ASPN ActiveState Programmer Network  
ActiveState, a division of Sophos
/ Home / Perl / PHP / Python / Tcl / XSLT /
/ Safari / My ASPN /
Cookbooks | Documentation | Mailing Lists | Modules | News Feeds | Products | User Groups
Submit Recipe
My Recipes

All Recipes
All Cookbooks


View by Category

Title: Safe Expression Evaluation
Submitter: Sami Hangaslammi (other recipes)Sami Hangaslammi (other recipes)
Last Updated: 2004/06/14
Version no: 1.2
Category: Programs

 

3 stars 1 vote(s)


Description:

Often, we might want to let (untrusted) users input simple Python expressions and evaluate them, but the eval-function in Python is unsafe. The restricted execution model in the rexec module is deprecated, so we need another way ensure only "safe" expressions will be evaluted: analyzing bytecodes.

Source: Text Source

import dis

_const_codes = map(dis.opmap.__getitem__, [
    'POP_TOP','ROT_TWO','ROT_THREE','ROT_FOUR','DUP_TOP',
    'BUILD_LIST','BUILD_MAP','BUILD_TUPLE',
    'LOAD_CONST','RETURN_VALUE','STORE_SUBSCR'
    ])

_expr_codes = _const_codes + map(dis.opmap.__getitem__, [
    'UNARY_POSITIVE','UNARY_NEGATIVE','UNARY_NOT',
    'UNARY_INVERT','BINARY_POWER','BINARY_MULTIPLY',
    'BINARY_DIVIDE','BINARY_FLOOR_DIVIDE','BINARY_TRUE_DIVIDE',
    'BINARY_MODULO','BINARY_ADD','BINARY_SUBTRACT',
    'BINARY_LSHIFT','BINARY_RSHIFT','BINARY_AND','BINARY_XOR',
    'BINARY_OR',
    ])

def _get_opcodes(codeobj):
    """_get_opcodes(codeobj) -> [opcodes]

    Extract the actual opcodes as a list from a code object

    >>> c = compile("[1 + 2, (1,2)]", "", "eval")
    >>> _get_opcodes(c)
    [100, 100, 23, 100, 100, 102, 103, 83]
    """
    i = 0
    opcodes = []
    s = codeobj.co_code
    while i < len(s):
        code = ord(s[i])
        opcodes.append(code)
        if code >= dis.HAVE_ARGUMENT:
            i += 3
        else:
            i += 1
    return opcodes        

def test_expr(expr, allowed_codes):
    """test_expr(expr) -> codeobj

    Test that the expression contains only the listed opcodes.
    If the expression is valid and contains only allowed codes,
    return the compiled code object. Otherwise raise a ValueError
    """
    try:
        c = compile(expr, "", "eval")
    except:
        raise ValueError, "%s is not a valid expression", expr
    codes = _get_opcodes(c)
    for code in codes:
        if code not in allowed_codes:
            raise ValueError, "opcode %s not allowed" % dis.opname[code]
    return c


def const_eval(expr):
    """const_eval(expression) -> value

    Safe Python constant evaluation

    Evaluates a string that contains an expression describing
    a Python constant. Strings that are not valid Python expressions
    or that contain other code besides the constant raise ValueError.

    >>> const_eval("10")
    10
    >>> const_eval("[1,2, (3,4), {'foo':'bar'}]")
    [1, 2, (3, 4), {'foo': 'bar'}]
    >>> const_eval("1+2")
    Traceback (most recent call last):
    ...
    ValueError: opcode BINARY_ADD not allowed
    """
    c = test_expr(expr, _const_codes)
    return eval(c)

def expr_eval(expr):
    """expr_eval(expression) -> value

    Safe Python expression evaluation

    Evaluates a string that contains an expression that only
    uses Python constants. This can be used to e.g. evaluate 
    a numerical expression from an untrusted source.

    >>> expr_eval("1+2")
    3
    >>> expr_eval("[1,2]*2")
    [1, 2, 1, 2]
    >>> expr_eval("__import__('sys').modules")
    Traceback (most recent call last):
    ...
    ValueError: opcode LOAD_NAME not allowed
    """
    c = test_expr(expr, _expr_codes)
    return eval(c)

Discussion:

The same sort of method could be used to analyze and allow more complex code too. We could allow the LOAD_NAME opcode and then be very restrictive about the names allowed in the code by examining the code object's co_names field. This way, we could for example extend the expr_eval function to allow mathematical functions like sin and cos.

The greatest caveat is that relying on bytecodes is probably not very portable between Python versions. The above examples have been tested in Python 2.3

Also, it should be noted that a malicious user can still for example cause the expression to take vast amounts of memory by inputting something like '100**100**100**100**100**100...'. There is no way to really prevent this from within Python, without making the expression limitations too restrictive.



Add comment

No comments.



Highest rated recipes:

1. A simple XML-RPC server

2. Web service accessible ...

3. Wrapping template engine ...

4. Assignment in expression

5. SOLVING THE METACLASS ...

6. Povray for python

7. Calling Windows API ...

8. Generic filter logic ...

9. Function Decorators by ...

10. MS SQL Server log monitor




Privacy Policy | Email Opt-out | Feedback | Syndication
© 2006 ActiveState Software Inc. All rights reserved.