Welcome, guest | Sign In | My Account | Store | Cart

Often, we might want to let (untrusted) users input simple Python expressions and evaluate them, but the eval-function in Python is unsafe. The restricted execution model in the rexec module is deprecated, so we need another way ensure only "safe" expressions will be evaluted: analyzing bytecodes.

Python, 97 lines
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
import dis

_const_codes = map(dis.opmap.__getitem__, [
    'POP_TOP','ROT_TWO','ROT_THREE','ROT_FOUR','DUP_TOP',
    'BUILD_LIST','BUILD_MAP','BUILD_TUPLE',
    'LOAD_CONST','RETURN_VALUE','STORE_SUBSCR'
    ])

_expr_codes = _const_codes + map(dis.opmap.__getitem__, [
    'UNARY_POSITIVE','UNARY_NEGATIVE','UNARY_NOT',
    'UNARY_INVERT','BINARY_POWER','BINARY_MULTIPLY',
    'BINARY_DIVIDE','BINARY_FLOOR_DIVIDE','BINARY_TRUE_DIVIDE',
    'BINARY_MODULO','BINARY_ADD','BINARY_SUBTRACT',
    'BINARY_LSHIFT','BINARY_RSHIFT','BINARY_AND','BINARY_XOR',
    'BINARY_OR',
    ])

def _get_opcodes(codeobj):
    """_get_opcodes(codeobj) -> [opcodes]

    Extract the actual opcodes as a list from a code object

    >>> c = compile("[1 + 2, (1,2)]", "", "eval")
    >>> _get_opcodes(c)
    [100, 100, 23, 100, 100, 102, 103, 83]
    """
    i = 0
    opcodes = []
    s = codeobj.co_code
    while i < len(s):
        code = ord(s[i])
        opcodes.append(code)
        if code >= dis.HAVE_ARGUMENT:
            i += 3
        else:
            i += 1
    return opcodes        

def test_expr(expr, allowed_codes):
    """test_expr(expr) -> codeobj

    Test that the expression contains only the listed opcodes.
    If the expression is valid and contains only allowed codes,
    return the compiled code object. Otherwise raise a ValueError
    """
    try:
        c = compile(expr, "", "eval")
    except:
        raise ValueError, "%s is not a valid expression", expr
    codes = _get_opcodes(c)
    for code in codes:
        if code not in allowed_codes:
            raise ValueError, "opcode %s not allowed" % dis.opname[code]
    return c


def const_eval(expr):
    """const_eval(expression) -> value

    Safe Python constant evaluation

    Evaluates a string that contains an expression describing
    a Python constant. Strings that are not valid Python expressions
    or that contain other code besides the constant raise ValueError.

    >>> const_eval("10")
    10
    >>> const_eval("[1,2, (3,4), {'foo':'bar'}]")
    [1, 2, (3, 4), {'foo': 'bar'}]
    >>> const_eval("1+2")
    Traceback (most recent call last):
    ...
    ValueError: opcode BINARY_ADD not allowed
    """
    c = test_expr(expr, _const_codes)
    return eval(c)

def expr_eval(expr):
    """expr_eval(expression) -> value

    Safe Python expression evaluation

    Evaluates a string that contains an expression that only
    uses Python constants. This can be used to e.g. evaluate 
    a numerical expression from an untrusted source.

    >>> expr_eval("1+2")
    3
    >>> expr_eval("[1,2]*2")
    [1, 2, 1, 2]
    >>> expr_eval("__import__('sys').modules")
    Traceback (most recent call last):
    ...
    ValueError: opcode LOAD_NAME not allowed
    """
    c = test_expr(expr, _expr_codes)
    return eval(c)

The same sort of method could be used to analyze and allow more complex code too. We could allow the LOAD_NAME opcode and then be very restrictive about the names allowed in the code by examining the code object's co_names field. This way, we could for example extend the expr_eval function to allow mathematical functions like sin and cos.

The greatest caveat is that relying on bytecodes is probably not very portable between Python versions. The above examples have been tested in Python 2.3

Also, it should be noted that a malicious user can still for example cause the expression to take vast amounts of memory by inputting something like '100100100100100**100...'. There is no way to really prevent this from within Python, without making the expression limitations too restrictive.

Created by Sami Hangaslammi on Mon, 14 Jun 2004 (PSF)
Python recipes (4591)
Sami Hangaslammi's recipes (7)

Required Modules

Other Information and Tasks