Often, we might want to let (untrusted) users input simple Python expressions and evaluate them, but the eval-function in Python is unsafe. The restricted execution model in the rexec module is deprecated, so we need another way ensure only "safe" expressions will be evaluted: analyzing bytecodes.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 | import dis
_const_codes = map(dis.opmap.__getitem__, [
'POP_TOP','ROT_TWO','ROT_THREE','ROT_FOUR','DUP_TOP',
'BUILD_LIST','BUILD_MAP','BUILD_TUPLE',
'LOAD_CONST','RETURN_VALUE','STORE_SUBSCR'
])
_expr_codes = _const_codes + map(dis.opmap.__getitem__, [
'UNARY_POSITIVE','UNARY_NEGATIVE','UNARY_NOT',
'UNARY_INVERT','BINARY_POWER','BINARY_MULTIPLY',
'BINARY_DIVIDE','BINARY_FLOOR_DIVIDE','BINARY_TRUE_DIVIDE',
'BINARY_MODULO','BINARY_ADD','BINARY_SUBTRACT',
'BINARY_LSHIFT','BINARY_RSHIFT','BINARY_AND','BINARY_XOR',
'BINARY_OR',
])
def _get_opcodes(codeobj):
"""_get_opcodes(codeobj) -> [opcodes]
Extract the actual opcodes as a list from a code object
>>> c = compile("[1 + 2, (1,2)]", "", "eval")
>>> _get_opcodes(c)
[100, 100, 23, 100, 100, 102, 103, 83]
"""
i = 0
opcodes = []
s = codeobj.co_code
while i < len(s):
code = ord(s[i])
opcodes.append(code)
if code >= dis.HAVE_ARGUMENT:
i += 3
else:
i += 1
return opcodes
def test_expr(expr, allowed_codes):
"""test_expr(expr) -> codeobj
Test that the expression contains only the listed opcodes.
If the expression is valid and contains only allowed codes,
return the compiled code object. Otherwise raise a ValueError
"""
try:
c = compile(expr, "", "eval")
except:
raise ValueError, "%s is not a valid expression", expr
codes = _get_opcodes(c)
for code in codes:
if code not in allowed_codes:
raise ValueError, "opcode %s not allowed" % dis.opname[code]
return c
def const_eval(expr):
"""const_eval(expression) -> value
Safe Python constant evaluation
Evaluates a string that contains an expression describing
a Python constant. Strings that are not valid Python expressions
or that contain other code besides the constant raise ValueError.
>>> const_eval("10")
10
>>> const_eval("[1,2, (3,4), {'foo':'bar'}]")
[1, 2, (3, 4), {'foo': 'bar'}]
>>> const_eval("1+2")
Traceback (most recent call last):
...
ValueError: opcode BINARY_ADD not allowed
"""
c = test_expr(expr, _const_codes)
return eval(c)
def expr_eval(expr):
"""expr_eval(expression) -> value
Safe Python expression evaluation
Evaluates a string that contains an expression that only
uses Python constants. This can be used to e.g. evaluate
a numerical expression from an untrusted source.
>>> expr_eval("1+2")
3
>>> expr_eval("[1,2]*2")
[1, 2, 1, 2]
>>> expr_eval("__import__('sys').modules")
Traceback (most recent call last):
...
ValueError: opcode LOAD_NAME not allowed
"""
c = test_expr(expr, _expr_codes)
return eval(c)
|
The same sort of method could be used to analyze and allow more complex code too. We could allow the LOAD_NAME opcode and then be very restrictive about the names allowed in the code by examining the code object's co_names field. This way, we could for example extend the expr_eval function to allow mathematical functions like sin and cos.
The greatest caveat is that relying on bytecodes is probably not very portable between Python versions. The above examples have been tested in Python 2.3
Also, it should be noted that a malicious user can still for example cause the expression to take vast amounts of memory by inputting something like '100100100100100**100...'. There is no way to really prevent this from within Python, without making the expression limitations too restrictive.