The createFunction(sourceCode) below returns a python function that executes the given sourceCode (a string containing python code). The function, being a real python function, doesn't incur any overhead compared to any normal python function. And its environment is controlled: by default only safe operations are permitted (ie. map, reduce, filter, list, etc. ; others like import, open, close, eval, etc. are forbidden by default). But it is possible to extend this environment.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 | # The list of symbols that are included by default in the generated
# function's environment
SAFE_SYMBOLS = ["list", "dict", "tuple", "set", "long", "float", "object",
"bool", "callable", "True", "False", "dir",
"frozenset", "getattr", "hasattr", "abs", "cmp", "complex",
"divmod", "id", "pow", "round", "slice", "vars",
"hash", "hex", "int", "isinstance", "issubclass", "len",
"map", "filter", "max", "min", "oct", "chr", "ord", "range",
"reduce", "repr", "str", "type", "zip", "xrange", "None",
"Exception", "KeyboardInterrupt"]
# Also add the standard exceptions
__bi = __builtins__
if type(__bi) is not dict:
__bi = __bi.__dict__
for k in __bi:
if k.endswith("Error") or k.endswith("Warning"):
SAFE_SYMBOLS.append(k)
del __bi
def createFunction(sourceCode, args="", additional_symbols=dict()):
"""
Create a python function from the given source code
\param sourceCode A python string containing the core of the
function. Might include the return statement (or not), definition of
local functions, classes, etc. Indentation matters !
\param args The string representing the arguments to put in the function's
prototype, such as "a, b", or "a=12, b",
or "a=12, b=dict(akey=42, another=5)"
\param additional_symbols A dictionary variable name =>
variable/funcion/object to include in the generated function's
closure
The sourceCode will be executed in a restricted environment,
containing only the python builtins that are harmless (such as map,
hasattr, etc.). To allow the function to access other modules or
functions or objects, use the additional_symbols parameter. For
example, to allow the source code to access the re and sys modules,
as well as a global function F named afunction in the sourceCode and
an object OoO named ooo in the sourceCode, specify:
additional_symbols = dict(re=re, sys=sys, afunction=F, ooo=OoO)
\return A python function implementing the source code. It can be
recursive: the (internal) name of the function being defined is:
__TheFunction__. Its docstring is the initial sourceCode string.
Tests show that the resulting function does not have any calling
time overhead (-3% to +3%, probably due to system preemption aleas)
compared to normal python function calls.
"""
# Include the sourcecode as the code of a function __TheFunction__:
s = "def __TheFunction__(%s):\n" % args
s += "\t" + "\n\t".join(sourceCode.split('\n')) + "\n"
# Byte-compilation (optional)
byteCode = compile(s, "<string>", 'exec')
# Setup the local and global dictionaries of the execution
# environment for __TheFunction__
bis = dict() # builtins
globs = dict()
locs = dict()
# Setup a standard-compatible python environment
bis["locals"] = lambda: locs
bis["globals"] = lambda: globs
globs["__builtins__"] = bis
globs["__name__"] = "SUBENV"
globs["__doc__"] = sourceCode
# Determine how the __builtins__ dictionary should be accessed
if type(__builtins__) is dict:
bi_dict = __builtins__
else:
bi_dict = __builtins__.__dict__
# Include the safe symbols
for k in SAFE_SYMBOLS:
# try from current locals
try:
locs[k] = locals()[k]
continue
except KeyError:
pass
# Try from globals
try:
globs[k] = globals()[k]
continue
except KeyError:
pass
# Try from builtins
try:
bis[k] = bi_dict[k]
except KeyError:
# Symbol not available anywhere: silently ignored
pass
# Include the symbols added by the caller, in the globals dictionary
globs.update(additional_symbols)
# Finally execute the def __TheFunction__ statement:
eval(byteCode, globs, locs)
# As a result, the function is defined as the item __TheFunction__
# in the locals dictionary
fct = locs["__TheFunction__"]
# Attach the function to the globals so that it can be recursive
del locs["__TheFunction__"]
globs["__TheFunction__"] = fct
# Attach the actual source code to the docstring
fct.__doc__ = sourceCode
return fct
##################################################################
### Some tests
def test():
# -----------------------------------------------------
# Code to execute as function 'f' (as a string):
s = """
if a == "BE RECURSIVE":
print "In the recursion 1"
return __TheFunction__("THE END", 54)
elif a == "THE END":
print "In the recursion 2"
return 54
print a
print b
x = True
def sayhello(s):
print "I say hello that way: %s" % s
class SayHello(object):
def __init__(self, s):
self.__s = s
print "ctor says %s" % self.__s
def s(self):
return self.__s
try:
1/0
except ZeroDivisionError, ex:
print "GOT EX", ex
print "ooo in here says", ooo.mouf()
result = a + b +1
afunction(a+1)
c = re.compile("^a").search("ba", 1)
d = re.compile("a").match("ba", 1)
sayhello("I am so happy today %s,%s" % (c, d))
o = SayHello("this works")
vvv = range(42)
print vvv
sys.stderr.write("writing to stderr\\n")
print __TheFunction__
print "============ BEGIN docstring ==========="
print __TheFunction__.__doc__
print "============ END docstring ==========="
return a*b + __TheFunction__("BE RECURSIVE", 33)
"""
# End of source code string
# -----------------------------------------------------
# Create objects, functions, etc.
class OOO:
def __init__(self, id):
self.__id = id
def mouf(self):
return "OOO: My ID is %s" % self.__id
def F(n):
print "F: my parameter is", n
# Generate a first function, f, which needs the re and sys modules
import sys, re
OoO = OOO(64)
f = createFunction(s, "a=3, b=4",
additional_symbols = dict(re=re, sys=sys,
afunction=F, ooo=OoO))
# Generate another function
OoO = OOO("FOR G")
g = createFunction("print 'G: my parameter is ', p, 'and o says', o.mouf()\nreturn p",
"p='undefined'", dict(o=OoO))
# Test them
print "call f():", f()
print "call g():", g()
print "call f(42):", f(42)
print "call g(22):", g(22)
print "call f(b=42):", f(b=42)
print "call f(b=7, a=8):", f(b=7, a=8)
print "call f(9, 10):", f(9, 10)
# Do some basic profiling
def nothing(a):
return a*42
import time
ITER=10000000
time.sleep(1.) # force preemption
st = time.time()
for i in xrange(ITER):
x = nothing(i)
et = time.time()
print "Basic: %fs" % (et-st)
t = et-st
time.sleep(1.) # force preemption
f = createFunction("return a*42", "a")
st = time.time()
for i in xrange(ITER):
x = f(i)
et = time.time()
print "FromString: %fs" % (et-st)
print "FromString time = Basic%+f%%" % ((((et-st) - t) / t)*100.)
|
We developed a tool to watch log messages, and we wanted to allow the average user to define his own filters so that unwanted messages could be hidden. The regular expression solution is probably the most efficient, but can be tricky for the average user (for example: allow "x.y.z" only when x is "toto" and y is everything else but z). So we decided to provide a scripting mechanism from inside the tool: the user writes the contents of a function whose "return" statements return True or False to tell whether the log message should be shown or hidden.
So, basically, let's say we want to define a filter that takes two arguments "event" and "host" (each set by the log message). We let the user write the equivalent of in:
def __TheFunction__(host, event):
CODE_HERE
then CODE_HERE can access the variables "host", "event" as usual, as well as a controlled set of global variables/functions (see the list SAFE_SYMBOLS).
We create the function with:
filter = createFunction(CODE_HERE_as_a_string, "host, event")
And that's it, we have the function filter equivalent to __TheFunction__ above.
Refer to the test() and the comments to see how to extend the default environment of the function.
beware restricted Python environments. Python was not designed for restricted execution and there are likely ways around it. For most use cases this code is good enough but don't expect it to be secure against determined attackers.
Here's one I came up with through experimentation, though it depends on a precondition that not all programs have. It assumes that "configobj" is installed, available as an zipped egg, already imported, and the user knows the path to the egg.
It works because zipimporter doesn't check the internal "PyEval_GetRestricted' flag. I can use it to load a zipped egg file, or for that matter any zip file. I chose "configobj" here because it's pretty popular and because it imports os, so I can get access to os.system. I could also have chosen elementtree.ElementTree and no doubt others. The key thing is that it must have been imported earlier so that it's available in the zipimport cache.
There's another subtle problem in this code. The Python specification says that support for "\f" is implementation specific. The C implementation treats it as a newline. Thus,
gives the output "I live!", when nearly everyone would expect that it doesn't do anything.
Indeed. Thank you Andrew. Indeed, this function is not a bullet-proof device against attackers. So here is a disclaimer: don't use this technique on a wide-open-to-the-public system, where people could compromise other people's use of the service. The technique above is convenient for programs that need some kind of easy user-friendly on-the-fly configurations, with users aware that if they want to put the program down, they can and they are the main (only) victims of it. The "restricted" keyword is here to mean that the intent is to limit the impact of the mistakes the users can do by writing their code. But if the users deliberately want to shoot themselves in the foot, they can, though it's somewhat tricky, as you showed.