Welcome, guest | Sign In | My Account | Store | Cart

The createFunction(sourceCode) below returns a python function that executes the given sourceCode (a string containing python code). The function, being a real python function, doesn't incur any overhead compared to any normal python function. And its environment is controlled: by default only safe operations are permitted (ie. map, reduce, filter, list, etc. ; others like import, open, close, eval, etc. are forbidden by default). But it is possible to extend this environment.

Python, 223 lines
  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
# The list of symbols that are included by default in the generated
# function's environment
SAFE_SYMBOLS = ["list", "dict", "tuple", "set", "long", "float", "object",
                "bool", "callable", "True", "False", "dir",
                "frozenset", "getattr", "hasattr", "abs", "cmp", "complex",
                "divmod", "id", "pow", "round", "slice", "vars",
                "hash", "hex", "int", "isinstance", "issubclass", "len",
                "map", "filter", "max", "min", "oct", "chr", "ord", "range",
                "reduce", "repr", "str", "type", "zip", "xrange", "None",
                "Exception", "KeyboardInterrupt"]
# Also add the standard exceptions
__bi = __builtins__
if type(__bi) is not dict:
    __bi = __bi.__dict__
for k in __bi:
    if k.endswith("Error") or k.endswith("Warning"):
        SAFE_SYMBOLS.append(k)
del __bi


def createFunction(sourceCode, args="", additional_symbols=dict()):
  """
  Create a python function from the given source code
  
  \param sourceCode A python string containing the core of the
  function. Might include the return statement (or not), definition of
  local functions, classes, etc. Indentation matters !
  
  \param args The string representing the arguments to put in the function's
  prototype, such as "a, b", or "a=12, b",
  or "a=12, b=dict(akey=42, another=5)"

  \param additional_symbols A dictionary variable name =>
  variable/funcion/object to include in the generated function's
  closure

  The sourceCode will be executed in a restricted environment,
  containing only the python builtins that are harmless (such as map,
  hasattr, etc.). To allow the function to access other modules or
  functions or objects, use the additional_symbols parameter. For
  example, to allow the source code to access the re and sys modules,
  as well as a global function F named afunction in the sourceCode and
  an object OoO named ooo in the sourceCode, specify:
      additional_symbols = dict(re=re, sys=sys, afunction=F, ooo=OoO)

  \return A python function implementing the source code. It can be
  recursive: the (internal) name of the function being defined is:
  __TheFunction__. Its docstring is the initial sourceCode string.

  Tests show that the resulting function does not have any calling
  time overhead (-3% to +3%, probably due to system preemption aleas)
  compared to normal python function calls.
  """
  # Include the sourcecode as the code of a function __TheFunction__:
  s = "def __TheFunction__(%s):\n" % args
  s += "\t" + "\n\t".join(sourceCode.split('\n')) + "\n"

  # Byte-compilation (optional)
  byteCode = compile(s, "<string>", 'exec')  

  # Setup the local and global dictionaries of the execution
  # environment for __TheFunction__
  bis   = dict() # builtins
  globs = dict()
  locs  = dict()

  # Setup a standard-compatible python environment
  bis["locals"]  = lambda: locs
  bis["globals"] = lambda: globs
  globs["__builtins__"] = bis
  globs["__name__"] = "SUBENV"
  globs["__doc__"] = sourceCode

  # Determine how the __builtins__ dictionary should be accessed
  if type(__builtins__) is dict:
    bi_dict = __builtins__
  else:
    bi_dict = __builtins__.__dict__

  # Include the safe symbols
  for k in SAFE_SYMBOLS:
    # try from current locals
    try:
      locs[k] = locals()[k]
      continue
    except KeyError:
      pass
    # Try from globals
    try:
      globs[k] = globals()[k]
      continue
    except KeyError:
      pass
    # Try from builtins
    try:
      bis[k] = bi_dict[k]
    except KeyError:
      # Symbol not available anywhere: silently ignored
      pass

  # Include the symbols added by the caller, in the globals dictionary
  globs.update(additional_symbols)

  # Finally execute the def __TheFunction__ statement:
  eval(byteCode, globs, locs)
  # As a result, the function is defined as the item __TheFunction__
  # in the locals dictionary
  fct = locs["__TheFunction__"]
  # Attach the function to the globals so that it can be recursive
  del locs["__TheFunction__"]
  globs["__TheFunction__"] = fct
  # Attach the actual source code to the docstring
  fct.__doc__ = sourceCode
  return fct


##################################################################
### Some tests
def test():
    # -----------------------------------------------------
    # Code to execute as function 'f' (as a string):
    s = """
if a == "BE RECURSIVE":
    print "In the recursion 1"
    return __TheFunction__("THE END", 54)
elif a == "THE END":
    print "In the recursion 2"
    return 54

print a
print b
x = True

def sayhello(s):
    print "I say hello that way: %s" % s

class SayHello(object):
    def __init__(self, s):
        self.__s = s
        print "ctor says %s" % self.__s

    def s(self):
        return self.__s
try:
    1/0
except ZeroDivisionError, ex:
    print "GOT EX", ex

print "ooo in here says", ooo.mouf()

result = a + b +1
afunction(a+1)
c = re.compile("^a").search("ba", 1)
d = re.compile("a").match("ba", 1)
sayhello("I am so happy today %s,%s" % (c, d))
o = SayHello("this works")
vvv = range(42)
print vvv
sys.stderr.write("writing to stderr\\n")

print __TheFunction__
print "============ BEGIN docstring ==========="
print __TheFunction__.__doc__
print "============ END docstring ==========="
return a*b + __TheFunction__("BE RECURSIVE", 33)
"""
    # End of source code string
    # -----------------------------------------------------

    # Create objects, functions, etc.
    class OOO:
        def __init__(self, id):
            self.__id = id
        def mouf(self):
            return "OOO: My ID is %s" % self.__id
    def F(n):
        print "F: my parameter is", n

    # Generate a first function, f, which needs the re and sys modules
    import sys, re
    OoO = OOO(64)
    f = createFunction(s, "a=3, b=4",
                       additional_symbols = dict(re=re, sys=sys,
                                                 afunction=F, ooo=OoO))
    # Generate another function
    OoO = OOO("FOR G")
    g = createFunction("print 'G: my parameter is ', p, 'and o says', o.mouf()\nreturn p",
                    "p='undefined'", dict(o=OoO))

    # Test them
    print "call f():", f()
    print "call g():", g()
    print "call f(42):", f(42)
    print "call g(22):", g(22)
    print "call f(b=42):", f(b=42)
    print "call f(b=7, a=8):", f(b=7, a=8)
    print "call f(9, 10):", f(9, 10)

    # Do some basic profiling
    def nothing(a):
        return a*42

    import time
    ITER=10000000

    time.sleep(1.) # force preemption
    
    st = time.time()
    for i in xrange(ITER):
        x = nothing(i)
    et = time.time()
    print "Basic: %fs" % (et-st)
    t = et-st

    time.sleep(1.) # force preemption

    f = createFunction("return a*42", "a")
    st = time.time()
    for i in xrange(ITER):
        x = f(i)
    et = time.time()
    print "FromString: %fs" % (et-st)
    print "FromString time = Basic%+f%%" % ((((et-st) - t) / t)*100.)

We developed a tool to watch log messages, and we wanted to allow the average user to define his own filters so that unwanted messages could be hidden. The regular expression solution is probably the most efficient, but can be tricky for the average user (for example: allow "x.y.z" only when x is "toto" and y is everything else but z). So we decided to provide a scripting mechanism from inside the tool: the user writes the contents of a function whose "return" statements return True or False to tell whether the log message should be shown or hidden.

So, basically, let's say we want to define a filter that takes two arguments "event" and "host" (each set by the log message). We let the user write the equivalent of in:

def __TheFunction__(host, event):
   CODE_HERE

then CODE_HERE can access the variables "host", "event" as usual, as well as a controlled set of global variables/functions (see the list SAFE_SYMBOLS).

We create the function with:

filter = createFunction(CODE_HERE_as_a_string, "host, event")

And that's it, we have the function filter equivalent to __TheFunction__ above.

Refer to the test() and the comments to see how to extend the default environment of the function.

2 comments

Andrew Dalke 16 years, 2 months ago  # | flag

beware restricted Python environments. Python was not designed for restricted execution and there are likely ways around it. For most use cases this code is good enough but don't expect it to be secure against determined attackers.

Here's one I came up with through experimentation, though it depends on a precondition that not all programs have. It assumes that "configobj" is installed, available as an zipped egg, already imported, and the user knows the path to the egg.

import configobj

def test():
    s = """

all_types = ().__class__.__bases__[0].__subclasses__()
zipimport = [x for x in all_types if x.__name__ == "zipimporter"][0]
egg = "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/configobj-4.4.0-py2.5.egg"
x = zipimport(egg)
mod = x.load_module("configobj")

print "system", mod.os.system("ls")
"""
    f = createFunction(s)
    print "call f():", f()

test()

It works because zipimporter doesn't check the internal "PyEval_GetRestricted' flag. I can use it to load a zipped egg file, or for that matter any zip file. I chose "configobj" here because it's pretty popular and because it imports os, so I can get access to os.system. I could also have chosen elementtree.ElementTree and no doubt others. The key thing is that it must have been imported earlier so that it's available in the zipimport cache.

There's another subtle problem in this code. The Python specification says that support for "\f" is implementation specific. The C implementation treats it as a newline. Thus,

def test():
    s = """pass
\fprint "I live!"
"""
    f = createFunction(s)

    #print "call f():", f()

test()

gives the output "I live!", when nearly everyone would expect that it doesn't do anything.

david decotigny (author) 16 years, 2 months ago  # | flag

Indeed. Thank you Andrew. Indeed, this function is not a bullet-proof device against attackers. So here is a disclaimer: don't use this technique on a wide-open-to-the-public system, where people could compromise other people's use of the service. The technique above is convenient for programs that need some kind of easy user-friendly on-the-fly configurations, with users aware that if they want to put the program down, they can and they are the main (only) victims of it. The "restricted" keyword is here to mean that the intent is to limit the impact of the mistakes the users can do by writing their code. But if the users deliberately want to shoot themselves in the foot, they can, though it's somewhat tricky, as you showed.