ActiveState Code

Recipe 576409: Clean preceeding and trailing whitespace in complex list dictionary tuple structures


Function to clean trailing and or preceeding whitespace from string types in complex list, dictionary, and tuple structures. This is a recursive function to allow for complete coverage of all items in the structure. Wanted to share it as I needed it and after searching for a while I gave up and wrote one.

For example a = ["\rText \r\n", "This one is fine", ["stuff ", [" Something Else"], 4, "Another ", "one", " with"], "\twhitespace\r\n"]

print cleanWhiteSpace(a) Result: ["Text", "This one is fine", ["stuff", ["Something Else"], 4, "Another", "one", "with"], "whitespace"]

Python
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
import types

def cleanWhiteSpace(obj):
	objType = type(obj)
	if(objType is types.StringType):		# String
		# Clean regular string
		return obj.lstrip().rstrip()
	elif((objType is types.ListType) or (objType is types.TupleType)):		# List or Tuple
		out = [] 
		for ele in obj:		# Iterate the elements
			out.append(cleanWhiteSpace(ele))	# Recurse into this function for the element
		return out
	elif(objType is types.DictType):		# Dictionary 
		out = {}
		for ele in obj:		# Iterate the elements
			out[ele] = cleanWhiteSpace(obj[ele])	# Recurse into this function for the element
		return out
	else:
		# Non String or list object return it
		return obj

Comments

  1. 1. At 6:39 p.m. on 6 aug 2008, Edward Loper said:

    A few recommendations:

    • Use isinstance(...) rather than checking type identity
    • Use obj.strip() rather than obj.rstrip().lstrip()
    • It's standard in python to use lowercase_with_underscores for functions, not camelCase. Same for variable names (obj_type not objType).
    • It would be a little cleaner to do the

    Putting all that together, I'd recommend the following rewrite:

    >>> def clean_whitespace(obj):
    ...     if isinstance(obj, basestring):
    ...         return obj.strip()
    ...     elif isinstance(obj, list):
    ...         return [clean_whitespace(o) for o in obj]
    ...     elif isinstance(obj, tuple):
    ...         return tuple(clean_whitespace(o) for o in obj)
    ...     elif isinstance(obj, dict):
    ...         return dict((k, clean_whitespace(v)) for (k,v) in obj.items())
    ...     else:
    ...         return obj
    

    (n.b. that it does not recurse to dictionary keys -- this is to be consistent with your version. Obviously, it would be easy to change it to do so.)

  2. 2. At 5:34 p.m. on 7 aug 2008, Garron Moore said:

    Taking the previous recommendation as an example, you could go a step further in abstracting the functionality to work with any object and operation.

    def recurse_into(obj, baseaction, basetype=basestring):
        if isinstance(obj, basetype):
            return baseaction(obj)
        elif isinstance(obj, list):
            return [recurse_into(o, baseaction, basetype) for o in obj]
        elif isinstance(obj, tuple):
            return tuple(recurse_into(o, baseaction, basetype) for o in obj)
        elif isinstance(obj, dict):
            return dict((k, recurse_into(v, baseaction, basetype)) for (k, v) in obj.items())
        else:
            return obj
    
    def generate_recurse(baseaction, basetype=basestring):
        def f(obj):
            return recurse_into(obj, baseaction, basetype)
        return f
    

    To use this to accomplish the original task of cleaning whitespace:

    import string
    clean_whitespace = generate_recurse(string.strip)
    

Sign in to comment