Welcome, guest | Sign In | My Account | Store | Cart

Smaller faster pickles! Eliminates unused PUT opcodes.

Python, 36 lines
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
from pickletools import genops

def optimize(p):
    'Optimize a pickle string by removing unused PUT opcodes'
    gets = set()            # set of args used by a GET opcode
    puts = []               # (arg, startpos, stoppos) for the PUT opcodes
    prevpos = None          # set to pos if previous opcode was a PUT
    for opcode, arg, pos in genops(p):
        if prevpos is not None:
            puts.append((prevarg, prevpos, pos))
            prevpos = None
        if 'PUT' in opcode.name:
            prevarg, prevpos = arg, pos
        elif 'GET' in opcode.name:
            gets.add(arg)

    # Copy the pickle string except for PUTS without a corresponding GET
    s = []
    i = 0
    for arg, start, stop in puts:
        j = stop if (arg in gets) else start
        s.append(p[i:j])
        i = stop
    s.append(p[i:])            
    return ''.join(s)


if __name__ == '__main__':
    from pickle import dumps
    from pickletools import dis

    p = dumps(['the', 'quick', 'brown', 'fox'])
    print 'Before:'
    dis(p)
    print '\nAfter:'
    dis(optimize(p))

The pickler is designed to conserve memory by writing its output directly to a file as the pickle is generated. When writing a potentially reusable object, it is not known whether the object will be subsequently referenced. To allow for that possibility, pickle is forced to save a PUT opcode for each of those objects. This makes this pickle unnecessarily fat.

Fat pickles suck. They consume disk space if you write them to a file. They take extra transmission time if you send them across a network. And worse, fat pickles take more time and memory to unpickle. Unnecessary PUT opcodes cost you at both the sending and receiving ends.

To use the recipe, write "optimize(dumps(obj))" wherever you would have written "dumps(obj)".

This recipe pulls the whole pickle into memory, scans all the GET and PUT opcodes, and then eliminates unused PUT codes from the pickle string. This results in much shorter pickles. The effect is enhanced if you zip the pickle prior to transmission or storage.

2 comments

Louis RIVIERE 16 years, 2 months ago  # | flag

Pickle 3. That would be nice in Pickle protocol version 3

I'm unclear. Is this only a problem in 'fat, bloated, and default' version of Pickle?