Welcome, guest | Sign In | My Account | Store | Cart

Reader iterates over records in Dbase or Xbase files. Writer creates dbf files from Python sequences.

Python, 219 lines
  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
import struct, datetime, decimal, itertools

def dbfreader(f):
    """Returns an iterator over records in a Xbase DBF file.

    The first row returned contains the field names.
    The second row contains field specs: (type, size, decimal places).
    Subsequent rows contain the data records.
    If a record is marked as deleted, it is skipped.

    File should be opened for binary reads.

    """
    # See DBF format spec at:
    #     http://www.pgts.com.au/download/public/xbase.htm#DBF_STRUCT

    numrec, lenheader = struct.unpack('<xxxxLH22x', f.read(32))    
    numfields = (lenheader - 33) // 32

    fields = []
    for fieldno in xrange(numfields):
        name, typ, size, deci = struct.unpack('<11sc4xBB14x', f.read(32))
        name = name.replace('\0', '')       # eliminate NULs from string   
        fields.append((name, typ, size, deci))
    yield [field[0] for field in fields]
    yield [tuple(field[1:]) for field in fields]

    terminator = f.read(1)
    assert terminator == '\r'

    fields.insert(0, ('DeletionFlag', 'C', 1, 0))
    fmt = ''.join(['%ds' % fieldinfo[2] for fieldinfo in fields])
    fmtsiz = struct.calcsize(fmt)
    for i in xrange(numrec):
        record = struct.unpack(fmt, f.read(fmtsiz))
        if record[0] != ' ':
            continue                        # deleted record
        result = []
        for (name, typ, size, deci), value in itertools.izip(fields, record):
            if name == 'DeletionFlag':
                continue
            if typ == "N":
                value = value.replace('\0', '').lstrip()
                if value == '':
                    value = 0
                elif deci:
                    value = decimal.Decimal(value)
                else:
                    value = int(value)
            elif typ == 'D':
                y, m, d = int(value[:4]), int(value[4:6]), int(value[6:8])
                value = datetime.date(y, m, d)
            elif typ == 'L':
                value = (value in 'YyTt' and 'T') or (value in 'NnFf' and 'F') or '?'
            elif typ == 'F':
                value = float(value)
            result.append(value)
        yield result


def dbfwriter(f, fieldnames, fieldspecs, records):
    """ Return a string suitable for writing directly to a binary dbf file.

    File f should be open for writing in a binary mode.

    Fieldnames should be no longer than ten characters and not include \x00.
    Fieldspecs are in the form (type, size, deci) where
        type is one of:
            C for ascii character data
            M for ascii character memo data (real memo fields not supported)
            D for datetime objects
            N for ints or decimal objects
            L for logical values 'T', 'F', or '?'
        size is the field width
        deci is the number of decimal places in the provided decimal object
    Records can be an iterable over the records (sequences of field values).
    
    """
    # header info
    ver = 3
    now = datetime.datetime.now()
    yr, mon, day = now.year-1900, now.month, now.day
    numrec = len(records)
    numfields = len(fieldspecs)
    lenheader = numfields * 32 + 33
    lenrecord = sum(field[1] for field in fieldspecs) + 1
    hdr = struct.pack('<BBBBLHH20x', ver, yr, mon, day, numrec, lenheader, lenrecord)
    f.write(hdr)
                      
    # field specs
    for name, (typ, size, deci) in itertools.izip(fieldnames, fieldspecs):
        name = name.ljust(11, '\x00')
        fld = struct.pack('<11sc4xBB14x', name, typ, size, deci)
        f.write(fld)

    # terminator
    f.write('\r')

    # records
    for record in records:
        f.write(' ')                        # deletion flag
        for (typ, size, deci), value in itertools.izip(fieldspecs, record):
            if typ == "N":
                value = str(value).rjust(size, ' ')
            elif typ == 'D':
                value = value.strftime('%Y%m%d')
            elif typ == 'L':
                value = str(value)[0].upper()
            else:
                value = str(value)[:size].ljust(size, ' ')
            assert len(value) == size
            f.write(value)

    # End of file
    f.write('\x1A')


# -------------------------------------------------------
# Example calls
if __name__ == '__main__':
    import sys, csv
    from cStringIO import StringIO
    from operator import itemgetter

    # Read a database
    filename = '/pydev/databases/orders.dbf'      
    if len(sys.argv) == 2:
        filename = sys.argv[1]
    f = open(filename, 'rb')
    db = list(dbfreader(f))
    f.close()
    for record in db:
        print record
    fieldnames, fieldspecs, records = db[0], db[1], db[2:]

    # Alter the database
    del records[4]
    records.sort(key=itemgetter(4))

    # Remove a field
    del fieldnames[0]
    del fieldspecs[0]
    records = [rec[1:] for rec in records]

    # Create a new DBF
    f = StringIO()
    dbfwriter(f, fieldnames, fieldspecs, records)

    # Read the data back from the new DBF
    print '-' * 20    
    f.seek(0)
    for line in dbfreader(f):
        print line
    f.close()

    # Convert to CSV
    print '.' * 20    
    f = StringIO()
    csv.writer(f).writerow(fieldnames)    
    csv.writer(f).writerows(records)
    print f.getvalue()
    f.close()




# Example Output
"""
['ORDER_ID', 'CUSTMR_ID', 'EMPLOY_ID', 'ORDER_DATE', 'ORDER_AMT']
[('C', 10, 0), ('C', 11, 0), ('C', 11, 0), ('D', 8, 0), ('N', 12, 2)]
['10005     ', 'WALNG      ', '555        ', datetime.date(1995, 5, 22), Decimal("173.40")]
['10004     ', 'BMARK      ', '777        ', datetime.date(1995, 5, 18), Decimal("3194.20")]
['10029     ', 'SAWYH      ', '777        ', datetime.date(1995, 6, 29), Decimal("97.30")]
['10013     ', 'RITEB      ', '777        ', datetime.date(1995, 6, 2), Decimal("560.40")]
['10024     ', 'RATTC      ', '444        ', datetime.date(1995, 6, 21), Decimal("2223.50")]
['10018     ', 'RATTC      ', '444        ', datetime.date(1995, 6, 12), Decimal("1076.05")]
['10025     ', 'RATTC      ', '444        ', datetime.date(1995, 6, 23), Decimal("185.80")]
['10038     ', 'OLDWO      ', '111        ', datetime.date(1995, 7, 14), Decimal("863.96")]
['10002     ', 'MTIME      ', '333        ', datetime.date(1995, 5, 16), Decimal("731.80")]
['10007     ', 'MORNS      ', '444        ', datetime.date(1995, 5, 24), Decimal("1405.00")]
['10026     ', 'MORNS      ', '555        ', datetime.date(1995, 6, 26), Decimal("17.40")]
['10030     ', 'LILLO      ', '111        ', datetime.date(1995, 7, 3), Decimal("909.91")]
['10022     ', 'LAPLA      ', '111        ', datetime.date(1995, 6, 19), Decimal("671.50")]
['10035     ', 'HIGHG      ', '111        ', datetime.date(1995, 7, 11), Decimal("1984.83")]
['10033     ', 'FOODG      ', '333        ', datetime.date(1995, 7, 6), Decimal("3401.32")]
--------------------
['CUSTMR_ID', 'EMPLOY_ID', 'ORDER_DATE', 'ORDER_AMT']
[('C', 11, 0), ('C', 11, 0), ('D', 8, 0), ('N', 12, 2)]
['MORNS      ', '555        ', datetime.date(1995, 6, 26), Decimal("17.40")]
['SAWYH      ', '777        ', datetime.date(1995, 6, 29), Decimal("97.30")]
['WALNG      ', '555        ', datetime.date(1995, 5, 22), Decimal("173.40")]
['RATTC      ', '444        ', datetime.date(1995, 6, 23), Decimal("185.80")]
['RITEB      ', '777        ', datetime.date(1995, 6, 2), Decimal("560.40")]
['LAPLA      ', '111        ', datetime.date(1995, 6, 19), Decimal("671.50")]
['MTIME      ', '333        ', datetime.date(1995, 5, 16), Decimal("731.80")]
['OLDWO      ', '111        ', datetime.date(1995, 7, 14), Decimal("863.96")]
['LILLO      ', '111        ', datetime.date(1995, 7, 3), Decimal("909.91")]
['RATTC      ', '444        ', datetime.date(1995, 6, 12), Decimal("1076.05")]
['MORNS      ', '444        ', datetime.date(1995, 5, 24), Decimal("1405.00")]
['HIGHG      ', '111        ', datetime.date(1995, 7, 11), Decimal("1984.83")]
['BMARK      ', '777        ', datetime.date(1995, 5, 18), Decimal("3194.20")]
['FOODG      ', '333        ', datetime.date(1995, 7, 6), Decimal("3401.32")]
....................
CUSTMR_ID,EMPLOY_ID,ORDER_DATE,ORDER_AMT
MORNS      ,555        ,1995-06-26,17.40
SAWYH      ,777        ,1995-06-29,97.30
WALNG      ,555        ,1995-05-22,173.40
RATTC      ,444        ,1995-06-23,185.80
RITEB      ,777        ,1995-06-02,560.40
LAPLA      ,111        ,1995-06-19,671.50
MTIME      ,333        ,1995-05-16,731.80
OLDWO      ,111        ,1995-07-14,863.96
LILLO      ,111        ,1995-07-03,909.91
RATTC      ,444        ,1995-06-12,1076.05
MORNS      ,444        ,1995-05-24,1405.00
HIGHG      ,111        ,1995-07-11,1984.83
BMARK      ,777        ,1995-05-18,3194.20
FOODG      ,333        ,1995-07-06,3401.32
"""

The dbf file format is old, precise, commonplace, and widely supported by everything from calendar software in PDAs, to contact managers, to Excel and Access. It can be a good way to get Python to interoperate with pre-existing, non-Python apps.

16 comments

Tim Morgan 16 years, 10 months ago  # | flag

Visual FoxPro. With a visual foxpro dbf file I get an AssertionError regarding the terminator.

>>> db = list(dbfreader(f))
Traceback (most recent call last):
  File "", line 1, in ?
  File "dbf.py", line 29, in dbfreader
    assert terminator == '\r'
AssertionError

Can your recipe be modified easily to work with Visual FoxPro?

Marco Molinari 16 years, 9 months ago  # | flag

small typo. there is a small typo in the example code: main function is written twice and the second "version" has a bug in it (a missing argument in dbfwriter()); the first version works though

Anthony Dycks 16 years, 6 months ago  # | flag

yield statement syntax error using ActiveState Python V2.2. Attempting to run the DBF reader portion of the code results in the following invalid syntax error ...

yield [field[0] for field in fields]

an accompanying warning message is issued indicates that 'yield' will become a reserved word.

Chris Arndt 16 years, 5 months ago  # | flag

Generators in Python 2.2. Generators were still an optional feature in Python 2.2.

Just add "from __future__ import generators" to the top of the module.

hamish steiner 16 years, 2 months ago  # | flag

1 byte out.... Great code, as a newbie I would love to write code this tight. When I ran this on a DBF file it was one byte out , adding a new line to grab an extra byte sorted this out.

    terminator = f.read(1)

    assert terminator == '\r'



    fields.insert(0, ('DeletionFlag', 'C', 1, 0))

    fmt = ''.join(['%ds' % fieldinfo[2] for fieldinfo in fields])

    fmtsiz = struct.calcsize(fmt)

    hamish = f.read(1)   ----NEW LINE

Then I had no problems. I dont know if this is due to an extra byte after the terminator '\r' ???. Anyway, that helped.

The other problem I had was empy dates. If the value was ' ' then I would get errors ie a=int(' ') gives an error

Changeing the code for the Date record to this

    elif typ == 'D':

        y, m, d = value[:4], value[4:6], value[6:8]

        value=d+"/"+m+"/"+y

worked and gave me the date in a dmy format. If the string was ' ' then I get ' / / ' which is ok.

Hamish

Alex Baraev 16 years, 2 months ago  # | flag

Note that. File handlers for dbfwriter must be open in 'wb' mode, especially on Windows platform, because write method will convert 0xA to 0x0A 0x0D (CR to CRLF) and you'll get corrupted DBF.

Dennis Christopher 15 years ago  # | flag

problem writing dbf file to disk. I am butting my head against a wall trying to do something very simple with this: write a dbf file to disk (on XP). E.g modifying the sample script:

filename = 'c:\MyDBF.dbf'
g = open(filename, 'wb')
f = StringIO()
dbfwriter(f, fieldnames, fieldspecs, records)
g.write(f.getvalue())
g.flush()

Access doesnt like the resultant file. And DBFReader function also complains about it.

Any insight would be appreciated.

Dennis Christopher

Yusdi Santoso 14 years, 5 months ago  # | flag

Python code for reading Visual FoxPro DBF. This is a Python module that read Visual FoxPro DBF file with memo:

http://www.physics.ox.ac.uk/users/santoso/Software.Repository.html

Hope this can be useful.

tyrion-mx 13 years, 3 months ago  # | flag

I have made a simple lib to read DBF files, that looks like the django db/model api. It supports searching into the databases, creating new databases from a model definition and creating a model from an existing database. There is also a lower-level interface, if you don't like the "django like" syntax. It's still in the early stages of development, I hope to have the time to finish it.

david SHI 12 years, 10 months ago  # | flag

How come I can not update or delete a record?

david SHI 12 years, 10 months ago  # | flag

Get you make it generic?

Andrew Leonard 11 years ago  # | flag

I recently found this code, and i am using it to read arcmap shapefile dbf files, when i read in the data using the the dbfreader function some of the fields are being read in as various length string of '*'. I dunno why the code is doing this, i think it maybe the data, but excel, and arcmap read the data perfectly fine, some of the data columns are over the character length, which may be the problem. Any help would be greatly appreciated also i can provide the data that is giving issues too.

Thanks, Andrew

John Machin 10 years, 12 months ago  # | flag

To Andrew Leonard: I have a non-published DBF reader module which may be able to help you. You can find my e-mail address by google_search("John Machin xlrd").

Ken MacDonald 9 years, 5 months ago  # | flag

We are using the code above with a few mods for dumping CodeBase files, which has been a HUGE help for our development. However, CodeBase can create character fields with lengths greater than 255, and the above code chokes to death on that. This may be the solution to the problem Andrew Leonard mentions above, since he mentions over-length data columns.

As it turns out, the following line:

fmt = ''.join(['%ds' % fieldinfo[2] for fieldinfo in fields])

should look like:

fmt = ''.join(['%ds' % (fieldinfo[2] + (fieldinfo[3] * 256)) for fieldinfo in fields])

as fieldinfo[3] is the high order byte of the field length according to the spec at clicketyclick.dk, see comment above for URL. Once the above change is in place, files with long character fields read/dump properly.

CAVEAT: the spec mentions that this is the format of the high order byte for FoxPro and Clipper files for NON-NUMERIC data; the original posted code (comment, line 7) indicates that for some numeric types that fieldinfo[3] may be the number of decimal places. You may have to write some additional code if you have BOTH numeric and long character fields in your table. I didn't have any tables with decimal data to verify whether or not this is the case.

Good luck!

Ken

Tomas Nordin 7 years, 4 months ago  # | flag

For my needs (reading in a dbf file) this awesome function just replaced a dependency on a 6000+ lines of code module (java style) I was using before. And loading a file (126K, 8) took about 3+ secs while with the module it took 6- secs. So function is fast. Awesome and fast.

More, the module I was using, enforced the field names to lowercase. Also, the module I was using enforced no duplicated field names (when reading!), else error. This function does no such thing. So it's also beautiful.