Welcome, guest | Sign In | My Account | Store | Cart

Generate a catenateFiles function parameterized for common variations.

Python, 67 lines
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
import os

def catenateFilesFactory(isTextFiles=True, isClearTgt=True, isCreateTgt=True):
    """return a catenateFiles function parameterized by the factory arguments.
    isTextFiles:    Catenate text files.  If the last line of a non-empty file 
                    is not terminated by an EOL, append an EOL to it.
    isClearTgt      If the target file already exists, clear its original 
                    contents before appending the source files.
    isCreateTgt     If the target file does not already exist, and this
                    parameter is True, create the target file; otherwise raise 
                    an IOError.
    """
    eol = os.linesep
    lenEol = len(eol)

    def catenateFiles(tgtFile, *srcFiles):
        isTgtAppendEol = False
        if os.path.isfile(tgtFile):
            if isClearTgt:
                tgt = open(tgtFile, 'wb')
                tgt.close()
            elif isTextFiles:
                tgt = open(tgtFile, 'rb')
                data = tgt.read()
                tgt.close()
                if len(data) and (len(data) < lenEol or data[-lenEol:] != eol):
                    isTgtAppendEol = True
        elif not isCreateTgt:
            raise IOError, "catenateFiles target file '%s' not found" % ( 
                    tgtFile)
        tgt = open(tgtFile, 'ab')
        if isTgtAppendEol:
            tgt.write(eol)
        for srcFile in srcFiles:
            src = open(srcFile, 'rb')
            data = src.read()
            src.close()
            tgt.write(data)
            if (isTextFiles and len(data) and
                    (len(data) < lenEol or data[-lenEol:] != eol)):
                tgt.write(eol)
        tgt.close()
        return            

    # Support reflection and doc string.
    catenateFiles.isTextFiles = isTextFiles
    catenateFiles.isClearTgt = isClearTgt
    catenateFiles.isCreateTgt = isCreateTgt
    
    if isTextFiles:
        docFileType = "text"
    else:
        docFileType = "binary"
    if isCreateTgt:
        docCreate = "Create tgtFile if it does not already exist."
    else:
        docCreate = "Require that tgtFile already exists."
    if isClearTgt:
        docClear = "replace"
    else:
        docClear = "append to"
    catenateFiles.__doc__ = """Catenate %s srcFiles to %s the tgtFile.
    %s
    All of the srcFiles must exist; otherwise raise an IOError.
    """ % (docFileType, docClear, docCreate)

    return catenateFiles

Catenating (or concatenating) files is a common data processing task, but there are at least three independent binary choices for the functional requirements. This could mean 8 different functions, or one functions with three additional boolean arguments.

Binary files are relatively easy to handle. So are text files - unless the last line of a file does not end with a line separator. Then the catenation function must insert a line separator between the last line and the first line of the next file.

An existing target file may or may not be cleared before any of the source files are appended to it.

Finally, if the target file does not already exist, it may be created; or it may be required to exist already.

The catenateFilesFactory generates and returns a parameterized catentation function each time it is called. Each function has attributes and a doc string describing the parameter values used to generate it.

Example:

catenateFiles = catenateFilesFactory() srcFiles = ['src1.txt', 'src2.txt', 'src3.txt'] catenateFiles(tgtFilename, *srcFiles)

catenateBin = catenateFilesFactory(isTextFiles=False, isClearTgt=False, isCreateTgt=False) catenateBin('growing.bin', 'a.bin', 'b.bin')

Created by Jim Jinkins on Sun, 31 Jul 2005 (PSF)
Python recipes (4591)
Jim Jinkins's recipes (3)

Required Modules

Other Information and Tasks