Generate a catenateFiles function parameterized for common variations.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 | import os
def catenateFilesFactory(isTextFiles=True, isClearTgt=True, isCreateTgt=True):
"""return a catenateFiles function parameterized by the factory arguments.
isTextFiles: Catenate text files. If the last line of a non-empty file
is not terminated by an EOL, append an EOL to it.
isClearTgt If the target file already exists, clear its original
contents before appending the source files.
isCreateTgt If the target file does not already exist, and this
parameter is True, create the target file; otherwise raise
an IOError.
"""
eol = os.linesep
lenEol = len(eol)
def catenateFiles(tgtFile, *srcFiles):
isTgtAppendEol = False
if os.path.isfile(tgtFile):
if isClearTgt:
tgt = open(tgtFile, 'wb')
tgt.close()
elif isTextFiles:
tgt = open(tgtFile, 'rb')
data = tgt.read()
tgt.close()
if len(data) and (len(data) < lenEol or data[-lenEol:] != eol):
isTgtAppendEol = True
elif not isCreateTgt:
raise IOError, "catenateFiles target file '%s' not found" % (
tgtFile)
tgt = open(tgtFile, 'ab')
if isTgtAppendEol:
tgt.write(eol)
for srcFile in srcFiles:
src = open(srcFile, 'rb')
data = src.read()
src.close()
tgt.write(data)
if (isTextFiles and len(data) and
(len(data) < lenEol or data[-lenEol:] != eol)):
tgt.write(eol)
tgt.close()
return
# Support reflection and doc string.
catenateFiles.isTextFiles = isTextFiles
catenateFiles.isClearTgt = isClearTgt
catenateFiles.isCreateTgt = isCreateTgt
if isTextFiles:
docFileType = "text"
else:
docFileType = "binary"
if isCreateTgt:
docCreate = "Create tgtFile if it does not already exist."
else:
docCreate = "Require that tgtFile already exists."
if isClearTgt:
docClear = "replace"
else:
docClear = "append to"
catenateFiles.__doc__ = """Catenate %s srcFiles to %s the tgtFile.
%s
All of the srcFiles must exist; otherwise raise an IOError.
""" % (docFileType, docClear, docCreate)
return catenateFiles
|
Catenating (or concatenating) files is a common data processing task, but there are at least three independent binary choices for the functional requirements. This could mean 8 different functions, or one functions with three additional boolean arguments.
Binary files are relatively easy to handle. So are text files - unless the last line of a file does not end with a line separator. Then the catenation function must insert a line separator between the last line and the first line of the next file.
An existing target file may or may not be cleared before any of the source files are appended to it.
Finally, if the target file does not already exist, it may be created; or it may be required to exist already.
The catenateFilesFactory generates and returns a parameterized catentation function each time it is called. Each function has attributes and a doc string describing the parameter values used to generate it.
Example:
catenateFiles = catenateFilesFactory() srcFiles = ['src1.txt', 'src2.txt', 'src3.txt'] catenateFiles(tgtFilename, *srcFiles)
catenateBin = catenateFilesFactory(isTextFiles=False, isClearTgt=False, isCreateTgt=False) catenateBin('growing.bin', 'a.bin', 'b.bin')