This recipe provides recursive nlst() behavior on top of a normal ftplib.FTP instance. The rnlst() method provided by the LocalFTP class returns a list of filenames under the path passed in as an argument. (One use for this list might be mirroring an ftp site. However, the python distribution contains a script called ftpmirror.py - use that instead.)
Best suited for use on fast local connections, or for use on relatively small remote ftp directories.
Please see code comments for additional information.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 | class LocalFTP(object):
"""
Class adding recursive nlst() behavior to ftplib.FTP instance. The
ftplib.FTP instance is available through the connection attribute, and
is exposed through __getattr__.
The behavior added by this class (recursive directory listing) is most
appropriate for ftp connections on a local network over a fast connection,
or for small directories on remote ftp servers.
The class relies on an externally defined callable, which can parse the
lines returned by the ftplib.FTP.dir() method. This callable should be
bound to the 'dirparser' attribute on this object. The callable 'dirparser'
attribute can be initialized by passing it in to the constructor using the
keyword argument 'dirparser', or by attaching the callable to the
'dirparser' attribute after instantiation.
The callable should return parsed results as a dict. This class makes some
assumptions about the contents of the dict returned by the user-defined
dirparser callable:
-- the key 'trycwds' holds a list of booleans
-- the key 'names' holds a list of filenames in the dir() listing.
-- The two lists should be the same length. A True value in the list
referred to by the 'trycwds' key indicates the corresponding value
in the list referred to by the 'names' key is a directory.
-- The key names are based on fields in the ftpparse structure, from the
ftpparse module/C library.
-- Other keys can be included in the dict, but they are not used by the
rnlst() method.
-- The callable should return an empty dict() if there is nothing to return
from the dir listing.
This module provides two parsers which seem to work ok, but it should
be easy to create others if these don't work for some reason:
-- parse_windows parses the dir listing from Windows ftp servers.
-- parse_unix parses the dir listing from UNIX ftp servers.
"""
def __init__(self, host='', user='', passwd='', acct='',
dirparser=None):
self.connection = ftplib.FTP(host, user, passwd, acct)
self.remotepathsep = '/'
self.dirparser = dirparser
def __getattr__(self, name):
"""
Delegate most requests to the underlying FTP object.
"""
return getattr(self.connection, name)
def _dir(self,path):
"""
Call dir() on path, and use callback to accumulate
returned lines. Return list of lines.
"""
dirlist = []
try:
self.connection.dir(path, dirlist.append)
except ftplib.error_perm:
warnings.warn('Access denied for path %s'%path)
return dirlist
def parsedir(self, path=''):
"""
Method to parse the lines returned by the ftplib.FTP.dir(),
when called on supplied path. Uses callable dirparser
attribute.
"""
if self.dirparser is None:
msg = ('Must set dirparser attribute to a callable '
'before calling this method')
raise TypeError(msg)
dirlines = self._dir(path)
dirdict = self.dirparser(dirlines)
return dirdict
def _cleanpath(self, path):
"""
Clean up path - remove repeated and trailing separators.
"""
slashes = self.remotepathsep*2
while slashes in path:
path = path.replace(slashes,self.remotepathsep)
if path.endswith(self.remotepathsep):
path = path[:-1]
return path
def _rnlst(self, path, filelist):
"""
Recursively accumulate filelist starting at
path, on the server accessed through this object's
ftp connection.
"""
path = self._cleanpath(path)
dirdict = self.parsedir(path)
trycwds = dirdict.get('trycwds', [])
names = dirdict.get('names', [])
for trycwd, name in zip(trycwds, names):
if trycwd: # name is a directory
self._rnlst(self.remotepathsep.join([path, name]), filelist)
else:
filelist.append(self.remotepathsep.join([path, name]))
return filelist
def rnlst(self, path=''):
"""
Recursive nlst(). Return a list of filenames under path.
"""
filelist = []
return self._rnlst(path,filelist)
# Naive ftplib.FTP.dir() parsing functions, which may or may not work. (These
# happen to work for servers I connect to.) Create your own functions (perhaps
# using ftpparse) for more robust solutions.
def parse_windows(dirlines):
"""
Parse the lines returned by ftplib.FTP.dir(), when called
on a Windows ftp server. May not work for all servers, but it
works for the ones I need to connect to.
"""
typemap = {'<DIR>': True}
if not dirlines:
return dict()
maxlen = max(len(line) for line in dirlines)
columns = [slice(0, 9), slice(9, 17), slice(17, 29), slice(29, 38),
slice(38, maxlen+1)]
fields = 'dates times trycwds sizes names'.split()
values = []
for line in dirlines:
vals = [line[slc].strip() for slc in columns]
vals[2] = typemap.get(vals[2], False)
values.append(vals)
lists = zip(*values)
assert len(lists) == len(fields)
return dict(zip(fields, lists))
def parse_unix(dirlines,startindex=1):
"""
Parse the lines returned by ftplib.FTP.dir(), when called
on a UNIX ftp server. May not work for all servers, but it
works for the ones I need to connect to.
"""
dirlines = dirlines[startindex:]
if not dirlines:
return dict()
pattern = re.compile('(.)(.*?)\s+(.*?)\s+(.*?)\s+(.*?)\s+'
'(.*?)\s+(.*?\s+.*?\s+.*?)\s+(.*)')
fields = 'trycwds tryretrs inodes users groups sizes dates names'.split()
getmatches = lambda s:pattern.search(s)
matches = filter(getmatches, dirlines)
getfields = lambda s:pattern.findall(s)[0]
lists = zip(*map(getfields, matches))
# change the '-','d','l' values to booleans, where names referring
# to directories get True, and others get False.
lists[0] = ['d' == s for s in lists[0]]
assert len(lists) == len(fields)
return dict(zip(fields, lists))
|
The old recipe, if it worked at all, was trying to do too much at once (traversing the remote file system, retrieving the files, parsing the dir() listing, etc.).
This recipe just enables the retrieval of a list containing files in all directories under a path on an ftp server. This enables the functionality the old recipe was trying to provide (mirroring an ftp directory), but is not confined to that use only. Still, ftpmirror.py is better suited for this task.
Notable omissions from the class include caching and retrieval methods for the information parsed from the ftp server directory listings. This should be easy to add, if needed.
Interactive usage might look something like the following, if the above code were in a module called ftp.py:
In [1]: import getpass
In [2]: import ftp
In [3]: passwd = getpass.getpass() Password:
In [4]: localftp = ftp.LocalFTP('192.168.0.100','username',passwd)
In [5]: localftp.dirparser = ftp.parse_unix
In [6]: filelist = localftp.rnlst('/data')
[now, do stuff with the filelist]
NOTE:
The parse_unix function accepts a keyword argument startindex, which can be used to skip '.' and '..' entries in the ftplib.FTP.dir() listing. It is important to skip these directories; otherwise rnlst() will traverse into those directories.
In the example above, I assumed the server does not list the '.' and '..' directories. If the server does list those directories, and these directories are always at the top of the listing, one way to avoid them follows:
localftp.dirparser = lambda lines,startindex=3:ftp.parse_unix(lines,startindex)
Another way would be to write a better dirparser (perhaps using ftpparse), which skips these directories in some other way.
Reference:
-RFC 959 -ftplib documentation
how about an os.walk()-like interface?
Some help needed, when I execute the next line
I get an error
Add the line "import ftplib" at the top of the module. "ftplib" is a python standard library module. Sorry for the confusion; I'm not sure how I managed to post this code without that line.
Please note the utility functions (parse_unix, parse_windows) for parsing ftp server directory listings are not robust. For example, the parse_unix function does not handle filenames with spaces. If you need help re-writing one of these functions to work better for you, please let me know.
It looks like ftplib.FTP in python 3.3 will have a method called mlsd() which would help solve this problem (i.e., brittle methods for parsing ftplib.FTP.dir() directory listings) in a more robust manner.
As an alternative, you may want to check out ftp client utilities (e.g., ftputil) on PyPI (http://pypi.python.org/pypi). You'll probably find a better solution there.