Welcome, guest | Sign In | My Account | Store | Cart

This recipe provides recursive nlst() behavior on top of a normal ftplib.FTP instance. The rnlst() method provided by the LocalFTP class returns a list of filenames under the path passed in as an argument. (One use for this list might be mirroring an ftp site. However, the python distribution contains a script called ftpmirror.py - use that instead.)

Best suited for use on fast local connections, or for use on relatively small remote ftp directories.

Please see code comments for additional information.

Python, 202 lines
  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
class LocalFTP(object):
    """
    Class adding recursive nlst() behavior to ftplib.FTP instance. The
    ftplib.FTP instance is available through the connection attribute, and
    is exposed through __getattr__.

    The behavior added by this class (recursive directory listing) is most
    appropriate for ftp connections on a local network over a fast connection, 
    or for small directories on remote ftp servers.

    The class relies on an externally defined callable, which can parse the
    lines returned by the ftplib.FTP.dir() method. This callable should be 
    bound to the 'dirparser' attribute on this object. The callable 'dirparser' 
    attribute can be initialized by passing it in to the constructor using the
    keyword argument 'dirparser', or by attaching the callable to the
    'dirparser' attribute after instantiation. 

    The callable should return parsed results as a dict. This class makes some
    assumptions about the contents of the dict returned by the user-defined 
    dirparser callable:

    -- the key 'trycwds' holds a list of booleans 

    -- the key 'names' holds a list of filenames in the dir() listing.
    
    -- The two lists should be the same length. A True value in the list
       referred to by the 'trycwds' key indicates the corresponding value
       in the list referred to by the 'names' key is a directory.

    -- The key names are based on fields in the ftpparse structure, from the   
       ftpparse module/C library.     
   
    -- Other keys can be included in the dict, but they are not used by the 
       rnlst() method.
      
    -- The callable should return an empty dict() if there is nothing to return
       from the dir listing.
       
    This module provides two parsers which seem to work ok, but it should
    be easy to create others if these don't work for some reason:

    -- parse_windows parses the dir listing from Windows ftp servers.
    -- parse_unix parses the dir listing from UNIX ftp servers.
    
    """
    
    def __init__(self, host='', user='', passwd='', acct='', 
                 dirparser=None):
        self.connection = ftplib.FTP(host, user, passwd, acct)
        self.remotepathsep = '/'
        self.dirparser = dirparser
        

    def __getattr__(self, name):
        """
        Delegate most requests to the underlying FTP object. 
        """

        return getattr(self.connection, name)


    def _dir(self,path):
        """
        Call dir() on path, and use callback to accumulate
        returned lines. Return list of lines.
        """

        dirlist = []
        try:
            self.connection.dir(path, dirlist.append)
        except ftplib.error_perm:
            warnings.warn('Access denied for path %s'%path)
        return dirlist


    def parsedir(self, path=''):
        """
        Method to parse the lines returned by the ftplib.FTP.dir(),
        when called on supplied path. Uses callable dirparser
        attribute. 
        """
        
        if self.dirparser is None:
            msg = ('Must set dirparser attribute to a callable '
                   'before calling this method')
            raise TypeError(msg)

        dirlines = self._dir(path)
        dirdict = self.dirparser(dirlines)
        return dirdict
        
        
    def _cleanpath(self, path):
        """
        Clean up path - remove repeated and trailing separators. 
        """
        
        slashes = self.remotepathsep*2
        while slashes in path:
            path = path.replace(slashes,self.remotepathsep)
            
        if path.endswith(self.remotepathsep):
            path = path[:-1]
            
        return path
        
        
    def _rnlst(self, path, filelist):
        """
        Recursively accumulate filelist starting at
        path, on the server accessed through this object's
        ftp connection.
        """
        
        path = self._cleanpath(path)
        dirdict = self.parsedir(path)
        
        trycwds = dirdict.get('trycwds', [])
        names = dirdict.get('names', [])
        
        for trycwd, name in zip(trycwds, names):           
            if trycwd: # name is a directory
                self._rnlst(self.remotepathsep.join([path, name]), filelist)
            else: 
                filelist.append(self.remotepathsep.join([path, name]))
                
        return filelist

                
    def rnlst(self, path=''):
        """
        Recursive nlst(). Return a list of filenames under path.
        """
      
        filelist = []
        return self._rnlst(path,filelist)
        

# Naive ftplib.FTP.dir() parsing functions, which may or may not work. (These
# happen to work for servers I connect to.) Create your own functions (perhaps
# using ftpparse) for more robust solutions.
       
def parse_windows(dirlines):
    """
    Parse the lines returned by ftplib.FTP.dir(), when called
    on a Windows ftp server. May not work for all servers, but it
    works for the ones I need to connect to.
    """

    typemap = {'<DIR>': True}
    
    if not dirlines:
        return dict()
    
    maxlen = max(len(line) for line in dirlines)
    columns = [slice(0, 9), slice(9, 17), slice(17, 29), slice(29, 38), 
               slice(38, maxlen+1)]

    fields = 'dates times trycwds sizes names'.split()

    values = []
    for line in dirlines:
        vals = [line[slc].strip() for slc in columns]
        vals[2] = typemap.get(vals[2], False)
        values.append(vals)
        
    lists = zip(*values)
    
    assert len(lists) == len(fields)

    return dict(zip(fields, lists))


def parse_unix(dirlines,startindex=1):
    """
    Parse the lines returned by ftplib.FTP.dir(), when called
    on a UNIX ftp server. May not work for all servers, but it
    works for the ones I need to connect to.
    """

    dirlines = dirlines[startindex:]
    if not dirlines:
        return dict()
   
    pattern = re.compile('(.)(.*?)\s+(.*?)\s+(.*?)\s+(.*?)\s+'
                         '(.*?)\s+(.*?\s+.*?\s+.*?)\s+(.*)')

    fields = 'trycwds tryretrs inodes users groups sizes dates names'.split()

    getmatches = lambda s:pattern.search(s)
    matches = filter(getmatches, dirlines)

    getfields = lambda s:pattern.findall(s)[0]
    lists = zip(*map(getfields, matches))
    
    # change the '-','d','l' values to booleans, where names referring
    # to directories get True, and others get False.
    lists[0] = ['d' == s for s in lists[0]]
    
    assert len(lists) == len(fields)
    
    return dict(zip(fields, lists))
    

The old recipe, if it worked at all, was trying to do too much at once (traversing the remote file system, retrieving the files, parsing the dir() listing, etc.).

This recipe just enables the retrieval of a list containing files in all directories under a path on an ftp server. This enables the functionality the old recipe was trying to provide (mirroring an ftp directory), but is not confined to that use only. Still, ftpmirror.py is better suited for this task.

Notable omissions from the class include caching and retrieval methods for the information parsed from the ftp server directory listings. This should be easy to add, if needed.

Interactive usage might look something like the following, if the above code were in a module called ftp.py:

In [1]: import getpass

In [2]: import ftp

In [3]: passwd = getpass.getpass() Password:

In [4]: localftp = ftp.LocalFTP('192.168.0.100','username',passwd)

In [5]: localftp.dirparser = ftp.parse_unix

In [6]: filelist = localftp.rnlst('/data')

[now, do stuff with the filelist]

NOTE:

The parse_unix function accepts a keyword argument startindex, which can be used to skip '.' and '..' entries in the ftplib.FTP.dir() listing. It is important to skip these directories; otherwise rnlst() will traverse into those directories.

In the example above, I assumed the server does not list the '.' and '..' directories. If the server does list those directories, and these directories are always at the top of the listing, one way to avoid them follows:

localftp.dirparser = lambda lines,startindex=3:ftp.parse_unix(lines,startindex)

Another way would be to write a better dirparser (perhaps using ftpparse), which skips these directories in some other way.

Reference:

-RFC 959 -ftplib documentation

3 comments

Jochen Wersdörfer 13 years, 10 months ago  # | flag

how about an os.walk()-like interface?

def _walk(self, top, topdown=True, onerror=None):

    top = self._cleanpath(top)
    dirdict = self.parsedir(top)

    trycwds = dirdict.get('trycwds', [])
    names = dirdict.get('names', [])

    dirs, nondirs = [], []
    for is_dir, name in zip(trycwds, names):
        if is_dir:
            dirs.append(name)
        else:
            nondirs.append(name)

    if topdown:
        yield top, dirs, nondirs
    for name in dirs:
        path = self.remotepathsep.join([top, name])
        for x in self._walk(path, topdown, onerror):
            yield x
    if not topdown:
        yield top, dirs, nondirs

def rnlst(self, path=''):
    """
    Recursive nlst(). Return a list of filenames under path.
    """
    filelist = []
    #return self._rnlst(path,filelist)
    return self._walk(path)
Alex Martinez Corria 10 years, 6 months ago  # | flag

Some help needed, when I execute the next line

  localftp = ftp.LocalFTP('192.168.0.100','username',passwd)

I get an error

 Traceback (most recent call last):
 File "<stdin>", line 1, in <module>
 File "ftp.py", line 50, in __init__
    self.connection = ftplib.FTP(host, user, passwd, acct)
 NameError: global name 'ftplib' is not defined
Rich Krauter (author) 10 years, 6 months ago  # | flag

Add the line "import ftplib" at the top of the module. "ftplib" is a python standard library module. Sorry for the confusion; I'm not sure how I managed to post this code without that line.

Please note the utility functions (parse_unix, parse_windows) for parsing ftp server directory listings are not robust. For example, the parse_unix function does not handle filenames with spaces. If you need help re-writing one of these functions to work better for you, please let me know.

It looks like ftplib.FTP in python 3.3 will have a method called mlsd() which would help solve this problem (i.e., brittle methods for parsing ftplib.FTP.dir() directory listings) in a more robust manner.

As an alternative, you may want to check out ftp client utilities (e.g., ftputil) on PyPI (http://pypi.python.org/pypi). You'll probably find a better solution there.

Created by Rich Krauter on Mon, 22 Mar 2004 (PSF)
Python recipes (4591)
Rich Krauter's recipes (1)

Required Modules

  • (none specified)

Other Information and Tasks