ActiveState Code

Recipe 146306: Http client to POST using multipart/form-data


A scripted web client that will post data to a site as if from a form using ENCTYPE="multipart/form-data". This is typically used to upload files, but also gets around a server's (e.g. ASP's) limitation on the amount of data that can be accepted via a standard POST (application/x-www-form-urlencoded).

Python
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
import httplib, mimetypes

def post_multipart(host, selector, fields, files):
    """
    Post fields and files to an http host as multipart/form-data.
    fields is a sequence of (name, value) elements for regular form fields.
    files is a sequence of (name, filename, value) elements for data to be uploaded as files
    Return the server's response page.
    """
    content_type, body = encode_multipart_formdata(fields, files)
    h = httplib.HTTP(host)
    h.putrequest('POST', selector)
    h.putheader('content-type', content_type)
    h.putheader('content-length', str(len(body)))
    h.endheaders()
    h.send(body)
    errcode, errmsg, headers = h.getreply()
    return h.file.read()

def encode_multipart_formdata(fields, files):
    """
    fields is a sequence of (name, value) elements for regular form fields.
    files is a sequence of (name, filename, value) elements for data to be uploaded as files
    Return (content_type, body) ready for httplib.HTTP instance
    """
    BOUNDARY = '----------ThIs_Is_tHe_bouNdaRY_$'
    CRLF = '\r\n'
    L = []
    for (key, value) in fields:
        L.append('--' + BOUNDARY)
        L.append('Content-Disposition: form-data; name="%s"' % key)
        L.append('')
        L.append(value)
    for (key, filename, value) in files:
        L.append('--' + BOUNDARY)
        L.append('Content-Disposition: form-data; name="%s"; filename="%s"' % (key, filename))
        L.append('Content-Type: %s' % get_content_type(filename))
        L.append('')
        L.append(value)
    L.append('--' + BOUNDARY + '--')
    L.append('')
    body = CRLF.join(L)
    content_type = 'multipart/form-data; boundary=%s' % BOUNDARY
    return content_type, body

def get_content_type(filename):
    return mimetypes.guess_type(filename)[0] or 'application/octet-stream'

Discussion

At Python 9, Moshe Zadka showed how to create the multipart-mime data using MimeWriter ( http://www.python9.org/p9-zadka.ppt ). His recipe worked just fine for me when I was talking to a Zope server, but triggered a cryptic error message when pointed at an ASP server that was using the COM file upload component from http://persits.com .

The main problem with MimeWriter is that it does not use '\r\n' for newlines, and the ASP server insisted on this. Other bits of persnicketiness on the part of the ASP included rejection of a mime message that had content-type headers in its regular form fields, and insistence on at least five dashes at the front of the boundary marker.

The function encode_multipart_formdata() shown here takes a more direct approach to creating the mime data, and fairly closely mimics the data sent by Internet Explorer 5.5.

Comments

  1. 1. At 5:27 a.m. on 5 jan 2004, Anonymous said:

    python mod. I made a wrapper to urllib2.urlopen() in order to support file uploading

    http://fabien.seisen.org/python/

    It uses boundary creation from mimetools and doesn't read the whole file in memory

    import urllib2_file
    import urllib2
    
    data = {'name': 'value',
            'file':  open('/etc/services')
           }
    urllib2.urlopen('http://site.com/script_upload.php', data)
    
  2. 2. At 10:14 a.m. on 21 apr 2004, Chris Green said:

    using urls.

    import urlparse
    
    def posturl(url, fields, files):
        urlparts = urlparse.urlsplit(url)
        return post_multipart(urlparts[1], urlparts[2], fields,files)
    

    This allows you to specify the form as a url and not worry about host and selector.

  3. 3. At 2:55 a.m. on 6 sep 2004, chris hoke said:

    Update to use HTTPConnection. simple update to use HTTPConnection instead of HTTP for the recipe to simplify it and also to use HTTP 1.1.

    Only replace first function of the recipe:

    def post_multipart(host, selector, fields, files):
        content_type, body = encode_multipart_formdata(fields, files)
        h = httplib.HTTPConnection(host)
        headers = {
            'User-Agent': 'INSERT USERAGENTNAME',
            'Content-Type': content_type
            }
        h.request('POST', selector, body, headers)
        res = h.getresponse()
        return res.status, res.reason, res.read()
    

    Should work as the original version.

  4. 4. At 2:58 a.m. on 6 sep 2004, chris hoke said:

    extended return... new version does additionally return Status and Reason information. For exact same return of original version replace

    res.status, res.reason, res.read()
    

    with

    res.read()
    
  5. 5. At 6:33 a.m. on 17 mar 2006, James Jurack said:

    With cookie support on Python 2.4. Here's a version of your code that supports cookies with python 2.4's urllib2 and cookielib.

    import httplib, mimetypes, mimetools, urllib2, cookielib
    
    cj = cookielib.CookieJar()
    opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
    urllib2.install_opener(opener)
    
    def post_multipart(host, selector, fields, files):
        """
        Post fields and files to an http host as multipart/form-data.
        fields is a sequence of (name, value) elements for regular form fields.
        files is a sequence of (name, filename, value) elements for data to be uploaded as files
        Return the server's response page.
        """
        content_type, body = encode_multipart_formdata(fields, files)
        headers = {'Content-Type': content_type,
                   'Content-Length': str(len(body))}
        r = urllib2.Request("http://%s%s" % (host, selector), body, headers)
        return urllib2.urlopen(r).read()
    
    def encode_multipart_formdata(fields, files):
        """
        fields is a sequence of (name, value) elements for regular form fields.
        files is a sequence of (name, filename, value) elements for data to be uploaded as files
        Return (content_type, body) ready for httplib.HTTP instance
        """
        BOUNDARY = mimetools.choose_boundary()
        CRLF = '\r\n'
        L = []
        for (key, value) in fields:
            L.append('--' + BOUNDARY)
            L.append('Content-Disposition: form-data; name="%s"' % key)
            L.append('')
            L.append(value)
        for (key, filename, value) in files:
            L.append('--' + BOUNDARY)
            L.append('Content-Disposition: form-data; name="%s"; filename="%s"' % (key, filename))
            L.append('Content-Type: %s' % get_content_type(filename))
            L.append('')
            L.append(value)
        L.append('--' + BOUNDARY + '--')
        L.append('')
        body = CRLF.join(L)
        content_type = 'multipart/form-data; boundary=%s' % BOUNDARY
        return content_type, body
    
    def get_content_type(filename):
        return mimetypes.guess_type(filename)[0] or 'application/octet-stream'
    
  6. 6. At 3:44 p.m. on 6 apr 2006, Will Holcomb said:

    a less intrusive version using the urllib2 hierarchy. Here is the same basic idea, but using a class inherited into the BasicHandler hierarchy of urllib2. It has the advantage of leaving all the existing urllib2 functionality intact.

    Example usage:

    import MultipartPostHandler, urllib2, cookielib
    
    cookies = cookielib.CookieJar()
    opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cookies),
                                    MultipartPostHandler.MultipartPostHandler)
    params = { "username" : "bob", "password" : "riviera",
               "file" : open("filename", "rb") }
    opener.open("http://wwww.bobsite.com/upload/", params)
    

    The code is at: http://odin.himinbi.org/MultipartPostHandler.py

  7. 7. At 12:41 p.m. on 9 aug 2007, Brian Schneider said:

    MultipartPostHandler didn't work for unicode files. I fixed it by reading in via StringIO class.

    fix posted here:

    http://peerit.blogspot.com/2007/07/multipartposthandler-doesnt-work-for.html

  8. 8. At 2:32 a.m. on 11 oct 2007, Thomas Guettler said:

    Script which can be called from the command line. This script is based on this recipe and can be called from the shell:

    http://fabien.seisen.org/python/urllib2_multipart.html

  9. 9. At 3:12 p.m. on 1 feb 2008, Tim Keating said:

    Do NOT use this script verbatim. It relies on a deprecated backward-compatibility module in httplib that didn't work at all for me. When I switched to using the more modern HTTPConnection version (described by another commenter above) it worked correctly first time out of the box, so I strongly encourage you to use that instead.

  10. 10. At 5:57 a.m. on 3 jul 2008, Lee June said:

    how to deal this situation? I have a question in my case. I cannot find the solution, can anybody me out? thank.

    Even for logout action, I have to supply the 'data' var as following

    [code works]
    import MultipartPostHandler, cookielib
    cookies = cookielib.CookieJar()
    opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cookies),
    MultipartPostHandler.MultipartPostHandler)
    urllib2.install_opener(opener)
    data = urllib.urlencode({'usr':'myname','pwd':'mypwd',})
    request = urllib2.Request('http://host/cgi-bin/bbslogin', data)
    data=urllib2.open(request).read()
    utmpnum, utmpkey, utmpuserid=ReadFromWeb(data)
    data = urllib.urlencode({'utmpnum':utmpnum,'utmpkey':utmpkey,'utmpuserid':utmpuserid})
    request = urllib2.Request('http://host/cgi-bin/bbslogout', data)
    [/code works]
    

    If I do not supply 'data', I was told "you are not logged in"

    So, in this case, how can I supply all of utmpnum, utmpkey, utmpuserid and the attached file?

    I have this code, but the response still says "you are not logged in"!

    [code does not work]
    (this is previous log in code)
    
    bbs_att_url='http://host/cgi-bin/bbsdoupload'
    
    data = {
    'utmpnum':utmpnum,
    'utmpkey':utmpkey,
    'utmpuserid':utmpuserid,
    'upfile' : open("myfile.ico", "rb") ,
    }
    data=urllib.urlencode(data)
    request = urllib2.Request(bbs_att_url, data)
    fd=urllib2.urlopen(request)
    data=fd.read()
    print data            #"you are not logged in" can be found
    [/code does not work]
    
  11. 11. At 4:52 a.m. on 14 aug 2008, Robert Lujo said:

    Take a look at recipe 576422: Python HTTP POST binary file upload with pycurl (http://code.activestate.com/recipes/576422/)

  12. 12. At 11:52 a.m. on 24 nov 2008, Florian Steinel said:

    See also Python Bug 3244, target => Python 2.7

  13. 13. At 12:21 p.m. on 17 aug 2009, İlkin Balkanay said:

    I am trying to upload an image to flickr using flickr upload API. The recipe raises UnicodeDecodeError when trying to join '\r\n' and file.read() How can I construct the body of the post data if i want to post an image (binary file)?

  14. 14. At 10:57 p.m. on 20 aug 2009, İlkin Balkanay said:

    body = CRLF.join([element.decode('string_escape') for element in L]) worked for me.

Sign in to comment