Popular Python recipes tagged "meta:requires=urllib"http://code.activestate.com/recipes/langs/python/tags/meta:requires=urllib/2014-07-05T18:47:47-07:00ActiveState Code RecipesComposing a POSTable HTTP request with multipart/form-data Content-Type to simulate a form/file upload. (Python)
2014-03-08T17:34:38-08:00István Pásztorhttp://code.activestate.com/recipes/users/4189380/http://code.activestate.com/recipes/578846-composing-a-postable-http-request-with-multipartfo/
<p style="color: grey">
Python
recipe 578846
by <a href="/recipes/users/4189380/">István Pásztor</a>
(<a href="/recipes/tags/field/">field</a>, <a href="/recipes/tags/file/">file</a>, <a href="/recipes/tags/form/">form</a>, <a href="/recipes/tags/html/">html</a>, <a href="/recipes/tags/httpclient/">httpclient</a>, <a href="/recipes/tags/mime/">mime</a>, <a href="/recipes/tags/multipart/">multipart</a>, <a href="/recipes/tags/post/">post</a>, <a href="/recipes/tags/upload/">upload</a>, <a href="/recipes/tags/web/">web</a>).
Revision 5.
</p>
<p>This code is useful if you are using a http client and you want to simulate a request similar to that of a browser that submits a form containing several input fields (including file upload fields). I've used this with python 2.x.</p>
Recursive Multimedia (audio, video) M3U Playlist Generator (Python)
2014-02-08T01:03:36-08:00Mano Bastardohttp://code.activestate.com/recipes/users/4182040/http://code.activestate.com/recipes/578771-recursive-multimedia-audio-video-m3u-playlist-gene/
<p style="color: grey">
Python
recipe 578771
by <a href="/recipes/users/4182040/">Mano Bastardo</a>
(<a href="/recipes/tags/audio/">audio</a>, <a href="/recipes/tags/ffmpeg/">ffmpeg</a>, <a href="/recipes/tags/generate/">generate</a>, <a href="/recipes/tags/m3u/">m3u</a>, <a href="/recipes/tags/mulitmedia/">mulitmedia</a>, <a href="/recipes/tags/os/">os</a>, <a href="/recipes/tags/playlist/">playlist</a>, <a href="/recipes/tags/python/">python</a>, <a href="/recipes/tags/system/">system</a>, <a href="/recipes/tags/video/">video</a>).
Revision 9.
</p>
<p>Generate an m3u playlist searching recursively
for multimedia files (video or audio) in the given
directory.
Information from ID3 tags will be extracted for audio
files with <a href="http://en.wikipedia.org/wiki/FFmpeg">FFmpeg</a> available.</p>
Music Downloader (Python)
2013-05-25T06:52:51-07:00Christian Careagahttp://code.activestate.com/recipes/users/4186639/http://code.activestate.com/recipes/578530-music-downloader/
<p style="color: grey">
Python
recipe 578530
by <a href="/recipes/users/4186639/">Christian Careaga</a>
(<a href="/recipes/tags/download/">download</a>, <a href="/recipes/tags/downloader/">downloader</a>, <a href="/recipes/tags/music/">music</a>, <a href="/recipes/tags/program/">program</a>, <a href="/recipes/tags/python/">python</a>, <a href="/recipes/tags/python_scripts/">python_scripts</a>, <a href="/recipes/tags/selenium/">selenium</a>, <a href="/recipes/tags/urllib/">urllib</a>, <a href="/recipes/tags/urllib2/">urllib2</a>).
</p>
<p>A Python Program i wrote that downloads music from the web</p>
download the Activestate cook book recipe (Python)
2013-01-29T16:24:22-08:00lwz7512http://code.activestate.com/recipes/users/4185066/http://code.activestate.com/recipes/578439-download-the-activestate-cook-book-recipe/
<p style="color: grey">
Python
recipe 578439
by <a href="/recipes/users/4185066/">lwz7512</a>
.
</p>
<p>Small effort to store the python recipes to our local</p>
<p>similar effort by other people:
543267-i-will-download-all-of-the-recipes-from-the-python
535162-i-download-all-the-python-cookbook-recipes</p>
download the Activestate cook book recipe (Python)
2012-07-04T12:14:22-07:00Sudeep AMhttp://code.activestate.com/recipes/users/4182702/http://code.activestate.com/recipes/578193-download-the-activestate-cook-book-recipe/
<p style="color: grey">
Python
recipe 578193
by <a href="/recipes/users/4182702/">Sudeep AM</a>
.
Revision 2.
</p>
<p>Small effort to store the python recipes to our local</p>
<p>similar effort by other people:
543267-i-will-download-all-of-the-recipes-from-the-python
535162-i-download-all-the-python-cookbook-recipes</p>
KYSU (Keep Your Stuff Updated) (Python)
2012-07-29T01:12:01-07:00Gamoholichttp://code.activestate.com/recipes/users/4182585/http://code.activestate.com/recipes/578174-kysu-keep-your-stuff-updated/
<p style="color: grey">
Python
recipe 578174
by <a href="/recipes/users/4182585/">Gamoholic</a>
(<a href="/recipes/tags/automatic/">automatic</a>, <a href="/recipes/tags/exe/">exe</a>, <a href="/recipes/tags/installer/">installer</a>, <a href="/recipes/tags/updated/">updated</a>, <a href="/recipes/tags/updater/">updater</a>).
Revision 5.
</p>
<p>I wrote this program to keep my Samba share up to date. The Samba share contains installers for the programs that I use to fix and update computers (ccleaner, mbam, java, .etc). It can also be used for ISO's such as clonezilla.</p>
<p>For the latest version of the script and the accompanying files please go to GitHub.</p>
<p><a href="https://github.com/Gamoholic/KYSU" rel="nofollow">https://github.com/Gamoholic/KYSU</a></p>
<p>!!! Important! The only OS I have tested this on is Ubuntu. I will test Windows soon. If it doesn't work on any other OS please let me know! Also, I have no idea how well this works with Python 3. I may test that eventually.</p>
Geocoding Lists via Google Maps (Python)
2012-05-11T05:06:27-07:00Mano Bastardohttp://code.activestate.com/recipes/users/4182040/http://code.activestate.com/recipes/578126-geocoding-lists-via-google-maps/
<p style="color: grey">
Python
recipe 578126
by <a href="/recipes/users/4182040/">Mano Bastardo</a>
(<a href="/recipes/tags/batch/">batch</a>, <a href="/recipes/tags/coordinates/">coordinates</a>, <a href="/recipes/tags/geocode/">geocode</a>, <a href="/recipes/tags/geocoding/">geocoding</a>, <a href="/recipes/tags/google/">google</a>, <a href="/recipes/tags/google_maps/">google_maps</a>, <a href="/recipes/tags/lat/">lat</a>, <a href="/recipes/tags/latitude/">latitude</a>, <a href="/recipes/tags/list/">list</a>, <a href="/recipes/tags/list_comprehension/">list_comprehension</a>, <a href="/recipes/tags/lng/">lng</a>, <a href="/recipes/tags/longitude/">longitude</a>, <a href="/recipes/tags/map/">map</a>, <a href="/recipes/tags/web/">web</a>).
Revision 2.
</p>
<p>A simple script written as an experiment in geocoding addresses in a database. A list of addresses in the form of "100 Any Street, Anytown, CA, 10010" is passed to a Google Maps URL, and the latitude/longitude coordinates are extracted from the returned XML.</p>
<p>XML methods are not used in this script, but simple string searches instead.</p>
A Simple Webcrawler (Python)
2012-03-03T02:37:30-08:00Johnhttp://code.activestate.com/recipes/users/4181142/http://code.activestate.com/recipes/578060-a-simple-webcrawler/
<p style="color: grey">
Python
recipe 578060
by <a href="/recipes/users/4181142/">John</a>
(<a href="/recipes/tags/crawler/">crawler</a>, <a href="/recipes/tags/html/">html</a>, <a href="/recipes/tags/page/">page</a>, <a href="/recipes/tags/parser/">parser</a>, <a href="/recipes/tags/scraping/">scraping</a>, <a href="/recipes/tags/urllib/">urllib</a>, <a href="/recipes/tags/urlopen/">urlopen</a>, <a href="/recipes/tags/web/">web</a>).
</p>
<p>This is my simple web crawler. It takes as input a list of seed pages (web urls) and 'scrapes' each page of all its absolute path links (i.e. links in the format <a href="http://" rel="nofollow">http://</a>) and adds those to a dictionary. The web crawler can take all the links found in the seed pages and then scrape those as well. You can continue scraping as deep as you like. You can control how "deep you go" by specifying the depth variable passed into the WebCrawler class function start_crawling(seed_pages,depth). Think of the depth as the recursion depth (or the number of web pages deep you go before returning back up the tree).</p>
<p>To make this web crawler a little more interesting I added some bells and whistles. I added the ability to pass into the WebCrawler class constructor a regular expression object. The regular expression object is used to "filter" the links found during scraping. For example, in the code below you will see:</p>
<p>cnn_url_regex = re.compile('(?<=[.]cnn)[.]com') # cnn_url_regex is a regular expression object</p>
<p>w = WebCrawler(cnn_url_regex)</p>
<p>This particular regular expression says:</p>
<p>1) Find the first occurence of the string '.com'</p>
<p>2) Then looking backwards from where '.com' was found it attempts to find '.cnn'</p>
<p>Why do this?</p>
<p>You can control where the crawler crawls. In this case I am constraining the crawler to operate on webpages within cnn.com.</p>
<p>Another feature I added was the ability to parse a given page looking for specific html tags. I chose as an example the <h1> tag. Once a <h1> tag is found I store all the words I find in the tag in a dictionary that gets associated with the page url.</p>
<p>Why do this?</p>
<p>My thought was that if I scraped the page for text I could eventually use this data for a search engine request. Say I searched for 'Lebron James'. And suppose that one of the pages my crawler scraped found an article that mentions Lebron James many times. In response to a search request I could return the link with the Lebron James article in it.</p>
<p>The web crawler is described in the WebCrawler class. It has 2 functions the user should call:</p>
<p>1) start_crawling(seed_pages,depth)</p>
<p>2) print_all_page_text() # this is only used for debug purposes</p>
<p>The rest of WebCrawler's functions are internal functions that should not be called by the user (think private in C++).</p>
<p>Upon construction of a WebCrawler object, it creates a MyHTMLParser object. The MyHTMLParser class inherits from the built-in Python class HTMLParser. I use the MyHTMLParser object when searching for the <h1> tag. The MyHTMLParser class creates instances of a helper class named Tag. The tag class is used in creating a "linked list" of tags.</p>
<p>So to get started with WebCrawler make sure to use Python 2.7.2. Enter the code a piece at a time into IDLE in the order displayed below. This ensures that you import libs before you start using them.</p>
<p>Once you have entered all the code into IDLE, you can start crawling the 'interwebs' by entering the following:</p>
<p>import re</p>
<p>cnn_url_regex = re.compile('(?<=[.]cnn)[.]com') </p>
<p>w = WebCrawler(cnn_url_regex)</p>
<p>w.start_crawling(['http://www.cnn.com/2012/02/24/world/americas/haiti-pm-resigns/index.html?hpt=hp_t3'],1)</p>
<p>Of course you can enter any page you want. But the regular expression object is already setup to filter on <a href="http://cnn.com" rel="nofollow">cnn.com</a>. Remember the second parameter passed into the start_crawling function is the recursion depth.</p>
<p>Happy Crawling!</p>
Extract air quality data of Santigo , Chile on csv file (Python)
2011-06-30T13:39:29-07:00jrovegnohttp://code.activestate.com/recipes/users/4170207/http://code.activestate.com/recipes/577773-extract-air-quality-data-of-santigo-chile-on-csv-f/
<p style="color: grey">
Python
recipe 577773
by <a href="/recipes/users/4170207/">jrovegno</a>
(<a href="/recipes/tags/aire/">aire</a>, <a href="/recipes/tags/calidad/">calidad</a>, <a href="/recipes/tags/chile/">chile</a>, <a href="/recipes/tags/data_mining/">data_mining</a>, <a href="/recipes/tags/santiago/">santiago</a>).
</p>
<p>Extract air quality data of Santigo , Chile on csv file</p>
Extract data from feedjit (Python)
2011-05-03T19:50:27-07:00jrovegnohttp://code.activestate.com/recipes/users/4170207/http://code.activestate.com/recipes/577683-extract-data-from-feedjit/
<p style="color: grey">
Python
recipe 577683
by <a href="/recipes/users/4170207/">jrovegno</a>
(<a href="/recipes/tags/data/">data</a>, <a href="/recipes/tags/feedjit/">feedjit</a>).
Revision 2.
</p>
<p>Script to extract data from my live traffic feed from feedjit</p>
Download all lolcat images from iCanHasCheezburger.com (Python)
2011-03-10T08:49:14-08:00Rahul Anandhttp://code.activestate.com/recipes/users/4173646/http://code.activestate.com/recipes/577603-download-all-lolcat-images-from-icanhascheezburger/
<p style="color: grey">
Python
recipe 577603
by <a href="/recipes/users/4173646/">Rahul Anand</a>
(<a href="/recipes/tags/download/">download</a>, <a href="/recipes/tags/images/">images</a>, <a href="/recipes/tags/lolcat/">lolcat</a>, <a href="/recipes/tags/python/">python</a>, <a href="/recipes/tags/web/">web</a>).
</p>
<p>Running this python script will download all lolcat images from <a href="http://icanhascheezburger.com" rel="nofollow">icanhascheezburger.com</a> to the current folder. Download will start from the oldest image. Images are collected into subfolders lolcat0, lolcat1 etc, each containing 300 images. The script can be stopped and resumed at anytime.
Make sure to create files <em>lolconfig.txt</em> and <em>log.txt</em> in the same folder before running the script. <em>lolconfig.txt</em> must have a string as follows in the beginning: <em>1496/1496/0</em>.
log.txt is an empty file in the beginning</p>
Robot Pager (Search engines and others) (Python)
2010-10-10T19:33:20-07:00Carlos del Ojohttp://code.activestate.com/recipes/users/4173977/http://code.activestate.com/recipes/577420-robot-pager-search-engines-and-others/
<p style="color: grey">
Python
recipe 577420
by <a href="/recipes/users/4173977/">Carlos del Ojo</a>
(<a href="/recipes/tags/automate/">automate</a>, <a href="/recipes/tags/engine/">engine</a>, <a href="/recipes/tags/paging/">paging</a>, <a href="/recipes/tags/robot/">robot</a>, <a href="/recipes/tags/search/">search</a>, <a href="/recipes/tags/websites/">websites</a>).
Revision 3.
</p>
<p>This is a class to make easy the development of robots, to parse results over a website with a paging. For example Google, Yahoo, Bing, or any other page with paging system.</p>
<p>PagerEngine is the main class. I've developed three more clases implementing GoogleSearch, YahooSearch and BingSearch as examples.</p>
<p>Inheriting from PagerEngine (and having RexExp knowledge) you can easily develop other robots for other websites.</p>
Script para descargar videos desde http://www.chilevision.cl/ (Python)
2010-08-23T20:20:15-07:00jrovegnohttp://code.activestate.com/recipes/users/4170207/http://code.activestate.com/recipes/577367-script-para-descargar-videos-desde-httpwwwchilevis/
<p style="color: grey">
Python
recipe 577367
by <a href="/recipes/users/4170207/">jrovegno</a>
(<a href="/recipes/tags/chv/">chv</a>, <a href="/recipes/tags/download/">download</a>, <a href="/recipes/tags/tv/">tv</a>, <a href="/recipes/tags/video/">video</a>).
</p>
<p>Requiere:
- aria2c - wget
Extras:
Ofrece Descargar el resto de los videos
Uso:
# Comillas requeridas, problema parser
cvh_video.py "http://www.chilevision.cl/home/index.php?option=com_content&task=view&id=YYYYY&Itemid=XXX"</p>
Script para saber calida del aire Santiago de Chile (Python)
2011-06-02T00:55:02-07:00jrovegnohttp://code.activestate.com/recipes/users/4170207/http://code.activestate.com/recipes/577369-script-para-saber-calida-del-aire-santiago-de-chil/
<p style="color: grey">
Python
recipe 577369
by <a href="/recipes/users/4170207/">jrovegno</a>
(<a href="/recipes/tags/aire/">aire</a>, <a href="/recipes/tags/calidad/">calidad</a>, <a href="/recipes/tags/santiago/">santiago</a>).
Revision 4.
</p>
<p>Script para saber calida del aire Santiago de Chile</p>
ur1.ca command-line client (Python)
2011-03-23T05:27:27-07:00Conghttp://code.activestate.com/recipes/users/4167149/http://code.activestate.com/recipes/577236-ur1ca-command-line-client/
<p style="color: grey">
Python
recipe 577236
by <a href="/recipes/users/4167149/">Cong</a>
(<a href="/recipes/tags/scraping/">scraping</a>, <a href="/recipes/tags/shortening/">shortening</a>, <a href="/recipes/tags/url/">url</a>, <a href="/recipes/tags/web/">web</a>).
Revision 2.
</p>
<p>(ur1.ca)[http://ur1.ca/] is the URL shortening services provided by <a href="http://status.net" rel="nofollow">status.net</a>. This script makes it possible to access the service from the command line. This is done by scraping the returned page and look for the shortened URL.</p>
Routines for programmatically authenticating with the Google Accounts system at Google App-Engine. (Python)
2010-05-20T20:39:50-07:00Berendhttp://code.activestate.com/recipes/users/4173891/http://code.activestate.com/recipes/577217-routines-for-programmatically-authenticating-with-/
<p style="color: grey">
Python
recipe 577217
by <a href="/recipes/users/4173891/">Berend</a>
(<a href="/recipes/tags/auth/">auth</a>, <a href="/recipes/tags/authentication/">authentication</a>, <a href="/recipes/tags/gae/">gae</a>, <a href="/recipes/tags/google/">google</a>, <a href="/recipes/tags/http/">http</a>, <a href="/recipes/tags/sessions/">sessions</a>).
Revision 2.
</p>
<p>This takes two calls, one to the ClientLogin service of Google Accounts,
and then a second to the login frontend of App Engine.</p>
<p>User credentials are provided to the first, which responds with a token.
Passing that token to the _ah/login GAE endpoint then gives the cookie that can
be used to make further authenticated requests.</p>
Download chromium browser nightly builds for any OS (with proxy support) (Python)
2014-07-05T18:47:47-07:00ccpizzahttp://code.activestate.com/recipes/users/4170754/http://code.activestate.com/recipes/577162-download-chromium-browser-nightly-builds-for-any-o/
<p style="color: grey">
Python
recipe 577162
by <a href="/recipes/users/4170754/">ccpizza</a>
(<a href="/recipes/tags/chrome/">chrome</a>, <a href="/recipes/tags/chromium/">chromium</a>, <a href="/recipes/tags/download/">download</a>, <a href="/recipes/tags/unzip/">unzip</a>).
Revision 19.
</p>
<p>Downloads the latest Chromium browser build from <a href="http://commondatastorage.googleapis.com/chromium-browser-continuous/" rel="nofollow">http://commondatastorage.googleapis.com/chromium-browser-continuous/</a> using urllib2 or wget (with Python versions below 2.5) and unzips the downloaded zip file to a predefined folder.</p>
<p>To use a custom proxy define the <code>HTTP_PROXY</code> system variable.</p>
<p>The script will figure out the OS but you can also pass the platform as the first parameter (one of <code>win32, linux, linux64, mac</code>).</p>
<p><em>Prerequisites (!! only for Python versions below 2.5):</em></p>
<ul>
<li><p><code>wget</code> - usually already installed on most linux systems. Windows users can get it <a href="http://gnuwin32.sourceforge.net/packages/wget.htm">here</a>.</p></li>
<li><p><code>unzip</code> - used for unpacking the archive; usually already installed on most linux systems. Windows users can get it <a href="http://gnuwin32.sourceforge.net/packages/unzip.htm">here</a>.</p></li>
</ul>
<p>Both <code>wget</code> and <code>unzip</code> should be available in PATH (for Python 2.5+ native Python modules are used).</p>
<p>For most Linux distros this script does not make much sense since the built-in package managers do a better job of managing chromium builds and dependencies, but it still can be useful if you are using a stable Chromium build but would like to be able to test the nightly builds too.</p>
<p>For OSX an additional installation step will be performed using the <code>install.sh</code> that is included in the OSX build. The OSX installer will copy the package to <code>~/Applications/Chromium</code>, and set some permissions that are required for Chromium to run. Running the unpacked zip without doing the installation will most likely will not work because of missing executable permissions on some files.</p>
Retrieve Dell Warranty Information for all machines in AD Domain (Python)
2010-02-18T15:51:28-08:00Kenneth Keiterhttp://code.activestate.com/recipes/users/4173089/http://code.activestate.com/recipes/577056-retrieve-dell-warranty-information-for-all-machine/
<p style="color: grey">
Python
recipe 577056
by <a href="/recipes/users/4173089/">Kenneth Keiter</a>
(<a href="/recipes/tags/active_directory_scripts/">active_directory_scripts</a>, <a href="/recipes/tags/ad/">ad</a>, <a href="/recipes/tags/dell/">dell</a>, <a href="/recipes/tags/domain/">domain</a>, <a href="/recipes/tags/network/">network</a>, <a href="/recipes/tags/warranty/">warranty</a>, <a href="/recipes/tags/windows/">windows</a>).
</p>
<p>This snippet retrieves warranty information for all Dell machines in a domain and outputs a CSV of the results. </p>
<p>Should be run on a machine joined to an active directory or NT4 domain. May need to be updated to parse Dell's website if they update it (since they have no service tag API).</p>
Improvements of the urllib.URLopen.retrieve() method (Python)
2010-01-16T04:50:07-08:00Kévin Gomezhttp://code.activestate.com/recipes/users/4172815/http://code.activestate.com/recipes/577009-improvements-of-the-urlliburlopenretrieve-method/
<p style="color: grey">
Python
recipe 577009
by <a href="/recipes/users/4172815/">Kévin Gomez</a>
(<a href="/recipes/tags/retrieve/">retrieve</a>, <a href="/recipes/tags/urllib/">urllib</a>).
</p>
<p>I improved the urllib.URLopen.retrieve() method so that it can restart a download if it failed. And like wget does (with wget -c), it restarts where it stopped.
The number of maximum tries can be changed.</p>
Wordze.com API bindings (Python)
2010-01-21T21:24:39-08:00Sergei Lebedevhttp://code.activestate.com/recipes/users/4172845/http://code.activestate.com/recipes/577018-wordzecom-api-bindings/
<p style="color: grey">
Python
recipe 577018
by <a href="/recipes/users/4172845/">Sergei Lebedev</a>
(<a href="/recipes/tags/api/">api</a>, <a href="/recipes/tags/bindings/">bindings</a>, <a href="/recipes/tags/wordze/">wordze</a>, <a href="/recipes/tags/wordze_com/">wordze_com</a>).
Revision 3.
</p>
<p>Just in case anyone would need the bindings, here's a very basic implementation. For some reasons, <a href="http://wordze.com" rel="nofollow">wordze.com</a> has very strict API usage limitations, so to use this efficiently, you'll probably need to cache every request-response. </p>
<p>Any comments and/or corrections are welcome. :) </p>