Popular recipes tagged "crawler" but not "urllib"http://code.activestate.com/recipes/tags/crawler-urllib/2011-01-31T21:57:58-08:00ActiveState Code RecipesSimple Web Crawler (Python)
2009-08-18T13:21:49-07:00manuelaraozhttp://code.activestate.com/recipes/users/4171484/http://code.activestate.com/recipes/576884-simple-web-crawler/
<p style="color: grey">
Python
recipe 576884
by <a href="/recipes/users/4171484/">manuelaraoz</a>
(<a href="/recipes/tags/crawler/">crawler</a>, <a href="/recipes/tags/htmlparser/">htmlparser</a>, <a href="/recipes/tags/urllib2/">urllib2</a>).
</p>
<p>A simple class that starts in a url and follows links to a desired depth.</p>
Simple Web Crawler (Python)
2011-01-31T21:57:58-08:00James Millshttp://code.activestate.com/recipes/users/4167757/http://code.activestate.com/recipes/576551-simple-web-crawler/
<p style="color: grey">
Python
recipe 576551
by <a href="/recipes/users/4167757/">James Mills</a>
(<a href="/recipes/tags/crawler/">crawler</a>, <a href="/recipes/tags/network/">network</a>, <a href="/recipes/tags/parsing/">parsing</a>, <a href="/recipes/tags/web/">web</a>).
Revision 2.
</p>
<p>NOTE: This recipe has been updated with suggested improvements since the last revision.</p>
<p>This is a simple web crawler I wrote to
test websites and links. It will traverse
all links found to any given depth.</p>
<p>See --help for usage.</p>
<p>I'm posting this recipe as this kind of
problem has been asked on the Python
Mailing List a number of times... I
thought I'd share my simple little
implementation based on the standard
library and BeautifulSoup.</p>
<p>--JamesMills</p>