Downloading favicon of the website and save it to file, if website doesn't have favicon using default one.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 | import sys
import shutil
import urllib2
import lxml.html
HEADERS = {
'User-Agent': 'urllib2 (Python %s)' % sys.version.split()[0],
'Connection': 'close',
}
def get_favicon(url, path='favicon.ico', alt_icon_path='alticon.ico'):
if not url.endswith('/'):
url += '/'
request = urllib2.Request(url + 'favicon.ico', headers=HEADERS)
try:
icon = urllib2.urlopen(request).read()
except(urllib2.HTTPError, urllib2.URLError):
reqest = urllib2.Request(url, headers=HEADERS)
try:
content = urllib2.urlopen(request).read(2048) # 2048 bytes should be enought for most of websites
except(urllib2.HTTPError, urllib2.URLError):
shutil.copyfile(alt_icon_path, path)
return
icon_path = lxml.html.fromstring(x).xpath(
'//link[@rel="icon" or @rel="shortcut icon"]/@href'
)
if icon_path:
request = urllib2.Request(url + icon_path[:1], headers=HEADERS)
try:
icon = urllib2.urlopen(request).read()
except(urllib2.HTTPError, urllib2.URLError):
shutil.copyfile(alt_icon_path, path)
return
open(path, 'wb').write(icon)
if __name__ == '__main__':
get_favicon('http://code.activestate.com', 'favicon.ico')
|
Favicon retrieval is a fun coding problem to solve, but with all the redirect following and page loads, it usually takes 3-4 seconds to retrieve a favicon. If you need icons faster, http://grabicon.com is a free service that lets you specify icon size, and generates unique defaults for sites that don't have an icon. This provides a uniform user experience for web/mobile applications because icons are guaranteed to be the size you requested, and never missing.
It's this simple:
http://grabicon.com/icon?domain=wikipedia.org
Or specify size, if you don't want the default 32 pixel icon:
http://grabicon.com/icon?domain=wikipedia.org&size=16
There's also a jQuery snippet to automatically add icons to the external links on your web page.