Welcome, guest | Sign In | My Account | Store | Cart

this script is a simlpe python script to find linux distros details from distrowatch using beautifulsoup,urllib2 modules.The script finds distros distribution details from distrowatch.com when the distribution name is called as argument.

Python, 35 lines
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
from bs4 import BeautifulSoup
from mechanize import Browser
import urllib2 
import sys,re


if len(sys.argv) == 0:
    print "\nSyntax: python %s 'distribution title'" % (sys.argv[0])
    exit()
else :
     distribution = '+'.join(sys.argv[1].split())

try:
  br = Browser()
  br.open("http://distrowatch.com/table.php?distribution="+distribution)
  br.response().read()
  print br.title()
  url = br.geturl()

  content = urllib2.urlopen(url).read()
except urllib2.URLError :
       print "Unable to connect to internet !! OR  not connected to internet !!"
else :
     soup=BeautifulSoup(content)

try :
   title = soup.find("h1").contents[0].strip()
   print "DISTRIBUTION:",title
   ul = soup.findAll("ul")
   li = soup.ul.findAll("li")
   
   for i in li:
       print("{} {}.".format(i.b.text,"".join([a.text for a in i.findAll("a")])))
except:
    print("Link not found Distribution name ERROR")
   
    


    
  
  
  

1 comment

Vatay Világi Norbert 8 years, 2 months ago  # | flag

No parser was explicitly specified, so I'm using the best available HTML parser for this system ("lxml"). This usually isn't a problem, but if you run this code on another system, or in a different virtual environment, it may use a different parser and behave differently.

To get rid of this warning, change this (line 24):

 soup=BeautifulSoup(content)

to this:

 soup=BeautifulSoup(content, "lxml")