Welcome, guest | Sign In | My Account | Store | Cart

Downloads "All Recipe Authors" pages in ActiveState, uses regular expressions to parse author name and number of their recipes on each page. Finally, it displays the recipe submission distribution (the count of how many authors have submitted how many recipes each).

Python, 36 lines
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
import urllib2
import re

page = 1
contrib = [] # each element of contrib is a tuple consisting of the name of the user and the number of submitted recipes.

while 1: # loop over pages
    print "Processing page %s" % (page)
    f=urllib2.urlopen("http://code.activestate.com/recipes/users/?page=%s" % (page))
    html = f.read()
    f.close()
    
    pattern = '<li><a href="/recipes/users/.*/">(.*)</a>\s*<span class="secondary">\((.*) recipe[s]?\)</span>'
    res = re.findall(pattern, html)
    if res:
        contrib.extend(res)

    if html.find('<span class="next disabled">') != -1: # found at the last page
        break
    else:
        page += 1

# Print users and number of recipes on screen 
#for p in contrib:
#    print p[0], p[1]

# Number of recipes as a list:
nrecipes = [int(p[1]) for p in contrib]

# Print the distribution
n = 1
while n <= max(nrecipes):
    c = nrecipes.count(n)
    if c:
        print "%s people contribute %s recipes each" % (c,n)
    n += 1

Like most social sharing sites, contributions to ActiveState recipes have an uneven, nongaussian distribution: A few users have a large number of contributions while most users contribute only one or two recipes. This program is written to parse and analyze the statistics of recipe contribution in ActiveState.

The program downloads and processes the author list in ActiveState, starting at page 1, adds user names and corresponding number of recipes to the list "contrib", and repeats for the next page. The main loop ends if the page contains the string '<span class="next disabled">'.

At the moment of this submission, the output shows that 875 of 1427 authors (61%) have submitted a single recipe, and the top four authors (0.3%) altogether contribute 10% of all recipes.

4 comments

amir naghavi 12 years, 10 months ago  # | flag

m is not defined

amir naghavi 12 years, 10 months ago  # | flag

it gave such error :

Traceback (most recent call last): File "E:\Download\recipe-577732-1.py", line 32, in <module> while n <=(max(nrecipes) or 1): ValueError: max() arg is an empty sequence

amir naghavi 12 years, 10 months ago  # | flag

i put in code : contrib=m and so i gave the correct answer.

Kaan Ozturk (author) 12 years, 10 months ago  # | flag

Oops, I'm sorry. I'd replaced name "m" with "contrib" for readability, but I forgot to change it in line 16. Fixed now. Thanks.

Created by Kaan Ozturk on Thu, 2 Jun 2011 (MIT)
Python recipes (4591)
Kaan Ozturk's recipes (1)

Required Modules

  • (none specified)

Other Information and Tasks