Welcome, guest | Sign In | My Account | Store | Cart

The wikipedia Python library (available on PyPI) is a wrapper for the official Wikipedia API. The library is higher level and easier to use than the API, though for limited functionality of the API. It can be used to easily do basic access of Wikipedia pages, which could be useful for many educational, reference and other purposes. This recipe shows the basic use of the wikipedia library, by using it to search for information about oranges.

Python, 53 lines
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
# wikipedia_orange.py

# Author: Vasudev Ram

import wikipedia

# First, try searching Wikipedia for a keyword - 'Orange':
# It works, but if there are multiple pages with that word
# in the title, you get them all back.

print "1: Searching Wikipedia for 'Orange'"
try:
    print wikipedia.page('Orange')
    print '-' * 60
except wikipedia.exceptions.DisambiguationError as e:
    print str(e)
    print '+' * 60
    print 'DisambiguationError: The page name is ambiguous'
print

# Next, select one of the results from the search above,
# such as the orange fruit, and search for it,
# replacing spaces in the search term with underscores:
print "2: Searching Wikipedia for 'Orange (fruit)'"
print wikipedia.page('Orange_(fruit)')
print

# The output is:
# <WikipediaPage 'Orange (fruit)'>

# That is because the return value from the above call is a 
# WikipediaPage object, not the content itself. To get the content 
# we want, we have to access the 'content' attrbute of the 
# WikipediaPage object:

#print wikipedia.page('Orange_(fruit)').content

# However, if we access it directly, we may get a Unicode error, so
# we encode it to UTF-8:

result = wikipedia.page('Orange_(fruit)').content.encode('UTF8')
print "3: Result of searching Wikipedia for 'Orange_(fruit)':"
print result
print

orange_count = result.count('orange')
print

# And find the number of occurrences of our original search keyword,
# 'orange', within the resulting content:
print "The Wikipedia page for 'Orange_(fruit)' has " + \
    "{} occurrences of the word 'orange'".format(orange_count)
print

This blog post has more details and sample output:

https://jugad2.blogspot.in/2015/11/using-wikipedia-python-library.html