Welcome, guest | Sign In | My Account | Store | Cart

Origin: http://wiki.tcl.tk/3490 Author: Richard Suchenwirth

Given some piece of data where it is doubtful whether they are correct or not, one way to find out is just to ask a search engine like Google, but disregard the results except for the number of found web pages. Chances are that the correct data have a higher hit rate than the faulty one.

Example output in the text widget (asking about a city in Italy, where post code and province were unsure): 'bellaria rn': 3580 hits 'bellaria fo': 609 hits 'bellaria 47814': 1130 hits 'bellaria 47014': 30 hits

These results seem to indicate that 47814 Bellaria RN (Rimini) is the correct address ;-) On single words one might use this for spelling verification: 'suchenwirth': 280 hits 'suchenworth': 0 hits

..or to check how strong an association between several words is: 'suchenwirth tcl': 57 hits 'suchenwirth java': 14 hits

The numbers may change over time, but the tendency ("fuzzy truth") can at least be estimated.

Tcl, 23 lines
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
#!/bin/sh
 # \
 exec wish "$0" "$@"
 package require http
 #http::config -proxyhost proxy -proxyport 80

 proc google'nhits query {
    set url  http://google.yahoo.com/bin/query?p=[string map {" " +} $query]&hc=0&hs=0
    set token [http::geturl $url]
    set data [http::data $token]
    http::cleanup $token
    set nhits 0
    regexp {\n[0-9-]+ of ([0-9]+)} $data -> nhits
    set nhits
 }
 proc go {w} {
    global query
    $w insert end "'$query': [google'nhits $query] hits\n"
 }
 entry .e -textvar query -bg white
 bind .e <Return> {go .t}
 text .t -bg white
 pack .e .t -fill x -expand 1

1 comment

Toby Gabele 20 years, 7 months ago  # | flag

patch for line 13. in order to make this work with the current google.yahoo.com format (as of september 2003), consider replacing line 13 with:

regexp {\n\s?out of about ([0-9 ,]+)} $data -> nhits

cheers,

Tobias

(http://uni-sql.de)

Created by andreas kupries on Tue, 11 Jun 2002 (MIT)
Tcl recipes (162)
andreas kupries's recipes (20)

Required Modules

  • (none specified)

Other Information and Tasks