| Store | Cart

Working around a lack of 'goto' in python

From: Roy Smith <r...@panix.com>
Wed, 10 Mar 2004 08:52:35 -0500
In article <9qh2i1-2up.ln1 at home.rogerbinns.com>,
 "Roger Binns" <rogerb at rogerbinns.com> wrote:

> Roy Smith wrote:> > I find it interesting that many> > people who come from a C++ background (especially those who migrated> > from C) tend to dislike exceptions.  It must have been something in the> > water when they were growing up :-)> > It also affects former Java weenies.  As a Java programmer you expect> your code to be run under a JIT (and not interpretted) and you are> aware that exceptions are significantly more expensive than straight> forward code.  This is less of an issue in Python as interpretters> don't tend to have as expensive exceptions compared to "normal" code.

But, exceptions are things that (generally) don't happen very often.  
What difference does it make if something that doesn't happen very often 
is slow?

What really tends to surprise C++ types is that in Python, using 
exceptions is often FASTER than the alternative.  For example, if most 
accesses of a dictionary are for keys that are already there, it's 
faster to just try it and catch the occasional KeyError than to check 
before you try the access.

I wrote a little test program which reads a dictionary, generates the 
leading digram of each word (i.e. the first two letters) then builds a 
histogram of digram frequency.  I build the histogram 10 times to dilute 
out the time it takes to read the file in the first place.

Each of the 10 runs results in about 2 million dictionary accesses, of 
which about 500 generate KeyErrors.  I ran it using two algorithms; one 
catches the KeyErrors and deals with problem, the other uses has_key() 
to check first to avoid raising the exceptions in the first place.  
Here's the results (see full code listing below):

Roy-Smiths-Computer:play$ time ./digram.py -h > has
real    0m7.633s
user    0m6.830s
sys     0m0.100s

Roy-Smiths-Computer:play$ time ./digram.py -e > ex
real    0m5.642s
user    0m4.970s
sys     0m0.120s

The output files (has and ex) compare identical.  The has_key() version 
took 6.8 seconds, the exception catching version took 5.0.

I think what's really interesting about this is that even in C++, where 
exceptions are relatively costly, with a hit ratio of 500:2,000,000 
(0.025%), I bet the exception catching version would still be faster.


---------digram.py----------
#!/usr/bin/env python

import sys

def main ():
        digrams = []
        words = open ("/usr/share/dict/words")
        for word in words:
                digram = word.rstrip().lower()[0:2]
                digrams.append (digram)

        count = 10
        if sys.argv[1] == "-e":
                d = countEx (digrams, count)
        else:
                d = countHas (digrams, count)

        digrams = d.keys ()
        digrams.sort ()
        for digram in digrams:
                print "%s: %d" % (digram, d [digram])

def countEx (digrams, count):
        d = {}
        for i in range (count):
                for digram in digrams:
                        try:
                                d[digram] += 1
                        except KeyError:
                                d[digram] = 1
        return d

def countHas (digrams, count):
        d = {}
        for i in range (count):
                for digram in digrams:
                        if d.has_key (digram):
                                d[digram] += 1
                        else:
                                d[digram] = 1
        return d

if __name__ == "__main__":
        main ()
-----------------------------------

Recent Messages in this Thread
Brett Mar 06, 2004 06:18 pm
Andrew Koenig Mar 06, 2004 06:30 pm
Carmine Noviello Mar 06, 2004 06:41 pm
Jeff Schwaber Mar 06, 2004 06:41 pm
Peter Otten Mar 06, 2004 06:48 pm
Rene Pijlman Mar 06, 2004 06:49 pm
Christian Tismer Mar 06, 2004 08:50 pm
Roy Smith Mar 06, 2004 09:16 pm
Stephen Horne Mar 08, 2004 05:01 am
Dan Bishop Mar 06, 2004 11:23 pm
David M. Cooke Mar 07, 2004 08:49 am
Stephen Horne Mar 07, 2004 06:35 pm
Roy Smith Mar 07, 2004 06:54 pm
Stephen Horne Mar 07, 2004 07:57 pm
Roy Smith Mar 07, 2004 08:49 pm
Stephen Horne Mar 08, 2004 02:54 am
benjamin schollnick Mar 08, 2004 03:38 am
Stephen Horne Mar 08, 2004 05:10 am
Roger Binns Mar 07, 2004 10:33 pm
Roy Smith Mar 08, 2004 02:11 am
Roger Binns Mar 08, 2004 05:09 am
Georgy Mar 08, 2004 09:09 am
Jeff Epler Mar 08, 2004 02:50 pm
Mel Wilson Mar 08, 2004 03:43 pm
Roger Binns Mar 08, 2004 08:43 pm
Donn Cave Mar 08, 2004 10:46 pm
Roger Binns Mar 09, 2004 12:11 am
Stephen Horne Mar 08, 2004 03:30 am
Stephen Horne Mar 08, 2004 05:29 am
Y2KYZFR1 Mar 08, 2004 04:41 pm
Lou Pecora Mar 08, 2004 04:59 pm
Y2KYZFR1 Mar 09, 2004 04:36 pm
Joe Mason Mar 09, 2004 04:57 pm
Roy Smith Mar 09, 2004 05:25 pm
Joe Mason Mar 09, 2004 06:53 pm
Roy Smith Mar 09, 2004 09:05 pm
Roger Binns Mar 10, 2004 04:23 am
Roy Smith Mar 10, 2004 01:52 pm
Isaac To Mar 10, 2004 05:36 am
Jeff Epler Mar 10, 2004 01:05 pm
Isaac To Mar 10, 2004 03:58 pm
gabor Mar 10, 2004 04:34 pm
Donn Cave Mar 10, 2004 05:49 pm
Stephen Horne Mar 11, 2004 03:02 am
Jacek Generowicz Mar 11, 2004 11:40 am
Stephen Horne Mar 11, 2004 02:50 pm
Stephen Horne Mar 09, 2004 09:43 pm
Roy Smith Mar 09, 2004 09:55 pm
Stephen Horne Mar 10, 2004 12:41 am
Isaac To Mar 10, 2004 05:16 am
Christopher A. Craig Mar 08, 2004 07:54 pm
leeg Mar 09, 2004 01:34 am
Stephen Horne Mar 09, 2004 09:52 pm
leeg Mar 10, 2004 02:46 pm
Stephen Horne Mar 10, 2004 04:58 pm
Roger Binns Mar 10, 2004 06:14 pm
David MacQuigg Mar 09, 2004 11:26 pm
Anton Vredegoor Mar 10, 2004 10:11 am
Messages in this thread