| Store | Cart

Suggest more finesse, please. I/O and sequences.

From: Qertoip <q...@o2.pl>
Fri, 25 Mar 2005 23:06:17 +0100
Dnia Fri, 25 Mar 2005 12:51:59 -0800, Scott David Daniels napisa?(a):

Thanks for your reply! It was really enlightening.

 
> How about:>      for line in inFile:>          for word in line.split():>              try:>                  corpus[word] += 1>              except KeyError:>                  corpus[word] = 1

Above is (probably) not efficient when exception is thrown, that is most of
the time (for any new word). However, I've just read about the following:
corpus[word] = corpus.setdefault( word, 0 ) + 1


>> wordsLst = wordsDic.items()>> wordsLst.sort( moreCommonWord )> OK, here I'm going to get version specific.> For Python 2.4 and later:>      words = sorted((-freq, word) for word, freq in corpus.iteritems())

This is my favorite! :) You managed to avoid moreCommonWord() through the
clever use of list comprehensions and sequences comaparison rules.


> After python 2.2:>   for negfrequency, word in words:> 	print >>outFile, '%7d : %s' % (-negfrequency, word)

This is also cool, I didn't know about this kind of 'print' usage.


> So, with all my prejudices in place and python 2.4 on my box, I'd> lift a few things to functions:

While I like your functionality and reusability improvements, I will stick
to my as-simple-as-possible solution for given requirements (which I didn't
mention, and which assume correct command line arguments for example).

Therefore, the current code is:
-------------------------------------------------------------------------
import sys

corpus = {}
inFile = open( sys.argv[1] )
for line in inFile:
	for word in line.split():
		corpus[word] = corpus.setdefault( word, 0 ) + 1
inFile.close()

words = sorted( ( -freq, word ) for word, freq in corpus.iteritems() )

outFile = open( sys.argv[2], 'w')
for negFreq, word in words:
	print >>outFile, '%7d : %s' % ( -negFreq, word )
outFile.close()
-------------------------------------------------------------------------

Any ideas how to make it even better? :>


-- 
Regards,
Piotrek

Recent Messages in this Thread
Qertoip Mar 25, 2005 06:17 pm
Scott David Daniels Mar 25, 2005 08:51 pm
Qertoip Mar 25, 2005 10:06 pm
Scott David Daniels Mar 25, 2005 11:30 pm
Larry Bates Mar 25, 2005 10:11 pm
Qertoip Mar 26, 2005 01:24 am
Peter Hansen Mar 26, 2005 02:09 am
Qertoip Mar 26, 2005 10:15 am
Peter Hansen Mar 26, 2005 11:46 am
Messages in this thread