ASPN ActiveState Programmer Network
ActiveState
/ Home / Perl / PHP / Python / Tcl / XSLT /
/ Safari / My ASPN /
Cookbooks | Documentation | Mailing Lists | Modules | News Feeds | Products | User Groups


Recent Messages
List Archives
About the List
List Leaders
Subscription Options

View Subscriptions
Help

View by Topic
ActiveState
.NET Framework
Open Source
Perl
PHP
Python
Tcl
Web Services
XML & XSLT

View by Category
Database
General
SOAP
System Administration
Tools
User Interfaces
Web Programming
XML Programming


MyASPN >> Mail Archive >> python-list
python-list
Re: performance: initializing dictionary elements
by Skip Montanaro other posts by this author
May 21 2002 3:54PM messages near this date
performance: initializing dictionary elements | Python User Group in Jena/Germany
George>  I found Skip Montanaro's page of performance tips and the
    George>  subsection "Initializing Dictionary Elements" presented an
    George>  optimization using exceptions instead of conditional looping. I
    George>  tried this out for my operation (which was the same one
    George>  incidentally, a full-text index) but found that performance
    George>  actually suffered. I'm using Python 2.1.1. Does anyone have any
    George>  idea why this would happen?

The most likely reason is that Python is not static and I haven't done a
very good job keeping up with changes.  I wrote that missive several years
ago.  (I have a response to it in my Python mailbox dated 06 Nov 1996.)  I
should probably have timestamped the various sections, but didn't think of
it at the time.

I wrote a small script to test these two approaches (appended).  It uses two
sources for words.  One used all the "words" in the tex source for the
library reference manual (234k words, 33k unique).  The other uses
/usr/share/dict/words (45k words, all unique).  I also added another
approach to initialization that uses {}.get().  Dict's hadn't yet grown that
method when I wrote the original document.

In both cases all three versions perform about the same (within about 20% of
each other).  Note that using python's -O flag has an effect on the results
because it removes fewer SET_LINENO instructions from the getinit() function
than from the other two...

-- 
Skip Montanaro (skip@[...].com - http://www.mojam.com/)
"Excellant Written and Communications Skills required" - seen on chi.jobs

import glob, time

def ifinit(words):
    wdict = {}
    has_key = wdict.has_key
    for word in words:
        if not has_key(word): wdict[word] = 0
        wdict[word] = wdict[word] + 1
    return wdict

def excinit(words):
    wdict = {}
    for word in words:
        try:
            wdict[word] = wdict[word] + 1
        except KeyError:
            wdict[word] = 1
    return wdict

def getinit(words):
    wdict = {}
    get = wdict.get
    for word in words:
        wdict[word] = get(word, 0) + 1
    return wdict

def countwords(words):
    print "corpus is", len(words), "words"

    t = time.time()
    for i in range(5):
        wdict = ifinit(words)
    print "using if: %.2f seconds to process" % (time.time()-t),
    print len(wdict), "unique words"

    t = time.time()
    for i in range(5):
        wdict = ifinit(words)
    print "using try/except: %.2f seconds to process" % (time.time()-t),
    print len(wdict), "unique words"

    t = time.time()
    for i in range(5):
        wdict = getinit(words)
    print "using get(): %.2f seconds to process" % (time.time()-t),
    print len(wdict), "unique words"

    print

words = []
for file in glob.glob("/home/skip/src/python/head/dist/src/Doc/lib/*.tex"):
    words.extend(open(file).read().split())
countwords(words)

countwords(open("/usr/share/dict/words").read().split())



-- 
http://mail.python.org/mailman/listinfo/python-list
Thread:
George Thomas
Skip Montanaro

Privacy Policy | Email Opt-out | Feedback | Syndication
© ActiveState Software Inc. All rights reserved