Learning Python 2nd Edition

Book Updates Page

This page lists recent changes to Python that impact the book Learning Python 2nd Edition, as well as assorted author's notes on the book itself. Note that this page:

The main sections here:

Most recent change to this page: June 15, 2005



New Features in Python

Recent Python releases include a number of enhancements that address some issues discussed in the book. Python 2.4 was released after this book's publication.


The enumerate function

The new enumerate built-in function allows you to obtain both values and their positions within a for loop. In practice, it can obviate the need to keep a loop counter manually. Use the list built-in to force enumerate to generate the position/value pairs all at once (for loops and other iteration contexts grab one pair on each iteration).

>>> S = "abcdefghij"

>>> e = enumerate(S)
>>> e
<enumerate object at 0x00B5EB70>
>>> e.next()
(0, 'a')
>>> e.next()
(1, 'b')
 
>>> list(enumerate(S))
[(0, 'a'), (1, 'b'), (2, 'c'), (3, 'd'), (4, 'e'), (5, 'f'), (6, 'g'), (7, 'h'), (8, 'i'), (9, 'j')]
 
>>> for (i, x) in enumerate(S):
	print x * i

b
cc
ddd
eeee
fffff
gggggg
hhhhhhh
iiiiiiii
jjjjjjjjj


The sorted function

The new sorted built-in function returns a new list containing the result of sorting another list (really, any iterable object's values). Because it is not an in-place operation on the original list, it can come in handy to work around the gotcha that occurs when trying to step through a dictionary's values in sorted-key order. Note that the last example below also uses the fact that a dictionary's iterator returns one key at a time--you don't need to call D.keys():

>>> D = {'a':1, 'b':2, 'c':3}
>>> D
{'a': 1, 'c': 3, 'b': 2}
>>> 
>>> for k in D.keys().sort():        # bad
	print D[k]

Error...

>>> Ks = D.keys()
>>> Ks.sort()
>>> for k in Ks:                     # okay
	print D[k]

1
2
3

>>> D
{'a': 1, 'c': 3, 'b': 2}
>>> 
>>> for k in sorted(D):              # okay (2.4)
	print D[k]
	
1
2
3

A similar new function, reversed(seq), returns a new reverse iterator object useful in for loops and other iteration contexts, instead of reversing in-place like the list reverse method.


Generator expressions

In 2.4, you can use parenthesis instead of square brackets around a list comprehension. If you do, then instead of producing a list containing all the loop's results, it produces a generator object that yields one result at a time in iteration contexts (e.g., in for loops). If the comprehension produces very many values, this can save memory, because the results list is never physically created. The parenthesis used for a function call suffice to enclose the expression.

>>> lines = [line[:-1] for line in open('lgmt2.py')]
>>> lines
['# comment here', 'import os', 'print os.getcwd()', 'print 2 ** 750']
 
>>> lines = (line[:-1] for line in open('lgmt2.py'))
>>> lines
<generator object at 0x00B368A0>
>>> lines.next()
'# comment here'
>>> lines.next()
'import os'
 
>>> lines = list(line[:-1] for line in open('lgmt2.py'))
>>> lines
['# comment here', 'import os', 'print os.getcwd()', 'print 2 ** 750']


The set type

Sets are now available as a built-in function, set. In 2.3, they were in a module. The usual set operations (intersection, union, difference, and more) are available as expression operators.

>>> x = set('abcd')
>>> x
set(['a', 'c', 'b', 'd'])
 
>>> y = set('bxd')

>>> x & y                           # intersect
set(['b', 'd'])
 
>>> x | y                           # union
set(['a', 'c', 'b', 'd', 'x'])
 
>>> x - set('bc')                   # difference
set(['a', 'd'])


The decimal type

Pyhon 2.4 includes a new numeric type available in the decimal module, for applications where the potential inaccuracies of floating-point hardware are significant (e.g., for representing money or other accounting data). For instance, normal floating-point objects cannot exactly represent some values, as you can see if you accept the default "repr" display format at the interactive prompt; the print statement gives the user-friendy "str" format instead, which sometimes hides a real innacuracy that can become significant in testing or accumulations:

>>> 35.72 + 1.73
37.449999999999996
>>> 35.72 - 1.73
33.990000000000002
>>> print 35.72 + 1.73
37.45

Decimal objects retain correct results:

>>> import decimal
>>> a = decimal.Decimal('35.72')
>>> b = decimal.Decimal('1.73')
>>> a + b
Decimal("37.45")
>>> a - b
Decimal("33.99")

A Context object in this module also allows for specifying precision (number of decimal digits), and rounding modes (down, ceiling, etc.):

>>> decimal.Decimal(1) / decimal.Decimal(7)
Decimal("0.1428571428571428571428571429")

>>> decimal.getcontext().prec = 4
>>> decimal.Decimal(1) / decimal.Decimal(7)
Decimal("0.1429")

Here is another example of how decimal math gives correct results, when floating math cannot (notice that using print to get the "str" display format doesn't help much here to hide the floating point inaccuracy):

>>> 0.1 + 0.1 + 0.1 - 0.3
5.5511151231257827e-017

>>> print 0.1 + 0.1 + 0.1 - 0.3
5.55111512313e-017

>>> from decimal import Decimal
>>> Decimal('0.1') + Decimal('0.1') + Decimal('0.1') - Decimal('0.3')
Decimal("0.0")

>>> Decimal('0.1') + Decimal('0.10') + Decimal('0.10') - Decimal('0.30')
Decimal("0.00")
>>> 0.1 + 0.10 + 0.10 - 0.30
5.5511151231257827e-017

Decimal objects work the same as floating point objects, in most contexts (that is, they support the same object interface), and the initial construction call determines their precision. For more details, see the Python library manual.


Third slice limit

This was mentioned very briefly in the book, but not illustrated. Using the 3rd slice limit (the stride/step) is often a simpler alternative to a 3-argument range in for loops.

>>> S = 'abcdefghij'
>>> for i in range(0, len(S), 2):
	print S[i],
	
a c e g i
 
>>> for c in S[::2]:
	print c,
	
a c e g i
This is also a handy way to reverse sequences, and allows slices to run right to left:
>>> S[::-1]
'jihgfedcba'

>>> S[7:2:-1]
'hgfed'


Dictionary construction forms

There are many ways to create dictionaries now, and this may not have been summarized as clearly as it might have been. Dictionaries can be created with literals, with keyword arguments, one field at a time, with zipped key/value pairs, and with initial key values:

literals

>>> D = {'name': 'Bob', 'age': 42, 'job': 'dev'}
>>> D
{'job': 'dev', 'age': 42, 'name': 'Bob'}
keywords
>>> D = dict(name='Bob', age=42, job='dev')
>>> D
{'job': 'dev', 'age': 42, 'name': 'Bob'}
field by field
>>> D = {}
>>> D['name'] = 'Bob'
>>> D['age']  = 42
>>> D['job']  = 'dev'
>>> D
{'job': 'dev', 'age': 42, 'name': 'Bob'}
zipped keys/values
>>> pairs = zip(['name', 'age', 'job'], ('Bob', 42, 'dev'))
>>> pairs
[('name', 'Bob'), ('age', 42), ('job', 'dev')]
>>> D = dict(pairs)
>>> D
{'job': 'dev', 'age': 42, 'name': 'Bob'}
key lists
>>> D = dict.fromkeys(['name', 'age', 'job'], '?')
>>> D
{'job': '?', 'age': '?', 'name': '?'}


String membership

The in membership test now works better on strings.

>>> S = 'abcdefg'
>>> 
>>> if S[0] in 'xgdhalk':
	print 'yes'

yes

>>> X = 'p'
>>> if X == 's' or X == 'p' or X == 'a':
	print 'yes'

yes
>>> if X in 'spa':
	print 'yes'

yes


Template substitution

Python 2.4 adds a simpler form of string substitution. The traditional way of substituting variables is with the % operator and dictionaries:

>>> '%(page)i: %(title)s' % {'page': 2, 'title': 'The Best of Times'}
'2: The Best of Times'

A Template class has been added to the string module that uses $ to indicate a simpler substitution:

>>> import string
>>> t = string.Template('$page: $title')
>>> t.substitute({'page': 2, 'title': 'The Best of Times'})
'2: The Best of Times'

Substitution values may be keyword arguments or dictionary keys:

>>> s = string.Template('$who likes $what')

>>> s.substitute(who='bob', what=3.14)
'bob likes 3.14'

>>> s.substitute(dict(who='bob', what='pie'))
'bob likes pie'



General Book Notes

This section collects assorted notes and clarifications to the book's material, which are not a result of Python changes.


Reassigning built-in names is not necessarily bad

On page 202 in chapter 13, there is a warning about reassigning bult-in names. For example, "open=42" hides the built-in open function by creating an "open" in your closer scope ("True=False" works too!). Avoiding such reassignments is probably a good idea in general, but note that this is not a problem, unless you really do need to use the original built-in version of a name. If you don't, you're free to reuse built-in names for your own purposes.

For instance, Python programmers often use the name "file" as a variable, even though it is a built-in name that refers to the file datatype. This is not at all wrong, unless you need to refer to the datatype (and most programs do not). Similarly, using the name "dir" to refer to a directory name string is okay, unless you also need to call the built-in dir function in the same scope.

In fact, one of the reasons that you can reassign built-in names is because there are too many for you to have to worry about (and new ones are added all the time). If all the built-in names were reserved words, you'd have to consult a long and changing list each time you pick a new variable name. That's why the built-ins are just preset names in the outer scope; you can reuse any of them for your own variables, as long as you don't need to refer to the built-in version. Although it's important to understand the issue, enforcing a style that dictates avoiding all built-in names would be as bad as making all of them reserved words.

Moreover, you may really want to hide the built-in version at times -- to provide a custom open function that validates directory access, for example:

import __builtin__

def open(name, mode='r'):
    if approved(name):
        return __builtin__.open(name, mode)
    else:
        raise IOError, name + ' permission denied!'


Generators versus threads

In the advanced function topics chapter, generator functions (those containing a "yield" statement) are described as a way to distribute work over time, istead of computing many results all at once. This is true, but it should be noted that threads can often serve the same role. In a typical threaded program, a producer is spawned as parallel thread of control that adds results to a queue; the main thread queries the queue periodically for results. In some ways, threading is a more general way to distribute computation over time than generators, since there may be multiple producer and consumer threads running in parallel. Generators, however, also retain their local scope as state information, and do not require thread support or synchronization of access to global data.


Global variables and threads

Speaking of global data and threads, the book also warns about avoiding global variables whenever possible. This is also true in general, but it should be noted that threaded function calls normally communicate by setting and checking global variables such as queues (all threads share the same global scope, as well as system internals). Threads are beyond the scope of this book; see Programming Python or other books for more details.


Relative speed of for loops, maps, and list comprehensions

The book makes a claim about the relative speeds of equivalent list comprehensions, map calls, and simple for loops. While the claim is true for the code tested and the time at which it was tested, it turns out that the speeds of these three alternatives are highly dependent on both Python versions and subtle coding variations. In fact, as I'll show here, the differences are so subtle and convoluted as to make a single rule of thumb elusive. Because performance analysis is complex, the book may have done better to not claim any relative differences at all; you should time your own code to get an accurate measure.

1. The effect of version numbers

The relative speed of list comprehensions, map calls, and simple for loops has changed since this edition was written. On page 232, the book gives the results for Python 2.2: map and list comprehensions were nearly twice as fast as a for loop statement, and list comprehensions were slightly quicker than a map. Because Python is optimized with nearly every new release, though, this is different today.

Below are some test runs that exercise all three alternatives under Python 2.2, 2.3, and 2.4, on the same machine (the test code follows the results below). The output shows total time for 1 million iterations during a test. The results: today, in 2.4 the for loop is still roughly twice as slow as a list comprehension, but the map call is actually slower than a for loop. That is, the list comprehension is about twice as quick as both for loops and map calls. In fact, list comprehensions and for loops both became roughly twice as quick between 2.2 and 2.4, but map became slightly slower. All three started speeding up in 2.3, and improved in 2.4 -- except map, which grew slower in 2.4.

Just for fun, I also threw in two similar nested list comprehensions (their result lists vary slightly), as well as a 2.4 generator expression. All three are quicker than a map or for loop, but do not beat the basic list comprehension (and are arguably tougher to understand).

C:\...>c:\python22\python timerseqs.py
listcomp        : 1.45699355644
mapfunc         : 1.50583607693
forloop         : 2.55587707376
--------------------
listcomp        : 1.4429356245
mapfunc         : 1.50292565117
forloop         : 2.56842755155
--------------------
nestedcomp1     : 2.42550240324
nestedcomp2     : 1.65157415258
Test finished.

C:\...>c:\python23\python timerseqs.py
listcomp        : 1.28096028965
mapfunc         : 1.48229788982
forloop         : 1.83916419545
--------------------
listcomp        : 1.28577291248
mapfunc         : 1.47013265652
forloop         : 1.84038250672
--------------------
nestedcomp1     : 2.30293207656
nestedcomp2     : 1.5596926425
Test finished.

C:\...>c:\python24\python timerseqs.py
listcomp        : 0.709736572665
mapfunc         : 1.58013015621
forloop         : 1.35704845169
--------------------
listcomp        : 0.72173055514
mapfunc         : 1.58334313439
forloop         : 1.34741901555
--------------------
nestedcomp1     : 1.26382878271
nestedcomp2     : 0.93540044894
gencomp         : 0.841136792525
Test finished.

As usual, you should always time on your Python, because both versions and hardware differences such as caching can have a huge impact on speed. The tests above were run under Windows XP, on a somewhat dated 800 MHz Pentium3 laptop with 256M memory. Below is the output of this test on a faster 2.8GHz Windows XP machine; it's quicker, but the relative performance of the coding alternatives is the same:

C:\Python24>python timerseqs.py
listcomp        : 0.273727879835
mapfunc         : 0.602030159705
forloop         : 0.498468066111
--------------------
listcomp        : 0.283013537798
mapfunc         : 0.601513854923
forloop         : 0.499488582081
--------------------
nestedcomp1     : 0.399356399634
nestedcomp2     : 0.321446681618
gencomp         : 0.307038229244
Test finished.

And here is the test code I used to measure this; change the rep* variables to try different scenarios, and add print statements with slicing if you want to see what is being built. Notice the exec to conditionally compile the 2.4 generator expression test; it's illegal syntax before that version.

timerseqs.py

import time, sys
repouter = 1000        # build 1K result list 1K times
repinner = 1000        # 1K * 1K = 1M iterations per test

def runtest(func):
    start = time.clock()
    func()
    stop = time.clock()
    print func.__name__, '\t:', stop - start

def listcomp():
    for i in xrange(repouter):
        t = [x**2 for x in range(repinner)]

def mapfunc():
    for i in xrange(repouter):
        t = map((lambda x: x**2), range(repinner))

def forloop():
    for i in xrange(repouter):
        t = []
        for x in range(repinner):
            t.append(x**2)

def nestedcomp1():
    # but builds a longer list (1M)
    t = [x**2 for y in range(repouter) for x in range(repinner)]

def nestedcomp2():
    # but keeps many lists (1K * 1K)
    t = [[x**2 for x in range(repinner)] for y in range(repouter)]

def testcases():
    for i in range(2):
        runtest(listcomp)
        runtest(mapfunc)
        runtest(forloop)
        print '-'*20

    runtest(nestedcomp1)
    runtest(nestedcomp2)

    if sys.version_info[:2] >= (2, 4):
        # 2.4 generator expression syntax
        exec """def gencomp():
            for i in xrange(repouter):
                e = (x**2 for x in range(repinner))
                t = list(e)"""
        runtest(gencomp)

testcases()
raw_input('Test finished.')

2. The effect of coding styles: built-in functions and map

After writing the preceding section, I discovered that map can actually still be quicker than list comprehensions (and for loops) in 2.4, if the operation applied to all items is a built-in function call. That is, map's handicap seems to occur when using a lambda function as the per-item operation. For instance, if you change the test script listed in the prior section to apply the built-in abs() function to each item, instead of running x **2:

def listcomp():
    for i in xrange(repouter):
        t = [abs(x) for x in range(repinner)]

def mapfunc():
    for i in xrange(repouter):
        t = map(abs, range(repinner))

def forloop():
    for i in xrange(repouter):
        t = []
        for x in range(repinner):
            t.append(abs(x))

Then map wins -- it runs almost twice as quick as a list comprehension, and three times quicker than a for loop:

# running abs(x) instead of x**2:

listcomp        : 0.760260591779
mapfunc         : 0.441295192545
forloop         : 1.29827980931
--------------------
listcomp        : 0.722472828251
mapfunc         : 0.443123357857
forloop         : 1.30097987314
--------------------
nestedcomp1     : 1.22460480312
nestedcomp2     : 0.985956874407
gencomp         : 0.845157135893
Test finished.

And if you change the code to apply a str() built-in function call to each item (a more complex operation), you get similar relative performance, though the cost of str() outweighs some of the for loop's loss:

# running str(x) instead of x**2:

listcomp        : 4.08835564297
mapfunc         : 2.60152589226
forloop         : 4.63220744536
--------------------
listcomp        : 3.85740282634
mapfunc         : 2.53059816262
forloop         : 4.48695043644
--------------------
nestedcomp1     : 4.58792388418
nestedcomp2     : 4.3173263641
gencomp         : 3.74688989802
Test finished.

The apparent conclusion here: using lambda with map seems to be the bottleneck; with built-in function calls, map can be quickest.

Unfortunately, in the case of running something like X **2 on each iteration, there is no built-in function that can be applied to the items: we need to pass in 2 as an argument to the function call, and so must provide a function that defers ("wraps") the call. map only accepts a function object to apply, not extra arguments to pass to that function. The pow(x,y) built-in would almost work, but pow doesn't know about 2 -- we would still need to wrap the call to pow in another function, in order to pass the 2 into the calls to pow.

For example, the following fails, because of too few arguments -- there is no place to specify the 2:

>>> map(pow, range(10))
...error...

But this works, thanks to the lambda function that takes a single argument, and defers the call to pow with the extra argument:

>>> map((lambda x: pow(x, 2)), range(10))
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

Hence, the lambda speed hit seems inevitable when using map, except for 1-argument functions. Because list comprehensions allow for general expressions, they should perform better whenever there is more than 1 argument to be passed to the applied function or operation. map can be quickest when applying a built-in 1-argument function call, but lambdas slow down maps substantially.

3. The effect of coding styles: lambda and def

Actually, it's even a bit more subtle than that: the real bottleneck for map seems to be calling user-defined and Python-coded functions in general, whether they are created with lambda or def. For example, here is part of the test script and its output when using the original map/lambda combination -- map is slowest in 2.4 as previously observed:

def mapfunc():
    for i in xrange(repouter):
        t = map((lambda x: x ** 2), range(repinner))

C:\Python24>timerseqs.py
listcomp        : 0.767703437169
mapfunc         : 1.69127221476
forloop         : 1.45426052753
--------------------
listcomp        : 0.767938103865
mapfunc         : 1.75999184254
forloop         : 1.53275905178
--------------------

But if we replace the lambda with a local function def, it makes no noticeable difference in the operation's speed:

def mapfunc():
    for i in xrange(repouter):
        def square(x): return x ** 2
        t = map(square, range(repinner))

C:\Python24>timerseqs.py
listcomp        : 0.770341481948
mapfunc         : 1.72459656185
forloop         : 1.5332769947
--------------------
listcomp        : 0.816619151317
mapfunc         : 1.75671461038
forloop         : 1.45702875645
--------------------

And even if we hoist the def outside the inner loop, it still has no noticeable impact on the speed of the map test. In fact, def seems to have no significant speed advantage over the lambda; the real cost is calling a non-builtin function within map:

def mapfunc():
    def square(x): return x ** 2
    for i in xrange(repouter):
        t = map(square, range(repinner))

C:\Python24>timerseqs.py
listcomp        : 0.767686116532
mapfunc         : 1.71990881523
forloop         : 1.45481227363
--------------------
listcomp        : 0.768373075349
mapfunc         : 1.693961104
forloop         : 1.45097240012
--------------------

4. The effect of coding styles: extra argument lists

One more twist: with clever coding, it is possible to provide the extra 2 argument for the square operation without a wrapping function layer, by passing more than one sequence to map. Given two sequences, map pairs up items taken in parallel from the sequences, and passes them as two arguments to the function on each iteraration (or stuffs them into tuples if the function is None):

>>> [2] * 5
[2, 2, 2, 2, 2]

>>> map(None, range(5), [2]*5)
[(0, 2), (1, 2), (2, 2), (3, 2), (4, 2)]

Now, giving map a list of many 2s allows us to apply the built-in pow(x,y) to the list, to generate squares:

>>> repinner = 10
>>> t = map(pow, range(repinner), [2]*repinner)
>>> t
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

Below is the updated test script code and output using this technique. This makes the map operation roughly twice as fast as when passing in a user-defined function, and almost as fast as the comprehension. On the other hand, it seems like a bit of a trick, and may not be a general solution to the extra arguments dilema; in same cases, extra arguments may be more complex to generate.

def mapfunc():
    for i in xrange(repouter):
        t = map(pow, range(repinner), [2]*repinner)

C:\Python24>timerseqs.py
listcomp        : 0.769735818379
mapfunc         : 0.990321954327
forloop         : 1.45857364553
--------------------
listcomp        : 0.770787907402
mapfunc         : 0.993462576948
forloop         : 1.44713308535
---------------------

5. The effect of coding styles: function calls and comprehensions

And if the prior sections have not muddled the waters enough, list comprehensions can wind up slowing down by a factor of 2 when function calls instead of simple expressions are appied. In the original, we run the expression X **2:

def listcomp():
    for i in xrange(repouter):
        t = [x ** 2 for x in range(repinner)]

C:\Python24>timerseqs.py
listcomp        : 0.775519234987
mapfunc         : 0.984034563052
forloop         : 1.45646723257

But changing this to call the pow() built-in function instead negates the list comprehension's win completely; it runs twice as slow, and map prevails again:

def listcomp():
    for i in xrange(repouter):
        t = [pow(x, 2) for x in range(repinner)]

C:\Python24>timerseqs.py
listcomp        : 1.40437038786
mapfunc         : 0.985775007717
forloop         : 1.45322408295
--------------------

A deeper understanding of this slowdown, and that for map with user-defined functions, would require coming to grips with the way that Python internally performs these tasks. Barring that, the best we can do is time specific cases.

6. Summary

As you can see, the speed of coding alternatives depends greatly on both Python versions, and coding techniques used. As always, you should time your own specific code to get a clear performance picture. In this case, the simple task of measuring the relative speeds of for loops, comprehensions, and map, is not as clear cut as one might think. In the end, all we can say is that for the code tested, in 2.4:

In other words, list comprehensions are often fastest (but can be made slow by coding style), map is highly influenced by coding styles (but can be fastest for some built-ins), for loops usually lag behind, and all of this has changed over time with new Python releases. Furthermore, all these findings apply only to the test script used; although reasonable, the structure of the code in the test script may be arbitrarily different from that used in a particular application.

Of course, this may all change again in 2.5, and not every application is concerned with milliseconds. Performance analysis of Python code, though, is still a non-trivial undertaking.


Other ways to access global variables

It's possible to change both a local and global version of the same variable name within a function. This was mentioned in a gotcha at the end of chapter 14, and a footnote on page 284 in chapter 18, but not summarized in a single place. The following module illustrates all the techniques at once: assignment makes a simple name local and the global declaration makes all occurrences global, but importing the enclosing module or going through the sys table allows both a local and global version to be changed (you do have to import the enclosing module name, because it is not added to your scopes automatically).

thismod.py

var = 99

def local():
    var = 0                 # change local var
    
def glob1():
    global var              # declare global (normal)
    var += 1                # change global var

def glob2():
    var = 0                 # change local var
    import thismod          # import myself
    thismod.var += 1        # change global var

def glob3():
    var = 0                           # change local var
    import sys                        # import system table
    glob = sys.modules['thismod']     # get module object
    glob.var += 1                     # change global var

def test():
    print var
    local(); glob1(); glob2(); glob3()
    print var
    

>>>  import thismod
>>>  thismod.test()
99
102
>>>  thismod.var
102


execfile() versus import+reload()

Page 36 in chapter 2 discusses the need to interactively reload a module once it has been imported already, after you make code changes (import only happens once per process by default). Page 39 goes on to suggest avoiding testing by imports and reloads, and using other launching techniques instead, such as the IDLE run menu pulldown.

That is still a good rule of thumb, but another alternative is to run programs from the interactive prompt by using the execfile("filename.py") built-in function. This will always run the current version of the code in the named file each time it is called, and so avoids the import/reload confusion. execfile works as though the code in the file were pasted in the place where the execfile call appears -- it runs the file's code in the namespace of the execfile() call itself, so that items defined in the file become variables in that scope (e.g., they will show up in the interactive prompt, and can be tested by name).

execfile does not actually import the module, though, and won't let you know if you've forgotten to save the changed file. More subtlely, names assigned in the executed file may overwrite names that you are using interactively (you can work around this by passing in namespace dictionaries, but it's more complex). You won't be warned about the overwrites, and won't even notice until a name is no longer what you think is it. Because of that, the IDLE run menu or system command lines are still probably the best ways to go at first. Barring that, import and reload run the module in its own namespace, and so are at least not prone to silent name overwrites like execfile.

Eventually, though, you'll probably outgrow the interactive prompt altogether, and put test code at the bottom of script files, perhaps wrapped up in __name__ == '__main__' tests. The interactive prompt is handy when starting out, but tends to require too much typing and retyping in general.


Speed of linear versus sort-based min/max

On page 214 in chapter 13, three alternative implementations of min are presented -- two using linear for loop scans, and one using the list sort method. The book claims that the sort version is usually fastest, because of the speed of the underlying C sorting code (it's a hyper-optimized routine that actually picks one of two alternative algorithms, based upon some up-front analysis). This claim was true for the test cases run; apparently, though, the speed of the alternatives can vary widely depending on the relative ordering of the data to be sorted. That is, if the data is already mostly sorted, one scheme can beat the other; in truly random scenarios, and for large enough data sets, the linear nature of the for loop versions can sometimes beat the sort-based version.

Coding and timing this is left as an exercise for the reader (but see the list comprehension timing script above for hints). Note that these results can vary per Python release, because the list sort method may change. The point the book was making is that although it seems like a linear scan should beat a sort in almost all cases, the speed difference between a C-coded sorter and running Python virtual machine code can sometimes outweigh algorithmic differences, at least for some data sets. Timing your data sets is the only way to know for certain.


Generalizing min/max with classes

And since I mentioned min: page 215 gives an example that generalizes min and max into a single function, by making the caller pass in the desired comparison function (a less-than or greater-than function). The point of this exercise is to avoid code redundancy -- it's not a good idea to have two versions of a function that differ in just a single comparison operator; if you ever have to change one, you'll need to change both:

def minmax(test, *args):
    res = args[0]
    for arg in args[1:]:
        if test(arg, res):
            res = arg
    return res

def lessthan(x, y): return x < y
def grtrthan(x, y): return x > y

print minmax(lessthan, 1, 4, 6, 0, 3)                  # prints 0
print minmax(grtrthan, 1, 4, 6, 0, 3)                  # prints 6

print minmax((lambda x, y: x > y), 1, 4, 6, 0, 3)      # prints 6

import operator
print minmax(operator.gt, 1, 4, 6, 0, 3)               # prints 6

This function-based approach shown in the book works, but classes (discussed in a later chapter) are generally better at customizing software. For example, we could also generalize by coding a common superclass that implements the algorithm in just one place, and expects subclasses to provide the required comparison method. In a sense, you still pass in the comparison implicitly, by making the right kind of subclass object:

class MinMax:
    def run(self, *args):
        res = args[0]
        for arg in args[1:]:
            if self.test(arg, res):
                res = arg
        return res

class Min(MinMax):
    def test(self, x, y): return x < y
    
class Max(MinMax):
    def test(self, x, y): return x > y

m = Min()
print m.run(1, 4, 6, 0, 3)                     # prints 0
print Max().run(1, 4, 6, 0, 3)                 # prints 6


Min/max one-liners using sorted()

And another thing: a student in a class I was teaching pointed out that the new 2.4 sorted() function allows for a one-line min solution, because it is not an in-place operation, and accepts any sequence/iterable including a tuple. The original sort-based min:

def min(*args):
    tmp = list(args)
    tmp.sort()
    return tmp[0]

can now be coded as follows (of course min is also a builtin, but this technique applies to other tasks too, such as dictionary key traversals):

def min(*args):
    return sorted(args)[0]



More Python Gotcha's

This section clarifies and extends the discussions of common Python mistakes in the book.


Cross-module changes

This was mentioned in the modules design issues section, but may not have been illustrated as explicitly as it could have been. It's always okay to use names imported from another module, but changing names in a different module is generally a bad idea unless it is well known -- it's hard for the file's reader to know about such remote changes. For example, given a module file M.py:

X = 99

Any module can import and refer to its top-level (global) names -- that's how modules use other modules, and how program architecture works:

import M
print M.X ** 2

Changing from another file makes maintenance difficult, though:

import M
M.X = 88

because the change sets up very strong and remote coupling between the two files. A better way to communicate across file boundaries is to call functions, passing in arguments and getting back return values. This scheme isolates the point of coupling to well-known interfaces -- the reader of your code expects interaction at the point of function calls:

from M import func
result = func(input)


Initialize your variables

Again, this was stated, but perhaps could have been called out better. Python variables must always be initialized before they can be used at all. This means you must initialize counters to 0, lists to an empty list, and so on.


Gotcha's article

For another list of gotcha's, see the article I wrote for O'Reilly at this page


Nested scopes gotcha

The book discusses the way Python automatically looks up names in the local scopes of enclosing functions, in the scopes material of Chapter 13 on pages 203-206, and then again for lambda in Chapter 14 on pages 220 and 224. The discussion is accurate as is, and makes the point that it is usually no longer necessary to pass in values from the enclosing scope with default arguments.

It turns out, however, that there is one case where you'll still want to pass in values with defaults. Because of that, nested scopes are not a complete replacement for the prior coding pattern. This seems like a gotcha (in fact, I was recently burned by this myself, when I blindly got rid of all defaults in the lambdas of a GUI program--all the button callbacks created in a loop did the same thing!).

The issue is a bit subtle, but within a function, references to names in the enclosing scope (whether that is a function or module) are resolved when the function is called, not when it is created. Because of this, when the function is later called, such name references will reflect the latest (or final) assignments made to the names anywhere in the enclosing scope, which are not necessarily the values they held when the function was made.

By contrast, default argument values are evaluated when a function is created, not when it is later called. Because of that, they can be used to remember the values of enclosing scope variables as they were when you made the function. Unlike enclosing scope name references, defaults will not have a different value if the variable later changes in the enclosing scope. (In fact, this is why mutable defaults retain their state between calls—they are created only once, when the function is made.)

This is normally a non-issue, because most enclosing scope references name a variable that is assigned just once in the enclosing scope (e.g., "self"). This can be a problem if you create functions within a loop, though--if you reference the loop variable within such a function, it will evaluate to the value it was given on the last loop iteration in all the nested functions generated. By contrast, if you use defaults instead, each function will remember the current value of the loop variable, not the last.

The net effect is that when making a function within a loop, you still must use defaults to remember the loop variable. Here's some code to illustrate; in the following, "spam" from the enclosing scope is automatically remembered, even though the enclosing scope exits before the nested function is called:

def weird():
    tmp = (lambda: spam * 2)        # remembers spam
    spam = 42                       # even though not set till here
    return tmp

act = weird()
print act()     # prints 84

In this code, the variable "spam" hasn't even been assigned yet when the nested function is made; it's looked up when the function is called. To make that even more obvious, watch what happens in this code:

def weird():
    spam = 42
    handler = (lambda: spam * 2)     # func doesn't save 42 now
    spam = 50
    print handler()                  # prints 100: spam looked up now
    spam = 60
    print handler()                  # prints 120: spam looked up again now
    
weird()

Here, "spam" refers to its latest value in the enclosing scope, not what it was when the function was made. Now, if you're going to make functions within a loop, you have to apply this knowledge to the loop variable:

def odd():
    funcs = []
    for c in 'abcdefg':
       funcs.append((lambda: c))      # c will be looked up later
    return funcs                      # does not remember current c

for func in odd():
    print func(),       # print 7 g's, not a,b,c,... !

We've run into trouble here because the variable "c" in the nested lambda functions refers to its latest setting in the enclosing scope when the function is called--the value it was given on the last loop iteration. The effect is that all 7 functions have the same value for "c", and we get 7 "g"s printed, not "a b c...". To make this work as planned, we have to use defaults to save the current value of the loop variable, not its future value:

def odd():
    funcs = []
    for c in 'abcdefg':
       funcs.append((lambda c=c: c))    # force to remember c now
    return funcs                        # defaults eval now

for func in odd():
    print func(),       # okay: now prints a,b,c,...

The short story here is that enclosing scope lookup does simplify some code, but as implemented, it is not a complete replacement for passing in values with defaults. For more examples and details, watch for the upcoming 3rd Edition of the book Programming Python. The first GUI chapter of that book covers this phenomenon in a bit more depth (in fact, I borrowed much of this description from an early draft of that book).



Noteworthy Book Errata

This section discusses selected errata in the book; see the link to the O'Reilly-maintained errata list at the top of this page for the full book errata list.


Code bug, min function

The first few printings of the book had a nasty coding bug in chapter 13, page 214, function min1. It read:

def min1(*args):
    res = args[0]
    for arg in args[1:]:
        if arg < args:     # bad line
            res = arg
    return res
But should have read:
def min1(*args):
    res = args[0]
    for arg in args[1:]:
        if arg < res:
            res = arg
    return res

This bug was also reflected in the test output that follows the code listing (and is made more mind-numbing by the fact that this example has been coded correctly hundreds of times during classes; typos happen, especially when publishing schedules shrink). See the discussion above about generalizing min/max with classes for more on this example.


Back to the LP2E page



[Home page] Books Code Blog Python Author Train Find ©M.Lutz