Learning Python 4th Edition: Recent Clarifications

Below are recent book clarifications: notes which provide additional coverage of language topics, and are intended as supplements to the book. The items on this page were posted after the first reprint of this book, which was dated January 2010. Any book changes they propose do not appear in either the first printing or the first reprint.

To make this page's content easier to apply in reprints, I've divided its notes into 2 sections--those that merit book changes, and those that do not. These lists are ordered in no particular way, though mostly by page number and/or date of addition.

Also note that much of what follows was later incorporated into 2013's newer 5th Edition of this book.

Highlights

Here are some of the main topics for which you'll find more coverage here:

See also the older clarifications page for items already patched in reprints, and the corrections pages for genuine book errata. And for more book-related topics, see also the book's notes pages; in general, I put less technically in-depth notes on that page.

Update, 10/30/2011: I've stopped adding trivial clarifications made in reprints to the list below, because the redundancy proved too complex to manage. For all items recently patched in reprints, including a few additional clarifications not listed here, please see the confirmed errata list at O'Reilly's site, sorted by submission date.


Items that merit changes in reprints


  1. [Jan-4-12] Page 902: more on Unicode internal storage models

    I inserted a short footnote at the bottom of Page 902 in reprints to describe the internal storage of Unicode characters in Python 3.X (per a July 2010 note below). Because this is changing in 3.3, and because it looks like there is space on this page for elaboration, I want to change the footnote's current text:

    "It may help to know that Python internally stores decoded strings in UTF-16 (roughly, UCS-2) format, with 2 bytes per character (a.k.a. Unicode "code pont"), unless compiled for 4 bytes/character. Encoded text is always translated to and from this internal form, in which text processing occurs."

    to read as follows (reprints: please ask me how to shorten if this is too large to fit at the bottom of the page, as it's not worth changing page breaks; this ideally should have been a sidebar, but it's too late for that much change):

    "It may help to know that Python always stores decoded text strings in a encoding-neutral, multi-byte format in memory. All text processing occurs in this uniform internal format. Text is translated to and from an encoding-specific format only when it is transferred to or from byte strings, external text files, or APIs with specific ecoding requirements. Through Python 3.2, strings are stored internally in UTF-16 (roughly, UCS-2) format with 2 bytes per character, unless Python is configured to use 4 bytes/character. Python 3.3 and later will instead use a variable-length scheme with 1, 2, or 4 bytes per character, depending on a string's content. Either way, encoding pertains mostly to files and transfers; once loaded into a Python string, text in memory has no notion of encoding, and is simply a sequence of Unicode characters (a.k.a. "code points") stored generically."



  2. [Jul-29-11] Page 728: more on right-side operator overloading methods: __radd__ = __add__

    At the empty space at the bottom of page 728, after the final code listing on this page, add a new short paragraph that reads as follows with all __radd__ and __add__ in literal font (it looks like there's ample room, but please ask how to shorten of not):

    "For truly commutative operations which do not require special-casing by position, it is also sometimes sufficient to alias the right-side __radd__ to the left-side __add__, by simply assigning the former name to the latter at the top-level of the class statement. Right appearances will then trigger the single, shared __add__ method passing the right operand to self."

    Discussion only follows: On Pages 727-729 the book introduces right-side operator overloading methods such as __radd__ in sufficient though somewhat cursory fashion. As mentioned there, this was intentional, given that most applications programmers and readers of this book will do little operator overloading if any, and even fewer will need to implement commutative expression operations for their objects.

    As one example, __radd__ shows up nowhere in the follow-up book Programming Python 4E, despite the fact that that book constructs larger and fully functional programs including desktop email clients, webmail sites, text editors, and image viewers, some of which span thousands of lines of code. Right-side methods are more important for implementing objects of truly numeric nature, a task which is relatively rare in practice. Where required, the full story is readily available in Python's manuals.

    Still, Learning Python 4E omits a common coding pattern for the right-side operator methods, which I thought was present in earlier editions or other books, but is absent today. In short, for operations that are truly commutative, it's somewhat common for a class to simply alias a right-side appearance method such as __radd__ to the left-side appearance method such as __add__, by assigning the former name to the latter at the class level. Abstractly:
    class C:
        def __add__(self, other):
            ....
        __radd__ = __add__
    
    Because self is actually on the right side of the operator when __radd__ is invoked, the effect is to treat this case the same as left-side appearances: __radd__ triggers __add__ with operand order swapped. The following code illustrates, by tracing __add__ calls and arguments:
    # radd.py
    from __future__ import print_function   # for 2.7
    
    class C:
        def __init__(self, value):
            self.data = value
        def __add__(self, other):
            print(self, '+', other, '=', end=' ')
            return self.data + other
        __radd__ = __add__
        def __str__(self):
            return '[%s]' % self.data
    
    x = C(1)
    y = C(2)
    
    print(x + 3)    # [1] + 3   =>  left:  __add__
    print(3 + y)    # 3 + [2]   =>  right: __radd__==__add__
    print(x + y)    # [1] + [2] =>  both:  __add__, then __radd__==__add__
    
    When run, this code's print calls trace how every call is routed into the single __add__ method, with operands swapped for right-side appearances:
    ...>C:\Python32\python radd.py
    [1] + 3 = 4
    [2] + 3 = 5
    [1] + [2] = [2] + 1 = 3
    
    ...>C:\Python27\python radd.py
    [1] + 3 = 4
    [2] + 3 = 5
    [1] + [2] = [2] + 1 = 3
    
    There's no reason to define __radd__ separately as shown in the book's brief call-tracing examples, unless right-side appearances require special-case processing. For instance, consider the book's second Commuter class example:
    class Commuter:                  # Propagate class type in results
        def __init__(self, val):
            self.val = val
        def __add__(self, other):
            if isinstance(other, Commuter): other = other.val
            return Commuter(self.val + other)
        def __radd__(self, other):
            return Commuter(other + self.val)
        def __str__(self):
            return '' % self.val
    
    This class works the same if it simply assigns __radd__ to __add__, though it must still do some type testing to avoid nesting Commuter objects in expression results (comment-out the "if" to see why):
    class Commuter:                  # Propagate class type in results
        def __init__(self, val):
            self.val = val
        def __add__(self, other):
            if isinstance(other, Commuter): other = other.val
            return Commuter(self.val + other)
        __radd__ = __add__
        def __str__(self):
            return '' % self.val
    
    Trace this to see why the equivalence works. The book's examples are designed to trace calls or illustrate concepts, of course, but they could use simpler patterns in real code.

    Also notice that it's possible to achieve a similar effect by adding in reverse -- the following works the same as the former -- but name aliasing by simple assignment is more direct and does not incur an extra call and operation:
    class C:
        def __add__(self, other):
            ....
        def __radd__(self, other):   # other + self (__radd__) => self + other (__add__)
            return self + other      # but __radd__ = __add__ more direct and quick
    



  3. [May-30-11] Page 778: The object superclass comes with some __X__ defaults

    A minor clarification for new-style classes, and all classes in 3.X: the built-in object class at the top of each class tree in this model also comes with a small handful of default __X__ operator-overloading methods. Run a dir(object) to see what these are.

    These defaults are described explicitly by the book's diamond search order discussion (especially on Page 787); are demonstrated by the book's Lister mix-in examples (Pages 758-767); and are mentioned at various __str__ method appearances (Pages 971 and 1031). Still, this might have been called out more explicitly in the introductory bullet lists too, and mentioned as a footnote in the operator overloading chapter, though this would be a forward reference there.

    Because of this, I posted a minor insert for reprints at this book's errata page at oreilly.com:
    On page 778, 5th line from bottom, change:
    "and all classes (and hence types) inherit from object."
    
    by adding text at the end to read:
    "and all classes (and hence types) inherit from object, 
    which comes with a small set of default operator overloading
    methods."
    
    This is described ahead on Page 787 and in the context of other
    examples, but it seems important enough to mention in this
    summary (and it looks like there is ample space on this page).  
    The default methods of object in new-style classes such as 
    __str__ can sometimes be problematic if not anticipated.
    



  4. [Sep-29-10] Page 27 and 534, note "magic" version number check for byte-code recompiles

    On page 27, at the very end of the 2nd paragraph that begins "Python saves", add this sentence: "Imports also check to see if the file must be recompiled because it was created by a different Python version, using a "magic" number in the byte-code file itself."

    Also, on page 534, at the very end of the second last paragraph which begins "Python checks", add another sentence: "As noted in Chapter 2, imports also recreate byte code if its "magic" Python version number does not match."

    Discussion only: Technically, in order to know if a recompile is required, imports check both source/bytecode timestamps as well as the bytecode file's internal version/implementation "magic" number The book describes the timestamp check because it's fundamental to all users, but does not go into further detail about the extra magic number test because this is arguably more low-level and detailed than most Pyton newcomers using a single Python require. It becomes more important to know once you start installing new or alternative Pythons, of course, though it would be difficult to imagine how an incompatible Python could work at all without such a mechanism. See also the note about the upcoming byte-code storage model changes in Python 3.2.



  5. [Nov-3-10] Pages 164 (200, 233): note the use of 3.X print function form for 2.X readers

    A reader wrote with confusion about why a 3.X print call in one of the early examples did not run under his Python 2.X. To minimize confusion, expand the text of the comment on the second last line of page 164 to make the usage explicit; change the first of the following lines to the second:
    >>> for c in myjob: print(c, end=' ')    # Step through items
    
    >>> for c in myjob: print(c, end=' ')    # Step through items (3.X print call)
    
    Simlarly, extend two comments on Pages 200 and 233 to add the same text; change these lines as follows (and make sure the indentation of both the code and the "#" characters in all these lines is the same as it was originally):
    ...     print(x, end=' ')            # Iteration (3.X print call)
    >>> for line in open('myfile'):      # Use file iterators, not reads (3.X print call)
    
    (Discussion only follows): The example on page 164 uses the Python 3.X print function, instead of the 2.X print statement. It doesn't say so explicitly, but the new print function in 3.X is described in the Preface (see Table P-2 in particular), and the 3.X/2.X printing differences are covered in depth later in the book (see page 298). To run on 2.6, use the following -- the 2.X trailing comma syntax works like the 3.X end=' ' keyword function argument to avoid a newline:
    >>> myjob = "hacker"
    >>> for c in myjob: print c,      # versus 3.X print(c, end=' ')
    
    This is an unfortunate byproduct of having to address 2 Python versions in one book. Per its Preface, this book is primarily 3.X by default, with coverage of 2.X divergences. In this case, the 3.X print form might have been called out explicitly, and we'll expand the comment in reprints as noted above to minimize confusion. In general, though, if the book exhaustively noted every occurence of an incompatibility for 2.X readers, it may have been much larger than it already is. In fact, the end='' appears two more times before the book gets to print call/statement details in Chapter 11, and there are many other instances of 3.X-only usage, some of which are undoubtedly not explicitly noted as such.

    When in doubt, refer to the tables of 3.X changes in the Preface (ideally, you should at least scan the Preface up front), and check the index for details on 3.Xisms that create unavoidable forward dependencies like this in a dual-version book.



  6. [Oct-13-10] Page 792, third sentence: main point lost by edit made during production

    In this sentence, change the clause: "but they incur an extra method call for any accesses to names that require dynamic computation." to read as worded in my original text: "but they incur an extra method call only for accesses to names that require dynamic computation."

    This clause describes how properties differ from tools like __getattr__, and the "only" in my original wording is really the main point. As changed by editors, that main point (the contrast that stems from their focus on a specific attribute instead of many) was lost.

    While we're at it, please add page 792 to the Index entry for "property built-in function" -- this is a crucial first definition of them.



  7. [Jul-7-10] Assorted Unicode clarifications

    Three related minior updates meant to clarify the scope of Python 3.X Unicode strings.

    1. Page 896, end of second last paragraph: Unicode -- clarify impacts (new sentence)
      At the very end of the paragraph which begins "Even if you fall into", add a new last sentence which reads: "Though applications are beyond our scope here, especially if you work with the Internet, files, directories, network interfaces, databases, pipes, and even GUIs, Unicode may no longer be an optional topic for you in Python 3.X."

      I'm adding this because the existing text seems a bit misleading, after seeing firsthand how much Unicode permeates 3.X applications work. See this note for related discussion. Reprints: delete the first clause of this new sentence of it won't fit as is; it looks like there is plenty of room.

      (This and 6 other Unicode items on this page arose from a recent reread of the Unicode chapter a year after writing it; it's fine as is, but a few key concepts could be polished with simple inserts in the next printing.)

    2. Page 901, start of 1st paragraph on page: Unicode -- same policy for read, write (reword)
      The start of this paragraph seems potentially misleading in retrospect--it's not clear if writes work the same as reads. This is clarified later on (see page 920 and later), but it may be worth tightening up here.

      Change: "When a file is opened in text mode, reading its data automatically decodes its content (per a platform default or a provided encoding name) and returns it as a str; writing takes a str and automatically encodes it before transferring it to the file."

      to read as this (the parenthesized part has been pulled out): "When a file is opened in text mode, reading its data automatically decodes its content and returns it as a str; writing takes a str and automatically encodes it before transferring it to the file. Both reads and writes translate per a platform default or a provided encoding name."

    3. Page 936, last sentence of page: Unicode -- mention filename tools too (new text)
      Change the last part of the text: "For more details on re, struct, pickle, and XML tools in general, consult" to read: "For more details on re, struct, pickle, and XML, as well as the impacts of Unicode on other library tools such as filename expansion and directory walkers, consult".

      The section here dealing with tools impacted by Unicode could also have mentioned that os.listdir returns decoded Unicode str for str arguments, and encoded raw binary bytes for bytes arguments, in order to handle undecodable filenames. In short, pass in the directory name as a bytes object to suppress Unicode decoding of filenames per the platform default, or else an exception is raised if any filenames fail to decode. Passing in a str invokes Unicode filename decoding on platforms where this matters.

      By proxy, os.walk and glob.glob work the same way, because they use os.listdir internally to generate filenames in directories. This was omitted here because the section already encroaches on the language/applications line. Instead, the impacts of Unicode on these and other tools are covered in depth in the new 4th Edition of Programming Python, where application topics are collected in general.



  8. [Jul-2-10] Assorted Unicode clarifications

    Four related minor updates meant to clarify the scope of Python 3.X Unicode strings.

    1. Page 898: Unicode -- mention UTF-16 and UTF-32 in intro (new text)
      Near the end of the second last paragraph on this page, expand the start of the second last line by adding the parenthesized text in the following, to read: "sets in similar ways (e.g., UTF-16 and UTF-32 format strings with 2 and 4 bytes per each character, respectively), but all of these". This is implied by later UTF-16 examples, but UTF-16 is so common on Windows now that it merits a word here.

    2. Page 899: Unicode -- bytes is for encoded str too (new text)
      At the second bullet item in the second bullet list on this page, add the following text in parenthesis at the end, so that the bullet item reads: "* bytes for representing binary data (including encoded text)". This is shown and implied in later examples, but this seems like a key link concept.

    3. Page 900: Unicode -- internal str format (new footnote)
      I avoided internals discussion in this chapter on purpose, using terms such as "character" instead, but in retrospect some readers might find a more tangible model useful too. Add a footnote at the bottom of page 900, with its star at the very end of the last paragraph before header "Text and Binary Files", which reads:

      "It may help to know that Python internally stores decoded strings in UTF-16 (roughly, UCS-2) format, with 2 bytes per character (a.k.a. Unicode "code pont"), unless compiled for 4 bytes/character. Encoded text is always translated to and from this internal string form, in which text processing occurs.".

      Reprints: if this doesn't fit at the bottom of this page as is, please ask me how it could be shortened

    4. Page 909: Unicode -- "conversion" means encoding differently (new sentence)
      At the very end of the last paragraph on this page, add the following new sentence: "Either way, note that "conversion" here really just means encoding a text string to raw bytes per a different encoding scheme; decoded text has no encoding type, and is simply a string of Unicode code points (a.k.a. characters) in memory.".



  9. [Aug-1-10] Page 1139 and entire Index: Index additions list with commments

    A reader posted a nice list of Index additions on O'Reilly's errata site for this book, and I replied there with a handful of clarifications and additions (I won't repeat the details here). The indexes which O'Reilly creates have improved much over the years, and this book is primarily tutorial rather than reference. Instead, Python Pocket Reference provides a quick-reference supplement in a more concise format. Still, we should try to pick up as many of these additions in a future reprint as space allows; this is exactly the sort of reader feedback needed to make improvements in this department.



  10. [Sep-1-10] Page 976 and 954: Note that descriptor state cannot vary per client class instance

    Add a new sentence at the very end of paragraph 3 on page 976, which reads "The downside of this scheme is that state stored inside a descriptor itself is class-level data which is effectively shared by all client class instances, and so cannot vary between them.".

    Also, after the first sentence of the last paragraph on page 954, add a new sentence which reads "Unlike data stored in the descriptor itself, this allows for data that can vary per client class instance.". It looks like there is space for both inserts, but please ask me how to shorten if not.

    (Discussion only follows): There is an implication of descriptor state options which might have been called out more explictly than it was. Crucially, storing state in the descriptor instance instead of the owner (client) class instance means that the state will be effectively shared by all owner class instances. That is, because descriptors are class-level data, their content cannot vary per instance of client classes. To see this at work, in the descriptor-based CardHolder example on page 976-977, try printing attributes of the "bob" instance after creating the second instance, "sue". The values of sue's managed attributes ("name", "age", and "acct") effectively overwrite those of the earlier object bob, because both share the same, single descriptor instance attached to their class:
    class CardHolder: ...as is...
    
    bob = CardHolder('1234-5678',  'Bob Smith', 40, '123 main st')
    print(bob.name, bob.acct, bob.age, bob.addr) 
    
    sue = CardHolder('5678-12-34', 'Sue Jones', 35, '124 main st')
    print(sue.name, sue.acct, sue.age, sue.addr)    # addr differs: cardholder instance data
    print(bob.name, bob.acct, bob.age, bob.addr)    # name,acct,age same: descriptor data!
    
    ...> C:\Python31\python test.py
    bob_smith 12345*** 40 123 main st
    sue_jones 56781*** 35 124 main st
    sue_jones 56781*** 35 123 main st
    
    There are valid uses for descriptor state, of course (to manage descriptor implementation, for example), and this code was implemented to illustrate the technique. Moreover, the state scope implications of class versus instance attributes should be more or less a given at this point in the book. However, in this particular use case, attributes of CardHolder objects are probably better stored as per-instance data instead of descriptor instance data, perhaps using the same __X naming convention as the property-based equivalent to avoid name clashes in the instance.



  11. [Sep-27-10] Chapter 23: The 3.X package-relative import model precludes using directories as both program and package

    If we have space, on Page 569, 3rd paragraph from the end, extend the second sentence with the following's parenthesized text to read: " ..., they can keep saying just import utilities and expect to find their own files (when they are run as top-level programs, at least; per the next section, when used as a package in 3.X, their same directory inter-package imports may need to be changed to use absolute directory paths or package-relative imports). ".

    Also if we have space, add a paragraph to the end of the note on page 580, which reads: " Python 3.X's package-relative import model today also complicates using a directory of code used as both program and library. To import a file from the same directory, an inter-package importer generally must use package-relative syntax when it is being used in package mode, but cannot use this syntax when it is being used in non-package mode. Hence, you may need to either isolate externally visible files in their own package subdirectory; use fully specified package path imports instead; extend the import search path; or special-case imports per usage mode via the __name__ variable described in the next chapter. See the interactive prompt imports run earlier for equivalent cases. "

    (Discussion only follows): There is a bit of gotcha to the Python 3.0 package-relative import model change for inter-package imports, which is implied by the examples and narrative of this chapter, but isn't called out or illustrated as explicitly as it might have been. I ran into it first-hand when updating some exmples for Programming Python 4th Edition. In short, because 3.X:

    1. Does not search a package's own directory when it's used in package mode unless "from ." package-relative syntax is used, and
    2. Does not allow "from ." syntax to be used unless the importer is being used as part of a package,

    you can no longer directly create directories that serve as both standalone programs and importable packages--because import syntax can vary per usage mode, importers in such directories may need to pick between package relative import syntax (and assume use as package only) or normal import syntax (and assume non-package usage only). The workarounds are as follows:

    1. Always use fully specified "dir.dir.mod" absolute package imports instead of "from ." package relative imports,
    2. Specialize your import statements according to their usage context (package or program) by testing __name__,
    3. Add the package's directory to the sys.path module search path directly, or
    4. Move all files meant to be visible outside a directory into a nested subdirectory package so they are always used in package mode

    The latter may be the ultimate solution, but it implies substantial program restructuring for existing code meant to be used as both program and importable library. This cropped up in multiple cases in the PP4E book, but as a simple case, the PyEdit text editor is meant to be both run standalone, but also to be imported as attachable component classes. Since this system is nested in the PP4E package, it is referenced with absolute package import syntax by clients outside the package:
    from PP4E.Gui.TextEditor import textEditor   # component and pop up
    
    In Python 2.X, PyEdit's own files imported files in its own directory with simple imports, relying on 2.X's implied package directory relative imports model:
    import textConfig    # startup font and colors
    
    This worked in 2.X for both package and top-level program usage modes. However, unless this module is also located elsewhere on the import search path, this fails for package-mode in 3.X because the package directory itself is no longer searched. Simply using package-relative imports:
    from . import textconfig
    
    suffices when PyEdit is imported externally, but then fails when it is run standalone, because "from ." is allowed only for code being used as a package. To workaround for cases where the text config file had to be imported from the package directory, I specialized the imports per usage mode:
    if __name__ == '__main__':
        from textConfig import (               # my dir is on the path
            opensAskUser, opensEncoding,
            savesUseKnownEncoding, savesAskUser, savesEncoding)
    else:
        from .textConfig import (              # always from this package
            opensAskUser, opensEncoding,
            savesUseKnownEncoding, savesAskUser, savesEncoding)
    
    Other cases instead run a top-level script one level up from the package subdirectory to avoid the conflict. Restructuring PyEdit as a top-level script plus a package subdirectory may be arguably better, but seems like too much of a change to existing code just to accomodate the new model. Moreover using full absolute paths from the PP4E root in every import seems to be overkill in the cases I observed, and is prone to requiring updates if directories are moved.

    I'm not sure if such a dual program/library role was taken into account in the 3.X inter-package import model change (indeed, package-relative import semantics is being discussed anew on the Python developers list as I write this note), but it seems to be a primary casualty.



  12. [Nov-22-10] Pages 767, 786: more on new-style inheritance method resolution order (MRO)

    Two inserts in the name of completeness.

    First, on page 767, at the end of the very last paragraph before the note box on this page, add the following new sentence ("class.mro()" in both of the text inserts should be literal font): " For more ideas, see also Python manuals for the class.mro() new-style class object method, which returns a list giving the class tree search order used by inheritance; this could be used by a class lister to show attribute sources. ".

    Second, at the very end of the last paragrph on page 786, add a new sentence which reads: " To trace how new-stye inheritance works by default, see also the class.mro() method mentioned in the preceding chapter's class lister examples. ".

    [Discussion only follows] I resisted a formal description of new-style class method resolution order (MRO -- the order in which inheritance searches classes in a class tree), partly because most Python programmers don't care and probably don't need to care (this really only impacts diamonds, which are relatively rare in real-world code); partly because it differs between 2.X and 3.X; and partly because the details of the new-style MRO are a bit too arcane and academic for this book. As a rule, this book avoids formal, rigid description, and prefers to teach informally by example; see its treatment of function argument matching for another example of this approach.

    Having said that, some readers may still have an interest in the formal theory behind new-style MRO. If this set includes you, it's described in detail online at: this web page.

    Apart from such formalities, if you just want to see how Python's new-style inheritance orders supserclasses in general, new-style classes (and hence all classes in 3.X) have a class.mro() method which returns a list giving the linear search order. Here are some illustrative examples:
    >>> class C: pass
    >>> class A(C): pass         # diamonds: order differs for newstyle
    >>> class B(C): pass         # breadth-first across lower levels
    >>> class D(A, B): pass
    >>> D.mro()
    [<class '__main__.D'>, <class '__main__.A'>, <class '__main__.B'>, 
    <class '__main__.C'>, <class 'object'>]
    
    >>> class C: pass
    >>> class A(C): pass         # nondiamond: order same as classic
    >>> class B: pass            # depth-first, then left-to-right
    >>> class D(A, B): pass
    >>> D.mro()
    [<class '__main__.D'>, <class '__main__.A'>, <class '__main__.C'>, 
    <class '__main__.B'>, <class 'object'>]
    
    >>> class X: pass
    >>> class Y: pass
    >>> class A(X): pass         # nondiamond: depth-first then left-to-right
    >>> class B(Y): pass         # though implied "object" always forms a diamond
    >>> class D(A, B): pass
    >>> D.mro()
    [<class '__main__.D'>, <class '__main__.A'>, <class '__main__.X'>, 
    <class '__main__.B'>, <class '__main__.Y'>, <class 'object'>]
    
    The mro method is only available on new style classes (it's not present in 2.X unless classes derive from "object"). It might be useful to resolve confusion, and in tools that must imitate Python's inheritance search order. For instance, tree climbers such as the book's class tree lister (Chapter 30,pages 757-767) might benefit, though climbers might also need to map this linear list to the structure of the tree being traced.


Items that are informational only


  1. [Jan-4-12] Python 3.2.0 breaks scripts using input() on Windows [LP4E]

    [No fix required] If a book example which uses the input() built-in seems to be failing, and you are using Python 3.2.0 in a Windows console window, see this post on this book's Notes page. This built-in was apparently broken temporarily in 3.2.0 (3.2) in Windows console mode, but has been fixed in later Python releases. The quickest fix is to upgrade to 3.2.1 or later, or try a different environment; the book examples work fine in all other Pythons and most other contexts such as IDLE.



  2. [Oct-17-11] More on pickle module constraints: bound methods

    Python's pickle object serialization module is mentioned a few times in this book: in Chapter 9 for flat files; in Chapter 27 to store an object database during a classes demo; in a Chapter 30 sidebar to describe storing a composite object; and in Chapter 36 in conjunction with string tool changes in 3.X (see the index for page numbers). Though really an application tool in the realm of the book Programming Python which covers it in more depth, pickle has very broad utility, and is even at the heart of some newer distributed computing libraries such as Pyro -- a system which implements remote procedure calls by pickling function arguments and return values across network sockets, providing a Python-focused alternative to web service protocols such as XML-RPC and Soap. Pickled data is also the transport medium in the newer multiprocessing module in Python itself -- a portable threading API implemented with processes.

    Learning Python doesn't go into much detail about the rules for what can and cannot be pickled, but only lists common types that can, and defers to Python's manuals and other books for more details. As described in those other resources, in general most built-in types and class instances can be pickled, but objects with system state such as open files cannot. Moreover, pickled functions, classes, and by proxy classes of pickled instances, must all be importable -- they must live at the top of a module file on the import search path, because they are saved and loaded by name only. For such an object, pickle data records just the names of the file or class and its enclosing module, not its bytecode; unpickling reimports and fetches by name to recreate the original object. This applies to classes of pickled instances too: pickles saves the instance's attributes, and relinks them to the automatically imported class on loads.

    One noteable item that cannot be pickled, which is implied but not mentioned explicitly in most other resources is bound methods: callable method/instance pairs described explicitly on Pages 752-758. Python could not recreate the bound method properly if pickled. Technically, these fail because they do not conform to the importability rule for functions: class methods are not directly importable at the top of a module. More subtly, Python cannot pickle function objects except by name, and cannot assume that the function object referenced inside a bound method object originated from any particular name's binding. For instance, the original method name may have been reassigned in a class or instance between the time a bound method is created and pickled, and may thus reference an object different than the bound method's function if fetched anew.

    The net effect of all this is that you cannot serialize or store bound methods themselves, though you might devise other similar schemes that make assumptions reasonable for a given program. For example, a program may pickle an instance along with the desired method's name string, and fetch the method by name with getattr() after unpickling to call immediately or create a new bound method. In some cases it may also suffice to pickle a simple top-level function along with an instance to be passed into it after unpickling. The pickle module doesn't directly support such schemes itself, however.

    Here's an illustration of this limitation in code run with Python 3.1. The following creates, pickles, and unpickles an instance of an importable class. In this test the class lives in an importable module file, but the test works the same if this class is instead typed at the interactive shell where all this code runs, because the shell's namespace is then equivalent to the top of a module file (when typed interactively, the class is named __main__.C in object displays):
    >>> print(open('test.py').read())
    class C:
        def __init__(self, data):
            self.state = data
        def spam(self):
            print(self.state)
    
    >>> from test import C
    >>> X = C(99)
    >>> X.spam()
    99
    >>>
    >>> X
    <test.C object at 0x02695310>
    >>>
    >>> import pickle
    >>> pickle.dump(X, open('test.pkl', 'wb'))
    >>> pickle.load(open('test.pkl', 'rb'))
    <test.C object at 0x02695350>
    >>>
    >>> Y = pickle.load(open('test.pkl', 'rb'))
    >>> Y.spam()
    99
    
    As described in the book, bound methods allow us to treat an instance's methods as though they were simple callable functions -- especially useful in callback-based code such as GUIs to implement functions with state to be used while processing an event (see Pages 729-730 and the sidebar on Page 758 for more on this bound method role, as well as its __call__ alternative coding):
    >>> X
    <test.C object at 0x02695310>
    >>> X.spam()
    99
    >>>
    >>> X.spam
    <bound method C.spam of <test.C object at 0x02695310>>
    >>>
    >>> T = X.spam
    >>> T()
    99
    
    You won't be able to pickle bound (or unbound) methods directly, though, which precludes using them in roles such as persistently saved or transferred callback handlers without extra steps on unpickles:
    >>> pickle.dump(X.spam, open('test.pkl', 'wb'))
    Traceback (most recent call last):
    ...more...
    _pickle.PicklingError: Can't pickle <class 'method'>: attribute lookup builtins.method failed
    
    >>> pickle.dump(C.spam, open('test1.pkl', 'wb'))
    Traceback (most recent call last):
    ...more...
    _pickle.PicklingError: Can't pickle <class 'function'>: attribute lookup builtins.function failed
    
    Of course, pickling things like bound method callback handlers may not work in some cases anyhow, because the instance may contain state information that is valid in the pickling process only; references to GUI objects in callback handlers, for example, are likely invalid in an unpickling program. Unpickled state information might be less transient in other applications.

    I'm not marking this as a book update because this book doesn't go into this level of detail on pickling. See Programming Python and Python's Library Manual for more on pickle, as well as the related shelve module which adds access to objects by key. As described elsewhere, there is additional pickler protocol for providing and restoring object state which may prove useful in this case. For instance, the pickler's __getstate__ and __setstate__ methods can be used for purposes such as reopening files on unpickling, and might be used to recreate a bound method when loading a pickled instance of a suitable wrapper class.



  3. [May-30-11] Page 711-718: Using yield within the __iter__ method (or not!)

    (Note: I'm going to have more to say on this technique in the 5th Edition of this book; it is as implicit as this describes, but also does have some advantages in code size which are not described here as well as they might be.)

    [No fix required] I recently saw an iterator coding technique in Python standard library code which is described only tersely and abstractly in Python's own manuals, and implied but not covered explicitly in the book itself. Given that understanding this technique at all requires two big leaps of faith in the implicit and the magic, I'm not sure I would recommend it in general. Still, a brief look might help if you stumble onto it in code too.

    On Pages 711-718, the book teaches user-defined iterator objects by coding their classes to either return self, for a single-pass iteration:
    class C:
        def __iter__(self, ...):                    # called on iter()
            ...configure state
            return self
        def __next__(self):                         # called on next()
            ...use state                            # use .next() in 2.X
            ...return next or raise StopIteration
    
    or return a different object, to support multiple active iterations:
    class C:
        def __iter__(self, ...):
            return Citer(state)
    
    class Citer:
        def __init__(self, ...):
            ...configure state
        def __next__(self):
            ...use state
            ...return next or raise StopIteration
    
    This part of the book also compares such classes to generator functions and expressions, as well as simple list comprehensions, to show how the classes better support state retention and minimize memory requirements. Though not shown explicitly in the book, as implied directly by its coverage of generator functions on Pages 492-505 it's also possible to achieve similar effects by yielding values from the __iter__ method itself:
    class C:
        def __iter__(self, ...):        # __iter__ returns obj with __next__
            ...configure state          # yield makes this a generator
            for loop...:                # generators make objs with __next__
                yield next              # return raises StopIteration
    
    This technique works too, but seems like too deep magic to me. To understand this at all, you need to know two very implicit things:

    • First, that __iter__ is invoked as a first step in iteration, and must return an object with a __next__ method (next in 2.X) to be called on each iteration. This is the iteration protocol in general, discussed in multiple places in the book; see the two iteration chapters especially.

    • Second, that this coding scheme only works because calling a generator function (a def statement containing a yield statement) automatically creates and returns an iterable object which has an internally created __next__ method, which automatically raises StopIteration on returns. This is the definition of generator functions, discussed in detail on Pages 492-505.

    In other words, this sort of __iter__ does return an object with a __next__ to be run later too, but only because that's what generator functions do automatically when they are first called. The combined effect is therefore the same as explicitly returning an object with an explicit __next__ method as in the book's examples, but there seems a magic multiplier factor at work here which makes the yield-based scheme substantially more obscure.

    I would even suggest that this qualifies the __iter__/yield scheme as non-Pythonic, at least by that term's original conception. Among other things, it soundly violates Python's longstanding EIBTI motto -- for Explicit is better than implicit, the second rule listed by the "import this" statement of Python's underlying philosophies. (Run this command yourself at an interactive Python prompt to see what I mean; it's as formal a collection of goals and values as Python has.)

    Of course, the Python world and time are the final judges on such matters. Moreover, one could credibly argue that the very meaning of the term Pythonic has been modified in recent years to incorporate much more feature redundancy and implicit magic than it originally did. Consider the growing prominence of scope closure state retention in recent Python code, instead of traditional and explicit object attributes. The __iter__/yield iterator coding scheme is ultimately based on the former and more implicit of these, and reflects a growing shift in the language from object-oriented towards functional programming patterns.

    All of which is to me really just another instance of a general property I've observed often in the last two decades: Despite their many advantages, open source projects like Python sometimes seem to stand for no more than what their current crop of developers finds interesting. Naturally, whether you find that asset, liability, or both is up to you to decide.

    As a rule, though, and as underscored often in the book, code like this that requires the next programmer to experience "moments of great clarity" is probably less than ideal from a typical software lifecycle perspective. Academically interesting though such examples may be, magic and engineering do not generally mix very well in practice.



  4. [Feb-3-11] More concise coding option for transitive reloads example, page 596

    [No fix required] I was recently reviewing the transitive module reloading utility example on page 596, and noticed that it may be a bit more verbose than needed (a year's time has a way of affording fresh perspectives on such things). If I were to recode this today, I'd probably go with the version that follows -- by moving the loop to the top of the recursive function, it eliminates one of the two loops altogether. Compare this with the original in the book; it works the same, but is arguably simpler, and comes in at 4 lines shorter:
    """
    reloadall.py: transitively reload nested modules 
    """
    
    import types
    from imp import reload                               # from required in 3.0
    
    def status(module):
        print('reloading ' + module.__name__)
    
    def transitive_reload(objects, visited):
        for obj in objects:
            if type(obj) == types.ModuleType and obj not in visited:
                status(obj)
                reload(obj)                             # Reload this, recur to attrs
                visited[obj] = None
                transitive_reload(obj.__dict__.values(), visited)
    
    def reload_all(*args): 
        transitive_reload(args, {})
    
    if __name__ == '__main__':
        import reloadall                                 # Test code: reload myself
        reload_all(reloadall)                            # Should reload this, types
    
    Also keep in mind that both this and the original reload only modules which were loaded with "import" statements; since names copied with "from" statements do not cause a module to be nested in the importer's namespace, their containing module is not reloaded. Handling "from" importers may require either source code analysis, or customization of the __import__ operation.

    If the recursion used in this example is confusing, see the discussion of recursive functions in the advanced function topics of Chapter 19; here is a simple example which demonstrates the technique:
    >>> def countdown(N):
            if N == 0:
                print('stop')                   # 2.X: print 'stop'
            else:
                print(N, end=' ')               # 2.X: print N,
                countdown(N-1)
    		
    >>> countdown(20)
    20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 stop
    
    For more on Python recursion, see also the recursive stack limit tools in the sys module (Python has a fixed depth limit on function calls, which you increase for pathologically deep recursive use cases):
    >>> import sys
    >>> help(sys.setrecursionlimit)
    Help on built-in function setrecursionlimit in module sys:
    
    setrecursionlimit(...)
        setrecursionlimit(n)
        
        Set the maximum depth of the Python interpreter stack to n.  This
        limit prevents infinite recursion from causing an overflow of the C
        stack and crashing Python.  The highest possible limit is platform-
        dependent.
    
    >>> sys.getrecursionlimit()
    1000
    



  5. [Feb-5-11] More on using the super() built-in function in Python 3.X and 2.X ... or not!

    (Note: I'm going to have more to say on this call in the 5th Edition of this book; it has a valid use case--cooperative method dispatch in multiple inheritance trees--which is not given here, although this is still a rare and obscure role, relies on the esoteric MRO ordering of classes, and generally requires universal deployment in all classes of a tree to be used reliably--something that seems highly unrealistic in the many millions of lines of existing Python code.)

    [No fix required] This book very briefly mentions Python's super() built-in function on page 787, but this call probably merits further elaboration given its increase in popularity. Frankly, in my classes it seems to be most often requested by Java programmers starting to use Python anew, because of its conceptual origins in that language (for better or worse, many a new Python feature owes its existence to programmers of other languages bringing their old habits to a new tool). It was given limited coverage in this introductory-level book on purpose because it's arguably not best Python practice today, but it might help some readers to explain the rationale for that choice.

    Traditional form: portable, general
    In general, the book's examples prefer to call back to superclass methods when needed by naming the superclass explicitly, because this technique is traditional in Python; because it works the same in both Python 2.X and 3.X; and because it sidesteps limitations and complexities related to this call in both 2.X and 3.X, especially its weak support of multiple inheritance trees. For reasons I'll outline here, super() is not broadly used today, and might even be better avoided altogether, in favor of the more general and widely applicable traditional call scheme. As shown in the book, to augment a superclass method, the traditional superclass method call scheme works as follows:
    [Python 2.7 and 3.1]
    
    >>> class C:
    ...     def act(self):
    ...         print('spam')
    ...
    >>> class D(C):
    ...     def act(self):
    ...         C.act(self)          # 2.X and 3.X: name superclass explicitly, pass self
    ...         print('eggs')
    ...
    >>> X = D()
    >>> X.act()
    spam
    eggs
    
    This form works the same in 2.X and 3.X, follows Python's normal method call mapping model, applies to all inheritance tree forms, and does not lead to confusing behavior when operator overloading is used. To see why these distinctions matter, let's see how super() compares.

    Use in Python 3.X: a magic proxy
    One of the two goals of the super() built-in (see Python's manuals for the other) is to allow superclasses to be named generically in single-inheritance trees instead, in order to promote simpler code maintenance, and to avoid having to type long superclass reference paths at calls. In Python 3.X, this call seems at least on first glance to achieve this purpose well:
    [Python 3.1]
    
    >>> class C:
    ...     def act(self):
    ...         print('spam')
    ...
    >>> class D(C):
    ...     def act(self):
    ...         super().act()       # 3.X: reference superclass generically, omit self
    ...         print('eggs')
    ...
    >>> X = D()
    >>> X.act()
    spam
    eggs
    
    >>> super                       # a "magic" proxy object that routes later calls
    <class 'super'>
    
    This works, but perhaps the biggest potential downside with this call in 3.X is its reliance on deep magic: it operates by inspecting the call stack in order to automatically locate the self argument and find the superclass, and pairs the two in a proxy object which routes the later call to the superclass version of the method. If that sounds complicated and strange, it's because it is.

    Really, this call's semantics resembles nothing else in Python -- it's neither bound nor unbound itself, and somehow finds a self even though you omit one in the call. In single inheritance trees, a super is available from self via the path self.__class__.__bases__[0], but the heavily implicit nature of this call makes this difficult to see, and even flies in the face of Python's explicit self policy that holds true everywhere else. That is, this call violates a fundamental Python idiom for a single use case. It also flies in the face of Python's general EIBTI rule at large (see earlier on this page for more on this rule).

    Limitation in Python 3.X: multiple inheritance
    Besides its unusual semantics, even in 3.X this super() role really applies only to single inheritance trees, not to multiple inheritance. This is a major limitation of scope; due to the utility of mix-classes in Python, multiple inheritance is probably more the norm than the exception in realistic code. If your classes use more than one superclass, or may in the future, super() is essentially unusable -- it does not raise an exception for multiple inheritance trees, but will pick just the leftmost superclass, which may or may not be the one what you want, and may silently mask the fact that you should really select superclasses explicitly in this case:
    [Python 3.1]
    
    >>> class A:
    ...    def act(self): print('A')
    ...
    >>> class B:
    ...    def act(self): print('B')
    ...
    >>> class C(A):
    ...    def act(self):
    ...       super().act()             # super applies to single-inheritance only
    ...
    >>> X = C()
    >>> X.act()
    A
    
    >>> class C(A, B):
    ...    def act(self):
    ...       super().act()             # doesn't fail on multi, but picks just one!
    ... 
    >>> X = C()
    >>> X.act()
    A
    
    >>> class C(B, A):
    ...    def act(self):
    ...       super().act()             # if B is listed first, A.act() is no longer run!
    ...
    >>> X = C()
    >>> X.act()
    B
    
    >>> class C(A, B):                  # traditional form
    ...    def act(self):               # you probably need to be more explicit here
    ...       A.act(self)               # this form handles both single and multiple inher
    ...       B.act(self)               # and works the same in both Python 3.X and 2.X
    ...                                 # so why use the super() special case at all?
    >>> X = C()
    >>> X.act()
    A
    B
    
    Here's a real world example of a case where super() does not apply, taken from the PyMailGUI case study in Programming Python 4th Edition -- the following very typical Python classes use multiple inheritance to mix in both application logic and window tools, and hence must invoke both superclass constructors explicitly with direct calls by name, because super() does not apply:
    class PyMailServerWindow(PyMailServer, windows.MainWindow):
        "a Tk, with extra protocol and mixed-in methods"
        def __init__(self):
            windows.MainWindow.__init__(self, appname, srvrname)
            PyMailServer.__init__(self)
    
    class PyMailFileWindow(PyMailFile, windows.PopupWindow):
        "a Toplevel, with extra protocol and mixed-in methods"
        def __init__(self, filename):
            windows.PopupWindow.__init__(self, appname, filename)
            PyMailFile.__init__(self, filename)
    
    The crucial point here is that using super() for just the single inheritance cases where it applies means that programmers must remember two ways to accomplish the same goal, when just one, direct calls, would suffice for all cases. Which begs the question of super() advocates: Wasn't such feature creep one of the main things that Python originally sought to avoid?

    Even more fundamentally, it's also not clear that the trivial amount of code maintenance that super() is envisioned to avoid fully justifies its presence. In Python practice, superclass names in headers are rarely changed; when they are, there are usually at most a very small number of superclass calls to update within the class. And consider this: if you do use super() in a single-inheritance tree, and then add a second superclass in the future to leverage multiple inheritance (as in the example above), you may very well have to change all the super() calls in your class to use the traditional explicit call scheme instead -- a maintenance task which seems just as likely and tedious as the one that super() is supposed to address!

    Limitation in Python 3.X: operator overloading
    As mentioned briefly in Python's library manual, super() also doesn't quite work in the presence of __X__ operator overloading methods. If you study the following code, you'll see that direct named calls to overload methods in the superclass operate normally, but using the super() result in an expression fails to dispatch to the superclass's overload method:
    [Python 3.1]
    
    >>> class C:
    ...     def __getitem__(self, ix):      # index overload method
    ...         print('C index')
    ...
    >>> class D(C):
    ...     def __getitem__(self, ix):      # redefine to extend here
    ...         print('D index')
    ...         C.__getitem__(self, ix)     # traditional call form works
    ...         super().__getitem__(ix)     # direct name calls work too 
    ...         super()[ix]                 # but operators do not! (__getattribute__)
    ...
    >>> X = C()
    >>> X[99]
    C index
    >>>
    >>> X = D()
    >>> X[99]
    D index
    C index
    C index
    Traceback (most recent call last):
      File "", line 1, in 
      File "", line 6, in __getitem__
    TypeError: 'super' object is not subscriptable
    
    This behavior is apparently due to the same new-style (and 3.X) class change described at numerous places in the book (see the sidebar on Page 662 for the first) -- because the proxy object returned by super() uses __getattribute__ to catch and dispatch later method calls, it fails to intercept the automatic __X__ method invocations run by expression operators, as these begin their search in the class instead of the instance. This may seem less severe than the multiple-inheritance limitation, but operators should generally work the same as the equivalent method call, especially for a built-in like this, and not supporting this adds another exception for super() users to confront.

    Your Java mileage may have varied, but in Python, self is explicit, multiple inheritance and operator overloading is common, and superclass name updates are rare. Frankly, the super() call seems intended more to placate Java programmers than to address real Python problems. Because it adds an odd special case to the language -- one with strange semantics, limited scope, and questionable reward -- most Python programmers may be better served by the more broadly applicable traditional call scheme.

    Use in Python 2.X: verbose calls
    Just as bad for current 2.X users as well as this dual-version book, the super() technique is not portable between Python lines. To make this call work in Python 2.X, you must first use new-style classes. Worse, you must also explicitly pass in the immediate class name and self to super(), making this call so complex and verbose that in most cases it's probably easier to avoid it completely, and simply name the superclass explicitly per the traditional code pattern above (and for brevity, I'll leave it to readers to consider what changing a class's own name means for code maintenance when using the 2.X super() form!):
    [Python 2.7]
    
    >>> class C(object):                # for new-style classes only
    ...     def act(self):
    ...         print('spam')
    ...
    >>> class D(C):
    ...     def act(self):
    ...         super(D, self).act()    # 2.X: call format seems too complex
    ...         print('eggs')           # "D" may be just as much to type as "C"!
    ...
    >>> X = D()
    >>> X.act()
    spam
    eggs
    
    >>> class D(C):
    ...     def act(self):
    ...         super().act()           # simpler 3.X call format fails in 2.X
    ...         print('eggs')
    ...
    >>> X = D()
    >>> X.act()
    Traceback (most recent call last):
      File "", line 1, in 
      File "", line 3, in act
    TypeError: super() takes at least 1 argument (0 given)
    
    >>> class D(C):
    ...     def act(self):
    ...         C.act(self)             # but traditional pattern works portably
    ...         print('eggs')           # and may be simpler in 2.X code
    ...
    >>> X = D()
    >>> X.act()
    spam
    eggs
    
    Summary
    Like all new Python language features, you should be the judge on this one too, of course, but because this call:

    even ex-Java programmers should also consider the book's preferred traditional technique of explicitly naming superclasses in calls to be at least as valid a solution as Python's super() -- a call which seems an unusual and limited answer to a question which was not being asked for most of Python's history.

    Having said that, I recently found myself finding a use for this call in code that would only run on 3.X, and which used a very long superclass reference path (through a module package -- see the parser class in this code). As usual, time will tell if such limited contexts lead to broader adoption for this call.

    Update Aug-27-11: For other opinions on Python's super() which go into further details both good and bad, see also: Python's Super Considered Harmful, as well as Python’s super() considered super!. You can find additional positions near or between these two with a simple web search.



  6. [Feb-5-11] Page 86, paragraph 2: punctuation inside quotes in non-code text

    [No fix required] A reader wrote to suggest that the "Hello," in the first line of this paragraph be changed to "Hello",, with the comma move after the closing quote to match the pattern's substring. This isn't an errata, though it is an interesting point. I agree with the poster in principle, but this text has to follow writing style conventions. The text in question is not code, it simply quotes a word in the narrative (if this had been code, it would be in literal font). In non-code text like this, a comma, or other punctuation which would normally follow quoted text, is by standard moved inside the quotes, just before the closing quote. The same thing happens to "world." later in this paragraph. This doesn't exactly match the pattern, of course, but English isn't Python.



  7. [Feb-10-11] Chapter 38, decorators: annotations, aspects, and (not) macros

    [No fix required] Python's function and class decorators are covered in depth in the book, especially in Chapters 31 and 38. In a prior clarification which I posted on the first printing's page, I noted that Python's function decorators are similar to what is sometimes called aspect-oriented programming in some other languages -- code inserted to run automatically before or after a function call runs. Python's decorators also very closely resemble Java's annotations, even in their "@" syntax, though Python's model is usually considered more flexible and general.

    Recently, though, I've also heard some comparing decorators to macros, but I don't think this is entirely apt, and might even be misleading. Macros (e.g., C's #define preprocessor directive) are typically associated with textual replacement and expansion, and are designed for generating code. By contrast, Python's decorators are a runtime operation, based upon name rebinding, callable objects, and often, proxies. While the two may have use cases that sometimes overlap, decorators and macros are fundamentally different in scope, implementation, and coding patterns. Comparing the two seems akin to comparing Python's import operation with a C #include, which similarly confuses a runtime object-based operation with text insertion.

    Of course, the term "macro" has been a bit diluted over time (to some, it now can also refer to any canned series of steps or procedure), and some might find the analogy to descriptors useful anyhow. But they should probably also keep in mind that decorators are about callable objects managing callable objects, not text expansion. Python tends to be best understood and used in terms of Python idioms.



  8. [Oct-27-10] Notes on using example code cut-and-paste from PDF or HTML

    [No fix required] A reader wrote with questions on using book example code obtained from HTML (online) and PDF (ebook) forms of the book. Indentation matters in Python code, and some formatting protocols support this better than others. In short, indentation in example code displays correctly when viewed in both formats, but copying the code may require special handling: line breaks formatting may be lost when copying from HTML to Windows-only text editors; indentation is lost altogether when copying from the PDF; and the text files in the example distribution package avoid such issues altogether. In more detail:




  9. [Jul-7-10] Page 139: more on implementation of the bool type

    [No fix required] A reader wrote to ask how the bool type is actually implemented in Python. I mentioned in Chapters 5 and 31 that bool is really just a subclass of int with two predefined instances, True and False. This is true, but the implementation is actually a bit more subtle. For example, the following almost works, but not quite--the initialization-time value passed is consumed by the int type's __new__ method to set integer state, and is used in later math regardless of the self.val of this class:
    >>> class myBool(int):
    ...     def __init__(self, value):
    ...        self.val = 1 if value else 0     # val goes to int.__new__ first!
    ...     def __repr__(self):
    ...        return 'True' if self.val else 'False'
    ...     __str__ = __repr__
    ...
    >>> myTrue  = myBool(1)
    >>> myFalse = myBool(0)
    >>> myTrue
    True
    >>> myFalse
    False
    >>> myTrue + 8              # really uses int's state, not self.val
    9
    >>> myFalse - 3             # really uses int's state, not self.val
    -3
    >>> myOther = myBool(9)     # but doesn't use self.val==1 here!
    >>> myOther
    True
    >>> myOther + 3             # really a int(9) with a __repr__
    12
    >>> myOther.val
    1
    
    To see how bool really works, you need to study its C source code in Python's boolobject.c file. One possible emulation in Python code is the following -- define an int subclass whose __new__ operation always returns True or False, which are really just int objects but have a __class__ pointer referring to bool in order to obtain its __repr__ behavior:

    (Footnote: as mentioned on Page 707 and in Chapter 39, __new__ is a rarely used overloading method called to create an instance, before __init__ is run to initialize the new instance which __new__ returns. Most classes define just __init__ and allow __new__ to default to built-in code which creates and returns a new instance, but __new__ has some advanced roles in metaclasses, and can be used for some coding patterns such as singletons: classes that make at most one instance, and return it on later construction calls.)
    >>> class myBool(int):
    ...     def __new__(self, value):
    ...         return myTrue if value else myFalse
    ...     def __repr__(self):
    ...         return 'True' if self else 'False'
    ...     __str__ = __repr__
    ...     # plus __and__, __or__, __xor__ redefines here to retain type
    ...
    >>> myTrue = int(1)
    >>> myTrue.__class__ = myBool
    Traceback (most recent call last):
      File "", line 1, in 
    TypeError: __class__ assignment: only for heap types
    
    As you can see, this doesn't work in pure Python code in 3.X, though; the C implementation gets away with this in lower-level terms A pure Python solution might look like the following, but requires overriding the obscure __new__ method, and redefining with a factory function to ensure that at most two instances are ever created:
    >>> class myBool(int):
    ...     def __new__(self, value):
    ...         return int.__new__(self, 1 if value else 0)
    ...     def __repr__(self):
    ...         return 'True' if int(self) == 1 else 'False'
    ...
    >>> myTrue  = myBool(1)
    >>> myFalse = myBool(0)
    >>>
    >>> myTrue, myFalse
    (True, False)
    >>> myTrue + 3, myFalse + 3
    (4, 3)
    >>>
    >>> def myBool(value):                          # factory
    ...     return myTrue if value else myFalse     # at most these two 
    ...
    >>> myBool(1), myBool(0)
    (True, False)
    >>> myBool('spam'), myBool('')
    (True, False)
    >>>
    >>> myBool('spam') == myTrue, myBool('spam') is myTrue
    (True, True)
    
    Of course, this still isn't the same because Python uses its own True and False internally in its C-language code for operations like the last line here. Experimenting further (e.g., see the builtins module) is left as a suggested exercise.



  10. [Jul-7-10] Page 238 and exceptions part: more on files and "with"

    [No fix required] The book mentions that the "with" context manager statement can save 3 lines of code compared tp the more genrally applicable "try/finally" when you need to guarantee file closures in the face of possible exceptions. It's also true that with" can even save 1 line of code when no exceptions are expected at all (albeit at the expense of further nesting and indenting file processing logic):
    myfile = open(filename, 'w')               # traditional form
    ...process myfile...
    myfile.close()
    
    with open(filename) as myfile:             # context manager form
        ...process myfile...
    
    If you really need to close your file, though, you should generally allow for an exception for unexpected system conditions with the longer try/finally alternative to the first of these as shown in the book:
    myfile = open(r'C:\misc\data.txt')
    try:
        ...process myfile...
    finally:
        myfile.close()
    



  11. [Sep-1-10] Page 333: code is deliberately abstract and partial

    [No fix required] A reader wrote:
    > I looked on your website for corrections to the book Learning Python 4th
    > edition for page 333 but did not find any. I am working through your
    > book on my own and found the program example on page 333 unclear and
    > broken, i.e. the "match(x[0])" is undefined. Can you explain this
    > example a bit more and give me an example definition for "x" and "match"
    > that will make this sample code run?
    
    Thanks for your note. You have a valid point: this code snippet is intended to be abstract and partial, but it's not explicitly described as such. In fact, the prior page's primes code is similarly abstract, though this fact is better noted there.

    The abstract code snippet in the book strips off successive items at the front of "x" and passes each into this function in turn. To make it work as real, live code, "x" would have to be a sequence such as a list or string, and "match()" would have to be a function which checks for a match against an object passed in. As a completely artificial example:
    x = list(range(100))
    def match(I):
        return I > 50
    
    A better example might make x a list of dictionary "records", and match() a test against a dictionary key's "field" value (e.g., looking in a database list for a record with a name value matching one selected by match()).

    I couldn't show a function like match() at this point in the book, though, without yet another forward dependency (functions are not covered until the next part). The goal here was to illustrate the loop else by itself. I also chose not to elaborate here because in practice a "for" loop is probably better than a "while" for this code, and iteration tools such as filter() and comprehensions might be better than both:
    x = [....]
    
    for item in x:
        if match(item):
            print('Ni')
            break 
    else:
        print('Not found')
    
    print('Ni' if [item for item in x if match(item)] else 'Not found')
    
    print('Ni' if list(filter(match, x)) else 'Not found')
    
    print('Ni' if any(item for item in x if match(item)) else 'Not found')
    
    print('Ni' if any(filter(match, x)) else 'Not found')
    
    Try running these on your own to see what I mean. Despite their conciseness, the downside of some of the latter of these is that they may wind up calling match() more times than required (for items after a match)--possibly costly if match() is expensive to run.



  12. [Jul-7-10] Page 355/357 footnote/text: popen iteration, call iter() first

    [No fix required] These sections describe the way that popen iterators fail in 3.X for certain use cases. The discussion is correct, but not complete. Technically, popen objects support I.__next__() but not next(I) directly, unless I = iter(I) is called first. Automatic iterations work because they do call iter() first, not simply because they run I.__next__() instead of next(I).

    In effect, the initial iter() call triggers the wrappr's own __iter__ which returns the wrapped object that actually has a __next__ itself. Without the initial iter(), clients instead rely on the wrapper's __getattr__ to intercept the call on next() and delegate to the wrapper, which no longer works in 3.X. Regardless, this is still a change, and an arguable regression, from Python 2.6.

    This is a subtle issue which is described in more detail in Programming Python 4th Edition, and can be studied in Python's os.py file, but as a quick summary:
    >>> import os
    >>> for line in os.popen('dir /B *.py'): print(line, end='')
    ...
    helloshell.py
    more.py
    
    >>> I = os.popen('dir /B *.py')
    >>> I
    <os._wrap_close object at 0x0148C750>
    
    >>> I = os.popen('dir /B *.py')
    >>> I.__next__()
    'helloshell.py\n'
    >>> next(I)
    TypeError: _wrap_close object is not an iterator
    
    >>> I = os.popen('dir /B *.py')
    >>> I = iter(I)
    >>> I.__next__()
    'helloshell.py\n'
    >>> next(I)
    'more.py\n'
    



  13. [Jul-7-10] Page 475: enclosing scopes lambda, common confusion

    [No fix required] A reader wrote the following about an example which been asked about enough times to warrant posting the interchange here:
    > The definition of knights() is shown as
    > 
    > def knights():
    > 
    > However, I think that it should be
    > 
    > def knights(x):
    >
    > Because 2 lines below refers to x in
    > 
    > action = (lambda x: title + ' ' + x)
    > 
    > I am not sure how the value of x is defined without being passed in.
    
    No, this isn't an error (try running this example's code yourself--it works exactly as shown in the book). This example does confuse, though; I believe I answered the same question for the 3rd Edition.

    The critical point here is that the lambda makes a new function which is returned (and not called) by knights. The function created by the lambda has the required "x" argument; when the knights return value is later called (by the code "act('robin')), knights is not called again--instead, the argument is passed to the "x" in the lambda function. The name "title" is fetched from the lambda function's enclosing scope, but "x" is the lambda function's own argument.

    If that's difficult to grasp, remember that lambdas can always be replaced by the name of a function previously defined with a def; here's the original and a def-based equivalent:
    def knights():
        title = 'Sir'
        action = (lambda x: title + ' ' + x)
        return action 
    
    act = knights()
    print(act('robin'))          # both print 'Sir robin'
    
    def knights():
        title = 'Sir'
        def action(x): return title + ' ' + x
        return action
    
    act = knights()
    print(act('robin'))
    



  14. [Jul-7-10] Page 594: exec() in functions requires eval() or ns['x']

    [No fix required] The bottom part of this page describes how to import a module given its name as a string. It uses exec() to import, and then uses the module's name as a simple variable; this works because the code is typed at the interactive prompt, and the module's name thus becomes a global variable on the fly.

    Note, however, that if you use exec() to import amodule by namestring within a function, you must also use eval() to reference the imported module, since its name is not recognized as an assigned local when Python creates the function. Passing an explicit namespace dictionary to exec() and later indexing it can have the same effect:
    >>> def f():
    ...    exec("import string")
    ...    print(string)
    ...
    >>> f()
    Traceback (most recent call last):
      File "", line 1, in 
      File "", line 3, in f
    NameError: global name 'string' is not defined
    
    >>> def f():
    ...    exec("import string")
    ...    print(eval("string"))
    ...
    >>> f()
    <module 'string' from 'c:\python31\lib\string.py'>
    
    >>> def f():
    ...     ns = {}
    ...     exec("import string", ns)
    ...     print(ns["string"])
    ...
    >>> f()
    <module 'string' from 'C:\Python31\lib\string.py'>
    



  15. [Sep-27-10] Page 638, middle of page: clarification on object attribute paths semantics

    [No fix required] A reader wrote with a question about the externally defined method on this page:
    > I have a question/request for clarification for self.name.upper() in the context 
    > of the text below:
    > 
    > "Even methods, normally created by a def nested in a class, can be created completely 
    > independently of any class object. The following, for example, defines a simple 
    > function outside of any class that takes one argument:
    > 
    > >>> def upperName(self):
    > ...     return self.name.upper()    # Still needs a self
    > 
    > There is nothing about a class here yet it's a simple function, and it can be called 
    > as such at this point, provided we pass in an object with a name attribute (the name 
    > self does not make this special in any way)"
    > 
    > My question: I am lost about self.name.upper().  Why is this self.name.upper() instead 
    > of simply self.upper()?
    > 
    > From the context, 'name' is an attribute of object x and also an attribute of class rec. 
    > How can this 'name' attribute have an attribute (the upper() function) of its own?  Is 
    > it a "nested attribute"?  Is there even such a thing in Python?
    
    Well, the code is correct as shown, but the "self" in it might be a bit confusing (it's just a simple variable name here). I'd call this nested objects, not nested attributes. To understand it fully, you must evaluate it the way Python does--from left to right, and one expression/operation at a time. Given "self.name.upper()", and adding parenthesis to emphasize the order of operations:

    1. (self.name) fetches the value of a "name" attribute from whatever object variable "self" happens to reference.
    2. ((self.name).upper) then fetches the value of an "upper" attribute from whatever object was returned by step 1.
    3. ((self.name).upper)() finally calls the function object that "upper" is assumed to reference, with no arguments.

    The net effect is that "self" references an object, whose "name" attribute references a (string) object, whose "upper" attribute references a (callable) object. It's object nesting; in general, that's what class instance state information always is--nested objects assigned to instance attibutes.

    And that's why it works to pass "x" to this function directly: "x" is a class instance object, whose "name" attribute references a string object with an "upper"; "x" has no "upper" attribute itself. The "self" function argument is just a reference to the same object referenced by vatiable "x", whether the function is attached to a class or not.



  16. [Sep-27-10] Page 974 1st paragraph: more on property use case example

    [No fix required] A reader wote with two questions about one of the property examples in the advanced managed attributes coverage of Chapter 37:
    > re: "To understand this code, it's crucial to notice that the attribute 
    > assignments inside the __init__ constructor method trigger property 
    > setter methods too."
    >
    > (Using python 2.6.5, linux) Stepping with pydev debugger through 
    > Attribute_validation_w_properties it appears instance attribute 
    > assignments are only intercepted by properties for re-assignments, 
    > eg. bob.name = 'Bob Q. Smith' but not during instatiation since 
    > self._name remains 'Bob Smith" not "bob_smith" as setter implies. 
    > Correct ???
    > 
    > Also: inside __init__ "name mangling" missing out and perhaps just 
    > one leading underscore ?
    > 
    > self._acct
    > self._name
    > self._age
    > self.addr
    
    No, the example does work as shown and described, and the "__name" attributes format is intended. However, this is arguably one of the most subtle examples in the book, so I'll try to clarify a bit here. To see that the setter is indeed called for assignments in __init__ at instance creation time, try adding a print() in the setter methods, and either run the self-test code or import the class and create an instance interactively:
    class CardHolder...
        def setName(self, value):
            print('in setName')
    
    >>> CardHolder('11111111', '25', 3, '44')
    in setName
    <test.CardHolder object at 0x01410830>
    
    The setter is called from __init__ when the instance is first created and the attribute is assigned, under both Python 3.X and 2.X. Also make sure that you derive the class from "object" under 2.X to make it a new-style class. As explained earlier in this chapter (and in Chapter 31), property setters don't quite work under 2.X without including "object" in the superclass list; once an attribute name is mistakenly assigned directly on an instance, it hides the property getter in the class too (perhaps this was the entire issue here?):
    class CardHolder(object):    # required in 2.X
    
    With this change results under 2.6 and 3.1 are identical. You'll also need to use 2.X-style print statements or a from __future__ for 3.X-style print calls, of course; see earlier in the book for print() in 2.X:
    from __future__ import print_function
    
    The other oddness in this example (which is covered earlier in the book but perhaps not explained as explicitly for this example itself as it could have been) is that names beginning with 2 underscores like "__name" are pseudo-private attributes: Python expands them to include the enclosing class's name, in order to localize them to the creating class. They are used intentionally here to avoid clashing with the real attribute names such as "name" that are part of the class's external client API. Python mangles each in-class appearance of the attribute like this:
    __name  ...becomes...  _CardHolder__name  
    
    The single underscore naming pattern "_name" used elsewhere in this chapter is a weaker convention that informally attempts to avoid name collisions, but "__name" truly forces the issue, and is especially useful for classes like this one which manage attribute access but also need to record real state information in the instance. Clients use "name" (the property), and the expanded version of "__name" (the data) where state is actually stored is more or less hidden from them. Moreover, unlike "_name", it won't clash with other normal instance attributes if this class is later extended by a subclass.


Older clarifications: see the first printing clarifications page

Back to this book's main updates page



[Home page] Books Code Blog Python Author Train Find ©M.Lutz