[LP5E cover]

LP5E: Recent Reader Queries

Last revised: July 2018

This page hosts replies to reader emails, for questions that either reflect issues raised by multiple readers, or otherwise seem of potentially broader interest. Content here is primarily related to the book Learning Python 5th Edition (LP5E), though a few general book posts have managed to sneak in as well. Items are posted as time allows; all personal details are of course removed; and this page's organization is as informal as its content—it's grouped into more- and less-technical topics, with newer items generally nearer the tops of these lists.

Related Resources

For more book-related resources, be sure to also see these external pages:

Content Here

More Technical

Less Technical

More on classmethod versus staticmethod

[Jul-2014] Learning Python, 5th Edition has an in-depth section on staticmethod and classmethod in Chapter 32, which includes examples of counting instances of classes in a small framework with both tools. I recently posted an even smaller, self-contained example which may help some readers further clarify the distinction between these tools:

Read the code here

[Back to Index]

Formal Inheritance Rules: PyRef5E Excerpt

[Jul-2014] Learning Python, 5th Edition covers new-style inheritance in full by tutorial and example, especially in the Metaclasses Chapter 40 in its Advanced Topics part. As a supplement for this book's readers, see also the following concise summary of Python's formal inheritance rules, a preview excerpted from Python Pocket Reference, 5th Edition:

Read the excerpt here

[Back to Index]

More Benchmarks: str.format() Is Slower than % Formatting

A Python Pocket Reference reader suggested that the str.format() method should be avoided in programs where performance matters. Per benchmarks, str.format() currently runs 30%-40% slower than the % formatting expression, and in some cases as much as 10X slower. This is an important optimization finding in itself, but also touches on the benchmarking topics in Learning Python, 5th Edition (which has a new full chapter devoted to the subject), and serves as an opportunity for a supplemental example here.

For context, the reader note/reply appear below. The related content:

> -----Original Message-----
> From: ...
> To: lutz@rmi.net
> Subject: Python Pocket Reference: Performance of string formatting methods
> Date: Tue, 04 Mar 2014 18:48:56 -0500
> [...]
> In the chapter about "String formatting" you write that there is no
> compelling reason to prefer the '%' over the 'format()' method and vice versa. 
> Maybe you could briefly mention in the next edition of the book that '%' is 
> currently a lot faster (in Python 2.x and Python 3.x) than the 'format()' 
> method call. I did a quick comparison not too long ago if you are interested: 
> http://sebastianraschka.com/Articles/2014_python_performance_tweaks.html#string_assembly
> This might be worthwhile considering for extensive (and expensive) string 
> operations.

A follow-up: thought you may be interested in some quick
benchmarks I just ran for str.format() and %, on 3.3.4, 
2.7.3, and PyPy 1.9.  Script and results are attached.

Please draw your own conclusions, but it appears that  
str.format() is consistently on the order of 30% to 40% 
slower than % in my tests, when other factors are removed.  

As usual, though, this can vary by test context.  Your 
comprehension-based test code does indeed show str.format()
10X slower than both + and %, though that higher number may 
be a function of the list comprehension itself.  When
a function call is added to the % equivalent, it's almost
as slow as str.format(), though still slightly quicker.  

In other words, it may be that the reason for the 10X 
slowdown for str.format() is the cost of its inherent 
function call (always slow in Python).  Within a list 
comprehension (only?), Python seems to be able to run 
a % as a quicker in-line expression instead.  In other 
contexts, the function call cost seems to be moot.

That seems odd — so much so that I wonder if my tests
are missing something.  In any event, please take these
as preliminary, and feel free to follow-up if you find 
something askew.  Still, it is true that % is quicker in 
all Pythons, by at least a double-digit percentage: a big
factor for some code, and cause to prefer % in general.  

It's also true that 2.X almost always checks in faster than
3.X in these tests (and PyPy is stunningly faster than both),
but that's fairly normal.

--Mark Lutz (http://learning-python.com)

[Back to Index]

New-Style Inheritance "Breadth-First" Search Order in Diamonds

A reader wrote to ask about the superclass search order of new-style classes in diamond-pattern inheritance trees—which seems as good an excuse as any to clarify this subtle matter here. Also note that Python Pocket Reference, 5th Edition now includes a new sketch of this post's topic; see the excerpted section on this query's topic described above.

> -----Original Message-----
> From: ...
> To: lutz@learning-python.com
> Subject: A question in 'Learning Python 4e'
> Date: Sat, 19 Oct 2013 10:57:48 +0800
> I have question about 'Diamond pattern' in chapter 31 at page 
> 783. (Learning Python 4e)
> I write a test code like this:
> class A:
>      attr = 1
> class E:
>      attr = 5
> class F(A):
>      pass
> class B(F):
>      pass
> class C(A):
>      attr = 2
> class D(B, C):
>      pass
> x = D()
> print(x.attr)
> If I run it in python3, it always shows 5.
> But if the search breadth-first, it should show 2.
> What's the real search order of this code?
> -- 
> Best Wishes!

Hmm; this can't be the code you're testing, as it cannot
produce the result "5" under either Python 2.X or 3.X.  
Your class "E" with attr=5 is not referenced anywhere,
and is not part of the class tree searched by inheritance
from class D, as the following ASCII sketch tries to show:

   F     A:1
   |     |
   B     C:2
    \   /

In fact, your code runs fine with "E" commented-out entirely.
This was probably just a typo in your email, but be sure you
are testing the code in the book exactly.  As show in the book,
(and even as given in your email with the unused "E" class),
this example:

--In Python 2.X, prints "1", because of 2.X's strict depth-first
  then left-to-right DFLR search order (it reaches "A" through 
  the leftmost branch first).

--In Python 3.X, prints "2", because 3.X's MRO search is more
  breadth-first in diamonds only (it reaches "C" before climbing 
  to any higher "A").

That is:

   C:\temp> py -2 di.py

   C:\temp> py -3 di.py 

Really, the first behavior holds true for classic classes in 2.X,
and the second for both 3.X and new-style classes in 2.X; in terms
of default class models, though, its 2.X and 3.X behavior; that
difference was the main point of this example.

There's expanded and more detailed and formal descriptions of the 
3.X (new-style) MRO search order in this book's new 5th Edition.
Technically, the 3.X order scans trees depth-first and then 
left-to-right collecting all classes along the way, but retains 
only the final (right-most) occurrence of each class in its linear
MRO ordering.  This both orders the search and removes duplicates.

The net effect of this is roughly breadth-first in diamonds (only),
because common superclasses are visited later than in 2.X, and 
just once per inheritance search.  In your example code, the 
DFLR ordering [X, D, B, F, A, C, A] becomes the MRO ordering
[X, D, B, F, C, A], which accounts for C's precedence in 3.X.

That said, I suspect what you're seeing is the effect of changing
B's superclass to E:

   class E:
        attr = 5

   class B(E):

generating the following different inheritance tree:

   E:5   A:1
   |     |
   B     C:2
    \   /

In this case, both 2.X and 3.X print "5" for the attribute,
taken from class E in the left-most branch.  It's true that 
this is not breadth-first, but this is also not a diamond.  
As stated in the book, the breadth-first effect applies only 
in _diamond_ cases in new-style classes, because all but the
last appearances of common superclasses are filtered out; 
non-diamonds still are effectively DFLR, because each class 
appears just once in the tree.  

More formally, the DFLR and MRO orderings in non-diamonds 
are the same, because there are no duplicates to remove; 
here, both are [X, D, B, E, C, A], which is why the higher
"E" wins in both Python lines.

The 4th Ed of this book was deliberately informal and terse
on this topic due to its complexity, but it gets much fuller 
coverage in the 5th Ed (including 10 pages on the MRO and 23
pages on its super() use case).  Also note that the MRO order
is just something of a nested concept in the complete new-style
inheritance model, a procedure which encompasses descriptors 
and metaclasses, as summarized here:


The 5th edition covers this and other topics more completely
too, which accounts for most of its higher page-count.

--Mark Lutz (http://learning-python.com)

[Back to Index]

Generator State Suspension versus Pickling

A reader wrote to ask about the relationship of generators to pickling. Per later interaction, it appears that this issue stems from a web-based Python interpreter emulator's auto-pickling of objects used at the interactive prompt. That is, point #2 below was the culprit, in the context of saving session state when interacting with a web server. If generator or other examples fail for you on pickling errors, try a different—and non-web-based—interpreter interface.

> -----Original Message-----
> From: ...
> To: "lutz@learning-python.com" <lutz@learning-python.com>
> Subject: Question from book
> Date: Tue, 8 Oct 2013 12:09:41 -0400
> Hello Mark!
> First off, I bought your learning python book and want you to know that I 
> really appreciate it and am enjoying it greatly!!
> I am about halfway through it though when I've gotten up to the section on 
> function generators, and when I try to follow your code (page 593-594) I get 
> a 'cannot pickle generators' error.. Am I doing something wrong? The reading 
> I have done so far on the Internet confirms that one cannot pickle a 
> generator, but u seem to do it in the book and I'd like to understand what 
> I'm missing. 
> Thanks!
> ...
> Sent from my iPhone

Thanks for your note.  As for your pickle issue: I'm not
sure I can tell how this arose from your note alone, but
generators are a notoriously confusing topic.

The generators coverage and examples around page 593-594 
don't mention or use pickling in any way.  Pickling is 
presented in the Chapter 9 files section, but then not 
mentioned again until the OOP part, where it is deployed
anew.  So, my best theories are that either:

1) Perhaps you're assuming that the state suspend/resume 
behavior of generators implies pickling to a file?  It 
doesn't — generators are purely in-memory tools.  They 
suspend their state in memory when they yield() a result, 
and resume that in-memory state on later next() calls to 
pick up where they left off.  Really, their "state suspension"
is simply remembering variable values and code location.  
External files don't enter into it, and there is no need
to use the pickle module in any fashion to make generators 
work.  They're just resumable functions in memory.

2) Perhaps you're working in an IDE that attempts to pickle
objects in your interactive session automatically?  This 
sounds error-prone to me, and doesn't happen in IDLE or at 
a command-line shell prompt interface; but it's conceivable
that some interfaces may try to save your interactive state
as you work, to allow you to pick up where you left off in 
a prior session (or web page).  This seems a bit far-fetched,
though; beyond generators, nothing with system state would 
work at the interactive prompt in such a tool, including files.

For reference, below my signature line is the code I think 
you're referring to, working in Python 3.3 (in IDLE; a shell
prompt is the same, but the prompt may act a bit different).  
If I've misread your question entirely — always a possibility
with email — please feel free to clarify in a follow-up.

--Mark Lutz (http://learning-python.com)

# The page 593-594 code in question?

   >>> def gensquares(N):
           for i in range(N):
               yield i ** 2        # Resume here later
   >>> for i in gensquares(5):     # Resume the function
           print(i, end=' : ')     # Print last yielded value
   0 : 1 : 4 : 9 : 16 : 
   >>> x = gensquares(4)
   >>> x
   <generator object gensquares at 0x00000000032EEEE8>
   >>> next(x)
   >>> next(x)
   >>> next(x)
   >>> next(x)
   >>> next(x) 
   Traceback (most recent call last):
     File "<pyshell#10>", line 1, in <module>
   >>> y = gensquares(5)
   >>> iter(y) is y
   >>> next(y)
   >>> list(y)
   [1, 4, 9, 16]

# The following fails, but it's not shown or mentioned in the book:

   >>> import pickle
   >>> pickle.dumps(y)
   Traceback (most recent call last):
     File "<pyshell#17>", line 1, in <module>
   _pickle.PicklingError: Can't pickle <class 'generator'>: attribute lookup 
   builtins.generator failed

[Back to Index]

Code Listings without Outputs?

A reader posted on O'Reilly's "Get Satisfaction" support forum with confusion stemming from formatting of code. To head-off further confusion, I also posted this as a clarification to the book's errata list, requesting the following text insert just before this code: "(the following snippets both print Bob's 2-item job list if run live and provided with another record structure)".

> From: O'Reilly Media [noreply.oreilly@getsatisfaction.com]
> Sent: 10/18/2013 10:31 AM
> To: getsatisfaction@oreilly.com
> Subject: New question: don't understand code at bottom of pg 261 of learning python
> ... just asked this question in O'Reilly Media: 
> "don't understand code at bottom of pg 261 of learning python"
> At the bottom of pg 261. why does >>>db[0]['jobs'] gives no response rather 
> than ['developer,'manager]? Same question for top of page 262 for 
> >>>db['bob']['jobs']. 

This code is not being run at the interactive prompt which 
prints results; it's just being listed as an example here,
and an abstract one at that.  Notice the lack of a ">>>" 
interactive prompt or bold font in the book, the same as in 
the other output-less code on page 262.  Also notice the 
italicized "other"; it's supposed to stand for another 
record structure.

This code will print what you expect if run live, as in the 
following (I'm using a string for "other"):

   >>> rec = {'name': 'Bob',
   ...        'jobs': ['developer', 'manager'],
   ...        'web':  'www.bobs.org/Bob',
   ...        'home': {'state': 'Overworked', 'zip': 12345}}
   >>> db = []
   >>> db.append(rec)
   >>> db.append('other')
   >>> db[0]['jobs']
   ['developer', 'manager']
   >>> db = {}
   >>> db['bob'] = rec
   >>> db['sue'] = 'other'
   >>> db['bob']['jobs']
   ['developer', 'manager']

When in doubt, try running the examples yourself (the book's 
examples package and ebook cut-and-paste can help), and don't
expect to see outputs in the book for code not typed at the 
">>>" prompt.  This is especially true of later, larger code.

--Mark Lutz (http://learning-python.com)

[Back to Index]

Running Scripts 1: Commands and Prompts

Some basic pointers for beginners getting used to where to type commands, courtesy of a fellow learner's query. Today I'd also mention the PyEdit program available on this site (and noted ahead) as another program edit+run option.

> -----Original Message-----
> From: ...
> To: lutz@rmi.net
> Subject: Learning Python, 4th Edition
> Date: Sat, 7 Sep 2013 16:48:54 -0700
> Mr. Lutz,
> I am currently trying to go through the fourth edition of Learning Python.
> I'm on a Mac and I'm using Python 3.1.2.
> I've tried numerous times to get the very first script on page 44 to run,
> yet I get the following error:
> Traceback (most recent call last):
>     File "", line 1, in 
>        print (script1.py)
> NameError: name 'script1' is not defined
> Now, I've tried just entering script1.py, which has given me the same error.
> As I said, I'm running 3.1.2 using the IDLE launcher. I entered the
> statements just like the example on page 44 into the text editor that's in
> IDLE. I then saved it as script1.py to my desktop (so it's in
> /Users/myname/Desktop/script1.py).
> Any suggestions as to what I might be doing wrong? Maybe I'm not defining
> something correctly or not saving the file to the right location. This is
> my first time programming, so I'm somewhat lost.
> Thank you for any help you can provide. Have a good weekend.

It looks like you may be confusing prompts, a very common
source of frustration for newcomers.

Be careful in this chapter not to confuse the system prompt 
(where you launch Python and script files) with the Python 
prompt (where you run Python program code only).  This is 
stated and alluded to at various points in the chapter.

This means you can't type just "script1.py" at the ">>>"
Python prompt to run a script file.  That's a system command,
to be typed at the system prompt (a terminal window on Macs).
To run a script file from the ">>>" prompt, the chapter later
shows that an "import" statement or "exec" call will suffice, 
but both are Python code typed at ">>>", not system commands. 

This also means that you can't run a script by coding its
name in a Python print() statement; your print is invalid 
syntax (you'd need to quote strings, but that still wouldn't
run an external file of that name), but it also confuses 
system and Python commands. 

In the end, though, you're probably best advised to create
and launch your files with the IDLE GUI at least when 
starting out, as it avoids system prompts altogether.
IDLE details are given in their own section later in the
chapter too.  This chapter is a catalog of launching 
techniques for a broad set of readers and backgrounds,
and you should feel free to pick and choose as you like.

--Mark Lutz (http://learning-python.com)

[Back to Index]

Books and Other Training Options

Someone wrote asking about book and training choices. For a related post, see also the newer notes on book selection on the purchase pointers page.

> -----Original Message-----
> From: ...
> To: lutz@learning-python.com
> Subject: Python Training
> Date: Sat, 7 Sep 2013 12:48:12 +0530
> Dear Mark,
> I am resident of India, and residing @present in Jaipur. I have my computer
> background & worked in the IT industry late back in 1992-93, with
> programming expertise in MySQL. Left the industry in 1995.
> Once again I wish to be part of IT industry & would like to start up with
> Python (advance level training). Should I refer your book or distance
> learning training program if any.
> Kindly advice me on training aspect & if I have to refer your books then
> please let me know the complete name of set of books which will help me in
> programming from basic to advance level of Python.
> Looking forward to your kind reply.

My 3 books are designed to work together as a set, and 
function like a self-paced comprehensive Python course:

--"Learning Python, 5th Edition" is a tutorial that covers
foundational concepts of the Python language itself.

--"Programming Python, 4th Edition" is a tutorial that moves
on to explore how to apply Python in common applications.

--"Python Pocket Reference, 4th Edition" is a concise 
reference-only supplement, that complements the other two.

Together, these 3 were written to provide a self-paced 
experience similar to that had by students in the Python
classes I teach, but with substantially more depth.  They 
have been used by many to get started in Python.  They've 
also evolved over the last 2 decades, and so reflect 
changes that have occurred in the software field in general.

That being said, there are many ways to pick up Python,
and many types of readers with differing backgrounds and
goals.  With hundreds of Python books to choose from today, 
I encourage people to weigh the options for themselves.

Some online tutorials and training options are also no 
doubt useful, though I'd recommend caution on that front.
Some offerings may fall short, especially in the for-pay 
category; as is common, Python has grown large enough to 
attract lower quality products more interested in profit
than teaching.

However you proceed, best wishes with your Python
explorations.  You'll probably find it to be a very
productive choice, especially compared to the tools
prominent in the early 1990s (some of which were my 
own inspiration for exploring Python).

--Mark Lutz (http://learning-python.com)

[Back to Index]

About Future Editions of My Books

Update see also the newer FAQ page about this topic.

I've received multiple queries about possible upcoming editions of my books after the new 5th Edition of Learning Python appeared in mid-2013. In short:

The first of these is new and doesn't require an update, of course, but here is the best detailed information I have on the other two as of this reply's latest revision.

Python Pocket Reference

I've begun working on a 5th Edition of this book, but it will be months before it is released. When it appears (most likely early in 2014), it will be only a very minor refresh for Pythons 3.4 and 2.7. I do not recommend waiting for this still tentative and minor update if you're looking for a reference resource now. The current 4th Edition is largely current today, as long as you think "3.X" when you see "3.0" or "3.1" per the book's introduction, and browse Python's What's New? documents for recent changes in 2.7, 3.2, 3.3, and 3.4.

Update, Nov 11, 2013 O'Reilly already has a web catalog page for this book, though some of the details are to be tweaked (e.g., it's 230 est. pages, March seems a worst case, and it's been updated for both 3.4 and 2.7). I've also now posted a preliminary page for this book on this site as well, with a draft of its introduction for content details.

Update, January 2014 This update is now available; see the aforementioned page for details. In the end, it wound up with 50 pages of new material, including fresh coverage of the MRO and inheritance, super(), namespace packages, enumerations, JSON usage, and more. See the Introduction preview on its page.

Programming Python

There are today no plans for a new edition of this book. Its latest edition less than 3 years old, is already current with 3.X, and is fully relevant as is. Some libraries it uses have changed in minor ways that may imply changes in some example code—for instance, see the latest examples package release for Python 3.3, 3.4, and 3.5—but that's in itself a fair lesson about development; change is a constant in the software world. As warranted, example updates for future Pythons will be posted both here and at oreilly.com.

More generally, while it may be premature to label any of my current publications as final editions, I hope they will continue to serve Python 2.X and 3.X users for many years to come. For future Python changes, watch the books' update pages on this site, as well as Python's own What's New? documents. For more on Python's status, see both the introduction and conclusion to Learning Python, 5th Edition (Chapters 1 and 41).

[Back to Index]

Suggestions for Mac and Windows Users

A reader wrote with the following suggestion on using Python 3.X on Macs:

> Dear Sir,
> Thank you for your book "Learning Python".
> May I suggest that on your FAQ web page you point Mac users to this page: 
> [edit: this page is now defunct; try this or this]
> I don't think I would have been able to make Python 3 the default 
> version on my Mac without the help provided on this page, and there's 
> nothing worse, when you are trying to learn alone, than to be stuck 
> right at the beginning.

You can also find Mac-specific resources in the notes on this site's Programming Python page, though most of it is GUI related; search for "Mac OS" or "Mac" there, and try the same with the Search button on this page's toolbar below (Mac OS shows up on a variety of pages at this site).

And speaking of getting stuck, some Windows users may benefit from the extra Windows launcher notes on this page; in short, file associations may fail to be set correctly in some installs.

[Back to Index]

About "The Shallows" Mentioned in the Preface

Some clarifications on recommended reading.

> -----Original Message-----
> From: ...
> To: lutz@rmi.net
> Subject: The shallows
> Date: Tue, 14 Jan 2014 01:07:03 -0800 (PST)
> hi Mark,
> You had paid tribute to a book - The shallows in LP5e. ᅠI'm curious about it.
> Which exact book did you refer to? ᅠThere are quite a number of them with 
> similar title.
> Thank you.

In full detail, the book I mentioned is:
    The Shallows: What the Internet Is Doing to Our Brains 
    by Nicholas Carr

There are Amazon and Wikipedia pages for it here:

Per the latter of these, it had a different subtitle when 
published in the UK.

It's a bit controversial, but was a Pulitzer Prize finalist,
and raises questions that deserve to be asked about some of
the Web's impacts, while there may still be time to address 
them.  Its look at cognitive research is illuminating;
contrary to current dogma, the focus-breaking nature of the 
web probably isn't a win for anyone but advertising companies.  

--Mark Lutz (http://learning-python.com)

Update, Dec-2014 Carr has a new title released later in 2014: The Glass Cage: Automation and Us, on the deskilling effects of automation in knowledge-based domains, and the dangers of entrusting our society and lives to opaque technologies laden with hidden agendas. It's highly recommended reading for anyone interested in the ethical aspects of the software field (and that should be just about all of us).

Update, Apr-2015 Another reader asked about the implications of The Shallows—see the later related reply on this page.

[Back to Index]

Matching a User-IDs File to an Email-Addresses file

A reader wrote in March 2014, asking for help with a typical file processing problem: collecting the entries in an email address file that are not also present in a file containing email names (not full email addresses, as I understand the goal) of people who have left the company. This touches on both the basic string processing and file tools in the foundational Learning Python, as well as pattern matching and email parsing tools covered in the application-focused Programming Python:

> [...reader email omitted...]

Have you resolved this yet?  If not, here are some ideas.
The 'in' operator scans for a substring:

   >>> 'bob' in '123bob456'
   >>> ('bob' in 'bob'), ('bob' in 'bob@spam.com'), ('bob@spam.com' in 'bob@spam.com')
   (True, True, True)
   >>> ('bob' in 'other')

It's also not very accurate, as it will find the substring in any context:

   >>> ('bob' in 'bobsled@spam.com'), ('bob' in 'sue@bob.com')
   (True, True)

The str.find() method does the same, but is also inaccurate:

   >>> '123bob456'.find('bob'), 'other'.find('bob')
   (3, -1)

You could resort to a pattern match:

   >>> re.match('.*?(%s).*?@.*' % 'bob', '123bob123@spam.com')
   <_sre.SRE_Match object at 0x0000000002B1B8A0>
   >>> re.match('.*?(%s).*?@.*' % 'bob', '123bob123@spam.com').groups()

But it's probably overkill: splitting on "@" and taking the 
result's [0] item will get just the name before the domain:

   >>> 'bob@spam.com'.split('@')
   ['bob', 'spam.com']
   >>> 'bob@spam.com'.split('@')[0]

In the worst case, full-blown email name/address pairs can be
parsed with the email package:

   >>> from email.utils import getaddresses
   >>> pairs = getaddresses(['bob@spam.com', '"Bob" <bob@spam.com>'])
   >>> pairs
   [('', 'bob@spam.com'), ('Bob', 'bob@spam.com')]

Assuming the split('@') suffices, though, the code using it
would look something like this:

   leavers = [line.rstrip() for line in open('leavers.list')]
   for line in open('user.db'):
       user = line.rstrip()
       if user.split('@')[0] not in leavers:
           print(user) # write addr to a file here

And again, set comprehensions might suffice (though this version
prints just the users' names, not their full addresses, and sets
both remove duplicates and reorder the original file's lines):

   leavers = {line.rstrip() for line in open('leavers.list')}
   users   = {line.rstrip().split('@')[0] for line in open('user.db')}
   for notleft in users - leavers:
       print(notleft) # write name to a file here

Beyond this, I recommend working through the book and 
experimenting interactively to see how various tools work.

> -----Original Message-----
> > [...earlier reader email omitted...]
> It sounds like you're trying to compute a file difference,
> right? — select all lines in a file that are not in another
> file?  If so, I might code this as follows:
>    leavers = [line.rstrip() for line in open('user.list')]
>    for line in open('user.db'):
>        user = line.rstrip()
>        if user not in leavers:
>            print(user) # eventually I'll write to a file here
> Or, since this seems a set difference, I'd do it with sets
> if the files are small enough to fit in memory, and the db
> contains no duplicate you care to retain in the result (use
> set(L) in 2.6 and earlier, as it has no set comprehensions):
>    leavers = {line.rstrip() for line in open('user.list')}
>    users   = {line.rstrip() for line in open('user.db')}
>    notleft = users - leavers
>    for user in notleft:
>        print(user)
> Perhaps the any() is the confusing bit: it returns true if
> any of its iterations are true — not if its argument is
> simply empty (which you can test without any(), as empty
> means false in Python).
> --Mark Lutz (http://learning-python.com)

[Back to Index]

More on Benchmarking: Understanding timer.bestof() Results

The following is a reader thread from June 2014, dealing with some of the subtleties of timing results reported in the book's new benchmarking chapter. This may not make very much sense without the book's context, but underscores common timing factors, and points out a potential Mac-specific issue.

> -----Original Message-----
> From: ...
> To: Mark Lutz 
> Subject: Fwd: Not really errata, but very curious
> Date: Sat, 7 Jun 2014 20:57:25 -0700
> Mark,
> When I replaced time.time with time.perf_counter, the 
> "timer.bestof(1000, str.upper, 'spam')" 0.0 anomalies disappear
> Begin forwarded message:
> > From: ...
> > Subject: Not really errata, but very curious
> > Date: June 7, 2014 at 1:45:03 PM PDT
> > To: Mark Lutz 
> > 
> > Mark,
> > 
> > In chapter 21, following along, I ran the examples you gave in the
> > book, specifically:
> > 
> > timer.bestof(1000, str.upper, 'spam')   
> > (see text below from your examples)
> > 
> > Each time I ran it, I would get (0.0, 'SPAM').
> > This was true even if I reduced the reps to 10.
> > 
> > Could the cause of it be the value of the preset epsilon?
> > I am running python3.4 on a Mac OSX 10.9
> > >>> sys.float_info
> > ...
> > Using print statements, I see multiple occurrences of elapsed 
> > being set to 0.0 on repeated runs with even 10 reps.
> > 
> > I do not expect to see so many computed elapsed times of 0.0
> > 
> > On the same page (see text below),
> > why is the return best value, tuple[0] always greater than the last 
> > computed elapsed time, tuple[1][0]?  That seems counter intuitive to me.
> > 
> > ---------------------------------
> > My PDF page 632 (in part):
> > 
> > [...book text omitted...] 
> >  

Have you worked this one out on your own yet?  If not,
here's a quick review.

1) About the time module calls: no idea why time.time() is
failing on Macs; this is actually fairly disturbing, as 
the newer time.*() calls in 3.3 (per page 633's sidebar) 
are not available until 3.3 on any platform.  All other 
Pythons will have the issue on Macs.  To be explored, but 
I'd also try time.clock() on the Mac, as it's in 2.X too.

2) About the bestoftotal() results: I agree it seems counter
intuitive, but there are two factors contributing to this:

a) There is some small amount of overhead time that elapses 
between the timer() start and stop calls in bestof(), which 
is added to the time reported by the nested total() call's 
time result.  The timer() calls in total() will thus always 
show a shorter elapsed time — which is why it may be better
to use a min(total(....)) approach in general, as explained 
near the top of page 633.

b) Look carefully at the what bestoftotal() is actually 
returning: it's the same as the bestof() call, and contains
the best (i.e., min) time among all timed total() runs in 
tuple[0], along with the _final_ (not best) return value from
total() in tuple[1] — a total time and function result as a
nested tuple.  Hence, the bestof() result's tuple[1][0] time
from total() doesn't necessarily correspond to the best time 
reported in bestof()'s tuple[0] — which is itself skewed a 
bit up by the overhead inside bestof()'s code per the prior 
point.  Complex, but true; these two will likely never agree.

--Mark Lutz (http://learning-python.com)

[Back to Index]

Programming Concepts for Younger Readers

Another reader wrote to ask for advice on using Learning Python and other books for teaching programming concepts to his 12-year old child (not surprisingly, on-line book forums were not as useful as they might have been). There is no generic answer that applies to every learner, but here are a few pointers:

> [...March 2014 reader email omitted...]


I understand your position.  It's difficult to get good advice
on the Internet these days.  Asking an author about his own 
book probably isn't much better most of the time.  But as a
former parent of curious younger children too, I'll try to be 

Learning Python was designed to teach both programming and
the Python language at the same time.  It deals with common
software design issues like global variables, complexity, 
and even some ethics, to try to help readers become more
rounded practitioners.

That said, this is not a project-based book (it's a didactic
tutorial), so you might also be served by working on a tangible
project in parallel with the book, or surveying some of the 
Python books available for younger learners.  I don't know much
about these, but many are project based, some use gaming as
their vehicle, and some kids respond better to results-oriented
approaches.  If you go that route, I'd recommend combining it
with a more in-depth language resource like Learning Python; 
it's difficult for books that teach gaming to also teach 
Python or programming at large.

If you'd like to browse a sampler of Learning Python, 
O'Reilly has the first chapter posted on-line at the following 
site; this is a nontechnical overview, though, so it doesn't 
do much to show the meat of the book:


I'm also the author of Programming Python, which is entirely
application-focused, and does have a project focus (it constructs
GUIs, websites, and the like); but it's also too advanced for a
beginner to start out with, and assumes Learning Python's 
material as its prerequisite.  Then again, every learner is
different, so please judge for yourself.

--Mark Lutz (http://learning-python.com)

[Back to Index]

Software Engineering Isn't Trivial

A reader wrote having difficulty learning enough programing on a tight schedule to use a Python scientific library, and asking which parts of Learning Python would suffice in a 2-week window left on an internship. The reader also mentioned that the entire team was getting bogged down in programming tasks for the library, instead of focusing on their core science work. This is a common concern; my reply:

> [...March 2014 reader email omitted...]


I appreciate your dilemma.  Many systems expose a Python 
scripting layer these days, but not all of them properly 
insulate their users from the complexities underlying the 
API.  I don't know if this is the case with your system,
but full-scale software engineering is as complex and
substantial as other engineering domains.  See your
university's computer science degree requirements to 
see what I mean.  There's a reason that many offer BS, 
MS, and PHD programs in the field.

Asking non-practitioners to write basic code can work if 
the system's internals are encapsulated well.  Moreover, 
not all people need full computer science knowledge — 
much as people balancing their checkbooks do not need to 
know calculus or statistics.  Unfortunately, people have 
been sold the idea that programming is somehow trivial for 
everyone (and many seem to have accepted the myth in full).
That makes it all the worse when systems expose too much 
of their implementation.  Again, this may or may not be 
your case, but it's a general concern.

In terms of the book, I suggest that the first few parts up
to the classes/OOP part should suffice for simple, procedural
scripting.  This can probably be covered in weeks with focus.
But that assumes the library you're using does not expose 
classes and OOP, or advanced functional or metaprogramming 
techniques; if it does, you've got more material to cover.  
OOP alone is a software engineer's tool, which in my 
experience is often too much for most people outside the 
field (much like calculus is to those outside science). 

As I mention in the book's Preface, learning a modern 
software tool like Python is not a trivial or quick task.
The new-style inheritance algorithm alone is easily enough
to occupy a week for someone already having a CS degree. 
Add in generators and Unicode, and it's a sizeable effort.

However you proceed, best wishes with your goals.  In a 
perfect world, the advice I'd like to pass on is that if
you and/or your team are getting bogged down in writing 
code, you should either expect to put in the effort required
to learn software engineering well, or hire a professional 
in the field who has.  But I also realize that won't fly for 
people in your position, with tight schedules and a need
to write just enough code to customize a packaged system.

--Mark Lutz (http://learning-python.com)

[Back to Index]

Tracing Recursive Function Calls

A reader wrote O'Reilly's book support forum with questions about recursive call tracing functions included in the book's examples package. The dialog, with a few minor elaborations in the reply:

> -----Original Message-----
> From: BookTech 
> To: "lutz@rmi.net" 
> Subject: FW: Question regarding Learning Python 5E Mark Lutz
> Date: Tue, 5 Aug 2014 23:07:29 +0000 (GMT)
> Hi Mark,
> Can you clarify this for the reader?
> --------------- Original Message ---------------
> To: bookquestions@oreilly.com
> Subject: Question regarding Learning Python 5E Mark Lutz
> Hi,
> On page 559 the author refers to a script called sumtree2.py.
> The scripts has a lambda function called trace which the book quotes "*It
> adds items list tracing so you can watch it grow on both schemes, and can
> show numbers as they are visited so you see the search order.*"
> Unfortunately my script compiles so fast that all of the date shows up
> within a fraction of a second and unfortunately you can't see this
> phenomenon.
> I tried to modify the code to allow the trace function to be a delay
> mechanism:
> for i in [trace(items)]*int(1E6):
>     print('')
> But that didn't work either.
> Am I misunderstanding the text or am I missing something completely?
> Thanks!


This is about code not shown in the book, but included in the
examples package for readers to experiment with on their own;
it uses book examples, but adds tracing code and another variant.

In short, the "watch it grow" remark wasn't meant to imply that it 
would trace its progress slowly enough to observe in real time; it 
simply means that the included code can print status that can be 
inspected after the fact, to better understand the recursive calls.

Really, the included code simply has hooks for extra tracing, which
isolate display logic so that readers can flesh them out as desired. 
As provided, its two display functions print nodes as visited, but 
not the stack/queue lists:
   # as coded: show visits only
   trace = lambda x: None                 # or print
   visit = lambda x: print(x, end=', ')

Here are some pointers for alternative ways to code these:

1) If you want to also see the lists as they grow and shrink, make
trace a synonym for the 3.X print function (and add the required
__future__ import in 2.X at file top if you're using that version):

   # show lists too
   trace = print
   visit = lambda x: print(x, end=', ')

2) If you really want to slow the progress down, simply add a 
call to the time.sleep() function to pause between prints; see
library manuals for more on this call:

   # show lists only, and slowly
   import time                                     # insert a pause
   trace = lambda x: (print(x), time.sleep(0.5))   # secs, need parens
   visit = lambda x: None

[Sidebar: the parens are required in this lambda, because the
tuple "," here outranks the lambda expression.  Or, equivalently,
"lambda" binds tighter than ",", which is a tupling operator of
of lowest precedence, but only in contexts like these that don't 
treat "," specially.  Without parens, this lambda expression would
end after print(x) — yet another reason to code tuples in parens!]

Here is this version's output for the first breadth-first search:

   [1, [2, [3, 4], 5], 6, [7, 8]]
   [[2, [3, 4], 5], 6, [7, 8]]
   [6, [7, 8], 2, [3, 4], 5]
   [[7, 8], 2, [3, 4], 5] 
   [2, [3, 4], 5, 7, 8]
   [[3, 4], 5, 7, 8]
   [5, 7, 8, 3, 4]
   [7, 8, 3, 4]
   [8, 3, 4]
   [3, 4]

3) For full and gradual status, try something like this:

   # show lists slowly, visits slower
   import time
   trace = lambda x: (print(x), time.sleep(0.25))
   visit = lambda x: (print('=>', x, sep=''), time.sleep(0.5))

The last of these shows both lists and nodes visited, with a
quarter-second delay between list displays; here's what this
shows gradually for the first breadth-first traversal:

   [1, [2, [3, 4], 5], 6, [7, 8]]
   [[2, [3, 4], 5], 6, [7, 8]]
   [6, [7, 8], 2, [3, 4], 5]
   [[7, 8], 2, [3, 4], 5]
   [2, [3, 4], 5, 7, 8]
   [[3, 4], 5, 7, 8]
   [5, 7, 8, 3, 4]
   [7, 8, 3, 4]
   [8, 3, 4]
   [3, 4]

4) And finally, you can pause for a user Enter key press at
each step, if you really want an interactive trace (though
you'll probably tire quickly of keypresses for large trees):

   # show lists slowly, pause after visits
   import time
   trace = lambda x: (print(x), time.sleep(0.5))
   visit = lambda x: input('=>%s' %  x)           # 2.X: raw_input()

--Mark Lutz (http://learning-python.com)

[Back to Index]

More on Class Factories

A reader asked for clarification on an abstract code snippet in the Learning Python, 5th Edition:

> -----Original Message-----
> From: ...
> To: "lutz@rmi.net" 
> Subject: Question - Learning Python
> Date: Thu, 6 Nov 2014 20:11:01 +0000
> Hi Mark,
>                 Thank you very much for make available a comprehensive, 
> in-depth Python book for readers like me. The book is clearly elaborated and 
> carefully arranged. I enjoyed reading the book and practicing with the 
> examples with little difficulty. If I do have any questions so far, here is 
> one: in page 956, classname parsed from the configuration file was not used 
> as the first argument of the factory function. Instead, aclass returned from 
> getattr() call was used. Could you explain what is the difference between 
> those two names?
> Regards,
> ...

Thanks for your note and feedback.  About your question:
the example is correct as shown, but very sketchy and
understandably confusing.  The idea it means to capture 
is that:

  1) The string name of a class, "classname", is parsed
  from a text file.

  2) This string name is then used by getattr() to fetch 
  a class object, "aclass".

  3) This class object is than called to generate an 
  instance, "reader", which is passed on to other code.

So, "classname" is used for getattr() to fetch the "aclass" 
object, which is then passed on to factory().  factory() 
doesn't need the string name "classname" anymore, because 
it uses the already-fetched "aclass" object directly.

The suggested application of this is that the "classname"
string might be entered in a GUI, and used to fetch and
then call the class object to make an instance dynamically.

--Mark Lutz (http://learning-python.com)

[Back to Index]

More Fun with Comprehensions

[Aug-2014] Learning Python 5E has two full chapters on iteration, comprehensions, and generators. I've coded a few additional examples recently that might shed more light on these tools—though, like many comprehensions, they seem to require moments of great clarity more than some coding options might. See the recent example programs table for more on the two use cases mentioned below.

From the mergeall System:

# find differing bytes in two files, named by path1 and path2;
# assumes the files are small enough to fit in memory all at once:
# enumerate() and zip() are both iterables on 3.X that defer their
# results, but reading a file's bytes pulls them all into memory;

bytes1 = open(path1, 'rb').read()
bytes2 = open(path2, 'rb').read() 
[(ix, (b1, b2)) for (ix, (b1, b2)) in enumerate(zip(bytes1, bytes2)) if b1 != b2] 

Run live:

>>> path1 = r'c:\marks-stuff\sheets\somefile.XLS'
>>> path2 = r'c:\users\mark\desktop\somefile.XLS'
>>> bytes1 = open(path1, 'rb').read()
>>> bytes2 = open(path2, 'rb').read()                               # read raw bytes
>>> bytes1 == bytes2
>>> bytes1[:8], bytes2[:8]
(b'\xd0\xcf\x11\xe0\xa1\xb1\x1a\xe1', b'\xd0\xcf\x11\xe0\xa1\xb1\x1a\xe1')
>>> zipped = list(zip(bytes1, bytes2))
>>> zipped[:5]
[(208, 208), (207, 207), (17, 17), (224, 224), (161, 161)]          # combined bytes
>>> hex(208), hex(207), hex(17)
('0xd0', '0xcf', '0x11')
>>> [(b1, b2) for (b1, b2) in zipped if b1 != b2]                   # what differs?
[(250, 91), (11, 128), (165, 245), (157, 242), (173, 175)]
>>> len([(b1, b2) for (b1, b2) in zipped if b1 != b2])
>>> [(ix, (b1, b2)) for (ix, (b1, b2)) in enumerate(zipped) if b1 != b2]      # where?
[(145517, (250, 91)), (145518, (11, 128)), (145519, (165, 245)), (145520, (157, 242)), (145521, (173, 175))]

From the pystockmood System:

# list construction and filtering, for terms used in text matching;
# combine nouns with verbs, and remove any items having duplicate
# prefixes, else they will match to the subject text redundantly;

nouns = ['wall street',  'stocks', 'markets']                       # plus others
verbs = [('rises', 'falls'), ('rose', 'fell'), ('rise', 'fall')]    # [(good, bad)]

# combine noun/verb
goodterms = [(noun + ' ' + good) for noun in nouns for (good, bad) in verbs]
badterms  = [(noun + ' ' + bad)  for noun in nouns for (good, bad) in verbs]

# fixup: 'x rise' is a prefix of 'x rises' => count for first term only!
goodterms = [term for term in goodterms if not
                  [other for other in goodterms
                             if other != term and term.startswith(other)]]

badterms  = [term for term in badterms if not
                  [other for other in badterms
                             if other != term and term.startswith(other)]]

Run live:

>>> nouns = ['wall street',  'stocks', 'markets']                   
>>> verbs = [('rises', 'falls'), ('rose', 'fell'), ('rise', 'fall')]
>>> goodterms = [(noun + ' ' + good) for noun in nouns for (good, bad) in verbs]
>>> goodterms
['wall street rises', 'wall street rose', 'wall street rise', 
 'stocks rises', 'stocks rose', 'stocks rise', 
 'markets rises', 'markets rose', 'markets rise']
>>> goodterms = [term for term in goodterms if not
...                   [other for other in goodterms
...                              if other != term and term.startswith(other)]]
>>> goodterms
['wall street rose', 'wall street rise', 
 'stocks rose', 'stocks rise', 
 'markets rose', 'markets rise']

[Back to Index]

That Weird Iterator Example Sidebar in Chapter 20

[Dec-2014] A reader wrote seeking clarification about the iterators example in the sidebar on pages 621-622 (644-645 in later printings), titled Why You Will Care: One-Shot Iterations. The example:

def myzip(*args):
    iters = map(iter, args)
    while iters:
        res = [next(i) for i in iters]
        yield tuple(res)

taken verbatim from Python's standard manuals, works in 2.X but fails in 3.X, and was included as a prime example of the possibly unexpected consequences of 3.X iterator changes. In short, the change of map() results from lists to iterables is not just an interactive display issue; it can also lead to very subtle errors in 3.X, especially for code that expects the former list-like iteration behavior.

The incorrect code was included only to illustrate this point. Its intent was not to pick on Python manual writers; but if they didn't catch this, chances are good that it may trip up an unwarned LP5E reader too. I've trimmed much of the reader's mail below, as it chronicled a long and winding road in search of answers from other resources, including Stack Overflow, manuals, and mailing lists; indeed, this issue seems only weakly understood in the Python world at large. In any event, it has proven confusing for enough readers to merit a reply paste here.

-----Original Message-----
> From: ...
> To: lutz@rmi.net
> Subject: Iterators and iterables (string vs list)
> Date: Sun, 7 Dec 2014 12:01:40 +0200
> I'm currently EE student (last year) and I'm studying Python for my B.Sc.
> project.
> For the above purpose, your book was chosen.
> Yep, till now it's the best way for me to learn this language from scratch.
> (just tried to make some compliment here ^_^).
> Note: before actually writing to you, I tried to solve this issue by myself:
> googled it, and wrote the letter to python mailing lists (but till this
> moment they were pretty silent with answers). 
> Anyway, in order to save your time I'll jump to the issue itself.
> At this moment I got to chapter 20 which talks about Comprehensions and
> Generations.  By the end of this chapter you provided some example  
> regard to "myzip" function and inherent iteration issues in it.
> So here is a quote from your book(pages: 621-622, "Learning Python", 5-th Ed.):
> [...]

First off, this example, taken from the Python manuals, has been 
notoriously confusing to many readers, and you've clearly done 
much valuable research on this already.  Replies to your three
somewhat related questions:

> 1. What actually map() trying to do in Python 3.X? 
> I mean, why is this works fine:
> >>> L = [1, 2, 3, 4]
> >>> k = iter(L) - what actually happens here?
> >>> next(k)
> 1
> and so on.

This creates an iterator on the list L itself — an object 
that produces the items in the list upon next() requests: 
the integers 1, 2, and so on.

> But not this: 
> >>> list(map(iter, L)) ---- and what happens here?
> TypeError: 'int' object is not iterable

The map(F, I) call applies function F to each item in iterable I.
It creates the results series:

   F(I.next()), F(I.next()),  F(I.next()), ...

In 2.X, map() produces and returns the results of this process 
all at once in a new list.  In 3.X, map() returns an iterable 
result object that delays the work until it's asked for a next 
result; the list() forces this object to produce all its results 
at once, and stores them in a new list for display or other 

In your specific usage — list(map(iter, L)) — the map() call
is trying to apply iter() to each item _within_ iterable L, 
not to L itself (as in your prior code).  In equivalent 
indexing notation, this produces results series:

   iter(L[0]), iter(L[1]), iter(L[2]), ....

This won't work because the items within L are integers which 
do not support iteration.

> 2. Why strings are allowed(privileged)  "to become" an
> iterators(self-iterators)? 

Because strings are always sequences of 1-item strings.  That 
is, strings are iterable themselves, but so are their individual
components by definition.

This property is unique to strings, and stems from the fact that 
Python has no distinct type for individual characters.  In C, 
for example, strings are arrays of characters, and characters 
are atomic data items, that (usually) correspond to byte values. 
In Python, there are only strings, whose components are also 
strings of length 1; hence, the characters in a string are
themselves strings, and may be indexed, sliced, iterated, etc.:

   >>> x = 'spam'
   >>> x[0]               # first item in a string
   >>> x[0][0]            # but it's also a string of len 1...
   >>> list(iter(x))      # iterate over string itself
   ['s', 'p', 'a', 'm']
   >>> list(iter(x[0]))   # iterate over string's component

This doesn't generally work for lists or tuples, which are 
heterogeneous collections of arbitrary object types — except,
of course, for items in a list that happen to be 1-character 

   >>> y = [1, 'p', 2]
   >>> y[0]               # first item is an integer 
   >>> y[0][0]            # not a sequence or iterable
   TypeError: 'int' object is not subscriptable
   >>> y[1][0]            # but second item is... 
   >>> y[1][0][0]         # and so on: str[i] is always a str

Confusing, perhaps, but it's a fundamental Python design 
choice, and hopefully clarifies the rest of your question:
items in a string are nested strings, and hence iterable
themselves, but that's not usually true for other more 
general collection types that are not homogenous:

> I mean why, is this possible:
> >>> print(list(map(iter, S)))
> [<str_iterator object at 0x02E24FF0>, 
> <str_iterator object at 0x02E24CF0>, 
> <str_iterator object at 0x02E24E10>,
> <str_iterator object at 0x02E24DF0>]     
> I'm just trying to say, is that if I wouldn't tried to run the book's
> example with integer arguments (or tuples or lists as arguments)  it 
> wouldn't alarm this issue.
> And I would have lived happily assuming that I understand iterables. ))))
> Those examples works fine with strings but not with list/tuples etc.

> 3.	The last question
> You say:
> " But it falls into an infinite loop and fails in Python 3.X, because 
> the 3.X map returns a one-shot iterable object instead of a list as 
> in 2.X. In 3.X, as soon as we've run the list comprehension inside 
> the loop once, iters will be exhausted but still True. [...]
> To make this work in 3.X, we need to use the list built-in function 
> to create an object that can support multiple iterations". 
> (Like:"Wat?!" ^_^)

Well, a list.  Lists support multiple iterations (scans), but 
map() result objects do not.  You cannot rescan a map() result
by itself more than once, because it's empty after the first 
scan (in the book's phrasing, it's a "one-shot iterator").  
But wrapping a map() result object in a list() call collects 
its items in a list object which does allow multiple scans:

   C:\...> py -3
   >>> L = [1, 2, 3, 4]
   >>> [x * 2 for x in L]          # iterate across a list
   [2, 4, 6, 8]
   >>> [x * 2 for x in L]          # we can go again here...
   [2, 4, 6, 8]

   >>> M = map(abs, [1, 2, 3, 4])  # abs(X) simply returns X here
   >>> [x * 2 for x in M]          # iterate over a map() result
   [2, 4, 6, 8]
   >>> [x * 2 for x in M]          # <== but it's empty after 1 pass...

   >>> LM = list(map(abs, [1, 2, 3, 4]))
   >>> [x * 2 for x in LM]
   [2, 4, 6, 8]
   >>> [x * 2 for x in LM]         # copying to a list works...
   [2, 4, 6, 8]

   >>> M = map(abs, [1, 2, 3, 4])
   >>> [x * 2 for x in M]
   [2, 4, 6, 8]
   >>> M = map(abs, [1, 2, 3, 4])  # or make a new map() object...
   >>> [x * 2 for x in M]
   [2, 4, 6, 8]

> Why the infinite loop would be there and why should list() to make it
> finite?  o_0 

This is probably the most confusing part of the manual's arguably
confusingly coded example, and its failure in 3.X.  It occurs 
because once the map() result object is emptied by a single scan, 
it's always considered Boolean True, despite its empty status:

   >>> M = map(abs, [1])
   >>> next(M) 
   >>> next(M)                     # now always empty in 3.X
   >>> bool(M)                     # <= but True nonetheless...

   >>> I = iter(M)                 # still empty/True if new scan tried
   >>> next(I)
   >>> bool(M)

This throws off the example's logic, triggering the infinite loop
in 3.X.  Specifically, the map() result object produced by the

   iters = map(iter, args)

is empty after its first scan of its iter() results within the 
loop, but also True thereafter:

   >>> iters = map(iter, ([1], [2, 3], [4]))
   >>> [next(i) for i in iters]
   [1, 2, 4]
   >>> bool(iters)
   >>> [next(i) for i in iters]
   >>> bool(iters)
   >>> [next(i) for i in iters]      # infinite loop time...
   >>> bool(iters)

A list() call avoids this by allowing for multiple scans in 
3.X, and it's a non-issue in 2.X because map() returns a new 
list anyhow; in either case, the StopIteration is thrown in 
the loop correctly when any argument's iterator is exhausted.
Trace through the code again to see why.

[Some of this reply may show up (anonymously, of course) on my 
recent FAQs page, because it's officially attained common status.]

--Mark Lutz (http://learning-python.com

[Back to Index]

More on map() with Differing-Length Iterables

[Jan-2015] A reader wrote seeking clarification on a Python 2.X failure for a map() example in the book. The edits this dialog spawned are recorded on the book's errata page at O'Reilly's site (look for page 617 there). The following gives the original email, followed by the discussion text from the errata post. This is mostly for 2.X readers (who may see the error), but it also is a general summary of the map() function's behavior in both lines.

> -----Original Message-----
> From: ...
> To: lutz@rmi.net
> Subject: Error? Learning Python 5e, p. 617
> Date: Wed, 24 Dec 2014 13:04:53 -0600
> Hello ... enjoying the book so far. I am getting an error message for a mapping 
> function on p. 617 (“Example: Emulating zip and map with Iteration Tools), 
> using Python 2.7. I tried checking online resources, including errata on your 
> book’s website, to no avail.
> The issue seems to be that map(pow, list1, list2) can’t tolerate lists of 
> different lengths (in contrast to the zip functions earlier on the page). 
> Whereas:
>     >>>map(pow, [1, 2, 3], [2, 3, 4])
> returns:
>     [1, 8, 81]
> adding 5 to the second list to match the book’s example:
>     >>>map(pow, [1, 2, 3], [2, 3, 4, 5])
> results in:
> Traceback (most recent call last):
>   File "", line 1, in 
>     map(pow, [1, 2, 3], [2, 3, 4, 5])
> TypeError: unsupported operand type(s) for ** or pow(): 'NoneType' and 'int'  
> Since I’m using Python 2.7, I have omitted the list terms in these examples.
> If I make the second list either shorter than, or longer than, the first I 
> get the error, which suggests that the function does not automatically stop 
> when it reaches the end of the shorter iteration. However, the book example 
> indicates that it should automatically stop at the end of the shorter list 
> iteration.
> Could you explain why the book example seems to tolerate unequal list lengths,
> but my code does not?

[from the errata page's post:]

A reader wrote to ask why this example on page 617 of Chapter 20:

   >>> list(map(pow, [1, 2, 3], [2, 3, 4, 5]))   # N sequences: N-ary function
   [1, 8, 81] 

works on Python 3.X, but fails in 2.X.  This reader later withdrew the query, 
after finding the earlier 2.X map() coverage which notes its None padding when 
argument lengths differ (by contrast, 3.X's zip() and map() both truncate).  
This earlier coverage is on page 408-409, in Chapter 19's section "map equivalence 
in Python 2.X."

In hindsight, though, the Chapter 19 section (and related material later in 
Chapter 20) is perhaps not as clear about the 2.X/3.X differences in the map() 
call as it could have been.  Really, the padding with None always occurs in 2.X, 
_irrespective_ of the function argument passed in.  In full detail:

In Python 3.X, map() always truncates at the shortest argument's length, 
and a real function is expected in its first argument:

   C:\...> py -3
   >>> list(map(pow, (2, 3), (1, 2, 3)))     # 2**1, 3**2
   [2, 9]

   >>> list(map(pow, (2, 3, 4), (1, 2)))     # 2**1, 3**2
   [2, 9]

   >>> list(map(None, (2, 3), (1, 2, 3)))
   TypeError: 'NoneType' object is not callable

In Python 2.X, map() always pads shorter arguments with None, regardless of 
whether a real function or None is passed — which can lead to errors for 
functions that don't expect the None:

   C:\...> py -2
   >>> list(map(pow, (2, 3), (1, 2, 3)))     # 2**1, 3**2
   TypeError: unsupported operand type(s) for ** or pow(): 'NoneType' and 'int'

   >>> list(map(pow, (2, 3, 4), (1, 2)))     # 2**1, 3**2
   TypeError: unsupported operand type(s) for ** or pow(): 'int' and 'NoneType'

   >>> list(map(None, (2, 3), (1, 2, 3)))
   [(2, 1), (3, 2), (None, 3)]

This is why page 617's "list(map(pow, [1, 2, 3], [2, 3, 4, 5]))" works 
in 3.X (as shown) but fails in 2.X (as not shown): on 2.X, the last 
function call runs "None ** 5" and fails.

In addition to 2.X map() coverage on page 408-409, this 2.X behavior is
strongly implied by the manual zip() and map() implementations that 
immediately follow in Chapter 20.  Still, the extension to a real function
argument isn't stated explicitly in either location.

Also note that the 2.X-flavor map() implementation in Chapter 20's section 
"Coding your own zip(...) and map(None, ...)" is really just that of 2.X's 
map(None,...), as it does not apply a function to paired items (though you 
could easily extend it to do so).  This example primarily implements a 
zip() with padding.

For more on map() in Python 3.X and 2.X, see also Python Pocket Reference 
(a supplement to Learning Python), as well as Python's own standard 
library manual.  As a tutorial, Learning Python occasionally sidesteps 
some obscure or dated fine points on purpose; this qualifies on both counts
(mapping functions on differing-length iterables in 2.X seems rare in the 
extreme), but the book example's potential to confuse merits the patches.

[Back to Index]

What Does "in" Do When Applied to a File?

[Jan-2015] A reader wrote O'Reilly's book support people asking about an example that applies the in membership operator to an open file object. I've seen this cause confusion before, so I'm posting the query and reply here; it's also an excuse to show an arguably powerful use case for the any() built-in, and preview the re pattern-matching module.

> -----Original Message-----
> From: BookTech 
> To: "lutz@rmi.net" 
> Subject: FW: Learning Python 5th Edition query
> Date: Mon, 5 Jan 2015 23:57:57 +0000 (GMT)
> Hi Mark,
> Would you be able to assist me in answering this customer's question about 
> your book please?
> --------------- Original Message ---------------
> From: ...
> Sent: 1/2/2015 2:38 AM
> To: bookquestions@oreilly.com
> Subject: Learning Python 5th Edition query
> Hi,
> I'm currently working my way through the above mentioned book and I'm a
> little confused by the results produced in one of the examples.  On page 431
> the author gives us a list of examples in which we can see the iteration
> protocol at work on a file and all of them work just fine except the
> membership test and although this makes sense to some degree, the author
> clearly expected it to work.
> I'm using my own file but I know one of the tests should produce a True
> result but it doesn't.  I've now tried this in both Python v2.7.6 and v3.3
> but I always get a False result.  
> I've attached the file I'm reading from and copied and pasted my own two
> forms of the example below;
> print('blue' in open('test.txt'))
> print('brown' in open('test.txt'))
> As I've said I was not overly surprised that the test fails because this
> example isn't really asking for the file to be read, it is a
> straight-forward test and the object it is testing again isn't a string, so
> I would have expected an exception to be raised instead.
> I imagine there are other ways to perform the test but I'm curious to know
> why this just produces a False.

Sure; this is actually simpler than the reader might expect.
Look carefully at what the "in" tests in the book are doing:

   >>> 'y = 2\n' in open('script2.py')      # Membership test
   >>> 'x = 2\n' in open('script2.py')

These are testing for the presence of an entire line, not a 
word within a line.  That's why there is an explicit '\n' at the 
end of the test strings.  It must be so,  because the file object 
iterator returned by open() and activated by the "in" iterates 
through full line strings, not individual words or characters.

Hence, in the reader's case and test file, a test for an 
individual word will never be True, but testing for a full 
line (as in the book) does work in both Python 2.X and 3.X:

   C:\...> type test.txt
   The quick brown fox
   jumped over the lazy hen
   Mary had a little lamb
   its fleece as white as snow

   C:\...> py -3
   >>> 'brown' in open('test.txt')
   >>> 'The quick brown fox\n' in open('test.txt')
   >>> 'The quick blue fox\n' in open('test.txt')

If you really want to test for individual words within
lines, you might try an outer for loop that scans lines,
prints affirmative on a match, and negative in the else
(run if the loop didn't hit a break):

   >>> for line in open('test.txt'):
   ...     if 'brown' in line:
   ...         print('yes')
   ...         break
   ... else:
   ...     print('no')
   >>> for line in open('test.txt'):
   ...     if 'blue' in line:
   ...         print('yes')
   ...         break
   ... else:
   ...     print('no')

Alternatively, you might apply the "in" to each line with 
a list comprehension, and run the result through the any()
built-in to see if any came out True:

   >>> [('brown' in line) for line in open('test.txt')]
   [True, False, False, False]
   >>> [('blue' in line) for line in open('test.txt')]
   [False, False, False, False]

   >>> any([('brown' in line) for line in open('test.txt')])
   >>> any([('blue' in line) for line in open('test.txt')])

But if you've gone to that much bother, a generator
expression saves having to type the square brackets
(and building a list of Boolean results in memory):

   >>> any(('brown' in line) for line in open('test.txt'))
   >>> any(('blue' in line) for line in open('test.txt'))

Caveat: this is a bit inaccurate, as the search for "brown"
will also report true for a "browning" in the file, which 
may or may not be what you want.  To look for whole words 
only, you might first split line strings on delimiters such 
as whitespace:

   >>> 'The quick brown fox\n'.split()
   ['The', 'quick', 'brown', 'fox']

   >>> any(('bro' in line) for line in open('test.txt'))
   >>> any(('bro' in line.split()) for line in open('test.txt'))
   >>> any(('brown' in line.split()) for line in open('test.txt'))

On the other hand, splitting precludes searching for arbitrary
line substrings:

   >>> any(('brown fox' in line) for line in open('test.txt'))
   >>> any(('brown fox' in line.split()) for line in open('test.txt'))

Finally, if the file being processed is small enough to fit
into memory, it's both quick and easy to load it into a 
single string with the file.read() method, and apply "in" 
to do substring search — which is close to the original 
intent, but applied to the full file, not its line strings
(and will fail for pathologically large files):

   >>> 'brown' in open('test.txt').read()
   >>> 'blue' in open('test.txt').read()

   >>> 'bro' in open('test.txt').read()
   >>> 'bro' in open('test.txt').read().split()
   >>> 'brown' in open('test.txt').read().split()

As a postscript (and preview of a topic in the domain of the 
book Programming Python): splitting on whitespace may not
suffice if the file contains punctuation characters; a comma
immediately following a word is enough to throw this scheme off. 
If such is your text, you can still perform whole word matches
by splitting on a set of alternatives with Python's re pattern 
matching module to be more general.
Here's the case for one line; apply this within a line iteration
or after a full text read as appropriate; precompile the pattern 
string for speed as needed; and see the mentioned book, the book 
Python Pocket Reference, or Python's library manual for more 
details on this module:

# whitespace splitting: some words missed

   >>> line = 'Brown,green; light-blue,  red.  And orange!'

   >>> line.split()
   ['Brown,green;', 'light-blue,', 'red.', 'And', 'orange!']

   >>> 'green' in line.split()

# smarter splitting: 1 or more of any in the [] set, \s=whitespace

   >>> import re
   >>> re.split('[,;.!\-\s]+', line)
   ['Brown', 'green', 'light', 'blue', 'red', 'And', 'orange', '']

   >>> 'green' in re.split('[,;.!\-\s]+', line)

# ignoring case

   >>> 'brown' in [s.lower() for s in re.split('[,;.!\-\s]+', line)]

# using dashes or not, generator expr

   >>> 'blue' in (s.lower() for s in re.split('[,;.!\-\s]+', line))
   >>> 'blue' in (s.lower() for s in re.split('[,;.!\s]+', line))

# applying to all lines in a file

   >>> import re
   >>> print(open('test2.txt').read())
   The quick, brown, fox
   jumped over the lazy hen
   Mary had a little lamb
   its fleece as Blue-White as snow

   >>> any( ('blue'  in re.split('[,;.!\-\s]+', line)) for line in open('test2.txt') )
   >>> any( ('brown' in re.split('[,;.!\-\s]+', line)) for line in open('test2.txt') )

   >>> any( ('blue' in (s.lower() for s in re.split('[,;.!\-\s]+', line))) for line in open('test2.txt') )
   >>> any( ('blue' in (s.lower() for s in re.split('[,;.!\s]+', line))) for line in open('test2.txt') )

--Mark Lutz (http://learning-python.com)

[Back to Index]

"Programming Python" Content and Status

[Feb-2015] A LP5E reader wrote with questions and suggestions for both Learning Python and Programming Python. As the latter is an applications-level follow-up to the former, this is relevant to both Learning Python readers and this page. Some of the replies pasted below may help explain the purpose and goals of Programming Python, as well as constraints inherent in writing such books.

> -----Original Message-----
> From: ...
> To: lutz@learning-python.com
> Subject: Information regarding Programming Python 5th ed.
> Date: Mon, 09 Feb 2015 15:18:41 +0000
> Hi Mark,
> I am a student from India and have been following your book Learning Python
> very closely and its helping me a lot. Since I'm about to finish Learning
> Python and very excited to jump into "Programming Python", I have a few
> questions regarding the book Programming Python :
>    [...specific questions quoted below...]
> Thanks for your time.
> Regards,
> ...


Thanks for your note, and best wishes with the books and Python.
Responding to your queries:

>    1. Is the 5th edition going to be released in the near future? Because
>    4th edition was in 2010 and there have been significant changes in python
>    since then.

No, there is no 5th edition planned for this book today.  I don't
believe there will be one in the next few years, if at all.

In general, this book is a tutorial on getting started in common 
application domains - the web, GUIs, systems, text, databases, and 
so on.  It teaches these domains' fundamentals that span, and are 
prerequisite to using, more specific tools.  Its CGI coverage, 
for example, lays the Web scripting groundwork needed to understand
and properly use more advanced frameworks such as Django.  As such, 
this book's material is not out of date with, and is even largely 
immune to, the latest twists and turns of the software field.

One more specific note: this book's examples are known to work under 
Python 3.3 and 3.4 [edit: and later, 3.5] with only minor patches, 
and so are as current as they can be.  For more on this, see the 
following page, especially if you purchase the book for use under 
the latest Python 3.X:


>    2. If 5th edition is underway, does it include topics that are new to
>    python3 like asyncio etc.

Per the prior point, there is no 5th Edition underway.
The asyncio module would be a prime new topic, of course, 
but it's a bit unproven given its age, and there is ample 
coverage of related parallel processing topics in the book. 
Again, the book stresses fundamentals underlying modules 
like asyncio, not just API details of specific libraries.
This is especially true for newly emerged tools.

>    3. Current edition (4th) of Programming Python explains about CGI
>    scripts for web-based programming, are there any plans to include stuff
>    like WSGI, and werkzeug  / other
>    REST API based frameworks.

No, again per the first point.  This book covers general 
fundamentals that span tools, rather than trying to cover the 
latest popular tools — which, time has shown, often have a 
heyday that is not as long as the shelf life of a book.

>    4. Learning Python mentions about coroutines in Generators section while
>    talking about the "yield" statement, however there is no further discussion
>    of coroutines in later chapters or Programming Python. Is this topic
>    included in next version given the popularity of coroutines in libraries
>    like "gevent" and "tornado".

I appreciate the suggestion.  There is no Learning Python update on
the drawing board today, and won't be for years.   This book is just 
1.5 years old today, after all; if there is an update, it most likely 
won't be until 2017 and Python 3.6, given the book's normal updates
cycle.  That said, more on coroutines might work (especially use of
the latest "yield" extensions), if its audience is large enough to 
justify the growth. 

[Update: see also here and here.]

>    5. Lastly as a feedback, I would humbly request you to add an (optional)
>    section about a guide to contributing to python opensource project and
>    briefly explaining how the important files in the source result into a
>    minimalist python (basically which important files do what) and how to use
>    debugger to find the file related to a particular bug. This could be
>    important for the python community as Guido Van Rossum has constantly been
>    discusing about the need to get more people in the python core development
>    and also given that many college students are now interested in
>    contributing to python who have relatively lower experience/understanding
>    about design of programming languages and compilers.

All good ideas as well (and some of which I've addressed in earlier 
books).  Unfortunately, this level of detail tends to change too 
frequently to codify in books that may be around for a decade or 
more.  My general policy is that "ecosystem" topics like source code 
structure, PyPi, and development procedures are best addressed on 
the web, were they're much more easily updated than in books. 
PyPi, for example, did not exist when earlier editions were 
published, and could be subsumed by other tools in the future.

Still, these are all useful topics; thanks for the suggestions,
and again, best wishes with Python.

--Mark Lutz (http://learning-python.com)

[Back to Index]

Hiding Standard Modules Can Make IDLE Fail Too

[Apr-2015] A reader wrote with an IDLE usage note that underscores some of the subtlety of module search paths:

> From: ...
> To: lutz@rmi.net
> Subject: Question...
> Date: Mon, 13 Apr 2015 17:36:20 -0700
> I'm working my way through "Learning Python," 5th edition. On page 724
> there's some code that involves creating a "string.py" in the main
> ("c:/code") directory.
> With the basic Python 3.4 distribution, this doesn't seem to work - when I
> create a file by that name, I can't get the Idle shell to open. It runs as
> expected in Python from command line, but this is significant enough that I
> thought it should be mentioned in future editions...

Thanks for your note; I'll consider adding a footnote on this in the 
future.  This is a bit grey, because:

- It's somewhat implied by this section's coverage (it demonstrates 
  a module in CWD hiding one in the std lib, which is just what happens 
  to IDLE if it's run in this CWD).

- The example works as run in the book (it has explicit command-line 
  prompts to give its usage mode). 

- Most readers probably aren't having the issue, because they are 
  launching IDLE by clicks instead of command lines: other launch modes 
  won't run IDLE in the CWD where string is redefined.  In fact, this 
  is why the examples must be run from a command line instead - IDLE won't 
  see your CWD if clicked in a file explorer.  Getting IDLE to see your 
  CWD via command line triggers the issue the section demos.

That is, IDLE is a Python program and follows the normal module
lookup rules for the tools it requires.  As the current directory 
is searched first, a string.py there can break IDLE, but only if
IDLE is launched in that directory with a command line.  Clicking 
to launch runs IDLE in a different directory, and without problem.

Still, this could be useful, and even informative, to note.

--Mark Lutz (http://learning-python.com)

[Back to Index]

Running Scripts 2: General Pointers, Using "cd"

[May-2015] A reader wrote with confusion on launching Python scripts; this is beginner-level material, but may be common enough to warrant a paste here.

Update, Feb-2018 If asked today, I'd add to the reply below that the PyEdit program available freely on this site is a lightweight alternative to IDLE for editing and running Python code, especially for users just getting started with Python. Grab the app or executable for your desktop platform and experiment to see how.

> From: ...
> To: lutz@learning-python.com
> Subject: sorry but I am stuck
> Date: Sat, 2 May 2015 15:14:37 +0000 (UTC)
> Hello, 
> I'm new to programming and stuck in the first concept.  I bought the book through B&N. 
> I created a file in python, then saved it in a new folder on the c drive, just like 
> you explained in the book.  I tried it both ways with python interphase (GUI) and with 
> notepad (saving it with the py extension) and none of them work.  When I go to the 
> command line (it is already in python >>>) and type 
> c:\code\script1.py 
> it says that there is a syntax error 
> it seems like the error is in the : 
> I spent about two hours looking for a solution online.  Other people seemed to 
> have the same issue but no response.  I downloaded 2.7. 
> Thanks 

Have you worked this out yet on your own?  There are ample
resources in the book on running programs, but the first 
steps can be challenging for those new to programming.

Unfortunately, it's difficult to assist without seeing 
exactly what is failing for you.  One specific note: 

> When I go to the command line (it is already in python >>>) 
> and type c:\code\script1.py 

In this case, it appears that you are attempting to run a 
system shell command (to launch a program) at the Python 
prompt.  Per the book, that never works — you can type only 
Python code at the Python prompt.  You need to type and run
the "c:\code\script1.py" command from a basic system shell,
not from a Python session.  On Windows, that means it must be 
typed in a Windows Command Prompt window without starting the 
">>>" Python session.  That is:
  1. Open a Command Prompt window via Windows Start or Run menus
  2. Run system command "c:\code\script1.py"
Or, if the prior doesn't work due to bad filename associations:
  1. Open a Command Prompt window via Windows Start or Run menus
  2. Run system command "python c:\code\script1.py"
Or, if the prior doesn't work because Python isn't on your PATH (as described in an appendix in the book):
  1. Open a Command Prompt window via Windows Start or Run menus
  2. Run system command "c:\python27\python c:\code\script1.py"
To avoid typing the script's path, cd (change directory) to its folder first; in system commands, script names without their full paths are taken as relative to the current directory:
  1. Open a Command Prompt window via Windows Start or Run menus
  2. Run system command "cd c:\code" to go to the script's directory
  3. Run system command "python script1.py"
I suspect you were missing the "cd" command in this procedure. For example: ...open Command Prompt... C:\Users\you> cd c:\code c:\code> python script1.py win32 4294967296 Spam!Spam!Spam!Spam!Spam!Spam!Spam!Spam! c:\code> For more on the "cd" system command, try a "help cd" in a Windows Command Prompt, or http://en.wikipedia.org/wiki/Cd_%28command%29. Other Options. The preceding deals with running scripts from a command line, but, as covered in the book, there are other launch options. Alternatively, you can type an "import script1" Python command at its ">>>" prompt, but this works only if the window where the ">>>" appears is running in the "c:\code" directory (else Python cannot find your file in the current working directory). To use this technique, you must:
  1. Open a Command Prompt window
  2. Run system command "cd c:\code" to go to the script's directory
  3. Run system command "python" to start the ">>>" Python session
  4. Run Python command "import script1" (with no ".py") to run your script
This runs the file as it was when Python started up; to see changes you've made to the file, you may need to reload() or restart the Python session. Finally, you can run the script from IDLE and skip command lines altogether, using IDLE's pulldown menu options (IDLE also changes to the script's directory automatically):
  1. Start IDLE via Python menu in the Start menu (or other technique)
  2. Open the script's code in IDLE via "File->Open" in IDLE's main window
  3. Run the script's code via "Run->Run Module" in the newly opened window
You may also run a script by clicking its icon, but this fails if there are errors in the code; IDLE or command lines work better. All of this is covered in the book, so I encourage you to reread the early chapters if you're still having problems. If your script generates Python error messages when run, you have passed the first hurdle (it's running), but will need to make sure you copied its code exactly as shown in the book to avoid syntax errors from Python. --Mark Lutz (http://learning-python.com)

[Back to Index]

More on List-Based Matrix Summation

[Nov-2015] A student recently asked about computing element-wise sums of a list-of-lists matrix structure. The book covers most of this directly on page 113 (and uses sum() elsewhere), but the final step is added here. Its generator expression-based variant leverages the automatic iteration performed by the sum() built-in:

>>> M = [[1, 2, 3],                     # 3x3 2D matrix
...      [4, 5, 6],
...      [7, 8, 9]]

>>> M                                   # really a list of row lists
[[1, 2, 3], [4, 5, 6], [7, 8, 9]]      

>>> M[1]                                # second row (in book)
[4, 5, 6]

>>> [row[1] for row in M]               # second column (in book)
[2, 5, 8]

>>> [sum(row) for row in M]             # sum of rows (in book)
[6, 15, 24]

>>> sum([sum(row) for row in M])        # sum of all items: sum of row sums

>>> sum(sum(row) for row in M)          # same, but via generator, not list

[Back to Index]

Loading Matrix Data from a Text File

[Nov-2015] A reader wrote asking how to load numeric data from a text file. Because this sort of basic file-processing task is common to a wide range of applications, it merits a post here. Like all replies on this page, this one hopes to address common queries (and not homework deadlines...).


> From: ...
> To: lutz@rmi.net
> Subject: Python Help
> Date: Mon, 26 Oct 2015 08:30:03 +0200
> Hi Mark,
> I am lost in Python, can you guide me please.
> I got a protein distance matrix (you can check the attachment)
> How can I read this file and scale this value on a scale of 0 to 1 usign
> sigmoid function.
> I know we can do this in Python like this:
> import math
> def sigmoid(x)
>      return 1/(1+math.exp(-x))
> but I don't know how can I read and change this file values?
> Any help would be welcome, thanks!

Attached data file, distance_matrix:
gi15801179  0.0000 42.4581 10.6714 39.6484 15.0681  9.0639 10.8328 16.3808
gi9967069  42.4581  0.0000 42.9834 10.7504 14.9194 14.6448 41.0313 10.1185
gi15925280 10.6714 42.9834  0.0000  5.9973 12.5600 40.5210  5.8560 27.7503
gi12313641 39.6484 10.7504  5.9973  0.0000 15.4373 40.4623  2.3851 34.7955
gi14719485 15.0681 14.9194 12.5600 15.4373  0.0000 12.4809  8.8614 27.0177
gi4758426   9.0639 14.6448 40.5210 40.4623 12.4809  0.0000 42.7092 20.1177
gi6633958  10.8328 41.0313  5.8560  2.3851  8.8614 42.7092  0.0000 27.7887
gi21730171 16.3808 10.1185 27.7503 34.7955 27.0177 20.1177 27.7887  0.0000


I don't understand your application's goals, of course, but you can 
parse and load the data file with code of the following sort.  Given
that the data is whitespace-delimited text, the trick is to split and 
convert, while reading line by line.

To update a text file like this, you'll probably write or print lines 
to a new version of the file as shown, with a space or tab ('\t') 
between each number.  Binary files imply different loading techniques,
and can be updated in-place instead via file seeks.

# file scan.py

data = open('distance_matrix')                   # open data in input mode
newdata = open('new_distance_matrix', 'w')       # open results output file

numcols = int(data.readline().strip())           # line 1, less blanks, str->int

for line in data:                                # for each line left
    cols  = line.split()                         # split on whitespace
    label = cols[0]
    vals  = [float(text) for text in cols[1:]]   # strings -> numbers
    print(label, '=>', vals)

    for (ix, val) in enumerate(vals):
        # here, you may want: vals[ix] = sigmoid(val)?
        # not sure of the purpose of your numcols line
        # enumerate() makes range(numcols) superfluous 

    # write results: list -> text
    newdata.write('\t'.join(str(v) for v in vals) + '\n')

--Mark Lutz (http://learning-python.com)

[Back to Index]

More Fun with map+lambda: Nested Loops

[Jun-2016] A reader wrote asking for comments on some code written to perform a nested loop with a map()/lambda combination. This is an extension of the map() coverage in the book—which stops short when the nesting becomes complex enough to qualify as cruel and unusual punishment. Nevertheless, because other readers might be interested in the subject, I've posted the code of my reply giving a handful of alternative solutions here:


There certainly are additional alternatives, and some might even be fun to play with, but such code already pushes the envelope on readability enough to merit a stop here. Excerpted text from the reader's email:

Hello sir, I am currently reading your book learning python(5th ed.) I
was stuck at the part where using only map one can implement nested
loops. I came up with a solution that works for 2 nested levels, i.e.
for the same test case given in the book.

...[see code in linked file above]

I know the above might be bad code. I have attached an image that may
be used for more than 2 nested levels(N levels). Could you kindly tell
me if I'm wrong and how should I correct it. I haven't checked it yet
for more than 3 levels.

Thank you for this wonderful book. Your book has made me realize and
appreciate the beauty of the python language and it's design.

[Back to Index]

On "The Shallows" and Wake-Up Calls

[Apr-2015] A reader wrote with observations and questions about The Shallows, a book mentioned in Learning Python's Preface (see the earlier post). As this seems of general interest, the trimmed query and my slightly edited reply are below.

> From: ...
> To: lutz@learning-python.com
> Subject: The Shallows
> Date: Fri, 20 Mar 2015 10:44:12 -0500
> Mr. Lutz,
> I'm working my way through your *Learning Python* book, 5th Edition. In
> your preface you gave thanks for the book *The Shallows* because of its
> wake-up call in your life.
> Because of that comment, I bought and read the book *The Shallows,* and I'm
> curious about how this book affected you. It seems that Python forms part
> of "the system" called the Internet that is slowly eroding away at the
> intellectual aspects of life in the modern world.
> [...]
> How did that book give you a wake-up call? Can we study technology (like
> Python or networking principles) and still maintain that "depth of thought"
> Carr says is disappearing in our culture because of technology?
> Maybe I answered my own question when I bought the paper copy of *Learning
> Python* and not the electronic copy.
> [...]
> Thank you for your book. It's the best I've found on Python.

Thanks for your note, and I'm glad you found both books useful.

For me, the wakeup call of "The Shallows" was a reminder that I've
worked for 3 decades in a field that's great at asking the "how"
questions, but often lousy at the "why".  That's left us with devices 
and an Internet that have changed society in titanic ways in a 
very short amount of time — and nobody in the field seems to 
have asked whether this is a good thing.  

It may be too late for such questions, given the leveling of entire
industries that the web has wrought; and it may be unrealistic given 
the money that's flooded the field.  But it's my hope that we proceed
with more forethought than in the past.  Not only may there be adverse
cognitive consequences per "The Shallows", but the social and political
implications of online lives are enormously perilous.

As for Python's role: yes, it was there at the start and remains a 
major player (it was used for Google's first 1996 web spider, is behind
YouTube and Dropbox, and is at the core of much data analysis today). 
I suppose those of us who helped launch it are akin to the physicists 
whose theoretical work led to atomic weapons; a toolmaker isn't 
responsible for the creations of the tool, but does have some 
ethical obligation to sound the alarm when it is misused.

As examples, see my recent notes on cloud storage and on-line calendars
[edit: these links have changed - see the addresses they now reference]:


Which is not to say that either the Internet or Python are all bad. 
To be sure, Python programming can instill the very depth of thought
whose demise "The Shallows" laments.  But something with very large 
impacts should come with very large cautions.

Best wishes with Python,
--Mark Lutz (http://learning-python.com)

[Back to Index]

[Home] Books Programs Blog Python Author Training Search Email ©M.Lutz