Monday, December 7, 2009

A Ruby book worth reading for Pythonistas

It's been a while since I've ready an entire technical book over the weekend. But I enjoyed Design Patterns in Ruby so much that I kept picking it up for "one more chapter".

I played with Ruby a little several years ago, but I've never used it seriously. I need to use it at work though. Specifically, I wanted to learn about creating Domain-Specific Languages (DSLs) in Ruby, which is what led me to this book (since it has a chapter on DSLs).

When the book arrived late Thursday I skipped straight to the DSL chapter, which was interesting. But then I started reading it from the beginning. Chapter 1 talks about design patterns, the book Design Patterns, and the general principles that begin the Gang of Four book:
  • Separate out the things that change from those that stay the same.
  • Program to an interface, not an implementation.
  • Prefer composition over inheritance.
  • Delegate, delegate, delegate.
Russ Olsen adds "You ain't gonna need it."

Chapter 2 is a concise introduction to Ruby: just 37 pages. It's not thorough, but it's enough to be able to read the Ruby code in the subsequent chapters.

Chapters 3 through 15 explain many of the patterns covered in the Gang of Four book. I enjoyed brushing up on my patterns, but what kept me wanting to read "one more chapter" over the weekend was how after explaining the pattern, Olsen explains how it does or does not apply in Ruby. And he takes the opportunity to introduce Ruby features (like procs & blocks, modules and mixins, modifying classes and objects on the fly, message passing, etc.) as he does this.

He also write about how each pattern can be abused, and describes real-world uses of the patterns in the Ruby libraries.

Chapters 16 through 18 describe three "patterns that have emerged with the introduction and expanded use of Ruby": DSLs, meta-programming and convention over configuration. And there are two appendices: A on installing Ruby on different platforms, and B contains a carefully annotated bibliography.

I suspect I'll find myself referring back to this book often, even when I'm writing Python code. I suspect reading it will help make me a better Python programmer (and a better programmer in general). So I recommend it.

If I had loads of time, it would be a worthwhile exercise to go through each pattern in the book and convert the examples to Python, and look for examples of the pattern in the Python libraries. I wonder if anyone else with more time has already had this idea.

Thursday, June 4, 2009

PyCon 2009 Notes - March 28th

Day Two of the "core" of PyCon 2009...

Morning Lightning Talks

You’ll find the video at http://us.pycon.org/2009/conference/schedule/event/41/.

Jesse Noller’s “Python License V3” is pretty funny (http://blip.tv/file/1931026 contains only that talk).

http://blip.tv/file/1999383 contains all the Saturday morning lightning talks.

Katie Cunningham talks on Python at NASA at 19:15.

PlayerPiano (http://playerpiano.googlecode.com) pretend typing of doctests for demos at 35:15.

James Bennett had a good presentation on DVCSs at 40:10:

PyMite - a subset of Python for microcontrollers (at 45:15)

  • http://pymite.python-hosting.com
  • will run in as little as 64 KB flash and 4 KB RAM
  • easy porting: one file with six functions
  • interactive interface
  • easy to write native functions: embed C code in docstrings
  • to try it out:
    • svn co http://svn.pymite.python-hosting.com/trunk
    • cd trunk; make ipm
  • TODO - would be interesting to port it to Nios II

web2py (at 49:50)

  • admin interface not just for database, but for design & debug

Keynote: Guido van Rossum

You’ll find the video at http://us.pycon.org/2009/conference/schedule/event/42/.

The R-word

  • “I’m not retiring, but I’m tiring.”
  • doesn’t enjoy traveling like he used to
  • “I’m sort of thinking over the next five or ten years to gradually fade away.”
  • Hopes in 5 years that the BDFL is not actually leading the community anymore, more of a figurehead.
  • “Don’t look to me for guidance for everything.”

Mentioned a new PEP that he supports on “yield from” which is useful for building “coroutine libraries” [I think]

MAY be a dependency & packaging API in 2.7 & 3.1—he mentioned this problem several times, it’s certainly on his mind as a big unsolved problem

“Remember, once you put an API in the core, it may not be dead but it evolves at a glacial pace.” “Third-party packages can evolve much faster.”

“The standard library may be just the right size.”

  • Can’t shrink it without killing apps.
  • Don’t want to add things until they’re stable. “Biologists call things that are stable ‘dead’.”

Answering a question about code blocks: “Let’s stop tinkering with the languages and [focus on] solv[ing] the problems around the language.”

  • FOR FUTURE INVESTIGATION: Guido said you could put a decorator around a nested function that “says execute here” (@apply?) - would this be the equivalent of Ruby code blocks? What are the advantages of Ruby code blocks? How are they used? Would this apply to nested, apply-decorated functions like this?

Jacob Kaplan-Moss - The State of Django

You’ll find the slides (in PDF format) and video at http://us.pycon.org/2009/conference/schedule/event/45/. I didn’t take any notes. (And the slides aren’t meant to be read by themselves.)

Joe Gregorio - The (lack of) design patterns in Python

You’ll find the slides (in PDF format) at http://us.pycon.org/2009/conference/schedule/event/51/, but there’s no video yet. (I’ll update this when it appears.)

BTW, Joe Gregorio’s PyCon 2009 blog post is at http://bitworking.org/news/420/pycon-2009-notes. You’ll find links to his “An Introduction to Google App Engine” tutorial materials there (as well as mention of this talk).

This was one of my favorite talks from PyCon 2009.

Mythology - languages are all turing complete

Python isn’t Java without the compile

“Yes, design patterns are good, but they’re also a sign of weakness in a language.”

The patterns are built in [to Python].

  • “No one talks about the ‘structured programming’ pattern, or the ‘object-oriented’ pattern, or the ‘function’ pattern anymore.”

“When most of your code does nothing in a pompous way that is a sure sign that you are heading in the wrong direction. Here’s a translation into Python.”
- Peter Otten

Wikipedia Strategy Pattern page - “invisible in languages with first-class functions”.

no need for iterator pattern in Python

also talked about Abstract Factory pattern, and others

TODO - I need to think about his using a decorator for the observer pattern. (Slides 27 through 29.)

Conclusions:

  1. look to features before patterns
  2. reduce patterns -> shorter code
  3. needed patterns -> language feature

Concurrency Patterns listed on Wikipedia:

  • Thread pool, etc.
  • “do these point to language features we should be talking about adding?”

See the Google Video: “Advanced Topics in Programming Languages: Concurrency and Message Passing in Newsqueak” by Rob Pike

Jack Diederich - Class Decorators: Radically Simple

You find the slides (in PDF & PPT formats) and video at http://us.pycon.org/2009/conference/schedule/event/55/.

This was also one of my favorite talks from PyCon. I recommend you watch the video.

Practical definitions of a decorator:

  • takes one argument
  • returns something useful
Example:
import cron
@cron.schedule(cron.NIGHTLY)
class SalesReport(Report):
def run(self):
# do stuff
“Can do everything in class decorators that you can do in metaclasses” [I may have paraphrased.]
  • class decorators are explicit
  • class decorators are easily stackable (unlike metaclasses—while possible, it’s difficult to remember how)
  • nice alternative to mixins

Popular patterns:

  • register - like cron scheduler above
  • augment - like mixins
  • fixup
  • verify

very simple (and slick) registration class decorator - see slide 25

Best Practices

  • return the original class
  • don't assume you are the only decorator
  • maybe you want a metaclass [he still uses them sometimes]
  • don't add slots

Ed Leafe - Dabo: Rich Client Web Applications in 100% Python

Slides (in ODP format) and video at http://us.pycon.org/2009/conference/schedule/event/58/.

advantages of web application:

  • zero deployment
  • cross platform
  • centralized control (of databases)

disadvantages:

  • limited UI

Manifest Class

  • client and server diff each other, ala rsync

Dabo Springboard is a launch app (to allow you to run apps without have to install, so you get the same advantage as a browser-based app)

Bob Ippolito - Drop ACID and think about data

Video at http://us.pycon.org/2009/conference/schedule/event/64/. I found the slides at http://bitbucket.org/etrepum/drop_acid_pycon_2009/ (linked to from his "PyCon 2009, Drop ACID and think about data") blog post.

This was a "firehose-style" presentation, so I recommend you first have a look at the slides, then try the video if interested.

ACID

  • Atomicity - all or nothing
  • Consistency - no explosions
  • Isolation - no fights (when used concurrently)
  • Durability - no lying

networks make ACID difficult in a distributed environment (which is necessary for reliability and scalability)

BASE

  • basically available
  • soft state
  • eventually consistent

Then reviews the pros & cons of:

  • BigTable
  • Dynamo
  • Cassandra
  • Tokyo Cabinet & Tokyo Tyrant
  • Redis
  • CouchDB
  • MongoDB
  • Column Databases
  • MonetDB
  • LucidDB
  • Vertica
    • this is the one his company (MochiMedia) decided to pay for; “it works”

“Bloom Filters are Neat”

  • probabilistic data structure
  • false positives at a known error
  • constant space
  • uses:
    • approximate counting of a large set (eg. unique IPs from logs)
    • knowing that data is definitely NOT stored somewhere (eg. remote chache)

Ian Bicking - Topics of Interest

Video, slides, and IRC logs available at http://us.pycon.org/2009/conference/schedule/event/76/.

I went to this talk because I had heard Ian Bicking is a great speaker and is “not to be missed”. I’m afraid I wish I had missed it. Ian kept a live IRC display on the screen as he spoke. While it was good for some laughs, it was a distraction to both the speaker and the audience.

TODO - he mentioned a lot of names of what I presume are projects he’s started (he assumed—incorrectly in my case—that we all knew what they are), I should learn about them:

  • Paste
  • Deliverance
  • WebOb
  • PoachEggs
  • pip

Alex Martelli - Abstraction as Leverage

I chose the Ian Bicking talk over this one. The video and slides (PDF) are at http://us.pycon.org/2009/conference/schedule/event/75/.

TODO: I intend to review them. I’ve never regretted attending an Alex Martelli lecture.

Lennart Regebro - Python 2.6 and 3.0 compatibility

I chose the Ian Bicking talk over this one also. But this looks interesting: “This talks takes a look at the various options of migrating to Python 3, and takes up examples of some tricks you can do to make you code run unmodified under both 2.6 and 3.0.” The video and slides (PDF) are at http://us.pycon.org/2009/conference/schedule/event/74/.

Evening Lightning Talks

You’ll find the video at http://us.pycon.org/2009/conference/schedule/event/78/.

snakebite - http://www.snakebite.org/ (right at the start of the video)

  • “Snakebite is a network that strives to provide developers of open source projects complete and unrestricted access to as many different platforms, operating systems, architectures, compilers, devices, databases, tools and applications that they may need in order to optimally develop their software.”

reading and writing Excel files from Python (5:40)

melkjug.org (8:35)

  • for filtering RSS feeds
  • open source

funny numbers rant (15:55)

Martin v. Löwis talks about porting Django and pyscopg to Python 3.0 (32:30)

Leonard Reder from JPL spoke about “Mars Science Laboratory - Flight Software Automatic Code Generation Tools” (37:35)

Wednesday, April 22, 2009

PyCon 2009 Notes - March 27th

Friday, March 27th through Sunday, March 29th were the “core” conference days. These are the days with regular scheduled talks, keynote talks, lightning talks and open spaces. You can see an overview of the schedule for these three days at http://us.pycon.org/2009/conference/schedule/.

I’ll present my notes from PyCon in chronological order.

Last year there were lightning talk sessions after the scheduled talks on all three days. Perhaps the scheduling committee got plenty of “more lightning talks” feedback, because this year each day also started with lightning talks, so except for a brief introduction from the PyCon 2009 Chair David Googer on Friday morning, the conference kicked off with lightning talks. I found this quite fitting and in keeping with PyCon being a community conference.

Morning Lightning Talks

You’ll find the video at http://us.pycon.org/2009/conference/schedule/event/2/.

Jeff Rush - About Python Namespaces (and Code Objects)

See http://us.pycon.org/2009/conference/schedule/event/7/ for the video, slides and other files.

I didn’t know compiling and disassembling Python code is as simple as:
s = 'x = 5'
co = compile(s, '<stdin>', 'exec')
from dis import dis
dis(co)
His slides and/or the video are worth reviewing.

His “thunk” example—which he defines as “like a proxy but it gets out of the way when you need it”—is interesting. See page 38 of the PDF or about 11:30 of the video.

I also noted that he said “we spend a lot of time going over source code in Dallas [at the Dallas Python Interest Group]”. That would be a worthwhile thing to try at BayPIGgies—I’ll propose it [TODO].

Adam D Christian - Using Windmill

See http://us.pycon.org/2009/conference/schedule/event/9/ for the video and PowerPoint slides.

“Windmill is the best-integrated solution for Web test development and its flexibility is largely due to its development in Python.”
  • Open source
  • Looks pretty slick
  • http://www.getwindmill.com
  • Selenium does SSL, Windmill doesn’t…yet.
  • Selenium has strong Java integration (and Windmill does not)

Question: “Why did you create Windmill?”
Answer: “At the time it took us longer to debug a Selenium test than to write it over again.”

Mike Fletcher - Introduction to Python Profiling

See http://us.pycon.org/2009/conference/schedule/event/15/ for the video and slides in OOo & PDF formats.

Asked 12 programmers “If I had a million dollars to spend on Python…”. The top three answers were about improving performance.

Good introduction. You may want to check out the slides first and then turn to the video for more detail.
  • Visualization Tools:
    • KCacheGrind - “some assembly required for use with Python”[and non-trivial to get it working on Mac OS X]
    • RunSnakeRun: (http://www.vrplumber.com/programming/runsnakerun/) - “doesn’t provide all the bells-and-whistles of a program like KCacheGrind, it’s intended to allow for profiling your Python programs, and just your Python programs”

Kumar McMillan - Strategies For Testing Ajax Web Applications

See http://us.pycon.org/2009/conference/schedule/event/18/ for the video and a ZIP containing the slides in HTML (or go to http://farmdev.com/talks/test-ajax/).

5 strategies:

  1. Test Data Handlers
  2. Test JavaScript
  3. Isolate UI for Testing
  4. Automate UI Tests
  5. Gridify Your Test Suite

Some resources on his wrap-up slide.

Aaron Maxwell - Building an Automated QA Infrastructure using Open-Source Python Tools

See http://us.pycon.org/2009/conference/schedule/event/22/ for the video and the slides (in OOo & PPT formats). Or see http://redsymbol.net/talks/auto-qa-python/:

“This demonstrates the value of an automated QA system. If you need to manually execute the code coverage tool, then in practice you just won’t do it as often as if it is run for you. If your QA system automatically runs code coverage each night (for example), you and your team are freed up from bothering to do it manually - or even remembering to do so. It’s just done silently, and a fresh coverage report is available when you are ready to see it.

“This talk referenced the The Buildbot QA/CI Framework. There are many such frameworks with different plusses and minuses. BuildBot’s weakness is its brief but steep learning curve, which makes it harder than anyone would like to set up for simple projects. Its plusses are its generality, range, and extensibility: it can be made to do almost anything you need your QA system to do, even for tremendously large projects with complex test metrics. Overall, I recommend BuildBot be used for building your QA framework, unless you have some particular reason to use one of the others that are out there.”

From slide 5: “Your QA System is ONLY as good as its reporting of results. If you don’t get this done well… none of the rest matters. Under appreciated…And critically, critically important.”

From slide 9: “BuildBot is probably the best general purpose Python-based, open-source framework available now.”

Slide 11 gives quick definitions of some BuildBot architectural terms.

Slides 12-19 walk through examples of a simple and a more complex BuildBot configuration.

Slides 20 & 21 show examples of extending BuildBot.

Owen Taylor - Reinteract: a better way to interact with Python

I didn’t attend this talk, but several people remarked on it later. I’ve since played with Reinteract and I recommend you check it out: http://www.reinteract.org/.

See http://us.pycon.org/2009/conference/schedule/event/23/ for the video and the slides (in PDF format). The slides are not at all useful by themselves. But I definitely recommend you watch the video. Reinteract could well be a tool you’ll want to use regularly.

“Traditionally Python has worked one of two ways: either a program with an edit-run cycle or a command prompt where the user types commands. Reinteract introduces a new way of working where the user creates a worksheet that interleaves Python code with the results of that code. Previously entered code can be changed and corrected. The ability to insert graphs and plots in the worksheet makes Reinteract very suitable for data analysis, but it also is a good for basic experimentation with the Python language. This talk introduces Reinteract and gives a high-level peek at the magic behind the scenes.”

Ned Batchelder - Coverage testing, the good and the bad.

See http://us.pycon.org/2009/conference/schedule/event/26/ for the video and the slides (in PDF format).

“Coverage testing tests your tests”

The slides are easy to read without the video if you prefer, so I won’t duplicate them here.

Writing more tests is the “only way to truly increase code coverage”. Excluding code to boost coverage is tempting, but you’ll never come back, so you’re only hurting yourself.

What is currently “100% broken”:
  • branch coverage
  • path coverage
  • loop path coverage
  • data-driven code - can’t measure data used
  • complex conditionals
  • hidden branches
  • broken tests

Dr. C. Titus Brown - Building tests for large, untested codebases

See http://us.pycon.org/2009/conference/schedule/event/30/ for the video and the slides (in PDF format).

Presented on his experiences creating tests for pygr, a Python graph database (for use in bioinformatics). (slide 11)
  • ~8K of Python, ~2K of Pyrex (-> C, for speed)
  • almost all library and framework (complex)
  • lots of technical debt
Code coverage invaluable when aimed at (slide 16)
  • new tests efforts on legacy code
  • understanding code bases
Grokking code through coverage (slide 19)
  • start with minimum useful statement
  • examine code that’s actually executed
  • add additional statement
  • examine executed code
  • repeat
(At some point—I can’t find it in the slides—he showed a —coverage-diff command-line option, to figleaf?)

Coverage driven testing (slide 29)
  • each new test should “attack” an uncovered line of code
  • immediate gratification of new code coverage
  • finds simple bugs with ease
  • you now understand that code

Jesse Noller - Introduction to Multiprocessing in Python

I didn’t attend this (as it was at the same time as the above talk), but I heard it was good. See http://us.pycon.org/2009/conference/schedule/event/31/ for the video and the slides (in PDF format).

Michael Foord - Functional Testing of Desktop Applications

See http://us.pycon.org/2009/conference/schedule/event/34/ for the video. See http://www.voidspace.org.uk/python/articles/testing/index.shtml for “online slides”.

If you write applications without tests then you are a bad person, incapable of love. — Wilson Bilkovich (The Rails Way)

Why Test Functionally? (http://www.voidspace.org.uk/python/articles/testing/processes.shtml)
  • Unit tests test components - not the application as a whole
  • Check new features don’t break existing functionality
  • Massively helpful when refactoring
  • Individual tests act as specification for a feature
  • Test suites are a specification for the application
  • When the test passes you know the feature is done
  • They can drive development

Good advice in dealing with problems (http://www.voidspace.org.uk/python/articles/testing/problems.shtml)

Fragility due to layout changes
  • Timing problems (beware the lure of the voodoo sleep)
  • Some UI elements are very hard to test
  • System dialogs (that are hard to interact with programmatically)
  • How do you test printing?
  • Bugs in the GUI toolkit
  • Spurious, random and impossible failures

Raymond Hettinger - Easy AI with Python

I didn’t attend this (as it was at the same time as the above talk), but I heard it was good. See http://us.pycon.org/2009/conference/schedule/event/71/ for the video and slides (in PPT & PDF formats).

Evening Lightning Talks

You’ll find the video at http://us.pycon.org/2009/conference/schedule/event/39/.

RANT: “import *” is evil (right at 0:05 in the video)

some call Brazil “Belindia” because it’s like “islands of Belgium in a sea of India” (6:18)

Michael Foord - Metaclasses in Five Minutes (12:00)
http://www.voidspace.org.uk/python/articles/five-minutes.shtml

Thursday, April 16, 2009

Python 401: Some Advanced Topics

On Thursday, March 26th, I attended Steve Holden’s Python 401: Some Advanced Topics tutorial. This one wasn’t as mind-expanding as the previous tutorial (or the three I took at PyCon 2008), and none of the material was new to me. But I’ve found re-learning material will often fill in the gaps in my knowledge, and that certainly was the case here.

You’ll find the slides at http://holdenweb.com/files/Python401.pdf.

The material was divided into six “lessons”, and three appendices.

Lesson 1 (slides 4 though 15) was on string interpolation, which I thought I had mastered. (Especially after the Secrets of the Framework Creators tutorial at PyCon 2008 and after writing http://pypap.blogspot.com/2008/03/string-interpolation.html). But I did learn a few new things. For example, I didn’t realize that the ‘%s’ conversion uses the value’s str() method. So one can quite safely do:
print '%s' % foo
…regardless of the type of foo.

I also didn’t know that one can use an asterisk to make width and precision “data dependent”. (Steve notes this only works with tuple data—of course it won’t work with a dictionary because the values are not ordered.) So you can do the following:

>>> def foo_wide(width):
... print '%*s' % (width, 'foo')
...
>>> foo_wide(4)
foo
>>> foo_wide(10)
foo
…or…
>>> import math
>>> def pi_wide(width, precision):
... print '%*.*f' % (width, precision, math.pi)
...
>>> pi_wide(8,2)
3.14
>>> pi_wide(10,5)
3.14159
Lesson 2 (slides 17 through 27) was on iteration. Steve explained the ”iteration protocol”:
  • iterables must have an __iter__() method which returns and iterator
  • iterators must be iterable, and must also have a next() method
Steve (and I later observed Michael Foord also) pronounced __iter__ as “dunder-iter”. It sounded a little strange at first, but it’s certainly easier than saying “under-under-iter” or “under-under-iter-under-under”.

Steve mentioned the itertools standard library, but didn’t allocate time in the tutorial to cover it. (For that I recommend Doug Hellmann’s PyMOTW blog post.)

He concludes lesson 2 with a slide (#27) explaining how to use the enumerate() built-in function (which I have found useful many times).

Lesson 3 (slides 29 though 35) was on generators and generator expressions. I like Steve’s explanation that generators are for creating sequences where computation is needed to create each element. And in conclusion, he writes that generators can “express producer-consumer algorithms more naturally” since the “generation of values is cleaning separated from their processing”. But aside from these insights, I didn’t learn anything new about generators. (That may be difficult after David Beazley’s excellent “Generator Tricks for Systems Programmers” tutorial at PyCon 2008.) And in spite of the lesson’s title, Steve didn’t cover generator expressions.

Lesson 4—covering Descriptors and Properties—was the most useful to me. I’d heard of descriptors and properties, but never really studied them or read code that used them. First, Steve explains in detail how attribute lookup works in new-style classes. This leads (after an aside which I’ll mention later) this his definition of properties: “a way of interposing code between client and server of a namespace”. One can define—using the property() built-in—a getter, setter and deleter, plus a doc string. And since the first argument to property() is the getter function, one can use property as a decorator (with no arguments) around a method. (See slide 41.) David Beazley (who was also taking this tutorial) spoke up and pointed out that in Python 2.6 a property object (returned from the property built-in) has setter and deleter methods that can be used as decorators. See the property built-in documentation for an example. On slide 45, Steve shows how to define properties without namespace pollution. Finally (slides 47 though 50) he goes into detail on the difference between old-style and new-style attribute lookup. I realized as Steve wrapped up this lesson that I still didn’t understand what a descriptor is, so I asked. Steve’s answer (I think) that the “descriptor protocol” is what enables properties to work. I gave myself a to-do to read the Python documentation on descriptors.

Back to the aside (on slide 39) I mentioned above. Steve notes that when you look up a callable on an instance, the interpreter creates a “bound method”, therefore (presumably because these are objects like everything else in Python) “a method call carries object creation overhead”. There’s a good illustration of this on the slide. This would be good to keep in mind if I ever find myself trying to squeeze as much performance as possible out of some Python code.

Lesson 5 (slides 52 though 73) is on metaclasses. I’d seen these before in the “Secrets of the Framework Creators” tutorial at !PyCon 2008. (And during the time I spent digging around inside the Django sources). If you’re still trying to wrap your head around metaclasses, this may be a quick way to get there. I won’t attempt to summarize, but the insight I gained from this lesson is that the type() built-in, when called with three arguments returns a new type object. In other words it’s a dynamic form of the class statement. (This is the mechanism for implementing metaclasses.) You may also want to read Michael Foord’s “Metaclasses in Five Minutes” notes or watch the video of his lightning talk at PyCon 2009 (which is supposed to start 11 minutes in). Though I would conclude that if you’re considering using metaclasses, you should seriously consider using class decorators first. (See my notes on the “Class Decorators: Radically Simple” PyCon 2009 talk.)

Lesson 6 (slides 75 & 76) was not really a lesson but a very quick wrap-up.

Finally there are three appendices. We did find the time to cover Appendix A (slides 78 through 84). It’s on decorators, but only on the simpler form of decorators that don’t take arguments. Slides 83 and 84 cover functools.wraps and functools.partial, and are interesting reading.

We did not cover the other two appendices. Appendix B (slides 86 through 89) is on context managers, which I myself covered back in July 2008 to present a “Newbie Nugget” to BayPIGgies on the with statement. Appendix C (slides 92 through 107) is on unit testing. If you’re new to unit testing or new to the Python unittest module, then this is worth a read.

Thursday, April 9, 2009

A Curious Course on Coroutines and Concurrency

On Wednesday, March 25th, I attended David Beazley's A Curious Course on Coroutines and Concurrency tutorial at PyCon 2009. This was an excellent tutorial that continued from where David's Generator Tricks for Systems Programmers tutorial from PyCon 2008 left off. (See my notes on that tutorial in my PyCon 2008 Notes blog post; I see I never did write a summary.)

David again has made his tutorial materials (including excellent slides and plenty of code samples) publicly available: http://www.dabeaz.com/coroutines/.

At the start of the tutorial I had written generators and had some recollection of the "Generator Tricks for Systems Programmers" tutorial. But I had only vague sense of what a coroutine is. After the tutorial I feel like my understanding of coroutines is much deeper. I'm ready to use them when called for. I might even understand them well enough to avoid looking for all kinds of inappropriate nails to hit with this new hammer.

After an entertaining overview—David's sense of humor provided some sugar to help the medicine of the sometimes challenging material go down—he introduces coroutines. As he writes (on slide 8), in Python 2.5 generators "picked up some new features to allow 'coroutines'" (see PEP-342 "Coroutines via Enhanced Generators"), "most notably: a new send() method". He adds "If Python books are any guide, this is the most poorly documented, obscure, and apparently useless feature of Python."

I digress, but as the author of the Python Essential Reference, David has raised the bar for himself. I happened to pick up a copy of a draft manuscript of a fragment of chapter 6—"Functions and Functional Programming"—from the upcoming 4th Edition at the Addison-Wesley booth at PyCon. Following a section in that chapter explaining what coroutines are is a section entitled "Using Generators and Coroutines". David explains that "generator functions are useful if you want to set up a processing pipeline, similar in nature to using a pipe in the UNIX shell." After an example of this he writes "Coroutines can be used to write programs based on data-flow processing. Programs organized in this way look like inverted pipelines." The example that follows explains that "the coroutine pipeline remains active indefinitely or until close() is explicitly called on it." So "a program can continue to feed data into a coroutine for as long as necessary", and his example shows two consecutive calls to send different data into the pipeline.

I've never owned a copy of Python Essential Reference, but after reading this draft manuscript and seeing first-hand David's ability to simplify sometimes complex material, I've pre-ordered a copy of the 4th Edition.

Anyway, I need to remember that this is meant to be a summary of the tutorial. If you want the details you can read the slides and look at the code samples.

The tutorial was divided into 9 parts. Part 1 (slides 15 through 33) is a very clear introduction to generators and coroutines. He summarizes (in slide 33) that generators produce data for iteration, whereas coroutines are consumers are data and are not related to iteration. He warns us not to mix the two concepts together.

In Part 2 ("Coroutines, Pipelines, and Dataflow", slides 34 through 52) David explains that coroutines can be used to set up pipes. Each pipeline needs an initial source (a producer) and and end-point (a sink). Because (unlike generators which pull data through the pipe with iteration) coroutines push data into the pipeline with send(), they allow you to send data to multiple destinations. That is, you can have branches in the pipeline. He shows broadcasting to multiple targets as an example. (See slides 44 through 46.) He concludes by showing how coroutines are "somewhat similar to OO design patterns involving simple handler objects". (I think he's talking about the Chain of Responsibility pattern.) He notes that just like a generator is an iterator "stripped down to the bare essentials", so is a coroutine very simple compared to the multiple classes required to implement this pattern. (This is an example of the claim made by Joe Gregorio in his The (lack of) design patterns in Python PyCon 2009 talk.) David also shows that coroutines are faster than objects (because of the lack of self lookups).

Part 3 ("Coroutines and Event Dispatching", slides 53 through 74) shows that "coroutines can be used to write various components that process event streams". His example shows parsing the XML data that is available with the real-time GPS tracking data of most Chicago Transit Authority buses. He has a coroutine that implements a simple state machine to convert the "events" from an XML parser into dictionaries of bus data, another coroutine to filter on dictionary fields, and a coroutine to print the dictionaries as a table. ''What's quite slick about this is that he hooks them together into a pipeline that works without modification'' with SAX, expat, and a custom C extension written on top of the expat C library. (Each is faster than its predecessor, and the latter is slightly faster than using ElementTree.)

Part 4 ("From Data Processing to Concurrent Programming", slides 75 through 91) shows how coroutines "naturally tie into problems involving threads and distributed systems", since you send data to coroutines just as you do to threads (via queues) or processes (via messages). He creates a coroutine call threaded and hooks it up (slide 84) to the example coroutines from Part 3 so the filters and printing coroutines run in a separate thread. (And he notes this makes it run about 50% slower!) He then shows how coroutines could also be used to bridge two processes over a pipe or socket. So he notes that coroutines allow us to separate the implementation of a task (the coroutines) from the execution environment (threads, subprocesses, network). But he cautions us that huge collections of coroutines, threads and processes may be difficult to maintain and without careful study may make your program run slower. He also warns that the send() method on a coroutine must be synchronized, and if you call send() on an already-executing coroutine your program will crash (so no loops or cycles in the pipeline or multiple threads sending data into the same coroutine).

From here, the tutorial gets much more "mondo". In Part 5 (slides 92 through 98) he explains that coroutines look like tasks, the building blocks of concurrent programming. In Part 6 (slides 99 through 109), he gives a crash course in operating systems (and shows the yield statement can be though of like a trap) in order to prepare us for Part 7 (slides 110 through 168), where he builds an operating system using coroutines. I'm not going to go into detail on this, because this summary is already too long and you're better off reading his slides and looking at the code sample than reading my description of this. I'll note though that I enjoyed the humor in slide 152, where having written the a Task class to wrap a coroutine, support for system calls and a Scheduler class with basic task management, he states "The next step is obvious; we must implement a web framework". But he settles for an echo server. In Part 8 (slides 169 through 188) he explains that coroutines can't call subroutine functions that yield, and explains the solution using "trampolining". Slide 187 is worth noting, where he shows that application code has "normal looking control flow", just like traditional socket code (and unlike any code using a module that uses event callbacks).

He wraps it all up in Part 9 (slides 189 through 198). Here's my summary of his summary:
  • generators (and coroutines) are "far more powerful than most people realize"
  • they have decent performance
  • but he's not convinced that it's worth using coroutines for general multitasking
  • it is "critically important" not to mix the three main uses of yield toegether:
  • iteration
  • receiving messages
  • a trap
If you find this at all interesting, I urge you to read David's slides and code. And join me in urging him to present another tutorial at PyCon 2010.

Wednesday, March 25, 2009

Why Python?

I started reading Programming Collective Intelligence again on my flight to Chicago for PyCon 2009. I started reading it months ago but never found the time to finish it. So now I’m starting again at the beginning, but hopefully this time I’ll read the parts I've already read faster (and with better recollection), and hopefully get through more—if not all—of it. (Not that I find it difficult or dull reading; on the contrary it’s very interesting and well written.)

I read something in the preface I don’t recall noticing last time. (Though I wish I had noticed it earlier.) Toby Segaran (the author) explains why he chose Python for all the example code in the book. He lists six reasons. Python is:
  • Consise - Python code tends to be shorter than other “mainstream languages”. So there’s less typing, but also it’s “easier to fit the algorithm in your head and really understand what it’s doing”.

  • Easy to read - If I recall correctly, I’ve heard Guido himself describe Python as “executable pseudocode”.

  • Easily extensible - In addition to the “batteries included”, there are many other modules that free and easy to download, install and use.

  • Interactive - If you program in Python and you never use it interactively, you may be a figment of my imagination. (Though others may be shocked to learn I don’t use IPython. When I finish reading Programming Collective Intelligence, next on my list is another book I started and didn’t make time to finish: Python for Unix and Linux System Administration. Chapter 2 is on IPython.)

  • Multiparadigm - “Python supports object-oriented, procedural, and functional styles of programming…Sometimes it’s useful to pass around functions as parameters and other times to capture state in an object.” As I become more proficient with Python I’m surprised that I use all three styles.

  • Multiplatform and free - “The code described in this book will work on Windows, Linux, and Macintosh.”

I wanted to blog about this because I’m sure I’ll want to refer to this in the future the next time someone asks “Why Python?” (Though I find I hear that question less and less. The word is spreading.)

Thursday, January 8, 2009

Book Review: Python Web Development with Django


Back in April 2008 I eagerly volunteered to review a “rough cut” of Python Web Development with Django by Jeff Forcier, Paul Bissex, and Wesley Chun when I read of the opportunity on the BayPIGgies mailing list. I first completed a first version of this review in April (based on a version of the rough cut updated March 11, 2008). But I was asked to hold my review until it was updated again—I updated the review using the next two updates, but didn’t complete it. (I’ll post the whole sordid story is on yacitus.com.) I’m ashamed to say the book was published before I finished my review. (But I did post several comments on the rough cut—under the name “yacitus”—so I’m happy that I contributed to the final outcome in a small way.) I’m pleased to finally finish my review, about 10 months after I first read the “rough cut”. It’s fairly long, so you may want to skip to my recommendation in the third-to-last paragraph (before the footnotes).

You’ll find that my coverage of some chapters is extremely detailed, while I only quickly mention others. The deeper coverage was written back when I was reviewing the rough cut, and I just don’t have time right now to go that deep on the material added to the book since then. I plan to (someday) use this book to help me convert www.spitzer.us to a “real” blog, and I will update this review as I read other chapters more carefully then.

While I’ve been using Python for about 4 years (full-time for about half that), I’m quite new to Django. I’ve read The Definitive Guide to Django (which I’ll refer to as TheDjangoBook below) and I won’t be able to resist comparing it to Python Web Development with Django (which I’ll refer to as PyWDwD below).

The first thing I want to do before I start reading (or before I buy) a book is to get a sense of the intended audience. If I’m in a bookstore, I’ll start by reading the back. When browsing online (which for me almost always means Amazon.com), I’ll look at the “Editorial Reviews”. It doesn’t really contain a description of the intended audience (and neither does the back of the book). But back in April there was a clue in the overview on the book’s public Safari page: “This book is designed to help you learn and use Django (and Python, if necessary)…” (though that text is no longer there). So I assumed that the book would contain an optional introduction to Python, but would mostly concentrate on Django.

The second thing I look at is the table of contents. (Which you’ll find the the Amazon.com Editorial Reviews section.) The book is divided into five parts: “Getting Started”, “Django in Depth”, “Django Applications by Example”, “Advanced Django Techniques and Features”, and the appendices.

I’m not going to spend too much time reviewing Chapter 1: “Practical Python for Django”. I have limited experience with Django (which is why I was so eager to read even this incomplete version of this book), but I am quite comfortable with Python. But I did skim through the chapter and tried to keep an eye out for any Python features that are used in Django that they don’t cover. I didn’t find any. I was surprised to find concise introductions to some of the more complex Python features like generators and decorators. After reading the chapter over, I would guess that it would be useful for people with a fair bit of programming experience who are new to Python, but I think it would be confusing for anyone completely new to programming.

The beginning of the chapter states that they intend for it to be more than just a high-level introduction. They explain Python’s “object model, memory management, and philosophies as well as giving a good number of samples and sidebars which directly relate to Django development”. So I also made note of any content that I found valuable. (1)

The next chapter is “Django for the Impatient”. As an introduction to Django, they walk the reader through creating a simple blog project. I’m familiar enough with Django that I skimmed this very quickly, but I expect it would be quite useful to a Django newbie. It was worth the skim: defining ordering in a model (using the ordering attribute of the Meta inner class) was new to me.

In the next chapter (#3), titled “Starting Out”, they take a step back and provide a “tool-agnostic” view of the Web, provide high-level explanations of Django models, views and templates and explain the general philosophy of the creators of Django. They warn the reader not to skip ahead (saying “…even intermediate and experienced Web developers can benefit from taking a step back and reviewing the fundamentals…”), but I found I could have completely skipped the “Dynamic website basics” section. I also could have skipped the “Understanding Models, Views and Templates” section and the explanation of Django’s spin on MVC (2), but I do feel that both would be useful to a Django newbie. The “Core philosophies of Django” section was interesting, but I didn’t learn anything new. (It may have reinforced what I already know, however.)

Part II “Django in Depth” starts with Chapter 4: “Defining and Using Models”. There are (of course) two sections. The “Defining Models” section starts with an explanation of why to use an ORM. Then they describe the different types of fields, and explain primary keys (and they describe the “unique=True” argument, which is new to me). Then they go into detail on foreign keys (many-to-one relationships), many-to-many relationships (both “simple” and “complex”) and a brief explanation of composition with one-to-one relationships (but no examples). That’s followed a brief description of constraining relationships with a “limit_choices_to” argument. That’s new to me, and I don’t see a use case for it, so I’d like to see a real example (rather than their contrived example of a Book model class that will only relate to authors whose name ends in Smith). Next is a detailed explanation of (the new Django feature) model inheritance, where they explain the two different approaches: abstract base classes and multi-table inheritance. They then describe the Meta inner class with a reasonable amount of detail and conclude with admin registration. They avoid a detailed explanation of the admin options, which is in keeping with this book’s intention to be more of a tutorial than a reference.

The “Using Models” section starts with an explanation of syncdb. They’re careful to explain right away that in spite of its name, syncdb will only create database tables to match models—it doesn’t do any sort of synchronization. I don’t recall TheDjangoBook explicitly stating this so clearly, but I also didn’t learn this the hard way. I think it became clear as I read through the examples. They provide a table of the manage.py functions, which I don’t recall seeing in TheDjangoBook, and if I read about the “sql*” functions and “loaddata” and “dumpdata” (in The DjangoBook) then I’ve forgotten.

The meat of the section is in the section on query syntax. I also don’t recall TheDjangoBook explaining so clearly what a Manager is. I’ve heard the term, but until I read it in PyWDwD I didn’t realize a Manager object is returned from a model class’s objects attribute, and its methods (all, filter, exclude and get) always return QuerySets. The explanation in the (next) “QuerySet as a building block” section is also new (to me) and quite lucid. They explain how QuerySet “…is lazy: it will only execute a database query when it absolutely has to…” and explain how this allows them to be composed into complex queries. There’s a lot of detail here, including tweaking the SQL with the QuerySet method extra. Then the chapter concludes with an explanation of how to use SQL features that Django doesn’t provide.

Chapter 5 is entitled “URLs, HTTP Mechanisms and Views”. The URLs section contains a very detailed explanation of URLconfs. Then follows the “Modeling HTTP: Requests, Responses and Middleware” section, which starts a description of request objects, within which their explanation of GET and POST is helpful. And then the section gets even meatier. Their explanation of cookies and sessions is (IIRC) not covered in TheDjangoBook. The section concludes with a brief description of response objects and middleware.

Finally the “Views / Logic” section explains that views are just Python functions, that must take an HttpRequest object and return an HttpResponse object (both of which were explained previously in the “Client/Server - HTTP” section). And then they jump right in to explaining generic views. (3) Unfortunately they don’t (yet?) provide any examples, so I imagine their descriptions of the most common generic views will go over the head of Django newbies. They state that the most common use of “semi-generic” views (calling a generic view from a custom view) is “…to work around an inherent limitation in the URLconf itself: you can’t perform logic with the captured URL parameters until the regular expression has been parsed.” But they only provide one short example with no explanation of what it does. Finally they very briefly explain custom views, and describe rendertoresponse() as replacing “…the two- or three-step process of creating a Context object, rendering a Template with it, and then returning an HttpResponse containing the result.”, but I don’t believe they have yet explained what a context object is and only very briefly described templates.

The final chapter in Part II (#6) is “Templates and Form Processing”, containing (of course) two sections. The “Templates” section is more of an overview of Django templates than an in-depth description. But I did find it clear and easy to follow. I didn’t learn anything new, but I already have a reasonable amount of experience using Django templates. There is plenty of detail in the “Forms” section, and more examples than many of the previous chapters. I enjoyed reading this chapter—it’s a big improvement over the limited coverage of forms in TheDjangoBook.

Part III, “Django Applications by Example” contains four chapters, each dedicated to a different example application. Chapter 7—“Photo Gallery”—presents an example application using Django’s image upload field and a custom ImageField subclass that automatically generates thumbnails. Chapter 8—“Content Management System”—defines “CMS” and describes the “Un-CMS” Flatpages App and presents a simple custom CMS. Chapter 9—“Liveblog”—walks through the creation of a blog application, including Ajax integration. The introduction to the chapter states it “goes over everything you need to know to integrate Ajax with a Django Web application without going too deep into the specifics of complex client-server interaction or animation.” (I’m particularly looking forward to reading this chapter carefully.)

Chapter 10—“Pastebin”—is (I think) last in Part III because it is a lesson in using generic views. They write: “…the essence of this example is seeing how much work we can hand off to the framework. Some might call this approach lazy, but every line of code you don’t write is one you don’t have to debug.” They show and explain the model, and the templates are self-explanatory. The explanation for the URLs is detailed. And then we’re ready to try it out—they walk us through the “add one” form, the newly created paste, the list of submitted items and the admin screen. Then they show us how to limit the number of recent pastes displayed, how to add syntax highlighting using the SyntaxHighlighter JavaScript library, and how to write a cron job that periodically deletes old items.

Part IV—“Advanced Django Techniques and Features” contains two chapters. Chapter 11—“Advanced Django Programming”—has plenty of meat. There are sections on customizing the admin, generating RSS or Atom feeds, generating downloadable files (including examples of a vCard, CSV, and a chart using PyCha), enhancing the Django ORM with custom managers, and plenty of detail on extending the template system (by creating custom template tags, inclusion tags and custom filters) and using other template engines (Mako in this case). Their explanation of inclusion tags is detailed and walks through a useful example of how to create a template tag to display a calendar grid.

Chapter 12—“Advanced Django Development”—also has plenty of satisfying detail. There’s a section on writing utility scripts using Django, with a couple examples: one that can be run using cron to delete old records, and a script to import email from an mbox file to a database using a Django model. They provide plenty of good advice and background on caching. I’m sure it’s just a sign of how little I know about Apache, but I had never heard of “ab”, the Apache Bench tool. I applaud the authors for taking the time to describe its use at the beginning of their caching session, and using it to show measured performance improvements. A section on testing covers doctest & unittest, testing models, testing your entire web app, and testing the Django codebase itself. And there’s a couple paragraphs on customizing the Django sources, where they (rightly) discourage the reader from doing so unless it’s worth contributing the changes back to the Django project.

There are 6 appendices. Appendix A is titled “Command Line Basics”. It is a very simple introduction to using a command-line environment on Linux or Unix. They cover common commands, options and arguments, pipes and redirection, environment variables, and the PATH. I think this would be quite useful to someone with no command-line experience (perhaps with a strictly Windows background).

Appendix B covers “Installing and Running Django”. Including installing Python (and brief mentions of Easy Install and IPython), installing Django itself, choosing and configuring a web server (they cover Apache & mod_python, WSGI, and FastCGI with flup), and choosing and configuring an SQL database (they covert SQLite, PostgreSQL, and MySQL, with a quick mention of Oracle and Microsoft SQL Server and IBM DB2).

Appendix C is titled “Tools for Practical Django Development”. The first section is on version control. They discuss the fundamentals of branches and merging, describe subversion and the Mercurial and Git DVCSs, and then walk through using version control on a Django project (using Mercurial). There’s a brief section on project management software, with a description of Trac. And a brief section on text editors with descriptions of Emacs, Vim, TextMate and Eclipse.

Appendix D is a quick 3 pages on “Finding, Evaluating, and Using Django Applications”, with brief sections on where to look for applications, how to evaluate them, how to use them, and sharing your own applications.

Appendix E covers “Django on the Google App Engine”. They focus on porting an existing Django app to App Engine, and creating a new Django app written specifically for App Engine. While not exhaustive, the detail on porting looks useful. (I hope to come back to this in more detail later.) But the intention of the appendix is to give an overview; they conclude with a list of online resources for more detail.

The final appendix F (and conclusion of the book) is two pages on “Getting Involved in the Django Project”. They describe ways to contribute that don’t require any programming, ways to contribute involving code that “still don’t require Herculean effort”, and offer ideas for contributions that would “have a significant impact on the Django community”. The conclude by pointing the reader to the two Django mailing lists, IRC and some Django community web sites. It feels like an appropriate ending to the book.

In conclusion, I recommend Python Web Development with Django to anyone considering (or just starting) using Django to build a web site. I recommend it over TheDjangoBook, not just because PyWDwD is more up-to-date, but also because I think it does a better job at explaining all the new concepts required when using a web framework. My main concern when reading the rough cut version was that there were not enough examples. But that’s been almost entirely rectified in the published book. (And the book should empower the reader to go out an find more open source examples as needed.)

The only other thing I noticed that was missing was examples of writing Django applications (Django terminology for modules built to be reusable, if possible). But a book can’t be everything to everyone, and that’s a niche that James Bennett’s book Practical Django Projects looks like it will fill nicely. (I hope to review it eventually.)

One other Django book I hope to find time to read is Marty Alchin’s Pro Django. It looks like it will pick up where these other two leave off.


Footnotes:

(1) I made note of the following tidbits along the way:
  • I didn’t know about the enumerate() built-in function. I’m sure there have been times where I could have used that to simplify my code. They have an explanation of a handy way to use enumerate in Django models that you’ll also find used in this Paul Bissex blog post.
  • I also wasn’t aware of the sorted() built-in function, which would also simplify my code at times in place of the list sort() method (which doesn’t have a return value).
  • Their explanation of “tuple-related gotchas” will be quite helpful to Python newbies.
  • I wasn’t aware of “from . import X” and “from .. import Y”. I can think of one specific place in my code where “from ..” will help.
  • Their entire “Common Gotchas” section is excellent reading.
(2) See http://docs.djangoproject.com/en/dev/faq/general/#django-appears-to-be-a-mvc-framework-but-you-call-the-controller-the-view-and-the-view-the-template-how-come-you-don-t-use-the-standard-names

(3) This is the opposite approach to TheDjangoBook, which (IIRC) gives many examples of views and saves generic views for a separate chapter. I don’t think one approach is demonstrably better than the other, but I found it easier to understand generic views this time. But that could be because I read PyWDwD after already reading about generic views in TheDjangoBook.