Wednesday, 22 October 2008

How to Hunt for Easter Eggs in Python Open Source Code

first steps with -m pdb

Conceive a simple code block with your favourite open source library then run:

python -m pdb simple.py

Type s and hit return repeatedly to step through the code. This gets boring after 10 seconds and you'll want to do more stuff. Type help and you'll see a slew of commands.
  • j [linenumber] e.g. j 10 -> you need to be a good guesser here, you'll have a problem if you jump before or after the current code block
  • commands - if your program is crashing, go into pdb, type commands and hit return and you will get a magic stack trace
  • cont - restart the debugger if you're at the end of traipsing through your program

quit will exit the debugger. Now explore the other commands and make a list of your top 5 favourite debugger commands.

bugs as easter eggs

pdb is a great (though basic) for bug finding. Many people find bug-finding frustrating and gripe when difficulties come their way.

Think of bug finding as looking for Easter eggs. You understand a system well when you know all the Easter eggs hidden within it.

When faced with a new bug - celebrate! Difficulties will teach you much more than random success and "first-time lucky" code - you won't learn much from those kinds of experiences. Debug patiently, study the bugs, understand fully why they happen.

real-life easter egg examples

Here's a real-life Easter egg example I found in Python's BeautifulSoup (quick introduction - Python's BeautifulSoup (needs Py2.2 upwards) has two parts: BeautifulSoup for html, BeautifulStoneSoup for xml/sgml (built on sgmllib); it has a simple object model 1) PageElement - base class for Tag and NavigableString, and 2) NavigableString - base class for CData, PI, Comment, Declaration).

Grep for "enterprise" and you will see the Easter egg!

Here's another example where the above techniques were used to find a bug in a test program f0r BeautifulSoup, where the error message alone was insufficient to find the problem. I've copied the rogue code here for your perusal. Can you find the error using pdb? (you will need to install BeautifulSoup first obviously!). Here's the code:

import sys from BeautifulSoup import BeautifulSoup
def main(args): try: html = open("file", "r").readlines()
soup = BeautifulSoup(html)
soup.prettify()
print soup
except Exception, e: print e
main(sys.argv)

Look at the code above. What is it trying to do? What precise data transformations are taking place?

This is a good example of the caution needed when plugging the output of one API as input to another API. You need a clear understanding of the data types you are dealing with, and sometimes the internal represenation of those types (e.g. concept of "blittability" in .NET). It also shows that in dynamically typed languages you sometimes need to think even more carefully about data types.

A related post on UbuWorld entitled "Dynamic Typing Doesn't Mean You Don't Have to Think About Types" explores this idea with some more examples when "type-consciousness" is vital to debugging dynamically-typed programs.

We struggle with bugs when we forget programming is not magic but a very precise art form that is heavily dependent on exact data representation for things to work together correctly. Abstraction and dynamic typing (Scylla and Charybdis or programming languages) hides complexity but abstractions break when they are assembled in unexpected ways e.g. across API boundaries as illustrated above. Cost of such simplifications is that they make debugging more difficult, by giving the illusion of simplicity. Under the covers, very precise data transformations and memory allocations and deallocations are taking place. Don't forget this. Programming is a very precise art form.

Hint: the above example requires a one-line change to get it to work.

Happy Hunting.

No comments: