Tuesday, 16 June 2009

Debugging and fixing python code on the fly

As mentioned in the previous post, it is very time consuming to debug an application that has long execution time because the programmer often has to run the application multiple times to see if a fix is working and to fix multiple bugs. In this post I am going to describe a technique that enables trying out fixes multiple times and fix multiple bugs within one execution.
The idea is to use Python's decorator to inject some scaffoldings to the code that's to be tested and debugged, and use Python's ability to dynamically reload source file to fix issues and resume execution. Here is the decorator:
def untrusted(func):
  def wrapper_func(*args, **kwds):
    try:
      return func(*args, **kwds)
    except Exception, e:
      print str(e) or e.__class__
      import pdb
      pdb.set_trace()
  return wrapper_func
It just calls the function that's being decorated and go into debugging mode if an exception occurs. Here is a function that has a typo, and imagine it's going to be called in the middle of an hour long batch job.

@untrusted
def print_date(d):
print d.strftim("%Y-%m-%d")

When the function is called, an exception is raised and caught by the decorator and on the command line it looks like:
'datetime.datetime' object has no attribute 'strftim'
--Return--
> /Users/jiayao/examples/jiayao/debug.py(9)wrapper_func()->None
-> pdb.set_trace()
(Pdb) 

Now I can fix the typo in the source code. Then:
(Pdb) import example

(Pdb) example.print_date(*args)
2009-06-16
Then the program can be resumed as if no error has ever happened. "import example" loads the source code dynamically so I can execute the correct implementation of "print_date" with the same arguments as the original invocation. Note "import example" will work only once, calling import on the same module more than once has no effect, so to load the source again you need to call "reload(example)" the next time.

Another example, this time the error is using a module but forgot to import it first:
@untrusted
def match(pattern, text):
  return re.search(pattern, text)
match("^abc", "abcde")

Entering debug mode:
global name 're' is not defined
--Return--
> /Users/jiayao/examples/jiayao/debug.py(9)wrapper_func()->None
-> pdb.set_trace()
(Pdb)

Simple calling import re and invoke the function again will not work because the function maintains it's own copy of globals including the imports. "import re" here will not change the small world encapsulated in the function object. So we have to inject the import into it:
(Pdb) import re
(Pdb) func.func_globals['re'] = re
(Pdb) func(*args)
<_sre.SRE_Match object at 0x24e4f0>

One last example, this is dealing with objects and it's a bit more complicated:
class A(object):
  def __init__(self, text=None):
    self.text = text

  @untrusted
  def search(self, keyword):
    return keyword in self.text

a = A()
  a.search("abc")


>python example.py
argument of type 'NoneType' is not iterable
--Return--
> /Users/jiayao/examples/jiayao/debug.py(9)wrapper_func()->None
-> pdb.set_trace()

And we fix "search" function in the source code:
@untrusted
  def search(self, keyword):
    if self.text:
      return keyword in self.text
    else:
      return False

And we try out the fix:
(Pdb) import example
(Pdb) example.A.search(*args)
*** TypeError: unbound method wrapper_func() must be called with A instance as first argument (got A instance instead)

This is a strange error at first glance. But the "A" is not the same as the other "A":
(Pdb) example.A
< class 'example.A'>
(Pdb) type(args[0])
< class '__main__.A'>

So we can not call example.A on an __main__.A object. We can work around this:
(Pdb) new_a = example.A()
(Pdb) new_a.__dict__ = args[0].__dict__
(Pdb) example.A.search(new_a, *args[1:])
False

Hope this is useful for some fellow programmers out there. Use the time you saved wisely! :)

Wednesday, 10 June 2009

I love Python, I love dyanmic typing, but I also feel the pain

Most people who have written python programs that takes longer than a couple of minutes to run remembers the pain. The pain of program crashing after x minutes/hours due to some trivial problems like typos in function names. You carefully fix the problem, and run it again, x minutes later yourself debugging another crash caused by another trivial problem...

What is the root problem? A lot of people who came from Java or C++ background will say: you need static type checking! True, static type checking would have caught a lot if not most of the mistakes I made in writing Python code. Although that is still a subset of the problems. Generally speaking, the pain comes from errors that are not detectable until runtime.

What are the solutions? For the ones that can be caught by type checking, the natural answer is "Let there be type checking!". That has been discussed extensively in the Python community in the past few years. See [1]Guido's essays on the topic. It's not introduced in Py3k. PEP 246 was a formal attempt to add optional type checking in Python but it was rejected in 2009 because "Something much better is about to happen. --- GvR". So for now the problem is not fundamentally addressed. One partial solution is static analysis. Pycheckerand Pylint are the two most popular ones. They catch a lot of errors that are usually caught by the compiler in static typing languages. They also catch coding style violations which in turn helps writing less error prone code. Still there are a lot errors goes undetected under these tools. For example:
def print_date(d):
  d.strftiem("%Y-%m-%d")
Nobody knows anything about "d" and there is no way to find out "strftiem" is a typo until we run it. And the code may have executed for hours by the time this function is called.

Another partial solution is unit testing. Writing comprehensive unit tests can really iron out a lot of problems like this. But there are a few reasons unit testing is not enough to catch all the errors, but I won't go into that here.

Running static analysis frequently and writing unit tests can really save a lot of pains. But what next? There are still runtime errors falling through the cracks. Logging and debugger can help finding the problem, but you have to run the code again and again to iron out the all problems, which can be hours or days...

In the next post I will talk about a technique that I use to further reduce the pain.

[1] Guido's essays on static type checking