Nuggets of hard-won programming experience, brain-dumped for Googlability.

Tuesday, March 27, 2012

Py3k Porting Notes

I've been meaning to write this down for a while, but the wider release of PEP404 finally spurred me into getting it done. I spent a bunch of time porting a medium sized (~2.5kLoC) Python library to Python 3, using two different approaches, and there's a few tips that might have saved me time if I'd known them in advance.

2to3

For the first pass (2fecd513a4..cc08d9b081), I followed the recommended method of running 2to3, then fixed up the Python 2 source until the Py3k generated version passed all of the unit tests.

This was mostly straightforward, but tips that might have saved me time in this process are:

  • Understanding the old style versus new style import locations would have helped. But then I still don't really understand the wrinkles here.
    [2fecd513a4, e92f459593]
  • Get in the habit of using self.assertEqual rather than self.assertEquals in unit tests. [c5688b8003]
  • Doctests don't work well with 2to3. It's possible to get around this by contorting the results, but that rather spoils the point of doctests being simple and clear.
    [a4635065da]
  • I never did figure out a good way to name/mark packages for the Cheese
    Shop
    where you've got a Python 2.x version and a generated Py3k version.
    [a39f4239aa]

Dual Source

The original Python 2.x code and the generated Py3k code actually looked pretty similar, so the second pass (e334775b77..c90d133bc0) converted the codebase to a single set of source that could be used in both Python 2.x and Py3k.

This was rather trickier, and involved a few contortions. However, note that several of the contortions are because the minimum Python version I wanted to target is 2.5 – if the target had been >= 2.6, there's more to from __future__ import.

  • Unicode string literals are a pain. u"a string" is illegal in Py3k, but "a string" isn't Unicode in Python 2.x. In the meanwhile, there's a u("string that can include \u0101") function (which might be slow, so the code is tuned to avoid using it at runtime).
    (Would be fixed for a target of >= 2.6 with from __future__ import unicode_literals.)
    [e334775b77, 1897a0a30c, a49b35ead4, 59f4e2fef0,5e6f1e2447, e019a990aa, b4e6759e67, 9246f92746, b4cb1e9246, 3ff89a806c, 319c33ed67]
  • Needed a prnt function to be used instead of print.
    (Would be fixed for a target of >= 2.6 with from __future__ import print_function.)
    [c15fd9cb8d, d32f784cb4]

  • Needed to use sys.exc_info to portably get at a caught exception.
    (Would be fixed for a target of >= 2.6 because as is included there.)
    [3d8ddca92a]
  • There's no sys.maxint any more, and the suggested alternative
    (sys.maxsize) doesn't work on Python 2.5. So I just went for 65535.
    (Would be fixed for a target of >= 2.6 as sys.maxsize is included there.)
    [0964219b25]
  • No L suffix for literal longs any more, so need a wrapper to force ints to longs.
    [649b698b43]
  • Because the library includes auto-generated Python code, the generation tools needed to allow for Python 2.x/Py3k compatibility too. The new rpr function (sticking with disemvowelling as a way of naming things) converts generated code to use the u() utility.
    [3ff89a806c]

The net result of all of this is a small module that acts a portability layer.