Python 3 musings

2008-11-20 21:23 | Source

If you're a Python coder, and you haven't been living under a rock for the past few years, then you've probably heard of a thing called Python 3000, or Python 3.0. Python 3.0 is presented as a "new version" of Python, but with a twist: it represents a complete backwards compatibility break with the Python 2.x series, in order to make a whole slew of backwards-incompatible changes with no direct migration path. Thus, in some sense, this is actually a brand new language that just looks a lot like Python 2.x.

Is this a problem? Superficially, it doesn't "feel" like, say, trying to migrate from Java to C#; after all, it's just Python with a few cleanups and new features, right? Unfortunately, any non-trivial codebase is going to be riddled with code that is affected by the migration, and so the effort involved is still quite substantial. The problem runs deeper than that, however; code doesn't exist in a vacuum, it exists in a community of other code. All of your Python code has to run in the same runtime, so if you port your code to Python 3.x, you need all of the libraries / frameworks that you depend on to be available for Python 3.x. Of course, if you port your library to Python 3.x, people still using Python 2.x can't use any newer versions of your library anymore, so you either leave those users completely stranded, or you have to maintain two versions of your library simultaneously; that's really not much fun.

The Python developers have a solution to this problem: a tool called 2to3. 2to3 attempts to automatically convert Python 2.x code to Python 3.x code. Obviously, it can't do the right thing in every case, so the idea is that you modify your Python 2.x code so that it remains valid and correct Python 2.x code, but is also in a form that 2to3 can automatically convert to valid and correct Python 3.x code. Thus, you can simply maintain your Python 2.x codebase, generating the Python 3.x version, until one day you decide you no longer need Python 2.x code; at this point, you would run 2to3 over your codebase one last time and commit it, thus dropping Python 2.x forever.

Sounds great, right? Unfortunately, 2to3 is still in a fairly immature state, with lots of bugs and issues. In addition, there are many situations where 2to3 simply can't do the right thing. For example, in code that handles strings correctly, any use of the str type would simply be converted to use of the bytes type, and unicode would become str. However, a lot of code incorrectly uses the str type for strings of text, or because it uses APIs that are incorrectly written; in these cases, use of the str type might need to be converted to the Python 3.x str type, and 2to3 is never going to be able to figure this out on its own.

In the end, it's not yet clear that 2to3 will be a viable migration strategy, which opens the door to the possibility that we will see a schism in the Python ecosystem; with some people clinging to Python 2.x, while other people working in Python 3.x, isolated from their 2.x counterparts. Fortunately, the community has a lot of time to think about this: the 2.x series will probably continue until at least 2.7, if not beyond that. I'm personally still on Python 2.4 (my production servers run Debian stable), so I've got at least another 3 versions to go, which is quite a lot of time. If a viable migration strategy hasn't emerged by then, then I'm hoping that I'll either be able to switch from CPython to PyPy, or I'll already have made the jump to greener pastures, and thus not have to worry about Python at all.