More on Python 3

by on Dec.13, 2008

Since my earlier post on this subject, someone has since brought to my attention this blog post by James Bennett. James writes well, cutting straight through to the real issues at hand, but in some places I think his facts are incorrect, and in other places I draw different conclusions to the ones he draws.

First up, Unicode strings vs. byte strings. In fact, these are handled in almost exactly the same fashion in Python 2 and in Python 3; both languages have a type for storing strings of characters, and a type for storing strings of arbitrary bytes (including things like data read from a network socket, and the encoded form of character strings). In Python 3, the str type is for storing character strings, and the bytes type is for storing byte strings. In Python 2, the unicode type is for storing character strings, and the str type is for storing byte strings. That’s really the only difference; the Python 2 str type has some methods that the Python 3 bytes type doesn’t, but that’s a relatively unimportant difference. The real problem in Python 2 is that many people have used the str type to store character strings, when they really should have been using the unicode type; this includes things built into the language (like attribute names or function names), various stdlib modules, and vast oceans of third-party code.

What does Python 3 do to solve this? Well, not all that much, except for completely breaking everyone’s existing string-handling code; I guess James assumes that in the process of fixing all of their string-handling code, they’ll get things right this time around, but I’m somewhat less optimistic. Still, I think it is important to point out that Python 3 does *not* give you any additional tools for dealing with character / byte strings, nor does it make it any easier to work with them; at best, it just fixes some of the broken character / byte string-handling code that was being distributed with Python.

With that out the way, I’ll move on to the “different conclusions” part. First up, the “Death by a thousand cuts”; I know many programmers feel similarly about the myriad minor issues he mentions, but I’m simply not one of them. Sure, there are all sorts of minor annoyances, and they do start to add up over time, but they’re simply irrelevant compared with the big issues. I might spend two weeks out of a whole year dealing with them, as opposed to months of time spent working around the lack of production-quality libraries for certain tasks, or the lack of higher-level programming constructs requiring me to write pages and pages of lower-level code to solve a certain problem. I’ll admit that I used to find these minor issues a great annoyance, but over time, they’ve just faded away to background noise, just like much of the supposedly major differentiating factors between different libraries and different programming languages. Once you see the forest, you stop caring so much about the trees.

Speaking of libraries, the new standard-library reorganisation is all very exciting; but I would really have liked it if they’d spent the time and energy on actually improving the code to a level suitable for production applications. It really doesn’t matter how most of the standard library is organised, if you’re not going to be using any of it anyway. In addition, projects reorganise APIs *all the time*, and there’s a perfectly straightforward way to do it in a backwards-compatible fashion. You introduce the new API or new location of the API, deprecate the old one, and then eventually remove it. No Python 3-style chasm-of-incompatibility required.

Of course, some of the standard library changes are actual functional improvements, not just rearranging the deck chairs; I haven’t looked at it yet myself, but I’ll take it on faith that the new I/O library is a vast improvement over the old Python 2 I/O facilities. Except… you don’t need to break backwards-compatibility to introduce a new I/O library; and I assume it’ll be ported to Python 2 sooner or later. Indeed, this is a common trend in Python 3 improvements; all the really interesting functional improvements are stuff that can and most likely will be ported to Python 2, if it has not already been ported.

If Python was a brand new language, being developed from scratch with a brand new community, I would be very happy about all of the changes made in Python 3; but since it’s not, I must repeat my claim that aside from things that can be backported to Python 2 in the first place, absolutely none of the Python 3 changes are worth making the jump to what is essentially a whole new programming language.

:, , ,

  • http://lennartregebro.vox.com/ Lennart Regebro

    Funny. You say that incompatibilities is a problem because there is a lot of python software out there, but yet you also claim Python 2 is not production ready. That doesn't exactly make much sense.Neither is there any explanation of what is broken in the unicode handling of Python 3. The problem there is that there are two string types. In Python 3 there isn't, there is only one, so afaik the problem has been fixed.

  • http://mithrandi.vox.com/ Tristan Seligmann

    I claimed that portions of the standard library are not production
    ready. That's hardly a huge problem, since modules in the standard
    library aren't really any different from third-party libraries, aside
    from the installation convenience — it just means you need to either
    find an alternative that's of sufficient quality, or write your own. It
    is somewhat annoying, though, because people often assume that the code
    in the standard library is of a high quality, and only later discover
    all of the latent problems in various modules; then again, some of this
    is just due to Python's relatively long release cycle, which is quite
    suitable for a language implementation, but less suitable for
    libraries. (Maybe the Python stdlib should be released separately, and
    more frequently? There would still need to be a higher bar of quality,
    though.)There's nothing broken in the unicode handling of
    Python 3; but then again, there's very little broken in the unicode
    handling of Python 2 — the problem is the code people write, not the
    language itself. If you consider a "string" to be a sequence of
    characters, then Python 2 also has only one string type, called
    "unicode". There's no "problem" in the language to fix, people just
    have to understand the concept of encoding. I've blogged about this before;
    the only change that would be needed in that post to have it cover
    Python 3 is to replace mentions of "str" with "bytes", and mentions of
    "unicode" with "str".

  • http://eljay219.vox.com/ Eljay

    > If Python was a brand new language, being developed from scratch with a brand new community, I would be very happy about all of the changes made in Python 3; but since it's not, I must repeat my claim that aside from things that can be backported to Python 2 in the first place, absolutely none of the Python 3 changes are worth making the jump to what is essentially a whole new programming language.

Search

Loading