Python2, string .format(), and unicode


If you’ve dealt with unicode and byte str mixing in python2 before, you’ll know that there are certain percent-formatting operations that you absolutely should not do with them. For instance, if you are combining a string of each type and they both have non-ascii characters then you are going to get a traceback:

>>> print(u'くら%s' % (b'とみ',))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe3 in position 0: ordinal not in range(128)
>>> print(b'くら%s' % (u'とみ',))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe3 in position 0: ordinal not in range(128)

The canonical answer to this is to clean up your code to not mix unicode and byte str which seems fair enough here. You can convert one of the two strings to match with the other fairly easily:

>>> print(u'くら%s' % (unicode(b'とみ', 'utf-8'),))

However, if you’re part of a project which was written before the need to separate the two string types was realized you may be mixing the two types sometimes and relying on bug reports and python tracebacks to alert you to pieces of the code that need to be fixed. If you don’t get tracebacks then you may not bother to explicitly convert in some cases. Unfortunately, as code is changed you may find that the areas you thought of as safe to mix aren’t quite as broad as they first appeared. That can lead to UnicodeError exceptions suddenly popping up in your code with seemingly harmless changes….

A New Idiom

If you’re like me and trying to adopt python3-supported idioms into your python-2.6+ code bases then one of the changes you may be making is to switch from using percent formatting to construct your strings to the new string .format() method. This is usually fairly straightforward:

name = u"Kuratomi"

# Old style
print("Hello Mr. %s!" % (name,))

# New style
print("Hello Mr. {0}!".format(name))

# Output:
Hello Mr. Kuratomi!
Hello Mr. Kuratomi!

This seems like an obvious transformation with no possibility of UnicodeError being thrown. And for this simple example you’d be right. But we all know that real code is a little more obfuscated than that. So let’s start making this a little more real-world, shall we?

name = u"くらとみ"
print("Hello Mr. %s!" % (name,))
print("Hello Mr. {0}!".format(name))

# Output
Hello Mr. くらとみ!
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-3: ordinal not in range(128)

What happened here? In our code we set name to a unicode string that has non-ascii characters. Used with the old-style percent formatting, this continued to work fine. But with the new-style .format() method we ended up with a UnicodeError. Why? Well under the hood, the percent formatting uses the “%” operator. The function that handles the “%” operator (__mod__()) sees that you were given two strings one of which is a byte str and one of which is a unicode string. It then decides to convert the byte str to a unicode string and combine the two. Since our example only has ascii characters in the byte string, it converts successfully and python can then construct the unicode string u"Hello Mr. くらとみ!". Since it’s always the byte str that’s converted to unicode type we can build up an idea of what things will work and which will throw an exception:

# These are good as the byte string
# which is converted is ascii-only
"Mr. %s" % (u"くらとみ",)
u"%s くらとみ" % ("Mr.",)

# Output of either of those:
u"Mr. くらとみ"

# These will throw an exception as the
# *byte string* contains non-ascii characters
u"Mr. %s" % ("くらとみ",)
"%s くらとみ" % (u"Mr",)

Okay, so that explains what’s happening with the percent-formatting example. What’s happening with the .format() code? .format() is a method of one of the two string types (str for python2 byte strings or unicode for python2 text strings). This gives programmers a feeling that the method is more closely associated with the type it is a method of than the parameters that it is given. So the design decision was made that the method should convert to the type that the method is bound to instead of always converting to unicode string type. This means that we have to make sure parameters can be converted to the type of the format string rather than always to unicode. Taking that in mind, this is the matrix of things we expect to work and expect to fail:

# These are good as the parameter string
# which is converted is ascii-only
u"{0} くらとみ".format("Mr.")
"{0} くらとみ".format(u"Mr.")

# Output (first is a unicode, second is a str):
u"Mr. くらとみ"
"Mr. くらとみ"

# These will throw an exception as the
# parameters contain non-ascii characters
u"Mr. {0}".format("くらとみ")
"Mr. {0}".format(u"くらとみ")

So now we know why we get a traceback in the converted code but not in the original code. Let’s apply this to our example:

name = u"くらとみ"
# name is a unicode type so we need to make
# sure .format() does not implicitly convert it
print(u"Hello Mr. {0}!".format(name))

# Output
Hello Mr. くらとみ!

Alright! That seems good now, right? Are we done? Well, let’s take this real-world thing one step farther. With real-world users we often get transient errors because users are entering a value we didn’t test with. In real-world code, variables often aren’t being set a few lines above where you’re using them. Instead, they’re coming from user input or a config file or command line parsing which happened tens of function calls and thousands of lines away from where you are encountering your traceback. After you step through your program for a few hours you may be able to realize that the relation between your variable and where it is used looks something like this:

# Near the start of your program
name = raw_input("Your name")
if not name.strip():
    name = u"くらとみ"

# [..thousands of lines of code..]

print(u"Hello Mr. {0}!".format(name))

So what’s happening? There’s two ways that our variable could be set. One of those ways (the return from raw_input()) sets it to a byte str. The other way (when we set the default value) sets it to a unicode string. The way we’re using the variable in the print() function means that the value will be converted to a unicode string if it’s a byte string. Remember that we earlier determined that ascii-only byte strings would convert but non-ascii byte strings would throw an error. So that means the code will behave correctly if the default is used or if the user enters “Kuratomi” but it will throw an exception if the user enters “くらとみ” because it has non-ascii characters.

This is where explicit conversion comes in. We need to explicitly convert the value to a unicode string so that we do not throw a traceback when we use it later. There’s two sensible locations to do that conversion. The better long term option is to convert where the variable is being set:

name = raw_input("Your name")
name = unicode(name, "utf-8", "replace")
if not name.strip():
    name = u"くらとみ"

Doing it there means that everywhere in your code you know that the variable will contain a unicode string. If you do this to all of your variables you will get to the point where you know that all of your variables are unicode strings unless you are explicitly converting them to byte str (or have special variables that should always be bytes — in which case you should have a naming convention to identify them). Having this sort of default makes it much easier to write code that uses the variable without fearing that it will unexpectedly cause tracebacks.

The other point at which you can convert is at the point that the variable is being used:

if isinstance(name, 'str'):
    name = unicode(name, 'utf-8', 'replace')
print(u"Hello Mr. {0}!".format(name))

The drawbacks to setting the variable here include having to put this code in wherever you are using it (usually more places than the variable could be set) and having to add the isinstance check because you don’t know whether it was set to a unicode or str type at this point. However, it can be useful to use this strategy when you have some critical code deployed and you know you’re getting tracebacks at a specific location but don’t know what unintended consequences might occur from changing the type of the variable everywhere. In this case you might analyze the problem for a bit and decide to hotfix your production machines to convert at the point of use but in your development tree you change it where the variable is being set so that you have a bit more time to work your way through all the places that shows you that you are mixing string types.


Why sys.setdefaultencoding() will break code

I know wiser and more experienced Python coders have written to python-dev about this before but every time I’ve needed to reference one of those messages for someone else I have trouble finding one. This time when I did my google search the most relevant entry was a post from myself to the yum-devel mailing list in 2011. Since I know I’ll need to prove why setdefaultencoding() is to be avoided in the future I figured I should post the reasoning here so that I don’t have to search the web next time.

Some Background

15 years ago: Creating a Unicode Aware Python

In Python 2 it is possible to mix byte strings (str type) and text strings (unicode type) together to a limited extent. For instance:

>>> u'Toshio' == 'Toshio'
>>> print(u'Toshio' + ' Kuratomi')
Toshio Kuratomi

When you perform these operations Python sees that you have a unicode type on one side and a str type on the other. It takes the str value and decodes it to a unicode type and then performs the operation. The encoding it uses to interpret the bytes is what we’re going to call Python’s defaultencoding (named after sys.getdefaultencoding() which allows you to see what this value is set to.)

When the Python developers were first experimenting with a unicode-aware text type that was distinct from byte strings it was unclear what the value of defaultencoding should be. So they created a function to set the defaultencoding when Python started in order to experiment with different settings. The function they created was sys.setdefaultencoding() and the Python authors would modify their individual files to gain experience with how different encodings would change the experience of coding in Python.

Eventually, in October of 2000 (fourteen and a half years prior to me writing this) that experimental version of Python became Python-2.0 and the Python authors had decided that the sensible setting for defaultencoding should be ascii.

I know it’s easy to second guess the ascii decision today but remember 14 years ago the encoding landscape was a lot more cluttered. New programming languages and new APIs were emerging that optimized for fixed-width 2-byte encodings of unicode. 1-byte, non-unicode encodings for specific natural languages were even more popular then than they are now. Many pieces of data (even more than today!) could include non-ascii text without specifying what encoding to interpret that data as. In that environment anyone venturing outside of the ascii realm needed to be warned that they were entering a world where encoding dragons roamed freely. The ascii encoding helps to warn people that they were entering a land where their code had to take special precautions by throwing an error in many of the cases where the boundary was crossed.

However, there was one oversight about the unicode functionality that went into Python-2.0 that the Python authors grew to realize was a bad idea. That oversight was not removing the setdefaultencoding() function. They had taken some steps to prevent it being used outside of initialization (in the file) by deleting the reference to it from the sys module after Python initialized but it still existed for people to modify the defaultencoding in

The rise of the sys.setdefaultencoding() hack

As time went on, the utf-8 encoding emerged as the dominant encoding of both Unix-like systems and the Internet. Many people who only had to deal with utf-8 encoded text were tired of getting errors when they mixed byte strings and text strings together. Seeing that there was a function called setdefaultencoding(), people started trying to use it to get rid of the errors they were seeing.

At first, those with the ability to, tried modifying their Python installation’s global to make sys.setdefaultencoding do its thing. This is what the Python documentation suggests is the proper way to use it and it seemed to work on the user’s own machines. Unfortunately, the users often turned out to be coders. And it turned out that what these coders were doing was writing programs that had to work on machines run by other people: the IT department, customers, and users all over the Internet. That meant that applying the change in their often left them in a worse position than before: They would code something which would appear to work on their machines but which would fail for the people who were actually using their software.

Since the coders’ concern was confined to whether people would be able to run their software the coders figured if their software could set the defaultencoding as part of its initialization that would take care of things. They wouldn’t have to force other people to modify their Python install; their software could make that decision for them when the software was invoked. So they took another look at sys.setdefaultencoding(). Although the Python authors had done their best to make the function unavailable after python started up these coders hit upon a recipe to get at the functionality anyway:

import sys

Once this was run in the coders’ software, the default encoding for coercing byte strings to text strings was utf-8. This meant that when utf-8 encoded byte strings were mixed with unicode text strings, Python would successfully convert the str type data to unicode type and combine the two into one unicode string. This is what this new generation of coders were expecting from the majority of their data so the idea that solving their problem with just these few lines of (admittedly very hacky) code was very attractive to them. Unfortunately, there are non-obvious drawbacks to doing this….

Why sys.setdefaultencoding() will break your code

(1) Write once, change everything

The first problem with sys.setdefaultencoding() is not obviously a problem at first glance. When you call sys.setdefaultencoding() you are telling Python to change the defaultencoding for all of the code that it is going to run. Your software’s code, the code in the stdlib, and third-party library code all end up running with your setting for defaultencoding. That means that code which you weren’t responsible for that relied on the behaviour of having the defaultencoding be ascii would now stop throwing errors and potentially start creating garbage values. For instance, let’s say one of the libraries you rely on does this:

def welcome_message(byte_string):
        return u"%s runs your business" % byte_string
    except UnicodeError:
        return u"%s runs your business" % unicode(byte_string,

print(welcome_message(u"Angstrom (Å®)".encode("latin-1"))

Previous to setting defaultencoding this code would be unable to decode the “Å” in the ascii encoding and then would enter the exception handler to guess the encoding and properly turn it into unicode. Printing: Angstrom (Å®) runs your business. Once you’ve set the defaultencoding to utf-8 the code will find that the byte_string can be interpreted as utf-8 and so it will mangle the data and return this instead: Angstrom (Ů) runs your business.

Naturally, if this was your code, in your piece of software, you’d be able to fix it to deal with the defaultencoding being set to utf-8. But if it’s in a third party library that luxury may not exist for you.

(2) Let’s break dictionaries!

The most important problem with setting defaultencoding to the utf-8 encoding is that it will break certain assumptions about dictionaries. Let’s write a little code to show this:

def key_in_dict(key, dictionary):
    if key in dictionary:
        return True
    return False

def key_found_in_dict(key, dictionary):
    for dict_key in dictionary:
        if dict_key == key:
            return True
    return False

Would you assume that given the same inputs the output of both functions will be the same? In Python, if you don’t hack around with sys.setdefaultencoding(), your assumption would be correct:

>>> # Note: the following is the same as d = {'Café': 'test'} on
>>> #       systems with a utf-8 locale
>>> d = { u'Café'.encode('utf-8'): 'test' }
>>> key_in_dict('Café', d)
>>> key_found_in_dict('Café', d)
>>> key_in_dict(u'Café', d)
>>> key_found_in_dict(u'Café', d)
__main__:1: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal

But what happens if you call sys.setdefaultencoding('utf-8')? Answer: the assumption breaks:

>>> import sys
>>> reload(sys)
>>> sys.setdefaultencoding('utf-8')
>>> d = { u'Café'.encode('utf-8'): 'test' }
>>> key_in_dict('Café', d)
>>> key_found_in_dict('Café', d)
>>> key_in_dict(u'Café', d)
>>> key_found_in_dict(u'Café', d)

This happens because the in operator hashes the keys and then compares the hashes to determine if they are equal. In the utf-8 encoding, only the characters represented by ascii hash to the same values whether in a byte string or a unicode type text string. For all other characters the hash for the byte string and the unicode text string will be different values. The comparison operator (==), on the other hand, converts the byte string to a unicode type and then compares the results. When you call setdefaultencoding('utf-8') you allow the byte string to be transformed into a unicode type. Then the two text strings will be compared and found to be equal. The ramifications of this are that containment tests with in now yield different values than equality testing individual entries via ==. This is a pretty big difference in behaviour to get used to and for most people would count as having broken a fundamental assumption of the language.

So how does Python 3 fix this?

You may have heard that in Python 3 the default encoding has been switched from ascii to utf-8. How does it get away with that without encountering the equality versus containment problem? The answer is that python3 does not perform implicit conversions between byte strings (python3 bytes type) and text strings (python3 str type). Since the two objects are now entirely separate comparing them via both equality and containment will always yield False:

$ python3
>>> a = {'A': 1}
>>> b'A' in a
>>> b'A' == list(a.keys())[0]

At first, coming from python2 where ascii values were the same this might look a little funny. But just remember that bytes are really a type of number and you wouldn’t expect this to work either:

>>> a = {'1': 'one'}
>>> 1 in a
>>> 1 == list(a.keys())[0]

Pattern or Antipattern? Splitting up initialization with asyncio

“O brave new world, That has such people in’t!” – William Shakespeare, The Tempest

Edit: Jean-Paul Calderone (exarkun) has a very good response to this detailing why it should be considered an antipattern. He has some great thoughts on the implicit contract that a programmer is signing when they write an __init__() method and the maintenance cost that is incurred if a programmer breaks those expectations. Definitely worth reading!

Instead of spending the Thanksgiving weekend fighting crowds of shoppers I indulged my inner geek by staying at home on my computer. And not to shop online either — I was taking a look at Python-3.4’s asyncio library to see whether it would be useful in general, run of the mill code. After quite a bit of experimenting I do think every programmer will have a legitimate use for it from time to time. It’s also quite sexy. I think I’ll be a bit prone to overusing it for a little while 😉

Something I discovered, though — there’s a great deal of good documentation and blog posts about the underlying theory of asyncio and how to implement some broader concepts using asyncio’s API. There’s quite a few tutorials that skim the surface of what you can theoretically do with the library that don’t go into much depth. And there’s a definite lack of examples showing how people are taking asyncio’s API and applying them to real-world problems.

That lack is both exciting and hazardous. Exciting because it means there’s plenty of neat new ways to use the API that no one’s made into a wide-spread and oft-repeated pattern yet. Hazardous because there’s plenty of neat new ways to abuse the API that no one’s thought to write a post explaining why not to do things that way before. My joke about overusing it earlier has a large kernel of truth in it… there’s not a lot of information saying whether a particular means of using asyncio is good or bad.

So let me mention one way of using it that I thought about this weekend — maybe some more experienced tulip or twisted programmers will pop up and tell me whether this is a good use or bad use of the APIs.

Let’s say you’re writing some code that talks to a microblogging service. You have one class that handles both posting to the service and reading from it. As you write the code you realize that there’s some time consuming tasks (for instance, setting up an on-disk cache for posts) that you have to do in order to read from the service that you do not have to wait for if your first actions are going to be making new posts. After a bit of thought, you realize you can split up your initialization into two steps. Initialization needed for posting will be done immediately in the class’s constructor and initialization needed for reading will be setup in a future so that reading code will know when it can begin to process. Here’s a rough sketch of what an implementation might look like:

import os
import sqlite
import asyncio

import aiohttp

class Microblog:
    def __init__(self, url, username, token, cachedir):
        self.auth = token
        self.username = username
        self.url = url
        loop = asyncio.get_event_loop()
        self.init_future = loop.run_in_executor(None, self._reading_init, cachedir)

    def _reading_init(self, cachedir):
        # Mainly setup our cache
        self.cachedir = cachedir
        os.makedirs(cachedir, mode=0o755, exist_ok=True)
        self.db = sqlite.connect('sqlite:////{0}/cache.sqlite'.format(cachedir))
        # Create tables, fill in some initial data, you get the picture [....]

    def post(self, msg):
        data = dict(payload=msg)
        headers = dict(Authorization=self.token)
        reply = yield from aiohttp.request('post', self.url, data=data, headers=headers)
        # Manipulate reply a bit [...]
        return reply

    def sync_latest(self):
        # Synchronize with the initialization we need before we can read
        yield from self.init_future
        data = dict(per_page=100, page=1)
        headers = dict(Authorization=self.token)
        reply = yield from aiohttp.request('get', self.url, data=data, headers=headers)
        # Stuff the reply in our cache

if __name__ == '__main__':
    chirpchirp = Microblog('', 'a.badger', TOKEN, '/home/badger/cache/')
    loop = asyncio.get_event_loop()
    # Contrived -- real code would probably have a coroutine to take user input
    # and then submit that while interleaving with displaying new posts
    asyncio.async(' '.join(sys.argv[1:])))

Some of this code is just there to give an idea of how this could be used. The real question’s revolve around splitting up initialization into two steps:

  • Is yield from the proper way for sync_latest() to signal that it needs self.init_future to finish before it can continue?
  • Is it good form to potentially start using the object for one task before __init__ has finished all tasks?
  • Would it be better style to setup posting and reading separately? Maybe a reading class and a posting class or the old standby of invoking _reading_init() the first time sync_latest() is called?

Porting Kitchen to Python3: Part 1 — Detecting string types

I’ve spent a good part of the last week working on the python3 port of kitchen. It’s now to the point where I’ve reviewed all of the code and got the unittests passing. I still need to add some deprecation warnings and a gettext object that mirrors the python3 API instead of the python2 API. Then it’ll be ready for an alpha release. Still a lot of work to do before a final release. Most of the documentation will need to be updated to change from unicode + str to str + bytes and the best practices sections will need a major overhaul since a lot of the problems with python2 and unicode have either been fixed, mitigated, or moved to a different level.

It was both an easy and hard undertaking. The easy part was that kitchen is largely a collection of dependent but unrelated functions. So it’s reasonably easy to pick a set of functions, figure out that they don’t depend on anything else in kitchen, and then port them one by one.

The hard part is that a lot of those functions deal with things that are explicitly unicode and things that are explicitly byte strings; an area that has both changed dramatically in python3 and that 2to3 doesn’t handle very well. Here’s a couple of things I ended up doing to help out:

Detecting String Types

Kitchen has several places that need to know whether an object it’s been given is a byte string, unicode string, or a generic string. The python2 idioms for this are:

if isinstance(obj, basestring):
    pass # object is any of the string types
    if isinstance(obj, str):
        pass # object is a byte string
    elif isinstance(obj, unicode):
        pass # object is a unicode string
    pass # object was not a string type

In python3, a couple things have changed.

  • There’s no longer a basestring type as byte strings and unicode strings are no longer meant to be related types.
  • Byte strings now have an immutable (bytes) and mutable (bytearray) type.

With these changes, the python3 idioms equivalent to the python2 ones look something like this:

if isinstance(obj, str) or isinstance(obj, bytes) or isinstance(obj, bytearray):
    pass # any string type
    if isinstance(obj, bytes) or isinstance(obj, bytearray):
        pass # byte string
    elif isinstance(obj, str):
        pass # unicode string

There’s two issues with these changes:

  • code that needs to do this needs to be manually ported when moving from python2 to python3. 2to3 can correctly change all occurrences of isinstance(obj, unicode) to isinstance(obj, str) but occurrences of isinstance(obj, basestring) and isinstance(obj, str) will also be rendered as isinstance(obj, str) in the 2to3 output. This is correct for a lot of normal python2 code that is trying to separate strings from ints, floats, or other types but it is incorrect for code that’s trying to explicitly separate bytes from unicode. So you’ll need to hand-audit and fix your code wherever these idioms are being used.
  • This is more prolix and tedious to write than the python2 version and if your code has to do this sort of differentiation in many places you’ll soon get bored of it.

For kitchen, I added a few helper functions into kitchen.text.misc that encapsulate the python2 and python3 idioms. For instance:

def isbasestring(obj):
    if isinstance(obj, str) or isinstance(obj, bytes) or isinstance(obj, bytearray):
        return True
    return False

and similar for isunicodestring() and isbytestring(). [In case you’re curious, I broke with PEP8 style for these function names because of the long history of is* functions and methods in python and other programming languages.] By pushing these into functions, I can use if isbasetring(obj): on both python2 and python3. I only have to change the implementation of the is*string() functions in a single place when porting from python2 to python3.

Now let’s mention some of the caveats to using this:

  • In python, calling a function (isbasestring()) is somewhat expensive. So if you use this in any hot inner loops, you may want to benchmark with the function and with the expanded version to see whether you take a noticable loss of speed.
  • Not every piece of code is going to want to define “string” in the same way. For instance, bytearrays are mutable so maybe your code shouldn’t include those with the “normal” string types.
  • Maybe your code can be changed to only deal with unicode strings (str). In python3 byte strings are not as ubiquitous as they were in python2 so maybe your code can be changed to stop checking for the type of the object altogether or reduced to a single isinstance(obj, str). The language has evolved so when possible, evolve your code to adapt as well.

Next time: Literals

My first python3 script

I’ve been hacking on other people’s python3 code for a while doing porting and bugfixes but so far my own code has been tied to python2 because of dependencies. Yesterday I ported my first personal script from python2 to python3. This was just a simple, one file script that hacks together a way to track how long my kids are using the computer and log them off after they’ve hit a quota. The kind of thing that many a home sysadmin has probably hacked together to automate just a little bit of their routine. For that use, it seemed very straightforward to make the switch. There were only three changes in the language that I encountered when making the transition:

  • octal values. I use octal for setting file permissions. The syntax for octal values has changed from "0755" to "0o755"
  • exception catching. No longer can you do: except Exception, exc. The new syntax is: except Exception as exc.
  • print function. In python2, print is a keyword so you do this: print 'hello world'. In python3, it’s a function so you write it this way: print('hello world')
  • The strict separation of bytes and string types. Required me to specify that one subprocess function should return string instead of bytes to me

When I’ve worked on porting libraries that needed to maintain some form of compat between python2 (older versions… no nice shiny python-2.7 for you!) and python3 these concerns were harder to address as there needed to be two versions of the code (usually, maintained via automatic build-time invocation of 2to3). With this application/script, throwing out python2 compatibility was possible so switching over was just a matter of getting an error when the code executed and switching the syntax over.

This script also didn’t use any modules that had either not ported, been dropped, or been restructured in the switch from python2 to python3. Unlike my day job where urllib’s restructuring would affect many of the things that we’ve written and lack of ported third-party libraries would prevent even more things from being ported, this script (and many other of my simple-home-use scripts) didn’t require any changes due to library changes.

Verdict? Within these constraints, porting to python3 was as painless as porting between some python2.x releases has been. I don’t see any reason I won’t use python3 for new programming tasks like this. I’ll probably port other existing scripts as I need to enhance them.

Python3 porting organization

Last week, a few people crawled out of the wordwork and decided we wanted to start porting third party python modules to python3. We need a bit of structure for this since some of the time, we have people who are packagers for individual Linux distributions (or even, just at the one company that they work for) doing the job of porting something they need over. The work that is done in that one place then doesn’t get upstreamed for some reason (upstream is dead, upstream only wants to work on python2 problems at the moment, you got busy and forgot about it). Having a central place to coordinate these efforts would make for a nice way to make sure that we’re working on things that no one else has done instead of everybody duplicating efforts.

With that in mind, we’ve decided that we should collaborate on the python porting mailing list to try to organize what we’re doing. If you’re interested in doing some of this python3 porting work, join up and see how you can help!