Better unittest coverage stats: excluding code that only executes on Python 2/3

What we’re testing

In my last post I used pytest to write a unittest for code which wrote to sys.stdout on Python2 and sys.stdout.buffer on Python3. Here’s that code again:

# byte_writer.py
import sys
import six

def write_bytes(some_bytes):
    # some_bytes must be a byte string
    if six.PY3:
        stdout = sys.stdout.buffer
    else:
        stdout = sys.stdout
    stdout.write(some_bytes)

and the unittest to test it:

# test_byte_writer.py
import io
import sys

from byte_writer import write_bytes

def test_write_byte(capfdbinary):
    write_bytes(b'\xff')
    out, err = capfdbinary.readouterr()
    assert out == b'\xff'

This all works great but as we all know, there’s always room for improvement.

Coverage

When working on code that’s intended to run on both Python2 and Python3, you often hear the advice that making sure that every line of your code is executed during automated testing is extremely helpful. This is, in part, because several classes of error that are easy to introduce in such code are easily picked up by tests which simply exercise the code. For instance:

  • Functions which were renamed or moved will throw an error when code attempts to use them.
  • Mixing of byte and text strings will throw an error as soon as the code that combines the strings runs on Python3.

These benefits are over and above the behaviours that your tests were specifically written to check for.

Once you accept that you want to strive for 100% coverage, though, the next question is how to get there. The Python coverage library can help. It integrates with your unittest framework to track which lines of code and which branches have been executed over the course of running your unittests. If you’re using pytest to write your unittests like I am, you will want to install the pytest-cov package to integrate coverage with pytest.

If you recall my previous post, I had to use a recent checkout of pytest to get my unittests working so we have to make sure not to overwrite that version when we install pytest-cov. It’s easiest to create a test-requirements.txt file to make sure that we get the right versions together:

$ cat test-requirements.txt
pytest-cov
git+git://github.com/pytest-dev/pytest.git@6161bcff6e3f07359c94a7be52ad32ecb882214
$ pip install --user --upgrade -r test-requirements.txt
$ pip3 install --user --upgrade -r test-requirements.txt

And then we can run pytest to get coverage information:

$ pytest --cov=byte_writer  --cov-report=term-missing --cov-branch
======================= test session starts ========================
test_byte_writer.py .

---------- coverage: platform linux, python 3.5.4-final-0 ----------
Name             Stmts   Miss Branch BrPart  Cover   Missing
------------------------------------------------------------
byte_writer.py       7      1      2      1    78%   10, 7->10
===================== 1 passed in 0.02 seconds =====================

Yay! The output shows that currently 78% of byte_writer.py is executed when the unittests are run. The Missing column shows two types of missing things. The 10 means that line 10 is not executed. The 7->10 means that there’s a branch in the code where only one of its conditions was executed. Since the two missing pieces coincide, we probably only have to add a single test case that satisfies the second code branch path to reach 100% coverage. Right?

Conditions that depend on Python version

If we take a look at lines 7 through 10 in byte_writer.py we can see that this might not be as easy as we first assumed:

    if six.PY3:
        stdout = sys.stdout.buffer
    else:
        stdout = sys.stdout

Which code path executes here is dependent on the Python version the code runs under. Line 8 is executed when we run on Python3 and Line 10 is skipped. When run on Python2 the opposite happens. Line 10 is executed and Line 8 is skipped. We could mock out the value in six.PY3 to hit both code paths but since sys.stdout only has a buffer attribute on Python3, we’d have to mock that out too. And that would lead us back into conflict with pytest capturing stdout as we we figured out in my previous post.

Taking a step back, it also doesn’t make sense that we’d test both code paths on the same run. Each code path should be executed when the environment is correct for it to run and we’d be better served by modifying the environment to trigger the correct code path. That way we also test that the code is detecting the environment correctly. For instance, if the conditional was triggered by the user’s encoding, we’d probably run the tests under different locale settings to check that each encoding went down the correct code path. In the case of a Python version check, we modify the environment by running the test suite under a different version of Python. So what we really want is to make sure that the correct branch is run when we run the test suite on Python3 and the other branch is run when we execute it under Python2. Since we already have to run the test suite under both Python2 and Python3 this has the net effect of testing all of the code.

So how do we achieve that?

Excluding lines from coverage

Coverage has the ability to match lines in your code to a string and then exclude those lines from the coverage report. This allows you to tell coverage that a branch of code will never be executed and thus it shouldn’t contribute to the list of unexecuted code. By default, the only string to match is “pragma: no cover“. However, “pragma: no cover” will unconditionally exclude the lines it matches which isn’t really what we want. We do want to test for coverage of the branch but only when we’re running on the correct Python version. Luckily, the matched lines are customizable and further, they can include environment variables. The combination of these two abilities means that we can add lines to exclude that are different when we run on Python2 and Python3. Here’s how to configure coverage to do what we need it to:

$ cat .coveragerc
[report]
exclude_lines=
    pragma: no cover
    pragma: no py${PYTEST_PYMAJVER} cover

This .coveragerc includes the standard matched line, pragma: no cover and our addition, pragma: no py${PYTEST_PYMAJVER} cover. The addition uses an environment variable, PYTEST_PYMAJVER so that we can vary the string that’s matched when we invoke pytest.

Next we need to change the code in byte_writer.py so that the special strings are present:

    if six.PY3:  # pragma: no py2 cover
        stdout = sys.stdout.buffer
    else:  # pragma: no py3 cover
        stdout = sys.stdout

As you can see, we’ve added the special comments to the two conditional lines. Let’s run this manually and see if it works now:

$ PYTEST_PYMAJVER=3 pytest --cov=byte_writer  --cov-report=term-missing --cov-branch
======================= test session starts ========================
test_byte_writer.py .
---------- coverage: platform linux, python 3.5.4-final-0 ----------
Name             Stmts   Miss Branch BrPart  Cover   Missing
------------------------------------------------------------
byte_writer.py       6      0      0      0   100%
==================== 1 passed in 0.02 seconds =====================

$ PYTEST_PYMAJVER=2 pytest-2 --cov=byte_writer  --cov-report=term-missing --cov-branch
======================= test session starts ========================
test_byte_writer.py .
--------- coverage: platform linux2, python 2.7.13-final-0 ---------
Name             Stmts   Miss Branch BrPart  Cover   Missing
------------------------------------------------------------
byte_writer.py       5      0      0      0   100%
===================== 1 passed in 0.02 seconds =====================

Well, that seemed to work! Let’s run it once more with the wrong PYTEST_PYMAJVER value to show that coverage is still recording information on the branches that are supposed to be used:

$ PYTEST_PYMAJVER=3 pytest-2 --cov=byte_writer  --cov-report=term-missing --cov-branch
======================= test session starts ========================
test_byte_writer.py .
-------- coverage: platform linux2, python 2.7.13-final-0 --------
Name             Stmts   Miss Branch BrPart  Cover   Missing
------------------------------------------------------------
byte_writer.py       6      1      0      0    83%   8
===================== 1 passed in 0.02 seconds =====================

Yep. When we specify the wrong PYTEST_PYMAJVER value, the coverage report shows that the missing line is included as an unexecuted line. So that seems to be working.

Setting PYTEST_PYMAJVER automatically

Just one more thing… it’s kind of a pain to have to set the PYTEST_PYMAJVER variable with every test run, isn’t it? Wouldn’t it be better if pytest would automatically set that for you? After all, pytest knows which Python version it’s running under so it should be able to. I thought so too so I wrote pytest-env-info to do just that. When installed, pytest-env-info will set PYTEST_VER, PYTEST_PYVER, and PYTEST_PYMAJVER so that they are available to pytest_cov and other, similar plugins which can use environment variables to configure themselves. It’s available on pypi so all you have to do to enable it is add it to your requirements so that pip will install it and then run pytest:

$ cat test-requirements.txt
pytest-cov
git+git://github.com/pytest-dev/pytest.git@6161bcff6e3f07359c94a7be52ad32ecb8822142
pytest-env-info
$ pip3 install --user -r test-requirements.txt
$ pytest --cov=byte_writer  --cov-report=term-missing --cov-branch
======================== test session starts =========================
plugins: env-info-0.1.0, cov-2.5.1
test_byte_writer.py .
---------- coverage: platform linux2, python 2.7.14-final-0 ----------
Name             Stmts   Miss Branch BrPart  Cover   Missing
------------------------------------------------------------
byte_writer.py       5      0      0      0   100%
====================== 1 passed in 0.02 seconds ======================
Advertisements

Python testing: Asserting raw byte output with pytest

The Code to Test

When writing code that can run on both Python2 and Python3, I’ve sometimes found that I need to send and receive bytes to stdout. Here’s some typical code I might write to do that:

# byte_writer.py
import sys
import six

def write_bytes(some_bytes):
    # some_bytes must be a byte string
    if six.PY3:
        stdout = sys.stdout.buffer
    else:
        stdout = sys.stdout
    stdout.write(some_bytes)

if __name__ == '__main__':
    write_bytes(b'\xff')

In this example, my code needs to write a raw byte to stdout. To do this, it uses sys.stdout.buffer on Python3 to circumvent the automatic encoding/decoding that occurs on Python3’s sys.stdout. So far so good. Python2 expects bytes to be written to sys.stdout by default so we can write the byte string directly to sys.stdout in that case.

The First Attempt: Pytest newb, but willing to learn!

Recently I wanted to write a unittest for some code like that. I had never done this in pytest before so my first try looked a lot like my experience with nose or unittest2: override sys.stdout with an io.BytesIO object and then assert that the right values showed up in sys.stdout:

# test_byte_writer.py
import io
import sys

import mock
import pytest
import six

from byte_writer import write_bytes

@pytest.fixture
def stdout():
    real_stdout = sys.stdout
    fake_stdout = io.BytesIO()
    if six.PY3:
        sys.stdout = mock.MagicMock()
        sys.stdout.buffer = fake_stdout
    else:
        sys.stdout = fake_stdout

    yield fake_stdout

    sys.stdout = real_stdout

def test_write_bytes(stdout):
    write_bytes()
    assert stdout.getvalue() == b'a'

This gave me an error:

[pts/38@roan /var/tmp/py3_coverage]$ pytest              (07:46:36)
_________________________ test_write_byte __________________________

stdout = 

    def test_write_byte(stdout):
        write_bytes(b'a')
>       assert stdout.getvalue() == b'a'
E       AssertionError: assert b'' == b'a'
E         Right contains more items, first extra item: 97
E         Use -v to get the full diff

test_byte_writer.py:27: AssertionError
----------------------- Captured stdout call -----------------------
a
===================== 1 failed in 0.03 seconds =====================

I could plainly see from pytest’s “Captured stdout” output that my test value had been printed to stdout. So it appeared that my stdout fixture just wasn’t capturing what was printed there. What could be the problem? Hmmm…. Captured stdout… If pytest is capturing stdout, then perhaps my fixture is getting overridden by pytest’s internal facility. Let’s google and see if there’s a solution.

The Second Attempt: Hey, that’s really neat!

Wow, not only did I find that there is a way to capture stdout with pytest, I found that you don’t have to write your own fixture to do so. You can just hook into pytest’s builtin capfd fixture to do so. Cool, that should be much simpler:

# test_byte_writer.py
import io
import sys

from byte_writer import write_bytes

def test_write_byte(capfd):
    write_bytes(b'a')
    out, err = capfd.readouterr()
    assert out == b'a'

Okay, that works fine on Python2 but on Python3 it gives:

[pts/38@roan /var/tmp/py3_coverage]$ pytest              (07:46:41)
_________________________ test_write_byte __________________________

capfd = 

    def test_write_byte(capfd):
        write_bytes(b'a')
        out, err = capfd.readouterr()
>       assert out == b'a'
E       AssertionError: assert 'a' == b'a'

test_byte_writer.py:10: AssertionError
===================== 1 failed in 0.02 seconds =====================

The assert looks innocuous enough. So if I was an insufficiently paranoid person I might be tempted to think that this was just stdout using python native string types (bytes on Python2 and text on Python3) so the solution would be to use a native string here ("a" instead of b"a". However, where the correctness of anyone else’s bytes <=> text string code is concerned, I subscribe to the philosophy that you can never be too paranoid. So….

The Third Attempt: I bet I can break this more!

Rather than make the seemingly easy fix of switching the test expectation from b"a" to "a" I decided that I should test whether some harder test data would break either pytest or my code. Now my code is intended to push bytes out to stdout even if those bytes are non-decodable in the user’s selected encoding. On modern UNIX systems this is usually controlled by the user’s locale. And most of the time, the locale setting specifies a UTF-8 compatible encoding. With that in mind, what happens when I pass a byte string that is not legal in UTF-8 to write_bytes() in my test function?

# test_byte_writer.py
import io
import sys

from byte_writer import write_bytes

def test_write_byte(capfd):
    write_bytes(b'\xff')
    out, err = capfd.readouterr()
    assert out == b'\xff'

Here I adapted the test function to attempt writing the byte 0xff (255) to stdout. In UTF-8, this is an illegal byte (ie: by itself, that byte cannot be mapped to any unicode code point) which makes it good for testing this. (If you want to make a truly robust unittest, you should probably standardize on the locale settings (and hence, the encoding) to use when running the tests. However, that deserves a blog post of its own.) Anyone want to guess what happens when I run this test?

[pts/38@roan /var/tmp/py3_coverage]$ pytest              (08:19:52)
_________________________ test_write_byte __________________________

capfd = 

    def test_write_byte(capfd):
        write_bytes(b'\xff')
        out, err = capfd.readouterr()
>       assert out == b'\xff'
E       AssertionError: assert '�' == b'\xff'

test_byte_writer.py:10: AssertionError
===================== 1 failed in 0.02 seconds =====================

On Python3, we see that the undecodable byte is replaced with the unicode replacement character. Pytest is likely running the equivalent of b"Byte string".decode(errors="replace") on stdout. This is good when capfd is used to display the Captured stdout call information to the console. Unfortunately, it is not what we need when we want to check that our exact byte string was emitted to stdout.

With this change, it also becomes apparent that the test isn’t doing the right thing on Python2 either:

[pts/38@roan /var/tmp/py3_coverage]$ pytest-2            (08:59:37)
_________________________ test_write_byte __________________________

capfd = 

    def test_write_byte(capfd):
        write_bytes(b'\xff')
        out, err = capfd.readouterr()
>       assert out == b'\xff'
E       AssertionError: assert '�' == '\xff'
E         - �
E         + \xff

test_byte_writer.py:10: AssertionError
========================= warnings summary =========================
test_byte_writer.py::test_write_byte
  /var/tmp/py3_coverage/test_byte_writer.py:10: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal
    assert out == b'\xff'

-- Docs: http://doc.pytest.org/en/latest/warnings.html
=============== 1 failed, 1 warnings in 0.02 seconds ===============

In the previous version, this test passed. Now we see that the test was passing because Python2 evaluates u"a" == b"a" as True. However, that’s not really what we want to test; we want to test that the byte string we passed to write_bytes() is the actual byte string that was emitted on stdout. The new data shows that instead, the test is converting the value that got to stdout into a text string and then trying to compare that. So a fix is needed on both Python2 and Python3.

These problems are down in the guts of pytest. How are we going to fix them? Will we have to seek out a different strategy that lets us capture stdout, overriding pytest’s builtin?

The Fourth Attempt: Fortuitous Timing!

Well, as it turns out, the pytest maintainers merged a pull request four days ago which implements a capfdbinary fixture. capfdbinary is like the capfd fixture that I was using in the above example but returns data as byte strings instead of as text strings. Let’s install it and see what happens:

$ pip install --user git+git://github.com/pytest-dev/pytest.git@6161bcff6e3f07359c94a7be52ad32ecb8822142
$ mv ~/.local/bin/pytest ~/.local/bin/pytest-2
$ pip3 install --user git+git://github.com/pytest-dev/pytest.git@6161bcff6e3f07359c94a7be52ad32ecb8822142

And then update the test to use capfdbinary instead of capfd:

# test_byte_writer.py
import io
import sys

from byte_writer import write_bytes

def test_write_byte(capfdbinary):
    write_bytes(b'\xff')
    out, err = capfdbinary.readouterr()
    assert out == b'\xff'

And with those changes, the tests now pass:

[pts/38@roan /var/tmp/py3_coverage]$ pytest              (11:42:06)
======================= test session starts ========================
platform linux -- Python 3.5.4, pytest-3.2.5.dev194+ng6161bcf, py-1.4.34, pluggy-0.5.2
rootdir: /var/tmp/py3_coverage, inifile:
plugins: xdist-1.15.0, mock-1.5.0, cov-2.4.0, asyncio-0.5.0
collected 1 item                                                    

test_byte_writer.py .

===================== 1 passed in 0.01 seconds =====================

Yay! Mission accomplished.

Python2, string .format(), and unicode

Primer

If you’ve dealt with unicode and byte str mixing in python2 before, you’ll know that there are certain percent-formatting operations that you absolutely should not do with them. For instance, if you are combining a string of each type and they both have non-ascii characters then you are going to get a traceback:

>>> print(u'くら%s' % (b'とみ',))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe3 in position 0: ordinal not in range(128)
>>> print(b'くら%s' % (u'とみ',))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe3 in position 0: ordinal not in range(128)

The canonical answer to this is to clean up your code to not mix unicode and byte str which seems fair enough here. You can convert one of the two strings to match with the other fairly easily:

>>> print(u'くら%s' % (unicode(b'とみ', 'utf-8'),))
くらとみ

However, if you’re part of a project which was written before the need to separate the two string types was realized you may be mixing the two types sometimes and relying on bug reports and python tracebacks to alert you to pieces of the code that need to be fixed. If you don’t get tracebacks then you may not bother to explicitly convert in some cases. Unfortunately, as code is changed you may find that the areas you thought of as safe to mix aren’t quite as broad as they first appeared. That can lead to UnicodeError exceptions suddenly popping up in your code with seemingly harmless changes….

A New Idiom

If you’re like me and trying to adopt python3-supported idioms into your python-2.6+ code bases then one of the changes you may be making is to switch from using percent formatting to construct your strings to the new string .format() method. This is usually fairly straightforward:

name = u"Kuratomi"

# Old style
print("Hello Mr. %s!" % (name,))

# New style
print("Hello Mr. {0}!".format(name))

# Output:
Hello Mr. Kuratomi!
Hello Mr. Kuratomi!

This seems like an obvious transformation with no possibility of UnicodeError being thrown. And for this simple example you’d be right. But we all know that real code is a little more obfuscated than that. So let’s start making this a little more real-world, shall we?

name = u"くらとみ"
print("Hello Mr. %s!" % (name,))
print("Hello Mr. {0}!".format(name))

# Output
Hello Mr. くらとみ!
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-3: ordinal not in range(128)

What happened here? In our code we set name to a unicode string that has non-ascii characters. Used with the old-style percent formatting, this continued to work fine. But with the new-style .format() method we ended up with a UnicodeError. Why? Well under the hood, the percent formatting uses the “%” operator. The function that handles the “%” operator (__mod__()) sees that you were given two strings one of which is a byte str and one of which is a unicode string. It then decides to convert the byte str to a unicode string and combine the two. Since our example only has ascii characters in the byte string, it converts successfully and python can then construct the unicode string u"Hello Mr. くらとみ!". Since it’s always the byte str that’s converted to unicode type we can build up an idea of what things will work and which will throw an exception:

# These are good as the byte string
# which is converted is ascii-only
"Mr. %s" % (u"くらとみ",)
u"%s くらとみ" % ("Mr.",)

# Output of either of those:
u"Mr. くらとみ"

# These will throw an exception as the
# *byte string* contains non-ascii characters
u"Mr. %s" % ("くらとみ",)
"%s くらとみ" % (u"Mr",)

Okay, so that explains what’s happening with the percent-formatting example. What’s happening with the .format() code? .format() is a method of one of the two string types (str for python2 byte strings or unicode for python2 text strings). This gives programmers a feeling that the method is more closely associated with the type it is a method of than the parameters that it is given. So the design decision was made that the method should convert to the type that the method is bound to instead of always converting to unicode string type. This means that we have to make sure parameters can be converted to the type of the format string rather than always to unicode. Taking that in mind, this is the matrix of things we expect to work and expect to fail:

# These are good as the parameter string
# which is converted is ascii-only
u"{0} くらとみ".format("Mr.")
"{0} くらとみ".format(u"Mr.")

# Output (first is a unicode, second is a str):
u"Mr. くらとみ"
"Mr. くらとみ"

# These will throw an exception as the
# parameters contain non-ascii characters
u"Mr. {0}".format("くらとみ")
"Mr. {0}".format(u"くらとみ")

So now we know why we get a traceback in the converted code but not in the original code. Let’s apply this to our example:

name = u"くらとみ"
# name is a unicode type so we need to make
# sure .format() does not implicitly convert it
print(u"Hello Mr. {0}!".format(name))

# Output
Hello Mr. くらとみ!

Alright! That seems good now, right? Are we done? Well, let’s take this real-world thing one step farther. With real-world users we often get transient errors because users are entering a value we didn’t test with. In real-world code, variables often aren’t being set a few lines above where you’re using them. Instead, they’re coming from user input or a config file or command line parsing which happened tens of function calls and thousands of lines away from where you are encountering your traceback. After you step through your program for a few hours you may be able to realize that the relation between your variable and where it is used looks something like this:

# Near the start of your program
name = raw_input("Your name")
if not name.strip():
    name = u"くらとみ"

# [..thousands of lines of code..]

print(u"Hello Mr. {0}!".format(name))

So what’s happening? There’s two ways that our variable could be set. One of those ways (the return from raw_input()) sets it to a byte str. The other way (when we set the default value) sets it to a unicode string. The way we’re using the variable in the print() function means that the value will be converted to a unicode string if it’s a byte string. Remember that we earlier determined that ascii-only byte strings would convert but non-ascii byte strings would throw an error. So that means the code will behave correctly if the default is used or if the user enters “Kuratomi” but it will throw an exception if the user enters “くらとみ” because it has non-ascii characters.

This is where explicit conversion comes in. We need to explicitly convert the value to a unicode string so that we do not throw a traceback when we use it later. There’s two sensible locations to do that conversion. The better long term option is to convert where the variable is being set:

name = raw_input("Your name")
name = unicode(name, "utf-8", "replace")
if not name.strip():
    name = u"くらとみ"

Doing it there means that everywhere in your code you know that the variable will contain a unicode string. If you do this to all of your variables you will get to the point where you know that all of your variables are unicode strings unless you are explicitly converting them to byte str (or have special variables that should always be bytes — in which case you should have a naming convention to identify them). Having this sort of default makes it much easier to write code that uses the variable without fearing that it will unexpectedly cause tracebacks.

The other point at which you can convert is at the point that the variable is being used:

if isinstance(name, 'str'):
    name = unicode(name, 'utf-8', 'replace')
print(u"Hello Mr. {0}!".format(name))

The drawbacks to setting the variable here include having to put this code in wherever you are using it (usually more places than the variable could be set) and having to add the isinstance check because you don’t know whether it was set to a unicode or str type at this point. However, it can be useful to use this strategy when you have some critical code deployed and you know you’re getting tracebacks at a specific location but don’t know what unintended consequences might occur from changing the type of the variable everywhere. In this case you might analyze the problem for a bit and decide to hotfix your production machines to convert at the point of use but in your development tree you change it where the variable is being set so that you have a bit more time to work your way through all the places that shows you that you are mixing string types.

My first python3 script

I’ve been hacking on other people’s python3 code for a while doing porting and bugfixes but so far my own code has been tied to python2 because of dependencies. Yesterday I ported my first personal script from python2 to python3. This was just a simple, one file script that hacks together a way to track how long my kids are using the computer and log them off after they’ve hit a quota. The kind of thing that many a home sysadmin has probably hacked together to automate just a little bit of their routine. For that use, it seemed very straightforward to make the switch. There were only four changes in the language that I encountered when making the transition:

  • octal values. I use octal for setting file permissions. The syntax for octal values has changed from "0755" to "0o755"
  • exception catching. No longer can you do: except Exception, exc. The new syntax is: except Exception as exc.
  • print function. In python2, print is a keyword so you do this: print 'hello world'. In python3, it’s a function so you write it this way: print('hello world')
  • The strict separation of bytes and string types. Required me to specify that one subprocess function should return string instead of bytes to me

When I’ve worked on porting libraries that needed to maintain some form of compat between python2 (older versions… no nice shiny python-2.7 for you!) and python3 these concerns were harder to address as there needed to be two versions of the code (usually, maintained via automatic build-time invocation of 2to3). With this application/script, throwing out python2 compatibility was possible so switching over was just a matter of getting an error when the code executed and switching the syntax over.

This script also didn’t use any modules that had either not ported, been dropped, or been restructured in the switch from python2 to python3. Unlike my day job where urllib’s restructuring would affect many of the things that we’ve written and lack of ported third-party libraries would prevent even more things from being ported, this script (and many other of my simple-home-use scripts) didn’t require any changes due to library changes.

Verdict? Within these constraints, porting to python3 was as painless as porting between some python2.x releases has been. I don’t see any reason I won’t use python3 for new programming tasks like this. I’ll probably port other existing scripts as I need to enhance them.

Account System 0.8.11 Release

We’ve just deployed a new Fedora Account System to production. This release just pulls a few new features that didn’t quite make the 0.8.10 release:

  • Ian Cole (icole) Added a feature to allow for email address to be used instead of login name for logging in. Because of the way we do authentication, this means that email addresses can also be used on the other applications on admin.fedoraproject.org as well.
  • Pierre-Yves Chibon (pingou) Implemented an audio captcha for people signing up for a new account. It generates a wav file that gets downloaded to your computer that you can listen to and then type in the proper answer to the captcha.
  • Adam M. Dutko (addutko) Standardized some of the errors that can be returned from our JSON API.
  • Our translation team pointed out a few areas where we weren’t loading translations correctly and I fixed them. Look forward to more complete translations in the future.

That’s it for this minor update.

/me goes to play with the audio captcha some more.

Kitchen 1.1.0 release

As mentioned last week a new kitchen release went out today. Since last week some small changes were made to the documentation and a few changes to make building kitchen easier were implemented but nothing has changed in the code. Here’s the text of the release announcement:

== Kitchen 1.1.0 has been released ==

Kitchen is a python library that brings together small snippets of code that you might otherwise find yourself reimplementing via cut and paste. Each little bit is useful and important but they usually feel too small and too trivial to create a whole module just for that one little function. However, experience has shown that any code implemented by copying will inevitably be shown to have bugs. And when you fix those bugs, you’ll wish you had created the module so you could fix the bug in one place rather than two (or five.. or ten…). Kitchen aims to be that one place.

Kitchen currently has code for easily setting up gettext so it won’t throw UnicodeErrors in corner cases, compatibility modules for different python2 stdlib versions (2.4, 2.5, 2.7), a little bit of iterators, and a whole lot of code for unicode-byte string conversion. In addition to the code, kitchen contains documentation that explains some of the common problems that arise when dealing with unicode in python2 and how to fix them through changes in coding practices and/or making use of functions from kitchen.

The 1.1.0 release enhances the gettext portion of kitchen. The major enhancements are:

  • get_translation_object can now be used as a drop in replacement for the stdlib’s gettext.translations() function.
  • If get_translation_object finds multiple message catalogs for the domain, it will setup the additional catalogs as fallbacks in case the message isn’t found in the first one.
  • The gettext and lgettext functions were reworked so that they guarantee that the string they return is both a byte str (this was present in previous kitchen releases) and is a valid sequence of bytes in the selected output_charset. This should prevent tracebacks if your code decodes and reencodes a value returned from the gettext and lgettext family of functions.
  • Several fixes to the way fallback message catalogs interacted with input and output charsets.

For the complete set of changes, see the NEWS file.