The Code to Test
When writing code that can run on both Python2 and Python3, I’ve sometimes found that I need to send and receive bytes to stdout. Here’s some typical code I might write to do that:
# byte_writer.py import sys import six def write_bytes(some_bytes): # some_bytes must be a byte string if six.PY3: stdout = sys.stdout.buffer else: stdout = sys.stdout stdout.write(some_bytes) if __name__ == '__main__': write_bytes(b'\xff')
In this example, my code needs to write a raw byte to stdout. To do this, it uses sys.stdout.buffer
on Python3 to circumvent the automatic encoding/decoding that occurs on Python3’s sys.stdout
. So far so good. Python2 expects bytes to be written to sys.stdout
by default so we can write the byte string directly to sys.stdout
in that case.
The First Attempt: Pytest newb, but willing to learn!
Recently I wanted to write a unittest for some code like that. I had never done this in pytest before so my first try looked a lot like my experience with nose or unittest2: override sys.stdout
with an io.BytesIO
object and then assert that the right values showed up in sys.stdout
:
# test_byte_writer.py import io import sys import mock import pytest import six from byte_writer import write_bytes @pytest.fixture def stdout(): real_stdout = sys.stdout fake_stdout = io.BytesIO() if six.PY3: sys.stdout = mock.MagicMock() sys.stdout.buffer = fake_stdout else: sys.stdout = fake_stdout yield fake_stdout sys.stdout = real_stdout def test_write_bytes(stdout): write_bytes() assert stdout.getvalue() == b'a'
This gave me an error:
[pts/38@roan /var/tmp/py3_coverage]$ pytest (07:46:36) _________________________ test_write_byte __________________________ stdout = def test_write_byte(stdout): write_bytes(b'a') > assert stdout.getvalue() == b'a' E AssertionError: assert b'' == b'a' E Right contains more items, first extra item: 97 E Use -v to get the full diff test_byte_writer.py:27: AssertionError ----------------------- Captured stdout call ----------------------- a ===================== 1 failed in 0.03 seconds =====================
I could plainly see from pytest’s “Captured stdout” output that my test value had been printed to stdout. So it appeared that my stdout fixture just wasn’t capturing what was printed there. What could be the problem? Hmmm…. Captured stdout… If pytest is capturing stdout, then perhaps my fixture is getting overridden by pytest’s internal facility. Let’s google and see if there’s a solution.
The Second Attempt: Hey, that’s really neat!
Wow, not only did I find that there is a way to capture stdout with pytest, I found that you don’t have to write your own fixture to do so. You can just hook into pytest’s builtin capfd fixture to do so. Cool, that should be much simpler:
# test_byte_writer.py import io import sys from byte_writer import write_bytes def test_write_byte(capfd): write_bytes(b'a') out, err = capfd.readouterr() assert out == b'a'
Okay, that works fine on Python2 but on Python3 it gives:
[pts/38@roan /var/tmp/py3_coverage]$ pytest (07:46:41) _________________________ test_write_byte __________________________ capfd = def test_write_byte(capfd): write_bytes(b'a') out, err = capfd.readouterr() > assert out == b'a' E AssertionError: assert 'a' == b'a' test_byte_writer.py:10: AssertionError ===================== 1 failed in 0.02 seconds =====================
The assert looks innocuous enough. So if I was an insufficiently paranoid person I might be tempted to think that this was just stdout using python native string types (bytes on Python2 and text on Python3) so the solution would be to use a native string here ("a"
instead of b"a"
. However, where the correctness of anyone else’s bytes <=> text string code is concerned, I subscribe to the philosophy that you can never be too paranoid. So….
The Third Attempt: I bet I can break this more!
Rather than make the seemingly easy fix of switching the test expectation from b"a"
to "a"
I decided that I should test whether some harder test data would break either pytest or my code. Now my code is intended to push bytes out to stdout even if those bytes are non-decodable in the user’s selected encoding. On modern UNIX systems this is usually controlled by the user’s locale. And most of the time, the locale setting specifies a UTF-8 compatible encoding. With that in mind, what happens when I pass a byte string that is not legal in UTF-8 to write_bytes()
in my test function?
# test_byte_writer.py import io import sys from byte_writer import write_bytes def test_write_byte(capfd): write_bytes(b'\xff') out, err = capfd.readouterr() assert out == b'\xff'
Here I adapted the test function to attempt writing the byte 0xff (255) to stdout. In UTF-8, this is an illegal byte (ie: by itself, that byte cannot be mapped to any unicode code point) which makes it good for testing this. (If you want to make a truly robust unittest, you should probably standardize on the locale settings (and hence, the encoding) to use when running the tests. However, that deserves a blog post of its own.) Anyone want to guess what happens when I run this test?
[pts/38@roan /var/tmp/py3_coverage]$ pytest (08:19:52) _________________________ test_write_byte __________________________ capfd = def test_write_byte(capfd): write_bytes(b'\xff') out, err = capfd.readouterr() > assert out == b'\xff' E AssertionError: assert '�' == b'\xff' test_byte_writer.py:10: AssertionError ===================== 1 failed in 0.02 seconds =====================
On Python3, we see that the undecodable byte is replaced with the unicode replacement character. Pytest is likely running the equivalent of b"Byte string".decode(errors="replace")
on stdout. This is good when capfd
is used to display the Captured stdout call
information to the console. Unfortunately, it is not what we need when we want to check that our exact byte string was emitted to stdout.
With this change, it also becomes apparent that the test isn’t doing the right thing on Python2 either:
[pts/38@roan /var/tmp/py3_coverage]$ pytest-2 (08:59:37) _________________________ test_write_byte __________________________ capfd = def test_write_byte(capfd): write_bytes(b'\xff') out, err = capfd.readouterr() > assert out == b'\xff' E AssertionError: assert '�' == '\xff' E - � E + \xff test_byte_writer.py:10: AssertionError ========================= warnings summary ========================= test_byte_writer.py::test_write_byte /var/tmp/py3_coverage/test_byte_writer.py:10: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal assert out == b'\xff' -- Docs: http://doc.pytest.org/en/latest/warnings.html =============== 1 failed, 1 warnings in 0.02 seconds ===============
In the previous version, this test passed. Now we see that the test was passing because Python2 evaluates u"a" == b"a"
as True
. However, that’s not really what we want to test; we want to test that the byte string we passed to write_bytes()
is the actual byte string that was emitted on stdout
. The new data shows that instead, the test is converting the value that got to stdout into a text string and then trying to compare that. So a fix is needed on both Python2 and Python3.
These problems are down in the guts of pytest. How are we going to fix them? Will we have to seek out a different strategy that lets us capture stdout, overriding pytest’s builtin?
The Fourth Attempt: Fortuitous Timing!
Well, as it turns out, the pytest maintainers merged a pull request four days ago which implements a capfdbinary fixture. capfdbinary
is like the capfd
fixture that I was using in the above example but returns data as byte strings instead of as text strings. Let’s install it and see what happens:
$ pip install --user git+git://github.com/pytest-dev/pytest.git@6161bcff6e3f07359c94a7be52ad32ecb8822142 $ mv ~/.local/bin/pytest ~/.local/bin/pytest-2 $ pip3 install --user git+git://github.com/pytest-dev/pytest.git@6161bcff6e3f07359c94a7be52ad32ecb8822142
And then update the test to use capfdbinary
instead of capfd
:
# test_byte_writer.py import io import sys from byte_writer import write_bytes def test_write_byte(capfdbinary): write_bytes(b'\xff') out, err = capfdbinary.readouterr() assert out == b'\xff'
And with those changes, the tests now pass:
[pts/38@roan /var/tmp/py3_coverage]$ pytest (11:42:06) ======================= test session starts ======================== platform linux -- Python 3.5.4, pytest-3.2.5.dev194+ng6161bcf, py-1.4.34, pluggy-0.5.2 rootdir: /var/tmp/py3_coverage, inifile: plugins: xdist-1.15.0, mock-1.5.0, cov-2.4.0, asyncio-0.5.0 collected 1 item test_byte_writer.py . ===================== 1 passed in 0.01 seconds =====================
Yay! Mission accomplished.
Pingback: Toshio Kuratomi: Better unittest coverage stats: excluding code that only executes on Python 2/3 | Fedora Colombia