Better unittest coverage stats: excluding code that only executes on Python 2/3

What we’re testing

In my last post I used pytest to write a unittest for code which wrote to sys.stdout on Python2 and sys.stdout.buffer on Python3. Here’s that code again:

import sys
import six

def write_bytes(some_bytes):
    # some_bytes must be a byte string
    if six.PY3:
        stdout = sys.stdout.buffer
        stdout = sys.stdout

and the unittest to test it:

import io
import sys

from byte_writer import write_bytes

def test_write_byte(capfdbinary):
    out, err = capfdbinary.readouterr()
    assert out == b'\xff'

This all works great but as we all know, there’s always room for improvement.


When working on code that’s intended to run on both Python2 and Python3, you often hear the advice that making sure that every line of your code is executed during automated testing is extremely helpful. This is, in part, because several classes of error that are easy to introduce in such code are easily picked up by tests which simply exercise the code. For instance:

  • Functions which were renamed or moved will throw an error when code attempts to use them.
  • Mixing of byte and text strings will throw an error as soon as the code that combines the strings runs on Python3.

These benefits are over and above the behaviours that your tests were specifically written to check for.

Once you accept that you want to strive for 100% coverage, though, the next question is how to get there. The Python coverage library can help. It integrates with your unittest framework to track which lines of code and which branches have been executed over the course of running your unittests. If you’re using pytest to write your unittests like I am, you will want to install the pytest-cov package to integrate coverage with pytest.

If you recall my previous post, I had to use a recent checkout of pytest to get my unittests working so we have to make sure not to overwrite that version when we install pytest-cov. It’s easiest to create a test-requirements.txt file to make sure that we get the right versions together:

$ cat test-requirements.txt
$ pip install --user --upgrade -r test-requirements.txt
$ pip3 install --user --upgrade -r test-requirements.txt

And then we can run pytest to get coverage information:

$ pytest --cov=byte_writer  --cov-report=term-missing --cov-branch
======================= test session starts ======================== .

---------- coverage: platform linux, python 3.5.4-final-0 ----------
Name             Stmts   Miss Branch BrPart  Cover   Missing
------------------------------------------------------------       7      1      2      1    78%   10, 7->10
===================== 1 passed in 0.02 seconds =====================

Yay! The output shows that currently 78% of is executed when the unittests are run. The Missing column shows two types of missing things. The 10 means that line 10 is not executed. The 7->10 means that there’s a branch in the code where only one of its conditions was executed. Since the two missing pieces coincide, we probably only have to add a single test case that satisfies the second code branch path to reach 100% coverage. Right?

Conditions that depend on Python version

If we take a look at lines 7 through 10 in we can see that this might not be as easy as we first assumed:

    if six.PY3:
        stdout = sys.stdout.buffer
        stdout = sys.stdout

Which code path executes here is dependent on the Python version the code runs under. Line 8 is executed when we run on Python3 and Line 10 is skipped. When run on Python2 the opposite happens. Line 10 is executed and Line 8 is skipped. We could mock out the value in six.PY3 to hit both code paths but since sys.stdout only has a buffer attribute on Python3, we’d have to mock that out too. And that would lead us back into conflict with pytest capturing stdout as we we figured out in my previous post.

Taking a step back, it also doesn’t make sense that we’d test both code paths on the same run. Each code path should be executed when the environment is correct for it to run and we’d be better served by modifying the environment to trigger the correct code path. That way we also test that the code is detecting the environment correctly. For instance, if the conditional was triggered by the user’s encoding, we’d probably run the tests under different locale settings to check that each encoding went down the correct code path. In the case of a Python version check, we modify the environment by running the test suite under a different version of Python. So what we really want is to make sure that the correct branch is run when we run the test suite on Python3 and the other branch is run when we execute it under Python2. Since we already have to run the test suite under both Python2 and Python3 this has the net effect of testing all of the code.

So how do we achieve that?

Excluding lines from coverage

Coverage has the ability to match lines in your code to a string and then exclude those lines from the coverage report. This allows you to tell coverage that a branch of code will never be executed and thus it shouldn’t contribute to the list of unexecuted code. By default, the only string to match is “pragma: no cover“. However, “pragma: no cover” will unconditionally exclude the lines it matches which isn’t really what we want. We do want to test for coverage of the branch but only when we’re running on the correct Python version. Luckily, the matched lines are customizable and further, they can include environment variables. The combination of these two abilities means that we can add lines to exclude that are different when we run on Python2 and Python3. Here’s how to configure coverage to do what we need it to:

$ cat .coveragerc
    pragma: no cover
    pragma: no py${PYTEST_PYMAJVER} cover

This .coveragerc includes the standard matched line, pragma: no cover and our addition, pragma: no py${PYTEST_PYMAJVER} cover. The addition uses an environment variable, PYTEST_PYMAJVER so that we can vary the string that’s matched when we invoke pytest.

Next we need to change the code in so that the special strings are present:

    if six.PY3:  # pragma: no py2 cover
        stdout = sys.stdout.buffer
    else:  # pragma: no py3 cover
        stdout = sys.stdout

As you can see, we’ve added the special comments to the two conditional lines. Let’s run this manually and see if it works now:

$ PYTEST_PYMAJVER=3 pytest --cov=byte_writer  --cov-report=term-missing --cov-branch
======================= test session starts ======================== .
---------- coverage: platform linux, python 3.5.4-final-0 ----------
Name             Stmts   Miss Branch BrPart  Cover   Missing
------------------------------------------------------------       6      0      0      0   100%
==================== 1 passed in 0.02 seconds =====================

$ PYTEST_PYMAJVER=2 pytest-2 --cov=byte_writer  --cov-report=term-missing --cov-branch
======================= test session starts ======================== .
--------- coverage: platform linux2, python 2.7.13-final-0 ---------
Name             Stmts   Miss Branch BrPart  Cover   Missing
------------------------------------------------------------       5      0      0      0   100%
===================== 1 passed in 0.02 seconds =====================

Well, that seemed to work! Let’s run it once more with the wrong PYTEST_PYMAJVER value to show that coverage is still recording information on the branches that are supposed to be used:

$ PYTEST_PYMAJVER=3 pytest-2 --cov=byte_writer  --cov-report=term-missing --cov-branch
======================= test session starts ======================== .
-------- coverage: platform linux2, python 2.7.13-final-0 --------
Name             Stmts   Miss Branch BrPart  Cover   Missing
------------------------------------------------------------       6      1      0      0    83%   8
===================== 1 passed in 0.02 seconds =====================

Yep. When we specify the wrong PYTEST_PYMAJVER value, the coverage report shows that the missing line is included as an unexecuted line. So that seems to be working.

Setting PYTEST_PYMAJVER automatically

Just one more thing… it’s kind of a pain to have to set the PYTEST_PYMAJVER variable with every test run, isn’t it? Wouldn’t it be better if pytest would automatically set that for you? After all, pytest knows which Python version it’s running under so it should be able to. I thought so too so I wrote pytest-env-info to do just that. When installed, pytest-env-info will set PYTEST_VER, PYTEST_PYVER, and PYTEST_PYMAJVER so that they are available to pytest_cov and other, similar plugins which can use environment variables to configure themselves. It’s available on pypi so all you have to do to enable it is add it to your requirements so that pip will install it and then run pytest:

$ cat test-requirements.txt
$ pip3 install --user -r test-requirements.txt
$ pytest --cov=byte_writer  --cov-report=term-missing --cov-branch
======================== test session starts =========================
plugins: env-info-0.1.0, cov-2.5.1 .
---------- coverage: platform linux2, python 2.7.14-final-0 ----------
Name             Stmts   Miss Branch BrPart  Cover   Missing
------------------------------------------------------------       5      0      0      0   100%
====================== 1 passed in 0.02 seconds ======================

9 thoughts on “Better unittest coverage stats: excluding code that only executes on Python 2/3

  1. If you change ‘else’ after ‘if six.PY3’ to ‘if six.PY2’, and the exclude line to
    “if six.${PYTEST_PYMAJVER}:” and can deal with the upper/lower case issue, then you would not need to burden the code with coverage comments.

    • That’s a great idea! PYTEST_PYMAJVER is set to “3” or “2” so “if six.PY${PYTEST_PYMAJVER}” should work out of the box. That said, there are some pieces of code where using six.PY2/3 is not the most natural method of writing the real code so we probably still need the escape hatch of a comment. Some examples I think might fall under that:

      if isinstance(mystring, six.text_type):
          b_data = mystring.encode('utf-8')
          b_data = mystring
      if sys.version_info <= (2, 6):
          cli_string = to_bytes(cli_string)
      args = shlex.split(cli_string)

      Admittedly, for the latter case there’s no harm in using a byte string on Python-2.7 so it can be rewritten to use six.PY2 and still be natural. The former is trickier, though. Under Python 2 many APIs accept either text or byte strings interchangably. On Python 3, those APIs only accept text strings or only accept byte strings. If I’m coding to that same standard, then I’d want both pathways to be tested on Python 2 but may not want or be able to test both pathways on Python 3. (mystring could be set by calling an external API which, on Python 2, returns text or bytes depending on what I call it with. On Python 3 sending that API bytes is an error and it only returns text.)

  2. Excluding lines from coverage measurement always blurs out the information you get from it. Instead of relying on the data, you need to remember that you have the lines tested somehow.

    Wouldn’t it be better to run tests under both Python versions (since you’re supporting them), let’s say with tox, and combine the coverage info from both runs? You could either run coverage on the second run with “-a” option to append to the already created “.coverage” file or switch the parallel option to true and run “coverage combine” after both runs. But I would prefer the first option if you don’t actually need the “parallel” option.

    That way you won’t have to use any trick and will have full, reliable info about what’s getting covered. Also, the tests running under 2 can be just a subset of all the tests, that will explore these places that version differences show.

    • The process above does envision running the tests under both Python 2 and Python 3. (Initially, before writing this blog post, I was setting the environment variables inside of tox.ini. But I decided that I wanted something that also worked for people who ran pytest directly rather than tox -e.) If you don’t run the tests under both Python 2 and Python 3 then you can’t know if you’re testing every line of code under both Python versions.

      Combining the coverage statistics seems like a vastly inferior solution to me. As you say, combining the coverage statistics means “the tests running under 2 can be just a subset of all tests, that will explore these places that version differences show” which I see as a drawback, not a feature. Our goal is to know that a user who runs our software on Python-2.6 or Python-3.6 will have a bug-free experience. To do that, our sub-goal is to make sure our test suite executes every line of code that can be reached on every Python version we support. Combining coverage from the Python-2 and Python-3 runs obfuscate that by obscuring whether a line of code was only executed under Python-2 or only executed under Python-3 or (as we want) executed on both.

      Excluding the branches by marking them with a comment marker is saying “When you test this on Python-2, the following branch is dead code. It will never be executed”. Which is what we actually do want to say about this code. So that seems both appropriate for this case and doesn’t have the drawbacks which combining the coverage statistics from your two runs would have.

      One thing that would be nice would be for coverage to report if any lines marked as excluded are executed over the course of the test run. That way you know it’s not actually dead code and either your marking is wrong or your code is wrong.

      /me goes to file an issue for coverage to add this.

      • Well, I’m not saying that you need to run a subset of tests for one of the versions. You could run full test suits for all versions. If you’re excluding some tests for some versions I’m sure you can grep for that, the same as for “no cover”s. You could even produce a separate coverage report for all versions, and then a collective one.

        Telling someone who doesn’t want to run Tox and is oblivious to “no cover” tricks that running the suite only for Python 2 covers all code can be misleading. Though with your explanation I think I understand why are you doing this. Everything’s a tradeoff, right? 🙂

      • Yep, everything is a tradeoff. And like I say, that branch *should be* dead code on that Python version so it should be okay to do it this way. Of course, using this process for code which isn’t unreachable on that Python version is a very bad idea.

  3. I have to ask: what’s the point of all this extra configuration and boilerplate when you can just append to your coverage data files. Pytest-cov even has a convenient `–cov-append` option for such usecase.

    • As I replied to Michał Bultrowicz earlier, appending stats from the two runs on separate Python versions is contrary to the goals we have in collecting coverage stats. Our need is to know that the code runs without bugs on both Python2 and Python3. If we simply append stats from the code running on Python2 and running on Python3 we don’t know if a “100% covered” report means that a particular section of the code was exercised on both Python2 and Python3 or only one of them. That makes the combined stat a poor measurement of whether our unittests are testing the code that we need in the environment that we need.

      Take, for example, test code like this:

      def test_string(capfd):
          out, err = capfd.readouterr()
          assert out == &quot;cafe\ncafe&quot;

      On Python2, the above code will test that my_print handles both byte strings and text strings. On Python3, the above code will only test that my_print handles text strings. If you combine your stats from running on both Python2 and Python3 the coverage report won’t inform you that your test case is incomplete on Python3 and needs to be changed.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.