Dear Lazyweb, how would you nicely bundle python code?

I’ve been looking into bundling the python six library into ansible because it’s getting painful to maintain compatibility with the old versions on some distros. However, the distribution developer in me wanted to make it easy for distro packagers to make sure the system copy was used rather than the bundled copy if needed and also make it easy for other ansible developers to make use of it. It seemed like the way to achieve that was to make an import in our namespace that would transparently decide which version of six was needed and use that. I figured out three ways of doing this but haven’t figured out which is better. So throwing the three ways out there in the hopes that some python gurus can help me understand the pros and cons of each (and perhaps improve on what I have so far).

Boilerplate
To be both transparent to our developers and use system packages if the system had a recent enough six, I created a six package in our namespace. Inside of this module I included the real six library as _six.py. Then I created an __init__.py with code to decide whether to use the system six or the bundled _six.py. So the directory layout is like this:

+ ansible/
  + __init__.py
  + compat/
    + __init__.py
    + six/
      + __init__.py
      + _six.py

__init__.py has two tasks. It has to determine whether we want the system six library or the bundled one. And then it has to make that choice what other code gets when it does import ansible.compat.six. here’s the basic boilerplate:

# Does the system have a six library installed?
try:
    import six as _system_six
except ImportError:
    _system_six = None

if _system_six:
    # Various checks that system six library is current enough
    if not hasattr(_system_six.moves, 'shlex_quote'):
        _system_six = None

if _system_six:
    # Here's where we have to load up the system six library
else:
    # Alternatively, we load up the bundled library

Loading using standard import
Now things start to get interesting. We know which version of the six library we want. We just have to make it available to people who are now going to use it. In the past, I’d used the standard import mechanism so that was the first thing I tried here:

if _system_six:
    from six import *
else:
    from ._six import *

As a general way of doing this, it has some caveats. It only pulls in the symbols that the module considers public. If a module has any functions or variables that are supposed to be public and marked with a leading underscore then they won’t be pulled in. Or if a module has an __all__ = [...] that doesn’t contain all of the public symbols then those won’t get pulled in. You can pull those additions in by specifying them explicitly if you have to.

For this case, we don’t have any issues with those as six doesn’t use __all__ and none of the public symbols are marked with a leading underscore. However, when I then started porting the ansible code to use ansible.compat.six I encountered an interesting problem:

# Simple things like this work
>>> from ansible.compat.six import moves
>>> moves.urllib.parse.urlsplit('https://toshio.fedorapeople.org/')
SplitResult(scheme='https', netloc='toshio.fedorapeople.org', path='/', query='', fragment='')

# this throws an error:
>>> from ansible.compat.six.moves import urllib
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ImportError: No module named moves

Hmm… I’m not quite sure what’s happening but I zero in on the word “module”. Maybe there’s something special about modules such that import * doesn’t give me access to import subpackages or submodules of that. Time to look for answers on the Internet…

The Sorta, Kinda, Hush-Hush, Semi-Official Way

Googling for a means to replace a module from itself eventually leads to a strategy that seems to have both some people who like it and some who don’t. It seems to be supported officially but people don’t want to encourage people to use it. It involves a module replacing its own entry in sys.modules. Going back to our example, it looks like this:

import sys
[...]
if _system_six:
    six = _system_six
else:
    from . import _six as six

sys.modules['ansible.compat.six'] = six

When I ran this with a simple test case of a python package with several nested modules, that seemed to clear up the problem. I was able to import submodules of the real module from my fake module just fine. So I was hopeful that everything would be fine when I implemented it for six.

Nope:

>>> from ansible.compat.six.moves import urllib
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ImportError: No module named moves

Hmm… same error. So I take a look inside of six.py to see if there’s any clue as to why my simple test case with multiple files and directories worked but six’s single file is giving us headaches. Inside I find that six is doing its own magic with a custom importer to make moves work. I spend a little while trying to figure out if there’s something specifically conflicting between my code and six’s code and then throw my hands up. There’s a lot of stuff that I’ve never used before here… it’ll take me a while to wrap my head around it and there’s no assurance that I’ll be able to make my code work with what six is doing even after I understand it. Is there anything else I could try to just tell my code to run everything that six would normally do when it is imported but do it in my ansible.compat.six namespace?

You tell me: Am I beating my code with the ugly stick?

As a matter of fact, python does provide us with a keyword in python2 and a function in python3 that might do exactly that. So here’s strategy number three:

import os.path
[...]
if _system_six:
    import six
else:
    from . import _six as six
six_py_file = '{0}.py'.format(os.path.splitext(six.__file__)[0])
exec (open(six_py_file, 'r'))

Yep, exec will take an open file handle of a python module and execute it in the current namespace. So this seems like it will do what we want. Let’s test it:

>>> from ansible.compat.six.moves import urllib
>>>
>>> from ansible.compat.six.moves.urllib.parse import urlsplit
>>> urlsplit('https://toshio.fedorapeople.org/')
SplitResult(scheme='https', netloc='toshio.fedorapeople.org', path='/', query='', fragment='')

So dear readers, you tell me — I now have some code that works but it relies on exec. And moreover, it relies on exec to overwrite the current namespace. Is this a good idea or a bad idea? Let’s contemplate a little further — is this an idea that should only be applied sparingly (Using sys.modules instead if the module isn’t messing around with a custom importer of its own) or is it a general purpose strategy that should be applied to other libraries that I might bundle as well? Are there caveats to doing things this way? For instance, is it bypassing the standard import caching and so might be slower? Is there a better way to do this that in my ignorance I jsut don’t know about?

Advertisements