Dear Lazyweb, how would you nicely bundle python code?

I’ve been looking into bundling the python six library into ansible because it’s getting painful to maintain compatibility with the old versions on some distros. However, the distribution developer in me wanted to make it easy for distro packagers to make sure the system copy was used rather than the bundled copy if needed and also make it easy for other ansible developers to make use of it. It seemed like the way to achieve that was to make an import in our namespace that would transparently decide which version of six was needed and use that. I figured out three ways of doing this but haven’t figured out which is better. So throwing the three ways out there in the hopes that some python gurus can help me understand the pros and cons of each (and perhaps improve on what I have so far).

Boilerplate
To be both transparent to our developers and use system packages if the system had a recent enough six, I created a six package in our namespace. Inside of this module I included the real six library as _six.py. Then I created an __init__.py with code to decide whether to use the system six or the bundled _six.py. So the directory layout is like this:

+ ansible/
  + __init__.py
  + compat/
    + __init__.py
    + six/
      + __init__.py
      + _six.py

__init__.py has two tasks. It has to determine whether we want the system six library or the bundled one. And then it has to make that choice what other code gets when it does import ansible.compat.six. here’s the basic boilerplate:

# Does the system have a six library installed?
try:
    import six as _system_six
except ImportError:
    _system_six = None

if _system_six:
    # Various checks that system six library is current enough
    if not hasattr(_system_six.moves, 'shlex_quote'):
        _system_six = None

if _system_six:
    # Here's where we have to load up the system six library
else:
    # Alternatively, we load up the bundled library

Loading using standard import
Now things start to get interesting. We know which version of the six library we want. We just have to make it available to people who are now going to use it. In the past, I’d used the standard import mechanism so that was the first thing I tried here:

if _system_six:
    from six import *
else:
    from ._six import *

As a general way of doing this, it has some caveats. It only pulls in the symbols that the module considers public. If a module has any functions or variables that are supposed to be public and marked with a leading underscore then they won’t be pulled in. Or if a module has an __all__ = [...] that doesn’t contain all of the public symbols then those won’t get pulled in. You can pull those additions in by specifying them explicitly if you have to.

For this case, we don’t have any issues with those as six doesn’t use __all__ and none of the public symbols are marked with a leading underscore. However, when I then started porting the ansible code to use ansible.compat.six I encountered an interesting problem:

# Simple things like this work
>>> from ansible.compat.six import moves
>>> moves.urllib.parse.urlsplit('https://toshio.fedorapeople.org/')
SplitResult(scheme='https', netloc='toshio.fedorapeople.org', path='/', query='', fragment='')

# this throws an error:
>>> from ansible.compat.six.moves import urllib
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ImportError: No module named moves

Hmm… I’m not quite sure what’s happening but I zero in on the word “module”. Maybe there’s something special about modules such that import * doesn’t give me access to import subpackages or submodules of that. Time to look for answers on the Internet…

The Sorta, Kinda, Hush-Hush, Semi-Official Way

Googling for a means to replace a module from itself eventually leads to a strategy that seems to have both some people who like it and some who don’t. It seems to be supported officially but people don’t want to encourage people to use it. It involves a module replacing its own entry in sys.modules. Going back to our example, it looks like this:

import sys
[...]
if _system_six:
    six = _system_six
else:
    from . import _six as six

sys.modules['ansible.compat.six'] = six

When I ran this with a simple test case of a python package with several nested modules, that seemed to clear up the problem. I was able to import submodules of the real module from my fake module just fine. So I was hopeful that everything would be fine when I implemented it for six.

Nope:

>>> from ansible.compat.six.moves import urllib
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ImportError: No module named moves

Hmm… same error. So I take a look inside of six.py to see if there’s any clue as to why my simple test case with multiple files and directories worked but six’s single file is giving us headaches. Inside I find that six is doing its own magic with a custom importer to make moves work. I spend a little while trying to figure out if there’s something specifically conflicting between my code and six’s code and then throw my hands up. There’s a lot of stuff that I’ve never used before here… it’ll take me a while to wrap my head around it and there’s no assurance that I’ll be able to make my code work with what six is doing even after I understand it. Is there anything else I could try to just tell my code to run everything that six would normally do when it is imported but do it in my ansible.compat.six namespace?

You tell me: Am I beating my code with the ugly stick?

As a matter of fact, python does provide us with a keyword in python2 and a function in python3 that might do exactly that. So here’s strategy number three:

import os.path
[...]
if _system_six:
    import six
else:
    from . import _six as six
six_py_file = '{0}.py'.format(os.path.splitext(six.__file__)[0])
exec (open(six_py_file, 'r'))

Yep, exec will take an open file handle of a python module and execute it in the current namespace. So this seems like it will do what we want. Let’s test it:

>>> from ansible.compat.six.moves import urllib
>>>
>>> from ansible.compat.six.moves.urllib.parse import urlsplit
>>> urlsplit('https://toshio.fedorapeople.org/')
SplitResult(scheme='https', netloc='toshio.fedorapeople.org', path='/', query='', fragment='')

So dear readers, you tell me — I now have some code that works but it relies on exec. And moreover, it relies on exec to overwrite the current namespace. Is this a good idea or a bad idea? Let’s contemplate a little further — is this an idea that should only be applied sparingly (Using sys.modules instead if the module isn’t messing around with a custom importer of its own) or is it a general purpose strategy that should be applied to other libraries that I might bundle as well? Are there caveats to doing things this way? For instance, is it bypassing the standard import caching and so might be slower? Is there a better way to do this that in my ignorance I jsut don’t know about?

Advertisements

7 thoughts on “Dear Lazyweb, how would you nicely bundle python code?

  1. FWIW, as a distro packager, if it all gets too much, I don’t think it’s really *that* terrible to just dump six into your tree such that `import six` will use it. Because Python is sane, it’s still easy for distributions to unbundle: we simply wipe the directory from the ansible package, which is like one line in %prep, it’s no big deal.

    This does mean that people deploying ansible outside their distro package system on systems with sufficiently new system-wide copies will be using the bundled copy not the system-wide one, but is that really a big deal? Big enough that it wouldn’t be enough to just say ‘if you want to use the system copy, wipe or rename the bundled one’?

    • The problem is that it’s not so easy to make “import six” import the bundled six if the system six is not present or has insufficient features. So in our code we pretty much have to import from a location from within our namespace (for instance, import ansible.compat.six). If we do that, though, then the system packager has to go through and patch our code to remove the import from the bundled location and put an import from the system location in its place.

      One way to deal with that problem is in every python file that’s referencing the bundled version, have a try except like this:

      try:
          from six import shlex_quote
      except:
          from ansible.compat.six import shlex_quote
      

      that way we first try one location and then fall back to the other if its not found (for some upstream, preferring the system location is fine. For others, preferring the bundled location is what they are willing to do but they’re still willing to keep the fallback in there so that system packagers can make the code do the right thing by doing an rm -rf on the bundled copy.

      For something like six, there can be a great many files where it needs to be imported: https://github.com/ansible/ansible/pull/12769 so writing out the try except for every one of them gets tedious. Making the code inside of ansible.compat.six do the work means less work in the long run for everyone… provided that there’s a good way to make that work in the first place.

  2. Seems like a lot of work for what can be accomplished with a `install_requires = [‘six>=x.y’]`, in setup.py. Distro Packages are always behind, so we never use them for Python apps or frameworks.

    Our ansible environment and playbooks are in a repos and treated just like any other app we create based on a framework. Cloned into a virtualenv, pip install -r requirements.txt, changes committed, etc. All sysadmin’s fork, PR, etc. This creates a history and a tool set that runs very well.

    Installing ansible from a dist pkg would be like trying to create a django app and trying to use the dist supplied package. Making ansible less pip friendly is not the way to go. IMHO, Adam has it backwards.

    • There’s many different shops out there with many different policies and preferences. Some places will only install distro packages, others will only install packages if they’re packaged to integrate with their distro packaging system, and others are willing to either install directly from pip or to run from a git checkout. When I code something, I try to meet the needs of as many of these audiences as I can as long as it doesn’t drag it into insanely hard to support areas.

      For this particular case there isn’t a lot of code to make the distro expectations work. It’s just a matter of choosing which of these minimal code solutions actually has the least drawbacks. None of them make it so that pip no longer functions — its just that now the ansible package contains the version of six that it requires instead of making pip download it separately.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s