Dear Lazyweb, how would you nicely bundle python code?

I’ve been looking into bundling the python six library into ansible because it’s getting painful to maintain compatibility with the old versions on some distros. However, the distribution developer in me wanted to make it easy for distro packagers to make sure the system copy was used rather than the bundled copy if needed and also make it easy for other ansible developers to make use of it. It seemed like the way to achieve that was to make an import in our namespace that would transparently decide which version of six was needed and use that. I figured out three ways of doing this but haven’t figured out which is better. So throwing the three ways out there in the hopes that some python gurus can help me understand the pros and cons of each (and perhaps improve on what I have so far).

Boilerplate
To be both transparent to our developers and use system packages if the system had a recent enough six, I created a six package in our namespace. Inside of this module I included the real six library as _six.py. Then I created an __init__.py with code to decide whether to use the system six or the bundled _six.py. So the directory layout is like this:

+ ansible/
  + __init__.py
  + compat/
    + __init__.py
    + six/
      + __init__.py
      + _six.py

__init__.py has two tasks. It has to determine whether we want the system six library or the bundled one. And then it has to make that choice what other code gets when it does import ansible.compat.six. here’s the basic boilerplate:

# Does the system have a six library installed?
try:
    import six as _system_six
except ImportError:
    _system_six = None

if _system_six:
    # Various checks that system six library is current enough
    if not hasattr(_system_six.moves, 'shlex_quote'):
        _system_six = None

if _system_six:
    # Here's where we have to load up the system six library
else:
    # Alternatively, we load up the bundled library

Loading using standard import
Now things start to get interesting. We know which version of the six library we want. We just have to make it available to people who are now going to use it. In the past, I’d used the standard import mechanism so that was the first thing I tried here:

if _system_six:
    from six import *
else:
    from ._six import *

As a general way of doing this, it has some caveats. It only pulls in the symbols that the module considers public. If a module has any functions or variables that are supposed to be public and marked with a leading underscore then they won’t be pulled in. Or if a module has an __all__ = [...] that doesn’t contain all of the public symbols then those won’t get pulled in. You can pull those additions in by specifying them explicitly if you have to.

For this case, we don’t have any issues with those as six doesn’t use __all__ and none of the public symbols are marked with a leading underscore. However, when I then started porting the ansible code to use ansible.compat.six I encountered an interesting problem:

# Simple things like this work
>>> from ansible.compat.six import moves
>>> moves.urllib.parse.urlsplit('https://toshio.fedorapeople.org/')
SplitResult(scheme='https', netloc='toshio.fedorapeople.org', path='/', query='', fragment='')

# this throws an error:
>>> from ansible.compat.six.moves import urllib
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ImportError: No module named moves

Hmm… I’m not quite sure what’s happening but I zero in on the word “module”. Maybe there’s something special about modules such that import * doesn’t give me access to import subpackages or submodules of that. Time to look for answers on the Internet…

The Sorta, Kinda, Hush-Hush, Semi-Official Way

Googling for a means to replace a module from itself eventually leads to a strategy that seems to have both some people who like it and some who don’t. It seems to be supported officially but people don’t want to encourage people to use it. It involves a module replacing its own entry in sys.modules. Going back to our example, it looks like this:

import sys
[...]
if _system_six:
    six = _system_six
else:
    from . import _six as six

sys.modules['ansible.compat.six'] = six

When I ran this with a simple test case of a python package with several nested modules, that seemed to clear up the problem. I was able to import submodules of the real module from my fake module just fine. So I was hopeful that everything would be fine when I implemented it for six.

Nope:

>>> from ansible.compat.six.moves import urllib
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ImportError: No module named moves

Hmm… same error. So I take a look inside of six.py to see if there’s any clue as to why my simple test case with multiple files and directories worked but six’s single file is giving us headaches. Inside I find that six is doing its own magic with a custom importer to make moves work. I spend a little while trying to figure out if there’s something specifically conflicting between my code and six’s code and then throw my hands up. There’s a lot of stuff that I’ve never used before here… it’ll take me a while to wrap my head around it and there’s no assurance that I’ll be able to make my code work with what six is doing even after I understand it. Is there anything else I could try to just tell my code to run everything that six would normally do when it is imported but do it in my ansible.compat.six namespace?

You tell me: Am I beating my code with the ugly stick?

As a matter of fact, python does provide us with a keyword in python2 and a function in python3 that might do exactly that. So here’s strategy number three:

import os.path
[...]
if _system_six:
    import six
else:
    from . import _six as six
six_py_file = '{0}.py'.format(os.path.splitext(six.__file__)[0])
exec (open(six_py_file, 'r'))

Yep, exec will take an open file handle of a python module and execute it in the current namespace. So this seems like it will do what we want. Let’s test it:

>>> from ansible.compat.six.moves import urllib
>>>
>>> from ansible.compat.six.moves.urllib.parse import urlsplit
>>> urlsplit('https://toshio.fedorapeople.org/')
SplitResult(scheme='https', netloc='toshio.fedorapeople.org', path='/', query='', fragment='')

So dear readers, you tell me — I now have some code that works but it relies on exec. And moreover, it relies on exec to overwrite the current namespace. Is this a good idea or a bad idea? Let’s contemplate a little further — is this an idea that should only be applied sparingly (Using sys.modules instead if the module isn’t messing around with a custom importer of its own) or is it a general purpose strategy that should be applied to other libraries that I might bundle as well? Are there caveats to doing things this way? For instance, is it bypassing the standard import caching and so might be slower? Is there a better way to do this that in my ignorance I jsut don’t know about?

Flock Ansible Talk

Hey Fedorans, I’m trying to come up with an Ansible Talk Proposal for FLOCK in Rochester. What would you like to hear about?

Some ideas:

* An intro to using ansible
* An intro to ansible-playbook
* Managing docker containers via ansible
* Using Ansible to do X ( help me choose a value for X πŸ˜‰
* How to write your own ansible module
* How does ansible transform a playbook task into a python script on the remote system

Let me know what you’re interested in πŸ™‚

Hacky, hacky playbook to upgrade several Fedora machines

Normally I like to polish my code a bit before publishing but seeing as Fedora 21 is relatively new and a vacation is coming up which people might use as an opportunity to upgrade their home networks, I thought I’d throw this extremely *unpolished and kludgey* ansible playbook out there for others to experiment with:

https://bitbucket.org/toshio/local-playbooks/src/3d3ae76a56034784ab60fcfa1129221c59a40f3b/provisioning/fedora-upgrade.yml?at=master

When I recently looked at updating all of my systems to Fedora 21 I decided to try to be a little lighter on network bandwidth than I usually am (I’m on slow DSL and I have three kids all trying to stream video at the same time as I’m downloading packages). So I decided that I’d use a squid caching proxy to cache the packages that I was going to be installing since many of the packages would end up on all of my machines. I found a page on caching packages for use with mock and saw that there were a bunch of steps that I probably wouldn’t remember the next time I wanted to do this. So I opened up vim, translated the steps into an ansible playbook, and tried to run it.

First several times, it failed because there were unsolved dependencies in my packageset (packages I’d built locally with outdated dependencies, packages that were no longer available in Fedora 21, etc). Eventually I set the fedup steps to ignore errors so that the playbook would clean up all the configuration and I could fix the package dependency problems and then re-run the playbook immediately afterwards. I’ve now got it to the point where it will successfully run fedup in my environment and will cache many packages (something’s still using mirrors instead of the baseurl I specified sometimes but I haven’t tracked that down yet. Those packages are getting cached more than once).

Anyhow, feel free to take a look, modify it to suit your needs, and let me know of any “bugs” that you find πŸ™‚

Things I’ll probably do to it for when I update to F22:

Tales of an Ansible Newb: Ch.1 – Modules

In my last post I used the ansible command line to run an arbitrary shell command on several remote systems. Just being able to do that makes Ansible a useful tool in a sysadmin’s toolchest. However, that’s just the tip of what Ansible can do. Let’s go one step further today and explore some Ansible modules.

The command from the last post looked like this:

ansible '*' -i  'host1,host2' -a 'yum -y install tmux' -K --sudo

That let me run an ad hoc command on a set of hosts. You can run any command you want by specifying a different command line to -a:

$ ansible '*' -i 'host1,' -a 'hostname'
host1 | success | rc=0 >>
host1.lan

$ ansible '*' -i 'fedora20-test,' -a 'grep fedora /etc/os-release'
fedora20-test | success | rc=0 >>
ID=fedora
CPE_NAME="cpe:/o:fedoraproject:fedora:20"
HOME_URL="https://fedoraproject.org/"

Definitely useful. But what if you need to combine several commands together? For instance, let’s say we want to run one command if we’re talking to Fedora and a different one if we’re talking to Ubuntu? Well, we know how to use if-then-else in a shell script, maybe that will work here? Let’s give it a try!

$ ansible '*' -i 'fedora20-test,ubuntu14' -a \
> 'if test -f /etc/fedora-release ; then echo "Fedora!" ; else echo "Ubuntu!" ; fi'
ubuntu14 | FAILED | rc=2 >>
[Errno 2] No such file or directory

fedora20-test | FAILED | rc=2 >>
[Errno 2] No such file or directory

Nope that didn’t work. For a moment we might be puzzled why we get the error “No such file or directory” but then we realize that if is a shell builtin. If ansible isn’t creating a shell on the remote system then if won’t be found. So how can we do this?

Until now all of the commands I’ve given Ansible have been single programs and their command line parameters specified as a string to ansible’s -a parameter. But just what is the -a? Ansible’s man page has this information for us:

-a ‘ARGUMENTS’, –args=’ARGUMENTS’
The ARGUMENTS to pass to the module.

So what’s this module it’s talking about? Modules are small pieces of code that Ansible is able to run on a remote machine. By default, if no module is specified, Ansible runs the command module. The command module takes a single argument, a command line string which it then runs on the remote machine… with the important note that it directly invokes it so that we don’t have to worry about a remote shell interpreting any special characters . It does not pass it through the shell. But if we do need to use shell constructs (shell variables, redirection, builtins, etc) we can use the shell module instead. Woo hoo! Let’s try that:

$ ansible '*' -i 'fedora20-test,ubuntu14' -m shell -a \
'if test -f /etc/fedora-release ; then echo "Fedora!" ; else echo "Ubuntu!"
; fi'
ubuntu14 | success | rc=0 >>
Ubuntu!

fedora20-test | success | rc=0 >>
Fedora!

Excellent. So now we know how to use the full range of shell features if we need to. But I wonder what other modules Ansible ships with.

To find the answer to that, you can run:

$ ansible-doc -l

Which prints out a list of 240 or so modules which range from niche applications like managing specific brands of network devices to installing OS packages on various types of Linux distribution and copying files between the local system and remote hosts. You can find information about any one of these by running ansible-doc MODULE-NAME.

So let’s try out one of these modules. Let’s take our example of installing a package on several remote computers that we used last time and make it use the yum module instead of invoking yum via the command module. ansible-doc yum tells us that the yum module takes a few arguments of which name is mandatory and specifies a package name. state let’s us tell the module what state we want the package to be in when ansible exits. Let’s give it a try:

$ ansible '*' -i 'fedora20-test,' -m yum -a 'name=tmux,zsh state=latest' --sudo -K
sudo password: 
fedora20-test | success >> {
    "changed": true, 
    "msg": "", 
    "rc": 0, 
    "results": [
        "All packages providing tmux are up to date",.
        "Resolving Dependencies\nRunning transaction check\n
Package zsh.x86_64 0:5.0.7-1.fc20 will be updated\n
Package zsh.x86_64 0:5.0.7-4.fc20 will be an update\n
Finished Dependency Resolution\n\nDependencies Resolved\n\n
================================================================================\n
Package       Arch             Version                 Repository         Size\n
================================================================================\n
Updating:\n zsh           x86_64           5.0.7-4.fc20            updates           2.5 M\n\n
      Transaction Summary\n     
================================================================================\n
Upgrade  1 Package\n\nTotal download size: 2.5 M\nDownloading packages:\n
Not downloading deltainfo for updates, MD is 2.9 M and rpms are 2.5 M\n
Running transaction check\nRunning transaction test\nTransaction test succeeded\n
Running transaction (shutdown inhibited)\n
Updating   : zsh-5.0.7-4.fc20.x86_64                                      1/2 \n
Cleanup    : zsh-5.0.7-1.fc20.x86_64                                      2/2 \n
Verifying  : zsh-5.0.7-4.fc20.x86_64                                      1/2 \n  
Verifying  : zsh-5.0.7-1.fc20.x86_64                                      2/2 \n\n
Updated:\n  zsh.x86_64 0:5.0.7-4.fc20                                                     \n\n
Complete!\n"
    ]
}

Yep, as we expect telling ansible to use the yum module and specifying that the arguments to the module are name=tmux,zsh state=latest makes sure that the tmux and zsh packages are installed and at their latest available versions. But has using the yum module gained us anything? It’s about as long to type -m yum -a 'name=tmux,zsh state=latest as it is to type -a 'yum install -y tmux zsh and we already know the syntax for the latter. Is there a difference? For the yum module there isn’t much difference. The module and the single command do about the same thing. But there are other modules that do more. For instance, the git module.

Let’s say that you have a web application in a git repository and you want to deploy it directly from git. You need to clone the repository remotely and checkout a specific revision to the working tree because that’s what should be running in production, not the current HEAD of the master branch. You also need to checkout a few git submodules for projects that are hosted in different repositories. If you did this via the shell module, you’d have to use several different calls to git:

$ ansible '*' -i 'host1,' -a \
  'git clone http://git.example.com/project /var/www/webapp'
$ ansible '*' -i 'host1,' -m shell -a \
  'cd /var/www/webapp && git checkout fad30ad8'
$ ansible '*' -i 'host1,' -m shell -a \
  'cd /var/www/webapp && git submodule update --init'

The git module encapsulates git’s features in such a way that you can do all of this in one ansible call:

$ ansible '*' -i 'host1,' -m git -a \
  'repo=http://git.example.com/project state=present version=fad30ad8 recursive=yes dest=/var/www/webapp'

Kinda handy, right? Where modules really start to shine, though, is when we stop trying to run everything in a single ansible command line and start using playbooks to perform multiple steps in a single run. More on playbooks next time.

Tales of an Ansible Newb: Ch0.1 – What are these blog posts?

Last time I wrote about my first baby steps with Ansible, using it for ad hoc commands on multiple machines. When I first started working on Ansible three months ago, that was what 90% of my Ansible knowledge consisted of.

At work we were using Ansible to replace puppet as our configuration management system in Fedora Infrastructure but somehow I never understood the bigger picture of how it all fit together. Host_vars and group_vars and roles and inventory and playbooks that were included in other playbooks and playbooks that worked in conjunction with RHEL kickstart files to create a new host and playbooks that configured the hosts once they were created. And roles… where did roles fit into all of this?

Basically, the system at work was large and I had a hard time finding small enough pieces to master one at a time.

Now that I’m working on Ansible, I have an even larger need to understand how to use it. Code doesn’t exist in a vacuum; I need to know how people might use the code in order to better write the behaviours that they may want. So, on the theory that the best way to learn is by doing and having learned from the mistake of trying to understand too much all at once, I’ve started writing playbooks for my home network. Now, I’m not yet ambitious enough to try to replicate the production-quality or production goals that we had in an infrastructure like Fedora. I’m not (yet?) interested in tearing down my home network and provisioning it from scratch with Ansible. At the moment my ambition is simply to automate some repetitive tasks. As time goes on we’ll see what else happens πŸ™‚

Tales of an Ansible Newb: Ch.0 – The Hook

The main thing I like about Ansible is that it has a very easy onramp. If you’ve done even a little bit of administration of a *nix system you’ve been exposed to the shell to interact with the system and ssh to connect you to a remote computer where you can interact with the remote system’s shell. At its most basic level, Ansible is just building on those pieces of knowledge.

For instance, let’s say you have a few computers at your house for you and your wife and your kids and the neighbors’ kids and the cat to use as a warm bed when the sun isn’t shining. You want to install a new piece of software on all of them. How do you do that?

Pre-Ansible it might look like this:

$ ssh host1
$ sudo yum -y install tmux
$ exit
$ ssh host2
$ sudo yum -y install tmux
$ exit
[...]

If you were feeling luc^Wconfident in your shell scripting you might try to automate some of that:

$ for host in host1 host2 [...] ; do
for> ssh -t $host1 sudo yum -y install tmux
for> done
[sudo] password for badger: [types password]
[waits while stuff gets done]
Connection to host1 closed.
[sudo] password for badger: [types password again]
[waits while stuff gets done]
Connection to host2 closed.
[...]

A little better but we still have to type your sudo password prompt once for every host. You still have to wait between typing in your password. You still have the tasks running one-by-one on the remote hosts even though they could be done in parallel.

At this point, depending on what type of person you are you’ll be thinking one of three things:

  • Yay! Automation!
  • I bet I can do better… let me just crack open the expect manual, open up
    my text editor, and maybe rewrite this in perl….
  • You know, someone else must have written an application for this…

I’m a programmer by nature so once upon a time I probably would have found myself running down path 2 with nary a backwards glance. But if time, grey hairs, and the prodding of your more experienced sysadmin friends teaches you anything, it’s that you shouldn’t invent your own security sensitive wheel when someone else’s is perfectly serviceable. So let’s reach for Ansible and see what it can do here:

ansible '*' -i 'host1,host2,[...]' -a 'yum -y install tmux' -K --sudo
sudo password:
fedora20 | success | rc=0 >>
Resolving Dependencies
--> Running transaction check
[...]
Installed:
tmux.x86_64 0:1.9a-2.fc20

Complete!

katahdin | success | rc=0 >>
Loaded plugins: langpacks
Resolving Dependencies
--> Running transaction check
[...]
Installed:
tmux.x86_64 0:1.9a-2.fc20

Complete!

Nice! So now I can run ad hoc commands on multiple ad hoc hosts, using sudo if I want to. I type my sudo password just once at the very beginning and then go do other things while Ansible runs my task on all the hosts I specified! That certainly makes managing my home network easier. I think I’ll be using this tool more often….