Pattern or Antipattern? Splitting up initialization with asyncio

“O brave new world, That has such people in’t!” – William Shakespeare, The Tempest

Edit: Jean-Paul Calderone (exarkun) has a very good response to this detailing why it should be considered an antipattern. He has some great thoughts on the implicit contract that a programmer is signing when they write an __init__() method and the maintenance cost that is incurred if a programmer breaks those expectations. Definitely worth reading!

Instead of spending the Thanksgiving weekend fighting crowds of shoppers I indulged my inner geek by staying at home on my computer. And not to shop online either — I was taking a look at Python-3.4’s asyncio library to see whether it would be useful in general, run of the mill code. After quite a bit of experimenting I do think every programmer will have a legitimate use for it from time to time. It’s also quite sexy. I think I’ll be a bit prone to overusing it for a little while ;-)

Something I discovered, though — there’s a great deal of good documentation and blog posts about the underlying theory of asyncio and how to implement some broader concepts using asyncio’s API. There’s quite a few tutorials that skim the surface of what you can theoretically do with the library that don’t go into much depth. And there’s a definite lack of examples showing how people are taking asyncio’s API and applying them to real-world problems.

That lack is both exciting and hazardous. Exciting because it means there’s plenty of neat new ways to use the API that no one’s made into a wide-spread and oft-repeated pattern yet. Hazardous because there’s plenty of neat new ways to abuse the API that no one’s thought to write a post explaining why not to do things that way before. My joke about overusing it earlier has a large kernel of truth in it… there’s not a lot of information saying whether a particular means of using asyncio is good or bad.

So let me mention one way of using it that I thought about this weekend — maybe some more experienced tulip or twisted programmers will pop up and tell me whether this is a good use or bad use of the APIs.

Let’s say you’re writing some code that talks to a microblogging service. You have one class that handles both posting to the service and reading from it. As you write the code you realize that there’s some time consuming tasks (for instance, setting up an on-disk cache for posts) that you have to do in order to read from the service that you do not have to wait for if your first actions are going to be making new posts. After a bit of thought, you realize you can split up your initialization into two steps. Initialization needed for posting will be done immediately in the class’s constructor and initialization needed for reading will be setup in a future so that reading code will know when it can begin to process. Here’s a rough sketch of what an implementation might look like:

import os
import sqlite
import asyncio

import aiohttp

class Microblog:
    def __init__(self, url, username, token, cachedir):
        self.auth = token
        self.username = username
        self.url = url
        loop = asyncio.get_event_loop()
        self.init_future = loop.run_in_executor(None, self._reading_init, cachedir)

    def _reading_init(self, cachedir):
        # Mainly setup our cache
        self.cachedir = cachedir
        os.makedirs(cachedir, mode=0o755, exist_ok=True)
        self.db = sqlite.connect('sqlite:////{0}/cache.sqlite'.format(cachedir))
        # Create tables, fill in some initial data, you get the picture [....]

    @asyncio.coroutine
    def post(self, msg):
        data = dict(payload=msg)
        headers = dict(Authorization=self.token)
        reply = yield from aiohttp.request('post', self.url, data=data, headers=headers)
        # Manipulate reply a bit [...]
        return reply

    @asyncio.coroutine
    def sync_latest(self):
        # Synchronize with the initialization we need before we can read
        yield from self.init_future
        data = dict(per_page=100, page=1)
        headers = dict(Authorization=self.token)
        reply = yield from aiohttp.request('get', self.url, data=data, headers=headers)
        # Stuff the reply in our cache

if __name__ == '__main__':
    chirpchirp = Microblog('http://chirpchirp.com', 'a.badger', TOKEN, '/home/badger/cache/')
    loop = asyncio.get_event_loop()
    # Contrived -- real code would probably have a coroutine to take user input
    # and then submit that while interleaving with displaying new posts
    asyncio.async(chirpchirp.post(' '.join(sys.argv[1:])))
    loop.run_until_complete(chirpchirp.sync_latest())
    

Some of this code is just there to give an idea of how this could be used. The real question’s revolve around splitting up initialization into two steps:

  • Is yield from the proper way for sync_latest() to signal that it needs self.init_future to finish before it can continue?
  • Is it good form to potentially start using the object for one task before __init__ has finished all tasks?
  • Would it be better style to setup posting and reading separately? Maybe a reading class and a posting class or the old standby of invoking _reading_init() the first time sync_latest() is called?

Security FAD

All packed up and waiting for my plane to Raleigh. Going there to work on enabling two-factor authentication for the hosts that give shell access inside of Fedora’s Infrastructure. For the first round, I think we’re planning on going for simple and minimal to show what we can do. Briefly, the simplest and minimalist is:

* Server to verify a one time password (we already have one for yubikeys)
* CGI to take a username, password, and otp to verify in fas and the otp server
* pam module for sudo that verifies the user via the cgi
* database to store the secret keys for the otp generation and associate them with the fas username

We’re hoping to go a little beyond the minimal at the FAD:

* Have a web frontend to configure the secret keys that are stored for an account.
* Presently we’re thinking that this is a FAS frontend but we may end up re-evaluating this depending on what we decide to do for web apps and what to require for changing an auth source.
* Allow both yubikey and google-authenticator as otp

I’m also hoping that since we’ll have most of the sysadmin side of infrastructure present that we’ll get a chance to discuss and write down a few OTP policies for the future:

* Do we want to make two-factor optional for some people and required for others?
* How many auth sources do we require in order to change a separate auth source (email address, password, secret for otp generation, phone number, gpg key, etc)?

If we manage to get through all of that work, there’s a few other things we could work on as well:

* Design and implement OTP for our web apps

Porting Kitchen to Python3: Part 1 — Detecting string types

I’ve spent a good part of the last week working on the python3 port of kitchen. It’s now to the point where I’ve reviewed all of the code and got the unittests passing. I still need to add some deprecation warnings and a gettext object that mirrors the python3 API instead of the python2 API. Then it’ll be ready for an alpha release. Still a lot of work to do before a final release. Most of the documentation will need to be updated to change from unicode + str to str + bytes and the best practices sections will need a major overhaul since a lot of the problems with python2 and unicode have either been fixed, mitigated, or moved to a different level.

It was both an easy and hard undertaking. The easy part was that kitchen is largely a collection of dependent but unrelated functions. So it’s reasonably easy to pick a set of functions, figure out that they don’t depend on anything else in kitchen, and then port them one by one.

The hard part is that a lot of those functions deal with things that are explicitly unicode and things that are explicitly byte strings; an area that has both changed dramatically in python3 and that 2to3 doesn’t handle very well. Here’s a couple of things I ended up doing to help out:

Detecting String Types

Kitchen has several places that need to know whether an object it’s been given is a byte string, unicode string, or a generic string. The python2 idioms for this are:

if isinstance(obj, basestring):
    pass # object is any of the string types
    if isinstance(obj, str):
        pass # object is a byte string
    elif isinstance(obj, unicode):
        pass # object is a unicode string
else:
    pass # object was not a string type

In python3, a couple things have changed.

  • There’s no longer a basestring type as byte strings and unicode strings are no longer meant to be related types.
  • Byte strings now have an immutable (bytes) and mutable (bytearray) type.

With these changes, the python3 idioms equivalent to the python2 ones look something like this:

if isinstance(obj, str) or isinstance(obj, bytes) or isinstance(obj, bytearray):
    pass # any string type
    if isinstance(obj, bytes) or isinstance(obj, bytearray):
        pass # byte string
    elif isinstance(obj, str):
        pass # unicode string

There’s two issues with these changes:

  • code that needs to do this needs to be manually ported when moving from python2 to python3. 2to3 can correctly change all occurrences of isinstance(obj, unicode) to isinstance(obj, str) but occurrences of isinstance(obj, basestring) and isinstance(obj, str) will also be rendered as isinstance(obj, str) in the 2to3 output. This is correct for a lot of normal python2 code that is trying to separate strings from ints, floats, or other types but it is incorrect for code that’s trying to explicitly separate bytes from unicode. So you’ll need to hand-audit and fix your code wherever these idioms are being used.
  • This is more prolix and tedious to write than the python2 version and if your code has to do this sort of differentiation in many places you’ll soon get bored of it.

For kitchen, I added a few helper functions into kitchen.text.misc that encapsulate the python2 and python3 idioms. For instance:

def isbasestring(obj):
    if isinstance(obj, str) or isinstance(obj, bytes) or isinstance(obj, bytearray):
        return True
    return False

and similar for isunicodestring() and isbytestring(). [In case you’re curious, I broke with PEP8 style for these function names because of the long history of is* functions and methods in python and other programming languages.] By pushing these into functions, I can use if isbasetring(obj): on both python2 and python3. I only have to change the implementation of the is*string() functions in a single place when porting from python2 to python3.

Now let’s mention some of the caveats to using this:

  • In python, calling a function (isbasestring()) is somewhat expensive. So if you use this in any hot inner loops, you may want to benchmark with the function and with the expanded version to see whether you take a noticable loss of speed.
  • Not every piece of code is going to want to define “string” in the same way. For instance, bytearrays are mutable so maybe your code shouldn’t include those with the “normal” string types.
  • Maybe your code can be changed to only deal with unicode strings (str). In python3 byte strings are not as ubiquitous as they were in python2 so maybe your code can be changed to stop checking for the type of the object altogether or reduced to a single isinstance(obj, str). The language has evolved so when possible, evolve your code to adapt as well.

Next time: Literals

My first python3 script

I’ve been hacking on other people’s python3 code for a while doing porting and bugfixes but so far my own code has been tied to python2 because of dependencies. Yesterday I ported my first personal script from python2 to python3. This was just a simple, one file script that hacks together a way to track how long my kids are using the computer and log them off after they’ve hit a quota. The kind of thing that many a home sysadmin has probably hacked together to automate just a little bit of their routine. For that use, it seemed very straightforward to make the switch. There were only three changes in the language that I encountered when making the transition:

  • octal values. I use octal for setting file permissions. The syntax for octal values has changed from "0755" to "0o755"
  • exception catching. No longer can you do: except Exception, exc. The new syntax is: except Exception as exc.
  • print function. In python2, print is a keyword so you do this: print 'hello world'. In python3, it’s a function so you write it this way: print('hello world')
  • The strict separation of bytes and string types. Required me to specify that one subprocess function should return string instead of bytes to me

When I’ve worked on porting libraries that needed to maintain some form of compat between python2 (older versions… no nice shiny python-2.7 for you!) and python3 these concerns were harder to address as there needed to be two versions of the code (usually, maintained via automatic build-time invocation of 2to3). With this application/script, throwing out python2 compatibility was possible so switching over was just a matter of getting an error when the code executed and switching the syntax over.

This script also didn’t use any modules that had either not ported, been dropped, or been restructured in the switch from python2 to python3. Unlike my day job where urllib’s restructuring would affect many of the things that we’ve written and lack of ported third-party libraries would prevent even more things from being ported, this script (and many other of my simple-home-use scripts) didn’t require any changes due to library changes.

Verdict? Within these constraints, porting to python3 was as painless as porting between some python2.x releases has been. I don’t see any reason I won’t use python3 for new programming tasks like this. I’ll probably port other existing scripts as I need to enhance them.

Account System 0.8.11 Release

We’ve just deployed a new Fedora Account System to production. This release just pulls a few new features that didn’t quite make the 0.8.10 release:

  • Ian Cole (icole) Added a feature to allow for email address to be used instead of login name for logging in. Because of the way we do authentication, this means that email addresses can also be used on the other applications on admin.fedoraproject.org as well.
  • Pierre-Yves Chibon (pingou) Implemented an audio captcha for people signing up for a new account. It generates a wav file that gets downloaded to your computer that you can listen to and then type in the proper answer to the captcha.
  • Adam M. Dutko (addutko) Standardized some of the errors that can be returned from our JSON API.
  • Our translation team pointed out a few areas where we weren’t loading translations correctly and I fixed them. Look forward to more complete translations in the future.

That’s it for this minor update.

/me goes to play with the audio captcha some more.

Kitchen 1.1.0 release

As mentioned last week a new kitchen release went out today. Since last week some small changes were made to the documentation and a few changes to make building kitchen easier were implemented but nothing has changed in the code. Here’s the text of the release announcement:

== Kitchen 1.1.0 has been released ==

Kitchen is a python library that brings together small snippets of code that you might otherwise find yourself reimplementing via cut and paste. Each little bit is useful and important but they usually feel too small and too trivial to create a whole module just for that one little function. However, experience has shown that any code implemented by copying will inevitably be shown to have bugs. And when you fix those bugs, you’ll wish you had created the module so you could fix the bug in one place rather than two (or five.. or ten…). Kitchen aims to be that one place.

Kitchen currently has code for easily setting up gettext so it won’t throw UnicodeErrors in corner cases, compatibility modules for different python2 stdlib versions (2.4, 2.5, 2.7), a little bit of iterators, and a whole lot of code for unicode-byte string conversion. In addition to the code, kitchen contains documentation that explains some of the common problems that arise when dealing with unicode in python2 and how to fix them through changes in coding practices and/or making use of functions from kitchen.

The 1.1.0 release enhances the gettext portion of kitchen. The major enhancements are:

  • get_translation_object can now be used as a drop in replacement for the stdlib’s gettext.translations() function.
  • If get_translation_object finds multiple message catalogs for the domain, it will setup the additional catalogs as fallbacks in case the message isn’t found in the first one.
  • The gettext and lgettext functions were reworked so that they guarantee that the string they return is both a byte str (this was present in previous kitchen releases) and is a valid sequence of bytes in the selected output_charset. This should prevent tracebacks if your code decodes and reencodes a value returned from the gettext and lgettext family of functions.
  • Several fixes to the way fallback message catalogs interacted with input and output charsets.

For the complete set of changes, see the NEWS file.