I realize I didn’t announce 0.2.3 so here’s the NEWS entries for both of 0.2.3 and 0.2.4:
lgettextfunctions instead of
- Correct docstring for
transforming into a byte
str, not into
- Correct some examples in the unicode frustrations documentation
- Correct some cross-references in the documentation
These are undocumented, and not in upstream’s
__all__but google (and bug
reports against kitchen) show that some people are using them. Note that
upstream is leaning towards these being private so they may be deprecated in
So what do these changes mean for you? Hopefully it’ll just be bugfixes for everyone. The subprocess changes in 0.2.3 make more of the subprocess interface public because some code uses those functions and variables. People using them are advised to stop using them as this upstream bug report shows that the python maintainers don’t intend them to be public and will be deprecating them in the future. Since I had to dig into the code to look into this, I’ll also note that if your code is using
list2cmdline() it it’s likely that it’s buggy in corner cases. From thesubprocess documentation: “list2cmdline() is designed for applications using the same rules as the MS C runtime.” That means that it’s not intended for dealing with Unix shells or even the MS-DOS command prompt. It’s only intended for the MS C runtime itself.
The 0.2.4 changes to easy_gettext_setup() changes behaviour so there is a potential to break code although I still classify it as a bugfix.
easy_gettext_setup() is intended to return the gettext functions needed to translate an application. Since python has both byte
unicode string types that can be used, there are gettext functions that return one or the other of those.
easy_gettext_setup() takes a parameter,
use_unicode to know whether to return a set of functions that works with byte
str or a set of functions that work with
unicode strings. There’s only one set of functions that return
unicode so when
unicode is requested the code returns the
ungettext() functions as expected. When byte
str is requested, however, things are a little messier as there’s two sets of function to choose from:
Prior to 0.2.4,
ngettext(). The gettext functions do return byte strings. However, the byte strings they return are in the encoding that was saved in the message catalogs on the filesystem. So, if the translators used utf-8 to encode their strings, you’d get utf-8 output; if they used latin-1, you’d get latin-1 output and so forth. This works fine as long as you’re using the same encoding as the translators were. However, when the translator uses a different encoding than you, you get mojibake.
In 0.2.4, we’ve switched to returning the
lgettext functions to address this.
lngettext take the byte strings and the encoding information from the message catalog that the translator provided and use that to re-encode the strings in the desired encoding. That way if you have a locale setting of
ja_JP.EUC_JP you get text encoded in
EUC_JP and if you have a locale setting of
ja_JP.UTF8 your text is encoded in