Python 2/3 Compatibility
Table of Contents
- No implicit package-relative imports
- Alternative Imports
- No dict.iteritems, iterkeys or itervalues
- No dict.has_key
- Map, Filter and Zip return generators
- Don't unicode.encode
- Don't str.decode either
- next(i) not i.next()
- Cannot use LazyT as sorting key
- XML with XML declaration must be bytes
- Sorting requires type consistency
- Turkish letters İ and ı
This guideline documents coding conventions to achieve hybrid Python-2.7/Python-3.5 compatibility in Sahana.
It's a working document that will be added to as we move towards full Python-3 compatibility.
No print statements
Don't use the print statement anywhere:
print "example" # deprecated
You shouldn't use the print-function either, because it can clash with uWSGI:
print("example") # not good
...but it can be tolerated in CLI scripts which don't run in the WSGI environment.
Best option for debug output is to use sys.stderr.write:
import sys sys.stderr.write("example\n") # better
...or the logger (as it can be configured globally to write to a log file instead of the system console):
current.log.debug("example") # even better
Avoid overriding built-ins and wildcard-imports
Except where it is deliberate, we should avoid to use the names of built-in functions or types as variable names.
Overriding a built-in function/type can - depending on scope - have adverse side-effects, which can then be very hard to trace back to a cause.
# Somewhere at the top of the module range = lambda x, y: (x, y) ... # Somewhere further down in the module class Example(object): def __init__(self, maximum): self.indexes = [x for x in range(0, maximum)] # Doesn't crash, but doesn't behave as expected either: # my_example.indexes should be [0, 1, 2, 3, 4], but instead it is (0, 5) my_example = Example(5)
The developer of the Example-class will find it difficult to understand why it doesn't work as expected - so we should avoid such situations.
Python-3 introduces some new built-in names (e.g. "input"), so this needs extra care. A good strategy is to use a static code checker (e.g. pylint) to detect overrides, or to use an editor that highlights built-in names right away.
A very tricky part of this are wildcard-imports like
from dateutil.rrule import * - these wildcard imports can (unintentionally) override built-ins without that being obvious (dateutil.rrule overrides range, for instance). Therefore, we should avoid all wildcard-imports, except from internal modules (e.g.
from s3 import *) where we have control over what they expose and what not.
Use as-syntax for catching exceptions
When catching exceptions in a variable, don't use the comma-syntax:
try: ... except Exception, e: # deprecated ...
Instead, use the as-keyword:
try: ... except Exception as e: # new standard ...
Raise Exceptions Instances
Python-3 does no longer support the
raise E, V, T syntax - the new syntax is
However, the traceback is rarely required - and where E is an Exception class (rather than a string), we can use the
raise E(V) syntax which works in all Python versions.
# Works in Py2, but not in Py3: raise SyntaxError, "Error Message" # Works in all Python versions, hence our Standard: raise SyntaxError("Error Message")
If a traceback object must be passed (which is rarely needed), then we must use the PY2 constant to implement alternative statements.
No exec statements
In Python-3, the
exec statement has become a function
exec(). Python-2.7 accepts the function-syntax as well, so we use it throughout:
# Deprecated: exec pyexpr # New Standard: exec(pyexpr)
NB the Python-2.7
exec statement also accepts a 3-tuple as parameter
exec(expr, globals, locals) which syntax is equivalent to the exec-function in Python-3.
No cmp-parameter in sort/sorted
Python-3 does no longer support the
cmp parameter for
sorted(x). We use the
key parameter instead.
For locale-sensitive sorting, use the s3compat alternative
# Python-2 pattern, not working in Python-3: x = sorted(x, cmp=locale.strcoll) # Python-3 pattern, not working with unicode in Python-2: x = sorted(x, key=locale.strxfrm) # Compatible pattern: from s3compat import sorted_locale x = sorted_locale(x)
No implicit package-relative imports
Python-3 does not search for modules relative to the current module in the same package - unless explicitly indicated by leading
.. in the module path.
from s3datetime import s3_format_datetime # inside modules/s3, not working in Python-3
Python-2.7 would search relative to the current module, but on the other hand, it supports the explicit-relative syntax as well.
So we decide that only explicit paths shall be used in imports.
To import a module in the same package (e.g. within s3), either use explicit-relative syntax:
from .s3datetime import s3_format_datetime # inside modules/s3, preferred variant
...or an absolute path relative to modules (or the global python path):
from s3.s3datetime import s3_format_datetime # inside modules/s3, acceptable alternative
Outside of modules/s3, you should always import from the top-level of the s3 package (because the package structure may change over time):
from s3 import s3_format_datetime # outside modules/s3
As the locations and names of some libraries have changed in Python-3, we use the compatibility module (
modules/s3compat.py) to implement suitable alternatives. Similarily, the compat module provides alternatives for other objects such as types, functions and certain common patterns.
Where an object is provided by modules/s3compat.py, it MUST be imported from there if used.
The following objects are provided by s3compat:
|PY2||boolean||Constant indicating whether we're currently running on Py2, should only be used if alternatives cannot be generalized|
|Cookie||module||maps to http.cookies in Py3|
|pickle||module||replaces cPickle in Py2|
|urlparse||module||maps to urllib.parse in Py3|
|urllib2||module||maps to urllib.requests in Py3, which contains only part of Py2's urllib2 - some urllib2 objects therefore need to be imported separately (see below)|
|sorted_locale||lambda i||locale-sensitive sorting, |
|name2codepoint||function||from htmlentitydefs (Py2) resp. html.entities (Py3)|
|urlencode||function||replaces urllib.urlencode in Py2|
|urllib_quote||function||replaces urllib.quote in Py2|
|urlopen||function||replaces urllib.urlopen and urllib2.urlopen in Py2|
|xrange||function||maps to |
|zip_longest||function||replaces itertools.izip_longest in Py2 (which has been renamed to zip_longest in Py3)|
|long||type||same as |
|unicodeT||type||for type checking, instead of |
Type Tuples for isinstance()
|CLASS_TYPES||tuple||maps to tuple of all known class types: |
|INTEGER_TYPES||tuple||maps to tuple of all known integer types: (int,long) in Py2, (int,) in Py3|
|STRING_TYPES||tuple||maps to tuple of all known string types: (str,unicode) in Py2, (str,) in Py3|
Other Classes and Exceptions
|HTTPError||Exception||replaces urllib2.HTTPError in Py2, NB HTTPError is a subclass of URLError, so must be caught first in order to differentiate|
|HTMLParser||class||replaces HTMLParser.HTMLParser in Py2|
|StringIO||class/function||maps to cStringIO.StringIO in Py2 (which is a function rather than a class), so can't use this for type checking|
|BytesIO||class/function||for binary data streams, same as StringIO in Py2, but different in Py3|
|URLError||Exception||replaces urllib2.URLError in Py2|
No dict.iteritems, iterkeys or itervalues
Python-3 does no longer have
dict.iteritems(). We use
# Deprecated: for x, y in d.iteritems(): # Compatible: for x, y in d.items():
NB In Python-2,
dict.items() returns a list of tuples, but in Python-3 it returns a dict view object that is sensitive to changes of the dict. If a list is required, or the dict is changed inside the loop, the result of
dict.items() must be converted into a list explicitly.
The same applies to
dict.keys() instead) and
dict.has_key() method has been removed in Python-3 in favor of the
x in y pattern, which is also available (and equivalent) in Python-2.7:
# Deprecated: if d.has_key(k): # New Standard: if k in d:
Map, Filter and Zip return generators
In Python-3, the
zip() functions return generator objects rather than lists. This is fine when we want to iterate over the result, especially when there is a chance to break out of the loop early.
But where lists are required, the return value must be converted explicitly using the list constructor.
# This is fine: for item in map(func, values): # This could be wrong: result = map(func, values) # This is better: result = list(map(func, values)) # This could be even better: result = [func(v) for v in values]
NB For building a list from a single iterable argument, we prefer list comprehensions over
filter() for readability and speed.
NB Generator objects (unlike lists, tuples or sets) can only be iterated over once, and they cannot be accessed by index
Since there is no difference between
str in Python-3, using the
encode() method will produce
bytes rather than
bytes object differs from string in that it is an array of integers rather than an array of characters. It will also give a distorted result with any later
if isinstance(x, unicodeT): # unicodeT maps to str in Py3 x = x.encode("utf-8") # x becomes a bytes-object in Py3, unlike in Py2 where it becomes a str str(x) # thus, in Py3, this results in something like "b'example'" instead of the expected "example" x # is "e" in Py2, but 120 (an integer!) in Py3
If you just want to encode a potential
unicode instance as an utf-8 encoded
s3_str rather than
# Do this instead: x = s3_str(x)
If you need to exclude non-string types from the conversion, you can keep the type-check:
if isinstance(x, unicodeT): x = s3_str(x)
Don't str.decode either
str type has no
decode method in Python3.
To convert a utf-8 encoded
unicode in Py2, use
NB Do not attempt to utf-8-decode UI strings on the client side either when the server runs Python-3.
next(i) not i.next()
In Python-3 the
i.next method of iterators has been renamed into
i.__next__, and should not be called explicitly but via the built-in
This function is also available in Python-2.7 (where it calls i.next() instead), so we generally use the next function:
i = iter(l) # Deprecated: item = i.next() # Forward+backward-compatible way to do it: item = next(i)
Cannot use LazyT as sorting key
This may be a temporary issue:
LazyT does not define a
__lt__ method which is used for sorting in Python-3, but it does define a
__cmp__ which is used by Python-2.7 and therefore works. For the same reason, sorting an array of
lazyT does not work in Python-3.
As a workaround, wrap the
s3_str when sorting or using it as sorting key.
a = [T("quick"), T("lazy"), T("fox")] # This will crash with a TypeError in Py3: a.sort() # This will work both in Py2 and Py3: a.sort(key=lambda item: s3_str(item))
XML with XML declaration must be bytes
Any XML that contains an XML declaration must be UTF-8 encoded
bytes. In Python-2,
bytes is synonymous with
str - but in Python-3 it is a different data type.
S3Resource.export_xmlwill always return
- if string operations must be performed on such XML, use
s3_strto convert it
Certain libraries expect a file-like object to represent binary data. Whilst in Python-2, this can be handled with
StringIO, Python-3 requires to use
zipfileexpects and returns
Sorting requires type consistency
When sorting iterables with
sorted(i), all elements must have the same type - otherwise it will raise a
TypeError in Python-3.
This is particularly relevant when the iterable can contain
None. In such a case, use a key function to deal with None:
l = [4,2,7,None] # Works in Py2, but raises a TypeError in Py3: l.sort() # Works in both: l.sort(key=lambda item: item if item is not None else -float('inf'))
Turkish letters İ and ı
In Turkish, the letters
i are not a upper/lowercase pair. Instead, there are two pairs
(İ, i) and
(I, ı), i.e. one with and one without the dot above.
According to the Unicode spec, the lowercase pendant for
İ is a sequence of two unicode characters, namely the
i (with the dot) and the code point U0307 which means "with dot above". The latter is there to preserve the information about the dot for the conversion back to uppercase.
Python-2 did not implement the U0307 character, so it converted the letters like this:
# Actually wrong in both cases, but consistently so: >>> u"İ".lower().upper() u'I' >>> u"ı".upper().lower() u'i' # NB with utf-8-encoded str, Python-2 doesn't "İ".lower() at all! >>> print "İ".lower() İ
Python-3, where all str are unicode, does implement the U0307 character, so the behavior is different:
>>> "İ".lower().upper() 'İ' # But: same inconsistency as Py2 with the dot-less lowercase ı >>> "ı".upper().lower() 'i'
Critically, the U0307 character changes the string length (it's an extra character!):
# Python-2 >>> len(u"İ".lower()) 1 # Python-3 >>> len("İ".lower()) 2
This is just something to keep in mind - an actual forward/backward compatibility pattern must be developed for the specific use-case. Neither the Python-2 nor the Python-3 behavior are particularly helpful for generalization, the Turkish I's always need special treatment.
Be careful with character-wise string processing after
lower() when Turkish letters could be involved!
NB Many other platforms, e.g. ECMA-Script, implement the same behavior as Python-3 (i.e. the Unicode spec), so there is cross-platform consistency, at least.