Python 2/3 Compatibility
Table of Contents
- Syntax
- Usage
- No implicit package-relative imports
- Alternative Imports
- No dict.iteritems, iterkeys or itervalues
- No dict.has_key
- Map, Filter and Zip return generators
- Don't unicode.encode
- Don't str.decode either
- next(i) not i.next()
- Cannot use LazyT as sorting key
- XML with XML declaration must be bytes
- BytesIO
- Sorting requires type consistency
- Turkish letters İ and ı
This guideline documents coding conventions to achieve hybrid Python-2.7/Python-3.5 compatibility in Sahana.
It's a working document that will be added to as we move towards full Python-3 compatibility.
Syntax
No print statements
Don't use the print statement anywhere:
print "example" # deprecated
You shouldn't use the print-function either, because it can clash with uWSGI:
print("example") # not good
...but it can be tolerated in CLI scripts which don't run in the WSGI environment.
Best option for debug output is to use sys.stderr.write:
import sys sys.stderr.write("example\n") # better
...or the logger (as it can be configured globally to write to a log file instead of the system console):
current.log.debug("example") # even better
Avoid overriding built-ins and wildcard-imports
Except where it is deliberate, we should avoid to use the names of built-in functions or types as variable names.
Overriding a built-in function/type can - depending on scope - have adverse side-effects, which can then be very hard to trace back to a cause.
Example:
# Somewhere at the top of the module range = lambda x, y: (x, y) ... # Somewhere further down in the module class Example(object): def __init__(self, maximum): self.indexes = [x for x in range(0, maximum)] # Doesn't crash, but doesn't behave as expected either: # my_example.indexes should be [0, 1, 2, 3, 4], but instead it is (0, 5) my_example = Example(5)
The developer of the Example-class will find it difficult to understand why it doesn't work as expected - so we should avoid such situations.
Python-3 introduces some new built-in names (e.g. "input"), so this needs extra care. A good strategy is to use a static code checker (e.g. pylint) to detect overrides, or to use an editor that highlights built-in names right away.
A very tricky part of this are wildcard-imports like from dateutil.rrule import *
- these wildcard imports can (unintentionally) override built-ins without that being obvious (dateutil.rrule overrides range, for instance). Therefore, we should avoid all wildcard-imports, except from internal modules (e.g. from s3 import *
) where we have control over what they expose and what not.
Use as-syntax for catching exceptions
When catching exceptions in a variable, don't use the comma-syntax:
try: ... except Exception, e: # deprecated ...
Instead, use the as-keyword:
try: ... except Exception as e: # new standard ...
Raise Exceptions Instances
Python-3 does no longer support the raise E, V, T
syntax - the new syntax is raise E(V).with_traceback(T)
.
However, the traceback is rarely required - and where E is an Exception class (rather than a string), we can use the raise E(V)
syntax which works in all Python versions.
# Works in Py2, but not in Py3: raise SyntaxError, "Error Message" # Works in all Python versions, hence our Standard: raise SyntaxError("Error Message")
If a traceback object must be passed (which is rarely needed), then we must use the PY2 constant to implement alternative statements.
No exec statements
In Python-3, the exec
statement has become a function exec()
. Python-2.7 accepts the function-syntax as well, so we use it throughout:
# Deprecated: exec pyexpr # New Standard: exec(pyexpr)
NB the Python-2.7 exec
statement also accepts a 3-tuple as parameter exec(expr, globals, locals)
which syntax is equivalent to the exec-function in Python-3.
No cmp-parameter in sort/sorted
Python-3 does no longer support the cmp
parameter for x.sort()
and sorted(x)
. We use the key
parameter instead.
For locale-sensitive sorting, use the s3compat alternative sorted_locale
:
# Python-2 pattern, not working in Python-3: x = sorted(x, cmp=locale.strcoll) # Python-3 pattern, not working with unicode in Python-2: x = sorted(x, key=locale.strxfrm) # Compatible pattern: from s3compat import sorted_locale x = sorted_locale(x)
Usage
No implicit package-relative imports
Python-3 does not search for modules relative to the current module in the same package - unless explicitly indicated by leading .
or ..
in the module path.
from s3datetime import s3_format_datetime # inside modules/s3, not working in Python-3
Python-2.7 would search relative to the current module, but on the other hand, it supports the explicit-relative syntax as well.
So we decide that only explicit paths shall be used in imports.
To import a module in the same package (e.g. within s3), either use explicit-relative syntax:
from .s3datetime import s3_format_datetime # inside modules/s3, preferred variant
...or an absolute path relative to modules (or the global python path):
from s3.s3datetime import s3_format_datetime # inside modules/s3, acceptable alternative
Outside of modules/s3, you should always import from the top-level of the s3 package (because the package structure may change over time):
from s3 import s3_format_datetime # outside modules/s3
Alternative Imports
As the locations and names of some libraries have changed in Python-3, we use the compatibility module (modules/s3compat.py
) to implement suitable alternatives. Similarily, the compat module provides alternatives for other objects such as types, functions and certain common patterns.
Where an object is provided by modules/s3compat.py, it MUST be imported from there if used.
The following objects are provided by s3compat:
Constants
Name | Type | Comments/Caveats |
---|---|---|
PY2 | boolean | Constant indicating whether we're currently running on Py2, should only be used if alternatives cannot be generalized |
Libraries
Name | Type | Comments/Caveats |
---|---|---|
Cookie | module | maps to http.cookies in Py3 |
pickle | module | replaces cPickle in Py2 |
urlparse | module | maps to urllib.parse in Py3 |
urllib2 | module | maps to urllib.requests in Py3, which contains only part of Py2's urllib2 - some urllib2 objects therefore need to be imported separately (see below) |
Functions
Name | Type | Comments/Caveats |
---|---|---|
reduce | function | |
reload | function | |
sorted_locale | lambda i | locale-sensitive sorting, sorted(i, cmp=locale.strcoll) in Py2, sorted(i, key=locale.strxfrm) in Py3
|
name2codepoint | function | from htmlentitydefs (Py2) resp. html.entities (Py3) |
unichr | function | |
urlencode | function | replaces urllib.urlencode in Py2 |
urllib_quote | function | replaces urllib.quote in Py2 |
urlopen | function | replaces urllib.urlopen and urllib2.urlopen in Py2 |
xrange | function | maps to range in Py3, since xrange does no longer exist (but range behaves like it)
|
zip_longest | function | replaces itertools.izip_longest in Py2 (which has been renamed to zip_longest in Py3) |
Types
Name | Type | Comments/Caveats |
---|---|---|
basestring | type | |
long | type | same as int in Py3, so can occasionally lead to redundancy
|
unicodeT | type | for type checking, instead of unicode in Py2, maps to str in Py3
|
ClassType | type | replaces types.ClassType (old-style classes) in Py2, maps to type in Py3 (old-style classes do no longer exist)
|
Type Tuples for isinstance()
Name | Type | Comments/Caveats |
---|---|---|
CLASS_TYPES | tuple | maps to tuple of all known class types: (type,types.ClassType) in Py2, just (type,) in Py3
|
INTEGER_TYPES | tuple | maps to tuple of all known integer types: (int,long) in Py2, (int,) in Py3 |
STRING_TYPES | tuple | maps to tuple of all known string types: (str,unicode) in Py2, (str,) in Py3 |
Other Classes and Exceptions
Name | Type | Comments/Caveats |
---|---|---|
HTTPError | Exception | replaces urllib2.HTTPError in Py2, NB HTTPError is a subclass of URLError, so must be caught first in order to differentiate |
HTMLParser | class | replaces HTMLParser.HTMLParser in Py2 |
StringIO | class/function | maps to cStringIO.StringIO in Py2 (which is a function rather than a class), so can't use this for type checking |
BytesIO | class/function | for binary data streams, same as StringIO in Py2, but different in Py3 |
URLError | Exception | replaces urllib2.URLError in Py2 |
No dict.iteritems, iterkeys or itervalues
Python-3 does no longer have dict.iteritems()
. We use dict.items()
instead:
# Deprecated: for x, y in d.iteritems(): # Compatible: for x, y in d.items():
NB In Python-2, dict.items()
returns a list of tuples, but in Python-3 it returns a dict view object that is sensitive to changes of the dict. If a list is required, or the dict is changed inside the loop, the result of dict.items()
must be converted into a list explicitly.
The same applies to dict.iterkeys()
(use dict.keys()
instead) and dict.itervalues()
(use dict.values()
instead).
No dict.has_key
The dict.has_key()
method has been removed in Python-3 in favor of the x in y
pattern, which is also available (and equivalent) in Python-2.7:
# Deprecated: if d.has_key(k): # New Standard: if k in d:
Map, Filter and Zip return generators
In Python-3, the map()
, filter
and zip()
functions return generator objects rather than lists. This is fine when we want to iterate over the result, especially when there is a chance to break out of the loop early.
But where lists are required, the return value must be converted explicitly using the list constructor.
# This is fine: for item in map(func, values): # This could be wrong: result = map(func, values) # This is better: result = list(map(func, values)) # This could be even better: result = [func(v) for v in values]
NB For building a list from a single iterable argument, we prefer list comprehensions over map()
or filter()
for readability and speed.
NB Generator objects (unlike lists, tuples or sets) can only be iterated over once, and they cannot be accessed by index
Don't unicode.encode
Since there is no difference between unicode
and str
in Python-3, using the encode()
method will produce bytes
rather than str
. A bytes
object differs from string in that it is an array of integers rather than an array of characters. It will also give a distorted result with any later str()
or s3_str()
.
if isinstance(x, unicodeT): # unicodeT maps to str in Py3 x = x.encode("utf-8") # x becomes a bytes-object in Py3, unlike in Py2 where it becomes a str str(x) # thus, in Py3, this results in something like "b'example'" instead of the expected "example" x[1] # is "e" in Py2, but 120 (an integer!) in Py3
If you just want to encode a potential unicode
instance as an utf-8 encoded str
, use s3_str
rather than unicode.encode
:
# Do this instead: x = s3_str(x)
If you need to exclude non-string types from the conversion, you can keep the type-check:
if isinstance(x, unicodeT): x = s3_str(x)
Don't str.decode either
Similar, the str
type has no decode
method in Python3.
To convert a utf-8 encoded str
to unicode
in Py2, use s3_unicode
.
NB Do not attempt to utf-8-decode UI strings on the client side either when the server runs Python-3.
next(i) not i.next()
In Python-3 the i.next
method of iterators has been renamed into i.__next__
, and should not be called explicitly but via the built-in next(i)
function.
This function is also available in Python-2.7 (where it calls i.next() instead), so we generally use the next function:
i = iter(l) # Deprecated: item = i.next() # Forward+backward-compatible way to do it: item = next(i)
Cannot use LazyT as sorting key
This may be a temporary issue:
Web2py's LazyT
does not define a __lt__
method which is used for sorting in Python-3, but it does define a __cmp__
which is used by Python-2.7 and therefore works. For the same reason, sorting an array of lazyT
does not work in Python-3.
As a workaround, wrap the lazyT
in s3_str
when sorting or using it as sorting key.
a = [T("quick"), T("lazy"), T("fox")] # This will crash with a TypeError in Py3: a.sort() # This will work both in Py2 and Py3: a.sort(key=lambda item: s3_str(item))
XML with XML declaration must be bytes
Any XML that contains an XML declaration must be UTF-8 encoded bytes
. In Python-2, bytes
is synonymous with str
- but in Python-3 it is a different data type.
S3XML.tostring
andS3Resource.export_xml
will always returnbytes
- if string operations must be performed on such XML, use
s3_str
to convert it
BytesIO
Certain libraries expect a file-like object to represent binary data. Whilst in Python-2, this can be handled with StringIO
, Python-3 requires to use BytesIO
instead.
zipfile
expects and returnsbytes
Sorting requires type consistency
When sorting iterables with i.sort()
or sorted(i)
, all elements must have the same type - otherwise it will raise a TypeError
in Python-3.
This is particularly relevant when the iterable can contain None
. In such a case, use a key function to deal with None:
l = [4,2,7,None] # Works in Py2, but raises a TypeError in Py3: l.sort() # Works in both: l.sort(key=lambda item: item if item is not None else -float('inf'))
Turkish letters İ and ı
In Turkish, the letters I
and i
are not a upper/lowercase pair. Instead, there are two pairs (İ, i)
and (I, ı)
, i.e. one with and one without the dot above.
According to the Unicode spec, the lowercase pendant for İ
is a sequence of two unicode characters, namely the i
(with the dot) and the code point U0307 which means "with dot above". The latter is there to preserve the information about the dot for the conversion back to uppercase.
Python-2 did not implement the U0307 character, so it converted the letters like this:
# Actually wrong in both cases, but consistently so: >>> u"İ".lower().upper() u'I' >>> u"ı".upper().lower() u'i' # NB with utf-8-encoded str, Python-2 doesn't "İ".lower() at all! >>> print "İ".lower() İ
Python-3, where all str are unicode, does implement the U0307 character, so the behavior is different:
>>> "İ".lower().upper() 'İ' # But: same inconsistency as Py2 with the dot-less lowercase ı >>> "ı".upper().lower() 'i'
Critically, the U0307 character changes the string length (it's an extra character!):
# Python-2 >>> len(u"İ".lower()) 1 # Python-3 >>> len("İ".lower()) 2
This is just something to keep in mind - an actual forward/backward compatibility pattern must be developed for the specific use-case. Neither the Python-2 nor the Python-3 behavior are particularly helpful for generalization, the Turkish I's always need special treatment.
Be careful with character-wise string processing after lower()
when Turkish letters could be involved!
NB Many other platforms, e.g. ECMA-Script, implement the same behavior as Python-3 (i.e. the Unicode spec), so there is cross-platform consistency, at least.