wiki:DeveloperGuidelines/Py_2_3

Python 2/3 Compatibility

This guideline documents coding conventions to achieve hybrid Python-2.7/Python-3.5 compatibility in Sahana.

It's a working document that will be added to as we move towards full Python-3 compatibility.

Syntax

No print statements

Don't use the print statement anywhere:

print "example" # deprecated

You shouldn't use the print-function either, because it can clash with uWSGI:

print("example") # not good

...but it can be tolerated in CLI scripts which don't run in the WSGI environment.

Best option for debug output is to use sys.stderr.write:

import sys
sys.stderr.write("example\n") # better

...or the logger (as it can be configured globally to write to a log file instead of the system console):

current.log.debug("example") # even better

Avoid overriding built-ins and wildcard-imports

Except where it is deliberate, we should avoid to use the names of built-in functions or types as variable names.

Overriding a built-in function/type can - depending on scope - have adverse side-effects, which can then be very hard to trace back to a cause.

Example:

# Somewhere at the top of the module
range = lambda x, y: (x, y)

...

# Somewhere further down in the module
class Example(object):

    def __init__(self, maximum):

        self.indexes = [x for x in range(0, maximum)]

# Doesn't crash, but doesn't behave as expected either:
#   my_example.indexes should be [0, 1, 2, 3, 4], but instead it is (0, 5)
my_example = Example(5)

The developer of the Example-class will find it difficult to understand why it doesn't work as expected - so we should avoid such situations.

Python-3 introduces some new built-in names (e.g. "input"), so this needs extra care. A good strategy is to use a static code checker (e.g. pylint) to detect overrides, or to use an editor that highlights built-in names right away.

A very tricky part of this are wildcard-imports like from dateutil.rrule import * - these wildcard imports can (unintentionally) override built-ins without that being obvious (dateutil.rrule overrides range, for instance). Therefore, we should avoid all wildcard-imports, except from internal modules (e.g. from s3 import *) where we have control over what they expose and what not.

Use as-syntax for catching exceptions

When catching exceptions in a variable, don't use the comma-syntax:

try:
    ...
except Exception, e: # deprecated
    ...

Instead, use the as-keyword:

try:
    ...
except Exception as e: # new standard
    ...

Raise Exceptions Instances

Python-3 does no longer support the raise E, V, T syntax - the new syntax is raise E(V).with_traceback(T).

However, the traceback is rarely required - and where E is an Exception class (rather than a string), we can use the raise E(V) syntax which works in all Python versions.

# Works in Py2, but not in Py3:
raise SyntaxError, "Error Message"

# Works in all Python versions, hence our Standard:
raise SyntaxError("Error Message")

If a traceback object must be passed (which is rarely needed), then we must use the PY2 constant to implement alternative statements.

No exec statements

In Python-3, the exec statement has become a function exec(). Python-2.7 accepts the function-syntax as well, so we use it throughout:

# Deprecated:
exec pyexpr
# New Standard:
exec(pyexpr)

NB the Python-2.7 exec statement also accepts a 3-tuple as parameter exec(expr, globals, locals) which syntax is equivalent to the exec-function in Python-3.

No cmp-parameter in sort/sorted

Python-3 does no longer support the cmp parameter for x.sort() and sorted(x). We use the key parameter instead.

For locale-sensitive sorting, use the s3compat alternative sorted_locale:

# Python-2 pattern, not working in Python-3:
x = sorted(x, cmp=locale.strcoll)

# Python-3 pattern, not working with unicode in Python-2:
x = sorted(x, key=locale.strxfrm)

# Compatible pattern:
from s3compat import sorted_locale
x = sorted_locale(x)

Usage

No implicit package-relative imports

Python-3 does not search for modules relative to the current module in the same package - unless explicitly indicated by leading . or .. in the module path.

from s3datetime import s3_format_datetime # inside modules/s3, not working in Python-3

Python-2.7 would search relative to the current module, but on the other hand, it supports the explicit-relative syntax as well.

So we decide that only explicit paths shall be used in imports.

To import a module in the same package (e.g. within s3), either use explicit-relative syntax:

from .s3datetime import s3_format_datetime # inside modules/s3, preferred variant

...or an absolute path relative to modules (or the global python path):

from s3.s3datetime import s3_format_datetime # inside modules/s3, acceptable alternative

Outside of modules/s3, you should always import from the top-level of the s3 package (because the package structure may change over time):

from s3 import s3_format_datetime # outside modules/s3

Alternative Imports

As the locations and names of some libraries have changed in Python-3, we use the compatibility module (modules/s3compat.py) to implement suitable alternatives. Similarily, the compat module provides alternatives for other objects such as types, functions and certain common patterns.

Where an object is provided by modules/s3compat.py, it MUST be imported from there if used.

The following objects are provided by s3compat:

Constants

Name Type Comments/Caveats
PY2booleanConstant indicating whether we're currently running on Py2, should only be used if alternatives cannot be generalized

Libraries

Name Type Comments/Caveats
Cookiemodulemaps to http.cookies in Py3
picklemodulereplaces cPickle in Py2
urlparsemodulemaps to urllib.parse in Py3
urllib2modulemaps to urllib.requests in Py3, which contains only part of Py2's urllib2 - some urllib2 objects therefore need to be imported separately (see below)

Functions

Name Type Comments/Caveats
reducefunction
reloadfunction
sorted_localelambda ilocale-sensitive sorting, sorted(i, cmp=locale.strcoll) in Py2, sorted(i, key=locale.strxfrm) in Py3
name2codepointfunctionfrom htmlentitydefs (Py2) resp. html.entities (Py3)
unichrfunction
urlencodefunctionreplaces urllib.urlencode in Py2
urllib_quotefunctionreplaces urllib.quote in Py2
urlopenfunctionreplaces urllib.urlopen and urllib2.urlopen in Py2
xrangefunctionmaps to range in Py3, since xrange does no longer exist (but range behaves like it)
zip_longestfunctionreplaces itertools.izip_longest in Py2 (which has been renamed to zip_longest in Py3)

Types

Name Type Comments/Caveats
basestringtype
longtypesame as int in Py3, so can occasionally lead to redundancy
unicodeTtypefor type checking, instead of unicode in Py2, maps to str in Py3
ClassTypetypereplaces types.ClassType (old-style classes) in Py2, maps to type in Py3 (old-style classes do no longer exist)

Type Tuples for isinstance()

Name Type Comments/Caveats
CLASS_TYPEStuplemaps to tuple of all known class types: (type,types.ClassType) in Py2, just (type,) in Py3
INTEGER_TYPEStuplemaps to tuple of all known integer types: (int,long) in Py2, (int,) in Py3
STRING_TYPEStuplemaps to tuple of all known string types: (str,unicode) in Py2, (str,) in Py3

Other Classes and Exceptions

Name Type Comments/Caveats
HTTPErrorExceptionreplaces urllib2.HTTPError in Py2, NB HTTPError is a subclass of URLError, so must be caught first in order to differentiate
HTMLParserclassreplaces HTMLParser.HTMLParser in Py2
StringIOclass/functionmaps to cStringIO.StringIO in Py2 (which is a function rather than a class), so can't use this for type checking
BytesIOclass/functionfor binary data streams, same as StringIO in Py2, but different in Py3
URLErrorExceptionreplaces urllib2.URLError in Py2

No dict.iteritems, iterkeys or itervalues

Python-3 does no longer have dict.iteritems(). We use dict.items() instead:

# Deprecated:
for x, y in d.iteritems():
# Compatible:
for x, y in d.items():

NB In Python-2, dict.items() returns a list of tuples, but in Python-3 it returns a dict view object that is sensitive to changes of the dict. If a list is required, or the dict is changed inside the loop, the result of dict.items() must be converted into a list explicitly.

The same applies to dict.iterkeys() (use dict.keys() instead) and dict.itervalues() (use dict.values() instead).

No dict.has_key

The dict.has_key() method has been removed in Python-3 in favor of the x in y pattern, which is also available (and equivalent) in Python-2.7:

# Deprecated:
if d.has_key(k):
# New Standard:
if k in d:

Map, Filter and Zip return generators

In Python-3, the map(), filter and zip() functions return generator objects rather than lists. This is fine when we want to iterate over the result, especially when there is a chance to break out of the loop early.

But where lists are required, the return value must be converted explicitly using the list constructor.

# This is fine:
for item in map(func, values):
# This could be wrong:
result = map(func, values)
# This is better:
result = list(map(func, values))
# This could be even better:
result = [func(v) for v in values]

NB For building a list from a single iterable argument, we prefer list comprehensions over map() or filter() for readability and speed.

NB Generator objects (unlike lists, tuples or sets) can only be iterated over once, and they cannot be accessed by index

Don't unicode.encode

Since there is no difference between unicode and str in Python-3, using the encode() method will produce bytes rather than str. A bytes object differs from string in that it is an array of integers rather than an array of characters. It will also give a distorted result with any later str() or s3_str().

if isinstance(x, unicodeT):  # unicodeT maps to str in Py3
    x = x.encode("utf-8")    # x becomes a bytes-object in Py3, unlike in Py2 where it becomes a str

str(x)                       # thus, in Py3, this results in something like "b'example'" instead of the expected "example"

x[1]                         # is "e" in Py2, but 120 (an integer!) in Py3

If you just want to encode a potential unicode instance as an utf-8 encoded str, use s3_str rather than unicode.encode:

# Do this instead:
x = s3_str(x)

If you need to exclude non-string types from the conversion, you can keep the type-check:

if isinstance(x, unicodeT):
    x = s3_str(x)

Don't str.decode either

Similar, the str type has no decode method in Python3.

To convert a utf-8 encoded str to unicode in Py2, use s3_unicode.

NB Do not attempt to utf-8-decode UI strings on the client side either when the server runs Python-3.

next(i) not i.next()

In Python-3 the i.next method of iterators has been renamed into i.__next__, and should not be called explicitly but via the built-in next(i) function.

This function is also available in Python-2.7 (where it calls i.next() instead), so we generally use the next function:

i = iter(l)
# Deprecated:
item = i.next()
# Forward+backward-compatible way to do it:
item = next(i)

Cannot use LazyT as sorting key

This may be a temporary issue: Web2py's LazyT does not define a __lt__ method which is used for sorting in Python-3, but it does define a __cmp__ which is used by Python-2.7 and therefore works. For the same reason, sorting an array of lazyT does not work in Python-3.

As a workaround, wrap the lazyT in s3_str when sorting or using it as sorting key.

a = [T("quick"), T("lazy"), T("fox")]
# This will crash with a TypeError in Py3:
a.sort()
# This will work both in Py2 and Py3:
a.sort(key=lambda item: s3_str(item))

XML with XML declaration must be bytes

Any XML that contains an XML declaration must be UTF-8 encoded bytes. In Python-2, bytes is synonymous with str - but in Python-3 it is a different data type.

  • S3XML.tostring and S3Resource.export_xml will always return bytes
  • if string operations must be performed on such XML, use s3_str to convert it

BytesIO

Certain libraries expect a file-like object to represent binary data. Whilst in Python-2, this can be handled with StringIO, Python-3 requires to use BytesIO instead.

  • zipfile expects and returns bytes

Sorting requires type consistency

When sorting iterables with i.sort() or sorted(i), all elements must have the same type - otherwise it will raise a TypeError in Python-3.

This is particularly relevant when the iterable can contain None. In such a case, use a key function to deal with None:

l = [4,2,7,None]

# Works in Py2, but raises a TypeError in Py3:
l.sort()

# Works in both:
l.sort(key=lambda item: item if item is not None else -float('inf'))

Turkish letters İ and ı

In Turkish, the letters I and i are not a upper/lowercase pair. Instead, there are two pairs (İ, i) and (I, ı), i.e. one with and one without the dot above.

According to the Unicode spec, the lowercase pendant for İ is a sequence of two unicode characters, namely the i (with the dot) and the code point U0307 which means "with dot above". The latter is there to preserve the information about the dot for the conversion back to uppercase.

Python-2 did not implement the U0307 character, so it converted the letters like this:

# Actually wrong in both cases, but consistently so:
>>> u"İ".lower().upper()
u'I'
>>> u"ı".upper().lower()
u'i'

# NB with utf-8-encoded str, Python-2 doesn't "İ".lower() at all!
>>> print "İ".lower()
İ

Python-3, where all str are unicode, does implement the U0307 character, so the behavior is different:

>>> "İ".lower().upper()
'İ'
# But: same inconsistency as Py2 with the dot-less lowercase ı
>>> "ı".upper().lower()
'i'

Critically, the U0307 character changes the string length (it's an extra character!):

# Python-2
>>> len(u"İ".lower())
1

# Python-3
>>> len("İ".lower())
2

This is just something to keep in mind - an actual forward/backward compatibility pattern must be developed for the specific use-case. Neither the Python-2 nor the Python-3 behavior are particularly helpful for generalization, the Turkish I's always need special treatment.

Be careful with character-wise string processing after lower() when Turkish letters could be involved!

NB Many other platforms, e.g. ECMA-Script, implement the same behavior as Python-3 (i.e. the Unicode spec), so there is cross-platform consistency, at least.

Last modified 5 years ago Last modified on 07/24/19 08:24:37
Note: See TracWiki for help on using the wiki.