Context Navigation

Changes between Version 61 and Version 62 of DeveloperGuidelines/CodeConventions

Timestamp:: 01/26/16 12:11:40 (10 years ago)
Author:: Dominic König
Comment:: --

Legend:

: Unmodified
: Added
: Removed
: Modified

DeveloperGuidelines/CodeConventions

-              v61
+              v62
 Such 'unicode' objects are not printable, though, i.e. they are not generally understood outside of the Python VM. When writing to interfaces, unicode-objects must be ''encoded'' as strings of printable characters, which Python represents as 'str' objects. The most common character encoding that covers all unicode characters is UTF-8.
+The str() constructor in Python 2 assumes that its argument is ASCII-encoded, and raises an exception for unicode-objects that contain non-ASCII characters. To prevent that, we must implement safe ways for converting unicode into str, ''enforcing'' UTF-8 encoding.
+Additionally, indices in str objects count byte-wise, not character-wise - which can lead to invalid characters when extracting substrings from UTF-8 encoded strings. Further, in Python 2, str.lower() and str.upper() may not work correctly for some unicode characters (e.g. "Ẽ".lower() gives "Ẽ" again - instead of "ẽ"), depending on the server locale setting. Therefore, for any substring- or character-operations we must safely ''decode'' the str into a unicode object, ''assuming'' UTF-8 encoding.
+The str() constructor in Python 2 assumes that its argument is ASCII-encoded, and raises an exception for unicode-objects that contain non-ASCII characters. To prevent that, we must implement safe ways to '''encode''' unicode into str, ''enforcing'' UTF-8 encoding.
+Additionally, indices in str objects count byte-wise, not character-wise - which can lead to invalid characters when extracting substrings from UTF-8 encoded strings. Further, in Python 2, str.lower() and str.upper() may not work correctly for some unicode characters (e.g. "Ẽ".lower() gives "Ẽ" again - instead of "ẽ"), depending on the server locale setting. Therefore, for any substring- or character-operations we must safely '''decode''' the str into a unicode object, ''assuming'' UTF-8 encoding.
 ==== Unicode-Guideline ====