Changes between Version 61 and Version 62 of DeveloperGuidelines/CodeConventions


Ignore:
Timestamp:
01/26/16 12:11:40 (9 years ago)
Author:
Dominic König
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • DeveloperGuidelines/CodeConventions

    v61 v62  
    9090Such 'unicode' objects are not printable, though, i.e. they are not generally understood outside of the Python VM. When writing to interfaces, unicode-objects must be ''encoded'' as strings of printable characters, which Python represents as 'str' objects. The most common character encoding that covers all unicode characters is UTF-8.
    9191
    92 The str() constructor in Python 2 assumes that its argument is ASCII-encoded, and raises an exception for unicode-objects that contain non-ASCII characters. To prevent that, we must implement safe ways for converting unicode into str, ''enforcing'' UTF-8 encoding.
    93 
    94 Additionally, indices in str objects count byte-wise, not character-wise - which can lead to invalid characters when extracting substrings from UTF-8 encoded strings. Further, in Python 2, str.lower() and str.upper() may not work correctly for some unicode characters (e.g. "Ẽ".lower() gives "Ẽ" again - instead of "ẽ"), depending on the server locale setting. Therefore, for any substring- or character-operations we must safely ''decode'' the str into a unicode object, ''assuming'' UTF-8 encoding.
    95 
     92The str() constructor in Python 2 assumes that its argument is ASCII-encoded, and raises an exception for unicode-objects that contain non-ASCII characters. To prevent that, we must implement safe ways to '''encode''' unicode into str, ''enforcing'' UTF-8 encoding.
     93
     94Additionally, indices in str objects count byte-wise, not character-wise - which can lead to invalid characters when extracting substrings from UTF-8 encoded strings. Further, in Python 2, str.lower() and str.upper() may not work correctly for some unicode characters (e.g. "Ẽ".lower() gives "Ẽ" again - instead of "ẽ"), depending on the server locale setting. Therefore, for any substring- or character-operations we must safely '''decode''' the str into a unicode object, ''assuming'' UTF-8 encoding.
    9695==== Unicode-Guideline ====
    9796