92 | | The str() constructor in Python 2 assumes that its argument is ASCII-encoded, and raises an exception for unicode-objects that contain non-ASCII characters. To prevent that, we must implement safe ways for converting unicode into str, ''enforcing'' UTF-8 encoding. |
93 | | |
94 | | Additionally, indices in str objects count byte-wise, not character-wise - which can lead to invalid characters when extracting substrings from UTF-8 encoded strings. Further, in Python 2, str.lower() and str.upper() may not work correctly for some unicode characters (e.g. "Ẽ".lower() gives "Ẽ" again - instead of "ẽ"), depending on the server locale setting. Therefore, for any substring- or character-operations we must safely ''decode'' the str into a unicode object, ''assuming'' UTF-8 encoding. |
95 | | |
| 92 | The str() constructor in Python 2 assumes that its argument is ASCII-encoded, and raises an exception for unicode-objects that contain non-ASCII characters. To prevent that, we must implement safe ways to '''encode''' unicode into str, ''enforcing'' UTF-8 encoding. |
| 93 | |
| 94 | Additionally, indices in str objects count byte-wise, not character-wise - which can lead to invalid characters when extracting substrings from UTF-8 encoded strings. Further, in Python 2, str.lower() and str.upper() may not work correctly for some unicode characters (e.g. "Ẽ".lower() gives "Ẽ" again - instead of "ẽ"), depending on the server locale setting. Therefore, for any substring- or character-operations we must safely '''decode''' the str into a unicode object, ''assuming'' UTF-8 encoding. |