Changes between Version 74 and Version 75 of DeveloperGuidelines/Py_2_3


Ignore:
Timestamp:
07/10/19 22:31:41 (5 years ago)
Author:
Dominic König
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • DeveloperGuidelines/Py_2_3

    v74 v75  
    315315l.sort(key=lambda item: item if item is not None else -float('inf'))
    316316}}}
     317
     318=== Turkish letters İ and ı ===
     319
     320In Turkish, the letters {{{I}}} and {{{i}}} are not a upper/lowercase pair. Instead, there are two pairs {{{(İ, i)}}} and {{{(I, ı)}}}, i.e. one with and one without the dot above.
     321
     322According to the Unicode spec, the lowercase pendant for {{{İ}}} is a sequence of two unicode characters, namely the {{{i}}} (with the dot) and the code point U0307 which mean "with dot above". The latter is there to preserve the information about the dot for the conversion back to uppercase.
     323
     324Python-2 did not implement the U0307 character, so it converted the letters like this:
     325{{{#!python
     326>>> u"İ".lower().upper()
     327u'I'
     328>>> u"ı".upper().lower()
     329u'i'
     330
     331# NB with utf-8-encoded str, Python-2 doesn't "İ".lower() at all!
     332>>> print "İ".lower()
     333İ
     334}}}
     335
     336Python-3 does implement the U0307 character, so the behavior is different:
     337{{{#!python
     338>>> "İ".lower().upper()
     339'İ'
     340>>> "ı".upper().lower()
     341'i'
     342}}}
     343
     344Critically, the U0307 character changes the string length (it's an extra character!):
     345{{{#!python
     346# Python-2
     347>>> len(u"İ".lower())
     3481
     349
     350# Python-3
     351>>> len("İ".lower())
     3522
     353}}}
     354
     355This is just something to keep in mind - an actual forward/backward compatibility pattern must be developed for the specific use-case. Neither the Python-2 nor the Python-3 are particularly helpful for generalization, the Turkish I's always need special treatment.