• Kirill Smelkov's avatar
    strconv: Switch _utf8_decode_rune to return rune ordinal instead of unicode character · 5cc679ac
    Kirill Smelkov authored
    This is a preparatory step for the next patch where we'll be fixing
    strconv for Python2 builds with --enable-unicode=ucs2, where a unicode
    character can be taking _2_ unicode points.
    
    In that general case relying on unicode objects to represent runes is
    not good, because many things generally do not work for U+10000 and
    above, e.g. ord breaks:
    
        >>> import sys
        >>> sys.maxunicode
        65535                       <-- NOTE indicates UCS2 build
        >>> s = u'\U00012345'
        >>> s
        u'\U00012345'
        >>> s.encode('utf-8')
        '\xf0\x92\x8d\x85'
        >>> len(s)
        2                           <-- NOTE _not_ 1
        >>> ord(s)
        Traceback (most recent call last):
          File "<stdin>", line 1, in <module>
        TypeError: ord() expected a character, but string of length 2 found
    
    so we switch to represent runes as integer, similarly to what Go does.
    5cc679ac
strconv.py 8.52 KB