strconv: Switch _utf8_decode_rune to return rune ordinal instead of unicode character
This is a preparatory step for the next patch where we'll be fixing strconv for Python2 builds with --enable-unicode=ucs2, where a unicode character can be taking _2_ unicode points. In that general case relying on unicode objects to represent runes is not good, because many things generally do not work for U+10000 and above, e.g. ord breaks: >>> import sys >>> sys.maxunicode 65535 <-- NOTE indicates UCS2 build >>> s = u'\U00012345' >>> s u'\U00012345' >>> s.encode('utf-8') '\xf0\x92\x8d\x85' >>> len(s) 2 <-- NOTE _not_ 1 >>> ord(s) Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: ord() expected a character, but string of length 2 found so we switch to represent runes as integer, similarly to what Go does.
Showing
Please register or sign in to comment