Commit 9bf2b3ae authored by Ezio Melotti's avatar Ezio Melotti

Update comment about surrogates.

parent 2f194b90
...@@ -2450,11 +2450,11 @@ PyObject *PyUnicode_DecodeUTF8Stateful(const char *s, ...@@ -2450,11 +2450,11 @@ PyObject *PyUnicode_DecodeUTF8Stateful(const char *s,
break; break;
case 3: case 3:
/* XXX: surrogates shouldn't be valid UTF-8! /* Decoding UTF-8 sequences in range \xed\xa0\x80-\xed\xbf\xbf
see http://www.unicode.org/versions/Unicode5.2.0/ch03.pdf will result in surrogates in range d800-dfff. Surrogates are
(table 3-7) and http://www.rfc-editor.org/rfc/rfc3629.txt not valid UTF-8 so they are rejected.
Uncomment the 2 lines below to make them invalid, See http://www.unicode.org/versions/Unicode5.2.0/ch03.pdf
codepoints: d800-dfff; UTF-8: \xed\xa0\x80-\xed\xbf\xbf. */ (table 3-7) and http://www.rfc-editor.org/rfc/rfc3629.txt */
if ((s[1] & 0xc0) != 0x80 || if ((s[1] & 0xc0) != 0x80 ||
(s[2] & 0xc0) != 0x80 || (s[2] & 0xc0) != 0x80 ||
((unsigned char)s[0] == 0xE0 && ((unsigned char)s[0] == 0xE0 &&
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment