Skip to content
Projects
Groups
Snippets
Help
Loading...
Help
Support
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
C
cpython
Project overview
Project overview
Details
Activity
Releases
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Issues
0
Issues
0
List
Boards
Labels
Milestones
Merge Requests
0
Merge Requests
0
Analytics
Analytics
Repository
Value Stream
Wiki
Wiki
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Create a new issue
Commits
Issue Boards
Open sidebar
Kirill Smelkov
cpython
Commits
07985ef3
Commit
07985ef3
authored
Jan 25, 2015
by
Serhiy Storchaka
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
Issue #22286: The "backslashreplace" error handlers now works with
decoding and translating.
parent
58f02019
Changes
10
Hide whitespace changes
Inline
Side-by-side
Showing
10 changed files
with
196 additions
and
83 deletions
+196
-83
Doc/howto/unicode.rst
Doc/howto/unicode.rst
+5
-2
Doc/library/codecs.rst
Doc/library/codecs.rst
+9
-5
Doc/library/functions.rst
Doc/library/functions.rst
+2
-3
Doc/library/io.rst
Doc/library/io.rst
+6
-5
Doc/whatsnew/3.5.rst
Doc/whatsnew/3.5.rst
+3
-1
Lib/codecs.py
Lib/codecs.py
+6
-3
Lib/test/test_codeccallbacks.py
Lib/test/test_codeccallbacks.py
+15
-11
Lib/test/test_codecs.py
Lib/test/test_codecs.py
+56
-0
Misc/NEWS
Misc/NEWS
+3
-0
Python/codecs.c
Python/codecs.c
+91
-53
No files found.
Doc/howto/unicode.rst
View file @
07985ef3
...
@@ -280,8 +280,9 @@ and optionally an *errors* argument.
...
@@ -280,8 +280,9 @@ and optionally an *errors* argument.
The *errors* argument specifies the response when the input string can't be
The *errors* argument specifies the response when the input string can't be
converted according to the encoding's rules. Legal values for this argument are
converted according to the encoding's rules. Legal values for this argument are
``'strict'`` (raise a :exc:`UnicodeDecodeError` exception), ``'replace'`` (use
``'strict'`` (raise a :exc:`UnicodeDecodeError` exception), ``'replace'`` (use
``U+FFFD``, ``REPLACEMENT CHARACTER``), or ``'ignore'`` (just leave the
``U+FFFD``, ``REPLACEMENT CHARACTER``), ``'ignore'`` (just leave the
character out of the Unicode result).
character out of the Unicode result), or ``'backslashreplace'`` (inserts a
``\xNN`` escape sequence).
The following examples show the differences::
The following examples show the differences::
>>> b'\x80abc'.decode("utf-8", "strict") #doctest: +NORMALIZE_WHITESPACE
>>> b'\x80abc'.decode("utf-8", "strict") #doctest: +NORMALIZE_WHITESPACE
...
@@ -291,6 +292,8 @@ The following examples show the differences::
...
@@ -291,6 +292,8 @@ The following examples show the differences::
invalid start byte
invalid start byte
>>> b'\x80abc'.decode("utf-8", "replace")
>>> b'\x80abc'.decode("utf-8", "replace")
'\ufffdabc'
'\ufffdabc'
>>> b'\x80abc'.decode("utf-8", "backslashreplace")
'\\x80abc'
>>> b'\x80abc'.decode("utf-8", "ignore")
>>> b'\x80abc'.decode("utf-8", "ignore")
'abc'
'abc'
...
...
Doc/library/codecs.rst
View file @
07985ef3
...
@@ -314,8 +314,8 @@ The following error handlers are only applicable to
...
@@ -314,8 +314,8 @@ The following error handlers are only applicable to
|
| reference (only for encoding). Implemented |
|
| reference (only for encoding). Implemented |
|
| in :func:`xmlcharrefreplace_errors`. |
|
| in :func:`xmlcharrefreplace_errors`. |
+-------------------------+-----------------------------------------------+
+-------------------------+-----------------------------------------------+
|
``'backslashreplace'`` | Replace with backslashed escape sequences
|
|
``'backslashreplace'`` | Replace with backslashed escape sequences
.
|
|
|
(only for encoding). Implemented in
|
|
|
Implemented in
|
|
| :func:`backslashreplace_errors`. |
|
| :func:`backslashreplace_errors`. |
+-------------------------+-----------------------------------------------+
+-------------------------+-----------------------------------------------+
|
``'namereplace'`` | Replace with ``\N{...}`` escape sequences |
|
``'namereplace'`` | Replace with ``\N{...}`` escape sequences |
...
@@ -350,6 +350,10 @@ In addition, the following error handler is specific to the given codecs:
...
@@ -350,6 +350,10 @@ In addition, the following error handler is specific to the given codecs:
.. versionadded:: 3.5
.. versionadded:: 3.5
The ``'namereplace'`` error handler.
The ``'namereplace'`` error handler.
.. versionchanged:: 3.5
The ``'backslashreplace'`` error handlers now works with decoding and
translating.
The set of allowed values can be extended by registering a new named error
The set of allowed values can be extended by registering a new named error
handler:
handler:
...
@@ -417,9 +421,9 @@ functions:
...
@@ -417,9 +421,9 @@ functions:
..
function:: backslashreplace_errors(exception)
..
function:: backslashreplace_errors(exception)
Implements the ``'backslashreplace'`` error handling (for
encoding with
Implements the ``'backslashreplace'`` error handling (for
:term:`text encodings <text encoding>` only):
the
:term:`text encodings <text encoding>` only):
malformed data is
unencodable character is
replaced by a backslashed escape sequence.
replaced by a backslashed escape sequence.
.. function:: namereplace_errors(exception)
.. function:: namereplace_errors(exception)
...
...
Doc/library/functions.rst
View file @
07985ef3
...
@@ -973,9 +973,8 @@ are always available. They are listed here in alphabetical order.
...
@@ -973,9 +973,8 @@ are always available. They are listed here in alphabetical order.
Characters not supported by the encoding are replaced with the
Characters not supported by the encoding are replaced with the
appropriate XML character reference ``&#nnn;``.
appropriate XML character reference ``&#nnn;``.
* ``'backslashreplace'`` (also only supported when writing)
* ``'backslashreplace'`` replaces malformed data by Python's backslashed
replaces unsupported characters with Python's backslashed escape
escape sequences.
sequences.
* ``'namereplace'`` (also only supported when writing)
* ``'namereplace'`` (also only supported when writing)
replaces unsupported characters with ``\N{...}`` escape sequences.
replaces unsupported characters with ``\N{...}`` escape sequences.
...
...
Doc/library/io.rst
View file @
07985ef3
...
@@ -825,11 +825,12 @@ Text I/O
...
@@ -825,11 +825,12 @@ Text I/O
exception if there is an encoding error (the default of ``None`` has the same
exception if there is an encoding error (the default of ``None`` has the same
effect), or pass ``'ignore'`` to ignore errors. (Note that ignoring encoding
effect), or pass ``'ignore'`` to ignore errors. (Note that ignoring encoding
errors can lead to data loss.) ``'replace'`` causes a replacement marker
errors can lead to data loss.) ``'replace'`` causes a replacement marker
(such as ``'?'``) to be inserted where there is malformed data. When
(such as ``'?'``) to be inserted where there is malformed data.
writing, ``'xmlcharrefreplace'`` (replace with the appropriate XML character
``'backslashreplace'`` causes malformed data to be replaced by a
reference), ``'backslashreplace'`` (replace with backslashed escape
backslashed escape sequence. When writing, ``'xmlcharrefreplace'``
sequences) or ``'namereplace'`` (replace with ``\N{...}`` escape sequences)
(replace with the appropriate XML character reference) or ``'namereplace'``
can be used. Any other error handling name that has been registered with
(replace with ``\N{...}`` escape sequences) can be used. Any other error
handling name that has been registered with
:func:`codecs.register_error` is also valid.
:func:`codecs.register_error` is also valid.
.. index::
.. index::
...
...
Doc/whatsnew/3.5.rst
View file @
07985ef3
...
@@ -118,7 +118,9 @@ Other Language Changes
...
@@ -118,7 +118,9 @@ Other Language Changes
Some smaller changes made to the core Python language are:
Some smaller changes made to the core Python language are:
* None yet.
* Added the ``'namereplace'`` error handlers. The ``'backslashreplace'``
error handlers now works with decoding and translating.
(Contributed by Serhiy Storchaka in :issue:`19676` and :issue:`22286`.)
...
...
Lib/codecs.py
View file @
07985ef3
...
@@ -127,7 +127,8 @@ class Codec:
...
@@ -127,7 +127,8 @@ class Codec:
'surrogateescape' - replace with private code points U+DCnn.
'surrogateescape' - replace with private code points U+DCnn.
'xmlcharrefreplace' - Replace with the appropriate XML
'xmlcharrefreplace' - Replace with the appropriate XML
character reference (only for encoding).
character reference (only for encoding).
'backslashreplace' - Replace with backslashed escape sequences
'backslashreplace' - Replace with backslashed escape sequences.
'namereplace' - Replace with
\
\
N{...} escape sequences
(only for encoding).
(only for encoding).
The set of allowed values can be extended via register_error.
The set of allowed values can be extended via register_error.
...
@@ -359,7 +360,8 @@ class StreamWriter(Codec):
...
@@ -359,7 +360,8 @@ class StreamWriter(Codec):
'xmlcharrefreplace' - Replace with the appropriate XML
'xmlcharrefreplace' - Replace with the appropriate XML
character reference.
character reference.
'backslashreplace' - Replace with backslashed escape
'backslashreplace' - Replace with backslashed escape
sequences (only for encoding).
sequences.
'namereplace' - Replace with
\
\
N{...} escape sequences.
The set of allowed parameter values can be extended via
The set of allowed parameter values can be extended via
register_error.
register_error.
...
@@ -429,7 +431,8 @@ class StreamReader(Codec):
...
@@ -429,7 +431,8 @@ class StreamReader(Codec):
'strict' - raise a ValueError (or a subclass)
'strict' - raise a ValueError (or a subclass)
'ignore' - ignore the character and continue with the next
'ignore' - ignore the character and continue with the next
'replace'- replace with a suitable replacement character;
'replace'- replace with a suitable replacement character
'backslashreplace' - Replace with backslashed escape sequences;
The set of allowed parameter values can be extended via
The set of allowed parameter values can be extended via
register_error.
register_error.
...
...
Lib/test/test_codeccallbacks.py
View file @
07985ef3
...
@@ -246,6 +246,11 @@ class CodecCallbackTest(unittest.TestCase):
...
@@ -246,6 +246,11 @@ class CodecCallbackTest(unittest.TestCase):
"
\
u0000
\
ufffd
"
"
\
u0000
\
ufffd
"
)
)
self
.
assertEqual
(
b"
\
x00
\
x00
\
x00
\
x00
\
x00
"
.
decode
(
"unicode-internal"
,
"backslashreplace"
),
"
\
u0000
\
\
x00"
)
codecs
.
register_error
(
"test.hui"
,
handler_unicodeinternal
)
codecs
.
register_error
(
"test.hui"
,
handler_unicodeinternal
)
self
.
assertEqual
(
self
.
assertEqual
(
...
@@ -565,17 +570,6 @@ class CodecCallbackTest(unittest.TestCase):
...
@@ -565,17 +570,6 @@ class CodecCallbackTest(unittest.TestCase):
codecs
.
backslashreplace_errors
,
codecs
.
backslashreplace_errors
,
UnicodeError
(
"ouch"
)
UnicodeError
(
"ouch"
)
)
)
# "backslashreplace" can only be used for encoding
self
.
assertRaises
(
TypeError
,
codecs
.
backslashreplace_errors
,
UnicodeDecodeError
(
"ascii"
,
bytearray
(
b"
\
xff
"
),
0
,
1
,
"ouch"
)
)
self
.
assertRaises
(
TypeError
,
codecs
.
backslashreplace_errors
,
UnicodeTranslateError
(
"
\
u3042
"
,
0
,
1
,
"ouch"
)
)
# Use the correct exception
# Use the correct exception
self
.
assertEqual
(
self
.
assertEqual
(
codecs
.
backslashreplace_errors
(
codecs
.
backslashreplace_errors
(
...
@@ -701,6 +695,16 @@ class CodecCallbackTest(unittest.TestCase):
...
@@ -701,6 +695,16 @@ class CodecCallbackTest(unittest.TestCase):
UnicodeEncodeError
(
"ascii"
,
"
\
udfff
"
,
0
,
1
,
"ouch"
)),
UnicodeEncodeError
(
"ascii"
,
"
\
udfff
"
,
0
,
1
,
"ouch"
)),
(
"
\
\
udfff"
,
1
)
(
"
\
\
udfff"
,
1
)
)
)
self
.
assertEqual
(
codecs
.
backslashreplace_errors
(
UnicodeDecodeError
(
"ascii"
,
bytearray
(
b"
\
xff
"
),
0
,
1
,
"ouch"
)),
(
"
\
\
xff"
,
1
)
)
self
.
assertEqual
(
codecs
.
backslashreplace_errors
(
UnicodeTranslateError
(
"
\
u3042
"
,
0
,
1
,
"ouch"
)),
(
"
\
\
u3042"
,
1
)
)
def
test_badhandlerresults
(
self
):
def
test_badhandlerresults
(
self
):
results
=
(
42
,
"foo"
,
(
1
,
2
,
3
),
(
"foo"
,
1
,
3
),
(
"foo"
,
None
),
(
"foo"
,),
(
"foo"
,
1
,
3
),
(
"foo"
,
None
),
(
"foo"
,)
)
results
=
(
42
,
"foo"
,
(
1
,
2
,
3
),
(
"foo"
,
1
,
3
),
(
"foo"
,
None
),
(
"foo"
,),
(
"foo"
,
1
,
3
),
(
"foo"
,
None
),
(
"foo"
,)
)
...
...
Lib/test/test_codecs.py
View file @
07985ef3
...
@@ -378,6 +378,10 @@ class ReadTest(MixInCheckStateHandling):
...
@@ -378,6 +378,10 @@ class ReadTest(MixInCheckStateHandling):
before
+
after
)
before
+
after
)
self
.
assertEqual
(
test_sequence
.
decode
(
self
.
encoding
,
"replace"
),
self
.
assertEqual
(
test_sequence
.
decode
(
self
.
encoding
,
"replace"
),
before
+
self
.
ill_formed_sequence_replace
+
after
)
before
+
self
.
ill_formed_sequence_replace
+
after
)
backslashreplace
=
''
.
join
(
'
\
\
x%02x'
%
b
for
b
in
self
.
ill_formed_sequence
)
self
.
assertEqual
(
test_sequence
.
decode
(
self
.
encoding
,
"backslashreplace"
),
before
+
backslashreplace
+
after
)
class
UTF32Test
(
ReadTest
,
unittest
.
TestCase
):
class
UTF32Test
(
ReadTest
,
unittest
.
TestCase
):
encoding
=
"utf-32"
encoding
=
"utf-32"
...
@@ -1300,14 +1304,19 @@ class UnicodeInternalTest(unittest.TestCase):
...
@@ -1300,14 +1304,19 @@ class UnicodeInternalTest(unittest.TestCase):
"unicode_internal"
)
"unicode_internal"
)
if
sys
.
byteorder
==
"little"
:
if
sys
.
byteorder
==
"little"
:
invalid
=
b"
\
x00
\
x00
\
x11
\
x00
"
invalid
=
b"
\
x00
\
x00
\
x11
\
x00
"
invalid_backslashreplace
=
r"\x00\x00\x11\x00"
else
:
else
:
invalid
=
b"
\
x00
\
x11
\
x00
\
x00
"
invalid
=
b"
\
x00
\
x11
\
x00
\
x00
"
invalid_backslashreplace
=
r"\x00\x11\x00\x00"
with
support
.
check_warnings
():
with
support
.
check_warnings
():
self
.
assertRaises
(
UnicodeDecodeError
,
self
.
assertRaises
(
UnicodeDecodeError
,
invalid
.
decode
,
"unicode_internal"
)
invalid
.
decode
,
"unicode_internal"
)
with
support
.
check_warnings
():
with
support
.
check_warnings
():
self
.
assertEqual
(
invalid
.
decode
(
"unicode_internal"
,
"replace"
),
self
.
assertEqual
(
invalid
.
decode
(
"unicode_internal"
,
"replace"
),
'
\
ufffd
'
)
'
\
ufffd
'
)
with
support
.
check_warnings
():
self
.
assertEqual
(
invalid
.
decode
(
"unicode_internal"
,
"backslashreplace"
),
invalid_backslashreplace
)
@
unittest
.
skipUnless
(
SIZEOF_WCHAR_T
==
4
,
'specific to 32-bit wchar_t'
)
@
unittest
.
skipUnless
(
SIZEOF_WCHAR_T
==
4
,
'specific to 32-bit wchar_t'
)
def
test_decode_error_attributes
(
self
):
def
test_decode_error_attributes
(
self
):
...
@@ -2042,6 +2051,16 @@ class CharmapTest(unittest.TestCase):
...
@@ -2042,6 +2051,16 @@ class CharmapTest(unittest.TestCase):
(
"ab
\
ufffd
"
,
3
)
(
"ab
\
ufffd
"
,
3
)
)
)
self
.
assertEqual
(
codecs
.
charmap_decode
(
b"
\
x00
\
x01
\
x02
"
,
"backslashreplace"
,
"ab"
),
(
"ab
\
\
x02"
,
3
)
)
self
.
assertEqual
(
codecs
.
charmap_decode
(
b"
\
x00
\
x01
\
x02
"
,
"backslashreplace"
,
"ab
\
ufffe
"
),
(
"ab
\
\
x02"
,
3
)
)
self
.
assertEqual
(
self
.
assertEqual
(
codecs
.
charmap_decode
(
b"
\
x00
\
x01
\
x02
"
,
"ignore"
,
"ab"
),
codecs
.
charmap_decode
(
b"
\
x00
\
x01
\
x02
"
,
"ignore"
,
"ab"
),
(
"ab"
,
3
)
(
"ab"
,
3
)
...
@@ -2118,6 +2137,25 @@ class CharmapTest(unittest.TestCase):
...
@@ -2118,6 +2137,25 @@ class CharmapTest(unittest.TestCase):
(
"ab
\
ufffd
"
,
3
)
(
"ab
\
ufffd
"
,
3
)
)
)
self
.
assertEqual
(
codecs
.
charmap_decode
(
b"
\
x00
\
x01
\
x02
"
,
"backslashreplace"
,
{
0
:
'a'
,
1
:
'b'
}),
(
"ab
\
\
x02"
,
3
)
)
self
.
assertEqual
(
codecs
.
charmap_decode
(
b"
\
x00
\
x01
\
x02
"
,
"backslashreplace"
,
{
0
:
'a'
,
1
:
'b'
,
2
:
None
}),
(
"ab
\
\
x02"
,
3
)
)
# Issue #14850
self
.
assertEqual
(
codecs
.
charmap_decode
(
b"
\
x00
\
x01
\
x02
"
,
"backslashreplace"
,
{
0
:
'a'
,
1
:
'b'
,
2
:
'
\
ufffe
'
}),
(
"ab
\
\
x02"
,
3
)
)
self
.
assertEqual
(
self
.
assertEqual
(
codecs
.
charmap_decode
(
b"
\
x00
\
x01
\
x02
"
,
"ignore"
,
codecs
.
charmap_decode
(
b"
\
x00
\
x01
\
x02
"
,
"ignore"
,
{
0
:
'a'
,
1
:
'b'
}),
{
0
:
'a'
,
1
:
'b'
}),
...
@@ -2194,6 +2232,18 @@ class CharmapTest(unittest.TestCase):
...
@@ -2194,6 +2232,18 @@ class CharmapTest(unittest.TestCase):
(
"ab
\
ufffd
"
,
3
)
(
"ab
\
ufffd
"
,
3
)
)
)
self
.
assertEqual
(
codecs
.
charmap_decode
(
b"
\
x00
\
x01
\
x02
"
,
"backslashreplace"
,
{
0
:
a
,
1
:
b
}),
(
"ab
\
\
x02"
,
3
)
)
self
.
assertEqual
(
codecs
.
charmap_decode
(
b"
\
x00
\
x01
\
x02
"
,
"backslashreplace"
,
{
0
:
a
,
1
:
b
,
2
:
0xFFFE
}),
(
"ab
\
\
x02"
,
3
)
)
self
.
assertEqual
(
self
.
assertEqual
(
codecs
.
charmap_decode
(
b"
\
x00
\
x01
\
x02
"
,
"ignore"
,
codecs
.
charmap_decode
(
b"
\
x00
\
x01
\
x02
"
,
"ignore"
,
{
0
:
a
,
1
:
b
}),
{
0
:
a
,
1
:
b
}),
...
@@ -2253,9 +2303,13 @@ class TypesTest(unittest.TestCase):
...
@@ -2253,9 +2303,13 @@ class TypesTest(unittest.TestCase):
self
.
assertRaises
(
UnicodeDecodeError
,
codecs
.
unicode_escape_decode
,
br"\U00110000"
)
self
.
assertRaises
(
UnicodeDecodeError
,
codecs
.
unicode_escape_decode
,
br"\U00110000"
)
self
.
assertEqual
(
codecs
.
unicode_escape_decode
(
r"\U00110000"
,
"replace"
),
(
"
\
ufffd
"
,
10
))
self
.
assertEqual
(
codecs
.
unicode_escape_decode
(
r"\U00110000"
,
"replace"
),
(
"
\
ufffd
"
,
10
))
self
.
assertEqual
(
codecs
.
unicode_escape_decode
(
r"\U00110000"
,
"backslashreplace"
),
(
r"\x5c\x55\x30\x30\x31\x31\x30\x30\x30\x30"
,
10
))
self
.
assertRaises
(
UnicodeDecodeError
,
codecs
.
raw_unicode_escape_decode
,
br"\U00110000"
)
self
.
assertRaises
(
UnicodeDecodeError
,
codecs
.
raw_unicode_escape_decode
,
br"\U00110000"
)
self
.
assertEqual
(
codecs
.
raw_unicode_escape_decode
(
r"\U00110000"
,
"replace"
),
(
"
\
ufffd
"
,
10
))
self
.
assertEqual
(
codecs
.
raw_unicode_escape_decode
(
r"\U00110000"
,
"replace"
),
(
"
\
ufffd
"
,
10
))
self
.
assertEqual
(
codecs
.
raw_unicode_escape_decode
(
r"\U00110000"
,
"backslashreplace"
),
(
r"\x5c\x55\x30\x30\x31\x31\x30\x30\x30\x30"
,
10
))
class
UnicodeEscapeTest
(
unittest
.
TestCase
):
class
UnicodeEscapeTest
(
unittest
.
TestCase
):
...
@@ -2894,11 +2948,13 @@ class CodePageTest(unittest.TestCase):
...
@@ -2894,11 +2948,13 @@ class CodePageTest(unittest.TestCase):
(b'[
\
xff
]', 'strict', None),
(b'[
\
xff
]', 'strict', None),
(b'[
\
xff
]', 'ignore', '[]'),
(b'[
\
xff
]', 'ignore', '[]'),
(b'[
\
xff
]', 'replace', '[
\
ufffd
]'),
(b'[
\
xff
]', 'replace', '[
\
ufffd
]'),
(b'[
\
xff
]', 'backslashreplace', '[
\
\
xff]'),
(b'[
\
xff
]', 'surrogateescape', '[
\
udcff
]'),
(b'[
\
xff
]', 'surrogateescape', '[
\
udcff
]'),
(b'[
\
xff
]', 'surrogatepass', None),
(b'[
\
xff
]', 'surrogatepass', None),
(b'
\
x81
\
x00
abc', 'strict', None),
(b'
\
x81
\
x00
abc', 'strict', None),
(b'
\
x81
\
x00
abc', 'ignore', '
\
x00
abc'),
(b'
\
x81
\
x00
abc', 'ignore', '
\
x00
abc'),
(b'
\
x81
\
x00
abc', 'replace', '
\
ufffd
\
x00
abc'),
(b'
\
x81
\
x00
abc', 'replace', '
\
ufffd
\
x00
abc'),
(b'
\
x81
\
x00
abc', 'backslashreplace', '
\
\
xff
\
x00
abc'),
))
))
def test_cp1252(self):
def test_cp1252(self):
...
...
Misc/NEWS
View file @
07985ef3
...
@@ -10,6 +10,9 @@ Release date: TBA
...
@@ -10,6 +10,9 @@ Release date: TBA
Core and Builtins
Core and Builtins
-----------------
-----------------
- Issue #22286: The "backslashreplace" error handlers now works with
decoding and translating.
- Issue #23253: Delay-load ShellExecute[AW] in os.startfile for reduced
- Issue #23253: Delay-load ShellExecute[AW] in os.startfile for reduced
startup overhead on Windows.
startup overhead on Windows.
...
...
Python/codecs.c
View file @
07985ef3
...
@@ -864,74 +864,112 @@ PyObject *PyCodec_XMLCharRefReplaceErrors(PyObject *exc)
...
@@ -864,74 +864,112 @@ PyObject *PyCodec_XMLCharRefReplaceErrors(PyObject *exc)
PyObject
*
PyCodec_BackslashReplaceErrors
(
PyObject
*
exc
)
PyObject
*
PyCodec_BackslashReplaceErrors
(
PyObject
*
exc
)
{
{
if
(
PyObject_IsInstance
(
exc
,
PyExc_UnicodeEncodeError
))
{
PyObject
*
object
;
PyObject
*
restuple
;
Py_ssize_t
i
;
PyObject
*
object
;
Py_ssize_t
start
;
Py_ssize_t
i
;
Py_ssize_t
end
;
Py_ssize_t
start
;
PyObject
*
res
;
Py_ssize_t
end
;
unsigned
char
*
outp
;
PyObject
*
res
;
int
ressize
;
unsigned
char
*
outp
;
Py_UCS4
c
;
Py_ssize_t
ressize
;
Py_UCS4
c
;
if
(
PyObject_IsInstance
(
exc
,
PyExc_UnicodeDecodeError
))
{
if
(
PyUnicodeEncodeError_GetStart
(
exc
,
&
start
))
unsigned
char
*
p
;
if
(
PyUnicodeDecodeError_GetStart
(
exc
,
&
start
))
return
NULL
;
return
NULL
;
if
(
PyUnicode
En
codeError_GetEnd
(
exc
,
&
end
))
if
(
PyUnicode
De
codeError_GetEnd
(
exc
,
&
end
))
return
NULL
;
return
NULL
;
if
(
!
(
object
=
PyUnicodeEncodeError_GetObject
(
exc
)))
if
(
!
(
object
=
PyUnicodeDecodeError_GetObject
(
exc
)))
return
NULL
;
if
(
!
(
p
=
(
unsigned
char
*
)
PyBytes_AsString
(
object
)))
{
Py_DECREF
(
object
);
return
NULL
;
return
NULL
;
if
(
end
-
start
>
PY_SSIZE_T_MAX
/
(
1
+
1
+
8
))
end
=
start
+
PY_SSIZE_T_MAX
/
(
1
+
1
+
8
);
for
(
i
=
start
,
ressize
=
0
;
i
<
end
;
++
i
)
{
/* object is guaranteed to be "ready" */
c
=
PyUnicode_READ_CHAR
(
object
,
i
);
if
(
c
>=
0x10000
)
{
ressize
+=
1
+
1
+
8
;
}
else
if
(
c
>=
0x100
)
{
ressize
+=
1
+
1
+
4
;
}
else
ressize
+=
1
+
1
+
2
;
}
}
res
=
PyUnicode_New
(
ressize
,
127
);
res
=
PyUnicode_New
(
4
*
(
end
-
start
)
,
127
);
if
(
res
==
NULL
)
{
if
(
res
==
NULL
)
{
Py_DECREF
(
object
);
Py_DECREF
(
object
);
return
NULL
;
return
NULL
;
}
}
for
(
i
=
start
,
outp
=
PyUnicode_1BYTE_DATA
(
res
);
outp
=
PyUnicode_1BYTE_DATA
(
res
);
i
<
end
;
++
i
)
{
for
(
i
=
start
;
i
<
end
;
i
++
,
outp
+=
4
)
{
c
=
PyUnicode_READ_CHAR
(
object
,
i
);
unsigned
char
c
=
p
[
i
];
*
outp
++
=
'\\'
;
outp
[
0
]
=
'\\'
;
if
(
c
>=
0x00010000
)
{
outp
[
1
]
=
'x'
;
*
outp
++
=
'U'
;
outp
[
2
]
=
Py_hexdigits
[(
c
>>
4
)
&
0xf
];
*
outp
++
=
Py_hexdigits
[(
c
>>
28
)
&
0xf
];
outp
[
3
]
=
Py_hexdigits
[
c
&
0xf
];
*
outp
++
=
Py_hexdigits
[(
c
>>
24
)
&
0xf
];
*
outp
++
=
Py_hexdigits
[(
c
>>
20
)
&
0xf
];
*
outp
++
=
Py_hexdigits
[(
c
>>
16
)
&
0xf
];
*
outp
++
=
Py_hexdigits
[(
c
>>
12
)
&
0xf
];
*
outp
++
=
Py_hexdigits
[(
c
>>
8
)
&
0xf
];
}
else
if
(
c
>=
0x100
)
{
*
outp
++
=
'u'
;
*
outp
++
=
Py_hexdigits
[(
c
>>
12
)
&
0xf
];
*
outp
++
=
Py_hexdigits
[(
c
>>
8
)
&
0xf
];
}
else
*
outp
++
=
'x'
;
*
outp
++
=
Py_hexdigits
[(
c
>>
4
)
&
0xf
];
*
outp
++
=
Py_hexdigits
[
c
&
0xf
];
}
}
assert
(
_PyUnicode_CheckConsistency
(
res
,
1
));
assert
(
_PyUnicode_CheckConsistency
(
res
,
1
));
restuple
=
Py_BuildValue
(
"(Nn)"
,
res
,
end
);
Py_DECREF
(
object
);
Py_DECREF
(
object
);
return
restuple
;
return
Py_BuildValue
(
"(Nn)"
,
res
,
end
);
}
if
(
PyObject_IsInstance
(
exc
,
PyExc_UnicodeEncodeError
))
{
if
(
PyUnicodeEncodeError_GetStart
(
exc
,
&
start
))
return
NULL
;
if
(
PyUnicodeEncodeError_GetEnd
(
exc
,
&
end
))
return
NULL
;
if
(
!
(
object
=
PyUnicodeEncodeError_GetObject
(
exc
)))
return
NULL
;
}
else
if
(
PyObject_IsInstance
(
exc
,
PyExc_UnicodeTranslateError
))
{
if
(
PyUnicodeTranslateError_GetStart
(
exc
,
&
start
))
return
NULL
;
if
(
PyUnicodeTranslateError_GetEnd
(
exc
,
&
end
))
return
NULL
;
if
(
!
(
object
=
PyUnicodeTranslateError_GetObject
(
exc
)))
return
NULL
;
}
}
else
{
else
{
wrong_exception_type
(
exc
);
wrong_exception_type
(
exc
);
return
NULL
;
return
NULL
;
}
}
if
(
end
-
start
>
PY_SSIZE_T_MAX
/
(
1
+
1
+
8
))
end
=
start
+
PY_SSIZE_T_MAX
/
(
1
+
1
+
8
);
for
(
i
=
start
,
ressize
=
0
;
i
<
end
;
++
i
)
{
/* object is guaranteed to be "ready" */
c
=
PyUnicode_READ_CHAR
(
object
,
i
);
if
(
c
>=
0x10000
)
{
ressize
+=
1
+
1
+
8
;
}
else
if
(
c
>=
0x100
)
{
ressize
+=
1
+
1
+
4
;
}
else
ressize
+=
1
+
1
+
2
;
}
res
=
PyUnicode_New
(
ressize
,
127
);
if
(
res
==
NULL
)
{
Py_DECREF
(
object
);
return
NULL
;
}
outp
=
PyUnicode_1BYTE_DATA
(
res
);
for
(
i
=
start
;
i
<
end
;
++
i
)
{
c
=
PyUnicode_READ_CHAR
(
object
,
i
);
*
outp
++
=
'\\'
;
if
(
c
>=
0x00010000
)
{
*
outp
++
=
'U'
;
*
outp
++
=
Py_hexdigits
[(
c
>>
28
)
&
0xf
];
*
outp
++
=
Py_hexdigits
[(
c
>>
24
)
&
0xf
];
*
outp
++
=
Py_hexdigits
[(
c
>>
20
)
&
0xf
];
*
outp
++
=
Py_hexdigits
[(
c
>>
16
)
&
0xf
];
*
outp
++
=
Py_hexdigits
[(
c
>>
12
)
&
0xf
];
*
outp
++
=
Py_hexdigits
[(
c
>>
8
)
&
0xf
];
}
else
if
(
c
>=
0x100
)
{
*
outp
++
=
'u'
;
*
outp
++
=
Py_hexdigits
[(
c
>>
12
)
&
0xf
];
*
outp
++
=
Py_hexdigits
[(
c
>>
8
)
&
0xf
];
}
else
*
outp
++
=
'x'
;
*
outp
++
=
Py_hexdigits
[(
c
>>
4
)
&
0xf
];
*
outp
++
=
Py_hexdigits
[
c
&
0xf
];
}
assert
(
_PyUnicode_CheckConsistency
(
res
,
1
));
Py_DECREF
(
object
);
return
Py_BuildValue
(
"(Nn)"
,
res
,
end
);
}
}
static
_PyUnicode_Name_CAPI
*
ucnhash_CAPI
=
NULL
;
static
_PyUnicode_Name_CAPI
*
ucnhash_CAPI
=
NULL
;
...
@@ -1444,8 +1482,8 @@ static int _PyCodecRegistry_Init(void)
...
@@ -1444,8 +1482,8 @@ static int _PyCodecRegistry_Init(void)
backslashreplace_errors
,
backslashreplace_errors
,
METH_O
,
METH_O
,
PyDoc_STR
(
"Implements the 'backslashreplace' error handling, "
PyDoc_STR
(
"Implements the 'backslashreplace' error handling, "
"which replaces
an unencodable character with a
"
"which replaces
malformed data with a backslashed
"
"
backslashed
escape sequence."
)
"escape sequence."
)
}
}
},
},
{
{
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment