Skip to content
Projects
Groups
Snippets
Help
Loading...
Help
Support
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
C
cpython
Project overview
Project overview
Details
Activity
Releases
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Issues
0
Issues
0
List
Boards
Labels
Milestones
Merge Requests
0
Merge Requests
0
Analytics
Analytics
Repository
Value Stream
Wiki
Wiki
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Create a new issue
Commits
Issue Boards
Open sidebar
Kirill Smelkov
cpython
Commits
3aeb632c
Commit
3aeb632c
authored
Sep 02, 2002
by
Walter Dörwald
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
PEP 293 implemention (from SF patch
http://www.python.org/sf/432401
)
parent
94fab762
Changes
12
Expand all
Hide whitespace changes
Inline
Side-by-side
Showing
12 changed files
with
2929 additions
and
556 deletions
+2929
-556
Doc/lib/libcodecs.tex
Doc/lib/libcodecs.tex
+38
-1
Doc/lib/libexcs.tex
Doc/lib/libexcs.tex
+21
-0
Include/codecs.h
Include/codecs.h
+30
-0
Include/pyerrors.h
Include/pyerrors.h
+66
-0
Lib/codecs.py
Lib/codecs.py
+12
-1
Lib/test/test_codeccallbacks.py
Lib/test/test_codeccallbacks.py
+483
-0
Misc/NEWS
Misc/NEWS
+3
-0
Modules/_codecsmodule.c
Modules/_codecsmodule.c
+28
-0
Objects/stringobject.c
Objects/stringobject.c
+6
-2
Objects/unicodeobject.c
Objects/unicodeobject.c
+1240
-552
Python/codecs.c
Python/codecs.c
+399
-0
Python/exceptions.c
Python/exceptions.c
+603
-0
No files found.
Doc/lib/libcodecs.tex
View file @
3aeb632c
...
...
@@ -17,7 +17,7 @@
This module defines base classes for standard Python codecs (encoders
and decoders) and provides access to the internal Python codec
registry which manages the codec lookup process.
registry which manages the codec
and error handling
lookup process.
It defines the following functions:
...
...
@@ -98,6 +98,43 @@ Raises a \exception{LookupError} in case the encoding cannot be found.
To simplify working with encoded files or stream, the module
also defines these utility functions:
\begin{funcdesc}
{
register
_
error
}{
name, error
_
handler
}
Register the error handling function
\var
{
error
_
handler
}
under the
name
\var
{
name
}
.
\vari
{
error
_
handler
}
will be called during encoding
and decoding in case of an error, when
\var
{
name
}
is specified as the
errors parameter.
\var
{
error
_
handler
}
will be called with an
\exception
{
UnicodeEncodeError
}
,
\exception
{
UnicodeDecodeError
}
or
\exception
{
UnicodeTranslateError
}
instance and must return a tuple
with a replacement for the unencodable/undecodable part of the input
and a position where encoding/decoding should continue.
\end{funcdesc}
\begin{funcdesc}
{
lookup
_
error
}{
name
}
Return the error handler previously register under the name
\var
{
name
}
.
Raises a
\exception
{
LookupError
}
in case the handler cannot be found.
\end{funcdesc}
\begin{funcdesc}
{
strict
_
errors
}{
exception
}
Implements the
\code
{
strict
}
error handling.
\end{funcdesc}
\begin{funcdesc}
{
replace
_
errors
}{
exception
}
Implements the
\code
{
replace
}
error handling.
\end{funcdesc}
\begin{funcdesc}
{
ignore
_
errors
}{
exception
}
Implements the
\code
{
ignore
}
error handling.
\end{funcdesc}
\begin{funcdesc}
{
xmlcharrefreplace
_
errors
_
errors
}{
exception
}
Implements the
\code
{
xmlcharrefreplace
}
error handling.
\end{funcdesc}
\begin{funcdesc}
{
backslashreplace
_
errors
_
errors
}{
exception
}
Implements the
\code
{
backslashreplace
}
error handling.
\end{funcdesc}
\begin{funcdesc}
{
open
}{
filename, mode
\optional
{
, encoding
\optional
{
,
errors
\optional
{
, buffering
}}}}
Open an encoded file using the given
\var
{
mode
}
and return
...
...
Doc/lib/libexcs.tex
View file @
3aeb632c
...
...
@@ -335,6 +335,24 @@ Raised when an \keyword{assert} statement fails.
\versionadded
{
2.0
}
\end{excdesc}
\begin{excdesc}
{
UnicodeEncodeError
}
Raised when a Unicode-related error occurs during encoding. It
is a subclass of
\exception
{
UnicodeError
}
.
\versionadded
{
2.3
}
\end{excdesc}
\begin{excdesc}
{
UnicodeDecodeError
}
Raised when a Unicode-related error occurs during decoding. It
is a subclass of
\exception
{
UnicodeError
}
.
\versionadded
{
2.3
}
\end{excdesc}
\begin{excdesc}
{
UnicodeTranslateError
}
Raised when a Unicode-related error occurs during translating. It
is a subclass of
\exception
{
UnicodeError
}
.
\versionadded
{
2.3
}
\end{excdesc}
\begin{excdesc}
{
ValueError
}
Raised when a built-in operation or function receives an argument
that has the right type but an inappropriate value, and the
...
...
@@ -426,6 +444,9 @@ The class hierarchy for built-in exceptions is:
| | +-- FloatingPointError
| +-- ValueError
| | +-- UnicodeError
| | +-- UnicodeEncodeError
| | +-- UnicodeDecodeError
| | +-- UnicodeTranslateError
| +-- ReferenceError
| +-- SystemError
| +-- MemoryError
...
...
Include/codecs.h
View file @
3aeb632c
...
...
@@ -117,6 +117,36 @@ PyAPI_FUNC(PyObject *) PyCodec_StreamWriter(
const
char
*
errors
);
/* Unicode encoding error handling callback registry API */
/* Register the error handling callback function error under the name
name. This function will be called by the codec when it encounters
unencodable characters/undecodable bytes and doesn't know the
callback name, when name is specified as the error parameter
in the call to the encode/decode function.
Return 0 on success, -1 on error */
PyAPI_FUNC
(
int
)
PyCodec_RegisterError
(
const
char
*
name
,
PyObject
*
error
);
/* Lookup the error handling callback function registered under the
name error. As a special case NULL can be passed, in which case
the error handling callback for "strict" will be returned. */
PyAPI_FUNC
(
PyObject
*
)
PyCodec_LookupError
(
const
char
*
name
);
/* raise exc as an exception */
PyAPI_FUNC
(
PyObject
*
)
PyCodec_StrictErrors
(
PyObject
*
exc
);
/* ignore the unicode error, skipping the faulty input */
PyAPI_FUNC
(
PyObject
*
)
PyCodec_IgnoreErrors
(
PyObject
*
exc
);
/* replace the unicode error with ? or U+FFFD */
PyAPI_FUNC
(
PyObject
*
)
PyCodec_ReplaceErrors
(
PyObject
*
exc
);
/* replace the unicode encode error with XML character references */
PyAPI_FUNC
(
PyObject
*
)
PyCodec_XMLCharRefReplaceErrors
(
PyObject
*
exc
);
/* replace the unicode encode error with backslash escapes (\x, \u and \U) */
PyAPI_FUNC
(
PyObject
*
)
PyCodec_BackslashReplaceErrors
(
PyObject
*
exc
);
#ifdef __cplusplus
}
#endif
...
...
Include/pyerrors.h
View file @
3aeb632c
...
...
@@ -54,6 +54,9 @@ PyAPI_DATA(PyObject *) PyExc_SystemExit;
PyAPI_DATA
(
PyObject
*
)
PyExc_TypeError
;
PyAPI_DATA
(
PyObject
*
)
PyExc_UnboundLocalError
;
PyAPI_DATA
(
PyObject
*
)
PyExc_UnicodeError
;
PyAPI_DATA
(
PyObject
*
)
PyExc_UnicodeEncodeError
;
PyAPI_DATA
(
PyObject
*
)
PyExc_UnicodeDecodeError
;
PyAPI_DATA
(
PyObject
*
)
PyExc_UnicodeTranslateError
;
PyAPI_DATA
(
PyObject
*
)
PyExc_ValueError
;
PyAPI_DATA
(
PyObject
*
)
PyExc_ZeroDivisionError
;
#ifdef MS_WINDOWS
...
...
@@ -114,6 +117,69 @@ PyAPI_FUNC(void) PyErr_SetInterrupt(void);
PyAPI_FUNC
(
void
)
PyErr_SyntaxLocation
(
char
*
,
int
);
PyAPI_FUNC
(
PyObject
*
)
PyErr_ProgramText
(
char
*
,
int
);
/* The following functions are used to create and modify unicode
exceptions from C */
/* create a UnicodeDecodeError object */
PyAPI_FUNC
(
PyObject
*
)
PyUnicodeDecodeError_Create
(
const
char
*
,
const
char
*
,
int
,
int
,
int
,
const
char
*
);
/* create a UnicodeEncodeError object */
PyAPI_FUNC
(
PyObject
*
)
PyUnicodeEncodeError_Create
(
const
char
*
,
const
Py_UNICODE
*
,
int
,
int
,
int
,
const
char
*
);
/* create a UnicodeTranslateError object */
PyAPI_FUNC
(
PyObject
*
)
PyUnicodeTranslateError_Create
(
const
Py_UNICODE
*
,
int
,
int
,
int
,
const
char
*
);
/* get the encoding attribute */
PyAPI_FUNC
(
PyObject
*
)
PyUnicodeEncodeError_GetEncoding
(
PyObject
*
);
PyAPI_FUNC
(
PyObject
*
)
PyUnicodeDecodeError_GetEncoding
(
PyObject
*
);
PyAPI_FUNC
(
PyObject
*
)
PyUnicodeTranslateError_GetEncoding
(
PyObject
*
);
/* get the object attribute */
PyAPI_FUNC
(
PyObject
*
)
PyUnicodeEncodeError_GetObject
(
PyObject
*
);
PyAPI_FUNC
(
PyObject
*
)
PyUnicodeDecodeError_GetObject
(
PyObject
*
);
PyAPI_FUNC
(
PyObject
*
)
PyUnicodeTranslateError_GetObject
(
PyObject
*
);
/* get the value of the start attribute (the int * may not be NULL)
return 0 on success, -1 on failure */
PyAPI_FUNC
(
int
)
PyUnicodeEncodeError_GetStart
(
PyObject
*
,
int
*
);
PyAPI_FUNC
(
int
)
PyUnicodeDecodeError_GetStart
(
PyObject
*
,
int
*
);
PyAPI_FUNC
(
int
)
PyUnicodeTranslateError_GetStart
(
PyObject
*
,
int
*
);
/* assign a new value to the start attribute
return 0 on success, -1 on failure */
PyAPI_FUNC
(
int
)
PyUnicodeEncodeError_SetStart
(
PyObject
*
,
int
);
PyAPI_FUNC
(
int
)
PyUnicodeDecodeError_SetStart
(
PyObject
*
,
int
);
PyAPI_FUNC
(
int
)
PyUnicodeTranslateError_SetStart
(
PyObject
*
,
int
);
/* get the value of the end attribute (the int *may not be NULL)
return 0 on success, -1 on failure */
PyAPI_FUNC
(
int
)
PyUnicodeEncodeError_GetEnd
(
PyObject
*
,
int
*
);
PyAPI_FUNC
(
int
)
PyUnicodeDecodeError_GetEnd
(
PyObject
*
,
int
*
);
PyAPI_FUNC
(
int
)
PyUnicodeTranslateError_GetEnd
(
PyObject
*
,
int
*
);
/* assign a new value to the end attribute
return 0 on success, -1 on failure */
PyAPI_FUNC
(
int
)
PyUnicodeEncodeError_SetEnd
(
PyObject
*
,
int
);
PyAPI_FUNC
(
int
)
PyUnicodeDecodeError_SetEnd
(
PyObject
*
,
int
);
PyAPI_FUNC
(
int
)
PyUnicodeTranslateError_SetEnd
(
PyObject
*
,
int
);
/* get the value of the reason attribute */
PyAPI_FUNC
(
PyObject
*
)
PyUnicodeEncodeError_GetReason
(
PyObject
*
);
PyAPI_FUNC
(
PyObject
*
)
PyUnicodeDecodeError_GetReason
(
PyObject
*
);
PyAPI_FUNC
(
PyObject
*
)
PyUnicodeTranslateError_GetReason
(
PyObject
*
);
/* assign a new value to the reason attribute
return 0 on success, -1 on failure */
PyAPI_FUNC
(
int
)
PyUnicodeEncodeError_SetReason
(
PyObject
*
,
const
char
*
);
PyAPI_FUNC
(
int
)
PyUnicodeDecodeError_SetReason
(
PyObject
*
,
const
char
*
);
PyAPI_FUNC
(
int
)
PyUnicodeTranslateError_SetReason
(
PyObject
*
,
const
char
*
);
/* These APIs aren't really part of the error implementation, but
often needed to format error messages; the native C lib APIs are
not available on all platforms, which is why we provide emulations
...
...
Lib/codecs.py
View file @
3aeb632c
...
...
@@ -20,7 +20,10 @@ except ImportError, why:
__all__
=
[
"register"
,
"lookup"
,
"open"
,
"EncodedFile"
,
"BOM"
,
"BOM_BE"
,
"BOM_LE"
,
"BOM32_BE"
,
"BOM32_LE"
,
"BOM64_BE"
,
"BOM64_LE"
,
"BOM_UTF8"
,
"BOM_UTF16"
,
"BOM_UTF16_LE"
,
"BOM_UTF16_BE"
,
"BOM_UTF32"
,
"BOM_UTF32_LE"
,
"BOM_UTF32_BE"
]
"BOM_UTF32"
,
"BOM_UTF32_LE"
,
"BOM_UTF32_BE"
,
"strict_errors"
,
"ignore_errors"
,
"replace_errors"
,
"xmlcharrefreplace_errors"
,
"register_error"
,
"lookup_error"
]
### Constants
...
...
@@ -632,6 +635,14 @@ def make_encoding_map(decoding_map):
m
[
v
]
=
None
return
m
### error handlers
strict_errors
=
lookup_error
(
"strict"
)
ignore_errors
=
lookup_error
(
"ignore"
)
replace_errors
=
lookup_error
(
"replace"
)
xmlcharrefreplace_errors
=
lookup_error
(
"xmlcharrefreplace"
)
backslashreplace_errors
=
lookup_error
(
"backslashreplace"
)
# Tell modulefinder that using codecs probably needs the encodings
# package
_false
=
0
...
...
Lib/test/test_codeccallbacks.py
0 → 100644
View file @
3aeb632c
This diff is collapsed.
Click to expand it.
Misc/NEWS
View file @
3aeb632c
...
...
@@ -57,6 +57,9 @@ Type/class unification and new-style classes
Core and builtins
- Codec error handling callbacks (PEP 293) are implemented.
Error handling in unicode.encode or str.decode can now be customized.
- A subtle change to the semantics of the built-in function intern():
interned strings are no longer immortal. You must keep a reference
to the return value intern() around to get the benefit.
...
...
Modules/_codecsmodule.c
View file @
3aeb632c
...
...
@@ -706,6 +706,32 @@ mbcs_encode(PyObject *self,
#endif
/* MS_WINDOWS */
#endif
/* Py_USING_UNICODE */
/* --- Error handler registry --------------------------------------------- */
static
PyObject
*
register_error
(
PyObject
*
self
,
PyObject
*
args
)
{
const
char
*
name
;
PyObject
*
handler
;
if
(
!
PyArg_ParseTuple
(
args
,
"sO:register_error"
,
&
name
,
&
handler
))
return
NULL
;
if
(
PyCodec_RegisterError
(
name
,
handler
))
return
NULL
;
Py_INCREF
(
Py_None
);
return
Py_None
;
}
static
PyObject
*
lookup_error
(
PyObject
*
self
,
PyObject
*
args
)
{
const
char
*
name
;
if
(
!
PyArg_ParseTuple
(
args
,
"s:lookup_error"
,
&
name
))
return
NULL
;
return
PyCodec_LookupError
(
name
);
}
/* --- Module API --------------------------------------------------------- */
static
PyMethodDef
_codecs_functions
[]
=
{
...
...
@@ -744,6 +770,8 @@ static PyMethodDef _codecs_functions[] = {
{
"mbcs_decode"
,
mbcs_decode
,
METH_VARARGS
},
#endif
#endif
/* Py_USING_UNICODE */
{
"register_error"
,
register_error
,
METH_VARARGS
},
{
"lookup_error"
,
lookup_error
,
METH_VARARGS
},
{
NULL
,
NULL
}
/* sentinel */
};
...
...
Objects/stringobject.c
View file @
3aeb632c
...
...
@@ -2468,7 +2468,9 @@ PyDoc_STRVAR(encode__doc__,
Encodes S using the codec registered for encoding. encoding defaults
\n
\
to the default encoding. errors may be given to set a different error
\n
\
handling scheme. Default is 'strict' meaning that encoding errors raise
\n
\
a ValueError. Other possible values are 'ignore' and 'replace'."
);
a UnicodeEncodeError. Other possible values are 'ignore', 'replace' and
\n
\
'xmlcharrefreplace' as well as any other name registered with
\n
\
codecs.register_error that is able to handle UnicodeEncodeErrors."
);
static
PyObject
*
string_encode
(
PyStringObject
*
self
,
PyObject
*
args
)
...
...
@@ -2487,7 +2489,9 @@ PyDoc_STRVAR(decode__doc__,
Decodes S using the codec registered for encoding. encoding defaults
\n
\
to the default encoding. errors may be given to set a different error
\n
\
handling scheme. Default is 'strict' meaning that encoding errors raise
\n
\
a ValueError. Other possible values are 'ignore' and 'replace'."
);
a UnicodeDecodeError. Other possible values are 'ignore' and 'replace'
\n
\
as well as any other name registerd with codecs.register_error that is
\n
\
able to handle UnicodeDecodeErrors."
);
static
PyObject
*
string_decode
(
PyStringObject
*
self
,
PyObject
*
args
)
...
...
Objects/unicodeobject.c
View file @
3aeb632c
This diff is collapsed.
Click to expand it.
Python/codecs.c
View file @
3aeb632c
...
...
@@ -422,12 +422,409 @@ PyObject *PyCodec_Decode(PyObject *object,
return
NULL
;
}
static
PyObject
*
_PyCodec_ErrorRegistry
;
/* Register the error handling callback function error under the name
name. This function will be called by the codec when it encounters
an unencodable characters/undecodable bytes and doesn't know the
callback name, when name is specified as the error parameter
in the call to the encode/decode function.
Return 0 on success, -1 on error */
int
PyCodec_RegisterError
(
const
char
*
name
,
PyObject
*
error
)
{
if
(
!
PyCallable_Check
(
error
))
{
PyErr_SetString
(
PyExc_TypeError
,
"handler must be callable"
);
return
-
1
;
}
return
PyDict_SetItemString
(
_PyCodec_ErrorRegistry
,
(
char
*
)
name
,
error
);
}
/* Lookup the error handling callback function registered under the
name error. As a special case NULL can be passed, in which case
the error handling callback for strict encoding will be returned. */
PyObject
*
PyCodec_LookupError
(
const
char
*
name
)
{
PyObject
*
handler
=
NULL
;
if
(
name
==
NULL
)
name
=
"strict"
;
handler
=
PyDict_GetItemString
(
_PyCodec_ErrorRegistry
,
(
char
*
)
name
);
if
(
!
handler
)
PyErr_Format
(
PyExc_LookupError
,
"unknown error handler name '%.400s'"
,
name
);
else
Py_INCREF
(
handler
);
return
handler
;
}
static
void
wrong_exception_type
(
PyObject
*
exc
)
{
PyObject
*
type
=
PyObject_GetAttrString
(
exc
,
"__class__"
);
if
(
type
!=
NULL
)
{
PyObject
*
name
=
PyObject_GetAttrString
(
type
,
"__name__"
);
Py_DECREF
(
type
);
if
(
name
!=
NULL
)
{
PyObject
*
string
=
PyObject_Str
(
name
);
Py_DECREF
(
name
);
PyErr_Format
(
PyExc_TypeError
,
"don't know how to handle %.400s in error callback"
,
PyString_AS_STRING
(
string
));
Py_DECREF
(
string
);
}
}
}
PyObject
*
PyCodec_StrictErrors
(
PyObject
*
exc
)
{
if
(
PyInstance_Check
(
exc
))
PyErr_SetObject
((
PyObject
*
)((
PyInstanceObject
*
)
exc
)
->
in_class
,
exc
);
else
PyErr_SetString
(
PyExc_TypeError
,
"codec must pass exception instance"
);
return
NULL
;
}
PyObject
*
PyCodec_IgnoreErrors
(
PyObject
*
exc
)
{
int
end
;
if
(
PyObject_IsInstance
(
exc
,
PyExc_UnicodeEncodeError
))
{
if
(
PyUnicodeEncodeError_GetEnd
(
exc
,
&
end
))
return
NULL
;
}
else
if
(
PyObject_IsInstance
(
exc
,
PyExc_UnicodeDecodeError
))
{
if
(
PyUnicodeDecodeError_GetEnd
(
exc
,
&
end
))
return
NULL
;
}
else
if
(
PyObject_IsInstance
(
exc
,
PyExc_UnicodeTranslateError
))
{
if
(
PyUnicodeTranslateError_GetEnd
(
exc
,
&
end
))
return
NULL
;
}
else
{
wrong_exception_type
(
exc
);
return
NULL
;
}
/* ouch: passing NULL, 0, pos gives None instead of u'' */
return
Py_BuildValue
(
"(u#i)"
,
&
end
,
0
,
end
);
}
PyObject
*
PyCodec_ReplaceErrors
(
PyObject
*
exc
)
{
PyObject
*
restuple
;
int
start
;
int
end
;
int
i
;
if
(
PyObject_IsInstance
(
exc
,
PyExc_UnicodeEncodeError
))
{
PyObject
*
res
;
Py_UNICODE
*
p
;
if
(
PyUnicodeEncodeError_GetStart
(
exc
,
&
start
))
return
NULL
;
if
(
PyUnicodeEncodeError_GetEnd
(
exc
,
&
end
))
return
NULL
;
res
=
PyUnicode_FromUnicode
(
NULL
,
end
-
start
);
if
(
res
==
NULL
)
return
NULL
;
for
(
p
=
PyUnicode_AS_UNICODE
(
res
),
i
=
start
;
i
<
end
;
++
p
,
++
i
)
*
p
=
'?'
;
restuple
=
Py_BuildValue
(
"(Oi)"
,
res
,
end
);
Py_DECREF
(
res
);
return
restuple
;
}
else
if
(
PyObject_IsInstance
(
exc
,
PyExc_UnicodeDecodeError
))
{
Py_UNICODE
res
=
Py_UNICODE_REPLACEMENT_CHARACTER
;
if
(
PyUnicodeDecodeError_GetEnd
(
exc
,
&
end
))
return
NULL
;
return
Py_BuildValue
(
"(u#i)"
,
&
res
,
1
,
end
);
}
else
if
(
PyObject_IsInstance
(
exc
,
PyExc_UnicodeTranslateError
))
{
PyObject
*
res
;
Py_UNICODE
*
p
;
if
(
PyUnicodeTranslateError_GetStart
(
exc
,
&
start
))
return
NULL
;
if
(
PyUnicodeTranslateError_GetEnd
(
exc
,
&
end
))
return
NULL
;
res
=
PyUnicode_FromUnicode
(
NULL
,
end
-
start
);
if
(
res
==
NULL
)
return
NULL
;
for
(
p
=
PyUnicode_AS_UNICODE
(
res
),
i
=
start
;
i
<
end
;
++
p
,
++
i
)
*
p
=
Py_UNICODE_REPLACEMENT_CHARACTER
;
restuple
=
Py_BuildValue
(
"(Oi)"
,
res
,
end
);
Py_DECREF
(
res
);
return
restuple
;
}
else
{
wrong_exception_type
(
exc
);
return
NULL
;
}
}
PyObject
*
PyCodec_XMLCharRefReplaceErrors
(
PyObject
*
exc
)
{
if
(
PyObject_IsInstance
(
exc
,
PyExc_UnicodeEncodeError
))
{
PyObject
*
restuple
;
PyObject
*
object
;
int
start
;
int
end
;
PyObject
*
res
;
Py_UNICODE
*
p
;
Py_UNICODE
*
startp
;
Py_UNICODE
*
outp
;
int
ressize
;
if
(
PyUnicodeEncodeError_GetStart
(
exc
,
&
start
))
return
NULL
;
if
(
PyUnicodeEncodeError_GetEnd
(
exc
,
&
end
))
return
NULL
;
if
(
!
(
object
=
PyUnicodeEncodeError_GetObject
(
exc
)))
return
NULL
;
startp
=
PyUnicode_AS_UNICODE
(
object
);
for
(
p
=
startp
+
start
,
ressize
=
0
;
p
<
startp
+
end
;
++
p
)
{
if
(
*
p
<
10
)
ressize
+=
2
+
1
+
1
;
else
if
(
*
p
<
100
)
ressize
+=
2
+
2
+
1
;
else
if
(
*
p
<
1000
)
ressize
+=
2
+
3
+
1
;
else
if
(
*
p
<
10000
)
ressize
+=
2
+
4
+
1
;
else
if
(
*
p
<
100000
)
ressize
+=
2
+
5
+
1
;
else
if
(
*
p
<
1000000
)
ressize
+=
2
+
6
+
1
;
else
ressize
+=
2
+
7
+
1
;
}
/* allocate replacement */
res
=
PyUnicode_FromUnicode
(
NULL
,
ressize
);
if
(
res
==
NULL
)
{
Py_DECREF
(
object
);
return
NULL
;
}
/* generate replacement */
for
(
p
=
startp
+
start
,
outp
=
PyUnicode_AS_UNICODE
(
res
);
p
<
startp
+
end
;
++
p
)
{
Py_UNICODE
c
=
*
p
;
int
digits
;
int
base
;
*
outp
++
=
'&'
;
*
outp
++
=
'#'
;
if
(
*
p
<
10
)
{
digits
=
1
;
base
=
1
;
}
else
if
(
*
p
<
100
)
{
digits
=
2
;
base
=
10
;
}
else
if
(
*
p
<
1000
)
{
digits
=
3
;
base
=
100
;
}
else
if
(
*
p
<
10000
)
{
digits
=
4
;
base
=
1000
;
}
else
if
(
*
p
<
100000
)
{
digits
=
5
;
base
=
10000
;
}
else
if
(
*
p
<
1000000
)
{
digits
=
6
;
base
=
100000
;
}
else
{
digits
=
7
;
base
=
1000000
;
}
while
(
digits
-->
0
)
{
*
outp
++
=
'0'
+
c
/
base
;
c
%=
base
;
base
/=
10
;
}
*
outp
++
=
';'
;
}
restuple
=
Py_BuildValue
(
"(Oi)"
,
res
,
end
);
Py_DECREF
(
res
);
Py_DECREF
(
object
);
return
restuple
;
}
else
{
wrong_exception_type
(
exc
);
return
NULL
;
}
}
static
Py_UNICODE
hexdigits
[]
=
{
'0'
,
'1'
,
'2'
,
'3'
,
'4'
,
'5'
,
'6'
,
'7'
,
'8'
,
'9'
,
'a'
,
'b'
,
'c'
,
'd'
,
'e'
,
'f'
};
PyObject
*
PyCodec_BackslashReplaceErrors
(
PyObject
*
exc
)
{
if
(
PyObject_IsInstance
(
exc
,
PyExc_UnicodeEncodeError
))
{
PyObject
*
restuple
;
PyObject
*
object
;
int
start
;
int
end
;
PyObject
*
res
;
Py_UNICODE
*
p
;
Py_UNICODE
*
startp
;
Py_UNICODE
*
outp
;
int
ressize
;
if
(
PyUnicodeEncodeError_GetStart
(
exc
,
&
start
))
return
NULL
;
if
(
PyUnicodeEncodeError_GetEnd
(
exc
,
&
end
))
return
NULL
;
if
(
!
(
object
=
PyUnicodeEncodeError_GetObject
(
exc
)))
return
NULL
;
startp
=
PyUnicode_AS_UNICODE
(
object
);
for
(
p
=
startp
+
start
,
ressize
=
0
;
p
<
startp
+
end
;
++
p
)
{
if
(
*
p
>=
0x00010000
)
ressize
+=
1
+
1
+
8
;
else
if
(
*
p
>=
0x100
)
{
ressize
+=
1
+
1
+
4
;
}
else
ressize
+=
1
+
1
+
2
;
}
res
=
PyUnicode_FromUnicode
(
NULL
,
ressize
);
if
(
res
==
NULL
)
return
NULL
;
for
(
p
=
startp
+
start
,
outp
=
PyUnicode_AS_UNICODE
(
res
);
p
<
startp
+
end
;
++
p
)
{
Py_UNICODE
c
=
*
p
;
*
outp
++
=
'\\'
;
if
(
c
>=
0x00010000
)
{
*
outp
++
=
'U'
;
*
outp
++
=
hexdigits
[(
c
>>
28
)
&
0xf
];
*
outp
++
=
hexdigits
[(
c
>>
24
)
&
0xf
];
*
outp
++
=
hexdigits
[(
c
>>
20
)
&
0xf
];
*
outp
++
=
hexdigits
[(
c
>>
16
)
&
0xf
];
*
outp
++
=
hexdigits
[(
c
>>
12
)
&
0xf
];
*
outp
++
=
hexdigits
[(
c
>>
8
)
&
0xf
];
}
else
if
(
c
>=
0x100
)
{
*
outp
++
=
'u'
;
*
outp
++
=
hexdigits
[(
c
>>
12
)
&
0xf
];
*
outp
++
=
hexdigits
[(
c
>>
8
)
&
0xf
];
}
else
*
outp
++
=
'x'
;
*
outp
++
=
hexdigits
[(
c
>>
4
)
&
0xf
];
*
outp
++
=
hexdigits
[
c
&
0xf
];
}
restuple
=
Py_BuildValue
(
"(Oi)"
,
res
,
end
);
Py_DECREF
(
res
);
Py_DECREF
(
object
);
return
restuple
;
}
else
{
wrong_exception_type
(
exc
);
return
NULL
;
}
}
static
PyObject
*
strict_errors
(
PyObject
*
self
,
PyObject
*
exc
)
{
return
PyCodec_StrictErrors
(
exc
);
}
static
PyObject
*
ignore_errors
(
PyObject
*
self
,
PyObject
*
exc
)
{
return
PyCodec_IgnoreErrors
(
exc
);
}
static
PyObject
*
replace_errors
(
PyObject
*
self
,
PyObject
*
exc
)
{
return
PyCodec_ReplaceErrors
(
exc
);
}
static
PyObject
*
xmlcharrefreplace_errors
(
PyObject
*
self
,
PyObject
*
exc
)
{
return
PyCodec_XMLCharRefReplaceErrors
(
exc
);
}
static
PyObject
*
backslashreplace_errors
(
PyObject
*
self
,
PyObject
*
exc
)
{
return
PyCodec_BackslashReplaceErrors
(
exc
);
}
void
_PyCodecRegistry_Init
(
void
)
{
static
struct
{
char
*
name
;
PyMethodDef
def
;
}
methods
[]
=
{
{
"strict"
,
{
"strict_errors"
,
strict_errors
,
METH_O
}
},
{
"ignore"
,
{
"ignore_errors"
,
ignore_errors
,
METH_O
}
},
{
"replace"
,
{
"replace_errors"
,
replace_errors
,
METH_O
}
},
{
"xmlcharrefreplace"
,
{
"xmlcharrefreplace_errors"
,
xmlcharrefreplace_errors
,
METH_O
}
},
{
"backslashreplace"
,
{
"backslashreplace_errors"
,
backslashreplace_errors
,
METH_O
}
}
};
if
(
_PyCodec_SearchPath
==
NULL
)
_PyCodec_SearchPath
=
PyList_New
(
0
);
if
(
_PyCodec_SearchCache
==
NULL
)
_PyCodec_SearchCache
=
PyDict_New
();
if
(
_PyCodec_ErrorRegistry
==
NULL
)
{
int
i
;
_PyCodec_ErrorRegistry
=
PyDict_New
();
if
(
_PyCodec_ErrorRegistry
)
{
for
(
i
=
0
;
i
<
5
;
++
i
)
{
PyObject
*
func
=
PyCFunction_New
(
&
methods
[
i
].
def
,
NULL
);
int
res
;
if
(
!
func
)
Py_FatalError
(
"can't initialize codec error registry"
);
res
=
PyCodec_RegisterError
(
methods
[
i
].
name
,
func
);
Py_DECREF
(
func
);
if
(
res
)
Py_FatalError
(
"can't initialize codec error registry"
);
}
}
}
if
(
_PyCodec_SearchPath
==
NULL
||
_PyCodec_SearchCache
==
NULL
)
Py_FatalError
(
"can't initialize codec registry"
);
...
...
@@ -439,4 +836,6 @@ void _PyCodecRegistry_Fini(void)
_PyCodec_SearchPath
=
NULL
;
Py_XDECREF
(
_PyCodec_SearchCache
);
_PyCodec_SearchCache
=
NULL
;
Py_XDECREF
(
_PyCodec_ErrorRegistry
);
_PyCodec_ErrorRegistry
=
NULL
;
}
Python/exceptions.c
View file @
3aeb632c
This diff is collapsed.
Click to expand it.
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment