Skip to content
Projects
Groups
Snippets
Help
Loading...
Help
Support
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
C
cpython
Project overview
Project overview
Details
Activity
Releases
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Issues
0
Issues
0
List
Boards
Labels
Milestones
Merge Requests
0
Merge Requests
0
Analytics
Analytics
Repository
Value Stream
Wiki
Wiki
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Create a new issue
Commits
Issue Boards
Open sidebar
Kirill Smelkov
cpython
Commits
812bc1b8
Commit
812bc1b8
authored
May 13, 2015
by
R David Murray
Browse files
Options
Browse Files
Download
Plain Diff
Merge: #23088: Clarify null termination of bytes and strings in C API.
parents
b01a1fdb
0a560a11
Changes
3
Hide whitespace changes
Inline
Side-by-side
Showing
3 changed files
with
44 additions
and
31 deletions
+44
-31
Doc/c-api/bytearray.rst
Doc/c-api/bytearray.rst
+2
-1
Doc/c-api/bytes.rst
Doc/c-api/bytes.rst
+18
-14
Doc/c-api/unicode.rst
Doc/c-api/unicode.rst
+24
-16
No files found.
Doc/c-api/bytearray.rst
View file @
812bc1b8
...
...
@@ -64,7 +64,8 @@ Direct API functions
.. c:function:: char* PyByteArray_AsString(PyObject *bytearray)
Return the contents of *bytearray* as a char array after checking for a
*NULL* pointer.
*NULL* pointer. The returned array always has an extra
null byte appended.
.. c:function:: int PyByteArray_Resize(PyObject *bytearray, Py_ssize_t len)
...
...
Doc/c-api/bytes.rst
View file @
812bc1b8
...
...
@@ -69,8 +69,8 @@ called with a non-bytes parameter.
+===================+===============+================================+
| :attr:`%%` | *n/a* | The literal % character. |
+-------------------+---------------+--------------------------------+
| :attr:`%c` | int | A single
character,
|
| | | represented as a
n C int.
|
| :attr:`%c` | int | A single
byte,
|
| | | represented as a
C int.
|
+-------------------+---------------+--------------------------------+
| :attr:`%d` | int | Exactly equivalent to |
| | | ``printf("%d")``. |
...
...
@@ -109,7 +109,7 @@ called with a non-bytes parameter.
+-------------------+---------------+--------------------------------+
An unrecognized format character causes all the rest of the format string to be
copied as-is to the result
string
, and any extra arguments discarded.
copied as-is to the result
object
, and any extra arguments discarded.
.. c:function:: PyObject* PyBytes_FromFormatV(const char *format, va_list vargs)
...
...
@@ -136,11 +136,13 @@ called with a non-bytes parameter.
.. c:function:: char* PyBytes_AsString(PyObject *o)
Return a NUL-terminated representation of the contents of *o*. The pointer
refers to the internal buffer of *o*, not a copy. The data must not be
modified in any way, unless the string was just created using
Return a pointer to the contents of *o*. The pointer
refers to the internal buffer of *o*, which consists of ``len(o) + 1``
bytes. The last byte in the buffer is always null, regardless of
whether there are any other null bytes. The data must not be
modified in any way, unless the object was just created using
``PyBytes_FromStringAndSize(NULL, size)``. It must not be deallocated. If
*o* is not a
string
object at all, :c:func:`PyBytes_AsString` returns *NULL*
*o* is not a
bytes
object at all, :c:func:`PyBytes_AsString` returns *NULL*
and raises :exc:`TypeError`.
...
...
@@ -151,16 +153,18 @@ called with a non-bytes parameter.
.. c:function:: int PyBytes_AsStringAndSize(PyObject *obj, char **buffer, Py_ssize_t *length)
Return
a NUL-terminated representation of the
contents of the object *obj*
Return
the null-terminated
contents of the object *obj*
through the output variables *buffer* and *length*.
If *length* is *NULL*, the resulting buffer may not contain NUL characters;
If *length* is *NULL*, the bytes object
may not contain embedded null bytes;
if it does, the function returns ``-1`` and a :exc:`TypeError` is raised.
The buffer refers to an internal string buffer of *obj*, not a copy. The data
must not be modified in any way, unless the string was just created using
The buffer refers to an internal buffer of *obj*, which includes an
additional null byte at the end (not counted in *length*). The data
must not be modified in any way, unless the object was just created using
``PyBytes_FromStringAndSize(NULL, size)``. It must not be deallocated. If
*
string* is not a string
object at all, :c:func:`PyBytes_AsStringAndSize`
*
obj* is not a bytes
object at all, :c:func:`PyBytes_AsStringAndSize`
returns ``-1`` and raises :exc:`TypeError`.
...
...
@@ -168,14 +172,14 @@ called with a non-bytes parameter.
Create a new bytes object in *\*bytes* containing the contents of *newpart*
appended to *bytes*; the caller will own the new reference. The reference to
the old value of *bytes* will be stolen. If the new
string
cannot be
the old value of *bytes* will be stolen. If the new
object
cannot be
created, the old reference to *bytes* will still be discarded and the value
of *\*bytes* will be set to *NULL*; the appropriate exception will be set.
.. c:function:: void PyBytes_ConcatAndDel(PyObject **bytes, PyObject *newpart)
Create a new
string
object in *\*bytes* containing the contents of *newpart*
Create a new
bytes
object in *\*bytes* containing the contents of *newpart*
appended to *bytes*. This version decrements the reference count of
*newpart*.
...
...
Doc/c-api/unicode.rst
View file @
812bc1b8
...
...
@@ -227,7 +227,10 @@ access internal read-only data of Unicode objects:
const char* PyUnicode_AS_DATA(PyObject *o)
Return a pointer to a :c:type:`Py_UNICODE` representation of the object. The
``AS_DATA`` form casts the pointer to :c:type:`const char *`. *o* has to be
returned buffer is always terminated with an extra null code point. It
may also contain embedded null code points, which would cause the string
to be truncated when used in most C functions. The ``AS_DATA`` form
casts the pointer to :c:type:`const char *`. The *o* argument has to be
a Unicode object (not checked).
.. versionchanged:: 3.3
...
...
@@ -650,7 +653,8 @@ APIs:
Copy the string *u* into a new UCS4 buffer that is allocated using
:c:func:`PyMem_Malloc`. If this fails, *NULL* is returned with a
:exc:`MemoryError` set.
:exc:`MemoryError` set. The returned buffer always has an extra
null code point appended.
.. versionadded:: 3.3
...
...
@@ -689,8 +693,9 @@ Extension modules can continue using them, as they will not be removed in Python
Return a read-only pointer to the Unicode object's internal
:c:type:`Py_UNICODE` buffer, or *NULL* on error. This will create the
:c:type:`Py_UNICODE*` representation of the object if it is not yet
available. Note that the resulting :c:type:`Py_UNICODE` string may contain
embedded null characters, which would cause the string to be truncated when
available. The buffer is always terminated with an extra null code point.
Note that the resulting :c:type:`Py_UNICODE` string may also contain
embedded null code points, which would cause the string to be truncated when
used in most C functions.
Please migrate to using :c:func:`PyUnicode_AsUCS4`,
...
...
@@ -708,8 +713,9 @@ Extension modules can continue using them, as they will not be removed in Python
.. c:function:: Py_UNICODE* PyUnicode_AsUnicodeAndSize(PyObject *unicode, Py_ssize_t *size)
Like :c:func:`PyUnicode_AsUnicode`, but also saves the :c:func:`Py_UNICODE`
array length in *size*. Note that the resulting :c:type:`Py_UNICODE*` string
may contain embedded null characters, which would cause the string to be
array length (excluding the extra null terminator) in *size*.
Note that the resulting :c:type:`Py_UNICODE*` string
may contain embedded null code points, which would cause the string to be
truncated when used in most C functions.
.. versionadded:: 3.3
...
...
@@ -717,11 +723,11 @@ Extension modules can continue using them, as they will not be removed in Python
.. c:function:: Py_UNICODE* PyUnicode_AsUnicodeCopy(PyObject *unicode)
Create a copy of a Unicode string ending with a nul
character
. Return *NULL*
Create a copy of a Unicode string ending with a nul
l code point
. Return *NULL*
and raise a :exc:`MemoryError` exception on memory allocation failure,
otherwise return a new allocated buffer (use :c:func:`PyMem_Free` to free
the buffer). Note that the resulting :c:type:`Py_UNICODE*` string may
contain embedded null c
haracter
s, which would cause the string to be
contain embedded null c
ode point
s, which would cause the string to be
truncated when used in most C functions.
.. versionadded:: 3.2
...
...
@@ -902,10 +908,10 @@ wchar_t Support
Copy the Unicode object contents into the :c:type:`wchar_t` buffer *w*. At most
*size* :c:type:`wchar_t` characters are copied (excluding a possibly trailing
0-
termination character). Return the number of :c:type:`wchar_t` characters
null
termination character). Return the number of :c:type:`wchar_t` characters
copied or -1 in case of an error. Note that the resulting :c:type:`wchar_t*`
string may or may not be
0
-terminated. It is the responsibility of the caller
to make sure that the :c:type:`wchar_t*` string is
0
-terminated in case this is
string may or may not be
null
-terminated. It is the responsibility of the caller
to make sure that the :c:type:`wchar_t*` string is
null
-terminated in case this is
required by the application. Also, note that the :c:type:`wchar_t*` string
might contain null characters, which would cause the string to be truncated
when used with most C functions.
...
...
@@ -914,8 +920,8 @@ wchar_t Support
.. c:function:: wchar_t* PyUnicode_AsWideCharString(PyObject *unicode, Py_ssize_t *size)
Convert the Unicode object to a wide character string. The output string
always ends with a nul character. If *size* is not *NULL*, write the number
of wide characters (excluding the trailing
0-
termination character) into
always ends with a nul
l
character. If *size* is not *NULL*, write the number
of wide characters (excluding the trailing
null
termination character) into
*\*size*.
Returns a buffer allocated by :c:func:`PyMem_Alloc` (use
...
...
@@ -1045,9 +1051,11 @@ These are the UTF-8 codec APIs:
.. c:function:: char* PyUnicode_AsUTF8AndSize(PyObject *unicode, Py_ssize_t *size)
Return a pointer to the default encoding (UTF-8) of the Unicode object, and
store the size of the encoded representation (in bytes) in *size*. *size*
can be *NULL*, in this case no size will be stored.
Return a pointer to the UTF-8 encoding of the Unicode object, and
store the size of the encoded representation (in bytes) in *size*. The
*size* argument can be *NULL*; in this case no size will be stored. The
returned buffer always has an extra null byte appended (not included in
*size*), regardless of whether there are any other null code points.
In the case of an error, *NULL* is returned with an exception set and no
*size* is stored.
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment