Commit 0a560a11 authored by R David Murray's avatar R David Murray

#23088: Clarify null termination of bytes and strings in C API.

Patch by Martin Panter, reviewed by Serhiy Storchaka and R. David Murray.
parent 3afdb287
......@@ -64,7 +64,8 @@ Direct API functions
.. c:function:: char* PyByteArray_AsString(PyObject *bytearray)
Return the contents of *bytearray* as a char array after checking for a
*NULL* pointer.
*NULL* pointer. The returned array always has an extra
null byte appended.
.. c:function:: int PyByteArray_Resize(PyObject *bytearray, Py_ssize_t len)
......
......@@ -69,8 +69,8 @@ called with a non-bytes parameter.
+===================+===============+================================+
| :attr:`%%` | *n/a* | The literal % character. |
+-------------------+---------------+--------------------------------+
| :attr:`%c` | int | A single character, |
| | | represented as an C int. |
| :attr:`%c` | int | A single byte, |
| | | represented as a C int. |
+-------------------+---------------+--------------------------------+
| :attr:`%d` | int | Exactly equivalent to |
| | | ``printf("%d")``. |
......@@ -109,7 +109,7 @@ called with a non-bytes parameter.
+-------------------+---------------+--------------------------------+
An unrecognized format character causes all the rest of the format string to be
copied as-is to the result string, and any extra arguments discarded.
copied as-is to the result object, and any extra arguments discarded.
.. c:function:: PyObject* PyBytes_FromFormatV(const char *format, va_list vargs)
......@@ -136,11 +136,13 @@ called with a non-bytes parameter.
.. c:function:: char* PyBytes_AsString(PyObject *o)
Return a NUL-terminated representation of the contents of *o*. The pointer
refers to the internal buffer of *o*, not a copy. The data must not be
modified in any way, unless the string was just created using
Return a pointer to the contents of *o*. The pointer
refers to the internal buffer of *o*, which consists of ``len(o) + 1``
bytes. The last byte in the buffer is always null, regardless of
whether there are any other null bytes. The data must not be
modified in any way, unless the object was just created using
``PyBytes_FromStringAndSize(NULL, size)``. It must not be deallocated. If
*o* is not a string object at all, :c:func:`PyBytes_AsString` returns *NULL*
*o* is not a bytes object at all, :c:func:`PyBytes_AsString` returns *NULL*
and raises :exc:`TypeError`.
......@@ -151,16 +153,18 @@ called with a non-bytes parameter.
.. c:function:: int PyBytes_AsStringAndSize(PyObject *obj, char **buffer, Py_ssize_t *length)
Return a NUL-terminated representation of the contents of the object *obj*
Return the null-terminated contents of the object *obj*
through the output variables *buffer* and *length*.
If *length* is *NULL*, the resulting buffer may not contain NUL characters;
If *length* is *NULL*, the bytes object
may not contain embedded null bytes;
if it does, the function returns ``-1`` and a :exc:`TypeError` is raised.
The buffer refers to an internal string buffer of *obj*, not a copy. The data
must not be modified in any way, unless the string was just created using
The buffer refers to an internal buffer of *obj*, which includes an
additional null byte at the end (not counted in *length*). The data
must not be modified in any way, unless the object was just created using
``PyBytes_FromStringAndSize(NULL, size)``. It must not be deallocated. If
*string* is not a string object at all, :c:func:`PyBytes_AsStringAndSize`
*obj* is not a bytes object at all, :c:func:`PyBytes_AsStringAndSize`
returns ``-1`` and raises :exc:`TypeError`.
......@@ -168,14 +172,14 @@ called with a non-bytes parameter.
Create a new bytes object in *\*bytes* containing the contents of *newpart*
appended to *bytes*; the caller will own the new reference. The reference to
the old value of *bytes* will be stolen. If the new string cannot be
the old value of *bytes* will be stolen. If the new object cannot be
created, the old reference to *bytes* will still be discarded and the value
of *\*bytes* will be set to *NULL*; the appropriate exception will be set.
.. c:function:: void PyBytes_ConcatAndDel(PyObject **bytes, PyObject *newpart)
Create a new string object in *\*bytes* containing the contents of *newpart*
Create a new bytes object in *\*bytes* containing the contents of *newpart*
appended to *bytes*. This version decrements the reference count of
*newpart*.
......
......@@ -227,7 +227,10 @@ access internal read-only data of Unicode objects:
const char* PyUnicode_AS_DATA(PyObject *o)
Return a pointer to a :c:type:`Py_UNICODE` representation of the object. The
``AS_DATA`` form casts the pointer to :c:type:`const char *`. *o* has to be
returned buffer is always terminated with an extra null code point. It
may also contain embedded null code points, which would cause the string
to be truncated when used in most C functions. The ``AS_DATA`` form
casts the pointer to :c:type:`const char *`. The *o* argument has to be
a Unicode object (not checked).
.. versionchanged:: 3.3
......@@ -650,7 +653,8 @@ APIs:
Copy the string *u* into a new UCS4 buffer that is allocated using
:c:func:`PyMem_Malloc`. If this fails, *NULL* is returned with a
:exc:`MemoryError` set.
:exc:`MemoryError` set. The returned buffer always has an extra
null code point appended.
.. versionadded:: 3.3
......@@ -689,8 +693,9 @@ Extension modules can continue using them, as they will not be removed in Python
Return a read-only pointer to the Unicode object's internal
:c:type:`Py_UNICODE` buffer, or *NULL* on error. This will create the
:c:type:`Py_UNICODE*` representation of the object if it is not yet
available. Note that the resulting :c:type:`Py_UNICODE` string may contain
embedded null characters, which would cause the string to be truncated when
available. The buffer is always terminated with an extra null code point.
Note that the resulting :c:type:`Py_UNICODE` string may also contain
embedded null code points, which would cause the string to be truncated when
used in most C functions.
Please migrate to using :c:func:`PyUnicode_AsUCS4`,
......@@ -708,8 +713,9 @@ Extension modules can continue using them, as they will not be removed in Python
.. c:function:: Py_UNICODE* PyUnicode_AsUnicodeAndSize(PyObject *unicode, Py_ssize_t *size)
Like :c:func:`PyUnicode_AsUnicode`, but also saves the :c:func:`Py_UNICODE`
array length in *size*. Note that the resulting :c:type:`Py_UNICODE*` string
may contain embedded null characters, which would cause the string to be
array length (excluding the extra null terminator) in *size*.
Note that the resulting :c:type:`Py_UNICODE*` string
may contain embedded null code points, which would cause the string to be
truncated when used in most C functions.
.. versionadded:: 3.3
......@@ -717,11 +723,11 @@ Extension modules can continue using them, as they will not be removed in Python
.. c:function:: Py_UNICODE* PyUnicode_AsUnicodeCopy(PyObject *unicode)
Create a copy of a Unicode string ending with a nul character. Return *NULL*
Create a copy of a Unicode string ending with a null code point. Return *NULL*
and raise a :exc:`MemoryError` exception on memory allocation failure,
otherwise return a new allocated buffer (use :c:func:`PyMem_Free` to free
the buffer). Note that the resulting :c:type:`Py_UNICODE*` string may
contain embedded null characters, which would cause the string to be
contain embedded null code points, which would cause the string to be
truncated when used in most C functions.
.. versionadded:: 3.2
......@@ -895,10 +901,10 @@ wchar_t Support
Copy the Unicode object contents into the :c:type:`wchar_t` buffer *w*. At most
*size* :c:type:`wchar_t` characters are copied (excluding a possibly trailing
0-termination character). Return the number of :c:type:`wchar_t` characters
null termination character). Return the number of :c:type:`wchar_t` characters
copied or -1 in case of an error. Note that the resulting :c:type:`wchar_t*`
string may or may not be 0-terminated. It is the responsibility of the caller
to make sure that the :c:type:`wchar_t*` string is 0-terminated in case this is
string may or may not be null-terminated. It is the responsibility of the caller
to make sure that the :c:type:`wchar_t*` string is null-terminated in case this is
required by the application. Also, note that the :c:type:`wchar_t*` string
might contain null characters, which would cause the string to be truncated
when used with most C functions.
......@@ -907,8 +913,8 @@ wchar_t Support
.. c:function:: wchar_t* PyUnicode_AsWideCharString(PyObject *unicode, Py_ssize_t *size)
Convert the Unicode object to a wide character string. The output string
always ends with a nul character. If *size* is not *NULL*, write the number
of wide characters (excluding the trailing 0-termination character) into
always ends with a null character. If *size* is not *NULL*, write the number
of wide characters (excluding the trailing null termination character) into
*\*size*.
Returns a buffer allocated by :c:func:`PyMem_Alloc` (use
......@@ -1038,9 +1044,11 @@ These are the UTF-8 codec APIs:
.. c:function:: char* PyUnicode_AsUTF8AndSize(PyObject *unicode, Py_ssize_t *size)
Return a pointer to the default encoding (UTF-8) of the Unicode object, and
store the size of the encoded representation (in bytes) in *size*. *size*
can be *NULL*, in this case no size will be stored.
Return a pointer to the UTF-8 encoding of the Unicode object, and
store the size of the encoded representation (in bytes) in *size*. The
*size* argument can be *NULL*; in this case no size will be stored. The
returned buffer always has an extra null byte appended (not included in
*size*), regardless of whether there are any other null code points.
In the case of an error, *NULL* is returned with an exception set and no
*size* is stored.
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment