Commit be803255 authored by Stefan Behnel's avatar Stefan Behnel

string docs: moved comments on C++ string into their own section

parent 6d9f94d4
...@@ -163,23 +163,6 @@ null bytes. Text encoded in UTF-8 or one of the ISO-8859 encodings is ...@@ -163,23 +163,6 @@ null bytes. Text encoded in UTF-8 or one of the ISO-8859 encodings is
usually a good candidate. If in doubt, it's better to pass indices usually a good candidate. If in doubt, it's better to pass indices
that are 'obviously' correct than to rely on the data to be as expected. that are 'obviously' correct than to rely on the data to be as expected.
When wrapping a C++ library, strings will usually come in the form of
the :c:type:`std::string` class. Efficient decoding support is
available in Cython 0.17 and later::
# distutils: language = c++
from libcpp.string cimport string
cdef string s = string('abcdefg')
ustring1 = s.decode('UTF-8')
ustring2 = s[2:-2].decode('UTF-8')
For C++ strings, decoding slices will always take the proper length
of the string into account and apply Python slicing semantics (e.g.
return empty strings for out-of-bounds indices).
It is common practice to wrap string conversions (and non-trivial type It is common practice to wrap string conversions (and non-trivial type
conversions in general) in dedicated functions, as this needs to be conversions in general) in dedicated functions, as this needs to be
done in exactly the same way whenever receiving text from C. This done in exactly the same way whenever receiving text from C. This
...@@ -231,6 +214,49 @@ assignment. Later access to the invalidated pointer will read invalid ...@@ -231,6 +214,49 @@ assignment. Later access to the invalidated pointer will read invalid
memory and likely result in a segfault. Cython will therefore refuse memory and likely result in a segfault. Cython will therefore refuse
to compile this code. to compile this code.
C++ strings
-----------
When wrapping a C++ library, strings will usually come in the form of
the :c:type:`std::string` class. As with C strings, Python byte strings
automatically coerce from and to C++ strings::
# distutils: language = c++
from libcpp.string cimport string
cdef string s = py_bytes_object
try:
s.append('abc')
py_bytes_object = s
finally:
del s
The memory management situation is different than in C because the
creation of a C++ string makes an independent copy of the string
buffer which the string object then owns. It is therefore possible
to convert temporarily created Python objects directly into C++
strings. A common way to make use of this is when encoding a Python
unicode string into a C++ string::
cdef string cpp_string = py_unicode_string.encode('UTF-8')
Note that this involves a bit of overhead because it first encodes
the Unicode string into a temporarily created Python bytes object
and then copies its buffer into a new C++ string.
For the other direction, efficient decoding support is available
in Cython 0.17 and later::
cdef string s = string('abcdefg')
ustring1 = s.decode('UTF-8')
ustring2 = s[2:-2].decode('UTF-8')
For C++ strings, decoding slices will always take the proper length
of the string into account and apply Python slicing semantics (e.g.
return empty strings for out-of-bounds indices).
Source code encoding Source code encoding
-------------------- --------------------
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment