Skip to content
Projects
Groups
Snippets
Help
Loading...
Help
Support
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
C
cython
Project overview
Project overview
Details
Activity
Releases
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Issues
0
Issues
0
List
Boards
Labels
Milestones
Merge Requests
0
Merge Requests
0
Analytics
Analytics
Repository
Value Stream
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Create a new issue
Commits
Issue Boards
Open sidebar
Boxiang Sun
cython
Commits
7e137a34
Commit
7e137a34
authored
Jul 19, 2012
by
Stefan Behnel
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
doc update on string decoding
parent
92db64f1
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
30 additions
and
2 deletions
+30
-2
docs/src/tutorial/strings.rst
docs/src/tutorial/strings.rst
+30
-2
No files found.
docs/src/tutorial/strings.rst
View file @
7e137a34
...
...
@@ -44,7 +44,9 @@ slicing the C string::
Here, no additional byte counting is required and ``length`` bytes from
the ``c_string`` will be copied into the Python bytes object, including
any null bytes.
any null bytes. Keep in mind that the slice indices are assumed to be
accurate in this case and no bounds checking is done, so incorrect
slice indices will lead to data corruption and crashes.
Note that the creation of the Python bytes string can fail with an
exception, e.g. due to insufficient memory. If you need to
...
...
@@ -138,7 +140,7 @@ contains no null bytes::
cdef char* some_c_string = c_call_returning_a_c_string()
ustring = some_c_string.decode('UTF-8')
And for strings where the length is known::
And
, more efficiently,
for strings where the length is known::
cdef char* c_string = NULL
cdef Py_ssize_t length = 0
...
...
@@ -152,6 +154,32 @@ The same should be used when the string contains null bytes, e.g. when
it uses an encoding like UCS-4, where each character is encoded in four
bytes most of which tend to be 0.
Again, no bounds checking is done if slice indices are provided, so
incorrect indices lead to data corruption and crashes. However, using
negative indices is possible since Cython 0.17 and will inject a call
to :c:func:`strlen()` in order to determine the string length.
Obviously, this only works for 0-terminated strings without internal
null bytes. Text encoded in UTF-8 or one of the ISO-8859 encodings is
usually a good candidate. If in doubt, it's better to pass indices
that are 'obviously' correct than to rely on the data to be as expected.
When wrapping a C++ library, strings will usually come in the form of
the :c:type:`std::string` class. Efficient decoding support is
available in Cython 0.17 and later::
# cython: language = c++
from libcpp.string cimport string
s = string('abcdefg')
ustring1 = s.decode('UTF-8')
ustring2 = s[2:-2].decode('UTF-8')
For C++ strings, decoding slices will always take the proper length
of the string into account and apply Python slicing semantics (e.g.
return empty strings for out-of-bounds indices).
It is common practice to wrap string conversions (and non-trivial type
conversions in general) in dedicated functions, as this needs to be
done in exactly the same way whenever receiving text from C. This
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment