Commit b18cf636 authored by Robert Bradshaw's avatar Robert Bradshaw

Merge branch 'master' of github.com:cython/cython-docs

parents cde0b0b7 cbac7ae5
...@@ -40,17 +40,19 @@ source_suffix = '.rst' ...@@ -40,17 +40,19 @@ source_suffix = '.rst'
# The master toctree document. # The master toctree document.
master_doc = 'index' master_doc = 'index'
exclude_patterns = ['py*', 'build']
# General substitutions. # General substitutions.
project = 'Cython' project = 'Cython'
copyright = '2010, Stefan Behnel, Robert Bradshaw, Dag Sverre Seljebotn, Greg Ewing, William Stein, Gabriel Gellner, et al.' copyright = '2011, Stefan Behnel, Robert Bradshaw, Dag Sverre Seljebotn, Greg Ewing, William Stein, Gabriel Gellner, et al.'
# The default replacements for |version| and |release|, also used in various # The default replacements for |version| and |release|, also used in various
# other places throughout the built documents. # other places throughout the built documents.
# #
# The short X.Y version. # The short X.Y version.
version = '0.14' version = '0.15'
# The full version, including alpha/beta/rc tags. # The full version, including alpha/beta/rc tags.
release = '0.14' release = '0.15pre'
# There are two options for replacing |today|: either, you set today to some # There are two options for replacing |today|: either, you set today to some
# non-false value, then it is used: # non-false value, then it is used:
......
...@@ -238,7 +238,8 @@ flags, such as:: ...@@ -238,7 +238,8 @@ flags, such as::
Once we have compiled the module for the first time, we can now import Once we have compiled the module for the first time, we can now import
it and instantiate a new Queue:: it and instantiate a new Queue::
PYTHONPATH=. python -c 'import queue.Queue as Q ; Q()' $ export PYTHONPATH=.
$ python -c 'import queue.Queue as Q ; Q()'
However, this is all our Queue class can do so far, so let's make it However, this is all our Queue class can do so far, so let's make it
more usable. more usable.
......
...@@ -15,12 +15,12 @@ features for managing it. ...@@ -15,12 +15,12 @@ features for managing it.
Finally, don't hesitate to ask questions (or post reports on Finally, don't hesitate to ask questions (or post reports on
successes!) on the Cython users mailing list [UserList]_. The Cython successes!) on the Cython users mailing list [UserList]_. The Cython
developer mailing list, [DevList]_, is also open to everybody. Feel developer mailing list, [DevList]_, is also open to everybody, but
free to use it to report a bug, ask for guidance, if you have time to focusses on core development issues. Feel free to use it to report a
spare to develop Cython, or if you have suggestions for future clear bug, to ask for guidance if you have time to spare to develop
development. Cython, or if you have suggestions for future development.
.. [DevList] Cython developer mailing list: http://codespeak.net/mailman/listinfo/cython-dev. .. [DevList] Cython developer mailing list: http://mail.python.org/mailman/listinfo/cython-devel
.. [Seljebotn09] D. S. Seljebotn, Fast numerical computations with Cython, .. [Seljebotn09] D. S. Seljebotn, Fast numerical computations with Cython,
Proceedings of the 8th Python in Science Conference, 2009. Proceedings of the 8th Python in Science Conference, 2009.
.. [UserList] Cython users mailing list: http://groups.google.com/group/cython-users .. [UserList] Cython users mailing list: http://groups.google.com/group/cython-users
Related work Related work
============ ============
Pyrex [Pyrex]_ is the compiler project that Cython was originally based on. Pyrex [Pyrex]_ is the compiler project that Cython was originally
Many features and the major design decisions of the Cython language based on. Many features and the major design decisions of the Cython
were developed by Greg Ewing as part of that project. Today, Cython language were developed by Greg Ewing as part of that project. Today,
supersedes the capabilities of Pyrex by providing a higher Cython supersedes the capabilities of Pyrex by providing a
compatibility with Python code and Python semantics, as well as substantially higher compatibility with Python code and Python
superior optimisations and better integration with scientific Python semantics, as well as superior optimisations and better integration
extensions like NumPy. with scientific Python extensions like NumPy.
ctypes [ctypes]_ is a foreign function interface (FFI) for Python. It ctypes [ctypes]_ is a foreign function interface (FFI) for Python. It
provides C compatible data types, and allows calling functions in DLLs provides C compatible data types, and allows calling functions in DLLs
...@@ -20,23 +20,24 @@ operations must pass through Python code first. Cython, being a ...@@ -20,23 +20,24 @@ operations must pass through Python code first. Cython, being a
compiled language, can avoid much of this overhead by moving more compiled language, can avoid much of this overhead by moving more
functionality and long-running loops into fast C code. functionality and long-running loops into fast C code.
SWIG [SWIG]_ is a wrapper code generator. It makes it very easy to parse SWIG [SWIG]_ is a wrapper code generator. It makes it very easy to
large API definitions in C/C++ header files, and to generate straight parse large API definitions in C/C++ header files, and to generate
forward wrapper code for a large set of programming languages. As straight forward wrapper code for a large set of programming
opposed to Cython, however, it is not a programming language itself. languages. As opposed to Cython, however, it is not a programming
Thin wrappers are easy to generate, but the more functionality a language itself. Thin wrappers are easy to generate, but the more
wrapper needs to provide, the harder it gets to implement it with functionality a wrapper needs to provide, the harder it gets to
SWIG. Cython, on the other hand, makes it very easy to write very implement it with SWIG. Cython, on the other hand, makes it very easy
elaborate wrapper code specifically for the Python language. Also, to write very elaborate wrapper code specifically for the Python
there exists third party code for parsing C header files and using it language, and to make it as thin or thick as needed at any given
to generate Cython definitions and module skeletons. place. Also, there exists third party code for parsing C header files
and using it to generate Cython definitions and module skeletons.
ShedSkin [ShedSkin]_ is an experimental Python-to-C++ compiler. It ShedSkin [ShedSkin]_ is an experimental Python-to-C++ compiler. It
uses profiling information and very powerful type inference engine uses a very powerful whole-module type inference engine to generate a
to generate a C++ program from (restricted) Python source code. C++ program from (restricted) Python source code. The main drawback
The main drawback is has no support for calling the Python/C API for is that it has no support for calling the Python/C API for operations
operations it does not support natively, and supports very few of the it does not support natively, and supports very few of the standard
standard Python modules. Python modules.
.. [ctypes] http://docs.python.org/library/ctypes.html. .. [ctypes] http://docs.python.org/library/ctypes.html.
.. there's also the original ctypes home page: http://python.net/crew/theller/ctypes/ .. there's also the original ctypes home page: http://python.net/crew/theller/ctypes/
......
...@@ -203,18 +203,23 @@ Single bytes and characters ...@@ -203,18 +203,23 @@ Single bytes and characters
--------------------------- ---------------------------
The Python C-API uses the normal C ``char`` type to represent a byte The Python C-API uses the normal C ``char`` type to represent a byte
value, but it has a special ``Py_UNICODE`` integer type for a Unicode value, but it has two special integer types for a Unicode code point
code point value, i.e. a single Unicode character. Since version value, i.e. a single Unicode character: ``Py_UNICODE`` and
0.13, Cython supports the latter natively, which is either defined as ``Py_UCS4``. Since version 0.13, Cython supports the first natively,
an unsigned 2-byte or 4-byte integer, or as ``wchar_t``, depending on support for ``Py_UCS4`` is new in Cython 0.15. ``Py_UNICODE`` is
the platform. The exact type is a compile time option in the build of either defined as an unsigned 2-byte or 4-byte integer, or as
the CPython interpreter and extension modules inherit this definition ``wchar_t``, depending on the platform. The exact type is a compile
at C compile time. time option in the build of the CPython interpreter and extension
modules inherit this definition at C compile time. The advantage of
In Cython, the ``char`` and ``Py_UNICODE`` types behave differently ``Py_UCS4`` is that it is guaranteed to be large enough for any
when coercing to Python objects. Similar to the behaviour of the Unicode code point value, regardless of the platform. It is defined
bytes type in Python 3, the ``char`` type coerces to a Python integer as a 32bit unsigned int or long.
value by default, so that the following prints 65 and not ``A``::
In Cython, the ``char`` type behaves differently from the
``Py_UNICODE`` and ``Py_UCS4`` types when coercing to Python objects.
Similar to the behaviour of the bytes type in Python 3, the ``char``
type coerces to a Python integer value by default, so that the
following prints 65 and not ``A``::
# -*- coding: ASCII -*- # -*- coding: ASCII -*-
...@@ -230,31 +235,32 @@ explicitly, and the following will print ``A`` (or ``b'A'`` in Python ...@@ -230,31 +235,32 @@ explicitly, and the following will print ``A`` (or ``b'A'`` in Python
The explicit coercion works for any C integer type. Values outside of The explicit coercion works for any C integer type. Values outside of
the range of a ``char`` or ``unsigned char`` will raise an the range of a ``char`` or ``unsigned char`` will raise an
``OverflowError``. Coercion will also happen automatically when ``OverflowError`` at runtime. Coercion will also happen automatically
assigning to a typed variable, e.g.:: when assigning to a typed variable, e.g.::
cdef bytes py_byte_string = char_val cdef bytes py_byte_string
py_byte_string = char_val
On the other hand, the ``Py_UNICODE`` type is rarely used outside of On the other hand, the ``Py_UNICODE`` and ``Py_UCS4`` types are rarely
the context of a Python unicode string, so its default behaviour is to used outside of the context of a Python unicode string, so their
coerce to a Python unicode object. The following will therefore print default behaviour is to coerce to a Python unicode object. The
the character ``A``:: following will therefore print the character ``A``, as would the same
code with the ``Py_UNICODE`` type::
cdef Py_UNICODE uchar_val = u'A' cdef Py_UCS4 uchar_val = u'A'
assert uchar_val == 65 # character point value of u'A' assert uchar_val == 65 # character point value of u'A'
print( uchar_val ) print( uchar_val )
Again, explicit casting will allow users to override this behaviour. Again, explicit casting will allow users to override this behaviour.
The following will print 65:: The following will print 65::
cdef Py_UNICODE uchar_val = u'A' cdef Py_UCS4 uchar_val = u'A'
print( <int>uchar_val ) print( <long>uchar_val )
Note that casting to a C ``int`` (or ``unsigned int``) will do just Note that casting to a C ``long`` (or ``unsigned long``) will work
fine on a platform with 32bit or more, as the maximum code point value just fine, as the maximum code point value that a Unicode character
that a Unicode character can have is 1114111 on a 4-byte unicode can have is 1114111 (``0x10FFFF``). On platforms with 32bit or more,
CPython platform ("wide unicode") and 65535 on a narrow (2-byte) ``int`` is just as good.
unicode platform.
Narrow Unicode builds Narrow Unicode builds
...@@ -263,19 +269,19 @@ Narrow Unicode builds ...@@ -263,19 +269,19 @@ Narrow Unicode builds
In narrow Unicode builds of CPython, i.e. builds where In narrow Unicode builds of CPython, i.e. builds where
``sys.maxunicode`` is 65535 (such as all Windows builds, as opposed to ``sys.maxunicode`` is 65535 (such as all Windows builds, as opposed to
1114111 in wide builds), it is still possible to use Unicode character 1114111 in wide builds), it is still possible to use Unicode character
code points that do not fit into the two bytes wide ``Py_UNICODE`` code points that do not fit into the 16 bit wide ``Py_UNICODE`` type.
type. For example, such a CPython build will accept the unicode For example, such a CPython build will accept the unicode literal
literal ``u'\U00012345'``. However, the underlying system level ``u'\U00012345'``. However, the underlying system level encoding
encoding leaks into Python space in this case, so that the length of leaks into Python space in this case, so that the length of this
this literal becomes 2 instead of 1. This also shows when iterating literal becomes 2 instead of 1. This also shows when iterating over
over it or when indexing into it. The visible substrings are it or when indexing into it. The visible substrings are ``u'\uD808'``
``u'\uD808'`` and ``u'\uDF45'`` in this example. They form a and ``u'\uDF45'`` in this example. They form a so-called surrogate
so-called surrogate pair that represents the above character. pair that represents the above character.
For more information on this topic, it is worth reading the `Wikipedia For more information on this topic, it is worth reading the `Wikipedia
article about the UTF-16 encoding`_. article about the UTF-16 encoding`_.
.. _`Wikipedia article on the UTF-16 encoding`: http://en.wikipedia.org/wiki/UTF-16/UCS-2 .. _`Wikipedia article about the UTF-16 encoding`: http://en.wikipedia.org/wiki/UTF-16/UCS-2
The same properties apply to Cython code that gets compiled for a The same properties apply to Cython code that gets compiled for a
narrow CPython runtime environment. In most cases, e.g. when narrow CPython runtime environment. In most cases, e.g. when
...@@ -298,6 +304,20 @@ in question. Looking for substrings works correctly because the two ...@@ -298,6 +304,20 @@ in question. Looking for substrings works correctly because the two
code units in the surrogate pair use distinct value ranges, so the code units in the surrogate pair use distinct value ranges, so the
pair is always identifiable in a sequence of code points. pair is always identifiable in a sequence of code points.
As of version 0.15, Cython has extended support for surrogate pairs so
that you can safely use an ``in`` test to search character values from
the full ``Py_UCS4`` range even on narrow platforms::
cdef Py_UCS4 uchar = 0x12345
print( uchar in some_unicode_string )
Similarly, it can coerce a one character string with a high Unicode
code point value to a Py_UCS4 value on both narrow and wide Unicode
platforms::
cdef Py_UCS4 uchar = u'\U00012345'
assert uchar == 0x12345
Iteration Iteration
--------- ---------
...@@ -321,7 +341,7 @@ The same applies to bytes objects:: ...@@ -321,7 +341,7 @@ The same applies to bytes objects::
if c == 'A': ... if c == 'A': ...
For unicode objects, Cython will automatically infer the type of the For unicode objects, Cython will automatically infer the type of the
loop variable as ``Py_UNICODE``:: loop variable as ``Py_UCS4``::
cdef unicode ustring = ... cdef unicode ustring = ...
...@@ -335,13 +355,16 @@ value to be a Python object, so Cython may end up generating redundant ...@@ -335,13 +355,16 @@ value to be a Python object, so Cython may end up generating redundant
conversion code for the loop variable value inside of the loop. If conversion code for the loop variable value inside of the loop. If
this leads to a performance degradation for a specific piece of code, this leads to a performance degradation for a specific piece of code,
you can either type the loop variable as a Python object explicitly, you can either type the loop variable as a Python object explicitly,
or assign it to a Python typed temporary variable to enforce one-time or assign its value to a Python typed variable somewhere inside of the
coercion before running Python operations on it. loop to enforce one-time coercion before running Python operations on
it.
There is also an optimisation for ``in`` tests, so that the following There are also optimisations for ``in`` tests, so that the following
code will run in plain C code, (actually using a switch statement):: code will run in plain C code, (actually using a switch statement)::
cdef Py_UNICODE uchar_val = get_a_unicode_character() cdef Py_UCS4 uchar_val = get_a_unicode_character()
if uchar_val in u'abcABCxY': if uchar_val in u'abcABCxY':
... ...
Combined with the looping optimisation above, this can result in very
efficient character switching code, e.g. in unicode parsers.
...@@ -53,7 +53,7 @@ where calls occur within Cython code. For example: ...@@ -53,7 +53,7 @@ where calls occur within Cython code. For example:
def __init__(self, int x0, int y0, int x1, int y1): def __init__(self, int x0, int y0, int x1, int y1):
self.x0 = x0; self.y0 = y0; self.x1 = x1; self.y1 = y1 self.x0 = x0; self.y0 = y0; self.x1 = x1; self.y1 = y1
cdef int _area(self): cdef int _area(self):
int area cdef int area
area = (self.x1 - self.x0) * (self.y1 - self.y0) area = (self.x1 - self.x0) * (self.y1 - self.y0)
if area < 0: if area < 0:
area = -area area = -area
...@@ -88,7 +88,7 @@ overheads. Consider this code: ...@@ -88,7 +88,7 @@ overheads. Consider this code:
def __init__(self, int x0, int y0, int x1, int y1): def __init__(self, int x0, int y0, int x1, int y1):
self.x0 = x0; self.y0 = y0; self.x1 = x1; self.y1 = y1 self.x0 = x0; self.y0 = y0; self.x1 = x1; self.y1 = y1
cpdef int area(self): cpdef int area(self):
int area cdef int area
area = (self.x1 - self.x0) * (self.y1 - self.y0) area = (self.x1 - self.x0) * (self.y1 - self.y0)
if area < 0: if area < 0:
area = -area area = -area
......
...@@ -30,7 +30,7 @@ Other Current Limitations ...@@ -30,7 +30,7 @@ Other Current Limitations
========================== ==========================
* The :func:`globals` builtin returns the last Python callers globals, not the current function's locals. This behavior should not be relied upon, as it will probably change in the future. * The :func:`globals` builtin returns the last Python callers globals, not the current function's locals. This behavior should not be relied upon, as it will probably change in the future.
* The :fun:`locals` builtin can only be used if all local variables can be converted to Python objects, and returns a dict. * The :func:`locals` builtin can only be used if all local variables can be converted to Python objects, and returns a dict.
* Class and function definitions cannot be placed inside control structures. * Class and function definitions cannot be placed inside control structures.
Semantic differences between Python and Cython Semantic differences between Python and Cython
......
...@@ -15,7 +15,7 @@ called 'foo', and its fully qualified module name is ...@@ -15,7 +15,7 @@ called 'foo', and its fully qualified module name is
'foo.shrubbing'. 'foo.shrubbing'.
So when Pyrex wants to find out whether there is a `.pxd` file for shrubbing, So when Pyrex wants to find out whether there is a `.pxd` file for shrubbing,
it looks for one corresponding to a module called :module:`foo.shrubbing`. It it looks for one corresponding to a module called `foo.shrubbing`. It
does this by searching the include path for a top-level package directory does this by searching the include path for a top-level package directory
called 'foo' containing a file called 'shrubbing.pxd'. called 'foo' containing a file called 'shrubbing.pxd'.
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment