Commit c41d9a2d authored by Robert Bradshaw's avatar Robert Bradshaw

Merge back in old code as users guide (as it doesn't look like it was all copied over.)

--HG--
rename : index.rst => src/userguide/index.rst
parents b6add9d2 e2458387
...@@ -7,4 +7,5 @@ Welcome to Cython's Documentation ...@@ -7,4 +7,5 @@ Welcome to Cython's Documentation
src/quickstart/index src/quickstart/index
src/tutorial/index src/tutorial/index
src/userguide/index
src/reference/index src/reference/index
.. highlight:: cython
.. _early-binding-for-speed:
**************************
Early Binding for Speed
**************************
As a dynamic language, Python encourages a programming style of considering
classes and objects in terms of their methods and attributes, more than where
they fit into the class hierarchy.
This can make Python a very relaxed and comfortable language for rapid
development, but with a price - the 'red tape' of managing data types is
dumped onto the interpreter. At run time, the interpreter does a lot of work
searching namespaces, fetching attributes and parsing argument and keyword
tuples. This run-time 'late binding' is a major cause of Python's relative
slowness compared to 'early binding' languages such as C++.
However with Cython it is possible to gain significant speed-ups through the
use of 'early binding' programming techniques.
For example, consider the following (silly) code example:
.. sourcecode:: cython
cdef class Rectangle:
cdef int x0, y0
cdef int x1, y1
def __init__(self, int x0, int y0, int x1, int y1):
self.x0 = x0; self.y0 = y0; self.x1 = x1; self.y1 = y1
def area(self):
area = (self.x1 - self.x0) * (self.y1 - self.y0)
if area < 0:
area = -area
return area
def rectArea(x0, y0, x1, y1):
rect = Rectangle(x0, y0, x1, y1)
return rect.area()
In the :func:`rectArea` method, the call to :meth:`rect.area` and the
:meth:`.area` method contain a lot of Python overhead.
However, in Cython, it is possible to eliminate a lot of this overhead in cases
where calls occur within Cython code. For example:
.. sourcecode:: cython
cdef class Rectangle:
cdef int x0, y0
cdef int x1, y1
def __init__(self, int x0, int y0, int x1, int y1):
self.x0 = x0; self.y0 = y0; self.x1 = x1; self.y1 = y1
cdef int _area(self):
int area
area = (self.x1 - self.x0) * (self.y1 - self.y0)
if area < 0:
area = -area
return area
def area(self):
return self._area()
def rectArea(x0, y0, x1, y1):
cdef Rectangle rect
rect = Rectangle(x0, y0, x1, y1)
return rect._area()
Here, in the Rectangle extension class, we have defined two different area
calculation methods, the efficient :meth:`_area` C method, and the
Python-callable :meth:`area` method which serves as a thin wrapper around
:meth:`_area`. Note also in the function :func:`rectArea` how we 'early bind'
by declaring the local variable ``rect`` which is explicitly given the type
Rectangle. By using this declaration, instead of just dynamically assigning to
``rect``, we gain the ability to access the much more efficient C-callable
:meth:`_rect` method.
But Cython offers us more simplicity again, by allowing us to declare
dual-access methods - methods that can be efficiently called at C level, but
can also be accessed from pure Python code at the cost of the Python access
overheads. Consider this code:
.. sourcecode:: cython
cdef class Rectangle:
cdef int x0, y0
cdef int x1, y1
def __init__(self, int x0, int y0, int x1, int y1):
self.x0 = x0; self.y0 = y0; self.x1 = x1; self.y1 = y1
cpdef int area(self):
int area
area = (self.x1 - self.x0) * (self.y1 - self.y0)
if area < 0:
area = -area
return area
def rectArea(x0, y0, x1, y1):
cdef Rectangle rect
rect = Rectangle(x0, y0, x1, y1)
return rect.area()
.. note::
in earlier versions of Cython, the :keyword:`cpdef` keyword is
:keyword:`rdef` - but has the same effect).
Here, we just have a single area method, declared as :keyword:`cpdef` to make it
efficiently callable as a C function, but still accessible from pure Python
(or late-binding Cython) code.
If within Cython code, we have a variable already 'early-bound' (ie, declared
explicitly as type Rectangle, (or cast to type Rectangle), then invoking its
area method will use the efficient C code path and skip the Python overhead.
But if in Pyrex or regular Python code we have a regular object variable
storing a Rectangle object, then invoking the area method will require:
* an attribute lookup for the area method
* packing a tuple for arguments and a dict for keywords (both empty in this case)
* using the Python API to call the method
and within the area method itself:
* parsing the tuple and keywords
* executing the calculation code
* converting the result to a python object and returning it
So within Cython, it is possible to achieve massive optimisations by
using strong typing in declaration and casting of variables. For tight loops
which use method calls, and where these methods are pure C, the difference can
be huge.
.. highlight:: cython
.. _extension-types:
******************
Extension Types
******************
Introduction
==============
As well as creating normal user-defined classes with the Python class
statement, Cython also lets you create new built-in Python types, known as
extension types. You define an extension type using the :keyword:`cdef` class
statement. Here's an example::
cdef class Shrubbery:
cdef int width, height
def __init__(self, w, h):
self.width = w
self.height = h
def describe(self):
print "This shrubbery is", self.width, \
"by", self.height, "cubits."
As you can see, a Cython extension type definition looks a lot like a Python
class definition. Within it, you use the def statement to define methods that
can be called from Python code. You can even define many of the special
methods such as :meth:`__init__` as you would in Python.
The main difference is that you can use the :keyword:`cdef` statement to define
attributes. The attributes may be Python objects (either generic or of a
particular extension type), or they may be of any C data type. So you can use
extension types to wrap arbitrary C data structures and provide a Python-like
interface to them.
Attributes
============
Attributes of an extension type are stored directly in the object's C struct.
The set of attributes is fixed at compile time; you can't add attributes to an
extension type instance at run time simply by assigning to them, as you could
with a Python class instance. (You can subclass the extension type in Python
and add attributes to instances of the subclass, however.)
There are two ways that attributes of an extension type can be accessed: by
Python attribute lookup, or by direct access to the C struct from Cython code.
Python code is only able to access attributes of an extension type by the
first method, but Cython code can use either method.
By default, extension type attributes are only accessible by direct access,
not Python access, which means that they are not accessible from Python code.
To make them accessible from Python code, you need to declare them as
:keyword:`public` or :keyword:`readonly`. For example,::
cdef class Shrubbery:
cdef public int width, height
cdef readonly float depth
makes the width and height attributes readable and writable from Python code,
and the depth attribute readable but not writable.
.. note::
You can only expose simple C types, such as ints, floats, and
strings, for Python access. You can also expose Python-valued attributes.
.. note::
Also the :keyword:`public` and :keyword:`readonly` options apply only to
Python access, not direct access. All the attributes of an extension type
are always readable and writable by C-level access.
Type declarations
===================
Before you can directly access the attributes of an extension type, the Cython
compiler must know that you have an instance of that type, and not just a
generic Python object. It knows this already in the case of the ``self``
parameter of the methods of that type, but in other cases you will have to use
a type declaration.
For example, in the following function,::
cdef widen_shrubbery(sh, extra_width): # BAD
sh.width = sh.width + extra_width
because the ``sh`` parameter hasn't been given a type, the width attribute
will be accessed by a Python attribute lookup. If the attribute has been
declared :keyword:`public` or :keyword:`readonly` then this will work, but it
will be very inefficient. If the attribute is private, it will not work at all
-- the code will compile, but an attribute error will be raised at run time.
The solution is to declare ``sh`` as being of type :class:`Shrubbery`, as
follows::
cdef widen_shrubbery(Shrubbery sh, extra_width):
sh.width = sh.width + extra_width
Now the Cython compiler knows that ``sh`` has a C attribute called
:attr:`width` and will generate code to access it directly and efficiently.
The same consideration applies to local variables, for example,::
cdef Shrubbery another_shrubbery(Shrubbery sh1):
cdef Shrubbery sh2
sh2 = Shrubbery()
sh2.width = sh1.width
sh2.height = sh1.height
return sh2
Type Testing and Casting
------------------------
Suppose I have a method :meth:`quest` which returns an object of type :class:`Shrubbery`.
To access it's width I could write::
cdef Shrubbery sh = quest()
print sh.width
which requires the use of a local variable and performs a type test on assignment.
If you *know* the return value of :meth:`quest` will be of type :class:`Shrubbery`
you can use a cast to write::
print (<Shrubbery>quest()).width
This may be dangerous if :meth:`quest()` is not actually a :class:`Shrubbery`, as it
will try to access width as a C struct member which may not exist. At the C level,
rather than raising an :class:`AttributeError`, either an nonsensical result will be
returned (interpreting whatever data is at at that address as an int) or a segfault
may result from trying to access invalid memory. Instead, one can write::
print (<Shrubbery?>quest()).width
which performs a type check (possibly raising a :class:`TypeError`) before making the
cast and allowing the code to proceed.
To explicitly test the type of an object, use the :meth:`isinstance` method. By default,
in Python, the :meth:`isinstance` method checks the :class:`__class__` attribute of the
first argument to determine if it is of the required type. However, this is potentially
unsafe as the :class:`__class__` attribute can be spoofed or changed, but the C structure
of an extension type must be correct to access its :keyword:`cdef` attributes and call its :keyword:`cdef` methods. Cython detects if the second argument is a known extension
type and does a type check instead, analogous to Pyrex's :meth:`typecheck`.
The old behavior is always available by passing a tuple as the second parameter::
print isinstance(sh, Shrubbery) # Check the type of sh
print isinstance(sh, (Shrubbery,)) # Check sh.__class__
Extension types and None
=========================
When you declare a parameter or C variable as being of an extension type,
Cython will allow it to take on the value ``None`` as well as values of its
declared type. This is analogous to the way a C pointer can take on the value
``NULL``, and you need to exercise the same caution because of it. There is no
problem as long as you are performing Python operations on it, because full
dynamic type checking will be applied. However, when you access C attributes
of an extension type (as in the widen_shrubbery function above), it's up to
you to make sure the reference you're using is not ``None`` -- in the
interests of efficiency, Cython does not check this.
You need to be particularly careful when exposing Python functions which take
extension types as arguments. If we wanted to make :func:`widen_shrubbery` a
Python function, for example, if we simply wrote::
def widen_shrubbery(Shrubbery sh, extra_width): # This is
sh.width = sh.width + extra_width # dangerous!
then users of our module could crash it by passing ``None`` for the ``sh``
parameter.
One way to fix this would be::
def widen_shrubbery(Shrubbery sh, extra_width):
if sh is None:
raise TypeError
sh.width = sh.width + extra_width
but since this is anticipated to be such a frequent requirement, Cython
provides a more convenient way. Parameters of a Python function declared as an
extension type can have a ``not None`` clause::
def widen_shrubbery(Shrubbery sh not None, extra_width):
sh.width = sh.width + extra_width
Now the function will automatically check that ``sh`` is ``not None`` along
with checking that it has the right type.
.. note::
``not None`` clause can only be used in Python functions (defined with
:keyword:`def`) and not C functions (defined with :keyword:`cdef`). If
you need to check whether a parameter to a C function is None, you will
need to do it yourself.
.. note::
Some more things:
* The self parameter of a method of an extension type is guaranteed never to
be ``None``.
* When comparing a value with ``None``, keep in mind that, if ``x`` is a Python
object, ``x is None`` and ``x is not None`` are very efficient because they
translate directly to C pointer comparisons, whereas ``x == None`` and
``x != None``, or simply using ``x`` as a boolean value (as in ``if x: ...``)
will invoke Python operations and therefore be much slower.
Special methods
================
Although the principles are similar, there are substantial differences between
many of the :meth:`__xxx__` special methods of extension types and their Python
counterparts. There is a :ref:`separate page <special-methods>` devoted to this subject, and you should
read it carefully before attempting to use any special methods in your
extension types.
Properties
============
There is a special syntax for defining properties in an extension class::
cdef class Spam:
property cheese:
"A doc string can go here."
def __get__(self):
# This is called when the property is read.
...
def __set__(self, value):
# This is called when the property is written.
...
def __del__(self):
# This is called when the property is deleted.
The :meth:`__get__`, :meth:`__set__` and :meth:`__del__` methods are all
optional; if they are omitted, an exception will be raised when the
corresponding operation is attempted.
Here's a complete example. It defines a property which adds to a list each
time it is written to, returns the list when it is read, and empties the list
when it is deleted.::
# cheesy.pyx
cdef class CheeseShop:
cdef object cheeses
def __cinit__(self):
self.cheeses = []
property cheese:
def __get__(self):
return "We don't have: %s" % self.cheeses
def __set__(self, value):
self.cheeses.append(value)
def __del__(self):
del self.cheeses[:]
# Test input
from cheesy import CheeseShop
shop = CheeseShop()
print shop.cheese
shop.cheese = "camembert"
print shop.cheese
shop.cheese = "cheddar"
print shop.cheese
del shop.cheese
print shop.cheese
.. sourcecode:: text
# Test output
We don't have: []
We don't have: ['camembert']
We don't have: ['camembert', 'cheddar']
We don't have: []
Subclassing
=============
An extension type may inherit from a built-in type or another extension type::
cdef class Parrot:
...
cdef class Norwegian(Parrot):
...
A complete definition of the base type must be available to Cython, so if the
base type is a built-in type, it must have been previously declared as an
extern extension type. If the base type is defined in another Cython module, it
must either be declared as an extern extension type or imported using the
:keyword:`cimport` statement.
An extension type can only have one base class (no multiple inheritance).
Cython extension types can also be subclassed in Python. A Python class can
inherit from multiple extension types provided that the usual Python rules for
multiple inheritance are followed (i.e. the C layouts of all the base classes
must be compatible).
C methods
=========
Extension types can have C methods as well as Python methods. Like C
functions, C methods are declared using :keyword:`cdef` or :keyword:`cpdef` instead of
:keyword:`def`. C methods are "virtual", and may be overridden in derived
extension types.::
# pets.pyx
cdef class Parrot:
cdef void describe(self):
print "This parrot is resting."
cdef class Norwegian(Parrot):
cdef void describe(self):
Parrot.describe(self)
print "Lovely plumage!"
cdef Parrot p1, p2
p1 = Parrot()
p2 = Norwegian()
print "p1:"
p1.describe()
print "p2:"
p2.describe()
.. sourcecode:: text
# Output
p1:
This parrot is resting.
p2:
This parrot is resting.
Lovely plumage!
The above example also illustrates that a C method can call an inherited C
method using the usual Python technique, i.e.::
Parrot.describe(self)
Forward-declaring extension types
===================================
Extension types can be forward-declared, like :keyword:`struct` and
:keyword:`union` types. This will be necessary if you have two extension types
that need to refer to each other, e.g.::
cdef class Shrubbery # forward declaration
cdef class Shrubber:
cdef Shrubbery work_in_progress
cdef class Shrubbery:
cdef Shrubber creator
If you are forward-declaring an extension type that has a base class, you must
specify the base class in both the forward declaration and its subsequent
definition, for example,::
cdef class A(B)
...
cdef class A(B):
# attributes and methods
Making extension types weak-referenceable
==========================================
By default, extension types do not support having weak references made to
them. You can enable weak referencing by declaring a C attribute of type
object called :attr:`__weakref__`. For example,::
cdef class ExplodingAnimal:
"""This animal will self-destruct when it is
no longer strongly referenced."""
cdef object __weakref__
Public and external extension types
====================================
Extension types can be declared extern or public. An extern extension type
declaration makes an extension type defined in external C code available to a
Cython module. A public extension type declaration makes an extension type
defined in a Cython module available to external C code.
External extension types
------------------------
An extern extension type allows you to gain access to the internals of Python
objects defined in the Python core or in a non-Cython extension module.
.. note::
In previous versions of Pyrex, extern extension types were also used to
reference extension types defined in another Pyrex module. While you can still
do that, Cython provides a better mechanism for this. See
:ref:`sharing-declarations`.
Here is an example which will let you get at the C-level members of the
built-in complex object.::
cdef extern from "complexobject.h":
struct Py_complex:
double real
double imag
ctypedef class __builtin__.complex [object PyComplexObject]:
cdef Py_complex cval
# A function which uses the above type
def spam(complex c):
print "Real:", c.cval.real
print "Imag:", c.cval.imag
.. note::
Some important things:
1. In this example, :keyword:`ctypedef` class has been used. This is
because, in the Python header files, the ``PyComplexObject`` struct is
declared with:
.. sourcecode:: c
ctypedef struct {
...
} PyComplexObject;
2. As well as the name of the extension type, the module in which its type
object can be found is also specified. See the implicit importing section
below.
3. When declaring an external extension type, you don't declare any
methods. Declaration of methods is not required in order to call them,
because the calls are Python method calls. Also, as with
:keyword:`structs` and :keyword:`unions`, if your extension class
declaration is inside a :keyword:`cdef` extern from block, you only need to
declare those C members which you wish to access.
Name specification clause
-------------------------
The part of the class declaration in square brackets is a special feature only
available for extern or public extension types. The full form of this clause
is::
[object object_struct_name, type type_object_name ]
where ``object_struct_name`` is the name to assume for the type's C struct,
and type_object_name is the name to assume for the type's statically declared
type object. (The object and type clauses can be written in either order.)
If the extension type declaration is inside a :keyword:`cdef` extern from
block, the object clause is required, because Cython must be able to generate
code that is compatible with the declarations in the header file. Otherwise,
for extern extension types, the object clause is optional.
For public extension types, the object and type clauses are both required,
because Cython must be able to generate code that is compatible with external C
code.
Implicit importing
------------------
Cython requires you to include a module name in an extern extension class
declaration, for example,::
cdef extern class MyModule.Spam:
...
The type object will be implicitly imported from the specified module and
bound to the corresponding name in this module. In other words, in this
example an implicit::
from MyModule import Spam
statement will be executed at module load time.
The module name can be a dotted name to refer to a module inside a package
hierarchy, for example,::
cdef extern class My.Nested.Package.Spam:
...
You can also specify an alternative name under which to import the type using
an as clause, for example,::
cdef extern class My.Nested.Package.Spam as Yummy:
...
which corresponds to the implicit import statement::
from My.Nested.Package import Spam as Yummy
Type names vs. constructor names
--------------------------------
Inside a Cython module, the name of an extension type serves two distinct
purposes. When used in an expression, it refers to a module-level global
variable holding the type's constructor (i.e. its type-object). However, it
can also be used as a C type name to declare variables, arguments and return
values of that type.
When you declare::
cdef extern class MyModule.Spam:
...
the name Spam serves both these roles. There may be other names by which you
can refer to the constructor, but only Spam can be used as a type name. For
example, if you were to explicity import MyModule, you could use
``MyModule.Spam()`` to create a Spam instance, but you wouldn't be able to use
:class:`MyModule.Spam` as a type name.
When an as clause is used, the name specified in the as clause also takes over
both roles. So if you declare::
cdef extern class MyModule.Spam as Yummy:
...
then Yummy becomes both the type name and a name for the constructor. Again,
there are other ways that you could get hold of the constructor, but only
Yummy is usable as a type name.
Public extension types
======================
An extension type can be declared public, in which case a ``.h`` file is
generated containing declarations for its object struct and type object. By
including the ``.h`` file in external C code that you write, that code can
access the attributes of the extension type.
.. highlight:: cython
.. _external-C-code:
**********************************
Interfacing with External C Code
**********************************
One of the main uses of Cython is wrapping existing libraries of C code. This
is achieved by using external declarations to declare the C functions and
variables from the library that you want to use.
You can also use public declarations to make C functions and variables defined
in a Cython module available to external C code. The need for this is expected
to be less frequent, but you might want to do it, for example, if you are
`embedding Python`_ in another application as a scripting language. Just as a
Cython module can be used as a bridge to allow Python code to call C code, it
can also be used to allow C code to call Python code.
.. _embedding Python: http://www.freenet.org.nz/python/embeddingpyrex/
External declarations
=======================
By default, C functions and variables declared at the module level are local
to the module (i.e. they have the C static storage class). They can also be
declared extern to specify that they are defined elsewhere, for example::
cdef extern int spam_counter
cdef extern void order_spam(int tons)
Referencing C header files
---------------------------
When you use an extern definition on its own as in the examples above, Cython
includes a declaration for it in the generated C file. This can cause problems
if the declaration doesn't exactly match the declaration that will be seen by
other C code. If you're wrapping an existing C library, for example, it's
important that the generated C code is compiled with exactly the same
declarations as the rest of the library.
To achieve this, you can tell Cython that the declarations are to be found in a
C header file, like this::
cdef extern from "spam.h":
int spam_counter
void order_spam(int tons)
The ``cdef extern`` from clause does three things:
1. It directs Cython to place a ``#include`` statement for the named header file in
the generated C code.
2. It prevents Cython from generating any C code
for the declarations found in the associated block.
3. It treats all declarations within the block as though they started with
``cdef extern``.
It's important to understand that Cython does not itself read the C header
file, so you still need to provide Cython versions of any declarations from it
that you use. However, the Cython declarations don't always have to exactly
match the C ones, and in some cases they shouldn't or can't. In particular:
1. Don't use ``const``. Cython doesn't know anything about ``const``, so just
leave it out. Most of the time this shouldn't cause any problem, although
on rare occasions you might have to use a cast. You can also explicitly
declare something like::
ctypedef char* const_char_ptr "const char*"
though in most cases this will not be needed.
.. warning::
A problem with const could arise if you have something like::
cdef extern from "grail.h":
char *nun
where grail.h actually contains::
extern const char *nun;
and you do::
cdef void languissement(char *s):
#something that doesn't change s
...
languissement(nun)
which will cause the C compiler to complain. You can work around it by
casting away the constness::
languissement(<char *>nun)
2. Leave out any platform-specific extensions to C declarations such as
``__declspec()``.
3. If the header file declares a big struct and you only want to use a few
members, you only need to declare the members you're interested in. Leaving
the rest out doesn't do any harm, because the C compiler will use the full
definition from the header file.
In some cases, you might not need any of the struct's members, in which
case you can just put pass in the body of the struct declaration, e.g.::
cdef extern from "foo.h":
struct spam:
pass
.. note::
you can only do this inside a ``cdef extern from`` block; struct
declarations anywhere else must be non-empty.
4. If the header file uses ``typedef`` names such as :ctype:`word` to refer
to platform-dependent flavours of numeric types, you will need a
corresponding :keyword:`ctypedef` statement, but you don't need to match
the type exactly, just use something of the right general kind (int, float,
etc). For example,::
ctypedef int word
will work okay whatever the actual size of a :ctype:`word ` is (provided the header
file defines it correctly). Conversion to and from Python types, if any, will also
be used for this new type.
5. If the header file uses macros to define constants, translate them into a
dummy ``enum`` declaration.
6. If the header file defines a function using a macro, declare it as though
it were an ordinary function, with appropriate argument and result types.
7. For archaic reasons C uses the keyword :keyword:`void` to declare a function
taking no parameters. In Cython as in Python, simply declare such functions
as :meth:`foo()`.
A few more tricks and tips:
* If you want to include a C header because it's needed by another header, but
don't want to use any declarations from it, put pass in the extern-from
block::
cdef extern from "spam.h":
pass
* If you want to include some external declarations, but don't want to specify
a header file (because it's included by some other header that you've
already included) you can put ``*`` in place of the header file name::
cdef extern from *:
...
Styles of struct, union and enum declaration
----------------------------------------------
There are two main ways that structs, unions and enums can be declared in C
header files: using a tag name, or using a typedef. There are also some
variations based on various combinations of these.
It's important to make the Cython declarations match the style used in the
header file, so that Cython can emit the right sort of references to the type
in the code it generates. To make this possible, Cython provides two different
syntaxes for declaring a struct, union or enum type. The style introduced
above corresponds to the use of a tag name. To get the other style, you prefix
the declaration with :keyword:`ctypedef`, as illustrated below.
The following table shows the various possible styles that can be found in a
header file, and the corresponding Cython declaration that you should put in
the ``cdef extern`` from block. Struct declarations are used as an example; the
same applies equally to union and enum declarations.
+-------------------------+---------------------------------------------+-----------------------------------------------------------------------+
| C code | Possibilities for corresponding Cython Code | Comments |
+=========================+=============================================+=======================================================================+
| .. sourcecode:: c | :: | Cython will refer to the as ``struct Foo`` in the generated C code. |
| | | |
| struct Foo { | cdef struct Foo: | |
| ... | ... | |
| }; | | |
+-------------------------+---------------------------------------------+-----------------------------------------------------------------------+
| .. sourcecode:: c | :: | Cython will refer to the type simply as ``Foo`` in |
| | | the generated C code. |
| typedef struct { | ctypedef struct Foo: | |
| ... | ... | |
| } Foo; | | |
+-------------------------+---------------------------------------------+-----------------------------------------------------------------------+
| .. sourcecode:: c | :: | If the C header uses both a tag and a typedef with *different* |
| | | names, you can use either form of declaration in Cython |
| typedef struct foo { | cdef struct foo: | (although if you need to forward reference the type, |
| ... | ... | you'll have to use the first form). |
| } Foo; | ctypedef foo Foo #optional | |
| | | |
| | or:: | |
| | | |
| | ctypedef struct Foo: | |
| | ... | |
+-------------------------+---------------------------------------------+-----------------------------------------------------------------------+
| .. sourcecode:: c | :: | If the header uses the *same* name for the tag and typedef, you |
| | | won't be able to include a :keyword:`ctypedef` for it -- but then, |
| typedef struct Foo { | cdef struct Foo: | it's not necessary. |
| ... | ... | |
| } Foo; | | |
+-------------------------+---------------------------------------------+-----------------------------------------------------------------------+
Note that in all the cases below, you refer to the type in Cython code simply
as :ctype:`Foo`, not ``struct Foo``.
Accessing Python/C API routines
---------------------------------
One particular use of the ``cdef extern from`` statement is for gaining access to
routines in the Python/C API. For example,::
cdef extern from "Python.h":
object PyString_FromStringAndSize(char *s, Py_ssize_t len)
will allow you to create Python strings containing null bytes.
Special Types
--------------
Cython predefines the name ``Py_ssize_t`` for use with Python/C API routines. To
make your extensions compatible with 64-bit systems, you should always use
this type where it is specified in the documentation of Python/C API routines.
Windows Calling Conventions
----------------------------
The ``__stdcall`` and ``__cdecl`` calling convention specifiers can be used in
Cython, with the same syntax as used by C compilers on Windows, for example,::
cdef extern int __stdcall FrobnicateWindow(long handle)
cdef void (__stdcall *callback)(void *)
If ``__stdcall`` is used, the function is only considered compatible with
other ``__stdcall`` functions of the same signature.
Resolving naming conflicts - C name specifications
----------------------------------------------------
Each Cython module has a single module-level namespace for both Python and C
names. This can be inconvenient if you want to wrap some external C functions
and provide the Python user with Python functions of the same names.
Cython provides a couple of different ways of solving this problem. The
best way, especially if you have many C functions to wrap, is probably to put
the extern C function declarations into a different namespace using the
facilities described in the section on sharing declarations between Cython
modules.
The other way is to use a C name specification to give different Cython and C
names to the C function. Suppose, for example, that you want to wrap an
external function called :func:`eject_tomato`. If you declare it as::
cdef extern void c_eject_tomato "eject_tomato" (float speed)
then its name inside the Cython module will be ``c_eject_tomato``, whereas its name
in C will be ``eject_tomato``. You can then wrap it with::
def eject_tomato(speed):
c_eject_tomato(speed)
so that users of your module can refer to it as ``eject_tomato``.
Another use for this feature is referring to external names that happen to be
Cython keywords. For example, if you want to call an external function called
print, you can rename it to something else in your Cython module.
As well as functions, C names can be specified for variables, structs, unions,
enums, struct and union members, and enum values. For example,::
cdef extern int one "ein", two "zwei"
cdef extern float three "drei"
cdef struct spam "SPAM":
int i "eye"
cdef enum surprise "inquisition":
first "alpha"
second "beta" = 3
Using Cython Declarations from C
==================================
Cython provides two methods for making C declarations from a Cython module
available for use by external C code---public declarations and C API
declarations.
.. note::
You do not need to use either of these to make declarations from one
Cython module available to another Cython module – you should use the
:keyword:`cimport` statement for that. Sharing Declarations Between Cython Modules.
Public Declarations
---------------------
You can make C types, variables and functions defined in a Cython module
accessible to C code that is linked with the module, by declaring them with
the public keyword::
cdef public struct Bunny: # public type declaration
int vorpalness
cdef public int spam # public variable declaration
cdef public void grail(Bunny *): # public function declaration
...
If there are any public declarations in a Cython module, a header file called
:file:`modulename.h` file is generated containing equivalent C declarations for
inclusion in other C code.
Any C code wanting to make use of these declarations will need to be linked,
either statically or dynamically, with the extension module.
If the Cython module resides within a package, then the name of the ``.h``
file consists of the full dotted name of the module, e.g. a module called
:mod:`foo.spam` would have a header file called :file:`foo.spam.h`.
C API Declarations
-------------------
The other way of making declarations available to C code is to declare them
with the :keyword:`api` keyword. You can use this keyword with C functions and
extension types. A header file called :file:`modulename_api.h` is produced
containing declarations of the functions and extension types, and a function
called :func:`import_modulename`.
C code wanting to use these functions or extension types needs to include the
header and call the :func:`import_modulename` function. The other functions
can then be called and the extension types used as usual.
Any public C type or extension type declarations in the Cython module are also
made available when you include :file:`modulename_api.h`.::
# delorean.pyx
cdef public struct Vehicle:
int speed
float power
cdef api void activate(Vehicle *v):
if v.speed >= 88 and v.power >= 1.21:
print "Time travel achieved"
.. sourcecode:: c
# marty.c
#include "delorean_api.h"
Vehicle car;
int main(int argc, char *argv[]) {
import_delorean();
car.speed = atoi(argv[1]);
car.power = atof(argv[2]);
activate(&car);
}
.. note::
Any types defined in the Cython module that are used as argument or
return types of the exported functions will need to be declared public,
otherwise they won't be included in the generated header file, and you will
get errors when you try to compile a C file that uses the header.
Using the :keyword:`api` method does not require the C code using the
declarations to be linked with the extension module in any way, as the Python
import machinery is used to make the connection dynamically. However, only
functions can be accessed this way, not variables.
You can use both :keyword:`public` and :keyword:`api` on the same function to
make it available by both methods, e.g.::
cdef public api void belt_and_braces():
...
However, note that you should include either :file:`modulename.h` or
:file:`modulename_api.h` in a given C file, not both, otherwise you may get
conflicting dual definitions.
If the Cython module resides within a package, then:
* The name of the header file contains of the full dotted name of the module.
* The name of the importing function contains the full name with dots replaced
by double underscores.
E.g. a module called :mod:`foo.spam` would have an API header file called
:file:`foo.spam_api.h` and an importing function called
:func:`import_foo__spam`.
Multiple public and API declarations
--------------------------------------
You can declare a whole group of items as :keyword:`public` and/or
:keyword:`api` all at once by enclosing them in a :keyword:`cdef` block, for
example,::
cdef public api:
void order_spam(int tons)
char *get_lunch(float tomato_size)
This can be a useful thing to do in a ``.pxd`` file (see
:ref:`sharing-declarations`) to make the module's public interface
available by all three methods.
Acquiring and Releasing the GIL
---------------------------------
Cython provides facilities for releasing the Global Interpreter Lock (GIL)
before calling C code, and for acquiring the GIL in functions that are to be
called back from C code that is executed without the GIL.
Releasing the GIL
^^^^^^^^^^^^^^^^^
You can release the GIL around a section of code using the
:keyword:`with nogil` statement::
with nogil:
<code to be executed with the GIL released>
Code in the body of the statement must not manipulate Python objects in any
way, and must not call anything that manipulates Python objects without first
re-acquiring the GIL. Cython currently does not check this.
Acquiring the GIL
^^^^^^^^^^^^^^^^^
A C function that is to be used as a callback from C code that is executed
without the GIL needs to acquire the GIL before it can manipulate Python
objects. This can be done by specifying with :keyword:`gil` in the function
header::
cdef void my_callback(void *data) with gil:
...
Declaring a function as callable without the GIL
--------------------------------------------------
You can specify :keyword:`nogil` in a C function header or function type to
declare that it is safe to call without the GIL.::
cdef void my_gil_free_func(int spam) nogil:
...
If you are implementing such a function in Cython, it cannot have any Python
arguments, Python local variables, or Python return type, and cannot
manipulate Python objects in any way or call any function that does so without
acquiring the GIL first. Some of these restrictions are currently checked by
Cython, but not all. It is possible that more stringent checking will be
performed in the future.
Declaring a function with :keyword:`gil` also implicitly makes its signature
:keyword:`nogil`.
Cython Users Guide
==================
Contents:
.. toctree::
:maxdepth: 2
overview
tutorial
language_basics
extension_types
special_methods
sharing_declarations
external_C_code
source_files_and_compilation
wrapping_CPlusPlus
numpy_tutorial
profiling_tutorial
limitations
pyrex_differences
early_binding_for_speed
Indices and tables
------------------
* :ref:`genindex`
* :ref:`modindex`
* :ref:`search`
.. toctree::
.. highlight:: cython
.. _language-basics:
*****************
Language Basics
*****************
C variable and type definitions
===============================
The :keyword:`cdef` statement is used to declare C variables, either local or
module-level::
cdef int i, j, k
cdef float f, g[42], *h
and C :keyword:`struct`, :keyword:`union` or :keyword:`enum` types::
cdef struct Grail:
int age
float volume
cdef union Food:
char *spam
float *eggs
cdef enum CheeseType:
cheddar, edam,
camembert
cdef enum CheeseState:
hard = 1
soft = 2
runny = 3
There is currently no special syntax for defining a constant, but you can use
an anonymous :keyword:`enum` declaration for this purpose, for example,::
cdef enum:
tons_of_spam = 3
.. note::
the words ``struct``, ``union`` and ``enum`` are used only when
defining a type, not when referring to it. For example, to declare a variable
pointing to a ``Grail`` you would write::
cdef Grail *gp
and not::
cdef struct Grail *gp # WRONG
There is also a ``ctypedef`` statement for giving names to types, e.g.::
ctypedef unsigned long ULong
ctypedef int *IntPtr
Grouping multiple C declarations
--------------------------------
If you have a series of declarations that all begin with :keyword:`cdef`, you
can group them into a :keyword:`cdef` block like this::
cdef:
struct Spam:
int tons
int i
float f
Spam *p
void f(Spam *s):
print s.tons, "Tons of spam"
Python functions vs. C functions
==================================
There are two kinds of function definition in Cython:
Python functions are defined using the def statement, as in Python. They take
Python objects as parameters and return Python objects.
C functions are defined using the new :keyword:`cdef` statement. They take
either Python objects or C values as parameters, and can return either Python
objects or C values.
Within a Cython module, Python functions and C functions can call each other
freely, but only Python functions can be called from outside the module by
interpreted Python code. So, any functions that you want to "export" from your
Cython module must be declared as Python functions using def.
There is also a hybrid function, called :keyword:`cpdef`. A :keyword:`cpdef`
can be called from anywhere, but uses the faster C calling conventions
when being called from other Cython code.
Parameters of either type of function can be declared to have C data types,
using normal C declaration syntax. For example,::
def spam(int i, char *s):
...
cdef int eggs(unsigned long l, float f):
...
When a parameter of a Python function is declared to have a C data type, it is
passed in as a Python object and automatically converted to a C value, if
possible. Automatic conversion is currently only possible for numeric types
and string types; attempting to use any other type for the parameter of a
Python function will result in a compile-time error.
C functions, on the other hand, can have parameters of any type, since they're
passed in directly using a normal C function call.
A more complete comparison of the pros and cons of these different method
types can be found at :ref:`early-binding-for-speed`.
Python objects as parameters and return values
----------------------------------------------
If no type is specified for a parameter or return value, it is assumed to be a
Python object. (Note that this is different from the C convention, where it
would default to int.) For example, the following defines a C function that
takes two Python objects as parameters and returns a Python object::
cdef spamobjs(x, y):
...
Reference counting for these objects is performed automatically according to
the standard Python/C API rules (i.e. borrowed references are taken as
parameters and a new reference is returned).
The name object can also be used to explicitly declare something as a Python
object. This can be useful if the name being declared would otherwise be taken
as the name of a type, for example,::
cdef ftang(object int):
...
declares a parameter called int which is a Python object. You can also use
object as the explicit return type of a function, e.g.::
cdef object ftang(object int):
...
In the interests of clarity, it is probably a good idea to always be explicit
about object parameters in C functions.
Error return values
-------------------
If you don't do anything special, a function declared with :keyword:`cdef` that
does not return a Python object has no way of reporting Python exceptions to
its caller. If an exception is detected in such a function, a warning message
is printed and the exception is ignored.
If you want a C function that does not return a Python object to be able to
propagate exceptions to its caller, you need to declare an exception value for
it. Here is an example::
cdef int spam() except -1:
...
With this declaration, whenever an exception occurs inside spam, it will
immediately return with the value ``-1``. Furthermore, whenever a call to spam
returns ``-1``, an exception will be assumed to have occurred and will be
propagated.
When you declare an exception value for a function, you should never
explicitly return that value. If all possible return values are legal and you
can't reserve one entirely for signalling errors, you can use an alternative
form of exception value declaration::
cdef int spam() except? -1:
...
The "?" indicates that the value ``-1`` only indicates a possible error. In this
case, Cython generates a call to :cfunc:`PyErr_Occurred` if the exception value is
returned, to make sure it really is an error.
There is also a third form of exception value declaration::
cdef int spam() except *:
...
This form causes Cython to generate a call to :cfunc:`PyErr_Occurred` after
every call to spam, regardless of what value it returns. If you have a
function returning void that needs to propagate errors, you will have to use
this form, since there isn't any return value to test.
Otherwise there is little use for this form.
An external C++ function that may raise an exception can be declared with::
cdef int spam() except +
See :ref:`wrapping-cplusplus` for more details.
Some things to note:
* Exception values can only declared for functions returning an integer, enum,
float or pointer type, and the value must be a constant expression.
Void functions can only use the ``except *`` form.
* The exception value specification is part of the signature of the function.
If you're passing a pointer to a function as a parameter or assigning it
to a variable, the declared type of the parameter or variable must have
the same exception value specification (or lack thereof). Here is an
example of a pointer-to-function declaration with an exception
value::
int (*grail)(int, char *) except -1
* You don't need to (and shouldn't) declare exception values for functions
which return Python objects. Remember that a function with no declared
return type implicitly returns a Python object. (Exceptions on such functions
are implicitly propagated by returning NULL.)
Checking return values of non-Cython functions
----------------------------------------------
It's important to understand that the except clause does not cause an error to
be raised when the specified value is returned. For example, you can't write
something like::
cdef extern FILE *fopen(char *filename, char *mode) except NULL # WRONG!
and expect an exception to be automatically raised if a call to :func:`fopen`
returns ``NULL``. The except clause doesn't work that way; its only purpose is
for propagating Python exceptions that have already been raised, either by a Cython
function or a C function that calls Python/C API routines. To get an exception
from a non-Python-aware function such as :func:`fopen`, you will have to check the
return value and raise it yourself, for example,::
cdef FILE *p
p = fopen("spam.txt", "r")
if p == NULL:
raise SpamError("Couldn't open the spam file")
Automatic type conversions
==========================
In most situations, automatic conversions will be performed for the basic
numeric and string types when a Python object is used in a context requiring a
C value, or vice versa. The following table summarises the conversion
possibilities.
+----------------------------+--------------------+------------------+
| C types | From Python types | To Python types |
+============================+====================+==================+
| [unsigned] char | int, long | int |
| [unsigned] short | | |
| int, long | | |
+----------------------------+--------------------+------------------+
| unsigned int | int, long | long |
| unsigned long | | |
| [unsigned] long long | | |
+----------------------------+--------------------+------------------+
| float, double, long double | int, long, float | float |
+----------------------------+--------------------+------------------+
| char * | str/bytes | str/bytes [#]_ |
+----------------------------+--------------------+------------------+
| struct | | dict |
+----------------------------+--------------------+------------------+
.. [#] The conversion is to/from str for Python 2.x, and bytes for Python 3.x.
Caveats when using a Python string in a C context
-------------------------------------------------
You need to be careful when using a Python string in a context expecting a
``char *``. In this situation, a pointer to the contents of the Python string is
used, which is only valid as long as the Python string exists. So you need to
make sure that a reference to the original Python string is held for as long
as the C string is needed. If you can't guarantee that the Python string will
live long enough, you will need to copy the C string.
Cython detects and prevents some mistakes of this kind. For instance, if you
attempt something like::
cdef char *s
s = pystring1 + pystring2
then Cython will produce the error message ``Obtaining char * from temporary
Python value``. The reason is that concatenating the two Python strings
produces a new Python string object that is referenced only by a temporary
internal variable that Cython generates. As soon as the statement has finished,
the temporary variable will be decrefed and the Python string deallocated,
leaving ``s`` dangling. Since this code could not possibly work, Cython refuses to
compile it.
The solution is to assign the result of the concatenation to a Python
variable, and then obtain the ``char *`` from that, i.e.::
cdef char *s
p = pystring1 + pystring2
s = p
It is then your responsibility to hold the reference p for as long as
necessary.
Keep in mind that the rules used to detect such errors are only heuristics.
Sometimes Cython will complain unnecessarily, and sometimes it will fail to
detect a problem that exists. Ultimately, you need to understand the issue and
be careful what you do.
Statements and expressions
==========================
Control structures and expressions follow Python syntax for the most part.
When applied to Python objects, they have the same semantics as in Python
(unless otherwise noted). Most of the Python operators can also be applied to
C values, with the obvious semantics.
If Python objects and C values are mixed in an expression, conversions are
performed automatically between Python objects and C numeric or string types.
Reference counts are maintained automatically for all Python objects, and all
Python operations are automatically checked for errors, with appropriate
action taken.
Differences between C and Cython expressions
--------------------------------------------
There are some differences in syntax and semantics between C expressions and
Cython expressions, particularly in the area of C constructs which have no
direct equivalent in Python.
* An integer literal is treated as a C constant, and will
be truncated to whatever size your C compiler thinks appropriate.
To get a Python integer (of arbitrary precision) cast immediately to
an object (e.g. ``<object>100000000000000000000``). The ``L``, ``LL``,
and ``U`` suffixes have the same meaning as in C.
* There is no ``->`` operator in Cython. Instead of ``p->x``, use ``p.x``
* There is no unary ``*`` operator in Cython. Instead of ``*p``, use ``p[0]``
* There is an ``&`` operator, with the same semantics as in C.
* The null C pointer is called ``NULL``, not ``0`` (and ``NULL`` is a reserved word).
* Type casts are written ``<type>value`` , for example::
cdef char *p, float *q
p = <char*>q
Scope rules
-----------
Cython determines whether a variable belongs to a local scope, the module
scope, or the built-in scope completely statically. As with Python, assigning
to a variable which is not otherwise declared implicitly declares it to be a
Python variable residing in the scope where it is assigned.
.. note::
A consequence of these rules is that the module-level scope behaves the
same way as a Python local scope if you refer to a variable before assigning
to it. In particular, tricks such as the following will not work in Cython::
try:
x = True
except NameError:
True = 1
because, due to the assignment, the True will always be looked up in the
module-level scope. You would have to do something like this instead::
import __builtin__
try:
True = __builtin__.True
except AttributeError:
True = 1
Built-in Functions
------------------
Cython compiles calls to the following built-in functions into direct calls to
the corresponding Python/C API routines, making them particularly fast.
+------------------------------+-------------+----------------------------+
| Function and arguments | Return type | Python/C API Equivalent |
+==============================+=============+============================+
| abs(obj) | object | PyNumber_Absolute |
+------------------------------+-------------+----------------------------+
| delattr(obj, name) | int | PyObject_DelAttr |
+------------------------------+-------------+----------------------------+
| dir(obj) | object | PyObject_Dir |
| getattr(obj, name) (Note 1) | | |
| getattr3(obj, name, default) | | |
+------------------------------+-------------+----------------------------+
| hasattr(obj, name) | int | PyObject_HasAttr |
+------------------------------+-------------+----------------------------+
| hash(obj) | int | PyObject_Hash |
+------------------------------+-------------+----------------------------+
| intern(obj) | object | PyObject_InternFromString |
+------------------------------+-------------+----------------------------+
| isinstance(obj, type) | int | PyObject_IsInstance |
+------------------------------+-------------+----------------------------+
| issubclass(obj, type) | int | PyObject_IsSubclass |
+------------------------------+-------------+----------------------------+
| iter(obj) | object | PyObject_GetIter |
+------------------------------+-------------+----------------------------+
| len(obj) | Py_ssize_t | PyObject_Length |
+------------------------------+-------------+----------------------------+
| pow(x, y, z) (Note 2) | object | PyNumber_Power |
+------------------------------+-------------+----------------------------+
| reload(obj) | object | PyImport_ReloadModule |
+------------------------------+-------------+----------------------------+
| repr(obj) | object | PyObject_Repr |
+------------------------------+-------------+----------------------------+
| setattr(obj, name) | void | PyObject_SetAttr |
+------------------------------+-------------+----------------------------+
Note 1: There are two different functions corresponding to the Python
:func:`getattr` depending on whether a third argument is used. In a Python
context, they both evaluate to the Python :func:`getattr` function.
Note 2: Only the three-argument form of :func:`pow` is supported. Use the
``**`` operator otherwise.
Only direct function calls using these names are optimised. If you do
something else with one of these names that assumes it's a Python object, such
as assign it to a Python variable, and later call it, the call will be made as
a Python function call.
Operator Precedence
-------------------
Keep in mind that there are some differences in operator precedence between
Python and C, and that Cython uses the Python precedences, not the C ones.
Integer for-loops
------------------
Cython recognises the usual Python for-in-range integer loop pattern::
for i in range(n):
...
If ``i`` is declared as a :keyword:`cdef` integer type, it will
optimise this into a pure C loop. This restriction is required as
otherwise the generated code wouldn't be correct due to potential
integer overflows on the target architecture. If you are worried that
the loop is not being converted correctly, use the annotate feature of
the cython commandline (``-a``) to easily see the generated C code.
See :ref:`automatic-range-conversion`
For backwards compatibility to Pyrex, Cython also supports another
form of for-loop::
for i from 0 <= i < n:
...
or::
for i from 0 <= i < n by s:
...
where ``s`` is some integer step size.
Some things to note about the for-from loop:
* The target expression must be a variable name.
* The name between the lower and upper bounds must be the same as the target
name.
* The direction of iteration is determined by the relations. If they are both
from the set {``<``, ``<=``} then it is upwards; if they are both from the set
{``>``, ``>=``} then it is downwards. (Any other combination is disallowed.)
Like other Python looping statements, break and continue may be used in the
body, and the loop may have an else clause.
The include statement
=====================
.. warning::
Historically the ``include`` statement was used for sharing declarations.
Use :ref:`sharing-declarations` instead.
A Cython source file can include material from other files using the include
statement, for example::
include "spamstuff.pxi"
The contents of the named file are textually included at that point. The
included file can contain any complete statements or declarations that are
valid in the context where the include statement appears, including other
include statements. The contents of the included file should begin at an
indentation level of zero, and will be treated as though they were indented to
the level of the include statement that is including the file.
.. note::
There are other mechanisms available for splitting Cython code into
separate parts that may be more appropriate in many cases. See
:ref:`sharing-declarations`.
Conditional Compilation
=======================
Some features are available for conditional compilation and compile-time
constants within a Cython source file.
Compile-Time Definitions
------------------------
A compile-time constant can be defined using the DEF statement::
DEF FavouriteFood = "spam"
DEF ArraySize = 42
DEF OtherArraySize = 2 * ArraySize + 17
The right-hand side of the ``DEF`` must be a valid compile-time expression.
Such expressions are made up of literal values and names defined using ``DEF``
statements, combined using any of the Python expression syntax.
The following compile-time names are predefined, corresponding to the values
returned by :func:`os.uname`.
UNAME_SYSNAME, UNAME_NODENAME, UNAME_RELEASE,
UNAME_VERSION, UNAME_MACHINE
The following selection of builtin constants and functions are also available:
None, True, False,
abs, bool, chr, cmp, complex, dict, divmod, enumerate,
float, hash, hex, int, len, list, long, map, max, min,
oct, ord, pow, range, reduce, repr, round, slice, str,
sum, tuple, xrange, zip
A name defined using ``DEF`` can be used anywhere an identifier can appear,
and it is replaced with its compile-time value as though it were written into
the source at that point as a literal. For this to work, the compile-time
expression must evaluate to a Python value of type ``int``, ``long``,
``float`` or ``str``.::
cdef int a1[ArraySize]
cdef int a2[OtherArraySize]
print "I like", FavouriteFood
Conditional Statements
----------------------
The ``IF`` statement can be used to conditionally include or exclude sections
of code at compile time. It works in a similar way to the ``#if`` preprocessor
directive in C.::
IF UNAME_SYSNAME == "Windows":
include "icky_definitions.pxi"
ELIF UNAME_SYSNAME == "Darwin":
include "nice_definitions.pxi"
ELIF UNAME_SYSNAME == "Linux":
include "penguin_definitions.pxi"
ELSE:
include "other_definitions.pxi"
The ``ELIF`` and ``ELSE`` clauses are optional. An ``IF`` statement can appear
anywhere that a normal statement or declaration can appear, and it can contain
any statements or declarations that would be valid in that context, including
``DEF`` statements and other ``IF`` statements.
The expressions in the ``IF`` and ``ELIF`` clauses must be valid compile-time
expressions as for the ``DEF`` statement, although they can evaluate to any
Python value, and the truth of the result is determined in the usual Python
way.
.. highlight:: cython
.. _cython-limitations:
*************
Limitations
*************
Unsupported Python Features
============================
One of our goals is to make Cython as compatible as possible with standard
Python. This page lists the things that work in Python but not in Cython.
.. TODO: this limitation seems to be removed
.. ::
.. from module import *
.. This relies on at-runtime insertion of objects into the current namespace and
.. probably will be one of the few features never implemented (as any
.. implementation would be very slow). However, there is the --pre-import option
.. with treats all un-declared names as coming from the specified module, which
.. has the same effect as putting "from module import *" at the top-level of the
.. code. Note: the one difference is that builtins cannot be overriden in this
.. way, as the 'pre-import' scope is even higher than the builtin scope.
Nested def statements
----------------------
Function definitions (whether using ``def`` or ``cdef``) cannot be nested within
other function definitions. ::
def make_func():
def f(x):
return x*x
return f
(work in progress) This relies on functional closures
Generators
-----------
Using the yield keywords. (work in progress) This relies on functional closures
.. TODO Not really a limitation, rather an enchancement proposal
.. Support for builtin types
.. --------------------------
.. Support for statically declaring types such as list and dict and sequence
.. should be provided, and optimized code produced.
.. This needs to be well thought-out, and I think Pyrex has some plans along
.. these lines as well.
Other Current Limitations
==========================
* The :func:`globals` and :func:`locals` functions cannot be used.
* Class and function definitions cannot be placed inside control structures.
Semantic differences between Python and Cython
----------------------------------------------
Behaviour of class scopes
^^^^^^^^^^^^^^^^^^^^^^^^^
In Python, referring to a method of a class inside the class definition, i.e.
while the class is being defined, yields a plain function object, but in
Cython it yields an unbound method [#]_. A consequence of this is that the
usual idiom for using the :func:`classmethod` and :func:`staticmethod` functions,
e.g.::
class Spam:
def method(cls):
...
method = classmethod(method)
will not work in Cython. This can be worked around by defining the function
outside the class, and then assigning the result of ``classmethod`` or
``staticmethod`` inside the class, i.e.::
def Spam_method(cls):
...
class Spam:
method = classmethod(Spam_method)
.. rubric:: Footnotes
.. [#] The reason for the different behaviour of class scopes is that
Cython-defined Python functions are ``PyCFunction`` objects, not
``PyFunction`` objects, and are not recognised by the machinery that creates a
bound or unbound method when a function is extracted from a class. To get
around this, Cython wraps each method in an unbound method object itself
before storing it in the class's dictionary.
.. highlight:: cython
.. _numpy_tutorial:
**************************
Cython for NumPy users
**************************
This tutorial is aimed at NumPy users who have no experience with Cython at
all. If you have some knowledge of Cython you may want to skip to the
''Efficient indexing'' section which explains the new improvements made in
summer 2008.
The main scenario considered is NumPy end-use rather than NumPy/SciPy
development. The reason is that Cython is not (yet) able to support functions
that are generic with respect to datatype and the number of dimensions in a
high-level fashion. This restriction is much more severe for SciPy development
than more specific, "end-user" functions. See the last section for more
information on this.
The style of this tutorial will not fit everybody, so you can also consider:
* Robert Bradshaw's `slides on cython for SciPy2008
<http://wiki.sagemath.org/scipy08?action=AttachFile&do=get&target=scipy-cython.tgz>`_
(a higher-level and quicker introduction)
* Basic Cython documentation (see `Cython front page <http://cython.org>`_).
* ``[:enhancements/buffer:Spec for the efficient indexing]``
.. Note::
The fast array access documented below is a completely new feature, and
there may be bugs waiting to be discovered. It might be a good idea to do
a manual sanity check on the C code Cython generates before using this for
serious purposes, at least until some months have passed.
Cython at a glance
====================
Cython is a compiler which compiles Python-like code files to C code. Still,
''Cython is not a Python to C translator''. That is, it doesn't take your full
program and "turns it into C" -- rather, the result makes full use of the
Python runtime environment. A way of looking at it may be that your code is
still Python in that it runs within the Python runtime environment, but rather
than compiling to interpreted Python bytecode one compiles to native machine
code (but with the addition of extra syntax for easy embedding of faster
C-like code).
This has two important consequences:
* Speed. How much depends very much on the program involved though. Typical Python numerical programs would tend to gain very little as most time is spent in lower-level C that is used in a high-level fashion. However for-loop-style programs can gain many orders of magnitude, when typing information is added (and is so made possible as a realistic alternative).
* Easy calling into C code. One of Cython's purposes is to allow easy wrapping
of C libraries. When writing code in Cython you can call into C code as
easily as into Python code.
Some Python constructs are not yet supported, though making Cython compile all
Python code is a stated goal (among the more important omissions are inner
functions and generator functions).
Your Cython environment
========================
Using Cython consists of these steps:
1. Write a :file:`.pyx` source file
2. Run the Cython compiler to generate a C file
3. Run a C compiler to generate a compiled library
4. Run the Python interpreter and ask it to import the module
However there are several options to automate these steps:
1. The `SAGE <http://sagemath.org>`_ mathematics software system provides
excellent support for using Cython and NumPy from an interactive command
line (like IPython) or through a notebook interface (like
Maple/Mathematica). See `this documentation
<http://www.sagemath.org/doc/prog/node40.html>`_.
2. A version of `pyximport <http://www.prescod.net/pyximport/>`_ is shipped
with Cython, so that you can import pyx-files dynamically into Python and
have them compiled automatically (See :ref:`pyximport`).
3. Cython supports distutils so that you can very easily create build scripts
which automate the process, this is the preferred method for full programs.
4. Manual compilation (see below)
.. Note::
If using another interactive command line environment than SAGE, like
IPython or Python itself, it is important that you restart the process
when you recompile the module. It is not enough to issue an "import"
statement again.
Installation
=============
Unless you are used to some other automatic method:
`download Cython <http://cython.org/#download>`_ (0.9.8.1.1 or later), unpack it,
and run the usual ```python setup.py install``. This will install a
``cython`` executable on your system. It is also possible to use Cython from
the source directory without installing (simply launch :file:`cython.py` in the
root directory).
As of this writing SAGE comes with an older release of Cython than required
for this tutorial. So if using SAGE you should download the newest Cython and
then execute ::
$ cd path/to/cython-distro
$ path-to-sage/sage -python setup.py install
This will install the newest Cython into SAGE.
Manual compilation
====================
As it is always important to know what is going on, I'll describe the manual
method here. First Cython is run::
$ cython yourmod.pyx
This creates :file:`yourmod.c` which is the C source for a Python extension
module. A useful additional switch is ``-a`` which will generate a document
:file:`yourmod.html`) that shows which Cython code translates to which C code
line by line.
Then we compile the C file. This may vary according to your system, but the C
file should be built like Python was built. Python documentation for writing
extensions should have some details. On Linux this often means something
like::
$ gcc -shared -pthread -fPIC -fwrapv -O2 -Wall -fno-strict-aliasing -I/usr/include/python2.5 -o yourmod.so yourmod.c
``gcc`` should have access to the NumPy C header files so if they are not
installed at :file:`/usr/include/numpy` or similar you may need to pass another
option for those.
This creates :file:`yourmod.so` in the same directory, which is importable by
Python by using a normal ``import yourmod`` statement.
The first Cython program
==========================
The code below does 2D discrete convolution of an image with a filter (and I'm
sure you can do better!, let it serve for demonstration purposes). It is both
valid Python and valid Cython code. I'll refer to it as both
:file:`convolve_py.py` for the Python version and :file:`convolve1.pyx` for the
Cython version -- Cython uses ".pyx" as its file suffix.
.. code-block:: python
from __future__ import division
import numpy as np
def naive_convolve(f, g):
# f is an image and is indexed by (v, w)
# g is a filter kernel and is indexed by (s, t),
# it needs odd dimensions
# h is the output image and is indexed by (x, y),
# it is not cropped
if g.shape[0] % 2 != 1 or g.shape[1] % 2 != 1:
raise ValueError("Only odd dimensions on filter supported")
# smid and tmid are number of pixels between the center pixel
# and the edge, ie for a 5x5 filter they will be 2.
#
# The output size is calculated by adding smid, tmid to each
# side of the dimensions of the input image.
vmax = f.shape[0]
wmax = f.shape[1]
smax = g.shape[0]
tmax = g.shape[1]
smid = smax // 2
tmid = tmax // 2
xmax = vmax + 2*smid
ymax = wmax + 2*tmid
# Allocate result image.
h = np.zeros([xmax, ymax], dtype=f.dtype)
# Do convolution
for x in range(xmax):
for y in range(ymax):
# Calculate pixel value for h at (x,y). Sum one component
# for each pixel (s, t) of the filter g.
s_from = max(smid - x, -smid)
s_to = min((xmax - x) - smid, smid + 1)
t_from = max(tmid - y, -tmid)
t_to = min((ymax - y) - tmid, tmid + 1)
value = 0
for s in range(s_from, s_to):
for t in range(t_from, t_to):
v = x - smid + s
w = y - tmid + t
value += g[smid - s, tmid - t] * f[v, w]
h[x, y] = value
return h
This should be compiled to produce :file:`yourmod.so` (for Linux systems). We
run a Python session to test both the Python version (imported from
``.py``-file) and the compiled Cython module.
.. sourcecode:: ipython
In [1]: import numpy as np
In [2]: import convolve_py
In [3]: convolve_py.naive_convolve(np.array([[1, 1, 1]], dtype=np.int),
... np.array([[1],[2],[1]], dtype=np.int))
Out [3]:
array([[1, 1, 1],
[2, 2, 2],
[1, 1, 1]])
In [4]: import convolve1
In [4]: convolve1.naive_convolve(np.array([[1, 1, 1]], dtype=np.int),
... np.array([[1],[2],[1]], dtype=np.int))
Out [4]:
array([[1, 1, 1],
[2, 2, 2],
[1, 1, 1]])
In [11]: N = 100
In [12]: f = np.arange(N*N, dtype=np.int).reshape((N,N))
In [13]: g = np.arange(81, dtype=np.int).reshape((9, 9))
In [19]: %timeit -n2 -r3 convolve_py.naive_convolve(f, g)
2 loops, best of 3: 1.86 s per loop
In [20]: %timeit -n2 -r3 convolve1.naive_convolve(f, g)
2 loops, best of 3: 1.41 s per loop
There's not such a huge difference yet; because the C code still does exactly
what the Python interpreter does (meaning, for instance, that a new object is
allocated for each number used). Look at the generated html file and see what
is needed for even the simplest statements you get the point quickly. We need
to give Cython more information; we need to add types.
Adding types
=============
To add types we use custom Cython syntax, so we are now breaking Python source
compatibility. Here's :file:`convolve2.pyx`. *Read the comments!* ::
from __future__ import division
import numpy as np
# "cimport" is used to import special compile-time information
# about the numpy module (this is stored in a file numpy.pxd which is
# currently part of the Cython distribution).
cimport numpy as np
# We now need to fix a datatype for our arrays. I've used the variable
# DTYPE for this, which is assigned to the usual NumPy runtime
# type info object.
DTYPE = np.int
# "ctypedef" assigns a corresponding compile-time type to DTYPE_t. For
# every type in the numpy module there's a corresponding compile-time
# type with a _t-suffix.
ctypedef np.int_t DTYPE_t
# The builtin min and max functions works with Python objects, and are
# so very slow. So we create our own.
# - "cdef" declares a function which has much less overhead than a normal
# def function (but it is not Python-callable)
# - "inline" is passed on to the C compiler which may inline the functions
# - The C type "int" is chosen as return type and argument types
# - Cython allows some newer Python constructs like "a if x else b", but
# the resulting C file compiles with Python 2.3 through to Python 3.0 beta.
cdef inline int int_max(int a, int b): return a if a >= b else b
cdef inline int int_min(int a, int b): return a if a <= b else b
# "def" can type its arguments but not have a return type. The type of the
# arguments for a "def" function is checked at run-time when entering the
# function.
#
# The arrays f, g and h is typed as "np.ndarray" instances. The only effect
# this has is to a) insert checks that the function arguments really are
# NumPy arrays, and b) make some attribute access like f.shape[0] much
# more efficient. (In this example this doesn't matter though.)
def naive_convolve(np.ndarray f, np.ndarray g):
if g.shape[0] % 2 != 1 or g.shape[1] % 2 != 1:
raise ValueError("Only odd dimensions on filter supported")
assert f.dtype == DTYPE and g.dtype == DTYPE
# The "cdef" keyword is also used within functions to type variables. It
# can only be used at the top indendation level (there are non-trivial
# problems with allowing them in other places, though we'd love to see
# good and thought out proposals for it).
#
# For the indices, the "int" type is used. This corresponds to a C int,
# other C types (like "unsigned int") could have been used instead.
# Purists could use "Py_ssize_t" which is the proper Python type for
# array indices.
cdef int vmax = f.shape[0]
cdef int wmax = f.shape[1]
cdef int smax = g.shape[0]
cdef int tmax = g.shape[1]
cdef int smid = smax // 2
cdef int tmid = tmax // 2
cdef int xmax = vmax + 2*smid
cdef int ymax = wmax + 2*tmid
cdef np.ndarray h = np.zeros([xmax, ymax], dtype=DTYPE)
cdef int x, y, s, t, v, w
# It is very important to type ALL your variables. You do not get any
# warnings if not, only much slower code (they are implicitly typed as
# Python objects).
cdef int s_from, s_to, t_from, t_to
# For the value variable, we want to use the same data type as is
# stored in the array, so we use "DTYPE_t" as defined above.
# NB! An important side-effect of this is that if "value" overflows its
# datatype size, it will simply wrap around like in C, rather than raise
# an error like in Python.
cdef DTYPE_t value
for x in range(xmax):
for y in range(ymax):
s_from = int_max(smid - x, -smid)
s_to = int_min((xmax - x) - smid, smid + 1)
t_from = int_max(tmid - y, -tmid)
t_to = int_min((ymax - y) - tmid, tmid + 1)
value = 0
for s in range(s_from, s_to):
for t in range(t_from, t_to):
v = x - smid + s
w = y - tmid + t
value += g[smid - s, tmid - t] * f[v, w]
h[x, y] = value
return h
At this point, have a look at the generated C code for :file:`convolve1.pyx` and
:file:`convolve2.pyx`. Click on the lines to expand them and see corresponding C.
(Note that this code annotation is currently experimental and especially
"trailing" cleanup code for a block may stick to the last expression in the
block and make it look worse than it is -- use some common sense).
* .. literalinclude: convolve1.html
* .. literalinclude: convolve2.html
Especially have a look at the for loops: In :file:`convolve1.c`, these are ~20 lines
of C code to set up while in :file:`convolve2.c` a normal C for loop is used.
After building this and continuing my (very informal) benchmarks, I get:
.. sourcecode:: ipython
In [21]: import convolve2
In [22]: %timeit -n2 -r3 convolve2.naive_convolve(f, g)
2 loops, best of 3: 828 ms per loop
Efficient indexing
====================
There's still a bottleneck killing performance, and that is the array lookups
and assignments. The ``[]``-operator still uses full Python operations --
what we would like to do instead is to access the data buffer directly at C
speed.
What we need to do then is to type the contents of the :obj:`ndarray` objects.
We do this with a special "buffer" syntax which must be told the datatype
(first argument) and number of dimensions ("ndim" keyword-only argument, if
not provided then one-dimensional is assumed).
More information on this syntax [:enhancements/buffer:can be found here].
Showing the changes needed to produce :file:`convolve3.pyx` only::
...
def naive_convolve(np.ndarray[DTYPE_t, ndim=2] f, np.ndarray[DTYPE_t, ndim=2] g):
...
cdef np.ndarray[DTYPE_t, ndim=2] h = ...
Usage:
.. sourcecode:: ipython
In [18]: import convolve3
In [19]: %timeit -n3 -r100 convolve3.naive_convolve(f, g)
3 loops, best of 100: 11.6 ms per loop
Note the importance of this change.
*Gotcha*: This efficient indexing only affects certain index operations,
namely those with exactly ``ndim`` number of typed integer indices. So if
``v`` for instance isn't typed, then the lookup ``f[v, w]`` isn't
optimized. On the other hand this means that you can continue using Python
objects for sophisticated dynamic slicing etc. just as when the array is not
typed.
Tuning indexing further
========================
The array lookups are still slowed down by two factors:
1. Bounds checking is performed.
2. Negative indices are checked for and handled correctly. The code above is
explicitly coded so that it doesn't use negative indices, and it
(hopefully) always access within bounds. We can add a decorator to disable
bounds checking::
...
cimport cython
@cython.boundscheck(False) # turn of bounds-checking for entire function
def naive_convolve(np.ndarray[DTYPE_t, ndim=2] f, np.ndarray[DTYPE_t, ndim=2] g):
...
Now bounds checking is not performed (and, as a side-effect, if you ''do''
happen to access out of bounds you will in the best case crash your program
and in the worst case corrupt data). It is possible to switch bounds-checking
mode in many ways, see [:docs/compilerdirectives:compiler directives] for more
information.
Negative indices are dealt with by ensuring Cython that the indices will be
positive, by casting the variables to unsigned integer types (if you do have
negative values, then this casting will create a very large positive value
instead and you will attempt to access out-of-bounds values). Casting is done
with a special ``<>``-syntax. The code below is changed to use either
unsigned ints or casting as appropriate::
...
cdef int s, t # changed
cdef unsigned int x, y, v, w # changed
cdef int s_from, s_to, t_from, t_to
cdef DTYPE_t value
for x in range(xmax):
for y in range(ymax):
s_from = max(smid - x, -smid)
s_to = min((xmax - x) - smid, smid + 1)
t_from = max(tmid - y, -tmid)
t_to = min((ymax - y) - tmid, tmid + 1)
value = 0
for s in range(s_from, s_to):
for t in range(t_from, t_to):
v = <unsigned int>(x - smid + s) # changed
w = <unsigned int>(y - tmid + t) # changed
value += g[<unsigned int>(smid - s), <unsigned int>(tmid - t)] * f[v, w] # changed
h[x, y] = value
...
(In the next Cython release we will likely add a compiler directive or
argument to the ``np.ndarray[]``-type specifier to disable negative indexing
so that casting so much isn't necessary; feedback on this is welcome.)
The function call overhead now starts to play a role, so we compare the latter
two examples with larger N:
.. sourcecode:: ipython
In [11]: %timeit -n3 -r100 convolve4.naive_convolve(f, g)
3 loops, best of 100: 5.97 ms per loop
In [12]: N = 1000
In [13]: f = np.arange(N*N, dtype=np.int).reshape((N,N))
In [14]: g = np.arange(81, dtype=np.int).reshape((9, 9))
In [17]: %timeit -n1 -r10 convolve3.naive_convolve(f, g)
1 loops, best of 10: 1.16 s per loop
In [18]: %timeit -n1 -r10 convolve4.naive_convolve(f, g)
1 loops, best of 10: 597 ms per loop
(Also this is a mixed benchmark as the result array is allocated within the
function call.)
.. Warning::
Speed comes with some cost. Especially it can be dangerous to set typed
objects (like ``f``, ``g`` and ``h`` in our sample code) to :keyword:`None`.
Setting such objects to :keyword:`None` is entirely legal, but all you can do with them
is check whether they are None. All other use (attribute lookup or indexing)
can potentially segfault or corrupt data (rather than raising exceptions as
they would in Python).
The actual rules are a bit more complicated but the main message is clear: Do
not use typed objects without knowing that they are not set to None.
More generic code
==================
It would be possible to do::
def naive_convolve(object[DTYPE_t, ndim=2] f, ...):
i.e. use :obj:`object` rather than :obj:`np.ndarray`. Under Python 3.0 this
can allow your algorithm to work with any libraries supporting the buffer
interface; and support for e.g. the Python Imaging Library may easily be added
if someone is interested also under Python 2.x.
There is some speed penalty to this though (as one makes more assumptions
compile-time if the type is set to :obj:`np.ndarray`, specifically it is
assumed that the data is stored in pure strided more and not in indirect
mode).
[:enhancements/buffer:More information]
The future
============
These are some points to consider for further development. All points listed
here has gone through a lot of thinking and planning already; still they may
or may not happen depending on available developer time and resources for
Cython.
1. Support for efficient access to structs/records stored in arrays; currently
only primitive types are allowed.
2. Support for efficient access to complex floating point types in arrays. The
main obstacle here is getting support for efficient complex datatypes in
Cython.
3. Calling NumPy/SciPy functions currently has a Python call overhead; it
would be possible to take a short-cut from Cython directly to C. (This does
however require some isolated and incremental changes to those libraries;
mail the Cython mailing list for details).
4. Efficient code that is generic with respect to the number of dimensions.
This can probably be done today by calling the NumPy C multi-dimensional
iterator API directly; however it would be nice to have for-loops over
:func:`enumerate` and :func:`ndenumerate` on NumPy arrays create efficient
code.
5. A high-level construct for writing type-generic code, so that one can write
functions that work simultaneously with many datatypes. Note however that a
macro preprocessor language can help with doing this for now.
.. highlight:: cython
.. _overview:
********
Overview
********
About Cython
==============
Cython is a language that makes writing C extensions for the Python language
as easy as Python itself. Cython is based on the well-known `Pyrex
<http://www.cosc.canterbury.ac.nz/greg.ewing/python/Pyrex/>`_ language by Greg Ewing,
but supports more cutting edge functionality and optimizations [#]_.
The Cython language is very close to the Python language, but Cython
additionally supports calling C functions and declaring C types on variables
and class attributes. This allows the compiler to generate very efficient C
code from Cython code.
This makes Cython the ideal language for wrapping external C libraries,
and for fast C modules that speed up the execution of Python code.
Future Plans
============
Cython is not finished. Substantial tasks remaining. See
:ref:`cython-limitations` for a current list.
.. rubric:: Footnotes
.. [#] For differences with Pyrex see :ref:`pyrex-differences`.
.. highlight:: cython
.. _profiling:
*********
Profiling
*********
This part describes the profiling abilities of Cython. If you are familiar
with profiling pure Python code, you can only read the first section
(:ref:`profiling_basics`). If you are not familiar with python profiling you
should also read the tutorial (:ref:`profiling_tutorial`) which takes you
through a complete example step by step.
.. _profiling_basics:
Cython Profiling Basics
=======================
Profiling in Cython is controlled by a compiler directive.
It can either be set either for an entire file or on a per function
via a Cython decorator.
Enable profiling for a complete source file
-------------------------------------------
Profiling is enable for a complete source file via a global directive to the
Cython compiler at the top of a file::
# cython: profile=True
Note that profiling gives a slight overhead to each function call therefore making
your program a little slower (or a lot, if you call some small functions very
often).
Once enabled, your Cython code will behave just like Python code when called
from the cProfile module. This means you can just profile your Cython code
together with your Python code using the same tools as for Python code alone.
Disabling profiling function wise
------------------------------------------
If your profiling is messed up because of the call overhead to some small
functions that you rather do not want to see in your profile - either because
you plan to inline them anyway or because you are sure that you can't make them
any faster - you can use a special decorator to disable profiling for one
function only::
cimport cython
@cython.profile(False)
def my_often_called_function():
pass
.. _profiling_tutorial:
Profiling Tutorial
==================
This will be a complete tutorial, start to finish, of profiling python code,
turning it into Cython code and keep profiling until it is fast enough.
As a toy example, we would like to evaluate the summation of the reciprocals of
squares up to a certain integer :math:`n` for evaluating :math:`\pi`. The
relation we want to use has been proven by Euler in 1735 and is known as the
`Basel problem <http://en.wikipedia.org/wiki/Basel_problem>`_.
.. math::
\pi^2 = 6 \sum_{k=1}^{\infty} \frac{1}{k^2} =
6 \lim_{k \to \infty} \big( \frac{1}{1^2} +
\frac{1}{2^2} + \dots + \frac{1}{k^2} \big) \approx
6 \big( \frac{1}{1^2} + \frac{1}{2^2} + \dots + \frac{1}{n^2} \big)
A simple python code for evaluating the truncated sum looks like this::
#!/usr/bin/env python
# encoding: utf-8
# filename: calc_pi.py
def recip_square(i):
return 1./i**2
def approx_pi(n=10000000):
val = 0.
for k in range(1,n+1):
val += recip_square(k)
return (6 * val)**.5
On my box, this needs approximately 4 seconds to run the function with the
default n. The higher we choose n, the better will be the approximation for
:math:`\pi`. An experienced python programmer will already see plenty of
places to optimize this code. But remember the golden rule of optimization:
Never optimize without having profiled. Let me repeat this: **Never** optimize
without having profiled your code. Your thoughts about which part of your
code takes too much time are wrong. At least, mine are always wrong. So let's
write a short script to profile our code::
#!/usr/bin/env python
# encoding: utf-8
# filename: profile.py
import pstats, cProfile
import calc_pi
cProfile.runctx("calc_pi.approx_pi()", globals(), locals(), "Profile.prof")
s = pstats.Stats("Profile.prof")
s.strip_dirs().sort_stats("time").print_stats()
Running this on my box gives the following output::
TODO: how to display this not as code but verbatimly?
Sat Nov 7 17:40:54 2009 Profile.prof
10000004 function calls in 6.211 CPU seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
1 3.243 3.243 6.211 6.211 calc_pi.py:7(approx_pi)
10000000 2.526 0.000 2.526 0.000 calc_pi.py:4(recip_square)
1 0.442 0.442 0.442 0.442 {range}
1 0.000 0.000 6.211 6.211 <string>:1(<module>)
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
This contains the information that the code runs in 6.2 CPU seconds. Note that
the code got slower by 2 seconds because it ran inside the cProfile module. The
table contains the real valuable information. You might want to check the
python `profiling documentation <http://docs.python.org/library/profile.html>`_
for the nitty gritty details. The most important columns here are totime (total
time spend in this function **not** counting functions that were called by this
function) and cumtime (total time spend in this function **also** counting the
functions called by this function). Looking at the tottime column, we see that
approximately half the time is spend in approx_pi and the other half is spend
in recip_square. Also half a second is spend in range ... of course we should
have used xrange for such a big iteration. And in fact, just changing range to
xrange makes the code run in 5.8 seconds.
We could optimize a lot in the pure python version, but since we are interested
in Cython, let's move forward and bring this module to Cython. We would do this
anyway at some time to get the loop run faster. Here is our first Cython version::
# encoding: utf-8
# cython: profile=True
# filename: calc_pi.pyx
def recip_square(int i):
return 1./i**2
def approx_pi(int n=10000000):
cdef double val = 0.
cdef int k
for k in xrange(1,n+1):
val += recip_square(k)
return (6 * val)**.5
Note the second line: We have to tell Cython that profiling should be enabled.
This makes the Cython code slightly slower, but without this we would not get
meaningful output from the cProfile module. The rest of the code is mostly
unchanged, I only typed some variables which will likely speed things up a bit.
We also need to modify our profiling script to import the Cython module directly.
Here is the complete version adding the import of the pyximport module::
#!/usr/bin/env python
# encoding: utf-8
# filename: profile.py
import pstats, cProfile
import pyximport
pyximport.install()
import calc_pi
cProfile.runctx("calc_pi.approx_pi()", globals(), locals(), "Profile.prof")
s = pstats.Stats("Profile.prof")
s.strip_dirs().sort_stats("time").print_stats()
We only added two lines, the rest stays completely the same. Alternatively, we could also
manually compile our code into an extension; we wouldn't need to change the
profile script then at all. The script now outputs the following::
Sat Nov 7 18:02:33 2009 Profile.prof
10000004 function calls in 4.406 CPU seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
1 3.305 3.305 4.406 4.406 calc_pi.pyx:7(approx_pi)
10000000 1.101 0.000 1.101 0.000 calc_pi.pyx:4(recip_square)
1 0.000 0.000 4.406 4.406 {calc_pi.approx_pi}
1 0.000 0.000 4.406 4.406 <string>:1(<module>)
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
We gained 1.8 seconds. Not too shabby. Comparing the output to the previous, we
see that recip_square function got faster while the approx_pi function has not
changed a lot. Let's concentrate on the approx_pi function a bit more. First
note, that this function is not to be called from code outside of our module;
so it would be wise to turn it into a cdef to reduce call overhead. We should
also get rid of the power operator: it is turned into a pow(i,2) function call by
Cython, but we could instead just write i*i which could be faster. The
whole function is also a good candidate for inlining. Let's look at the
necessary changes for these ideas::
# encoding: utf-8
# cython: profile=True
# filename: calc_pi.pyx
cdef inline double recip_square(int i):
return 1./(i*i)
def approx_pi(int n=10000000):
cdef double val = 0.
cdef int k
for k in xrange(1,n+1):
val += recip_square(k)
return (6 * val)**.5
Now running the profile script yields::
Sat Nov 7 18:10:11 2009 Profile.prof
10000004 function calls in 2.622 CPU seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
1 1.782 1.782 2.622 2.622 calc_pi.pyx:7(approx_pi)
10000000 0.840 0.000 0.840 0.000 calc_pi.pyx:4(recip_square)
1 0.000 0.000 2.622 2.622 {calc_pi.approx_pi}
1 0.000 0.000 2.622 2.622 <string>:1(<module>)
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
That bought us another 1.8 seconds. Not the dramatic change we could have
expected. And why is recip_square still in this table; it is supposed to be
inlined, isn't it? The reason for this is that Cython still generates profiling code
even if the function call is eliminated. Let's tell it to not
profile recip_square any more; we couldn't get the function to be much faster anyway::
# encoding: utf-8
# cython: profile=True
# filename: calc_pi.pyx
cimport cython
@cython.profile(False)
cdef inline double recip_square(int i):
return 1./(i*i)
def approx_pi(int n=10000000):
cdef double val = 0.
cdef int k
for k in xrange(1,n+1):
val += recip_square(k)
return (6 * val)**.5
Running this shows an interesting result::
Sat Nov 7 18:15:02 2009 Profile.prof
4 function calls in 0.089 CPU seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.089 0.089 0.089 0.089 calc_pi.pyx:10(approx_pi)
1 0.000 0.000 0.089 0.089 {calc_pi.approx_pi}
1 0.000 0.000 0.089 0.089 <string>:1(<module>)
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
First note the tremendous speed gain: this version only takes 1/50 of the time
of our first Cython version. Also note that recip_square has vanished from the
table like we wanted. But the most peculiar and import change is that
approx_pi also got much faster. This is a problem with all profiling: calling a
function in a profile run adds a certain overhead to the function call. This
overhead is **not** added to the time spend in the called function, but to the
time spend in the **calling** function. In this example, approx_pi didn't need 2.622
seconds in the last run; but it called recip_square 10000000 times, each time taking a
little to set up profiling for it. This adds up to the massive time loss of
around 2.6 seconds. Having disable profiling for the often called function now
reveals realistic timings for approx_pi; we could continue optimizing it now if
needed.
This concludes this profiling tutorial. There is still some room for
improvement in this code. We could try to replace the power operator in
approx_pi with a call to sqrt from the C stdlib; but this is not necessarily
faster than calling pow(x,0.5).
Even so, the result we achieved here is quite satisfactory: we came up with a
solution that is much faster then our original python version while retaining
functionality and readability.
I think this is a result of a recent change to Pyrex that
has been merged into Cython.
If a directory contains an :file:`__init__.py` or :file:`__init__.pyx` file,
it's now assumed to be a package directory. So, for example,
if you have a directory structure::
foo/
__init__.py
shrubbing.pxd
shrubbing.pyx
then the shrubbing module is assumed to belong to a package
called 'foo', and its fully qualified module name is
'foo.shrubbing'.
So when Pyrex wants to find out whether there is a `.pxd` file for shrubbing,
it looks for one corresponding to a module called :module:`foo.shrubbing`. It
does this by searching the include path for a top-level package directory
called 'foo' containing a file called 'shrubbing.pxd'.
However, if foo is the current directory you're running
the compiler from, and you haven't added foo to the
include path using a -I option, then it won't be on
the include path, and the `.pxd` won't be found.
What to do about this depends on whether you really
intend the module to reside in a package.
If you intend shrubbing to be a top-level module, you
will have to move it somewhere else where there is
no :file:`__init__.*` file.
If you do intend it to reside in a package, then there
are two alternatives:
1. cd to the directory containing foo and compile
from there::
cd ..; cython foo/shrubbing.pyx
2. arrange for the directory containing foo to be
passed as a -I option, e.g.::
cython -I .. shrubbing.pyx
Arguably this behaviour is not very desirable, and I'll
see if I can do something about it.
.. highlight:: cython
.. _pyrex-differences:
**************************************
Differences between Cython and Pyrex
**************************************
.. warning::
Both Cython and Pyrex are moving targets. It has come to the point
that an explicit list of all the differences between the two
projects would be laborious to list and track, but hopefully
this high-level list gives an idea of the differences that
are present. It should be noted that both projects make an effort
at mutual compatibility, but Cython's goal is to be as close to
and complete as Python as reasonable.
Python 3.0 Support
==================
Cython creates ``.c`` files that can be built and used with both
Python 2.x and Python 3.x. In fact, compiling your module with
Cython may very well be the easiest way to port code to Python 3.0.
We are also working to make the compiler run in both Python 2.x and 3.0.
Many Python 3 constructs are already supported by Cython.
List/Set/Dict Comprehensions
----------------------------
Cython supports the different comprehensions defined by Python 3.0 for
lists, sets and dicts::
[expr(x) for x in A] # list
{expr(x) for x in A} # set
{key(x) : value(x) for x in A} # dict
Looping is optimized if ``A`` is a list, tuple or dict. You can use
the :keyword:`for` ... :keyword:`from` syntax, too, but it is
generally preferred to use the usual :keyword:`for` ... :keyword:`in`
``range(...)`` syntax with a C run variable (e.g. ``cdef int i``).
.. note:: see :ref:`automatic-range-conversion`
Note that Cython also supports set literals starting from Python 2.3.
Keyword-only arguments
----------------------
Python functions can have keyword-only arguments listed after the ``*``
parameter and before the ``**`` parameter if any, e.g.::
def f(a, b, *args, c, d = 42, e, **kwds):
...
Here ``c``, ``d`` and ``e`` cannot be passed as position arguments and must be
passed as keyword arguments. Furthermore, ``c`` and ``e`` are required keyword
arguments, since they do not have a default value.
If the parameter name after the ``*`` is omitted, the function will not accept any
extra positional arguments, e.g.::
def g(a, b, *, c, d):
...
takes exactly two positional parameters and has two required keyword parameters.
Conditional expressions "x if b else y" (python 2.5)
=====================================================
Conditional expressions as described in
http://www.python.org/dev/peps/pep-0308/::
X if C else Y
Only one of ``X`` and ``Y`` is evaluated, (depending on the value of C).
cdef inline
=============
Module level functions can now be declared inline, with the :keyword:`inline`
keyword passed on to the C compiler. These can be as fast as macros.::
cdef inline int something_fast(int a, int b):
return a*a + b
Note that class-level :keyword:`cdef` functions are handled via a virtual
function table, so the compiler won't be able to inline them in almost all
cases.
Assignment on declaration (e.g. "cdef int spam = 5")
======================================================
In Pyrex, one must write::
cdef int i, j, k
i = 2
j = 5
k = 7
Now, with cython, one can write::
cdef int i = 2, j = 5, k = 7
The expression on the right hand side can be arbitrarily complicated, e.g.::
cdef int n = python_call(foo(x,y), a + b + c) - 32
'by' expression in for loop (e.g. "for i from 0 <= i < 10 by 2")
==================================================================
::
for i from 0 <= i < 10 by 2:
print i
yields::
0
2
4
6
8
.. note:: see :ref:`automatic-range-conversion`
Boolean int type (e.g. it acts like a c int, but coerces to/from python as a boolean)
======================================================================================
In C, ints are used for truth values. In python, any object can be used as a
truth value (using the :meth:`__nonzero__` method, but the canonical choices
are the two boolean objects ``True`` and ``False``. The :keyword:`bint` of
"boolean int" object is compiled to a C int, but get coerced to and from
Cython as booleans. The return type of comparisons and several builtins is a
:ctype:`bint` as well. This allows one to avoid having to wrap things in
:func:`bool()`. For example, one can write::
def is_equal(x):
return x == y
which would return ``1`` or ``0`` in Pyrex, but returns ``True`` or ``False`` in
python. One can declare variables and return values for functions to be of the
:ctype:`bint` type. For example::
cdef int i = x
cdef bint b = x
The first conversion would happen via ``x.__int__()`` whereas the second would
happen via ``x.__nonzero__()``. (Actually, if ``x`` is the python object
``True`` or ``False`` then no method call is made.)
Executable class bodies
=======================
Including a working :func:`classmethod`::
cdef class Blah:
def some_method(self):
print self
some_method = classmethod(some_method)
a = 2*3
print "hi", a
cpdef functions
=================
Cython adds a third function type on top of the usual :keyword:`def` and
:keyword:`cdef`. If a function is declared :keyword:`cpdef` it can be called
from and overridden by both extension and normal python subclasses. You can
essentially think of a :keyword:`cpdef` method as a :keyword:`cdef` method +
some extras. (That's how it's implemented at least.) First, it creates a
:keyword:`def` method that does nothing but call the underlying
:keyword:`cdef` method (and does argument unpacking/coercion if needed). At
the top of the :keyword:`cdef` method a little bit of code is added to check
to see if it's overridden. Specifically, in pseudocode::
if type(self) has a __dict__:
foo = self.getattr('foo')
if foo is not wrapper_foo:
return foo(args)
[cdef method body]
To detect whether or not a type has a dictionary, it just checks the
tp_dictoffset slot, which is ``NULL`` (by default) for extension types, but
non- null for instance classes. If the dictionary exists, it does a single
attribute lookup and can tell (by comparing pointers) whether or not the
returned result is actually a new function. If, and only if, it is a new
function, then the arguments packed into a tuple and the method called. This
is all very fast. A flag is set so this lookup does not occur if one calls the
method on the class directly, e.g.::
cdef class A:
cpdef foo(self):
pass
x = A()
x.foo() # will check to see if overridden
A.foo(x) # will call A's implementation whether overridden or not
See :ref:`early-binding-for-speed` for explanation and usage tips.
.. _automatic-range-conversion:
Automatic range conversion
============================
This will convert statements of the form ``for i in range(...)`` to ``for i
from ...`` when ``i`` is any cdef'd integer type, and the direction (i.e. sign
of step) can be determined.
.. warning::
This may change the semantics if the range causes
assignment to ``i`` to overflow. Specifically, if this option is set, an error
will be raised before the loop is entered, whereas without this option the loop
will execute until a overflowing value is encountered. If this effects you
change ``Cython/Compiler/Options.py`` (eventually there will be a better
way to set this).
More friendly type casting
===========================
In Pyrex, if one types ``<int>x`` where ``x`` is a Python object, one will get
the memory address of ``x``. Likewise, if one types ``<object>i`` where ``i``
is a C int, one will get an "object" at location ``i`` in memory. This leads
to confusing results and segfaults.
In Cython ``<type>x`` will try and do a coercion (as would happen on assignment of
``x`` to a variable of type type) if exactly one of the types is a python object.
It does not stop one from casting where there is no conversion (though it will
emit a warning). If one really wants the address, cast to a ``void *`` first.
As in Pyrex ``<MyExtensionType>x`` will cast ``x`` to type :ctype:`MyExtensionType` without any
type checking. Cython supports the syntax ``<MyExtensionType?>`` to do the cast
with type checking (i.e. it will throw an error if ``x`` is not a (subclass of)
:ctype:`MyExtensionType`.
Optional arguments in cdef/cpdef functions
============================================
Cython now supports optional arguments for :keyword:`cdef` and
:keyword:`cpdef` functions.
The syntax in the ``.pyx`` file remains as in Python, but one declares such
functions in the ``.pxd`` file by writing ``cdef foo(x=*)``. The number of
arguments may increase on subclassing, but the argument types and order must
remain the same. There is a slight performance penalty in some cases when a
cdef/cpdef function without any optional is overridden with one that does have
default argument values.
For example, one can have the ``.pxd`` file::
cdef class A:
cdef foo(self)
cdef class B(A)
cdef foo(self, x=*)
cdef class C(B):
cpdef foo(self, x=*, int k=*)
with corresponding ``.pyx`` file::
cdef class A:
cdef foo(self):
print "A"
cdef class B(A)
cdef foo(self, x=None)
print "B", x
cdef class C(B):
cpdef foo(self, x=True, int k=3)
print "C", x, k
.. note::
this also demonstrates how :keyword:`cpdef` functions can override
:keyword:`cdef` functions.
Function pointers in structs
=============================
Functions declared in :keyword:`structs` are automatically converted to
function pointers for convenience.
C++ Exception handling
=========================
:keyword:`cdef` functions can now be declared as::
cdef int foo(...) except +
cdef int foo(...) except +TypeError
cdef int foo(...) except +python_error_raising_function
in which case a Python exception will be raised when a C++ error is caught.
See :ref:`wrapping-cplusplus` for more details.
Synonyms
=========
``cdef import from`` means the same thing as ``cdef extern from``
Source code encoding
======================
.. TODO: add the links to the relevent PEPs
Cython supports PEP 3120 and PEP 263, i.e. you can start your Cython source
file with an encoding comment and generally write your source code in UTF-8.
This impacts the encoding of byte strings and the conversion of unicode string
literals like ``u'abcd'`` to unicode objects.
Automatic ``typecheck``
========================
Rather than introducing a new keyword :keyword:`typecheck` as explained in the
`Pyrex docs
<http://www.cosc.canterbury.ac.nz/greg.ewing/python/Pyrex/version/Doc/Manual/special_methods.html>`_,
Cython emits a (non-spoofable and faster) typecheck whenever
:func:`isinstance` is used with an extension type as the second parameter.
From __future__ directives
==========================
Cython supports several from __future__ directives, namely ``unicode_literals`` and ``division``.
With statements are always enabled.
Pure Python mode
================
Cython has support for compiling ``.py`` files, and
accepting type annotations using decorators and other
valid Python syntax. This allows the same source to
be interpreted as straight Python, or compiled for
optimized results.
See http://wiki.cython.org/pure
for more details.
.. highlight:: cython
.. _sharing-declarations:
********************************************
Sharing Declarations Between Cython Modules
********************************************
This section describes a new set of facilities for making C declarations,
functions and extension types in one Cython module available for use in
another Cython module. These facilities are closely modelled on the Python
import mechanism, and can be thought of as a compile-time version of it.
Definition and Implementation files
====================================
A Cython module can be split into two parts: a definition file with a ``.pxd``
suffix, containing C declarations that are to be available to other Cython
modules, and an implementation file with a ``.pyx`` suffix, containing
everything else. When a module wants to use something declared in another
module's definition file, it imports it using the :keyword:`cimport`
statement.
A ``.pxd`` file that consists solely of extern declarations does not need
to correspond to an actual ``.pyx`` file or Python module. This can make it a
convenient place to put common declarations, for example declarations of
functions from an :ref:`external library <external-C-code>` that one wants to use in several modules.
What a Definition File contains
================================
A definition file can contain:
* Any kind of C type declaration.
* extern C function or variable declarations.
* Declarations of C functions defined in the module.
* The definition part of an extension type (see below).
It cannot contain any non-extern C variable declarations.
It cannot contain the implementations of any C or Python functions, or any
Python class definitions, or any executable statements. It is needed when one
wants to access :keyword:`cdef` attributes and methods, or to inherit from
:keyword:`cdef` classes defined in this module.
.. note::
You don't need to (and shouldn't) declare anything in a declaration file
public in order to make it available to other Cython modules; its mere
presence in a definition file does that. You only need a public
declaration if you want to make something available to external C code.
What an Implementation File contains
======================================
An implementation file can contain any kind of Cython statement, although there
are some restrictions on the implementation part of an extension type if the
corresponding definition file also defines that type (see below).
If one doesn't need to :keyword:`cimport` anything from this module, then this
is the only file one needs.
The cimport statement
=======================
The :keyword:`cimport` statement is used in a definition or
implementation file to gain access to names declared in another definition
file. Its syntax exactly parallels that of the normal Python import
statement::
cimport module [, module...]
from module cimport name [as name] [, name [as name] ...]
Here is an example. The file on the left is a definition file which exports a
C data type. The file on the right is an implementation file which imports and
uses it.
:file:`dishes.pxd`::
cdef enum otherstuff:
sausage, eggs, lettuce
cdef struct spamdish:
int oz_of_spam
otherstuff filler
:file:`restaurant.pyx`::
cimport dishes
from dishes cimport spamdish
cdef void prepare(spamdish *d):
d.oz_of_spam = 42
d.filler = dishes.sausage
def serve():
cdef spamdish d
prepare(&d)
print "%d oz spam, filler no. %d" % (d.oz_of_spam, d.otherstuff)
It is important to understand that the :keyword:`cimport` statement can only
be used to import C data types, C functions and variables, and extension
types. It cannot be used to import any Python objects, and (with one
exception) it doesn't imply any Python import at run time. If you want to
refer to any Python names from a module that you have cimported, you will have
to include a regular import statement for it as well.
The exception is that when you use :keyword:`cimport` to import an extension type, its
type object is imported at run time and made available by the name under which
you imported it. Using :keyword:`cimport` to import extension types is covered in more
detail below.
If a ``.pxd`` file changes, any modules that :keyword:`cimport` from it may need to be
recompiled.
Search paths for definition files
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
When you :keyword:`cimport` a module called ``modulename``, the Cython
compiler searches for a file called :file:`modulename.pxd` along the search
path for include files, as specified by ``-I`` command line options.
Also, whenever you compile a file :file:`modulename.pyx`, the corresponding
definition file :file:`modulename.pxd` is first searched for along the same
path, and if found, it is processed before processing the ``.pyx`` file.
Using cimport to resolve naming conflicts
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The :keyword:`cimport` mechanism provides a clean and simple way to solve the
problem of wrapping external C functions with Python functions of the same
name. All you need to do is put the extern C declarations into a ``.pxd`` file
for an imaginary module, and :keyword:`cimport` that module. You can then
refer to the C functions by qualifying them with the name of the module.
Here's an example:
:file:`c_lunch.pxd` ::
cdef extern from "lunch.h":
void eject_tomato(float)
:file:`lunch.pyx` ::
cimport c_lunch
def eject_tomato(float speed):
c_lunch.eject_tomato(speed)
You don't need any :file:`c_lunch.pyx` file, because the only things defined
in :file:`c_lunch.pxd` are extern C entities. There won't be any actual
``c_lunch`` module at run time, but that doesn't matter; the
:file:`c_lunch.pxd` file has done its job of providing an additional namespace
at compile time.
Sharing C Functions
===================
C functions defined at the top level of a module can be made available via
:keyword:`cimport` by putting headers for them in the ``.pxd`` file, for
example,:
:file:`volume.pxd`::
cdef float cube(float)
:file:`spammery.pyx`::
from volume cimport cube
def menu(description, size):
print description, ":", cube(size), \
"cubic metres of spam"
menu("Entree", 1)
menu("Main course", 3)
menu("Dessert", 2)
:file:`volume.pyx`::
cdef float cube(float x):
return x * x * x
.. note::
When a module exports a C function in this way, an object appears in the
module dictionary under the function's name. However, you can't make use of
this object from Python, nor can you use it from Cython using a normal import
statement; you have to use :keyword:`cimport`.
Sharing Extension Types
=======================
An extension type can be made available via :keyword:`cimport` by splitting
its definition into two parts, one in a definition file and the other in the
corresponding implementation file.
The definition part of the extension type can only declare C attributes and C
methods, not Python methods, and it must declare all of that type's C
attributes and C methods.
The implementation part must implement all of the C methods declared in the
definition part, and may not add any further C attributes. It may also define
Python methods.
Here is an example of a module which defines and exports an extension type,
and another module which uses it.::
# Shrubbing.pxd
cdef class Shrubbery:
cdef int width
cdef int length
# Shrubbing.pyx
cdef class Shrubbery:
def __cinit__(self, int w, int l):
self.width = w
self.length = l
def standard_shrubbery():
return Shrubbery(3, 7)
# Landscaping.pyx
cimport Shrubbing
import Shrubbing
cdef Shrubbing.Shrubbery sh
sh = Shrubbing.standard_shrubbery()
print "Shrubbery size is %d x %d" % (sh.width, sh.height)
Some things to note about this example:
* There is a :keyword:`cdef` class Shrubbery declaration in both
:file:`Shrubbing.pxd` and :file:`Shrubbing.pyx`. When the Shrubbing module
is compiled, these two declarations are combined into one.
* In Landscaping.pyx, the :keyword:`cimport` Shrubbing declaration allows us
to refer to the Shrubbery type as :class:`Shrubbing.Shrubbery`. But it
doesn't bind the name Shrubbing in Landscaping's module namespace at run
time, so to access :func:`Shrubbing.standard_shrubbery` we also need to
``import Shrubbing``.
.. highlight:: cython
.. _compilation:
****************************
Source Files and Compilation
****************************
Cython source file names consist of the name of the module followed by a
``.pyx`` extension, for example a module called primes would have a source
file named :file:`primes.pyx`.
Once you have written your ``.pyx`` file, there are a couple of ways of turning it
into an extension module. One way is to compile it manually with the Cython
compiler, e.g.:
.. sourcecode:: text
$ cython primes.pyx
This will produce a file called :file:`primes.c`, which then needs to be
compiled with the C compiler using whatever options are appropriate on your
platform for generating an extension module. For these options look at the
official Python documentation.
The other, and probably better, way is to use the :mod:`distutils` extension
provided with Cython. The benifit of this method is that it will give the
platform specific compilation options, acting like a stripped down autotools.
Basic setup.py
===============
The distutils extension provided with Cython allows you to pass ``.pyx`` files
directly to the ``Extension`` constructor in your setup file.
If you have a single Cython file that you want to turn into a compiled
extension, say with filename :file:`example.pyx` the associated :file:`setup.py`
would be::
from distutils.core import setup
from distutils.extension import Extension
from Cython.Distutils import build_ext
setup(
cmdclass = {'build_ext': build_ext},
ext_modules = [Extension("example", ["example.pyx"])]
)
To understand the :file:`setup.py` more fully look at the official
:mod:`distutils` documentation. To compile the extension for use in the
current directory use:
.. sourcecode:: text
$ python setup.py build_ext --inplace
Cython Files Depending on C Files
===================================
When you have come C files that have been wrapped with cython and you want to
compile them into your extension the basic :file:`setup.py` file to do this
would be::
from distutils.core import setup
from distutils.extension import Extension
from Cython.Distutils import build_ext
sourcefiles = ['example.pyx', 'helper.c', 'another_helper.c']
setup(
cmdclass = {'build_ext': build_ext},
ext_modules = [Extension("example", sourcefiles)]
)
Notice that the files have been given a name, this is not necessary, but it
makes the file easier to format if the list gets long.
The :class:`Extension` class takes many options, and a fuller explanation can
be found in the `distutils documentation`_. Some useful options to know about
are ``include_dirs``, ``libraries``, and ``library_dirs`` which specify where
to find the ``.h`` and library files when linking to external libraries.
.. _distutils documentation: http://docs.python.org/extending/building.html
Multiple Cython Files in a Package
===================================
TODO
Distributing Cython modules
============================
It is strongly recommended that you distribute the generated ``.c`` files as well
as your Cython sources, so that users can install your module without needing
to have Cython available.
It is also recommended that Cython compilation not be enabled by default in the
version you distribute. Even if the user has Cython installed, he probably
doesn't want to use it just to install your module. Also, the version he has
may not be the same one you used, and may not compile your sources correctly.
This simply means that the :file:`setup.py` file that you ship with will just
be a normal distutils file on the generated `.c` files, for the basic example
we would have instead::
from distutils.core import setup
from distutils.extension import Extension
setup(
ext_modules = [Extension("example", ["example.c"])]
)
.. _pyximport:
Pyximport
===========
.. TODO add some text about how this is Paul Prescods code. Also change the
tone to be more universal (i.e. remove all the I statements)
Cython is a compiler. Therefore it is natural that people tend to go
through an edit/compile/test cycle with Cython modules. But my personal
opinion is that one of the deep insights in Python's implementation is
that a language can be compiled (Python modules are compiled to ``.pyc``)
files and hide that compilation process from the end-user so that they
do not have to worry about it. Pyximport does this for Cython modules.
For instance if you write a Cython module called :file:`foo.pyx`, with
Pyximport you can import it in a regular Python module like this::
import pyximport; pyximport.install()
import foo
Doing so will result in the compilation of :file:`foo.pyx` (with appropriate
exceptions if it has an error in it).
If you would always like to import Cython files without building them
specially, you can also the first line above to your :file:`sitecustomize.py`.
That will install the hook every time you run Python. Then you can use
Cython modules just with simple import statements. I like to test my
Cython modules like this:
.. sourcecode:: text
$ python -c "import foo"
Dependency Handling
--------------------
In Pyximport 1.1 it is possible to declare that your module depends on
multiple files, (likely ``.h`` and ``.pxd`` files). If your Cython module is
named ``foo`` and thus has the filename :file:`foo.pyx` then you should make
another file in the same directory called :file:`foo.pyxdep`. The
:file:`modname.pyxdep` file can be a list of filenames or "globs" (like
``*.pxd`` or ``include/*.h``). Each filename or glob must be on a separate
line. Pyximport will check the file date for each of those files before
deciding whether to rebuild the module. In order to keep track of the
fact that the dependency has been handled, Pyximport updates the
modification time of your ".pyx" source file. Future versions may do
something more sophisticated like informing distutils of the
dependencies directly.
Limitations
------------
Pyximport does not give you any control over how your Cython file is
compiled. Usually the defaults are fine. You might run into problems if
you wanted to write your program in half-C, half-Cython and build them
into a single library. Pyximport 1.2 will probably do this.
Pyximport does not hide the Distutils/GCC warnings and errors generated
by the import process. Arguably this will give you better feedback if
something went wrong and why. And if nothing went wrong it will give you
the warm fuzzy that pyximport really did rebuild your module as it was
supposed to.
For further thought and discussion
------------------------------------
I don't think that Python's :func:`reload` will do anything for changed
``.so``'s on some (all?) platforms. It would require some (easy)
experimentation that I haven't gotten around to. But reload is rarely used in
applications outside of the Python interactive interpreter and certainly not
used much for C extension modules. Info about Windows
`<http://mail.python.org/pipermail/python-list/2001-July/053798.html>`_
``setup.py install`` does not modify :file:`sitecustomize.py` for you. Should it?
Modifying Python's "standard interpreter" behaviour may be more than
most people expect of a package they install..
Pyximport puts your ``.c`` file beside your ``.pyx`` file (analogous to
``.pyc`` beside ``.py``). But it puts the platform-specific binary in a
build directory as per normal for Distutils. If I could wave a magic
wand and get Cython or distutils or whoever to put the build directory I
might do it but not necessarily: having it at the top level is *VERY*
*HELPFUL* for debugging Cython problems.
.. _special-methods:
Special Methods of Extension Types
===================================
This page describes the special methods currently supported by Cython extension
types. A complete list of all the special methods appears in the table at the
bottom. Some of these methods behave differently from their Python
counterparts or have no direct Python counterparts, and require special
mention.
.. Note: Everything said on this page applies only to extension types, defined
with the :keyword:`cdef class` statement. It doesn't apply to classes defined with the
Python :keyword:`class` statement, where the normal Python rules apply.
Declaration
------------
Special methods of extension types must be declared with :keyword:`def`, not
:keyword:`cdef`. This does not impact their performance--Python uses different
calling conventions to invoke these special methods.
Docstrings
-----------
Currently, docstrings are not fully supported in some special methods of extension
types. You can place a docstring in the source to serve as a comment, but it
won't show up in the corresponding :attr:`__doc__` attribute at run time. (This
seems to be is a Python limitation -- there's nowhere in the `PyTypeObject`
data structure to put such docstrings.)
Initialisation methods: :meth:`__cinit__` and :meth:`__init__`
---------------------------------------------------------------
There are two methods concerned with initialising the object.
The :meth:`__cinit__` method is where you should perform basic C-level
initialisation of the object, including allocation of any C data structures
that your object will own. You need to be careful what you do in the
:meth:`__cinit__` method, because the object may not yet be fully valid Python
object when it is called. Therefore, you should be careful invoking any Python
operations which might touch the object; in particular, its methods.
By the time your :meth:`__cinit__` method is called, memory has been allocated for the
object and any C attributes it has have been initialised to 0 or null. (Any
Python attributes have also been initialised to None, but you probably
shouldn't rely on that.) Your :meth:`__cinit__` method is guaranteed to be called
exactly once.
If your extension type has a base type, the :meth:`__cinit__` method of the base type
is automatically called before your :meth:`__cinit__` method is called; you cannot
explicitly call the inherited :meth:`__cinit__` method. If you need to pass a modified
argument list to the base type, you will have to do the relevant part of the
initialisation in the :meth:`__init__` method instead (where the normal rules for
calling inherited methods apply).
Any initialisation which cannot safely be done in the :meth:`__cinit__` method should
be done in the :meth:`__init__` method. By the time :meth:`__init__` is called, the object is
a fully valid Python object and all operations are safe. Under some
circumstances it is possible for :meth:`__init__` to be called more than once or not
to be called at all, so your other methods should be designed to be robust in
such situations.
Any arguments passed to the constructor will be passed to both the
:meth:`__cinit__` method and the :meth:`__init__` method. If you anticipate
subclassing your extension type in Python, you may find it useful to give the
:meth:`__cinit__` method `*` and `**` arguments so that it can accept and
ignore extra arguments. Otherwise, any Python subclass which has an
:meth:`__init__` with a different signature will have to override
:meth:`__new__`[#] as well as :meth:`__init__`, which the writer of a Python
class wouldn't expect to have to do. Alternatively, as a convenience, if you declare
your :meth:`__cinit__`` method to take no arguments (other than self) it
will simply ignore any extra arguments passed to the constructor without
complaining about the signature mismatch.
.. Note: Older Cython files may use :meth:`__new__` rather than :meth:`__cinit__`. The two are synonyms.
The name change from :meth:`__new__` to :meth:`__cinit__` was to avoid
confusion with Python :meth:`__new__` (which is an entirely different
concept) and eventually the use of :meth:`__new__` in Cython will be
disallowed to pave the way for supporting Python-style :meth:`__new__`
.. [#] http://docs.python.org/reference/datamodel.html#object.__new__
Finalization method: :meth:`__dealloc__`
----------------------------------------
The counterpart to the :meth:`__cinit__` method is the :meth:`__dealloc__`
method, which should perform the inverse of the :meth:`__cinit__` method. Any
C data that you explicitly allocated (e.g. via malloc) in your
:meth:`__cinit__` method should be freed in your :meth:`__dealloc__` method.
You need to be careful what you do in a :meth:`__dealloc__` method. By the time your
:meth:`__dealloc__` method is called, the object may already have been partially
destroyed and may not be in a valid state as far as Python is concerned, so
you should avoid invoking any Python operations which might touch the object.
In particular, don't call any other methods of the object or do anything which
might cause the object to be resurrected. It's best if you stick to just
deallocating C data.
You don't need to worry about deallocating Python attributes of your object,
because that will be done for you by Cython after your :meth:`__dealloc__` method
returns.
.. Note: There is no :meth:`__del__` method for extension types.
Arithmetic methods
-------------------
Arithmetic operator methods, such as :meth:`__add__`, behave differently from their
Python counterparts. There are no separate "reversed" versions of these
methods (:meth:`__radd__`, etc.) Instead, if the first operand cannot perform the
operation, the same method of the second operand is called, with the operands
in the same order.
This means that you can't rely on the first parameter of these methods being
"self" or being the right type, and you should test the types of both operands
before deciding what to do. If you can't handle the combination of types you've
been given, you should return `NotImplemented`.
This also applies to the in-place arithmetic method :meth:`__ipow__`. It doesn't apply
to any of the other in-place methods (:meth:`__iadd__`, etc.) which always
take `self` as the first argument.
Rich comparisons
-----------------
There are no separate methods for the individual rich comparison operations
(:meth:`__eq__`, :meth:`__le__`, etc.) Instead there is a single method
:meth:`__richcmp__` which takes an integer indicating which operation is to be
performed, as follows:
+-----+-----+
| < | 0 |
+-----+-----+
| == | 2 |
+-----+-----+
| > | 4 |
+-----+-----+
| <= | 1 |
+-----+-----+
| != | 3 |
+-----+-----+
| >= | 5 |
+-----+-----+
The :meth:`__next__` method
----------------------------
Extension types wishing to implement the iterator interface should define a
method called :meth:`__next__`, not next. The Python system will automatically
supply a next method which calls your :meth:`__next__`. Do *NOT* explicitly
give your type a :meth:`next` method, or bad things could happen.
Special Method Table
---------------------
This table lists all of the special methods together with their parameter and
return types. In the table below, a parameter name of self is used to indicate
that the parameter has the type that the method belongs to. Other parameters
with no type specified in the table are generic Python objects.
You don't have to declare your method as taking these parameter types. If you
declare different types, conversions will be performed as necessary.
General
^^^^^^^
+-----------------------+---------------------------------------+-------------+-----------------------------------------------------+
| Name | Parameters | Return type | Description |
+=======================+=======================================+=============+=====================================================+
| __cinit__ |self, ... | | Basic initialisation (no direct Python equivalent) |
+-----------------------+---------------------------------------+-------------+-----------------------------------------------------+
| __init__ |self, ... | | Further initialisation |
+-----------------------+---------------------------------------+-------------+-----------------------------------------------------+
| __dealloc__ |self | | Basic deallocation (no direct Python equivalent) |
+-----------------------+---------------------------------------+-------------+-----------------------------------------------------+
| __cmp__ |x, y | int | 3-way comparison |
+-----------------------+---------------------------------------+-------------+-----------------------------------------------------+
| __richcmp__ |x, y, int op | object | Rich comparison (no direct Python equivalent) |
+-----------------------+---------------------------------------+-------------+-----------------------------------------------------+
| __str__ |self | object | str(self) |
+-----------------------+---------------------------------------+-------------+-----------------------------------------------------+
| __repr__ |self | object | repr(self) |
+-----------------------+---------------------------------------+-------------+-----------------------------------------------------+
| __hash__ |self | int | Hash function |
+-----------------------+---------------------------------------+-------------+-----------------------------------------------------+
| __call__ |self, ... | object | self(...) |
+-----------------------+---------------------------------------+-------------+-----------------------------------------------------+
| __iter__ |self | object | Return iterator for sequence |
+-----------------------+---------------------------------------+-------------+-----------------------------------------------------+
| __getattr__ |self, name | object | Get attribute |
+-----------------------+---------------------------------------+-------------+-----------------------------------------------------+
| __setattr__ |self, name, val | | Set attribute |
+-----------------------+---------------------------------------+-------------+-----------------------------------------------------+
| __delattr__ |self, name | | Delete attribute |
+-----------------------+---------------------------------------+-------------+-----------------------------------------------------+
Arithmetic operators
^^^^^^^^^^^^^^^^^^^^
+-----------------------+---------------------------------------+-------------+-----------------------------------------------------+
| Name | Parameters | Return type | Description |
+=======================+=======================================+=============+=====================================================+
| __add__ | x, y | object | binary `+` operator |
+-----------------------+---------------------------------------+-------------+-----------------------------------------------------+
| __sub__ | x, y | object | binary `-` operator |
+-----------------------+---------------------------------------+-------------+-----------------------------------------------------+
| __mul__ | x, y | object | `*` operator |
+-----------------------+---------------------------------------+-------------+-----------------------------------------------------+
| __div__ | x, y | object | `/` operator for old-style division |
+-----------------------+---------------------------------------+-------------+-----------------------------------------------------+
| __floordiv__ | x, y | object | `//` operator |
+-----------------------+---------------------------------------+-------------+-----------------------------------------------------+
| __truediv__ | x, y | object | `/` operator for new-style division |
+-----------------------+---------------------------------------+-------------+-----------------------------------------------------+
| __mod__ | x, y | object | `%` operator |
+-----------------------+---------------------------------------+-------------+-----------------------------------------------------+
| __divmod__ | x, y | object | combined div and mod |
+-----------------------+---------------------------------------+-------------+-----------------------------------------------------+
| __pow__ | x, y, z | object | `**` operator or pow(x, y, z) |
+-----------------------+---------------------------------------+-------------+-----------------------------------------------------+
| __neg__ | self | object | unary `-` operator |
+-----------------------+---------------------------------------+-------------+-----------------------------------------------------+
| __pos__ | self | object | unary `+` operator |
+-----------------------+---------------------------------------+-------------+-----------------------------------------------------+
| __abs__ | self | object | absolute value |
+-----------------------+---------------------------------------+-------------+-----------------------------------------------------+
| __nonzero__ | self | int | convert to boolean |
+-----------------------+---------------------------------------+-------------+-----------------------------------------------------+
| __invert__ | self | object | `~` operator |
+-----------------------+---------------------------------------+-------------+-----------------------------------------------------+
| __lshift__ | x, y | object | `<<` operator |
+-----------------------+---------------------------------------+-------------+-----------------------------------------------------+
| __rshift__ | x, y | object | `>>` operator |
+-----------------------+---------------------------------------+-------------+-----------------------------------------------------+
| __and__ | x, y | object | `&` operator |
+-----------------------+---------------------------------------+-------------+-----------------------------------------------------+
| __or__ | x, y | object | `|` operator |
+-----------------------+---------------------------------------+-------------+-----------------------------------------------------+
| __xor__ | x, y | object | `^` operator |
+-----------------------+---------------------------------------+-------------+-----------------------------------------------------+
Numeric conversions
^^^^^^^^^^^^^^^^^^^
+-----------------------+---------------------------------------+-------------+-----------------------------------------------------+
| Name | Parameters | Return type | Description |
+=======================+=======================================+=============+=====================================================+
| __int__ | self | object | Convert to integer |
+-----------------------+---------------------------------------+-------------+-----------------------------------------------------+
| __long__ | self | object | Convert to long integer |
+-----------------------+---------------------------------------+-------------+-----------------------------------------------------+
| __float__ | self | object | Convert to float |
+-----------------------+---------------------------------------+-------------+-----------------------------------------------------+
| __oct__ | self | object | Convert to octal |
+-----------------------+---------------------------------------+-------------+-----------------------------------------------------+
| __hex__ | self | object | Convert to hexadecimal |
+-----------------------+---------------------------------------+-------------+-----------------------------------------------------+
| __index__ (2.5+ only) | self | object | Convert to sequence index |
+-----------------------+---------------------------------------+-------------+-----------------------------------------------------+
In-place arithmetic operators
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+-----------------------+---------------------------------------+-------------+-----------------------------------------------------+
| Name | Parameters | Return type | Description |
+=======================+=======================================+=============+=====================================================+
| __iadd__ | self, x | object | `+=` operator |
+-----------------------+---------------------------------------+-------------+-----------------------------------------------------+
| __isub__ | self, x | object | `-=` operator |
+-----------------------+---------------------------------------+-------------+-----------------------------------------------------+
| __imul__ | self, x | object | `*=` operator |
+-----------------------+---------------------------------------+-------------+-----------------------------------------------------+
| __idiv__ | self, x | object | `/=` operator for old-style division |
+-----------------------+---------------------------------------+-------------+-----------------------------------------------------+
| __ifloordiv__ | self, x | object | `//=` operator |
+-----------------------+---------------------------------------+-------------+-----------------------------------------------------+
| __itruediv__ | self, x | object | `/=` operator for new-style division |
+-----------------------+---------------------------------------+-------------+-----------------------------------------------------+
| __imod__ | self, x | object | `%=` operator |
+-----------------------+---------------------------------------+-------------+-----------------------------------------------------+
| __ipow__ | x, y, z | object | `**=` operator |
+-----------------------+---------------------------------------+-------------+-----------------------------------------------------+
| __ilshift__ | self, x | object | `<<=` operator |
+-----------------------+---------------------------------------+-------------+-----------------------------------------------------+
| __irshift__ | self, x | object | `>>=` operator |
+-----------------------+---------------------------------------+-------------+-----------------------------------------------------+
| __iand__ | self, x | object | `&=` operator |
+-----------------------+---------------------------------------+-------------+-----------------------------------------------------+
| __ior__ | self, x | object | `|=` operator |
+-----------------------+---------------------------------------+-------------+-----------------------------------------------------+
| __ixor__ | self, x | object | `^=` operator |
+-----------------------+---------------------------------------+-------------+-----------------------------------------------------+
Sequences and mappings
^^^^^^^^^^^^^^^^^^^^^^
+-----------------------+---------------------------------------+-------------+-----------------------------------------------------+
| Name | Parameters | Return type | Description |
+=======================+=======================================+=============+=====================================================+
| __len__ | self int | | len(self) |
+-----------------------+---------------------------------------+-------------+-----------------------------------------------------+
| __getitem__ | self, x | object | self[x] |
+-----------------------+---------------------------------------+-------------+-----------------------------------------------------+
| __setitem__ | self, x, y | | self[x] = y |
+-----------------------+---------------------------------------+-------------+-----------------------------------------------------+
| __delitem__ | self, x | | del self[x] |
+-----------------------+---------------------------------------+-------------+-----------------------------------------------------+
| __getslice__ | self, Py_ssize_t i, Py_ssize_t j | object | self[i:j] |
+-----------------------+---------------------------------------+-------------+-----------------------------------------------------+
| __setslice__ | self, Py_ssize_t i, Py_ssize_t j, x | | self[i:j] = x |
+-----------------------+---------------------------------------+-------------+-----------------------------------------------------+
| __delslice__ | self, Py_ssize_t i, Py_ssize_t j | | del self[i:j] |
+-----------------------+---------------------------------------+-------------+-----------------------------------------------------+
| __contains__ | self, x | int | x in self |
+-----------------------+---------------------------------------+-------------+-----------------------------------------------------+
Iterators
^^^^^^^^^
+-----------------------+---------------------------------------+-------------+-----------------------------------------------------+
| Name | Parameters | Return type | Description |
+=======================+=======================================+=============+=====================================================+
| __next__ | self | object | Get next item (called next in Python) |
+-----------------------+---------------------------------------+-------------+-----------------------------------------------------+
Buffer interface (no Python equivalents - see note 1)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+-----------------------+---------------------------------------+-------------+-----------------------------------------------------+
| Name | Parameters | Return type | Description |
+=======================+=======================================+=============+=====================================================+
| __getreadbuffer__ | self, int i, void `**p` | | |
+-----------------------+---------------------------------------+-------------+-----------------------------------------------------+
| __getwritebuffer__ | self, int i, void `**p` | | |
+-----------------------+---------------------------------------+-------------+-----------------------------------------------------+
| __getsegcount__ | self, int `*p` | | |
+-----------------------+---------------------------------------+-------------+-----------------------------------------------------+
| __getcharbuffer__ | self, int i, char `**p` | | |
+-----------------------+---------------------------------------+-------------+-----------------------------------------------------+
Descriptor objects (see note 2)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+-----------------------+---------------------------------------+-------------+-----------------------------------------------------+
| Name | Parameters | Return type | Description |
+=======================+=======================================+=============+=====================================================+
| __get__ | self, instance, class | object | Get value of attribute |
+-----------------------+---------------------------------------+-------------+-----------------------------------------------------+
| __set__ | self, instance, value | | Set value of attribute |
+-----------------------+---------------------------------------+-------------+-----------------------------------------------------+
| __delete__ | self, instance | | Delete attribute |
+-----------------------+---------------------------------------+-------------+-----------------------------------------------------+
.. note:: (1) The buffer interface is intended for use by C code and is not directly
accessible from Python. It is described in the Python/C API Reference Manual
under sections 6.6 and 10.6.
.. note:: (2) Descriptor objects are part of the support mechanism for new-style
Python classes. See the discussion of descriptors in the Python documentation.
See also PEP 252, "Making Types Look More Like Classes", and PEP 253,
"Subtyping Built-In Types".
.. highlight:: cython
.. _tutorial:
*********
Tutorial
*********
The Basics of Cython
====================
The fundamental nature of Cython can be summed up as follows: Cython is Python
with C data types.
Cython is Python: Almost any piece of Python code is also valid Cython code.
(There are a few :ref:`cython-limitations`, but this approximation will
serve for now.) The Cython compiler will convert it into C code which makes
equivalent calls to the Python/C API.
But Cython is much more than that, because parameters and variables can be
declared to have C data types. Code which manipulates Python values and C
values can be freely intermixed, with conversions occurring automatically
wherever possible. Reference count maintenance and error checking of Python
operations is also automatic, and the full power of Python's exception
handling facilities, including the try-except and try-finally statements, is
available to you -- even in the midst of manipulating C data.
Cython Hello World
===================
As Cython can accept almost any valid python source file, one of the hardest
things in getting started is just figuring out how to compile your extension.
So lets start with the canonical python hello world::
print "Hello World"
So the first thing to do is rename the file to :file:`helloworld.pyx`. Now we
need to make the :file:`setup.py`, which is like a python Makefile (for more
information see :ref:`compilation`). Your :file:`setup.py` should look like::
from distutils.core import setup
from distutils.extension import Extension
from Cython.Distutils import build_ext
setup(
cmdclass = {'build_ext': build_ext},
ext_modules = [Extension("helloworld", ["helloworld.pyx"])]
)
To use this to build your Cython file use the commandline options:
.. sourcecode:: text
$ python setup.py build_ext --inplace
Which will leave a file in your local directory called :file:`helloworld.so` in unix
or :file:`helloworld.dll` in Windows. Now to use this file: start the python
interpreter and simply import it as if it was a regular python module::
>>> import helloworld
Hello World
Congratulations! You now know how to build a Cython extension. But So Far
this example doesn't really give a feeling why one would ever want to use Cython, so
lets create a more realistic example.
:mod:`pyximport`: Cython Compilation the Easy Way
==================================================
If your module doesn't require any extra C libraries or a special
build setup, then you can use the pyximport module by Paul Prescod and
Stefan Behnel to load .pyx files directly on import, without having to
write a :file:`setup.py` file. It is shipped and installed with
Cython and can be used like this::
>>> import pyximport; pyximport.install()
>>> import helloworld
Hello World
Since Cython 0.11, the :mod:`pyximport` module also has experimental
compilation support for normal Python modules. This allows you to
automatically run Cython on every .pyx and .py module that Python
imports, including the standard library and installed packages.
Cython will still fail to compile a lot of Python modules, in which
case the import mechanism will fall back to loading the Python source
modules instead. The .py import mechanism is installed like this::
>>> pyximport.install(pyimport = True)
Fibonacci Fun
==============
From the official Python tutorial a simple fibonacci function is defined as:
.. literalinclude:: ../examples/tutorial/fib1/fib.pyx
Now following the steps for the Hello World example we first rename the file
to have a `.pyx` extension, lets say :file:`fib.pyx`, then we create the
:file:`setup.py` file. Using the file created for the Hello World example, all
that you need to change is the name of the Cython filename, and the resulting
module name, doing this we have:
.. literalinclude:: ../examples/tutorial/fib1/setup.py
Build the extension with the same command used for the helloworld.pyx:
.. sourcecode:: text
$ python setup.py build_ext --inplace
And use the new extension with::
>>> import fib
>>> fib.fib(2000)
1 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987 1597
Primes
=======
Here's a small example showing some of what can be done. It's a routine for
finding prime numbers. You tell it how many primes you want, and it returns
them as a Python list.
:file:`primes.pyx`:
.. literalinclude:: ../examples/tutorial/primes/primes.pyx
:linenos:
You'll see that it starts out just like a normal Python function definition,
except that the parameter ``kmax`` is declared to be of type ``int`` . This
means that the object passed will be converted to a C integer (or a
``TypeError.`` will be raised if it can't be).
Lines 2 and 3 use the ``cdef`` statement to define some local C variables.
Line 4 creates a Python list which will be used to return the result. You'll
notice that this is done exactly the same way it would be in Python. Because
the variable result hasn't been given a type, it is assumed to hold a Python
object.
Lines 7-9 set up for a loop which will test candidate numbers for primeness
until the required number of primes has been found. Lines 11-12, which try
dividing a candidate by all the primes found so far, are of particular
interest. Because no Python objects are referred to, the loop is translated
entirely into C code, and thus runs very fast.
When a prime is found, lines 14-15 add it to the p array for fast access by
the testing loop, and line 16 adds it to the result list. Again, you'll notice
that line 16 looks very much like a Python statement, and in fact it is, with
the twist that the C parameter ``n`` is automatically converted to a Python
object before being passed to the append method. Finally, at line 18, a normal
Python return statement returns the result list.
Compiling primes.pyx with the Cython compiler produces an extension module
which we can try out in the interactive interpreter as follows::
>>> import primes
>>> primes.primes(10)
[2, 3, 5, 7, 11, 13, 17, 19, 23, 29]
See, it works! And if you're curious about how much work Cython has saved you,
take a look at the C code generated for this module.
Language Details
================
For more about the Cython language, see :ref:`language-basics`.
To dive right in to using Cython in a numerical computation context, see :ref:`numpy_tutorial`.
.. highlight:: cython
.. _wrapping-cplusplus:
********************************
Wrapping C++ Classes in Cython
********************************
Overview
=========
This page aims to get you quickly up to speed so you can wrap C++ interfaces
with a minimum of pain and 'surprises'.
In the past, Pyrex only supported wrapping of C APIs, and not C++. To wrap
C++, one had to write a pure-C shim, containing functions for
constructors/destructors and method invocations. Object pointers were passed
around as opaque void pointers, and cast to/from object pointers as needed.
This approach did work, but it got awfully messy and error-prone when trying
to wrap APIs with large class hierarchies and lots of inheritance.
These days, though, Pyrex offers an adequate bare minimum of C++ support,
which Cython has inherited. The approach described in this document will help
you wrap a lot of C++ code with only moderate effort. There are some
limitations, which we will discuss at the end of the document.
Procedure Overview
====================
* Specify C++ language in :file:`setup.py` script
* Create ``cdef extern from`` blocks and declare classes as
``ctypedef struct`` blocks
* Create constructors and destructors
* Add class methods as function pointers
* Create Cython wrapper class
An example C++ API
===================
Here is a tiny C++ API which we will use as an example throughout this
document. Let's assume it will be in a header file called
:file:`Rectangle.h`:
.. sourcecode:: c++
class Rectangle {
public:
int x0, y0, x1, y1;
Rectangle(int x0, int y0, int x1, int y1);
~Rectangle();
int getLength();
int getHeight();
int getArea();
void move(int dx, int dy);
};
This is pretty dumb, but should suffice to demonstrate the steps involved.
Specify C++ language in setup.py
=================================
In Cython :file:`setup.py` scripts, one normally instantiates an Extension
object. To make Cython generate and compile a C++ source, you just need
to add a keyword to your Extension construction statement, as in::
ext = Extension(
"rectangle", # name of extension
["rectangle.pyx"], # filename of our Cython source
language="c++", # this causes Cython to create C++ source
include_dirs=[...], # usual stuff
libraries=[...], # ditto
extra_link_args=[...], # if needed
cmdclass = {'build_ext': build_ext}
)
With the language="c++" keyword, Cython distutils will generate a C++ file.
Create cdef extern from block
==============================
The procedure for wrapping a C++ class is quite similar to that for wrapping
normal C structs, with a couple of additions. Let's start here by creating the
basic ``cdef extern from`` block::
cdef extern from "Rectangle.h":
This will make the C++ class def for Rectangle available.
Declare class as a ctypedef struct
-----------------------------------
Now, let's add the Rectangle class to this extern from block -- just copy the
class def from :file:`Rectangle.h` and adjust for Cython syntax, so now it
becomes::
cdef extern from "Rectangle.h":
# known in Cython namespace as 'c_Rectangle' but in C++ as 'Rectangle'
ctypedef struct c_Rectangle "Rectangle":
int x0, y0, x1, y1
We don't have any way of accessing the constructor/destructor or methods, but
we'll cover this now.
Add constructors and destructors
----------------------------------
We now need to expose a constructor and destructor into the Cython
namespace. Again, we'll be using C name specifications::
cdef extern from "Rectangle.h":
ctypedef struct c_Rectangle "Rectangle":
int x0, y0, x1, y1
c_Rectangle *new_Rectangle "new Rectangle" (int x0, int y0, int x1, int y1)
void del_Rectangle "delete" (c_Rectangle *rect)
Add class methods
-------------------
Now, let's add the class methods. You can circumvent Cython syntax
limitations by declaring these as function pointers. Recall that in the C++
class we have:
.. sourcecode:: c++
int getLength();
int getHeight();
int getArea();
void move(int dx, int dy);
So if we convert each of these to function pointers and stick them in our
extern block, we now get::
cdef extern from "Rectangle.h":
ctypedef struct c_Rectangle "Rectangle":
int x0, y0, x1, y1
int getLength()
int getHeight()
int getArea()
void move(int dx, int dy)
c_Rectangle *new_Rectangle "new Rectangle" (int x0, int y0, int x1, int y1)
void del_Rectangle "delete" (c_Rectangle *rect)
This will fool Cython into generating C++ method calls even though
Cython is mostly oblivious to C++.
In Pyrex you must explicitly declare these as function pointers, i.e.
``(int *getArea)()``.
Create Cython wrapper class
=============================
At this point, we have exposed into our pyx file's namespace a struct which
gives us access to the interface of a C++ Rectangle type. Now, we need to make
this accessible from external Python code (which is our whole point).
Common programming practice is to create a Cython extension type which
holds a C++ instance pointer as an attribute ``thisptr``, and create a bunch of
forwarding methods. So we can implement the Python extension type as::
cdef class Rectangle:
cdef c_Rectangle *thisptr # hold a C++ instance which we're wrapping
def __cinit__(self, int x0, int y0, int x1, int y1):
self.thisptr = new_Rectangle(x0, y0, x1, y1)
def __dealloc__(self):
del_Rectangle(self.thisptr)
def getLength(self):
return self.thisptr.getLength()
def getHeight(self):
return self.thisptr.getHeight()
def getArea(self):
return self.thisptr.getArea()
def move(self, dx, dy):
self.thisptr.move(dx, dy)
And there we have it. From a Python perspective, this extension type will look
and feel just like a natively defined Rectangle class. If you want to give
attribute access, you could just implement some properties::
property x0:
def __get__(self): return self.thisptr.x0
def __set__(self, x0): self.thisptr.x0 = x0
...
Caveats and Limitations
========================
In this document, we have discussed a relatively straightforward way of
wrapping C++ classes with Cython. However, there are some limitations in
this approach, some of which could be overcome with clever workarounds (anyone
here want to share some?), but some of which will require new features in
Cython.
The major limitations I'm most immediately aware of (and there will be many
more) include:
Overloading
------------
Presently, it's not easy to overload methods or constructors, but there may be
a workaround if you try some creative C name specifications
Access to C-only functions
---------------------------
Whenever generating C++ code, Cython generates declarations of and calls
to functions assuming these functions are C++ (ie, not declared as extern "C"
{...} . This is ok if the C functions have C++ entry points, but if they're C
only, you will hit a roadblock. If you have a C++ Cython module needing
to make calls to pure-C functions, you will need to write a small C++ shim
module which:
* includes the needed C headers in an extern "C" block
* contains minimal forwarding functions in C++, each of which calls the
respective pure-C function
Inherited C++ methods
----------------------
If you have a class ``Foo`` with a child class ``Bar``, and ``Foo`` has a
method :meth:`fred`, then you'll have to cast to access this method from
``Bar`` objects.
For example::
class MyClass:
Bar *b
...
def myfunc(self):
...
b.fred() # wrong, won't work
(<Foo *>(self.b)).fred() # should work, Cython now thinks it's a 'Foo'
It might take some experimenting by others (you?) to find the most elegant
ways of handling this issue.
Advanced C++ features
----------------------
Exceptions
^^^^^^^^^^^
Cython cannot throw C++ exceptions, or catch them with a try-except statement,
but it is possible to declare a function as potentially raising an C++
exception and converting it into a Python exception. For example, ::
cdef extern from "some_file.h":
cdef int foo() except +
This will translate try and the C++ error into an appropriate Python exception
(currently an IndexError on std::out_of_range and a RuntimeError otherwise
(preserving the what() message). ::
cdef int bar() except +MemoryError
This will catch any C++ error and raise a Python MemoryError in its place.
(Any Python exception is valid here.) ::
cdef int raise_py_error()
cdef int something_dangerous() except +raise_py_error
If something_dangerous raises a C++ exception then raise_py_error will be
called, which allows one to do custom C++ to Python error "translations." If
raise_py_error does not actually raise an exception a RuntimeError will be
raised.
Templates
^^^^^^^^^^
Cython does not natively understand C++ templates but we can put them to use
in some way. As an example consider an STL vector of C ints::
cdef extern from "some .h file which includes <vector>":
ctypedef struct intvec "std::vector<unsigned int>":
void (* push_back)(int elem)
intvec intvec_factory "std::vector<unsigned int>"(int len)
now we can use the vector like this::
cdef intvec v = intvec_factory(2)
v.push_back(2)
Overloading
^^^^^^^^^^^^
To support function overloading simply add a different alias to each
signature, so if you have e.g.
.. sourcecode:: c++
int foo(int a);
int foo(int a, int b);
in your C++ header then interface it like this in your ::
int fooi "foo"(int)
int fooii "foo"(int, int)
Operators
^^^^^^^^^^
Some operators (e.g. +,-,...) can be accessed from Cython like this::
ctypedef struct c_Rectangle "Rectangle":
c_Rectangle add "operator+"(c_Rectangle right)
Declaring/Using References
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Question: How do you declare and call a function that takes a reference as an argument?
Conclusion
============
A great many existing C++ classes can be wrapped using these techniques, in a
way much easier than writing a large messy C shim module. There's a bit of
manual work involved, and an annoying maintenance burden if the C++ library
you're wrapping is frequently changing, but this recipe should hopefully keep
the discomfort to a minimum.
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment