early_binding_for_speed.rst 4.79 KB
Newer Older
Robert Bradshaw's avatar
Robert Bradshaw committed
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55
.. highlight:: cython

.. _early-binding-for-speed:

**************************
Early Binding for Speed
**************************

As a dynamic language, Python encourages a programming style of considering
classes and objects in terms of their methods and attributes, more than where
they fit into the class hierarchy.

This can make Python a very relaxed and comfortable language for rapid
development, but with a price - the 'red tape' of managing data types is
dumped onto the interpreter. At run time, the interpreter does a lot of work
searching namespaces, fetching attributes and parsing argument and keyword
tuples. This run-time 'late binding' is a major cause of Python's relative
slowness compared to 'early binding' languages such as C++.

However with Cython it is possible to gain significant speed-ups through the
use of 'early binding' programming techniques.

For example, consider the following (silly) code example:

.. sourcecode:: cython

    cdef class Rectangle:
        cdef int x0, y0
        cdef int x1, y1
        def __init__(self, int x0, int y0, int x1, int y1):
            self.x0 = x0; self.y0 = y0; self.x1 = x1; self.y1 = y1
        def area(self):
            area = (self.x1 - self.x0) * (self.y1 - self.y0)
            if area < 0:
                area = -area
            return area

    def rectArea(x0, y0, x1, y1):
        rect = Rectangle(x0, y0, x1, y1)
        return rect.area()

In the :func:`rectArea` method, the call to :meth:`rect.area` and the
:meth:`.area` method contain a lot of Python overhead.

However, in Cython, it is possible to eliminate a lot of this overhead in cases
where calls occur within Cython code. For example:

.. sourcecode:: cython

    cdef class Rectangle:
        cdef int x0, y0
        cdef int x1, y1
        def __init__(self, int x0, int y0, int x1, int y1):
            self.x0 = x0; self.y0 = y0; self.x1 = x1; self.y1 = y1
        cdef int _area(self):
56
            cdef int area
Robert Bradshaw's avatar
Robert Bradshaw committed
57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75
            area = (self.x1 - self.x0) * (self.y1 - self.y0)
            if area < 0:
                area = -area
            return area
        def area(self):
            return self._area()

    def rectArea(x0, y0, x1, y1):
        cdef Rectangle rect
        rect = Rectangle(x0, y0, x1, y1)
        return rect._area()

Here, in the Rectangle extension class, we have defined two different area
calculation methods, the efficient :meth:`_area` C method, and the
Python-callable :meth:`area` method which serves as a thin wrapper around
:meth:`_area`. Note also in the function :func:`rectArea` how we 'early bind'
by declaring the local variable ``rect`` which is explicitly given the type
Rectangle. By using this declaration, instead of just dynamically assigning to
``rect``, we gain the ability to access the much more efficient C-callable
76
:meth:`_area` method.
Robert Bradshaw's avatar
Robert Bradshaw committed
77 78 79 80 81 82 83 84 85 86 87 88 89 90

But Cython offers us more simplicity again, by allowing us to declare
dual-access methods - methods that can be efficiently called at C level, but
can also be accessed from pure Python code at the cost of the Python access
overheads. Consider this code:

.. sourcecode:: cython

    cdef class Rectangle:
        cdef int x0, y0
        cdef int x1, y1
        def __init__(self, int x0, int y0, int x1, int y1):
            self.x0 = x0; self.y0 = y0; self.x1 = x1; self.y1 = y1
        cpdef int area(self):
91
            cdef int area
Robert Bradshaw's avatar
Robert Bradshaw committed
92 93 94 95 96 97 98 99 100 101
            area = (self.x1 - self.x0) * (self.y1 - self.y0)
            if area < 0:
                area = -area
            return area

    def rectArea(x0, y0, x1, y1):
        cdef Rectangle rect
        rect = Rectangle(x0, y0, x1, y1)
        return rect.area()

mathbunnyru's avatar
mathbunnyru committed
102
.. note::
Robert Bradshaw's avatar
Robert Bradshaw committed
103 104

    in earlier versions of Cython, the :keyword:`cpdef` keyword is
105
    ``rdef`` - but has the same effect).
Robert Bradshaw's avatar
Robert Bradshaw committed
106 107 108 109 110 111 112 113 114 115 116 117 118

Here, we just have a single area method, declared as :keyword:`cpdef` to make it
efficiently callable as a C function, but still accessible from pure Python
(or late-binding Cython) code.

If within Cython code, we have a variable already 'early-bound' (ie, declared
explicitly as type Rectangle, (or cast to type Rectangle), then invoking its
area method will use the efficient C code path and skip the Python overhead.
But if in Pyrex or regular Python code we have a regular object variable
storing a Rectangle object, then invoking the area method will require:

* an attribute lookup for the area method
* packing a tuple for arguments and a dict for keywords (both empty in this case)
mathbunnyru's avatar
mathbunnyru committed
119
* using the Python API to call the method
Robert Bradshaw's avatar
Robert Bradshaw committed
120 121 122 123 124

and within the area method itself:

* parsing the tuple and keywords
* executing the calculation code
mathbunnyru's avatar
mathbunnyru committed
125
* converting the result to a python object and returning it
Robert Bradshaw's avatar
Robert Bradshaw committed
126 127 128 129 130 131

So within Cython, it is possible to achieve massive optimisations by
using strong typing in declaration and casting of variables. For tight loops
which use method calls, and where these methods are pure C, the difference can
be huge.