Commit de2a01e2 authored by scoder's avatar scoder Committed by GitHub

Merge pull request #2151 from gabrieldemarmiesse/example_cython_tutorial

An improved version of the tutorial.
parents ff591c46 8f9485cc
def primes(int kmax):
cdef int n, k, i
def primes(int nb_primes):
cdef int n, i, len_p
cdef int p[1000]
result = []
if kmax > 1000:
kmax = 1000
k = 0
if nb_primes > 1000:
nb_primes = 1000
len_p = 0 # The number of elements in p
n = 2
while k < kmax:
i = 0
while i < k and n % p[i] != 0:
i = i + 1
if i == k:
p[k] = n
k = k + 1
result.append(n)
n = n + 1
return result
while len_p < nb_primes:
# Is n prime?
for i in p[:len_p]:
if n % i == 0:
break
# If no break occurred in the loop
else:
p[len_p] = n
len_p += 1
n += 1
# Let's put the result in a python list:
result_as_list = [prime for prime in p[:len_p]]
return result_as_list
......@@ -132,27 +132,90 @@ them as a Python list.
:linenos:
You'll see that it starts out just like a normal Python function definition,
except that the parameter ``kmax`` is declared to be of type ``int`` . This
except that the parameter ``nb_primes`` is declared to be of type ``int`` . This
means that the object passed will be converted to a C integer (or a
``TypeError.`` will be raised if it can't be).
Now, let's dig into the core of the function::
cdef int n, i, len_p
cdef int p[1000]
Lines 2 and 3 use the ``cdef`` statement to define some local C variables.
Line 4 creates a Python list which will be used to return the result. You'll
notice that this is done exactly the same way it would be in Python. Because
the variable result hasn't been given a type, it is assumed to hold a Python
object.
The result is stored in the C array ``p`` during processing,
and will be copied into a Python list at the end (line 22).
.. NOTE:: You cannot create very large arrays in this manner, because
they are allocated on something called the stack.
To request larger arrays,
or even arrays with a length only known at runtime
you can learn how to use :ref:`Python arrays<array-array>`
or :ref:`NumPy arrays<memoryviews>` with Cython.
::
if nb_primes > 1000:
nb_primes = 1000
As in C, declaring a static array requires knowing the size at compile time.
We make sure the user doesn't set a value above 1000 (or we would have a
segmentation fault, just like in C). ::
len_p = 0 # The number of elements in p
n = 2
while len_p < nb_primes:
Lines 7-9 set up for a loop which will test candidate numbers for primeness
until the required number of primes has been found. Lines 11-12, which try
dividing a candidate by all the primes found so far, are of particular
interest. Because no Python objects are referred to, the loop is translated
entirely into C code, and thus runs very fast.
When a prime is found, lines 14-15 add it to the p array for fast access by
the testing loop, and line 16 adds it to the result list. Again, you'll notice
that line 16 looks very much like a Python statement, and in fact it is, with
the twist that the C parameter ``n`` is automatically converted to a Python
object before being passed to the append method. Finally, at line 18, a normal
until the required number of primes has been found. ::
# Is n prime?
for i in p[:len_p]:
if n % i == 0:
break
Lines 11-12, which try dividing a candidate by all the primes found so far,
are of particular interest. Because no Python objects are referred to,
the loop is translated entirely into C code, and thus runs very fast.
You will notice the way we iterate over the ``p`` C array. ::
for i in p[:len_p]:
The loop gets translated into C code transparently. As if it was a Python list
or a NumPy array. If you don't use ``[:len_p]`` then Cython will loop
over the 1000 elements of the array. ::
# If no break occurred in the loop
else:
p[len_p] = n
len_p += 1
n += 1
If no breaks occurred, it means that we found a prime, and the block of code
after the ``else`` line 16 will be executed. We add the prime found to ``p``.
If you find having an ``else`` after a ``for-loop`` strange, just know that it's a
lesser known features of the Python language of the python syntax, and
actually doesn't exist in C! But since Cython is made to be written with the
Python syntax, it'll work out, but at C speed in this case.
If the ``for-else`` syntax still confuses you, see this excellent
`blog post <https://shahriar.svbtle.com/pythons-else-clause-in-loops>`_. ::
# Let's put the result in a python list:
result_as_list = [prime for prime in p[:len_p]]
return result_as_list
Line 22, before returning the result, we need to copy our C array into a
Python list, because Python can't read C arrays. Cython can automatically
convert many C types from and to Python types, as described in the
documentation on :ref:`type conversion <type-conversion>`. But not C arrays. We can trick
Cython into doing it because Cython knows how to convert a C int to a Python int.
By doing a list comprehension, we "cast" each C int prime from ``p`` into a Python int.
You could have also iterated manually over the C array and used
``result_as_list.append(prime)``, the result would have been the same.
You'll notice we declare a Python list exactly the same way it would be in Python.
Because the variable ``result_as_list`` hasn't been explicitly declared with a type,
it is assumed to hold a Python object.
Finally, at line 18, a normal
Python return statement returns the result list.
Compiling primes.pyx with the Cython compiler produces an extension module
......@@ -165,6 +228,99 @@ which we can try out in the interactive interpreter as follows::
See, it works! And if you're curious about how much work Cython has saved you,
take a look at the C code generated for this module.
Cython has a way to visualise where interaction with Python objects and
Python's C-API is taking place. For this, pass the
``annotate=True`` parameter to ``cythonize()``. It produces a HTML file. Let's see:
.. figure:: htmlreport.png
If a line is white, it means that the code generated doesn't interact
with Python, so will run fast. The darker the yellow, the more Python
interaction there is. Those yellow lines will run slower.
The function declaration and return use the Python interpreter so it makes
sense for those lines to be yellow. Same for the list comprehension because
it involves the creation of a Python object. But the line ``if n % i == 0:``, why?
We can examine the generated C code to understand:
.. figure:: python_division.png
We can see that some checks happen. Because Cython defaults to the
Python behavior, the language will perform division checks at runtime,
just like Python does. You can deactivate those checks by using the
:ref:`compiler directives<compiler-directives>`.
Now let's see if, even if we have division checks, we obtained a boost in speed.
Let's write the same program, but Python-style::
def primes_python(nb_primes):
p = []
n = 2
while len(p) < nb_primes:
# Is n prime?
for i in p:
if n % i == 0:
break
# If no break occurred in the loop
else:
p.append(n)
n += 1
return p
It is also possible to take a plain ``.py`` file and to compile it with Cython.
Let's take ``primes_python``, change the function name to ``primes_python_compiled`` and
compile it with Cython (without changing the code). We will also change the name of the
file to ``example_py_cy.py`` to differentiate it from the others.
Now the ``setup.py`` looks like this::
from distutils.core import setup
from Cython.Build import cythonize
setup(
ext_modules=cythonize(['example.pyx', # has the primes() function
'example_py_cy.py'], # has the primes_python_compiled() function
annotate=True), # produces the html annotation file
)
Now we can ensure that those two programs output the same values::
>>> primes_python(1000) == primes(1000)
True
>>> primes_python_compiled(1000) == primes(1000)
True
It's possible to compare the speed now::
python -m timeit -s 'from example_py import primes_python' 'primes_python(1000)'
10 loops, best of 3: 23 msec per loop
python -m timeit -s 'from example_py_cy import primes_python_compiled' 'primes_python_compiled(1000)'
100 loops, best of 3: 11.9 msec per loop
python -m timeit -s 'from example import primes' 'primes(1000)'
1000 loops, best of 3: 1.65 msec per loop
The cythonize version of ``primes_python`` is 2 times faster than the Python one,
without changing a single line of code.
The Cython version is 13 times faster than the Python version! What could explain this?
Multiple things:
* In this program, very little computation happen at each line.
So the overhead of the python interpreter is very important. It would be
very different if you were to do a lot computation at each line. Using NumPy for
example.
* Data locality. It's likely that a lot more can fit in CPU cache when using C than
when using Python. Because everything in python is an object, and every object is
implemented as a dictionary, this is not very cache friendly.
Usually the speedups are between 2x to 1000x. It depends on how much you call
the Python interpreter. As always, remember to profile before adding types
everywhere. Adding types makes your code less readable, so use them with
moderation.
Language Details
================
......
......@@ -301,6 +301,10 @@ return value and raise it yourself, for example,::
raise SpamError("Couldn't open the spam file")
.. _type-conversion:
Automatic type conversions
==========================
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment