Merge pull request #2151 from gabrieldemarmiesse/example_cython_tutorial

An improved version of the tutorial.

Merge pull request #2151 from gabrieldemarmiesse/example_cython_tutorial
An improved version of the tutorial.
de2a01e2 · scoder · GitHub · ff591c46 · 8f9485cc · de2a01e2
Commit de2a01e2 authored Mar 17, 2018 by scoder Committed by GitHub Mar 17, 2018
5 changed files
--- a/docs/examples/tutorial/primes/primes.pyx
+++ b/docs/examples/tutorial/primes/primes.pyx
-def primes(int kmax):
-    cdef int n, k, i
+def primes(int nb_primes):
+    cdef int n, i, len_p
    cdef int p[1000]
-    result = []
-    if kmax > 1000:
-        kmax = 1000
-    k = 0
+    if nb_primes > 1000:
+        nb_primes = 1000
+
+    len_p = 0  # The number of elements in p
    n = 2
-    while k < kmax:
-        i = 0
-        while i < k and n % p[i] != 0:
-            i = i + 1
-        if i == k:
-            p[k] = n
-            k = k + 1
-            result.append(n)
-        n = n + 1
-    return result
+    while len_p < nb_primes:
+        # Is n prime?
+        for i in p[:len_p]:
+            if n % i == 0:
+                break
+
+        # If no break occurred in the loop
+        else:
+            p[len_p] = n
+            len_p += 1
+        n += 1

+    # Let's put the result in a python list:
+    result_as_list  = [prime for prime in p[:len_p]]
+    return result_as_list
--- a/docs/src/tutorial/cython_tutorial.rst
+++ b/docs/src/tutorial/cython_tutorial.rst
@@ -132,27 +132,90 @@ them as a Python list.
    :linenos:

 You'll see that it starts out just like a normal Python function definition,
-except that the parameter ``kmax`` is declared to be of type ``int`` . This
+except that the parameter ``nb_primes`` is declared to be of type ``int`` . This
 means that the object passed will be converted to a C integer (or a
 ``TypeError.`` will be raised if it can't be).

+Now, let's dig into the core of the function::
+
+    cdef int n, i, len_p
+    cdef int p[1000]
+
 Lines 2 and 3 use the ``cdef`` statement to define some local C variables.
-Line 4 creates a Python list which will be used to return the result. You'll
-notice that this is done exactly the same way it would be in Python. Because
-the variable result hasn't been given a type, it is assumed to hold a Python
-object.
+The result is stored in the C array ``p`` during processing,
+and will be copied into a Python list at the end (line 22).
+
+.. NOTE:: You cannot create very large arrays in this manner, because
+          they are allocated on something called the stack.
+          To request larger arrays,
+          or even arrays with a length only known at runtime
+          you can learn how to use :ref:`Python arrays<array-array>`
+          or :ref:`NumPy arrays<memoryviews>` with Cython.
+::
+
+    if nb_primes > 1000:
+        nb_primes = 1000
+
+As in C, declaring a static array requires knowing the size at compile time.
+We make sure the user doesn't set a value above 1000 (or we would have a
+segmentation fault, just like in C).  ::
+
+    len_p = 0  # The number of elements in p
+    n = 2
+    while len_p < nb_primes:

 Lines 7-9 set up for a loop which will test candidate numbers for primeness
-until the required number of primes has been found. Lines 11-12, which try
-dividing a candidate by all the primes found so far, are of particular
-interest. Because no Python objects are referred to, the loop is translated
-entirely into C code, and thus runs very fast.
-
-When a prime is found, lines 14-15 add it to the p array for fast access by
-the testing loop, and line 16 adds it to the result list. Again, you'll notice
-that line 16 looks very much like a Python statement, and in fact it is, with
-the twist that the C parameter ``n`` is automatically converted to a Python
-object before being passed to the append method. Finally, at line 18, a normal
+until the required number of primes has been found. ::
+
+    # Is n prime?
+    for i in p[:len_p]:
+        if n % i == 0:
+            break
+
+Lines 11-12, which try dividing a candidate by all the primes found so far,
+are of particular interest. Because no Python objects are referred to,
+the loop is translated entirely into C code, and thus runs very fast.
+You will notice the way we iterate over the ``p`` C array.  ::
+
+    for i in p[:len_p]:
+
+The loop gets translated into C code transparently. As if it was a Python list
+or a NumPy array. If you don't use ``[:len_p]`` then Cython will loop
+over the 1000 elements of the array. ::
+
+    # If no break occurred in the loop
+    else:
+        p[len_p] = n
+        len_p += 1
+    n += 1
+
+If no breaks occurred, it means that we found a prime, and the block of code
+after the ``else`` line 16 will be executed. We add the prime found to ``p``.
+If you find having an ``else`` after a ``for-loop`` strange, just know that it's a
+lesser known features of the Python language of the python syntax, and
+actually doesn't exist in C! But since Cython is made to be written with the
+Python syntax, it'll work out, but at C speed in this case.
+If the ``for-else`` syntax still confuses you, see this excellent
+`blog post <https://shahriar.svbtle.com/pythons-else-clause-in-loops>`_. ::
+
+    # Let's put the result in a python list:
+    result_as_list  = [prime for prime in p[:len_p]]
+    return result_as_list
+
+Line 22, before returning the result, we need to copy our C array into a
+Python list, because Python can't read C arrays. Cython can automatically
+convert many C types from and to Python types, as described in the
+documentation on :ref:`type conversion <type-conversion>`. But not C arrays. We can trick
+Cython into doing it because Cython knows how to convert a C int to a Python int.
+By doing a list comprehension, we "cast" each C int prime from ``p`` into a Python int.
+You could have also iterated manually over the C array and used
+``result_as_list.append(prime)``, the result would have been the same.
+
+You'll notice we declare a Python list exactly the same way it would be in Python.
+Because the variable ``result_as_list`` hasn't been explicitly declared with a type,
+it is assumed to hold a Python object.
+
+Finally, at line 18, a normal
 Python return statement returns the result list.

 Compiling primes.pyx with the Cython compiler produces an extension module
@@ -165,6 +228,99 @@ which we can try out in the interactive interpreter as follows::
 See, it works! And if you're curious about how much work Cython has saved you,
 take a look at the C code generated for this module.

+
+Cython has a way to visualise where interaction with Python objects and
+Python's C-API is taking place. For this, pass the
+``annotate=True`` parameter to ``cythonize()``. It produces a HTML file. Let's see:
+
+.. figure:: htmlreport.png
+
+If a line is white, it means that the code generated doesn't interact
+with Python, so will run fast. The darker the yellow, the more Python
+interaction there is. Those yellow lines will run slower.
+The function declaration and return use the Python interpreter so it makes
+sense for those lines to be yellow. Same for the list comprehension because
+it involves the creation of a Python object. But the line ``if n % i == 0:``, why?
+We can examine the generated C code to understand:
+
+.. figure:: python_division.png
+
+We can see that some checks happen. Because Cython defaults to the
+Python behavior, the language will perform division checks at runtime,
+just like Python does. You can deactivate those checks by using the
+:ref:`compiler directives<compiler-directives>`.
+
+Now let's see if, even if we have division checks, we obtained a boost in speed.
+Let's write the same program, but Python-style::
+
+    def primes_python(nb_primes):
+        p = []
+        n = 2
+        while len(p) < nb_primes:
+            # Is n prime?
+            for i in p:
+                if n % i == 0:
+                    break
+
+            # If no break occurred in the loop
+            else:
+                p.append(n)
+            n += 1
+        return p
+
+
+It is also possible to take a plain ``.py`` file and to compile it with Cython.
+Let's take ``primes_python``, change the function name to ``primes_python_compiled`` and
+compile it with Cython (without changing the code). We will also change the name of the
+file to ``example_py_cy.py`` to differentiate it from the others.
+Now the ``setup.py`` looks like this::
+
+    from distutils.core import setup
+    from Cython.Build import cythonize
+
+    setup(
+        ext_modules=cythonize(['example.pyx',        # has the primes() function
+                               'example_py_cy.py'],  # has the primes_python_compiled() function
+                              annotate=True),        # produces the html annotation file
+    )
+
+Now we can ensure that those two programs output the same values::
+
+    >>> primes_python(1000) == primes(1000)
+    True
+    >>> primes_python_compiled(1000) == primes(1000)
+    True
+
+It's possible to compare the speed now::
+
+    python -m timeit -s 'from example_py import primes_python' 'primes_python(1000)'
+    10 loops, best of 3: 23 msec per loop
+
+    python -m timeit -s 'from example_py_cy import primes_python_compiled' 'primes_python_compiled(1000)'
+    100 loops, best of 3: 11.9 msec per loop
+
+    python -m timeit -s 'from example import primes' 'primes(1000)'
+    1000 loops, best of 3: 1.65 msec per loop
+
+The cythonize version of ``primes_python`` is 2 times faster than the Python one,
+without changing a single line of code.
+The Cython version is 13 times faster than the Python version! What could explain this?
+
+Multiple things:
+ * In this program, very little computation happen at each line.
+   So the overhead of the python interpreter is very important. It would be
+   very different if you were to do a lot computation at each line. Using NumPy for
+   example.
+ * Data locality. It's likely that a lot more can fit in CPU cache when using C than
+   when using Python. Because everything in python is an object, and every object is
+   implemented as a dictionary, this is not very cache friendly.
+
+Usually the speedups are between 2x to 1000x. It depends on how much you call
+the Python interpreter. As always, remember to profile before adding types
+everywhere. Adding types makes your code less readable, so use them with
+moderation.
+
+
 Language Details
 ================


--- a/docs/src/tutorial/htmlreport.png
+++ b/docs/src/tutorial/htmlreport.png
--- a/docs/src/tutorial/python_division.png
+++ b/docs/src/tutorial/python_division.png
--- a/docs/src/userguide/language_basics.rst
+++ b/docs/src/userguide/language_basics.rst
@@ -301,6 +301,10 @@ return value and raise it yourself, for example,::
        raise SpamError("Couldn't open the spam file")


+
+
+.. _type-conversion:
+
 Automatic type conversions
 ==========================