Some improvements one the tutorial page.

1105d722 · gabrieldemarmiesse · 831ce22f · 1105d722 · 1105d722 · 1105d722
Commit 1105d722 authored Mar 14, 2018 by gabrieldemarmiesse
5 changed files
--- a/docs/examples/tutorial/primes/primes.pyx
+++ b/docs/examples/tutorial/primes/primes.pyx
-def primes(int kmax):
-    cdef int n, k, i
+def primes(int nb_primes):
+    cdef int n, i, len_p
    cdef int p[1000]
-    result = []
-    if kmax > 1000:
-        kmax = 1000
-    k = 0
+    if nb_primes > 1000:
+        nb_primes = 1000
+
+    len_p = 0  # The number of elements in p
    n = 2
-    while k < kmax:
-        i = 0
-        while i < k and n % p[i] != 0:
-            i = i + 1
-        if i == k:
-            p[k] = n
-            k = k + 1
-            result.append(n)
-        n = n + 1
-    return result
+    while len_p < nb_primes:
+        # Is n prime?
+        for i in p[:len_p]:
+            if n % i == 0:
+                break
+
+        # If no break occurred in the loop
+        else:
+            p[len_p] = n
+            len_p += 1
+        n += 1

+    # Let's put the result in a python list:
+    result_as_list  = [prime for prime in p[:len_p]]
+    return result_as_list
--- a/docs/src/tutorial/cython_tutorial.rst
+++ b/docs/src/tutorial/cython_tutorial.rst
@@ -132,27 +132,83 @@ them as a Python list.
    :linenos:

 You'll see that it starts out just like a normal Python function definition,
-except that the parameter ``kmax`` is declared to be of type ``int`` . This
+except that the parameter ``nb_primes`` is declared to be of type ``int`` . This
 means that the object passed will be converted to a C integer (or a
 ``TypeError.`` will be raised if it can't be).

+Now, let's dig into the core of the function::
+
+    cdef int n, i, len_p
+    cdef int p[1000]
+
 Lines 2 and 3 use the ``cdef`` statement to define some local C variables.
-Line 4 creates a Python list which will be used to return the result. You'll
-notice that this is done exactly the same way it would be in Python. Because
-the variable result hasn't been given a type, it is assumed to hold a Python
-object.
+The result is put in ``p``, it will be converted to a python list at the end
+of the function (line 22). ::
+
+    if nb_primes > 1000:
+        nb_primes = 1000
+
+As in C, declaring a static array requires knowing the size at compile time.
+We make sure the user doesn't set a value above 1000 (or we'll have a nice
+segmentation fault, just like in C). ::
+
+    len_p = 0  # The number of elements in p
+    n = 2
+    while len_p < nb_primes:

 Lines 7-9 set up for a loop which will test candidate numbers for primeness
-until the required number of primes has been found. Lines 11-12, which try
-dividing a candidate by all the primes found so far, are of particular
-interest. Because no Python objects are referred to, the loop is translated
-entirely into C code, and thus runs very fast.
-
-When a prime is found, lines 14-15 add it to the p array for fast access by
-the testing loop, and line 16 adds it to the result list. Again, you'll notice
-that line 16 looks very much like a Python statement, and in fact it is, with
-the twist that the C parameter ``n`` is automatically converted to a Python
-object before being passed to the append method. Finally, at line 18, a normal
+until the required number of primes has been found. ::
+
+    # Is n prime?
+    for i in p[:len_p]:
+        if n % i == 0:
+            break
+
+Lines 11-12, which try dividing a candidate by all the primes found so far,
+are of particular interest. Because no Python objects are referred to,
+the loop is translated entirely into C code, and thus runs very fast.
+You will notice the way we iterate over the ``p`` C array.  ::
+
+    for i in p[:len_p]:
+
+The loop gets translated into C code transparently. No more ugly C for loops!
+Well don't forget how to loop in C style with integers yet, you might need it someday.
+If you don't use ``:len_p`` then Cython will loop over the 1000 elements of
+the array (it won't go out of bounds and give a segmentation fault). ::
+
+    # If no break occurred in the loop
+    else:
+        p[len_p] = n
+        len_p += 1
+    n += 1
+
+If no breaks occurred, it means that we found a prime, and the block of code
+after the ``else`` line 16 will be executed. We add the prime found to ``p``.
+If you find having a else after a for loop strange, just know that it's a
+hidden secret of the python syntax, and actually doesn't exist in C!
+But since Cython is made to be written with the Python syntax, it'll
+work out, as if you wrote Python code, but at C speed in this case.
+If the for...else syntax still confuses you, see this excellent
+`blog post <https://shahriar.svbtle.com/pythons-else-clause-in-loops>`_. ::
+
+    # Let's put the result in a python list:
+    result_as_list  = [prime for prime in p[:len_p]]
+    return result_as_list
+
+Line 22, before returning the result, we need to convert our C array into a
+Python list, because Python can't read C arrays. Note that Cython handle
+for you the conversion of quite some types between C and Python (you can
+see exactly which :ref:`here<type-conversion>`. But not C arrays. We can trick
+Cython into doing it because Cython knows how to convert a C int to a Python int.
+By doing a list comprehension, we "cast" each C int prime from p into a Python int.
+You could have also iterated manually over the C array and used
+``result_as_list.append(prime)``, the result would have been the same.
+
+You'll notice we declare a Python list exactly the same way it would be in Python.
+Because the variable ``result_as_list`` hasn't been given a type, it is assumed to
+hold a Python object.
+
+Finally, at line 18, a normal
 Python return statement returns the result list.

 Compiling primes.pyx with the Cython compiler produces an extension module
@@ -165,6 +221,72 @@ which we can try out in the interactive interpreter as follows::
 See, it works! And if you're curious about how much work Cython has saved you,
 take a look at the C code generated for this module.

+
+It is always good to check where is the Python interaction in the code with the
+``annotate=True`` parameter in ``cythonize()``. Let's see:
+
+.. figure:: htmlreport.png
+
+The function declaration and return use the Python interpreter so it makes
+sense for those lines to be yellow. Same for the list comprehension because
+it involves the creation of a python object. But the line ``if n % i == 0:``, why?
+We can examine the generated C code to understand:
+
+.. figure:: python_division.png
+
+We can see that some checks happen. Because Cython defaults to the
+Python behavior, the language will perform division checks at runtime,
+just like Python does. You can deactivate those checks by using the
+:ref:`compiler directives<compiler-directives>`.
+
+Now let's see if, even if we have division checks, we obtained a boost in speed.
+Let's write the same program, but Python-style::
+
+    def primes_python(nb_primes):
+        p = []
+        n = 2
+        while len(p) < nb_primes:
+            # Is n prime?
+            for i in p:
+                if n % i == 0:
+                    break
+
+            # If no break occurred in the loop
+            else:
+                p.append(n)
+            n += 1
+        return p
+
+Now we can ensure that those two programs output the same values::
+
+    >>> primes_python(500) == primes(500)
+    True
+
+It's possible to compare the speed now::
+
+    >>> %timeit primes_python(500)
+5.8 ms ± 178 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) ::
+
+    >>> %timeit primes(500)
+502 µs ± 2.22 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
+
+The Cython version is 11 550 times faster than the Python version! What could explain this?
+
+Multiple things:
+ * In this program, very little computation happen at each line.
+   So the overhead of the python interpreter is very important. It would be
+   very different if you were to do a lot computation at each line. Using NumPy for
+   example.
+ * Data locality. It's likely that a lot more can fit in CPU cache when using C than
+   when using Python. Because everything in python is an object, and every object is
+   implemented as a dictionary, this is not very cache friendly.
+
+It's worth mentioning that you won't usually get speedups like this.
+We very likeky touched a sweet spot with the CPU cache. Usually the speedups
+are between 2x to 1000x. As always, remember to profile before adding types
+everywhere.
+
+
 Language Details
 ================


--- a/docs/src/tutorial/htmlreport.png
+++ b/docs/src/tutorial/htmlreport.png
--- a/docs/src/tutorial/python_division.png
+++ b/docs/src/tutorial/python_division.png
--- a/docs/src/userguide/language_basics.rst
+++ b/docs/src/userguide/language_basics.rst
@@ -301,6 +301,10 @@ return value and raise it yourself, for example,::
        raise SpamError("Couldn't open the spam file")


+
+
+.. _type-conversion:
+
 Automatic type conversions
 ==========================