Skip to content
Projects
Groups
Snippets
Help
Loading...
Help
Support
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
C
cython
Project overview
Project overview
Details
Activity
Releases
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Issues
0
Issues
0
List
Boards
Labels
Milestones
Merge Requests
0
Merge Requests
0
Analytics
Analytics
Repository
Value Stream
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Create a new issue
Commits
Issue Boards
Open sidebar
Gwenaël Samain
cython
Commits
1d86d5fe
Commit
1d86d5fe
authored
Mar 18, 2018
by
gabrieldemarmiesse
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
Quite some progress with this tutorial. Only two parts missing.
parent
c00d9ed6
Changes
7
Hide whitespace changes
Inline
Side-by-side
Showing
7 changed files
with
145 additions
and
112 deletions
+145
-112
docs/examples/userguide/convolve_fused_types.pyx
docs/examples/userguide/convolve_fused_types.pyx
+1
-1
docs/examples/userguide/convolve_infer_types.pyx
docs/examples/userguide/convolve_infer_types.pyx
+2
-1
docs/examples/userguide/convolve_memview.pyx
docs/examples/userguide/convolve_memview.pyx
+1
-1
docs/examples/userguide/convolve_py.py
docs/examples/userguide/convolve_py.py
+1
-1
docs/examples/userguide/convolve_typed.pyx
docs/examples/userguide/convolve_typed.pyx
+4
-8
docs/src/userguide/convolve_types_html.png
docs/src/userguide/convolve_types_html.png
+0
-0
docs/src/userguide/numpy_tutorial.rst
docs/src/userguide/numpy_tutorial.rst
+136
-100
No files found.
docs/examples/userguide/convolve_fused_types.pyx
View file @
1d86d5fe
...
@@ -8,7 +8,7 @@ ctypedef fused my_type:
...
@@ -8,7 +8,7 @@ ctypedef fused my_type:
@
cython
.
boundscheck
(
False
)
@
cython
.
boundscheck
(
False
)
@
cython
.
wraparound
(
False
)
@
cython
.
wraparound
(
False
)
cpdef
naive_convolve
_fused_types
(
my_type
[:,:]
f
,
my_type
[:,:]
g
):
cpdef
naive_convolve
(
my_type
[:,:]
f
,
my_type
[:,:]
g
):
if
g
.
shape
[
0
]
%
2
!=
1
or
g
.
shape
[
1
]
%
2
!=
1
:
if
g
.
shape
[
0
]
%
2
!=
1
or
g
.
shape
[
1
]
%
2
!=
1
:
raise
ValueError
(
"Only odd dimensions on filter supported"
)
raise
ValueError
(
"Only odd dimensions on filter supported"
)
...
...
docs/examples/userguide/convolve_infer_types.pyx
View file @
1d86d5fe
# cython: infer_types=True
# cython: infer_types=True
import
numpy
as
np
import
numpy
as
np
cimport
cython
DTYPE
=
np
.
intc
DTYPE
=
np
.
intc
@
cython
.
boundscheck
(
False
)
@
cython
.
boundscheck
(
False
)
@
cython
.
wraparound
(
False
)
@
cython
.
wraparound
(
False
)
def
naive_convolve
_infer_types
(
int
[:,::
1
]
f
,
int
[:,::
1
]
g
):
def
naive_convolve
(
int
[:,::
1
]
f
,
int
[:,::
1
]
g
):
if
g
.
shape
[
0
]
%
2
!=
1
or
g
.
shape
[
1
]
%
2
!=
1
:
if
g
.
shape
[
0
]
%
2
!=
1
or
g
.
shape
[
1
]
%
2
!=
1
:
raise
ValueError
(
"Only odd dimensions on filter supported"
)
raise
ValueError
(
"Only odd dimensions on filter supported"
)
...
...
docs/examples/userguide/convolve_memview.pyx
View file @
1d86d5fe
...
@@ -2,7 +2,7 @@ import numpy as np
...
@@ -2,7 +2,7 @@ import numpy as np
DTYPE
=
np
.
intc
DTYPE
=
np
.
intc
def
naive_convolve
_memview
(
int
[:,:]
f
,
int
[:,:]
g
):
def
naive_convolve
(
int
[:,:]
f
,
int
[:,:]
g
):
if
g
.
shape
[
0
]
%
2
!=
1
or
g
.
shape
[
1
]
%
2
!=
1
:
if
g
.
shape
[
0
]
%
2
!=
1
or
g
.
shape
[
1
]
%
2
!=
1
:
raise
ValueError
(
"Only odd dimensions on filter supported"
)
raise
ValueError
(
"Only odd dimensions on filter supported"
)
...
...
docs/examples/userguide/convolve_py.py
View file @
1d86d5fe
from
__future__
import
division
from
__future__
import
division
import
numpy
as
np
import
numpy
as
np
def
naive_convolve
_py
(
f
,
g
):
def
naive_convolve
(
f
,
g
):
# f is an image and is indexed by (v, w)
# f is an image and is indexed by (v, w)
# g is a filter kernel and is indexed by (s, t),
# g is a filter kernel and is indexed by (s, t),
# it needs odd dimensions
# it needs odd dimensions
...
...
docs/examples/userguide/convolve_typed.pyx
View file @
1d86d5fe
import
numpy
as
np
import
numpy
as
np
# "def" can type its arguments but not have a return type. The type of the
# arguments for a "def" function is checked at run-time when entering the
# function.
# We now need to fix a datatype for our arrays. I've used the variable
# We now need to fix a datatype for our arrays. I've used the variable
# DTYPE for this, which is assigned to the usual NumPy runtime
# DTYPE for this, which is assigned to the usual NumPy runtime
# type info object.
# type info object.
DTYPE
=
np
.
intc
DTYPE
=
np
.
intc
# The arrays f, g and h is typed as "np.ndarray" instances. The only effect
# this has is to a) insert checks that the function arguments really are
def
naive_convolve
(
f
,
g
):
# NumPy arrays, and b) make some attribute access like f.shape[0] much
# more efficient. (In this example this doesn't matter though.)
def
naive_convolve_types
(
f
,
g
):
if
g
.
shape
[
0
]
%
2
!=
1
or
g
.
shape
[
1
]
%
2
!=
1
:
if
g
.
shape
[
0
]
%
2
!=
1
or
g
.
shape
[
1
]
%
2
!=
1
:
raise
ValueError
(
"Only odd dimensions on filter supported"
)
raise
ValueError
(
"Only odd dimensions on filter supported"
)
assert
f
.
dtype
==
DTYPE
and
g
.
dtype
==
DTYPE
assert
f
.
dtype
==
DTYPE
and
g
.
dtype
==
DTYPE
...
@@ -46,6 +40,8 @@ def naive_convolve_types(f, g):
...
@@ -46,6 +40,8 @@ def naive_convolve_types(f, g):
cdef
int
value
cdef
int
value
for
x
in
range
(
xmax
):
for
x
in
range
(
xmax
):
for
y
in
range
(
ymax
):
for
y
in
range
(
ymax
):
# Cython has built-in C functions for min and max
# This makes the following lines very fast.
s_from
=
max
(
smid
-
x
,
-
smid
)
s_from
=
max
(
smid
-
x
,
-
smid
)
s_to
=
min
((
xmax
-
x
)
-
smid
,
smid
+
1
)
s_to
=
min
((
xmax
-
x
)
-
smid
,
smid
+
1
)
t_from
=
max
(
tmid
-
y
,
-
tmid
)
t_from
=
max
(
tmid
-
y
,
-
tmid
)
...
...
docs/src/userguide/convolve_types_html.png
0 → 100644
View file @
1d86d5fe
37.1 KB
docs/src/userguide/numpy_tutorial.rst
View file @
1d86d5fe
...
@@ -160,22 +160,24 @@ run a Python session to test both the Python version (imported from
...
@@ -160,22 +160,24 @@ run a Python session to test both the Python version (imported from
array([[1, 1, 1],
array([[1, 1, 1],
[2, 2, 2],
[2, 2, 2],
[1, 1, 1]])
[1, 1, 1]])
In [11]: N =
1
00
In [11]: N =
3
00
In [12]: f = np.arange(N*N, dtype=np.int).reshape((N,N))
In [12]: f = np.arange(N*N, dtype=np.int).reshape((N,N))
In [13]: g = np.arange(81, dtype=np.int).reshape((9, 9))
In [13]: g = np.arange(81, dtype=np.int).reshape((9, 9))
In [19]: %timeit -n2 -r3 convolve_py.naive_convolve(f, g)
In [19]: %timeit -n2 -r3 convolve_py.naive_convolve(f, g)
422 ms ± 2.06
ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
3.9 s ± 12.4
ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [20]: %timeit -n2 -r3 convolve_cy.naive_convolve(f, g)
In [20]: %timeit -n2 -r3 convolve_cy.naive_convolve(f, g)
3
42 ms ± 1.39
ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
3
.12 s ± 15.2
ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
There's not such a huge difference yet; because the C code still does exactly
There's not such a huge difference yet; because the C code still does exactly
what the Python interpreter does (meaning, for instance, that a new object is
what the Python interpreter does (meaning, for instance, that a new object is
allocated for each number used). You can look at the Python interaction
allocated for each number used).
and the generated C code by using `-a` when calling Cython from the command
line, `%%cython -a` when using a Jupyter Notebook, or by using
You can look at the Python interaction and the generated C
`cythonize('convolve_cy.pyx', annotate=True)` when using a `setup.py`.
code by using ``-a`` when calling Cython from the command
line, ``%%cython -a`` when using a Jupyter Notebook, or by using
``cythonize('convolve_cy.pyx', annotate=True)`` when using a ``setup.py``.
Look at the generated html file and see what
Look at the generated html file and see what
is needed for even the simplest statements
y
ou get the point quickly. We need
is needed for even the simplest statements
. Y
ou get the point quickly. We need
to give Cython more information; we need to add types.
to give Cython more information; we need to add types.
Adding types
Adding types
...
@@ -187,62 +189,95 @@ compatibility. Here's :file:`convolve_typed.pyx`. *Read the comments!*
...
@@ -187,62 +189,95 @@ compatibility. Here's :file:`convolve_typed.pyx`. *Read the comments!*
.. literalinclude:: ../../examples/userguide/convolve_typed.pyx
.. literalinclude:: ../../examples/userguide/convolve_typed.pyx
:linenos:
:linenos:
At this point, have a look at the generated C code for :file:`convolve1.pyx` and
.. figure:: convolve_types_html.png
:file:`convolve2.pyx`. Click on the lines to expand them and see corresponding C.
(Note that this code annotation is currently experimental and especially
"trailing" cleanup code for a block may stick to the last expression in the
block and make it look worse than it is -- use some common sense).
* .. literalinclude: convolve1.html
At this point, have a look at the generated C code for :file:`convolve_cy.pyx` and
* .. literalinclude: convolve2.html
:file:`convolve_typed.pyx`. Click on the lines to expand them and see corresponding C.
Especially have a look at the for loops: In :file:`convolve
1
.c`, these are ~20 lines
Especially have a look at the for loops: In :file:`convolve
_cy
.c`, these are ~20 lines
of C code to set up while in :file:`convolve
2
.c` a normal C for loop is used.
of C code to set up while in :file:`convolve
_typed
.c` a normal C for loop is used.
After building this and continuing my (very informal) benchmarks, I get:
After building this and continuing my (very informal) benchmarks, I get:
.. sourcecode:: ipython
.. sourcecode:: ipython
In [21]: import convolve2
In [19]: %timeit -n2 -r3 convolve_py.naive_convolve(f, g)
In [22]: %timeit -n2 -r3 convolve2.naive_convolve(f, g)
3.9 s ± 12.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
2 loops, best of 3: 828 ms per loop
In [20]: %timeit -n2 -r3 convolve_cy.naive_convolve(f, g)
3.12 s ± 15.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [22]: %timeit -n2 -r3 convolve_typed.naive_convolve(f, g)
13.8 s ± 122 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Efficient indexing
So in the end, adding types make the Cython code slower?
====================
What happend is that most of the time spend in this code is spent on line
60. ::
value += g[smid - s, tmid - t] * f[v, w]
So what made this line so much slower than in the pure Python version?
``g`` and ``f`` are still NumPy arrays, so Python objects, and expect
Python integers as indexes. Here we give C integers. So every time
Cython reaches this line, it has to convert all the C integers to Python
integers. Since this line is called very often, it outweight the speed
benefits of the pure C loops that were created from the ``range()`` earlier.
Furthermore, ``g[smid - s, tmid - t] * f[v, w]`` returns a Python integer
and ``value`` is a C integers, Cython has to do a type conversion again.
In the end those types conversions add up. And made our convolution really
slow. But this can be solved easily by using memoryviews.
Efficient indexing with memoryviews
===================================
There's still a bottleneck killing performance, and that is the array lookups
There are still two bottleneck killing performance, and that is the array lookups
and assignments. The ``[]``-operator still uses full Python operations --
and assignments, as well as C/Python types conversion.
The ``[]``-operator still uses full Python operations --
what we would like to do instead is to access the data buffer directly at C
what we would like to do instead is to access the data buffer directly at C
speed.
speed.
What we need to do then is to type the contents of the :obj:`ndarray` objects.
What we need to do then is to type the contents of the :obj:`ndarray` objects.
We do this with a special "buffer" syntax which must be told the datatype
We do this with a memoryview. There is :ref:`a page in the Cython documentation
(first argument) and number of dimensions ("ndim" keyword-only argument, if
<memoryviews>` dedicated to it.
not provided then one-dimensional is assumed).
More information on this syntax [:enhancements/buffer:can be found here].
In short, memoryviews are C structures that can hold a pointer to the data
of a NumPy array. They also support slices, so they work even if
the NumPy array isn't contiguous in memory.
They can be indexed by C integers, thus allowing fast access to the
NumPy array data.
Showing the changes needed to produce :file:`convolve3.pyx` only
No data is copied from the NumPy array to the memoryview in our example.
As the name implies, it is only a "view" of the memory. So we can use
``h`` for efficient indexing and return then ``h_np``
because we want to return a NumPy array.
Here is how to use them in our code:
:file:`convolve_memview.pyx`
.. literalinclude:: ../../examples/userguide/convolve_memview.pyx
.. literalinclude:: ../../examples/userguide/convolve_memview.pyx
:linenos:
:linenos:
Usage:
Let's see how much faster accessing is now.
.. sourcecode:: ipython
.. sourcecode:: ipython
In [18]: import convolve3
In [19]: %timeit -n2 -r3 convolve_py.naive_convolve(f, g)
In [19]: %timeit -n3 -r100 convolve3.naive_convolve(f, g)
3.9 s ± 12.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
3 loops, best of 100: 11.6 ms per loop
In [20]: %timeit -n2 -r3 convolve_cy.naive_convolve(f, g)
3.12 s ± 15.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [21]: %timeit -n2 -r3 convolve_typed.naive_convolve(f, g)
13.8 s ± 122 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [22]: %timeit -n2 -r3 convolve_memview.naive_convolve(f, g)
13.5 ms ± 455 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Note the importance of this change.
Note the importance of this change.
We're now 290 times faster than an interpreted version of Python.
*Gotcha*: This efficient indexing only affects certain index operations,
Memoryviews can be used with slices too, or even
namely those with exactly ``ndim`` number of typed integer indices. So if
with Python arrays. Check out the `memoryview page <memoryviews>`to see what they
``v`` for instance isn't typed, then the lookup ``f[v, w]`` isn't
can do for you.
optimized. On the other hand this means that you can continue using Python
objects for sophisticated dynamic slicing etc. just as when the array is not
typed.
Tuning indexing further
Tuning indexing further
========================
========================
...
@@ -252,13 +287,15 @@ The array lookups are still slowed down by two factors:
...
@@ -252,13 +287,15 @@ The array lookups are still slowed down by two factors:
1. Bounds checking is performed.
1. Bounds checking is performed.
2. Negative indices are checked for and handled correctly. The code above is
2. Negative indices are checked for and handled correctly. The code above is
explicitly coded so that it doesn't use negative indices, and it
explicitly coded so that it doesn't use negative indices, and it
(hopefully) always access within bounds. We can add a decorator to disable
(hopefully) always access within bounds.
bounds checking::
With decorators, we can deactivate those checks::
...
...
cimport cython
cimport cython
@cython.boundscheck(False) # turn off bounds-checking for entire function
@cython.boundscheck(False) # Deactivate bounds checking
def naive_convolve(np.ndarray[DTYPE_t, ndim=2] f, np.ndarray[DTYPE_t, ndim=2] g):
@cython.wraparound(False) # Deactivate negative indexing.
def naive_convolve(int [:, :] f, int [:, :] g):
...
...
Now bounds checking is not performed (and, as a side-effect, if you ''do''
Now bounds checking is not performed (and, as a side-effect, if you ''do''
...
@@ -267,54 +304,19 @@ and in the worst case corrupt data). It is possible to switch bounds-checking
...
@@ -267,54 +304,19 @@ and in the worst case corrupt data). It is possible to switch bounds-checking
mode in many ways, see :ref:`compiler-directives` for more
mode in many ways, see :ref:`compiler-directives` for more
information.
information.
Negative indices are dealt with by ensuring Cython that the indices will be
positive, by casting the variables to unsigned integer types (if you do have
negative values, then this casting will create a very large positive value
instead and you will attempt to access out-of-bounds values). Casting is done
with a special ``<>``-syntax. The code below is changed to use either
unsigned ints or casting as appropriate::
...
cdef int s, t # changed
cdef unsigned int x, y, v, w # changed
cdef int s_from, s_to, t_from, t_to
cdef DTYPE_t value
for x in range(xmax):
for y in range(ymax):
s_from = max(smid - x, -smid)
s_to = min((xmax - x) - smid, smid + 1)
t_from = max(tmid - y, -tmid)
t_to = min((ymax - y) - tmid, tmid + 1)
value = 0
for s in range(s_from, s_to):
for t in range(t_from, t_to):
v = <unsigned int>(x - smid + s) # changed
w = <unsigned int>(y - tmid + t) # changed
value += g[<unsigned int>(smid - s), <unsigned int>(tmid - t)] * f[v, w] # changed
h[x, y] = value
...
(In the next Cython release we will likely add a compiler directive or
argument to the ``np.ndarray[]``-type specifier to disable negative indexing
so that casting so much isn't necessary; feedback on this is welcome.)
The function call overhead now starts to play a role, so we compare the latter
two examples with larger N:
.. sourcecode:: ipython
.. sourcecode:: ipython
In [11]: %timeit -n3 -r100 convolve4.naive_convolve(f, g)
In [19]: %timeit -n2 -r3 convolve_py.naive_convolve(f, g)
3 loops, best of 100: 5.97 ms per loop
3.9 s ± 12.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [12]: N = 1000
In [20]: %timeit -n2 -r3 convolve_cy.naive_convolve(f, g)
In [13]: f = np.arange(N*N, dtype=np.int).reshape((N,N))
3.12 s ± 15.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [14]: g = np.arange(81, dtype=np.int).reshape((9, 9))
In [21]: %timeit -n2 -r3 convolve_typed.naive_convolve(f, g)
In [17]: %timeit -n1 -r10 convolve3.naive_convolve(f, g)
13.8 s ± 122 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
1 loops, best of 10: 1.16 s per loop
In [22]: %timeit -n2 -r3 convolve_memview.naive_convolve(f, g)
In [18]: %timeit -n1 -r10 convolve4.naive_convolve(f, g)
13.5 ms ± 455 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
1 loops, best of 10: 597 ms per loop
In [23]: %timeit -n2 -r3 convolve_index.naive_convolve(f, g)
7.57 ms ± 151 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
(Also this is a mixed benchmark as the result array is allocated within the
function call.)
.. Warning::
.. Warning::
...
@@ -331,33 +333,67 @@ function call.)
...
@@ -331,33 +333,67 @@ function call.)
Declaring the NumPy arrays as contiguous
Declaring the NumPy arrays as contiguous
========================================
========================================
Insert stuff here.
For extra speed gains, if you know that the NumPy arrays you are
providing are contiguous in memory, you can declare the
memoryview as holding data contiguous in memory.
We give an example on an array that has 3 dimensions.
If they are C-contiguous you have to declare the memoryview like this::
cdef int [:,:,::1] a
if they are F-contiguous, you can declare the memoryview like this::
cdef int [::1, :, :] a
If all this makes no sense to you, you can skip it, the performance gains are
not that important. If you still want to understand what contiguous arrays are
all about, you can see `this answer on StackOverflow
<https://stackoverflow.com/questions/26998223/what-is-the-difference-between-contiguous-and-non-contiguous-arrays>`_.
For the sake of giving numbers, here are the speed gains that you should
get by declaring the memoryviews as contiguous:
.. sourcecode:: ipython
In [19]: %timeit -n2 -r3 convolve_py.naive_convolve(f, g)
3.9 s ± 12.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [20]: %timeit -n2 -r3 convolve_cy.naive_convolve(f, g)
3.12 s ± 15.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [21]: %timeit -n2 -r3 convolve_typed.naive_convolve(f, g)
13.8 s ± 122 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [22]: %timeit -n2 -r3 convolve_memview.naive_convolve(f, g)
13.5 ms ± 455 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [23]: %timeit -n2 -r3 convolve_index.naive_convolve(f, g)
7.57 ms ± 151 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [23]: %timeit -n2 -r3 convolve_contiguous.naive_convolve(f, g)
7.2 ms ± 40.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Making the function cleaner
Making the function cleaner
===========================
===========================
Some comments here.
Declaring types can make your code quite verbose. If you don't mind
Cython inferring the C types of your variables, you can use
the `infer_types=True` compiler directive. It will save you quite a bit
of typing.
# explain here why value must be typed
.. literalinclude:: ../../examples/userguide/convolve_infer_types.pyx
.. literalinclude:: ../../examples/userguide/convolve_infer_types.pyx
:linenos:
:linenos:
# explain here why it is faster.
More generic code
More generic code
==================
==================
It would be possible to do
# Explain here templated
.. literalinclude:: ../../examples/userguide/convolve_fused_types.pyx
.. literalinclude:: ../../examples/userguide/convolve_fused_types.pyx
:linenos:
:linenos:
i.e. use :obj:`object` rather than :obj:`np.ndarray`. Under Python 3.0 this
# Explain the black magic of why it's faster.
can allow your algorithm to work with any libraries supporting the buffer
interface; and support for e.g. the Python Imaging Library may easily be added
if someone is interested also under Python 2.x.
There is some speed penalty to this though (as one makes more assumptions
compile-time if the type is set to :obj:`np.ndarray`, specifically it is
assumed that the data is stored in pure strided mode and not in indirect
mode).
Where to go from here?
Where to go from here?
======================
======================
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment