Skip to content
Projects
Groups
Snippets
Help
Loading...
Help
Support
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
C
cython
Project overview
Project overview
Details
Activity
Releases
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Issues
0
Issues
0
List
Boards
Labels
Milestones
Merge Requests
0
Merge Requests
0
Analytics
Analytics
Repository
Value Stream
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Create a new issue
Commits
Issue Boards
Open sidebar
Gwenaël Samain
cython
Commits
8eeea273
Commit
8eeea273
authored
Jul 07, 2018
by
scoder
Committed by
GitHub
Jul 07, 2018
Browse files
Options
Browse Files
Download
Plain Diff
Merge pull request #2390 from gabrieldemarmiesse/rendering_parameters
Changed the organisation of prange.
parents
eb294a56
aa8f3b1c
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
77 additions
and
60 deletions
+77
-60
docs/src/userguide/parallelism.rst
docs/src/userguide/parallelism.rst
+77
-60
No files found.
docs/src/userguide/parallelism.rst
View file @
8eeea273
...
@@ -21,9 +21,7 @@ It currently supports OpenMP, but later on more backends might be supported.
...
@@ -21,9 +21,7 @@ It currently supports OpenMP, but later on more backends might be supported.
This function can be used for parallel loops. OpenMP automatically
This function can be used for parallel loops. OpenMP automatically
starts a thread pool and distributes the work according to the schedule
starts a thread pool and distributes the work according to the schedule
used. ``step`` must not be 0. This function can only be used with the
used.
GIL released. If ``nogil`` is true, the loop will be wrapped in a nogil
section.
Thread-locality and reductions are automatically inferred for variables.
Thread-locality and reductions are automatically inferred for variables.
...
@@ -36,80 +34,99 @@ It currently supports OpenMP, but later on more backends might be supported.
...
@@ -36,80 +34,99 @@ It currently supports OpenMP, but later on more backends might be supported.
Variables assigned to in a parallel with block will be private and unusable
Variables assigned to in a parallel with block will be private and unusable
after the block, as there is no concept of a sequentially last value.
after the block, as there is no concept of a sequentially last value.
The ``schedule`` is passed to OpenMP and can be one of the following:
static:
:param start:
If a chunksize is provided, iterations are distributed to all
The index indicating the start of the loop (same as the start argument in range).
threads ahead of time in blocks of the given chunksize. If no
chunksize is given, the iteration space is divided into chunks that
are approximately equal in size, and at most one chunk is assigned
to each thread in advance.
This is most appropriate when the scheduling overhead matters and
:param stop:
the problem can be cut down into equally sized chunks that are
The index indicating when to stop the loop (same as the stop argument in range).
known to have approximately the same runtime.
dynamic
:
:param step
:
The iterations are distributed to threads as they request them,
An integer giving the step of the sequence (same as the step argument in range).
with a default chunk size of 1
.
It must not be 0
.
This is suitable when the runtime of each chunk differs and is not
:param nogil:
known in advance and therefore a larger number of smaller chunks
This function can only be used with the GIL released.
is used in order to keep all threads busy
.
If ``nogil`` is true, the loop will be wrapped in a nogil section
.
guided:
:param schedule:
As with dynamic scheduling, the iterations are distributed to
The ``schedule`` is passed to OpenMP and can be one of the following:
threads as they request them, but with decreasing chunk size. The
size of each chunk is proportional to the number of unassigned
iterations divided by the number of participating threads,
decreasing to 1 (or the chunksize if provided).
This has an advantage over pure dynamic scheduling when it turns
static:
out that the last chunks take more time than expected or are
If a chunksize is provided, iterations are distributed to all
otherwise being badly scheduled, so that most threads start running
threads ahead of time in blocks of the given chunksize. If no
idle while the last chunks are being worked on by only a smaller
chunksize is given, the iteration space is divided into chunks that
number of threads.
are approximately equal in size, and at most one chunk is assigned
to each thread in advance.
runtime:
This is most appropriate when the scheduling overhead matters and
The schedule and chunk size are taken from the runtime scheduling
the problem can be cut down into equally sized chunks that are
variable, which can be set through the ``openmp.omp_set_schedule()``
known to have approximately the same runtime.
function call, or the OMP_SCHEDULE environment variable. Note that
this essentially disables any static compile time optimisations of
the scheduling code itself and may therefore show a slightly worse
performance than when the same scheduling policy is statically
configured at compile time.
.. auto The decision regarding scheduling is delegated to the
dynamic:
.. compiler and/or runtime system. The programmer gives
The iterations are distributed to threads as they request them,
.. the implementation the freedom to choose any possible
with a default chunk size of 1.
.. mapping of iterations to threads in the team.
The default schedule is implementation defined. For more information consult
This is suitable when the runtime of each chunk differs and is not
the OpenMP specification [#]_.
known in advance and therefore a larger number of smaller chunks
is used in order to keep all threads busy.
The ``num_threads`` argument indicates how many threads the team should consist of. If not given,
guided:
OpenMP will decide how many threads to use. Typically this is the number of cores available on
As with dynamic scheduling, the iterations are distributed to
the machine. However, this may be controlled through the ``omp_set_num_threads()`` function, or
threads as they request them, but with decreasing chunk size. The
through the ``OMP_NUM_THREADS`` environment variable.
size of each chunk is proportional to the number of unassigned
iterations divided by the number of participating threads,
decreasing to 1 (or the chunksize if provided).
The ``chunksize`` argument indicates the chunksize to be used for dividing the iterations among threads.
This has an advantage over pure dynamic scheduling when it turns
This is only valid for ``static``, ``dynamic`` and ``guided`` scheduling, and is optional. Different chunksizes
out that the last chunks take more time than expected or are
may give substantially different performance results, depending on the schedule, the load balance it provides,
otherwise being badly scheduled, so that most threads start running
the scheduling overhead and the amount of false sharing (if any).
idle while the last chunks are being worked on by only a smaller
number of threads.
Example with a reduction:
runtime:
The schedule and chunk size are taken from the runtime scheduling
variable, which can be set through the ``openmp.omp_set_schedule()``
function call, or the OMP_SCHEDULE environment variable. Note that
this essentially disables any static compile time optimisations of
the scheduling code itself and may therefore show a slightly worse
performance than when the same scheduling policy is statically
configured at compile time.
The default schedule is implementation defined. For more information consult
the OpenMP specification [#]_.
.. literalinclude:: ../../examples/userguide/parallelism/simple_sum.pyx
.. auto The decision regarding scheduling is delegated to the
.. compiler and/or runtime system. The programmer gives
.. the implementation the freedom to choose any possible
.. mapping of iterations to threads in the team.
Example with a typed memoryview (e.g. a NumPy array)::
from cython.parallel import prange
def func(double[:] x, double alpha):
:param num_threads:
cdef Py_ssize_t i
The ``num_threads`` argument indicates how many threads the team should consist of. If not given,
OpenMP will decide how many threads to use. Typically this is the number of cores available on
the machine. However, this may be controlled through the ``omp_set_num_threads()`` function, or
through the ``OMP_NUM_THREADS`` environment variable.
for i in prange(x.shape[0]):
:param chunksize:
x[i] = alpha * x[i]
The ``chunksize`` argument indicates the chunksize to be used for dividing the iterations among threads.
This is only valid for ``static``, ``dynamic`` and ``guided`` scheduling, and is optional. Different chunksizes
may give substantially different performance results, depending on the schedule, the load balance it provides,
the scheduling overhead and the amount of false sharing (if any).
Example with a reduction:
.. literalinclude:: ../../examples/userguide/parallelism/simple_sum.pyx
Example with a typed memoryview (e.g. a NumPy array)::
from cython.parallel import prange
def func(double[:] x, double alpha):
cdef Py_ssize_t i
for i in prange(x.shape[0]):
x[i] = alpha * x[i]
.. function:: parallel(num_threads=None)
.. function:: parallel(num_threads=None)
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment