Merge pull request #2390 from gabrieldemarmiesse/rendering_parameters

Changed the organisation of prange.

Merge pull request #2390 from gabrieldemarmiesse/rendering_parameters
Changed the organisation of prange.
8eeea273 · scoder · GitHub · eb294a56 · aa8f3b1c · 8eeea273
Commit 8eeea273 authored Jul 07, 2018 by scoder Committed by GitHub Jul 07, 2018
Hide whitespace changes
Inline Side-by-side

Showing with 77 additions and 60 deletions

docs/src/userguide/parallelism.rst docs/src/userguide/parallelism.rst +77 -60

No files found.
--- a/docs/src/userguide/parallelism.rst
+++ b/docs/src/userguide/parallelism.rst
@@ -21,9 +21,7 @@ It currently supports OpenMP, but later on more backends might be supported.
    This function can be used for parallel loops. OpenMP automatically
    starts a thread pool and distributes the work according to the schedule
-    used. ``step`` must not be 0. This function can only be used with the
+    used.
-    GIL released. If ``nogil`` is true, the loop will be wrapped in a nogil
-    section.
    Thread-locality and reductions are automatically inferred for variables.
@@ -36,80 +34,99 @@ It currently supports OpenMP, but later on more backends might be supported.
    Variables assigned to in a parallel with block will be private and unusable
    after the block, as there is no concept of a sequentially last value.
-    The ``schedule`` is passed to OpenMP and can be one of the following:
-    static:
+    :param start:
-       If a chunksize is provided, iterations are distributed to all
+        The index indicating the start of the loop (same as the start argument in range).
-       threads ahead of time in blocks of the given chunksize.  If no
-       chunksize is given, the iteration space is divided into chunks that
-       are approximately equal in size, and at most one chunk is assigned
-       to each thread in advance.
-       This is most appropriate when the scheduling overhead matters and
+    :param stop:
-       the problem can be cut down into equally sized chunks that are
+        The index indicating when to stop the loop (same as the stop argument in range).
-       known to have approximately the same runtime.
-    dynamic:
+    :param step:
-       The iterations are distributed to threads as they request them,
+        An integer giving the step of the sequence (same as the step argument in range).
-       with a default chunk size of 1.
+        It must not be 0.
-       This is suitable when the runtime of each chunk differs and is not
+    :param nogil:
-       known in advance and therefore a larger number of smaller chunks
+        This function can only be used with the GIL released.
-       is used in order to keep all threads busy.
+        If ``nogil`` is true, the loop will be wrapped in a nogil section.
-    guided:
+    :param schedule:
-       As with dynamic scheduling, the iterations are distributed to
+        The ``schedule`` is passed to OpenMP and can be one of the following:
-       threads as they request them, but with decreasing chunk size.  The
-       size of each chunk is proportional to the number of unassigned
-       iterations divided by the number of participating threads,
-       decreasing to 1 (or the chunksize if provided).
-       This has an advantage over pure dynamic scheduling when it turns
+        static:
-       out that the last chunks take more time than expected or are
+            If a chunksize is provided, iterations are distributed to all
-       otherwise being badly scheduled, so that most threads start running
+            threads ahead of time in blocks of the given chunksize.  If no
-       idle while the last chunks are being worked on by only a smaller
+            chunksize is given, the iteration space is divided into chunks that
-       number of threads.
+            are approximately equal in size, and at most one chunk is assigned
+            to each thread in advance.
-    runtime:
+            This is most appropriate when the scheduling overhead matters and
-       The schedule and chunk size are taken from the runtime scheduling
+            the problem can be cut down into equally sized chunks that are
-       variable, which can be set through the ``openmp.omp_set_schedule()``
+            known to have approximately the same runtime.
-       function call, or the OMP_SCHEDULE environment variable.  Note that
-       this essentially disables any static compile time optimisations of
-       the scheduling code itself and may therefore show a slightly worse
-       performance than when the same scheduling policy is statically
-       configured at compile time.
-    ..  auto             The decision regarding scheduling is delegated to the
+        dynamic:
-    ..                   compiler and/or runtime system. The programmer gives
+            The iterations are distributed to threads as they request them,
-    ..                   the implementation the freedom to choose any possible
+            with a default chunk size of 1.
-    ..                   mapping of iterations to threads in the team.
-    The default schedule is implementation defined. For more information consult
+            This is suitable when the runtime of each chunk differs and is not
-    the OpenMP specification [#]_.
+            known in advance and therefore a larger number of smaller chunks
+            is used in order to keep all threads busy.
-    The ``num_threads`` argument indicates how many threads the team should consist of. If not given,
+        guided:
-    OpenMP will decide how many threads to use. Typically this is the number of cores available on
+            As with dynamic scheduling, the iterations are distributed to
-    the machine. However, this may be controlled through the ``omp_set_num_threads()`` function, or
+            threads as they request them, but with decreasing chunk size.  The
-    through the ``OMP_NUM_THREADS`` environment variable.
+            size of each chunk is proportional to the number of unassigned
+            iterations divided by the number of participating threads,
+            decreasing to 1 (or the chunksize if provided).
-    The ``chunksize`` argument indicates the chunksize to be used for dividing the iterations among threads.
+            This has an advantage over pure dynamic scheduling when it turns
-    This is only valid for ``static``, ``dynamic`` and ``guided`` scheduling, and is optional. Different chunksizes
+            out that the last chunks take more time than expected or are
-    may give substantially different performance results, depending on the schedule, the load balance it provides,
+            otherwise being badly scheduled, so that most threads start running
-    the scheduling overhead and the amount of false sharing (if any).
+            idle while the last chunks are being worked on by only a smaller
+            number of threads.
-    Example with a reduction:
+        runtime:
+            The schedule and chunk size are taken from the runtime scheduling
+            variable, which can be set through the ``openmp.omp_set_schedule()``
+            function call, or the OMP_SCHEDULE environment variable.  Note that
+            this essentially disables any static compile time optimisations of
+            the scheduling code itself and may therefore show a slightly worse
+            performance than when the same scheduling policy is statically
+            configured at compile time.
+            The default schedule is implementation defined. For more information consult
+            the OpenMP specification [#]_.
-    .. literalinclude:: ../../examples/userguide/parallelism/simple_sum.pyx
+            ..  auto             The decision regarding scheduling is delegated to the
+            ..                   compiler and/or runtime system. The programmer gives
+            ..                   the implementation the freedom to choose any possible
+            ..                   mapping of iterations to threads in the team.
-    Example with a typed memoryview (e.g. a NumPy array)::
-        from cython.parallel import prange
-        def func(double[:] x, double alpha):
+    :param num_threads:
-            cdef Py_ssize_t i
+        The ``num_threads`` argument indicates how many threads the team should consist of. If not given,
+        OpenMP will decide how many threads to use. Typically this is the number of cores available on
+        the machine. However, this may be controlled through the ``omp_set_num_threads()`` function, or
+        through the ``OMP_NUM_THREADS`` environment variable.
-            for i in prange(x.shape[0]):
+    :param chunksize: 
-                x[i] = alpha * x[i]
+        The ``chunksize`` argument indicates the chunksize to be used for dividing the iterations among threads.
+        This is only valid for ``static``, ``dynamic`` and ``guided`` scheduling, and is optional. Different chunksizes
+        may give substantially different performance results, depending on the schedule, the load balance it provides,
+        the scheduling overhead and the amount of false sharing (if any).
+Example with a reduction:
+.. literalinclude:: ../../examples/userguide/parallelism/simple_sum.pyx
+Example with a typed memoryview (e.g. a NumPy array)::
+    from cython.parallel import prange
+    def func(double[:] x, double alpha):
+        cdef Py_ssize_t i
+        for i in prange(x.shape[0]):
+            x[i] = alpha * x[i]
 .. function:: parallel(num_threads=None)