Commit 4cb1ef64 authored by Tejun Heo's avatar Tejun Heo

workqueue: Implement BH workqueues to eventually replace tasklets

The only generic interface to execute asynchronously in the BH context is
tasklet; however, it's marked deprecated and has some design flaws such as
the execution code accessing the tasklet item after the execution is
complete which can lead to subtle use-after-free in certain usage scenarios
and less-developed flush and cancel mechanisms.

This patch implements BH workqueues which share the same semantics and
features of regular workqueues but execute their work items in the softirq
context. As there is always only one BH execution context per CPU, none of
the concurrency management mechanisms applies and a BH workqueue can be
thought of as a convenience wrapper around softirq.

Except for the inability to sleep while executing and lack of max_active
adjustments, BH workqueues and work items should behave the same as regular
workqueues and work items.

Currently, the execution is hooked to tasklet[_hi]. However, the goal is to
convert all tasklet users over to BH workqueues. Once the conversion is
complete, tasklet can be removed and BH workqueues can directly take over
the tasklet softirqs.

system_bh[_highpri]_wq are added. As queue-wide flushing doesn't exist in
tasklet, all existing tasklet users should be able to use the system BH
workqueues without creating their own workqueues.

v3: - Add missing interrupt.h include.

v2: - Instead of using tasklets, hook directly into its softirq action
      functions - tasklet[_hi]_action(). This is slightly cheaper and closer
      to the eventual code structure we want to arrive at. Suggested by Lai.

    - Lai also pointed out several places which need NULL worker->task
      handling or can use clarification. Updated.
Signed-off-by: default avatarTejun Heo <tj@kernel.org>
Suggested-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/CAHk-=wjDW53w4-YcSmgKC5RruiRLHmJ1sXeYdp_ZgVoBw=5byA@mail.gmail.comTested-by: default avatarAllen Pais <allen.lkml@gmail.com>
Reviewed-by: default avatarLai Jiangshan <jiangshanlai@gmail.com>
parent 2fcdb1b4
......@@ -77,10 +77,12 @@ wants a function to be executed asynchronously it has to set up a work
item pointing to that function and queue that work item on a
workqueue.
Special purpose threads, called worker threads, execute the functions
off of the queue, one after the other. If no work is queued, the
worker threads become idle. These worker threads are managed in so
called worker-pools.
A work item can be executed in either a thread or the BH (softirq) context.
For threaded workqueues, special purpose threads, called [k]workers, execute
the functions off of the queue, one after the other. If no work is queued,
the worker threads become idle. These worker threads are managed in
worker-pools.
The cmwq design differentiates between the user-facing workqueues that
subsystems and drivers queue work items on and the backend mechanism
......@@ -91,6 +93,12 @@ for high priority ones, for each possible CPU and some extra
worker-pools to serve work items queued on unbound workqueues - the
number of these backing pools is dynamic.
BH workqueues use the same framework. However, as there can only be one
concurrent execution context, there's no need to worry about concurrency.
Each per-CPU BH worker pool contains only one pseudo worker which represents
the BH execution context. A BH workqueue can be considered a convenience
interface to softirq.
Subsystems and drivers can create and queue work items through special
workqueue API functions as they see fit. They can influence some
aspects of the way the work items are executed by setting flags on the
......@@ -106,7 +114,7 @@ unless specifically overridden, a work item of a bound workqueue will
be queued on the worklist of either normal or highpri worker-pool that
is associated to the CPU the issuer is running on.
For any worker pool implementation, managing the concurrency level
For any thread pool implementation, managing the concurrency level
(how many execution contexts are active) is an important issue. cmwq
tries to keep the concurrency at a minimal but sufficient level.
Minimal to save resources and sufficient in that the system is used at
......@@ -164,6 +172,17 @@ resources, scheduled and executed.
``flags``
---------
``WQ_BH``
BH workqueues can be considered a convenience interface to softirq. BH
workqueues are always per-CPU and all BH work items are executed in the
queueing CPU's softirq context in the queueing order.
All BH workqueues must have 0 ``max_active`` and ``WQ_HIGHPRI`` is the
only allowed additional flag.
BH work items cannot sleep. All other features such as delayed queueing,
flushing and canceling are supported.
``WQ_UNBOUND``
Work items queued to an unbound wq are served by the special
worker-pools which host workers which are not bound to any
......
......@@ -353,6 +353,7 @@ static inline unsigned int work_static(struct work_struct *work) { return 0; }
* Documentation/core-api/workqueue.rst.
*/
enum wq_flags {
WQ_BH = 1 << 0, /* execute in bottom half (softirq) context */
WQ_UNBOUND = 1 << 1, /* not bound to any cpu */
WQ_FREEZABLE = 1 << 2, /* freeze during suspend */
WQ_MEM_RECLAIM = 1 << 3, /* may be used for memory reclaim */
......@@ -392,6 +393,9 @@ enum wq_flags {
__WQ_ORDERED = 1 << 17, /* internal: workqueue is ordered */
__WQ_LEGACY = 1 << 18, /* internal: create*_workqueue() */
__WQ_ORDERED_EXPLICIT = 1 << 19, /* internal: alloc_ordered_workqueue() */
/* BH wq only allows the following flags */
__WQ_BH_ALLOWS = WQ_BH | WQ_HIGHPRI,
};
enum wq_consts {
......@@ -434,6 +438,9 @@ enum wq_consts {
* they are same as their non-power-efficient counterparts - e.g.
* system_power_efficient_wq is identical to system_wq if
* 'wq_power_efficient' is disabled. See WQ_POWER_EFFICIENT for more info.
*
* system_bh[_highpri]_wq are convenience interface to softirq. BH work items
* are executed in the queueing CPU's BH context in the queueing order.
*/
extern struct workqueue_struct *system_wq;
extern struct workqueue_struct *system_highpri_wq;
......@@ -442,6 +449,10 @@ extern struct workqueue_struct *system_unbound_wq;
extern struct workqueue_struct *system_freezable_wq;
extern struct workqueue_struct *system_power_efficient_wq;
extern struct workqueue_struct *system_freezable_power_efficient_wq;
extern struct workqueue_struct *system_bh_wq;
extern struct workqueue_struct *system_bh_highpri_wq;
void workqueue_softirq_action(bool highpri);
/**
* alloc_workqueue - allocate a workqueue
......
......@@ -27,6 +27,7 @@
#include <linux/tick.h>
#include <linux/irq.h>
#include <linux/wait_bit.h>
#include <linux/workqueue.h>
#include <asm/softirq_stack.h>
......@@ -802,11 +803,13 @@ static void tasklet_action_common(struct softirq_action *a,
static __latent_entropy void tasklet_action(struct softirq_action *a)
{
workqueue_softirq_action(false);
tasklet_action_common(a, this_cpu_ptr(&tasklet_vec), TASKLET_SOFTIRQ);
}
static __latent_entropy void tasklet_hi_action(struct softirq_action *a)
{
workqueue_softirq_action(true);
tasklet_action_common(a, this_cpu_ptr(&tasklet_hi_vec), HI_SOFTIRQ);
}
......
This diff is collapsed.
......@@ -79,7 +79,9 @@ def cpumask_str(cpumask):
wq_type_len = 9
def wq_type_str(wq):
if wq.flags & WQ_UNBOUND:
if wq.flags & WQ_BH:
return f'{"bh":{wq_type_len}}'
elif wq.flags & WQ_UNBOUND:
if wq.flags & WQ_ORDERED:
return f'{"ordered":{wq_type_len}}'
else:
......@@ -97,6 +99,7 @@ wq_pod_types = prog['wq_pod_types']
wq_affn_dfl = prog['wq_affn_dfl']
wq_affn_names = prog['wq_affn_names']
WQ_BH = prog['WQ_BH']
WQ_UNBOUND = prog['WQ_UNBOUND']
WQ_ORDERED = prog['__WQ_ORDERED']
WQ_MEM_RECLAIM = prog['WQ_MEM_RECLAIM']
......@@ -107,6 +110,8 @@ WQ_AFFN_CACHE = prog['WQ_AFFN_CACHE']
WQ_AFFN_NUMA = prog['WQ_AFFN_NUMA']
WQ_AFFN_SYSTEM = prog['WQ_AFFN_SYSTEM']
POOL_BH = prog['POOL_BH']
WQ_NAME_LEN = prog['WQ_NAME_LEN'].value_()
cpumask_str_len = len(cpumask_str(wq_unbound_cpumask))
......@@ -151,10 +156,12 @@ for pi, pool in idr_for_each(worker_pool_idr):
for pi, pool in idr_for_each(worker_pool_idr):
pool = drgn.Object(prog, 'struct worker_pool', address=pool)
print(f'pool[{pi:0{max_pool_id_len}}] ref={pool.refcnt.value_():{max_ref_len}} nice={pool.attrs.nice.value_():3} ', end='')
print(f'pool[{pi:0{max_pool_id_len}}] flags=0x{pool.flags.value_():02x} ref={pool.refcnt.value_():{max_ref_len}} nice={pool.attrs.nice.value_():3} ', end='')
print(f'idle/workers={pool.nr_idle.value_():3}/{pool.nr_workers.value_():3} ', end='')
if pool.cpu >= 0:
print(f'cpu={pool.cpu.value_():3}', end='')
if pool.flags & POOL_BH:
print(' bh', end='')
else:
print(f'cpus={cpumask_str(pool.attrs.cpumask)}', end='')
print(f' pod_cpus={cpumask_str(pool.attrs.__pod_cpumask)}', end='')
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment