• Petr Mladek's avatar
    workqueue: Print backtraces from CPUs with hung CPU bound workqueues · cd2440d6
    Petr Mladek authored
    The workqueue watchdog reports a lockup when there was not any progress
    in the worker pool for a long time. The progress means that a pending
    work item starts being proceed.
    
    Worker pools for unbound workqueues always wake up an idle worker and
    try to process the work immediately. The last idle worker has to create
    new worker first. The stall might happen only when a new worker could
    not be created in which case an error should get printed. Another problem
    might be too high load. In this case, workers are victims of a global
    system problem.
    
    Worker pools for CPU bound workqueues are designed for lightweight
    work items that do not need much CPU time. They are proceed one by
    one on a single worker. New worker is used only when a work is sleeping.
    It creates one additional scenario. The stall might happen when
    the CPU-bound workqueue is used for CPU-intensive work.
    
    More precisely, the stall is detected when a CPU-bound worker is in
    the TASK_RUNNING state for too long. In this case, it might be useful
    to see the backtrace from the problematic worker.
    
    The information how long a worker is in the running state is not available.
    But the CPU-bound worker pools do not have many workers in the running
    state by definition. And only few pools are typically blocked.
    
    It should be acceptable to print backtraces from all workers in
    TASK_RUNNING state in the stalled worker pools. The number of false
    positives should be very low.
    Signed-off-by: default avatarPetr Mladek <pmladek@suse.com>
    Signed-off-by: default avatarTejun Heo <tj@kernel.org>
    cd2440d6
workqueue.c 175 KB