Commit 9e1b0211 authored by Guoqing Jiang's avatar Guoqing Jiang Committed by David Teigland

dlm: recheck kthread_should_stop() before schedule()

Call schedule() here could make the thread miss wake
up from kthread_stop(), so it is better to recheck
kthread_should_stop() before call schedule(), a symptom
happened when I run indefinite test (which mostly created
clustered raid1, assemble it in other nodes, then stop
them) of clustered raid.

$ ps aux|grep md|grep D
root      4211  0.0  0.0  19760  2220 ?        Ds   02:58   0:00 mdadm -Ssq
$ cat /proc/4211/stack
kthread_stop+0x4d/0x150
dlm_recoverd_stop+0x15/0x20 [dlm]
dlm_release_lockspace+0x2ab/0x460 [dlm]
leave+0xbf/0x150 [md_cluster]
md_cluster_stop+0x18/0x30 [md_mod]
bitmap_free+0x12e/0x140 [md_mod]
bitmap_destroy+0x7f/0x90 [md_mod]
__md_stop+0x21/0xa0 [md_mod]
do_md_stop+0x15f/0x5c0 [md_mod]
md_ioctl+0xa65/0x18a0 [md_mod]
blkdev_ioctl+0x49e/0x8d0
block_ioctl+0x41/0x50
do_vfs_ioctl+0x96/0x5b0
SyS_ioctl+0x79/0x90
entry_SYSCALL_64_fastpath+0x1e/0xad

This maybe not resolve the issue completely since the
KTHREAD_SHOULD_STOP flag could be set between "break"
and "schedule", but at least the chance for the symptom
happen could be reduce a lot (The indefinite test runs
more than 20 hours without problem and it happens easily
without the change).
Signed-off-by: default avatarGuoqing Jiang <gqjiang@suse.com>
Signed-off-by: default avatarDavid Teigland <teigland@redhat.com>
parent 26b41099
...@@ -299,8 +299,11 @@ static int dlm_recoverd(void *arg) ...@@ -299,8 +299,11 @@ static int dlm_recoverd(void *arg)
break; break;
} }
if (!test_bit(LSFL_RECOVER_WORK, &ls->ls_flags) && if (!test_bit(LSFL_RECOVER_WORK, &ls->ls_flags) &&
!test_bit(LSFL_RECOVER_DOWN, &ls->ls_flags)) !test_bit(LSFL_RECOVER_DOWN, &ls->ls_flags)) {
if (kthread_should_stop())
break;
schedule(); schedule();
}
set_current_state(TASK_RUNNING); set_current_state(TASK_RUNNING);
if (test_and_clear_bit(LSFL_RECOVER_DOWN, &ls->ls_flags)) { if (test_and_clear_bit(LSFL_RECOVER_DOWN, &ls->ls_flags)) {
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment