• Hou Tao's avatar
    fscache: Use wait_on_bit() to wait for the freeing of relinquished volume · 8226e37d
    Hou Tao authored
    The freeing of relinquished volume will wake up the pending volume
    acquisition by using wake_up_bit(), however it is mismatched with
    wait_var_event() used in fscache_wait_on_volume_collision() and it will
    never wake up the waiter in the wait-queue because these two functions
    operate on different wait-queues.
    
    According to the implementation in fscache_wait_on_volume_collision(),
    if the wake-up of pending acquisition is delayed longer than 20 seconds
    (e.g., due to the delay of on-demand fd closing), the first
    wait_var_event_timeout() will timeout and the following wait_var_event()
    will hang forever as shown below:
    
     FS-Cache: Potential volume collision new=00000024 old=00000022
     ......
     INFO: task mount:1148 blocked for more than 122 seconds.
           Not tainted 6.1.0-rc6+ #1
     task:mount           state:D stack:0     pid:1148  ppid:1
     Call Trace:
      <TASK>
      __schedule+0x2f6/0xb80
      schedule+0x67/0xe0
      fscache_wait_on_volume_collision.cold+0x80/0x82
      __fscache_acquire_volume+0x40d/0x4e0
      erofs_fscache_register_volume+0x51/0xe0 [erofs]
      erofs_fscache_register_fs+0x19c/0x240 [erofs]
      erofs_fc_fill_super+0x746/0xaf0 [erofs]
      vfs_get_super+0x7d/0x100
      get_tree_nodev+0x16/0x20
      erofs_fc_get_tree+0x20/0x30 [erofs]
      vfs_get_tree+0x24/0xb0
      path_mount+0x2fa/0xa90
      do_mount+0x7c/0xa0
      __x64_sys_mount+0x8b/0xe0
      do_syscall_64+0x30/0x60
      entry_SYSCALL_64_after_hwframe+0x46/0xb0
    
    Considering that wake_up_bit() is more selective, so fix it by using
    wait_on_bit() instead of wait_var_event() to wait for the freeing of
    relinquished volume. In addition because waitqueue_active() is used in
    wake_up_bit() and clear_bit() doesn't imply any memory barrier, use
    clear_and_wake_up_bit() to add the missing memory barrier between
    cursor->flags and waitqueue_active().
    
    Fixes: 62ab6335 ("fscache: Implement volume registration")
    Reviewed-by: default avatarJingbo Xu <jefflexu@linux.alibaba.com>
    Signed-off-by: default avatarHou Tao <houtao1@huawei.com>
    Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
    Reviewed-by: default avatarJeff Layton <jlayton@kernel.org>
    Link: https://lore.kernel.org/r/20230113115211.2895845-2-houtao@huaweicloud.com/ # v3
    8226e37d
volume.c 14.7 KB