• NeilBrown's avatar
    fscache: fix race between enablement and dropping of object · 765a05e2
    NeilBrown authored
    BugLink: https://bugs.launchpad.net/bugs/1811080
    
    [ Upstream commit c5a94f43 ]
    
    It was observed that a process blocked indefintely in
    __fscache_read_or_alloc_page(), waiting for FSCACHE_COOKIE_LOOKING_UP
    to be cleared via fscache_wait_for_deferred_lookup().
    
    At this time, ->backing_objects was empty, which would normaly prevent
    __fscache_read_or_alloc_page() from getting to the point of waiting.
    This implies that ->backing_objects was cleared *after*
    __fscache_read_or_alloc_page was was entered.
    
    When an object is "killed" and then "dropped",
    FSCACHE_COOKIE_LOOKING_UP is cleared in fscache_lookup_failure(), then
    KILL_OBJECT and DROP_OBJECT are "called" and only in DROP_OBJECT is
    ->backing_objects cleared.  This leaves a window where
    something else can set FSCACHE_COOKIE_LOOKING_UP and
    __fscache_read_or_alloc_page() can start waiting, before
    ->backing_objects is cleared
    
    There is some uncertainty in this analysis, but it seems to be fit the
    observations.  Adding the wake in this patch will be handled correctly
    by __fscache_read_or_alloc_page(), as it checks if ->backing_objects
    is empty again, after waiting.
    
    Customer which reported the hang, also report that the hang cannot be
    reproduced with this fix.
    
    The backtrace for the blocked process looked like:
    
    PID: 29360  TASK: ffff881ff2ac0f80  CPU: 3   COMMAND: "zsh"
     #0 [ffff881ff43efbf8] schedule at ffffffff815e56f1
     #1 [ffff881ff43efc58] bit_wait at ffffffff815e64ed
     #2 [ffff881ff43efc68] __wait_on_bit at ffffffff815e61b8
     #3 [ffff881ff43efca0] out_of_line_wait_on_bit at ffffffff815e625e
     #4 [ffff881ff43efd08] fscache_wait_for_deferred_lookup at ffffffffa04f2e8f [fscache]
     #5 [ffff881ff43efd18] __fscache_read_or_alloc_page at ffffffffa04f2ffe [fscache]
     #6 [ffff881ff43efd58] __nfs_readpage_from_fscache at ffffffffa0679668 [nfs]
     #7 [ffff881ff43efd78] nfs_readpage at ffffffffa067092b [nfs]
     #8 [ffff881ff43efda0] generic_file_read_iter at ffffffff81187a73
     #9 [ffff881ff43efe50] nfs_file_read at ffffffffa066544b [nfs]
    #10 [ffff881ff43efe70] __vfs_read at ffffffff811fc756
    #11 [ffff881ff43efee8] vfs_read at ffffffff811fccfa
    #12 [ffff881ff43eff18] sys_read at ffffffff811fda62
    #13 [ffff881ff43eff50] entry_SYSCALL_64_fastpath at ffffffff815e986e
    Signed-off-by: default avatarNeilBrown <neilb@suse.com>
    Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
    Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
    Signed-off-by: default avatarJuerg Haefliger <juergh@canonical.com>
    Signed-off-by: default avatarStefan Bader <stefan.bader@canonical.com>
    765a05e2
object.c 32.1 KB