• Sunil Mushran's avatar
    [PATCH 2/2] ocfs2: Fix race between mount and recovery · 539d8264
    Sunil Mushran authored
    As the fs recovery is asynchronous, there is a small chance that another
    node can mount (and thus recover) the slot before the recovery thread
    gets to it.
    
    If this happens, the recovery thread will block indefinitely on the
    journal/slot lock as that lock will be held for the duration of the mount
    (by design) by the node assigned to that slot.
    
    The solution implemented is to keep track of the journal replays using
    a recovery generation in the journal inode, which will be incremented by the
    thread replaying that journal. The recovery thread, before attempting the
    blocking lock on the journal/slot lock, will compare the generation on disk
    with what it has cached and skip recovery if it does not match.
    
    This bug appears to have been inadvertently introduced during the mount/umount
    vote removal by mainline commit 34d024f8. In the
    mount voting scheme, the messaging would indirectly indicate that the slot
    was being recovered.
    Signed-off-by: default avatarSunil Mushran <sunil.mushran@oracle.com>
    Signed-off-by: default avatarMark Fasheh <mfasheh@suse.com>
    539d8264
journal.c 43.6 KB