• Ryusuke Konishi's avatar
    nilfs2: fix deadlock in nilfs_count_free_blocks() · 8ac932a4
    Ryusuke Konishi authored
    A semaphore deadlock can occur if nilfs_get_block() detects metadata
    corruption while locating data blocks and a superblock writeback occurs at
    the same time:
    
    task 1                               task 2
    ------                               ------
    * A file operation *
    nilfs_truncate()
      nilfs_get_block()
        down_read(rwsem A) <--
        nilfs_bmap_lookup_contig()
          ...                            generic_shutdown_super()
                                           nilfs_put_super()
                                             * Prepare to write superblock *
                                             down_write(rwsem B) <--
                                             nilfs_cleanup_super()
          * Detect b-tree corruption *         nilfs_set_log_cursor()
          nilfs_bmap_convert_error()             nilfs_count_free_blocks()
            __nilfs_error()                        down_read(rwsem A) <--
              nilfs_set_error()
                down_write(rwsem B) <--
    
                               *** DEADLOCK ***
    
    Here, nilfs_get_block() readlocks rwsem A (= NILFS_MDT(dat_inode)->mi_sem)
    and then calls nilfs_bmap_lookup_contig(), but if it fails due to metadata
    corruption, __nilfs_error() is called from nilfs_bmap_convert_error()
    inside the lock section.
    
    Since __nilfs_error() calls nilfs_set_error() unless the filesystem is
    read-only and nilfs_set_error() attempts to writelock rwsem B (=
    nilfs->ns_sem) to write back superblock exclusively, hierarchical lock
    acquisition occurs in the order rwsem A -> rwsem B.
    
    Now, if another task starts updating the superblock, it may writelock
    rwsem B during the lock sequence above, and can deadlock trying to
    readlock rwsem A in nilfs_count_free_blocks().
    
    However, there is actually no need to take rwsem A in
    nilfs_count_free_blocks() because it, within the lock section, only reads
    a single integer data on a shared struct with
    nilfs_sufile_get_ncleansegs().  This has been the case after commit
    aa474a22 ("nilfs2: add local variable to cache the number of clean
    segments"), that is, even before this bug was introduced.
    
    So, this resolves the deadlock problem by just not taking the semaphore in
    nilfs_count_free_blocks().
    
    Link: https://lkml.kernel.org/r/20221029044912.9139-1-konishi.ryusuke@gmail.com
    Fixes: e828949e ("nilfs2: call nilfs_error inside bmap routines")
    Signed-off-by: default avatarRyusuke Konishi <konishi.ryusuke@gmail.com>
    Reported-by: syzbot+45d6ce7b7ad7ef455d03@syzkaller.appspotmail.com
    Tested-by: default avatarRyusuke Konishi <konishi.ryusuke@gmail.com>
    Cc: <stable@vger.kernel.org>	[2.6.38+
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    8ac932a4
the_nilfs.c 20.1 KB