• Zhihao Cheng's avatar
    ext4: Fix reusing stale buffer heads from last failed mounting · 26fb5290
    Zhihao Cheng authored
    Following process makes ext4 load stale buffer heads from last failed
    mounting in a new mounting operation:
    mount_bdev
     ext4_fill_super
     | ext4_load_and_init_journal
     |  ext4_load_journal
     |   jbd2_journal_load
     |    load_superblock
     |     journal_get_superblock
     |      set_buffer_verified(bh) // buffer head is verified
     |   jbd2_journal_recover // failed caused by EIO
     | goto failed_mount3a // skip 'sb->s_root' initialization
     deactivate_locked_super
      kill_block_super
       generic_shutdown_super
        if (sb->s_root)
        // false, skip ext4_put_super->invalidate_bdev->
        // invalidate_mapping_pages->mapping_evict_folio->
        // filemap_release_folio->try_to_free_buffers, which
        // cannot drop buffer head.
       blkdev_put
        blkdev_put_whole
         if (atomic_dec_and_test(&bdev->bd_openers))
         // false, systemd-udev happens to open the device. Then
         // blkdev_flush_mapping->kill_bdev->truncate_inode_pages->
         // truncate_inode_folio->truncate_cleanup_folio->
         // folio_invalidate->block_invalidate_folio->
         // filemap_release_folio->try_to_free_buffers will be skipped,
         // dropping buffer head is missed again.
    
    Second mount:
    ext4_fill_super
     ext4_load_and_init_journal
      ext4_load_journal
       ext4_get_journal
        jbd2_journal_init_inode
         journal_init_common
          bh = getblk_unmovable
           bh = __find_get_block // Found stale bh in last failed mounting
          journal->j_sb_buffer = bh
       jbd2_journal_load
        load_superblock
         journal_get_superblock
          if (buffer_verified(bh))
          // true, skip journal->j_format_version = 2, value is 0
        jbd2_journal_recover
         do_one_pass
          next_log_block += count_tags(journal, bh)
          // According to journal_tag_bytes(), 'tag_bytes' calculating is
          // affected by jbd2_has_feature_csum3(), jbd2_has_feature_csum3()
          // returns false because 'j->j_format_version >= 2' is not true,
          // then we get wrong next_log_block. The do_one_pass may exit
          // early whenoccuring non JBD2_MAGIC_NUMBER in 'next_log_block'.
    
    The filesystem is corrupted here, journal is partially replayed, and
    new journal sequence number actually is already used by last mounting.
    
    The invalidate_bdev() can drop all buffer heads even racing with bare
    reading block device(eg. systemd-udev), so we can fix it by invalidating
    bdev in error handling path in __ext4_fill_super().
    
    Fetch a reproducer in [Link].
    
    Link: https://bugzilla.kernel.org/show_bug.cgi?id=217171
    Fixes: 25ed6e8a ("jbd2: enable journal clients to enable v2 checksumming")
    Cc: stable@vger.kernel.org # v3.5
    Signed-off-by: default avatarZhihao Cheng <chengzhihao1@huawei.com>
    Reviewed-by: default avatarJan Kara <jack@suse.cz>
    Link: https://lore.kernel.org/r/20230315013128.3911115-2-chengzhihao1@huawei.comSigned-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
    26fb5290
super.c 204 KB