• Jan Kara's avatar
    fs: Don't invalidate page buffers in block_write_full_page() · 6dbf7bb5
    Jan Kara authored
    If block_write_full_page() is called for a page that is beyond current
    inode size, it will truncate page buffers for the page and return 0.
    This logic has been added in 2.5.62 in commit 81eb6906 ("fix ext3
    BUG due to race with truncate") in history.git tree to fix a problem
    with ext3 in data=ordered mode. This particular problem doesn't exist
    anymore because ext3 is long gone and ext4 handles ordered data
    differently. Also normally buffers are invalidated by truncate code and
    there's no need to specially handle this in ->writepage() code.
    
    This invalidation of page buffers in block_write_full_page() is causing
    issues to filesystems (e.g. ext4 or ocfs2) when block device is shrunk
    under filesystem's hands and metadata buffers get discarded while being
    tracked by the journalling layer. Although it is obviously "not
    supported" it can cause kernel crashes like:
    
    [ 7986.689400] BUG: unable to handle kernel NULL pointer dereference at
    +0000000000000008
    [ 7986.697197] PGD 0 P4D 0
    [ 7986.699724] Oops: 0002 [#1] SMP PTI
    [ 7986.703200] CPU: 4 PID: 203778 Comm: jbd2/dm-3-8 Kdump: loaded Tainted: G
    +O     --------- -  - 4.18.0-147.5.0.5.h126.eulerosv2r9.x86_64 #1
    [ 7986.716438] Hardware name: Huawei RH2288H V3/BC11HGSA0, BIOS 1.57 08/11/2015
    [ 7986.723462] RIP: 0010:jbd2_journal_grab_journal_head+0x1b/0x40 [jbd2]
    ...
    [ 7986.810150] Call Trace:
    [ 7986.812595]  __jbd2_journal_insert_checkpoint+0x23/0x70 [jbd2]
    [ 7986.818408]  jbd2_journal_commit_transaction+0x155f/0x1b60 [jbd2]
    [ 7986.836467]  kjournald2+0xbd/0x270 [jbd2]
    
    which is not great. The crash happens because bh->b_private is suddently
    NULL although BH_JBD flag is still set (this is because
    block_invalidatepage() cleared BH_Mapped flag and subsequent bh lookup
    found buffer without BH_Mapped set, called init_page_buffers() which has
    rewritten bh->b_private). So just remove the invalidation in
    block_write_full_page().
    
    Note that the buffer cache invalidation when block device changes size
    is already careful to avoid similar problems by using
    invalidate_mapping_pages() which skips busy buffers so it was only this
    odd block_write_full_page() behavior that could tear down bdev buffers
    under filesystem's hands.
    Reported-by: default avatarYe Bin <yebin10@huawei.com>
    Signed-off-by: default avatarJan Kara <jack@suse.cz>
    Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
    CC: stable@vger.kernel.org
    Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
    6dbf7bb5
buffer.c 89.1 KB