• Heming Zhao via Ocfs2-devel's avatar
    ocfs2: fix defrag path triggering jbd2 ASSERT · 60eed1e3
    Heming Zhao via Ocfs2-devel authored
    code path:
    
    ocfs2_ioctl_move_extents
     ocfs2_move_extents
      ocfs2_defrag_extent
       __ocfs2_move_extent
        + ocfs2_journal_access_di
        + ocfs2_split_extent  //sub-paths call jbd2_journal_restart
        + ocfs2_journal_dirty //crash by jbs2 ASSERT
    
    crash stacks:
    
    PID: 11297  TASK: ffff974a676dcd00  CPU: 67  COMMAND: "defragfs.ocfs2"
     #0 [ffffb25d8dad3900] machine_kexec at ffffffff8386fe01
     #1 [ffffb25d8dad3958] __crash_kexec at ffffffff8395959d
     #2 [ffffb25d8dad3a20] crash_kexec at ffffffff8395a45d
     #3 [ffffb25d8dad3a38] oops_end at ffffffff83836d3f
     #4 [ffffb25d8dad3a58] do_trap at ffffffff83833205
     #5 [ffffb25d8dad3aa0] do_invalid_op at ffffffff83833aa6
     #6 [ffffb25d8dad3ac0] invalid_op at ffffffff84200d18
        [exception RIP: jbd2_journal_dirty_metadata+0x2ba]
        RIP: ffffffffc09ca54a  RSP: ffffb25d8dad3b70  RFLAGS: 00010207
        RAX: 0000000000000000  RBX: ffff9706eedc5248  RCX: 0000000000000000
        RDX: 0000000000000001  RSI: ffff97337029ea28  RDI: ffff9706eedc5250
        RBP: ffff9703c3520200   R8: 000000000f46b0b2   R9: 0000000000000000
        R10: 0000000000000001  R11: 00000001000000fe  R12: ffff97337029ea28
        R13: 0000000000000000  R14: ffff9703de59bf60  R15: ffff9706eedc5250
        ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
     #7 [ffffb25d8dad3ba8] ocfs2_journal_dirty at ffffffffc137fb95 [ocfs2]
     #8 [ffffb25d8dad3be8] __ocfs2_move_extent at ffffffffc139a950 [ocfs2]
     #9 [ffffb25d8dad3c80] ocfs2_defrag_extent at ffffffffc139b2d2 [ocfs2]
    
    Analysis
    
    This bug has the same root cause of 'commit 7f27ec97 ("ocfs2: call
    ocfs2_journal_access_di() before ocfs2_journal_dirty() in
    ocfs2_write_end_nolock()")'.  For this bug, jbd2_journal_restart() is
    called by ocfs2_split_extent() during defragmenting.
    
    How to fix
    
    For ocfs2_split_extent() can handle journal operations totally by itself. 
    Caller doesn't need to call journal access/dirty pair, and caller only
    needs to call journal start/stop pair.  The fix method is to remove
    journal access/dirty from __ocfs2_move_extent().
    
    The discussion for this patch:
    https://oss.oracle.com/pipermail/ocfs2-devel/2023-February/000647.html
    
    Link: https://lkml.kernel.org/r/20230217003717.32469-1-heming.zhao@suse.comSigned-off-by: default avatarHeming Zhao <heming.zhao@suse.com>
    Reviewed-by: default avatarJoseph Qi <joseph.qi@linux.alibaba.com>
    Cc: Mark Fasheh <mark@fasheh.com>
    Cc: Joel Becker <jlbec@evilplan.org>
    Cc: Junxiao Bi <junxiao.bi@oracle.com>
    Cc: Changwei Ge <gechangwei@live.cn>
    Cc: Gang He <ghe@suse.com>
    Cc: Jun Piao <piaojun@huawei.com>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    60eed1e3
move_extents.c 24.9 KB