• Maurizio Lombardi's avatar
    scsi: target: core: Fix hard lockup when executing a compare-and-write command · a72629b5
    Maurizio Lombardi authored
    While handling an I/O completion for the compare portion of a
    COMPARE_AND_WRITE command, it may happen that the
    compare_and_write_callback function submits new bio structs while still in
    softirq context.
    
    Low level drivers like md raid5 do not expect their make_request call to be
    used in softirq context, they call into schedule() and create a deadlocked
    system.
    
     __schedule at ffffffff873a0807
     schedule at ffffffff873a0cc5
     raid5_get_active_stripe at ffffffffc0875744 [raid456]
     raid5_make_request at ffffffffc0875a50 [raid456]
     md_handle_request at ffffffff8713b9f9
     md_make_request at ffffffff8713bacb
     generic_make_request at ffffffff86e6f14b
     submit_bio at ffffffff86e6f27c
     iblock_submit_bios at ffffffffc0b4e4dc [target_core_iblock]
     iblock_execute_rw at ffffffffc0b4f3ce [target_core_iblock]
     __target_execute_cmd at ffffffffc1090079 [target_core_mod]
     compare_and_write_callback at ffffffffc1093602 [target_core_mod]
     target_cmd_interrupted at ffffffffc108d1ec [target_core_mod]
     target_complete_cmd_with_sense at ffffffffc108d27c [target_core_mod]
     iblock_complete_cmd at ffffffffc0b4e23a [target_core_iblock]
     dm_io_dec_pending at ffffffffc00db29e [dm_mod]
     clone_endio at ffffffffc00dbf07 [dm_mod]
     raid5_align_endio at ffffffffc086d6c2 [raid456]
     blk_update_request at ffffffff86e6d950
     scsi_end_request at ffffffff87063d48
     scsi_io_completion at ffffffff87063ee8
     blk_complete_reqs at ffffffff86e77b05
     __softirqentry_text_start at ffffffff876000d7
    
    This problem appears to be an issue between target_cmd_interrupted() and
    compare_and_write_callback(). target_cmd_interrupted() calls the se_cmd's
    transport_complete_callback function pointer if the se_cmd is being stopped
    or aborted, and CMD_T_ABORTED was set on the se_cmd.
    
    When calling compare_and_write_callback(), the success parameter was set to
    false. target_cmd_interrupted() seems to expect this means the callback
    will do cleanup that does not require a process context. But
    compare_and_write_callback() ignores the parameter if there was I/O done
    for the compare part of COMPARE_AND_WRITE.
    
    Since there was data, the function continued on, passed the compare, and
    issued a write while ignoring the value of the success parameter.  The
    submit of a bio for the write portion of the COMPARE_AND_WRITE then causes
    schedule to be unsafely called from the softirq context.
    
    Fix the bug in compare_and_write_callback by jumping to the out label if
    success == "false", after checking if we have been called by
    transport_generic_request_failure(); The command is being aborted or
    stopped so there is no need to submit the write bio for the write part of
    the COMPARE_AND_WRITE command.
    Signed-off-by: default avatarMaurizio Lombardi <mlombard@redhat.com>
    Link: https://lore.kernel.org/r/20221121092703.316489-1-mlombard@redhat.comReviewed-by: default avatarMike Christie <michael.christie@oracle.com>
    Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
    a72629b5
target_core_sbc.c 35.7 KB