1. 03 Aug, 2018 1 commit
    • Uma Krishnan's avatar
      scsi: cxlflash: Synchronize reset and remove ops · bb7cccb0
      Uma Krishnan authored
      [ Upstream commit a3feb6ef
      
       ]
      
      The following Oops can be encountered if a device removal or system shutdown
      is initiated while an EEH recovery is in process:
      
      [c000000ff2f479c0] c008000015256f18 cxlflash_pci_slot_reset+0xa0/0x100
                                            [cxlflash]
      [c000000ff2f47a30] c00800000dae22e0 cxl_pci_slot_reset+0x168/0x290 [cxl]
      [c000000ff2f47ae0] c00000000003ef1c eeh_report_reset+0xec/0x170
      [c000000ff2f47b20] c00000000003d0b8 eeh_pe_dev_traverse+0x98/0x170
      [c000000ff2f47bb0] c00000000003f80c eeh_handle_normal_event+0x56c/0x580
      [c000000ff2f47c60] c00000000003fba4 eeh_handle_event+0x2a4/0x338
      [c000000ff2f47d10] c0000000000400b8 eeh_event_handler+0x1f8/0x200
      [c000000ff2f47dc0] c00000000013da48 kthread+0x1a8/0x1b0
      [c000000ff2f47e30] c00000000000b528 ret_from_kernel_thread+0x5c/0xb4
      
      The remove handler frees AFU memory while the EEH recovery is in progress,
      leading to a race condition. This can result in a crash if the recovery thread
      tries to access this memory.
      
      To resolve this issue, the cxlflash remove handler will evaluate the device
      state and yield to any active reset or probing threads.
      Signed-off-by: default avatarUma Krishnan <ukrishn@linux.vnet.ibm.com>
      Acked-by: default avatarMatthew R. Ochs <mrochs@linux.vnet.ibm.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bb7cccb0
  2. 16 Feb, 2018 1 commit
  3. 25 Aug, 2017 1 commit
  4. 12 Jul, 2017 1 commit
  5. 01 Jul, 2017 3 commits
  6. 26 Jun, 2017 15 commits
  7. 14 Apr, 2017 16 commits
  8. 22 Feb, 2017 1 commit
  9. 12 Jan, 2017 1 commit
    • Uma Krishnan's avatar
      scsi: cxlflash: Cancel scheduled workers before stopping AFU · 0df5bef7
      Uma Krishnan authored
      When processing an AFU asynchronous interrupt, if the action results in an
      operation that requires off level processing (a link reset for example),
      the worker thread is scheduled. In the meantime a reset event (i.e.: EEH)
      could unmap the AFU to recover. This results in an Oops when the worker
      thread tries to access the AFU mapping.
      
      [c000000f17e03b90] d000000007cd5978 cxlflash_worker_thread+0x268/0x550
      [c000000f17e03c40] c00000000011883c process_one_work+0x1dc/0x680
      [c000000f17e03ce0] c000000000118e80 worker_thread+0x1a0/0x520
      [c000000f17e03d80] c000000000126174 kthread+0xf4/0x100
      [c000000f17e03e30] c00000000000a47c ret_from_kernel_thread+0x5c/0xe0
      
      In an effort to avoid this, a mapcount was introduced in
      commit b45cdbaf ("cxlflash: Resolve oops in wait_port_offline")
      but due to the race condition described above, this solution is incomplete.
      
      In order to fully resolve this problem and to simplify things, this commit
      removes the mapcount solution. Instead, the scheduled worker thread is
      cancelled after interrupts have been disabled and prior to the mapping
      being freed.
      
      Fixes: b45cdbaf
      
       ("cxlflash: Resolve oops in wait_port_offline")
      Signed-off-by: default avatarUma Krishnan <ukrishn@linux.vnet.ibm.com>
      Acked-by: default avatarMatthew R. Ochs <mrochs@linux.vnet.ibm.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      0df5bef7