• Emmanuel Grumbach's avatar
    iwlwifi: fix the NMI flow for old devices · a800f958
    Emmanuel Grumbach authored
    I noticed that the flow that triggers an NMI on the firmware
    for old devices (tested on 7265) doesn't work.
    Apparently, the firmware / device is still in low power when
    we write the register that triggers the NMI. We call the
    "grab_nic_access" function to make sure the device is awake
    but that wasn't enough. I played with this and noticed that
    if we wait 1 ms after the device reports it is awake before
    we write to the NMI register, the device always sees our
    write and the firmware gets properly asserted.
    
    Triggering an NMI to the firmware can be done with the
    debugfs hook:
    echo 1 > /sys/kernel/debug/iwlwifi/0000\:00\:03.0/iwlmvm/fw_nmi
    
    What happened before is that the firmware would just stall
    without running its NMI routine. Because of that the driver
    wouldn't get the "firmware crashed" interrupt. After a while
    the driver would notice that the firmware is not responding
    to some command and it would read the error data from the
    firmware, but this data is populated in the NMI service
    routine in the firmware which was not called. So in the logs
    it looked like:
    
    iwlwifi 0000:00:03.0: Error sending REPLY_ERROR: time out after 2000ms.
    iwlwifi 0000:00:03.0: Current CMD queue read_ptr 33 write_ptr 34
    iwlwifi 0000:00:03.0: Loaded firmware version: 29.09bd31e1.0 7265D-29.ucode
    iwlwifi 0000:00:03.0: 0x00000000 | ADVANCED_SYSASSERT
    iwlwifi 0000:00:03.0: 0x00000000 | trm_hw_status0
    iwlwifi 0000:00:03.0: 0x00000000 | trm_hw_status1
    iwlwifi 0000:00:03.0: 0x00000000 | branchlink2
    iwlwifi 0000:00:03.0: 0x00000000 | interruptlink1
    iwlwifi 0000:00:03.0: 0x00000000 | interruptlink2
    iwlwifi 0000:00:03.0: 0x00000000 | data1
    iwlwifi 0000:00:03.0: 0x00000000 | data2
    iwlwifi 0000:00:03.0: 0x00000000 | data3
    iwlwifi 0000:00:03.0: 0x00000000 | beacon time
    iwlwifi 0000:00:03.0: 0x00000000 | tsf low
    ...
    
    With this fix, immediately after we trigger the NMI to the
    firmware, we get the expected:
    iwlwifi 0000:00:03.0: Microcode SW error detected.  Restarting 0x2000000.
    iwlwifi 0000:00:03.0: Start IWL Error Log Dump:
    iwlwifi 0000:00:03.0: Status: 0x00000040, count: 6
    iwlwifi 0000:00:03.0: Loaded firmware version: 29.09bd31e1.0 7265D-29.ucode
    iwlwifi 0000:00:03.0: 0x00000084 | NMI_INTERRUPT_UNKNOWN
    iwlwifi 0000:00:03.0: 0x000002F1 | trm_hw_status0
    iwlwifi 0000:00:03.0: 0x00000000 | trm_hw_status1
    iwlwifi 0000:00:03.0: 0x00043D6C | branchlink2
    iwlwifi 0000:00:03.0: 0x0004AFD6 | interruptlink1
    iwlwifi 0000:00:03.0: 0x000008C4 | interruptlink2
    iwlwifi 0000:00:03.0: 0x00000000 | data1
    iwlwifi 0000:00:03.0: 0x00000080 | data2
    iwlwifi 0000:00:03.0: 0x07030000 | data3
    iwlwifi 0000:00:03.0: 0x003FD4C3 | beacon time
    iwlwifi 0000:00:03.0: 0x00C22AC3 | tsf low
    iwlwifi 0000:00:03.0: 0x00000000 | tsf hi
    iwlwifi 0000:00:03.0: 0x00000000 | time gp1
    iwlwifi 0000:00:03.0: 0x00C22AC3 | time gp2
    iwlwifi 0000:00:03.0: 0x00000001 | uCode revision type
    iwlwifi 0000:00:03.0: 0x0000001D | uCode version major
    
    Notice the first line: "Microcode SW error detected:" which
    is printed in the driver's ISR, which means that the driver
    actually got an interrupt from the firmware saying that it
    crashed. And then we have the properly populated error data.
    Signed-off-by: default avatarEmmanuel Grumbach <emmanuel.grumbach@intel.com>
    Signed-off-by: default avatarLuca Coelho <luciano.coelho@intel.com>
    Signed-off-by: default avatarKalle Valo <kvalo@codeaurora.org>
    Link: https://lore.kernel.org/r/iwlwifi.20210115130252.70e67cc75d88.I6615cad4361862e7f3c9f2d3cafb6a8c61e16781@changeid
    a800f958
iwl-io.c 10.7 KB