• Nick Child's avatar
    ibmvnic: Do not reset dql stats on NON_FATAL err · 48538ccb
    Nick Child authored
    All ibmvnic resets, make a call to netdev_tx_reset_queue() when
    re-opening the device. netdev_tx_reset_queue() resets the num_queued
    and num_completed byte counters. These stats are used in Byte Queue
    Limit (BQL) algorithms. The difference between these two stats tracks
    the number of bytes currently sitting on the physical NIC. ibmvnic
    increases the number of queued bytes though calls to
    netdev_tx_sent_queue() in the drivers xmit function. When, VIOS reports
    that it is done transmitting bytes, the ibmvnic device increases the
    number of completed bytes through calls to netdev_tx_completed_queue().
    It is important to note that the driver batches its transmit calls and
    num_queued is increased every time that an skb is added to the next
    batch, not necessarily when the batch is sent to VIOS for transmission.
    
    Unlike other reset types, a NON FATAL reset will not flush the sub crq
    tx buffers. Therefore, it is possible for the batched skb array to be
    partially full. So if there is call to netdev_tx_reset_queue() when
    re-opening the device, the value of num_queued (0) would not account
    for the skb's that are currently batched. Eventually, when the batch
    is sent to VIOS, the call to netdev_tx_completed_queue() would increase
    num_completed to a value greater than the num_queued. This causes a
    BUG_ON crash:
    
    ibmvnic 30000002: Firmware reports error, cause: adapter problem.
    Starting recovery...
    ibmvnic 30000002: tx error 600
    ibmvnic 30000002: tx error 600
    ibmvnic 30000002: tx error 600
    ibmvnic 30000002: tx error 600
    ------------[ cut here ]------------
    kernel BUG at lib/dynamic_queue_limits.c:27!
    Oops: Exception in kernel mode, sig: 5
    [....]
    NIP dql_completed+0x28/0x1c0
    LR ibmvnic_complete_tx.isra.0+0x23c/0x420 [ibmvnic]
    Call Trace:
    ibmvnic_complete_tx.isra.0+0x3f8/0x420 [ibmvnic] (unreliable)
    ibmvnic_interrupt_tx+0x40/0x70 [ibmvnic]
    __handle_irq_event_percpu+0x98/0x270
    ---[ end trace ]---
    
    Therefore, do not reset the dql stats when performing a NON_FATAL reset.
    
    Fixes: 0d973388 ("ibmvnic: Introduce xmit_more support using batched subCRQ hcalls")
    Signed-off-by: default avatarNick Child <nnac123@linux.ibm.com>
    Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
    48538ccb
ibmvnic.c 180 KB