• Dave Jiang's avatar
    dmaengine: idxd: fix submission race window · 6b4b87f2
    Dave Jiang authored
    Konstantin observed that when descriptors are submitted, the descriptor is
    added to the pending list after the submission. This creates a race window
    with the slight possibility that the descriptor can complete before it
    gets added to the pending list and this window would cause the completion
    handler to miss processing the descriptor.
    
    To address the issue, the addition of the descriptor to the pending list
    must be done before it gets submitted to the hardware. However, submitting
    to swq with ENQCMDS instruction can cause a failure with the condition of
    either wq is full or wq is not "active".
    
    With the descriptor allocation being the gate to the wq capacity, it is not
    possible to hit a retry with ENQCMDS submission to the swq. The only
    possible failure can happen is when wq is no longer "active" due to hw
    error and therefore we are moving towards taking down the portal. Given
    this is a rare condition and there's no longer concern over I/O
    performance, the driver can walk the completion lists in order to retrieve
    and abort the descriptor.
    
    The error path will set the descriptor to aborted status. It will take the
    work list lock to prevent further processing of worklist. It will do a
    delete_all on the pending llist to retrieve all descriptors on the pending
    llist. The delete_all action does not require a lock. It will walk through
    the acquired llist to find the aborted descriptor while add all remaining
    descriptors to the work list since it holds the lock. If it does not find
    the aborted descriptor on the llist, it will walk through the work
    list. And if it still does not find the descriptor, then it means the
    interrupt handler has removed the desc from the llist but is pending on
    the work list lock and will process it once the error path releases the
    lock.
    
    Fixes: eb15e715 ("dmaengine: idxd: add interrupt handle request and release support")
    Reported-by: default avatarKonstantin Ananyev <konstantin.ananyev@intel.com>
    Signed-off-by: default avatarDave Jiang <dave.jiang@intel.com>
    Link: https://lore.kernel.org/r/162628855747.360485.10101925573082466530.stgit@djiang5-desk3.ch.intel.comSigned-off-by: default avatarVinod Koul <vkoul@kernel.org>
    6b4b87f2
submit.c 4.77 KB