• Douglas Anderson's avatar
    usb: dwc2: host: Don't retry NAKed transactions right away · 38d2b5fb
    Douglas Anderson authored
    On rk3288-veyron devices on Chrome OS it was found that plugging in an
    Arduino-based USB device could cause the system to lockup, especially
    if the CPU Frequency was at one of the slower operating points (like
    100 MHz / 200 MHz).
    
    Upon tracing, I found that the following was happening:
    * The USB device (full speed) was connected to a high speed hub and
      then to the rk3288.  Thus, we were dealing with split transactions,
      which is all handled in software on dwc2.
    * Userspace was initiating a BULK IN transfer
    * When we sent the SSPLIT (to start the split transaction), we got an
      ACK.  Good.  Then we issued the CSPLIT.
    * When we sent the CSPLIT, we got back a NAK.  We immediately (from
      the interrupt handler) started to retry and sent another SSPLIT.
    * The device kept NAKing our CSPLIT, so we kept ping-ponging between
      sending a SSPLIT and a CSPLIT, each time sending from the interrupt
      handler.
    * The handling of the interrupts was (because of the low CPU speed and
      the inefficiency of the dwc2 interrupt handler) was actually taking
      _longer_ than it took the other side to send the ACK/NAK.  Thus we
      were _always_ in the USB interrupt routine.
    * The fact that USB interrupts were always going off was preventing
      other things from happening in the system.  This included preventing
      the system from being able to transition to a higher CPU frequency.
    
    As I understand it, there is no requirement to retry super quickly
    after a NAK, we just have to retry sometime in the future.  Thus one
    solution to the above is to just add a delay between getting a NAK and
    retrying the transmission.  If this delay is sufficiently long to get
    out of the interrupt routine then the rest of the system will be able
    to make forward progress.  Even a 25 us delay would probably be
    enough, but we'll be extra conservative and try to delay 1 ms (the
    exact amount depends on HZ and the accuracy of the jiffy and how close
    the current jiffy is to ticking, but could be as much as 20 ms or as
    little as 1 ms).
    
    Presumably adding a delay like this could impact the USB throughput,
    so we only add the delay with repeated NAKs.
    
    NOTE: Upon further testing of a pl2303 serial adapter, I found that
    this fix may help with problems there.  Specifically I found that the
    pl2303 serial adapters tend to respond with a NAK when they have
    nothing to say and thus we end with this same sequence.
    Signed-off-by: default avatarDouglas Anderson <dianders@chromium.org>
    Reviewed-by: default avatarJulius Werner <jwerner@chromium.org>
    Tested-by: default avatarStefan Wahren <stefan.wahren@i2se.com>
    Acked-by: default avatarJohn Youn <johnyoun@synopsys.com>
    Signed-off-by: default avatarFelipe Balbi <felipe.balbi@linux.intel.com>
    38d2b5fb
hcd_queue.c 64.5 KB