1. 04 Dec, 2018 3 commits
    • Lukas Wunner's avatar
      spi: bcm2835: Synchronize with callback on DMA termination · 2527704d
      Lukas Wunner authored
      Commit b36f09c3 ("dmaengine: Add transfer termination
      synchronization support") deprecated dmaengine_terminate_all() in favor
      of dmaengine_terminate_sync() and dmaengine_terminate_async() to avoid
      freeing resources used by the DMA callback before its execution has
      concluded.
      
      Commit de92436a ("dmaengine: bcm2835-dma: Use vchan_terminate_vdesc()
      instead of desc_free") amended the BCM2835 DMA driver with an
      implementation of ->device_synchronize(), which is a prerequisite for
      dmaengine_terminate_sync().  Thus, clients of the DMA driver (such as
      the BCM2835 SPI driver) may now be converted to the new API.
      
      It is generally desirable to use the _sync() variant except in atomic
      context.  There is only a single occurrence where the BCM2835 SPI driver
      calls dmaengine_terminate_all() in atomic context and that is in
      bcm2835_spi_dma_done() (the RX DMA channel's callback) to terminate the
      TX DMA channel.  The TX DMA channel doesn't have a callback (yet), hence
      it is safe to use the _async() variant there.
      Signed-off-by: default avatarLukas Wunner <lukas@wunner.de>
      Cc: Frank Pavlic <f.pavlic@kunbus.de>
      Cc: Martin Sperl <kernel@martin.sperl.org>
      Cc: Noralf Trønnes <noralf@tronnes.org>
      Cc: Vinod Koul <vkoul@kernel.org>
      Signed-off-by: default avatarMark Brown <broonie@kernel.org>
      2527704d
    • Lukas Wunner's avatar
      spi: bcm2835: Speed up FIFO access if fill level is known · 2e0733bc
      Lukas Wunner authored
      The RX and TX FIFO of the BCM2835 SPI master each accommodate 64 bytes
      (16 32-bit dwords).  The CS register provides hints on their fill level:
      
         "Bit 19  RXR - RX FIFO needs Reading ([¾] full)
          0 = RX FIFO is less than [¾] full (or not active TA = 0).
          1 = RX FIFO is [¾] or more full. Cleared by reading sufficient
              data from the RX FIFO or setting TA to 0."
      
         "Bit 16  DONE - Transfer Done
          0 = Transfer is in progress (or not active TA = 0).
          1 = Transfer is complete. Cleared by writing more data to the
              TX FIFO or setting TA to 0."
      
         "If DONE is set [...], write up to 16 [dwords] to SPI_FIFO. [...]
          If RXR is set read 12 [dwords] data from SPI_FIFO."
      
         [Source: Pages 153, 154 and 158 of
          https://www.raspberrypi.org/app/uploads/2012/02/BCM2835-ARM-Peripherals.pdf
          Note: The spec is missing the "¾" character, presumably due to
          copy-pasting from a different charset.  It also incorrectly
          refers to 16 and 12 "bytes" instead of 32-bit dwords.]
      
      In short, the RXR bit indicates that 48 bytes can be read and the DONE
      bit indicates 64 bytes can be written.  Leverage this knowledge to read
      or write bytes blindly to the FIFO, without polling whether data can be
      read or free space is available to write.  Moreover, when a transfer is
      starting, the TX FIFO is known to be empty, likewise allowing a blind
      write of 64 bytes.
      
      This cuts the number of bus accesses in half if the fill level is known.
      Also, the (posted) write accesses can be pipelined on the AXI bus since
      they are no longer interleaved with (non-posted) reads.
      
      bcm2835_spi_transfer_one_poll() switches to interrupt mode when a time
      limit is exceeded by calling bcm2835_spi_transfer_one_irq().  The TX
      FIFO may contain data in this case, but is known to be empty when the
      function is called from bcm2835_spi_transfer_one().  Hence only blindly
      fill the TX FIFO in the latter case but not the former.
      Signed-off-by: default avatarLukas Wunner <lukas@wunner.de>
      Tested-by: default avatarEric Anholt <eric@anholt.net>
      Cc: Frank Pavlic <f.pavlic@kunbus.de>
      Cc: Martin Sperl <kernel@martin.sperl.org>
      Cc: Noralf Trønnes <noralf@tronnes.org>
      Signed-off-by: default avatarMark Brown <broonie@kernel.org>
      2e0733bc
    • Lukas Wunner's avatar
      spi: bcm2835: Polish transfer of DMA prologue · b31a9299
      Lukas Wunner authored
      Commit 3bd7f658 ("spi: bcm2835: Overcome sglist entry length
      limitation") was unfortunately merged even though submission of a
      refined version was imminent.  Apply those refinements as an amendment:
      
      * Drop no longer needed #include <asm/page.h>.  The lines requiring
        its inclusion were removed by the commit.
      
      * Change type of tx_spillover flag from bool to unsigned int for
        consistency with dma_pending flag and pursuant to Linus' dictum:
        https://lkml.org/lkml/2017/11/21/384
      
      * In bcm2835_rd_fifo_count() do not check for bs->rx_buf != NULL.
        The function will never be called if that's the case.
      
      * Amend kerneldoc of bcm2835_wait_tx_fifo_empty() to prevent its use in
        situations where the function might spin forever.  (In response to a
        review comment by Stefan Wahren.)
      
      * Sync only the cacheline containing the RX prologue back to memory,
        not the full first sglist entry.
      
      * Use sg_dma_address() and sg_dma_len() instead of referencing the
        sglist entry members directly.  Seems to be the more common syntax in
        the tree, even for lvalues.
      Signed-off-by: default avatarLukas Wunner <lukas@wunner.de>
      Cc: Frank Pavlic <f.pavlic@kunbus.de>
      Cc: Martin Sperl <kernel@martin.sperl.org>
      Cc: Noralf Trønnes <noralf@tronnes.org>
      Signed-off-by: default avatarMark Brown <broonie@kernel.org>
      b31a9299
  2. 03 Dec, 2018 3 commits
  3. 29 Nov, 2018 3 commits
  4. 28 Nov, 2018 7 commits
    • Lukas Wunner's avatar
      spi: bcm2835: Overcome sglist entry length limitation · 3bd7f658
      Lukas Wunner authored
      When in DMA mode, the BCM2835 SPI controller requires that the FIFO is
      accessed in 4 byte chunks.  This rule is not fulfilled if a transfer
      consists of multiple sglist entries, one per page, and the first entry
      starts in the middle of a page with an offset not a multiple of 4.
      
      The driver currently falls back to programmed I/O for such transfers,
      incurring a significant performance penalty.
      
      Overcome this hardware limitation by transferring the first few bytes of
      a transfer without DMA such that the remainder of the first sglist entry
      becomes a multiple of 4.  Specifics are provided in kerneldoc comments.
      
      An alternative approach would have been to split transfers in the
      ->prepare_message hook, but this may necessitate two transfers per page,
      defeating the goal of clustering multiple pages together in a single
      transfer for efficiency.  E.g. if the first TX sglist entry's length is
      23 and the first RX's is 40, the first transfer would send and receive
      23 bytes, the second 40 - 23 = 17 bytes, the third 4096 - 17 = 4079
      bytes, the fourth 4096 - 4079 = 17 bytes and so on.  In other words,
      O(n) transfers are necessary (n = number of sglist entries), whereas
      the algorithm implemented herein only requires O(1) additional work.
      Signed-off-by: default avatarLukas Wunner <lukas@wunner.de>
      Cc: Mathias Duckeck <m.duckeck@kunbus.de>
      Cc: Frank Pavlic <f.pavlic@kunbus.de>
      Cc: Martin Sperl <kernel@martin.sperl.org>
      Cc: Noralf Trønnes <noralf@tronnes.org>
      Signed-off-by: default avatarMark Brown <broonie@kernel.org>
      3bd7f658
    • Lukas Wunner's avatar
      spi: bcm2835: Document struct bcm2835_spi · acf0f856
      Lukas Wunner authored
      Document the driver's data structure to lower the barrier to entry for
      contributors.
      Signed-off-by: default avatarLukas Wunner <lukas@wunner.de>
      Cc: Mathias Duckeck <m.duckeck@kunbus.de>
      Cc: Frank Pavlic <f.pavlic@kunbus.de>
      Cc: Martin Sperl <kernel@martin.sperl.org>
      Cc: Noralf Trønnes <noralf@tronnes.org>
      Signed-off-by: default avatarMark Brown <broonie@kernel.org>
      acf0f856
    • Lukas Wunner's avatar
      spi: bcm2835: Drop unused code for native Chip Select · 5c09e42f
      Lukas Wunner authored
      Commit a30a555d ("spi: bcm2835: transform native-cs to gpio-cs on
      first spi_setup") disabled the use of hardware-controlled native Chip
      Select in favour of software-controlled GPIO Chip Select but left code
      to support the former untouched.  Remove it to simplify the driver and
      ease the addition of new features and further optimizations.
      Signed-off-by: default avatarLukas Wunner <lukas@wunner.de>
      Cc: Mathias Duckeck <m.duckeck@kunbus.de>
      Cc: Frank Pavlic <f.pavlic@kunbus.de>
      Cc: Martin Sperl <kernel@martin.sperl.org>
      Cc: Noralf Trønnes <noralf@tronnes.org>
      Signed-off-by: default avatarMark Brown <broonie@kernel.org>
      5c09e42f
    • Mark Brown's avatar
      c06eea7d
    • Lukas Wunner's avatar
      spi: bcm2835: Fix race on DMA termination · e82b0b38
      Lukas Wunner authored
      If a DMA transfer finishes orderly right when spi_transfer_one_message()
      determines that it has timed out, the callbacks bcm2835_spi_dma_done()
      and bcm2835_spi_handle_err() race to call dmaengine_terminate_all(),
      potentially leading to double termination.
      
      Prevent by atomically changing the dma_pending flag before calling
      dmaengine_terminate_all().
      Signed-off-by: default avatarLukas Wunner <lukas@wunner.de>
      Fixes: 3ecd37ed ("spi: bcm2835: enable dma modes for transfers meeting certain conditions")
      Cc: stable@vger.kernel.org # v4.2+
      Cc: Mathias Duckeck <m.duckeck@kunbus.de>
      Cc: Frank Pavlic <f.pavlic@kunbus.de>
      Cc: Martin Sperl <kernel@martin.sperl.org>
      Cc: Noralf Trønnes <noralf@tronnes.org>
      Signed-off-by: default avatarMark Brown <broonie@kernel.org>
      e82b0b38
    • Lukas Wunner's avatar
      spi: bcm2835: Fix book-keeping of DMA termination · dbc94411
      Lukas Wunner authored
      If submission of a DMA TX transfer succeeds but submission of the
      corresponding RX transfer does not, the BCM2835 SPI driver terminates
      the TX transfer but neglects to reset the dma_pending flag to false.
      
      Thus, if the next transfer uses interrupt mode (because it is shorter
      than BCM2835_SPI_DMA_MIN_LENGTH) and runs into a timeout,
      dmaengine_terminate_all() will be called both for TX (once more) and
      for RX (which was never started in the first place).  Fix it.
      Signed-off-by: default avatarLukas Wunner <lukas@wunner.de>
      Fixes: 3ecd37ed ("spi: bcm2835: enable dma modes for transfers meeting certain conditions")
      Cc: stable@vger.kernel.org # v4.2+
      Cc: Mathias Duckeck <m.duckeck@kunbus.de>
      Cc: Frank Pavlic <f.pavlic@kunbus.de>
      Cc: Martin Sperl <kernel@martin.sperl.org>
      Cc: Noralf Trønnes <noralf@tronnes.org>
      Signed-off-by: default avatarMark Brown <broonie@kernel.org>
      dbc94411
    • Lukas Wunner's avatar
      spi: bcm2835: Avoid finishing transfer prematurely in IRQ mode · 56c17234
      Lukas Wunner authored
      The IRQ handler bcm2835_spi_interrupt() first reads as much as possible
      from the RX FIFO, then writes as much as possible to the TX FIFO.
      Afterwards it decides whether the transfer is finished by checking if
      the TX FIFO is empty.
      
      If very few bytes were written to the TX FIFO, they may already have
      been transmitted by the time the FIFO's emptiness is checked.  As a
      result, the transfer will be declared finished and the chip will be
      reset without reading the corresponding received bytes from the RX FIFO.
      
      The odds of this happening increase with a high clock frequency (such
      that the TX FIFO drains quickly) and either passing "threadirqs" on the
      command line or enabling CONFIG_PREEMPT_RT_BASE (such that the IRQ
      handler may be preempted between filling the TX FIFO and checking its
      emptiness).
      
      Fix by instead checking whether rx_len has reached zero, which means
      that the transfer has been received in full.  This is also more
      efficient as it avoids one bus read access per interrupt.  Note that
      bcm2835_spi_transfer_one_poll() likewise uses rx_len to determine
      whether the transfer has finished.
      Signed-off-by: default avatarLukas Wunner <lukas@wunner.de>
      Fixes: e34ff011 ("spi: bcm2835: move to the transfer_one driver model")
      Cc: stable@vger.kernel.org # v4.1+
      Cc: Mathias Duckeck <m.duckeck@kunbus.de>
      Cc: Frank Pavlic <f.pavlic@kunbus.de>
      Cc: Martin Sperl <kernel@martin.sperl.org>
      Cc: Noralf Trønnes <noralf@tronnes.org>
      Signed-off-by: default avatarMark Brown <broonie@kernel.org>
      56c17234
  5. 27 Nov, 2018 4 commits
  6. 23 Nov, 2018 1 commit
  7. 17 Nov, 2018 1 commit
  8. 16 Nov, 2018 2 commits
  9. 15 Nov, 2018 1 commit
  10. 14 Nov, 2018 2 commits
  11. 13 Nov, 2018 9 commits
  12. 07 Nov, 2018 3 commits
  13. 06 Nov, 2018 1 commit