1. 12 Mar, 2020 10 commits
  2. 11 Mar, 2020 6 commits
  3. 10 Mar, 2020 7 commits
  4. 06 Mar, 2020 1 commit
  5. 05 Mar, 2020 16 commits
    • Mark Brown's avatar
      Merge series "TCFQ to XSPI migration for NXP DSPI driver" from Vladimir Oltean <olteanv@gmail.com> · 4a8ee2ab
      Mark Brown authored
      Vladimir Oltean <vladimir.oltean@nxp.com>:
      
      From: Vladimir Oltean <vladimir.oltean@nxp.com>
      
      This series aims to remove the most inefficient transfer method from the
      NXP DSPI driver.
      
      TCFQ (Transfer Complete Flag) mode works by transferring one word,
      waiting for its TX confirmation interrupt (or polling on the equivalent
      status bit), sending the next word, etc, until the buffer is complete.
      
      The issue with this mode is that it's fundamentally incompatible with
      any sort of batching such as writing to a FIFO. But actually, due to
      previous patchset ("Compatible string consolidation for NXP DSPI driver"):
      
      https://patchwork.kernel.org/cover/11414593/
      
      all existing users of TCFQ mode today already support a more advanced
      feature set, in the form of XSPI (extended SPI). XSPI brings 2 extra
      features:
      
      - Word sizes up to 32 bits. This is sub-utilized today, and acceleration
        of smaller-than-32 bpw values is provided.
      - "Command cycling", basically the ability to write multiple words in a
        row and receiving an interrupt only after the completion of the last
        one. This is what enables us to make use of the full FIFO depth of
        this controller.
      
      Series was tested on the NXP LS1021A-TSN and LS1043A-RDB boards, both
      functionally as well as from a performance standpoint.
      
      The command used to benchmark the increased throughput was:
      
      spidev_test --device /dev/spidev1.0 --bpw 8 --size 256 --cpha --iter 10000000 --speed 20000000
      
      where spidev1.0 is a dummy spidev node, using a chip select that no
      peripheral responds to.
      
      On LS1021A, which has a 4-entry-deep FIFO and a less powerful CPU, the
      performance increase brought by this patchset is from 2700 kbps to 5800
      kbps.
      
      On LS1043A, which has a 16-entry-deep FIFO and a more powerful CPU, the
      performance increases from 4100 kbps to 13700 kbps.
      
      On average, SPI software timestamping is not adversely affected by the
      extra batching, due to the extra patches.
      
      There is one extra patch which clarifies why the TCFQ users were not
      converted to the "other" mode in this driver that makes use of the FIFO,
      which would be EOQ mode.
      
      My request to the many people on CC (known users and/or contributors) is
      to give this series a test to ensure there are no regressions, and for
      the Coldfire maintainers to clarify whether the EOQ limitation is
      acceptable for them in the long run.
      
      Vladimir Oltean (12):
        spi: spi-fsl-dspi: Simplify bytes_per_word gymnastics
        spi: spi-fsl-dspi: Remove unused chip->void_write_data
        spi: spi-fsl-dspi: Don't mask off undefined bits
        spi: spi-fsl-dspi: Add comments around dspi_pop_tx and dspi_push_rx
          functions
        spi: spi-fsl-dspi: Rename fifo_{read,write} and {tx,cmd}_fifo_write
        spi: spi-fsl-dspi: Implement .max_message_size method for EOQ mode
        spi: Do spi_take_timestamp_pre for as many times as necessary
        spi: spi-fsl-dspi: Convert TCFQ users to XSPI FIFO mode
        spi: spi-fsl-dspi: Accelerate transfers using larger word size if
          possible
        spi: spi-fsl-dspi: Optimize dspi_setup_accel for lowest interrupt
          count
        spi: spi-fsl-dspi: Use EOQ for last word in buffer even for XSPI mode
        spi: spi-fsl-dspi: Take software timestamp in dspi_fifo_write
      
       drivers/spi/spi-fsl-dspi.c | 421 ++++++++++++++++++++++++-------------
       drivers/spi/spi.c          |  19 +-
       include/linux/spi/spi.h    |   3 +-
       3 files changed, 288 insertions(+), 155 deletions(-)
      
      --
      2.17.1
      4a8ee2ab
    • Johan Jonker's avatar
      dt-bindings: spi: spi-rockchip: add description for rk3328 · 6ac12131
      Johan Jonker authored
      The description below is already in use for rk3328.dtsi,
      but was somehow never added to a document, so add
      "rockchip,rk3328-spi", "rockchip,rk3066-spi"
      for spi nodes on a rk3328 platform to spi-rockchip.yaml.
      Signed-off-by: default avatarJohan Jonker <jbx6244@gmail.com>
      Acked-by: default avatarRob Herring <robh@kernel.org>
      Link: https://lore.kernel.org/r/20200304184203.9548-3-jbx6244@gmail.comSigned-off-by: default avatarMark Brown <broonie@kernel.org>
      6ac12131
    • Johan Jonker's avatar
      dt-bindings: spi: spi-rockchip: add description for rk3308 · db7dd939
      Johan Jonker authored
      The description below is already in use for rk3308.dtsi,
      but was somehow never added to a document, so add
      "rockchip,rk3308-spi", "rockchip,rk3066-spi"
      for spi nodes on a rk3308 platform to spi-rockchip.yaml.
      Signed-off-by: default avatarJohan Jonker <jbx6244@gmail.com>
      Acked-by: default avatarRob Herring <robh@kernel.org>
      Link: https://lore.kernel.org/r/20200304184203.9548-2-jbx6244@gmail.comSigned-off-by: default avatarMark Brown <broonie@kernel.org>
      db7dd939
    • Johan Jonker's avatar
      dt-bindings: spi: convert rockchip spi bindings to yaml · 5de04175
      Johan Jonker authored
      Current dts files with 'spi' nodes are manually verified.
      In order to automate this process spi-rockchip.txt
      has to be converted to yaml. In the new setup
      spi-rockchip.yaml will inherit properties from
      spi-controller.yaml.
      
      Add document to MAINTAINERS.
      
      Also rk3188.dtsi, rk3288.dtsi, rk3368.dtsi and rk3399.dtsi
      use an extra fallback string, so change this in the documentation.
      
      Changed:
      "rockchip,rk3188-spi", "rockchip,rk3066-spi"
      "rockchip,rk3288-spi", "rockchip,rk3066-spi"
      "rockchip,rk3368-spi", "rockchip,rk3066-spi"
      "rockchip,rk3399-spi", "rockchip,rk3066-spi"
      Signed-off-by: default avatarJohan Jonker <jbx6244@gmail.com>
      Link: https://lore.kernel.org/r/20200304184203.9548-1-jbx6244@gmail.comSigned-off-by: default avatarMark Brown <broonie@kernel.org>
      5de04175
    • Sascha Hauer's avatar
      spi: spi-fsl-dspi: Make bus-num property optional · 29d2daf2
      Sascha Hauer authored
      The SPI bus number is completely optional to Linux, so make the
      corresponding device tree property optional as well.
      Signed-off-by: default avatarSascha Hauer <s.hauer@pengutronix.de>
      Link: https://lore.kernel.org/r/20200305115546.31814-1-s.hauer@pengutronix.deSigned-off-by: default avatarMark Brown <broonie@kernel.org>
      29d2daf2
    • Adam Ford's avatar
      spi: spi-nxp-fspi: Add support for imx8mm, imx8qxp · c7a1a20e
      Adam Ford authored
      Add support for nxp,imx8qxp-fspi and nxp,imx8mm-fspi do the bindings
      document.
      Signed-off-by: default avatarAdam Ford <aford173@gmail.com>
      
      Link: https://lore.kernel.org/r/20200126140913.2139260-4-aford173@gmail.comSigned-off-by: default avatarMark Brown <broonie@kernel.org>
      c7a1a20e
    • Han Xu's avatar
      spi: spi-nxp-fspi: Enable the Octal Mode in MCR0 · b7461fa5
      Han Xu authored
      Apply patch from NXP upstream repo to
      Enable the octal combination mode in MCR0
      Signed-off-by: default avatarAdam Ford <aford173@gmail.com>
      Signed-off-by: default avatarHan Xu <han.xu@nxp.com>
      
      Link: https://lore.kernel.org/r/20200126140913.2139260-3-aford173@gmail.comSigned-off-by: default avatarMark Brown <broonie@kernel.org>
      b7461fa5
    • Han Xu's avatar
      spi: fspi: dynamically alloc AHB memory · d166a735
      Han Xu authored
      Apply patch from NXP upstream repo to
      dynamically allocate AHB memory as needed.
      Signed-off-by: default avatarAdam Ford <aford173@gmail.com>
      Signed-off-by: default avatarHan Xu <han.xu@nxp.com>
      
      Link: https://lore.kernel.org/r/20200126140913.2139260-2-aford173@gmail.comSigned-off-by: default avatarMark Brown <broonie@kernel.org>
      d166a735
    • Han Xu's avatar
      spi: fspi: enable fspi on imx8qxp and imx8mm · 941be8a7
      Han Xu authored
      Pull in this patch from NXP's upstream repo to
      enable fspi on imx8qxp and imx8mm
      Signed-off-by: default avatarAdam Ford <aford173@gmail.com>
      Signed-off-by: default avatarHan Xu <han.xu@nxp.com>
      
      Link: https://lore.kernel.org/r/20200126140913.2139260-1-aford173@gmail.comSigned-off-by: default avatarMark Brown <broonie@kernel.org>
      941be8a7
    • Vladimir Oltean's avatar
      spi: spi-fsl-dspi: Take software timestamp in dspi_fifo_write · e9bac900
      Vladimir Oltean authored
      Although the SPI system timestamps are supposed to reflect the moment
      that the peripheral has received a word rather than the moment when the
      CPU has enqueued that word to the FIFO, in practice it is easier to just
      record the latter time than the former (with a smaller error).
      
      With the recent migration of TCFQ users from poll back to interrupt mode
      (this time for XSPI FIFO), it's wiser to keep the interrupt latency
      outside of the measurement of the PTP system timestamp itself. If there
      proves to be any constant offset that requires static compensation, that
      can always be added later. So far that does not appear to be the case at
      least on the LS1021A-TSN board, where testing shows that the phc2sys
      offset is able to remain within +/- 200 ns even after 68 hours of
      testing.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Link: https://lore.kernel.org/r/20200304220044.11193-13-olteanv@gmail.comSigned-off-by: default avatarMark Brown <broonie@kernel.org>
      e9bac900
    • Vladimir Oltean's avatar
      spi: spi-fsl-dspi: Use EOQ for last word in buffer even for XSPI mode · ea93ed4c
      Vladimir Oltean authored
      The EOQ mode has a hardware limitation in that it stops the transmission
      (including the deassertion of the chip select signal) once the host CPU
      requests end-of-queue for a particular word in the TX FIFO.
      
      And XSPI mode has a limitation in that we need a separate CMD FIFO entry
      for the last byte in the buffer, where the chip select signal needs to
      be deasserted. It's not a functional limitation, but it's rather clunky
      and the fact that we need to halt the pipeline and write a single entry
      to the TX FIFO whenever a buffer ends brings the throughput down when
      transmitting small buffers.
      
      So the idea here is to use EOQ's limitation in our favor when using XSPI
      mode. Stop special-casing that final word in the buffer, and just kill
      the chip select signal by issuing an EOQ for that last word. Now it can
      be mixed in with all the other words in the current TX FIFO train.
      
      A small trick here is that we still keep using the XSPI-specific
      signaling via the CMDTCFQ interrupt in RSER, and not enabling the EOQ
      interrupt, in order to avoid hardware weirdness (potential races with
      separate interrupts being raised for CMDTCFQ and EOQ for what is in fact
      the end of the same transmission). That is just theoretical, but it's
      good to be cautious, and the EOQ interrupt isn't needed.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Link: https://lore.kernel.org/r/20200304220044.11193-12-olteanv@gmail.comSigned-off-by: default avatarMark Brown <broonie@kernel.org>
      ea93ed4c
    • Vladimir Oltean's avatar
      spi: spi-fsl-dspi: Optimize dspi_setup_accel for lowest interrupt count · 6365504d
      Vladimir Oltean authored
      Currently, a SPI transfer that is not multiple of the highest supported
      word width (e.g. 4 bytes) will be transmitted as follows (assume a
      30-byte buffer transmitted through a 32-bit wide FIFO that is 32 bytes
      deep):
      
       - First 28 bytes are sent as 7 words of 32 bits each
       - Last 2 bytes are sent as 1 word of 16 bits size
      
      But if the dspi_setup_accel function had decided to use a lower
      oper_bits_per_word value (16 instead of 32), there would have been
      enough space in the TX FIFO to fit the entire buffer in one go (15 words
      of 16 bits each).
      
      What we're actually trying to avoid is mixing word sizes within the same
      run with the TX FIFO, since there is an erratum surrounding this, and
      invalid data might get transmitted.
      
      So this patch adds special cases for when the remaining length of the
      buffer can be sent in one go as 8-bit or 16-bit words, otherwise it
      falls back to the standard logic of sending as many bytes as possible at
      the highest oper_bits_per_word value possible.
      
      The benefit is that there will be one less CMDFQ/EOQ interrupt to
      service when the entire buffer is transmitted during a single go, and
      that will improve the overall latency of the transfer.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Link: https://lore.kernel.org/r/20200304220044.11193-11-olteanv@gmail.comSigned-off-by: default avatarMark Brown <broonie@kernel.org>
      6365504d
    • Vladimir Oltean's avatar
      spi: spi-fsl-dspi: Accelerate transfers using larger word size if possible · 6c1c26ec
      Vladimir Oltean authored
      This patch adds logic in the driver to transmit SPI buffers that use
      bits_per_word=8 with a higher bits_per_word count (multiple of 8).
      
      Currently the following (most common) modes are implemented:
       - 8 bits_per_word on 32-bit capable controllers
       - 8 bits_per_word on 16-bit capable controllers
       - 16 bits_per_word on 32-bit capable controllers
      
      Transfers which are not accelerated are transferred with a hardware
      bits_per_word value equal to the one of the SPI transfer.
      
      The difference from just extending bits_per_word=32 at the spi_device
      driver level is that endianness is different - the SPI core wants to
      treat bits_per_word=32 buffers as arrays of u32 (i.e. words in host CPU
      endianness). So to preserve endianness when clumping 8x4 bits into
      32-bit words, one must perform conversion between CPU and standard (big)
      endianness.
      
      All appearances (both on the wire as well as in the buffers presented to
      the peripheral driver) are preserved, just that accesses to the PUSHR
      and POPR registers are now more efficient, since the same number of
      reads/writes can now carry more data (2x more data on TX, 4x more data
      on RX).
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Link: https://lore.kernel.org/r/20200304220044.11193-10-olteanv@gmail.comSigned-off-by: default avatarMark Brown <broonie@kernel.org>
      6c1c26ec
    • Vladimir Oltean's avatar
      spi: spi-fsl-dspi: Convert TCFQ users to XSPI FIFO mode · d59c90a2
      Vladimir Oltean authored
      The Transfer Complete Flag (TCF) interrupt gets raised after each write
      to the TX FIFO (PUSHR) which means that it is not possible to devise a
      transfer procedure that makes full utilization of the FIFO depth (4
      entries on most controllers, 16 entries on some).
      
      On the other hand, XSPI mode has a feature called "command cycling",
      which allows a single TX command to be run for a pre-specified number of
      TX words. When the command cycle ends, the Command Transfer Complete
      Flag bit asserts and raises an interrupt. The advantage in this mode is
      that the TX FIFO can be better utilized (more words can be batched at
      once).
      
      Other changes brought by this patch:
       - The dspi->rx_end variable has been removed, since now the
         dspi_fifo_write function sets up dspi->words_in_flight, so
         dspi_fifo_read knows how much to read without overrunning the RX
         buffer.
       - Stop using poll mode unconditionally for TCFQ mode, since XSPI mode
         is a little less efficient than that, and so, poll mode doesn't bring
         as many improvements for XSPI.
       - Stop relying on the hardware transfer counter (SPI_TCR_GET_TCNT) and
         instead increment the message->actual_length based on the newly
         introduced dspi->words_in_flight variable.
       - The CTARE register is now written in the hotpath instead of just at
         transfer init time, since it contains the DTCP field (transfer
         preload - the counter indicating how many txdata words will follow),
         which is a dynamic value.
      
      Due to the fact that the Chip Select toggling setting is part of the
      command written to the TX FIFO, the ending word of each buffer needs to
      be sent via its own TX command, so that we have a chance to emit a
      1-word command with deasserted PCS.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Link: https://lore.kernel.org/r/20200304220044.11193-9-olteanv@gmail.comSigned-off-by: default avatarMark Brown <broonie@kernel.org>
      d59c90a2
    • Vladimir Oltean's avatar
      spi: Do spi_take_timestamp_pre for as many times as necessary · 6a726824
      Vladimir Oltean authored
      When dealing with a SPI controller driver that is sending more than 1
      byte at once (or the entire buffer at once), and the SPI peripheral
      driver has requested timestamping for a byte in the middle of the
      buffer, we find that spi_take_timestamp_pre never records a "pre"
      timestamp.
      
      This happens because the function currently expects to be called with
      the "progress" argument >= to what the peripheral has requested to be
      timestamped. But clearly there are cases when that isn't going to fly.
      
      And since we can't change the past when we realize that the opportunity
      to take a "pre" timestamp has just passed and there isn't going to be
      another one, the approach taken is to keep recording the "pre" timestamp
      on each call, overwriting the previously recorded one until the "post"
      timestamp is also taken.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Link: https://lore.kernel.org/r/20200304220044.11193-8-olteanv@gmail.comSigned-off-by: default avatarMark Brown <broonie@kernel.org>
      6a726824
    • Vladimir Oltean's avatar
      spi: spi-fsl-dspi: Implement .max_message_size method for EOQ mode · a3185c38
      Vladimir Oltean authored
      When it gets set, End Of Queue Flag halts the DSPI controller and forces
      the chip select signal to deassert.
      
      This operating mode is not ideal, but it is used for the DSPI
      instantiations where there is no other notification from the controller
      that the data in the FIFO has finished transmission. So in practice, it
      means that transmitting buffers larger than the FIFO size will yield
      unpredictable results.
      
      The only controller that operates in EOQ mode is MCF5441X (Coldfire). I
      would say that the way EOQ is used (and documented in the reference
      manual, too) on this chip is incorrect, and I would personally migrate
      it to TCFQ, but that's notably worse in terms of performance (it can
      only use 1 entry of the 16-deep FIFO) and if this limitation didn't
      bother any Coldfire DSPI user so far, it's likely that we just need to
      throw an error for larger buffers to make sure that callers are aware
      their transfers are getting truncated/split.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Link: https://lore.kernel.org/r/20200304220044.11193-7-olteanv@gmail.comSigned-off-by: default avatarMark Brown <broonie@kernel.org>
      a3185c38