1. 18 Mar, 2020 10 commits
    • Vladimir Oltean's avatar
      spi: spi-fsl-dspi: Add support for LS1028A · 138f56ef
      Vladimir Oltean authored
      This is similar to the DSPI instantiation on LS1028A, except that:
       - The A-011218 erratum has been fixed, so DMA works
       - The endianness is different, which has implications on XSPI mode
      
      Some benchmarking with the following command:
      
      spidev_test --device /dev/spidev2.0 --bpw 8 --size 256 --cpha --iter 10000000 --speed 20000000
      
      shows that in DMA mode, it can achieve around 2400 kbps, and in XSPI
      mode, the same command goes up to 4700 kbps. This is somewhat to be
      expected, since the DMA buffer size is extremely small at 8 bytes, the
      winner becomes whomever can prepare the buffers for transmission
      quicker, and DMA mode has higher overhead there. So XSPI FIFO mode has
      been chosen as the operating mode for this chip.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Tested-by: default avatarMichael Walle <michael@walle.cc>
      Link: https://lore.kernel.org/r/20200318001603.9650-11-olteanv@gmail.comSigned-off-by: default avatarMark Brown <broonie@kernel.org>
      138f56ef
    • Vladimir Oltean's avatar
      spi: spi-fsl-dspi: Move invariant configs out of dspi_transfer_one_message · 5b342c5a
      Vladimir Oltean authored
      The operating mode (DMA, XSPI, EOQ) is not going to change across the
      lifetime of the device. So it makes no sense to keep writing to SPI_RSER
      on each message. Move this configuration to dspi_init instead.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Tested-by: default avatarMichael Walle <michael@walle.cc>
      Link: https://lore.kernel.org/r/20200318001603.9650-10-olteanv@gmail.comSigned-off-by: default avatarMark Brown <broonie@kernel.org>
      5b342c5a
    • Vladimir Oltean's avatar
      spi: spi-fsl-dspi: Fix interrupt-less DMA mode taking an XSPI code path · 826b3a6a
      Vladimir Oltean authored
      Interrupts are not necessary for DMA functionality, since the completion
      event is provided by the DMA driver.
      
      But if the driver fails to request the IRQ defined in the device tree,
      it will call dspi_poll which would make the driver hang waiting for data
      to become available in the RX FIFO.
      
      Fixes: c55be305 ("spi: spi-fsl-dspi: Use poll mode in case the platform IRQ is missing")
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Tested-by: default avatarMichael Walle <michael@walle.cc>
      Link: https://lore.kernel.org/r/20200318001603.9650-9-olteanv@gmail.comSigned-off-by: default avatarMark Brown <broonie@kernel.org>
      826b3a6a
    • Vladimir Oltean's avatar
      spi: spi-fsl-dspi: Avoid NULL pointer in dspi_slave_abort for non-DMA mode · 3d6224e6
      Vladimir Oltean authored
      The driver does not create the dspi->dma structure unless operating in
      DSPI_DMA_MODE, so it makes sense to check for that.
      
      Fixes: f4b32390 ("spi: Introduce dspi_slave_abort() function for NXP's dspi SPI driver")
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Tested-by: default avatarMichael Walle <michael@walle.cc>
      Link: https://lore.kernel.org/r/20200318001603.9650-8-olteanv@gmail.comSigned-off-by: default avatarMark Brown <broonie@kernel.org>
      3d6224e6
    • Vladimir Oltean's avatar
      spi: spi-fsl-dspi: Replace interruptible wait queue with a simple completion · 4f5ee75e
      Vladimir Oltean authored
      Currently the driver puts the process in interruptible sleep waiting for
      the interrupt train to finish transfer to/from the tx_buf and rx_buf.
      
      But exiting the process with ctrl-c may make the kernel panic: the
      wait_event_interruptible call will return -ERESTARTSYS, which a proper
      driver implementation is perhaps supposed to handle, but nonetheless
      this one doesn't, and aborts the transfer altogether.
      
      Actually when the task is interrupted, there is still a high chance that
      the dspi_interrupt is still triggering. And if dspi_transfer_one_message
      returns execution all the way to the spi_device driver, that can free
      the spi_message and spi_transfer structures, leaving the interrupts to
      access a freed tx_buf and rx_buf.
      
      hexdump -C /dev/mtd0
      00000000  00 75 68 75 0a ff ff ff  ff ff ff ff ff ff ff ff
      |.uhu............|
      00000010  ff ff ff ff ff ff ff ff  ff ff ff ff ff ff ff ff
      |................|
      *
      ^C[   38.495955] fsl-dspi 2120000.spi: Waiting for transfer to complete failed!
      [   38.503097] spi_master spi2: failed to transfer one message from queue
      [   38.509729] Unable to handle kernel paging request at virtual address ffff800095ab3377
      [   38.517676] Mem abort info:
      [   38.520474]   ESR = 0x96000045
      [   38.523533]   EC = 0x25: DABT (current EL), IL = 32 bits
      [   38.528861]   SET = 0, FnV = 0
      [   38.531921]   EA = 0, S1PTW = 0
      [   38.535067] Data abort info:
      [   38.537952]   ISV = 0, ISS = 0x00000045
      [   38.541797]   CM = 0, WnR = 1
      [   38.544771] swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000000082621000
      [   38.551494] [ffff800095ab3377] pgd=00000020fffff003, p4d=00000020fffff003, pud=0000000000000000
      [   38.560229] Internal error: Oops: 96000045 [#1] PREEMPT SMP
      [   38.565819] Modules linked in:
      [   38.568882] CPU: 0 PID: 2729 Comm: hexdump Not tainted 5.6.0-rc4-next-20200306-00052-gd8730cdc8a0b-dirty #193
      [   38.578834] Hardware name: Kontron SMARC-sAL28 (Single PHY) on SMARC Eval 2.0 carrier (DT)
      [   38.587129] pstate: 20000085 (nzCv daIf -PAN -UAO)
      [   38.591941] pc : ktime_get_real_ts64+0x3c/0x110
      [   38.596487] lr : spi_take_timestamp_pre+0x40/0x90
      [   38.601203] sp : ffff800010003d90
      [   38.604525] x29: ffff800010003d90 x28: ffff80001200e000
      [   38.609854] x27: ffff800011da9000 x26: ffff002079c40400
      [   38.615184] x25: ffff8000117fe018 x24: ffff800011daa1a0
      [   38.620513] x23: ffff800015ab3860 x22: ffff800095ab3377
      [   38.625841] x21: 000000000000146e x20: ffff8000120c3000
      [   38.631170] x19: ffff0020795f6e80 x18: ffff800011da9948
      [   38.636498] x17: 0000000000000000 x16: 0000000000000000
      [   38.641826] x15: ffff800095ab3377 x14: 0720072007200720
      [   38.647155] x13: 0720072007200765 x12: 0775076507750771
      [   38.652483] x11: 0720076d076f0772 x10: 0000000000000040
      [   38.657812] x9 : ffff8000108e2100 x8 : ffff800011dcabe8
      [   38.663139] x7 : 0000000000000000 x6 : ffff800015ab3a60
      [   38.668468] x5 : 0000000007200720 x4 : ffff800095ab3377
      [   38.673796] x3 : 0000000000000000 x2 : 0000000000000ab0
      [   38.679125] x1 : ffff800011daa000 x0 : 0000000000000026
      [   38.684454] Call trace:
      [   38.686905]  ktime_get_real_ts64+0x3c/0x110
      [   38.691100]  spi_take_timestamp_pre+0x40/0x90
      [   38.695470]  dspi_fifo_write+0x58/0x2c0
      [   38.699315]  dspi_interrupt+0xbc/0xd0
      [   38.702987]  __handle_irq_event_percpu+0x78/0x2c0
      [   38.707706]  handle_irq_event_percpu+0x3c/0x90
      [   38.712161]  handle_irq_event+0x4c/0xd0
      [   38.716008]  handle_fasteoi_irq+0xbc/0x170
      [   38.720115]  generic_handle_irq+0x2c/0x40
      [   38.724135]  __handle_domain_irq+0x68/0xc0
      [   38.728243]  gic_handle_irq+0xc8/0x160
      [   38.732000]  el1_irq+0xb8/0x180
      [   38.735149]  spi_nor_spimem_read_data+0xe0/0x140
      [   38.739779]  spi_nor_read+0xc4/0x120
      [   38.743364]  mtd_read_oob+0xa8/0xc0
      [   38.746860]  mtd_read+0x4c/0x80
      [   38.750007]  mtdchar_read+0x108/0x2a0
      [   38.753679]  __vfs_read+0x20/0x50
      [   38.757002]  vfs_read+0xa4/0x190
      [   38.760237]  ksys_read+0x6c/0xf0
      [   38.763471]  __arm64_sys_read+0x20/0x30
      [   38.767319]  el0_svc_common.constprop.3+0x90/0x160
      [   38.772125]  do_el0_svc+0x28/0x90
      [   38.775449]  el0_sync_handler+0x118/0x190
      [   38.779468]  el0_sync+0x140/0x180
      [   38.782793] Code: 91000294 1400000f d50339bf f9405e80 (f90002c0)
      [   38.788910] ---[ end trace 55da560db4d6bef7 ]---
      [   38.793540] Kernel panic - not syncing: Fatal exception in interrupt
      [   38.799914] SMP: stopping secondary CPUs
      [   38.803849] Kernel Offset: disabled
      [   38.807344] CPU features: 0x10002,20006008
      [   38.811451] Memory Limit: none
      [   38.814513] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---
      
      So it is clear that the "interruptible" part isn't handled correctly.
      When the process receives a signal, one could either attempt a clean
      abort (which appears to be difficult with this hardware) or just keep
      restarting the sleep until the wait queue really completes. But checking
      in a loop for -ERESTARTSYS is a bit too complicated for this driver, so
      just make the sleep uninterruptible, to avoid all that nonsense.
      
      The wait queue was actually restructured as a completion, after polling
      other drivers for the most "popular" approach.
      
      Fixes: 349ad66c ("spi:Add Freescale DSPI driver for Vybrid VF610 platform")
      Reported-by: default avatarMichael Walle <michael@walle.cc>
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Tested-by: default avatarMichael Walle <michael@walle.cc>
      Link: https://lore.kernel.org/r/20200318001603.9650-7-olteanv@gmail.comSigned-off-by: default avatarMark Brown <broonie@kernel.org>
      4f5ee75e
    • Vladimir Oltean's avatar
      spi: spi-fsl-dspi: Protect against races on dspi->words_in_flight · 0dedf901
      Vladimir Oltean authored
      dspi->words_in_flight is a variable populated in the *_write functions
      and used in the dspi_fifo_read function. It is also used in
      dspi_fifo_write, immediately after transmission, to update the
      message->actual_length variable used by higher layers such as spi-mem
      for integrity checking.
      
      But it may happen that the IRQ which calls dspi_fifo_read to be
      triggered before the updating of message->actual_length takes place. In
      that case, dspi_fifo_read will decrement dspi->words_in_flight to -1,
      and that will cause an invalid modification of message->actual_length.
      
      For that, we make the simplest fix possible: to not decrement the actual
      shared variable in dspi->words_in_flight from dspi_fifo_read, but
      actually a copy of it which is on stack.
      
      But even if dspi_fifo_read from the next IRQ does not interfere with the
      dspi_fifo_write of the current chunk, the *next* dspi_fifo_write still
      can. So we must assume that everything after the last write to the TX
      FIFO can be preempted by the "TX complete" IRQ, and the dspi_fifo_write
      function must be safe against that. This means refactoring the 2
      flavours of FIFO writes (for EOQ and XSPI) such that the calculation of
      the number of words to be written is common and happens a priori. This
      way, the code for updating the message->actual_length variable works
      with a copy and not with the volatile dspi->words_in_flight.
      
      After some interior debate, the dspi->progress variable used for
      software timestamping was *not* backed up against preemption in a copy
      on stack. Because if preemption does occur between
      spi_take_timestamp_pre and spi_take_timestamp_post, there's really no
      point in trying to save anything. The first-in-time
      spi_take_timestamp_post call with a dspi->progress higher than the
      requested xfer->ptp_sts_word_post will trigger xfer->timestamped = true
      anyway and will close the deal.
      
      To understand the above a bit better, consider a transfer with
      xfer->ptp_sts_word_pre = xfer->ptp_sts_word_post = 3, and
      xfer->bits_per_words = 8 (so byte 3 needs to be timestamped). The DSPI
      controller timestamps in chunks of 4 bytes at a time, and preemption
      occurs in the middle of timestamping the first chunk:
      
        spi_take_timestamp_pre(0)
          .
          . (preemption)
          .
          . spi_take_timestamp_pre(4)
          .
          . spi_take_timestamp_post(7)
          .
        spi_take_timestamp_post(3)
      
      So the reason I'm not bothering to back up dspi->progress for that
      spi_take_timestamp_post(3) is that spi_take_timestamp_post(7) is going
      to (a) be more honest, (b) provide better accuracy and (c) already
      render the spi_take_timestamp_post(3) into a noop by setting
      xfer->timestamped = true anyway.
      
      Fixes: d59c90a2 ("spi: spi-fsl-dspi: Convert TCFQ users to XSPI FIFO mode")
      Reported-by: default avatarMichael Walle <michael@walle.cc>
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Tested-by: default avatarMichael Walle <michael@walle.cc>
      Link: https://lore.kernel.org/r/20200318001603.9650-6-olteanv@gmail.comSigned-off-by: default avatarMark Brown <broonie@kernel.org>
      0dedf901
    • Vladimir Oltean's avatar
      spi: spi-fsl-dspi: Avoid reading more data than written in EOQ mode · c6c1e30a
      Vladimir Oltean authored
      If dspi->words_in_flight is populated with the hardware FIFO size,
      then in dspi_fifo_read it will attempt to read more data at the end of a
      buffer that is not a multiple of 16 bytes in length. It will probably
      time out attempting to do so.
      
      So limit the num_fifo_entries variable to the actual number of FIFO
      entries that is going to be used.
      
      Fixes: d59c90a2 ("spi: spi-fsl-dspi: Convert TCFQ users to XSPI FIFO mode")
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Tested-by: default avatarMichael Walle <michael@walle.cc>
      Link: https://lore.kernel.org/r/20200318001603.9650-5-olteanv@gmail.comSigned-off-by: default avatarMark Brown <broonie@kernel.org>
      c6c1e30a
    • Vladimir Oltean's avatar
      spi: spi-fsl-dspi: Fix bits-per-word acceleration in DMA mode · a957499b
      Vladimir Oltean authored
      In DMA mode, dspi_setup_accel does not get called, which results in the
      dspi->oper_word_size variable (which is used by dspi_dma_xfer) to not be
      initialized properly.
      
      Because oper_word_size is zero, a few calculations end up being
      incorrect, and the DMA transfer eventually times out instead of sending
      anything on the wire.
      
      Set up native transfers (or 8-on-16 acceleration) using dspi_setup_accel
      for DMA mode too.
      
      Also take the opportunity and simplify the DMA buffer handling a little
      bit.
      
      Fixes: 6c1c26ec ("spi: spi-fsl-dspi: Accelerate transfers using larger word size if possible")
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Tested-by: default avatarMichael Walle <michael@walle.cc>
      Link: https://lore.kernel.org/r/20200318001603.9650-4-olteanv@gmail.comSigned-off-by: default avatarMark Brown <broonie@kernel.org>
      a957499b
    • Vladimir Oltean's avatar
      spi: spi-fsl-dspi: Fix little endian access to PUSHR CMD and TXDATA · 671ffde1
      Vladimir Oltean authored
      In XSPI mode, the 32-bit PUSHR register can be written to separately:
      the higher 16 bits are for commands and the lower 16 bits are for data.
      
      This has nicely been hacked around, by defining a second regmap with a
      width of 16 bits, and effectively splitting a 32-bit register into 2
      16-bit ones, from the perspective of this regmap_pushr.
      
      The problem is the assumption about the controller's endianness. If the
      controller is little endian (such as anything post-LS1046A), then the
      first 2 bytes, in the order imposed by memory layout, will actually hold
      the TXDATA, and the last 2 bytes will hold the CMD.
      
      So take the controller's endianness into account when performing split
      writes to PUSHR. The obvious and simple solution would have been to call
      regmap_get_val_endian(), but that is an internal regmap function and we
      don't want to change regmap just for this. Therefore, we just re-read
      the "big-endian" device tree property.
      
      Fixes: 58ba07ec ("spi: spi-fsl-dspi: Add support for XSPI mode registers")
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Tested-by: default avatarMichael Walle <michael@walle.cc>
      Link: https://lore.kernel.org/r/20200318001603.9650-3-olteanv@gmail.comSigned-off-by: default avatarMark Brown <broonie@kernel.org>
      671ffde1
    • Vladimir Oltean's avatar
      spi: spi-fsl-dspi: Don't access reserved fields in SPI_MCR · 4fcc7c22
      Vladimir Oltean authored
      The SPI_MCR_PCSIS macro assumes that the controller has a number of chip
      select signals equal to 6. That is not always the case, but actually is
      described through the driver-specific "spi-num-chipselects" device tree
      binding. LS1028A for example only has 4 chip selects.
      
      Don't write to the upper bits of the PCSIS field, which are reserved in
      the reference manual.
      
      Fixes: 349ad66c ("spi:Add Freescale DSPI driver for Vybrid VF610 platform")
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Tested-by: default avatarMichael Walle <michael@walle.cc>
      Link: https://lore.kernel.org/r/20200318001603.9650-2-olteanv@gmail.comSigned-off-by: default avatarMark Brown <broonie@kernel.org>
      4fcc7c22
  2. 17 Mar, 2020 1 commit
  3. 13 Mar, 2020 2 commits
  4. 12 Mar, 2020 12 commits
  5. 11 Mar, 2020 6 commits
  6. 10 Mar, 2020 7 commits
  7. 06 Mar, 2020 1 commit
  8. 05 Mar, 2020 1 commit
    • Mark Brown's avatar
      Merge series "TCFQ to XSPI migration for NXP DSPI driver" from Vladimir Oltean <olteanv@gmail.com> · 4a8ee2ab
      Mark Brown authored
      Vladimir Oltean <vladimir.oltean@nxp.com>:
      
      From: Vladimir Oltean <vladimir.oltean@nxp.com>
      
      This series aims to remove the most inefficient transfer method from the
      NXP DSPI driver.
      
      TCFQ (Transfer Complete Flag) mode works by transferring one word,
      waiting for its TX confirmation interrupt (or polling on the equivalent
      status bit), sending the next word, etc, until the buffer is complete.
      
      The issue with this mode is that it's fundamentally incompatible with
      any sort of batching such as writing to a FIFO. But actually, due to
      previous patchset ("Compatible string consolidation for NXP DSPI driver"):
      
      https://patchwork.kernel.org/cover/11414593/
      
      all existing users of TCFQ mode today already support a more advanced
      feature set, in the form of XSPI (extended SPI). XSPI brings 2 extra
      features:
      
      - Word sizes up to 32 bits. This is sub-utilized today, and acceleration
        of smaller-than-32 bpw values is provided.
      - "Command cycling", basically the ability to write multiple words in a
        row and receiving an interrupt only after the completion of the last
        one. This is what enables us to make use of the full FIFO depth of
        this controller.
      
      Series was tested on the NXP LS1021A-TSN and LS1043A-RDB boards, both
      functionally as well as from a performance standpoint.
      
      The command used to benchmark the increased throughput was:
      
      spidev_test --device /dev/spidev1.0 --bpw 8 --size 256 --cpha --iter 10000000 --speed 20000000
      
      where spidev1.0 is a dummy spidev node, using a chip select that no
      peripheral responds to.
      
      On LS1021A, which has a 4-entry-deep FIFO and a less powerful CPU, the
      performance increase brought by this patchset is from 2700 kbps to 5800
      kbps.
      
      On LS1043A, which has a 16-entry-deep FIFO and a more powerful CPU, the
      performance increases from 4100 kbps to 13700 kbps.
      
      On average, SPI software timestamping is not adversely affected by the
      extra batching, due to the extra patches.
      
      There is one extra patch which clarifies why the TCFQ users were not
      converted to the "other" mode in this driver that makes use of the FIFO,
      which would be EOQ mode.
      
      My request to the many people on CC (known users and/or contributors) is
      to give this series a test to ensure there are no regressions, and for
      the Coldfire maintainers to clarify whether the EOQ limitation is
      acceptable for them in the long run.
      
      Vladimir Oltean (12):
        spi: spi-fsl-dspi: Simplify bytes_per_word gymnastics
        spi: spi-fsl-dspi: Remove unused chip->void_write_data
        spi: spi-fsl-dspi: Don't mask off undefined bits
        spi: spi-fsl-dspi: Add comments around dspi_pop_tx and dspi_push_rx
          functions
        spi: spi-fsl-dspi: Rename fifo_{read,write} and {tx,cmd}_fifo_write
        spi: spi-fsl-dspi: Implement .max_message_size method for EOQ mode
        spi: Do spi_take_timestamp_pre for as many times as necessary
        spi: spi-fsl-dspi: Convert TCFQ users to XSPI FIFO mode
        spi: spi-fsl-dspi: Accelerate transfers using larger word size if
          possible
        spi: spi-fsl-dspi: Optimize dspi_setup_accel for lowest interrupt
          count
        spi: spi-fsl-dspi: Use EOQ for last word in buffer even for XSPI mode
        spi: spi-fsl-dspi: Take software timestamp in dspi_fifo_write
      
       drivers/spi/spi-fsl-dspi.c | 421 ++++++++++++++++++++++++-------------
       drivers/spi/spi.c          |  19 +-
       include/linux/spi/spi.h    |   3 +-
       3 files changed, 288 insertions(+), 155 deletions(-)
      
      --
      2.17.1
      4a8ee2ab