1. 27 Jul, 2019 7 commits
    • Herbert Xu's avatar
      padata: Replace delayed timer with immediate workqueue in padata_reorder · 6fc4dbcf
      Herbert Xu authored
      The function padata_reorder will use a timer when it cannot progress
      while completed jobs are outstanding (pd->reorder_objects > 0).  This
      is suboptimal as if we do end up using the timer then it would have
      introduced a gratuitous delay of one second.
      
      In fact we can easily distinguish between whether completed jobs
      are outstanding and whether we can make progress.  All we have to
      do is look at the next pqueue list.
      
      This patch does that by replacing pd->processed with pd->cpu so
      that the next pqueue is more accessible.
      
      A work queue is used instead of the original try_again to avoid
      hogging the CPU.
      
      Note that we don't bother removing the work queue in
      padata_flush_queues because the whole premise is broken.  You
      cannot flush async crypto requests so it makes no sense to even
      try.  A subsequent patch will fix it by replacing it with a ref
      counting scheme.
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      6fc4dbcf
    • Arnd Bergmann's avatar
      crypto: aegis - fix badly optimized clang output · 97ac82d9
      Arnd Bergmann authored
      Clang sometimes makes very different inlining decisions from gcc.
      In case of the aegis crypto algorithms, it decides to turn the innermost
      primitives (and, xor, ...) into separate functions but inline most of
      the rest.
      
      This results in a huge amount of variables spilled on the stack, leading
      to rather slow execution as well as kernel stack usage beyond the 32-bit
      warning limit when CONFIG_KASAN is enabled:
      
      crypto/aegis256.c:123:13: warning: stack frame size of 648 bytes in function 'crypto_aegis256_encrypt_chunk' [-Wframe-larger-than=]
      crypto/aegis256.c:366:13: warning: stack frame size of 1264 bytes in function 'crypto_aegis256_crypt' [-Wframe-larger-than=]
      crypto/aegis256.c:187:13: warning: stack frame size of 656 bytes in function 'crypto_aegis256_decrypt_chunk' [-Wframe-larger-than=]
      crypto/aegis128l.c:135:13: warning: stack frame size of 832 bytes in function 'crypto_aegis128l_encrypt_chunk' [-Wframe-larger-than=]
      crypto/aegis128l.c:415:13: warning: stack frame size of 1480 bytes in function 'crypto_aegis128l_crypt' [-Wframe-larger-than=]
      crypto/aegis128l.c:218:13: warning: stack frame size of 848 bytes in function 'crypto_aegis128l_decrypt_chunk' [-Wframe-larger-than=]
      crypto/aegis128.c:116:13: warning: stack frame size of 584 bytes in function 'crypto_aegis128_encrypt_chunk' [-Wframe-larger-than=]
      crypto/aegis128.c:351:13: warning: stack frame size of 1064 bytes in function 'crypto_aegis128_crypt' [-Wframe-larger-than=]
      crypto/aegis128.c:177:13: warning: stack frame size of 592 bytes in function 'crypto_aegis128_decrypt_chunk' [-Wframe-larger-than=]
      
      Forcing the primitives to all get inlined avoids the issue and the
      resulting code is similar to what gcc produces.
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Acked-by: default avatarNick Desaulniers <ndesaulniers@google.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      97ac82d9
    • Chuhong Yuan's avatar
      crypto: ccp - Replace dma_pool_alloc + memset with dma_pool_zalloc · bfb5eb08
      Chuhong Yuan authored
      Use dma_pool_zalloc instead of using dma_pool_alloc to allocate
      memory and then zeroing it with memset 0.
      This simplifies the code.
      Signed-off-by: default avatarChuhong Yuan <hslester96@gmail.com>
      Acked-by: default avatarGary R Hook <gary.hook@amd.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      bfb5eb08
    • Vakul Garg's avatar
      crypto: caam/qi2 - Increase napi budget to process more caam responses · 6ed01097
      Vakul Garg authored
      While running ipsec processing for traffic through multiple network
      interfaces, it is observed that caam driver gets less time to poll
      responses from caam block compared to ethernet driver. This is because
      ethernet driver has as many napi instances per cpu as the number of
      ethernet interfaces in system. Therefore, caam driver's napi executes
      lesser than the ethernet driver's napi instances. This results in
      situation that we end up submitting more requests to caam (which it is
      able to finish off quite fast), but don't dequeue the responses at same
      rate. This makes caam response FQs bloat with large number of frames. In
      some situations, it makes kernel crash due to out-of-memory. To prevent
      it We increase the napi budget of dpseci driver to a big value so that
      caam driver is able to drain its response queues at enough rate.
      Signed-off-by: default avatarVakul Garg <vakul.garg@nxp.com>
      Reviewed-by: default avatarHoria Geantă <horia.geanta@nxp.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      6ed01097
    • Anson Huang's avatar
      hwrng: mxc-rnga - use devm_platform_ioremap_resource() to simplify code · f2f1d75a
      Anson Huang authored
      Use the new helper devm_platform_ioremap_resource() which wraps the
      platform_get_resource() and devm_ioremap_resource() together, to
      simplify the code.
      Signed-off-by: default avatarAnson Huang <Anson.Huang@nxp.com>
      Reviewed-by: default avatarDong Aisheng <aisheng.dong@nxp.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      f2f1d75a
    • Anson Huang's avatar
      hwrng: imx-rngc - use devm_platform_ioremap_resource() to simplify code · d10d094c
      Anson Huang authored
      Use the new helper devm_platform_ioremap_resource() which wraps the
      platform_get_resource() and devm_ioremap_resource() together, to
      simplify the code.
      Signed-off-by: default avatarAnson Huang <Anson.Huang@nxp.com>
      Acked-by: default avatarArnd Bergmann <arnd@arndb.de>
      Reviewed-by: default avatarDong Aisheng <aisheng.dong@nxp.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      d10d094c
    • Arnd Bergmann's avatar
      crypto: ccp - Reduce maximum stack usage · 72c8117a
      Arnd Bergmann authored
      Each of the operations in ccp_run_cmd() needs several hundred
      bytes of kernel stack. Depending on the inlining, these may
      need separate stack slots that add up to more than the warning
      limit, as shown in this clang based build:
      
      drivers/crypto/ccp/ccp-ops.c:871:12: error: stack frame size of 1164 bytes in function 'ccp_run_aes_cmd' [-Werror,-Wframe-larger-than=]
      static int ccp_run_aes_cmd(struct ccp_cmd_queue *cmd_q, struct ccp_cmd *cmd)
      
      The problem may also happen when there is no warning, e.g. in the
      ccp_run_cmd()->ccp_run_aes_cmd()->ccp_run_aes_gcm_cmd() call chain with
      over 2000 bytes.
      
      Mark each individual function as 'noinline_for_stack' to prevent
      this from happening, and move the calls to the two special cases for aes
      into the top-level function. This will keep the actual combined stack
      usage to the mimimum: 828 bytes for ccp_run_aes_gcm_cmd() and
      at most 524 bytes for each of the other cases.
      
      Fixes: 63b94509 ("crypto: ccp - CCP device driver and interface support")
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      72c8117a
  2. 26 Jul, 2019 33 commits