1. 02 May, 2023 2 commits
    • Tom Rix's avatar
      nfsd: define exports_proc_ops with CONFIG_PROC_FS · 340086da
      Tom Rix authored
      gcc with W=1 and ! CONFIG_PROC_FS
      fs/nfsd/nfsctl.c:161:30: error: ‘exports_proc_ops’
        defined but not used [-Werror=unused-const-variable=]
        161 | static const struct proc_ops exports_proc_ops = {
            |                              ^~~~~~~~~~~~~~~~
      
      The only use of exports_proc_ops is when CONFIG_PROC_FS
      is defined, so its definition should be likewise conditional.
      Signed-off-by: default avatarTom Rix <trix@redhat.com>
      Reviewed-by: default avatarJeff Layton <jlayton@kernel.org>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      340086da
    • Ard Biesheuvel's avatar
      SUNRPC: Avoid relying on crypto API to derive CBC-CTS output IV · af97b7df
      Ard Biesheuvel authored
      Scott reports SUNRPC self-test failures regarding the output IV on arm64
      when using the SIMD accelerated implementation of AES in CBC mode with
      ciphertext stealing ("cts(cbc(aes))" in crypto API speak).
      
      These failures are the result of the fact that, while RFC 3962 does
      specify what the output IV should be and includes test vectors for it,
      the general concept of an output IV is poorly defined, and generally,
      not specified by the various algorithms implemented by the crypto API.
      Only algorithms that support transparent chaining (e.g., CBC mode on a
      block boundary) have requirements on the output IV, but ciphertext
      stealing (CTS) is fundamentally about how to encapsulate CBC in a way
      where the length of the entire message may not be an integral multiple
      of the cipher block size, and the concept of an output IV does not exist
      here because it has no defined purpose past the end of the message.
      
      The generic CTS template takes advantage of this chaining capability of
      the CBC implementations, and as a result, happens to return an output
      IV, simply because it passes its IV buffer directly to the encapsulated
      CBC implementation, which operates on full blocks only, and always
      returns an IV. This output IV happens to match how RFC 3962 defines it,
      even though the CTS template itself does not contain any output IV logic
      whatsoever, and, for this reason, lacks any test vectors that exercise
      this accidental output IV generation.
      
      The arm64 SIMD implementation of cts(cbc(aes)) does not use the generic
      CTS template at all, but instead, implements the CBC mode and ciphertext
      stealing directly, and therefore does not encapsule a CBC implementation
      that returns an output IV in the same way. The arm64 SIMD implementation
      complies with the specification and passes all internal tests, but when
      invoked by the SUNRPC code, fails to produce the expected output IV and
      causes its selftests to fail.
      
      Given that the output IV is defined as the penultimate block (where the
      final block may smaller than the block size), we can quite easily derive
      it in the caller by copying the appropriate slice of ciphertext after
      encryption.
      
      Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
      Cc: Anna Schumaker <anna@kernel.org>
      Cc: Chuck Lever <chuck.lever@oracle.com>
      Cc: Jeff Layton <jlayton@kernel.org>
      Reported-by: default avatarScott Mayhew <smayhew@redhat.com>
      Tested-by: default avatarScott Mayhew <smayhew@redhat.com>
      Signed-off-by: default avatarArd Biesheuvel <ardb@kernel.org>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      af97b7df
  2. 27 Apr, 2023 7 commits
    • Chuck Lever's avatar
      NFSD: Handle new xprtsec= export option · 9280c577
      Chuck Lever authored
      Enable administrators to require clients to use transport layer
      security when accessing particular exports.
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      9280c577
    • Chuck Lever's avatar
      SUNRPC: Support TLS handshake in the server-side TCP socket code · b3cbf98e
      Chuck Lever authored
      This patch adds opportunitistic RPC-with-TLS to the Linux in-kernel
      NFS server. If the client requests RPC-with-TLS and the user space
      handshake agent is running, the server will set up a TLS session.
      
      There are no policy settings yet. For example, the server cannot
      yet require the use of RPC-with-TLS to access its data.
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      b3cbf98e
    • Chuck Lever's avatar
      NFSD: Clean up xattr memory allocation flags · 22b620ec
      Chuck Lever authored
      Tetsuo Handa points out:
      > Since GFP_KERNEL is "GFP_NOFS | __GFP_FS", usage like
      > "GFP_KERNEL | GFP_NOFS" does not make sense.
      
      The original intent was to hold the inode lock while estimating
      the buffer requirements for the requested information. Frank van
      der Linden, the author of NFSD's xattr code, says:
      
      > ... you need inode_lock to get an atomic view of an xattr. Since
      > both nfsd_getxattr and nfsd_listxattr to the standard trick of
      > querying the xattr length with a NULL buf argument (just getting
      > the length back), allocating the right buffer size, and then
      > querying again, they need to hold the inode lock to avoid having
      > the xattr changed from under them while doing that.
      >
      > From that then flows the requirement that GFP_FS could cause
      > problems while holding i_rwsem, so I added GFP_NOFS.
      
      However, Dave Chinner states:
      > You can do GFP_KERNEL allocations holding the i_rwsem just fine.
      > All that it requires is the caller holds a reference to the
      > inode ...
      
      Since these code paths acquire a dentry, they do indeed hold a
      reference. It is therefore safe to use GFP_KERNEL for these memory
      allocations. In particular, that's what this code is already doing;
      but now the C source code looks sane too.
      
      At a later time we can revisit in order to remove the inode lock in
      favor of simply retrying if the estimated buffer size is too small.
      Reported-by: default avatarTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      22b620ec
    • Dai Ngo's avatar
      NFSD: Fix problem of COMMIT and NFS4ERR_DELAY in infinite loop · 147abcac
      Dai Ngo authored
      The following request sequence to the same file causes the NFS client and
      server getting into an infinite loop with COMMIT and NFS4ERR_DELAY:
      
      OPEN
      REMOVE
      WRITE
      COMMIT
      
      Problem reported by recall11, recall12, recall14, recall20, recall22,
      recall40, recall42, recall48, recall50 of nfstest suite.
      
      This patch restores the handling of race condition in nfsd_file_do_acquire
      with unlink to that prior of the regression.
      
      Fixes: ac3a2585 ("nfsd: rework refcounting in filecache")
      Signed-off-by: default avatarDai Ngo <dai.ngo@oracle.com>
      Reviewed-by: default avatarJeff Layton <jlayton@kernel.org>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      147abcac
    • Chuck Lever's avatar
      SUNRPC: Clear rq_xid when receiving a new RPC Call · 695bc1f3
      Chuck Lever authored
      This is an eye-catcher for tracepoints that record the XID: it means
      svc_rqst() has not received a full RPC Call with an XID yet.
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      695bc1f3
    • Chuck Lever's avatar
      SUNRPC: Recognize control messages in server-side TCP socket code · 5e052dda
      Chuck Lever authored
      To support kTLS, the server-side TCP socket receive path needs to
      watch for CMSGs.
      Acked-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      5e052dda
    • Chuck Lever's avatar
      SUNRPC: Be even lazier about releasing pages · 6a0cdf56
      Chuck Lever authored
      A single RPC transaction that touches only a couple of pages means
      rq_pvec will not be even close to full in svc_xpt_release(). This is
      a common case.
      
      Instead, just leave the pages in rq_pvec until it is completely
      full. This improves the efficiency of the batch release mechanism
      on workloads that involve small RPC messages.
      
      The rq_pvec is also fully emptied just before thread exit.
      Reviewed-by: default avatarCalum Mackay <calum.mackay@oracle.com>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      6a0cdf56
  3. 26 Apr, 2023 26 commits
  4. 25 Apr, 2023 5 commits
    • wuych's avatar
      net: phy: marvell-88x2222: remove unnecessary (void*) conversions · 28b17f62
      wuych authored
      Pointer variables of void * type do not require type cast.
      Signed-off-by: default avatarwuych <yunchuan@nfschina.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      28b17f62
    • Kuniyuki Iwashima's avatar
      tcp/udp: Fix memleaks of sk and zerocopy skbs with TX timestamp. · 50749f2d
      Kuniyuki Iwashima authored
      syzkaller reported [0] memory leaks of an UDP socket and ZEROCOPY
      skbs.  We can reproduce the problem with these sequences:
      
        sk = socket(AF_INET, SOCK_DGRAM, 0)
        sk.setsockopt(SOL_SOCKET, SO_TIMESTAMPING, SOF_TIMESTAMPING_TX_SOFTWARE)
        sk.setsockopt(SOL_SOCKET, SO_ZEROCOPY, 1)
        sk.sendto(b'', MSG_ZEROCOPY, ('127.0.0.1', 53))
        sk.close()
      
      sendmsg() calls msg_zerocopy_alloc(), which allocates a skb, sets
      skb->cb->ubuf.refcnt to 1, and calls sock_hold().  Here, struct
      ubuf_info_msgzc indirectly holds a refcnt of the socket.  When the
      skb is sent, __skb_tstamp_tx() clones it and puts the clone into
      the socket's error queue with the TX timestamp.
      
      When the original skb is received locally, skb_copy_ubufs() calls
      skb_unclone(), and pskb_expand_head() increments skb->cb->ubuf.refcnt.
      This additional count is decremented while freeing the skb, but struct
      ubuf_info_msgzc still has a refcnt, so __msg_zerocopy_callback() is
      not called.
      
      The last refcnt is not released unless we retrieve the TX timestamped
      skb by recvmsg().  Since we clear the error queue in inet_sock_destruct()
      after the socket's refcnt reaches 0, there is a circular dependency.
      If we close() the socket holding such skbs, we never call sock_put()
      and leak the count, sk, and skb.
      
      TCP has the same problem, and commit e0c8bccd ("net: stream:
      purge sk_error_queue in sk_stream_kill_queues()") tried to fix it
      by calling skb_queue_purge() during close().  However, there is a
      small chance that skb queued in a qdisc or device could be put
      into the error queue after the skb_queue_purge() call.
      
      In __skb_tstamp_tx(), the cloned skb should not have a reference
      to the ubuf to remove the circular dependency, but skb_clone() does
      not call skb_copy_ubufs() for zerocopy skb.  So, we need to call
      skb_orphan_frags_rx() for the cloned skb to call skb_copy_ubufs().
      
      [0]:
      BUG: memory leak
      unreferenced object 0xffff88800c6d2d00 (size 1152):
        comm "syz-executor392", pid 264, jiffies 4294785440 (age 13.044s)
        hex dump (first 32 bytes):
          00 00 00 00 00 00 00 00 cd af e8 81 00 00 00 00  ................
          02 00 07 40 00 00 00 00 00 00 00 00 00 00 00 00  ...@............
        backtrace:
          [<0000000055636812>] sk_prot_alloc+0x64/0x2a0 net/core/sock.c:2024
          [<0000000054d77b7a>] sk_alloc+0x3b/0x800 net/core/sock.c:2083
          [<0000000066f3c7e0>] inet_create net/ipv4/af_inet.c:319 [inline]
          [<0000000066f3c7e0>] inet_create+0x31e/0xe40 net/ipv4/af_inet.c:245
          [<000000009b83af97>] __sock_create+0x2ab/0x550 net/socket.c:1515
          [<00000000b9b11231>] sock_create net/socket.c:1566 [inline]
          [<00000000b9b11231>] __sys_socket_create net/socket.c:1603 [inline]
          [<00000000b9b11231>] __sys_socket_create net/socket.c:1588 [inline]
          [<00000000b9b11231>] __sys_socket+0x138/0x250 net/socket.c:1636
          [<000000004fb45142>] __do_sys_socket net/socket.c:1649 [inline]
          [<000000004fb45142>] __se_sys_socket net/socket.c:1647 [inline]
          [<000000004fb45142>] __x64_sys_socket+0x73/0xb0 net/socket.c:1647
          [<0000000066999e0e>] do_syscall_x64 arch/x86/entry/common.c:50 [inline]
          [<0000000066999e0e>] do_syscall_64+0x38/0x90 arch/x86/entry/common.c:80
          [<0000000017f238c1>] entry_SYSCALL_64_after_hwframe+0x63/0xcd
      
      BUG: memory leak
      unreferenced object 0xffff888017633a00 (size 240):
        comm "syz-executor392", pid 264, jiffies 4294785440 (age 13.044s)
        hex dump (first 32 bytes):
          00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
          00 00 00 00 00 00 00 00 00 2d 6d 0c 80 88 ff ff  .........-m.....
        backtrace:
          [<000000002b1c4368>] __alloc_skb+0x229/0x320 net/core/skbuff.c:497
          [<00000000143579a6>] alloc_skb include/linux/skbuff.h:1265 [inline]
          [<00000000143579a6>] sock_omalloc+0xaa/0x190 net/core/sock.c:2596
          [<00000000be626478>] msg_zerocopy_alloc net/core/skbuff.c:1294 [inline]
          [<00000000be626478>] msg_zerocopy_realloc+0x1ce/0x7f0 net/core/skbuff.c:1370
          [<00000000cbfc9870>] __ip_append_data+0x2adf/0x3b30 net/ipv4/ip_output.c:1037
          [<0000000089869146>] ip_make_skb+0x26c/0x2e0 net/ipv4/ip_output.c:1652
          [<00000000098015c2>] udp_sendmsg+0x1bac/0x2390 net/ipv4/udp.c:1253
          [<0000000045e0e95e>] inet_sendmsg+0x10a/0x150 net/ipv4/af_inet.c:819
          [<000000008d31bfde>] sock_sendmsg_nosec net/socket.c:714 [inline]
          [<000000008d31bfde>] sock_sendmsg+0x141/0x190 net/socket.c:734
          [<0000000021e21aa4>] __sys_sendto+0x243/0x360 net/socket.c:2117
          [<00000000ac0af00c>] __do_sys_sendto net/socket.c:2129 [inline]
          [<00000000ac0af00c>] __se_sys_sendto net/socket.c:2125 [inline]
          [<00000000ac0af00c>] __x64_sys_sendto+0xe1/0x1c0 net/socket.c:2125
          [<0000000066999e0e>] do_syscall_x64 arch/x86/entry/common.c:50 [inline]
          [<0000000066999e0e>] do_syscall_64+0x38/0x90 arch/x86/entry/common.c:80
          [<0000000017f238c1>] entry_SYSCALL_64_after_hwframe+0x63/0xcd
      
      Fixes: f214f915 ("tcp: enable MSG_ZEROCOPY")
      Fixes: b5947e5d ("udp: msg_zerocopy")
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Reviewed-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      50749f2d
    • Gencen Gan's avatar
      net: amd: Fix link leak when verifying config failed · d325c34d
      Gencen Gan authored
      After failing to verify configuration, it returns directly without
      releasing link, which may cause memory leak.
      
      Paolo Abeni thinks that the whole code of this driver is quite
      "suboptimal" and looks unmainatained since at least ~15y, so he
      suggests that we could simply remove the whole driver, please
      take it into consideration.
      
      Simon Horman suggests that the fix label should be set to
      "Linux-2.6.12-rc2" considering that the problem has existed
      since the driver was introduced and the commit above doesn't
      seem to exist in net/net-next.
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Signed-off-by: default avatarGan Gecen <gangecen@hust.edu.cn>
      Reviewed-by: default avatarDongliang Mu <dzm91@hust.edu.cn>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d325c34d
    • Christian Marangi's avatar
      net: phy: marvell: Fix inconsistent indenting in led_blink_set · 4774ad84
      Christian Marangi authored
      Fix inconsistent indeinting in m88e1318_led_blink_set reported by kernel
      test robot, probably done by the presence of an if condition dropped in
      later revision of the same code.
      Reported-by: default avatarkernel test robot <lkp@intel.com>
      Link: https://lore.kernel.org/oe-kbuild-all/202304240007.0VEX8QYG-lkp@intel.com/
      Fixes: ea9e8648 ("net: phy: marvell: Implement led_blink_set()")
      Signed-off-by: default avatarChristian Marangi <ansuelsmth@gmail.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Link: https://lore.kernel.org/r/20230423172800.3470-1-ansuelsmth@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      4774ad84
    • Horatiu Vultur's avatar
      lan966x: Don't use xdp_frame when action is XDP_TX · 700f11eb
      Horatiu Vultur authored
      When the action of an xdp program was XDP_TX, lan966x was creating
      a xdp_frame and use this one to send the frame back. But it is also
      possible to send back the frame without needing a xdp_frame, because
      it is possible to send it back using the page.
      And then once the frame is transmitted is possible to use directly
      page_pool_recycle_direct as lan966x is using page pools.
      This would save some CPU usage on this path, which results in higher
      number of transmitted frames. Bellow are the statistics:
      Frame size:    Improvement:
      64                ~8%
      256              ~11%
      512               ~8%
      1000              ~0%
      1500              ~0%
      Signed-off-by: default avatarHoratiu Vultur <horatiu.vultur@microchip.com>
      Reviewed-by: default avatarAlexander Lobakin <aleksander.lobakin@intel.com>
      Link: https://lore.kernel.org/r/20230422142344.3630602-1-horatiu.vultur@microchip.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      700f11eb