1. 17 May, 2016 10 commits
    • Chuck Lever's avatar
      xprtrdma: Prevent inline overflow · 302d3deb
      Chuck Lever authored
      When deciding whether to send a Call inline, rpcrdma_marshal_req
      doesn't take into account header bytes consumed by chunk lists.
      This results in Call messages on the wire that are sometimes larger
      than the inline threshold.
      
      Likewise, when a Write list or Reply chunk is in play, the server's
      reply has to emit an RDMA Send that includes a larger-than-minimal
      RPC-over-RDMA header.
      
      The actual size of a Call message cannot be estimated until after
      the chunk lists have been registered. Thus the size of each
      RPC-over-RDMA header can be estimated only after chunks are
      registered; but the decision to register chunks is based on the size
      of that header. Chicken, meet egg.
      
      The best a client can do is estimate header size based on the
      largest header that might occur, and then ensure that inline content
      is always smaller than that.
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      Tested-by: default avatarSteve Wise <swise@opengridcomputing.com>
      Signed-off-by: default avatarAnna Schumaker <Anna.Schumaker@Netapp.com>
      302d3deb
    • Chuck Lever's avatar
      xprtrdma: Limit number of RDMA segments in RPC-over-RDMA headers · 94931746
      Chuck Lever authored
      Send buffer space is shared between the RPC-over-RDMA header and
      an RPC message. A large RPC-over-RDMA header means less space is
      available for the associated RPC message, which then has to be
      moved via an RDMA Read or Write.
      
      As more segments are added to the chunk lists, the header increases
      in size.  Typical modern hardware needs only a few segments to
      convey the maximum payload size, but some devices and registration
      modes may need a lot of segments to convey data payload. Sometimes
      so many are needed that the remaining space in the Send buffer is
      not enough for the RPC message. Sending such a message usually
      fails.
      
      To ensure a transport can always make forward progress, cap the
      number of RDMA segments that are allowed in chunk lists. This
      prevents less-capable devices and memory registrations from
      consuming a large portion of the Send buffer by reducing the
      maximum data payload that can be conveyed with such devices.
      
      For now I choose an arbitrary maximum of 8 RDMA segments. This
      allows a maximum size RPC-over-RDMA header to fit nicely in the
      current 1024 byte inline threshold with over 700 bytes remaining
      for an inline RPC message.
      
      The current maximum data payload of NFS READ or WRITE requests is
      one megabyte. To convey that payload on a client with 4KB pages,
      each chunk segment would need to handle 32 or more data pages. This
      is well within the capabilities of FMR. For physical registration,
      the maximum payload size on platforms with 4KB pages is reduced to
      32KB.
      
      For FRWR, a device's maximum page list depth would need to be at
      least 34 to support the maximum 1MB payload. A device with a smaller
      maximum page list depth means the maximum data payload is reduced
      when using that device.
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      Tested-by: default avatarSteve Wise <swise@opengridcomputing.com>
      Reviewed-by: default avatarSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: default avatarAnna Schumaker <Anna.Schumaker@Netapp.com>
      94931746
    • Chuck Lever's avatar
      xprtrdma: Bound the inline threshold values · 29c55422
      Chuck Lever authored
      Currently the sysctls that allow setting the inline threshold allow
      any value to be set.
      
      Small values only make the transport run slower. The default 1KB
      setting is as low as is reasonable. And the logic that decides how
      to divide a Send buffer between RPC-over-RDMA header and RPC message
      assumes (but does not check) that the lower bound is not crazy (say,
      57 bytes).
      
      Send and receive buffers share a page with some control information.
      Values larger than about 3KB can't be supported, currently.
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      Tested-by: default avatarSteve Wise <swise@opengridcomputing.com>
      Signed-off-by: default avatarAnna Schumaker <Anna.Schumaker@Netapp.com>
      29c55422
    • Chuck Lever's avatar
      sunrpc: Advertise maximum backchannel payload size · 6b26cc8c
      Chuck Lever authored
      RPC-over-RDMA transports have a limit on how large a backward
      direction (backchannel) RPC message can be. Ensure that the NFSv4.x
      CREATE_SESSION operation advertises this limit to servers.
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      Tested-by: default avatarSteve Wise <swise@opengridcomputing.com>
      Signed-off-by: default avatarAnna Schumaker <Anna.Schumaker@Netapp.com>
      6b26cc8c
    • Chuck Lever's avatar
      sunrpc: Update RPCBIND_MAXNETIDLEN · 4b9c7f9d
      Chuck Lever authored
      Commit 176e21ee ("SUNRPC: Support for RPC over AF_LOCAL
      transports") added a 5-character netid, but did not bump
      RPCBIND_MAXNETIDLEN from 4 to 5.
      
      Fixes: 176e21ee ("SUNRPC: Support for RPC over AF_LOCAL ...")
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: default avatarAnna Schumaker <Anna.Schumaker@Netapp.com>
      4b9c7f9d
    • Shirley Ma's avatar
      xprtrdma: Add rdma6 option to support NFS/RDMA IPv6 · 181342c5
      Shirley Ma authored
      RFC 5666: The "rdma" netid is to be used when IPv4 addressing
      is employed by the underlying transport, and "rdma6" for IPv6
      addressing.
      
      Add mount -o proto=rdma6 option to support NFS/RDMA IPv6 addressing.
      
      Changes from v2:
       - Integrated comments from Chuck Level, Anna Schumaker, Trodt Myklebust
       - Add a little more to the patch description to describe NFS/RDMA
         IPv6 suggested by Chuck Level and Anna Schumaker
       - Removed duplicated rdma6 define
       - Remove Opt_xprt_rdma mountfamily since it doesn't support
      Signed-off-by: default avatarShirley Ma <shirley.ma@oracle.com>
      Reviewed-by: default avatarChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: default avatarAnna Schumaker <Anna.Schumaker@Netapp.com>
      181342c5
    • Tigran Mkrtchyan's avatar
      nfs4: client: do not send empty SETATTR after OPEN_CREATE · a1d1c4f1
      Tigran Mkrtchyan authored
      OPEN_CREATE with EXCLUSIVE4_1 sends initial file permission.
      Ignoring  fact, that server have indicated that file mod is set, client
      will send yet another SETATTR request, but, as mode is already set,
      new SETATTR will be empty. This is not a problem, nevertheless
      an extra roundtrip and slow open on high latency networks.
      
      This change is aims to skip extra setattr after open  if there are
      no attributes to be set.
      Signed-off-by: default avatarTigran Mkrtchyan <tigran.mkrtchyan@desy.de>
      Signed-off-by: default avatarAnna Schumaker <Anna.Schumaker@Netapp.com>
      a1d1c4f1
    • Anna Schumaker's avatar
      NFS: Add COPY nfs operation · 2e72448b
      Anna Schumaker authored
      This adds the copy_range file_ops function pointer used by the
      sys_copy_range() function call.  This patch only implements sync copies,
      so if an async copy happens we decode the stateid and ignore it.
      Signed-off-by: default avatarAnna Schumaker <bjschuma@netapp.com>
      2e72448b
    • Anna Schumaker's avatar
      NFS: Add nfs_commit_file() · 67911c8f
      Anna Schumaker authored
      Copy will use this to set up a commit request for a generic range.  I
      don't want to allocate a new pagecache entry for the file, so I needed
      to change parts of the commit path to handle requests with a null
      wb_page.
      Signed-off-by: default avatarAnna Schumaker <Anna.Schumaker@Netapp.com>
      67911c8f
    • Olga Kornievskaia's avatar
      Fixing oops in callback path · c2985d00
      Olga Kornievskaia authored
      Commit 80f96427 ("NFSv4.x: Enforce the ca_maxreponsesize_cached
      on the back channel") causes an oops when it receives a callback with
      cachethis=yes.
      
      [  109.667378] BUG: unable to handle kernel NULL pointer dereference at 00000000000002c8
      [  109.669476] IP: [<ffffffffa08a3e68>] nfs4_callback_compound+0x4f8/0x690 [nfsv4]
      [  109.671216] PGD 0
      [  109.671736] Oops: 0000 [#1] SMP
      [  109.705427] CPU: 1 PID: 3579 Comm: nfsv4.1-svc Not tainted 4.5.0-rc1+ #1
      [  109.706987] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 05/20/2014
      [  109.709468] task: ffff8800b4408000 ti: ffff88008448c000 task.ti: ffff88008448c000
      [  109.711207] RIP: 0010:[<ffffffffa08a3e68>]  [<ffffffffa08a3e68>] nfs4_callback_compound+0x4f8/0x690 [nfsv4]
      [  109.713521] RSP: 0018:ffff88008448fca0  EFLAGS: 00010286
      [  109.714762] RAX: ffff880081ee202c RBX: ffff8800b7b5b600 RCX: 0000000000000001
      [  109.716427] RDX: 0000000000000008 RSI: 0000000000000008 RDI: 0000000000000000
      [  109.718091] RBP: ffff88008448fda8 R08: 0000000000000000 R09: 000000000b000000
      [  109.719757] R10: ffff880137786000 R11: ffff8800b7b5b600 R12: 0000000001000000
      [  109.721415] R13: 0000000000000002 R14: 0000000053270000 R15: 000000000000000b
      [  109.723061] FS:  0000000000000000(0000) GS:ffff880139640000(0000) knlGS:0000000000000000
      [  109.724931] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  109.726278] CR2: 00000000000002c8 CR3: 0000000034d50000 CR4: 00000000001406e0
      [  109.727972] Stack:
      [  109.728465]  ffff880081ee202c ffff880081ee201c 000000008448fcc0 ffff8800baccb800
      [  109.730349]  ffff8800baccc800 ffffffffa08d0380 0000000000000000 0000000000000000
      [  109.732211]  ffff8800b7b5b600 0000000000000001 ffffffff81d073c0 ffff880081ee3090
      [  109.734056] Call Trace:
      [  109.734657]  [<ffffffffa03795d4>] svc_process_common+0x5c4/0x6c0 [sunrpc]
      [  109.736267]  [<ffffffffa0379a4c>] bc_svc_process+0x1fc/0x360 [sunrpc]
      [  109.737775]  [<ffffffffa08a2c2c>] nfs41_callback_svc+0x10c/0x1d0 [nfsv4]
      [  109.739335]  [<ffffffff810cb380>] ? prepare_to_wait_event+0xf0/0xf0
      [  109.740799]  [<ffffffffa08a2b20>] ? nfs4_callback_svc+0x50/0x50 [nfsv4]
      [  109.742349]  [<ffffffff810a6998>] kthread+0xd8/0xf0
      [  109.743495]  [<ffffffff810a68c0>] ? kthread_park+0x60/0x60
      [  109.744776]  [<ffffffff816abc4f>] ret_from_fork+0x3f/0x70
      [  109.746037]  [<ffffffff810a68c0>] ? kthread_park+0x60/0x60
      [  109.747324] Code: cc 45 31 f6 48 8b 85 00 ff ff ff 44 89 30 48 8b 85 f8 fe ff ff 44 89 20 48 8b 9d 38 ff ff ff 48 8b bd 30 ff ff ff 48 85 db 74 4c <4c> 8b af c8 02 00 00 4d 8d a5 08 02 00 00 49 81 c5 98 02 00 00
      [  109.754361] RIP  [<ffffffffa08a3e68>] nfs4_callback_compound+0x4f8/0x690 [nfsv4]
      [  109.756123]  RSP <ffff88008448fca0>
      [  109.756951] CR2: 00000000000002c8
      [  109.757738] ---[ end trace 2b8555511ab5dfb4 ]---
      [  109.758819] Kernel panic - not syncing: Fatal exception
      [  109.760126] Kernel Offset: disabled
      [  118.938934] ---[ end Kernel panic - not syncing: Fatal exception
      
      It doesn't unlock the table nor does it set the cps->clp pointer which
      is later needed by nfs4_cb_free_slot().
      
      Fixes: 80f96427 ("NFSv4.x: Enforce the ca_maxresponsesize_cached ...")
      CC: stable@vger.kernel.org
      Signed-off-by: default avatarOlga Kornievskaia <kolga@netapp.com>
      Signed-off-by: default avatarAnna Schumaker <Anna.Schumaker@Netapp.com>
      c2985d00
  2. 09 May, 2016 13 commits
  3. 08 May, 2016 1 commit
  4. 07 May, 2016 7 commits
  5. 06 May, 2016 9 commits