1. 08 Jun, 2018 4 commits
    • Andrew Elble's avatar
      nfsd: fix error handling in nfs4_set_delegation() · 692ad280
      Andrew Elble authored
      I noticed a memory corruption crash in nfsd in
      4.17-rc1. This patch corrects the issue.
      
      Fix to return error if the delegation couldn't be hashed or there was
      a recall in progress. Use the existing error path instead of
      destroy_delegation() for readability.
      Signed-off-by: default avatarAndrew Elble <aweits@rit.edu>
      Fixes: 353601e7 ("nfsd: create a separate lease for each delegation")
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      692ad280
    • Scott Mayhew's avatar
      nfsd: fix potential use-after-free in nfsd4_decode_getdeviceinfo · 3171822f
      Scott Mayhew authored
      When running a fuzz tester against a KASAN-enabled kernel, the following
      splat periodically occurs.
      
      The problem occurs when the test sends a GETDEVICEINFO request with a
      malformed xdr array (size but no data) for gdia_notify_types and the
      array size is > 0x3fffffff, which results in an overflow in the value of
      nbytes which is passed to read_buf().
      
      If the array size is 0x40000000, 0x80000000, or 0xc0000000, then after
      the overflow occurs, the value of nbytes 0, and when that happens the
      pointer returned by read_buf() points to the end of the xdr data (i.e.
      argp->end) when really it should be returning NULL.
      
      Fix this by returning NFS4ERR_BAD_XDR if the array size is > 1000 (this
      value is arbitrary, but it's the same threshold used by
      nfsd4_decode_bitmap()... in could really be any value >= 1 since it's
      expected to get at most a single bitmap in gdia_notify_types).
      
      [  119.256854] ==================================================================
      [  119.257611] BUG: KASAN: use-after-free in nfsd4_decode_getdeviceinfo+0x5a4/0x5b0 [nfsd]
      [  119.258422] Read of size 4 at addr ffff880113ada000 by task nfsd/538
      
      [  119.259146] CPU: 0 PID: 538 Comm: nfsd Not tainted 4.17.0+ #1
      [  119.259662] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.3-1.fc25 04/01/2014
      [  119.261202] Call Trace:
      [  119.262265]  dump_stack+0x71/0xab
      [  119.263371]  print_address_description+0x6a/0x270
      [  119.264609]  kasan_report+0x258/0x380
      [  119.265854]  ? nfsd4_decode_getdeviceinfo+0x5a4/0x5b0 [nfsd]
      [  119.267291]  nfsd4_decode_getdeviceinfo+0x5a4/0x5b0 [nfsd]
      [  119.268549]  ? nfs4svc_decode_compoundargs+0xa5b/0x13c0 [nfsd]
      [  119.269873]  ? nfsd4_decode_sequence+0x490/0x490 [nfsd]
      [  119.271095]  nfs4svc_decode_compoundargs+0xa5b/0x13c0 [nfsd]
      [  119.272393]  ? nfsd4_release_compoundargs+0x1b0/0x1b0 [nfsd]
      [  119.273658]  nfsd_dispatch+0x183/0x850 [nfsd]
      [  119.274918]  svc_process+0x161c/0x31a0 [sunrpc]
      [  119.276172]  ? svc_printk+0x190/0x190 [sunrpc]
      [  119.277386]  ? svc_xprt_release+0x451/0x680 [sunrpc]
      [  119.278622]  nfsd+0x2b9/0x430 [nfsd]
      [  119.279771]  ? nfsd_destroy+0x1c0/0x1c0 [nfsd]
      [  119.281157]  kthread+0x2db/0x390
      [  119.282347]  ? kthread_create_worker_on_cpu+0xc0/0xc0
      [  119.283756]  ret_from_fork+0x35/0x40
      
      [  119.286041] Allocated by task 436:
      [  119.287525]  kasan_kmalloc+0xa0/0xd0
      [  119.288685]  kmem_cache_alloc+0xe9/0x1f0
      [  119.289900]  get_empty_filp+0x7b/0x410
      [  119.291037]  path_openat+0xca/0x4220
      [  119.292242]  do_filp_open+0x182/0x280
      [  119.293411]  do_sys_open+0x216/0x360
      [  119.294555]  do_syscall_64+0xa0/0x2f0
      [  119.295721]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      [  119.298068] Freed by task 436:
      [  119.299271]  __kasan_slab_free+0x130/0x180
      [  119.300557]  kmem_cache_free+0x78/0x210
      [  119.301823]  rcu_process_callbacks+0x35b/0xbd0
      [  119.303162]  __do_softirq+0x192/0x5ea
      
      [  119.305443] The buggy address belongs to the object at ffff880113ada000
                      which belongs to the cache filp of size 256
      [  119.308556] The buggy address is located 0 bytes inside of
                      256-byte region [ffff880113ada000, ffff880113ada100)
      [  119.311376] The buggy address belongs to the page:
      [  119.312728] page:ffffea00044eb680 count:1 mapcount:0 mapping:0000000000000000 index:0xffff880113ada780
      [  119.314428] flags: 0x17ffe000000100(slab)
      [  119.315740] raw: 0017ffe000000100 0000000000000000 ffff880113ada780 00000001000c0001
      [  119.317379] raw: ffffea0004553c60 ffffea00045c11e0 ffff88011b167e00 0000000000000000
      [  119.319050] page dumped because: kasan: bad access detected
      
      [  119.321652] Memory state around the buggy address:
      [  119.322993]  ffff880113ad9f00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      [  119.324515]  ffff880113ad9f80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      [  119.326087] >ffff880113ada000: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      [  119.327547]                    ^
      [  119.328730]  ffff880113ada080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      [  119.330218]  ffff880113ada100: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb
      [  119.331740] ==================================================================
      Signed-off-by: default avatarScott Mayhew <smayhew@redhat.com>
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      3171822f
    • Dave Wysochanski's avatar
      Fix 16-byte memory leak in gssp_accept_sec_context_upcall · 0070ed3d
      Dave Wysochanski authored
      There is a 16-byte memory leak inside sunrpc/auth_gss on an nfs server when
      a client mounts with 'sec=krb5' in a simple mount / umount loop.  The leak
      is seen by either monitoring the kmalloc-16 slab or with kmemleak enabled
      
      unreferenced object 0xffff92e6a045f030 (size 16):
        comm "nfsd", pid 1096, jiffies 4294936658 (age 761.110s)
        hex dump (first 16 bytes):
          2a 86 48 86 f7 12 01 02 02 00 00 00 00 00 00 00  *.H.............
        backtrace:
          [<000000004b2b79a7>] gssx_dec_buffer+0x79/0x90 [auth_rpcgss]
          [<000000002610ac1a>] gssx_dec_accept_sec_context+0x215/0x6dd [auth_rpcgss]
          [<000000004fd0e81d>] rpcauth_unwrap_resp+0xa9/0xe0 [sunrpc]
          [<000000002b099233>] call_decode+0x1e9/0x840 [sunrpc]
          [<00000000954fc846>] __rpc_execute+0x80/0x3f0 [sunrpc]
          [<00000000c83a961c>] rpc_run_task+0x10d/0x150 [sunrpc]
          [<000000002c2cdcd2>] rpc_call_sync+0x4d/0xa0 [sunrpc]
          [<000000000b74eea2>] gssp_accept_sec_context_upcall+0x196/0x470 [auth_rpcgss]
          [<000000003271273f>] svcauth_gss_proxy_init+0x188/0x520 [auth_rpcgss]
          [<000000001cf69f01>] svcauth_gss_accept+0x3a6/0xb50 [auth_rpcgss]
      
      If you map the above to code you'll see the following call chain
        gssx_dec_accept_sec_context
          gssx_dec_ctx  (missing from kmemleak output)
            gssx_dec_buffer(xdr, &ctx->mech)
      
      Inside gssx_dec_buffer there is 'kmemdup' where we allocate memory for
      any gssx_buffer (buf) and store into buf->data.  In the above instance,
      'buf == &ctx->mech).
      
      Further up in the chain in gssp_accept_sec_context_upcall we see ctx->mech
      is part of a stack variable 'struct gssx_ctx rctxh'.  Now later inside
      gssp_accept_sec_context_upcall after gssp_call, there is a number of
      memcpy and kfree statements, but there is no kfree(rctxh.mech.data)
      after the memcpy into data->mech_oid.data.
      
      With this patch applied and the same mount / unmount loop, the kmalloc-16
      slab is stable and kmemleak enabled no longer shows the above backtrace.
      Signed-off-by: default avatarDave Wysochanski <dwysocha@redhat.com>
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      0070ed3d
    • Chuck Lever's avatar
      svcrdma: Fix incorrect return value/type in svc_rdma_post_recvs · af7fd74e
      Chuck Lever authored
      This crept in during the development process and wasn't caught
      before I posted the "final" version.
      Reported-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Fixes: 0b2613c5883f ('svcrdma: Allocate recv_ctxt's on CPU ... ')
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      af7fd74e
  2. 11 May, 2018 21 commits
    • Chuck Lever's avatar
      svcrdma: Remove unused svc_rdma_op_ctxt · 51cc257a
      Chuck Lever authored
      Clean up: Eliminate a structure that is no longer used.
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      51cc257a
    • Chuck Lever's avatar
      svcrdma: Persistently allocate and DMA-map Send buffers · 99722fe4
      Chuck Lever authored
      While sending each RPC Reply, svc_rdma_sendto allocates and DMA-
      maps a separate buffer where the RPC/RDMA transport header is
      constructed. The buffer is unmapped and released in the Send
      completion handler. This is significant per-RPC overhead,
      especially for small RPCs.
      
      Instead, allocate and DMA-map a buffer, and cache it in each
      svc_rdma_send_ctxt. This buffer and its mapping can be re-used
      for each RPC, saving the cost of memory allocation and DMA
      mapping.
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      99722fe4
    • Chuck Lever's avatar
      svcrdma: Simplify svc_rdma_send() · 3abb03fa
      Chuck Lever authored
      Clean up: No current caller of svc_rdma_send's passes in a chained
      WR. The logic that counts the chain length can be replaced with a
      constant (1).
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      3abb03fa
    • Chuck Lever's avatar
      svcrdma: Remove post_send_wr · 986b7889
      Chuck Lever authored
      Clean up: Now that the send_wr is part of the svc_rdma_send_ctxt,
      svc_rdma_post_send_wr is nearly empty.
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      986b7889
    • Chuck Lever's avatar
      svcrdma: Don't overrun the SGE array in svc_rdma_send_ctxt · 25fd86ec
      Chuck Lever authored
      Receive buffers are always the same size, but each Send WR has a
      variable number of SGEs, based on the contents of the xdr_buf being
      sent.
      
      While assembling a Send WR, keep track of the number of SGEs so that
      we don't exceed the device's maximum, or walk off the end of the
      Send SGE array.
      
      For now the Send path just fails if it exceeds the maximum.
      
      The current logic in svc_rdma_accept bases the maximum number of
      Send SGEs on the largest NFS request that can be sent or received.
      In the transport layer, the limit is actually based on the
      capabilities of the underlying device, not on properties of the
      Upper Layer Protocol.
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      25fd86ec
    • Chuck Lever's avatar
      svcrdma: Introduce svc_rdma_send_ctxt · 4201c746
      Chuck Lever authored
      svc_rdma_op_ctxt's are pre-allocated and maintained on a per-xprt
      free list. This eliminates the overhead of calling kmalloc / kfree,
      both of which grab a globally shared lock that disables interrupts.
      Introduce a replacement to svc_rdma_op_ctxt's that is built
      especially for the svcrdma Send path.
      
      Subsequent patches will take advantage of this new structure by
      allocating real resources which are then cached in these objects.
      The allocations are freed when the transport is torn down.
      
      I've renamed the structure so that static type checking can be used
      to ensure that uses of op_ctxt and send_ctxt are not confused. As an
      additional clean up, structure fields are renamed to conform with
      kernel coding conventions.
      
      Additional clean ups:
      - Handle svc_rdma_send_ctxt_get allocation failure at each call
        site, rather than pre-allocating and hoping we guessed correctly
      - All send_ctxt_put call-sites request page freeing, so remove
        the @free_pages argument
      - All send_ctxt_put call-sites unmap SGEs, so fold that into
        svc_rdma_send_ctxt_put
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      4201c746
    • Chuck Lever's avatar
      svcrdma: Clean up Send SGE accounting · 23262790
      Chuck Lever authored
      Clean up: Since there's already a svc_rdma_op_ctxt being passed
      around with the running count of mapped SGEs, drop unneeded
      parameters to svc_rdma_post_send_wr().
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      23262790
    • Chuck Lever's avatar
      svcrdma: Refactor svc_rdma_dma_map_buf · f016f305
      Chuck Lever authored
      Clean up: svc_rdma_dma_map_buf does mostly the same thing as
      svc_rdma_dma_map_page, so let's fold these together.
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      f016f305
    • Chuck Lever's avatar
      svcrdma: Allocate recv_ctxt's on CPU handling Receives · eb5d7a62
      Chuck Lever authored
      There is a significant latency penalty when processing an ingress
      Receive if the Receive buffer resides in memory that is not on the
      same NUMA node as the the CPU handling completions for a CQ.
      
      The system administrator and the device driver determine which CPU
      handles completions. This CPU does not change during life of the CQ.
      Further the Upper Layer does not have any visibility of which CPU it
      is.
      
      Allocating Receive buffers in the Receive completion handler
      guarantees that Receive buffers are allocated on the preferred NUMA
      node for that CQ.
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      eb5d7a62
    • Chuck Lever's avatar
      svcrdma: Persistently allocate and DMA-map Receive buffers · 3316f063
      Chuck Lever authored
      The current Receive path uses an array of pages which are allocated
      and DMA mapped when each Receive WR is posted, and then handed off
      to the upper layer in rqstp::rq_arg. The page flip releases unused
      pages in the rq_pages pagelist. This mechanism introduces a
      significant amount of overhead.
      
      So instead, kmalloc the Receive buffer, and leave it DMA-mapped
      while the transport remains connected. This confers a number of
      benefits:
      
      * Each Receive WR requires only one receive SGE, no matter how large
        the inline threshold is. This helps the server-side NFS/RDMA
        transport operate on less capable RDMA devices.
      
      * The Receive buffer is left allocated and mapped all the time. This
        relieves svc_rdma_post_recv from the overhead of allocating and
        DMA-mapping a fresh buffer.
      
      * svc_rdma_wc_receive no longer has to DMA unmap the Receive buffer.
        It has to DMA sync only the number of bytes that were received.
      
      * svc_rdma_build_arg_xdr no longer has to free a page in rq_pages
        for each page in the Receive buffer, making it a constant-time
        function.
      
      * The Receive buffer is now plugged directly into the rq_arg's
        head[0].iov_vec, and can be larger than a page without spilling
        over into rq_arg's page list. This enables simplification of
        the RDMA Read path in subsequent patches.
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      3316f063
    • Chuck Lever's avatar
      svcrdma: Preserve Receive buffer until svc_rdma_sendto · 3a88092e
      Chuck Lever authored
      Rather than releasing the incoming svc_rdma_recv_ctxt at the end of
      svc_rdma_recvfrom, hold onto it until svc_rdma_sendto.
      
      This permits the contents of the Receive buffer to be preserved
      through svc_process and then referenced directly in sendto as it
      constructs Write and Reply chunks to return to the client.
      
      The real changes will come in subsequent patches.
      
      Note: I cannot use ->xpo_release_rqst for this purpose because that
      is called _before_ ->xpo_sendto. svc_rdma_sendto uses information in
      the received Call transport header to construct the Reply transport
      header, which is preserved in the RPC's Receive buffer.
      
      The historical comment in svc_send() isn't helpful: it is already
      obvious that ->xpo_release_rqst is being called before ->xpo_sendto,
      but there is no explanation for this ordering going back to the
      beginning of the git era.
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      3a88092e
    • Chuck Lever's avatar
      svcrdma: Simplify svc_rdma_recv_ctxt_put · 1e5f4160
      Chuck Lever authored
      Currently svc_rdma_recv_ctxt_put's callers have to know whether they
      want to free the ctxt's pages or not. This means the human
      developers have to know when and why to set that free_pages
      argument.
      
      Instead, the ctxt should carry that information with it so that
      svc_rdma_recv_ctxt_put does the right thing no matter who is
      calling.
      
      We want to keep track of the number of pages in the Receive buffer
      separately from the number of pages pulled over by RDMA Read. This
      is so that the correct number of pages can be freed properly and
      that number is well-documented.
      
      So now, rc_hdr_count is the number of pages consumed by head[0]
      (ie., the page index where the Read chunk should start); and
      rc_page_count is always the number of pages that need to be released
      when the ctxt is put.
      
      The @free_pages argument is no longer needed.
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      1e5f4160
    • Chuck Lever's avatar
      svcrdma: Remove sc_rq_depth · 2c577bfe
      Chuck Lever authored
      Clean up: No need to retain rq_depth in struct svcrdma_xprt, it is
      used only in svc_rdma_accept().
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      2c577bfe
    • Chuck Lever's avatar
      svcrdma: Introduce svc_rdma_recv_ctxt · ecf85b23
      Chuck Lever authored
      svc_rdma_op_ctxt's are pre-allocated and maintained on a per-xprt
      free list. This eliminates the overhead of calling kmalloc / kfree,
      both of which grab a globally shared lock that disables interrupts.
      To reduce contention further, separate the use of these objects in
      the Receive and Send paths in svcrdma.
      
      Subsequent patches will take advantage of this separation by
      allocating real resources which are then cached in these objects.
      The allocations are freed when the transport is torn down.
      
      I've renamed the structure so that static type checking can be used
      to ensure that uses of op_ctxt and recv_ctxt are not confused. As an
      additional clean up, structure fields are renamed to conform with
      kernel coding conventions.
      
      As a final clean up, helpers related to recv_ctxt are moved closer
      to the functions that use them.
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      ecf85b23
    • Chuck Lever's avatar
      svcrdma: Trace key RDMA API events · bd2abef3
      Chuck Lever authored
      This includes:
        * Posting on the Send and Receive queues
        * Send, Receive, Read, and Write completion
        * Connect upcalls
        * QP errors
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      bd2abef3
    • Chuck Lever's avatar
      svcrdma: Trace key RPC/RDMA protocol events · 98895edb
      Chuck Lever authored
      This includes:
        * Transport accept and tear-down
        * Decisions about using Write and Reply chunks
        * Each RDMA segment that is handled
        * Whenever an RDMA_ERR is sent
      
      As a clean-up, I've standardized the order of the includes, and
      removed some now redundant dprintk call sites.
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      98895edb
    • Chuck Lever's avatar
      xprtrdma: Prepare RPC/RDMA includes for server-side trace points · b6e717cb
      Chuck Lever authored
      Clean up: Move #include <trace/events/rpcrdma.h> into source files,
      similar to how it is done with trace/events/sunrpc.h.
      
      Server-side trace points will be part of the rpcrdma subsystem,
      just like the client-side trace points.
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      b6e717cb
    • Chuck Lever's avatar
      svcrdma: Use passed-in net namespace when creating RDMA listener · 8dafcbee
      Chuck Lever authored
      Ensure each RDMA listener and its children transports are created in
      the same net namespace as the user that started the NFS service.
      This is similar to how listener sockets are created in
      svc_create_socket, required for enabling support for containers.
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      8dafcbee
    • Chuck Lever's avatar
    • Trond Myklebust's avatar
      nfsd: Do not refuse to serve out of cache · 7e5d0e0d
      Trond Myklebust authored
      Currently the knfsd replay cache appears to try to refuse replying to
      retries that come within 200ms of the cache entry being created. That
      makes limited sense in today's world of high speed TCP.
      
      After a TCP disconnection, a client can very easily reconnect and retry
      an rpc in less than 200ms.  If this logic drops that retry, however, the
      client may be quite slow to retry again.  This logic is original to the
      first reply cache implementation in 2.1, and may have made more sense
      for UDP clients that retried much more frequently.
      
      After this patch we will still drop on finding the original request
      still in progress.  We may want to fix that as well at some point,
      though it's less likely.
      
      Note that svc_check_conn_limits is often the cause of those
      disconnections.  We may want to fix that some day.
      Signed-off-by: default avatarTrond Myklebust <trond.myklebust@primarydata.com>
      Acked-by: default avatarJeff Layton <jlayton@kernel.org>
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      7e5d0e0d
    • Scott Mayhew's avatar
      nfsd: make nfsd4_scsi_identify_device retry with a larger buffer · dac27072
      Scott Mayhew authored
      nfsd4_scsi_identify_device() performs a single IDENTIFY command for the
      device identification VPD page using a small buffer.  If the reply is
      too large to fit in this buffer then the GETDEVICEINFO reply will not
      contain any info for the SCSI volume aside from the registration key.
      This can happen for example if the device has descriptors using long
      SCSI name strings.
      
      When the initial reply from the device indicates a larger buffer is
      needed, retry once using the page length from that reply.
      Signed-off-by: default avatarScott Mayhew <smayhew@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      dac27072
  3. 07 May, 2018 2 commits
  4. 06 May, 2018 7 commits
    • Linus Torvalds's avatar
      Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm · 701e39d0
      Linus Torvalds authored
      Pll KVM fixes from Radim Krčmář:
       "ARM:
         - Fix proxying of GICv2 CPU interface accesses
         - Fix crash when switching to BE
         - Track source vcpu git GICv2 SGIs
         - Fix an outdated bit of documentation
      
        x86:
         - Speed up injection of expired timers (for stable)"
      
      * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
        KVM: x86: remove APIC Timer periodic/oneshot spikes
        arm64: vgic-v2: Fix proxying of cpuif access
        KVM: arm/arm64: vgic_init: Cleanup reference to process_maintenance
        KVM: arm64: Fix order of vcpu_write_sys_reg() arguments
        KVM: arm/arm64: vgic: Fix source vcpu issues for GICv2 SGI
      701e39d0
    • Linus Torvalds's avatar
      Merge tag 'iommu-fixes-v4.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu · 772d4f84
      Linus Torvalds authored
      Pull iommu fixes from Joerg Roedel:
      
       - fix a compile warning in the AMD IOMMU driver with irq remapping
         disabled
      
       - fix for VT-d interrupt remapping and invalidation size (caused a
         BUG_ON when trying to invalidate more than 4GB)
      
       - build fix and a regression fix for broken graphics with old DTS for
         the rockchip iommu driver
      
       - a revert in the PCI window reservation code which fixes a regression
         with VFIO.
      
      * tag 'iommu-fixes-v4.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
        iommu: rockchip: fix building without CONFIG_OF
        iommu/vt-d: Use WARN_ON_ONCE instead of BUG_ON in qi_flush_dev_iotlb()
        iommu/vt-d: fix shift-out-of-bounds in bug checking
        iommu/dma: Move PCI window region reservation back into dma specific path.
        iommu/rockchip: Make clock handling optional
        iommu/amd: Hide unused iommu_table_lock
        iommu/vt-d: Fix usage of force parameter in intel_ir_reconfigure_irte()
      772d4f84
    • Linus Torvalds's avatar
      Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 9c48eb6a
      Linus Torvalds authored
      Pull x86 fix from Thomas Gleixner:
       "Unbreak the CPUID CPUID_8000_0008_EBX reload which got dropped when
        the evaluation of physical and virtual bits which uses the same CPUID
        leaf was moved out of get_cpu_cap()"
      
      * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/cpu: Restore CPUID_8000_0008_EBX reload
      9c48eb6a
    • Linus Torvalds's avatar
      Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · fe282c60
      Linus Torvalds authored
      Pull clocksource fixes from Thomas Gleixner:
       "The recent addition of the early TSC clocksource breaks on machines
        which have an unstable TSC because in case that TSC is disabled, then
        the clocksource selection logic falls back to the early TSC which is
        obviously bogus.
      
        That also unearthed a few robustness issues in the clocksource
        derating code which are addressed as well"
      
      * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        clocksource: Rework stale comment
        clocksource: Consistent de-rate when marking unstable
        x86/tsc: Fix mark_tsc_unstable()
        clocksource: Initialize cs->wd_list
        clocksource: Allow clocksource_mark_unstable() on unregistered clocksources
        x86/tsc: Always unregister clocksource_tsc_early
      fe282c60
    • Linus Torvalds's avatar
      Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 03b5f0c1
      Linus Torvalds authored
      Pull irq fix from Thomas Gleixner:
       "A single fix to prevent false positives in the spurious interrupt
        detector when more than a single demultiplex register is evaluated in
        the Qualcom irq combiner driver"
      
      * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        irqchip/qcom: Fix check for spurious interrupts
      03b5f0c1
    • Linus Torvalds's avatar
      Merge tag 'platform-drivers-x86-v4.17-2' of git://git.infradead.org/linux-platform-drivers-x86 · ee946c36
      Linus Torvalds authored
      Pull x86 platform driver fixes from Darren Hart:
      
       - We missed a case in the Dell config dependencies resulting in a
         possible bad configuration, resolve it by giving up on trying to keep
         DELL_LAPTOP visible in the menu and make it depend on DELL_SMBIOS.
      
       - Fix a null pointer dereference at module unload for the asus-wireless
         driver.
      
      * tag 'platform-drivers-x86-v4.17-2' of git://git.infradead.org/linux-platform-drivers-x86:
        platform/x86: Kconfig: Fix dell-laptop dependency chain.
        platform/x86: asus-wireless: Fix NULL pointer dereference
      ee946c36
    • Linus Torvalds's avatar
      Merge tag 'usb-4.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb · 8e95cb33
      Linus Torvalds authored
      Pull USB fixes from Greg KH:
       "Here are some USB driver fixes for 4.17-rc4.
      
        The majority of them are some USB gadget fixes that missed my last
        pull request. The "largest" patch in here is a fix for the old visor
        driver that syzbot found 6 months or so ago and I finally remembered
        to fix it.
      
        All of these have been in linux-next with no reported issues"
      
      * tag 'usb-4.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
        Revert "usb: host: ehci: Use dma_pool_zalloc()"
        usb: typec: tps6598x: handle block reads separately with plain-I2C adapters
        usb: typec: tcpm: Release the role mux when exiting
        USB: Accept bulk endpoints with 1024-byte maxpacket
        xhci: Fix use-after-free in xhci_free_virt_device
        USB: serial: visor: handle potential invalid device configuration
        USB: serial: option: adding support for ublox R410M
        usb: musb: trace: fix NULL pointer dereference in musb_g_tx()
        usb: musb: host: fix potential NULL pointer dereference
        usb: gadget: composite Allow for larger configuration descriptors
        usb: dwc3: gadget: Fix list_del corruption in dwc3_ep_dequeue
        usb: dwc3: gadget: dwc3_gadget_del_and_unmap_request() can be static
        usb: dwc2: pci: Fix error return code in dwc2_pci_probe()
        usb: dwc2: WA for Full speed ISOC IN in DDMA mode.
        usb: dwc2: dwc2_vbus_supply_init: fix error check
        usb: gadget: f_phonet: fix pn_net_xmit()'s return type
      8e95cb33
  5. 05 May, 2018 6 commits
    • Anthoine Bourgeois's avatar
      KVM: x86: remove APIC Timer periodic/oneshot spikes · ecf08dad
      Anthoine Bourgeois authored
      Since the commit "8003c9ae: add APIC Timer periodic/oneshot mode VMX
      preemption timer support", a Windows 10 guest has some erratic timer
      spikes.
      
      Here the results on a 150000 times 1ms timer without any load:
      	  Before 8003c9ae | After 8003c9ae
      Max           1834us          |  86000us
      Mean          1100us          |   1021us
      Deviation       59us          |    149us
      Here the results on a 150000 times 1ms timer with a cpu-z stress test:
      	  Before 8003c9ae | After 8003c9ae
      Max          32000us          | 140000us
      Mean          1006us          |   1997us
      Deviation      140us          |  11095us
      
      The root cause of the problem is starting hrtimer with an expiry time
      already in the past can take more than 20 milliseconds to trigger the
      timer function.  It can be solved by forward such past timers
      immediately, rather than submitting them to hrtimer_start().
      In case the timer is periodic, update the target expiration and call
      hrtimer_start with it.
      
      v2: Check if the tsc deadline is already expired. Thank you Mika.
      v3: Execute the past timers immediately rather than submitting them to
      hrtimer_start().
      v4: Rearm the periodic timer with advance_periodic_target_expiration() a
      simpler version of set_target_expiration(). Thank you Paolo.
      
      Cc: Mika Penttilä <mika.penttila@nextfour.com>
      Cc: Wanpeng Li <kernellwp@gmail.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarAnthoine Bourgeois <anthoine.bourgeois@blade-group.com>
      8003c9ae ("KVM: LAPIC: add APIC Timer periodic/oneshot mode VMX preemption timer support")
      Signed-off-by: default avatarRadim Krčmář <rkrcmar@redhat.com>
      ecf08dad
    • Radim Krčmář's avatar
      Merge tag 'kvmarm-fixes-for-4.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm · f3351c60
      Radim Krčmář authored
      KVM/arm fixes for 4.17, take #2
      
      - Fix proxying of GICv2 CPU interface accesses
      - Fix crash when switching to BE
      - Track source vcpu git GICv2 SGIs
      - Fix an outdated bit of documentation
      f3351c60
    • Linus Torvalds's avatar
      Merge tag 'kbuild-fixes-v4.17' of... · c1c07416
      Linus Torvalds authored
      Merge tag 'kbuild-fixes-v4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
      
      Pull Kbuild fixes from Masahiro Yamada:
      
       - remove state comment in modpost
      
       - extend MAINTAINERS entry to cover modpost and more makefiles
      
       - fix missed building of SANCOV gcc-plugin
      
       - replace left-over 'bison' with $(YACC)
      
       - display short log when generating parer of genksyms
      
      * tag 'kbuild-fixes-v4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
        genksyms: fix typo in parse.tab.{c,h} generation rules
        kbuild: replace hardcoded bison in cmd_bison_h with $(YACC)
        gcc-plugins: fix build condition of SANCOV plugin
        MAINTAINERS: Update Kbuild entry with a few paths
        modpost: delete stale comment
      c1c07416
    • Linus Torvalds's avatar
      Merge tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux · 4a7a7729
      Linus Torvalds authored
      Pull clk fixes froom Stephen Boyd:
       "A handful of fixes for the stm32mp1 clk driver came in during the
        merge window for the driver that got merged in the merge window.
      
        Plus a warning fix for unused PM ops and a couple fixes for the meson
        clk driver clk names that went unnoticed with the regmap rework.
      
        There's also another fix in here for the mux rounding flag which
        wasn't doing what it said it did, but now it does"
      
      * tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
        clk: meson: meson8b: fix meson8b_cpu_clk parent clock name
        clk: meson: meson8b: fix meson8b_fclk_div3_div clock name
        clk: meson: drop meson_aoclk_gate_regmap_ops
        clk: meson: honor CLK_MUX_ROUND_CLOSEST in clk_regmap
        clk: honor CLK_MUX_ROUND_CLOSEST in generic clk mux
        clk: cs2000: mark resume function as __maybe_unused
        clk: stm32mp1: remove ck_apb_dbg clock
        clk: stm32mp1: set stgen_k clock as critical
        clk: stm32mp1: add missing tzc2 clock
        clk: stm32mp1: fix SAI3 & SAI4 clocks
        clk: stm32mp1: remove unused dfsdm_src[] const
        clk: stm32mp1: add missing static
      4a7a7729
    • Linus Torvalds's avatar
      Merge tag 'rproc-v4.17-1' of git://github.com/andersson/remoteproc · f9331473
      Linus Torvalds authored
      Pull remoteproc and rpmsg fixes from Bjorn Andersson:
      
       - fix screw-up when reversing boolean for rproc_stop()
      
       - add missing OF node refcounting dereferences
      
       - add missing MODULE_ALIAS in rpmsg_char
      
      * tag 'rproc-v4.17-1' of git://github.com/andersson/remoteproc:
        rpmsg: added MODULE_ALIAS for rpmsg_char
        remoteproc: qcom: Fix potential device node leaks
        remoteproc: fix crashed parameter logic on stop call
      f9331473
    • Linus Torvalds's avatar
      Merge tag 'drm-fixes-for-v4.17-rc4' of git://people.freedesktop.org/~airlied/linux · c12fd0fe
      Linus Torvalds authored
      Pull drm fixes from Dave Airlie:
       "vmwgfx, i915, vc4, vga dac fixes.
      
        This seems eerily quiet, so I expect it will explode next week or
        something.
      
        One i915 model firmware, two vmwgfx fixes, one vc4 fix and one bridge
        leak fix"
      
      * tag 'drm-fixes-for-v4.17-rc4' of git://people.freedesktop.org/~airlied/linux:
        drm/bridge: vga-dac: Fix edid memory leak
        drm/vc4: Make sure vc4_bo_{inc,dec}_usecnt() calls are balanced
        drm/i915/glk: Add MODULE_FIRMWARE for Geminilake
        drm/vmwgfx: Fix a buffer object leak
        drm/vmwgfx: Clean up fbdev modeset locking
      c12fd0fe