1. 08 Aug, 2018 1 commit
    • Chuck Lever's avatar
      xprtrdma: Fix disconnect regression · 8d4fb8ff
      Chuck Lever authored
      I found that injecting disconnects with v4.18-rc resulted in
      random failures of the multi-threaded git regression test.
      
      The root cause appears to be that, after a reconnect, the
      RPC/RDMA transport is waking pending RPCs before the transport has
      posted enough Receive buffers to receive the Replies. If a Reply
      arrives before enough Receive buffers are posted, the connection
      is dropped. A few connection drops happen in quick succession as
      the client and server struggle to regain credit synchronization.
      
      This regression was introduced with commit 7c8d9e7c ("xprtrdma:
      Move Receive posting to Receive handler"). The client is supposed to
      post a single Receive when a connection is established because
      it's not supposed to send more than one RPC Call before it gets
      a fresh credit grant in the first RPC Reply [RFC 8166, Section
      3.3.3].
      
      Unfortunately there appears to be a longstanding bug in the Linux
      client's credit accounting mechanism. On connect, it simply dumps
      all pending RPC Calls onto the new connection. It's possible it has
      done this ever since the RPC/RDMA transport was added to the kernel
      ten years ago.
      
      Servers have so far been tolerant of this bad behavior. Currently no
      server implementation ever changes its credit grant over reconnects,
      and servers always repost enough Receives before connections are
      fully established.
      
      The Linux client implementation used to post a Receive before each
      of these Calls. This has covered up the flooding send behavior.
      
      I could try to correct this old bug so that the client sends exactly
      one RPC Call and waits for a Reply. Since we are so close to the
      next merge window, I'm going to instead provide a simple patch to
      post enough Receives before a reconnect completes (based on the
      number of credits granted to the previous connection).
      
      The spurious disconnects will be gone, but the client will still
      send multiple RPC Calls immediately after a reconnect.
      
      Addressing the latter problem will wait for a merge window because
      a) I expect it to be a large change requiring lots of testing, and
      b) obviously the Linux client has interoperated successfully since
      day zero while still being broken.
      
      Fixes: 7c8d9e7c ("xprtrdma: Move Receive posting to ... ")
      Cc: stable@vger.kernel.org # v4.18+
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: default avatarAnna Schumaker <Anna.Schumaker@Netapp.com>
      8d4fb8ff
  2. 31 Jul, 2018 6 commits
  3. 30 Jul, 2018 8 commits
  4. 26 Jul, 2018 17 commits
  5. 25 Jul, 2018 8 commits
    • Linus Torvalds's avatar
      Merge tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux · 6e77b267
      Linus Torvalds authored
      Pull clk fixes from Stephen Boyd:
       "One more round of updates for problems seen this -rc series. Drivers
        fixes are:
      
         - Amlogic Meson audio divider fix and CPU clk critical marking
      
         - Qualcomm multimedia GDSC marked as 'always on' to keep display
           working
      
         - Aspeed fixes for critical clks, resets causing clks to stay
           disabled, and an incorrect HPLL frequency calculation
      
         - Marvell Armada 3700 cpu clks would undervolt when switching from
           low frequencies to high frequencies because the voltage didn't
           stabilize in time so now we switch to an intermediate frequency
      
        Plus we have a core framework thinko that messed up the debugfs flag
        printing logic to make it not very useful"
      
      * tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
        clk: aspeed: Support HPLL strapping on ast2400
        clk: mvebu: armada-37xx-periph: Fix switching CPU rate from 300Mhz to 1.2GHz
        clk: aspeed: Mark bclk (PCIe) and dclk (VGA) as critical
        clk/mmcc-msm8996: Make mmagic_bimc_gdsc ALWAYS_ON
        clk: aspeed: Treat a gate in reset as disabled
        clk: Really show symbolic clock flags in debugfs
        clk: qcom: gcc-msm8996: Disable halt check on UFS tx clock
        clk: meson: audio-divider is one based
        clk: meson-gxbb: set fclk_div2 as CLK_IS_CRITICAL
      6e77b267
    • Linus Torvalds's avatar
      Merge tag 'fscache-fixes-20180725' of... · 5c61ef1b
      Linus Torvalds authored
      Merge tag 'fscache-fixes-20180725' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs
      
      Pull fscache/cachefiles fixes from David Howells:
      
       - Allow cancelled operations to be queued so they can be cleaned up.
      
       - Fix a refcounting bug in the monitoring of reads on backend files
         whereby a race can occur between monitor objects being listed for
         work, the work processing being queued and the work processor running
         and destroying the monitor objects.
      
       - Fix a ref overput in object attachment, whereby a tentatively
         considered object is put in error handling without first being 'got'.
      
       - Fix a missing clear of the CACHEFILES_OBJECT_ACTIVE flag whereby an
         assertion occurs when we retry because it seems the object is now
         active.
      
       - Wait rather BUG'ing on an object collision in the depths of
         cachefiles as the active object should be being cleaned up - also
         depends on the one above.
      
      * tag 'fscache-fixes-20180725' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
        cachefiles: Wait rather than BUG'ing on "Unexpected object collision"
        cachefiles: Fix missing clear of the CACHEFILES_OBJECT_ACTIVE flag
        fscache: Fix reference overput in fscache_attach_object() error handling
        cachefiles: Fix refcounting bug in backing-file read monitoring
        fscache: Allow cancelled operations to be enqueued
      5c61ef1b
    • Kiran Kumar Modukuri's avatar
      cachefiles: Wait rather than BUG'ing on "Unexpected object collision" · c2412ac4
      Kiran Kumar Modukuri authored
      If we meet a conflicting object that is marked FSCACHE_OBJECT_IS_LIVE in
      the active object tree, we have been emitting a BUG after logging
      information about it and the new object.
      
      Instead, we should wait for the CACHEFILES_OBJECT_ACTIVE flag to be cleared
      on the old object (or return an error).  The ACTIVE flag should be cleared
      after it has been removed from the active object tree.  A timeout of 60s is
      used in the wait, so we shouldn't be able to get stuck there.
      
      Fixes: 9ae326a6 ("CacheFiles: A cache that backs onto a mounted filesystem")
      Signed-off-by: default avatarKiran Kumar Modukuri <kiran.modukuri@gmail.com>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      c2412ac4
    • Kiran Kumar Modukuri's avatar
      cachefiles: Fix missing clear of the CACHEFILES_OBJECT_ACTIVE flag · 5ce83d4b
      Kiran Kumar Modukuri authored
      In cachefiles_mark_object_active(), the new object is marked active and
      then we try to add it to the active object tree.  If a conflicting object
      is already present, we want to wait for that to go away.  After the wait,
      we go round again and try to re-mark the object as being active - but it's
      already marked active from the first time we went through and a BUG is
      issued.
      
      Fix this by clearing the CACHEFILES_OBJECT_ACTIVE flag before we try again.
      
      Analysis from Kiran Kumar Modukuri:
      
      [Impact]
      Oops during heavy NFS + FSCache + Cachefiles
      
      CacheFiles: Error: Overlong wait for old active object to go away.
      
      BUG: unable to handle kernel NULL pointer dereference at 0000000000000002
      
      CacheFiles: Error: Object already active kernel BUG at
      fs/cachefiles/namei.c:163!
      
      [Cause]
      In a heavily loaded system with big files being read and truncated, an
      fscache object for a cookie is being dropped and a new object being
      looked. The new object being looked for has to wait for the old object
      to go away before the new object is moved to active state.
      
      [Fix]
      Clear the flag 'CACHEFILES_OBJECT_ACTIVE' for the new object when
      retrying the object lookup.
      
      [Testcase]
      Have run ~100 hours of NFS stress tests and have not seen this bug recur.
      
      [Regression Potential]
       - Limited to fscache/cachefiles.
      
      Fixes: 9ae326a6 ("CacheFiles: A cache that backs onto a mounted filesystem")
      Signed-off-by: default avatarKiran Kumar Modukuri <kiran.modukuri@gmail.com>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      5ce83d4b
    • Kiran Kumar Modukuri's avatar
      fscache: Fix reference overput in fscache_attach_object() error handling · f29507ce
      Kiran Kumar Modukuri authored
      When a cookie is allocated that causes fscache_object structs to be
      allocated, those objects are initialised with the cookie pointer, but
      aren't blessed with a ref on that cookie unless the attachment is
      successfully completed in fscache_attach_object().
      
      If attachment fails because the parent object was dying or there was a
      collision, fscache_attach_object() returns without incrementing the cookie
      counter - but upon failure of this function, the object is released which
      then puts the cookie, whether or not a ref was taken on the cookie.
      
      Fix this by taking a ref on the cookie when it is assigned in
      fscache_object_init(), even when we're creating a root object.
      
      
      Analysis from Kiran Kumar:
      
      This bug has been seen in 4.4.0-124-generic #148-Ubuntu kernel
      
      BugLink: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1776277
      
      fscache cookie ref count updated incorrectly during fscache object
      allocation resulting in following Oops.
      
      kernel BUG at /build/linux-Y09MKI/linux-4.4.0/fs/fscache/internal.h:321!
      kernel BUG at /build/linux-Y09MKI/linux-4.4.0/fs/fscache/cookie.c:639!
      
      [Cause]
      Two threads are trying to do operate on a cookie and two objects.
      
      (1) One thread tries to unmount the filesystem and in process goes over a
          huge list of objects marking them dead and deleting the objects.
          cookie->usage is also decremented in following path:
      
            nfs_fscache_release_super_cookie
             -> __fscache_relinquish_cookie
              ->__fscache_cookie_put
              ->BUG_ON(atomic_read(&cookie->usage) <= 0);
      
      (2) A second thread tries to lookup an object for reading data in following
          path:
      
          fscache_alloc_object
          1) cachefiles_alloc_object
              -> fscache_object_init
                 -> assign cookie, but usage not bumped.
          2) fscache_attach_object -> fails in cant_attach_object because the
               cookie's backing object or cookie's->parent object are going away
          3) fscache_put_object
              -> cachefiles_put_object
                ->fscache_object_destroy
                  ->fscache_cookie_put
                     ->BUG_ON(atomic_read(&cookie->usage) <= 0);
      
      [NOTE from dhowells] It's unclear as to the circumstances in which (2) can
      take place, given that thread (1) is in nfs_kill_super(), however a
      conflicting NFS mount with slightly different parameters that creates a
      different superblock would do it.  A backtrace from Kiran seems to show
      that this is a possibility:
      
          kernel BUG at/build/linux-Y09MKI/linux-4.4.0/fs/fscache/cookie.c:639!
          ...
          RIP: __fscache_cookie_put+0x3a/0x40 [fscache]
          Call Trace:
           __fscache_relinquish_cookie+0x87/0x120 [fscache]
           nfs_fscache_release_super_cookie+0x2d/0xb0 [nfs]
           nfs_kill_super+0x29/0x40 [nfs]
           deactivate_locked_super+0x48/0x80
           deactivate_super+0x5c/0x60
           cleanup_mnt+0x3f/0x90
           __cleanup_mnt+0x12/0x20
           task_work_run+0x86/0xb0
           exit_to_usermode_loop+0xc2/0xd0
           syscall_return_slowpath+0x4e/0x60
           int_ret_from_sys_call+0x25/0x9f
      
      [Fix] Bump up the cookie usage in fscache_object_init, when it is first
      being assigned a cookie atomically such that the cookie is added and bumped
      up if its refcount is not zero.  Remove the assignment in
      fscache_attach_object().
      
      [Testcase]
      I have run ~100 hours of NFS stress tests and not seen this bug recur.
      
      [Regression Potential]
       - Limited to fscache/cachefiles.
      
      Fixes: ccc4fc3d ("FS-Cache: Implement the cookie management part of the netfs API")
      Signed-off-by: default avatarKiran Kumar Modukuri <kiran.modukuri@gmail.com>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      f29507ce
    • Kiran Kumar Modukuri's avatar
      cachefiles: Fix refcounting bug in backing-file read monitoring · 934140ab
      Kiran Kumar Modukuri authored
      cachefiles_read_waiter() has the right to access a 'monitor' object by
      virtue of being called under the waitqueue lock for one of the pages in its
      purview.  However, it has no ref on that monitor object or on the
      associated operation.
      
      What it is allowed to do is to move the monitor object to the operation's
      to_do list, but once it drops the work_lock, it's actually no longer
      permitted to access that object.  However, it is trying to enqueue the
      retrieval operation for processing - but it can only do this via a pointer
      in the monitor object, something it shouldn't be doing.
      
      If it doesn't enqueue the operation, the operation may not get processed.
      If the order is flipped so that the enqueue is first, then it's possible
      for the work processor to look at the to_do list before the monitor is
      enqueued upon it.
      
      Fix this by getting a ref on the operation so that we can trust that it
      will still be there once we've added the monitor to the to_do list and
      dropped the work_lock.  The op can then be enqueued after the lock is
      dropped.
      
      The bug can manifest in one of a couple of ways.  The first manifestation
      looks like:
      
       FS-Cache:
       FS-Cache: Assertion failed
       FS-Cache: 6 == 5 is false
       ------------[ cut here ]------------
       kernel BUG at fs/fscache/operation.c:494!
       RIP: 0010:fscache_put_operation+0x1e3/0x1f0
       ...
       fscache_op_work_func+0x26/0x50
       process_one_work+0x131/0x290
       worker_thread+0x45/0x360
       kthread+0xf8/0x130
       ? create_worker+0x190/0x190
       ? kthread_cancel_work_sync+0x10/0x10
       ret_from_fork+0x1f/0x30
      
      This is due to the operation being in the DEAD state (6) rather than
      INITIALISED, COMPLETE or CANCELLED (5) because it's already passed through
      fscache_put_operation().
      
      The bug can also manifest like the following:
      
       kernel BUG at fs/fscache/operation.c:69!
       ...
          [exception RIP: fscache_enqueue_operation+246]
       ...
       #7 [ffff883fff083c10] fscache_enqueue_operation at ffffffffa0b793c6
       #8 [ffff883fff083c28] cachefiles_read_waiter at ffffffffa0b15a48
       #9 [ffff883fff083c48] __wake_up_common at ffffffff810af028
      
      I'm not entirely certain as to which is line 69 in Lei's kernel, so I'm not
      entirely clear which assertion failed.
      
      Fixes: 9ae326a6 ("CacheFiles: A cache that backs onto a mounted filesystem")
      Reported-by: default avatarLei Xue <carmark.dlut@gmail.com>
      Reported-by: default avatarVegard Nossum <vegard.nossum@gmail.com>
      Reported-by: default avatarAnthony DeRobertis <aderobertis@metrics.net>
      Reported-by: default avatarNeilBrown <neilb@suse.com>
      Reported-by: default avatarDaniel Axtens <dja@axtens.net>
      Reported-by: default avatarKiran Kumar Modukuri <kiran.modukuri@gmail.com>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Reviewed-by: default avatarDaniel Axtens <dja@axtens.net>
      934140ab
    • Kiran Kumar Modukuri's avatar
      fscache: Allow cancelled operations to be enqueued · d0eb06af
      Kiran Kumar Modukuri authored
      Alter the state-check assertion in fscache_enqueue_operation() to allow
      cancelled operations to be given processing time so they can be cleaned up.
      
      Also fix a debugging statement that was requiring such operations to have
      an object assigned.
      
      Fixes: 9ae326a6 ("CacheFiles: A cache that backs onto a mounted filesystem")
      Reported-by: default avatarKiran Kumar Modukuri <kiran.modukuri@gmail.com>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      d0eb06af
    • Linus Torvalds's avatar
      Merge tag 'mips_fixes_4.18_4' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux · 9981b4fb
      Linus Torvalds authored
      Pull MIPS fixes from Paul Burton:
       "A couple more MIPS fixes for 4.18:
      
         - Fix an off-by-one in reporting PCI resource sizes to userland which
           regressed in v3.12.
      
         - Fix writes to DDR controller registers used to flush write buffers,
           which regressed with some refactoring in v4.2"
      
      * tag 'mips_fixes_4.18_4' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
        MIPS: ath79: fix register address in ath79_ddr_wb_flush()
        MIPS: Fix off-by-one in pci_resource_to_user()
      9981b4fb