1. 27 Jul, 2023 4 commits
    • Yu Kuai's avatar
      md: refactor idle/frozen_sync_thread() to fix deadlock · 130443d6
      Yu Kuai authored
      Our test found a following deadlock in raid10:
      
      1) Issue a normal write, and such write failed:
      
        raid10_end_write_request
         set_bit(R10BIO_WriteError, &r10_bio->state)
         one_write_done
          reschedule_retry
      
        // later from md thread
        raid10d
         handle_write_completed
          list_add(&r10_bio->retry_list, &conf->bio_end_io_list)
      
        // later from md thread
        raid10d
         if (!test_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags))
          list_move(conf->bio_end_io_list.prev, &tmp)
          r10_bio = list_first_entry(&tmp, struct r10bio, retry_list)
          raid_end_bio_io(r10_bio)
      
      Dependency chain 1: normal io is waiting for updating superblock
      
      2) Trigger a recovery:
      
        raid10_sync_request
         raise_barrier
      
      Dependency chain 2: sync thread is waiting for normal io
      
      3) echo idle/frozen to sync_action:
      
        action_store
         mddev_lock
          md_unregister_thread
           kthread_stop
      
      Dependency chain 3: drop 'reconfig_mutex' is waiting for sync thread
      
      4) md thread can't update superblock:
      
        raid10d
         md_check_recovery
          if (mddev_trylock(mddev))
           md_update_sb
      
      Dependency chain 4: update superblock is waiting for 'reconfig_mutex'
      
      Hence cyclic dependency exist, in order to fix the problem, we must
      break one of them. Dependency 1 and 2 can't be broken because they are
      foundation design. Dependency 4 may be possible if it can be guaranteed
      that no io can be inflight, however, this requires a new mechanism which
      seems complex. Dependency 3 is a good choice, because idle/frozen only
      requires sync thread to finish, which can be done asynchronously that is
      already implemented, and 'reconfig_mutex' is not needed anymore.
      
      This patch switch 'idle' and 'frozen' to wait sync thread to be done
      asynchronously, and this patch also add a sequence counter to record how
      many times sync thread is done, so that 'idle' won't keep waiting on new
      started sync thread.
      
      Noted that raid456 has similiar deadlock([1]), and it's verified[2] this
      deadlock can be fixed by this patch as well.
      
      [1] https://lore.kernel.org/linux-raid/5ed54ffc-ce82-bf66-4eff-390cb23bc1ac@molgen.mpg.de/T/#t
      [2] https://lore.kernel.org/linux-raid/e9067438-d713-f5f3-0d3d-9e6b0e9efa0e@huaweicloud.com/Signed-off-by: default avatarYu Kuai <yukuai3@huawei.com>
      Signed-off-by: default avatarSong Liu <song@kernel.org>
      Link: https://lore.kernel.org/r/20230529132037.2124527-5-yukuai1@huaweicloud.com
      130443d6
    • Yu Kuai's avatar
      md: add a mutex to synchronize idle and frozen in action_store() · 6f56f0c4
      Yu Kuai authored
      Currently, for idle and frozen, action_store will hold 'reconfig_mutex'
      and call md_reap_sync_thread() to stop sync thread, however, this will
      cause deadlock (explained in the next patch). In order to fix the
      problem, following patch will release 'reconfig_mutex' and wait on
      'resync_wait', like md_set_readonly() and do_md_stop() does.
      
      Consider that action_store() will set/clear 'MD_RECOVERY_FROZEN'
      unconditionally, which might cause unexpected problems, for example,
      frozen just set 'MD_RECOVERY_FROZEN' and is still in progress, while
      'idle' clear 'MD_RECOVERY_FROZEN' and new sync thread is started, which
      might starve in progress frozen. A mutex is added to synchronize idle
      and frozen from action_store().
      Signed-off-by: default avatarYu Kuai <yukuai3@huawei.com>
      Signed-off-by: default avatarSong Liu <song@kernel.org>
      Link: https://lore.kernel.org/r/20230529132037.2124527-4-yukuai1@huaweicloud.com
      6f56f0c4
    • Yu Kuai's avatar
      md: refactor action_store() for 'idle' and 'frozen' · 64e5e09a
      Yu Kuai authored
      Prepare to handle 'idle' and 'frozen' differently to fix a deadlock, there
      are no functional changes except that MD_RECOVERY_RUNNING is checked
      again after 'reconfig_mutex' is held.
      Signed-off-by: default avatarYu Kuai <yukuai3@huawei.com>
      Signed-off-by: default avatarSong Liu <song@kernel.org>
      Link: https://lore.kernel.org/r/20230529132037.2124527-3-yukuai1@huaweicloud.com
      64e5e09a
    • Yu Kuai's avatar
      Revert "md: unlock mddev before reap sync_thread in action_store" · a865b96c
      Yu Kuai authored
      This reverts commit 9dfbdafd.
      
      Because it will introduce a defect that sync_thread can be running while
      MD_RECOVERY_RUNNING is cleared, which will cause some unexpected problems,
      for example:
      
      list_add corruption. prev->next should be next (ffff0001ac1daba0), but was ffff0000ce1a02a0. (prev=ffff0000ce1a02a0).
      Call trace:
       __list_add_valid+0xfc/0x140
       insert_work+0x78/0x1a0
       __queue_work+0x500/0xcf4
       queue_work_on+0xe8/0x12c
       md_check_recovery+0xa34/0xf30
       raid10d+0xb8/0x900 [raid10]
       md_thread+0x16c/0x2cc
       kthread+0x1a4/0x1ec
       ret_from_fork+0x10/0x18
      
      This is because work is requeued while it's still inside workqueue:
      
      t1:			t2:
      action_store
       mddev_lock
        if (mddev->sync_thread)
         mddev_unlock
         md_unregister_thread
         // first sync_thread is done
      			md_check_recovery
      			 mddev_try_lock
      			 /*
      			  * once MD_RECOVERY_DONE is set, new sync_thread
      			  * can start.
      			  */
      			 set_bit(MD_RECOVERY_RUNNING, &mddev->recovery)
      			 INIT_WORK(&mddev->del_work, md_start_sync)
      			 queue_work(md_misc_wq, &mddev->del_work)
      			  test_and_set_bit(WORK_STRUCT_PENDING_BIT, ...)
      			  // set pending bit
      			  insert_work
      			   list_add_tail
      			 mddev_unlock
         mddev_lock_nointr
         md_reap_sync_thread
         // MD_RECOVERY_RUNNING is cleared
       mddev_unlock
      
      t3:
      
      // before queued work started from t2
      md_check_recovery
       // MD_RECOVERY_RUNNING is not set, a new sync_thread can be started
       INIT_WORK(&mddev->del_work, md_start_sync)
        work->data = 0
        // work pending bit is cleared
       queue_work(md_misc_wq, &mddev->del_work)
        insert_work
         list_add_tail
         // list is corrupted
      
      The above commit is reverted to fix the problem, the deadlock this
      commit tries to fix will be fixed in following patches.
      Signed-off-by: default avatarYu Kuai <yukuai3@huawei.com>
      Signed-off-by: default avatarSong Liu <song@kernel.org>
      Link: https://lore.kernel.org/r/20230529132037.2124527-2-yukuai1@huaweicloud.com
      a865b96c
  2. 26 Jul, 2023 1 commit
  3. 25 Jul, 2023 11 commits
  4. 20 Jul, 2023 1 commit
  5. 17 Jul, 2023 9 commits
  6. 16 Jul, 2023 10 commits
    • Linus Torvalds's avatar
      Linux 6.5-rc2 · fdf0eaf1
      Linus Torvalds authored
      fdf0eaf1
    • Linus Torvalds's avatar
      Merge tag 'xtensa-20230716' of https://github.com/jcmvbkbc/linux-xtensa · 5b8d6e85
      Linus Torvalds authored
      Pull xtensa fixes from Max Filippov:
      
       - fix interaction between unaligned exception handler and load/store
         exception handler
      
       - fix parsing ISS network interface specification string
      
       - add comment about etherdev freeing to ISS network driver
      
      * tag 'xtensa-20230716' of https://github.com/jcmvbkbc/linux-xtensa:
        xtensa: fix unaligned and load/store configuration interaction
        xtensa: ISS: fix call to split_if_spec
        xtensa: ISS: add comment about etherdev freeing
      5b8d6e85
    • Linus Torvalds's avatar
      Merge tag 'perf_urgent_for_v6.5_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 1667e630
      Linus Torvalds authored
      Pull perf fix from Borislav Petkov:
      
       - Fix a lockdep warning when the event given is the first one, no event
         group exists yet but the code still goes and iterates over event
         siblings
      
      * tag 'perf_urgent_for_v6.5_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        perf/x86: Fix lockdep warning in for_each_sibling_event() on SPR
      1667e630
    • Linus Torvalds's avatar
      Merge tag 'objtool_urgent_for_v6.5_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 8a3e4a64
      Linus Torvalds authored
      Pull objtool fixes from Borislav Petkov:
      
       - Mark copy_iovec_from_user() __noclone in order to prevent gcc from
         doing an inter-procedural optimization and confuse objtool
      
       - Initialize struct elf fully to avoid build failures
      
      * tag 'objtool_urgent_for_v6.5_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        iov_iter: Mark copy_iovec_from_user() noclone
        objtool: initialize all of struct elf
      8a3e4a64
    • Linus Torvalds's avatar
      Merge tag 'sched_urgent_for_v6.5_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · f61a89ca
      Linus Torvalds authored
      Pull scheduler fixes from Borislav Petkov:
      
       - Remove a cgroup from under a polling process properly
      
       - Fix the idle sibling selection
      
      * tag 'sched_urgent_for_v6.5_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        sched/psi: use kernfs polling functions for PSI trigger polling
        sched/fair: Use recent_used_cpu to test p->cpus_ptr
      f61a89ca
    • Linus Torvalds's avatar
      Merge tag 'pinctrl-v6.5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl · ede950b0
      Linus Torvalds authored
      Pull pin control fixes from Linus Walleij:
       "I'm mostly on vacation but what would vacation be without a few
        critical fixes so people can use their gaming laptops when hiding away
        from the sun (or rain)?
      
         - Fix a really annoying interrupt storm in the AMD driver affecting
           Asus TUF gaming notebooks
      
         - Fix device tree parsing in the Renesas driver"
      
      * tag 'pinctrl-v6.5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
        pinctrl: amd: Unify debounce handling into amd_pinconf_set()
        pinctrl: amd: Drop pull up select configuration
        pinctrl: amd: Use amd_pinconf_set() for all config options
        pinctrl: amd: Only use special debounce behavior for GPIO 0
        pinctrl: renesas: rzg2l: Handle non-unique subnode names
        pinctrl: renesas: rzv2m: Handle non-unique subnode names
      ede950b0
    • Linus Torvalds's avatar
      Merge tag '6.5-rc1-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6 · fe756ad0
      Linus Torvalds authored
      Pull smb client fixes from Steve French:
      
       - Two reconnect fixes: important fix to address inFlight count to leak
         (which can leak credits), and fix for better handling a deleted share
      
       - DFS fix
      
       - SMB1 cleanup fix
      
       - deferred close fix
      
      * tag '6.5-rc1-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6:
        cifs: fix mid leak during reconnection after timeout threshold
        cifs: is_network_name_deleted should return a bool
        smb: client: fix missed ses refcounting
        smb: client: Fix -Wstringop-overflow issues
        cifs: if deferred close is disabled then close files immediately
      fe756ad0
    • Linus Torvalds's avatar
      Merge tag 'powerpc-6.5-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux · 20edcec2
      Linus Torvalds authored
      Pull powerpc fixes from Michael Ellerman:
      
       - Fix Speculation_Store_Bypass reporting in /proc/self/status on
         Power10
      
       - Fix HPT with 4K pages since recent changes by implementing pmd_same()
      
       - Fix 64-bit native_hpte_remove() to be irq-safe
      
      Thanks to Aneesh Kumar K.V, Nageswara R Sastry, and Russell Currey.
      
      * tag 'powerpc-6.5-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
        powerpc/mm/book3s64/hash/4k: Add pmd_same callback for 4K page size
        powerpc/64e: Fix obtool warnings in exceptions-64e.S
        powerpc/security: Fix Speculation_Store_Bypass reporting on Power10
        powerpc/64s: Fix native_hpte_remove() to be irq-safe
      20edcec2
    • Linus Torvalds's avatar
      Merge tag 'hardening-v6.5-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux · 6eede068
      Linus Torvalds authored
      Pull hardening fixes from Kees Cook:
      
       - Remove LTO-only suffixes from promoted global function symbols
         (Yonghong Song)
      
       - Remove unused .text..refcount section from vmlinux.lds.h (Petr Pavlu)
      
       - Add missing __always_inline to sparc __arch_xchg() (Arnd Bergmann)
      
       - Claim maintainership of string routines
      
      * tag 'hardening-v6.5-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
        sparc: mark __arch_xchg() as __always_inline
        MAINTAINERS: Foolishly claim maintainership of string routines
        kallsyms: strip LTO-only suffixes from promoted global functions
        vmlinux.lds.h: Remove a reference to no longer used sections .text..refcount
      6eede068
    • Linus Torvalds's avatar
      Merge tag 'probes-fixes-v6.5-rc1-2' of... · 4b4eef57
      Linus Torvalds authored
      Merge tag 'probes-fixes-v6.5-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
      
      Pull probe fixes from Masami Hiramatsu:
      
       - fprobe: Add a comment why fprobe will be skipped if another kprobe is
         running in fprobe_kprobe_handler().
      
       - probe-events: Fix some issues related to fetch-arguments:
      
          - Fix double counting of the string length for user-string and
            symstr. This will require longer buffer in the array case.
      
          - Fix not to count error code (minus value) for the total used
            length in array argument. This makes the total used length
            shorter.
      
          - Fix to update dynamic used data size counter only if fetcharg uses
            the dynamic size data. This may mis-count the used dynamic data
            size and corrupt data.
      
          - Revert "tracing: Add "(fault)" name injection to kernel probes"
            because that did not work correctly with a bug, and we agreed the
            current '(fault)' output (instead of '"(fault)"' like a string)
            explains what happened more clearly.
      
          - Fix to record 0-length (means fault access) data_loc data in fetch
            function itself, instead of store_trace_args(). If we record an
            array of string, this will fix to save fault access data on each
            entry of the array correctly.
      
      * tag 'probes-fixes-v6.5-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
        tracing/probes: Fix to record 0-length data_loc in fetch_store_string*() if fails
        Revert "tracing: Add "(fault)" name injection to kernel probes"
        tracing/probes: Fix to update dynamic data counter if fetcharg uses it
        tracing/probes: Fix not to count error code to total length
        tracing/probes: Fix to avoid double count of the string length on the array
        fprobes: Add a comment why fprobe_kprobe_handler exits if kprobe is running
      4b4eef57
  7. 15 Jul, 2023 4 commits