1. 27 Jul, 2023 7 commits
    • Jann Horn's avatar
      mm: fix memory ordering for mm_lock_seq and vm_lock_seq · b1f02b95
      Jann Horn authored
      mm->mm_lock_seq effectively functions as a read/write lock; therefore it
      must be used with acquire/release semantics.
      
      A specific example is the interaction between userfaultfd_register() and
      lock_vma_under_rcu().
      
      userfaultfd_register() does the following from the point where it changes
      a VMA's flags to the point where concurrent readers are permitted again
      (in a simple scenario where only a single private VMA is accessed and no
      merging/splitting is involved):
      
      userfaultfd_register
        userfaultfd_set_vm_flags
          vm_flags_reset
            vma_start_write
              down_write(&vma->vm_lock->lock)
              vma->vm_lock_seq = mm_lock_seq [marks VMA as busy]
              up_write(&vma->vm_lock->lock)
            vm_flags_init
              [sets VM_UFFD_* in __vm_flags]
        vma->vm_userfaultfd_ctx.ctx = ctx
        mmap_write_unlock
          vma_end_write_all
            WRITE_ONCE(mm->mm_lock_seq, mm->mm_lock_seq + 1) [unlocks VMA]
      
      There are no memory barriers in between the __vm_flags update and the
      mm->mm_lock_seq update that unlocks the VMA, so the unlock can be
      reordered to above the `vm_flags_init()` call, which means from the
      perspective of a concurrent reader, a VMA can be marked as a userfaultfd
      VMA while it is not VMA-locked.  That's bad, we definitely need a
      store-release for the unlock operation.
      
      The non-atomic write to vma->vm_lock_seq in vma_start_write() is mostly
      fine because all accesses to vma->vm_lock_seq that matter are always
      protected by the VMA lock.  There is a racy read in vma_start_read()
      though that can tolerate false-positives, so we should be using
      WRITE_ONCE() to keep things tidy and data-race-free (including for KCSAN).
      
      On the other side, lock_vma_under_rcu() works as follows in the relevant
      region for locking and userfaultfd check:
      
      lock_vma_under_rcu
        vma_start_read
          vma->vm_lock_seq == READ_ONCE(vma->vm_mm->mm_lock_seq) [early bailout]
          down_read_trylock(&vma->vm_lock->lock)
          vma->vm_lock_seq == READ_ONCE(vma->vm_mm->mm_lock_seq) [main check]
        userfaultfd_armed
          checks vma->vm_flags & __VM_UFFD_FLAGS
      
      Here, the interesting aspect is how far down the mm->mm_lock_seq read can
      be reordered - if this read is reordered down below the vma->vm_flags
      access, this could cause lock_vma_under_rcu() to partly operate on
      information that was read while the VMA was supposed to be locked.  To
      prevent this kind of downwards bleeding of the mm->mm_lock_seq read, we
      need to read it with a load-acquire.
      
      Some of the comment wording is based on suggestions by Suren.
      
      BACKPORT WARNING: One of the functions changed by this patch (which I've
      written against Linus' tree) is vma_try_start_write(), but this function
      no longer exists in mm/mm-everything.  I don't know whether the merged
      version of this patch will be ordered before or after the patch that
      removes vma_try_start_write().  If you're backporting this patch to a tree
      with vma_try_start_write(), make sure this patch changes that function.
      
      Link: https://lkml.kernel.org/r/20230721225107.942336-1-jannh@google.com
      Fixes: 5e31275c ("mm: add per-VMA lock and helper functions to control it")
      Signed-off-by: default avatarJann Horn <jannh@google.com>
      Reviewed-by: default avatarSuren Baghdasaryan <surenb@google.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      b1f02b95
    • Drew Fustini's avatar
      scripts/spelling.txt: remove 'thead' as a typo · 15571273
      Drew Fustini authored
      T-Head is a vendor of processor core IP, and they have recently introduced
      the RISC-V TH1520 SoC.  Remove 'thead' as a typo of 'thread' to avoid
      checkpatch incorrectly warning that 'thead' is typo in patches that add
      support for T-Head designs in the kernel.
      
      Link: https://lkml.kernel.org/r/20230723010329.674186-1-dfustini@baylibre.com
      Link: https://www.t-head.cn/Signed-off-by: default avatarDrew Fustini <dfustini@baylibre.com>
      Acked-by: default avatarGuo Ren <guoren@kernel.org>
      Cc: Conor Dooley <conor@kernel.org>
      Cc: Jisheng Zhang <jszhang@kernel.org>
      Cc: Colin Ian King <colin.i.king@gmail.com>
      Cc: Diederik de Haas <didi.debian@cknow.org>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Luca Ceresoli <luca.ceresoli@bootlin.com> # versaclock5
      Cc: Randy Dunlap <rdunlap@infradead.org>
      Cc: SeongJae Park <sj@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      15571273
    • Hugh Dickins's avatar
      mm/pagewalk: fix EFI_PGT_DUMP of espfix area · 8b1cb4a2
      Hugh Dickins authored
      Booting x86_64 with CONFIG_EFI_PGT_DUMP=y shows messages of the form
      "mm/pgtable-generic.c:53: bad pmd (____ptrval____)(8000000100077061)".
      
      EFI_PGT_DUMP dumps all of efi_mm, including the espfix area, which is set
      up with pmd entries which fit the pmd_bad() check: so 0d940a9b warns
      and clears those entries, which would ruin running Win16 binaries.
      
      The failing pte_offset_map() stopped such a kernel from even booting,
      until a few commits later be872f83 changed the pagewalk to tolerate
      that: but it needs to be even more careful, to not spoil those entries.
      
      I might have preferred to change init_espfix_ap() not to use "bad" pmd
      entries; or to leave them out of the efi_mm dump.  But there is great
      value in staying away from there, and a pagewalk check of address against
      TASK_SIZE may protect from other such aberrations too.
      
      Link: https://lkml.kernel.org/r/22bca736-4cab-9ee5-6a52-73a3b2bbe865@google.com
      Closes: https://lore.kernel.org/linux-mm/CABXGCsN3JqXckWO=V7p=FhPU1tK03RE1w9UE6xL5Y86SMk209w@mail.gmail.com/
      Fixes: 0d940a9b ("mm/pgtable: allow pte_offset_map[_lock]() to fail")
      Fixes: be872f83 ("mm/pagewalk: walk_pte_range() allow for pte_offset_map()")
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Reported-by: default avatarMikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
      Tested-by: default avatarMikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
      Cc: Bagas Sanjaya <bagasdotme@gmail.com>
      Cc: Laura Abbott <labbott@fedoraproject.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      8b1cb4a2
    • Hugh Dickins's avatar
      shmem: minor fixes to splice-read implementation · fa598952
      Hugh Dickins authored
      HWPoison: my reading of folio_test_hwpoison() is that it only tests the
      head page of a large folio, whereas splice_folio_into_pipe() will splice
      as much of the folio as it can: so for safety we should also check the
      has_hwpoisoned flag, set if any of the folio's pages are hwpoisoned. 
      (Perhaps that ugliness can be improved at the mm end later.)
      
      The call to splice_zeropage_into_pipe() risked overrunning past EOF: ask
      it for "part" not "len".
      
      Link: https://lkml.kernel.org/r/32c72c9c-72a8-115f-407d-f0148f368@google.com
      Fixes: bd194b18 ("shmem: Implement splice-read")
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Reviewed-by: default avatarDavid Howells <dhowells@redhat.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      fa598952
    • Hugh Dickins's avatar
      tmpfs: fix Documentation of noswap and huge mount options · 253e5df8
      Hugh Dickins authored
      The noswap mount option is surely not one of the three options for sizing:
      move its description down.
      
      The huge= mount option does not accept numeric values: those are just in
      an internal enum.  Delete those numbers, and follow the manpage text more
      closely (but there's not yet any fadvise() or fcntl() which applies here).
      
      /sys/kernel/mm/transparent_hugepage/shmem_enabled is hard to describe, and
      barely relevant to mounting a tmpfs: just refer to transhuge.rst (while
      still using the words deny and force, to help as informal reminders).
      
      [rdunlap@infradead.org: fixup Docs table for huge mount options]
        Link: https://lkml.kernel.org/r/20230725052333.26857-1-rdunlap@infradead.org
      Link: https://lkml.kernel.org/r/986cb0bf-9780-354-9bb-4bf57aadbab@google.comSigned-off-by: default avatarHugh Dickins <hughd@google.com>
      Signed-off-by: default avatarRandy Dunlap <rdunlap@infradead.org>
      Fixes: d0f5a854 ("shmem: update documentation")
      Fixes: 2c6efe9c ("shmem: add support to ignore swap")
      Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> 
      Cc: Christian Brauner <brauner@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      253e5df8
    • Andy Shevchenko's avatar
      Revert "um: Use swap() to make code cleaner" · dddfa05e
      Andy Shevchenko authored
      This reverts commit 9b0da3f2.
      
      The sigio.c is clearly user space code which is handled by
      arch/um/scripts/Makefile.rules (see USER_OBJS rule).
      
      The above mentioned commit simply broke this agreement,
      we may not use Linux kernel internal headers in them without
      thorough thinking.
      
      Hence, revert the wrong commit.
      
      Link: https://lkml.kernel.org/r/20230724143131.30090-1-andriy.shevchenko@linux.intel.comSigned-off-by: default avatarAndy Shevchenko <andriy.shevchenko@linux.intel.com>
      Reported-by: default avatarkernel test robot <lkp@intel.com>
      Closes: https://lore.kernel.org/oe-kbuild-all/202307212304.cH79zJp1-lkp@intel.com/
      Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
      Cc: Herve Codina <herve.codina@bootlin.com>
      Cc: Jason A. Donenfeld <Jason@zx2c4.com>
      Cc: Johannes Berg <johannes@sipsolutions.net>
      Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Yang Guang <yang.guang5@zte.com.cn>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      dddfa05e
    • Feng Tang's avatar
      mm/damon/core-test: initialise context before test in damon_test_set_attrs() · d1836a3b
      Feng Tang authored
      Running kunit test for 6.5-rc1 hits one bug:
      
              ok 10 damon_test_update_monitoring_result
          general protection fault, probably for non-canonical address 0x1bffa5c419cfb81: 0000 [#1] PREEMPT SMP NOPTI
          CPU: 1 PID: 110 Comm: kunit_try_catch Tainted: G                 N 6.5.0-rc2 #15
          Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
          RIP: 0010:damon_set_attrs+0xb9/0x120
          Code: f8 00 00 00 4c 8d 58 e0 48 39 c3 74 ba 41 ba 59 17 b7 d1 49 8b 43 10 4d
          8d 4b 10 48 8d 70 e0 49 39 c1 74 50 49 8b 40 08 31 d2 <69> 4e 18 10 27 00 00
          49 f7 30 31 d2 48 89 c5 89 c8 f7 f5 31 d2 89
          RSP: 0000:ffffc900005bfd40 EFLAGS: 00010246
          RAX: ffffffff81159fc0 RBX: ffffc900005bfeb8 RCX: 0000000000000000
          RDX: 0000000000000000 RSI: 01bffa5c419cfb69 RDI: ffffc900005bfd70
          RBP: ffffc90000013c10 R08: ffffc900005bfdc0 R09: ffffffff81ff10ed
          R10: 00000000d1b71759 R11: ffffffff81ff10dd R12: ffffc90000013a78
          R13: ffff88810eb78180 R14: ffffffff818297c0 R15: ffffc90000013c28
          FS:  0000000000000000(0000) GS:ffff88813bd00000(0000) knlGS:0000000000000000
          CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
          CR2: 0000000000000000 CR3: 0000000002a1c001 CR4: 0000000000370ee0
          DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
          DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
          Call Trace:
           <TASK>
           damon_test_set_attrs+0x63/0x1f0
           kunit_generic_run_threadfn_adapter+0x17/0x30
           kthread+0xfd/0x130
      
      The problem seems to be related with the damon_ctx was used without
      being initialized. Fix it by adding the initialization.
      
      Link: https://lkml.kernel.org/r/20230718052811.1065173-1-feng.tang@intel.com
      Fixes: aa13779b ("mm/damon/core-test: add a test for damon_set_attrs()")
      Signed-off-by: default avatarFeng Tang <feng.tang@intel.com>
      Reviewed-by: default avatarSeongJae Park <sj@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      d1836a3b
  2. 23 Jul, 2023 18 commits
    • Linus Torvalds's avatar
      Linux 6.5-rc3 · 6eaae198
      Linus Torvalds authored
      6eaae198
    • Linus Torvalds's avatar
      Merge tag 'trace-v6.5-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace · 3b4e48b8
      Linus Torvalds authored
      Pull tracing fixes from Steven Rostedt:
      
       - Swapping the ring buffer for snapshotting (for things like irqsoff)
         can crash if the ring buffer is being resized. Disable swapping when
         this happens. The missed swap will be reported to the tracer
      
       - Report error if the histogram fails to be created due to an error in
         adding a histogram variable, in event_hist_trigger_parse()
      
       - Remove unused declaration of tracing_map_set_field_descr()
      
      * tag 'trace-v6.5-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
        tracing/histograms: Return an error if we fail to add histogram to hist_vars list
        ring-buffer: Do not swap cpu_buffer during resize process
        tracing: Remove unused extern declaration tracing_map_set_field_descr()
      3b4e48b8
    • Linus Torvalds's avatar
      Merge tag 'kbuild-fixes-v6.5' of... · 12a5336c
      Linus Torvalds authored
      Merge tag 'kbuild-fixes-v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
      
      Pull Kbuild fixes from Masahiro Yamada:
      
       - Fix stale help text in gconfig
      
       - Support *.S files in compile_commands.json
      
       - Flatten KBUILD_CFLAGS
      
       - Fix external module builds with Rust so that temporary files are
         created in the modules directories instead of the kernel tree
      
      * tag 'kbuild-fixes-v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
        kbuild: rust: avoid creating temporary files
        kbuild: flatten KBUILD_CFLAGS
        gen_compile_commands: add assembly files to compilation database
        kconfig: gconfig: correct program name in help text
        kconfig: gconfig: drop the Show Debug Info help text
      12a5336c
    • Miguel Ojeda's avatar
      kbuild: rust: avoid creating temporary files · df01b7cf
      Miguel Ojeda authored
      `rustc` outputs by default the temporary files (i.e. the ones saved
      by `-Csave-temps`, such as `*.rcgu*` files) in the current working
      directory when `-o` and `--out-dir` are not given (even if
      `--emit=x=path` is given, i.e. it does not use those for temporaries).
      
      Since out-of-tree modules are compiled from the `linux` tree,
      `rustc` then tries to create them there, which may not be accessible.
      
      Thus pass `--out-dir` explicitly, even if it is just for the temporary
      files.
      
      Similarly, do so for Rust host programs too.
      Reported-by: default avatarRaphael Nestler <raphael.nestler@gmail.com>
      Closes: https://github.com/Rust-for-Linux/linux/issues/1015Reported-by: default avatarAndrea Righi <andrea.righi@canonical.com>
      Tested-by: Raphael Nestler <raphael.nestler@gmail.com> # non-hostprogs
      Tested-by: Andrea Righi <andrea.righi@canonical.com> # non-hostprogs
      Fixes: 295d8398 ("kbuild: specify output names separately for each emission type from rustc")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarMiguel Ojeda <ojeda@kernel.org>
      Tested-by: default avatarMartin Rodriguez Reboredo <yakoyoku@gmail.com>
      Signed-off-by: default avatarMasahiro Yamada <masahiroy@kernel.org>
      df01b7cf
    • Linus Torvalds's avatar
      Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm · 269f4a4b
      Linus Torvalds authored
      Pull kvm fixes from Paolo Bonzini:
       "ARM:
      
         - Avoid pKVM finalization if KVM initialization fails
      
         - Add missing BTI instructions in the hypervisor, fixing an early
           boot failure on BTI systems
      
         - Handle MMU notifiers correctly for non hugepage-aligned memslots
      
         - Work around a bug in the architecture where hypervisor timer
           controls have UNKNOWN behavior under nested virt
      
         - Disable preemption in kvm_arch_hardware_enable(), fixing a kernel
           BUG in cpu hotplug resulting from per-CPU accessor sanity checking
      
         - Make WFI emulation on GICv4 systems robust w.r.t. preemption,
           consistently requesting a doorbell interrupt on vcpu_put()
      
         - Uphold RES0 sysreg behavior when emulating older PMU versions
      
         - Avoid macro expansion when initializing PMU register names,
           ensuring the tracepoints pretty-print the sysreg
      
        s390:
      
         - Two fixes for asynchronous destroy
      
        x86 fixes will come early next week"
      
      * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
        KVM: s390: pv: fix index value of replaced ASCE
        KVM: s390: pv: simplify shutdown and fix race
        KVM: arm64: Fix the name of sys_reg_desc related to PMU
        KVM: arm64: Correctly handle RES0 bits PMEVTYPER<n>_EL0.evtCount
        KVM: arm64: vgic-v4: Make the doorbell request robust w.r.t preemption
        KVM: arm64: Add missing BTI instructions
        KVM: arm64: Correctly handle page aging notifiers for unaligned memslot
        KVM: arm64: Disable preemption in kvm_arch_hardware_enable()
        KVM: arm64: Handle kvm_arm_init failure correctly in finalize_pkvm
        KVM: arm64: timers: Use CNTHCTL_EL2 when setting non-CNTKCTL_EL1 bits
      269f4a4b
    • Linus Torvalds's avatar
      Merge tag 'ext4_for_linus-6.5-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 · 15b593ba
      Linus Torvalds authored
      Pull ext4 fixes from Ted Ts'o:
       "Bug and regression fixes for 6.5-rc3 for ext4's mballoc and jbd2's
        checkpoint code"
      
      * tag 'ext4_for_linus-6.5-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
        ext4: fix rbtree traversal bug in ext4_mb_use_preallocated
        ext4: fix off by one issue in ext4_mb_choose_next_group_best_avail()
        ext4: correct inline offset when handling xattrs in inode body
        jbd2: remove __journal_try_to_free_buffer()
        jbd2: fix a race when checking checkpoint buffer busy
        jbd2: Fix wrongly judgement for buffer head removing while doing checkpoint
        jbd2: remove journal_clean_one_cp_list()
        jbd2: remove t_checkpoint_io_list
        jbd2: recheck chechpointing non-dirty buffer
      15b593ba
    • Linus Torvalds's avatar
      Merge tag '6.5-rc2-smb3-client-fixes-ver2' of git://git.samba.org/sfrench/cifs-2.6 · 8266f53b
      Linus Torvalds authored
      Pull smb client fix from Steve French:
       "Add minor debugging improvement.
      
        The change improves ability to read a network trace to debug problems
        on encrypted connections which are very common (e.g. using wireshark
        or tcpdump).
      
        That works today with tools like 'smbinfo keys /mnt/file' but requires
        passing in a filename on the mount (see e.g. [1]), but it often makes
        more sense to just pass in the mount point path (ie a directory not a
        filename).
      
        So this fix was needed to debug some types of problems (an obvious
        example is on an encrypted connection failing operations on an empty
        share or with no files in the root of the directory) - so you can
        simply pass in the 'smbinfo keys <mntpoint>' and get the information
        that wireshark needs"
      
      Link: https://wiki.samba.org/index.php/Wireshark_Decryption [1]
      
      * tag '6.5-rc2-smb3-client-fixes-ver2' of git://git.samba.org/sfrench/cifs-2.6:
        cifs: update internal module version number for cifs.ko
        cifs: allow dumping keys for directories too
      8266f53b
    • Paolo Bonzini's avatar
      Merge tag 'kvm-s390-master-6.5-1' of... · 0c189708
      Paolo Bonzini authored
      Merge tag 'kvm-s390-master-6.5-1' of https://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD
      
      Two fixes for asynchronous destroy
      0c189708
    • Paolo Bonzini's avatar
      Merge tag 'kvmarm-fixes-6.5-1' of... · 675a15f4
      Paolo Bonzini authored
      Merge tag 'kvmarm-fixes-6.5-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
      
      KVM/arm64 fixes for 6.5, part #1
      
       - Avoid pKVM finalization if KVM initialization fails
      
       - Add missing BTI instructions in the hypervisor, fixing an early boot
         failure on BTI systems
      
       - Handle MMU notifiers correctly for non hugepage-aligned memslots
      
       - Work around a bug in the architecture where hypervisor timer controls
         have UNKNOWN behavior under nested virt.
      
       - Disable preemption in kvm_arch_hardware_enable(), fixing a kernel BUG
         in cpu hotplug resulting from per-CPU accessor sanity checking.
      
       - Make WFI emulation on GICv4 systems robust w.r.t. preemption,
         consistently requesting a doorbell interrupt on vcpu_put()
      
       - Uphold RES0 sysreg behavior when emulating older PMU versions
      
       - Avoid macro expansion when initializing PMU register names, ensuring
         the tracepoints pretty-print the sysreg.
      675a15f4
    • Mohamed Khalfella's avatar
      tracing/histograms: Return an error if we fail to add histogram to hist_vars list · 4b8b3905
      Mohamed Khalfella authored
      Commit 6018b585 ("tracing/histograms: Add histograms to hist_vars if
      they have referenced variables") added a check to fail histogram creation
      if save_hist_vars() failed to add histogram to hist_vars list. But the
      commit failed to set ret to failed return code before jumping to
      unregister histogram, fix it.
      
      Link: https://lore.kernel.org/linux-trace-kernel/20230714203341.51396-1-mkhalfella@purestorage.com
      
      Cc: stable@vger.kernel.org
      Fixes: 6018b585 ("tracing/histograms: Add histograms to hist_vars if they have referenced variables")
      Signed-off-by: default avatarMohamed Khalfella <mkhalfella@purestorage.com>
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      4b8b3905
    • Chen Lin's avatar
      ring-buffer: Do not swap cpu_buffer during resize process · 8a96c028
      Chen Lin authored
      When ring_buffer_swap_cpu was called during resize process,
      the cpu buffer was swapped in the middle, resulting in incorrect state.
      Continuing to run in the wrong state will result in oops.
      
      This issue can be easily reproduced using the following two scripts:
      /tmp # cat test1.sh
      //#! /bin/sh
      for i in `seq 0 100000`
      do
               echo 2000 > /sys/kernel/debug/tracing/buffer_size_kb
               sleep 0.5
               echo 5000 > /sys/kernel/debug/tracing/buffer_size_kb
               sleep 0.5
      done
      /tmp # cat test2.sh
      //#! /bin/sh
      for i in `seq 0 100000`
      do
              echo irqsoff > /sys/kernel/debug/tracing/current_tracer
              sleep 1
              echo nop > /sys/kernel/debug/tracing/current_tracer
              sleep 1
      done
      /tmp # ./test1.sh &
      /tmp # ./test2.sh &
      
      A typical oops log is as follows, sometimes with other different oops logs.
      
      [  231.711293] WARNING: CPU: 0 PID: 9 at kernel/trace/ring_buffer.c:2026 rb_update_pages+0x378/0x3f8
      [  231.713375] Modules linked in:
      [  231.714735] CPU: 0 PID: 9 Comm: kworker/0:1 Tainted: G        W          6.5.0-rc1-00276-g20edcec2 #15
      [  231.716750] Hardware name: linux,dummy-virt (DT)
      [  231.718152] Workqueue: events update_pages_handler
      [  231.719714] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
      [  231.721171] pc : rb_update_pages+0x378/0x3f8
      [  231.722212] lr : rb_update_pages+0x25c/0x3f8
      [  231.723248] sp : ffff800082b9bd50
      [  231.724169] x29: ffff800082b9bd50 x28: ffff8000825f7000 x27: 0000000000000000
      [  231.726102] x26: 0000000000000001 x25: fffffffffffff010 x24: 0000000000000ff0
      [  231.728122] x23: ffff0000c3a0b600 x22: ffff0000c3a0b5c0 x21: fffffffffffffe0a
      [  231.730203] x20: ffff0000c3a0b600 x19: ffff0000c0102400 x18: 0000000000000000
      [  231.732329] x17: 0000000000000000 x16: 0000000000000000 x15: 0000ffffe7aa8510
      [  231.734212] x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000002
      [  231.736291] x11: ffff8000826998a8 x10: ffff800082b9baf0 x9 : ffff800081137558
      [  231.738195] x8 : fffffc00030e82c8 x7 : 0000000000000000 x6 : 0000000000000001
      [  231.740192] x5 : ffff0000ffbafe00 x4 : 0000000000000000 x3 : 0000000000000000
      [  231.742118] x2 : 00000000000006aa x1 : 0000000000000001 x0 : ffff0000c0007208
      [  231.744196] Call trace:
      [  231.744892]  rb_update_pages+0x378/0x3f8
      [  231.745893]  update_pages_handler+0x1c/0x38
      [  231.746893]  process_one_work+0x1f0/0x468
      [  231.747852]  worker_thread+0x54/0x410
      [  231.748737]  kthread+0x124/0x138
      [  231.749549]  ret_from_fork+0x10/0x20
      [  231.750434] ---[ end trace 0000000000000000 ]---
      [  233.720486] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
      [  233.721696] Mem abort info:
      [  233.721935]   ESR = 0x0000000096000004
      [  233.722283]   EC = 0x25: DABT (current EL), IL = 32 bits
      [  233.722596]   SET = 0, FnV = 0
      [  233.722805]   EA = 0, S1PTW = 0
      [  233.723026]   FSC = 0x04: level 0 translation fault
      [  233.723458] Data abort info:
      [  233.723734]   ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
      [  233.724176]   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
      [  233.724589]   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
      [  233.725075] user pgtable: 4k pages, 48-bit VAs, pgdp=0000000104943000
      [  233.725592] [0000000000000000] pgd=0000000000000000, p4d=0000000000000000
      [  233.726231] Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
      [  233.726720] Modules linked in:
      [  233.727007] CPU: 0 PID: 9 Comm: kworker/0:1 Tainted: G        W          6.5.0-rc1-00276-g20edcec2 #15
      [  233.727777] Hardware name: linux,dummy-virt (DT)
      [  233.728225] Workqueue: events update_pages_handler
      [  233.728655] pstate: 200000c5 (nzCv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
      [  233.729054] pc : rb_update_pages+0x1a8/0x3f8
      [  233.729334] lr : rb_update_pages+0x154/0x3f8
      [  233.729592] sp : ffff800082b9bd50
      [  233.729792] x29: ffff800082b9bd50 x28: ffff8000825f7000 x27: 0000000000000000
      [  233.730220] x26: 0000000000000000 x25: ffff800082a8b840 x24: ffff0000c0102418
      [  233.730653] x23: 0000000000000000 x22: fffffc000304c880 x21: 0000000000000003
      [  233.731105] x20: 00000000000001f4 x19: ffff0000c0102400 x18: ffff800082fcbc58
      [  233.731727] x17: 0000000000000000 x16: 0000000000000001 x15: 0000000000000001
      [  233.732282] x14: ffff8000825fe0c8 x13: 0000000000000001 x12: 0000000000000000
      [  233.732709] x11: ffff8000826998a8 x10: 0000000000000ae0 x9 : ffff8000801b760c
      [  233.733148] x8 : fefefefefefefeff x7 : 0000000000000018 x6 : ffff0000c03298c0
      [  233.733553] x5 : 0000000000000002 x4 : 0000000000000000 x3 : 0000000000000000
      [  233.733972] x2 : ffff0000c3a0b600 x1 : 0000000000000000 x0 : 0000000000000000
      [  233.734418] Call trace:
      [  233.734593]  rb_update_pages+0x1a8/0x3f8
      [  233.734853]  update_pages_handler+0x1c/0x38
      [  233.735148]  process_one_work+0x1f0/0x468
      [  233.735525]  worker_thread+0x54/0x410
      [  233.735852]  kthread+0x124/0x138
      [  233.736064]  ret_from_fork+0x10/0x20
      [  233.736387] Code: 92400000 910006b5 aa000021 aa0303f7 (f9400060)
      [  233.736959] ---[ end trace 0000000000000000 ]---
      
      After analysis, the seq of the error is as follows [1-5]:
      
      int ring_buffer_resize(struct trace_buffer *buffer, unsigned long size,
      			int cpu_id)
      {
      	for_each_buffer_cpu(buffer, cpu) {
      		cpu_buffer = buffer->buffers[cpu];
      		//1. get cpu_buffer, aka cpu_buffer(A)
      		...
      		...
      		schedule_work_on(cpu,
      		 &cpu_buffer->update_pages_work);
      		//2. 'update_pages_work' is queue on 'cpu', cpu_buffer(A) is passed to
      		// update_pages_handler, do the update process, set 'update_done' in
      		// complete(&cpu_buffer->update_done) and to wakeup resize process.
      	//---->
      		//3. Just at this moment, ring_buffer_swap_cpu is triggered,
      		//cpu_buffer(A) be swaped to cpu_buffer(B), the max_buffer.
      		//ring_buffer_swap_cpu is called as the 'Call trace' below.
      
      		Call trace:
      		 dump_backtrace+0x0/0x2f8
      		 show_stack+0x18/0x28
      		 dump_stack+0x12c/0x188
      		 ring_buffer_swap_cpu+0x2f8/0x328
      		 update_max_tr_single+0x180/0x210
      		 check_critical_timing+0x2b4/0x2c8
      		 tracer_hardirqs_on+0x1c0/0x200
      		 trace_hardirqs_on+0xec/0x378
      		 el0_svc_common+0x64/0x260
      		 do_el0_svc+0x90/0xf8
      		 el0_svc+0x20/0x30
      		 el0_sync_handler+0xb0/0xb8
      		 el0_sync+0x180/0x1c0
      	//<----
      
      	/* wait for all the updates to complete */
      	for_each_buffer_cpu(buffer, cpu) {
      		cpu_buffer = buffer->buffers[cpu];
      		//4. get cpu_buffer, cpu_buffer(B) is used in the following process,
      		//the state of cpu_buffer(A) and cpu_buffer(B) is totally wrong.
      		//for example, cpu_buffer(A)->update_done will leave be set 1, and will
      		//not 'wait_for_completion' at the next resize round.
      		  if (!cpu_buffer->nr_pages_to_update)
      			continue;
      
      		if (cpu_online(cpu))
      			wait_for_completion(&cpu_buffer->update_done);
      		cpu_buffer->nr_pages_to_update = 0;
      	}
      	...
      }
      	//5. the state of cpu_buffer(A) and cpu_buffer(B) is totally wrong,
      	//Continuing to run in the wrong state, then oops occurs.
      
      Link: https://lore.kernel.org/linux-trace-kernel/202307191558478409990@zte.com.cnSigned-off-by: default avatarChen Lin <chen.lin5@zte.com.cn>
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      8a96c028
    • YueHaibing's avatar
      tracing: Remove unused extern declaration tracing_map_set_field_descr() · 1faf7e4a
      YueHaibing authored
      Since commit 08d43a5f ("tracing: Add lock-free tracing_map"),
      this is never used, so can be removed.
      
      Link: https://lore.kernel.org/linux-trace-kernel/20230722032123.24664-1-yuehaibing@huawei.com
      
      Cc: <mhiramat@kernel.org>
      Signed-off-by: default avatarYueHaibing <yuehaibing@huawei.com>
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      1faf7e4a
    • Alexey Dobriyan's avatar
      kbuild: flatten KBUILD_CFLAGS · 0817d259
      Alexey Dobriyan authored
      Make it slightly easier to see which compiler options are added and
      removed (and not worry about column limit too!).
      Signed-off-by: default avatarAlexey Dobriyan <adobriyan@gmail.com>
      Reviewed-by: default avatarNicolas Schier <n.schier@avm.de>
      Signed-off-by: default avatarMasahiro Yamada <masahiroy@kernel.org>
      0817d259
    • Benjamin Gray's avatar
      gen_compile_commands: add assembly files to compilation database · 1c679214
      Benjamin Gray authored
      Like C source files, tooling can find it useful to have the assembly
      source file compilation recorded.
      
      The .S extension appears to used across all architectures.
      Signed-off-by: default avatarBenjamin Gray <bgray@linux.ibm.com>
      Reviewed-by: default avatarFangrui Song <maskray@google.com>
      Reviewed-by: default avatarNathan Chancellor <nathan@kernel.org>
      Reviewed-by: default avatarNick Desaulniers <ndesaulniers@google.com>
      Signed-off-by: default avatarMasahiro Yamada <masahiroy@kernel.org>
      1c679214
    • Ojaswin Mujoo's avatar
      ext4: fix rbtree traversal bug in ext4_mb_use_preallocated · 9d3de7ee
      Ojaswin Mujoo authored
      During allocations, while looking for preallocations(PA) in the per
      inode rbtree, we can't do a direct traversal of the tree because
      ext4_mb_discard_group_preallocation() can paralelly mark the pa deleted
      and that can cause direct traversal to skip some entries. This was
      leading to a BUG_ON() being hit [1] when we missed a PA that could satisfy
      our request and ultimately tried to create a new PA that would overlap
      with the missed one.
      
      To makes sure we handle that case while still keeping the performance of
      the rbtree, we make use of the fact that the only pa that could possibly
      overlap the original goal start is the one that satisfies the below
      conditions:
      
        1. It must have it's logical start immediately to the left of
        (ie less than) original logical start.
      
        2. It must not be deleted
      
      To find this pa we use the following traversal method:
      
      1. Descend into the rbtree normally to find the immediate neighboring
      PA. Here we keep descending irrespective of if the PA is deleted or if
      it overlaps with our request etc. The goal is to find an immediately
      adjacent PA.
      
      2. If the found PA is on right of original goal, use rb_prev() to find
      the left adjacent PA.
      
      3. Check if this PA is deleted and keep moving left with rb_prev() until
      a non deleted PA is found.
      
      4. This is the PA we are looking for. Now we can check if it can satisfy
      the original request and proceed accordingly.
      
      This approach also takes care of having deleted PAs in the tree.
      
      (While we are at it, also fix a possible overflow bug in calculating the
      end of a PA)
      
      [1] https://lore.kernel.org/linux-ext4/CA+G9fYv2FRpLqBZf34ZinR8bU2_ZRAUOjKAD3+tKRFaEQHtt8Q@mail.gmail.com/
      
      Cc: stable@kernel.org # 6.4
      Fixes: 38727786 ("ext4: Use rbtrees to manage PAs instead of inode i_prealloc_list")
      Signed-off-by: default avatarOjaswin Mujoo <ojaswin@linux.ibm.com>
      Reported-by: default avatarNaresh Kamboju <naresh.kamboju@linaro.org>
      Reviewed-by: Ritesh Harjani (IBM) ritesh.list@gmail.com
      Tested-by: Ritesh Harjani (IBM) ritesh.list@gmail.com
      Link: https://lore.kernel.org/r/edd2efda6a83e6343c5ace9deea44813e71dbe20.1690045963.git.ojaswin@linux.ibm.comSigned-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      9d3de7ee
    • Ojaswin Mujoo's avatar
      ext4: fix off by one issue in ext4_mb_choose_next_group_best_avail() · 5d5460fa
      Ojaswin Mujoo authored
      In ext4_mb_choose_next_group_best_avail(), we want the start order to be
      1 less than goal length and the min_order to be, at max, 1 more than the
      original length. This commit fixes an off by one issue that arose due to
      the fact that 1 << fls(n) > (n).
      
      After all the processing:
      
      order = 1 order below goal len
      min_order = maximum of the three:-
                   - order - trim_order
                   - 1 order below B2C(s_stripe)
                   - 1 order above original len
      
      Cc: stable@kernel.org
      Fixes: 33122aa930 ("ext4: Add allocation criteria 1.5 (CR1_5)")
      Signed-off-by: default avatarOjaswin Mujoo <ojaswin@linux.ibm.com>
      Link: https://lore.kernel.org/r/20230609103403.112807-1-ojaswin@linux.ibm.comSigned-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      5d5460fa
    • Eric Whitney's avatar
      ext4: correct inline offset when handling xattrs in inode body · 6909cf5c
      Eric Whitney authored
      When run on a file system where the inline_data feature has been
      enabled, xfstests generic/269, generic/270, and generic/476 cause ext4
      to emit error messages indicating that inline directory entries are
      corrupted.  This occurs because the inline offset used to locate
      inline directory entries in the inode body is not updated when an
      xattr in that shared region is deleted and the region is shifted in
      memory to recover the space it occupied.  If the deleted xattr precedes
      the system.data attribute, which points to the inline directory entries,
      that attribute will be moved further up in the region.  The inline
      offset continues to point to whatever is located in system.data's former
      location, with unfortunate effects when used to access directory entries
      or (presumably) inline data in the inode body.
      
      Cc: stable@kernel.org
      Signed-off-by: default avatarEric Whitney <enwlinux@gmail.com>
      Link: https://lore.kernel.org/r/20230522181520.1570360-1-enwlinux@gmail.comSigned-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      6909cf5c
    • Linus Torvalds's avatar
      Merge tag 'powerpc-6.5-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux · c2782531
      Linus Torvalds authored
      Pull powerpc fixes from Michael Ellerman:
      
       - Reinstate support for little endian ELFv1 binaries, which it turns
         out still exist in the wild.
      
       - Revert a change which used asm goto for WARN_ON/__WARN_FLAGS, as it
         lead to dead code generation and seemed to trigger compiler bugs in
         some edge cases.
      
       - Fix a deadlock in the pseries VAS code, between live migration and
         the driver's mmap handler.
      
       - Disable KCOV instrumentation in the powerpc KASAN code.
      
      Thanks to Andrew Donnellan, Benjamin Gray, Christophe Leroy, Haren
      Myneni, Russell Currey, and Uwe Kleine-König.
      
      * tag 'powerpc-6.5-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
        Revert "powerpc/64s: Remove support for ELFv1 little endian userspace"
        powerpc/kasan: Disable KCOV in KASAN code
        powerpc/512x: lpbfifo: Convert to platform remove callback returning void
        powerpc/crypto: Add gitignore for generated P10 AES/GCM .S files
        Revert "powerpc/bug: Provide better flexibility to WARN_ON/__WARN_FLAGS() with asm goto"
        powerpc/pseries/vas: Hold mmap_mutex after mmap lock during window close
      c2782531
  3. 22 Jul, 2023 8 commits
    • Steve French's avatar
      cifs: update internal module version number for cifs.ko · ba61a03a
      Steve French authored
      From 2.43 to 2.44
      Signed-off-by: default avatarSteve French <stfrench@microsoft.com>
      ba61a03a
    • Shyam Prasad N's avatar
      cifs: allow dumping keys for directories too · b3edef6b
      Shyam Prasad N authored
      Dumping the enc/dec keys is a session wide operation.
      And it should not matter if the ioctl was run on
      a regular file or a directory.
      
      Currently, we obtain the tcon pointer from the
      cifs file handle. But since there's no dir open call
      in cifs, this is not populated for dirs.
      
      This change allows dumping of session keys using ioctl
      even for directories. To do this, we'll now get the
      tcon pointer from the superblock, and not from the file
      handle.
      Signed-off-by: default avatarShyam Prasad N <sprasad@microsoft.com>
      Signed-off-by: default avatarSteve French <stfrench@microsoft.com>
      b3edef6b
    • Linus Torvalds's avatar
      Merge tag 's390-6.5-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux · 295e1388
      Linus Torvalds authored
      Pull s390 fixes from Heiko Carstens:
      
       - Fix per vma lock fault handling: add missing !(fault & VM_FAULT_ERROR)
         check to fault handler to prevent error handling for return values
         that don't indicate an error
      
       - Use kfree_sensitive() instead of kfree() in paes crypto code to clear
         memory that may contain keys before freeing it
      
       - Fix reply buffer size calculation for CCA replies in zcrypt device
         driver
      
      * tag 's390-6.5-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
        s390/zcrypt: fix reply buffer calculations for CCA replies
        s390/crypto: use kfree_sensitive() instead of kfree()
        s390/mm: fix per vma lock fault handling
      295e1388
    • Linus Torvalds's avatar
      Merge tag 'block-6.5-2023-07-21' of git://git.kernel.dk/linux · f036d67c
      Linus Torvalds authored
      Pull block fixes from Jens Axboe:
      
       - Fix for loop regressions (Mauricio)
      
       - Fix a potential stall with batched wakeups in sbitmap (David)
      
       - Fix for stall with recursive plug flushes (Ross)
      
       - Skip accounting of empty requests for blk-iocost (Chengming)
      
       - Remove a dead field in struct blk_mq_hw_ctx (Chengming)
      
      * tag 'block-6.5-2023-07-21' of git://git.kernel.dk/linux:
        loop: do not enforce max_loop hard limit by (new) default
        loop: deprecate autoloading callback loop_probe()
        sbitmap: fix batching wakeup
        blk-iocost: skip empty flush bio in iocost
        blk-mq: delete dead struct blk_mq_hw_ctx->queued field
        blk-mq: Fix stall due to recursive flush plug
      f036d67c
    • Linus Torvalds's avatar
      Merge tag 'io_uring-6.5-2023-07-21' of git://git.kernel.dk/linux · bdd1d82e
      Linus Torvalds authored
      Pull io_uring fixes from Jens Axboe:
      
       - Fix for io-wq not always honoring REQ_F_NOWAIT, if it was set and
         punted directly (eg via DRAIN) (me)
      
       - Capability check fix (Ondrej)
      
       - Regression fix for the mmap changes that went into 6.4, which
         apparently broke IA64 (Helge)
      
      * tag 'io_uring-6.5-2023-07-21' of git://git.kernel.dk/linux:
        ia64: mmap: Consider pgoff when searching for free mapping
        io_uring: Fix io_uring mmap() by using architecture-provided get_unmapped_area()
        io_uring: treat -EAGAIN for REQ_F_NOWAIT as final for io-wq
        io_uring: don't audit the capability check in io_uring_create()
      bdd1d82e
    • Linus Torvalds's avatar
      Merge tag 'devicetree-fixes-for-6.5-1' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux · 725d444d
      Linus Torvalds authored
      Pull devicetree fixes from Rob Herring:
      
       - Fix moortec,mr75203 schema usage of 'multipleOf' keyword
      
       - Fix regression in systems depending on "of-display" device name
      
       - Build fix for s390 with CONFIG_PCI=n and OF_EARLY_FLATTREE=y
      
       - Drop two obsolete serial .txt bindings
      
      * tag 'devicetree-fixes-for-6.5-1' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
        dt-bindings: serial: Remove obsolete nxp,lpc1850-uart.txt
        dt-bindings: serial: Remove obsolete cavium-uart.txt
        dt-bindings: hwmon: moortec,mr75203: fix multipleOf for coefficients
        of: Preserve "of-display" device name for compatibility
        of: make OF_EARLY_FLATTREE depend on HAS_IOMEM
      725d444d
    • Linus Torvalds's avatar
      Merge tag 'regmap-fix-v6.5-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap · 39b14286
      Linus Torvalds authored
      Pull regmap fixes from Mark Brown:
       "Three fixes here:
      
         - The issues with accounting for register and padding length on raw
           buses turn out to be quite widespread in custom buses.
      
           In order to avoid disturbing anything drop the initial fixes and
           fall back to a point fix in the SMBus code where the issue was
           originally noticed, a more substantial refactoring of the API which
           ensures that all buses make the same assumptions will follow.
      
         - The generic regcache code had been forcing on async I/O which did
           not work with the new maple tree sync code when used with SPI.
      
           Since that was mainly for the rbtree cache and the assumptions
           about hardware that drove the choice are probably not true any more
           fix this by pushing the enablement of async down into the rbtree
           code.
      
           This probably also makes cache syncs for systems faster though it's
           not the point.
      
         - The test code was triggering use of the rbtree and maple tree
           caches with dynamic allocation of nodes since all the testing is
           with RAM backed caches with no I/O performance issues.
      
           Just disable the locking in the tests to avoid triggering warnings
           when allocation debugging is turned on, it's not really what's
           being tested"
      
      * tag 'regmap-fix-v6.5-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap:
        regmap: Disable locking for RBTREE and MAPLE unit tests
        regcache: Push async I/O request down into the rbtree cache
        regmap: Account for register length in SMBus I/O limits
        regmap: Drop initial version of maximum transfer length fixes
      39b14286
    • Linus Torvalds's avatar
      Merge tag 'gpio-fixes-for-v6.5-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux · c0842db5
      Linus Torvalds authored
      Pull gpio fixes from Bartosz Golaszewski:
      
       - fix initial value handling for output-only pins in gpio-tps68470
      
       - fix two resource leaks in gpio-mvebu
      
      * tag 'gpio-fixes-for-v6.5-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
        gpio: mvebu: fix irq domain leak
        gpio: mvebu: Make use of devm_pwmchip_add
        gpio: tps68470: Make tps68470_gpio_output() always set the initial value
      c0842db5
  4. 21 Jul, 2023 7 commits