1. 23 Jun, 2023 2 commits
    • Linus Torvalds's avatar
      Merge tag 'io_uring-6.4-2023-06-21' of git://git.kernel.dk/linux · c213de63
      Linus Torvalds authored
      Pull io_uring fixes from Jens Axboe:
       "A fix for a race condition with poll removal and linked timeouts, and
        then a few followup fixes/tweaks for the msg_control patch from last
        week.
      
        Not super important, particularly the sparse fixup, as it was broken
        before that recent commit. But let's get it sorted for real for this
        release, rather than just have it broken a bit differently"
      
      * tag 'io_uring-6.4-2023-06-21' of git://git.kernel.dk/linux:
        io_uring/net: use the correct msghdr union member in io_sendmsg_copy_hdr
        io_uring/net: disable partial retries for recvmsg with cmsg
        io_uring/net: clear msg_controllen on partial sendmsg retry
        io_uring/poll: serialize poll linked timer start with poll removal
      c213de63
    • Linus Torvalds's avatar
      Merge tag 'cgroup-for-6.4-rc7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup · 5950a006
      Linus Torvalds authored
      Pull cgroup fixes from Tejun Heo:
       "It's late but here are two bug fixes. Both fix problems which can be
        severe but are very confined in scope. The risk to most use cases
        should be minimal.
      
         - Fix for an old bug which triggers if a cgroup subsystem is
           remounted to a different hierarchy while someone is reading its
           cgroup.procs/tasks file. The risk is pretty low given how seldom
           cgroup subsystems are moved across hierarchies.
      
         - We moved cpus_read_lock() outside of cgroup internal locks a while
           ago but forgot to update the legacy_freezer leading to lockdep
           triggers. Fixed"
      
      * tag 'cgroup-for-6.4-rc7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
        cgroup: Do not corrupt task iteration when rebinding subsystem
        cgroup,freezer: hold cpu_hotplug_lock before freezer_mutex in freezer_css_{online,offline}()
      5950a006
  2. 21 Jun, 2023 11 commits
    • Linus Torvalds's avatar
      Merge tag 'timers-urgent-2023-06-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · dad9774d
      Linus Torvalds authored
      Pull timer fix from Thomas Gleixner:
       "A single regression fix for a regression fix:
      
        For a long time the tick was aligned to clock MONOTONIC so that the
        tick event happened at a multiple of nanoseconds per tick starting
        from clock MONOTONIC = 0.
      
        At some point this changed as the refined jiffies clocksource which is
        used during boot before the TSC or other clocksources becomes usable,
        was adjusted with a boot offset, so that time 0 is closer to the point
        where the kernel starts.
      
        This broke the assumption in the tick code that when the tick setup
        happens early on ktime_get() will return a multiple of nanoseconds per
        tick. As a consequence applications which aligned their periodic
        execution so that it does not collide with the tick were not longer
        guaranteed that the tick period starts from time 0.
      
        The fix for this regression was to realign the tick when it is
        initially set up to a multiple of tick periods. That works as long as
        the underlying tick device supports periodic mode, but breaks under
        certain conditions when the tick device supports only one shot mode.
      
        Depending on the offset, the alignment delta to clock MONOTONIC can
        get in a range where the minimal programming delta of the underlying
        clock event device is larger than the calculated delta to the next
        tick. This results in a boot hang as the tick code tries to play catch
        up, but as the tick never fires jiffies are not advanced so it keeps
        trying for ever.
      
        Solve this by moving the tick alignement into the NOHZ / HIGHRES
        enablement code because at that point it is guaranteed that the
        underlying clocksource is high resolution capable and not longer
        depending on the tick.
      
        This is far before user space starts, so at the point where
        applications try to align their timers, the old behaviour of the tick
        happening at a multiple of nanoseconds per tick starting from clock
        MONOTONIC = 0 is restored"
      
      * tag 'timers-urgent-2023-06-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        tick/common: Align tick period during sched_timer setup
      dad9774d
    • Linus Torvalds's avatar
      Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost · 00703497
      Linus Torvalds authored
      Pull virtio fix from Michael Tsirkin:
       "A last minute revert to fix a regression"
      
      * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
        Revert "virtio-blk: support completion batching for the IRQ path"
      00703497
    • Linus Torvalds's avatar
      Revert "efi: random: refresh non-volatile random seed when RNG is initialized" · 69cbeb61
      Linus Torvalds authored
      This reverts commit e7b813b3 (and the
      subsequent fix for it: 41a15855 "efi: random: fix NULL-deref when
      refreshing seed").
      
      It turns otu to cause non-deterministic boot stalls on at least a HP
      6730b laptop.
      Reported-and-bisected-by: default avatarSami Korkalainen <sami.korkalainen@proton.me>
      Link: https://lore.kernel.org/all/GQUnKz2al3yke5mB2i1kp3SzNHjK8vi6KJEh7rnLrOQ24OrlljeCyeWveLW9pICEmB9Qc8PKdNt3w1t_g3-Uvxq1l8Wj67PpoMeWDoH8PKk=@proton.me/
      Cc: Jason A. Donenfeld <Jason@zx2c4.com>
      Cc: Bagas Sanjaya <bagasdotme@gmail.com>
      Cc: stable@kernel.org
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      69cbeb61
    • Linus Torvalds's avatar
      Merge tag 'spi-fix-v6.4-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi · 2214170c
      Linus Torvalds authored
      Pull spi fix from Mark Brown:
       "One last fix for SPI, just a simple fix for incorrect handling of
        probe deferral for DMA in the Qualcomm GENI driver"
      
      * tag 'spi-fix-v6.4-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
        spi: spi-geni-qcom: correctly handle -EPROBE_DEFER from dma_request_chan()
      2214170c
    • Linus Torvalds's avatar
      Merge tag 'regulator-fix-v6.4-rc7' of... · 6e6fb54d
      Linus Torvalds authored
      Merge tag 'regulator-fix-v6.4-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator
      
      Pull regulator fix from Mark Brown:
       "One simple fix for v6.4, some incorrectly specified bitfield masks in
        the PCA9450 driver"
      
      * tag 'regulator-fix-v6.4-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
        regulator: pca9450: Fix LDO3OUT and LDO4OUT MASK
      6e6fb54d
    • Linus Torvalds's avatar
      Merge tag 'regmap-fix-v6.4-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap · e075d681
      Linus Torvalds authored
      Pull regmap fix from Mark Brown:
       "One more fix for v6.4
      
        The earlier fix to take account of the register data size when
        limiting raw register writes exposed the fact that the Intel AVMM bus
        was incorrectly specifying too low a limit on the maximum data
        transfer, it is only capable of transmitting one register so had set a
        transfer size limit that couldn't fit both the value and the the
        register address into a single message"
      
      * tag 'regmap-fix-v6.4-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap:
        regmap: spi-avmm: Fix regmap_bus max_raw_write
      e075d681
    • Jens Axboe's avatar
      io_uring/net: use the correct msghdr union member in io_sendmsg_copy_hdr · 26fed836
      Jens Axboe authored
      Rather than assign the user pointer to msghdr->msg_control, assign it
      to msghdr->msg_control_user to make sparse happy. They are in a union
      so the end result is the same, but let's avoid new sparse warnings and
      squash this one.
      Reported-by: default avatarkernel test robot <lkp@intel.com>
      Closes: https://lore.kernel.org/oe-kbuild-all/202306210654.mDMcyMuB-lkp@intel.com/
      Fixes: cac9e441 ("io_uring/net: save msghdr->msg_control for retries")
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      26fed836
    • Jens Axboe's avatar
      io_uring/net: disable partial retries for recvmsg with cmsg · 78d0d206
      Jens Axboe authored
      We cannot sanely handle partial retries for recvmsg if we have cmsg
      attached. If we don't, then we'd just be overwriting the initial cmsg
      header on retries. Alternatively we could increment and handle this
      appropriately, but it doesn't seem worth the complication.
      
      Move the MSG_WAITALL check into the non-multishot case while at it,
      since MSG_WAITALL is explicitly disabled for multishot anyway.
      
      Link: https://lore.kernel.org/io-uring/0b0d4411-c8fd-4272-770b-e030af6919a0@kernel.dk/
      Cc: stable@vger.kernel.org # 5.10+
      Reported-by: default avatarStefan Metzmacher <metze@samba.org>
      Reviewed-by: default avatarStefan Metzmacher <metze@samba.org>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      78d0d206
    • Jens Axboe's avatar
      io_uring/net: clear msg_controllen on partial sendmsg retry · b1dc4920
      Jens Axboe authored
      If we have cmsg attached AND we transferred partial data at least, clear
      msg_controllen on retry so we don't attempt to send that again.
      
      Cc: stable@vger.kernel.org # 5.10+
      Fixes: cac9e441 ("io_uring/net: save msghdr->msg_control for retries")
      Reported-by: default avatarStefan Metzmacher <metze@samba.org>
      Reviewed-by: default avatarStefan Metzmacher <metze@samba.org>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      b1dc4920
    • Michael S. Tsirkin's avatar
      Revert "virtio-blk: support completion batching for the IRQ path" · afd384f0
      Michael S. Tsirkin authored
      This reverts commit 07b679f7.
      
      This change appears to have broken things...
      We now see applications hanging during disk accesses.
      e.g.
      multi-port virtio-blk device running in h/w (FPGA)
      Host running a simple 'fio' test.
      [global]
      thread=1
      direct=1
      ioengine=libaio
      norandommap=1
      group_reporting=1
      bs=4K
      rw=read
      iodepth=128
      runtime=1
      numjobs=4
      time_based
      [job0]
      filename=/dev/vda
      [job1]
      filename=/dev/vdb
      [job2]
      filename=/dev/vdc
      ...
      [job15]
      filename=/dev/vdp
      
      i.e. 16 disks; 4 queues per disk; simple burst of 4KB reads
      This is repeatedly run in a loop.
      
      After a few, normally <10 seconds, fio hangs.
      With 64 queues (16 disks), failure occurs within a few seconds; with 8 queues (2 disks) it may take ~hour before hanging.
      Last message:
      fio-3.19
      Starting 8 threads
      Jobs: 1 (f=1): [_(7),R(1)][68.3%][eta 03h:11m:06s]
      I think this means at the end of the run 1 queue was left incomplete.
      
      'diskstats' (run while fio is hung) shows no outstanding transactions.
      e.g.
      $ cat /proc/diskstats
      ...
      252       0 vda 1843140071 0 14745120568 712568645 0 0 0 0 0 3117947 712568645 0 0 0 0 0 0
      252      16 vdb 1816291511 0 14530332088 704905623 0 0 0 0 0 3117711 704905623 0 0 0 0 0 0
      ...
      
      Other stats (in the h/w, and added to the virtio-blk driver ([a]virtio_queue_rq(), [b]virtblk_handle_req(), [c]virtblk_request_done()) all agree, and show every request had a completion, and that virtblk_request_done() never gets called.
      e.g.
      PF= 0                         vq=0           1           2           3
      [a]request_count     -   839416590   813148916   105586179    84988123
      [b]completion1_count -   839416590   813148916   105586179    84988123
      [c]completion2_count -           0           0           0           0
      
      PF= 1                         vq=0           1           2           3
      [a]request_count     -   823335887   812516140   104582672    75856549
      [b]completion1_count -   823335887   812516140   104582672    75856549
      [c]completion2_count -           0           0           0           0
      
      i.e. the issue is after the virtio-blk driver.
      
      This change was introduced in kernel 6.3.0.
      I am seeing this using 6.3.3.
      If I run with an earlier kernel (5.15), it does not occur.
      If I make a simple patch to the 6.3.3 virtio-blk driver, to skip the blk_mq_add_to_batch()call, it does not fail.
      e.g.
      kernel 5.15 - this is OK
      virtio_blk.c,virtblk_done() [irq handler]
                       if (likely(!blk_should_fake_timeout(req->q))) {
                                blk_mq_complete_request(req);
                       }
      
      kernel 6.3.3 - this fails
      virtio_blk.c,virtblk_handle_req() [irq handler]
                       if (likely(!blk_should_fake_timeout(req->q))) {
                                if (!blk_mq_complete_request_remote(req)) {
                                        if (!blk_mq_add_to_batch(req, iob, virtblk_vbr_status(vbr), virtblk_complete_batch)) {
                                                 virtblk_request_done(req);    //this never gets called... so blk_mq_add_to_batch() must always succeed
                                         }
                                }
                       }
      
      If I do, kernel 6.3.3 - this is OK
      virtio_blk.c,virtblk_handle_req() [irq handler]
                       if (likely(!blk_should_fake_timeout(req->q))) {
                                if (!blk_mq_complete_request_remote(req)) {
                                         virtblk_request_done(req); //force this here...
                                        if (!blk_mq_add_to_batch(req, iob, virtblk_vbr_status(vbr), virtblk_complete_batch)) {
                                                 virtblk_request_done(req);    //this never gets called... so blk_mq_add_to_batch() must always succeed
                                         }
                                }
                       }
      
      Perhaps you might like to fix/test/revert this change...
      Martin
      Reported-by: default avatarkernel test robot <lkp@intel.com>
      Closes: https://lore.kernel.org/oe-kbuild-all/202306090826.C1fZmdMe-lkp@intel.com/
      Cc: Suwan Kim <suwan.kim027@gmail.com>
      Tested-by: edliaw@google.com
      Reported-by: default avatar"Roberts, Martin" <martin.roberts@intel.com>
      Message-Id: <336455b4f630f329380a8f53ee8cad3868764d5c.1686295549.git.mst@redhat.com>
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      afd384f0
    • Linus Torvalds's avatar
      Merge tag 'mm-hotfixes-stable-2023-06-20-12-31' of... · 8ba90f5c
      Linus Torvalds authored
      Merge tag 'mm-hotfixes-stable-2023-06-20-12-31' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
      
      Pull hotfixes from Andrew Morton:
       "19 hotfixes.  8 of these are cc:stable.
      
        This includes a wholesale reversion of the post-6.4 series 'make slab
        shrink lockless'. After input from Dave Chinner it has been decided
        that we should go a different way [1]"
      
      Link: https://lkml.kernel.org/r/ZH6K0McWBeCjaf16@dread.disaster.area [1]
      
      * tag 'mm-hotfixes-stable-2023-06-20-12-31' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
        selftests/mm: fix cross compilation with LLVM
        mailmap: add entries for Ben Dooks
        nilfs2: prevent general protection fault in nilfs_clear_dirty_page()
        Revert "mm: vmscan: make global slab shrink lockless"
        Revert "mm: vmscan: make memcg slab shrink lockless"
        Revert "mm: vmscan: add shrinker_srcu_generation"
        Revert "mm: shrinkers: make count and scan in shrinker debugfs lockless"
        Revert "mm: vmscan: hold write lock to reparent shrinker nr_deferred"
        Revert "mm: vmscan: remove shrinker_rwsem from synchronize_shrinkers()"
        Revert "mm: shrinkers: convert shrinker_rwsem to mutex"
        nilfs2: fix buffer corruption due to concurrent device reads
        scripts/gdb: fix SB_* constants parsing
        scripts: fix the gfp flags header path in gfp-translate
        udmabuf: revert 'Add support for mapping hugepages (v4)'
        mm/khugepaged: fix iteration in collapse_file
        memfd: check for non-NULL file_seals in memfd_create() syscall
        mm/vmalloc: do not output a spurious warning when huge vmalloc() fails
        mm/mprotect: fix do_mprotect_pkey() limit check
        writeback: fix dereferencing NULL mapping->host on writeback_page_template
      8ba90f5c
  3. 20 Jun, 2023 8 commits
    • Linus Torvalds's avatar
      Merge tag 'acpi-6.4-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · e660abd5
      Linus Torvalds authored
      Pull ACPI fix from Rafael Wysocki:
       "Fix a kernel crash during early resume from ACPI S3 that has been
        present since the 5.15 cycle when might_sleep() was added to
        down_timeout(), which in some configurations of the kernel caused an
        implicit preemption point to trigger at a wrong time"
      
      * tag 'acpi-6.4-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
        ACPI: sleep: Avoid breaking S3 wakeup due to might_sleep()
      e660abd5
    • Linus Torvalds's avatar
      Merge tag 'thermal-6.4-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · c74e2ac2
      Linus Torvalds authored
      Pull thermal control fix from Rafael Wysocki:
       "Fix a regression introduced during the 6.3 cycle causing
        intel_soc_dts_iosf to report incorrect temperature values
        due to a coding mistake (Hans de Goede)"
      
      * tag 'thermal-6.4-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
        thermal/intel/intel_soc_dts_iosf: Fix reporting wrong temperatures
      c74e2ac2
    • Linus Torvalds's avatar
      Merge tag 'trace-v6.4-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace · 2e30b973
      Linus Torvalds authored
      Pull tracing fixes from Steven Rostedt:
      
       - Fix MAINTAINERS file to point to proper mailing list for rtla and rv
      
         The mailing list pointed to linux-trace-devel instead of
         linux-trace-kernel. The former is for the tracing libraries and the
         latter is for anything in the Linux kernel tree. The wrong mailing
         list was used because linux-trace-kernel did not exist when rtla and
         rv were created.
      
       - User events:
      
          - Fix matching of dynamic events to their user events
      
            When user writes to dynamic_events file, a lookup of the
            registered dynamic events is made, but there were some cases that
            a match could be incorrectly made.
      
          - Add auto cleanup of user events
      
            Have the user events automatically get removed when the last
            reference (file descriptor) is closed. This was asked for to
            prevent leaks of user events hanging around needing admins to
            clean them up.
      
          - Add persistent logic (but not let user space use it yet)
      
            In some cases, having a persistent user event (one that does not
            get cleaned up automatically) is useful. But there's still debates
            about how to expose this to user space. The infrastructure is
            added, but the API is not.
      
          - Update the selftests
      
            Update the user event selftests to reflect the above changes"
      
      * tag 'trace-v6.4-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
        tracing/user_events: Document auto-cleanup and remove dyn_event refs
        selftests/user_events: Adapt dyn_test to non-persist events
        selftests/user_events: Ensure auto cleanup works as expected
        tracing/user_events: Add auto cleanup and future persist flag
        tracing/user_events: Track refcount consistently via put/get
        tracing/user_events: Store register flags on events
        tracing/user_events: Remove user_ns walk for groups
        selftests/user_events: Add perf self-test for empty arguments events
        selftests/user_events: Clear the events after perf self-test
        selftests/user_events: Add ftrace self-test for empty arguments events
        tracing/user_events: Fix the incorrect trace record for empty arguments events
        tracing: Modify print_fields() for fields output order
        tracing/user_events: Handle matching arguments that is null from dyn_events
        tracing/user_events: Prevent same name but different args event
        tracing/rv/rtla: Update MAINTAINERS file to point to proper mailing list
      2e30b973
    • Linus Torvalds's avatar
      Merge tag 'for-6.4-rc7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux · 4b0c7a1b
      Linus Torvalds authored
      Pull btrfs fix from David Sterba:
       "One more regression fix for an assertion failure that uncovered a
        nasty problem with stripe calculations. This is caused by a u32
        overflow when there are enough devices. The fstests require 6 so this
        hasn't been caught, I was able to hit it with 8.
      
        The fix is minimal and only adds u64 casts, we'll clean that up later.
        I did various additional tests to be sure"
      
      * tag 'for-6.4-rc7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
        btrfs: fix u32 overflows when left shifting stripe_nr
      4b0c7a1b
    • Russ Weight's avatar
      regmap: spi-avmm: Fix regmap_bus max_raw_write · c8e79689
      Russ Weight authored
      The max_raw_write member of the regmap_spi_avmm_bus structure is defined
      as:
      	.max_raw_write = SPI_AVMM_VAL_SIZE * MAX_WRITE_CNT
      
      SPI_AVMM_VAL_SIZE == 4 and MAX_WRITE_CNT == 1 so this results in a
      maximum write transfer size of 4 bytes which provides only enough space to
      transfer the address of the target register. It provides no space for the
      value to be transferred. This bug became an issue (divide-by-zero in
      _regmap_raw_write()) after the following was accepted into mainline:
      
      commit 39815141 ("regmap: Account for register length when chunking")
      
      Change max_raw_write to include space (4 additional bytes) for both the
      register address and value:
      
      	.max_raw_write = SPI_AVMM_REG_SIZE + SPI_AVMM_VAL_SIZE * MAX_WRITE_CNT
      
      Fixes: 7f9fb673 ("regmap: add Intel SPI Slave to AVMM Bus Bridge support")
      Reviewed-by: default avatarMatthew Gerlach <matthew.gerlach@linux.intel.com>
      Signed-off-by: default avatarRuss Weight <russell.h.weight@intel.com>
      Link: https://lore.kernel.org/r/20230620202824.380313-1-russell.h.weight@intel.comSigned-off-by: default avatarMark Brown <broonie@kernel.org>
      c8e79689
    • Linus Torvalds's avatar
      Merge tag '6.4-rc6-smb3-server-fixes' of git://git.samba.org/ksmbd · 99ec1ed7
      Linus Torvalds authored
      Pull smb server fixes from Steve French:
       "Four smb3 server fixes, all also for stable:
      
         - fix potential oops in parsing compounded requests
      
         - fix various paths (mkdir, create etc) where mnt_want_write was not
           checked first
      
         - fix slab out of bounds in check_message and write"
      
      * tag '6.4-rc6-smb3-server-fixes' of git://git.samba.org/ksmbd:
        ksmbd: validate session id and tree id in the compound request
        ksmbd: fix out-of-bound read in smb2_write
        ksmbd: add mnt_want_write to ksmbd vfs functions
        ksmbd: validate command payload size
      99ec1ed7
    • Qu Wenruo's avatar
      btrfs: fix u32 overflows when left shifting stripe_nr · a7299a18
      Qu Wenruo authored
      [BUG]
      David reported an ASSERT() get triggered during fio load on 8 devices
      with data/raid6 and metadata/raid1c3:
      
        fio --rw=randrw --randrepeat=1 --size=3000m \
      	  --bsrange=512b-64k --bs_unaligned \
      	  --ioengine=libaio --fsync=1024 \
      	  --name=job0 --name=job1 \
      
      The ASSERT() is from rbio_add_bio() of raid56.c:
      
      	ASSERT(orig_logical >= full_stripe_start &&
      	       orig_logical + orig_len <= full_stripe_start +
      	       rbio->nr_data * BTRFS_STRIPE_LEN);
      
      Which is checking if the target rbio is crossing the full stripe
      boundary.
      
        [100.789] assertion failed: orig_logical >= full_stripe_start && orig_logical + orig_len <= full_stripe_start + rbio->nr_data * BTRFS_STRIPE_LEN, in fs/btrfs/raid56.c:1622
        [100.795] ------------[ cut here ]------------
        [100.796] kernel BUG at fs/btrfs/raid56.c:1622!
        [100.797] invalid opcode: 0000 [#1] PREEMPT SMP KASAN
        [100.798] CPU: 1 PID: 100 Comm: kworker/u8:4 Not tainted 6.4.0-rc6-default+ #124
        [100.799] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552-rebuilt.opensuse.org 04/01/2014
        [100.802] Workqueue: writeback wb_workfn (flush-btrfs-1)
        [100.803] RIP: 0010:rbio_add_bio+0x204/0x210 [btrfs]
        [100.806] RSP: 0018:ffff888104a8f300 EFLAGS: 00010246
        [100.808] RAX: 00000000000000a1 RBX: ffff8881075907e0 RCX: ffffed1020951e01
        [100.809] RDX: 0000000000000000 RSI: 0000000000000008 RDI: 0000000000000001
        [100.811] RBP: 0000000141d20000 R08: 0000000000000001 R09: ffff888104a8f04f
        [100.813] R10: ffffed1020951e09 R11: 0000000000000003 R12: ffff88810e87f400
        [100.815] R13: 0000000041d20000 R14: 0000000144529000 R15: ffff888101524000
        [100.817] FS:  0000000000000000(0000) GS:ffff88811ac00000(0000) knlGS:0000000000000000
        [100.821] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        [100.822] CR2: 000055d54e44c270 CR3: 000000010a9a1006 CR4: 00000000003706a0
        [100.824] Call Trace:
        [100.825]  <TASK>
        [100.825]  ? die+0x32/0x80
        [100.826]  ? do_trap+0x12d/0x160
        [100.827]  ? rbio_add_bio+0x204/0x210 [btrfs]
        [100.827]  ? rbio_add_bio+0x204/0x210 [btrfs]
        [100.829]  ? do_error_trap+0x90/0x130
        [100.830]  ? rbio_add_bio+0x204/0x210 [btrfs]
        [100.831]  ? handle_invalid_op+0x2c/0x30
        [100.833]  ? rbio_add_bio+0x204/0x210 [btrfs]
        [100.835]  ? exc_invalid_op+0x29/0x40
        [100.836]  ? asm_exc_invalid_op+0x16/0x20
        [100.837]  ? rbio_add_bio+0x204/0x210 [btrfs]
        [100.837]  raid56_parity_write+0x64/0x270 [btrfs]
        [100.838]  btrfs_submit_chunk+0x26e/0x800 [btrfs]
        [100.840]  ? btrfs_bio_init+0x80/0x80 [btrfs]
        [100.841]  ? release_pages+0x503/0x6d0
        [100.842]  ? folio_unlock+0x2f/0x60
        [100.844]  ? __folio_put+0x60/0x60
        [100.845]  ? btrfs_do_readpage+0xae0/0xae0 [btrfs]
        [100.847]  btrfs_submit_bio+0x21/0x60 [btrfs]
        [100.847]  submit_one_bio+0x6a/0xb0 [btrfs]
        [100.849]  extent_write_cache_pages+0x395/0x680 [btrfs]
        [100.850]  ? __extent_writepage+0x520/0x520 [btrfs]
        [100.851]  ? mark_usage+0x190/0x190
        [100.852]  extent_writepages+0xdb/0x130 [btrfs]
        [100.853]  ? extent_write_locked_range+0x480/0x480 [btrfs]
        [100.854]  ? mark_usage+0x190/0x190
        [100.854]  ? attach_extent_buffer_page+0x220/0x220 [btrfs]
        [100.855]  ? reacquire_held_locks+0x178/0x280
        [100.856]  ? writeback_sb_inodes+0x245/0x7f0
        [100.857]  do_writepages+0x102/0x2e0
        [100.858]  ? page_writeback_cpu_online+0x10/0x10
        [100.859]  ? __lock_release.isra.0+0x14a/0x4d0
        [100.860]  ? reacquire_held_locks+0x280/0x280
        [100.861]  ? __lock_acquired+0x1e9/0x3d0
        [100.862]  ? do_raw_spin_lock+0x1b0/0x1b0
        [100.863]  __writeback_single_inode+0x94/0x450
        [100.864]  writeback_sb_inodes+0x372/0x7f0
        [100.864]  ? lock_sync+0xd0/0xd0
        [100.865]  ? do_raw_spin_unlock+0x93/0xf0
        [100.866]  ? sync_inode_metadata+0xc0/0xc0
        [100.867]  ? rwsem_optimistic_spin+0x340/0x340
        [100.868]  __writeback_inodes_wb+0x70/0x130
        [100.869]  wb_writeback+0x2d1/0x530
        [100.869]  ? __writeback_inodes_wb+0x130/0x130
        [100.870]  ? lockdep_hardirqs_on_prepare.part.0+0xf1/0x1c0
        [100.870]  wb_do_writeback+0x3eb/0x480
        [100.871]  ? wb_writeback+0x530/0x530
        [100.871]  ? mark_lock_irq+0xcd0/0xcd0
        [100.872]  wb_workfn+0xe0/0x3f0<
      
      [CAUSE]
      Commit a97699d1 ("btrfs: replace map_lookup->stripe_len by
      BTRFS_STRIPE_LEN") changes how we calculate the map length, to reduce
      u64 division.
      
      Function btrfs_max_io_len() is to get the length to the stripe boundary.
      
      It calculates the full stripe start offset (inside the chunk) by the
      following code:
      
      		*full_stripe_start =
      			rounddown(*stripe_nr, nr_data_stripes(map)) <<
      			BTRFS_STRIPE_LEN_SHIFT;
      
      The calculation itself is fine, but the value returned by rounddown() is
      dependent on both @stripe_nr (which is u32) and nr_data_stripes() (which
      returned int).
      
      Thus the result is also u32, then we do the left shift, which can
      overflow u32.
      
      If such overflow happens, @full_stripe_start will be a value way smaller
      than @offset, causing later "full_stripe_len - (offset -
      *full_stripe_start)" to underflow, thus make later length calculation to
      have no stripe boundary limit, resulting a write bio to exceed stripe
      boundary.
      
      There are some other locations like this, with a u32 @stripe_nr got left
      shift, which can lead to a similar overflow.
      
      [FIX]
      Fix all @stripe_nr with left shift with a type cast to u64 before the
      left shift.
      
      Those involved @stripe_nr or similar variables are recording the stripe
      number inside the chunk, which is small enough to be contained by u32,
      but their offset inside the chunk can not fit into u32.
      
      Thus for those specific left shifts, a type cast to u64 is necessary so
      this patch does not touch them and the code will be cleaned up in the
      future to keep the fix minimal.
      Reported-by: default avatarDavid Sterba <dsterba@suse.com>
      Fixes: a97699d1 ("btrfs: replace map_lookup->stripe_len by BTRFS_STRIPE_LEN")
      Tested-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarQu Wenruo <wqu@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      a7299a18
    • Linus Torvalds's avatar
      Merge tag 'hyperv-fixes-signed-20230619' of... · 692b7dc8
      Linus Torvalds authored
      Merge tag 'hyperv-fixes-signed-20230619' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux
      
      Pull hyperv fixes from Wei Liu:
      
       - Fix races in Hyper-V PCI controller (Dexuan Cui)
      
       - Fix handling of hyperv_pcpu_input_arg (Michael Kelley)
      
       - Fix vmbus_wait_for_unload to scan present CPUs (Michael Kelley)
      
       - Call hv_synic_free in the failure path of hv_synic_alloc (Dexuan Cui)
      
       - Add noop for real mode handlers for virtual trust level code (Saurabh
         Sengar)
      
      * tag 'hyperv-fixes-signed-20230619' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux:
        PCI: hv: Add a per-bus mutex state_lock
        Revert "PCI: hv: Fix a timing issue which causes kdump to fail occasionally"
        PCI: hv: Remove the useless hv_pcichild_state from struct hv_pci_dev
        PCI: hv: Fix a race condition in hv_irq_unmask() that can cause panic
        PCI: hv: Fix a race condition bug in hv_pci_query_relations()
        arm64/hyperv: Use CPUHP_AP_HYPERV_ONLINE state to fix CPU online sequencing
        x86/hyperv: Fix hyperv_pcpu_input_arg handling when CPUs go online/offline
        Drivers: hv: vmbus: Fix vmbus_wait_for_unload() to scan present CPUs
        Drivers: hv: vmbus: Call hv_synic_free() if hv_synic_alloc() fails
        x86/hyperv/vtl: Add noop for realmode pointers
      692b7dc8
  4. 19 Jun, 2023 19 commits