1. 23 Aug, 2021 13 commits
    • Filipe Manana's avatar
      btrfs: continue readahead of siblings even if target node is in memory · 069a2e37
      Filipe Manana authored
      At reada_for_search(), when attempting to readahead a node or leaf's
      siblings, we skip the readahead of the siblings if the node/leaf is
      already in memory. That is probably fine for the READA_FORWARD and
      READA_BACK readahead types, as they are used on contexts where we
      end up reading some consecutive leaves, but usually not the whole btree.
      
      However for a READA_FORWARD_ALWAYS mode, currently only used for full
      send operations, it does not make sense to skip the readahead if the
      target node or leaf is already loaded in memory, since we know the caller
      is visiting every node and leaf of the btree in ascending order.
      
      So change the behaviour to not skip the readahead when the target node is
      already in memory and the readahead mode is READA_FORWARD_ALWAYS.
      
      The following test script was used to measure the improvement on a box
      using an average, consumer grade, spinning disk, with 32GiB of RAM and
      using a non-debug kernel config (Debian's default config).
      
        $ cat test.sh
        #!/bin/bash
      
        DEV=/dev/sdj
        MNT=/mnt/sdj
        MKFS_OPTIONS="--nodesize 16384"     # default, just to be explicit
        MOUNT_OPTIONS="-o max_inline=2048"  # default, just to be explicit
      
        mkfs.btrfs -f $MKFS_OPTIONS $DEV > /dev/null
        mount $MOUNT_OPTIONS $DEV $MNT
      
        # Create files with inline data to make it easier and faster to create
        # large btrees.
        add_files()
        {
            local total=$1
            local start_offset=$2
            local number_jobs=$3
            local total_per_job=$(($total / $number_jobs))
      
            echo "Creating $total new files using $number_jobs jobs"
            for ((n = 0; n < $number_jobs; n++)); do
                (
                    local start_num=$(($start_offset + $n * $total_per_job))
                    for ((i = 1; i <= $total_per_job; i++)); do
                        local file_num=$((start_num + $i))
                        local file_path="$MNT/file_${file_num}"
                        xfs_io -f -c "pwrite -S 0xab 0 2000" $file_path > /dev/null
                        if [ $? -ne 0 ]; then
                            echo "Failed creating file $file_path"
                            break
                        fi
                    done
                ) &
                worker_pids[$n]=$!
            done
      
            wait ${worker_pids[@]}
      
            sync
            echo
            echo "btree node/leaf count: $(btrfs inspect-internal dump-tree -t 5 $DEV | egrep '^(node|leaf) ' | wc -l)"
        }
      
        file_count=2000000
        add_files $file_count 0 4
      
        echo
        echo "Creating snapshot..."
        btrfs subvolume snapshot -r $MNT $MNT/snap1
      
        umount $MNT
      
        echo 3 > /proc/sys/vm/drop_caches
        blockdev --flushbufs $DEV &> /dev/null
        hdparm -F $DEV &> /dev/null
      
        mount $MOUNT_OPTIONS $DEV $MNT
      
        echo
        echo "Testing full send..."
        start=$(date +%s)
        btrfs send $MNT/snap1 > /dev/null
        end=$(date +%s)
        echo
        echo "Full send took $((end - start)) seconds"
      
        umount $MNT
      
      The duration of the full send operations, in seconds, were the following:
      
      Before this change:  85 seconds
      After this change:   76 seconds (-11.2%)
      Reviewed-by: default avatarQu Wenruo <wqu@suse.com>
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      069a2e37
    • David Sterba's avatar
      btrfs: check-integrity: drop kmap/kunmap for block pages · 5da38479
      David Sterba authored
      The pages in block_ctx have never been allocated from highmem (in
      btrfsic_read_block) so the mapping is pointless and can be removed.
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      5da38479
    • David Sterba's avatar
      btrfs: compression: drop kmap/kunmap from generic helpers · 4c2bf276
      David Sterba authored
      The pages in compressed_pages are not from highmem anymore so we can
      drop the mapping for checksum calculation and inline extent.
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      4c2bf276
    • David Sterba's avatar
      btrfs: compression: drop kmap/kunmap from zstd · bbaf9715
      David Sterba authored
      As we don't use highmem pages anymore, drop the kmap/kunmap. The kmap is
      simply page_address and kunmap is a no-op.
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      bbaf9715
    • David Sterba's avatar
      btrfs: compression: drop kmap/kunmap from zlib · 696ab562
      David Sterba authored
      As we don't use highmem pages anymore, drop the kmap/kunmap. The kmap is
      simply page_address and kunmap is a no-op.
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      696ab562
    • David Sterba's avatar
      btrfs: compression: drop kmap/kunmap from lzo · 8c945d32
      David Sterba authored
      As we don't use highmem pages anymore, drop the kmap/kunmap. The kmap is
      simply page_address and kunmap is a no-op.
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      8c945d32
    • David Sterba's avatar
      btrfs: drop from __GFP_HIGHMEM all allocations · b0ee5e1e
      David Sterba authored
      The highmem flag is used for allocating pages for compression and for
      raid56 pages. The high memory makes sense on 32bit systems but is not
      without problems. On 64bit system's it's just another layer of wrappers.
      
      The time the pages are allocated for compression or raid56 is relatively
      short (about a transaction commit), so the pages are not blocked
      indefinitely. As the number of pages depends on the amount of data being
      written/read, there's a theoretical problem. A fast device on a 32bit
      system could use most of the low memory pool, while with the highmem
      allocation that would not happen. This was possibly the original idea
      long time ago, but nowadays we optimize for 64bit systems.
      
      This patch removes all usage of the __GFP_HIGHMEM flag for page
      allocation, the kmap/kunmap are still in place and will be removed in
      followup patches. Remaining is masking out the bit in
      alloc_extent_state and __lookup_free_space_inode, that can safely stay.
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      b0ee5e1e
    • Anand Jain's avatar
      btrfs: cleanup fs_devices pointer usage in btrfs_trim_fs · 23608d51
      Anand Jain authored
      Drop variable 'devices' (used only once) and add new variable for
      the fs_devices, so it is used at two locations within btrfs_trim_fs()
      function and also helps to access fs_devices->devices.
      Signed-off-by: default avatarAnand Jain <anand.jain@oracle.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      23608d51
    • Marcos Paulo de Souza's avatar
      btrfs: remove max argument from generic_bin_search · 67d5e289
      Marcos Paulo de Souza authored
      Both callers use btrfs_header_nritems to feed the max argument.  Remove
      the argument and let generic_bin_search call it itself.
      Reviewed-by: default avatarNikolay Borisov <nborisov@suse.com>
      Signed-off-by: default avatarMarcos Paulo de Souza <mpdesouza@suse.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      67d5e289
    • Nikolay Borisov's avatar
      btrfs: make btrfs_finish_chunk_alloc private to block-group.c · 2eadb9e7
      Nikolay Borisov authored
      One of the final things that must be done to add a new chunk is
      inserting its device extent items in the device tree. They describe
      the portion of allocated device physical space during phase 1 of
      chunk allocation. This is currently done in btrfs_finish_chunk_alloc
      whose name isn't very informative. What's more, this function is only
      used in block-group.c but is defined as public. There isn't anything
      special about it that would warrant it being defined in volumes.c.
      
      Just move btrfs_finish_chunk_alloc and alloc_chunk_dev_extent to
      block-group.c, make the former static and rename both functions to
      insert_dev_extents and insert_dev_extent respectively.
      Reviewed-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarNikolay Borisov <nborisov@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      2eadb9e7
    • Anand Jain's avatar
      btrfs: check-integrity: drop unnecessary function prototypes · 4a9531cf
      Anand Jain authored
      The function prototypes below aren't necessary as the functions are
      first defined before called. Remove them.
      Signed-off-by: default avatarAnand Jain <anand.jain@oracle.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      4a9531cf
    • David Sterba's avatar
      btrfs: add special case to setget helpers for 64k pages · b3b7e1d0
      David Sterba authored
      On 64K pages the size of the extent_buffer::pages array is 1 and
      compilation with -Warray-bounds warns due to
      
        kaddr = page_address(eb->pages[idx + 1]);
      
      when reading byte range crossing page boundary.
      
      This does never actually overflow the array because on 64K because all
      the data fit in one page and bounds are checked by check_setget_bounds.
      
      To fix the reported overflows and warnings add a compile-time condition
      that will allow compiler to eliminate the dead code that reads from the
      idx + 1 page.
      
      Link: https://lore.kernel.org/lkml/20210623083901.1d49d19d@canb.auug.org.au/
      CC: Gustavo A. R. Silva <gustavoars@kernel.org>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      b3b7e1d0
    • Johannes Thumshirn's avatar
      btrfs: zoned: remove max_zone_append_size logic · 5a80d1c6
      Johannes Thumshirn authored
      There used to be a patch in the original series for zoned support which
      limited the extent size to max_zone_append_size, but this patch has been
      dropped somewhere around v9.
      
      We've decided to go the opposite direction, instead of limiting extents
      in the first place we split them before submission to comply with the
      device's limits.
      
      Remove the related code, btrfs_fs_info::max_zone_append_size and
      btrfs_zoned_device_info::max_zone_append_size.
      
      This also removes the workaround for dm-crypt introduced in
      1d68128c ("btrfs: zoned: fail mount if the device does not support
      zone append") because the fix has been merged as f34ee1dc ("dm
      crypt: Fix zoned block device support").
      Reviewed-by: default avatarAnand Jain <anand.jain@oracle.com>
      Signed-off-by: default avatarJohannes Thumshirn <johannes.thumshirn@wdc.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      5a80d1c6
  2. 22 Aug, 2021 2 commits
  3. 21 Aug, 2021 9 commits
  4. 20 Aug, 2021 16 commits
    • Jens Axboe's avatar
      io_uring: fix xa_alloc_cycle() error return value check · a30f895a
      Jens Axboe authored
      We currently check for ret != 0 to indicate error, but '1' is a valid
      return and just indicates that the allocation succeeded with a wrap.
      Correct the check to be for < 0, like it was before the xarray
      conversion.
      
      Cc: stable@vger.kernel.org
      Fixes: 61cf9370 ("io_uring: Convert personality_idr to XArray")
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      a30f895a
    • Linus Torvalds's avatar
      Merge tag 'acpi-5.14-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · fa54d366
      Linus Torvalds authored
      Pull ACPI fixes from Rafael Wysocki:
       "These fix two mistakes in new code.
      
        Specifics:
      
         - Prevent confusing messages from being printed if the PRMT table is
           not present or there are no PRM modules (Aubrey Li).
      
         - Fix the handling of suspend-to-idle entry and exit in the case when
           the Microsoft UUID is used with the Low-Power S0 Idle _DSM
           interface (Mario Limonciello)"
      
      * tag 'acpi-5.14-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
        ACPI: PM: s2idle: Invert Microsoft UUID entry and exit
        ACPI: PRM: Deal with table not present or no module found
      fa54d366
    • Linus Torvalds's avatar
      Merge tag 'pm-5.14-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · cae68764
      Linus Torvalds authored
      Pull power management fixes from Rafael Wysocki:
       "These fix some issues in the ARM cpufreq drivers and in the operating
        performance points (OPP) framework.
      
        Specifics:
      
         - Fix useless WARN() in the OPP core and prevent a noisy warning
           from being printed by OPP _put functions (Dmitry Osipenko).
      
         - Fix error path when allocation failed in the arm_scmi cpufreq
           driver (Lukasz Luba).
      
         - Blacklist Qualcomm sc8180x and Qualcomm sm8150 in
           cpufreq-dt-platdev (Bjorn Andersson, Thara Gopinath).
      
         - Forbid cpufreq for 1.2 GHz variant in the armada-37xx cpufreq
           driver (Marek Behún)"
      
      * tag 'pm-5.14-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
        opp: Drop empty-table checks from _put functions
        cpufreq: armada-37xx: forbid cpufreq for 1.2 GHz variant
        cpufreq: blocklist Qualcomm sm8150 in cpufreq-dt-platdev
        cpufreq: arm_scmi: Fix error path when allocation failed
        opp: remove WARN when no valid OPPs remain
        cpufreq: blacklist Qualcomm sc8180x in cpufreq-dt-platdev
      cae68764
    • Linus Torvalds's avatar
      Merge branch 'akpm' (patches from Andrew) · ed3bad2e
      Linus Torvalds authored
      Merge misc fixes from Andrew Morton:
       "10 patches.
      
        Subsystems affected by this patch series: MAINTAINERS and mm (shmem,
        pagealloc, tracing, memcg, memory-failure, vmscan, kfence, and
        hugetlb)"
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>:
        hugetlb: don't pass page cache pages to restore_reserve_on_error
        kfence: fix is_kfence_address() for addresses below KFENCE_POOL_SIZE
        mm: vmscan: fix missing psi annotation for node_reclaim()
        mm/hwpoison: retry with shake_page() for unhandlable pages
        mm: memcontrol: fix occasional OOMs due to proportional memory.low reclaim
        MAINTAINERS: update ClangBuiltLinux IRC chat
        mmflags.h: add missing __GFP_ZEROTAGS and __GFP_SKIP_KASAN_POISON names
        mm/page_alloc: don't corrupt pcppage_migratetype
        Revert "mm: swap: check if swap backing device is congested or not"
        Revert "mm/shmem: fix shmem_swapin() race with swapoff"
      ed3bad2e
    • Linus Torvalds's avatar
      Merge tag 'drm-fixes-2021-08-20-3' of git://anongit.freedesktop.org/drm/drm · 8ba9fbe1
      Linus Torvalds authored
      Pull drm fixes from Dave Airlie:
       "Regularly scheduled fixes. The ttm one solves a problem of GPU drivers
        failing to load if debugfs is off in Kconfig, otherwise the i915 and
        mediatek, and amdgpu fixes all fairly normal.
      
        Nouveau has a couple of display fixes, but it has a fix for a
        longstanding race condition in it's memory manager code, and the fix
        mostly removes some code that wasn't working properly and has no
        userspace users. This fix makes the diffstat kinda larger but in a
        good (negative line-count) way.
      
        core:
         - fix drm_wait_vblank uapi copying bug
      
        ttm:
         - fix debugfs init when debugfs is off
      
        amdgpu:
         - vega10 SMU workload fix
         - DCN VM fix
         - DCN 3.01 watermark fix
      
        amdkfd:
         - SVM fix
      
        nouveau:
         - ampere display fixes
         - remove MM misfeature to fix a longstanding race condition
      
        i915:
         - tweaked display workaround for all PCHs
         - eDP MSO pipe sanity for ADL-P fix
         - remove unused symbol export
      
        mediatek:
         - AAL output size setting
         - Delete component in remove function"
      
      * tag 'drm-fixes-2021-08-20-3' of git://anongit.freedesktop.org/drm/drm:
        drm/amd/display: Use DCN30 watermark calc for DCN301
        drm/i915/dp: remove superfluous EXPORT_SYMBOL()
        drm/i915/edp: fix eDP MSO pipe sanity checks for ADL-P
        drm/i915: Tweaked Wa_14010685332 for all PCHs
        drm/nouveau: rip out nvkm_client.super
        drm/nouveau: block a bunch of classes from userspace
        drm/nouveau/fifo/nv50-: rip out dma channels
        drm/nouveau/kms/nv50: workaround EFI GOP window channel format differences
        drm/nouveau/disp: power down unused DP links during init
        drm/nouveau: recognise GA107
        drm: Copy drm_wait_vblank to user before returning
        drm/amd/display: Ensure DCN save after VM setup
        drm/amdkfd: fix random KFDSVMRangeTest.SetGetAttributesTest test failure
        drm/amd/pm: change the workload type for some cards
        Revert "drm/amd/pm: fix workload mismatch on vega10"
        drm: ttm: Don't bail from ttm_global_init if debugfs_create_dir fails
        drm/mediatek: Add component_del in OVL and COLOR remove function
        drm/mediatek: Add AAL output size configuration
      8ba9fbe1
    • Linus Torvalds's avatar
      Merge tag 'pci-v5.14-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci · 3db903a8
      Linus Torvalds authored
      Pull PCI fixes from Bjorn Helgaas:
      
       - Add Rahul Tanwar as Intel LGM Gateway PCIe maintainer (Rahul Tanwar)
      
       - Add Jim Quinlan et al as Broadcom STB PCIe maintainers (Jim Quinlan)
      
       - Increase D3hot-to-D0 delay for AMD Renoir/Cezanne XHCI (Marcin
         Bachry)
      
       - Correct iomem_get_mapping() usage for legacy_mem sysfs (Krzysztof
         Wilczyński)
      
      * tag 'pci-v5.14-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
        PCI/sysfs: Use correct variable for the legacy_mem sysfs object
        PCI: Increase D3 delay for AMD Renoir/Cezanne XHCI
        MAINTAINERS: Add Jim Quinlan et al as Broadcom STB PCIe maintainers
        MAINTAINERS: Add Rahul Tanwar as Intel LGM Gateway PCIe maintainer
      3db903a8
    • Linus Torvalds's avatar
      Merge tag 'mmc-v5.14-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc · a27c75e5
      Linus Torvalds authored
      Pull MMC host fixes from Ulf Hansson:
      
       - dw_mmc: Fix hang on data CRC error
      
       - mmci: Fix voltage switch procedure for the stm32 variant
      
       - sdhci-iproc: Fix some clock issues for BCM2711
      
       - sdhci-msm: Fixup software timeout value
      
      * tag 'mmc-v5.14-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
        mmc: sdhci-iproc: Set SDHCI_QUIRK_CAP_CLOCK_BASE_BROKEN on BCM2711
        mmc: sdhci-iproc: Cap min clock frequency on BCM2711
        mmc: sdhci-msm: Update the software timeout value for sdhc
        mmc: mmci: stm32: Check when the voltage switch procedure should be done
        mmc: dw_mmc: Fix hang on data CRC error
      a27c75e5
    • Linus Torvalds's avatar
      Merge tag 'sound-5.14-rc7-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound · 43a6473e
      Linus Torvalds authored
      Pull more sound fixes from Takashi Iwai:
       "This is a quick follow up for 5.14: a fix for a very recently
        introduced regression on ASoC Intel Atom driver, and another trivial
        HD-audio quirk for HP laptops"
      
      * tag 'sound-5.14-rc7-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
        ASoC: intel: atom: Fix breakage for PCM buffer address setup
        ALSA: hda/realtek: Limit mic boost on HP ProBook 445 G8
      43a6473e
    • Linus Torvalds's avatar
      Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux · 54e9ea3c
      Linus Torvalds authored
      Pull arm64 fixes from Will Deacon:
      
       - Fix cleaning of vDSO directories
      
       - Ensure CNTHCTL_EL2 is fully initialised when booting at EL2
      
      * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
        arm64: initialize all of CNTHCTL_EL2
        arm64: clean vdso & vdso32 files
      54e9ea3c
    • Rafael J. Wysocki's avatar
      Merge branch 'acpi-pm' · 0f09f4c4
      Rafael J. Wysocki authored
      * acpi-pm:
        ACPI: PM: s2idle: Invert Microsoft UUID entry and exit
      0f09f4c4
    • Linus Torvalds's avatar
      Merge tag 'iommu-fixes-v5.14-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu · b7d184d3
      Linus Torvalds authored
      Pull iommu fixes from Joerg Roedel:
      
       - Fix for a potential NULL-ptr dereference in IOMMU core code
      
       - Two resource leak fixes
      
       - Cache flush fix in the Intel VT-d driver
      
      * tag 'iommu-fixes-v5.14-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
        iommu/vt-d: Fix incomplete cache flush in intel_pasid_tear_down_entry()
        iommu/vt-d: Fix PASID reference leak
        iommu: Check if group is NULL before remove device
        iommu/dma: Fix leak in non-contiguous API
      b7d184d3
    • Rafael J. Wysocki's avatar
      Merge branch 'pm-opp' · f2963c7e
      Rafael J. Wysocki authored
      * pm-opp:
        opp: Drop empty-table checks from _put functions
        opp: remove WARN when no valid OPPs remain
      f2963c7e
    • Mike Kravetz's avatar
      hugetlb: don't pass page cache pages to restore_reserve_on_error · c7b1850d
      Mike Kravetz authored
      syzbot hit kernel BUG at fs/hugetlbfs/inode.c:532 as described in [1].
      This BUG triggers if the HPageRestoreReserve flag is set on a page in
      the page cache.  It should never be set, as the routine
      huge_add_to_page_cache explicitly clears the flag after adding a page to
      the cache.
      
      The only code other than huge page allocation which sets the flag is
      restore_reserve_on_error.  It will potentially set the flag in rare out
      of memory conditions.  syzbot was injecting errors to cause memory
      allocation errors which exercised this specific path.
      
      The code in restore_reserve_on_error is doing the right thing.  However,
      there are instances where pages in the page cache were being passed to
      restore_reserve_on_error.  This is incorrect, as once a page goes into
      the cache reservation information will not be modified for the page
      until it is removed from the cache.  Error paths do not remove pages
      from the cache, so even in the case of error, the page will remain in
      the cache and no reservation adjustment is needed.
      
      Modify routines that potentially call restore_reserve_on_error with a
      page cache page to no longer do so.
      
      Note on fixes tag: Prior to commit 846be085 ("mm/hugetlb: expand
      restore_reserve_on_error functionality") the routine would not process
      page cache pages because the HPageRestoreReserve flag is not set on such
      pages.  Therefore, this issue could not be trigggered.  The code added
      by commit 846be085 ("mm/hugetlb: expand restore_reserve_on_error
      functionality") is needed and correct.  It exposed incorrect calls to
      restore_reserve_on_error which is the root cause addressed by this
      commit.
      
      [1] https://lore.kernel.org/linux-mm/00000000000050776d05c9b7c7f0@google.com/
      
      Link: https://lkml.kernel.org/r/20210818213304.37038-1-mike.kravetz@oracle.com
      Fixes: 846be085 ("mm/hugetlb: expand restore_reserve_on_error functionality")
      Signed-off-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Reported-by: <syzbot+67654e51e54455f1c585@syzkaller.appspotmail.com>
      Cc: Mina Almasry <almasrymina@google.com>
      Cc: Axel Rasmussen <axelrasmussen@google.com>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Muchun Song <songmuchun@bytedance.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Naoya Horiguchi <naoya.horiguchi@linux.dev>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c7b1850d
    • Marco Elver's avatar
      kfence: fix is_kfence_address() for addresses below KFENCE_POOL_SIZE · a7cb5d23
      Marco Elver authored
      Originally the addr != NULL check was meant to take care of the case
      where __kfence_pool == NULL (KFENCE is disabled).  However, this does
      not work for addresses where addr > 0 && addr < KFENCE_POOL_SIZE.
      
      This can be the case on NULL-deref where addr > 0 && addr < PAGE_SIZE or
      any other faulting access with addr < KFENCE_POOL_SIZE.  While the
      kernel would likely crash, the stack traces and report might be
      confusing due to double faults upon KFENCE's attempt to unprotect such
      an address.
      
      Fix it by just checking that __kfence_pool != NULL instead.
      
      Link: https://lkml.kernel.org/r/20210818130300.2482437-1-elver@google.com
      Fixes: 0ce20dd8 ("mm: add Kernel Electric-Fence infrastructure")
      Signed-off-by: default avatarMarco Elver <elver@google.com>
      Reported-by: default avatarKuan-Ying Lee <Kuan-Ying.Lee@mediatek.com>
      Acked-by: default avatarAlexander Potapenko <glider@google.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: <stable@vger.kernel.org>    [5.12+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a7cb5d23
    • Johannes Weiner's avatar
      mm: vmscan: fix missing psi annotation for node_reclaim() · 57f29762
      Johannes Weiner authored
      In a debugging session the other day, Rik noticed that node_reclaim()
      was missing memstall annotations.  This means we'll miss pressure and
      lost productivity resulting from reclaim on an overloaded local NUMA
      node when vm.zone_reclaim_mode is enabled.
      
      There haven't been any reports, but that's likely because
      vm.zone_reclaim_mode hasn't been a commonly used feature recently, and
      the intersection between such setups and psi users is probably nil.
      
      But secondary memory such as CXL-connected DIMMS, persistent memory etc,
      and the page demotion patches that handle them
      (https://lore.kernel.org/lkml/20210401183216.443C4443@viggo.jf.intel.com/)
      could soon make this a more common codepath again.
      
      Link: https://lkml.kernel.org/r/20210818152457.35846-1-hannes@cmpxchg.orgSigned-off-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Reported-by: default avatarRik van Riel <riel@surriel.com>
      Reviewed-by: default avatarShakeel Butt <shakeelb@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      57f29762
    • Naoya Horiguchi's avatar
      mm/hwpoison: retry with shake_page() for unhandlable pages · fcc00621
      Naoya Horiguchi authored
      HWPoisonHandlable() sometimes returns false for typical user pages due
      to races with average memory events like transfers over LRU lists.  This
      causes failures in hwpoison handling.
      
      There's retry code for such a case but does not work because the retry
      loop reaches the retry limit too quickly before the page settles down to
      handlable state.  Let get_any_page() call shake_page() to fix it.
      
      [naoya.horiguchi@nec.com: get_any_page(): return -EIO when retry limit reached]
        Link: https://lkml.kernel.org/r/20210819001958.2365157-1-naoya.horiguchi@linux.dev
      
      Link: https://lkml.kernel.org/r/20210817053703.2267588-1-naoya.horiguchi@linux.dev
      Fixes: 25182f05 ("mm,hwpoison: fix race with hugetlb page allocation")
      Signed-off-by: default avatarNaoya Horiguchi <naoya.horiguchi@nec.com>
      Reported-by: default avatarTony Luck <tony.luck@intel.com>
      Reviewed-by: default avatarYang Shi <shy828301@gmail.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: Muchun Song <songmuchun@bytedance.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: <stable@vger.kernel.org>		[5.13+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      fcc00621