1. 16 Nov, 2020 2 commits
  2. 15 Nov, 2020 11 commits
    • Linus Torvalds's avatar
      Merge tag 'drm-fixes-2020-11-16' of git://anongit.freedesktop.org/drm/drm · a6af8718
      Linus Torvalds authored
      Pull drm fixes from Dave Airlie:
       "Nouveau fixes:
      
         - atomic modesetting regression fix
      
         - ttm pre-nv50 fix
      
         - connector NULL ptr deref fix"
      
      * tag 'drm-fixes-2020-11-16' of git://anongit.freedesktop.org/drm/drm:
        drm/nouveau/kms/nv50-: Use atomic encoder callbacks everywhere
        drm/nouveau/ttm: avoid using nouveau_drm.ttm.type_vram prior to nv50
        drm/nouveau/kms: Fix NULL pointer dereference in nouveau_connector_detect_depth
      a6af8718
    • Dave Airlie's avatar
      Merge branch 'linux-5.10' of git://github.com/skeggsb/linux into drm-fixes · 8f598d15
      Dave Airlie authored
      - atomic modesetting regression fix
      - ttm pre-nv50 fix
      - connector NULL ptr deref fix
      Signed-off-by: default avatarDave Airlie <airlied@redhat.com>
      From: Ben Skeggs <skeggsb@gmail.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/CACAvsv5D9p78MNN0OxVeRZxN8LDqcadJEGUEFCgWJQ6+_rjPuw@mail.gmail.com
      8f598d15
    • Linus Torvalds's avatar
      Merge tag 'char-misc-5.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc · 9cfd9c45
      Linus Torvalds authored
      Pull char/misc driver fixes from Greg KH:
       "Here are some small char/misc/whatever driver fixes for 5.10-rc4.
      
        Nothing huge, lots of small fixes for reported issues:
      
         - habanalabs driver fixes
      
         - speakup driver fixes
      
         - uio driver fixes
      
         - virtio driver fix
      
         - other tiny driver fixes
      
        Full details are in the shortlog.
      
        All of these have been in linux-next for a full week with no reported
        issues"
      
      * tag 'char-misc-5.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
        uio: Fix use-after-free in uio_unregister_device()
        firmware: xilinx: fix out-of-bounds access
        nitro_enclaves: Fixup type and simplify logic of the poll mask setup
        speakup ttyio: Do not schedule() in ttyio_in_nowait
        speakup: Fix clearing selection in safe context
        speakup: Fix var_id_t values and thus keymap
        virtio: virtio_console: fix DMA memory allocation for rproc serial
        habanalabs/gaudi: mask WDT error in QMAN
        habanalabs/gaudi: move coresight mmu config
        habanalabs: fix kernel pointer type
        mei: protect mei_cl_mtu from null dereference
      9cfd9c45
    • Linus Torvalds's avatar
      Merge tag 'usb-5.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb · 281b3ec3
      Linus Torvalds authored
      Pull USB and Thunderbolt fixes from Greg KH:
       "Here are some small Thunderbolt and USB driver fixes for 5.10-rc4 to
        solve some reported issues.
      
        Nothing huge in here, just small things:
      
         - thunderbolt memory leaks fixed and new device ids added
      
         - revert of problem patch for the musb driver
      
         - new quirks added for USB devices
      
         - typec power supply fixes to resolve much reported problems about
           charging notifications not working anymore
      
        All except the cdc-acm driver quirk addition have been in linux-next
        with no reported issues (the quirk patch was applied on Friday, and is
        self-contained)"
      
      * tag 'usb-5.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
        usb: cdc-acm: Add DISABLE_ECHO for Renesas USB Download mode
        MAINTAINERS: add usb raw gadget entry
        usb: typec: ucsi: Report power supply changes
        xhci: hisilicon: fix refercence leak in xhci_histb_probe
        Revert "usb: musb: convert to devm_platform_ioremap_resource_byname"
        thunderbolt: Add support for Intel Tiger Lake-H
        thunderbolt: Only configure USB4 wake for lane 0 adapters
        thunderbolt: Add uaccess dependency to debugfs interface
        thunderbolt: Fix memory leak if ida_simple_get() fails in enumerate_services()
        thunderbolt: Add the missed ida_simple_remove() in ring_request_msix()
      281b3ec3
    • Linus Torvalds's avatar
      Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm · 0062442e
      Linus Torvalds authored
      Pull kvm fixes from Paolo Bonzini:
       "Fixes for ARM and x86, the latter especially for old processors
        without two-dimensional paging (EPT/NPT)"
      
      * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
        kvm: mmu: fix is_tdp_mmu_check when the TDP MMU is not in use
        KVM: SVM: Update cr3_lm_rsvd_bits for AMD SEV guests
        KVM: x86: Introduce cr3_lm_rsvd_bits in kvm_vcpu_arch
        KVM: x86: clflushopt should be treated as a no-op by emulation
        KVM: arm64: Handle SCXTNUM_ELx traps
        KVM: arm64: Unify trap handlers injecting an UNDEF
        KVM: arm64: Allow setting of ID_AA64PFR0_EL1.CSV2 from userspace
      0062442e
    • Linus Torvalds's avatar
      Merge tag 'x86-urgent-2020-11-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 326fd6db
      Linus Torvalds authored
      Pull x86 fixes from Thomas Gleixner:
       "A small set of fixes for x86:
      
         - Cure the fallout from the MSI irqdomain overhaul which missed that
           the Intel IOMMU does not register virtual function devices and
           therefore never reaches the point where the MSI interrupt domain is
           assigned. This made the VF devices use the non-remapped MSI domain
           which is trapped by the IOMMU/remap unit
      
         - Remove an extra space in the SGI_UV architecture type procfs output
           for UV5
      
         - Remove a unused function which was missed when removing the UV BAU
           TLB shootdown handler"
      
      * tag 'x86-urgent-2020-11-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        iommu/vt-d: Cure VF irqdomain hickup
        x86/platform/uv: Fix copied UV5 output archtype
        x86/platform/uv: Drop last traces of uv_flush_tlb_others
      326fd6db
    • Linus Torvalds's avatar
      Merge tag 'perf-urgent-2020-11-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 64b609d6
      Linus Torvalds authored
      Pull perf fixes from Thomas Gleixner:
       "A set of fixes for perf:
      
          - A set of commits which reduce the stack usage of various perf
            event handling functions which allocated large data structs on
            stack causing stack overflows in the worst case
      
          - Use the proper mechanism for detecting soft interrupts in the
            recursion protection
      
          - Make the resursion protection simpler and more robust
      
          - Simplify the scheduling of event groups to make the code more
            robust and prepare for fixing the issues vs. scheduling of
            exclusive event groups
      
          - Prevent event multiplexing and rotation for exclusive event groups
      
          - Correct the perf event attribute exclusive semantics to take
            pinned events, e.g. the PMU watchdog, into account
      
          - Make the anythread filtering conditional for Intel's generic PMU
            counters as it is not longer guaranteed to be supported on newer
            CPUs. Check the corresponding CPUID leaf to make sure
      
          - Fixup a duplicate initialization in an array which was probably
            caused by the usual 'copy & paste - forgot to edit' mishap"
      
      * tag 'perf-urgent-2020-11-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        perf/x86/intel/uncore: Fix Add BW copypasta
        perf/x86/intel: Make anythread filter support conditional
        perf: Tweak perf_event_attr::exclusive semantics
        perf: Fix event multiplexing for exclusive groups
        perf: Simplify group_sched_in()
        perf: Simplify group_sched_out()
        perf/x86: Make dummy_iregs static
        perf/arch: Remove perf_sample_data::regs_user_copy
        perf: Optimize get_recursion_context()
        perf: Fix get_recursion_context()
        perf/x86: Reduce stack usage for x86_pmu::drain_pebs()
        perf: Reduce stack usage of perf_output_begin()
      64b609d6
    • Linus Torvalds's avatar
      Merge tag 'sched-urgent-2020-11-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · d0a37fd5
      Linus Torvalds authored
      Pull scheduler fixes from Thomas Gleixner:
       "A set of scheduler fixes:
      
         - Address a load balancer regression by making the load balancer use
           the same logic as the wakeup path to spread tasks in the LLC domain
      
         - Prefer the CPU on which a task run last over the local CPU in the
           fast wakeup path for asymmetric CPU capacity systems to align with
           the symmetric case. This ensures more locality and prevents massive
           migration overhead on those asymetric systems
      
         - Fix a memory corruption bug in the scheduler debug code caused by
           handing a modified buffer pointer to kfree()"
      
      * tag 'sched-urgent-2020-11-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        sched/debug: Fix memory corruption caused by multiple small reads of flags
        sched/fair: Prefer prev cpu in asymmetric wakeup path
        sched/fair: Ensure tasks spreading in LLC during LB
      d0a37fd5
    • Linus Torvalds's avatar
      Merge tag 'locking-urgent-2020-11-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 259c2fbe
      Linus Torvalds authored
      Pull locking fixes from Thomas Gleixner:
       "Two fixes for the locking subsystem:
      
         - Prevent an unconditional interrupt enable in a futex helper
           function which can be called from contexts which expect interrupts
           to stay disabled across the call
      
         - Don't modify lockdep chain keys in the validation process as that
           causes chain inconsistency"
      
      * tag 'locking-urgent-2020-11-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        lockdep: Avoid to modify chain keys in validate_chain()
        futex: Don't enable IRQs unconditionally in put_pi_state()
      259c2fbe
    • Linus Torvalds's avatar
      Merge branch 'for-5.10-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/dennis/percpu · a50cf159
      Linus Torvalds authored
      Pull percpu fix and cleanup from Dennis Zhou:
       "A fix for a Wshadow warning in the asm-generic percpu macros came in
        and then I tacked on the removal of flexible array initializers in the
        percpu allocator"
      
      * 'for-5.10-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/dennis/percpu:
        percpu: convert flexible array initializers to use struct_size()
        asm-generic: percpu: avoid Wshadow warning
      a50cf159
    • Paolo Bonzini's avatar
      kvm: mmu: fix is_tdp_mmu_check when the TDP MMU is not in use · c887c9b9
      Paolo Bonzini authored
      In some cases where shadow paging is in use, the root page will
      be either mmu->pae_root or vcpu->arch.mmu->lm_root.  Then it will
      not have an associated struct kvm_mmu_page, because it is allocated
      with alloc_page instead of kvm_mmu_alloc_page.
      
      Just return false quickly from is_tdp_mmu_root if the TDP MMU is
      not in use, which also includes the case where shadow paging is
      enabled.
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      c887c9b9
  3. 14 Nov, 2020 26 commits
    • Linus Torvalds's avatar
      Merge branch 'akpm' (patches from Andrew) · e28c0d7c
      Linus Torvalds authored
      Merge fixes from Andrew Morton:
       "14 patches.
      
        Subsystems affected by this patch series: mm (migration, vmscan, slub,
        gup, memcg, hugetlbfs), mailmap, kbuild, reboot, watchdog, panic, and
        ocfs2"
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>:
        ocfs2: initialize ip_next_orphan
        panic: don't dump stack twice on warn
        hugetlbfs: fix anon huge page migration race
        mm: memcontrol: fix missing wakeup polling thread
        kernel/watchdog: fix watchdog_allowed_mask not used warning
        reboot: fix overflow parsing reboot cpu number
        Revert "kernel/reboot.c: convert simple_strtoul to kstrtoint"
        compiler.h: fix barrier_data() on clang
        mm/gup: use unpin_user_pages() in __gup_longterm_locked()
        mm/slub: fix panic in slab_alloc_node()
        mailmap: fix entry for Dmitry Baryshkov/Eremin-Solenikov
        mm/vmscan: fix NR_ISOLATED_FILE corruption on 64-bit
        mm/compaction: stop isolation if too many pages are isolated and we have pages to migrate
        mm/compaction: count pages and stop correctly during page isolation
      e28c0d7c
    • Linus Torvalds's avatar
      Merge tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux · 31908a60
      Linus Torvalds authored
      Pull clk fixes from Stephen Boyd:
       "Two small clk driver fixes:
      
         - Make to_clk_regmap() inline to avoid compiler annoyance
      
         - Fix critical clks on i.MX imx8m SoCs"
      
      * tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
        clk: imx8m: fix bus critical clk registration
        clk: define to_clk_regmap() as inline function
      31908a60
    • Linus Torvalds's avatar
      Merge tag 'hwmon-for-v5.10-rc4' of... · 7e908b74
      Linus Torvalds authored
      Merge tag 'hwmon-for-v5.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging
      
      Pull hwmon fixes from Guenter Roeck:
      
       - Fix potential bufer overflow in pmbus/max20730 driver
      
       - Fix locking issue in pmbus core
      
       - Fix regression causing timeouts in applesmc driver
      
       - Fix RPM calculation in pwm-fan driver
      
       - Restrict counter visibility in amd_energy driver
      
      * tag 'hwmon-for-v5.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
        hwmon: (amd_energy) modify the visibility of the counters
        hwmon: (applesmc) Re-work SMC comms
        hwmon: (pwm-fan) Fix RPM calculation
        hwmon: (pmbus) Add mutex locking for sysfs reads
        hwmon: (pmbus/max20730) use scnprintf() instead of snprintf()
      7e908b74
    • Linus Torvalds's avatar
      Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi · 0c045111
      Linus Torvalds authored
      Pull SCSI fixes from James Bottomley:
       "Three small fixes, all in the embedded ufs driver subsystem"
      
      * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
        scsi: ufshcd: Fix missing destroy_workqueue()
        scsi: ufs: Try to save power mode change and UIC cmd completion timeout
        scsi: ufs: Fix unbalanced scsi_block_reqs_cnt caused by ufshcd_hold()
      0c045111
    • Linus Torvalds's avatar
      Merge tag 'selinux-pr-20201113' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux · 30636a59
      Linus Torvalds authored
      Pull selinux fix from Paul Moore:
       "One small SELinux patch to make sure we return an error code when an
        allocation fails. It passes all of our tests, but given the nature of
        the patch that isn't surprising"
      
      * tag 'selinux-pr-20201113' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
        selinux: Fix error return code in sel_ib_pkey_sid_slow()
      30636a59
    • Linus Torvalds's avatar
      Merge tag 'for-linus-5.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml · 4aea779d
      Linus Torvalds authored
      Pull uml fix from Richard Weinberger:
       "Call PMD destructor in __pmd_free_tlb()"
      
      * tag 'for-linus-5.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml:
        um: Call pgtable_pmd_page_dtor() in __pmd_free_tlb()
      4aea779d
    • David Howells's avatar
      afs: Fix afs_write_end() when called with copied == 0 [ver #3] · 3ad216ee
      David Howells authored
      When afs_write_end() is called with copied == 0, it tries to set the
      dirty region, but there's no way to actually encode a 0-length region in
      the encoding in page->private.
      
      "0,0", for example, indicates a 1-byte region at offset 0.  The maths
      miscalculates this and sets it incorrectly.
      
      Fix it to just do nothing but unlock and put the page in this case.  We
      don't actually need to mark the page dirty as nothing presumably
      changed.
      
      Fixes: 65dd2d60 ("afs: Alter dirty range encoding in page->private")
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      3ad216ee
    • Wengang Wang's avatar
      ocfs2: initialize ip_next_orphan · f5785283
      Wengang Wang authored
      Though problem if found on a lower 4.1.12 kernel, I think upstream has
      same issue.
      
      In one node in the cluster, there is the following callback trace:
      
         # cat /proc/21473/stack
         __ocfs2_cluster_lock.isra.36+0x336/0x9e0 [ocfs2]
         ocfs2_inode_lock_full_nested+0x121/0x520 [ocfs2]
         ocfs2_evict_inode+0x152/0x820 [ocfs2]
         evict+0xae/0x1a0
         iput+0x1c6/0x230
         ocfs2_orphan_filldir+0x5d/0x100 [ocfs2]
         ocfs2_dir_foreach_blk+0x490/0x4f0 [ocfs2]
         ocfs2_dir_foreach+0x29/0x30 [ocfs2]
         ocfs2_recover_orphans+0x1b6/0x9a0 [ocfs2]
         ocfs2_complete_recovery+0x1de/0x5c0 [ocfs2]
         process_one_work+0x169/0x4a0
         worker_thread+0x5b/0x560
         kthread+0xcb/0xf0
         ret_from_fork+0x61/0x90
      
      The above stack is not reasonable, the final iput shouldn't happen in
      ocfs2_orphan_filldir() function.  Looking at the code,
      
        2067         /* Skip inodes which are already added to recover list, since dio may
        2068          * happen concurrently with unlink/rename */
        2069         if (OCFS2_I(iter)->ip_next_orphan) {
        2070                 iput(iter);
        2071                 return 0;
        2072         }
        2073
      
      The logic thinks the inode is already in recover list on seeing
      ip_next_orphan is non-NULL, so it skip this inode after dropping a
      reference which incremented in ocfs2_iget().
      
      While, if the inode is already in recover list, it should have another
      reference and the iput() at line 2070 should not be the final iput
      (dropping the last reference).  So I don't think the inode is really in
      the recover list (no vmcore to confirm).
      
      Note that ocfs2_queue_orphans(), though not shown up in the call back
      trace, is holding cluster lock on the orphan directory when looking up
      for unlinked inodes.  The on disk inode eviction could involve a lot of
      IOs which may need long time to finish.  That means this node could hold
      the cluster lock for very long time, that can lead to the lock requests
      (from other nodes) to the orhpan directory hang for long time.
      
      Looking at more on ip_next_orphan, I found it's not initialized when
      allocating a new ocfs2_inode_info structure.
      
      This causes te reflink operations from some nodes hang for very long
      time waiting for the cluster lock on the orphan directory.
      
      Fix: initialize ip_next_orphan as NULL.
      Signed-off-by: default avatarWengang Wang <wen.gang.wang@oracle.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarJoseph Qi <joseph.qi@linux.alibaba.com>
      Cc: Mark Fasheh <mark@fasheh.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Junxiao Bi <junxiao.bi@oracle.com>
      Cc: Changwei Ge <gechangwei@live.cn>
      Cc: Gang He <ghe@suse.com>
      Cc: Jun Piao <piaojun@huawei.com>
      Cc: <stable@vger.kernel.org>
      Link: https://lkml.kernel.org/r/20201109171746.27884-1-wen.gang.wang@oracle.comSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      f5785283
    • Christophe Leroy's avatar
      panic: don't dump stack twice on warn · 2f31ad64
      Christophe Leroy authored
      Before commit 3f388f28 ("panic: dump registers on panic_on_warn"),
      __warn() was calling show_regs() when regs was not NULL, and show_stack()
      otherwise.
      
      After that commit, show_stack() is called regardless of whether
      show_regs() has been called or not, leading to duplicated Call Trace:
      
        ------------[ cut here ]------------
        WARNING: CPU: 0 PID: 1 at arch/powerpc/mm/nohash/8xx.c:186 mmu_mark_initmem_nx+0x24/0x94
        CPU: 0 PID: 1 Comm: swapper Not tainted 5.10.0-rc2-s3k-dev-01375-gf46ec0d3ecbd-dirty #4092
        NIP:  c00128b4 LR: c0010228 CTR: 00000000
        REGS: c9023e40 TRAP: 0700   Not tainted  (5.10.0-rc2-s3k-dev-01375-gf46ec0d3ecbd-dirty)
        MSR:  00029032 <EE,ME,IR,DR,RI>  CR: 24000424  XER: 00000000
      
        GPR00: c0010228 c9023ef8 c2100000 0074c000 ffffffff 00000000 c2151000 c07b3880
        GPR08: ff000900 0074c000 c8000000 c33b53a8 24000822 00000000 c0003a20 00000000
        GPR16: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
        GPR24: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00800000
        NIP [c00128b4] mmu_mark_initmem_nx+0x24/0x94
        LR [c0010228] free_initmem+0x20/0x58
        Call Trace:
          free_initmem+0x20/0x58
          kernel_init+0x1c/0x114
          ret_from_kernel_thread+0x14/0x1c
        Instruction dump:
        7d291850 7d234b78 4e800020 9421ffe0 7c0802a6 bfc10018 3fe0c060 3bff0000
        3fff4080 3bffffff 90010024 57ff0010 <0fe00000> 392001cd 7c3e0b78 953e0008
        CPU: 0 PID: 1 Comm: swapper Not tainted 5.10.0-rc2-s3k-dev-01375-gf46ec0d3ecbd-dirty #4092
        Call Trace:
          __warn+0x8c/0xd8 (unreliable)
          report_bug+0x11c/0x154
          program_check_exception+0x1dc/0x6e0
          ret_from_except_full+0x0/0x4
        --- interrupt: 700 at mmu_mark_initmem_nx+0x24/0x94
            LR = free_initmem+0x20/0x58
          free_initmem+0x20/0x58
          kernel_init+0x1c/0x114
          ret_from_kernel_thread+0x14/0x1c
        ---[ end trace 31702cd2a9570752 ]---
      
      Only call show_stack() when regs is NULL.
      
      Fixes: 3f388f28 ("panic: dump registers on panic_on_warn")
      Signed-off-by: default avatarChristophe Leroy <christophe.leroy@csgroup.eu>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Alexey Kardashevskiy <aik@ozlabs.ru>
      Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
      Link: https://lkml.kernel.org/r/e8c055458b080707f1bc1a98ff8bea79d0cec445.1604748361.git.christophe.leroy@csgroup.euSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      2f31ad64
    • Mike Kravetz's avatar
      hugetlbfs: fix anon huge page migration race · 336bf30e
      Mike Kravetz authored
      Qian Cai reported the following BUG in [1]
      
        LTP: starting move_pages12
        BUG: unable to handle page fault for address: ffffffffffffffe0
        ...
        RIP: 0010:anon_vma_interval_tree_iter_first+0xa2/0x170 avc_start_pgoff at mm/interval_tree.c:63
        Call Trace:
          rmap_walk_anon+0x141/0xa30 rmap_walk_anon at mm/rmap.c:1864
          try_to_unmap+0x209/0x2d0 try_to_unmap at mm/rmap.c:1763
          migrate_pages+0x1005/0x1fb0
          move_pages_and_store_status.isra.47+0xd7/0x1a0
          __x64_sys_move_pages+0xa5c/0x1100
          do_syscall_64+0x5f/0x310
          entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Hugh Dickins diagnosed this as a migration bug caused by code introduced
      to use i_mmap_rwsem for pmd sharing synchronization.  Specifically, the
      routine unmap_and_move_huge_page() is always passing the TTU_RMAP_LOCKED
      flag to try_to_unmap() while holding i_mmap_rwsem.  This is wrong for
      anon pages as the anon_vma_lock should be held in this case.  Further
      analysis suggested that i_mmap_rwsem was not required to he held at all
      when calling try_to_unmap for anon pages as an anon page could never be
      part of a shared pmd mapping.
      
      Discussion also revealed that the hack in hugetlb_page_mapping_lock_write
      to drop page lock and acquire i_mmap_rwsem is wrong.  There is no way to
      keep mapping valid while dropping page lock.
      
      This patch does the following:
      
       - Do not take i_mmap_rwsem and set TTU_RMAP_LOCKED for anon pages when
         calling try_to_unmap.
      
       - Remove the hacky code in hugetlb_page_mapping_lock_write. The routine
         will now simply do a 'trylock' while still holding the page lock. If
         the trylock fails, it will return NULL. This could impact the
         callers:
      
          - migration calling code will receive -EAGAIN and retry up to the
            hard coded limit (10).
      
          - memory error code will treat the page as BUSY. This will force
            killing (SIGKILL) instead of SIGBUS any mapping tasks.
      
         Do note that this change in behavior only happens when there is a
         race. None of the standard kernel testing suites actually hit this
         race, but it is possible.
      
      [1] https://lore.kernel.org/lkml/20200708012044.GC992@lca.pw/
      [2] https://lore.kernel.org/linux-mm/alpine.LSU.2.11.2010071833100.2214@eggly.anvils/
      
      Fixes: c0d0381a ("hugetlbfs: use i_mmap_rwsem for more pmd sharing synchronization")
      Reported-by: default avatarQian Cai <cai@lca.pw>
      Suggested-by: default avatarHugh Dickins <hughd@google.com>
      Signed-off-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Acked-by: default avatarNaoya Horiguchi <naoya.horiguchi@nec.com>
      Cc: <stable@vger.kernel.org>
      Link: https://lkml.kernel.org/r/20201105195058.78401-1-mike.kravetz@oracle.comSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      336bf30e
    • Muchun Song's avatar
      mm: memcontrol: fix missing wakeup polling thread · 8b21ca02
      Muchun Song authored
      When we poll the swap.events, we can miss being woken up when the swap
      event occurs.  Because we didn't notify.
      
      Fixes: f3a53a3a ("mm, memcontrol: implement memory.swap.events")
      Signed-off-by: default avatarMuchun Song <songmuchun@bytedance.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarShakeel Butt <shakeelb@google.com>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Cc: Roman Gushchin <guro@fb.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Yafang Shao <laoar.shao@gmail.com>
      Cc: Chris Down <chris@chrisdown.name>
      Cc: Tejun Heo <tj@kernel.org>
      Link: https://lkml.kernel.org/r/20201105161936.98312-1-songmuchun@bytedance.comSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8b21ca02
    • Santosh Sivaraj's avatar
      kernel/watchdog: fix watchdog_allowed_mask not used warning · e7e04615
      Santosh Sivaraj authored
      Define watchdog_allowed_mask only when SOFTLOCKUP_DETECTOR is enabled.
      
      Fixes: 7feeb9cd ("watchdog/sysctl: Clean up sysctl variable name space")
      Signed-off-by: default avatarSantosh Sivaraj <santosh@fossix.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarPetr Mladek <pmladek@suse.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: https://lkml.kernel.org/r/20201106015025.1281561-1-santosh@fossix.orgSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e7e04615
    • Matteo Croce's avatar
      reboot: fix overflow parsing reboot cpu number · df5b0ab3
      Matteo Croce authored
      Limit the CPU number to num_possible_cpus(), because setting it to a
      value lower than INT_MAX but higher than NR_CPUS produces the following
      error on reboot and shutdown:
      
          BUG: unable to handle page fault for address: ffffffff90ab1bb0
          #PF: supervisor read access in kernel mode
          #PF: error_code(0x0000) - not-present page
          PGD 1c09067 P4D 1c09067 PUD 1c0a063 PMD 0
          Oops: 0000 [#1] SMP
          CPU: 1 PID: 1 Comm: systemd-shutdow Not tainted 5.9.0-rc8-kvm #110
          Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014
          RIP: 0010:migrate_to_reboot_cpu+0xe/0x60
          Code: ea ea 00 48 89 fa 48 c7 c7 30 57 f1 81 e9 fa ef ff ff 66 2e 0f 1f 84 00 00 00 00 00 53 8b 1d d5 ea ea 00 e8 14 33 fe ff 89 da <48> 0f a3 15 ea fc bd 00 48 89 d0 73 29 89 c2 c1 e8 06 65 48 8b 3c
          RSP: 0018:ffffc90000013e08 EFLAGS: 00010246
          RAX: ffff88801f0a0000 RBX: 0000000077359400 RCX: 0000000000000000
          RDX: 0000000077359400 RSI: 0000000000000002 RDI: ffffffff81c199e0
          RBP: ffffffff81c1e3c0 R08: ffff88801f41f000 R09: ffffffff81c1e348
          R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
          R13: 00007f32bedf8830 R14: 00000000fee1dead R15: 0000000000000000
          FS:  00007f32bedf8980(0000) GS:ffff88801f480000(0000) knlGS:0000000000000000
          CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
          CR2: ffffffff90ab1bb0 CR3: 000000001d057000 CR4: 00000000000006a0
          DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
          DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
          Call Trace:
            __do_sys_reboot.cold+0x34/0x5b
            do_syscall_64+0x2d/0x40
      
      Fixes: 1b3a5d02 ("reboot: move arch/x86 reboot= handling to generic kernel")
      Signed-off-by: default avatarMatteo Croce <mcroce@microsoft.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Fabian Frederick <fabf@skynet.be>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Guenter Roeck <linux@roeck-us.net>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Mike Rapoport <rppt@kernel.org>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: Robin Holt <robinmholt@gmail.com>
      Cc: <stable@vger.kernel.org>
      Link: https://lkml.kernel.org/r/20201103214025.116799-3-mcroce@linux.microsoft.comSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      df5b0ab3
    • Matteo Croce's avatar
      Revert "kernel/reboot.c: convert simple_strtoul to kstrtoint" · 8b92c4ff
      Matteo Croce authored
      Patch series "fix parsing of reboot= cmdline", v3.
      
      The parsing of the reboot= cmdline has two major errors:
      
       - a missing bound check can crash the system on reboot
      
       - parsing of the cpu number only works if specified last
      
      Fix both.
      
      This patch (of 2):
      
      This reverts commit 616feab7.
      
      kstrtoint() and simple_strtoul() have a subtle difference which makes
      them non interchangeable: if a non digit character is found amid the
      parsing, the former will return an error, while the latter will just
      stop parsing, e.g.  simple_strtoul("123xyx") = 123.
      
      The kernel cmdline reboot= argument allows to specify the CPU used for
      rebooting, with the syntax `s####` among the other flags, e.g.
      "reboot=warm,s31,force", so if this flag is not the last given, it's
      silently ignored as well as the subsequent ones.
      
      Fixes: 616feab7 ("kernel/reboot.c: convert simple_strtoul to kstrtoint")
      Signed-off-by: default avatarMatteo Croce <mcroce@microsoft.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Guenter Roeck <linux@roeck-us.net>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Mike Rapoport <rppt@kernel.org>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Robin Holt <robinmholt@gmail.com>
      Cc: Fabian Frederick <fabf@skynet.be>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: <stable@vger.kernel.org>
      Link: https://lkml.kernel.org/r/20201103214025.116799-2-mcroce@linux.microsoft.comSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8b92c4ff
    • Arvind Sankar's avatar
      compiler.h: fix barrier_data() on clang · 3347acc6
      Arvind Sankar authored
      Commit 815f0ddb ("include/linux/compiler*.h: make compiler-*.h
      mutually exclusive") neglected to copy barrier_data() from
      compiler-gcc.h into compiler-clang.h.
      
      The definition in compiler-gcc.h was really to work around clang's more
      aggressive optimization, so this broke barrier_data() on clang, and
      consequently memzero_explicit() as well.
      
      For example, this results in at least the memzero_explicit() call in
      lib/crypto/sha256.c:sha256_transform() being optimized away by clang.
      
      Fix this by moving the definition of barrier_data() into compiler.h.
      
      Also move the gcc/clang definition of barrier() into compiler.h,
      __memory_barrier() is icc-specific (and barrier() is already defined
      using it in compiler-intel.h) and doesn't belong in compiler.h.
      
      [rdunlap@infradead.org: fix ALPHA builds when SMP is not enabled]
      
      Link: https://lkml.kernel.org/r/20201101231835.4589-1-rdunlap@infradead.org
      Fixes: 815f0ddb ("include/linux/compiler*.h: make compiler-*.h mutually exclusive")
      Signed-off-by: default avatarArvind Sankar <nivedita@alum.mit.edu>
      Signed-off-by: default avatarRandy Dunlap <rdunlap@infradead.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Tested-by: default avatarNick Desaulniers <ndesaulniers@google.com>
      Reviewed-by: default avatarNick Desaulniers <ndesaulniers@google.com>
      Reviewed-by: default avatarKees Cook <keescook@chromium.org>
      Cc: <stable@vger.kernel.org>
      Link: https://lkml.kernel.org/r/20201014212631.207844-1-nivedita@alum.mit.eduSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      3347acc6
    • Jason Gunthorpe's avatar
      mm/gup: use unpin_user_pages() in __gup_longterm_locked() · 96e1fac1
      Jason Gunthorpe authored
      When FOLL_PIN is passed to __get_user_pages() the page list must be put
      back using unpin_user_pages() otherwise the page pin reference persists
      in a corrupted state.
      
      There are two places in the unwind of __gup_longterm_locked() that put
      the pages back without checking.  Normally on error this function would
      return the partial page list making this the caller's responsibility,
      but in these two cases the caller is not allowed to see these pages at
      all.
      
      Fixes: 3faa52c0 ("mm/gup: track FOLL_PIN pages")
      Reported-by: default avatarIra Weiny <ira.weiny@intel.com>
      Signed-off-by: default avatarJason Gunthorpe <jgg@nvidia.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarIra Weiny <ira.weiny@intel.com>
      Reviewed-by: default avatarJohn Hubbard <jhubbard@nvidia.com>
      Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: <stable@vger.kernel.org>
      Link: https://lkml.kernel.org/r/0-v2-3ae7d9d162e2+2a7-gup_cma_fix_jgg@nvidia.comSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      96e1fac1
    • Laurent Dufour's avatar
      mm/slub: fix panic in slab_alloc_node() · 22e4663e
      Laurent Dufour authored
      While doing memory hot-unplug operation on a PowerPC VM running 1024 CPUs
      with 11TB of ram, I hit the following panic:
      
          BUG: Kernel NULL pointer dereference on read at 0x00000007
          Faulting instruction address: 0xc000000000456048
          Oops: Kernel access of bad area, sig: 11 [#2]
          LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS= 2048 NUMA pSeries
          Modules linked in: rpadlpar_io rpaphp
          CPU: 160 PID: 1 Comm: systemd Tainted: G      D           5.9.0 #1
          NIP:  c000000000456048 LR: c000000000455fd4 CTR: c00000000047b350
          REGS: c00006028d1b77a0 TRAP: 0300   Tainted: G      D            (5.9.0)
          MSR:  8000000000009033 <SF,EE,ME,IR,DR,RI,LE>  CR: 24004228  XER: 00000000
          CFAR: c00000000000f1b0 DAR: 0000000000000007 DSISR: 40000000 IRQMASK: 0
          GPR00: c000000000455fd4 c00006028d1b7a30 c000000001bec800 0000000000000000
          GPR04: 0000000000000dc0 0000000000000000 00000000000374ef c00007c53df99320
          GPR08: 000007c53c980000 0000000000000000 000007c53c980000 0000000000000000
          GPR12: 0000000000004400 c00000001e8e4400 0000000000000000 0000000000000f6a
          GPR16: 0000000000000000 c000000001c25930 c000000001d62528 00000000000000c1
          GPR20: c000000001d62538 c00006be469e9000 0000000fffffffe0 c0000000003c0ff8
          GPR24: 0000000000000018 0000000000000000 0000000000000dc0 0000000000000000
          GPR28: c00007c513755700 c000000001c236a4 c00007bc4001f800 0000000000000001
          NIP [c000000000456048] __kmalloc_node+0x108/0x790
          LR [c000000000455fd4] __kmalloc_node+0x94/0x790
          Call Trace:
            kvmalloc_node+0x58/0x110
            mem_cgroup_css_online+0x10c/0x270
            online_css+0x48/0xd0
            cgroup_apply_control_enable+0x2c4/0x470
            cgroup_mkdir+0x408/0x5f0
            kernfs_iop_mkdir+0x90/0x100
            vfs_mkdir+0x138/0x250
            do_mkdirat+0x154/0x1c0
            system_call_exception+0xf8/0x200
            system_call_common+0xf0/0x27c
          Instruction dump:
          e93e0000 e90d0030 39290008 7cc9402a e94d0030 e93e0000 7ce95214 7f89502a
          2fbc0000 419e0018 41920230 e9270010 <89290007> 7f994800 419e0220 7ee6bb78
      
      This pointing to the following code:
      
          mm/slub.c:2851
                  if (unlikely(!object || !node_match(page, node))) {
          c000000000456038:       00 00 bc 2f     cmpdi   cr7,r28,0
          c00000000045603c:       18 00 9e 41     beq     cr7,c000000000456054 <__kmalloc_node+0x114>
          node_match():
          mm/slub.c:2491
                  if (node != NUMA_NO_NODE && page_to_nid(page) != node)
          c000000000456040:       30 02 92 41     beq     cr4,c000000000456270 <__kmalloc_node+0x330>
          page_to_nid():
          include/linux/mm.h:1294
          c000000000456044:       10 00 27 e9     ld      r9,16(r7)
          c000000000456048:       07 00 29 89     lbz     r9,7(r9)	<<<< r9 = NULL
          node_match():
          mm/slub.c:2491
          c00000000045604c:       00 48 99 7f     cmpw    cr7,r25,r9
          c000000000456050:       20 02 9e 41     beq     cr7,c000000000456270 <__kmalloc_node+0x330>
      
      The panic occurred in slab_alloc_node() when checking for the page's node:
      
      	object = c->freelist;
      	page = c->page;
      	if (unlikely(!object || !node_match(page, node))) {
      		object = __slab_alloc(s, gfpflags, node, addr, c);
      		stat(s, ALLOC_SLOWPATH);
      
      The issue is that object is not NULL while page is NULL which is odd but
      may happen if the cache flush happened after loading object but before
      loading page.  Thus checking for the page pointer is required too.
      
      The cache flush is done through an inter processor interrupt when a
      piece of memory is off-lined.  That interrupt is triggered when a memory
      hot-unplug operation is initiated and offline_pages() is calling the
      slub's MEM_GOING_OFFLINE callback slab_mem_going_offline_callback()
      which is calling flush_cpu_slab().  If that interrupt is caught between
      the reading of c->freelist and the reading of c->page, this could lead
      to such a situation.  That situation is expected and the later call to
      this_cpu_cmpxchg_double() will detect the change to c->freelist and redo
      the whole operation.
      
      In commit 6159d0f5 ("mm/slub.c: page is always non-NULL in
      node_match()") check on the page pointer has been removed assuming that
      page is always valid when it is called.  It happens that this is not
      true in that particular case, so check for page before calling
      node_match() here.
      
      Fixes: 6159d0f5 ("mm/slub.c: page is always non-NULL in node_match()")
      Signed-off-by: default avatarLaurent Dufour <ldufour@linux.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Acked-by: default avatarChristoph Lameter <cl@linux.com>
      Cc: Wei Yang <richard.weiyang@gmail.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Nathan Lynch <nathanl@linux.ibm.com>
      Cc: Scott Cheloha <cheloha@linux.ibm.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: <stable@vger.kernel.org>
      Link: https://lkml.kernel.org/r/20201027190406.33283-1-ldufour@linux.ibm.comSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      22e4663e
    • Dmitry Baryshkov's avatar
      mailmap: fix entry for Dmitry Baryshkov/Eremin-Solenikov · 044747e9
      Dmitry Baryshkov authored
      Change back surname to new (old) one.  Dmitry Baryshkov -> Dmitry
      Eremin-Solenikov -> Dmitry Baryshkov.  Map several odd entries to main
      identity.
      Signed-off-by: default avatarDmitry Baryshkov <dbaryshkov@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Link: https://lkml.kernel.org/r/20201103005158.1181426-1-dmitry.baryshkov@linaro.orgSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      044747e9
    • Nicholas Piggin's avatar
      mm/vmscan: fix NR_ISOLATED_FILE corruption on 64-bit · 2da9f630
      Nicholas Piggin authored
      Previously the negated unsigned long would be cast back to signed long
      which would have the correct negative value.  After commit 730ec8c0
      ("mm/vmscan.c: change prototype for shrink_page_list"), the large
      unsigned int converts to a large positive signed long.
      
      Symptoms include CMA allocations hanging forever holding the cma_mutex
      due to alloc_contig_range->...->isolate_migratepages_block waiting
      forever in "while (unlikely(too_many_isolated(pgdat)))".
      
      [akpm@linux-foundation.org: fix -stat.nr_lazyfree_fail as well, per Michal]
      
      Fixes: 730ec8c0 ("mm/vmscan.c: change prototype for shrink_page_list")
      Signed-off-by: default avatarNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Vaneet Narang <v.narang@samsung.com>
      Cc: Maninder Singh <maninder1.s@samsung.com>
      Cc: Amit Sahrawat <a.sahrawat@samsung.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: <stable@vger.kernel.org>
      Link: https://lkml.kernel.org/r/20201029032320.1448441-1-npiggin@gmail.comSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      2da9f630
    • Zi Yan's avatar
      mm/compaction: stop isolation if too many pages are isolated and we have pages to migrate · d20bdd57
      Zi Yan authored
      In isolate_migratepages_block, if we have too many isolated pages and
      nr_migratepages is not zero, we should try to migrate what we have
      without wasting time on isolating.
      
      In theory it's possible that multiple parallel compactions will cause
      too_many_isolated() to become true even if each has isolated less than
      COMPACT_CLUSTER_MAX, and loop forever in the while loop.  Bailing
      immediately prevents that.
      
      [vbabka@suse.cz: changelog addition]
      
      Fixes: 1da2f328 (“mm,thp,compaction,cma: allow THP migration for CMA allocations”)
      Suggested-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarZi Yan <ziy@nvidia.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: <stable@vger.kernel.org>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Rik van Riel <riel@surriel.com>
      Cc: Yang Shi <shy828301@gmail.com>
      Link: https://lkml.kernel.org/r/20201030183809.3616803-2-zi.yan@sent.comSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d20bdd57
    • Zi Yan's avatar
      mm/compaction: count pages and stop correctly during page isolation · 38935861
      Zi Yan authored
      In isolate_migratepages_block, when cc->alloc_contig is true, we are
      able to isolate compound pages.  But nr_migratepages and nr_isolated did
      not count compound pages correctly, causing us to isolate more pages
      than we thought.
      
      So count compound pages as the number of base pages they contain.
      Otherwise, we might be trapped in too_many_isolated while loop, since
      the actual isolated pages can go up to COMPACT_CLUSTER_MAX*512=16384,
      where COMPACT_CLUSTER_MAX is 32, since we stop isolation after
      cc->nr_migratepages reaches to COMPACT_CLUSTER_MAX.
      
      In addition, after we fix the issue above, cc->nr_migratepages could
      never be equal to COMPACT_CLUSTER_MAX if compound pages are isolated,
      thus page isolation could not stop as we intended.  Change the isolation
      stop condition to '>='.
      
      The issue can be triggered as follows:
      
      In a system with 16GB memory and an 8GB CMA region reserved by
      hugetlb_cma, if we first allocate 10GB THPs and mlock them (so some THPs
      are allocated in the CMA region and mlocked), reserving 6 1GB hugetlb
      pages via /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages will
      get stuck (looping in too_many_isolated function) until we kill either
      task.  With the patch applied, oom will kill the application with 10GB
      THPs and let hugetlb page reservation finish.
      
      [ziy@nvidia.com: v3]
      
      Link: https://lkml.kernel.org/r/20201030183809.3616803-1-zi.yan@sent.com
      Fixes: 1da2f328 ("cmm,thp,compaction,cma: allow THP migration for CMA allocations")
      Signed-off-by: default avatarZi Yan <ziy@nvidia.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarYang Shi <shy828301@gmail.com>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Rik van Riel <riel@surriel.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: <stable@vger.kernel.org>
      Link: https://lkml.kernel.org/r/20201029200435.3386066-1-zi.yan@sent.comSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      38935861
    • Lyude Paul's avatar
      drm/nouveau/kms/nv50-: Use atomic encoder callbacks everywhere · 5c6fb4b2
      Lyude Paul authored
      It turns out that I forgot to go through and make sure that I converted all
      encoder callbacks to use atomic_enable/atomic_disable(), so let's go and
      actually do that.
      Signed-off-by: default avatarLyude Paul <lyude@redhat.com>
      Cc: Kirill A. Shutemov <kirill@shutemov.name>
      Fixes: 09838c4e ("drm/nouveau/kms: Search for encoders' connectors properly")
      Signed-off-by: default avatarBen Skeggs <bskeggs@redhat.com>
      5c6fb4b2
    • Ben Skeggs's avatar
      drm/nouveau/ttm: avoid using nouveau_drm.ttm.type_vram prior to nv50 · 6c27ffab
      Ben Skeggs authored
      Pre-NV50 chipsets don't currently use the MMU subsystem that later
      chipsets use, and type_vram is negative here, leading to an OOB memory
      access.
      
      This was previously guarded by a chipset check, restore that.
      Reported-by: default avatarThomas Zimmermann <tzimmermann@suse.de>
      Fixes: 5839172f ("drm/nouveau: explicitly specify caching to use")
      Signed-off-by: default avatarBen Skeggs <bskeggs@redhat.com>
      6c27ffab
    • Alexander Kapshuk's avatar
      drm/nouveau/kms: Fix NULL pointer dereference in nouveau_connector_detect_depth · 630f5122
      Alexander Kapshuk authored
      This oops manifests itself on the following hardware:
      01:00.0 VGA compatible controller: NVIDIA Corporation G98M [GeForce G 103M] (rev a1)
      
      Oct 09 14:17:46 lp-sasha kernel: BUG: kernel NULL pointer dereference, address: 0000000000000000
      Oct 09 14:17:46 lp-sasha kernel: #PF: supervisor read access in kernel mode
      Oct 09 14:17:46 lp-sasha kernel: #PF: error_code(0x0000) - not-present page
      Oct 09 14:17:46 lp-sasha kernel: PGD 0 P4D 0
      Oct 09 14:17:46 lp-sasha kernel: Oops: 0000 [#1] SMP PTI
      Oct 09 14:17:46 lp-sasha kernel: CPU: 1 PID: 191 Comm: systemd-udevd Not tainted 5.9.0-rc8-next-20201009 #38
      Oct 09 14:17:46 lp-sasha kernel: Hardware name: Hewlett-Packard Compaq Presario CQ61 Notebook PC/306A, BIOS F.03 03/23/2009
      Oct 09 14:17:46 lp-sasha kernel: RIP: 0010:nouveau_connector_detect_depth+0x71/0xc0 [nouveau]
      Oct 09 14:17:46 lp-sasha kernel: Code: 0a 00 00 48 8b 49 48 c7 87 b8 00 00 00 06 00 00 00 80 b9 4d 0a 00 00 00 75 1e 83 fa 41 75 05 48 85 c0 75 29 8b 81 10 0d 00 00 <39> 06 7c 25 f6 81 14 0d 00 00 02 75 b7 c3 80 b9 0c 0d 00 00 00 75
      Oct 09 14:17:46 lp-sasha kernel: RSP: 0018:ffffc9000028f8c0 EFLAGS: 00010297
      Oct 09 14:17:46 lp-sasha kernel: RAX: 0000000000014c08 RBX: ffff8880369d4000 RCX: ffff8880369d3000
      Oct 09 14:17:46 lp-sasha kernel: RDX: 0000000000000040 RSI: 0000000000000000 RDI: ffff8880369d4000
      Oct 09 14:17:46 lp-sasha kernel: RBP: ffff88800601cc00 R08: ffff8880051da298 R09: ffffffff8226201a
      Oct 09 14:17:46 lp-sasha kernel: R10: ffff88800469aa80 R11: ffff888004c84ff8 R12: 0000000000000000
      Oct 09 14:17:46 lp-sasha kernel: R13: ffff8880051da000 R14: 0000000000002000 R15: 0000000000000003
      Oct 09 14:17:46 lp-sasha kernel: FS:  00007fd0192b3440(0000) GS:ffff8880bc900000(0000) knlGS:0000000000000000
      Oct 09 14:17:46 lp-sasha kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      Oct 09 14:17:46 lp-sasha kernel: CR2: 0000000000000000 CR3: 0000000004976000 CR4: 00000000000006e0
      Oct 09 14:17:46 lp-sasha kernel: Call Trace:
      Oct 09 14:17:46 lp-sasha kernel:  nouveau_connector_get_modes+0x1e6/0x240 [nouveau]
      Oct 09 14:17:46 lp-sasha kernel:  ? kfree+0xb9/0x240
      Oct 09 14:17:46 lp-sasha kernel:  ? drm_connector_list_iter_next+0x7c/0xa0
      Oct 09 14:17:46 lp-sasha kernel:  drm_helper_probe_single_connector_modes+0x1ba/0x7c0
      Oct 09 14:17:46 lp-sasha kernel:  drm_client_modeset_probe+0x27e/0x1360
      Oct 09 14:17:46 lp-sasha kernel:  ? nvif_object_sclass_put+0xc/0x20 [nouveau]
      Oct 09 14:17:46 lp-sasha kernel:  ? nouveau_cli_init+0x3cc/0x440 [nouveau]
      Oct 09 14:17:46 lp-sasha kernel:  ? ktime_get_mono_fast_ns+0x49/0xa0
      Oct 09 14:17:46 lp-sasha kernel:  ? nouveau_drm_open+0x4e/0x180 [nouveau]
      Oct 09 14:17:46 lp-sasha kernel:  __drm_fb_helper_initial_config_and_unlock+0x3f/0x4a0
      Oct 09 14:17:46 lp-sasha kernel:  ? drm_file_alloc+0x18f/0x260
      Oct 09 14:17:46 lp-sasha kernel:  ? mutex_lock+0x9/0x40
      Oct 09 14:17:46 lp-sasha kernel:  ? drm_client_init+0x110/0x160
      Oct 09 14:17:46 lp-sasha kernel:  nouveau_fbcon_init+0x14d/0x1c0 [nouveau]
      Oct 09 14:17:46 lp-sasha kernel:  nouveau_drm_device_init+0x1c0/0x880 [nouveau]
      Oct 09 14:17:46 lp-sasha kernel:  nouveau_drm_probe+0x11a/0x1e0 [nouveau]
      Oct 09 14:17:46 lp-sasha kernel:  pci_device_probe+0xcd/0x140
      Oct 09 14:17:46 lp-sasha kernel:  really_probe+0xd8/0x400
      Oct 09 14:17:46 lp-sasha kernel:  driver_probe_device+0x4a/0xa0
      Oct 09 14:17:46 lp-sasha kernel:  device_driver_attach+0x9c/0xc0
      Oct 09 14:17:46 lp-sasha kernel:  __driver_attach+0x6f/0x100
      Oct 09 14:17:46 lp-sasha kernel:  ? device_driver_attach+0xc0/0xc0
      Oct 09 14:17:46 lp-sasha kernel:  bus_for_each_dev+0x75/0xc0
      Oct 09 14:17:46 lp-sasha kernel:  bus_add_driver+0x106/0x1c0
      Oct 09 14:17:46 lp-sasha kernel:  driver_register+0x86/0xe0
      Oct 09 14:17:46 lp-sasha kernel:  ? 0xffffffffa044e000
      Oct 09 14:17:46 lp-sasha kernel:  do_one_initcall+0x48/0x1e0
      Oct 09 14:17:46 lp-sasha kernel:  ? _cond_resched+0x11/0x60
      Oct 09 14:17:46 lp-sasha kernel:  ? kmem_cache_alloc_trace+0x19c/0x1e0
      Oct 09 14:17:46 lp-sasha kernel:  do_init_module+0x57/0x220
      Oct 09 14:17:46 lp-sasha kernel:  __do_sys_finit_module+0xa0/0xe0
      Oct 09 14:17:46 lp-sasha kernel:  do_syscall_64+0x33/0x40
      Oct 09 14:17:46 lp-sasha kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      Oct 09 14:17:46 lp-sasha kernel: RIP: 0033:0x7fd01a060d5d
      Oct 09 14:17:46 lp-sasha kernel: Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d e3 70 0c 00 f7 d8 64 89 01 48
      Oct 09 14:17:46 lp-sasha kernel: RSP: 002b:00007ffc8ad38a98 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
      Oct 09 14:17:46 lp-sasha kernel: RAX: ffffffffffffffda RBX: 0000563f6e7fd530 RCX: 00007fd01a060d5d
      Oct 09 14:17:46 lp-sasha kernel: RDX: 0000000000000000 RSI: 00007fd01a19f95d RDI: 000000000000000f
      Oct 09 14:17:46 lp-sasha kernel: RBP: 0000000000020000 R08: 0000000000000000 R09: 0000000000000007
      Oct 09 14:17:46 lp-sasha kernel: R10: 000000000000000f R11: 0000000000000246 R12: 00007fd01a19f95d
      Oct 09 14:17:46 lp-sasha kernel: R13: 0000000000000000 R14: 0000563f6e7fbc10 R15: 0000563f6e7fd530
      Oct 09 14:17:46 lp-sasha kernel: Modules linked in: nouveau(+) ttm xt_string xt_mark xt_LOG vgem v4l2_dv_timings uvcvideo ulpi udf ts_kmp ts_fsm ts_bm snd_aloop sil164 qat_dh895xccvf nf_nat_sip nf_nat_irc nf_nat_ftp nf_nat nf_log_ipv6 nf_log_ipv4 nf_log_common ltc2990 lcd intel_qat input_leds i2c_mux gspca_main videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_common videodev mc drivetemp cuse fuse crc_itu_t coretemp ch7006 ath5k ath algif_hash
      Oct 09 14:17:46 lp-sasha kernel: CR2: 0000000000000000
      Oct 09 14:17:46 lp-sasha kernel: ---[ end trace 0ddafe218ad30017 ]---
      Oct 09 14:17:46 lp-sasha kernel: RIP: 0010:nouveau_connector_detect_depth+0x71/0xc0 [nouveau]
      Oct 09 14:17:46 lp-sasha kernel: Code: 0a 00 00 48 8b 49 48 c7 87 b8 00 00 00 06 00 00 00 80 b9 4d 0a 00 00 00 75 1e 83 fa 41 75 05 48 85 c0 75 29 8b 81 10 0d 00 00 <39> 06 7c 25 f6 81 14 0d 00 00 02 75 b7 c3 80 b9 0c 0d 00 00 00 75
      Oct 09 14:17:46 lp-sasha kernel: RSP: 0018:ffffc9000028f8c0 EFLAGS: 00010297
      Oct 09 14:17:46 lp-sasha kernel: RAX: 0000000000014c08 RBX: ffff8880369d4000 RCX: ffff8880369d3000
      Oct 09 14:17:46 lp-sasha kernel: RDX: 0000000000000040 RSI: 0000000000000000 RDI: ffff8880369d4000
      Oct 09 14:17:46 lp-sasha kernel: RBP: ffff88800601cc00 R08: ffff8880051da298 R09: ffffffff8226201a
      Oct 09 14:17:46 lp-sasha kernel: R10: ffff88800469aa80 R11: ffff888004c84ff8 R12: 0000000000000000
      Oct 09 14:17:46 lp-sasha kernel: R13: ffff8880051da000 R14: 0000000000002000 R15: 0000000000000003
      Oct 09 14:17:46 lp-sasha kernel: FS:  00007fd0192b3440(0000) GS:ffff8880bc900000(0000) knlGS:0000000000000000
      Oct 09 14:17:46 lp-sasha kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      Oct 09 14:17:46 lp-sasha kernel: CR2: 0000000000000000 CR3: 0000000004976000 CR4: 00000000000006e0
      
      The disassembly:
      Code: 0a 00 00 48 8b 49 48 c7 87 b8 00 00 00 06 00 00 00 80 b9 4d 0a 00 00 00 75 1e 83 fa 41 75 05 48 85 c0 75 29 8b 81 10 0d 00 00 <39> 06 7c 25 f6 81 14 0d 00 00 02 75 b7 c3 80 b9 0c 0d 00 00 00 75
      All code
      ========
         0:   0a 00                   or     (%rax),%al
         2:   00 48 8b                add    %cl,-0x75(%rax)
         5:   49                      rex.WB
         6:   48 c7 87 b8 00 00 00    movq   $0x6,0xb8(%rdi)
         d:   06 00 00 00
        11:   80 b9 4d 0a 00 00 00    cmpb   $0x0,0xa4d(%rcx)
        18:   75 1e                   jne    0x38
        1a:   83 fa 41                cmp    $0x41,%edx
        1d:   75 05                   jne    0x24
        1f:   48 85 c0                test   %rax,%rax
        22:   75 29                   jne    0x4d
        24:   8b 81 10 0d 00 00       mov    0xd10(%rcx),%eax
        2a:*  39 06                   cmp    %eax,(%rsi)              <-- trapping instruction
        2c:   7c 25                   jl     0x53
        2e:   f6 81 14 0d 00 00 02    testb  $0x2,0xd14(%rcx)
        35:   75 b7                   jne    0xffffffffffffffee
        37:   c3                      retq
        38:   80 b9 0c 0d 00 00 00    cmpb   $0x0,0xd0c(%rcx)
        3f:   75                      .byte 0x75
      
      Code starting with the faulting instruction
      ===========================================
         0:   39 06                   cmp    %eax,(%rsi)
         2:   7c 25                   jl     0x29
         4:   f6 81 14 0d 00 00 02    testb  $0x2,0xd14(%rcx)
         b:   75 b7                   jne    0xffffffffffffffc4
         d:   c3                      retq
         e:   80 b9 0c 0d 00 00 00    cmpb   $0x0,0xd0c(%rcx)
        15:   75                      .byte 0x75
      
      objdump -SF --disassemble=nouveau_connector_detect_depth
      [...]
              if (nv_connector->edid &&
         c85e1:       83 fa 41                cmp    $0x41,%edx
         c85e4:       75 05                   jne    c85eb <nouveau_connector_detect_depth+0x6b> (File Offset: 0xc866b)
         c85e6:       48 85 c0                test   %rax,%rax
         c85e9:       75 29                   jne    c8614 <nouveau_connector_detect_depth+0x94> (File Offset: 0xc8694)
                  nv_connector->type == DCB_CONNECTOR_LVDS_SPWG)
                      duallink = ((u8 *)nv_connector->edid)[121] == 2;
              else
                      duallink = mode->clock >= bios->fp.duallink_transition_clk;
      
              if ((!duallink && (bios->fp.strapless_is_24bit & 1)) ||
         c85eb:       8b 81 10 0d 00 00       mov    0xd10(%rcx),%eax
         c85f1:       39 06                   cmp    %eax,(%rsi)
         c85f3:       7c 25                   jl     c861a <nouveau_connector_detect_depth+0x9a> (File Offset: 0xc869a)
                  ( duallink && (bios->fp.strapless_is_24bit & 2)))
         c85f5:       f6 81 14 0d 00 00 02    testb  $0x2,0xd14(%rcx)
         c85fc:       75 b7                   jne    c85b5 <nouveau_connector_detect_depth+0x35> (File Offset: 0xc8635)
                      connector->display_info.bpc = 8;
      [...]
      
      % scripts/faddr2line /lib/modules/5.9.0-rc8-next-20201009/kernel/drivers/gpu/drm/nouveau/nouveau.ko nouveau_connector_detect_depth+0x71/0xc0
      nouveau_connector_detect_depth+0x71/0xc0:
      nouveau_connector_detect_depth at /home/sasha/linux-next/drivers/gpu/drm/nouveau/nouveau_connector.c:891
      
      It is actually line 889. See the disassembly below.
      889                     duallink = mode->clock >= bios->fp.duallink_transition_clk;
      
      The NULL pointer being dereferenced is mode.
      
      Git bisect has identified the following commit as bad:
      f28e32d3 drm/nouveau/kms: Don't change EDID when it hasn't actually changed
      
      Here is the chain of events that causes the oops.
      On entry to nouveau_connector_detect_lvds, edid is set to NULL.  The call
      to nouveau_connector_detect sets nv_connector->edid to valid memory,
      with status set to connector_status_connected and the flow of execution
      branching to the out label.
      
      The subsequent call to nouveau_connector_set_edid erronously clears
      nv_connector->edid, via the local edid pointer which remains set to NULL.
      
      Fix this by setting edid to the value of the just acquired
      nv_connector->edid and executing the body of nouveau_connector_set_edid
      only if nv_connector->edid and edid point to different memory addresses
      thus preventing nv_connector->edid from being turned into a dangling
      pointer.
      
      Fixes: f28e32d3 ("drm/nouveau/kms: Don't change EDID when it hasn't actually changed")
      Signed-off-by: default avatarAlexander Kapshuk <alexander.kapshuk@gmail.com>
      Reviewed-by: default avatarLyude Paul <lyude@redhat.com>
      Signed-off-by: default avatarBen Skeggs <bskeggs@redhat.com>
      630f5122
    • Linus Torvalds's avatar
      Merge tag 'vfs-5.10-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux · f01c30de
      Linus Torvalds authored
      Pull fs freeze fix and cleanups from Darrick Wong:
       "A single vfs fix for 5.10, along with two subsequent cleanups.
      
        A very long time ago, a hack was added to the vfs fs freeze protection
        code to work around lockdep complaints about XFS, which would try to
        run a transaction (which requires intwrite protection) to finalize an
        xfs freeze (by which time the vfs had already taken intwrite).
      
        Fast forward a few years, and XFS fixed the recursive intwrite problem
        on its own, and the hack became unnecessary. Fast forward almost a
        decade, and latent bugs in the code converting this hack from freeze
        flags to freeze locks combine with lockdep bugs to make this reproduce
        frequently enough to notice page faults racing with freeze.
      
        Since the hack is unnecessary and causes thread race errors, just get
        rid of it completely. Making this kind of vfs change midway through a
        cycle makes me nervous, but a large enough number of the usual
        VFS/ext4/XFS/btrfs suspects have said this looks good and solves a
        real problem vector.
      
        And once that removal is done, __sb_start_write is now simple enough
        that it becomes possible to refactor the function into smaller,
        simpler static inline helpers in linux/fs.h. The cleanup is
        straightforward.
      
        Summary:
      
         - Finally remove the "convert to trylock" weirdness in the fs freezer
           code. It was necessary 10 years ago to deal with nested
           transactions in XFS, but we've long since removed that; and now
           this is causing subtle race conditions when lockdep goes offline
           and sb_start_* aren't prepared to retry a trylock failure.
      
         - Minor cleanups of the sb_start_* fs freeze helpers"
      
      * tag 'vfs-5.10-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
        vfs: move __sb_{start,end}_write* to fs.h
        vfs: separate __sb_start_write into blocking and non-blocking helpers
        vfs: remove lockdep bogosity in __sb_start_write
      f01c30de
    • Linus Torvalds's avatar
      Merge tag 'xfs-5.10-fixes-5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux · d9315f56
      Linus Torvalds authored
      Pull xfs fixes from Darrick Wong:
      
       - Fix a fairly serious problem where the reverse mapping btree key
         comparison functions were silently ignoring parts of the keyspace
         when doing comparisons
      
       - Fix a thinko in the online refcount scrubber
      
       - Fix a missing unlock in the pnfs code
      
      * tag 'xfs-5.10-fixes-5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
        xfs: fix a missing unlock on error in xfs_fs_map_blocks
        xfs: fix brainos in the refcount scrubber's rmap fragment processor
        xfs: fix rmap key and record comparison functions
        xfs: set the unwritten bit in rmap lookup flags in xchk_bmap_get_rmapextents
        xfs: fix flags argument to rmap lookup when converting shared file rmaps
      d9315f56
  4. 13 Nov, 2020 1 commit