1. 19 Jul, 2019 40 commits
    • Linus Torvalds's avatar
      Merge tag 'csky-for-linus-5.3-rc1' of git://github.com/c-sky/csky-linux · a84d2d29
      Linus Torvalds authored
      Pull arch/csky pupdates from Guo Ren:
       "This round of csky subsystem gives two features (ASID algorithm
        update, Perf pmu record support) and some fixups.
      
        ASID updates:
         - Revert mmu ASID mechanism
         - Add new asid lib code from arm
         - Use generic asid algorithm to implement switch_mm
         - Improve tlb operation with help of asid
      
        Perf pmu record support:
         - Init pmu as a device
         - Add count-width property for csky pmu
         - Add pmu interrupt support
         - Fix perf record in kernel/user space
         - dt-bindings: Add csky PMU bindings
      
        Fixes:
         - Fixup no panic in kernel for some traps
         - Fixup some error count in 810 & 860.
         - Fixup abiv1 memset error"
      
      * tag 'csky-for-linus-5.3-rc1' of git://github.com/c-sky/csky-linux:
        csky: Fixup abiv1 memset error
        csky: Improve tlb operation with help of asid
        csky: Use generic asid algorithm to implement switch_mm
        csky: Add new asid lib code from arm
        csky: Revert mmu ASID mechanism
        dt-bindings: csky: Add csky PMU bindings
        dt-bindings: interrupt-controller: Update csky mpintc
        csky: Fixup some error count in 810 & 860.
        csky: Fix perf record in kernel/user space
        csky: Add pmu interrupt support
        csky: Add count-width property for csky pmu
        csky: Init pmu as a device
        csky: Fixup no panic in kernel for some traps
        csky: Select intc & timer drivers
      a84d2d29
    • Linus Torvalds's avatar
      Merge tag 'for-linus-5.3a-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip · b5d72dda
      Linus Torvalds authored
      Pull xen updates from Juergen Gross:
       "Fixes and features:
      
         - A series to introduce a common command line parameter for disabling
           paravirtual extensions when running as a guest in virtualized
           environment
      
         - A fix for int3 handling in Xen pv guests
      
         - Removal of the Xen-specific tmem driver as support of tmem in Xen
           has been dropped (and it was experimental only)
      
         - A security fix for running as Xen dom0 (XSA-300)
      
         - A fix for IRQ handling when offlining cpus in Xen guests
      
         - Some small cleanups"
      
      * tag 'for-linus-5.3a-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
        xen: let alloc_xenballooned_pages() fail if not enough memory free
        xen/pv: Fix a boot up hang revealed by int3 self test
        x86/xen: Add "nopv" support for HVM guest
        x86/paravirt: Remove const mark from x86_hyper_xen_hvm variable
        xen: Map "xen_nopv" parameter to "nopv" and mark it obsolete
        x86: Add "nopv" parameter to disable PV extensions
        x86/xen: Mark xen_hvm_need_lapic() and xen_x2apic_para_available() as __init
        xen: remove tmem driver
        Revert "x86/paravirt: Set up the virt_spin_lock_key after static keys get initialized"
        xen/events: fix binding user event channels to cpus
      b5d72dda
    • Linus Torvalds's avatar
      Merge tag 'iomap-5.3-merge-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux · 26473f83
      Linus Torvalds authored
      Pull iomap split/cleanup from Darrick Wong:
       "As promised, here's the second part of the iomap merge for 5.3, in
        which we break up iomap.c into smaller files grouped by functional
        area so that it'll be easier in the long run to maintain cohesiveness
        of code units and to review incoming patches. There are no functional
        changes and fs/iomap.c split cleanly.
      
        Summary:
      
         - Regroup the fs/iomap.c code by major functional area so that we can
           start development for 5.4 from a more stable base"
      
      * tag 'iomap-5.3-merge-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
        iomap: move internal declarations into fs/iomap/
        iomap: move the main iteration code into a separate file
        iomap: move the buffered IO code into a separate file
        iomap: move the direct IO code into a separate file
        iomap: move the SEEK_HOLE code into a separate file
        iomap: move the file mapping reporting code into a separate file
        iomap: move the swapfile code into a separate file
        iomap: start moving code to fs/iomap/
      26473f83
    • Linus Torvalds's avatar
      Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · 4f5ed131
      Linus Torvalds authored
      Pull misc vfs updates from Al Viro:
       "Assorted stuff"
      
      * 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
        perf_event_get(): don't bother with fget_raw()
        vfs: update d_make_root() description
      4f5ed131
    • Linus Torvalds's avatar
      Merge branch 'work.adfs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · d2fbf4b6
      Linus Torvalds authored
      Pull adfs updates from Al Viro:
       "More ADFS patches from Russell King"
      
      * 'work.adfs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
        fs/adfs: add time stamp and file type helpers
        fs/adfs: super: limit idlen according to directory type
        fs/adfs: super: fix use-after-free bug
        fs/adfs: super: safely update options on remount
        fs/adfs: super: correct superblock flags
        fs/adfs: clean up indirect disc addresses and fragment IDs
        fs/adfs: clean up error message printing
        fs/adfs: use %pV for error messages
        fs/adfs: use format_version from disc_record
        fs/adfs: add helper to get filesystem size
        fs/adfs: add helper to get discrecord from map
        fs/adfs: correct disc record structure
      d2fbf4b6
    • Linus Torvalds's avatar
      Merge branch 'work.mount0' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · 933a90bf
      Linus Torvalds authored
      Pull vfs mount updates from Al Viro:
       "The first part of mount updates.
      
        Convert filesystems to use the new mount API"
      
      * 'work.mount0' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (63 commits)
        mnt_init(): call shmem_init() unconditionally
        constify ksys_mount() string arguments
        don't bother with registering rootfs
        init_rootfs(): don't bother with init_ramfs_fs()
        vfs: Convert smackfs to use the new mount API
        vfs: Convert selinuxfs to use the new mount API
        vfs: Convert securityfs to use the new mount API
        vfs: Convert apparmorfs to use the new mount API
        vfs: Convert openpromfs to use the new mount API
        vfs: Convert xenfs to use the new mount API
        vfs: Convert gadgetfs to use the new mount API
        vfs: Convert oprofilefs to use the new mount API
        vfs: Convert ibmasmfs to use the new mount API
        vfs: Convert qib_fs/ipathfs to use the new mount API
        vfs: Convert efivarfs to use the new mount API
        vfs: Convert configfs to use the new mount API
        vfs: Convert binfmt_misc to use the new mount API
        convenience helper: get_tree_single()
        convenience helper get_tree_nodev()
        vfs: Kill sget_userns()
        ...
      933a90bf
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 5f4fc6d4
      Linus Torvalds authored
      Pull networking fixes from David Miller:
      
       1) Fix AF_XDP cq entry leak, from Ilya Maximets.
      
       2) Fix handling of PHY power-down on RTL8411B, from Heiner Kallweit.
      
       3) Add some new PCI IDs to iwlwifi, from Ihab Zhaika.
      
       4) Fix handling of neigh timers wrt. entries added by userspace, from
          Lorenzo Bianconi.
      
       5) Various cases of missing of_node_put(), from Nishka Dasgupta.
      
       6) The new NET_ACT_CT needs to depend upon NF_NAT, from Yue Haibing.
      
       7) Various RDS layer fixes, from Gerd Rausch.
      
       8) Fix some more fallout from TCQ_F_CAN_BYPASS generalization, from
          Cong Wang.
      
       9) Fix FIB source validation checks over loopback, also from Cong Wang.
      
      10) Use promisc for unsupported number of filters, from Justin Chen.
      
      11) Missing sibling route unlink on failure in ipv6, from Ido Schimmel.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (90 commits)
        tcp: fix tcp_set_congestion_control() use from bpf hook
        ag71xx: fix return value check in ag71xx_probe()
        ag71xx: fix error return code in ag71xx_probe()
        usb: qmi_wwan: add D-Link DWM-222 A2 device ID
        bnxt_en: Fix VNIC accounting when enabling aRFS on 57500 chips.
        net: dsa: sja1105: Fix missing unlock on error in sk_buff()
        gve: replace kfree with kvfree
        selftests/bpf: fix test_xdp_noinline on s390
        selftests/bpf: fix "valid read map access into a read-only array 1" on s390
        net/mlx5: Replace kfree with kvfree
        MAINTAINERS: update netsec driver
        ipv6: Unlink sibling route in case of failure
        liquidio: Replace vmalloc + memset with vzalloc
        udp: Fix typo in net/ipv4/udp.c
        net: bcmgenet: use promisc for unsupported filters
        ipv6: rt6_check should return NULL if 'from' is NULL
        tipc: initialize 'validated' field of received packets
        selftests: add a test case for rp_filter
        fib: relax source validation check for loopback packets
        mlxsw: spectrum: Do not process learned records with a dummy FID
        ...
      5f4fc6d4
    • Linus Torvalds's avatar
      Merge branch 'akpm' (patches from Andrew) · 249be851
      Linus Torvalds authored
      Merge yet more updates from Andrew Morton:
       "The rest of MM and a kernel-wide procfs cleanup.
      
        Summary of the more significant patches:
      
         - Patch series "mm/memory_hotplug: Factor out memory block
           devicehandling", v3. David Hildenbrand.
      
           Some spring-cleaning of the memory hotplug code, notably in
           drivers/base/memory.c
      
         - "mm: thp: fix false negative of shmem vma's THP eligibility". Yang
           Shi.
      
           Fix /proc/pid/smaps output for THP pages used in shmem.
      
         - "resource: fix locking in find_next_iomem_res()" + 1. Nadav Amit.
      
           Bugfix and speedup for kernel/resource.c
      
         - Patch series "mm: Further memory block device cleanups", David
           Hildenbrand.
      
           More spring-cleaning of the memory hotplug code.
      
         - Patch series "mm: Sub-section memory hotplug support". Dan
           Williams.
      
           Generalise the memory hotplug code so that pmem can use it more
           completely. Then remove the hacks from the libnvdimm code which
           were there to work around the memory-hotplug code's constraints.
      
         - "proc/sysctl: add shared variables for range check", Matteo Croce.
      
           We have about 250 instances of
      
                int zero;
                ...
                        .extra1 = &zero,
      
           in the tree. This is a tree-wide sweep to make all those private
           "zero"s and "one"s use global variables.
      
           Alas, it isn't practical to make those two global integers const"
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (38 commits)
        proc/sysctl: add shared variables for range check
        mm: migrate: remove unused mode argument
        mm/sparsemem: cleanup 'section number' data types
        libnvdimm/pfn: stop padding pmem namespaces to section alignment
        libnvdimm/pfn: fix fsdax-mode namespace info-block zero-fields
        mm/devm_memremap_pages: enable sub-section remap
        mm: document ZONE_DEVICE memory-model implications
        mm/sparsemem: support sub-section hotplug
        mm/sparsemem: prepare for sub-section ranges
        mm: kill is_dev_zone() helper
        mm/hotplug: kill is_dev_zone() usage in __remove_pages()
        mm/sparsemem: convert kmalloc_section_memmap() to populate_section_memmap()
        mm/hotplug: prepare shrink_{zone, pgdat}_span for sub-section removal
        mm/sparsemem: add helpers track active portions of a section at boot
        mm/sparsemem: introduce a SECTION_IS_EARLY flag
        mm/sparsemem: introduce struct mem_section_usage
        drivers/base/memory.c: get rid of find_memory_block_hinted()
        mm/memory_hotplug: move and simplify walk_memory_blocks()
        mm/memory_hotplug: rename walk_memory_range() and pass start+size instead of pfns
        mm: make register_mem_sect_under_node() static
        ...
      249be851
    • Guo Ren's avatar
      csky: Fixup abiv1 memset error · bdfeb0cc
      Guo Ren authored
      Current memset implementation in abiv1 is wrong and it'll cause unalign
      access. Just remove it and use the generic one. This patch will cause
      performance degradation and we will improve it with a new design in next
      patchset.
      Signed-off-by: default avatarGuo Ren <ren_guo@c-sky.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      bdfeb0cc
    • Guo Ren's avatar
      csky: Improve tlb operation with help of asid · 4e562c11
      Guo Ren authored
      There are two generations of tlb operation instruction for C-SKY.
      First generation is use mcr register and it need software do more
      things, second generation is use specific instructions, eg:
       tlbi.va, tlbi.vas, tlbi.alls
      
      We implemented the following functions:
      
       - flush_tlb_range (a range of entries)
       - flush_tlb_page (one entry)
      
       Above functions use asid from vma->mm to invalid tlb entries and
       we could use tlbi.vas instruction for newest generation csky cpu.
      
       - flush_tlb_kernel_range
       - flush_tlb_one
      
       Above functions don't care asid and it invalid the tlb entries only
       with vpn and we could use tlbi.vaas instruction for newest generat-
       ion csky cpu.
      Signed-off-by: default avatarGuo Ren <ren_guo@c-sky.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      4e562c11
    • Guo Ren's avatar
      csky: Use generic asid algorithm to implement switch_mm · 22d55f02
      Guo Ren authored
      Use linux generic asid/vmid algorithm to implement csky
      switch_mm function. The algorithm is from arm and it could
      work with SMP system. It'll help reduce tlb flush for
      switch_mm in task/vm switch.
      Signed-off-by: default avatarGuo Ren <ren_guo@c-sky.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      22d55f02
    • Guo Ren's avatar
      csky: Add new asid lib code from arm · a231b883
      Guo Ren authored
      This patch only contains asid help code from arm for next patch to
      use.
      
      The asid allocator use five level check to reduce the cost of
      switch_mm.
      
       1. Check if the asid version is the same (it's general)
       2. Check reserved_asid which is set in rollover flush_context()
          and key point is to keep the same bit position with the current
          asid version instead of input version.
       3. Check if the position of bitmap is free then it could be set &
          used directly.
       4. find_next_zero_bit() (a little performance cost)
       5. flush_context  (this is the worst cost with increase current asid
          version)
      
      Check is level by level and cost is also higher with the next level.
      The reserved_asid and bitmap mechanism prevent unnecessary
      find_next_zero_bit().
      
      The atomic 64 bit asid is also suitable for 32-bit system and it
      won't cost a lot in 1th 2th 3th level check.
      
      The operation of set/clear mm_cpumask was removed in arm64 compared to
      arm32. It seems no side effect on current arm64 system, but from
      software meaning it's wrong. Although csky also needn't it, we add it
      back for csky.
      
      The asid_per_ctxt is no use for csky and it reserves the lowest bits for
      other use, maybe: trust zone ? Ok, just keep it in csky copy.
      
      Seems it also could be used by other archs and it's worth to move asid
      code to generic in future.
      Signed-off-by: default avatarGuo Ren <ren_guo@c-sky.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Julien Grall <julien.grall@arm.com>
      a231b883
    • Guo Ren's avatar
      csky: Revert mmu ASID mechanism · 9d35dc30
      Guo Ren authored
      Current C-SKY ASID mechanism is from mips and it doesn't work well
      with multi-cores. ASID per core mechanism is not suitable for C-SKY
      SMP tlb maintain operations, eg: tlbi.vas need share the same asid
      in all processors and it'll invalid the tlb entry in all cores with
      the same asid.
      
      This patch is prepare for new ASID mechanism.
      Signed-off-by: default avatarGuo Ren <ren_guo@c-sky.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      9d35dc30
    • Mao Han's avatar
      dt-bindings: csky: Add csky PMU bindings · 4d581034
      Mao Han authored
      This patch adds the documentation to describe that how to add pmu node in
      dts.
      Signed-off-by: default avatarMao Han <han_mao@c-sky.com>
      Signed-off-by: default avatarGuo Ren <guoren@kernel.org>
      Cc: Rob Herring <robh+dt@kernel.org>
      4d581034
    • Guo Ren's avatar
      dt-bindings: interrupt-controller: Update csky mpintc · 69d812f5
      Guo Ren authored
      Add trigger type setting for csky,mpintc. The driver also could
      support #interrupt-cells <1> and it wouldn't invalidate existing
      DTs. Here we only show the complete format.
      Signed-off-by: default avatarGuo Ren <ren_guo@c-sky.com>
      Reviewed-by: default avatarRob Herring <robh+dt@kernel.org>
      Cc: Marc Zyngier <marc.zyngier@arm.com>
      69d812f5
    • Guo Ren's avatar
      csky: Fixup some error count in 810 & 860. · e7534198
      Guo Ren authored
      CK810 pmu only support event with index 0-8 and 0xd; CK860 only
      support event 1~4, 0xa~0x1b. So do not register unsupport event
      to hardware cache event, which may leader to unknown behavior.
      Signed-off-by: default avatarMao Han <han_mao@c-sky.com>
      Signed-off-by: default avatarGuo Ren <ren_guo@c-sky.com>
      e7534198
    • Mao Han's avatar
      csky: Fix perf record in kernel/user space · d41435d9
      Mao Han authored
      csky_pmu_event_init is called several times during the perf record
      initialzation. After configure the event counter in either kernel
      space or user space, csky_pmu_event_init is called twice with no
      attr specified. Configuration will be overwritten with sampling in
      both kernel space and user space. --all-kernel/--all-user is
      useless without this patch applied.
      Signed-off-by: default avatarMao Han <han_mao@c-sky.com>
      Signed-off-by: default avatarGuo Ren <guoren@kernel.org>
      d41435d9
    • Mao Han's avatar
      csky: Add pmu interrupt support · f622fbf2
      Mao Han authored
      This patch add interrupt request and handler for csky pmu.
      perf can record on hardware event with this patch applied.
      Signed-off-by: default avatarMao Han <han_mao@c-sky.com>
      Signed-off-by: default avatarGuo Ren <guoren@kernel.org>
      f622fbf2
    • Mao Han's avatar
      csky: Add count-width property for csky pmu · ccffa1ad
      Mao Han authored
      The csky pmu counter may have different io width. When the counter is
      smaller then 64 bits and counter value is smaller than the old value, it
      will result to a extremely large delta value. So the sampled value should
      be extend to 64 bits to avoid this, the extension bits base on the
      count-width property from dts.
      Signed-off-by: default avatarMao Han <han_mao@c-sky.com>
      Signed-off-by: default avatarGuo Ren <guoren@kernel.org>
      ccffa1ad
    • Mao Han's avatar
      csky: Init pmu as a device · f132076c
      Mao Han authored
      This patch change the csky pmu initialization from arch init to
      device init. The pmu can be configued with information from
      device tree(pmu device name, irq number and etc.).
      Signed-off-by: default avatarMao Han <han_mao@c-sky.com>
      Signed-off-by: default avatarGuo Ren <guoren@kernel.org>
      f132076c
    • Guo Ren's avatar
      csky: Fixup no panic in kernel for some traps · 3158d289
      Guo Ren authored
      These traps couldn't be hanppen in kernel and we must panic there not
      send a signal to userspace.
      Signed-off-by: default avatarGuo Ren <ren_guo@c-sky.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      3158d289
    • Guo Ren's avatar
      csky: Select intc & timer drivers · 1994cc49
      Guo Ren authored
      Let arch help to select interrupt controller's and timer's drivers
      instead of people using menuconfig to select. This help the mini system
      boot up.
      Signed-off-by: default avatarGuo Ren <ren_guo@c-sky.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      1994cc49
    • Eric Dumazet's avatar
      tcp: fix tcp_set_congestion_control() use from bpf hook · 8d650cde
      Eric Dumazet authored
      Neal reported incorrect use of ns_capable() from bpf hook.
      
      bpf_setsockopt(...TCP_CONGESTION...)
        -> tcp_set_congestion_control()
         -> ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN)
          -> ns_capable_common()
           -> current_cred()
            -> rcu_dereference_protected(current->cred, 1)
      
      Accessing 'current' in bpf context makes no sense, since packets
      are processed from softirq context.
      
      As Neal stated : The capability check in tcp_set_congestion_control()
      was written assuming a system call context, and then was reused from
      a BPF call site.
      
      The fix is to add a new parameter to tcp_set_congestion_control(),
      so that the ns_capable() call is only performed under the right
      context.
      
      Fixes: 91b5b21c ("bpf: Add support for changing congestion control")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Lawrence Brakmo <brakmo@fb.com>
      Reported-by: default avatarNeal Cardwell <ncardwell@google.com>
      Acked-by: default avatarNeal Cardwell <ncardwell@google.com>
      Acked-by: default avatarLawrence Brakmo <brakmo@fb.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8d650cde
    • Wei Yongjun's avatar
      ag71xx: fix return value check in ag71xx_probe() · 269b7c5f
      Wei Yongjun authored
      In case of error, the function of_get_mac_address() returns ERR_PTR()
      and never returns NULL. The NULL test in the return value check should
      be replaced with IS_ERR().
      
      Fixes: d51b6ce4 ("net: ethernet: add ag71xx driver")
      Signed-off-by: default avatarWei Yongjun <weiyongjun1@huawei.com>
      Reviewed-by: default avatarOleksij Rempel <o.rempel@pengutronix.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      269b7c5f
    • Wei Yongjun's avatar
      ag71xx: fix error return code in ag71xx_probe() · 6f5fa8d2
      Wei Yongjun authored
      Fix to return error code -ENOMEM from the dmam_alloc_coherent() error
      handling case instead of 0, as done elsewhere in this function.
      
      Fixes: d51b6ce4 ("net: ethernet: add ag71xx driver")
      Signed-off-by: default avatarWei Yongjun <weiyongjun1@huawei.com>
      Reviewed-by: default avatarOleksij Rempel <o.rempel@pengutronix.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6f5fa8d2
    • Matteo Croce's avatar
      proc/sysctl: add shared variables for range check · eec4844f
      Matteo Croce authored
      In the sysctl code the proc_dointvec_minmax() function is often used to
      validate the user supplied value between an allowed range.  This
      function uses the extra1 and extra2 members from struct ctl_table as
      minimum and maximum allowed value.
      
      On sysctl handler declaration, in every source file there are some
      readonly variables containing just an integer which address is assigned
      to the extra1 and extra2 members, so the sysctl range is enforced.
      
      The special values 0, 1 and INT_MAX are very often used as range
      boundary, leading duplication of variables like zero=0, one=1,
      int_max=INT_MAX in different source files:
      
          $ git grep -E '\.extra[12].*&(zero|one|int_max)' |wc -l
          248
      
      Add a const int array containing the most commonly used values, some
      macros to refer more easily to the correct array member, and use them
      instead of creating a local one for every object file.
      
      This is the bloat-o-meter output comparing the old and new binary
      compiled with the default Fedora config:
      
          # scripts/bloat-o-meter -d vmlinux.o.old vmlinux.o
          add/remove: 2/2 grow/shrink: 0/2 up/down: 24/-188 (-164)
          Data                                         old     new   delta
          sysctl_vals                                    -      12     +12
          __kstrtab_sysctl_vals                          -      12     +12
          max                                           14      10      -4
          int_max                                       16       -     -16
          one                                           68       -     -68
          zero                                         128      28    -100
          Total: Before=20583249, After=20583085, chg -0.00%
      
      [mcroce@redhat.com: tipc: remove two unused variables]
        Link: http://lkml.kernel.org/r/20190530091952.4108-1-mcroce@redhat.com
      [akpm@linux-foundation.org: fix net/ipv6/sysctl_net_ipv6.c]
      [arnd@arndb.de: proc/sysctl: make firmware loader table conditional]
        Link: http://lkml.kernel.org/r/20190617130014.1713870-1-arnd@arndb.de
      [akpm@linux-foundation.org: fix fs/eventpoll.c]
      Link: http://lkml.kernel.org/r/20190430180111.10688-1-mcroce@redhat.comSigned-off-by: default avatarMatteo Croce <mcroce@redhat.com>
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Acked-by: default avatarKees Cook <keescook@chromium.org>
      Reviewed-by: default avatarAaron Tomlin <atomlin@redhat.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      eec4844f
    • Keith Busch's avatar
      mm: migrate: remove unused mode argument · 37109694
      Keith Busch authored
      migrate_page_move_mapping() doesn't use the mode argument.  Remove it
      and update callers accordingly.
      
      Link: http://lkml.kernel.org/r/20190508210301.8472-1-keith.busch@intel.comSigned-off-by: default avatarKeith Busch <keith.busch@intel.com>
      Reviewed-by: default avatarZi Yan <ziy@nvidia.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      37109694
    • Dan Williams's avatar
      mm/sparsemem: cleanup 'section number' data types · 9a845030
      Dan Williams authored
      David points out that there is a mixture of 'int' and 'unsigned long'
      usage for section number data types.  Update the memory hotplug path to
      use 'unsigned long' consistently for section numbers.
      
      [akpm@linux-foundation.org: fix printk format]
      Link: http://lkml.kernel.org/r/156107543656.1329419.11505835211949439815.stgit@dwillia2-desk3.amr.corp.intel.comSigned-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      Reported-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: Jason Gunthorpe <jgg@mellanox.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9a845030
    • Dan Williams's avatar
      libnvdimm/pfn: stop padding pmem namespaces to section alignment · a3619190
      Dan Williams authored
      Now that the mm core supports section-unaligned hotplug of ZONE_DEVICE
      memory, we no longer need to add padding at pfn/dax device creation
      time.  The kernel will still honor padding established by older kernels.
      
      Link: http://lkml.kernel.org/r/156092356588.979959.6793371748950931916.stgit@dwillia2-desk3.amr.corp.intel.comSigned-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      Reported-by: default avatarJeff Moyer <jmoyer@redhat.com>
      Tested-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>	[ppc64]
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Jane Chu <jane.chu@oracle.com>
      Cc: Jérôme Glisse <jglisse@redhat.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Logan Gunthorpe <logang@deltatee.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Mike Rapoport <rppt@linux.ibm.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Toshi Kani <toshi.kani@hpe.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Wei Yang <richardw.yang@linux.intel.com>
      Cc: Jason Gunthorpe <jgg@mellanox.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a3619190
    • Dan Williams's avatar
      libnvdimm/pfn: fix fsdax-mode namespace info-block zero-fields · 7e3e888d
      Dan Williams authored
      At namespace creation time there is the potential for the "expected to
      be zero" fields of a 'pfn' info-block to be filled with indeterminate
      data.  While the kernel buffer is zeroed on allocation it is immediately
      overwritten by nd_pfn_validate() filling it with the current contents of
      the on-media info-block location.  For fields like, 'flags' and the
      'padding' it potentially means that future implementations can not rely on
      those fields being zero.
      
      In preparation to stop using the 'start_pad' and 'end_trunc' fields for
      section alignment, arrange for fields that are not explicitly
      initialized to be guaranteed zero.  Bump the minor version to indicate
      it is safe to assume the 'padding' and 'flags' are zero.  Otherwise,
      this corruption is expected to benign since all other critical fields
      are explicitly initialized.
      
      Note The cc: stable is about spreading this new policy to as many
      kernels as possible not fixing an issue in those kernels.  It is not
      until the change titled "libnvdimm/pfn: Stop padding pmem namespaces to
      section alignment" where this improper initialization becomes a problem.
      So if someone decides to backport "libnvdimm/pfn: Stop padding pmem
      namespaces to section alignment" (which is not tagged for stable), make
      sure this pre-requisite is flagged.
      
      Link: http://lkml.kernel.org/r/156092356065.979959.6681003754765958296.stgit@dwillia2-desk3.amr.corp.intel.com
      Fixes: 32ab0a3f ("libnvdimm, pmem: 'struct page' for pmem")
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      Tested-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>	[ppc64]
      Cc: <stable@vger.kernel.org>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Jane Chu <jane.chu@oracle.com>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Jérôme Glisse <jglisse@redhat.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Logan Gunthorpe <logang@deltatee.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Mike Rapoport <rppt@linux.ibm.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Toshi Kani <toshi.kani@hpe.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Wei Yang <richardw.yang@linux.intel.com>
      Cc: Jason Gunthorpe <jgg@mellanox.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7e3e888d
    • Dan Williams's avatar
      mm/devm_memremap_pages: enable sub-section remap · 7cc7867f
      Dan Williams authored
      Teach devm_memremap_pages() about the new sub-section capabilities of
      arch_{add,remove}_memory().  Effectively, just replace all usage of
      align_start, align_end, and align_size with res->start, res->end, and
      resource_size(res).  The existing sanity check will still make sure that
      the two separate remap attempts do not collide within a sub-section (2MB
      on x86).
      
      Link: http://lkml.kernel.org/r/156092355542.979959.10060071713397030576.stgit@dwillia2-desk3.amr.corp.intel.comSigned-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      Tested-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>	[ppc64]
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Toshi Kani <toshi.kani@hpe.com>
      Cc: Jérôme Glisse <jglisse@redhat.com>
      Cc: Logan Gunthorpe <logang@deltatee.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Jane Chu <jane.chu@oracle.com>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Mike Rapoport <rppt@linux.ibm.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Wei Yang <richardw.yang@linux.intel.com>
      Cc: Jason Gunthorpe <jgg@mellanox.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7cc7867f
    • Dan Williams's avatar
      mm: document ZONE_DEVICE memory-model implications · a0653406
      Dan Williams authored
      Explain the general mechanisms of 'ZONE_DEVICE' pages and list the users
      of 'devm_memremap_pages()'.
      
      [dan.j.williams@intel.com: update ZONE_DEVICE memory model documentation]
        Link: http://lkml.kernel.org/r/156109575458.1409767.1885676287099277666.stgit@dwillia2-desk3.amr.corp.intel.com
      Link: http://lkml.kernel.org/r/156092354985.979959.15763234410543451710.stgit@dwillia2-desk3.amr.corp.intel.comSigned-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      Reported-by: default avatarMike Rapoport <rppt@linux.ibm.com>
      Reviewed-by: default avatarMike Rapoport <rppt@linux.ibm.com>
      Tested-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>	[ppc64]
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Jane Chu <jane.chu@oracle.com>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Jérôme Glisse <jglisse@redhat.com>
      Cc: Logan Gunthorpe <logang@deltatee.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Toshi Kani <toshi.kani@hpe.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Wei Yang <richardw.yang@linux.intel.com>
      Cc: Jason Gunthorpe <jgg@mellanox.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a0653406
    • Dan Williams's avatar
      mm/sparsemem: support sub-section hotplug · ba72b4c8
      Dan Williams authored
      The libnvdimm sub-system has suffered a series of hacks and broken
      workarounds for the memory-hotplug implementation's awkward
      section-aligned (128MB) granularity.
      
      For example the following backtrace is emitted when attempting
      arch_add_memory() with physical address ranges that intersect 'System
      RAM' (RAM) with 'Persistent Memory' (PMEM) within a given section:
      
          # cat /proc/iomem | grep -A1 -B1 Persistent\ Memory
          100000000-1ffffffff : System RAM
          200000000-303ffffff : Persistent Memory (legacy)
          304000000-43fffffff : System RAM
          440000000-23ffffffff : Persistent Memory
          2400000000-43bfffffff : Persistent Memory
            2400000000-43bfffffff : namespace2.0
      
          WARNING: CPU: 38 PID: 928 at arch/x86/mm/init_64.c:850 add_pages+0x5c/0x60
          [..]
          RIP: 0010:add_pages+0x5c/0x60
          [..]
          Call Trace:
           devm_memremap_pages+0x460/0x6e0
           pmem_attach_disk+0x29e/0x680 [nd_pmem]
           ? nd_dax_probe+0xfc/0x120 [libnvdimm]
           nvdimm_bus_probe+0x66/0x160 [libnvdimm]
      
      It was discovered that the problem goes beyond RAM vs PMEM collisions as
      some platform produce PMEM vs PMEM collisions within a given section.
      The libnvdimm workaround for that case revealed that the libnvdimm
      section-alignment-padding implementation has been broken for a long
      while.
      
      A fix for that long-standing breakage introduces as many problems as it
      solves as it would require a backward-incompatible change to the
      namespace metadata interpretation.  Instead of that dubious route [1],
      address the root problem in the memory-hotplug implementation.
      
      Note that EEXIST is no longer treated as success as that is how
      sparse_add_section() reports subsection collisions, it was also obviated
      by recent changes to perform the request_region() for 'System RAM'
      before arch_add_memory() in the add_memory() sequence.
      
      [1] https://lore.kernel.org/r/155000671719.348031.2347363160141119237.stgit@dwillia2-desk3.amr.corp.intel.com
      
      [osalvador@suse.de: fix deactivate_section for early sections]
        Link: http://lkml.kernel.org/r/20190715081549.32577-2-osalvador@suse.de
      Link: http://lkml.kernel.org/r/156092354368.979959.6232443923440952359.stgit@dwillia2-desk3.amr.corp.intel.comSigned-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      Signed-off-by: default avatarOscar Salvador <osalvador@suse.de>
      Tested-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>	[ppc64]
      Reviewed-by: default avatarOscar Salvador <osalvador@suse.de>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Logan Gunthorpe <logang@deltatee.com>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Jane Chu <jane.chu@oracle.com>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Jérôme Glisse <jglisse@redhat.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Mike Rapoport <rppt@linux.ibm.com>
      Cc: Toshi Kani <toshi.kani@hpe.com>
      Cc: Wei Yang <richardw.yang@linux.intel.com>
      Cc: Jason Gunthorpe <jgg@mellanox.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ba72b4c8
    • Dan Williams's avatar
      mm/sparsemem: prepare for sub-section ranges · 7ea62160
      Dan Williams authored
      Prepare the memory hot-{add,remove} paths for handling sub-section
      ranges by plumbing the starting page frame and number of pages being
      handled through arch_{add,remove}_memory() to
      sparse_{add,remove}_one_section().
      
      This is simply plumbing, small cleanups, and some identifier renames.
      No intended functional changes.
      
      Link: http://lkml.kernel.org/r/156092353780.979959.9713046515562743194.stgit@dwillia2-desk3.amr.corp.intel.comSigned-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      Reviewed-by: default avatarPavel Tatashin <pasha.tatashin@soleen.com>
      Tested-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>	[ppc64]
      Reviewed-by: default avatarOscar Salvador <osalvador@suse.de>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Logan Gunthorpe <logang@deltatee.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Jane Chu <jane.chu@oracle.com>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Jérôme Glisse <jglisse@redhat.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Mike Rapoport <rppt@linux.ibm.com>
      Cc: Toshi Kani <toshi.kani@hpe.com>
      Cc: Wei Yang <richardw.yang@linux.intel.com>
      Cc: Jason Gunthorpe <jgg@mellanox.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7ea62160
    • Dan Williams's avatar
      mm: kill is_dev_zone() helper · 46d945ae
      Dan Williams authored
      Given there are no more usages of is_dev_zone() outside of 'ifdef
      CONFIG_ZONE_DEVICE' protection, kill off the compilation helper.
      
      Link: http://lkml.kernel.org/r/156092353211.979959.1489004866360828964.stgit@dwillia2-desk3.amr.corp.intel.comSigned-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      Reviewed-by: default avatarOscar Salvador <osalvador@suse.de>
      Reviewed-by: default avatarPavel Tatashin <pasha.tatashin@soleen.com>
      Reviewed-by: default avatarWei Yang <richardw.yang@linux.intel.com>
      Acked-by: default avatarDavid Hildenbrand <david@redhat.com>
      Tested-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>	[ppc64]
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Logan Gunthorpe <logang@deltatee.com>
      Cc: Jane Chu <jane.chu@oracle.com>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Jérôme Glisse <jglisse@redhat.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Mike Rapoport <rppt@linux.ibm.com>
      Cc: Toshi Kani <toshi.kani@hpe.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Jason Gunthorpe <jgg@mellanox.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      46d945ae
    • Dan Williams's avatar
      mm/hotplug: kill is_dev_zone() usage in __remove_pages() · 96da4350
      Dan Williams authored
      The zone type check was a leftover from the cleanup that plumbed altmap
      through the memory hotplug path, i.e.  commit da024512 "mm: pass the
      vmem_altmap to arch_remove_memory and __remove_pages".
      
      Link: http://lkml.kernel.org/r/156092352642.979959.6664333788149363039.stgit@dwillia2-desk3.amr.corp.intel.comSigned-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarOscar Salvador <osalvador@suse.de>
      Tested-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>	[ppc64]
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Logan Gunthorpe <logang@deltatee.com>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Jane Chu <jane.chu@oracle.com>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Jérôme Glisse <jglisse@redhat.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Mike Rapoport <rppt@linux.ibm.com>
      Cc: Toshi Kani <toshi.kani@hpe.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Wei Yang <richardw.yang@linux.intel.com>
      Cc: Jason Gunthorpe <jgg@mellanox.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      96da4350
    • Dan Williams's avatar
      mm/sparsemem: convert kmalloc_section_memmap() to populate_section_memmap() · e9c0a3f0
      Dan Williams authored
      Allow sub-section sized ranges to be added to the memmap.
      
      populate_section_memmap() takes an explict pfn range rather than
      assuming a full section, and those parameters are plumbed all the way
      through to vmmemap_populate().  There should be no sub-section usage in
      current deployments.  New warnings are added to clarify which memmap
      allocation paths are sub-section capable.
      
      Link: http://lkml.kernel.org/r/156092352058.979959.6551283472062305149.stgit@dwillia2-desk3.amr.corp.intel.comSigned-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      Reviewed-by: default avatarPavel Tatashin <pasha.tatashin@soleen.com>
      Tested-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>	[ppc64]
      Reviewed-by: default avatarOscar Salvador <osalvador@suse.de>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Logan Gunthorpe <logang@deltatee.com>
      Cc: Jane Chu <jane.chu@oracle.com>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Jérôme Glisse <jglisse@redhat.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Mike Rapoport <rppt@linux.ibm.com>
      Cc: Toshi Kani <toshi.kani@hpe.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Wei Yang <richardw.yang@linux.intel.com>
      Cc: Jason Gunthorpe <jgg@mellanox.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e9c0a3f0
    • Dan Williams's avatar
      mm/hotplug: prepare shrink_{zone, pgdat}_span for sub-section removal · 49ba3c6b
      Dan Williams authored
      Sub-section hotplug support reduces the unit of operation of hotplug
      from section-sized-units (PAGES_PER_SECTION) to sub-section-sized units
      (PAGES_PER_SUBSECTION).  Teach shrink_{zone,pgdat}_span() to consider
      PAGES_PER_SUBSECTION boundaries as the points where pfn_valid(), not
      valid_section(), can toggle.
      
      [osalvador@suse.de: fix shrink_{zone,node}_span]
        Link: http://lkml.kernel.org/r/20190717090725.23618-3-osalvador@suse.de
      Link: http://lkml.kernel.org/r/156092351496.979959.12703722803097017492.stgit@dwillia2-desk3.amr.corp.intel.comSigned-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      Signed-off-by: default avatarOscar Salvador <osalvador@suse.de>
      Reviewed-by: default avatarPavel Tatashin <pasha.tatashin@soleen.com>
      Reviewed-by: default avatarOscar Salvador <osalvador@suse.de>
      Tested-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>	[ppc64]
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Logan Gunthorpe <logang@deltatee.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Jane Chu <jane.chu@oracle.com>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Jérôme Glisse <jglisse@redhat.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Mike Rapoport <rppt@linux.ibm.com>
      Cc: Toshi Kani <toshi.kani@hpe.com>
      Cc: Wei Yang <richardw.yang@linux.intel.com>
      Cc: Jason Gunthorpe <jgg@mellanox.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      49ba3c6b
    • Dan Williams's avatar
      mm/sparsemem: add helpers track active portions of a section at boot · f46edbd1
      Dan Williams authored
      Prepare for hot{plug,remove} of sub-ranges of a section by tracking a
      sub-section active bitmask, each bit representing a PMD_SIZE span of the
      architecture's memory hotplug section size.
      
      The implications of a partially populated section is that pfn_valid()
      needs to go beyond a valid_section() check and either determine that the
      section is an "early section", or read the sub-section active ranges
      from the bitmask.  The expectation is that the bitmask (subsection_map)
      fits in the same cacheline as the valid_section() / early_section()
      data, so the incremental performance overhead to pfn_valid() should be
      negligible.
      
      The rationale for using early_section() to short-ciruit the
      subsection_map check is that there are legacy code paths that use
      pfn_valid() at section granularity before validating the pfn against
      pgdat data.  So, the early_section() check allows those traditional
      assumptions to persist while also permitting subsection_map to tell the
      truth for purposes of populating the unused portions of early sections
      with PMEM and other ZONE_DEVICE mappings.
      
      Link: http://lkml.kernel.org/r/156092350874.979959.18185938451405518285.stgit@dwillia2-desk3.amr.corp.intel.comSigned-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      Reported-by: default avatarQian Cai <cai@lca.pw>
      Tested-by: default avatarJane Chu <jane.chu@oracle.com>
      Tested-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>	[ppc64]
      Reviewed-by: default avatarOscar Salvador <osalvador@suse.de>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Logan Gunthorpe <logang@deltatee.com>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Jérôme Glisse <jglisse@redhat.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Mike Rapoport <rppt@linux.ibm.com>
      Cc: Toshi Kani <toshi.kani@hpe.com>
      Cc: Wei Yang <richardw.yang@linux.intel.com>
      Cc: Jason Gunthorpe <jgg@mellanox.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      f46edbd1
    • Dan Williams's avatar
      mm/sparsemem: introduce a SECTION_IS_EARLY flag · 326e1b8f
      Dan Williams authored
      In preparation for sub-section hotplug, track whether a given section
      was created during early memory initialization, or later via memory
      hotplug.  This distinction is needed to maintain the coarse expectation
      that pfn_valid() returns true for any pfn within a given section even if
      that section has pages that are reserved from the page allocator.
      
      For example one of the of goals of subsection hotplug is to support
      cases where the system physical memory layout collides System RAM and
      PMEM within a section.  Several pfn_valid() users expect to just check
      if a section is valid, but they are not careful to check if the given
      pfn is within a "System RAM" boundary and instead expect pgdat
      information to further validate the pfn.
      
      Rather than unwind those paths to make their pfn_valid() queries more
      precise a follow on patch uses the SECTION_IS_EARLY flag to maintain the
      traditional expectation that pfn_valid() returns true for all early
      sections.
      
      Link: https://lore.kernel.org/lkml/1560366952-10660-1-git-send-email-cai@lca.pw/
      Link: http://lkml.kernel.org/r/156092350358.979959.5817209875548072819.stgit@dwillia2-desk3.amr.corp.intel.comSigned-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      Reported-by: default avatarQian Cai <cai@lca.pw>
      Tested-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>	[ppc64]
      Reviewed-by: default avatarOscar Salvador <osalvador@suse.de>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Logan Gunthorpe <logang@deltatee.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Jane Chu <jane.chu@oracle.com>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Jérôme Glisse <jglisse@redhat.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Mike Rapoport <rppt@linux.ibm.com>
      Cc: Toshi Kani <toshi.kani@hpe.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Wei Yang <richardw.yang@linux.intel.com>
      Cc: Jason Gunthorpe <jgg@mellanox.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      326e1b8f