1. 06 Nov, 2019 3 commits
    • Mel Gorman's avatar
      mm, meminit: recalculate pcpu batch and high limits after init completes · 3e8fc007
      Mel Gorman authored
      Deferred memory initialisation updates zone->managed_pages during the
      initialisation phase but before that finishes, the per-cpu page
      allocator (pcpu) calculates the number of pages allocated/freed in
      batches as well as the maximum number of pages allowed on a per-cpu
      list.  As zone->managed_pages is not up to date yet, the pcpu
      initialisation calculates inappropriately low batch and high values.
      
      This increases zone lock contention quite severely in some cases with
      the degree of severity depending on how many CPUs share a local zone and
      the size of the zone.  A private report indicated that kernel build
      times were excessive with extremely high system CPU usage.  A perf
      profile indicated that a large chunk of time was lost on zone->lock
      contention.
      
      This patch recalculates the pcpu batch and high values after deferred
      initialisation completes for every populated zone in the system.  It was
      tested on a 2-socket AMD EPYC 2 machine using a kernel compilation
      workload -- allmodconfig and all available CPUs.
      
      mmtests configuration: config-workload-kernbench-max Configuration was
      modified to build on a fresh XFS partition.
      
      kernbench
                                      5.4.0-rc3              5.4.0-rc3
                                        vanilla           resetpcpu-v2
      Amean     user-256    13249.50 (   0.00%)    16401.31 * -23.79%*
      Amean     syst-256    14760.30 (   0.00%)     4448.39 *  69.86%*
      Amean     elsp-256      162.42 (   0.00%)      119.13 *  26.65%*
      Stddev    user-256       42.97 (   0.00%)       19.15 (  55.43%)
      Stddev    syst-256      336.87 (   0.00%)        6.71 (  98.01%)
      Stddev    elsp-256        2.46 (   0.00%)        0.39 (  84.03%)
      
                         5.4.0-rc3    5.4.0-rc3
                           vanilla resetpcpu-v2
      Duration User       39766.24     49221.79
      Duration System     44298.10     13361.67
      Duration Elapsed      519.11       388.87
      
      The patch reduces system CPU usage by 69.86% and total build time by
      26.65%.  The variance of system CPU usage is also much reduced.
      
      Before, this was the breakdown of batch and high values over all zones
      was:
      
          256               batch: 1
          256               batch: 63
          512               batch: 7
          256               high:  0
          256               high:  378
          512               high:  42
      
      512 pcpu pagesets had a batch limit of 7 and a high limit of 42.  After
      the patch:
      
          256               batch: 1
          768               batch: 63
          256               high:  0
          768               high:  378
      
      [mgorman@techsingularity.net: fix merge/linkage snafu]
        Link: http://lkml.kernel.org/r/20191023084705.GD3016@techsingularity.netLink: http://lkml.kernel.org/r/20191021094808.28824-2-mgorman@techsingularity.netSigned-off-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Acked-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Matt Fleming <matt@codeblueprint.co.uk>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Qian Cai <cai@lca.pw>
      Cc: <stable@vger.kernel.org>	[4.1+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      3e8fc007
    • John Hubbard's avatar
      mm/gup_benchmark: fix MAP_HUGETLB case · 64801d19
      John Hubbard authored
      The MAP_HUGETLB ("-H" option) of gup_benchmark fails:
      
        $ sudo ./gup_benchmark -H
        mmap: Invalid argument
      
      This is because gup_benchmark.c is passing in a file descriptor to
      mmap(), but the fd came from opening up the /dev/zero file.  This
      confuses the mmap syscall implementation, which thinks that, if the
      caller did not specify MAP_ANONYMOUS, then the file must be a huge page
      file.  So it attempts to verify that the file really is a huge page
      file, as you can see here:
      
      ksys_mmap_pgoff()
      {
          if (!(flags & MAP_ANONYMOUS)) {
              retval = -EINVAL;
              if (unlikely(flags & MAP_HUGETLB && !is_file_hugepages(file)))
                  goto out_fput; /* THIS IS WHERE WE END UP */
      
          else if (flags & MAP_HUGETLB) {
              ...proceed normally, /dev/zero is ok here...
      
      ...and of course is_file_hugepages() returns "false" for the /dev/zero
      file.
      
      The problem is that the user space program, gup_benchmark.c, really just
      wants anonymous memory here.  The simplest way to get that is to pass
      MAP_ANONYMOUS whenever MAP_HUGETLB is specified, so that's what this
      patch does.
      
      Link: http://lkml.kernel.org/r/20191021212435.398153-2-jhubbard@nvidia.comSigned-off-by: default avatarJohn Hubbard <jhubbard@nvidia.com>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarJérôme Glisse <jglisse@redhat.com>
      Cc: Keith Busch <keith.busch@intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      64801d19
    • Shakeel Butt's avatar
      mm: memcontrol: fix NULL-ptr deref in percpu stats flush · 7961eee3
      Shakeel Butt authored
      __mem_cgroup_free() can be called on the failure path in
      mem_cgroup_alloc().  However memcg_flush_percpu_vmstats() and
      memcg_flush_percpu_vmevents() which are called from __mem_cgroup_free()
      access the fields of memcg which can potentially be null if called from
      failure path from mem_cgroup_alloc().  Indeed syzbot has reported the
      following crash:
      
      	kasan: CONFIG_KASAN_INLINE enabled
      	kasan: GPF could be caused by NULL-ptr deref or user memory access
      	general protection fault: 0000 [#1] PREEMPT SMP KASAN
      	CPU: 0 PID: 30393 Comm: syz-executor.1 Not tainted 5.4.0-rc2+ #0
      	Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      	RIP: 0010:memcg_flush_percpu_vmstats+0x4ae/0x930 mm/memcontrol.c:3436
      	Code: 05 41 89 c0 41 0f b6 04 24 41 38 c7 7c 08 84 c0 0f 85 5d 03 00 00 44 3b 05 33 d5 12 08 0f 83 e2 00 00 00 4c 89 f0 48 c1 e8 03 <42> 80 3c 28 00 0f 85 91 03 00 00 48 8b 85 10 fe ff ff 48 8b b0 90
      	RSP: 0018:ffff888095c27980 EFLAGS: 00010206
      	RAX: 0000000000000012 RBX: ffff888095c27b28 RCX: ffffc90008192000
      	RDX: 0000000000040000 RSI: ffffffff8340fae7 RDI: 0000000000000007
      	RBP: ffff888095c27be0 R08: 0000000000000000 R09: ffffed1013f0da33
      	R10: ffffed1013f0da32 R11: ffff88809f86d197 R12: fffffbfff138b760
      	R13: dffffc0000000000 R14: 0000000000000090 R15: 0000000000000007
      	FS:  00007f5027170700(0000) GS:ffff8880ae800000(0000) knlGS:0000000000000000
      	CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      	CR2: 0000000000710158 CR3: 00000000a7b18000 CR4: 00000000001406f0
      	DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      	DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      	Call Trace:
      	__mem_cgroup_free+0x1a/0x190 mm/memcontrol.c:5021
      	mem_cgroup_free mm/memcontrol.c:5033 [inline]
      	mem_cgroup_css_alloc+0x3a1/0x1ae0 mm/memcontrol.c:5160
      	css_create kernel/cgroup/cgroup.c:5156 [inline]
      	cgroup_apply_control_enable+0x44d/0xc40 kernel/cgroup/cgroup.c:3119
      	cgroup_mkdir+0x899/0x11b0 kernel/cgroup/cgroup.c:5401
      	kernfs_iop_mkdir+0x14d/0x1d0 fs/kernfs/dir.c:1124
      	vfs_mkdir+0x42e/0x670 fs/namei.c:3807
      	do_mkdirat+0x234/0x2a0 fs/namei.c:3830
      	__do_sys_mkdir fs/namei.c:3846 [inline]
      	__se_sys_mkdir fs/namei.c:3844 [inline]
      	__x64_sys_mkdir+0x5c/0x80 fs/namei.c:3844
      	do_syscall_64+0xfa/0x760 arch/x86/entry/common.c:290
      	entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      Fixing this by moving the flush to mem_cgroup_free as there is no need
      to flush anything if we see failure in mem_cgroup_alloc().
      
      Link: http://lkml.kernel.org/r/20191018165231.249872-1-shakeelb@google.com
      Fixes: bb65f89b ("mm: memcontrol: flush percpu vmevents before releasing memcg")
      Fixes: c350a99e ("mm: memcontrol: flush percpu vmstats before releasing memcg")
      Signed-off-by: default avatarShakeel Butt <shakeelb@google.com>
      Reported-by: syzbot+515d5bcfe179cdf049b2@syzkaller.appspotmail.com
      Reviewed-by: default avatarRoman Gushchin <guro@fb.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7961eee3
  2. 03 Nov, 2019 2 commits
    • Linus Torvalds's avatar
      Linux 5.4-rc6 · a99d8080
      Linus Torvalds authored
      a99d8080
    • Linus Torvalds's avatar
      Merge tag 'usb-5.4-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb · 3a69c9e5
      Linus Torvalds authored
      Pull USB fixes from Greg KH:
       "The USB sub-maintainers woke up this past week and sent a bunch of
        tiny fixes. Here are a lot of small patches that that resolve a bunch
        of reported issues in the USB core, drivers, serial drivers, gadget
        drivers, and of course, xhci :)
      
        All of these have been in linux-next with no reported issues"
      
      * tag 'usb-5.4-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (31 commits)
        usb: dwc3: gadget: fix race when disabling ep with cancelled xfers
        usb: cdns3: gadget: Fix g_audio use case when connected to Super-Speed host
        usb: cdns3: gadget: reset EP_CLAIMED flag while unloading
        USB: serial: whiteheat: fix line-speed endianness
        USB: serial: whiteheat: fix potential slab corruption
        USB: gadget: Reject endpoints with 0 maxpacket value
        UAS: Revert commit 3ae62a42 ("UAS: fix alignment of scatter/gather segments")
        usb-storage: Revert commit 747668db ("usb-storage: Set virt_boundary_mask to avoid SG overflows")
        usbip: Fix free of unallocated memory in vhci tx
        usbip: tools: Fix read_usb_vudc_device() error path handling
        usb: xhci: fix __le32/__le64 accessors in debugfs code
        usb: xhci: fix Immediate Data Transfer endianness
        xhci: Fix use-after-free regression in xhci clear hub TT implementation
        USB: ldusb: fix control-message timeout
        USB: ldusb: use unsigned size format specifiers
        USB: ldusb: fix ring-buffer locking
        USB: Skip endpoints with 0 maxpacket length
        usb: cdns3: gadget: Don't manage pullups
        usb: dwc3: remove the call trace of USBx_GFLADJ
        usb: gadget: configfs: fix concurrent issue between composite APIs
        ...
      3a69c9e5
  3. 02 Nov, 2019 10 commits
    • Linus Torvalds's avatar
      Merge tag '5.4-rc6-smb3-fix' of git://git.samba.org/sfrench/cifs-2.6 · 56cfd250
      Linus Torvalds authored
      Pull cifs fix from Steve French:
       "A small smb3 memleak fix"
      
      * tag '5.4-rc6-smb3-fix' of git://git.samba.org/sfrench/cifs-2.6:
        fix memory leak in large read decrypt offload
      56cfd250
    • Linus Torvalds's avatar
      Merge tag 'hwmon-for-v5.4-rc6' of... · 9d234505
      Linus Torvalds authored
      Merge tag 'hwmon-for-v5.4-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging
      
      Pull hwmon fixes from Guenter Roeck:
      
       - Fix read timeout problem in ina3221 driver
      
       - Fix wrong bitmask in nct7904 driver
      
      * tag 'hwmon-for-v5.4-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
        hwmon: (ina3221) Fix read timeout issue
        hwmon: (nct7904) Fix the incorrect value of vsen_mask & tcpu_mask & temp_mode in nct7904_data struct.
      9d234505
    • Linus Torvalds's avatar
      Merge tag 'pwm/for-5.4-rc6' of... · e935842a
      Linus Torvalds authored
      Merge tag 'pwm/for-5.4-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/thierry.reding/linux-pwm
      
      Pull pwm fixes from Thierry Reding:
       "It turned out that relying solely on drivers storing all the PWM state
        in hardware was a little premature and causes a number of subtle (and
        some not so subtle) regressions. Revert the offending patch for now"
      
      * tag 'pwm/for-5.4-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/thierry.reding/linux-pwm:
        Revert "pwm: Let pwm_get_state() return the last implemented state"
      e935842a
    • Linus Torvalds's avatar
      Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi · f83e148a
      Linus Torvalds authored
      Pull SCSI fixes from James Bottomley:
       "Nine changes, eight in drivers [ufs, target, lpfc x 2, qla2xxx x 4]
        and one core change in sd that fixes an I/O failure on DIF type 3
        devices"
      
      * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
        scsi: qla2xxx: stop timer in shutdown path
        scsi: sd: define variable dif as unsigned int instead of bool
        scsi: target: cxgbit: Fix cxgbit_fw4_ack()
        scsi: qla2xxx: Fix partial flash write of MBI
        scsi: qla2xxx: Initialized mailbox to prevent driver load failure
        scsi: lpfc: Honor module parameter lpfc_use_adisc
        scsi: ufs-bsg: Wake the device before sending raw upiu commands
        scsi: lpfc: Check queue pointer before use
        scsi: qla2xxx: fixup incorrect usage of host_byte
      f83e148a
    • Linus Torvalds's avatar
      Merge tag 'powerpc-5.4-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux · 8194c28e
      Linus Torvalds authored
      Pull powerpc fixes from Michael Ellerman:
       "Our recent cleanup of EEH led to an oops on bare metal machines when
        the cxl (CAPI) driver creates virtual devices for an attached FPGA
        accelerator.
      
        The "secure virtual machine" support we added in v5.4 had a bug if the
        kernel was relocated (moved during boot), in those cases the signature
        of the kernel text wouldn't verify and the Ultravisor would refuse to
        run the VM.
      
        A recent change to disable interrupts before calling
        arch_cpu_idle_dead() caused a WARN_ON() in our bare metal CPU offline
        code to always trigger.
      
        The KUAP (SMAP) support we added for 32-bit Book3S had a bug if the
        address range crossed a segment (256MB) boundary which could lead to
        spurious faults.
      
        Thanks to: Christophe Leroy, Frederic Barrat, Michael Anderson,
        Nicholas Piggin, Sam Bobroff, Thiago Jung Bauermann"
      
      * tag 'powerpc-5.4-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
        powerpc/powernv: Fix CPU idle to be called with IRQs disabled
        powerpc/prom_init: Undo relocation before entering secure mode
        powerpc/powernv/eeh: Fix oops when probing cxl devices
        powerpc/32s: fix allow/prevent_user_access() when crossing segment boundaries.
      8194c28e
    • Linus Torvalds's avatar
      Merge tag 's390-5.4-6' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux · 969a5197
      Linus Torvalds authored
      Pull s390 fixes from Vasily Gorbik:
      
       - Fix cpu idle time accounting
      
       - Fix stack unwinder case when both pt_regs and sp are specified
      
       - Fix information leak via cmm timeout proc handler
      
      * tag 's390-5.4-6' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
        s390/idle: fix cpu idle time calculation
        s390/unwind: fix mixing regs and sp
        s390/cmm: fix information leak in cmm_timeout_handler()
      969a5197
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 1204c70d
      Linus Torvalds authored
      Pull networking fixes from David Miller:
      
       1) Fix free/alloc races in batmanadv, from Sven Eckelmann.
      
       2) Several leaks and other fixes in kTLS support of mlx5 driver, from
          Tariq Toukan.
      
       3) BPF devmap_hash cost calculation can overflow on 32-bit, from Toke
          Høiland-Jørgensen.
      
       4) Add an r8152 device ID, from Kazutoshi Noguchi.
      
       5) Missing include in ipv6's addrconf.c, from Ben Dooks.
      
       6) Use siphash in flow dissector, from Eric Dumazet. Attackers can
          easily infer the 32-bit secret otherwise etc.
      
       7) Several netdevice nesting depth fixes from Taehee Yoo.
      
       8) Fix several KCSAN reported errors, from Eric Dumazet. For example,
          when doing lockless skb_queue_empty() checks, and accessing
          sk_napi_id/sk_incoming_cpu lockless as well.
      
       9) Fix jumbo packet handling in RXRPC, from David Howells.
      
      10) Bump SOMAXCONN and tcp_max_syn_backlog values, from Eric Dumazet.
      
      11) Fix DMA synchronization in gve driver, from Yangchun Fu.
      
      12) Several bpf offload fixes, from Jakub Kicinski.
      
      13) Fix sk_page_frag() recursion during memory reclaim, from Tejun Heo.
      
      14) Fix ping latency during high traffic rates in hisilicon driver, from
          Jiangfent Xiao.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (146 commits)
        net: fix installing orphaned programs
        net: cls_bpf: fix NULL deref on offload filter removal
        selftests: bpf: Skip write only files in debugfs
        selftests: net: reuseport_dualstack: fix uninitalized parameter
        r8169: fix wrong PHY ID issue with RTL8168dp
        net: dsa: bcm_sf2: Fix IMP setup for port different than 8
        net: phylink: Fix phylink_dbg() macro
        gve: Fixes DMA synchronization.
        inet: stop leaking jiffies on the wire
        ixgbe: Remove duplicate clear_bit() call
        Documentation: networking: device drivers: Remove stray asterisks
        e1000: fix memory leaks
        i40e: Fix receive buffer starvation for AF_XDP
        igb: Fix constant media auto sense switching when no cable is connected
        net: ethernet: arc: add the missed clk_disable_unprepare
        igb: Enable media autosense for the i350.
        igb/igc: Don't warn on fatal read failures when the device is removed
        tcp: increase tcp_max_syn_backlog max value
        net: increase SOMAXCONN to 4096
        netdevsim: Fix use-after-free during device dismantle
        ...
      1204c70d
    • Linus Torvalds's avatar
      Merge tag 'nfs-for-5.4-3' of git://git.linux-nfs.org/projects/anna/linux-nfs · 372bf6c1
      Linus Torvalds authored
      Pull NFS client bugfixes from Anna Schumaker:
       "This contains two delegation fixes (with the RCU lock leak fix marked
        for stable), and three patches to fix destroying the the sunrpc back
        channel.
      
        Stable bugfixes:
      
         - Fix an RCU lock leak in nfs4_refresh_delegation_stateid()
      
        Other fixes:
      
         - The TCP back channel mustn't disappear while requests are
           outstanding
      
         - The RDMA back channel mustn't disappear while requests are
           outstanding
      
         - Destroy the back channel when we destroy the host transport
      
         - Don't allow a cached open with a revoked delegation"
      
      * tag 'nfs-for-5.4-3' of git://git.linux-nfs.org/projects/anna/linux-nfs:
        NFS: Fix an RCU lock leak in nfs4_refresh_delegation_stateid()
        NFSv4: Don't allow a cached open with a revoked delegation
        SUNRPC: Destroy the back channel when we destroy the host transport
        SUNRPC: The RDMA back channel mustn't disappear while requests are outstanding
        SUNRPC: The TCP back channel mustn't disappear while requests are outstanding
      372bf6c1
    • Linus Torvalds's avatar
      Merge tag 'for-linus-20191101' of git://git.kernel.dk/linux-block · 0821de28
      Linus Torvalds authored
      Pull block fixes from Jens Axboe:
      
       - Two small nvme fixes, one is a fabrics connection fix, the other one
         a cleanup made possible by that fix (Anton, via Keith)
      
       - Fix requeue handling in umb ubd (Anton)
      
       - Fix spin_lock_irq() nesting in blk-iocost (Dan)
      
       - Three small io_uring fixes:
           - Install io_uring fd after done with ctx (me)
           - Clear ->result before every poll issue (me)
           - Fix leak of shadow request on error (Pavel)
      
      * tag 'for-linus-20191101' of git://git.kernel.dk/linux-block:
        iocost: don't nest spin_lock_irq in ioc_weight_write()
        io_uring: ensure we clear io_kiocb->result before each issue
        um-ubd: Entrust re-queue to the upper layers
        nvme-multipath: remove unused groups_only mode in ana log
        nvme-multipath: fix possible io hang after ctrl reconnect
        io_uring: don't touch ctx in setup after ring fd install
        io_uring: Fix leaked shadow_req
      0821de28
    • Linus Torvalds's avatar
      Merge tag 'riscv/for-v5.4-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux · e5897c7d
      Linus Torvalds authored
      Pull RISC-V fixes from Paul Walmsley:
       "One fix for PCIe users:
      
         - Fix legacy PCI I/O port access emulation
      
        One set of cleanups:
      
         - Resolve most of the warnings generated by sparse across arch/riscv.
           No functional changes
      
        And one MAINTAINERS update:
      
         - Update Palmer's E-mail address"
      
      * tag 'riscv/for-v5.4-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
        MAINTAINERS: Change to my personal email address
        RISC-V: Add PCIe I/O BAR memory mapping
        riscv: for C functions called only from assembly, mark with __visible
        riscv: fp: add missing __user pointer annotations
        riscv: add missing header file includes
        riscv: mark some code and data as file-static
        riscv: init: merge split string literals in preprocessor directive
        riscv: add prototypes for assembly language functions from head.S
      e5897c7d
  4. 01 Nov, 2019 25 commits