1. 28 Jan, 2020 4 commits
    • Qais Yousef's avatar
      sched/uclamp: Reject negative values in cpu_uclamp_write() · b562d140
      Qais Yousef authored
      The check to ensure that the new written value into cpu.uclamp.{min,max}
      is within range, [0:100], wasn't working because of the signed
      comparison
      
       7301                 if (req.percent > UCLAMP_PERCENT_SCALE) {
       7302                         req.ret = -ERANGE;
       7303                         return req;
       7304                 }
      
      	# echo -1 > cpu.uclamp.min
      	# cat cpu.uclamp.min
      	42949671.96
      
      Cast req.percent into u64 to force the comparison to be unsigned and
      work as intended in capacity_from_percent().
      
      	# echo -1 > cpu.uclamp.min
      	sh: write error: Numerical result out of range
      
      Fixes: 2480c093 ("sched/uclamp: Extend CPU's cgroup controller")
      Signed-off-by: default avatarQais Yousef <qais.yousef@arm.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Link: https://lkml.kernel.org/r/20200114210947.14083-1-qais.yousef@arm.com
      b562d140
    • Mel Gorman's avatar
      sched/fair: Allow a small load imbalance between low utilisation SD_NUMA domains · b396f523
      Mel Gorman authored
      The CPU load balancer balances between different domains to spread load
      and strives to have equal balance everywhere. Communicating tasks can
      migrate so they are topologically close to each other but these decisions
      are independent. On a lightly loaded NUMA machine, two communicating tasks
      pulled together at wakeup time can be pushed apart by the load balancer.
      In isolation, the load balancer decision is fine but it ignores the tasks
      data locality and the wakeup/LB paths continually conflict. NUMA balancing
      is also a factor but it also simply conflicts with the load balancer.
      
      This patch allows a fixed degree of imbalance of two tasks to exist
      between NUMA domains regardless of utilisation levels. In many cases,
      this prevents communicating tasks being pulled apart. It was evaluated
      whether the imbalance should be scaled to the domain size. However, no
      additional benefit was measured across a range of workloads and machines
      and scaling adds the risk that lower domains have to be rebalanced. While
      this could change again in the future, such a change should specify the
      use case and benefit.
      
      The most obvious impact is on netperf TCP_STREAM -- two simple
      communicating tasks with some softirq offload depending on the
      transmission rate.
      
       2-socket Haswell machine 48 core, HT enabled
       netperf-tcp -- mmtests config config-network-netperf-unbound
      			      baseline              lbnuma-v3
       Hmean     64         568.73 (   0.00%)      577.56 *   1.55%*
       Hmean     128       1089.98 (   0.00%)     1128.06 *   3.49%*
       Hmean     256       2061.72 (   0.00%)     2104.39 *   2.07%*
       Hmean     1024      7254.27 (   0.00%)     7557.52 *   4.18%*
       Hmean     2048     11729.20 (   0.00%)    13350.67 *  13.82%*
       Hmean     3312     15309.08 (   0.00%)    18058.95 *  17.96%*
       Hmean     4096     17338.75 (   0.00%)    20483.66 *  18.14%*
       Hmean     8192     25047.12 (   0.00%)    27806.84 *  11.02%*
       Hmean     16384    27359.55 (   0.00%)    33071.88 *  20.88%*
       Stddev    64           2.16 (   0.00%)        2.02 (   6.53%)
       Stddev    128          2.31 (   0.00%)        2.19 (   5.05%)
       Stddev    256         11.88 (   0.00%)        3.22 (  72.88%)
       Stddev    1024        23.68 (   0.00%)        7.24 (  69.43%)
       Stddev    2048        79.46 (   0.00%)       71.49 (  10.03%)
       Stddev    3312        26.71 (   0.00%)       57.80 (-116.41%)
       Stddev    4096       185.57 (   0.00%)       96.15 (  48.19%)
       Stddev    8192       245.80 (   0.00%)      100.73 (  59.02%)
       Stddev    16384      207.31 (   0.00%)      141.65 (  31.67%)
      
      In this case, there was a sizable improvement to performance and
      a general reduction in variance. However, this is not univeral.
      For most machines, the impact was roughly a 3% performance gain.
      
       Ops NUMA base-page range updates       19796.00         292.00
       Ops NUMA PTE updates                   19796.00         292.00
       Ops NUMA PMD updates                       0.00           0.00
       Ops NUMA hint faults                   16113.00         143.00
       Ops NUMA hint local faults %            8407.00         142.00
       Ops NUMA hint local percent               52.18          99.30
       Ops NUMA pages migrated                 4244.00           1.00
      
      Without the patch, only 52.18% of sampled accesses are local.  In an
      earlier changelog, 100% of sampled accesses are local and indeed on
      most machines, this was still the case. In this specific case, the
      local sampled rates was 99.3% but note the "base-page range updates"
      and "PTE updates".  The activity with the patch is negligible as were
      the number of faults. The small number of pages migrated were related to
      shared libraries.  A 2-socket Broadwell showed better results on average
      but are not presented for brevity as the performance was similar except
      it showed 100% of the sampled NUMA hints were local. The patch holds up
      for a 4-socket Haswell, an AMD EPYC and AMD Epyc 2 machine.
      
      For dbench, the impact depends on the filesystem used and the number of
      clients. On XFS, there is little difference as the clients typically
      communicate with workqueues which have a separate class of scheduler
      problem at the moment. For ext4, performance is generally better,
      particularly for small numbers of clients as NUMA balancing activity is
      negligible with the patch applied.
      
      A more interesting example is the Facebook schbench which uses a
      number of messaging threads to communicate with worker threads. In this
      configuration, one messaging thread is used per NUMA node and the number of
      worker threads is varied. The 50, 75, 90, 95, 99, 99.5 and 99.9 percentiles
      for response latency is then reported.
      
       Lat 50.00th-qrtle-1        44.00 (   0.00%)       37.00 (  15.91%)
       Lat 75.00th-qrtle-1        53.00 (   0.00%)       41.00 (  22.64%)
       Lat 90.00th-qrtle-1        57.00 (   0.00%)       42.00 (  26.32%)
       Lat 95.00th-qrtle-1        63.00 (   0.00%)       43.00 (  31.75%)
       Lat 99.00th-qrtle-1        76.00 (   0.00%)       51.00 (  32.89%)
       Lat 99.50th-qrtle-1        89.00 (   0.00%)       52.00 (  41.57%)
       Lat 99.90th-qrtle-1        98.00 (   0.00%)       55.00 (  43.88%)
       Lat 50.00th-qrtle-2        42.00 (   0.00%)       42.00 (   0.00%)
       Lat 75.00th-qrtle-2        48.00 (   0.00%)       47.00 (   2.08%)
       Lat 90.00th-qrtle-2        53.00 (   0.00%)       52.00 (   1.89%)
       Lat 95.00th-qrtle-2        55.00 (   0.00%)       53.00 (   3.64%)
       Lat 99.00th-qrtle-2        62.00 (   0.00%)       60.00 (   3.23%)
       Lat 99.50th-qrtle-2        63.00 (   0.00%)       63.00 (   0.00%)
       Lat 99.90th-qrtle-2        68.00 (   0.00%)       66.00 (   2.94%
      
      For higher worker threads, the differences become negligible but it's
      interesting to note the difference in wakeup latency at low utilisation
      and mpstat confirms that activity was almost all on one node until
      the number of worker threads increase.
      
      Hackbench generally showed neutral results across a range of machines.
      This is different to earlier versions of the patch which allowed imbalances
      for higher degrees of utilisation. perf bench pipe showed negligible
      differences in overall performance as the differences are very close to
      the noise.
      
      An earlier prototype of the patch showed major regressions for NAS C-class
      when running with only half of the available CPUs -- 20-30% performance
      hits were measured at the time. With this version of the patch, the impact
      is negligible with small gains/losses within the noise measured. This is
      because the number of threads far exceeds the small imbalance the aptch
      cares about. Similarly, there were report of regressions for the autonuma
      benchmark against earlier versions but again, normal load balancing now
      applies for that workload.
      
      In general, the patch simply seeks to avoid unnecessary cross-node
      migrations in the basic case where imbalances are very small.  For low
      utilisation communicating workloads, this patch generally behaves better
      with less NUMA balancing activity. For high utilisation, there is no
      change in behaviour.
      Signed-off-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Reviewed-by: default avatarValentin Schneider <valentin.schneider@arm.com>
      Reviewed-by: default avatarVincent Guittot <vincent.guittot@linaro.org>
      Reviewed-by: default avatarSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      Acked-by: default avatarPhil Auld <pauld@redhat.com>
      Tested-by: default avatarPhil Auld <pauld@redhat.com>
      Link: https://lkml.kernel.org/r/20200114101319.GO3466@techsingularity.net
      b396f523
    • Peter Zijlstra (Intel)'s avatar
      timers/nohz: Update NOHZ load in remote tick · ebc0f83c
      Peter Zijlstra (Intel) authored
      The way loadavg is tracked during nohz only pays attention to the load
      upon entering nohz.  This can be particularly noticeable if full nohz is
      entered while non-idle, and then the cpu goes idle and stays that way for
      a long time.
      
      Use the remote tick to ensure that full nohz cpus report their deltas
      within a reasonable time.
      
      [ swood: Added changelog and removed recheck of stopped tick. ]
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: default avatarScott Wood <swood@redhat.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Link: https://lkml.kernel.org/r/1578736419-14628-3-git-send-email-swood@redhat.com
      ebc0f83c
    • Scott Wood's avatar
      sched/core: Don't skip remote tick for idle CPUs · 488603b8
      Scott Wood authored
      This will be used in the next patch to get a loadavg update from
      nohz cpus.  The delta check is skipped because idle_sched_class
      doesn't update se.exec_start.
      Signed-off-by: default avatarScott Wood <swood@redhat.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Link: https://lkml.kernel.org/r/1578736419-14628-2-git-send-email-swood@redhat.com
      488603b8
  2. 20 Jan, 2020 1 commit
  3. 17 Jan, 2020 14 commits
  4. 25 Dec, 2019 9 commits
  5. 23 Dec, 2019 2 commits
  6. 22 Dec, 2019 10 commits
    • Linus Torvalds's avatar
      Merge tag 'xfs-5.5-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux · c6017471
      Linus Torvalds authored
      Pull xfs fixes from Darrick Wong:
       "Fix a few bugs that could lead to corrupt files, fsck complaints, and
        filesystem crashes:
      
         - Minor documentation fixes
      
         - Fix a file corruption due to read racing with an insert range
           operation.
      
         - Fix log reservation overflows when allocating large rt extents
      
         - Fix a buffer log item flags check
      
         - Don't allow administrators to mount with sunit= options that will
           cause later xfs_repair complaints about the root directory being
           suspicious because the fs geometry appeared inconsistent
      
         - Fix a non-static helper that should have been static"
      
      * tag 'xfs-5.5-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
        xfs: Make the symbol 'xfs_rtalloc_log_count' static
        xfs: don't commit sunit/swidth updates to disk if that would cause repair failures
        xfs: split the sunit parameter update into two parts
        xfs: refactor agfl length computation function
        libxfs: resync with the userspace libxfs
        xfs: use bitops interface for buf log item AIL flag check
        xfs: fix log reservation overflows when allocating large rt extents
        xfs: stabilize insert range start boundary to avoid COW writeback race
        xfs: fix Sphinx documentation warning
      c6017471
    • Linus Torvalds's avatar
      Merge tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 · a3965607
      Linus Torvalds authored
      Pull ext4 bug fixes from Ted Ts'o:
       "Ext4 bug fixes, including a regression fix"
      
      * tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
        ext4: clarify impact of 'commit' mount option
        ext4: fix unused-but-set-variable warning in ext4_add_entry()
        jbd2: fix kernel-doc notation warning
        ext4: use RCU API in debug_print_tree
        ext4: validate the debug_want_extra_isize mount option at parse time
        ext4: reserve revoke credits in __ext4_new_inode
        ext4: unlock on error in ext4_expand_extra_isize()
        ext4: optimize __ext4_check_dir_entry()
        ext4: check for directory entries too close to block end
        ext4: fix ext4_empty_dir() for directories with holes
      a3965607
    • Linus Torvalds's avatar
      Merge tag 'block-5.5-20191221' of git://git.kernel.dk/linux-block · 44579f35
      Linus Torvalds authored
      Pull block fixes from Jens Axboe:
       "Let's try this one again, this time without the compat_ioctl changes.
        We've got those fixed up, but that can go out next week.
      
        This contains:
      
         - block queue flush lockdep annotation (Bart)
      
         - Type fix for bsg_queue_rq() (Bart)
      
         - Three dasd fixes (Stefan, Jan)
      
         - nbd deadlock fix (Mike)
      
         - Error handling bio user map fix (Yang)
      
         - iocost fix (Tejun)
      
         - sbitmap waitqueue addition fix that affects the kyber IO scheduler
           (David)"
      
      * tag 'block-5.5-20191221' of git://git.kernel.dk/linux-block:
        sbitmap: only queue kyber's wait callback if not already active
        block: fix memleak when __blk_rq_map_user_iov() is failed
        s390/dasd: fix typo in copyright statement
        s390/dasd: fix memleak in path handling error case
        s390/dasd/cio: Interpret ccw_device_get_mdc return value correctly
        block: Fix a lockdep complaint triggered by request queue flushing
        block: Fix the type of 'sts' in bsg_queue_rq()
        block: end bio with BLK_STS_AGAIN in case of non-mq devs and REQ_NOWAIT
        nbd: fix shutdown and recv work deadlock v2
        iocost: over-budget forced IOs should schedule async delay
      44579f35
    • Linus Torvalds's avatar
      Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm · a313c8e0
      Linus Torvalds authored
      Pull KVM fixes from Paolo Bonzini:
       "PPC:
         - Fix a bug where we try to do an ultracall on a system without an
           ultravisor
      
        KVM:
         - Fix uninitialised sysreg accessor
         - Fix handling of demand-paged device mappings
         - Stop spamming the console on IMPDEF sysregs
         - Relax mappings of writable memslots
         - Assorted cleanups
      
        MIPS:
         - Now orphan, James Hogan is stepping down
      
        x86:
         - MAINTAINERS change, so long Radim and thanks for all the fish
         - supported CPUID fixes for AMD machines without SPEC_CTRL"
      
      * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
        MAINTAINERS: remove Radim from KVM maintainers
        MAINTAINERS: Orphan KVM for MIPS
        kvm: x86: Host feature SSBD doesn't imply guest feature AMD_SSBD
        kvm: x86: Host feature SSBD doesn't imply guest feature SPEC_CTRL_SSBD
        KVM: PPC: Book3S HV: Don't do ultravisor calls on systems without ultravisor
        KVM: arm/arm64: Properly handle faulting of device mappings
        KVM: arm64: Ensure 'params' is initialised when looking up sys register
        KVM: arm/arm64: Remove excessive permission check in kvm_arch_prepare_memory_region
        KVM: arm64: Don't log IMP DEF sysreg traps
        KVM: arm64: Sanely ratelimit sysreg messages
        KVM: arm/arm64: vgic: Use wrapper function to lock/unlock all vcpus in kvm_vgic_create()
        KVM: arm/arm64: vgic: Fix potential double free dist->spis in __kvm_vgic_destroy()
        KVM: arm/arm64: Get rid of unused arg in cpu_init_hyp_mode()
      a313c8e0
    • Linus Torvalds's avatar
      Merge tag 'riscv/for-v5.5-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux · 7214618c
      Linus Torvalds authored
      Pull RISC-V fixes from Paul Walmsley:
       "Several fixes, and one cleanup, for RISC-V.
      
        Fixes:
      
         - Fix an error in a Kconfig file that resulted in an undefined
           Kconfig option "CONFIG_CONFIG_MMU"
      
         - Fix undefined Kconfig option "CONFIG_CONFIG_MMU"
      
         - Fix scratch register clearing in M-mode (affects nommu users)
      
         - Fix a mismerge on my part that broke the build for
           CONFIG_SPARSEMEM_VMEMMAP users
      
        Cleanup:
      
         - Move SiFive L2 cache-related code to drivers/soc, per request"
      
      * tag 'riscv/for-v5.5-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
        riscv: move sifive_l2_cache.c to drivers/soc
        riscv: define vmemmap before pfn_to_page calls
        riscv: fix scratch register clearing in M-mode.
        riscv: Fix use of undefined config option CONFIG_CONFIG_MMU
      7214618c
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 78bac77b
      Linus Torvalds authored
      Pull networking fixes from David Miller:
      
       1) Several nf_flow_table_offload fixes from Pablo Neira Ayuso,
          including adding a missing ipv6 match description.
      
       2) Several heap overflow fixes in mwifiex from qize wang and Ganapathi
          Bhat.
      
       3) Fix uninit value in bond_neigh_init(), from Eric Dumazet.
      
       4) Fix non-ACPI probing of nxp-nci, from Stephan Gerhold.
      
       5) Fix use after free in tipc_disc_rcv(), from Tuong Lien.
      
       6) Enforce limit of 33 tail calls in mips and riscv JIT, from Paul
          Chaignon.
      
       7) Multicast MAC limit test is off by one in qede, from Manish Chopra.
      
       8) Fix established socket lookup race when socket goes from
          TCP_ESTABLISHED to TCP_LISTEN, because there lacks an intervening
          RCU grace period. From Eric Dumazet.
      
       9) Don't send empty SKBs from tcp_write_xmit(), also from Eric Dumazet.
      
      10) Fix active backup transition after link failure in bonding, from
          Mahesh Bandewar.
      
      11) Avoid zero sized hash table in gtp driver, from Taehee Yoo.
      
      12) Fix wrong interface passed to ->mac_link_up(), from Russell King.
      
      13) Fix DSA egress flooding settings in b53, from Florian Fainelli.
      
      14) Memory leak in gmac_setup_txqs(), from Navid Emamdoost.
      
      15) Fix double free in dpaa2-ptp code, from Ioana Ciornei.
      
      16) Reject invalid MTU values in stmmac, from Jose Abreu.
      
      17) Fix refcount leak in error path of u32 classifier, from Davide
          Caratti.
      
      18) Fix regression causing iwlwifi firmware crashes on boot, from Anders
          Kaseorg.
      
      19) Fix inverted return value logic in llc2 code, from Chan Shu Tak.
      
      20) Disable hardware GRO when XDP is attached to qede, frm Manish
          Chopra.
      
      21) Since we encode state in the low pointer bits, dst metrics must be
          at least 4 byte aligned, which is not necessarily true on m68k. Add
          annotations to fix this, from Geert Uytterhoeven.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (160 commits)
        sfc: Include XDP packet headroom in buffer step size.
        sfc: fix channel allocation with brute force
        net: dst: Force 4-byte alignment of dst_metrics
        selftests: pmtu: fix init mtu value in description
        hv_netvsc: Fix unwanted rx_table reset
        net: phy: ensure that phy IDs are correctly typed
        mod_devicetable: fix PHY module format
        qede: Disable hardware gro when xdp prog is installed
        net: ena: fix issues in setting interrupt moderation params in ethtool
        net: ena: fix default tx interrupt moderation interval
        net/smc: unregister ib devices in reboot_event
        net: stmmac: platform: Fix MDIO init for platforms without PHY
        llc2: Fix return statement of llc_stat_ev_rx_null_dsap_xid_c (and _test_c)
        net: hisilicon: Fix a BUG trigered by wrong bytes_compl
        net: dsa: ksz: use common define for tag len
        s390/qeth: don't return -ENOTSUPP to userspace
        s390/qeth: fix promiscuous mode after reset
        s390/qeth: handle error due to unsupported transport mode
        cxgb4: fix refcount init for TC-MQPRIO offload
        tc-testing: initial tdc selftests for cls_u32
        ...
      78bac77b
    • Jan Stancek's avatar
      pipe: fix empty pipe check in pipe_write() · 0dd1e377
      Jan Stancek authored
      LTP pipeio_1 test is hanging with v5.5-rc2-385-gb8e382a1,
      with read side observing empty pipe and sleeping and write
      side running out of space and then sleeping as well. In this
      scenario there are 5 writers and 1 reader.
      
      Problem is that after pipe_write() reacquires pipe lock, it
      re-checks for empty pipe with potentially stale 'head' and
      doesn't wake up read side anymore. pipe->tail can advance
      beyond 'head', because there are multiple writers.
      
      Use pipe->head for empty pipe check after reacquiring lock
      to observe current state.
      
      Testing: With patch, LTP pipeio_1 ran successfully in loop for 1 hour.
               Without patch it hanged within a minute.
      
      Fixes: 1b6b26ae ("pipe: fix and clarify pipe write wakeup logic")
      Reported-by: default avatarRachel Sibley <rasibley@redhat.com>
      Signed-off-by: default avatarJan Stancek <jstancek@redhat.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      0dd1e377
    • Paolo Bonzini's avatar
      Merge tag 'kvm-ppc-fixes-5.5-1' of... · d68321de
      Paolo Bonzini authored
      Merge tag 'kvm-ppc-fixes-5.5-1' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc into kvm-master
      
      PPC KVM fix for 5.5
      
      - Fix a bug where we try to do an ultracall on a system without an
        ultravisor.
      d68321de
    • Paolo Bonzini's avatar
      MAINTAINERS: remove Radim from KVM maintainers · 19a049f1
      Paolo Bonzini authored
      Radim's kernel.org email is bouncing, which I take as a signal that
      he is not really able to deal with KVM at this time.  Make MAINTAINERS
      match the effective value of KVM's bus factor.
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      19a049f1
    • James Hogan's avatar
      MAINTAINERS: Orphan KVM for MIPS · 088e11d4
      James Hogan authored
      I haven't been active for 18 months, and don't have the hardware set up
      to test KVM for MIPS, so mark it as orphaned and remove myself as
      maintainer. Hopefully somebody from MIPS can pick this up.
      Signed-off-by: default avatarJames Hogan <jhogan@kernel.org>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: "Radim Krčmář" <rkrcmar@redhat.com>
      Cc: Paul Burton <paulburton@kernel.org>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: kvm@vger.kernel.org
      Cc: linux-mips@vger.kernel.org
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      088e11d4