1. 11 Feb, 2020 1 commit
  2. 10 Feb, 2020 1 commit
    • Mel Gorman's avatar
      sched/fair: Allow a per-CPU kthread waking a task to stack on the same CPU, to... · 52262ee5
      Mel Gorman authored
      sched/fair: Allow a per-CPU kthread waking a task to stack on the same CPU, to fix XFS performance regression
      
      The following XFS commit:
      
        8ab39f11 ("xfs: prevent CIL push holdoff in log recovery")
      
      changed the logic from using bound workqueues to using unbound
      workqueues. Functionally this makes sense but it was observed at the
      time that the dbench performance dropped quite a lot and CPU migrations
      were increased.
      
      The current pattern of the task migration is straight-forward. With XFS,
      an IO issuer delegates work to xlog_cil_push_work ()on an unbound kworker.
      This runs on a nearby CPU and on completion, dbench wakes up on its old CPU
      as it is still idle and no migration occurs. dbench then queues the real
      IO on the blk_mq_requeue_work() work item which runs on a bound kworker
      which is forced to run on the same CPU as dbench. When IO completes,
      the bound kworker wakes dbench but as the kworker is a bound but,
      real task, the CPU is not considered idle and dbench gets migrated by
      select_idle_sibling() to a new CPU. dbench may ping-pong between two CPUs
      for a while but ultimately it starts a round-robin of all CPUs sharing
      the same LLC. High-frequency migration on each IO completion has poor
      performance overall. It has negative implications both in commication
      costs and power management. mpstat confirmed that at low thread counts
      that all CPUs sharing an LLC has low level of activity.
      
      Note that even if the CIL patch was reverted, there still would
      be migrations but the impact is less noticeable. It turns out that
      individually the scheduler, XFS, blk-mq and workqueues all made sensible
      decisions but in combination, the overall effect was sub-optimal.
      
      This patch special cases the IO issue/completion pattern and allows
      a bound kworker waker and a task wakee to stack on the same CPU if
      there is a strong chance they are directly related. The expectation
      is that the kworker is likely going back to sleep shortly. This is not
      guaranteed as the IO could be queued asynchronously but there is a very
      strong relationship between the task and kworker in this case that would
      justify stacking on the same CPU instead of migrating. There should be
      few concerns about kworker starvation given that the special casing is
      only when the kworker is the waker.
      
      DBench on XFS
      MMTests config: io-dbench4-async modified to run on a fresh XFS filesystem
      
      UMA machine with 8 cores sharing LLC
                                5.5.0-rc7              5.5.0-rc7
                        tipsched-20200124           kworkerstack
      Amean     1        22.63 (   0.00%)       20.54 *   9.23%*
      Amean     2        25.56 (   0.00%)       23.40 *   8.44%*
      Amean     4        28.63 (   0.00%)       27.85 *   2.70%*
      Amean     8        37.66 (   0.00%)       37.68 (  -0.05%)
      Amean     64      469.47 (   0.00%)      468.26 (   0.26%)
      Stddev    1         1.00 (   0.00%)        0.72 (  28.12%)
      Stddev    2         1.62 (   0.00%)        1.97 ( -21.54%)
      Stddev    4         2.53 (   0.00%)        3.58 ( -41.19%)
      Stddev    8         5.30 (   0.00%)        5.20 (   1.92%)
      Stddev    64       86.36 (   0.00%)       94.53 (  -9.46%)
      
      NUMA machine, 48 CPUs total, 24 CPUs share cache
                                 5.5.0-rc7              5.5.0-rc7
                         tipsched-20200124      kworkerstack-v1r2
      Amean     1         58.69 (   0.00%)       30.21 *  48.53%*
      Amean     2         60.90 (   0.00%)       35.29 *  42.05%*
      Amean     4         66.77 (   0.00%)       46.55 *  30.28%*
      Amean     8         81.41 (   0.00%)       68.46 *  15.91%*
      Amean     16       113.29 (   0.00%)      107.79 *   4.85%*
      Amean     32       199.10 (   0.00%)      198.22 *   0.44%*
      Amean     64       478.99 (   0.00%)      477.06 *   0.40%*
      Amean     128     1345.26 (   0.00%)     1372.64 *  -2.04%*
      Stddev    1          2.64 (   0.00%)        4.17 ( -58.08%)
      Stddev    2          4.35 (   0.00%)        5.38 ( -23.73%)
      Stddev    4          6.77 (   0.00%)        6.56 (   3.00%)
      Stddev    8         11.61 (   0.00%)       10.91 (   6.04%)
      Stddev    16        18.63 (   0.00%)       19.19 (  -3.01%)
      Stddev    32        38.71 (   0.00%)       38.30 (   1.06%)
      Stddev    64       100.28 (   0.00%)       91.24 (   9.02%)
      Stddev    128      186.87 (   0.00%)      160.34 (  14.20%)
      
      Dbench has been modified to report the time to complete a single "load
      file". This is a more meaningful metric for dbench that a throughput
      metric as the benchmark makes many different system calls that are not
      throughput-related
      
      Patch shows a 9.23% and 48.53% reduction in the time to process a load
      file with the difference partially explained by the number of CPUs sharing
      a LLC. In a separate run, task migrations were almost eliminated by the
      patch for low client counts. In case people have issue with the metric
      used for the benchmark, this is a comparison of the throughputs as
      reported by dbench on the NUMA machine.
      
      dbench4 Throughput (misleading but traditional)
                                 5.5.0-rc7              5.5.0-rc7
                         tipsched-20200124      kworkerstack-v1r2
      Hmean     1        321.41 (   0.00%)      617.82 *  92.22%*
      Hmean     2        622.87 (   0.00%)     1066.80 *  71.27%*
      Hmean     4       1134.56 (   0.00%)     1623.74 *  43.12%*
      Hmean     8       1869.96 (   0.00%)     2212.67 *  18.33%*
      Hmean     16      2673.11 (   0.00%)     2806.13 *   4.98%*
      Hmean     32      3032.74 (   0.00%)     3039.54 (   0.22%)
      Hmean     64      2514.25 (   0.00%)     2498.96 *  -0.61%*
      Hmean     128     1778.49 (   0.00%)     1746.05 *  -1.82%*
      
      Note that this is somewhat specific to XFS and ext4 shows no performance
      difference as it does not rely on kworkers in the same way. No major
      problem was observed running other workloads on different machines although
      not all tests have completed yet.
      Signed-off-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: https://lkml.kernel.org/r/20200128154006.GD3466@techsingularity.netSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      52262ee5
  3. 28 Jan, 2020 6 commits
    • Vincent Guittot's avatar
      sched/fair: Prevent unlimited runtime on throttled group · 2a4b03ff
      Vincent Guittot authored
      When a running task is moved on a throttled task group and there is no
      other task enqueued on the CPU, the task can keep running using 100% CPU
      whatever the allocated bandwidth for the group and although its cfs rq is
      throttled. Furthermore, the group entity of the cfs_rq and its parents are
      not enqueued but only set as curr on their respective cfs_rqs.
      
      We have the following sequence:
      
      sched_move_task
        -dequeue_task: dequeue task and group_entities.
        -put_prev_task: put task and group entities.
        -sched_change_group: move task to new group.
        -enqueue_task: enqueue only task but not group entities because cfs_rq is
          throttled.
        -set_next_task : set task and group_entities as current sched_entity of
          their cfs_rq.
      
      Another impact is that the root cfs_rq runnable_load_avg at root rq stays
      null because the group_entities are not enqueued. This situation will stay
      the same until an "external" event triggers a reschedule. Let trigger it
      immediately instead.
      Signed-off-by: default avatarVincent Guittot <vincent.guittot@linaro.org>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Acked-by: default avatarBen Segall <bsegall@google.com>
      Link: https://lkml.kernel.org/r/1579011236-31256-1-git-send-email-vincent.guittot@linaro.org
      2a4b03ff
    • Wanpeng Li's avatar
      sched/nohz: Optimize get_nohz_timer_target() · e938b9c9
      Wanpeng Li authored
      On a machine, CPU 0 is used for housekeeping, the other 39 CPUs in the
      same socket are in nohz_full mode. We can observe huge time burn in the
      loop for seaching nearest busy housekeeper cpu by ftrace.
      
        2)               |                        get_nohz_timer_target() {
        2)   0.240 us    |                          housekeeping_test_cpu();
        2)   0.458 us    |                          housekeeping_test_cpu();
      
        ...
      
        2)   0.292 us    |                          housekeeping_test_cpu();
        2)   0.240 us    |                          housekeeping_test_cpu();
        2)   0.227 us    |                          housekeeping_any_cpu();
        2) + 43.460 us   |                        }
      
      This patch optimizes the searching logic by finding a nearest housekeeper
      CPU in the housekeeping cpumask, it can minimize the worst searching time
      from ~44us to < 10us in my testing. In addition, the last iterated busy
      housekeeper can become a random candidate while current CPU is a better
      fallback if it is a housekeeper.
      Signed-off-by: default avatarWanpeng Li <wanpengli@tencent.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Reviewed-by: default avatarFrederic Weisbecker <frederic@kernel.org>
      Link: https://lkml.kernel.org/r/1578876627-11938-1-git-send-email-wanpengli@tencent.com
      e938b9c9
    • Qais Yousef's avatar
      sched/uclamp: Reject negative values in cpu_uclamp_write() · b562d140
      Qais Yousef authored
      The check to ensure that the new written value into cpu.uclamp.{min,max}
      is within range, [0:100], wasn't working because of the signed
      comparison
      
       7301                 if (req.percent > UCLAMP_PERCENT_SCALE) {
       7302                         req.ret = -ERANGE;
       7303                         return req;
       7304                 }
      
      	# echo -1 > cpu.uclamp.min
      	# cat cpu.uclamp.min
      	42949671.96
      
      Cast req.percent into u64 to force the comparison to be unsigned and
      work as intended in capacity_from_percent().
      
      	# echo -1 > cpu.uclamp.min
      	sh: write error: Numerical result out of range
      
      Fixes: 2480c093 ("sched/uclamp: Extend CPU's cgroup controller")
      Signed-off-by: default avatarQais Yousef <qais.yousef@arm.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Link: https://lkml.kernel.org/r/20200114210947.14083-1-qais.yousef@arm.com
      b562d140
    • Mel Gorman's avatar
      sched/fair: Allow a small load imbalance between low utilisation SD_NUMA domains · b396f523
      Mel Gorman authored
      The CPU load balancer balances between different domains to spread load
      and strives to have equal balance everywhere. Communicating tasks can
      migrate so they are topologically close to each other but these decisions
      are independent. On a lightly loaded NUMA machine, two communicating tasks
      pulled together at wakeup time can be pushed apart by the load balancer.
      In isolation, the load balancer decision is fine but it ignores the tasks
      data locality and the wakeup/LB paths continually conflict. NUMA balancing
      is also a factor but it also simply conflicts with the load balancer.
      
      This patch allows a fixed degree of imbalance of two tasks to exist
      between NUMA domains regardless of utilisation levels. In many cases,
      this prevents communicating tasks being pulled apart. It was evaluated
      whether the imbalance should be scaled to the domain size. However, no
      additional benefit was measured across a range of workloads and machines
      and scaling adds the risk that lower domains have to be rebalanced. While
      this could change again in the future, such a change should specify the
      use case and benefit.
      
      The most obvious impact is on netperf TCP_STREAM -- two simple
      communicating tasks with some softirq offload depending on the
      transmission rate.
      
       2-socket Haswell machine 48 core, HT enabled
       netperf-tcp -- mmtests config config-network-netperf-unbound
      			      baseline              lbnuma-v3
       Hmean     64         568.73 (   0.00%)      577.56 *   1.55%*
       Hmean     128       1089.98 (   0.00%)     1128.06 *   3.49%*
       Hmean     256       2061.72 (   0.00%)     2104.39 *   2.07%*
       Hmean     1024      7254.27 (   0.00%)     7557.52 *   4.18%*
       Hmean     2048     11729.20 (   0.00%)    13350.67 *  13.82%*
       Hmean     3312     15309.08 (   0.00%)    18058.95 *  17.96%*
       Hmean     4096     17338.75 (   0.00%)    20483.66 *  18.14%*
       Hmean     8192     25047.12 (   0.00%)    27806.84 *  11.02%*
       Hmean     16384    27359.55 (   0.00%)    33071.88 *  20.88%*
       Stddev    64           2.16 (   0.00%)        2.02 (   6.53%)
       Stddev    128          2.31 (   0.00%)        2.19 (   5.05%)
       Stddev    256         11.88 (   0.00%)        3.22 (  72.88%)
       Stddev    1024        23.68 (   0.00%)        7.24 (  69.43%)
       Stddev    2048        79.46 (   0.00%)       71.49 (  10.03%)
       Stddev    3312        26.71 (   0.00%)       57.80 (-116.41%)
       Stddev    4096       185.57 (   0.00%)       96.15 (  48.19%)
       Stddev    8192       245.80 (   0.00%)      100.73 (  59.02%)
       Stddev    16384      207.31 (   0.00%)      141.65 (  31.67%)
      
      In this case, there was a sizable improvement to performance and
      a general reduction in variance. However, this is not univeral.
      For most machines, the impact was roughly a 3% performance gain.
      
       Ops NUMA base-page range updates       19796.00         292.00
       Ops NUMA PTE updates                   19796.00         292.00
       Ops NUMA PMD updates                       0.00           0.00
       Ops NUMA hint faults                   16113.00         143.00
       Ops NUMA hint local faults %            8407.00         142.00
       Ops NUMA hint local percent               52.18          99.30
       Ops NUMA pages migrated                 4244.00           1.00
      
      Without the patch, only 52.18% of sampled accesses are local.  In an
      earlier changelog, 100% of sampled accesses are local and indeed on
      most machines, this was still the case. In this specific case, the
      local sampled rates was 99.3% but note the "base-page range updates"
      and "PTE updates".  The activity with the patch is negligible as were
      the number of faults. The small number of pages migrated were related to
      shared libraries.  A 2-socket Broadwell showed better results on average
      but are not presented for brevity as the performance was similar except
      it showed 100% of the sampled NUMA hints were local. The patch holds up
      for a 4-socket Haswell, an AMD EPYC and AMD Epyc 2 machine.
      
      For dbench, the impact depends on the filesystem used and the number of
      clients. On XFS, there is little difference as the clients typically
      communicate with workqueues which have a separate class of scheduler
      problem at the moment. For ext4, performance is generally better,
      particularly for small numbers of clients as NUMA balancing activity is
      negligible with the patch applied.
      
      A more interesting example is the Facebook schbench which uses a
      number of messaging threads to communicate with worker threads. In this
      configuration, one messaging thread is used per NUMA node and the number of
      worker threads is varied. The 50, 75, 90, 95, 99, 99.5 and 99.9 percentiles
      for response latency is then reported.
      
       Lat 50.00th-qrtle-1        44.00 (   0.00%)       37.00 (  15.91%)
       Lat 75.00th-qrtle-1        53.00 (   0.00%)       41.00 (  22.64%)
       Lat 90.00th-qrtle-1        57.00 (   0.00%)       42.00 (  26.32%)
       Lat 95.00th-qrtle-1        63.00 (   0.00%)       43.00 (  31.75%)
       Lat 99.00th-qrtle-1        76.00 (   0.00%)       51.00 (  32.89%)
       Lat 99.50th-qrtle-1        89.00 (   0.00%)       52.00 (  41.57%)
       Lat 99.90th-qrtle-1        98.00 (   0.00%)       55.00 (  43.88%)
       Lat 50.00th-qrtle-2        42.00 (   0.00%)       42.00 (   0.00%)
       Lat 75.00th-qrtle-2        48.00 (   0.00%)       47.00 (   2.08%)
       Lat 90.00th-qrtle-2        53.00 (   0.00%)       52.00 (   1.89%)
       Lat 95.00th-qrtle-2        55.00 (   0.00%)       53.00 (   3.64%)
       Lat 99.00th-qrtle-2        62.00 (   0.00%)       60.00 (   3.23%)
       Lat 99.50th-qrtle-2        63.00 (   0.00%)       63.00 (   0.00%)
       Lat 99.90th-qrtle-2        68.00 (   0.00%)       66.00 (   2.94%
      
      For higher worker threads, the differences become negligible but it's
      interesting to note the difference in wakeup latency at low utilisation
      and mpstat confirms that activity was almost all on one node until
      the number of worker threads increase.
      
      Hackbench generally showed neutral results across a range of machines.
      This is different to earlier versions of the patch which allowed imbalances
      for higher degrees of utilisation. perf bench pipe showed negligible
      differences in overall performance as the differences are very close to
      the noise.
      
      An earlier prototype of the patch showed major regressions for NAS C-class
      when running with only half of the available CPUs -- 20-30% performance
      hits were measured at the time. With this version of the patch, the impact
      is negligible with small gains/losses within the noise measured. This is
      because the number of threads far exceeds the small imbalance the aptch
      cares about. Similarly, there were report of regressions for the autonuma
      benchmark against earlier versions but again, normal load balancing now
      applies for that workload.
      
      In general, the patch simply seeks to avoid unnecessary cross-node
      migrations in the basic case where imbalances are very small.  For low
      utilisation communicating workloads, this patch generally behaves better
      with less NUMA balancing activity. For high utilisation, there is no
      change in behaviour.
      Signed-off-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Reviewed-by: default avatarValentin Schneider <valentin.schneider@arm.com>
      Reviewed-by: default avatarVincent Guittot <vincent.guittot@linaro.org>
      Reviewed-by: default avatarSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      Acked-by: default avatarPhil Auld <pauld@redhat.com>
      Tested-by: default avatarPhil Auld <pauld@redhat.com>
      Link: https://lkml.kernel.org/r/20200114101319.GO3466@techsingularity.net
      b396f523
    • Peter Zijlstra (Intel)'s avatar
      timers/nohz: Update NOHZ load in remote tick · ebc0f83c
      Peter Zijlstra (Intel) authored
      The way loadavg is tracked during nohz only pays attention to the load
      upon entering nohz.  This can be particularly noticeable if full nohz is
      entered while non-idle, and then the cpu goes idle and stays that way for
      a long time.
      
      Use the remote tick to ensure that full nohz cpus report their deltas
      within a reasonable time.
      
      [ swood: Added changelog and removed recheck of stopped tick. ]
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: default avatarScott Wood <swood@redhat.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Link: https://lkml.kernel.org/r/1578736419-14628-3-git-send-email-swood@redhat.com
      ebc0f83c
    • Scott Wood's avatar
      sched/core: Don't skip remote tick for idle CPUs · 488603b8
      Scott Wood authored
      This will be used in the next patch to get a loadavg update from
      nohz cpus.  The delta check is skipped because idle_sched_class
      doesn't update se.exec_start.
      Signed-off-by: default avatarScott Wood <swood@redhat.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Link: https://lkml.kernel.org/r/1578736419-14628-2-git-send-email-swood@redhat.com
      488603b8
  4. 20 Jan, 2020 1 commit
  5. 17 Jan, 2020 14 commits
  6. 25 Dec, 2019 9 commits
  7. 23 Dec, 2019 2 commits
  8. 22 Dec, 2019 6 commits
    • Linus Torvalds's avatar
      Merge tag 'xfs-5.5-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux · c6017471
      Linus Torvalds authored
      Pull xfs fixes from Darrick Wong:
       "Fix a few bugs that could lead to corrupt files, fsck complaints, and
        filesystem crashes:
      
         - Minor documentation fixes
      
         - Fix a file corruption due to read racing with an insert range
           operation.
      
         - Fix log reservation overflows when allocating large rt extents
      
         - Fix a buffer log item flags check
      
         - Don't allow administrators to mount with sunit= options that will
           cause later xfs_repair complaints about the root directory being
           suspicious because the fs geometry appeared inconsistent
      
         - Fix a non-static helper that should have been static"
      
      * tag 'xfs-5.5-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
        xfs: Make the symbol 'xfs_rtalloc_log_count' static
        xfs: don't commit sunit/swidth updates to disk if that would cause repair failures
        xfs: split the sunit parameter update into two parts
        xfs: refactor agfl length computation function
        libxfs: resync with the userspace libxfs
        xfs: use bitops interface for buf log item AIL flag check
        xfs: fix log reservation overflows when allocating large rt extents
        xfs: stabilize insert range start boundary to avoid COW writeback race
        xfs: fix Sphinx documentation warning
      c6017471
    • Linus Torvalds's avatar
      Merge tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 · a3965607
      Linus Torvalds authored
      Pull ext4 bug fixes from Ted Ts'o:
       "Ext4 bug fixes, including a regression fix"
      
      * tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
        ext4: clarify impact of 'commit' mount option
        ext4: fix unused-but-set-variable warning in ext4_add_entry()
        jbd2: fix kernel-doc notation warning
        ext4: use RCU API in debug_print_tree
        ext4: validate the debug_want_extra_isize mount option at parse time
        ext4: reserve revoke credits in __ext4_new_inode
        ext4: unlock on error in ext4_expand_extra_isize()
        ext4: optimize __ext4_check_dir_entry()
        ext4: check for directory entries too close to block end
        ext4: fix ext4_empty_dir() for directories with holes
      a3965607
    • Linus Torvalds's avatar
      Merge tag 'block-5.5-20191221' of git://git.kernel.dk/linux-block · 44579f35
      Linus Torvalds authored
      Pull block fixes from Jens Axboe:
       "Let's try this one again, this time without the compat_ioctl changes.
        We've got those fixed up, but that can go out next week.
      
        This contains:
      
         - block queue flush lockdep annotation (Bart)
      
         - Type fix for bsg_queue_rq() (Bart)
      
         - Three dasd fixes (Stefan, Jan)
      
         - nbd deadlock fix (Mike)
      
         - Error handling bio user map fix (Yang)
      
         - iocost fix (Tejun)
      
         - sbitmap waitqueue addition fix that affects the kyber IO scheduler
           (David)"
      
      * tag 'block-5.5-20191221' of git://git.kernel.dk/linux-block:
        sbitmap: only queue kyber's wait callback if not already active
        block: fix memleak when __blk_rq_map_user_iov() is failed
        s390/dasd: fix typo in copyright statement
        s390/dasd: fix memleak in path handling error case
        s390/dasd/cio: Interpret ccw_device_get_mdc return value correctly
        block: Fix a lockdep complaint triggered by request queue flushing
        block: Fix the type of 'sts' in bsg_queue_rq()
        block: end bio with BLK_STS_AGAIN in case of non-mq devs and REQ_NOWAIT
        nbd: fix shutdown and recv work deadlock v2
        iocost: over-budget forced IOs should schedule async delay
      44579f35
    • Linus Torvalds's avatar
      Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm · a313c8e0
      Linus Torvalds authored
      Pull KVM fixes from Paolo Bonzini:
       "PPC:
         - Fix a bug where we try to do an ultracall on a system without an
           ultravisor
      
        KVM:
         - Fix uninitialised sysreg accessor
         - Fix handling of demand-paged device mappings
         - Stop spamming the console on IMPDEF sysregs
         - Relax mappings of writable memslots
         - Assorted cleanups
      
        MIPS:
         - Now orphan, James Hogan is stepping down
      
        x86:
         - MAINTAINERS change, so long Radim and thanks for all the fish
         - supported CPUID fixes for AMD machines without SPEC_CTRL"
      
      * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
        MAINTAINERS: remove Radim from KVM maintainers
        MAINTAINERS: Orphan KVM for MIPS
        kvm: x86: Host feature SSBD doesn't imply guest feature AMD_SSBD
        kvm: x86: Host feature SSBD doesn't imply guest feature SPEC_CTRL_SSBD
        KVM: PPC: Book3S HV: Don't do ultravisor calls on systems without ultravisor
        KVM: arm/arm64: Properly handle faulting of device mappings
        KVM: arm64: Ensure 'params' is initialised when looking up sys register
        KVM: arm/arm64: Remove excessive permission check in kvm_arch_prepare_memory_region
        KVM: arm64: Don't log IMP DEF sysreg traps
        KVM: arm64: Sanely ratelimit sysreg messages
        KVM: arm/arm64: vgic: Use wrapper function to lock/unlock all vcpus in kvm_vgic_create()
        KVM: arm/arm64: vgic: Fix potential double free dist->spis in __kvm_vgic_destroy()
        KVM: arm/arm64: Get rid of unused arg in cpu_init_hyp_mode()
      a313c8e0
    • Linus Torvalds's avatar
      Merge tag 'riscv/for-v5.5-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux · 7214618c
      Linus Torvalds authored
      Pull RISC-V fixes from Paul Walmsley:
       "Several fixes, and one cleanup, for RISC-V.
      
        Fixes:
      
         - Fix an error in a Kconfig file that resulted in an undefined
           Kconfig option "CONFIG_CONFIG_MMU"
      
         - Fix undefined Kconfig option "CONFIG_CONFIG_MMU"
      
         - Fix scratch register clearing in M-mode (affects nommu users)
      
         - Fix a mismerge on my part that broke the build for
           CONFIG_SPARSEMEM_VMEMMAP users
      
        Cleanup:
      
         - Move SiFive L2 cache-related code to drivers/soc, per request"
      
      * tag 'riscv/for-v5.5-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
        riscv: move sifive_l2_cache.c to drivers/soc
        riscv: define vmemmap before pfn_to_page calls
        riscv: fix scratch register clearing in M-mode.
        riscv: Fix use of undefined config option CONFIG_CONFIG_MMU
      7214618c
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 78bac77b
      Linus Torvalds authored
      Pull networking fixes from David Miller:
      
       1) Several nf_flow_table_offload fixes from Pablo Neira Ayuso,
          including adding a missing ipv6 match description.
      
       2) Several heap overflow fixes in mwifiex from qize wang and Ganapathi
          Bhat.
      
       3) Fix uninit value in bond_neigh_init(), from Eric Dumazet.
      
       4) Fix non-ACPI probing of nxp-nci, from Stephan Gerhold.
      
       5) Fix use after free in tipc_disc_rcv(), from Tuong Lien.
      
       6) Enforce limit of 33 tail calls in mips and riscv JIT, from Paul
          Chaignon.
      
       7) Multicast MAC limit test is off by one in qede, from Manish Chopra.
      
       8) Fix established socket lookup race when socket goes from
          TCP_ESTABLISHED to TCP_LISTEN, because there lacks an intervening
          RCU grace period. From Eric Dumazet.
      
       9) Don't send empty SKBs from tcp_write_xmit(), also from Eric Dumazet.
      
      10) Fix active backup transition after link failure in bonding, from
          Mahesh Bandewar.
      
      11) Avoid zero sized hash table in gtp driver, from Taehee Yoo.
      
      12) Fix wrong interface passed to ->mac_link_up(), from Russell King.
      
      13) Fix DSA egress flooding settings in b53, from Florian Fainelli.
      
      14) Memory leak in gmac_setup_txqs(), from Navid Emamdoost.
      
      15) Fix double free in dpaa2-ptp code, from Ioana Ciornei.
      
      16) Reject invalid MTU values in stmmac, from Jose Abreu.
      
      17) Fix refcount leak in error path of u32 classifier, from Davide
          Caratti.
      
      18) Fix regression causing iwlwifi firmware crashes on boot, from Anders
          Kaseorg.
      
      19) Fix inverted return value logic in llc2 code, from Chan Shu Tak.
      
      20) Disable hardware GRO when XDP is attached to qede, frm Manish
          Chopra.
      
      21) Since we encode state in the low pointer bits, dst metrics must be
          at least 4 byte aligned, which is not necessarily true on m68k. Add
          annotations to fix this, from Geert Uytterhoeven.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (160 commits)
        sfc: Include XDP packet headroom in buffer step size.
        sfc: fix channel allocation with brute force
        net: dst: Force 4-byte alignment of dst_metrics
        selftests: pmtu: fix init mtu value in description
        hv_netvsc: Fix unwanted rx_table reset
        net: phy: ensure that phy IDs are correctly typed
        mod_devicetable: fix PHY module format
        qede: Disable hardware gro when xdp prog is installed
        net: ena: fix issues in setting interrupt moderation params in ethtool
        net: ena: fix default tx interrupt moderation interval
        net/smc: unregister ib devices in reboot_event
        net: stmmac: platform: Fix MDIO init for platforms without PHY
        llc2: Fix return statement of llc_stat_ev_rx_null_dsap_xid_c (and _test_c)
        net: hisilicon: Fix a BUG trigered by wrong bytes_compl
        net: dsa: ksz: use common define for tag len
        s390/qeth: don't return -ENOTSUPP to userspace
        s390/qeth: fix promiscuous mode after reset
        s390/qeth: handle error due to unsupported transport mode
        cxgb4: fix refcount init for TC-MQPRIO offload
        tc-testing: initial tdc selftests for cls_u32
        ...
      78bac77b