1. 01 Dec, 2023 3 commits
    • Waiman Long's avatar
      cgroup: Avoid false cacheline sharing of read mostly rstat_cpu · 77070eeb
      Waiman Long authored
      The rstat_cpu and also rstat_css_list of the cgroup structure are read
      mostly variables. However, they may share the same cacheline as the
      subsequent rstat_flush_next and *bstat variables which can be updated
      frequently.  That will slow down the cgroup_rstat_cpu() call which is
      called pretty frequently in the rstat code. Add a CACHELINE_PADDING()
      line in between them to avoid false cacheline sharing.
      
      A parallel kernel build on a 2-socket x86-64 server is used as the
      benchmarking tool for measuring the lock hold time. Below were the lock
      hold time frequency distribution before and after the patch:
      
            Run time        Before patch       After patch
            --------        ------------       -----------
             0-01 us         9,928,562          9,820,428
            01-05 us           110,151             50,935
            05-10 us               270                 93
            10-15 us               273                146
            15-20 us               135                 76
            20-25 us                 0                  2
            25-30 us                 1                  0
      
      It can be seen that the patch further pushes the lock hold time towards
      the lower end.
      Signed-off-by: default avatarWaiman Long <longman@redhat.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      77070eeb
    • Waiman Long's avatar
      cgroup/rstat: Optimize cgroup_rstat_updated_list() · d499fd41
      Waiman Long authored
      The current design of cgroup_rstat_cpu_pop_updated() is to traverse
      the updated tree in a way to pop out the leaf nodes first before
      their parents. This can cause traversal of multiple nodes before a
      leaf node can be found and popped out. IOW, a given node in the tree
      can be visited multiple times before the whole operation is done. So
      it is not very efficient and the code can be hard to read.
      
      With the introduction of cgroup_rstat_updated_list() to build a list
      of cgroups to be flushed first before any flushing operation is being
      done, we can optimize the way the updated tree nodes are being popped
      by pushing the parents first to the tail end of the list before their
      children. In this way, most updated tree nodes will be visited only
      once with the exception of the subtree root as we still need to go
      back to its parent and popped it out of its updated_children list.
      This also makes the code easier to read.
      Signed-off-by: default avatarWaiman Long <longman@redhat.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      d499fd41
    • Josh Don's avatar
      cgroup: Fix documentation for cpu.idle · 7b91eb60
      Josh Don authored
      Two problems:
      	- cpu.idle cgroups show up with 0 weight, correct the
      	  documentation to indicate this.
      	- cpu.idle has no entry describing it.
      Signed-off-by: default avatarJosh Don <joshdon@google.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      7b91eb60
  2. 28 Nov, 2023 1 commit
    • Waiman Long's avatar
      cgroup/cpuset: Expose cpuset.cpus.isolated · 877c737d
      Waiman Long authored
      The root-only cpuset.cpus.isolated control file shows the current set
      of isolated CPUs in isolated partitions. This control file is currently
      exposed only with the cgroup_debug boot command line option which also
      adds the ".__DEBUG__." prefix. This is actually a useful control file if
      users want to find out which CPUs are currently in an isolated state by
      the cpuset controller. Remove CFTYPE_DEBUG flag for this control file and
      make it available by default without any prefix.
      
      The test_cpuset_prs.sh test script and the cgroup-v2.rst documentation
      file are also updated accordingly. Minor code change is also made in
      test_cpuset_prs.sh to avoid false test failure when running on debug
      kernel.
      Signed-off-by: default avatarWaiman Long <longman@redhat.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      877c737d
  3. 22 Nov, 2023 2 commits
    • Tejun Heo's avatar
      Merge branch 'for-6.7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq into for-6.8 · 20259566
      Tejun Heo authored
      cgroup/for-6.8 is carrying two workqueue changes to allow cpuset to restrict
      the CPUs used by unbound workqueues. Unfortunately, this conflicts with a
      new bug fix in wq/for-6.7-fixes. The conflict is contextual but can be a bit
      confusing to resolve. Pull the fix branch to resolve the conflict.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      20259566
    • Tejun Heo's avatar
      workqueue: Make sure that wq_unbound_cpumask is never empty · 4a6c5607
      Tejun Heo authored
      During boot, depending on how the housekeeping and workqueue.unbound_cpus
      masks are set, wq_unbound_cpumask can end up empty. Since 8639eceb
      ("workqueue: Implement non-strict affinity scope for unbound workqueues"),
      this may end up feeding -1 as a CPU number into scheduler leading to oopses.
      
        BUG: unable to handle page fault for address: ffffffff8305e9c0
        #PF: supervisor read access in kernel mode
        #PF: error_code(0x0000) - not-present page
        ...
        Call Trace:
         <TASK>
         select_idle_sibling+0x79/0xaf0
         select_task_rq_fair+0x1cb/0x7b0
         try_to_wake_up+0x29c/0x5c0
         wake_up_process+0x19/0x20
         kick_pool+0x5e/0xb0
         __queue_work+0x119/0x430
         queue_work_on+0x29/0x30
        ...
      
      An empty wq_unbound_cpumask is a clear misconfiguration and already
      disallowed once system is booted up. Let's warn on and ignore
      unbound_cpumask restrictions which lead to no unbound cpus. While at it,
      also remove now unncessary empty check on wq_unbound_cpumask in
      wq_select_unbound_cpu().
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Reported-and-Tested-by: default avatarYong He <alexyonghe@tencent.com>
      Link: http://lkml.kernel.org/r/20231120121623.119780-1-alexyonghe@tencent.com
      Fixes: 8639eceb ("workqueue: Implement non-strict affinity scope for unbound workqueues")
      Cc: stable@vger.kernel.org # v6.6+
      Reviewed-by: default avatarWaiman Long <longman@redhat.com>
      4a6c5607
  4. 21 Nov, 2023 1 commit
  5. 12 Nov, 2023 6 commits
    • Waiman Long's avatar
      cgroup/rstat: Reduce cpu_lock hold time in cgroup_rstat_flush_locked() · e76d28bd
      Waiman Long authored
      When cgroup_rstat_updated() isn't being called concurrently with
      cgroup_rstat_flush_locked(), its run time is pretty short. When
      both are called concurrently, the cgroup_rstat_updated() run time
      can spike to a pretty high value due to high cpu_lock hold time in
      cgroup_rstat_flush_locked(). This can be problematic if the task calling
      cgroup_rstat_updated() is a realtime task running on an isolated CPU
      with a strict latency requirement. The cgroup_rstat_updated() call can
      happen when there is a page fault even though the task is running in
      user space most of the time.
      
      The percpu cpu_lock is used to protect the update tree -
      updated_next and updated_children. This protection is only needed when
      cgroup_rstat_cpu_pop_updated() is being called. The subsequent flushing
      operation which can take a much longer time does not need that protection
      as it is already protected by cgroup_rstat_lock.
      
      To reduce the cpu_lock hold time, we need to perform all the
      cgroup_rstat_cpu_pop_updated() calls up front with the lock
      released afterward before doing any flushing. This patch adds a new
      cgroup_rstat_updated_list() function to return a singly linked list of
      cgroups to be flushed.
      
      Some instrumentation code are added to measure the cpu_lock hold time
      right after lock acquisition to after releasing the lock. Parallel
      kernel build on a 2-socket x86-64 server is used as the benchmarking
      tool for measuring the lock hold time.
      
      The maximum cpu_lock hold time before and after the patch are 100us and
      29us respectively. So the worst case time is reduced to about 30% of
      the original. However, there may be some OS or hardware noises like NMI
      or SMI in the test system that can worsen the worst case value. Those
      noises are usually tuned out in a real production environment to get
      a better result.
      
      OTOH, the lock hold time frequency distribution should give a better
      idea of the performance benefit of the patch.  Below were the frequency
      distribution before and after the patch:
      
           Hold time        Before patch       After patch
           ---------        ------------       -----------
             0-01 us           804,139         13,738,708
            01-05 us         9,772,767          1,177,194
            05-10 us         4,595,028              4,984
            10-15 us           303,481              3,562
            15-20 us            78,971              1,314
            20-25 us            24,583                 18
            25-30 us             6,908                 12
            30-40 us             8,015
            40-50 us             2,192
            50-60 us               316
            60-70 us                43
            70-80 us                 7
            80-90 us                 2
              >90 us                 3
      Signed-off-by: default avatarWaiman Long <longman@redhat.com>
      Reviewed-by: default avatarYosry Ahmed <yosryahmed@google.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      e76d28bd
    • Waiman Long's avatar
      cgroup/cpuset: Take isolated CPUs out of workqueue unbound cpumask · 72c6303a
      Waiman Long authored
      To make CPUs in isolated cpuset partition closer in isolation to
      the boot time isolated CPUs specified in the "isolcpus" boot command
      line option, we need to take those CPUs out of the workqueue unbound
      cpumask so that work functions from the unbound workqueues won't run
      on those CPUs.  Otherwise, they will interfere the user tasks running
      on those isolated CPUs.
      
      With the introduction of the workqueue_unbound_exclude_cpumask() helper
      function in an earlier commit, those isolated CPUs can now be taken
      out from the workqueue unbound cpumask.
      
      This patch also updates cgroup-v2.rst to mention that isolated
      CPUs will be excluded from unbound workqueue cpumask as well as
      updating test_cpuset_prs.sh to verify the correctness of the new
      *cpuset.cpus.isolated file, if available via cgroup_debug option.
      Signed-off-by: default avatarWaiman Long <longman@redhat.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      72c6303a
    • Waiman Long's avatar
      cgroup/cpuset: Keep track of CPUs in isolated partitions · 11e5f407
      Waiman Long authored
      Add a new internal isolated_cpus mask to keep track of the CPUs that are in
      isolated partitions. Expose that new cpumask as a new root-only control file
      ".cpuset.cpus.isolated".
      
      tj: Updated patch description to reflect dropping __DEBUG__ prefix.
      Signed-off-by: default avatarWaiman Long <longman@redhat.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      11e5f407
    • Waiman Long's avatar
      selftests/cgroup: Minor code cleanup and reorganization of test_cpuset_prs.sh · 14060dfc
      Waiman Long authored
      Minor cleanup of test matrix and relocation of test_isolated() function
      to prepare for the next patch. There is no functional change.
      Signed-off-by: default avatarWaiman Long <longman@redhat.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      14060dfc
    • Waiman Long's avatar
      workqueue: Add workqueue_unbound_exclude_cpumask() to exclude CPUs from wq_unbound_cpumask · fe28f631
      Waiman Long authored
      When the "isolcpus" boot command line option is used to add a set
      of isolated CPUs, those CPUs will be excluded automatically from
      wq_unbound_cpumask to avoid running work functions from unbound
      workqueues.
      
      Recently cpuset has been extended to allow the creation of partitions
      of isolated CPUs dynamically. To make it closer to the "isolcpus"
      in functionality, the CPUs in those isolated cpuset partitions should be
      excluded from wq_unbound_cpumask as well. This can be done currently by
      explicitly writing to the workqueue's cpumask sysfs file after creating
      the isolated partitions. However, this process can be error prone.
      
      Ideally, the cpuset code should be allowed to request the workqueue code
      to exclude those isolated CPUs from wq_unbound_cpumask so that this
      operation can be done automatically and the isolated CPUs will be returned
      back to wq_unbound_cpumask after the destructions of the isolated
      cpuset partitions.
      
      This patch adds a new workqueue_unbound_exclude_cpumask() function to
      enable that. This new function will exclude the specified isolated
      CPUs from wq_unbound_cpumask. To be able to restore those isolated
      CPUs back after the destruction of isolated cpuset partitions, a new
      wq_requested_unbound_cpumask is added to store the user provided unbound
      cpumask either from the boot command line options or from writing to
      the cpumask sysfs file. This new cpumask provides the basis for CPU
      exclusion.
      
      To enable users to understand how the wq_unbound_cpumask is being
      modified internally, this patch also exposes the newly introduced
      wq_requested_unbound_cpumask as well as a wq_isolated_cpumask to
      store the cpumask to be excluded from wq_unbound_cpumask as read-only
      sysfs files.
      Signed-off-by: default avatarWaiman Long <longman@redhat.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      fe28f631
    • Atul Kumar Pant's avatar
      421fc858
  6. 09 Nov, 2023 10 commits
    • Yafang Shao's avatar
      cgroup: Add a new helper for cgroup1 hierarchy · aecd408b
      Yafang Shao authored
      A new helper is added for cgroup1 hierarchy:
      
      - task_get_cgroup1
        Acquires the associated cgroup of a task within a specific cgroup1
        hierarchy. The cgroup1 hierarchy is identified by its hierarchy ID.
      
      This helper function is added to facilitate the tracing of tasks within
      a particular container or cgroup dir in BPF programs. It's important to
      note that this helper is designed specifically for cgroup1 only.
      
      tj: Use irsqsave/restore as suggested by Hou Tao <houtao@huaweicloud.com>.
      Suggested-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarYafang Shao <laoar.shao@gmail.com>
      Cc: Hou Tao <houtao@huaweicloud.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      aecd408b
    • Yafang Shao's avatar
      cgroup: Add annotation for holding namespace_sem in current_cgns_cgroup_from_root() · 0008454e
      Yafang Shao authored
      When I initially examined the function current_cgns_cgroup_from_root(), I
      was perplexed by its lack of holding cgroup_mutex. However, after Michal
      explained the reason[0] to me, I realized that it already holds the
      namespace_sem. I believe this intricacy could also confuse others, so it
      would be advisable to include an annotation for clarification.
      
      After we replace the cgroup_mutex with RCU read lock, if current doesn't
      hold the namespace_sem, the root cgroup will be NULL. So let's add a
      WARN_ON_ONCE() for it.
      
      [0]. https://lore.kernel.org/bpf/afdnpo3jz2ic2ampud7swd6so5carkilts2mkygcaw67vbw6yh@5b5mncf7qyetSigned-off-by: default avatarYafang Shao <laoar.shao@gmail.com>
      Cc: Michal Koutny <mkoutny@suse.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      0008454e
    • Yafang Shao's avatar
      cgroup: Eliminate the need for cgroup_mutex in proc_cgroup_show() · 9067d900
      Yafang Shao authored
      The cgroup root_list is already RCU-safe. Therefore, we can replace the
      cgroup_mutex with the RCU read lock in some particular paths. This change
      will be particularly beneficial for frequent operations, such as
      `cat /proc/self/cgroup`, in a cgroup1-based container environment.
      
      I did stress tests with this change, as outlined below
      (with CONFIG_PROVE_RCU_LIST enabled):
      
      - Continuously mounting and unmounting named cgroups in some tasks,
        for example:
      
        cgrp_name=$1
        while true
        do
            mount -t cgroup -o none,name=$cgrp_name none /$cgrp_name
            umount /$cgrp_name
        done
      
      - Continuously triggering proc_cgroup_show() in some tasks concurrently,
        for example:
        while true; do cat /proc/self/cgroup > /dev/null; done
      
      They can ran successfully after implementing this change, with no RCU
      warnings in dmesg.
      Signed-off-by: default avatarYafang Shao <laoar.shao@gmail.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      9067d900
    • Yafang Shao's avatar
      cgroup: Make operations on the cgroup root_list RCU safe · d23b5c57
      Yafang Shao authored
      At present, when we perform operations on the cgroup root_list, we must
      hold the cgroup_mutex, which is a relatively heavyweight lock. In reality,
      we can make operations on this list RCU-safe, eliminating the need to hold
      the cgroup_mutex during traversal. Modifications to the list only occur in
      the cgroup root setup and destroy paths, which should be infrequent in a
      production environment. In contrast, traversal may occur frequently.
      Therefore, making it RCU-safe would be beneficial.
      Signed-off-by: default avatarYafang Shao <laoar.shao@gmail.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      d23b5c57
    • Yafang Shao's avatar
      cgroup: Remove unnecessary list_empty() · 96a2b48e
      Yafang Shao authored
      The root hasn't been removed from the root_list, so the list can't be NULL.
      However, if it had been removed, attempting to destroy it once more is not
      possible. Let's replace this with WARN_ON_ONCE() for clarity.
      Signed-off-by: default avatarYafang Shao <laoar.shao@gmail.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      96a2b48e
    • Linus Torvalds's avatar
      Merge tag 'input-for-v6.7-rc0' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input · a12deb44
      Linus Torvalds authored
      Pull input updates from Dmitry Torokhov:
      
       - a number of input drivers has been converted to use facilities
         provided by the device core to instantiate driver-specific attributes
         instead of using devm_device_add_group() and similar APIs
      
       - platform input devices have been converted to use remove() callback
         returning void
      
       - a fix for use-after-free when tearing down a Synaptics RMI device
      
       - a few flexible arrays in input structures have been annotated with
         __counted_by to help hardening efforts
      
       - handling of vddio supply in cyttsp5 driver
      
       - other miscellaneous fixups
      
      * tag 'input-for-v6.7-rc0' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: (86 commits)
        Input: walkera0701 - use module_parport_driver macro to simplify the code
        Input: synaptics-rmi4 - fix use after free in rmi_unregister_function()
        dt-bindings: input: fsl,scu-key: Document wakeup-source
        Input: cyttsp5 - add handling for vddio regulator
        dt-bindings: input: cyttsp5: document vddio-supply
        Input: tegra-kbc - use device_get_match_data()
        Input: Annotate struct ff_device with __counted_by
        Input: axp20x-pek - avoid needless newline removal
        Input: mt - annotate struct input_mt with __counted_by
        Input: leds - annotate struct input_leds with __counted_by
        Input: evdev - annotate struct evdev_client with __counted_by
        Input: synaptics-rmi4 - replace deprecated strncpy
        Input: wm97xx-core - convert to platform remove callback returning void
        Input: wm831x-ts - convert to platform remove callback returning void
        Input: ti_am335x_tsc - convert to platform remove callback returning void
        Input: sun4i-ts - convert to platform remove callback returning void
        Input: stmpe-ts - convert to platform remove callback returning void
        Input: pcap_ts - convert to platform remove callback returning void
        Input: mc13783_ts - convert to platform remove callback returning void
        Input: mainstone-wm97xx - convert to platform remove callback returning void
        ...
      a12deb44
    • Linus Torvalds's avatar
      Merge tag 'for-6.7-rc1-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux · ace92fd9
      Linus Torvalds authored
      Pull more i2c updates from Wolfram Sang:
       "This contains one patch which slipped through the cracks (iproc), a
        core sanitizing improvement as the new memdup_array_user() helper went
        upstream (i2c-dev), and two driver bugfixes (designware, cp2615)"
      
      * tag 'for-6.7-rc1-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
        i2c: cp2615: Fix 'assignment to __be16' warning
        i2c: dev: copy userspace array safely
        i2c: designware: Disable TX_EMPTY irq while waiting for block length byte
        i2c: iproc: handle invalid slave state
      ace92fd9
    • Linus Torvalds's avatar
      Merge tag 'linux-watchdog-6.7-rc1' of git://www.linux-watchdog.org/linux-watchdog · 12418ece
      Linus Torvalds authored
      Pull watchdog updates from Wim Van Sebroeck:
      
       - add support for Amlogic C3 and S4 SoCs
      
       - add IT8613 ID
      
       - add MSM8226 and MSM8974 compatibles
      
       - other small fixes and improvements
      
      * tag 'linux-watchdog-6.7-rc1' of git://www.linux-watchdog.org/linux-watchdog: (24 commits)
        dt-bindings: watchdog: Add support for Amlogic C3 and S4 SoCs
        watchdog: mlx-wdt: Parameter desctiption warning fix
        watchdog: aspeed: Add support for aspeed,reset-mask DT property
        dt-bindings: watchdog: aspeed-wdt: Add aspeed,reset-mask property
        watchdog: apple: Deactivate on suspend
        dt-bindings: watchdog: qcom-wdt: Add MSM8226 and MSM8974 compatibles
        dt-bindings: watchdog: fsl-imx7ulp-wdt: Add 'fsl,ext-reset-output'
        wdog: imx7ulp: Enable wdog int_en bit for watchdog any reset
        drivers: watchdog: marvell_gti: Program the max_hw_heartbeat_ms
        drivers: watchdog: marvell_gti: fix zero pretimeout handling
        watchdog: marvell_gti: Replace of_platform.h with explicit includes
        watchdog: imx_sc_wdt: continue if the wdog already enabled
        watchdog: st_lpc: Use device_get_match_data()
        watchdog: wdat_wdt: Add timeout value as a param in ping method
        watchdog: gpio_wdt: Make use of device properties
        sbsa_gwdt: Calculate timeout with 64-bit math
        watchdog: ixp4xx: Make sure restart always works
        watchdog: it87_wdt: add IT8613 ID
        watchdog: marvell_gti_wdt: Fix error code in probe()
        Watchdog: marvell_gti_wdt: Remove redundant dev_err_probe() for platform_get_irq()
        ...
      12418ece
    • Linus Torvalds's avatar
      Merge tag 'pwm/for-6.7-rc1' of... · f3bfe643
      Linus Torvalds authored
      Merge tag 'pwm/for-6.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/thierry.reding/linux-pwm
      
      Pull pwm updates from Thierry Reding:
       "This contains a few fixes and a bunch of cleanups, a lot of which is
        in preparation for Uwe's character device support that may be ready in
        time for the next merge window"
      
      * tag 'pwm/for-6.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/thierry.reding/linux-pwm: (37 commits)
        pwm: samsung: Document new member .channel in struct samsung_pwm_chip
        pwm: bcm2835: Add support for suspend/resume
        pwm: brcmstb: Checked clk_prepare_enable() return value
        pwm: brcmstb: Utilize appropriate clock APIs in suspend/resume
        pwm: pxa: Explicitly include correct DT includes
        pwm: cros-ec: Simplify using devm_pwmchip_add() and dev_err_probe()
        pwm: samsung: Consistently use the same name for driver data
        pwm: vt8500: Simplify using devm functions
        pwm: sprd: Simplify using devm_pwmchip_add() and dev_err_probe()
        pwm: sprd: Provide a helper to cast a chip to driver data
        pwm: spear: Simplify using devm functions
        pwm: mtk-disp: Simplify using devm_pwmchip_add()
        pwm: imx-tpm: Simplify using devm functions
        pwm: brcmstb: Simplify using devm functions
        pwm: bcm2835: Simplify using devm functions
        pwm: bcm-iproc: Simplify using devm functions
        pwm: Adapt sysfs API documentation to reality
        pwm: dwc: add PWM bit unset in get_state call
        pwm: dwc: make timer clock configurable
        pwm: dwc: split pci out of core driver
        ...
      f3bfe643
    • Linus Torvalds's avatar
      Merge tag 'iommu-updates-v6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu · 4bbdb725
      Linus Torvalds authored
      Pull iommu updates from Joerg Roedel:
       "Core changes:
         - Make default-domains mandatory for all IOMMU drivers
         - Remove group refcounting
         - Add generic_single_device_group() helper and consolidate drivers
         - Cleanup map/unmap ops
         - Scaling improvements for the IOVA rcache depot
         - Convert dart & iommufd to the new domain_alloc_paging()
      
        ARM-SMMU:
         - Device-tree binding update:
             - Add qcom,sm7150-smmu-v2 for Adreno on SM7150 SoC
         - SMMUv2:
             - Support for Qualcomm SDM670 (MDSS) and SM7150 SoCs
         - SMMUv3:
             - Large refactoring of the context descriptor code to move the CD
               table into the master, paving the way for '->set_dev_pasid()'
               support on non-SVA domains
         - Minor cleanups to the SVA code
      
        Intel VT-d:
         - Enable debugfs to dump domain attached to a pasid
         - Remove an unnecessary inline function
      
        AMD IOMMU:
         - Initial patches for SVA support (not complete yet)
      
        S390 IOMMU:
         - DMA-API conversion and optimized IOTLB flushing
      
        And some smaller fixes and improvements"
      
      * tag 'iommu-updates-v6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (102 commits)
        iommu/dart: Remove the force_bypass variable
        iommu/dart: Call apple_dart_finalize_domain() as part of alloc_paging()
        iommu/dart: Convert to domain_alloc_paging()
        iommu/dart: Move the blocked domain support to a global static
        iommu/dart: Use static global identity domains
        iommufd: Convert to alloc_domain_paging()
        iommu/vt-d: Use ops->blocked_domain
        iommu/vt-d: Update the definition of the blocking domain
        iommu: Move IOMMU_DOMAIN_BLOCKED global statics to ops->blocked_domain
        Revert "iommu/vt-d: Remove unused function"
        iommu/amd: Remove DMA_FQ type from domain allocation path
        iommu: change iommu_map_sgtable to return signed values
        iommu/virtio: Add __counted_by for struct viommu_request and use struct_size()
        iommu/vt-d: debugfs: Support dumping a specified page table
        iommu/vt-d: debugfs: Create/remove debugfs file per {device, pasid}
        iommu/vt-d: debugfs: Dump entry pointing to huge page
        iommu/vt-d: Remove unused function
        iommu/arm-smmu-v3-sva: Remove bond refcount
        iommu/arm-smmu-v3-sva: Remove unused iommu_sva handle
        iommu/arm-smmu-v3: Rename cdcfg to cd_table
        ...
      4bbdb725
  7. 08 Nov, 2023 17 commits
    • Linus Torvalds's avatar
      Merge tag 'nfs-for-6.7-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs · 6bc986ab
      Linus Torvalds authored
      Pull NFS client updates from Trond Myklebust:
       "Bugfixes:
      
         - SUNRPC:
             - re-probe the target RPC port after an ECONNRESET error
             - handle allocation errors from rpcb_call_async()
             - fix a use-after-free condition in rpc_pipefs
             - fix up various checks for timeouts
      
         - NFSv4.1:
             - Handle NFS4ERR_DELAY errors during session trunking
             - fix SP4_MACH_CRED protection for pnfs IO
      
         - NFSv4:
             - Ensure that we test all delegations when the server notifies
               us that it may have revoked some of them
      
        Features:
      
         - Allow knfsd processes to break out of NFS4ERR_DELAY loops when
           re-exporting NFSv4.x by setting appropriate values for the
           'delay_retrans' module parameter
      
         - nfs: Convert nfs_symlink() to use a folio"
      
      * tag 'nfs-for-6.7-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
        nfs: Convert nfs_symlink() to use a folio
        SUNRPC: Fix RPC client cleaned up the freed pipefs dentries
        NFSv4.1: fix SP4_MACH_CRED protection for pnfs IO
        SUNRPC: Add an IS_ERR() check back to where it was
        NFSv4.1: fix handling NFS4ERR_DELAY when testing for session trunking
        nfs41: drop dependency between flexfiles layout driver and NFSv3 modules
        NFSv4: fairly test all delegations on a SEQ4_ revocation
        SUNRPC: SOFTCONN tasks should time out when on the sending list
        SUNRPC: Force close the socket when a hard error is reported
        SUNRPC: Don't skip timeout checks in call_connect_status()
        SUNRPC: ECONNRESET might require a rebind
        NFSv4/pnfs: Allow layoutget to return EAGAIN for softerr mounts
        NFSv4: Add a parameter to limit the number of retries after NFS4ERR_DELAY
      6bc986ab
    • Linus Torvalds's avatar
      Merge tag 'exfat-for-6.7-rc1-part2' of... · 67c0afb6
      Linus Torvalds authored
      Merge tag 'exfat-for-6.7-rc1-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat
      
      Pull exfat updates from Namjae Jeon:
      
       - Fix an issue that exfat timestamps are not updated caused by new
         timestamp accessor function patch
      
      * tag 'exfat-for-6.7-rc1-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat:
        exfat: fix ctime is not updated
        exfat: fix setting uninitialized time to ctime/atime
      67c0afb6
    • Linus Torvalds's avatar
      Merge tag 'xfs-6.7-merge-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux · 34f76326
      Linus Torvalds authored
      Pull xfs updates from Chandan Babu:
      
       - Realtime device subsystem:
          - Cleanup usage of xfs_rtblock_t and xfs_fsblock_t data types
          - Replace open coded conversions between rt blocks and rt extents
            with calls to static inline helpers
          - Replace open coded realtime geometry compuation and macros with
            helper functions
          - CPU usage optimizations for realtime allocator
          - Misc bug fixes associated with Realtime device
      
       - Allow read operations to execute while an FICLONE ioctl is being
         serviced
      
       - Misc bug fixes:
          - Alert user when xfs_droplink() encounters an inode with a link
            count of zero
          - Handle the case where the allocator could return zero extents when
            servicing an fallocate request
      
      * tag 'xfs-6.7-merge-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (40 commits)
        xfs: allow read IO and FICLONE to run concurrently
        xfs: handle nimaps=0 from xfs_bmapi_write in xfs_alloc_file_space
        xfs: introduce protection for drop nlink
        xfs: don't look for end of extent further than necessary in xfs_rtallocate_extent_near()
        xfs: don't try redundant allocations in xfs_rtallocate_extent_near()
        xfs: limit maxlen based on available space in xfs_rtallocate_extent_near()
        xfs: return maximum free size from xfs_rtany_summary()
        xfs: invert the realtime summary cache
        xfs: simplify rt bitmap/summary block accessor functions
        xfs: simplify xfs_rtbuf_get calling conventions
        xfs: cache last bitmap block in realtime allocator
        xfs: use accessor functions for summary info words
        xfs: consolidate realtime allocation arguments
        xfs: create helpers for rtsummary block/wordcount computations
        xfs: use accessor functions for bitmap words
        xfs: create helpers for rtbitmap block/wordcount computations
        xfs: create a helper to handle logging parts of rt bitmap/summary blocks
        xfs: convert rt summary macros to helpers
        xfs: convert open-coded xfs_rtword_t pointer accesses to helper
        xfs: remove XFS_BLOCKWSIZE and XFS_BLOCKWMASK macros
        ...
      34f76326
    • Konstantin Ryabitsev's avatar
      MAINTAINERS: update lists.linuxfoundation.org migrated lists · 6d795e2a
      Konstantin Ryabitsev authored
      The mailman-2 system behind lists.linux[-]foundation.org is being
      retired, so the lists are being migrated to lists.linux.dev.
      
      Since both domains belong to LF and setting up proper forwards is
      possible, the old addresses will continue to work for a while, but all
      new patches should be sent to the new canonical addresses for each list.
      Signed-off-by: default avatarKonstantin Ryabitsev <konstantin@linuxfoundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      6d795e2a
    • Linus Torvalds's avatar
      Merge tag 's390-6.7-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux · 1995a536
      Linus Torvalds authored
      Pull more s390 updates from Vasily Gorbik:
      
       - Get rid of s390 specific use of two PTEs per 4KB page with complex
         half-used pages tracking. Using full 4KB pages for 2KB PTEs increases
         the memory footprint of page tables but drastically simplify mm code,
         removing a common blocker for common code changes and adaptations
      
       - Simplify and rework "cmma no-dat" handling. This is a follow up for
         recent fixes which prevent potential incorrect guest TLB flushes
      
       - Add perf user stack unwinding as well as USER_STACKTRACE support for
         user space built with -mbackchain compile option
      
       - Add few missing conversion from tlb_remove_table to tlb_remove_ptdesc
      
       - Fix crypto cards vanishing in a secure execution environment due to
         asynchronous errors
      
       - Avoid reporting crypto cards or queues in check-stop state as online
      
       - Fix null-ptr deference in AP bus code triggered by early config
         change via SCLP
      
       - Couple of stability improvements in AP queue interrupt handling
      
      * tag 's390-6.7-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
        s390/mm: make pte_free_tlb() similar to pXd_free_tlb()
        s390/mm: use compound page order to distinguish page tables
        s390/mm: use full 4KB page for 2KB PTE
        s390/cmma: rework no-dat handling
        s390/cmma: move arch_set_page_dat() to header file
        s390/cmma: move set_page_stable() and friends to header file
        s390/cmma: move parsing of cmma kernel parameter to early boot code
        s390/cmma: cleanup inline assemblies
        s390/ap: fix vanishing crypto cards in SE environment
        s390/zcrypt: don't report online if card or queue is in check-stop state
        s390: add USER_STACKTRACE support
        s390/perf: implement perf_callchain_user()
        s390/ap: fix AP bus crash on early config change callback invocation
        s390/ap: re-enable interrupt for AP queues
        s390/ap: rework to use irq info from ap queue status
        s390/mm: add missing conversion to use ptdescs
      1995a536
    • Linus Torvalds's avatar
      Merge tag 'rcu-fixes-v6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks · 90450a06
      Linus Torvalds authored
      Pull RCU fixes from Frederic Weisbecker:
      
       - Fix a lock inversion between scheduler and RCU introduced in
         v6.2-rc4. The scenario could trigger on any user of RCU_NOCB
         (mostly Android but also nohz_full)
      
       - Fix PF_IDLE semantic changes introduced in v6.6-rc3 breaking
         some RCU-Tasks and RCU-Tasks-Trace expectations as to what
         exactly is an idle task. This resulted in potential spurious
         stalls and warnings.
      
      * tag 'rcu-fixes-v6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks:
        rcu/tasks-trace: Handle new PF_IDLE semantics
        rcu/tasks: Handle new PF_IDLE semantics
        rcu: Introduce rcu_cpu_online()
        rcu: Break rcu_node_0 --> &rq->__lock order
      90450a06
    • Linus Torvalds's avatar
      Merge tag 'memblock-v6.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock · 447cec03
      Linus Torvalds authored
      Pull memblock update from Mike Rapoport:
       "Report failures when memblock_can_resize is not set.
      
        Numerous memblock reservations at early boot may exhaust static
        memblock.reserved array and it is unnoticed because most of the
        callers don't check memblock_reserve() return value.
      
        In this case the system will crash later, but the reason is hard to
        identify.
      
        Replace return of an error with panic() when memblock.reserved is
        exhausted before it can be resized"
      
      * tag 'memblock-v6.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock:
        memblock: report failures when memblock_can_resize is not set
      447cec03
    • Linus Torvalds's avatar
      Merge tag 'kgdb-6.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/danielt/linux · c1ef4df1
      Linus Torvalds authored
      Pull kgdb updates from Daniel Thompson:
       "Just two patches for you this time!
      
         - During a panic, flush the console before entering kgdb.
      
           This makes things a little easier to comprehend, especially if an
           NMI backtrace was triggered on all CPUs just before we enter the
           panic routines
      
         - Correcting a couple of misleading (a.k.a. plain wrong) comments"
      
      * tag 'kgdb-6.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/danielt/linux:
        kdb: Corrects comment for kdballocenv
        kgdb: Flush console before entering kgdb on panic
      c1ef4df1
    • Linus Torvalds's avatar
      Merge tag 'riscv-for-linus-6.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux · d46392bb
      Linus Torvalds authored
      Pull RISC-V updates from Palmer Dabbelt:
      
       - Support for cbo.zero in userspace
      
       - Support for CBOs on ACPI-based systems
      
       - A handful of improvements for the T-Head cache flushing ops
      
       - Support for software shadow call stacks
      
       - Various cleanups and fixes
      
      * tag 'riscv-for-linus-6.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (31 commits)
        RISC-V: hwprobe: Fix vDSO SIGSEGV
        riscv: configs: defconfig: Enable configs required for RZ/Five SoC
        riscv: errata: prefix T-Head mnemonics with th.
        riscv: put interrupt entries into .irqentry.text
        riscv: mm: Update the comment of CONFIG_PAGE_OFFSET
        riscv: Using TOOLCHAIN_HAS_ZIHINTPAUSE marco replace zihintpause
        riscv/mm: Fix the comment for swap pte format
        RISC-V: clarify the QEMU workaround in ISA parser
        riscv: correct pt_level name via pgtable_l5/4_enabled
        RISC-V: Provide pgtable_l5_enabled on rv32
        clocksource: timer-riscv: Increase rating of clock_event_device for Sstc
        clocksource: timer-riscv: Don't enable/disable timer interrupt
        lkdtm: Fix CFI_BACKWARD on RISC-V
        riscv: Use separate IRQ shadow call stacks
        riscv: Implement Shadow Call Stack
        riscv: Move global pointer loading to a macro
        riscv: Deduplicate IRQ stack switching
        riscv: VMAP_STACK overflow detection thread-safe
        RISC-V: cacheflush: Initialize CBO variables on ACPI systems
        RISC-V: ACPI: RHCT: Add function to get CBO block sizes
        ...
      d46392bb
    • Bence Csókás's avatar
      i2c: cp2615: Fix 'assignment to __be16' warning · bdba49cb
      Bence Csókás authored
      While the preamble field _is_ technically big-endian, its value is always 0x2A2A,
      which is the same in either endianness. However, to avoid generating a warning,
      we should still call `htons()` explicitly.
      Signed-off-by: default avatarBence Csókás <bence98@sch.bme.hu>
      Signed-off-by: default avatarWolfram Sang <wsa@kernel.org>
      bdba49cb
    • Philipp Stanner's avatar
      i2c: dev: copy userspace array safely · cc9c5423
      Philipp Stanner authored
      i2c-dev.c utilizes memdup_user() to copy a userspace array. This is done
      without an overflow check.
      
      Use the new wrapper memdup_array_user() to copy the array more safely.
      Suggested-by: default avatarDave Airlie <airlied@redhat.com>
      Signed-off-by: default avatarPhilipp Stanner <pstanner@redhat.com>
      Signed-off-by: default avatarWolfram Sang <wsa@kernel.org>
      cc9c5423
    • Tam Nguyen's avatar
      i2c: designware: Disable TX_EMPTY irq while waiting for block length byte · e8183fa1
      Tam Nguyen authored
      During SMBus block data read process, we have seen high interrupt rate
      because of TX_EMPTY irq status while waiting for block length byte (the
      first data byte after the address phase). The interrupt handler does not
      do anything because the internal state is kept as STATUS_WRITE_IN_PROGRESS.
      Hence, we should disable TX_EMPTY IRQ until I2C DesignWare receives
      first data byte from I2C device, then re-enable it to resume SMBus
      transaction.
      
      It takes 0.789 ms for host to receive data length from slave.
      Without the patch, i2c_dw_isr() is called 99 times by TX_EMPTY interrupt.
      And it is none after applying the patch.
      
      Cc: stable@vger.kernel.org
      Co-developed-by: default avatarChuong Tran <chuong@os.amperecomputing.com>
      Signed-off-by: default avatarChuong Tran <chuong@os.amperecomputing.com>
      Signed-off-by: default avatarTam Nguyen <tamnguyenchi@os.amperecomputing.com>
      Acked-by: default avatarJarkko Nikula <jarkko.nikula@linux.intel.com>
      Reviewed-by: default avatarSerge Semin <fancer.lancer@gmail.com>
      Signed-off-by: default avatarWolfram Sang <wsa@kernel.org>
      e8183fa1
    • Roman Bacik's avatar
      i2c: iproc: handle invalid slave state · ba15a143
      Roman Bacik authored
      Add the code to handle an invalid state when both bits S_RX_EVENT
      (indicating a transaction) and S_START_BUSY (indicating the end
      of transaction - transition of START_BUSY from 1 to 0) are set in
      the interrupt status register during a slave read.
      Signed-off-by: default avatarRoman Bacik <roman.bacik@broadcom.com>
      Fixes: 1ca1b451 ("i2c: iproc: handle Master aborted error")
      Acked-by: default avatarRay Jui <ray.jui@broadcom.com>
      Signed-off-by: default avatarWolfram Sang <wsa@kernel.org>
      ba15a143
    • Linus Torvalds's avatar
      Merge tag 'pm-6.7-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · 30523014
      Linus Torvalds authored
      Pull more power management updates from Rafael Wysocki:
       "These add new hardware support to a cpufreq driver and fix cpupower
        utility documentation:
      
         - Add support for several Qualcomm SoC versions to the Qualcomm
           cpufreq driver (Robert Marko, Varadarajan Narayanan)
      
         - Fix a reference to a removed document in the cpupower utility
           documentation (Vegard Nossum)"
      
      * tag 'pm-6.7-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
        cpufreq: qcom-nvmem: Introduce cpufreq for ipq95xx
        cpufreq: qcom-nvmem: Enable cpufreq for ipq53xx
        cpufreq: qcom-nvmem: add support for IPQ8074
        cpupower: fix reference to nonexistent document
      30523014
    • Linus Torvalds's avatar
      Merge tag 'drm-next-2023-11-07' of git://anongit.freedesktop.org/drm/drm · 25b63770
      Linus Torvalds authored
      Pull more drm updates from Dave Airlie:
       "Geert pointed out I missed the renesas reworks in my main pull, so
        this pull contains the renesas next work for atomic conversion and DT
        support.
      
        It also contains a bunch of amdgpu and some small ssd13xx fixes.
      
        renesas:
         - atomic conversion
         - DT support
      
        ssd13xx:
         - dt binding fix for ssd132x
         - Initialize ssd130x crtc_state to NULL.
      
        amdgpu:
         - Fix RAS support check
         - RAS fixes
         - MES fixes
         - SMU13 fixes
         - Contiguous memory allocation fix
         - BACO fixes
         - GPU reset fixes
         - Min power limit fixes
         - GFX11 fixes
         - USB4/TB hotplug fixes
         - ARM regression fix
         - GFX9.4.3 fixes
         - KASAN/KCSAN stack size check fixes
         - SR-IOV fixes
         - SMU14 fixes
         - PSP13 fixes
         - Display blend fixes
         - Flexible array size fixes
      
        amdkfd:
         - GPUVM fix
      
        radeon:
         - Flexible array size fixes"
      
      * tag 'drm-next-2023-11-07' of git://anongit.freedesktop.org/drm/drm: (83 commits)
        drm/amd/display: Enable fast update on blendTF change
        drm/amd/display: Fix blend LUT programming
        drm/amd/display: Program plane color setting correctly
        drm/amdgpu: Query and report boot status
        drm/amdgpu: Add psp v13 function to query boot status
        drm/amd/swsmu: remove fw version check in sw_init.
        drm/amd/swsmu: update smu v14_0_0 driver if and metrics table
        drm/amdgpu: Add C2PMSG_109/126 reg field shift/masks
        drm/amdgpu: Optimize the asic type fix code
        drm/amdgpu: fix GRBM read timeout when do mes_self_test
        drm/amdgpu: check recovery status of xgmi hive in ras_reset_error_count
        drm/amd/pm: only check sriov vf flag once when creating hwmon sysfs
        drm/amdgpu: Attach eviction fence on alloc
        drm/amdkfd: Improve amdgpu_vm_handle_moved
        drm/amd/display: Increase frame warning limit with KASAN or KCSAN in dml2
        drm/amd/display: Avoid NULL dereference of timing generator
        drm/amdkfd: Update cache info for GFX 9.4.3
        drm/amdkfd: Populate cache info for GFX 9.4.3
        drm/amdgpu: don't put MQDs in VRAM on ARM | ARM64
        drm/amdgpu/smu13: drop compute workload workaround
        ...
      25b63770
    • Linus Torvalds's avatar
      Merge tag 'regmap-fix-v6.7-merge-window' of... · eaec7c98
      Linus Torvalds authored
      Merge tag 'regmap-fix-v6.7-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap
      
      Pull regmap fix from Mark Brown:
       "One fix here, for an interaction between noinc registers and caches.
      
        If a device uses noinc registers (which is rare) then we could corrupt
        registers after the noinc register in the cache"
      
      * tag 'regmap-fix-v6.7-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap:
        regmap: prevent noinc writes from clobbering cache
      eaec7c98
    • Linus Torvalds's avatar
      Merge tag 'rproc-v6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/remoteproc/linux · b8dd631f
      Linus Torvalds authored
      Pull remoteproc updates from Bjorn Andersson:
       "Support for controlling the second core in Mediatek's SCP dual-core
        setup is introduced.
      
        Support for audio, compute and modem DSPs on Qualcomm SM6375, and the
        audio DSP in SC7180 are introduced. The peripheral NoC clock is
        dropped from MSM8996 modem DSP, as this is handled through the
        interconnect provider.
      
        In the zynqmp driver the setup for TCM memory, and device address
        translation thereof, when operating in lockstep mode is corrected.
      
        A few bug fixes and cleanups are introduces across the ST and STM32
        remoteproc drivers"
      
      * tag 'rproc-v6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/remoteproc/linux: (28 commits)
        remoteproc: st: Fix sometimes uninitialized ret in st_rproc_probe()
        remoteproc: st: Use device_get_match_data()
        remoteproc: zynqmp: Change tcm address translation method
        remoteproc: mediatek: Refactor single core check and fix retrocompatibility
        remoteproc: qcom: q6v5-mss: Remove PNoC clock from 8996 MSS
        dt-bindings: remoteproc: qcom,msm8996-mss-pil: Remove PNoC clock
        dt-bindings: remoteproc: qcom,adsp: Remove AGGRE2 clock
        remoteproc: qcom: pas: Add SM6375 MPSS
        remoteproc: qcom: pas: Add SM6375 ADSP & CDSP
        dt-bindings: remoteproc: qcom,sm6375-pas: Document remoteprocs
        dt-bindings: remoteproc: pru: Add Interrupt property
        remoteproc: qcom: pas: Add sc7180 adsp
        dt-bindings: remoteproc: qcom: sc7180-pas: Add ADSP compatible
        arm64: dts: mediatek: Update the node name of SCP rpmsg subnode
        remoteproc: zynqmp: fix TCM carveouts in lockstep mode
        remoteproc: mediatek: Refine ipi handler error message
        remoteproc: mediatek: Report watchdog crash to all cores
        remoteproc: mediatek: Handle MT8195 SCP core 1 watchdog timeout
        remoteproc: mediatek: Setup MT8195 SCP core 1 SRAM offset
        remoteproc: mediatek: Remove dependency of MT8195 SCP L2TCM power control on dual-core SCP
        ...
      b8dd631f