1. 07 Sep, 2022 1 commit
  2. 06 Sep, 2022 2 commits
    • Tejun Heo's avatar
      cgroup: Remove CFTYPE_PRESSURE · 8a693f77
      Tejun Heo authored
      CFTYPE_PRESSURE is used to flag PSI related files so that they are not
      created if PSI is disabled during boot. It's a bit weird to use a generic
      flag to mark a specific file type. Let's instead move the PSI files into its
      own cftypes array and add/rm them conditionally. This is a bit more code but
      cleaner.
      
      No userland visible changes.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      8a693f77
    • Tejun Heo's avatar
      cgroup: Improve cftype add/rm error handling · 0083d27b
      Tejun Heo authored
      Let's track whether a cftype is currently added or not using a new flag
      __CFTYPE_ADDED so that duplicate operations can be failed safely and
      consistently allow using empty cftypes.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      0083d27b
  3. 04 Sep, 2022 12 commits
    • Waiman Long's avatar
      kselftest/cgroup: Add cpuset v2 partition root state test · a8c52eba
      Waiman Long authored
      Add a test script test_cpuset_prs.sh with a helper program wait_inotify
      for exercising the cpuset v2 partition root state code.
      Signed-off-by: default avatarWaiman Long <longman@redhat.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      a8c52eba
    • Waiman Long's avatar
      cgroup/cpuset: Update description of cpuset.cpus.partition in cgroup-v2.rst · 8cbfdc24
      Waiman Long authored
      Update Documentation/admin-guide/cgroup-v2.rst on the newly introduced
      "isolated" cpuset partition type as well as other changes made in other
      cpuset patches.
      Signed-off-by: default avatarWaiman Long <longman@redhat.com>
      Reported-by: default avatarkernel test robot <lkp@intel.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      8cbfdc24
    • Waiman Long's avatar
      cgroup/cpuset: Make partition invalid if cpumask change violates exclusivity rule · d7c8142d
      Waiman Long authored
      Currently, changes in "cpust.cpus" of a partition root is not allowed if
      it violates the sibling cpu exclusivity rule when the check is done
      in the validate_change() function. That is inconsistent with the
      other cpuset changes that are always allowed but may make a partition
      invalid.
      
      Update the cpuset code to allow cpumask change even if it violates the
      sibling cpu exclusivity rule, but invalidate the partition instead
      just like the other changes. However, other sibling partitions with
      conflicting cpumask will also be invalidated in order to not violating
      the exclusivity rule. This behavior is specific to this partition
      rule violation.
      
      Note that a previous commit has made sibling cpu exclusivity rule check
      the last check of validate_change(). So if -EINVAL is returned, we can
      be sure that sibling cpu exclusivity rule violation is the only rule
      that is broken.
      Signed-off-by: default avatarWaiman Long <longman@redhat.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      d7c8142d
    • Waiman Long's avatar
      cgroup/cpuset: Relocate a code block in validate_change() · 74027a65
      Waiman Long authored
      This patch moves down the exclusive cpu and memory check in
      validate_change(). There is no functional change.
      Signed-off-by: default avatarWaiman Long <longman@redhat.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      74027a65
    • Waiman Long's avatar
      cgroup/cpuset: Show invalid partition reason string · 7476a636
      Waiman Long authored
      There are a number of different reasons which can cause a partition to
      become invalid. A user seeing an invalid partition may not know exactly
      why. To help user to get a better understanding of the underlying reason,
      The cpuset.cpus.partition control file, when read, will now report the
      reason why a partition become invalid. When a partition does become
      invalid, reading the control file will show "root invalid (<reason>)"
      where <reason> is a string that describes why the partition is invalid.
      Signed-off-by: default avatarWaiman Long <longman@redhat.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      7476a636
    • Waiman Long's avatar
      cgroup/cpuset: Add a new isolated cpus.partition type · f28e2244
      Waiman Long authored
      Cpuset v1 uses the sched_load_balance control file to determine if load
      balancing should be enabled.  Cpuset v2 gets rid of sched_load_balance
      as its use may require disabling load balancing at cgroup root.
      
      For workloads that require very low latency like DPDK, the latency
      jitters caused by periodic load balancing may exceed the desired
      latency limit.
      
      When cpuset v2 is in use, the only way to avoid this latency cost is to
      use the "isolcpus=" kernel boot option to isolate a set of CPUs. After
      the kernel boot, however, there is no way to add or remove CPUs from
      this isolated set. For workloads that are more dynamic in nature, that
      means users have to provision enough CPUs for the worst case situation
      resulting in excess idle CPUs.
      
      To address this issue for cpuset v2, a new cpuset.cpus.partition type
      "isolated" is added which allows the creation of a cpuset partition
      without load balancing. This will allow system administrators to
      dynamically adjust the size of isolated partition to the current need
      of the workload without rebooting the system.
      Signed-off-by: default avatarWaiman Long <longman@redhat.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      f28e2244
    • Waiman Long's avatar
      cgroup/cpuset: Relax constraints to partition & cpus changes · f0af1bfc
      Waiman Long authored
      Currently, enabling a partition root is only allowed if all the
      constraints of a valid partition are satisfied. Even changes to
      "cpuset.cpus" may not be allowed in some cases. Moreover, there are
      limits to changes made to a parent cpuset if it is a valid partition
      root. This is contrary to the general cgroup v2 philosophy.
      
      This patch relaxes the constraints of changing the state of "cpuset.cpus"
      and "cpuset.cpus.partition". Now all valid changes ("member" or "root")
      to "cpuset.cpus.partition" are allowed even if there are child cpusets
      underneath it.
      
      Trying to make a cpuset a partition root, however, will cause its state
      to become invalid if the following constraints of a valid partition
      root are not satisfied.
      
       1) The "cpuset.cpus" is non-empty and exclusive.
       2) The parent cpuset is a valid partition root.
       3) The "cpuset.cpus" overlaps parent's "cpuset.cpus".
      
      Similarly, almost all changes to "cpuset.cpus" are allowed with the
      exception that if the underlying CS_CPU_EXCLUSIVE flag is set, the
      exclusivity rule will still apply.
      Signed-off-by: default avatarWaiman Long <longman@redhat.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      f0af1bfc
    • Waiman Long's avatar
      cgroup/cpuset: Allow no-task partition to have empty cpuset.cpus.effective · e2d59900
      Waiman Long authored
      Currently, a partition root cannot have empty "cpuset.cpus.effective".
      As a result, a parent partition root cannot distribute out all its
      CPUs to child partitions with no CPUs left. However in most cases,
      there shouldn't be any tasks associated with intermediate nodes of the
      default hierarchy. So the current rule is too restrictive and can waste
      valuable CPU resource.
      
      To address this issue, we are now allowing a partition to have empty
      "cpuset.cpus.effective" as long as it has no task. Since cpuset is
      threaded, no-internal-process rule does not apply. So it is possible
      to have tasks in a partition root with child sub-partitions even though
      that should be uncommon.
      
      A parent partition with no task can now have all its CPUs distributed out
      to its child partitions. The top cpuset always have some house-keeping
      tasks running and so its list of effective cpu can't be empty.
      
      Once a partition with empty "cpuset.cpus.effective" is formed, no
      new task can be moved into it until "cpuset.cpus.effective" becomes
      non-empty.
      Signed-off-by: default avatarWaiman Long <longman@redhat.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      e2d59900
    • Waiman Long's avatar
      cgroup/cpuset: Miscellaneous cleanups & add helper functions · 18065ebe
      Waiman Long authored
      The partition root state (PRS) macro names do not currently match the
      external names. Change them to match the external names and add helper
      functions to read or change the state.
      
      Shorten the cpuset argument of update_parent_subparts_cpumask() to cs
      to match other cpuset functions.
      
      Remove the new_prs argument from notify_partition_change() as the
      cs->partition_root_state has already been set to new_prs before it
      is called.
      
      There is no functional change.
      Signed-off-by: default avatarWaiman Long <longman@redhat.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      18065ebe
    • Waiman Long's avatar
      cgroup/cpuset: Enable update_tasks_cpumask() on top_cpuset · ec5fbdfb
      Waiman Long authored
      Previously, update_tasks_cpumask() is not supposed to be called with
      top cpuset. With cpuset partition that takes CPUs away from the top
      cpuset, adjusting the cpus_mask of the tasks in the top cpuset is
      necessary. Percpu kthreads, however, are ignored.
      
      Fixes: ee8dde0c ("cpuset: Add new v2 cpuset.sched.partition flag")
      Signed-off-by: default avatarWaiman Long <longman@redhat.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      ec5fbdfb
    • Josh Don's avatar
      cgroup: add pids.peak interface for pids controller · 5251c6c4
      Josh Don authored
      pids.peak tracks the high watermark of usage for number of pids. This
      helps give a better baseline on which to set pids.max. Polling
      pids.current isn't really feasible, since it would potentially miss
      short-lived spikes.
      
      This interface is analogous to memory.peak.
      Signed-off-by: default avatarJosh Don <joshdon@google.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      5251c6c4
    • Tejun Heo's avatar
      cgroup: Remove data-race around cgrp_dfl_visible · dc79ec1b
      Tejun Heo authored
      There's a seemingly harmless data-race around cgrp_dfl_visible detected by
      kernel concurrency sanitizer. Let's remove it by throwing WRITE/READ_ONCE at
      it.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Reported-by: default avatarAbhishek Shah <abhishek.shah@columbia.edu>
      Cc: Gabriel Ryan <gabe@cs.columbia.edu>
      Reviewed-by: default avatarChristian Brauner (Microsoft) <brauner@kernel.org>
      Link: https://lore.kernel.org/netdev/20220819072256.fn7ctciefy4fc4cu@wittgenstein/
      dc79ec1b
  4. 29 Aug, 2022 1 commit
  5. 26 Aug, 2022 5 commits
  6. 25 Aug, 2022 1 commit
  7. 23 Aug, 2022 1 commit
  8. 17 Aug, 2022 1 commit
    • Tejun Heo's avatar
      cgroup: Fix threadgroup_rwsem <-> cpus_read_lock() deadlock · 4f7e7236
      Tejun Heo authored
      Bringing up a CPU may involve creating and destroying tasks which requires
      read-locking threadgroup_rwsem, so threadgroup_rwsem nests inside
      cpus_read_lock(). However, cpuset's ->attach(), which may be called with
      thredagroup_rwsem write-locked, also wants to disable CPU hotplug and
      acquires cpus_read_lock(), leading to a deadlock.
      
      Fix it by guaranteeing that ->attach() is always called with CPU hotplug
      disabled and removing cpus_read_lock() call from cpuset_attach().
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Reviewed-and-tested-by: default avatarImran Khan <imran.f.khan@oracle.com>
      Reported-and-tested-by: default avatarXuewen Yan <xuewen.yan@unisoc.com>
      Fixes: 05c7b7a9 ("cgroup/cpuset: Fix a race between cpuset_attach() and cpu hotplug")
      Cc: stable@vger.kernel.org # v5.17+
      4f7e7236
  9. 15 Aug, 2022 4 commits
  10. 14 Aug, 2022 10 commits
    • Linus Torvalds's avatar
      Linux 6.0-rc1 · 568035b0
      Linus Torvalds authored
      568035b0
    • Yury Norov's avatar
      radix-tree: replace gfp.h inclusion with gfp_types.h · 9f162193
      Yury Norov authored
      Radix tree header includes gfp.h for __GFP_BITS_SHIFT only. Now we
      have gfp_types.h for this.
      
      Fixes powerpc allmodconfig build:
      
         In file included from include/linux/nodemask.h:97,
                          from include/linux/mmzone.h:17,
                          from include/linux/gfp.h:7,
                          from include/linux/radix-tree.h:12,
                          from include/linux/idr.h:15,
                          from include/linux/kernfs.h:12,
                          from include/linux/sysfs.h:16,
                          from include/linux/kobject.h:20,
                          from include/linux/pci.h:35,
                          from arch/powerpc/kernel/prom_init.c:24:
         include/linux/random.h: In function 'add_latent_entropy':
      >> include/linux/random.h:25:46: error: 'latent_entropy' undeclared (first use in this function); did you mean 'add_latent_entropy'?
            25 |         add_device_randomness((const void *)&latent_entropy, sizeof(latent_entropy));
               |                                              ^~~~~~~~~~~~~~
               |                                              add_latent_entropy
         include/linux/random.h:25:46: note: each undeclared identifier is reported only once for each function it appears in
      Reported-by: default avatarkernel test robot <lkp@intel.com>
      CC: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
      CC: Andrew Morton <akpm@linux-foundation.org>
      CC: Jason A. Donenfeld <Jason@zx2c4.com>
      Signed-off-by: default avatarYury Norov <yury.norov@gmail.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9f162193
    • Linus Torvalds's avatar
      Merge tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · 74cbb480
      Linus Torvalds authored
      Pull vfs lseek fix from Al Viro:
       "Fix proc_reg_llseek() breakage. Always had been possible if somebody
        left NULL ->proc_lseek, became a practical issue now"
      
      * tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
        take care to handle NULL ->proc_lseek()
      74cbb480
    • Al Viro's avatar
      take care to handle NULL ->proc_lseek() · 3f61631d
      Al Viro authored
      Easily done now, just by clearing FMODE_LSEEK in ->f_mode
      during proc_reg_open() for such entries.
      
      Fixes: 868941b1 "fs: remove no_llseek"
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      3f61631d
    • Linus Torvalds's avatar
      Merge tag 'for-linus-6.0-rc1b-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip · 5d6a0f4d
      Linus Torvalds authored
      Pull more xen updates from Juergen Gross:
      
       - fix the handling of the "persistent grants" feature negotiation
         between Xen blkfront and Xen blkback drivers
      
       - a cleanup of xen.config and adding xen.config to Xen section in
         MAINTAINERS
      
       - support HVMOP_set_evtchn_upcall_vector, which is more compliant to
         "normal" interrupt handling than the global callback used up to now
      
       - further small cleanups
      
      * tag 'for-linus-6.0-rc1b-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
        MAINTAINERS: add xen config fragments to XEN HYPERVISOR sections
        xen: remove XEN_SCRUB_PAGES in xen.config
        xen/pciback: Fix comment typo
        xen/xenbus: fix return type in xenbus_file_read()
        xen-blkfront: Apply 'feature_persistent' parameter when connect
        xen-blkback: Apply 'feature_persistent' parameter when connect
        xen-blkback: fix persistent grants negotiation
        x86/xen: Add support for HVMOP_set_evtchn_upcall_vector
      5d6a0f4d
    • Linus Torvalds's avatar
      Merge tag 'perf-tools-fixes-for-v6.0-2022-08-13' of... · 96f86ff0
      Linus Torvalds authored
      Merge tag 'perf-tools-fixes-for-v6.0-2022-08-13' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux
      
      Pull more perf tool updates from Arnaldo Carvalho de Melo:
      
       - 'perf c2c' now supports ARM64, adjust its output to cope with
         differences with what is in x86_64. Now go find false sharing on
         ARM64 (at least Neoverse) as well!
      
       - Refactor the JSON processing, making the output more compact and thus
         reducing the size of the resulting perf binary
      
       - Improvements for 'perf offcpu' profiling, including tracking child
         processes
      
       - Update Intel JSON metrics and events files for broadwellde,
         broadwellx, cascadelakex, haswellx, icelakex, ivytown, jaketown,
         knightslanding, sapphirerapids, skylakex and snowridgex
      
       - Add 'perf stat' JSON output and a 'perf test' entry for it
      
       - Ignore memfd and anonymous mmap events if jitdump present
      
       - Refactor 'perf test' shell tests allowing subdirs
      
       - Fix an error handling path in 'parse_perf_probe_command()'
      
       - Fixes for the guest Intel PT tracing patchkit in the 1st batch of
         this merge window
      
       - Print debuginfod queries if -v option is used, to explain delays in
         processing when debuginfo servers are enabled to fetch DSOs with
         richer symbol tables
      
       - Improve error message for 'perf record -p not_existing_pid'
      
       - Fix openssl and libbpf feature detection
      
       - Add PMU pai_crypto event description for IBM z16 on 'perf list'
      
       - Fix typos and duplicated words on comments in various places
      
      * tag 'perf-tools-fixes-for-v6.0-2022-08-13' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: (81 commits)
        perf test: Refactor shell tests allowing subdirs
        perf vendor events: Update events for snowridgex
        perf vendor events: Update events and metrics for skylakex
        perf vendor events: Update metrics for sapphirerapids
        perf vendor events: Update events for knightslanding
        perf vendor events: Update metrics for jaketown
        perf vendor events: Update metrics for ivytown
        perf vendor events: Update events and metrics for icelakex
        perf vendor events: Update events and metrics for haswellx
        perf vendor events: Update events and metrics for cascadelakex
        perf vendor events: Update events and metrics for broadwellx
        perf vendor events: Update metrics for broadwellde
        perf jevents: Fold strings optimization
        perf jevents: Compress the pmu_events_table
        perf metrics: Copy entire pmu_event in find metric
        perf pmu-events: Hide the pmu_events
        perf pmu-events: Don't assume pmu_event is an array
        perf pmu-events: Move test events/metrics to JSON
        perf test: Use full metric resolution
        perf pmu-events: Hide pmu_events_map
        ...
      96f86ff0
    • Linus Torvalds's avatar
      Merge tag 'powerpc-6.0-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux · d785610f
      Linus Torvalds authored
      Pull powerpc fixes from Michael Ellerman:
      
       - Ensure we never emit lwarx with EH=1 on 32-bit, because some 32-bit
         CPUs trap on it rather than ignoring it as they should.
      
       - Fix ftrace when building with clang, which was broken by some
         refactoring.
      
       - A couple of other minor fixes.
      
      Thanks to Christophe Leroy, Naveen N.  Rao, Nick Desaulniers, Ondrej
      Mosnacek, Pali Rohár, Russell Currey, and Segher Boessenkool.
      
      * tag 'powerpc-6.0-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
        powerpc/kexec: Fix build failure from uninitialised variable
        powerpc/ppc-opcode: Fix PPC_RAW_TW()
        powerpc64/ftrace: Fix ftrace for clang builds
        powerpc: Make eh value more explicit when using lwarx
        powerpc: Don't hide eh field of lwarx behind a macro
        powerpc: Fix eh field when calling lwarx on PPC32
      d785610f
    • Linus Torvalds's avatar
      Merge tag 'pull-work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · aea23e7c
      Linus Torvalds authored
      Pull /proc/mounts fix from Al Viro:
       "Fix for /proc/mounts escaping - escape the '#' character too"
      
      * tag 'pull-work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
        vfs: escape hash as well
      aea23e7c
    • Linus Torvalds's avatar
      Merge tag '5.20-rc-smb3-client-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6 · 332019e2
      Linus Torvalds authored
      Pull more cifs updates from Steve French:
      
       - two fixes for stable, one for a lock length miscalculation, and
         another fixes a lease break timeout bug
      
       - improvement to handle leases, allows the close timeout to be
         configured more safely
      
       - five restructuring/cleanup patches
      
      * tag '5.20-rc-smb3-client-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6:
        cifs: Do not access tcon->cfids->cfid directly from is_path_accessible
        cifs: Add constructor/destructors for tcon->cfid
        SMB3: fix lease break timeout when multiple deferred close handles for the same file.
        smb3: allow deferred close timeout to be configurable
        cifs: Do not use tcon->cfid directly, use the cfid we get from open_cached_dir
        cifs: Move cached-dir functions into a separate file
        cifs: Remove {cifs,nfs}_fscache_release_page()
        cifs: fix lock length calculation
      332019e2
    • David Howells's avatar
      afs: Enable multipage folio support · 8549a263
      David Howells authored
      Enable multipage folio support for the afs filesystem.
      
      Support has already been implemented in netfslib, fscache and cachefiles
      and in most of afs, but I've waited for Matthew Wilcox's latest folio
      changes.
      
      Note that it does require a change to afs_write_begin() to return the
      correct subpage.  This is a "temporary" change as we're working on
      getting rid of the need for ->write_begin() and ->write_end()
      completely, at least as far as network filesystems are concerned - but
      it doesn't prevent afs from making use of the capability.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Acked-by: default avatarMatthew Wilcox (Oracle) <willy@infradead.org>
      Tested-by: kafs-testing@auristor.com
      Cc: Marc Dionne <marc.dionne@auristor.com>
      Cc: linux-afs@lists.infradead.org
      Link: https://lore.kernel.org/lkml/2274528.1645833226@warthog.procyon.org.uk/Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8549a263
  11. 13 Aug, 2022 2 commits