1. 08 Nov, 2018 12 commits
    • Waiman Long's avatar
      cpuset: Expose cpuset.cpus.subpartitions with cgroup_debug · 5cf8114d
      Waiman Long authored
      For debugging purpose, it will be useful to expose the content of the
      subparts_cpus as a read-only file to see if the code work correctly.
      However, subparts_cpus will not be used at all in most use cases. So
      adding a new cpuset file that clutters the cgroup directory may not be
      desirable.  This is now being done by using the hidden "cgroup_debug"
      kernel command line option to expose a new "cpuset.cpus.subpartitions"
      file.
      
      That option was originally used by the debug controller to expose
      itself when configured into the kernel. This is now extended to set an
      internal flag used by cgroup_addrm_files(). A new CFTYPE_DEBUG flag
      can now be used to specify that a cgroup file should only be created
      when the "cgroup_debug" option is specified.
      Signed-off-by: default avatarWaiman Long <longman@redhat.com>
      Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      5cf8114d
    • Waiman Long's avatar
      cpuset: Add documentation about the new "cpuset.sched.partition" flag · 90e92f2d
      Waiman Long authored
      The cgroup-v2.rst file is updated to document the purpose of the new
      "cpuset.sched.partition" flag and how its usage.
      Signed-off-by: default avatarWaiman Long <longman@redhat.com>
      Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      90e92f2d
    • Waiman Long's avatar
      cpuset: Use descriptive text when reading/writing cpuset.sched.partition · bb5b553c
      Waiman Long authored
      Currently, cpuset.sched.partition returns the values, 0, 1 or -1 on
      read. A person who is not familiar with the partition code may not
      understand what they mean.
      
      In order to make cpuset.sched.partition more user-friendly, it will
      now display the following descriptive text on read:
      
        "root" - A partition root (top cpuset of a partition)
        "member" - A non-root member of a partition
        "root invalid" - An invalid partition root
      
      Note that there is at least one partition in the whole cgroup hierarchy.
      The top cpuset is the root of that partition.  The rests are either a
      root if it starts a new partition or a member of a partition.
      
      The cpuset.sched.partition file will now also accept "root" and
      "member" besides 1 and 0 as valid input values. The "root invalid"
      value is internal only and cannot be written to the file.
      Suggested-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarWaiman Long <longman@redhat.com>
      Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      bb5b553c
    • Waiman Long's avatar
      cpuset: Expose cpus.effective and mems.effective on cgroup v2 root · 5776cecc
      Waiman Long authored
      Because of the fact that setting the "cpuset.sched.partition" in
      a direct child of root can remove CPUs from the root's effective CPU
      list, it makes sense to know what CPUs are left in the root cgroup for
      scheduling purpose. So the "cpuset.cpus.effective" control file is now
      exposed in the v2 cgroup root.
      
      For consistency, the "cpuset.mems.effective" control file is exposed
      as well.
      Signed-off-by: default avatarWaiman Long <longman@redhat.com>
      Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      5776cecc
    • Waiman Long's avatar
      cpuset: Make generate_sched_domains() work with partition · 0ccea8fe
      Waiman Long authored
      The generate_sched_domains() function is modified to make it work
      correctly with the newly introduced subparts_cpus mask for scheduling
      domains generation.
      Signed-off-by: default avatarWaiman Long <longman@redhat.com>
      Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      0ccea8fe
    • Waiman Long's avatar
      cpuset: Make CPU hotplug work with partition · 4b842da2
      Waiman Long authored
      When there is a cpu hotplug event (CPU online or offline), the partitions
      may need to be reconfigured and regenerated. So code is added to the
      hotplug functions to make them work with new subparts_cpus mask to
      compute the right effective_cpus for each of the affected cpusets.
      It may also change the state of a partition root from real one to an
      erroneous one or vice versa.
      Signed-off-by: default avatarWaiman Long <longman@redhat.com>
      Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      4b842da2
    • Waiman Long's avatar
      cpuset: Track cpusets that use parent's effective_cpus · 4716909c
      Waiman Long authored
      In the default hierarchy, a cpuset will use the parent's effective_cpus
      if none of the requested CPUs can be granted from the parent. That can
      be a problem if a parent is a partition root with children partition
      roots. Changes to a parent's effective_cpus list due to changes in a
      child partition root may not be properly reflected in a child cpuset
      that use parent's effective_cpus because the cpu_exclusive rule of a
      partition root will not guard against that.
      
      In order to avoid the mismatch, two new tracking variables are added to
      the cpuset structure to track if a cpuset uses parent's effective_cpus
      and the number of children cpusets that use its effective_cpus. So
      whenever cpumask changes are made to a parent, it will also check to
      see if it has other children cpusets that use its effective_cpus and
      call update_cpumasks_hier() if that is the case.
      Signed-off-by: default avatarWaiman Long <longman@redhat.com>
      Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      4716909c
    • Waiman Long's avatar
      cpuset: Add an error state to cpuset.sched.partition · 3881b861
      Waiman Long authored
      When external events like CPU offlining or user events like changing
      the cpu list of an ancestor cpuset happen, update_cpumasks_hier()
      will be called to update the effective cpus of each of the affected
      cpusets. That will then call update_parent_subparts_cpumask() if
      partitions are impacted.
      
      Currently, these events may cause update_parent_subparts_cpumask()
      to return error if none of the requested cpus are available or it will
      consume all the cpus in the parent partition root. Handling these errors
      is problematic as the states may become inconsistent.
      
      Instead of letting update_parent_subparts_cpumask() return error, a new
      error state (-1) is added to the partition_root_state flag to designate
      the fact that the partition is no longer valid. IOW, it is no longer a
      real partition root, but the CS_CPU_EXCLUSIVE flag will still be set
      as it can be changed back to a real one if favorable change happens
      later on.
      
      This new error state is set internally and user cannot write this new
      value to "cpuset.sched.partition".
      Signed-off-by: default avatarWaiman Long <longman@redhat.com>
      Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      3881b861
    • Waiman Long's avatar
      cpuset: Add new v2 cpuset.sched.partition flag · ee8dde0c
      Waiman Long authored
      A new cpuset.sched.partition boolean flag is added to cpuset v2.
      This new flag, if set, indicates that the cgroup is the root of a
      new scheduling domain or partition that includes itself and all its
      descendants except those that are scheduling domain roots themselves
      and their descendants.
      
      With this new flag, one can directly create as many partitions as
      necessary without ever using the v1 trick of turning off load balancing
      in specific cpusets to create partitions as a side effect.
      
      This new flag is owned by the parent and will cause the CPUs in the
      cpuset to be removed from the effective CPUs of its parent.
      
      This is implemented internally by adding a new subparts_cpus mask that
      holds the CPUs belonging to child partitions so that:
      
              subparts_cpus | effective_cpus = cpus_allowed
              subparts_cpus & effective_cpus = 0
      
      This new flag can only be turned on in a cpuset if its parent is a
      partition root itself. The state of this flag cannot be changed if the
      cpuset has children.
      
      Once turned on, further changes to "cpuset.cpus" is allowed as long
      as there is at least one CPU left that can be granted from the parent
      and a child partition root cannot use up all the CPUs in the parent's
      effective_cpus.
      Signed-off-by: default avatarWaiman Long <longman@redhat.com>
      Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      ee8dde0c
    • Waiman Long's avatar
      cpuset: Simply allocation and freeing of cpumasks · bf92370c
      Waiman Long authored
      The previous commit introduces a new subparts_cpus mask into the cpuset
      data structure and a new tmpmasks structure.  Managing the allocation
      and freeing of those cpumasks is becoming more complex.
      
      So a number of helper functions are added to simplify and streamline
      the management of those cpumasks. To make it simple, all the cpumasks
      are now pre-cleared on allocation.
      Signed-off-by: default avatarWaiman Long <longman@redhat.com>
      Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      bf92370c
    • Waiman Long's avatar
      cpuset: Define data structures to support scheduling partition · 58b74842
      Waiman Long authored
      >From a cpuset point of view, a scheduling partition is a group of
      cpusets with their own set of exclusive CPUs that are not shared by
      other tasks outside the scheduling partition.
      
      In the legacy hierarchy, scheduling partitions are supported indirectly
      via the right use of the load balancing and the exclusive CPUs flag
      which is not intuitive and can be hard to use.
      
      To fully support the concept of scheduling partitions in the default
      hierarchy, we need to add some new field into the cpuset structure as
      well as a new tmpmasks structure that is used to pre-allocate cpumasks
      at the top level cpuset functions to avoid memory allocation in inner
      functions as memory allocation failure in those inner functions may
      cause a cpuset to have inconsistent states.
      Signed-off-by: default avatarWaiman Long <longman@redhat.com>
      Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      58b74842
    • Waiman Long's avatar
      cpuset: Enable cpuset controller in default hierarchy · 4ec22e9c
      Waiman Long authored
      Given the fact that thread mode had been merged into 4.14, it is now
      time to enable cpuset to be used in the default hierarchy (cgroup v2)
      as it is clearly threaded.
      
      The cpuset controller had experienced feature creep since its
      introduction more than a decade ago. Besides the core cpus and mems
      control files to limit cpus and memory nodes, there are a bunch of
      additional features that can be controlled from the userspace. Some of
      the features are of doubtful usefulness and may not be actively used.
      
      This patch enables cpuset controller in the default hierarchy with
      a minimal set of features, namely just the cpus and mems and their
      effective_* counterparts.  We can certainly add more features to the
      default hierarchy in the future if there is a real user need for them
      later on.
      
      Alternatively, with the unified hiearachy, it may make more sense
      to move some of those additional cpuset features, if desired, to
      memory controller or may be to the cpu controller instead of staying
      with cpuset.
      Signed-off-by: default avatarWaiman Long <longman@redhat.com>
      Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      4ec22e9c
  2. 05 Nov, 2018 1 commit
  3. 04 Nov, 2018 9 commits
    • Linus Torvalds's avatar
      Linux 4.20-rc1 · 65102238
      Linus Torvalds authored
      65102238
    • Linus Torvalds's avatar
      Merge tag 'tags/upstream-4.20-rc1' of git://git.infradead.org/linux-ubifs · 42bd06e9
      Linus Torvalds authored
      Pull UBIFS updates from Richard Weinberger:
      
       - Full filesystem authentication feature, UBIFS is now able to have the
         whole filesystem structure authenticated plus user data encrypted and
         authenticated.
      
       - Minor cleanups
      
      * tag 'tags/upstream-4.20-rc1' of git://git.infradead.org/linux-ubifs: (26 commits)
        ubifs: Remove unneeded semicolon
        Documentation: ubifs: Add authentication whitepaper
        ubifs: Enable authentication support
        ubifs: Do not update inode size in-place in authenticated mode
        ubifs: Add hashes and HMACs to default filesystem
        ubifs: authentication: Authenticate super block node
        ubifs: Create hash for default LPT
        ubfis: authentication: Authenticate master node
        ubifs: authentication: Authenticate LPT
        ubifs: Authenticate replayed journal
        ubifs: Add auth nodes to garbage collector journal head
        ubifs: Add authentication nodes to journal
        ubifs: authentication: Add hashes to index nodes
        ubifs: Add hashes to the tree node cache
        ubifs: Create functions to embed a HMAC in a node
        ubifs: Add helper functions for authentication support
        ubifs: Add separate functions to init/crc a node
        ubifs: Format changes for authentication support
        ubifs: Store read superblock node
        ubifs: Drop write_node
        ...
      42bd06e9
    • Linus Torvalds's avatar
      Merge tag 'nfs-for-4.20-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs · 4710e789
      Linus Torvalds authored
      Pull NFS client bugfixes from Trond Myklebust:
       "Highlights include:
      
        Bugfix:
         - Fix build issues on architectures that don't provide 64-bit cmpxchg
      
        Cleanups:
         - Fix a spelling mistake"
      
      * tag 'nfs-for-4.20-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
        NFS: fix spelling mistake, EACCESS -> EACCES
        SUNRPC: Use atomic(64)_t for seq_send(64)
      4710e789
    • Linus Torvalds's avatar
      Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 35e74524
      Linus Torvalds authored
      Pull more timer updates from Thomas Gleixner:
       "A set of commits for the new C-SKY architecture timers"
      
      * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        dt-bindings: timer: gx6605s SOC timer
        clocksource/drivers/c-sky: Add gx6605s SOC system timer
        dt-bindings: timer: C-SKY Multi-processor timer
        clocksource/drivers/c-sky: Add C-SKY SMP timer
      35e74524
    • Linus Torvalds's avatar
      Merge tag 'ntb-4.20' of git://github.com/jonmason/ntb · 04578e84
      Linus Torvalds authored
      Pull NTB updates from Jon Mason:
       "Fairly minor changes and bug fixes:
      
        NTB IDT thermal changes and hook into hwmon, ntb_netdev clean-up of
        private struct, and a few bug fixes"
      
      * tag 'ntb-4.20' of git://github.com/jonmason/ntb:
        ntb: idt: Alter the driver info comments
        ntb: idt: Discard temperature sensor IRQ handler
        ntb: idt: Add basic hwmon sysfs interface
        ntb: idt: Alter temperature read method
        ntb_netdev: Simplify remove with client device drvdata
        NTB: transport: Try harder to alloc an aligned MW buffer
        ntb: ntb_transport: Mark expected switch fall-throughs
        ntb: idt: Set PCIe bus address to BARLIMITx
        NTB: ntb_hw_idt: replace IS_ERR_OR_NULL with regular NULL checks
        ntb: intel: fix return value for ndev_vec_mask()
        ntb_netdev: fix sleep time mismatch
      04578e84
    • Linus Torvalds's avatar
      Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 71e56028
      Linus Torvalds authored
      Pull scheduler fixes from Ingo Molnar:
       "A memory (under-)allocation fix and a comment fix"
      
      * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        sched/topology: Fix off by one bug
        sched/rt: Update comment in pick_next_task_rt()
      71e56028
    • Linus Torvalds's avatar
      Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 601a8807
      Linus Torvalds authored
      Pull x86 fixes from Ingo Molnar:
       "A number of fixes and some late updates:
      
         - make in_compat_syscall() behavior on x86-32 similar to other
           platforms, this touches a number of generic files but is not
           intended to impact non-x86 platforms.
      
         - objtool fixes
      
         - PAT preemption fix
      
         - paravirt fixes/cleanups
      
         - cpufeatures updates for new instructions
      
         - earlyprintk quirk
      
         - make microcode version in sysfs world-readable (it is already
           world-readable in procfs)
      
         - minor cleanups and fixes"
      
      * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        compat: Cleanup in_compat_syscall() callers
        x86/compat: Adjust in_compat_syscall() to generic code under !COMPAT
        objtool: Support GCC 9 cold subfunction naming scheme
        x86/numa_emulation: Fix uniform-split numa emulation
        x86/paravirt: Remove unused _paravirt_ident_32
        x86/mm/pat: Disable preemption around __flush_tlb_all()
        x86/paravirt: Remove GPL from pv_ops export
        x86/traps: Use format string with panic() call
        x86: Clean up 'sizeof x' => 'sizeof(x)'
        x86/cpufeatures: Enumerate MOVDIR64B instruction
        x86/cpufeatures: Enumerate MOVDIRI instruction
        x86/earlyprintk: Add a force option for pciserial device
        objtool: Support per-function rodata sections
        x86/microcode: Make revision and processor flags world-readable
      601a8807
    • Linus Torvalds's avatar
      Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 01897f3e
      Linus Torvalds authored
      Pull perf updates and fixes from Ingo Molnar:
       "These are almost all tooling updates: 'perf top', 'perf trace' and
        'perf script' fixes and updates, an UAPI header sync with the merge
        window versions, license marker updates, much improved Sparc support
        from David Miller, and a number of fixes"
      
      * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (66 commits)
        perf intel-pt/bts: Calculate cpumode for synthesized samples
        perf intel-pt: Insert callchain context into synthesized callchains
        perf tools: Don't clone maps from parent when synthesizing forks
        perf top: Start display thread earlier
        tools headers uapi: Update linux/if_link.h header copy
        tools headers uapi: Update linux/netlink.h header copy
        tools headers: Sync the various kvm.h header copies
        tools include uapi: Update linux/mmap.h copy
        perf trace beauty: Use the mmap flags table generated from headers
        perf beauty: Wire up the mmap flags table generator to the Makefile
        perf beauty: Add a generator for MAP_ mmap's flag constants
        tools include uapi: Update asound.h copy
        tools arch uapi: Update asm-generic/unistd.h and arm64 unistd.h copies
        tools include uapi: Update linux/fs.h copy
        perf callchain: Honour the ordering of PERF_CONTEXT_{USER,KERNEL,etc}
        perf cs-etm: Correct CPU mode for samples
        perf unwind: Take pgoff into account when reporting elf to libdwfl
        perf top: Do not use overwrite mode by default
        perf top: Allow disabling the overwrite mode
        perf trace: Beautify mount's first pathname arg
        ...
      01897f3e
    • Linus Torvalds's avatar
      Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · e9ebc215
      Linus Torvalds authored
      Pull irq fixes from Ingo Molnar:
       "An irqchip driver fix and a memory (over-)allocation fix"
      
      * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        irqchip/irq-mvebu-sei: Fix a NULL vs IS_ERR() bug in probe function
        irq/matrix: Fix memory overallocation
      e9ebc215
  4. 03 Nov, 2018 18 commits