1. 20 Oct, 2022 12 commits
    • Christian Brauner's avatar
      internal: add may_write_xattr() · 56851bc9
      Christian Brauner authored
      Split out the generic checks whether an inode allows writing xattrs. Since
      security.* and system.* xattrs don't have any restrictions and we're going
      to split out posix acls into a dedicated api we will use this helper to
      check whether we can write posix acls.
      Signed-off-by: default avatarChristian Brauner (Microsoft) <brauner@kernel.org>
      56851bc9
    • Christian Brauner's avatar
      evm: add post set acl hook · a56df5d5
      Christian Brauner authored
      The security_inode_post_setxattr() hook is used by security modules to
      update their own security.* xattrs. Consequently none of the security
      modules operate on posix acls. So we don't need an additional security
      hook when post setting posix acls.
      
      However, the integrity subsystem wants to be informed about posix acl
      changes in order to reset the EVM status flag.
      
      -> evm_inode_post_setxattr()
         -> evm_update_evmxattr()
            -> evm_calc_hmac()
               -> evm_calc_hmac_or_hash()
      
      and evm_cacl_hmac_or_hash() walks the global list of protected xattr
      names evm_config_xattrnames. This global list can be modified via
      /sys/security/integrity/evm/evm_xattrs. The write to "evm_xattrs" is
      restricted to security.* xattrs and the default xattrs in
      evm_config_xattrnames only contains security.* xattrs as well.
      
      So the actual value for posix acls is currently completely irrelevant
      for evm during evm_inode_post_setxattr() and frankly it should stay that
      way in the future to not cause the vfs any more headaches. But if the
      actual posix acl values matter then evm shouldn't operate on the binary
      void blob and try to hack around in the uapi struct anyway. Instead it
      should then in the future add a dedicated hook which takes a struct
      posix_acl argument passing the posix acls in the proper vfs format.
      
      For now it is sufficient to make evm_inode_post_set_acl() a wrapper
      around evm_inode_post_setxattr() not passing any actual values down.
      This will cause the hashes to be updated as before.
      Reviewed-by: default avatarPaul Moore <paul@paul-moore.com>
      Signed-off-by: default avatarChristian Brauner (Microsoft) <brauner@kernel.org>
      a56df5d5
    • Christian Brauner's avatar
      integrity: implement get and set acl hook · e61b135f
      Christian Brauner authored
      The current way of setting and getting posix acls through the generic
      xattr interface is error prone and type unsafe. The vfs needs to
      interpret and fixup posix acls before storing or reporting it to
      userspace. Various hacks exist to make this work. The code is hard to
      understand and difficult to maintain in it's current form. Instead of
      making this work by hacking posix acls through xattr handlers we are
      building a dedicated posix acl api around the get and set inode
      operations. This removes a lot of hackiness and makes the codepaths
      easier to maintain. A lot of background can be found in [1].
      
      So far posix acls were passed as a void blob to the security and
      integrity modules. Some of them like evm then proceed to interpret the
      void pointer and convert it into the kernel internal struct posix acl
      representation to perform their integrity checking magic. This is
      obviously pretty problematic as that requires knowledge that only the
      vfs is guaranteed to have and has lead to various bugs. Add a proper
      security hook for setting posix acls and pass down the posix acls in
      their appropriate vfs format instead of hacking it through a void
      pointer stored in the uapi format.
      
      I spent considerate time in the security module and integrity
      infrastructure and audited all codepaths. EVM is the only part that
      really has restrictions based on the actual posix acl values passed
      through it (e.g., i_mode). Before this dedicated hook EVM used to translate
      from the uapi posix acl format sent to it in the form of a void pointer
      into the vfs format. This is not a good thing. Instead of hacking around in
      the uapi struct give EVM the posix acls in the appropriate vfs format and
      perform sane permissions checks that mirror what it used to to in the
      generic xattr hook.
      
      IMA doesn't have any restrictions on posix acls. When posix acls are
      changed it just wants to update its appraisal status to trigger an EVM
      revalidation.
      
      The removal of posix acls is equivalent to passing NULL to the posix set
      acl hooks. This is the same as before through the generic xattr api.
      
      Link: https://lore.kernel.org/all/20220801145520.1532837-1-brauner@kernel.org [1]
      Acked-by: Paul Moore <paul@paul-moore.com> (LSM)
      Signed-off-by: default avatarChristian Brauner (Microsoft) <brauner@kernel.org>
      e61b135f
    • Christian Brauner's avatar
      smack: implement get, set and remove acl hook · 44faac01
      Christian Brauner authored
      The current way of setting and getting posix acls through the generic
      xattr interface is error prone and type unsafe. The vfs needs to
      interpret and fixup posix acls before storing or reporting it to
      userspace. Various hacks exist to make this work. The code is hard to
      understand and difficult to maintain in it's current form. Instead of
      making this work by hacking posix acls through xattr handlers we are
      building a dedicated posix acl api around the get and set inode
      operations. This removes a lot of hackiness and makes the codepaths
      easier to maintain. A lot of background can be found in [1].
      
      So far posix acls were passed as a void blob to the security and
      integrity modules. Some of them like evm then proceed to interpret the
      void pointer and convert it into the kernel internal struct posix acl
      representation to perform their integrity checking magic. This is
      obviously pretty problematic as that requires knowledge that only the
      vfs is guaranteed to have and has lead to various bugs. Add a proper
      security hook for setting posix acls and pass down the posix acls in
      their appropriate vfs format instead of hacking it through a void
      pointer stored in the uapi format.
      
      I spent considerate time in the security module infrastructure and
      audited all codepaths. Smack has no restrictions based on the posix
      acl values passed through it. The capability hook doesn't need to be
      called either because it only has restrictions on security.* xattrs. So
      these all becomes very simple hooks for smack.
      
      Link: https://lore.kernel.org/all/20220801145520.1532837-1-brauner@kernel.org [1]
      Reviewed-by: default avatarCasey Schaufler <casey@schaufler-ca.com>
      Reviewed-by: default avatarPaul Moore <paul@paul-moore.com>
      Signed-off-by: default avatarChristian Brauner (Microsoft) <brauner@kernel.org>
      44faac01
    • Christian Brauner's avatar
      selinux: implement get, set and remove acl hook · 1bdeb218
      Christian Brauner authored
      The current way of setting and getting posix acls through the generic
      xattr interface is error prone and type unsafe. The vfs needs to
      interpret and fixup posix acls before storing or reporting it to
      userspace. Various hacks exist to make this work. The code is hard to
      understand and difficult to maintain in it's current form. Instead of
      making this work by hacking posix acls through xattr handlers we are
      building a dedicated posix acl api around the get and set inode
      operations. This removes a lot of hackiness and makes the codepaths
      easier to maintain. A lot of background can be found in [1].
      
      So far posix acls were passed as a void blob to the security and
      integrity modules. Some of them like evm then proceed to interpret the
      void pointer and convert it into the kernel internal struct posix acl
      representation to perform their integrity checking magic. This is
      obviously pretty problematic as that requires knowledge that only the
      vfs is guaranteed to have and has lead to various bugs. Add a proper
      security hook for setting posix acls and pass down the posix acls in
      their appropriate vfs format instead of hacking it through a void
      pointer stored in the uapi format.
      
      I spent considerate time in the security module infrastructure and
      audited all codepaths. SELinux has no restrictions based on the posix
      acl values passed through it. The capability hook doesn't need to be
      called either because it only has restrictions on security.* xattrs. So
      these are all fairly simply hooks for SELinux.
      
      Link: https://lore.kernel.org/all/20220801145520.1532837-1-brauner@kernel.org [1]
      Acked-by: default avatarPaul Moore <paul@paul-moore.com>
      Signed-off-by: default avatarChristian Brauner (Microsoft) <brauner@kernel.org>
      1bdeb218
    • Christian Brauner's avatar
      security: add get, remove and set acl hook · 72b3897e
      Christian Brauner authored
      The current way of setting and getting posix acls through the generic
      xattr interface is error prone and type unsafe. The vfs needs to
      interpret and fixup posix acls before storing or reporting it to
      userspace. Various hacks exist to make this work. The code is hard to
      understand and difficult to maintain in it's current form. Instead of
      making this work by hacking posix acls through xattr handlers we are
      building a dedicated posix acl api around the get and set inode
      operations. This removes a lot of hackiness and makes the codepaths
      easier to maintain. A lot of background can be found in [1].
      
      So far posix acls were passed as a void blob to the security and
      integrity modules. Some of them like evm then proceed to interpret the
      void pointer and convert it into the kernel internal struct posix acl
      representation to perform their integrity checking magic. This is
      obviously pretty problematic as that requires knowledge that only the
      vfs is guaranteed to have and has lead to various bugs. Add a proper
      security hook for setting posix acls and pass down the posix acls in
      their appropriate vfs format instead of hacking it through a void
      pointer stored in the uapi format.
      
      In the next patches we implement the hooks for the few security modules
      that do actually have restrictions on posix acls.
      
      Link: https://lore.kernel.org/all/20220801145520.1532837-1-brauner@kernel.org [1]
      Acked-by: default avatarPaul Moore <paul@paul-moore.com>
      Signed-off-by: default avatarChristian Brauner (Microsoft) <brauner@kernel.org>
      72b3897e
    • Christian Brauner's avatar
      9p: implement set acl method · 079da629
      Christian Brauner authored
      The current way of setting and getting posix acls through the generic
      xattr interface is error prone and type unsafe. The vfs needs to
      interpret and fixup posix acls before storing or reporting it to
      userspace. Various hacks exist to make this work. The code is hard to
      understand and difficult to maintain in it's current form. Instead of
      making this work by hacking posix acls through xattr handlers we are
      building a dedicated posix acl api around the get and set inode
      operations. This removes a lot of hackiness and makes the codepaths
      easier to maintain. A lot of background can be found in [1].
      
      In order to build a type safe posix api around get and set acl we need
      all filesystem to implement get and set acl.
      
      So far 9p implemented a ->get_inode_acl() operation that didn't require
      access to the dentry in order to allow (limited) permission checking via
      posix acls in the vfs. Now that we have get and set acl inode operations
      that take a dentry argument we can give 9p get and set acl inode
      operations.
      
      This is mostly a light refactoring of the codepaths currently used in 9p
      posix acl xattr handler. After we have fully implemented the posix acl
      api and switched the vfs over to it, the 9p specific posix acl xattr
      handler and associated code will be removed.
      
      Note, until the vfs has been switched to the new posix acl api this
      patch is a non-functional change.
      
      Link: https://lore.kernel.org/all/20220801145520.1532837-1-brauner@kernel.org [1]
      Signed-off-by: default avatarChristian Brauner (Microsoft) <brauner@kernel.org>
      079da629
    • Christian Brauner's avatar
      9p: implement get acl method · 6cd4d4e8
      Christian Brauner authored
      The current way of setting and getting posix acls through the generic
      xattr interface is error prone and type unsafe. The vfs needs to
      interpret and fixup posix acls before storing or reporting it to
      userspace. Various hacks exist to make this work. The code is hard to
      understand and difficult to maintain in it's current form. Instead of
      making this work by hacking posix acls through xattr handlers we are
      building a dedicated posix acl api around the get and set inode
      operations. This removes a lot of hackiness and makes the codepaths
      easier to maintain. A lot of background can be found in [1].
      
      In order to build a type safe posix api around get and set acl we need
      all filesystem to implement get and set acl.
      
      So far 9p implemented a ->get_inode_acl() operation that didn't require
      access to the dentry in order to allow (limited) permission checking via
      posix acls in the vfs. Now that we have get and set acl inode operations
      that take a dentry argument we can give 9p get and set acl inode
      operations.
      
      This is mostly a refactoring of the codepaths currently used in 9p posix
      acl xattr handler. After we have fully implemented the posix acl api and
      switched the vfs over to it, the 9p specific posix acl xattr handler and
      associated code will be removed.
      
      Note, until the vfs has been switched to the new posix acl api this
      patch is a non-functional change.
      
      Link: https://lore.kernel.org/all/20220801145520.1532837-1-brauner@kernel.org [1]
      Signed-off-by: default avatarChristian Brauner (Microsoft) <brauner@kernel.org>
      6cd4d4e8
    • Christian Brauner's avatar
      cifs: implement set acl method · dc1af4c4
      Christian Brauner authored
      The current way of setting and getting posix acls through the generic
      xattr interface is error prone and type unsafe. The vfs needs to
      interpret and fixup posix acls before storing or reporting it to
      userspace. Various hacks exist to make this work. The code is hard to
      understand and difficult to maintain in it's current form. Instead of
      making this work by hacking posix acls through xattr handlers we are
      building a dedicated posix acl api around the get and set inode
      operations. This removes a lot of hackiness and makes the codepaths
      easier to maintain. A lot of background can be found in [1].
      
      In order to build a type safe posix api around get and set acl we need
      all filesystem to implement get and set acl.
      
      So far cifs wasn't able to implement get and set acl inode operations
      because it needs access to the dentry. Now that we extended the set acl
      inode operation to take a dentry argument and added a new get acl inode
      operation that takes a dentry argument we can let cifs implement get and
      set acl inode operations.
      
      This is mostly a copy and paste of the codepaths currently used in cifs'
      posix acl xattr handler. After we have fully implemented the posix acl
      api and switched the vfs over to it, the cifs specific posix acl xattr
      handler and associated code will be removed and the code duplication
      will go away.
      
      Note, until the vfs has been switched to the new posix acl api this
      patch is a non-functional change.
      
      Link: https://lore.kernel.org/all/20220801145520.1532837-1-brauner@kernel.org [1]
      Signed-off-by: default avatarChristian Brauner (Microsoft) <brauner@kernel.org>
      dc1af4c4
    • Christian Brauner's avatar
      cifs: implement get acl method · bd9684b0
      Christian Brauner authored
      The current way of setting and getting posix acls through the generic
      xattr interface is error prone and type unsafe. The vfs needs to
      interpret and fixup posix acls before storing or reporting it to
      userspace. Various hacks exist to make this work. The code is hard to
      understand and difficult to maintain in it's current form. Instead of
      making this work by hacking posix acls through xattr handlers we are
      building a dedicated posix acl api around the get and set inode
      operations. This removes a lot of hackiness and makes the codepaths
      easier to maintain. A lot of background can be found in [1].
      
      In order to build a type safe posix api around get and set acl we need
      all filesystem to implement get and set acl.
      
      So far cifs wasn't able to implement get and set acl inode operations
      because it needs access to the dentry. Now that we extended the set acl
      inode operation to take a dentry argument and added a new get acl inode
      operation that takes a dentry argument we can let cifs implement get and
      set acl inode operations.
      
      This is mostly a copy and paste of the codepaths currently used in cifs'
      posix acl xattr handler. After we have fully implemented the posix acl
      api and switched the vfs over to it, the cifs specific posix acl xattr
      handler and associated code will be removed and the code duplication
      will go away.
      
      Note, until the vfs has been switched to the new posix acl api this
      patch is a non-functional change.
      
      Link: https://lore.kernel.org/all/20220801145520.1532837-1-brauner@kernel.org [1]
      Signed-off-by: default avatarChristian Brauner (Microsoft) <brauner@kernel.org>
      bd9684b0
    • Christian Brauner's avatar
      fs: add new get acl method · 7420332a
      Christian Brauner authored
      The current way of setting and getting posix acls through the generic
      xattr interface is error prone and type unsafe. The vfs needs to
      interpret and fixup posix acls before storing or reporting it to
      userspace. Various hacks exist to make this work. The code is hard to
      understand and difficult to maintain in it's current form. Instead of
      making this work by hacking posix acls through xattr handlers we are
      building a dedicated posix acl api around the get and set inode
      operations. This removes a lot of hackiness and makes the codepaths
      easier to maintain. A lot of background can be found in [1].
      
      Since some filesystem rely on the dentry being available to them when
      setting posix acls (e.g., 9p and cifs) they cannot rely on the old get
      acl inode operation to retrieve posix acl and need to implement their
      own custom handlers because of that.
      
      In a previous patch we renamed the old get acl inode operation to
      ->get_inode_acl(). We decided to rename it and implement a new one since
      ->get_inode_acl() is called generic_permission() and inode_permission()
      both of which can be called during an filesystem's ->permission()
      handler. So simply passing a dentry argument to ->get_acl() would have
      amounted to also having to pass a dentry argument to ->permission(). We
      avoided that change.
      
      This adds a new ->get_acl() inode operations which takes a dentry
      argument which filesystems such as 9p, cifs, and overlayfs can implement
      to get posix acls.
      
      Link: https://lore.kernel.org/all/20220801145520.1532837-1-brauner@kernel.org [1]
      Signed-off-by: default avatarChristian Brauner (Microsoft) <brauner@kernel.org>
      7420332a
    • Christian Brauner's avatar
      fs: rename current get acl method · cac2f8b8
      Christian Brauner authored
      The current way of setting and getting posix acls through the generic
      xattr interface is error prone and type unsafe. The vfs needs to
      interpret and fixup posix acls before storing or reporting it to
      userspace. Various hacks exist to make this work. The code is hard to
      understand and difficult to maintain in it's current form. Instead of
      making this work by hacking posix acls through xattr handlers we are
      building a dedicated posix acl api around the get and set inode
      operations. This removes a lot of hackiness and makes the codepaths
      easier to maintain. A lot of background can be found in [1].
      
      The current inode operation for getting posix acls takes an inode
      argument but various filesystems (e.g., 9p, cifs, overlayfs) need access
      to the dentry. In contrast to the ->set_acl() inode operation we cannot
      simply extend ->get_acl() to take a dentry argument. The ->get_acl()
      inode operation is called from:
      
      acl_permission_check()
      -> check_acl()
         -> get_acl()
      
      which is part of generic_permission() which in turn is part of
      inode_permission(). Both generic_permission() and inode_permission() are
      called in the ->permission() handler of various filesystems (e.g.,
      overlayfs). So simply passing a dentry argument to ->get_acl() would
      amount to also having to pass a dentry argument to ->permission(). We
      should avoid this unnecessary change.
      
      So instead of extending the existing inode operation rename it from
      ->get_acl() to ->get_inode_acl() and add a ->get_acl() method later that
      passes a dentry argument and which filesystems that need access to the
      dentry can implement instead of ->get_inode_acl(). Filesystems like cifs
      which allow setting and getting posix acls but not using them for
      permission checking during lookup can simply not implement
      ->get_inode_acl().
      
      This is intended to be a non-functional change.
      
      Link: https://lore.kernel.org/all/20220801145520.1532837-1-brauner@kernel.org [1]
      Suggested-by/Inspired-by: Christoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarChristian Brauner (Microsoft) <brauner@kernel.org>
      cac2f8b8
  2. 19 Oct, 2022 2 commits
    • Christian Brauner's avatar
      fs: pass dentry to set acl method · 138060ba
      Christian Brauner authored
      The current way of setting and getting posix acls through the generic
      xattr interface is error prone and type unsafe. The vfs needs to
      interpret and fixup posix acls before storing or reporting it to
      userspace. Various hacks exist to make this work. The code is hard to
      understand and difficult to maintain in it's current form. Instead of
      making this work by hacking posix acls through xattr handlers we are
      building a dedicated posix acl api around the get and set inode
      operations. This removes a lot of hackiness and makes the codepaths
      easier to maintain. A lot of background can be found in [1].
      
      Since some filesystem rely on the dentry being available to them when
      setting posix acls (e.g., 9p and cifs) they cannot rely on set acl inode
      operation. But since ->set_acl() is required in order to use the generic
      posix acl xattr handlers filesystems that do not implement this inode
      operation cannot use the handler and need to implement their own
      dedicated posix acl handlers.
      
      Update the ->set_acl() inode method to take a dentry argument. This
      allows all filesystems to rely on ->set_acl().
      
      As far as I can tell all codepaths can be switched to rely on the dentry
      instead of just the inode. Note that the original motivation for passing
      the dentry separate from the inode instead of just the dentry in the
      xattr handlers was because of security modules that call
      security_d_instantiate(). This hook is called during
      d_instantiate_new(), d_add(), __d_instantiate_anon(), and
      d_splice_alias() to initialize the inode's security context and possibly
      to set security.* xattrs. Since this only affects security.* xattrs this
      is completely irrelevant for posix acls.
      
      Link: https://lore.kernel.org/all/20220801145520.1532837-1-brauner@kernel.org [1]
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarChristian Brauner (Microsoft) <brauner@kernel.org>
      138060ba
    • Christian Brauner's avatar
      orangefs: rework posix acl handling when creating new filesystem objects · 4053d250
      Christian Brauner authored
      When creating new filesytem objects orangefs used to create posix acls
      after it had created and inserted a new inode. This made it necessary to
      all posix_acl_chmod() on the newly created inode in case the mode of the
      inode would be changed by the posix acls.
      
      Instead of doing it this way calculate the correct mode directly before
      actually creating the inode. So we first create posix acls, then pass
      the mode that posix acls mandate into the orangefs getattr helper and
      calculate the correct mode. This is needed so we can simply change
      posix_acl_chmod() to take a dentry instead of an inode argument in the
      next patch.
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarChristian Brauner (Microsoft) <brauner@kernel.org>
      4053d250
  3. 16 Oct, 2022 10 commits
    • Linus Torvalds's avatar
      Linux 6.1-rc1 · 9abf2313
      Linus Torvalds authored
      9abf2313
    • Linus Torvalds's avatar
      Merge tag 'random-6.1-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random · f1947d7c
      Linus Torvalds authored
      Pull more random number generator updates from Jason Donenfeld:
       "This time with some large scale treewide cleanups.
      
        The intent of this pull is to clean up the way callers fetch random
        integers. The current rules for doing this right are:
      
         - If you want a secure or an insecure random u64, use get_random_u64()
      
         - If you want a secure or an insecure random u32, use get_random_u32()
      
           The old function prandom_u32() has been deprecated for a while
           now and is just a wrapper around get_random_u32(). Same for
           get_random_int().
      
         - If you want a secure or an insecure random u16, use get_random_u16()
      
         - If you want a secure or an insecure random u8, use get_random_u8()
      
         - If you want secure or insecure random bytes, use get_random_bytes().
      
           The old function prandom_bytes() has been deprecated for a while
           now and has long been a wrapper around get_random_bytes()
      
         - If you want a non-uniform random u32, u16, or u8 bounded by a
           certain open interval maximum, use prandom_u32_max()
      
           I say "non-uniform", because it doesn't do any rejection sampling
           or divisions. Hence, it stays within the prandom_*() namespace, not
           the get_random_*() namespace.
      
           I'm currently investigating a "uniform" function for 6.2. We'll see
           what comes of that.
      
        By applying these rules uniformly, we get several benefits:
      
         - By using prandom_u32_max() with an upper-bound that the compiler
           can prove at compile-time is ≤65536 or ≤256, internally
           get_random_u16() or get_random_u8() is used, which wastes fewer
           batched random bytes, and hence has higher throughput.
      
         - By using prandom_u32_max() instead of %, when the upper-bound is
           not a constant, division is still avoided, because
           prandom_u32_max() uses a faster multiplication-based trick instead.
      
         - By using get_random_u16() or get_random_u8() in cases where the
           return value is intended to indeed be a u16 or a u8, we waste fewer
           batched random bytes, and hence have higher throughput.
      
        This series was originally done by hand while I was on an airplane
        without Internet. Later, Kees and I worked on retroactively figuring
        out what could be done with Coccinelle and what had to be done
        manually, and then we split things up based on that.
      
        So while this touches a lot of files, the actual amount of code that's
        hand fiddled is comfortably small"
      
      * tag 'random-6.1-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random:
        prandom: remove unused functions
        treewide: use get_random_bytes() when possible
        treewide: use get_random_u32() when possible
        treewide: use get_random_{u8,u16}() when possible, part 2
        treewide: use get_random_{u8,u16}() when possible, part 1
        treewide: use prandom_u32_max() when possible, part 2
        treewide: use prandom_u32_max() when possible, part 1
      f1947d7c
    • Linus Torvalds's avatar
      Merge tag 'perf-tools-for-v6.1-2-2022-10-16' of... · 8636df94
      Linus Torvalds authored
      Merge tag 'perf-tools-for-v6.1-2-2022-10-16' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux
      
      Pull more perf tools updates from Arnaldo Carvalho de Melo:
      
       - Use BPF CO-RE (Compile Once, Run Everywhere) to support old kernels
         when using bperf (perf BPF based counters) with cgroups.
      
       - Support HiSilicon PCIe Performance Monitoring Unit (PMU), that
         monitors bandwidth, latency, bus utilization and buffer occupancy.
      
         Documented in Documentation/admin-guide/perf/hisi-pcie-pmu.rst.
      
       - User space tasks can migrate between CPUs, so when tracing selected
         CPUs, system-wide sideband is still needed, fix it in the setup of
         Intel PT on hybrid systems.
      
       - Fix metricgroups title message in 'perf list', it should state that
         the metrics groups are to be used with the '-M' option, not '-e'.
      
       - Sync the msr-index.h copy with the kernel sources, adding support for
         using "AMD64_TSC_RATIO" in filter expressions in 'perf trace' as well
         as decoding it when printing the MSR tracepoint arguments.
      
       - Fix program header size and alignment when generating a JIT ELF in
         'perf inject'.
      
       - Add multiple new Intel PT 'perf test' entries, including a jitdump
         one.
      
       - Fix the 'perf test' entries for 'perf stat' CSV and JSON output when
         running on PowerPC due to an invalid topology number in that arch.
      
       - Fix the 'perf test' for arm_coresight failures on the ARM Juno
         system.
      
       - Fix the 'perf test' attr entry for PERF_FORMAT_LOST, adding this
         option to the or expression expected in the intercepted
         perf_event_open() syscall.
      
       - Add missing condition flags ('hs', 'lo', 'vc', 'vs') for arm64 in the
         'perf annotate' asm parser.
      
       - Fix 'perf mem record -C' option processing, it was being chopped up
         when preparing the underlying 'perf record -e mem-events' and thus
         being ignored, requiring using '-- -C CPUs' as a workaround.
      
       - Improvements and tidy ups for 'perf test' shell infra.
      
       - Fix Intel PT information printing segfault in uClibc, where a NULL
         format was being passed to fprintf.
      
      * tag 'perf-tools-for-v6.1-2-2022-10-16' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: (23 commits)
        tools arch x86: Sync the msr-index.h copy with the kernel sources
        perf auxtrace arm64: Add support for parsing HiSilicon PCIe Trace packet
        perf auxtrace arm64: Add support for HiSilicon PCIe Tune and Trace device driver
        perf auxtrace arm: Refactor event list iteration in auxtrace_record__init()
        perf tests stat+json_output: Include sanity check for topology
        perf tests stat+csv_output: Include sanity check for topology
        perf intel-pt: Fix system_wide dummy event for hybrid
        perf intel-pt: Fix segfault in intel_pt_print_info() with uClibc
        perf test: Fix attr tests for PERF_FORMAT_LOST
        perf test: test_intel_pt.sh: Add 9 tests
        perf inject: Fix GEN_ELF_TEXT_OFFSET for jit
        perf test: test_intel_pt.sh: Add jitdump test
        perf test: test_intel_pt.sh: Tidy some alignment
        perf test: test_intel_pt.sh: Print a message when skipping kernel tracing
        perf test: test_intel_pt.sh: Tidy some perf record options
        perf test: test_intel_pt.sh: Fix return checking again
        perf: Skip and warn on unknown format 'configN' attrs
        perf list: Fix metricgroups title message
        perf mem: Fix -C option behavior for perf mem record
        perf annotate: Add missing condition flags for arm64
        ...
      8636df94
    • Linus Torvalds's avatar
      Merge tag 'kbuild-fixes-v6.1' of... · 2df76606
      Linus Torvalds authored
      Merge tag 'kbuild-fixes-v6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
      
      Pull Kbuild fixes from Masahiro Yamada:
      
       - Fix CONFIG_DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT=y compile error for the
         combination of Clang >= 14 and GAS <= 2.35.
      
       - Drop vmlinux.bz2 from the rpm package as it just annoyingly increased
         the package size.
      
       - Fix modpost error under build environments using musl.
      
       - Make *.ll files keep value names for easier debugging
      
       - Fix single directory build
      
       - Prevent RISC-V from selecting the broken DWARF5 support when Clang
         and GAS are used together.
      
      * tag 'kbuild-fixes-v6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
        lib/Kconfig.debug: Add check for non-constant .{s,u}leb128 support to DWARF5
        kbuild: fix single directory build
        kbuild: add -fno-discard-value-names to cmd_cc_ll_c
        scripts/clang-tools: Convert clang-tidy args to list
        modpost: put modpost options before argument
        kbuild: Stop including vmlinux.bz2 in the rpm's
        Kconfig.debug: add toolchain checks for DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT
        Kconfig.debug: simplify the dependency of DEBUG_INFO_DWARF4/5
      2df76606
    • Linus Torvalds's avatar
      Merge tag 'clk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux · 2fcd8f10
      Linus Torvalds authored
      Pull more clk updates from Stephen Boyd:
       "This is the final part of the clk patches for this merge window.
      
        The clk rate range series needed another week to fully bake. Maxime
        fixed the bug that broke clk notifiers and prevented this from being
        included in the first pull request. He also added a unit test on top
        to make sure it doesn't break so easily again. The majority of the
        series fixes up how the clk_set_rate_*() APIs work, particularly
        around when the rate constraints are dropped and how they move around
        when reparenting clks. Overall it's a much needed improvement to the
        clk rate range APIs that used to be pretty broken if you looked
        sideways.
      
        Beyond the core changes there are a few driver fixes for a compilation
        issue or improper data causing clks to fail to register or have the
        wrong parents. These are good to get in before the first -rc so that
        the system actually boots on the affected devices"
      
      * tag 'clk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux: (31 commits)
        clk: tegra: Fix Tegra PWM parent clock
        clk: at91: fix the build with binutils 2.27
        clk: qcom: gcc-msm8660: Drop hardcoded fixed board clocks
        clk: mediatek: clk-mux: Add .determine_rate() callback
        clk: tests: Add tests for notifiers
        clk: Update req_rate on __clk_recalc_rates()
        clk: tests: Add missing test case for ranges
        clk: qcom: clk-rcg2: Take clock boundaries into consideration for gfx3d
        clk: Introduce the clk_hw_get_rate_range function
        clk: Zero the clk_rate_request structure
        clk: Stop forwarding clk_rate_requests to the parent
        clk: Constify clk_has_parent()
        clk: Introduce clk_core_has_parent()
        clk: Switch from __clk_determine_rate to clk_core_round_rate_nolock
        clk: Add our request boundaries in clk_core_init_rate_req
        clk: Introduce clk_hw_init_rate_request()
        clk: Move clk_core_init_rate_req() from clk_core_round_rate_nolock() to its caller
        clk: Change clk_core_init_rate_req prototype
        clk: Set req_rate on reparenting
        clk: Take into account uncached clocks in clk_set_rate_range()
        ...
      2fcd8f10
    • Linus Torvalds's avatar
      Merge tag '6.1-rc-smb3-client-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6 · b08cd744
      Linus Torvalds authored
      Pull more cifs updates from Steve French:
      
       - fix a regression in guest mounts to old servers
      
       - improvements to directory leasing (caching directory entries safely
         beyond the root directory)
      
       - symlink improvement (reducing roundtrips needed to process symlinks)
      
       - an lseek fix (to problem where some dir entries could be skipped)
      
       - improved ioctl for returning more detailed information on directory
         change notifications
      
       - clarify multichannel interface query warning
      
       - cleanup fix (for better aligning buffers using ALIGN and round_up)
      
       - a compounding fix
      
       - fix some uninitialized variable bugs found by Coverity and the kernel
         test robot
      
      * tag '6.1-rc-smb3-client-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6:
        smb3: improve SMB3 change notification support
        cifs: lease key is uninitialized in two additional functions when smb1
        cifs: lease key is uninitialized in smb1 paths
        smb3: must initialize two ACL struct fields to zero
        cifs: fix double-fault crash during ntlmssp
        cifs: fix static checker warning
        cifs: use ALIGN() and round_up() macros
        cifs: find and use the dentry for cached non-root directories also
        cifs: enable caching of directories for which a lease is held
        cifs: prevent copying past input buffer boundaries
        cifs: fix uninitialised var in smb2_compound_op()
        cifs: improve symlink handling for smb2+
        smb3: clarify multichannel warning
        cifs: fix regression in very old smb1 mounts
        cifs: fix skipping to incorrect offset in emit_cached_dirents
      b08cd744
    • Tetsuo Handa's avatar
      Revert "cpumask: fix checking valid cpu range". · 80493877
      Tetsuo Handa authored
      This reverts commit 78e5a339 ("cpumask: fix checking valid cpu range").
      
      syzbot is hitting WARN_ON_ONCE(cpu >= nr_cpumask_bits) warning at
      cpu_max_bits_warn() [1], for commit 78e5a339 ("cpumask: fix checking
      valid cpu range") is broken.  Obviously that patch hits WARN_ON_ONCE()
      when e.g.  reading /proc/cpuinfo because passing "cpu + 1" instead of
      "cpu" will trivially hit cpu == nr_cpumask_bits condition.
      
      Although syzbot found this problem in linux-next.git on 2022/09/27 [2],
      this problem was not fixed immediately.  As a result, that patch was
      sent to linux.git before the patch author recognizes this problem, and
      syzbot started failing to test changes in linux.git since 2022/10/10
      [3].
      
      Andrew Jones proposed a fix for x86 and riscv architectures [4].  But
      [2] and [5] indicate that affected locations are not limited to arch
      code.  More delay before we find and fix affected locations, less tested
      kernel (and more difficult to bisect and fix) before release.
      
      We should have inspected and fixed basically all cpumask users before
      applying that patch.  We should not crash kernels in order to ask
      existing cpumask users to update their code, even if limited to
      CONFIG_DEBUG_PER_CPU_MAPS=y case.
      
      Link: https://syzkaller.appspot.com/bug?extid=d0fd2bf0dd6da72496dd [1]
      Link: https://syzkaller.appspot.com/bug?extid=21da700f3c9f0bc40150 [2]
      Link: https://syzkaller.appspot.com/bug?extid=51a652e2d24d53e75734 [3]
      Link: https://lkml.kernel.org/r/20221014155845.1986223-1-ajones@ventanamicro.com [4]
      Link: https://syzkaller.appspot.com/bug?extid=4d46c43d81c3bd155060 [5]
      Reported-by: default avatarAndrew Jones <ajones@ventanamicro.com>
      Reported-by: syzbot+d0fd2bf0dd6da72496dd@syzkaller.appspotmail.com
      Signed-off-by: default avatarTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Cc: Yury Norov <yury.norov@gmail.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      80493877
    • Nathan Chancellor's avatar
      lib/Kconfig.debug: Add check for non-constant .{s,u}leb128 support to DWARF5 · 0a6de78c
      Nathan Chancellor authored
      When building with a RISC-V kernel with DWARF5 debug info using clang
      and the GNU assembler, several instances of the following error appear:
      
        /tmp/vgettimeofday-48aa35.s:2963: Error: non-constant .uleb128 is not supported
      
      Dumping the .s file reveals these .uleb128 directives come from
      .debug_loc and .debug_ranges:
      
        .Ldebug_loc0:
                .byte   4                               # DW_LLE_offset_pair
                .uleb128 .Lfunc_begin0-.Lfunc_begin0    #   starting offset
                .uleb128 .Ltmp1-.Lfunc_begin0           #   ending offset
                .byte   1                               # Loc expr size
                .byte   90                              # DW_OP_reg10
                .byte   0                               # DW_LLE_end_of_list
      
        .Ldebug_ranges0:
                .byte   4                               # DW_RLE_offset_pair
                .uleb128 .Ltmp6-.Lfunc_begin0           #   starting offset
                .uleb128 .Ltmp27-.Lfunc_begin0          #   ending offset
                .byte   4                               # DW_RLE_offset_pair
                .uleb128 .Ltmp28-.Lfunc_begin0          #   starting offset
                .uleb128 .Ltmp30-.Lfunc_begin0          #   ending offset
                .byte   0                               # DW_RLE_end_of_list
      
      There is an outstanding binutils issue to support a non-constant operand
      to .sleb128 and .uleb128 in GAS for RISC-V but there does not appear to
      be any movement on it, due to concerns over how it would work with
      linker relaxation.
      
      To avoid these build errors, prevent DWARF5 from being selected when
      using clang and an assembler that does not have support for these symbol
      deltas, which can be easily checked in Kconfig with as-instr plus the
      small test program from the dwz test suite from the binutils issue.
      
      Link: https://sourceware.org/bugzilla/show_bug.cgi?id=27215
      Link: https://github.com/ClangBuiltLinux/linux/issues/1719Signed-off-by: default avatarNathan Chancellor <nathan@kernel.org>
      Reviewed-by: default avatarNick Desaulniers <ndesaulniers@google.com>
      Signed-off-by: default avatarMasahiro Yamada <masahiroy@kernel.org>
      0a6de78c
    • Masahiro Yamada's avatar
      kbuild: fix single directory build · 3753af77
      Masahiro Yamada authored
      Commit f110e5a2 ("kbuild: refactor single builds of *.ko") was wrong.
      
      KBUILD_MODULES _is_ needed for single builds.
      
      Otherwise, "make foo/bar/baz/" does not build module objects at all.
      
      Fixes: f110e5a2 ("kbuild: refactor single builds of *.ko")
      Reported-by: default avatarDavid Sterba <dsterba@suse.cz>
      Signed-off-by: default avatarMasahiro Yamada <masahiroy@kernel.org>
      Tested-by: default avatarDavid Sterba <dsterba@suse.com>
      3753af77
    • Linus Torvalds's avatar
      Merge tag 'slab-for-6.1-rc1-hotfix' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab · 1501278b
      Linus Torvalds authored
      Pull slab hotfix from Vlastimil Babka:
       "A single fix for the common-kmalloc series, for warnings on mips and
        sparc64 reported by Guenter Roeck"
      
      * tag 'slab-for-6.1-rc1-hotfix' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab:
        mm/slab: use kmalloc_node() for off slab freelist_idx_t array allocation
      1501278b
  4. 15 Oct, 2022 16 commits
    • Linus Torvalds's avatar
      Merge tag 'for-linus' of https://github.com/openrisc/linux · 36d8a3ed
      Linus Torvalds authored
      Pull OpenRISC updates from Stafford Horne:
       "I have relocated to London so not much work from me while I get
        settled.
      
        Still, OpenRISC picked up two patches in this window:
      
         - Fix for kernel page table walking from Jann Horn
      
         - MAINTAINER entry cleanup from Palmer Dabbelt"
      
      * tag 'for-linus' of https://github.com/openrisc/linux:
        MAINTAINERS: git://github -> https://github.com for openrisc
        openrisc: Fix pagewalk usage in arch_dma_{clear, set}_uncached
      36d8a3ed
    • Linus Torvalds's avatar
      Merge tag 'pci-v6.1-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci · 41410965
      Linus Torvalds authored
      Pull pci fix from Bjorn Helgaas:
       "Revert the attempt to distribute spare resources to unconfigured
        hotplug bridges at boot time.
      
        This fixed some dock hot-add scenarios, but Jonathan Cameron reported
        that it broke a topology with a multi-function device where one
        function was a Switch Upstream Port and the other was an Endpoint"
      
      * tag 'pci-v6.1-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
        Revert "PCI: Distribute available resources for root buses, too"
      41410965
    • Hyeonggon Yoo's avatar
      mm/slab: use kmalloc_node() for off slab freelist_idx_t array allocation · e36ce448
      Hyeonggon Yoo authored
      After commit d6a71648 ("mm/slab: kmalloc: pass requests larger than
      order-1 page to page allocator"), SLAB passes large ( > PAGE_SIZE * 2)
      requests to buddy like SLUB does.
      
      SLAB has been using kmalloc caches to allocate freelist_idx_t array for
      off slab caches. But after the commit, freelist_size can be bigger than
      KMALLOC_MAX_CACHE_SIZE.
      
      Instead of using pointer to kmalloc cache, use kmalloc_node() and only
      check if the kmalloc cache is off slab during calculate_slab_order().
      If freelist_size > KMALLOC_MAX_CACHE_SIZE, no looping condition happens
      as it allocates freelist_idx_t array directly from buddy.
      
      Link: https://lore.kernel.org/all/20221014205818.GA1428667@roeck-us.net/Reported-and-tested-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Fixes: d6a71648 ("mm/slab: kmalloc: pass requests larger than order-1 page to page allocator")
      Signed-off-by: default avatarHyeonggon Yoo <42.hyeyoo@gmail.com>
      Signed-off-by: default avatarVlastimil Babka <vbabka@suse.cz>
      e36ce448
    • Palmer Dabbelt's avatar
      MAINTAINERS: git://github -> https://github.com for openrisc · 34a0bac0
      Palmer Dabbelt authored
      Github deprecated the git:// links about a year ago, so let's move to
      the https:// URLs instead.
      Reported-by: default avatarConor Dooley <conor.dooley@microchip.com>
      Link: https://github.blog/2021-09-01-improving-git-protocol-security-github/Signed-off-by: default avatarPalmer Dabbelt <palmer@rivosinc.com>
      Signed-off-by: default avatarStafford Horne <shorne@gmail.com>
      34a0bac0
    • Steve French's avatar
      smb3: improve SMB3 change notification support · e3e94634
      Steve French authored
      Change notification is a commonly supported feature by most servers,
      but the current ioctl to request notification when a directory is
      changed does not return the information about what changed
      (even though it is returned by the server in the SMB3 change
      notify response), it simply returns when there is a change.
      
      This ioctl improves upon CIFS_IOC_NOTIFY by returning the notify
      information structure which includes the name of the file(s) that
      changed and why. See MS-SMB2 2.2.35 for details on the individual
      filter flags and the file_notify_information structure returned.
      
      To use this simply pass in the following (with enough space
      to fit at least one file_notify_information structure)
      
      struct __attribute__((__packed__)) smb3_notify {
             uint32_t completion_filter;
             bool     watch_tree;
             uint32_t data_len;
             uint8_t  data[];
      } __packed;
      
      using CIFS_IOC_NOTIFY_INFO 0xc009cf0b
       or equivalently _IOWR(CIFS_IOCTL_MAGIC, 11, struct smb3_notify_info)
      
      The ioctl will block until the server detects a change to that
      directory or its subdirectories (if watch_tree is set).
      Acked-by: default avatarPaulo Alcantara (SUSE) <pc@cjr.nz>
      Acked-by: default avatarRonnie Sahlberg <lsahlber@redhat.com>
      Signed-off-by: default avatarSteve French <stfrench@microsoft.com>
      e3e94634
    • Steve French's avatar
      cifs: lease key is uninitialized in two additional functions when smb1 · 2bff0659
      Steve French authored
      cifs_open and _cifsFileInfo_put also end up with lease_key uninitialized
      in smb1 mounts.  It is cleaner to set lease key to zero in these
      places where leases are not supported (smb1 can not return lease keys
      so the field was uninitialized).
      
      Addresses-Coverity: 1514207 ("Uninitialized scalar variable")
      Addresses-Coverity: 1514331 ("Uninitialized scalar variable")
      Reviewed-by: default avatarPaulo Alcantara (SUSE) <pc@cjr.nz>
      Signed-off-by: default avatarSteve French <stfrench@microsoft.com>
      2bff0659
    • Steve French's avatar
      cifs: lease key is uninitialized in smb1 paths · 625b60d4
      Steve French authored
      It is cleaner to set lease key to zero in the places where leases are not
      supported (smb1 can not return lease keys so the field was uninitialized).
      
      Addresses-Coverity: 1513994 ("Uninitialized scalar variable")
      Reviewed-by: default avatarPaulo Alcantara (SUSE) <pc@cjr.nz>
      Signed-off-by: default avatarSteve French <stfrench@microsoft.com>
      625b60d4
    • Steve French's avatar
      smb3: must initialize two ACL struct fields to zero · f09bd695
      Steve French authored
      Coverity spotted that we were not initalizing Stbz1 and Stbz2 to
      zero in create_sd_buf.
      
      Addresses-Coverity: 1513848 ("Uninitialized scalar variable")
      Cc: <stable@vger.kernel.org>
      Reviewed-by: default avatarPaulo Alcantara (SUSE) <pc@cjr.nz>
      Signed-off-by: default avatarSteve French <stfrench@microsoft.com>
      f09bd695
    • Paulo Alcantara's avatar
      cifs: fix double-fault crash during ntlmssp · b854b4ee
      Paulo Alcantara authored
      The crash occurred because we were calling memzero_explicit() on an
      already freed sess_data::iov[1] (ntlmsspblob) in sess_free_buffer().
      
      Fix this by not calling memzero_explicit() on sess_data::iov[1] as
      it's already by handled by callers.
      
      Fixes: a4e430c8 ("cifs: replace kfree() with kfree_sensitive() for sensitive data")
      Reviewed-by: default avatarEnzo Matsumiya <ematsumiya@suse.de>
      Signed-off-by: default avatarPaulo Alcantara (SUSE) <pc@cjr.nz>
      Signed-off-by: default avatarSteve French <stfrench@microsoft.com>
      b854b4ee
    • Arnaldo Carvalho de Melo's avatar
      tools arch x86: Sync the msr-index.h copy with the kernel sources · a3a36565
      Arnaldo Carvalho de Melo authored
      To pick up the changes in:
      
        b8d1d163 ("x86/apic: Don't disable x2APIC if locked")
        ca5b7c0d ("perf/x86/amd/lbr: Add LbrExtV2 branch record support")
      
      Addressing these tools/perf build warnings:
      
          diff -u tools/arch/x86/include/asm/msr-index.h arch/x86/include/asm/msr-index.h
          Warning: Kernel ABI header at 'tools/arch/x86/include/asm/msr-index.h' differs from latest version at 'arch/x86/include/asm/msr-index.h'
      
      That makes the beautification scripts to pick some new entries:
      
        $ tools/perf/trace/beauty/tracepoints/x86_msr.sh > before
        $ cp arch/x86/include/asm/msr-index.h tools/arch/x86/include/asm/msr-index.h
        $ tools/perf/trace/beauty/tracepoints/x86_msr.sh > after
        $ diff -u before after
        --- before	2022-10-14 18:06:34.294561729 -0300
        +++ after	2022-10-14 18:06:41.285744044 -0300
        @@ -264,6 +264,7 @@
         	[0xc0000102 - x86_64_specific_MSRs_offset] = "KERNEL_GS_BASE",
         	[0xc0000103 - x86_64_specific_MSRs_offset] = "TSC_AUX",
         	[0xc0000104 - x86_64_specific_MSRs_offset] = "AMD64_TSC_RATIO",
        +	[0xc000010e - x86_64_specific_MSRs_offset] = "AMD64_LBR_SELECT",
         	[0xc000010f - x86_64_specific_MSRs_offset] = "AMD_DBG_EXTN_CFG",
         	[0xc0000300 - x86_64_specific_MSRs_offset] = "AMD64_PERF_CNTR_GLOBAL_STATUS",
         	[0xc0000301 - x86_64_specific_MSRs_offset] = "AMD64_PERF_CNTR_GLOBAL_CTL",
        $
      
      Now one can trace systemwide asking to see backtraces to where that MSR
      is being read/written, see this example with a previous update:
      
        # perf trace -e msr:*_msr/max-stack=32/ --filter="msr>=IA32_U_CET && msr<=IA32_INT_SSP_TAB"
        ^C#
      
      If we use -v (verbose mode) we can see what it does behind the scenes:
      
        # perf trace -v -e msr:*_msr/max-stack=32/ --filter="msr>=IA32_U_CET && msr<=IA32_INT_SSP_TAB"
        Using CPUID AuthenticAMD-25-21-0
        0x6a0
        0x6a8
        New filter for msr:read_msr: (msr>=0x6a0 && msr<=0x6a8) && (common_pid != 597499 && common_pid != 3313)
        0x6a0
        0x6a8
        New filter for msr:write_msr: (msr>=0x6a0 && msr<=0x6a8) && (common_pid != 597499 && common_pid != 3313)
        mmap size 528384B
        ^C#
      
      Example with a frequent msr:
      
        # perf trace -v -e msr:*_msr/max-stack=32/ --filter="msr==IA32_SPEC_CTRL" --max-events 2
        Using CPUID AuthenticAMD-25-21-0
        0x48
        New filter for msr:read_msr: (msr==0x48) && (common_pid != 2612129 && common_pid != 3841)
        0x48
        New filter for msr:write_msr: (msr==0x48) && (common_pid != 2612129 && common_pid != 3841)
        mmap size 528384B
        Looking at the vmlinux_path (8 entries long)
        symsrc__init: build id mismatch for vmlinux.
        Using /proc/kcore for kernel data
        Using /proc/kallsyms for symbols
           0.000 Timer/2525383 msr:write_msr(msr: IA32_SPEC_CTRL, val: 6)
                                             do_trace_write_msr ([kernel.kallsyms])
                                             do_trace_write_msr ([kernel.kallsyms])
                                             __switch_to_xtra ([kernel.kallsyms])
                                             __switch_to ([kernel.kallsyms])
                                             __schedule ([kernel.kallsyms])
                                             schedule ([kernel.kallsyms])
                                             futex_wait_queue_me ([kernel.kallsyms])
                                             futex_wait ([kernel.kallsyms])
                                             do_futex ([kernel.kallsyms])
                                             __x64_sys_futex ([kernel.kallsyms])
                                             do_syscall_64 ([kernel.kallsyms])
                                             entry_SYSCALL_64_after_hwframe ([kernel.kallsyms])
                                             __futex_abstimed_wait_common64 (/usr/lib64/libpthread-2.33.so)
           0.030 :0/0 msr:write_msr(msr: IA32_SPEC_CTRL, val: 2)
                                             do_trace_write_msr ([kernel.kallsyms])
                                             do_trace_write_msr ([kernel.kallsyms])
                                             __switch_to_xtra ([kernel.kallsyms])
                                             __switch_to ([kernel.kallsyms])
                                             __schedule ([kernel.kallsyms])
                                             schedule_idle ([kernel.kallsyms])
                                             do_idle ([kernel.kallsyms])
                                             cpu_startup_entry ([kernel.kallsyms])
                                             secondary_startup_64_no_verify ([kernel.kallsyms])
        #
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Daniel Sneddon <daniel.sneddon@linux.intel.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sandipan Das <sandipan.das@amd.com>
      Link: https://lore.kernel.org/lkml/Y0nQkz2TUJxwfXJd@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      a3a36565
    • Qi Liu's avatar
      perf auxtrace arm64: Add support for parsing HiSilicon PCIe Trace packet · 5e91e57e
      Qi Liu authored
      Add support for using 'perf report --dump-raw-trace' to parse PTT packet.
      
      Example usage:
      
      Output will contain raw PTT data and its textual representation, such
      as (8DW format):
      
      0 0 0x5810 [0x30]: PERF_RECORD_AUXTRACE size: 0x400000  offset: 0
      ref: 0xa5d50c725  idx: 0  tid: -1  cpu: 0
      .
      . ... HISI PTT data: size 4194304 bytes
      .  00000000: 00 00 00 00                                 Prefix
      .  00000004: 08 20 00 60                                 Header DW0
      .  00000008: ff 02 00 01                                 Header DW1
      .  0000000c: 20 08 00 00                                 Header DW2
      .  00000010: 10 e7 44 ab                                 Header DW3
      .  00000014: 2a a8 1e 01                                 Time
      .  00000020: 00 00 00 00                                 Prefix
      .  00000024: 01 00 00 60                                 Header DW0
      .  00000028: 0f 1e 00 01                                 Header DW1
      .  0000002c: 04 00 00 00                                 Header DW2
      .  00000030: 40 00 81 02                                 Header DW3
      .  00000034: ee 02 00 00                                 Time
      ....
      
      This patch only add basic parsing support according to the definition of
      the PTT packet described in Documentation/trace/hisi-ptt.rst. And the
      fields of each packet can be further decoded following the PCIe Spec's
      definition of TLP packet.
      Signed-off-by: default avatarQi Liu <liuqi115@huawei.com>
      Signed-off-by: default avatarYicong Yang <yangyicong@hisilicon.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Bjorn Helgaas <helgaas@kernel.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Jonathan Cameron <jonathan.cameron@huawei.com>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Qi Liu <liuqi6124@gmail.com>
      Cc: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
      Cc: Shaokun Zhang <zhangshaokun@hisilicon.com>
      Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Zeng Prime <prime.zeng@huawei.com>
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: linux-pci@vger.kernel.org
      Cc: linuxarm@huawei.com
      Link: https://lore.kernel.org/r/20220927081400.14364-4-yangyicong@huawei.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      5e91e57e
    • Qi Liu's avatar
      perf auxtrace arm64: Add support for HiSilicon PCIe Tune and Trace device driver · 057381a7
      Qi Liu authored
      HiSilicon PCIe tune and trace device (PTT) could dynamically tune the
      PCIe link's events, and trace the TLP headers).
      
      This patch add support for PTT device in perf tool, so users could use
      'perf record' to get TLP headers trace data.
      Reviewed-by: default avatarLeo Yan <leo.yan@linaro.org>
      Signed-off-by: default avatarQi Liu <liuqi115@huawei.com>
      Signed-off-by: default avatarYicong Yang <yangyicong@hisilicon.com>
      Acked-by: default avatarJohn Garry <john.garry@huawei.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Bjorn Helgaas <helgaas@kernel.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Jonathan Cameron <jonathan.cameron@huawei.com>
      Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Qi Liu <liuqi6124@gmail.com>
      Cc: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
      Cc: Shaokun Zhang <zhangshaokun@hisilicon.com>
      Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Zeng Prime <prime.zeng@huawei.com>
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: linux-pci@vger.kernel.org
      Cc: linuxarm@huawei.com
      Link: https://lore.kernel.org/r/20220927081400.14364-3-yangyicong@huawei.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      057381a7
    • Qi Liu's avatar
      perf auxtrace arm: Refactor event list iteration in auxtrace_record__init() · 45a3975f
      Qi Liu authored
      Add find_pmu_for_event() and use to simplify logic in
      auxtrace_record_init(). find_pmu_for_event() will be reused in
      subsequent patches.
      Reviewed-by: default avatarJohn Garry <john.garry@huawei.com>
      Reviewed-by: default avatarJonathan Cameron <Jonathan.Cameron@huawei.com>
      Reviewed-by: default avatarLeo Yan <leo.yan@linaro.org>
      Signed-off-by: default avatarQi Liu <liuqi115@huawei.com>
      Signed-off-by: default avatarYicong Yang <yangyicong@hisilicon.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Bjorn Helgaas <helgaas@kernel.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Qi Liu <liuqi6124@gmail.com>
      Cc: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
      Cc: Shaokun Zhang <zhangshaokun@hisilicon.com>
      Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Zeng Prime <prime.zeng@huawei.com>
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: linux-pci@vger.kernel.org
      Cc: linuxarm@huawei.com
      Link: https://lore.kernel.org/r/20220927081400.14364-2-yangyicong@huawei.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      45a3975f
    • Athira Rajeev's avatar
      perf tests stat+json_output: Include sanity check for topology · 58d4802a
      Athira Rajeev authored
      Testcase stat+json_output.sh fails in powerpc:
      
      	86: perf stat JSON output linter : FAILED!
      
      The testcase "stat+json_output.sh" verifies perf stat JSON output. The
      test covers aggregation modes like per-socket, per-core, per-die, -A
      (no_aggr mode) along with few other tests. It counts expected fields for
      various commands. For example say -A (i.e, AGGR_NONE mode), expects 7
      fields in the output having "CPU" as first field. Same way, for
      per-socket, it expects the first field in result to point to socket id.
      The testcases compares the result with expected count.
      
      The values for socket, die, core and cpu are fetched from topology
      directory:
      
        /sys/devices/system/cpu/cpu*/topology.
      
      For example, socket value is fetched from "physical_package_id" file of
      topology directory.  (cpu__get_topology_int() in util/cpumap.c)
      
      If a platform fails to fetch the topology information, values will be
      set to -1. For example, incase of pSeries platform of powerpc, value for
      "physical_package_id" is restricted and not exposed. So, -1 will be
      assigned.
      
      Perf code has a checks for valid cpu id in "aggr_printout"
      (stat-display.c), which displays the fields. So, in cases where topology
      values not exposed, first field of the output displaying will be empty.
      This cause the testcase to fail, as it counts  number of fields in the
      output.
      
      Incase of -A (AGGR_NONE mode,), testcase expects 7 fields in the output,
      becos of -1 value obtained from topology files for some, only 6 fields
      are printed. Hence a testcase failure reported due to mismatch in number
      of fields in the output.
      
      Patch here adds a sanity check in the testcase for topology.  Check will
      help to skip the test if -1 value found.
      
      Fixes: 0c343af2 ("perf test: JSON format checking")
      Reported-by: default avatarDisha Goel <disgoel@linux.vnet.ibm.com>
      Suggested-by: default avatarIan Rogers <irogers@google.com>
      Suggested-by: default avatarJames Clark <james.clark@arm.com>
      Signed-off-by: default avatarAthira Jajeev <atrajeev@linux.vnet.ibm.com>
      Cc: Claire Jensen <cjense@google.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kajol Jain <kjain@linux.ibm.com>
      Cc: linuxppc-dev@lists.ozlabs.org
      Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Nageswara R Sastry <rnsastry@linux.ibm.com>
      Link: https://lore.kernel.org/r/20221006155149.67205-2-atrajeev@linux.vnet.ibm.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      58d4802a
    • Athira Rajeev's avatar
      perf tests stat+csv_output: Include sanity check for topology · cd400f6f
      Athira Rajeev authored
      Testcase stat+csv_output.sh fails in powerpc:
      
      	84: perf stat CSV output linter: FAILED!
      
      The testcase "stat+csv_output.sh" verifies perf stat CSV output. The
      test covers aggregation modes like per-socket, per-core, per-die, -A
      (no_aggr mode) along with few other tests. It counts expected fields for
      various commands. For example say -A (i.e, AGGR_NONE mode), expects 7
      fields in the output having "CPU" as first field. Same way, for
      per-socket, it expects the first field in result to point to socket id.
      The testcases compares the result with expected count.
      
      The values for socket, die, core and cpu are fetched from topology
      directory:
      
        /sys/devices/system/cpu/cpu*/topology.
      
      For example, socket value is fetched from "physical_package_id" file of
      topology directory.  (cpu__get_topology_int() in util/cpumap.c)
      
      If a platform fails to fetch the topology information, values will be
      set to -1. For example, incase of pSeries platform of powerpc, value for
      "physical_package_id" is restricted and not exposed. So, -1 will be
      assigned.
      
      Perf code has a checks for valid cpu id in "aggr_printout"
      (stat-display.c), which displays the fields. So, in cases where topology
      values not exposed, first field of the output displaying will be empty.
      This cause the testcase to fail, as it counts  number of fields in the
      output.
      
      Incase of -A (AGGR_NONE mode,), testcase expects 7 fields in the output,
      becos of -1 value obtained from topology files for some, only 6 fields
      are printed. Hence a testcase failure reported due to mismatch in number
      of fields in the output.
      
      Patch here adds a sanity check in the testcase for topology.  Check will
      help to skip the test if -1 value found.
      
      Fixes: 7473ee56 ("perf test: Add checking for perf stat CSV output.")
      Reported-by: default avatarDisha Goel <disgoel@linux.vnet.ibm.com>
      Suggested-by: default avatarIan Rogers <irogers@google.com>
      Suggested-by: default avatarJames Clark <james.clark@arm.com>
      Signed-off-by: default avatarAthira Jajeev <atrajeev@linux.vnet.ibm.com>
      Cc: Claire Jensen <cjense@google.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kajol Jain <kjain@linux.ibm.com>
      Cc: linuxppc-dev@lists.ozlabs.org
      Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Nageswara R Sastry <rnsastry@linux.ibm.com>
      Link: https://lore.kernel.org/r/20221006155149.67205-1-atrajeev@linux.vnet.ibm.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      cd400f6f
    • Adrian Hunter's avatar
      perf intel-pt: Fix system_wide dummy event for hybrid · 6cef7dab
      Adrian Hunter authored
      User space tasks can migrate between CPUs, so when tracing selected CPUs,
      system-wide sideband is still needed, however evlist->core.has_user_cpus
      is not set in the hybrid case, so check the target cpu_list instead.
      
      Fixes: 7d189cad ("perf intel-pt: Track sideband system-wide when needed")
      Signed-off-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Acked-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: stable@vger.kernel.org
      Link: https://lore.kernel.org/r/20221012082259.22394-3-adrian.hunter@intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      6cef7dab