1. 18 Jan, 2022 13 commits
    • Kuniyuki Iwashima's avatar
      bpf: af_unix: Use batching algorithm in bpf unix iter. · 855d8e77
      Kuniyuki Iwashima authored
      The commit 04c7820b ("bpf: tcp: Bpf iter batching and lock_sock")
      introduces the batching algorithm to iterate TCP sockets with more
      consistency.
      
      This patch uses the same algorithm to iterate AF_UNIX sockets.
      Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.co.jp>
      Link: https://lore.kernel.org/r/20220113002849.4384-3-kuniyu@amazon.co.jpSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      855d8e77
    • Kuniyuki Iwashima's avatar
      af_unix: Refactor unix_next_socket(). · 4408d55a
      Kuniyuki Iwashima authored
      Currently, unix_next_socket() is overloaded depending on the 2nd argument.
      If it is NULL, unix_next_socket() returns the first socket in the hash.  If
      not NULL, it returns the next socket in the same hash list or the first
      socket in the next non-empty hash list.
      
      This patch refactors unix_next_socket() into two functions unix_get_first()
      and unix_get_next().  unix_get_first() newly acquires a lock and returns
      the first socket in the list.  unix_get_next() returns the next socket in a
      list or releases a lock and falls back to unix_get_first().
      
      In the following patch, bpf iter holds entire sockets in a list and always
      releases the lock before .show().  It always calls unix_get_first() to
      acquire a lock in each iteration.  So, this patch makes the change easier
      to follow.
      Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.co.jp>
      Link: https://lore.kernel.org/r/20220113002849.4384-2-kuniyu@amazon.co.jpSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      4408d55a
    • Alexei Starovoitov's avatar
      Merge branch 'Introduce unstable CT lookup helpers' · 2a1aff60
      Alexei Starovoitov authored
      Kumar Kartikeya says:
      
      ====================
      
      This series adds unstable conntrack lookup helpers using BPF kfunc support.  The
      patch adding the lookup helper is based off of Maxim's recent patch to aid in
      rebasing their series on top of this, all adjusted to work with module kfuncs [0].
      
        [0]: https://lore.kernel.org/bpf/20211019144655.3483197-8-maximmi@nvidia.com
      
      To enable returning a reference to struct nf_conn, the verifier is extended to
      support reference tracking for PTR_TO_BTF_ID, and kfunc is extended with support
      for working as acquire/release functions, similar to existing BPF helpers. kfunc
      returning pointer (limited to PTR_TO_BTF_ID in the kernel) can also return a
      PTR_TO_BTF_ID_OR_NULL now, typically needed when acquiring a resource can fail.
      kfunc can also receive PTR_TO_CTX and PTR_TO_MEM (with some limitations) as
      arguments now. There is also support for passing a mem, len pair as argument
      to kfunc now. In such cases, passing pointer to unsized type (void) is also
      permitted.
      
      Please see individual commits for details.
      
      Changelog:
      ----------
      v7 -> v8:
      v7: https://lore.kernel.org/bpf/20220111180428.931466-1-memxor@gmail.com
      
       * Move enum btf_kfunc_hook to btf.c (Alexei)
       * Drop verbose log for unlikely failure case in __find_kfunc_desc_btf (Alexei)
       * Remove unnecessary barrier in register_btf_kfunc_id_set (Alexei)
       * Switch macro in bpf_nf test to __always_inline function (Alexei)
      
      v6 -> v7:
      v6: https://lore.kernel.org/bpf/20220102162115.1506833-1-memxor@gmail.com
      
       * Drop try_module_get_live patch, use flag in btf_module struct (Alexei)
       * Add comments and expand commit message detailing why we have to concatenate
         and sort vmlinux kfunc BTF ID sets (Alexei)
       * Use bpf_testmod for testing btf_try_get_module race (Alexei)
       * Use bpf_prog_type for both btf_kfunc_id_set_contains and
         register_btf_kfunc_id_set calls (Alexei)
       * In case of module set registration, directly assign set (Alexei)
       * Add CONFIG_USERFAULTFD=y to selftest config
       * Fix other nits
      
      v5 -> v6:
      v5: https://lore.kernel.org/bpf/20211230023705.3860970-1-memxor@gmail.com
      
       * Fix for a bug in btf_try_get_module leading to use-after-free
       * Drop *kallsyms_on_each_symbol loop, reinstate register_btf_kfunc_id_set (Alexei)
       * btf_free_kfunc_set_tab now takes struct btf, and handles resetting tab to NULL
       * Check return value btf_name_by_offset for param_name
       * Instead of using tmp_set, use btf->kfunc_set_tab directly, and simplify cleanup
      
      v4 -> v5:
      v4: https://lore.kernel.org/bpf/20211217015031.1278167-1-memxor@gmail.com
      
       * Move nf_conntrack helpers code to its own separate file (Toke, Pablo)
       * Remove verifier callbacks, put btf_id_sets in struct btf (Alexei)
        * Convert the in-kernel users away from the old API
       * Change len__ prefix convention to __sz suffix (Alexei)
       * Drop parent_ref_obj_id patch (Alexei)
      
      v3 -> v4:
      v3: https://lore.kernel.org/bpf/20211210130230.4128676-1-memxor@gmail.com
      
       * Guard unstable CT helpers with CONFIG_DEBUG_INFO_BTF_MODULES
       * Move addition of prog_test test kfuncs to selftest commit
       * Move negative kfunc tests to test_verifier suite
       * Limit struct nesting depth to 4, which should be enough for now
      
      v2 -> v3:
      v2: https://lore.kernel.org/bpf/20211209170929.3485242-1-memxor@gmail.com
      
       * Fix build error for !CONFIG_BPF_SYSCALL (Patchwork)
      
      RFC v1 -> v2:
      v1: https://lore.kernel.org/bpf/20211030144609.263572-1-memxor@gmail.com
      
       * Limit PTR_TO_MEM support to pointer to scalar, or struct with scalars (Alexei)
       * Use btf_id_set for checking acquire, release, ret type null (Alexei)
       * Introduce opts struct for CT helpers, move int err parameter to it
       * Add l4proto as parameter to CT helper's opts, remove separate tcp/udp helpers
       * Add support for mem, len argument pair to kfunc
       * Allow void * as pointer type for mem, len argument pair
       * Extend selftests to cover new additions to kfuncs
       * Copy ref_obj_id to PTR_TO_BTF_ID dst_reg on btf_struct_access, test it
       * Fix other misc nits, bugs, and expand commit messages
      ====================
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      2a1aff60
    • Kumar Kartikeya Dwivedi's avatar
      selftests/bpf: Add test for race in btf_try_get_module · 46565696
      Kumar Kartikeya Dwivedi authored
      This adds a complete test case to ensure we never take references to
      modules not in MODULE_STATE_LIVE, which can lead to UAF, and it also
      ensures we never access btf->kfunc_set_tab in an inconsistent state.
      
      The test uses userfaultfd to artificially widen the race.
      
      When run on an unpatched kernel, it leads to the following splat:
      
      [root@(none) bpf]# ./test_progs -t bpf_mod_race/ksym
      [   55.498171] BUG: unable to handle page fault for address: fffffbfff802548b
      [   55.499206] #PF: supervisor read access in kernel mode
      [   55.499855] #PF: error_code(0x0000) - not-present page
      [   55.500555] PGD a4fa9067 P4D a4fa9067 PUD a4fa5067 PMD 1b44067 PTE 0
      [   55.501499] Oops: 0000 [#1] PREEMPT SMP KASAN NOPTI
      [   55.502195] CPU: 0 PID: 83 Comm: kworker/0:2 Tainted: G           OE     5.16.0-rc4+ #151
      [   55.503388] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ArchLinux 1.15.0-1 04/01/2014
      [   55.504777] Workqueue: events bpf_prog_free_deferred
      [   55.505563] RIP: 0010:kasan_check_range+0x184/0x1d0
      [   55.509140] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
      [   55.509977] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
      [   55.511096] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
      [   55.512143] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
      [   55.513228] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
      [   55.514332] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
      [   55.515418] FS:  0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
      [   55.516705] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [   55.517560] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
      [   55.518672] PKRU: 55555554
      [   55.519022] Call Trace:
      [   55.519483]  <TASK>
      [   55.519884]  module_put.part.0+0x2a/0x180
      [   55.520642]  bpf_prog_free_deferred+0x129/0x2e0
      [   55.521478]  process_one_work+0x4fa/0x9e0
      [   55.522122]  ? pwq_dec_nr_in_flight+0x100/0x100
      [   55.522878]  ? rwlock_bug.part.0+0x60/0x60
      [   55.523551]  worker_thread+0x2eb/0x700
      [   55.524176]  ? __kthread_parkme+0xd8/0xf0
      [   55.524853]  ? process_one_work+0x9e0/0x9e0
      [   55.525544]  kthread+0x23a/0x270
      [   55.526088]  ? set_kthread_struct+0x80/0x80
      [   55.526798]  ret_from_fork+0x1f/0x30
      [   55.527413]  </TASK>
      [   55.527813] Modules linked in: bpf_testmod(OE) [last unloaded: bpf_testmod]
      [   55.530846] CR2: fffffbfff802548b
      [   55.531341] ---[ end trace 1af41803c054ad6d ]---
      [   55.532136] RIP: 0010:kasan_check_range+0x184/0x1d0
      [   55.535887] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
      [   55.536711] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
      [   55.537821] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
      [   55.538899] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
      [   55.539928] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
      [   55.541021] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
      [   55.542108] FS:  0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
      [   55.543260]CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [   55.544136] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
      [   55.545317] PKRU: 55555554
      [   55.545671] note: kworker/0:2[83] exited with preempt_count 1
      Signed-off-by: default avatarKumar Kartikeya Dwivedi <memxor@gmail.com>
      Link: https://lore.kernel.org/r/20220114163953.1455836-11-memxor@gmail.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      46565696
    • Kumar Kartikeya Dwivedi's avatar
      selftests/bpf: Extend kfunc selftests · c1ff181f
      Kumar Kartikeya Dwivedi authored
      Use the prog_test kfuncs to test the referenced PTR_TO_BTF_ID kfunc
      support, and PTR_TO_CTX, PTR_TO_MEM argument passing support. Also
      testing the various failure cases for invalid kfunc prototypes.
      Signed-off-by: default avatarKumar Kartikeya Dwivedi <memxor@gmail.com>
      Link: https://lore.kernel.org/r/20220114163953.1455836-10-memxor@gmail.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      c1ff181f
    • Kumar Kartikeya Dwivedi's avatar
      selftests/bpf: Add test_verifier support to fixup kfunc call insns · 0201b807
      Kumar Kartikeya Dwivedi authored
      This allows us to add tests (esp. negative tests) where we only want to
      ensure the program doesn't pass through the verifier, and also verify
      the error. The next commit will add the tests making use of this.
      Signed-off-by: default avatarKumar Kartikeya Dwivedi <memxor@gmail.com>
      Link: https://lore.kernel.org/r/20220114163953.1455836-9-memxor@gmail.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      0201b807
    • Kumar Kartikeya Dwivedi's avatar
      selftests/bpf: Add test for unstable CT lookup API · 87091063
      Kumar Kartikeya Dwivedi authored
      This tests that we return errors as documented, and also that the kfunc
      calls work from both XDP and TC hooks.
      Signed-off-by: default avatarKumar Kartikeya Dwivedi <memxor@gmail.com>
      Link: https://lore.kernel.org/r/20220114163953.1455836-8-memxor@gmail.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      87091063
    • Kumar Kartikeya Dwivedi's avatar
      net/netfilter: Add unstable CT lookup helpers for XDP and TC-BPF · b4c2b959
      Kumar Kartikeya Dwivedi authored
      This change adds conntrack lookup helpers using the unstable kfunc call
      interface for the XDP and TC-BPF hooks. The primary usecase is
      implementing a synproxy in XDP, see Maxim's patchset [0].
      
      Export get_net_ns_by_id as nf_conntrack_bpf.c needs to call it.
      
      This object is only built when CONFIG_DEBUG_INFO_BTF_MODULES is enabled.
      
        [0]: https://lore.kernel.org/bpf/20211019144655.3483197-1-maximmi@nvidia.comSigned-off-by: default avatarKumar Kartikeya Dwivedi <memxor@gmail.com>
      Link: https://lore.kernel.org/r/20220114163953.1455836-7-memxor@gmail.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      b4c2b959
    • Kumar Kartikeya Dwivedi's avatar
      bpf: Add reference tracking support to kfunc · 5c073f26
      Kumar Kartikeya Dwivedi authored
      This patch adds verifier support for PTR_TO_BTF_ID return type of kfunc
      to be a reference, by reusing acquire_reference_state/release_reference
      support for existing in-kernel bpf helpers.
      
      We make use of the three kfunc types:
      
      - BTF_KFUNC_TYPE_ACQUIRE
        Return true if kfunc_btf_id is an acquire kfunc.  This will
        acquire_reference_state for the returned PTR_TO_BTF_ID (this is the
        only allow return value). Note that acquire kfunc must always return a
        PTR_TO_BTF_ID{_OR_NULL}, otherwise the program is rejected.
      
      - BTF_KFUNC_TYPE_RELEASE
        Return true if kfunc_btf_id is a release kfunc.  This will release the
        reference to the passed in PTR_TO_BTF_ID which has a reference state
        (from earlier acquire kfunc).
        The btf_check_func_arg_match returns the regno (of argument register,
        hence > 0) if the kfunc is a release kfunc, and a proper referenced
        PTR_TO_BTF_ID is being passed to it.
        This is similar to how helper call check uses bpf_call_arg_meta to
        store the ref_obj_id that is later used to release the reference.
        Similar to in-kernel helper, we only allow passing one referenced
        PTR_TO_BTF_ID as an argument. It can also be passed in to normal
        kfunc, but in case of release kfunc there must always be one
        PTR_TO_BTF_ID argument that is referenced.
      
      - BTF_KFUNC_TYPE_RET_NULL
        For kfunc returning PTR_TO_BTF_ID, tells if it can be NULL, hence
        force caller to mark the pointer not null (using check) before
        accessing it. Note that taking into account the case fixed by commit
        93c230e3 ("bpf: Enforce id generation for all may-be-null register type")
        we assign a non-zero id for mark_ptr_or_null_reg logic. Later, if more
        return types are supported by kfunc, which have a _OR_NULL variant, it
        might be better to move this id generation under a common
        reg_type_may_be_null check, similar to the case in the commit.
      
      Referenced PTR_TO_BTF_ID is currently only limited to kfunc, but can be
      extended in the future to other BPF helpers as well.  For now, we can
      rely on the btf_struct_ids_match check to ensure we get the pointer to
      the expected struct type. In the future, care needs to be taken to avoid
      ambiguity for reference PTR_TO_BTF_ID passed to release function, in
      case multiple candidates can release same BTF ID.
      
      e.g. there might be two release kfuncs (or kfunc and helper):
      
      foo(struct abc *p);
      bar(struct abc *p);
      
      ... such that both release a PTR_TO_BTF_ID with btf_id of struct abc. In
      this case we would need to track the acquire function corresponding to
      the release function to avoid type confusion, and store this information
      in the register state so that an incorrect program can be rejected. This
      is not a problem right now, hence it is left as an exercise for the
      future patch introducing such a case in the kernel.
      Signed-off-by: default avatarKumar Kartikeya Dwivedi <memxor@gmail.com>
      Link: https://lore.kernel.org/r/20220114163953.1455836-6-memxor@gmail.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      5c073f26
    • Kumar Kartikeya Dwivedi's avatar
      bpf: Introduce mem, size argument pair support for kfunc · d583691c
      Kumar Kartikeya Dwivedi authored
      BPF helpers can associate two adjacent arguments together to pass memory
      of certain size, using ARG_PTR_TO_MEM and ARG_CONST_SIZE arguments.
      Since we don't use bpf_func_proto for kfunc, we need to leverage BTF to
      implement similar support.
      
      The ARG_CONST_SIZE processing for helpers is refactored into a common
      check_mem_size_reg helper that is shared with kfunc as well. kfunc
      ptr_to_mem support follows logic similar to global functions, where
      verification is done as if pointer is not null, even when it may be
      null.
      
      This leads to a simple to follow rule for writing kfunc: always check
      the argument pointer for NULL, except when it is PTR_TO_CTX. Also, the
      PTR_TO_CTX case is also only safe when the helper expecting pointer to
      program ctx is not exposed to other programs where same struct is not
      ctx type. In that case, the type check will fall through to other cases
      and would permit passing other types of pointers, possibly NULL at
      runtime.
      
      Currently, we require the size argument to be suffixed with "__sz" in
      the parameter name. This information is then recorded in kernel BTF and
      verified during function argument checking. In the future we can use BTF
      tagging instead, and modify the kernel function definitions. This will
      be a purely kernel-side change.
      
      This allows us to have some form of backwards compatibility for
      structures that are passed in to the kernel function with their size,
      and allow variable length structures to be passed in if they are
      accompanied by a size parameter.
      Signed-off-by: default avatarKumar Kartikeya Dwivedi <memxor@gmail.com>
      Link: https://lore.kernel.org/r/20220114163953.1455836-5-memxor@gmail.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      d583691c
    • Kumar Kartikeya Dwivedi's avatar
      bpf: Remove check_kfunc_call callback and old kfunc BTF ID API · b202d844
      Kumar Kartikeya Dwivedi authored
      Completely remove the old code for check_kfunc_call to help it work
      with modules, and also the callback itself.
      
      The previous commit adds infrastructure to register all sets and put
      them in vmlinux or module BTF, and concatenates all related sets
      organized by the hook and the type. Once populated, these sets remain
      immutable for the lifetime of the struct btf.
      
      Also, since we don't need the 'owner' module anywhere when doing
      check_kfunc_call, drop the 'btf_modp' module parameter from
      find_kfunc_desc_btf.
      Signed-off-by: default avatarKumar Kartikeya Dwivedi <memxor@gmail.com>
      Link: https://lore.kernel.org/r/20220114163953.1455836-4-memxor@gmail.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      b202d844
    • Kumar Kartikeya Dwivedi's avatar
      bpf: Populate kfunc BTF ID sets in struct btf · dee872e1
      Kumar Kartikeya Dwivedi authored
      This patch prepares the kernel to support putting all kinds of kfunc BTF
      ID sets in the struct btf itself. The various kernel subsystems will
      make register_btf_kfunc_id_set call in the initcalls (for built-in code
      and modules).
      
      The 'hook' is one of the many program types, e.g. XDP and TC/SCHED_CLS,
      STRUCT_OPS, and 'types' are check (allowed or not), acquire, release,
      and ret_null (with PTR_TO_BTF_ID_OR_NULL return type).
      
      A maximum of BTF_KFUNC_SET_MAX_CNT (32) kfunc BTF IDs are permitted in a
      set of certain hook and type for vmlinux sets, since they are allocated
      on demand, and otherwise set as NULL. Module sets can only be registered
      once per hook and type, hence they are directly assigned.
      
      A new btf_kfunc_id_set_contains function is exposed for use in verifier,
      this new method is faster than the existing list searching method, and
      is also automatic. It also lets other code not care whether the set is
      unallocated or not.
      
      Note that module code can only do single register_btf_kfunc_id_set call
      per hook. This is why sorting is only done for in-kernel vmlinux sets,
      because there might be multiple sets for the same hook and type that
      must be concatenated, hence sorting them is required to ensure bsearch
      in btf_id_set_contains continues to work correctly.
      
      Next commit will update the kernel users to make use of this
      infrastructure.
      
      Finally, add __maybe_unused annotation for BTF ID macros for the
      !CONFIG_DEBUG_INFO_BTF case, so that they don't produce warnings during
      build time.
      
      The previous patch is also needed to provide synchronization against
      initialization for module BTF's kfunc_set_tab introduced here, as
      described below:
      
        The kfunc_set_tab pointer in struct btf is write-once (if we consider
        the registration phase (comprised of multiple register_btf_kfunc_id_set
        calls) as a single operation). In this sense, once it has been fully
        prepared, it isn't modified, only used for lookup (from the verifier
        context).
      
        For btf_vmlinux, it is initialized fully during the do_initcalls phase,
        which happens fairly early in the boot process, before any processes are
        present. This also eliminates the possibility of bpf_check being called
        at that point, thus relieving us of ensuring any synchronization between
        the registration and lookup function (btf_kfunc_id_set_contains).
      
        However, the case for module BTF is a bit tricky. The BTF is parsed,
        prepared, and published from the MODULE_STATE_COMING notifier callback.
        After this, the module initcalls are invoked, where our registration
        function will be called to populate the kfunc_set_tab for module BTF.
      
        At this point, BTF may be available to userspace while its corresponding
        module is still intializing. A BTF fd can then be passed to verifier
        using bpf syscall (e.g. for kfunc call insn).
      
        Hence, there is a race window where verifier may concurrently try to
        lookup the kfunc_set_tab. To prevent this race, we must ensure the
        operations are serialized, or waiting for the __init functions to
        complete.
      
        In the earlier registration API, this race was alleviated as verifier
        bpf_check_mod_kfunc_call didn't find the kfunc BTF ID until it was added
        by the registration function (called usually at the end of module __init
        function after all module resources have been initialized). If the
        verifier made the check_kfunc_call before kfunc BTF ID was added to the
        list, it would fail verification (saying call isn't allowed). The
        access to list was protected using a mutex.
      
        Now, it would still fail verification, but for a different reason
        (returning ENXIO due to the failed btf_try_get_module call in
        add_kfunc_call), because if the __init call is in progress the module
        will be in the middle of MODULE_STATE_COMING -> MODULE_STATE_LIVE
        transition, and the BTF_MODULE_LIVE flag for btf_module instance will
        not be set, so the btf_try_get_module call will fail.
      Signed-off-by: default avatarKumar Kartikeya Dwivedi <memxor@gmail.com>
      Link: https://lore.kernel.org/r/20220114163953.1455836-3-memxor@gmail.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      dee872e1
    • Kumar Kartikeya Dwivedi's avatar
      bpf: Fix UAF due to race between btf_try_get_module and load_module · 18688de2
      Kumar Kartikeya Dwivedi authored
      While working on code to populate kfunc BTF ID sets for module BTF from
      its initcall, I noticed that by the time the initcall is invoked, the
      module BTF can already be seen by userspace (and the BPF verifier). The
      existing btf_try_get_module calls try_module_get which only fails if
      mod->state == MODULE_STATE_GOING, i.e. it can increment module reference
      when module initcall is happening in parallel.
      
      Currently, BTF parsing happens from MODULE_STATE_COMING notifier
      callback. At this point, the module initcalls have not been invoked.
      The notifier callback parses and prepares the module BTF, allocates an
      ID, which publishes it to userspace, and then adds it to the btf_modules
      list allowing the kernel to invoke btf_try_get_module for the BTF.
      
      However, at this point, the module has not been fully initialized (i.e.
      its initcalls have not finished). The code in module.c can still fail
      and free the module, without caring for other users. However, nothing
      stops btf_try_get_module from succeeding between the state transition
      from MODULE_STATE_COMING to MODULE_STATE_LIVE.
      
      This leads to a use-after-free issue when BPF program loads
      successfully in the state transition, load_module's do_init_module call
      fails and frees the module, and BPF program fd on close calls module_put
      for the freed module. Future patch has test case to verify we don't
      regress in this area in future.
      
      There are multiple points after prepare_coming_module (in load_module)
      where failure can occur and module loading can return error. We
      illustrate and test for the race using the last point where it can
      practically occur (in module __init function).
      
      An illustration of the race:
      
      CPU 0                           CPU 1
      			  load_module
      			    notifier_call(MODULE_STATE_COMING)
      			      btf_parse_module
      			      btf_alloc_id	// Published to userspace
      			      list_add(&btf_mod->list, btf_modules)
      			    mod->init(...)
      ...				^
      bpf_check		        |
      check_pseudo_btf_id             |
        btf_try_get_module            |
          returns true                |  ...
      ...                             |  module __init in progress
      return prog_fd                  |  ...
      ...                             V
      			    if (ret < 0)
      			      free_module(mod)
      			    ...
      close(prog_fd)
       ...
       bpf_prog_free_deferred
        module_put(used_btf.mod) // use-after-free
      
      We fix this issue by setting a flag BTF_MODULE_F_LIVE, from the notifier
      callback when MODULE_STATE_LIVE state is reached for the module, so that
      we return NULL from btf_try_get_module for modules that are not fully
      formed. Since try_module_get already checks that module is not in
      MODULE_STATE_GOING state, and that is the only transition a live module
      can make before being removed from btf_modules list, this is enough to
      close the race and prevent the bug.
      
      A later selftest patch crafts the race condition artifically to verify
      that it has been fixed, and that verifier fails to load program (with
      ENXIO).
      
      Lastly, a couple of comments:
      
       1. Even if this race didn't exist, it seems more appropriate to only
          access resources (ksyms and kfuncs) of a fully formed module which
          has been initialized completely.
      
       2. This patch was born out of need for synchronization against module
          initcall for the next patch, so it is needed for correctness even
          without the aforementioned race condition. The BTF resources
          initialized by module initcall are set up once and then only looked
          up, so just waiting until the initcall has finished ensures correct
          behavior.
      
      Fixes: 541c3bad ("bpf: Support BPF ksym variables in kernel modules")
      Signed-off-by: default avatarKumar Kartikeya Dwivedi <memxor@gmail.com>
      Link: https://lore.kernel.org/r/20220114163953.1455836-2-memxor@gmail.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      18688de2
  2. 15 Jan, 2022 3 commits
  3. 13 Jan, 2022 13 commits
  4. 11 Jan, 2022 8 commits
    • Linus Torvalds's avatar
      Merge tag 'devprop-5.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · fe8152b3
      Linus Torvalds authored
      Pull device properties framework updates from Rafael Wysocki:
       "These update the handling of software nodes and graph properties, and
        the MAINTAINERS entry for the former.
      
        Specifics:
      
         - Remove device_add_properties() which does not work correctly if
           software nodes holding additional device properties are shared or
           reused (Heikki Krogerus).
      
         - Fix nargs_prop property handling for software nodes (Clément
           Léger).
      
         - Update documentation of ACPI device properties (Sakari Ailus).
      
         - Update the handling of graph properties in the generic framework to
           match the DT case (Sakari Ailus).
      
         - Update software nodes entry in MAINTAINERS (Andy Shevchenko)"
      
      * tag 'devprop-5.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
        software node: Update MAINTAINERS data base
        software node: fix wrong node passed to find nargs_prop
        device property: Drop fwnode_graph_get_remote_node()
        device property: Use fwnode_graph_for_each_endpoint() macro
        device property: Implement fwnode_graph_get_endpoint_count()
        Documentation: ACPI: Update references
        Documentation: ACPI: Fix data node reference documentation
        device property: Fix documentation for FWNODE_GRAPH_DEVICE_DISABLED
        device property: Fix fwnode_graph_devcon_match() fwnode leak
        device property: Remove device_add_properties() API
        driver core: Don't call device_remove_properties() from device_del()
        PCI: Convert to device_create_managed_software_node()
      fe8152b3
    • Linus Torvalds's avatar
      Merge tag 'thermal-5.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · fe2437cc
      Linus Torvalds authored
      Pull thermal control updates from Rafael Wysocki:
       "These add a new driver for Renesas RZ/G2L TSU, update a few existing
        thermal control drivers and clean up the tmon utility.
      
        Specifics:
      
         - Add new TSU driver and DT bindings for the Renesas RZ/G2L platform
           (Biju Das).
      
         - Fix missing check when calling reset_control_deassert() in the
           rz2gl thermal driver (Biju Das).
      
         - In preparation for FORTIFY_SOURCE performing compile-time and
           run-time field bounds checking for memcpy(), avoid intentionally
           writing across neighboring fields in the int340x thermal control
           driver (Kees Cook).
      
         - Fix RFIM mailbox write commands handling in the int340x thermal
           control driver (Sumeet Pawnikar).
      
         - Fix PM issue occurring in the iMX thermal control driver during
           suspend/resume by implementing PM runtime support in it (Oleksij
           Rempel).
      
         - Add 'const' annotation to thermal_cooling_ops in the Intel
           powerclamp driver (Rikard Falkeborn).
      
         - Fix missing ADC bit set in the iMX8MP thermal driver to enable the
           sensor (Paul Gerber).
      
         - Drop unused local variable definition from tmon (ran jianping)"
      
      * tag 'thermal-5.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
        thermal/drivers/int340x: Fix RFIM mailbox write commands
        thermal/drivers/rz2gl: Add error check for reset_control_deassert()
        thermal/drivers/imx8mm: Enable ADC when enabling monitor
        thermal/drivers: Add TSU driver for RZ/G2L
        dt-bindings: thermal: Document Renesas RZ/G2L TSU
        thermal/drivers/intel_powerclamp: Constify static thermal_cooling_device_ops
        thermal/drivers/imx: Implement runtime PM support
        thermal: tools: tmon: remove unneeded local variable
        thermal: int340x: Use struct_group() for memcpy() region
      fe2437cc
    • Linus Torvalds's avatar
      Merge tag 'pm-5.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · b35b6d4d
      Linus Torvalds authored
      Pull power management updates from Rafael Wysocki:
       "The most signigicant change here is the addition of a new cpufreq
        'P-state' driver for AMD processors as a better replacement for the
        venerable acpi-cpufreq driver.
      
        There are also other cpufreq updates (in the core, intel_pstate, ARM
        drivers), PM core updates (mostly related to adding new macros for
        declaring PM operations which should make the lives of driver
        developers somewhat easier), and a bunch of assorted fixes and
        cleanups.
      
        Summary:
      
         - Add new P-state driver for AMD processors (Huang Rui).
      
         - Fix initialization of min and max frequency QoS requests in the
           cpufreq core (Rafael Wysocki).
      
         - Fix EPP handling on Alder Lake in intel_pstate (Srinivas
           Pandruvada).
      
         - Make intel_pstate update cpuinfo.max_freq when notified of HWP
           capabilities changes and drop a redundant function call from that
           driver (Rafael Wysocki).
      
         - Improve IRQ support in the Qcom cpufreq driver (Ard Biesheuvel,
           Stephen Boyd, Vladimir Zapolskiy).
      
         - Fix double devm_remap() in the Mediatek cpufreq driver (Hector
           Yuan).
      
         - Introduce thermal pressure helpers for cpufreq CPU cooling (Lukasz
           Luba).
      
         - Make cpufreq use default_groups in kobj_type (Greg Kroah-Hartman).
      
         - Make cpuidle use default_groups in kobj_type (Greg Kroah-Hartman).
      
         - Fix two comments in cpuidle code (Jason Wang, Yang Li).
      
         - Allow model-specific normal EPB value to be used in the intel_epb
           sysfs attribute handling code (Srinivas Pandruvada).
      
         - Simplify locking in pm_runtime_put_suppliers() (Rafael Wysocki).
      
         - Add safety net to supplier device release in the runtime PM core
           code (Rafael Wysocki).
      
         - Capture device status before disabling runtime PM for it (Rafael
           Wysocki).
      
         - Add new macros for declaring PM operations to allow drivers to
           avoid guarding them with CONFIG_PM #ifdefs or __maybe_unused and
           update some drivers to use these macros (Paul Cercueil).
      
         - Allow ACPI hardware signature to be honoured during restore from
           hibernation (David Woodhouse).
      
         - Update outdated operating performance points (OPP) documentation
           (Tang Yizhou).
      
         - Reduce log severity for informative message regarding frequency
           transition failures in devfreq (Tzung-Bi Shih).
      
         - Add DRAM frequency controller devfreq driver for Allwinner sunXi
           SoCs (Samuel Holland).
      
         - Add missing COMMON_CLK dependency to sun8i devfreq driver (Arnd
           Bergmann).
      
         - Add support for new layout of Psys PowerLimit Register on SPR to
           the Intel RAPL power capping driver (Zhang Rui).
      
         - Fix typo in a comment in idle_inject.c (Jason Wang).
      
         - Remove unused function definition from the DTPM (Dynamit Thermal
           Power Management) power capping framework (Daniel Lezcano).
      
         - Reduce DTPM trace verbosity (Daniel Lezcano)"
      
      * tag 'pm-5.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (53 commits)
        x86, sched: Fix undefined reference to init_freq_invariance_cppc() build error
        cpufreq: amd-pstate: Fix Kconfig dependencies for AMD P-State
        cpufreq: amd-pstate: Fix struct amd_cpudata kernel-doc comment
        cpuidle: use default_groups in kobj_type
        x86: intel_epb: Allow model specific normal EPB value
        MAINTAINERS: Add AMD P-State driver maintainer entry
        Documentation: amd-pstate: Add AMD P-State driver introduction
        cpufreq: amd-pstate: Add AMD P-State performance attributes
        cpufreq: amd-pstate: Add AMD P-State frequencies attributes
        cpufreq: amd-pstate: Add boost mode support for AMD P-State
        cpufreq: amd-pstate: Add trace for AMD P-State module
        cpufreq: amd-pstate: Introduce the support for the processors with shared memory solution
        cpufreq: amd-pstate: Add fast switch function for AMD P-State
        cpufreq: amd-pstate: Introduce a new AMD P-State driver to support future processors
        ACPI: CPPC: Add CPPC enable register function
        ACPI: CPPC: Check present CPUs for determining _CPC is valid
        ACPI: CPPC: Implement support for SystemIO registers
        x86/msr: Add AMD CPPC MSR definitions
        x86/cpufeatures: Add AMD Collaborative Processor Performance Control feature flag
        cpufreq: use default_groups in kobj_type
        ...
      b35b6d4d
    • Linus Torvalds's avatar
      Merge tag 'acpi-5.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · bca21755
      Linus Torvalds authored
      Pull ACPI updates from Rafael Wysocki:
       "These are usual ACPICA code updates (although there are more of them
        than in the last few releases), a noticeable EC driver update (which
        mostly consists of cleanups, though), the device enumeration quirks
        handling rework from Hans, some updates eliminating unnecessary CPU
        cache flushing in some places (processor idle and system-wide PM code)
        and a bunch of assorted cleanups and fixes.
      
        Specifics:
      
         - Update ACPICA code in the kernel to the 20211217 upstream release
           including the following changes:
      
            - iASL/Disassembler: Additional support for NHLT table (Bob
              Moore).
            - Change a return_ACPI_STATUS (AE_BAD_PARAMETER) (Bob Moore).
            - Fix a couple of warnings under MSVC (Bob Moore).
            - iASL: Add TDEL table to both compiler/disassembler (Bob Moore).
            - iASL/NHLT table: "Specific Data" field support (Bob Moore).
            - Use original data_table_region pointer for accesses (Jessica
              Clarke).
            - Use original pointer for virtual origin tables (Jessica Clarke).
            - Macros: Remove ACPI_PHYSADDR_TO_PTR (Jessica Clarke).
            - Avoid subobject buffer overflow when validating RSDP signature
              (Jessica Clarke).
            - iASL: Add suppport for AGDI table (Ilkka Koskinen).
            - Hardware: Do not flush CPU cache when entering S4 and S5 (Kirill
              A. Shutemov).
            - Expand the ACPI_ACCESS_ definitions (Mark Langsdorf).
            - Utilities: Avoid deleting the same object twice in a row (Rafael
              Wysocki).
            - Executer: Fix REFCLASS_REFOF case in acpi_ex_opcode_1A_0T_1R()
              (Rafael Wysocki).
            - Fix AEST Processor generic resource substructure data field byte
              length (Shuuichirou Ishii).
            - Fix wrong interpretation of PCC address (Sudeep Holla).
            - Add support for PCC Opregion special context data (Sudeep
              Holla).
      
         - Implement OperationRegion handler for PCC Type 3 subtype (Sudeep
           Holla).
      
         - Introduce acpi_fetch_acpi_dev() as a replacement for
           acpi_bus_get_device() and use it in the ACPI subsystem (Rafael
           Wysocki).
      
         - Avoid using _CID for device enumaration if _HID is missing or
           invalid (Rafael Wysocki).
      
         - Rework quirk handling during ACPI device enumeration and add some
           new quirks for known broken platforms (Hans de Goede).
      
         - Avoid unnecessary or redundant CPU cache flushing during system PM
           transitions (Kirill A. Shutemov).
      
         - Add PM debug messages related to power resources (Rafael Wysocki).
      
         - Fix kernel-doc comment in the PCI host bridge ACPI driver (Yang
           Li).
      
         - Rework flushing of EC work while suspended to idle and clean up the
           handling of events in the ACPI EC driver (Rafael Wysocki).
      
         - Prohibit ec_sys module parameter write_support from being used when
           the system is locked down (Hans de Goede).
      
         - Make the ACPI processor thermal driver use cpufreq_cpu_get() to
           check for presence of cpufreq policy (Manfred Spraul).
      
         - Avoid unnecessary CPU cache flushing in the ACPI processor idle
           driver (Kirill A. Shutemov).
      
         - Replace kernel.h with the necessary inclusions in the ACPI
           processor driver (Andy Shevchenko).
      
         - Use swap() instead of open coding it in the ACPI processor idle
           driver (Guo Zhengkui).
      
         - Fix the handling of defective LPAT in the ACPI xpower PMIC driver
           and clean up some definitions of PMIC data structures (Hans de
           Goede).
      
         - Fix outdated comment in the ACPI DPTF driver (Sumeet Pawnikar).
      
         - Add AEST to the list of known ACPI table signatures (Shuuichirou
           Ishii).
      
         - Make ACPI NUMA code take hotpluggable memblocks into account when
           CONFIG_MEMORY_HOTPLUG is not set (Vitaly Kuznetsov).
      
         - Use default_groups in kobj_type in the ACPI sysfs code (Greg
           Kroah-Hartman).
      
         - Rearrange _CPC structure documentation (Andy Shevchenko).
      
         - Drop an always true check from the ACPI thermal driver (Adam
           Borowski).
      
         - Add new "not charging" quirk for Lenovo ThinkPads to the ACPI
           battery driver (Thomas Weißschuh)"
      
      * tag 'acpi-5.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (64 commits)
        ACPI: PCC: Implement OperationRegion handler for the PCC Type 3 subtype
        ACPI / x86: Skip AC and battery devices on x86 Android tablets with broken DSDTs
        ACPI / x86: Introduce an acpi_quirk_skip_acpi_ac_and_battery() helper
        ACPI: processor: thermal: avoid cpufreq_get_policy()
        serdev: Do not instantiate serdevs on boards with known bogus DSDT entries
        i2c: acpi: Do not instantiate I2C-clients on boards with known bogus DSDT entries
        ACPI / x86: Add acpi_quirk_skip_[i2c_client|serdev]_enumeration() helpers
        ACPI: scan: Create platform device for BCM4752 and LNV4752 ACPI nodes
        PCI/ACPI: Fix acpi_pci_osc_control_set() kernel-doc comment
        ACPI: battery: Add the ThinkPad "Not Charging" quirk
        ACPI: sysfs: use default_groups in kobj_type
        ACPICA: Update version to 20211217
        ACPICA: iASL/NHLT table: "Specific Data" field support
        ACPICA: iASL: Add suppport for AGDI table
        ACPICA: iASL: Add TDEL table to both compiler/disassembler
        ACPICA: Fixed a couple of warnings under MSVC
        ACPICA: Change a return_ACPI_STATUS (AE_BAD_PARAMETER)
        ACPICA: Hardware: Do not flush CPU cache when entering S4 and S5
        ACPICA: Add support for PCC Opregion special context data
        ACPICA: Fix wrong interpretation of PCC address
        ...
      bca21755
    • Linus Torvalds's avatar
      netfilter: nf_tables: don't use 'data_size' uninitialized · 63045bfd
      Linus Torvalds authored
      Commit 2c865a8a ("netfilter: nf_tables: add rule blob layout") never
      initialized the new 'data_size' variable.
      
      I'm not sure how it ever worked, but it might have worked almost by
      accident - gcc seems to occasionally miss these kinds of 'variable used
      uninitialized' situations, but I've seen it do so because it ended up
      zero-initializing them due to some other simplification.
      
      But clang is very unhappy about it all, and correctly reports
      
          net/netfilter/nf_tables_api.c:8278:4: error: variable 'data_size' is uninitialized when used here [-Werror,-Wuninitialized]
                                  data_size += sizeof(*prule) + rule->dlen;
                                  ^~~~~~~~~
          net/netfilter/nf_tables_api.c:8263:30: note: initialize the variable 'data_size' to silence this warning
                  unsigned int size, data_size;
                                              ^
                                               = 0
          1 error generated.
      
      and this fix just initializes 'data_size' to zero before the loop.
      
      Fixes: 2c865a8a ("netfilter: nf_tables: add rule blob layout")
      Cc: Pablo Neira Ayuso <pablo@netfilter.org>
      Cc: Jakub Kicinski <kuba@kernel.org>
      Cc: David Miller <davem@davemloft.net>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      63045bfd
    • Linus Torvalds's avatar
      Merge tag '5.17-net-next' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next · 8efd0d9c
      Linus Torvalds authored
      Pull networking updates from Jakub Kicinski:
       "Core
        ----
      
         - Defer freeing TCP skbs to the BH handler, whenever possible, or at
           least perform the freeing outside of the socket lock section to
           decrease cross-CPU allocator work and improve latency.
      
         - Add netdevice refcount tracking to locate sources of netdevice and
           net namespace refcount leaks.
      
         - Make Tx watchdog less intrusive - avoid pausing Tx and restarting
           all queues from a single CPU removing latency spikes.
      
         - Various small optimizations throughout the stack from Eric Dumazet.
      
         - Make netdev->dev_addr[] constant, force modifications to go via
           appropriate helpers to allow us to keep addresses in ordered data
           structures.
      
         - Replace unix_table_lock with per-hash locks, improving performance
           of bind() calls.
      
         - Extend skb drop tracepoint with a drop reason.
      
         - Allow SO_MARK and SO_PRIORITY setsockopt under CAP_NET_RAW.
      
        BPF
        ---
      
         - New helpers:
            - bpf_find_vma(), find and inspect VMAs for profiling use cases
            - bpf_loop(), runtime-bounded loop helper trading some execution
              time for much faster (if at all converging) verification
            - bpf_strncmp(), improve performance, avoid compiler flakiness
            - bpf_get_func_arg(), bpf_get_func_ret(), bpf_get_func_arg_cnt()
              for tracing programs, all inlined by the verifier
      
         - Support BPF relocations (CO-RE) in the kernel loader.
      
         - Further the support for BTF_TYPE_TAG annotations.
      
         - Allow access to local storage in sleepable helpers.
      
         - Convert verifier argument types to a composable form with different
           attributes which can be shared across types (ro, maybe-null).
      
         - Prepare libbpf for upcoming v1.0 release by cleaning up APIs,
           creating new, extensible ones where missing and deprecating those
           to be removed.
      
        Protocols
        ---------
      
         - WiFi (mac80211/cfg80211):
            - notify user space about long "come back in N" AP responses,
              allow it to react to such temporary rejections
            - allow non-standard VHT MCS 10/11 rates
            - use coarse time in airtime fairness code to save CPU cycles
      
         - Bluetooth:
            - rework of HCI command execution serialization to use a common
              queue and work struct, and improve handling errors reported in
              the middle of a batch of commands
            - rework HCI event handling to use skb_pull_data, avoiding packet
              parsing pitfalls
            - support AOSP Bluetooth Quality Report
      
         - SMC:
            - support net namespaces, following the RDMA model
            - improve connection establishment latency by pre-clearing buffers
            - introduce TCP ULP for automatic redirection to SMC
      
         - Multi-Path TCP:
            - support ioctls: SIOCINQ, OUTQ, and OUTQNSD
            - support socket options: IP_TOS, IP_FREEBIND, IP_TRANSPARENT,
              IPV6_FREEBIND, and IPV6_TRANSPARENT, TCP_CORK and TCP_NODELAY
            - support cmsgs: TCP_INQ
            - improvements in the data scheduler (assigning data to subflows)
            - support fastclose option (quick shutdown of the full MPTCP
              connection, similar to TCP RST in regular TCP)
      
         - MCTP (Management Component Transport) over serial, as defined by
           DMTF spec DSP0253 - "MCTP Serial Transport Binding".
      
        Driver API
        ----------
      
         - Support timestamping on bond interfaces in active/passive mode.
      
         - Introduce generic phylink link mode validation for drivers which
           don't have any quirks and where MAC capability bits fully express
           what's supported. Allow PCS layer to participate in the validation.
           Convert a number of drivers.
      
         - Add support to set/get size of buffers on the Rx rings and size of
           the tx copybreak buffer via ethtool.
      
         - Support offloading TC actions as first-class citizens rather than
           only as attributes of filters, improve sharing and device resource
           utilization.
      
         - WiFi (mac80211/cfg80211):
            - support forwarding offload (ndo_fill_forward_path)
            - support for background radar detection hardware
            - SA Query Procedures offload on the AP side
      
        New hardware / drivers
        ----------------------
      
         - tsnep - FPGA based TSN endpoint Ethernet MAC used in PLCs with
           real-time requirements for isochronous communication with protocols
           like OPC UA Pub/Sub.
      
         - Qualcomm BAM-DMUX WWAN - driver for data channels of modems
           integrated into many older Qualcomm SoCs, e.g. MSM8916 or MSM8974
           (qcom_bam_dmux).
      
         - Microchip LAN966x multi-port Gigabit AVB/TSN Ethernet Switch driver
           with support for bridging, VLANs and multicast forwarding
           (lan966x).
      
         - iwlmei driver for co-operating between Intel's WiFi driver and
           Intel's Active Management Technology (AMT) devices.
      
         - mse102x - Vertexcom MSE102x Homeplug GreenPHY chips
      
         - Bluetooth:
            - MediaTek MT7921 SDIO devices
            - Foxconn MT7922A
            - Realtek RTL8852AE
      
        Drivers
        -------
      
         - Significantly improve performance in the datapaths of: lan78xx,
           ax88179_178a, lantiq_xrx200, bnxt.
      
         - Intel Ethernet NICs:
            - igb: support PTP/time PEROUT and EXTTS SDP functions on
              82580/i354/i350 adapters
            - ixgbevf: new PF -> VF mailbox API which avoids the risk of
              mailbox corruption with ESXi
            - iavf: support configuration of VLAN features of finer
              granularity, stacked tags and filtering
            - ice: PTP support for new E822 devices with sub-ns precision
            - ice: support firmware activation without reboot
      
         - Mellanox Ethernet NICs (mlx5):
            - expose control over IRQ coalescing mode (CQE vs EQE) via ethtool
            - support TC forwarding when tunnel encap and decap happen between
              two ports of the same NIC
            - dynamically size and allow disabling various features to save
              resources for running in embedded / SmartNIC scenarios
      
         - Broadcom Ethernet NICs (bnxt):
            - use page frag allocator to improve Rx performance
            - expose control over IRQ coalescing mode (CQE vs EQE) via ethtool
      
         - Other Ethernet NICs:
            - amd-xgbe: add Ryzen 6000 (Yellow Carp) Ethernet support
      
         - Microsoft cloud/virtual NIC (mana):
            - add XDP support (PASS, DROP, TX)
      
         - Mellanox Ethernet switches (mlxsw):
            - initial support for Spectrum-4 ASICs
            - VxLAN with IPv6 underlay
      
         - Marvell Ethernet switches (prestera):
            - support flower flow templates
            - add basic IP forwarding support
      
         - NXP embedded Ethernet switches (ocelot & felix):
            - support Per-Stream Filtering and Policing (PSFP)
            - enable cut-through forwarding between ports by default
            - support FDMA to improve packet Rx/Tx to CPU
      
         - Other embedded switches:
            - hellcreek: improve trapping management (STP and PTP) packets
            - qca8k: support link aggregation and port mirroring
      
         - Qualcomm 802.11ax WiFi (ath11k):
            - qca6390, wcn6855: enable 802.11 power save mode in station mode
            - BSS color change support
            - WCN6855 hw2.1 support
            - 11d scan offload support
            - scan MAC address randomization support
            - full monitor mode, only supported on QCN9074
            - qca6390/wcn6855: report signal and tx bitrate
            - qca6390: rfkill support
            - qca6390/wcn6855: regdb.bin support
      
         - Intel WiFi (iwlwifi):
            - support SAR GEO Offset Mapping (SGOM) and Time-Aware-SAR (TAS)
              in cooperation with the BIOS
            - support for Optimized Connectivity Experience (OCE) scan
            - support firmware API version 68
            - lots of preparatory work for the upcoming Bz device family
      
         - MediaTek WiFi (mt76):
            - Specific Absorption Rate (SAR) support
            - mt7921: 160 MHz channel support
      
         - RealTek WiFi (rtw88):
            - Specific Absorption Rate (SAR) support
            - scan offload
      
         - Other WiFi NICs
            - ath10k: support fetching (pre-)calibration data from nvmem
            - brcmfmac: configure keep-alive packet on suspend
            - wcn36xx: beacon filter support"
      
      * tag '5.17-net-next' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2048 commits)
        tcp: tcp_send_challenge_ack delete useless param `skb`
        net/qla3xxx: Remove useless DMA-32 fallback configuration
        rocker: Remove useless DMA-32 fallback configuration
        hinic: Remove useless DMA-32 fallback configuration
        lan743x: Remove useless DMA-32 fallback configuration
        net: enetc: Remove useless DMA-32 fallback configuration
        cxgb4vf: Remove useless DMA-32 fallback configuration
        cxgb4: Remove useless DMA-32 fallback configuration
        cxgb3: Remove useless DMA-32 fallback configuration
        bnx2x: Remove useless DMA-32 fallback configuration
        et131x: Remove useless DMA-32 fallback configuration
        be2net: Remove useless DMA-32 fallback configuration
        vmxnet3: Remove useless DMA-32 fallback configuration
        bna: Simplify DMA setting
        net: alteon: Simplify DMA setting
        myri10ge: Simplify DMA setting
        qlcnic: Simplify DMA setting
        net: allwinner: Fix print format
        page_pool: remove spinlock in page_pool_refill_alloc_cache()
        amt: fix wrong return type of amt_send_membership_update()
        ...
      8efd0d9c
    • Linus Torvalds's avatar
      Merge tag 'media/v5.17-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media · 9bcbf894
      Linus Torvalds authored
      Pull media updates from Mauro Carvalho Chehab:
      
       - New sensor driver: ov5693
      
       - A new driver for STM32 Chrom-ART Accelerator
      
       - Added V4L2 core helper functions for VP9 codec
      
       - Hantro driver has gained support for VP9 codecs
      
       - Added support for Maxim MAX96712 Quad GMSL2 Deserializer
      
       - The staging atomisp driver has gained lots of improvements, fixes and
         cleanups. It now works with userptr
      
       - Lots of random driver improvements as usual
      
      * tag 'media/v5.17-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (397 commits)
        media: ipu3-cio2: Add support for instantiating i2c-clients for VCMs
        media: ipu3-cio2: Call cio2_bridge_init() before anything else
        media: ipu3-cio2: Defer probing until the PMIC is fully setup
        media: hantro: Add support for Allwinner H6
        media: dt-bindings: allwinner: document H6 Hantro G2 binding
        media: hantro: Convert imx8m_vpu_g2_irq to helper
        media: hantro: move postproc enablement for old cores
        media: hantro: vp9: add support for legacy register set
        media: hantro: vp9: use double buffering if needed
        media: hantro: add support for reset lines
        media: hantro: Fix probe func error path
        media: i2c: hi846: use pm_runtime_force_suspend/resume for system suspend
        media: i2c: hi846: check return value of regulator_bulk_disable()
        media: hi556: Support device probe in non-zero ACPI D state
        media: ov5675: Support device probe in non-zero ACPI D state
        media: imx208: Support device probe in non-zero ACPI D state
        media: ov2740: support device probe in non-zero ACPI D state
        media: ov5670: Support device probe in non-zero ACPI D state
        media: ov8856: support device probe in non-zero ACPI D state
        media: ov8865: Disable only enabled regulators on error path
        ...
      9bcbf894
    • Linus Torvalds's avatar
      Revert "drm/amd/display: Fix for otg synchronization logic" · 75b950ef
      Linus Torvalds authored
      This reverts commit a896f870.
      
      It causes odd flickering on my Radeon RX580 (PCI ID 1002:67df rev e7,
      subsystem ID 1da2:e353).
      
      Bisected right to this commit, and reverting it fixes things.
      
      Link: https://lore.kernel.org/all/CAHk-=wg9hDde_L3bK9tAfdJ4N=TJJ+SjO3ZDONqH5=bVoy_Mzg@mail.gmail.com/
      Cc: Alex Deucher <alexdeucher@gmail.com>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Cc: Harry Wentland <harry.wentland@amd.com>
      Cc: Dave Airlie <airlied@gmail.com>
      Cc: Christian Koenig <christian.koenig@amd.com>
      Cc: Jun Lei <Jun.Lei@amd.com>
      Cc: Mustapha Ghaddar <mustapha.ghaddar@amd.com>
      Cc: Bhawanpreet Lakha <Bhawanpreet.Lakha@amd.com>
      Cc: meenakshikumar somasundaram <meenakshikumar.somasundaram@amd.com>
      Cc: Daniel Wheeler <daniel.wheeler@amd.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      75b950ef
  5. 10 Jan, 2022 3 commits
    • Linus Torvalds's avatar
      Merge tag 'drm-next-2022-01-07' of git://anongit.freedesktop.org/drm/drm · 8d0749b4
      Linus Torvalds authored
      Pull drm updates from Dave Airlie:
       "Highlights are support for privacy screens found in new laptops, a
        bunch of nomodeset refactoring, and i915 enables ADL-P systems by
        default, while starting to add RPL-S support.
      
        vmwgfx adds GEM and support for OpenGL 4.3 features in userspace.
      
        Lots of internal refactorings around dma reservations, and lots of
        driver refactoring as well.
      
        Summary:
      
        core:
         - add privacy screen support
         - move nomodeset option into drm subsystem
         - clean up nomodeset handling in drivers
         - make drm_irq.c legacy
         - fix stack_depot name conflicts
         - remove DMA_BUF_SET_NAME ioctl restrictions
         - sysfs: send hotplug event
         - replace several DRM_* logging macros with drm_*
         - move hashtable to legacy code
         - add error return from gem_create_object
         - cma-helper: improve interfaces, drop CONFIG_DRM_KMS_CMA_HELPER
         - kernel.h related include cleanups
         - support XRGB2101010 source buffers
      
        ttm:
         - don't include drm hashtable
         - stop pruning fences after wait
         - documentation updates
      
        dma-buf:
         - add dma_resv selftest
         - add debugfs helpers
         - remove dma_resv_get_excl_unlocked
         - documentation
         - make fences mandatory in dma_resv_add_excl_fence
      
        dp:
         - add link training delay helpers
      
        gem:
         - link shmem/cma helpers into separate modules
         - use dma_resv iteratior
         - import dma-buf namespace into gem helper modules
      
        scheduler:
         - fence grab fix
         - lockdep fixes
      
        bridge:
         - switch to managed MIPI DSI helpers
         - register and attach during probe fixes
         - convert to YAML in several places.
      
        panel:
         - add bunch of new panesl
      
        simpledrm:
         - support FB_DAMAGE_CLIPS
         - support virtual screen sizes
         - add Apple M1 support
      
        amdgpu:
         - enable seamless boot for DCN 3.01
         - runtime PM fixes
         - use drm_kms_helper_connector_hotplug_event
         - get all fences at once
         - use generic drm fb helpers
         - PSR/DPCD/LTTPR/DSC/PM/RAS/OLED/SRIOV fixes
         - add smart trace buffer (STB) for supported GPUs
         - display debugfs entries
         - new SMU debug option
         - Documentation update
      
        amdkfd:
         - IP discovery enumeration refactor
         - interface between driver fixes
         - SVM fixes
         - kfd uapi header to define some sysfs bitfields.
      
        i915:
         - support VESA panel backlights
         - enable ADL-P by default
         - add eDP privacy screen support
         - add Raptor Lake S (RPL-S) support
         - DG2 page table support
         - lots of GuC/HuC fw refactoring
         - refactored i915->gt interfaces
         - CD clock squashing support
         - enable 10-bit gamma support
         - update ADL-P DMC fw to v2.14
         - enable runtime PM autosuspend by default
         - ADL-P DSI support
         - per-lane DP drive settings for ICL+
         - add support for pipe C/D DMC firmware
         - Atomic gamma LUT updates
         - remove CCS FB stride restrictions on ADL-P
         - VRR platform support for display 11
         - add support for display audio codec keepalive
         - lots of display refactoring
         - fix runtime PM handling during PXP suspend
         - improved eviction performance with async TTM moves
         - async VMA unbinding improvements
         - VMA locking refactoring
         - improved error capture robustness
         - use per device iommu checks
         - drop bits stealing from i915_sw_fence function ptr
         - remove dma_resv_prune
         - add IC cache invalidation on DG2
      
        nouveau:
         - crc fixes
         - validate LUTs in atomic check
         - set HDMI AVI RGB quant to full
      
        tegra:
         - buffer objects reworks for dma-buf compat
         - NVDEC driver uAPI support
         - power management improvements
      
        etnaviv:
         - IOMMU enabled system support
         - fix > 4GB command buffer mapping
         - close a DoS vector
         - fix spurious GPU resets
      
        ast:
         - fix i2c initialization
      
        rcar-du:
         - DSI output support
      
        exynos:
         - replace legacy gpio interface
         - implement generic GEM object mmap
      
        msm:
         - dpu plane state cleanup in prep for multirect
         - dpu debugfs cleanups
         - dp support for sc7280
         - a506 support
         - removal of struct_mutex
         - remove old eDP sub-driver
      
        anx7625:
         - support MIPI DSI input
         - support HDMI audio
         - fix reading EDID
      
        lvds:
         - fix bridge DT bindings
      
        megachips:
         - probe both bridges before registering
      
        dw-hdmi:
         - allow interlace on bridge
      
        ps8640:
         - enable runtime PM
         - support aux-bus
      
        tx358768:
         - enable reference clock
         - add pulse mode support
      
        ti-sn65dsi86:
         - use regmap bulk write
         - add PWM support
      
        etnaviv:
         - get all fences at once
      
        gma500:
         - gem object cleanups
      
        kmb:
         - enable fb console
      
        radeon:
         - use dma_resv_wait_timeout
      
        rockchip:
         - add DSP hold timeout
         - suspend/resume fixes
         - PLL clock fixes
         - implement mmap in GEM object functions
         - use generic fbdev emulation
      
        sun4i:
         - use CMA helpers without vmap support
      
        vc4:
         - fix HDMI-CEC hang with display is off
         - power on HDMI controller while disabling
         - support 4K@60Hz modes
         - support 10-bit YUV 4:2:0 output
      
        vmwgfx:
         - fix leak on probe errors
         - fail probing on broken hosts
         - new placement for MOB page tables
         - hide internal BOs from userspace
         - implement GEM support
         - implement GL 4.3 support
      
        virtio:
         - overflow fixes
      
        xen:
         - implement mmap as GEM object function
      
        omapdrm:
         - fix scatterlist export
         - support virtual planes
      
        mediatek:
         - MT8192 support
         - CMDQ refinement"
      
      * tag 'drm-next-2022-01-07' of git://anongit.freedesktop.org/drm/drm: (1241 commits)
        drm/amdgpu: no DC support for headless chips
        drm/amd/display: fix dereference before NULL check
        drm/amdgpu: always reset the asic in suspend (v2)
        drm/amdgpu: put SMU into proper state on runpm suspending for BOCO capable platform
        drm/amd/display: Fix the uninitialized variable in enable_stream_features()
        drm/amdgpu: fix runpm documentation
        amdgpu/pm: Make sysfs pm attributes as read-only for VFs
        drm/amdgpu: save error count in RAS poison handler
        drm/amdgpu: drop redundant semicolon
        drm/amd/display: get and restore link res map
        drm/amd/display: support dynamic HPO DP link encoder allocation
        drm/amd/display: access hpo dp link encoder only through link resource
        drm/amd/display: populate link res in both detection and validation
        drm/amd/display: define link res and make it accessible to all link interfaces
        drm/amd/display: 3.2.167
        drm/amd/display: [FW Promotion] Release 0.0.98
        drm/amd/display: Undo ODM combine
        drm/amd/display: Add reg defs for DCN303
        drm/amd/display: Changed pipe split policy to allow for multi-display pipe split
        drm/amd/display: Set optimize_pwr_state for DCN31
        ...
      8d0749b4
    • Linus Torvalds's avatar
      Merge tag 'linux-kselftest-kunit-5.17-rc1' of... · bf4eebf8
      Linus Torvalds authored
      Merge tag 'linux-kselftest-kunit-5.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
      
      Pull KUnit updates from Shuah Khan:
       "This consists of several fixes and enhancements. A few highlights:
      
         - Option --kconfig_add option allows easily tweaking kunitconfigs
      
         - make build subcommand can reconfigure if needed
      
         - doesn't error on tests without test plans
      
         - doesn't crash if no parameters are generated
      
         - defaults --jobs to # of cups
      
         - reports test parameter results as (K)TAP subtests"
      
      * tag 'linux-kselftest-kunit-5.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
        kunit: tool: Default --jobs to number of CPUs
        kunit: tool: fix newly introduced typechecker errors
        kunit: tool: make `build` subcommand also reconfigure if needed
        kunit: tool: delete kunit_parser.TestResult type
        kunit: tool: use dataclass instead of collections.namedtuple
        kunit: tool: suggest using decode_stacktrace.sh on kernel crash
        kunit: tool: reconfigure when the used kunitconfig changes
        kunit: tool: revamp message for invalid kunitconfig
        kunit: tool: add --kconfig_add to allow easily tweaking kunitconfigs
        kunit: tool: move Kconfig read_from_file/parse_from_string to package-level
        kunit: tool: print parsed test results fully incrementally
        kunit: Report test parameter results as (K)TAP subtests
        kunit: Don't crash if no parameters are generated
        kunit: tool: Report an error if any test has no subtests
        kunit: tool: Do not error on tests without test plans
        kunit: add run_checks.py script to validate kunit changes
        Documentation: kunit: remove claims that kunit is a mocking framework
        kunit: tool: fix --json output for skipped tests
      bf4eebf8
    • Linus Torvalds's avatar
      Merge tag 'linux-kselftest-next-5.17-rc1' of... · 4369b3ce
      Linus Torvalds authored
      Merge tag 'linux-kselftest-next-5.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
      
      Pull Kselftest update from Shuah Khan:
       "Fixes to build errors, false negatives, and several code cleanups,
        including the ARRAY_SIZE cleanup that removes 25+ duplicates
        ARRAY_SIZE defines from individual tests"
      
      * tag 'linux-kselftest-next-5.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
        selftests/vm: remove ARRAY_SIZE define from individual tests
        selftests/timens: remove ARRAY_SIZE define from individual tests
        selftests/sparc64: remove ARRAY_SIZE define from adi-test
        selftests/seccomp: remove ARRAY_SIZE define from seccomp_benchmark
        selftests/rseq: remove ARRAY_SIZE define from individual tests
        selftests/net: remove ARRAY_SIZE define from individual tests
        selftests/landlock: remove ARRAY_SIZE define from common.h
        selftests/ir: remove ARRAY_SIZE define from ir_loopback.c
        selftests/core: remove ARRAY_SIZE define from close_range_test.c
        selftests/cgroup: remove ARRAY_SIZE define from cgroup_util.h
        selftests/arm64: remove ARRAY_SIZE define from vec-syscfg.c
        tools: fix ARRAY_SIZE defines in tools and selftests hdrs
        selftests: cgroup: build error multiple outpt files
        selftests/move_mount_set_group remove unneeded conversion to bool
        selftests/mount: remove unneeded conversion to bool
        selftests: harness: avoid false negatives if test has no ASSERTs
        selftests/ftrace: make kprobe profile testcase description unique
        selftests: clone3: clone3: add case CLONE3_ARGS_NO_TEST
        selftests: timers: Remove unneeded semicolon
        kselftests: timers:Remove unneeded semicolon
      4369b3ce