1. 22 Jun, 2021 3 commits
    • John Fastabend's avatar
      bpf: Fix null ptr deref with mixed tail calls and subprogs · 7506d211
      John Fastabend authored
      The sub-programs prog->aux->poke_tab[] is populated in jit_subprogs() and
      then used when emitting 'BPF_JMP|BPF_TAIL_CALL' insn->code from the
      individual JITs. The poke_tab[] to use is stored in the insn->imm by
      the code adding it to that array slot. The JIT then uses imm to find the
      right entry for an individual instruction. In the x86 bpf_jit_comp.c
      this is done by calling emit_bpf_tail_call_direct with the poke_tab[]
      of the imm value.
      
      However, we observed the below null-ptr-deref when mixing tail call
      programs with subprog programs. For this to happen we just need to
      mix bpf-2-bpf calls and tailcalls with some extra calls or instructions
      that would be patched later by one of the fixup routines. So whats
      happening?
      
      Before the fixup_call_args() -- where the jit op is done -- various
      code patching is done by do_misc_fixups(). This may increase the
      insn count, for example when we patch map_lookup_up using map_gen_lookup
      hook. This does two things. First, it means the instruction index,
      insn_idx field, of a tail call instruction will move by a 'delta'.
      
      In verifier code,
      
       struct bpf_jit_poke_descriptor desc = {
        .reason = BPF_POKE_REASON_TAIL_CALL,
        .tail_call.map = BPF_MAP_PTR(aux->map_ptr_state),
        .tail_call.key = bpf_map_key_immediate(aux),
        .insn_idx = i + delta,
       };
      
      Then subprog start values subprog_info[i].start will be updated
      with the delta and any poke descriptor index will also be updated
      with the delta in adjust_poke_desc(). If we look at the adjust
      subprog starts though we see its only adjusted when the delta
      occurs before the new instructions,
      
              /* NOTE: fake 'exit' subprog should be updated as well. */
              for (i = 0; i <= env->subprog_cnt; i++) {
                      if (env->subprog_info[i].start <= off)
                              continue;
      
      Earlier subprograms are not changed because their start values
      are not moved. But, adjust_poke_desc() does the offset + delta
      indiscriminately. The result is poke descriptors are potentially
      corrupted.
      
      Then in jit_subprogs() we only populate the poke_tab[]
      when the above insn_idx is less than the next subprogram start. From
      above we corrupted our insn_idx so we might incorrectly assume a
      poke descriptor is not used in a subprogram omitting it from the
      subprogram. And finally when the jit runs it does the deref of poke_tab
      when emitting the instruction and crashes with below. Because earlier
      step omitted the poke descriptor.
      
      The fix is straight forward with above context. Simply move same logic
      from adjust_subprog_starts() into adjust_poke_descs() and only adjust
      insn_idx when needed.
      
      [   82.396354] bpf_testmod: version magic '5.12.0-rc2alu+ SMP preempt mod_unload ' should be '5.12.0+ SMP preempt mod_unload '
      [   82.623001] loop10: detected capacity change from 0 to 8
      [   88.487424] ==================================================================
      [   88.487438] BUG: KASAN: null-ptr-deref in do_jit+0x184a/0x3290
      [   88.487455] Write of size 8 at addr 0000000000000008 by task test_progs/5295
      [   88.487471] CPU: 7 PID: 5295 Comm: test_progs Tainted: G          I       5.12.0+ #386
      [   88.487483] Hardware name: Dell Inc. Precision 5820 Tower/002KVM, BIOS 1.9.2 01/24/2019
      [   88.487490] Call Trace:
      [   88.487498]  dump_stack+0x93/0xc2
      [   88.487515]  kasan_report.cold+0x5f/0xd8
      [   88.487530]  ? do_jit+0x184a/0x3290
      [   88.487542]  do_jit+0x184a/0x3290
       ...
      [   88.487709]  bpf_int_jit_compile+0x248/0x810
       ...
      [   88.487765]  bpf_check+0x3718/0x5140
       ...
      [   88.487920]  bpf_prog_load+0xa22/0xf10
      
      Fixes: a748c697 ("bpf: propagate poke descriptors to subprograms")
      Reported-by: default avatarJussi Maki <joamaki@gmail.com>
      Signed-off-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Reviewed-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      7506d211
    • Bui Quang Minh's avatar
      bpf: Fix integer overflow in argument calculation for bpf_map_area_alloc · 7dd5d437
      Bui Quang Minh authored
      In 32-bit architecture, the result of sizeof() is a 32-bit integer so
      the expression becomes the multiplication between 2 32-bit integer which
      can potentially leads to integer overflow. As a result,
      bpf_map_area_alloc() allocates less memory than needed.
      
      Fix this by casting 1 operand to u64.
      
      Fixes: 0d2c4f96 ("bpf: Eliminate rlimit-based memory accounting for sockmap and sockhash maps")
      Fixes: 99c51064 ("devmap: Use bpf_map_area_alloc() for allocating hash buckets")
      Fixes: 546ac1ff ("bpf: add devmap, a map for storing net device references")
      Signed-off-by: default avatarBui Quang Minh <minhquangbui99@gmail.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20210613143440.71975-1-minhquangbui99@gmail.com
      7dd5d437
    • Maciej Żenczykowski's avatar
      bpf: Fix regression on BPF_OBJ_GET with non-O_RDWR flags · 5dec6d96
      Maciej Żenczykowski authored
      This reverts commit d37300ed ("bpf: program: Refuse non-O_RDWR flags
      in BPF_OBJ_GET"). It breaks Android userspace which expects to be able to
      fetch programs with just read permissions.
      
      See: https://cs.android.com/android/platform/superproject/+/master:frameworks/libs/net/common/native/bpf_syscall_wrappers/include/BpfSyscallWrappers.h;drc=7005c764be23d31fa1d69e826b4a2f6689a8c81e;l=124
      
      Side-note: another option to fix it would be to extend bpf_prog_new_fd()
      and to pass in used file mode flags in the same way as we do for maps via
      bpf_map_new_fd(). Meaning, they'd end up in anon_inode_getfd() and thus
      would be retained for prog fd operations with bpf() syscall. Right now
      these flags are not checked with progs since they are immutable for their
      lifetime (as opposed to maps which can be updated from user space). In
      future this could potentially change with new features, but at that point
      it's still fine to do the bpf_prog_new_fd() extension when needed. For a
      simple stable fix, a revert is less churn.
      
      Fixes: d37300ed ("bpf: program: Refuse non-O_RDWR flags in BPF_OBJ_GET")
      Signed-off-by: default avatarMaciej Żenczykowski <maze@google.com>
      [ Daniel: added side-note to commit message ]
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarLorenz Bauer <lmb@cloudflare.com>
      Acked-by: default avatarGreg Kroah-Hartman <gregkh@google.com>
      Link: https://lore.kernel.org/bpf/20210618105526.265003-1-zenczykowski@gmail.com
      5dec6d96
  2. 21 Jun, 2021 8 commits
  3. 18 Jun, 2021 3 commits
    • Tony Ambardar's avatar
      bpf: Fix libelf endian handling in resolv_btfids · 61e8aeda
      Tony Ambardar authored
      The vmlinux ".BTF_ids" ELF section is declared in btf_ids.h to hold a list
      of zero-filled BTF IDs, which is then patched at link-time with correct
      values by resolv_btfids. The section is flagged as "allocable" to preclude
      compression, but notably the section contents (BTF IDs) are untyped.
      
      When patching the BTF IDs, resolve_btfids writes in host-native endianness
      and relies on libelf for any required translation on reading and updating
      vmlinux. However, since the type of the .BTF_ids section content defaults
      to ELF_T_BYTE (i.e. unsigned char), no translation occurs. This results in
      incorrect patched values when cross-compiling to non-native endianness,
      and can manifest as kernel Oops and test failures which are difficult to
      troubleshoot [1].
      
      Explicitly set the type of patched data to ELF_T_WORD, the architecture-
      neutral ELF type corresponding to the u32 BTF IDs. This enables libelf to
      transparently perform any needed endian conversions.
      
      Fixes: fbbb68de ("bpf: Add resolve_btfids tool to resolve BTF IDs in ELF object")
      Signed-off-by: default avatarTony Ambardar <Tony.Ambardar@gmail.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarJiri Olsa <jolsa@redhat.com>
      Cc: Frank Eigler <fche@redhat.com>
      Cc: Mark Wielaard <mark@klomp.org>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Yonghong Song <yhs@fb.com>
      Link: https://lore.kernel.org/bpf/CAPGftE_eY-Zdi3wBcgDfkz_iOr1KF10n=9mJHm1_a_PykcsoeA@mail.gmail.com [1]
      Link: https://lore.kernel.org/bpf/20210618061404.818569-1-Tony.Ambardar@gmail.com
      61e8aeda
    • Magnus Karlsson's avatar
      xsk: Fix broken Tx ring validation · f654fae4
      Magnus Karlsson authored
      Fix broken Tx ring validation for AF_XDP. The commit under the Fixes
      tag, fixed an off-by-one error in the validation but introduced
      another error. Descriptors are now let through even if they straddle a
      chunk boundary which they are not allowed to do in aligned mode. Worse
      is that they are let through even if they straddle the end of the umem
      itself, tricking the kernel to read data outside the allowed umem
      region which might or might not be mapped at all.
      
      Fix this by reintroducing the old code, but subtract the length by one
      to fix the off-by-one error that the original patch was
      addressing. The test chunk != chunk_end makes sure packets do not
      straddle chunk boundraries. Note that packets of zero length are
      allowed in the interface, therefore the test if the length is
      non-zero.
      
      Fixes: ac31565c ("xsk: Fix for xp_aligned_validate_desc() when len == chunk_size")
      Signed-off-by: default avatarMagnus Karlsson <magnus.karlsson@intel.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Reviewed-by: default avatarXuan Zhuo <xuanzhuo@linux.alibaba.com>
      Acked-by: default avatarBjörn Töpel <bjorn@kernel.org>
      Link: https://lore.kernel.org/bpf/20210618075805.14412-1-magnus.karlsson@gmail.com
      f654fae4
    • Magnus Karlsson's avatar
      xsk: Fix missing validation for skb and unaligned mode · 2f996198
      Magnus Karlsson authored
      Fix a missing validation of a Tx descriptor when executing in skb mode
      and the umem is in unaligned mode. A descriptor could point to a
      buffer straddling the end of the umem, thus effectively tricking the
      kernel to read outside the allowed umem region. This could lead to a
      kernel crash if that part of memory is not mapped.
      
      In zero-copy mode, the descriptor validation code rejects such
      descriptors by checking a bit in the DMA address that tells us if the
      next page is physically contiguous or not. For the last page in the
      umem, this bit is not set, therefore any descriptor pointing to a
      packet straddling this last page boundary will be rejected. However,
      the skb path does not use this bit since it copies out data and can do
      so to two different pages. (It also does not have the array of DMA
      address, so it cannot even store this bit.) The code just returned
      that the packet is always physically contiguous. But this is
      unfortunately also returned for the last page in the umem, which means
      that packets that cross the end of the umem are being allowed, which
      they should not be.
      
      Fix this by introducing a check for this in the SKB path only, not
      penalizing the zero-copy path.
      
      Fixes: 2b43470a ("xsk: Introduce AF_XDP buffer allocation API")
      Signed-off-by: default avatarMagnus Karlsson <magnus.karlsson@intel.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarBjörn Töpel <bjorn@kernel.org>
      Link: https://lore.kernel.org/bpf/20210617092255.3487-1-magnus.karlsson@gmail.com
      2f996198
  4. 17 Jun, 2021 5 commits
    • Kees Cook's avatar
      net: qed: Fix memcpy() overflow of qed_dcbx_params() · 1c200f83
      Kees Cook authored
      The source (&dcbx_info->operational.params) and dest
      (&p_hwfn->p_dcbx_info->set.config.params) are both struct qed_dcbx_params
      (560 bytes), not struct qed_dcbx_admin_params (564 bytes), which is used
      as the memcpy() size.
      
      However it seems that struct qed_dcbx_operational_params
      (dcbx_info->operational)'s layout matches struct qed_dcbx_admin_params
      (p_hwfn->p_dcbx_info->set.config)'s 4 byte difference (3 padding, 1 byte
      for "valid").
      
      On the assumption that the size is wrong (rather than the source structure
      type), adjust the memcpy() size argument to be 4 bytes smaller and add
      a BUILD_BUG_ON() to validate any changes to the structure sizes.
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1c200f83
    • Linyu Yuan's avatar
      net: cdc_eem: fix tx fixup skb leak · c3b26fdf
      Linyu Yuan authored
      when usbnet transmit a skb, eem fixup it in eem_tx_fixup(),
      if skb_copy_expand() failed, it return NULL,
      usbnet_start_xmit() will have no chance to free original skb.
      
      fix it by free orginal skb in eem_tx_fixup() first,
      then check skb clone status, if failed, return NULL to usbnet.
      
      Fixes: 9f722c09 ("usbnet: CDC EEM support (v5)")
      Signed-off-by: default avatarLinyu Yuan <linyyuan@codeaurora.org>
      Reviewed-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c3b26fdf
    • David S. Miller's avatar
      Merge tag 'mlx5-fixes-2021-06-16' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux · bc39f679
      David S. Miller authored
      Saeed Mahameed says:
      
      ====================
      mlx5 fixes 2021-06-16
      
      This series introduces some fixes to mlx5 driver.
      Please pull and let me know if there is any problem.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bc39f679
    • Pavel Skripkin's avatar
      net: hamradio: fix memory leak in mkiss_close · 7edcc682
      Pavel Skripkin authored
      My local syzbot instance hit memory leak in
      mkiss_open()[1]. The problem was in missing
      free_netdev() in mkiss_close().
      
      In mkiss_open() netdevice is allocated and then
      registered, but in mkiss_close() netdevice was
      only unregistered, but not freed.
      
      Fail log:
      
      BUG: memory leak
      unreferenced object 0xffff8880281ba000 (size 4096):
        comm "syz-executor.1", pid 11443, jiffies 4295046091 (age 17.660s)
        hex dump (first 32 bytes):
          61 78 30 00 00 00 00 00 00 00 00 00 00 00 00 00  ax0.............
          00 27 fa 2a 80 88 ff ff 00 00 00 00 00 00 00 00  .'.*............
        backtrace:
          [<ffffffff81a27201>] kvmalloc_node+0x61/0xf0
          [<ffffffff8706e7e8>] alloc_netdev_mqs+0x98/0xe80
          [<ffffffff84e64192>] mkiss_open+0xb2/0x6f0 [1]
          [<ffffffff842355db>] tty_ldisc_open+0x9b/0x110
          [<ffffffff84236488>] tty_set_ldisc+0x2e8/0x670
          [<ffffffff8421f7f3>] tty_ioctl+0xda3/0x1440
          [<ffffffff81c9f273>] __x64_sys_ioctl+0x193/0x200
          [<ffffffff8911263a>] do_syscall_64+0x3a/0xb0
          [<ffffffff89200068>] entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      BUG: memory leak
      unreferenced object 0xffff8880141a9a00 (size 96):
        comm "syz-executor.1", pid 11443, jiffies 4295046091 (age 17.660s)
        hex dump (first 32 bytes):
          e8 a2 1b 28 80 88 ff ff e8 a2 1b 28 80 88 ff ff  ...(.......(....
          98 92 9c aa b0 40 02 00 00 00 00 00 00 00 00 00  .....@..........
        backtrace:
          [<ffffffff8709f68b>] __hw_addr_create_ex+0x5b/0x310
          [<ffffffff8709fb38>] __hw_addr_add_ex+0x1f8/0x2b0
          [<ffffffff870a0c7b>] dev_addr_init+0x10b/0x1f0
          [<ffffffff8706e88b>] alloc_netdev_mqs+0x13b/0xe80
          [<ffffffff84e64192>] mkiss_open+0xb2/0x6f0 [1]
          [<ffffffff842355db>] tty_ldisc_open+0x9b/0x110
          [<ffffffff84236488>] tty_set_ldisc+0x2e8/0x670
          [<ffffffff8421f7f3>] tty_ioctl+0xda3/0x1440
          [<ffffffff81c9f273>] __x64_sys_ioctl+0x193/0x200
          [<ffffffff8911263a>] do_syscall_64+0x3a/0xb0
          [<ffffffff89200068>] entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      BUG: memory leak
      unreferenced object 0xffff8880219bfc00 (size 512):
        comm "syz-executor.1", pid 11443, jiffies 4295046091 (age 17.660s)
        hex dump (first 32 bytes):
          00 a0 1b 28 80 88 ff ff 80 8f b1 8d ff ff ff ff  ...(............
          80 8f b1 8d ff ff ff ff 00 00 00 00 00 00 00 00  ................
        backtrace:
          [<ffffffff81a27201>] kvmalloc_node+0x61/0xf0
          [<ffffffff8706eec7>] alloc_netdev_mqs+0x777/0xe80
          [<ffffffff84e64192>] mkiss_open+0xb2/0x6f0 [1]
          [<ffffffff842355db>] tty_ldisc_open+0x9b/0x110
          [<ffffffff84236488>] tty_set_ldisc+0x2e8/0x670
          [<ffffffff8421f7f3>] tty_ioctl+0xda3/0x1440
          [<ffffffff81c9f273>] __x64_sys_ioctl+0x193/0x200
          [<ffffffff8911263a>] do_syscall_64+0x3a/0xb0
          [<ffffffff89200068>] entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      BUG: memory leak
      unreferenced object 0xffff888029b2b200 (size 256):
        comm "syz-executor.1", pid 11443, jiffies 4295046091 (age 17.660s)
        hex dump (first 32 bytes):
          00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
          00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
        backtrace:
          [<ffffffff81a27201>] kvmalloc_node+0x61/0xf0
          [<ffffffff8706f062>] alloc_netdev_mqs+0x912/0xe80
          [<ffffffff84e64192>] mkiss_open+0xb2/0x6f0 [1]
          [<ffffffff842355db>] tty_ldisc_open+0x9b/0x110
          [<ffffffff84236488>] tty_set_ldisc+0x2e8/0x670
          [<ffffffff8421f7f3>] tty_ioctl+0xda3/0x1440
          [<ffffffff81c9f273>] __x64_sys_ioctl+0x193/0x200
          [<ffffffff8911263a>] do_syscall_64+0x3a/0xb0
          [<ffffffff89200068>] entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      Fixes: 815f62bf ("[PATCH] SMP rewrite of mkiss")
      Signed-off-by: default avatarPavel Skripkin <paskripkin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7edcc682
    • Christophe JAILLET's avatar
      be2net: Fix an error handling path in 'be_probe()' · c19c8c0e
      Christophe JAILLET authored
      If an error occurs after a 'pci_enable_pcie_error_reporting()' call, it
      must be undone by a corresponding 'pci_disable_pcie_error_reporting()'
      call, as already done in the remove function.
      
      Fixes: d6b6d987 ("be2net: use PCIe AER capability")
      Signed-off-by: default avatarChristophe JAILLET <christophe.jaillet@wanadoo.fr>
      Acked-by: default avatarSomnath Kotur <somnath.kotur@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c19c8c0e
  5. 16 Jun, 2021 21 commits
    • Aya Levin's avatar
      net/mlx5: Reset mkey index on creation · 0232fc2d
      Aya Levin authored
      Reset only the index part of the mkey and keep the variant part. On
      devlink reload, driver recreates mkeys, so the mkey index may change.
      Trying to preserve the variant part of the mkey, driver mistakenly
      merged the mkey index with current value. In case of a devlink reload,
      current value of index part is dirty, so the index may be corrupted.
      
      Fixes: 54c62e13 ("{IB,net}/mlx5: Setup mkey variant before mr create command invocation")
      Signed-off-by: default avatarAya Levin <ayal@nvidia.com>
      Signed-off-by: default avatarAmir Tzin <amirtz@nvidia.com>
      Reviewed-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      0232fc2d
    • Dmytro Linkin's avatar
      net/mlx5e: Don't create devices during unload flow · a5ae8fc9
      Dmytro Linkin authored
      Running devlink reload command for port in switchdev mode cause
      resources to corrupt: driver can't release allocated EQ and reclaim
      memory pages, because "rdma" auxiliary device had add CQs which blocks
      EQ from deletion.
      Erroneous sequence happens during reload-down phase, and is following:
      
      1. detach device - suspends auxiliary devices which support it, destroys
         others. During this step "eth-rep" and "rdma-rep" are destroyed,
         "eth" - suspended.
      2. disable SRIOV - moves device to legacy mode; as part of disablement -
         rescans drivers. This step adds "rdma" auxiliary device.
      3. destroy EQ table - <failure>.
      
      Driver shouldn't create any device during unload flows. To handle that
      implement MLX5_PRIV_FLAGS_DETACH flag, set it on device detach and unset
      on device attach. If flag is set do no-op on drivers rescan.
      
      Fixes: a925b5e3 ("net/mlx5: Register mlx5 devices to auxiliary virtual bus")
      Signed-off-by: default avatarDmytro Linkin <dlinkin@nvidia.com>
      Reviewed-by: default avatarLeon Romanovsky <leonro@nvidia.com>
      Reviewed-by: default avatarRoi Dayan <roid@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      a5ae8fc9
    • Alex Vesker's avatar
      net/mlx5: DR, Fix STEv1 incorrect L3 decapsulation padding · 65fb7d10
      Alex Vesker authored
      Decapsulation L3 on small inner packets which are less than
      64 Bytes was done incorrectly. In small packets there is an
      extra padding added in L2 which should not be included in L3
      length. The issue was that after decapL3 the extra L2 padding
      caused an update on the L3 length.
      
      To avoid this issue the new header is pushed to the beginning
      of the packet (offset 0) which should not cause a HW reparse
      and update the L3 length.
      
      Fixes: c349b413 ("net/mlx5: DR, Add STEv1 modify header logic")
      Reviewed-by: default avatarErez Shitrit <erezsh@nvidia.com>
      Reviewed-by: default avatarYevgeny Kliteynik <kliteyn@nvidia.com>
      Signed-off-by: default avatarAlex Vesker <valex@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      65fb7d10
    • Parav Pandit's avatar
      net/mlx5: SF_DEV, remove SF device on invalid state · c7d6c19b
      Parav Pandit authored
      When auxiliary bus autoprobe is disabled and SF is in ACTIVE state,
      on SF port deletion it transitions from ACTIVE->ALLOCATED->INVALID.
      
      When VHCA event handler queries the state, it is already transition
      to INVALID state.
      
      In this scenario, event handler missed to delete the SF device.
      
      Fix it by deleting the SF when SF state is INVALID.
      
      Fixes: 90d010b8 ("net/mlx5: SF, Add auxiliary device support")
      Signed-off-by: default avatarParav Pandit <parav@nvidia.com>
      Reviewed-by: default avatarVu Pham <vuhuong@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      c7d6c19b
    • Parav Pandit's avatar
      net/mlx5: E-Switch, Allow setting GUID for host PF vport · ca36fc4d
      Parav Pandit authored
      E-switch should be able to set the GUID of host PF vport.
      Currently it returns an error. This results in below error
      when user attempts to configure MAC address of the PF of an
      external controller.
      
      $ devlink port function set pci/0000:03:00.0/196608 \
         hw_addr 00:00:00:11:22:33
      
      mlx5_core 0000:03:00.0: mlx5_esw_set_vport_mac_locked:1876:(pid 6715):\
      "Failed to set vport 0 node guid, err = -22.
      RDMA_CM will not function properly for this VF."
      
      Check for zero vport is no longer needed.
      
      Fixes: 330077d1 ("net/mlx5: E-switch, Supporting setting devlink port function mac address")
      Signed-off-by: default avatarYuval Avnery <yuvalav@nvidia.com>
      Signed-off-by: default avatarParav Pandit <parav@nvidia.com>
      Reviewed-by: default avatarBodong Wang <bodong@nvidia.com>
      Reviewed-by: default avatarAlaa Hleihel <alaa@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      ca36fc4d
    • Parav Pandit's avatar
      net/mlx5: E-Switch, Read PF mac address · bbc8222d
      Parav Pandit authored
      External controller PF's MAC address is not read from the device during
      vport setup. Fail to read this results in showing all zeros to user
      while the factory programmed MAC is a valid value.
      
      $ devlink port show eth1 -jp
      {
          "port": {
              "pci/0000:03:00.0/196608": {
                  "type": "eth",
                  "netdev": "eth1",
                  "flavour": "pcipf",
                  "controller": 1,
                  "pfnum": 0,
                  "splittable": false,
                  "function": {
                      "hw_addr": "00:00:00:00:00:00"
                  }
              }
          }
      }
      
      Hence, read it when enabling a vport.
      
      After the fix,
      
      $ devlink port show eth1 -jp
      {
          "port": {
              "pci/0000:03:00.0/196608": {
                  "type": "eth",
                  "netdev": "eth1",
                  "flavour": "pcipf",
                  "controller": 1,
                  "pfnum": 0,
                  "splittable": false,
                  "function": {
                      "hw_addr": "98:03:9b:a0:60:11"
                  }
              }
          }
      }
      
      Fixes: f099fde1 ("net/mlx5: E-switch, Support querying port function mac address")
      Signed-off-by: default avatarBodong Wang <bodong@nvidia.com>
      Signed-off-by: default avatarParav Pandit <parav@nvidia.com>
      Reviewed-by: default avatarAlaa Hleihel <alaa@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      bbc8222d
    • Leon Romanovsky's avatar
      net/mlx5: Check that driver was probed prior attaching the device · 2058cc9c
      Leon Romanovsky authored
      The device can be requested to be attached despite being not probed.
      This situation is possible if devlink reload races with module removal,
      and the following kernel panic is an outcome of such race.
      
       mlx5_core 0000:00:09.0: firmware version: 4.7.9999
       mlx5_core 0000:00:09.0: 0.000 Gb/s available PCIe bandwidth (8.0 GT/s PCIe x255 link)
       BUG: unable to handle page fault for address: fffffffffffffff0
       #PF: supervisor read access in kernel mode
       #PF: error_code(0x0000) - not-present page
       PGD 3218067 P4D 3218067 PUD 321a067 PMD 0
       Oops: 0000 [#1] SMP KASAN NOPTI
       CPU: 7 PID: 250 Comm: devlink Not tainted 5.12.0-rc2+ #2836
       Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
       RIP: 0010:mlx5_attach_device+0x80/0x280 [mlx5_core]
       Code: f8 48 c1 e8 03 42 80 3c 38 00 0f 85 80 01 00 00 48 8b 45 68 48 8d 78 f0 48 89 fe 48 c1 ee 03 42 80 3c 3e 00 0f 85 70 01 00 00 <48> 8b 40 f0 48 85 c0 74 0d 48 89 ef ff d0 85 c0 0f 85 84 05 0e 00
       RSP: 0018:ffff8880129675f0 EFLAGS: 00010246
       RAX: 0000000000000000 RBX: 0000000000000001 RCX: ffffffff827407f1
       RDX: 1ffff110011336cf RSI: 1ffffffffffffffe RDI: fffffffffffffff0
       RBP: ffff888008e0c000 R08: 0000000000000008 R09: ffffffffa0662ee7
       R10: fffffbfff40cc5dc R11: 0000000000000000 R12: ffff88800ea002e0
       R13: ffffed1001d459f7 R14: ffffffffa05ef4f8 R15: dffffc0000000000
       FS:  00007f51dfeaf740(0000) GS:ffff88806d5c0000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       CR2: fffffffffffffff0 CR3: 000000000bc82006 CR4: 0000000000370ea0
       DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
       DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
       Call Trace:
        mlx5_load_one+0x117/0x1d0 [mlx5_core]
        devlink_reload+0x2d5/0x520
        ? devlink_remote_reload_actions_performed+0x30/0x30
        ? mutex_trylock+0x24b/0x2d0
        ? devlink_nl_cmd_reload+0x62b/0x1070
        devlink_nl_cmd_reload+0x66d/0x1070
        ? devlink_reload+0x520/0x520
        ? devlink_nl_pre_doit+0x64/0x4d0
        genl_family_rcv_msg_doit+0x1e9/0x2f0
        ? mutex_lock_io_nested+0x1130/0x1130
        ? genl_family_rcv_msg_attrs_parse.constprop.0+0x240/0x240
        ? security_capable+0x51/0x90
        genl_rcv_msg+0x27f/0x4a0
        ? genl_get_cmd+0x3c0/0x3c0
        ? lock_acquire+0x1a9/0x6d0
        ? devlink_reload+0x520/0x520
        ? lock_release+0x6c0/0x6c0
        netlink_rcv_skb+0x11d/0x340
        ? genl_get_cmd+0x3c0/0x3c0
        ? netlink_ack+0x9f0/0x9f0
        ? lock_release+0x1f9/0x6c0
        genl_rcv+0x24/0x40
        netlink_unicast+0x433/0x700
        ? netlink_attachskb+0x730/0x730
        ? _copy_from_iter_full+0x178/0x650
        ? __alloc_skb+0x113/0x2b0
        netlink_sendmsg+0x6f1/0xbd0
        ? netlink_unicast+0x700/0x700
        ? netlink_unicast+0x700/0x700
        sock_sendmsg+0xb0/0xe0
        __sys_sendto+0x193/0x240
        ? __x64_sys_getpeername+0xb0/0xb0
        ? copy_page_range+0x2300/0x2300
        ? __up_read+0x1a1/0x7b0
        ? do_user_addr_fault+0x219/0xdc0
        __x64_sys_sendto+0xdd/0x1b0
        ? syscall_enter_from_user_mode+0x1d/0x50
        do_syscall_64+0x2d/0x40
        entry_SYSCALL_64_after_hwframe+0x44/0xae
       RIP: 0033:0x7f51dffb514a
       Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 76 c3 0f 1f 44 00 00 55 48 83 ec 30 44 89 4c
       RSP: 002b:00007ffcaef22e78 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
       RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f51dffb514a
       RDX: 0000000000000030 RSI: 000055750daf2440 RDI: 0000000000000003
       RBP: 000055750daf2410 R08: 00007f51e0081200 R09: 000000000000000c
       R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
       R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
       Modules linked in: mlx5_core(-) ptp pps_core ib_ipoib rdma_ucm rdma_cm iw_cm ib_cm ib_umad ib_uverbs ib_core [last unloaded: mlx5_ib]
       CR2: fffffffffffffff0
       ---[ end trace 7789831bfe74fa42 ]---
      
      Fixes: a925b5e3 ("net/mlx5: Register mlx5 devices to auxiliary virtual bus")
      Signed-off-by: default avatarLeon Romanovsky <leonro@nvidia.com>
      Reviewed-by: default avatarParav Pandit <parav@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      2058cc9c
    • Leon Romanovsky's avatar
      net/mlx5: Fix error path for set HCA defaults · 94a4b841
      Leon Romanovsky authored
      In the case of the failure to execute mlx5_core_set_hca_defaults(),
      we used wrong goto label to execute error unwind flow.
      
      Fixes: 5bef709d ("net/mlx5: Enable host PF HCA after eswitch is initialized")
      Reviewed-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      Reviewed-by: default avatarMoshe Shemesh <moshe@nvidia.com>
      Signed-off-by: default avatarLeon Romanovsky <leonro@nvidia.com>
      Reviewed-by: default avatarParav Pandit <parav@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      94a4b841
    • Kees Cook's avatar
      r8169: Avoid memcpy() over-reading of ETH_SS_STATS · da5ac772
      Kees Cook authored
      In preparation for FORTIFY_SOURCE performing compile-time and run-time
      field bounds checking for memcpy(), memmove(), and memset(), avoid
      intentionally reading across neighboring array fields.
      
      The memcpy() is copying the entire structure, not just the first array.
      Adjust the source argument so the compiler can do appropriate bounds
      checking.
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      da5ac772
    • Kees Cook's avatar
      sh_eth: Avoid memcpy() over-reading of ETH_SS_STATS · 224004fb
      Kees Cook authored
      In preparation for FORTIFY_SOURCE performing compile-time and run-time
      field bounds checking for memcpy(), memmove(), and memset(), avoid
      intentionally reading across neighboring array fields.
      
      The memcpy() is copying the entire structure, not just the first array.
      Adjust the source argument so the compiler can do appropriate bounds
      checking.
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      224004fb
    • Kees Cook's avatar
      r8152: Avoid memcpy() over-reading of ETH_SS_STATS · 99718abd
      Kees Cook authored
      In preparation for FORTIFY_SOURCE performing compile-time and run-time
      field bounds checking for memcpy(), memmove(), and memset(), avoid
      intentionally reading across neighboring array fields.
      
      The memcpy() is copying the entire structure, not just the first array.
      Adjust the source argument so the compiler can do appropriate bounds
      checking.
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      99718abd
    • Andrea Righi's avatar
      selftests: net: use bash to run udpgro_fwd test case · 1b29df0e
      Andrea Righi authored
      udpgro_fwd.sh contains many bash specific operators ("[[", "local -r"),
      but it's using /bin/sh; in some distro /bin/sh is mapped to /bin/dash,
      that doesn't support such operators.
      
      Force the test to use /bin/bash explicitly and prevent false positive
      test failures.
      Signed-off-by: default avatarAndrea Righi <andrea.righi@canonical.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1b29df0e
    • Eric Dumazet's avatar
      net/af_unix: fix a data-race in unix_dgram_sendmsg / unix_release_sock · a494bd64
      Eric Dumazet authored
      While unix_may_send(sk, osk) is called while osk is locked, it appears
      unix_release_sock() can overwrite unix_peer() after this lock has been
      released, making KCSAN unhappy.
      
      Changing unix_release_sock() to access/change unix_peer()
      before lock is released should fix this issue.
      
      BUG: KCSAN: data-race in unix_dgram_sendmsg / unix_release_sock
      
      write to 0xffff88810465a338 of 8 bytes by task 20852 on cpu 1:
       unix_release_sock+0x4ed/0x6e0 net/unix/af_unix.c:558
       unix_release+0x2f/0x50 net/unix/af_unix.c:859
       __sock_release net/socket.c:599 [inline]
       sock_close+0x6c/0x150 net/socket.c:1258
       __fput+0x25b/0x4e0 fs/file_table.c:280
       ____fput+0x11/0x20 fs/file_table.c:313
       task_work_run+0xae/0x130 kernel/task_work.c:164
       tracehook_notify_resume include/linux/tracehook.h:189 [inline]
       exit_to_user_mode_loop kernel/entry/common.c:175 [inline]
       exit_to_user_mode_prepare+0x156/0x190 kernel/entry/common.c:209
       __syscall_exit_to_user_mode_work kernel/entry/common.c:291 [inline]
       syscall_exit_to_user_mode+0x20/0x40 kernel/entry/common.c:302
       do_syscall_64+0x56/0x90 arch/x86/entry/common.c:57
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      read to 0xffff88810465a338 of 8 bytes by task 20888 on cpu 0:
       unix_may_send net/unix/af_unix.c:189 [inline]
       unix_dgram_sendmsg+0x923/0x1610 net/unix/af_unix.c:1712
       sock_sendmsg_nosec net/socket.c:654 [inline]
       sock_sendmsg net/socket.c:674 [inline]
       ____sys_sendmsg+0x360/0x4d0 net/socket.c:2350
       ___sys_sendmsg net/socket.c:2404 [inline]
       __sys_sendmmsg+0x315/0x4b0 net/socket.c:2490
       __do_sys_sendmmsg net/socket.c:2519 [inline]
       __se_sys_sendmmsg net/socket.c:2516 [inline]
       __x64_sys_sendmmsg+0x53/0x60 net/socket.c:2516
       do_syscall_64+0x4a/0x90 arch/x86/entry/common.c:47
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      value changed: 0xffff888167905400 -> 0x0000000000000000
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 0 PID: 20888 Comm: syz-executor.0 Not tainted 5.13.0-rc5-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a494bd64
    • Andrea Righi's avatar
      selftests: net: veth: make test compatible with dash · 0fd158b8
      Andrea Righi authored
      veth.sh is a shell script that uses /bin/sh; some distro (Ubuntu for
      example) use dash as /bin/sh and in this case the test reports the
      following error:
      
       # ./veth.sh: 21: local: -r: bad variable name
       # ./veth.sh: 21: local: -r: bad variable name
      
      This happens because dash doesn't support the option "-r" with local.
      
      Moreover, in case of missing bpf object, the script is exiting -1, that
      is an illegal number for dash:
      
       exit: Illegal number: -1
      
      Change the script to be compatible both with bash and dash and prevent
      the errors above.
      Signed-off-by: default avatarAndrea Righi <andrea.righi@canonical.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0fd158b8
    • David S. Miller's avatar
      Merge branch 'net-packet-data-races' · 1d2ac203
      David S. Miller authored
      Eric Dumazet says:
      
      ====================
      net/packet: annotate data races
      
      KCSAN sent two reports about data races in af_packet.
      Nothing serious, but worth fixing.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1d2ac203
    • Eric Dumazet's avatar
      net/packet: annotate accesses to po->ifindex · e032f7c9
      Eric Dumazet authored
      Like prior patch, we need to annotate lockless accesses to po->ifindex
      For instance, packet_getname() is reading po->ifindex (twice) while
      another thread is able to change po->ifindex.
      
      KCSAN reported:
      
      BUG: KCSAN: data-race in packet_do_bind / packet_getname
      
      write to 0xffff888143ce3cbc of 4 bytes by task 25573 on cpu 1:
       packet_do_bind+0x420/0x7e0 net/packet/af_packet.c:3191
       packet_bind+0xc3/0xd0 net/packet/af_packet.c:3255
       __sys_bind+0x200/0x290 net/socket.c:1637
       __do_sys_bind net/socket.c:1648 [inline]
       __se_sys_bind net/socket.c:1646 [inline]
       __x64_sys_bind+0x3d/0x50 net/socket.c:1646
       do_syscall_64+0x4a/0x90 arch/x86/entry/common.c:47
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      read to 0xffff888143ce3cbc of 4 bytes by task 25578 on cpu 0:
       packet_getname+0x5b/0x1a0 net/packet/af_packet.c:3525
       __sys_getsockname+0x10e/0x1a0 net/socket.c:1887
       __do_sys_getsockname net/socket.c:1902 [inline]
       __se_sys_getsockname net/socket.c:1899 [inline]
       __x64_sys_getsockname+0x3e/0x50 net/socket.c:1899
       do_syscall_64+0x4a/0x90 arch/x86/entry/common.c:47
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      value changed: 0x00000000 -> 0x00000001
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 0 PID: 25578 Comm: syz-executor.5 Not tainted 5.13.0-rc6-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e032f7c9
    • Eric Dumazet's avatar
      net/packet: annotate accesses to po->bind · c7d2ef5d
      Eric Dumazet authored
      tpacket_snd(), packet_snd(), packet_getname() and packet_seq_show()
      can read po->num without holding a lock. This means other threads
      can change po->num at the same time.
      
      KCSAN complained about this known fact [1]
      Add READ_ONCE()/WRITE_ONCE() to address the issue.
      
      [1] BUG: KCSAN: data-race in packet_do_bind / packet_sendmsg
      
      write to 0xffff888131a0dcc0 of 2 bytes by task 24714 on cpu 0:
       packet_do_bind+0x3ab/0x7e0 net/packet/af_packet.c:3181
       packet_bind+0xc3/0xd0 net/packet/af_packet.c:3255
       __sys_bind+0x200/0x290 net/socket.c:1637
       __do_sys_bind net/socket.c:1648 [inline]
       __se_sys_bind net/socket.c:1646 [inline]
       __x64_sys_bind+0x3d/0x50 net/socket.c:1646
       do_syscall_64+0x4a/0x90 arch/x86/entry/common.c:47
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      read to 0xffff888131a0dcc0 of 2 bytes by task 24719 on cpu 1:
       packet_snd net/packet/af_packet.c:2899 [inline]
       packet_sendmsg+0x317/0x3570 net/packet/af_packet.c:3040
       sock_sendmsg_nosec net/socket.c:654 [inline]
       sock_sendmsg net/socket.c:674 [inline]
       ____sys_sendmsg+0x360/0x4d0 net/socket.c:2350
       ___sys_sendmsg net/socket.c:2404 [inline]
       __sys_sendmsg+0x1ed/0x270 net/socket.c:2433
       __do_sys_sendmsg net/socket.c:2442 [inline]
       __se_sys_sendmsg net/socket.c:2440 [inline]
       __x64_sys_sendmsg+0x42/0x50 net/socket.c:2440
       do_syscall_64+0x4a/0x90 arch/x86/entry/common.c:47
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      value changed: 0x0000 -> 0x1200
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 1 PID: 24719 Comm: syz-executor.5 Not tainted 5.13.0-rc4-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c7d2ef5d
    • David S. Miller's avatar
      Merge tag 'linux-can-fixes-for-5.13-20210616' of... · e82a35ae
      David S. Miller authored
      Merge tag 'linux-can-fixes-for-5.13-20210616' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can
      
      Marc Kleine-Budde says:
      
      ====================
      pull-request: can 2021-06-16
      
      this is a pull request of 4 patches for net/master.
      
      The first patch is by Oleksij Rempel and fixes a Use-after-Free found
      by syzbot in the j1939 stack.
      
      The next patch is by Tetsuo Handa and fixes hung task detected by
      syzbot in the bcm, raw and isotp protocols.
      
      Norbert Slusarek's patch fixes a infoleak in bcm's struct
      bcm_msg_head.
      
      Pavel Skripkin's patch fixes a memory leak in the mcba_usb driver.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e82a35ae
    • Chengyang Fan's avatar
      net: ipv4: fix memory leak in ip_mc_add1_src · d8e29730
      Chengyang Fan authored
      BUG: memory leak
      unreferenced object 0xffff888101bc4c00 (size 32):
        comm "syz-executor527", pid 360, jiffies 4294807421 (age 19.329s)
        hex dump (first 32 bytes):
          00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
          01 00 00 00 00 00 00 00 ac 14 14 bb 00 00 02 00 ................
        backtrace:
          [<00000000f17c5244>] kmalloc include/linux/slab.h:558 [inline]
          [<00000000f17c5244>] kzalloc include/linux/slab.h:688 [inline]
          [<00000000f17c5244>] ip_mc_add1_src net/ipv4/igmp.c:1971 [inline]
          [<00000000f17c5244>] ip_mc_add_src+0x95f/0xdb0 net/ipv4/igmp.c:2095
          [<000000001cb99709>] ip_mc_source+0x84c/0xea0 net/ipv4/igmp.c:2416
          [<0000000052cf19ed>] do_ip_setsockopt net/ipv4/ip_sockglue.c:1294 [inline]
          [<0000000052cf19ed>] ip_setsockopt+0x114b/0x30c0 net/ipv4/ip_sockglue.c:1423
          [<00000000477edfbc>] raw_setsockopt+0x13d/0x170 net/ipv4/raw.c:857
          [<00000000e75ca9bb>] __sys_setsockopt+0x158/0x270 net/socket.c:2117
          [<00000000bdb993a8>] __do_sys_setsockopt net/socket.c:2128 [inline]
          [<00000000bdb993a8>] __se_sys_setsockopt net/socket.c:2125 [inline]
          [<00000000bdb993a8>] __x64_sys_setsockopt+0xba/0x150 net/socket.c:2125
          [<000000006a1ffdbd>] do_syscall_64+0x40/0x80 arch/x86/entry/common.c:47
          [<00000000b11467c4>] entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      In commit 24803f38 ("igmp: do not remove igmp souce list info when set
      link down"), the ip_mc_clear_src() in ip_mc_destroy_dev() was removed,
      because it was also called in igmpv3_clear_delrec().
      
      Rough callgraph:
      
      inetdev_destroy
      -> ip_mc_destroy_dev
           -> igmpv3_clear_delrec
              -> ip_mc_clear_src
      -> RCU_INIT_POINTER(dev->ip_ptr, NULL)
      
      However, ip_mc_clear_src() called in igmpv3_clear_delrec() doesn't
      release in_dev->mc_list->sources. And RCU_INIT_POINTER() assigns the
      NULL to dev->ip_ptr. As a result, in_dev cannot be obtained through
      inetdev_by_index() and then in_dev->mc_list->sources cannot be released
      by ip_mc_del1_src() in the sock_close. Rough call sequence goes like:
      
      sock_close
      -> __sock_release
         -> inet_release
            -> ip_mc_drop_socket
               -> inetdev_by_index
               -> ip_mc_leave_src
                  -> ip_mc_del_src
                     -> ip_mc_del1_src
      
      So we still need to call ip_mc_clear_src() in ip_mc_destroy_dev() to free
      in_dev->mc_list->sources.
      
      Fixes: 24803f38 ("igmp: do not remove igmp souce list info ...")
      Reported-by: default avatarHulk Robot <hulkci@huawei.com>
      Signed-off-by: default avatarChengyang Fan <cy.fan@huawei.com>
      Acked-by: default avatarHangbin Liu <liuhangbin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d8e29730
    • David S. Miller's avatar
      Merge branch 'fec-ptp-fixes' · c0d982bf
      David S. Miller authored
      Joakim Zhang says:
      
      ====================
      net: fixes for fec ptp
      
      Small fixes for fec ptp.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c0d982bf
    • Joakim Zhang's avatar
      net: fec_ptp: fix issue caused by refactor the fec_devtype · d2376564
      Joakim Zhang authored
      Commit da722186 ("net: fec: set GPR bit on suspend by DT configuration.")
      refactor the fec_devtype, need adjust ptp driver accordingly.
      
      Fixes: da722186 ("net: fec: set GPR bit on suspend by DT configuration.")
      Signed-off-by: default avatarJoakim Zhang <qiangqing.zhang@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d2376564