1. 28 Feb, 2022 7 commits
    • Magnus Karlsson's avatar
      xsk: Fix race at socket teardown · 18b1ab7a
      Magnus Karlsson authored
      Fix a race in the xsk socket teardown code that can lead to a NULL pointer
      dereference splat. The current xsk unbind code in xsk_unbind_dev() starts by
      setting xs->state to XSK_UNBOUND, sets xs->dev to NULL and then waits for any
      NAPI processing to terminate using synchronize_net(). After that, the release
      code starts to tear down the socket state and free allocated memory.
      
        BUG: kernel NULL pointer dereference, address: 00000000000000c0
        PGD 8000000932469067 P4D 8000000932469067 PUD 0
        Oops: 0000 [#1] PREEMPT SMP PTI
        CPU: 25 PID: 69132 Comm: grpcpp_sync_ser Tainted: G          I       5.16.0+ #2
        Hardware name: Dell Inc. PowerEdge R730/0599V5, BIOS 1.2.10 03/09/2015
        RIP: 0010:__xsk_sendmsg+0x2c/0x690
        [...]
        RSP: 0018:ffffa2348bd13d50 EFLAGS: 00010246
        RAX: 0000000000000000 RBX: 0000000000000040 RCX: ffff8d5fc632d258
        RDX: 0000000000400000 RSI: ffffa2348bd13e10 RDI: ffff8d5fc5489800
        RBP: ffffa2348bd13db0 R08: 0000000000000000 R09: 00007ffffffff000
        R10: 0000000000000000 R11: 0000000000000000 R12: ffff8d5fc5489800
        R13: ffff8d5fcb0f5140 R14: ffff8d5fcb0f5140 R15: 0000000000000000
        FS:  00007f991cff9400(0000) GS:ffff8d6f1f700000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: 00000000000000c0 CR3: 0000000114888005 CR4: 00000000001706e0
        Call Trace:
        <TASK>
        ? aa_sk_perm+0x43/0x1b0
        xsk_sendmsg+0xf0/0x110
        sock_sendmsg+0x65/0x70
        __sys_sendto+0x113/0x190
        ? debug_smp_processor_id+0x17/0x20
        ? fpregs_assert_state_consistent+0x23/0x50
        ? exit_to_user_mode_prepare+0xa5/0x1d0
        __x64_sys_sendto+0x29/0x30
        do_syscall_64+0x3b/0xc0
        entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      There are two problems with the current code. First, setting xs->dev to NULL
      before waiting for all users to stop using the socket is not correct. The
      entry to the data plane functions xsk_poll(), xsk_sendmsg(), and xsk_recvmsg()
      are all guarded by a test that xs->state is in the state XSK_BOUND and if not,
      it returns right away. But one process might have passed this test but still
      have not gotten to the point in which it uses xs->dev in the code. In this
      interim, a second process executing xsk_unbind_dev() might have set xs->dev to
      NULL which will lead to a crash for the first process. The solution here is
      just to get rid of this NULL assignment since it is not used anymore. Before
      commit 42fddcc7 ("xsk: use state member for socket synchronization"),
      xs->dev was the gatekeeper to admit processes into the data plane functions,
      but it was replaced with the state variable xs->state in the aforementioned
      commit.
      
      The second problem is that synchronize_net() does not wait for any process in
      xsk_poll(), xsk_sendmsg(), or xsk_recvmsg() to complete, which means that the
      state they rely on might be cleaned up prematurely. This can happen when the
      notifier gets called (at driver unload for example) as it uses xsk_unbind_dev().
      Solve this by extending the RCU critical region from just the ndo_xsk_wakeup
      to the whole functions mentioned above, so that both the test of xs->state ==
      XSK_BOUND and the last use of any member of xs is covered by the RCU critical
      section. This will guarantee that when synchronize_net() completes, there will
      be no processes left executing xsk_poll(), xsk_sendmsg(), or xsk_recvmsg() and
      state can be cleaned up safely. Note that we need to drop the RCU lock for the
      skb xmit path as it uses functions that might sleep. Due to this, we have to
      retest the xs->state after we grab the mutex that protects the skb xmit code
      from, among a number of things, an xsk_unbind_dev() being executed from the
      notifier at the same time.
      
      Fixes: 42fddcc7 ("xsk: use state member for socket synchronization")
      Reported-by: default avatarElza Mathew <elza.mathew@intel.com>
      Signed-off-by: default avatarMagnus Karlsson <magnus.karlsson@intel.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarBjörn Töpel <bjorn@kernel.org>
      Link: https://lore.kernel.org/bpf/20220228094552.10134-1-magnus.karlsson@gmail.com
      18b1ab7a
    • Lorenz Bauer's avatar
      bpf: Remove Lorenz Bauer from L7 BPF maintainers · f54eeae9
      Lorenz Bauer authored
      I'm leaving my position at Cloudflare and therefore won't have the
      necessary time and insight to maintain the sockmap code. It's in
      more capable hands with Jakub anyways.
      Signed-off-by: default avatarLorenz Bauer <lmb@cloudflare.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Link: https://lore.kernel.org/bpf/20220222103925.25802-2-lmb@cloudflare.com
      f54eeae9
    • Alex Elder's avatar
      net: ipa: fix a build dependency · caef14b7
      Alex Elder authored
      An IPA build problem arose in the linux-next tree the other day.
      The problem is that a recent commit adds a new dependency on some
      code, and the Kconfig file for IPA doesn't reflect that dependency.
      As a result, some configurations can fail to build (particularly
      when COMPILE_TEST is enabled).
      
      The recent patch adds calls to qmp_get(), qmp_put(), and qmp_send(),
      and those are built based on the QCOM_AOSS_QMP config option.  If
      that symbol is not defined, stubs are defined, so we just need to
      ensure QCOM_AOSS_QMP is compatible with QCOM_IPA, or it's not
      defined.
      Reported-by: default avatarRandy Dunlap <rdunlap@infradead.org>
      Fixes: 34a08176 ("net: ipa: request IPA register values be retained")
      Signed-off-by: default avatarAlex Elder <elder@linaro.org>
      Tested-by: default avatarRandy Dunlap <rdunlap@infradead.org>
      Acked-by: default avatarRandy Dunlap <rdunlap@infradead.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      caef14b7
    • Jia-Ju Bai's avatar
      atm: firestream: check the return value of ioremap() in fs_init() · d4e26aae
      Jia-Ju Bai authored
      The function ioremap() in fs_init() can fail, so its return value should
      be checked.
      Reported-by: default avatarTOTE Robot <oslab@tsinghua.edu.cn>
      Signed-off-by: default avatarJia-Ju Bai <baijiaju1990@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d4e26aae
    • Casper Andersson's avatar
      net: sparx5: Add #include to remove warning · 90d40252
      Casper Andersson authored
      main.h uses NUM_TARGETS from main_regs.h, but
      the missing include never causes any errors
      because everywhere main.h is (currently)
      included, main_regs.h is included before.
      But since it is dependent on main_regs.h
      it should always be included.
      Signed-off-by: default avatarCasper Andersson <casper.casan@gmail.com>
      Reviewed-by: default avatarJoacim Zetterling <joacim.zetterling@westermo.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      90d40252
    • Tony Lu's avatar
      net/smc: Fix cleanup when register ULP fails · 4d08b7b5
      Tony Lu authored
      This patch calls smc_ib_unregister_client() when tcp_register_ulp()
      fails, and make sure to clean it up.
      
      Fixes: d7cd421d ("net/smc: Introduce TCP ULP support")
      Signed-off-by: default avatarTony Lu <tonylu@linux.alibaba.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4d08b7b5
    • j.nixdorf@avm.de's avatar
      net: ipv6: ensure we call ipv6_mc_down() at most once · 9995b408
      j.nixdorf@avm.de authored
      There are two reasons for addrconf_notify() to be called with NETDEV_DOWN:
      either the network device is actually going down, or IPv6 was disabled
      on the interface.
      
      If either of them stays down while the other is toggled, we repeatedly
      call the code for NETDEV_DOWN, including ipv6_mc_down(), while never
      calling the corresponding ipv6_mc_up() in between. This will cause a
      new entry in idev->mc_tomb to be allocated for each multicast group
      the interface is subscribed to, which in turn leaks one struct ifmcaddr6
      per nontrivial multicast group the interface is subscribed to.
      
      The following reproducer will leak at least $n objects:
      
      ip addr add ff2e::4242/32 dev eth0 autojoin
      sysctl -w net.ipv6.conf.eth0.disable_ipv6=1
      for i in $(seq 1 $n); do
      	ip link set up eth0; ip link set down eth0
      done
      
      Joining groups with IPV6_ADD_MEMBERSHIP (unprivileged) or setting the
      sysctl net.ipv6.conf.eth0.forwarding to 1 (=> subscribing to ff02::2)
      can also be used to create a nontrivial idev->mc_list, which will the
      leak objects with the right up-down-sequence.
      
      Based on both sources for NETDEV_DOWN events the interface IPv6 state
      should be considered:
      
       - not ready if the network interface is not ready OR IPv6 is disabled
         for it
       - ready if the network interface is ready AND IPv6 is enabled for it
      
      The functions ipv6_mc_up() and ipv6_down() should only be run when this
      state changes.
      
      Implement this by remembering when the IPv6 state is ready, and only
      run ipv6_mc_down() if it actually changed from ready to not ready.
      
      The other direction (not ready -> ready) already works correctly, as:
      
       - the interface notification triggered codepath for NETDEV_UP /
         NETDEV_CHANGE returns early if ipv6 is disabled, and
       - the disable_ipv6=0 triggered codepath skips fully initializing the
         interface as long as addrconf_link_ready(dev) returns false
       - calling ipv6_mc_up() repeatedly does not leak anything
      
      Fixes: 3ce62a84 ("ipv6: exit early in addrconf_notify() if IPv6 is disabled")
      Signed-off-by: default avatarJohannes Nixdorf <j.nixdorf@avm.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9995b408
  2. 26 Feb, 2022 1 commit
    • David S. Miller's avatar
      Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue · 519ca6fa
      David S. Miller authored
      Tony Nguyen says:
      
      ====================
      Intel Wired LAN Driver Updates 2022-02-25
      
      This series contains updates to iavf driver only.
      
      Slawomir fixes stability issues that can be seen when stressing the
      driver using a large number of VFs with a multitude of operations.
      Among the fixes are reworking mutexes to provide more effective locking,
      ensuring initialization is complete before teardown, preventing
      operations which could race while removing the driver, stopping certain
      tasks from being queued when the device is down, and adding a missing
      mutex unlock.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      519ca6fa
  3. 25 Feb, 2022 32 commits