1. 24 Jun, 2023 14 commits
    • Kuniyuki Iwashima's avatar
      af_unix: Call scm_recv() only after scm_set_cred(). · 3f5f118b
      Kuniyuki Iwashima authored
      syzkaller hit a WARN_ON_ONCE(!scm->pid) in scm_pidfd_recv().
      
      In unix_stream_read_generic(), if there is no skb in the queue, we could
      bail out the do-while loop without calling scm_set_cred():
      
        1. No skb in the queue
        2. sk is non-blocking
             or
           shutdown(sk, RCV_SHUTDOWN) is called concurrently
             or
           peer calls close()
      
      If the socket is configured with SO_PASSCRED or SO_PASSPIDFD, scm_recv()
      would populate cmsg with garbage.
      
      Let's not call scm_recv() unless there is skb to receive.
      
      WARNING: CPU: 1 PID: 3245 at include/net/scm.h:138 scm_pidfd_recv include/net/scm.h:138 [inline]
      WARNING: CPU: 1 PID: 3245 at include/net/scm.h:138 scm_recv.constprop.0+0x754/0x850 include/net/scm.h:177
      Modules linked in:
      CPU: 1 PID: 3245 Comm: syz-executor.1 Not tainted 6.4.0-rc5-01219-gfa0e21fa #2
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
      RIP: 0010:scm_pidfd_recv include/net/scm.h:138 [inline]
      RIP: 0010:scm_recv.constprop.0+0x754/0x850 include/net/scm.h:177
      Code: 67 fd e9 55 fd ff ff e8 4a 70 67 fd e9 7f fd ff ff e8 40 70 67 fd e9 3e fb ff ff e8 36 70 67 fd e9 02 fd ff ff e8 8c 3a 20 fd <0f> 0b e9 fe fb ff ff e8 50 70 67 fd e9 2e f9 ff ff e8 46 70 67 fd
      RSP: 0018:ffffc90009af7660 EFLAGS: 00010216
      RAX: 00000000000000a1 RBX: ffff888041e58a80 RCX: ffffc90003852000
      RDX: 0000000000040000 RSI: ffffffff842675b4 RDI: 0000000000000007
      RBP: ffffc90009af7810 R08: 0000000000000007 R09: 0000000000000013
      R10: 00000000000000f8 R11: 0000000000000001 R12: ffffc90009af7db0
      R13: 0000000000000000 R14: ffff888041e58a88 R15: 1ffff9200135eecc
      FS:  00007f6b7113f640(0000) GS:ffff88806cf00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007f6b7111de38 CR3: 0000000012a6e002 CR4: 0000000000770ee0
      PKRU: 55555554
      Call Trace:
       <TASK>
       unix_stream_read_generic+0x5fe/0x1f50 net/unix/af_unix.c:2830
       unix_stream_recvmsg+0x194/0x1c0 net/unix/af_unix.c:2880
       sock_recvmsg_nosec net/socket.c:1019 [inline]
       sock_recvmsg+0x188/0x1d0 net/socket.c:1040
       ____sys_recvmsg+0x210/0x610 net/socket.c:2712
       ___sys_recvmsg+0xff/0x190 net/socket.c:2754
       do_recvmmsg+0x25d/0x6c0 net/socket.c:2848
       __sys_recvmmsg net/socket.c:2927 [inline]
       __do_sys_recvmmsg net/socket.c:2950 [inline]
       __se_sys_recvmmsg net/socket.c:2943 [inline]
       __x64_sys_recvmmsg+0x224/0x290 net/socket.c:2943
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x72/0xdc
      RIP: 0033:0x7f6b71da2e5d
      Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 73 9f 1b 00 f7 d8 64 89 01 48
      RSP: 002b:00007f6b7113ecc8 EFLAGS: 00000246 ORIG_RAX: 000000000000012b
      RAX: ffffffffffffffda RBX: 00000000004bc050 RCX: 00007f6b71da2e5d
      RDX: 0000000000000007 RSI: 0000000020006600 RDI: 000000000000000b
      RBP: 00000000004bc050 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000120 R11: 0000000000000246 R12: 0000000000000000
      R13: 000000000000006e R14: 00007f6b71e03530 R15: 0000000000000000
       </TASK>
      
      Fixes: 5e2ff670 ("scm: add SO_PASSPIDFD and SCM_PIDFD")
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Reported-by: default avatarsyzkaller <syzkaller@googlegroups.com>
      Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Reviewed-by: default avatarAlexander Mikhalitsyn <alexander@mihalicyn.com>
      Link: https://lore.kernel.org/r/20230622184351.91544-1-kuniyu@amazon.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      3f5f118b
    • Jakub Kicinski's avatar
      Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue · 1c78eb87
      Jakub Kicinski authored
      Tony Nguyen says:
      
      ====================
      Intel Wired LAN Driver Updates 2023-06-22 (iavf)
      
      This series contains updates to iavf driver only.
      
      Przemek defers removing, previous, primary MAC address until after
      getting result of adding its replacement. He also does some cleanup by
      removing unused functions and making applicable functions static.
      
      * '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
        iavf: make functions static where possible
        iavf: remove some unused functions and pointless wrappers
        iavf: fix err handling for MAC replace
      ====================
      
      Link: https://lore.kernel.org/r/20230622165914.2203081-1-anthony.l.nguyen@intel.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      1c78eb87
    • Randy Dunlap's avatar
      revert "s390/net: lcs: use IS_ENABLED() for kconfig detection" · 6a11af7c
      Randy Dunlap authored
      The referenced patch is causing build errors when ETHERNET=y and
      FDDI=m. While we work out the preferred patch(es), revert this patch
      to make the pain go away.
      
      Fixes: 12827233 ("s390/net: lcs: use IS_ENABLED() for kconfig detection")
      Signed-off-by: default avatarRandy Dunlap <rdunlap@infradead.org>
      Reported-by: default avatarkernel test robot <lkp@intel.com>
      Link: lore.kernel.org/r/202306202129.pl0AqK8G-lkp@intel.com
      Cc: Alexandra Winter <wintera@linux.ibm.com>
      Cc: Wenjia Zhang <wenjia@linux.ibm.com>
      Cc: Heiko Carstens <hca@linux.ibm.com>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Alexander Gordeev <agordeev@linux.ibm.com>
      Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
      Cc: Sven Schnelle <svens@linux.ibm.com>
      Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Link: https://lore.kernel.org/r/20230622155409.27311-1-rdunlap@infradead.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      6a11af7c
    • Giulio Benetti's avatar
      28e219ae
    • Jakub Kicinski's avatar
      Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next · a685d0df
      Jakub Kicinski authored
      Daniel Borkmann says:
      
      ====================
      pull-request: bpf-next 2023-06-23
      
      We've added 49 non-merge commits during the last 24 day(s) which contain
      a total of 70 files changed, 1935 insertions(+), 442 deletions(-).
      
      The main changes are:
      
      1) Extend bpf_fib_lookup helper to allow passing the route table ID,
         from Louis DeLosSantos.
      
      2) Fix regsafe() in verifier to call check_ids() for scalar registers,
         from Eduard Zingerman.
      
      3) Extend the set of cpumask kfuncs with bpf_cpumask_first_and()
         and a rework of bpf_cpumask_any*() kfuncs. Additionally,
         add selftests, from David Vernet.
      
      4) Fix socket lookup BPF helpers for tc/XDP to respect VRF bindings,
         from Gilad Sever.
      
      5) Change bpf_link_put() to use workqueue unconditionally to fix it
         under PREEMPT_RT, from Sebastian Andrzej Siewior.
      
      6) Follow-ups to address issues in the bpf_refcount shared ownership
         implementation, from Dave Marchevsky.
      
      7) A few general refactorings to BPF map and program creation permissions
         checks which were part of the BPF token series, from Andrii Nakryiko.
      
      8) Various fixes for benchmark framework and add a new benchmark
         for BPF memory allocator to BPF selftests, from Hou Tao.
      
      9) Documentation improvements around iterators and trusted pointers,
         from Anton Protopopov.
      
      10) Small cleanup in verifier to improve allocated object check,
          from Daniel T. Lee.
      
      11) Improve performance of bpf_xdp_pointer() by avoiding access
          to shared_info when XDP packet does not have frags,
          from Jesper Dangaard Brouer.
      
      12) Silence a harmless syzbot-reported warning in btf_type_id_size(),
          from Yonghong Song.
      
      13) Remove duplicate bpfilter_umh_cleanup in favor of umd_cleanup_helper,
          from Jarkko Sakkinen.
      
      14) Fix BPF selftests build for resolve_btfids under custom HOSTCFLAGS,
          from Viktor Malik.
      
      * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (49 commits)
        bpf, docs: Document existing macros instead of deprecated
        bpf, docs: BPF Iterator Document
        selftests/bpf: Fix compilation failure for prog vrf_socket_lookup
        selftests/bpf: Add vrf_socket_lookup tests
        bpf: Fix bpf socket lookup from tc/xdp to respect socket VRF bindings
        bpf: Call __bpf_sk_lookup()/__bpf_skc_lookup() directly via TC hookpoint
        bpf: Factor out socket lookup functions for the TC hookpoint.
        selftests/bpf: Set the default value of consumer_cnt as 0
        selftests/bpf: Ensure that next_cpu() returns a valid CPU number
        selftests/bpf: Output the correct error code for pthread APIs
        selftests/bpf: Use producer_cnt to allocate local counter array
        xsk: Remove unused inline function xsk_buff_discard()
        bpf: Keep BPF_PROG_LOAD permission checks clear of validations
        bpf: Centralize permissions checks for all BPF map types
        bpf: Inline map creation logic in map_create() function
        bpf: Move unprivileged checks into map_create() and bpf_prog_load()
        bpf: Remove in_atomic() from bpf_link_put().
        selftests/bpf: Verify that check_ids() is used for scalars in regsafe()
        bpf: Verify scalar ids mapping in regsafe() using check_ids()
        selftests/bpf: Check if mark_chain_precision() follows scalar ids
        ...
      ====================
      
      Link: https://lore.kernel.org/r/20230623211256.8409-1-daniel@iogearbox.netSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      a685d0df
    • Jakub Kicinski's avatar
      Merge branch 'mlxsw-maintain-candidate-rifs' · d1d29a42
      Jakub Kicinski authored
      Petr Machata says:
      
      ====================
      mlxsw: Maintain candidate RIFs
      
      The mlxsw driver currently makes the assumption that the user applies
      configuration in a bottom-up manner. Thus netdevices need to be added to
      the bridge before IP addresses are configured on that bridge or SVI added
      on top of it. Enslaving a netdevice to another netdevice that already has
      uppers is in fact forbidden by mlxsw for this reason. Despite this safety,
      it is rather easy to get into situations where the offloaded configuration
      is just plain wrong.
      
      As an example, take a front panel port, configure an IP address: it gets a
      RIF. Now enslave the port to the bridge, and the RIF is gone. Remove the
      port from the bridge again, but the RIF never comes back. There is a number
      of similar situations, where changing the configuration there and back
      utterly breaks the offload.
      
      The situation is going to be made better by implementing a range of replays
      and post-hoc offloads.
      
      This patch set lays the ground for replay of next hops. The particular
      issue that it deals with is that currently, driver-specific bookkeeping for
      next hops is hooked off RIF objects, which come and go across the lifetime
      of a netdevice. We would rather keep these objects at an entity that
      mirrors the lifetime of the netdevice itself. That way they are at hand and
      can be offloaded when a RIF is eventually created.
      
      To that end, with this patchset, mlxsw keeps a hash table of CRIFs:
      candidate RIFs, persistent handles for netdevices that mlxsw deems
      potentially interesting. The lifetime of a CRIF matches that of the
      underlying netdevice, and thus a RIF can always assume a CRIF exists. A
      CRIF is where next hops are kept, and when RIF is created, these next hops
      can be easily offloaded. (Previously only the next hops created after the
      RIF was created were offloaded.)
      
      - Patches #1 and #2 are minor adjustments.
      - In patches #3 and #4, add CRIF bookkeeping.
      - In patch #5, link CRIFs to RIFs such that given a netdevice-backed RIF,
        the corresponding CRIF is easy to look up.
      - Patch #6 is a clean-up allowed by the previous patches
      - Patches #7 and #8 move next hop tracking to CRIFs
      
      No observable effects are intended as of yet. This will be useful once
      there is support for RIF creation for netdevices that become mlxsw uppers,
      which will come in following patch sets.
      ====================
      
      Link: https://lore.kernel.org/r/cover.1687438411.git.petrm@nvidia.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      d1d29a42
    • Petr Machata's avatar
      mlxsw: spectrum_router: Track next hops at CRIFs · 9464a3d6
      Petr Machata authored
      Move the list of next hops from struct mlxsw_sp_rif to mlxsw_sp_crif. The
      reason is that eventually, next hops for mlxsw uppers should be offloaded
      and unoffloaded on demand as a netdevice becomes an upper, or stops being
      one. Currently, next hops are tracked at RIFs, but RIFs do not exist when a
      netdevice is not an mlxsw uppers. CRIFs are kept track of throughout the
      netdevice lifetime.
      
      Correspondingly, track at each next hop not its RIF, but its CRIF (from
      which a RIF can always be deduced).
      
      Note that now that next hops are tracked at a CRIF, it is not necessary to
      move each over to a new RIF when it is necessary to edit a RIF. Therefore
      drop mlxsw_sp_nexthop_rif_migrate() and have mlxsw_sp_rif_migrate_destroy()
      call mlxsw_sp_nexthop_rif_update() directly.
      Signed-off-by: default avatarPetr Machata <petrm@nvidia.com>
      Reviewed-by: default avatarDanielle Ratson <danieller@nvidia.com>
      Link: https://lore.kernel.org/r/e7c1c0a7dd13883b0f09aeda12c4fcf4d63a70e3.1687438411.git.petrm@nvidia.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      9464a3d6
    • Petr Machata's avatar
      mlxsw: spectrum_router: Split nexthop finalization to two stages · a285d664
      Petr Machata authored
      Nexthop finalization consists of two steps: the part where the offload is
      removed, because the backing RIF is now gone; and the part where the
      association to the RIF is severed.
      
      Extract from mlxsw_sp_nexthop_type_fini() a helper that covers the
      unoffloading part, mlxsw_sp_nexthop_type_rif_gone(), so that it can later
      be called independently.
      
      Note that this swaps around the ordering of mlxsw_sp_nexthop_ipip_fini()
      vs. mlxsw_sp_nexthop_rif_fini(). The current ordering is more of a
      historical happenstance than a conscious decision. The two cleanups do not
      depend on each other, and this change should have no observable effects.
      Signed-off-by: default avatarPetr Machata <petrm@nvidia.com>
      Reviewed-by: default avatarDanielle Ratson <danieller@nvidia.com>
      Link: https://lore.kernel.org/r/7134559534c5f5c4807c3a1569fae56f8887e763.1687438411.git.petrm@nvidia.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      a285d664
    • Petr Machata's avatar
      mlxsw: spectrum_router: Use router.lb_crif instead of .lb_rif_index · bdc0b78e
      Petr Machata authored
      A previous patch added a pointer to loopback CRIF to the router data
      structure. That makes the loopback RIF index redundant, as everything
      necessary can be derived from the CRIF. Drop the field and adjust the code
      accordingly.
      Signed-off-by: default avatarPetr Machata <petrm@nvidia.com>
      Reviewed-by: default avatarDanielle Ratson <danieller@nvidia.com>
      Link: https://lore.kernel.org/r/8637bf959bc5b6c9d5184b9bd8a0cd53c5132835.1687438411.git.petrm@nvidia.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      bdc0b78e
    • Petr Machata's avatar
      mlxsw: spectrum_router: Link CRIFs to RIFs · aa21242b
      Petr Machata authored
      When a RIF is about to be created, the registration of the netdevice that
      it should be associated with must have been seen in the past, and a CRIF
      created. Therefore make this a hard requirement by looking up the CRIF
      during RIF creation, and complaining loudly when there isn't one.
      
      This then allows to keep a link between a RIF and its corresponding
      CRIF (and back, as the relationship is one-to-at-most-one), which do.
      
      The CRIF will later be useful as the objects tracked there will be
      offloaded lazily as a result of RIF creation.
      
      CRIFs are created when an "interesting" netdevice is registered, and
      destroyed after such device is unregistered. CRIFs are supposed to already
      exist when a RIF creation request arises, and exist at least as long as
      that RIF exists. This makes for a simple invariant: it is always safe to
      dereference CRIF pointer from "its" RIF.
      
      To guarantee this, CRIFs cannot be removed immediately when the UNREGISTER
      event is delivered. The reason is that if a RIF's netdevices has an IPv6
      address, removal of this address is notified in an atomic block. To remove
      the RIF, the IPv6 removal handler schedules a work item. It must be safe
      for this work item to access the associated CRIF as well.
      
      Thus when a netdevice that backs the CRIF is removed, if it still has a
      RIF, do not actually free the CRIF, only toggle its can_destroy flag, which
      this patch adds. Later on, mlxsw_sp_rif_destroy() collects the CRIF.
      Signed-off-by: default avatarPetr Machata <petrm@nvidia.com>
      Reviewed-by: default avatarDanielle Ratson <danieller@nvidia.com>
      Link: https://lore.kernel.org/r/68c8e33afa6b8c03c431b435e1685ffdff752e63.1687438411.git.petrm@nvidia.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      aa21242b
    • Petr Machata's avatar
      mlxsw: spectrum_router: Maintain CRIF for fallback loopback RIF · 78126cfd
      Petr Machata authored
      CRIFs are generally not maintained for loopback RIFs. However, the RIF for
      the default VRF is used for offloading of blackhole nexthops. Nexthops
      expect to have a valid CRIF. Therefore in this patch, add code to maintain
      CRIF for the loopback RIF as well.
      Signed-off-by: default avatarPetr Machata <petrm@nvidia.com>
      Reviewed-by: default avatarDanielle Ratson <danieller@nvidia.com>
      Link: https://lore.kernel.org/r/7f2b2fcc98770167ed1254a904c3f7f585ba43f0.1687438411.git.petrm@nvidia.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      78126cfd
    • Petr Machata's avatar
      mlxsw: spectrum_router: Maintain a hash table of CRIFs · 4796c287
      Petr Machata authored
      CRIFs are objects that mlxsw maintains for netdevices that may not have an
      associated RIF (i.e. they may not have been instantiated in the ASIC), but
      if indeed they do not, it is quite possible they will in the future. These
      netdevices are candidate RIFs, hence CRIFs. Netdevices for which CRIFs are
      created include e.g. bridges, LAGs, or front panel ports. The idea is that
      next hops would be kept at CRIFs, not RIFs, and thus it would be easier to
      offload and unoffload the entities that have been added before the RIF was
      created.
      
      In this patch, add the code for low-level CRIF maintenance: create and
      destroy, and keep in a table keyed by the netdevice pointer for easy
      recall.
      Signed-off-by: default avatarPetr Machata <petrm@nvidia.com>
      Reviewed-by: default avatarDanielle Ratson <danieller@nvidia.com>
      Link: https://lore.kernel.org/r/186d44e399c475159da20689f2c540719f2d1ed0.1687438411.git.petrm@nvidia.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      4796c287
    • Petr Machata's avatar
      mlxsw: spectrum_router: Use mlxsw_sp_ul_rif_get() to get main VRF LB RIF · f3c85eed
      Petr Machata authored
      The current function, mlxsw_sp_router_ul_rif_get(), is a wrapper around the
      function mentioned in the subject. As such it forms an external interface
      of the router code.
      
      In future patches we will want to maintain connection between RIFs and the
      CRIFs (introduced in the next patch) that back them. That will not hold
      for the VRF-based loopback netdevices, so the whole CRIF business can be
      kept hidden from the rest of mlxsw.
      
      But for the main VRF loopback RIF we do want to keep the RIF-CRIF
      connection, because that RIF is used for blackhole next hops, and the next
      hop code can be kept simpler for assuming rif->crif is valid.
      
      Hence, instead, call mlxsw_sp_ul_rif_get() to create the main VRF loopback
      RIF. This being an internal function will take the CRIF argument anyway.
      Furthermore, the function does not lock, which is not necessary at this
      point in code yet.
      Signed-off-by: default avatarPetr Machata <petrm@nvidia.com>
      Reviewed-by: default avatarDanielle Ratson <danieller@nvidia.com>
      Link: https://lore.kernel.org/r/7a39a011a02a84164cd7f5da7985ec5b2ae01ba5.1687438411.git.petrm@nvidia.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      f3c85eed
    • Petr Machata's avatar
  2. 23 Jun, 2023 26 commits