1. 17 Feb, 2022 40 commits
    • Jakub Kicinski's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 6b5567b1
      Jakub Kicinski authored
      No conflicts.
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      6b5567b1
    • Linus Torvalds's avatar
      Merge tag 'net-5.17-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 8b97cae3
      Linus Torvalds authored
      Pull networking fixes from Jakub Kicinski:
       "Including fixes from wireless and netfilter.
      
        Current release - regressions:
      
         - dsa: lantiq_gswip: fix use after free in gswip_remove()
      
         - smc: avoid overwriting the copies of clcsock callback functions
      
        Current release - new code bugs:
      
         - iwlwifi:
            - fix use-after-free when no FW is present
            - mei: fix the pskb_may_pull check in ipv4
            - mei: retry mapping the shared area
            - mvm: don't feed the hardware RFKILL into iwlmei
      
        Previous releases - regressions:
      
         - ipv6: mcast: use rcu-safe version of ipv6_get_lladdr()
      
         - tipc: fix wrong publisher node address in link publications
      
         - iwlwifi: mvm: don't send SAR GEO command for 3160 devices, avoid FW
           assertion
      
         - bgmac: make idm and nicpm resource optional again
      
         - atl1c: fix tx timeout after link flap
      
        Previous releases - always broken:
      
         - vsock: remove vsock from connected table when connect is
           interrupted by a signal
      
         - ping: change destination interface checks to match raw sockets
      
         - crypto: af_alg - get rid of alg_memory_allocated to avoid confusing
           semantics (and null-deref) after SO_RESERVE_MEM was added
      
         - ipv6: make exclusive flowlabel checks per-netns
      
         - bonding: force carrier update when releasing slave
      
         - sched: limit TC_ACT_REPEAT loops
      
         - bridge: multicast: notify switchdev driver whenever MC processing
           gets disabled because of max entries reached
      
         - wifi: brcmfmac: fix crash in brcm_alt_fw_path when WLAN not found
      
         - iwlwifi: fix locking when "HW not ready"
      
         - phy: mediatek: remove PHY mode check on MT7531
      
         - dsa: mv88e6xxx: flush switchdev FDB workqueue before removing VLAN
      
         - dsa: lan9303:
            - fix polarity of reset during probe
            - fix accelerated VLAN handling"
      
      * tag 'net-5.17-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (65 commits)
        bonding: force carrier update when releasing slave
        nfp: flower: netdev offload check for ip6gretap
        ipv6: fix data-race in fib6_info_hw_flags_set / fib6_purge_rt
        ipv4: fix data races in fib_alias_hw_flags_set
        net: dsa: lan9303: add VLAN IDs to master device
        net: dsa: lan9303: handle hwaccel VLAN tags
        vsock: remove vsock from connected table when connect is interrupted by a signal
        Revert "net: ethernet: bgmac: Use devm_platform_ioremap_resource_byname"
        ping: fix the dif and sdif check in ping_lookup
        net: usb: cdc_mbim: avoid altsetting toggling for Telit FN990
        net: sched: limit TC_ACT_REPEAT loops
        tipc: fix wrong notification node addresses
        net: dsa: lantiq_gswip: fix use after free in gswip_remove()
        ipv6: per-netns exclusive flowlabel checks
        net: bridge: multicast: notify switchdev driver whenever MC processing gets disabled
        CDC-NCM: avoid overflow in sanity checking
        mctp: fix use after free
        net: mscc: ocelot: fix use-after-free in ocelot_vlan_del()
        bonding: fix data-races around agg_select_timer
        dpaa2-eth: Initialize mutex used in one step timestamping path
        ...
      8b97cae3
    • Zhang Changzhong's avatar
      bonding: force carrier update when releasing slave · a6ab75ce
      Zhang Changzhong authored
      In __bond_release_one(), bond_set_carrier() is only called when bond
      device has no slave. Therefore, if we remove the up slave from a master
      with two slaves and keep the down slave, the master will remain up.
      
      Fix this by moving bond_set_carrier() out of if (!bond_has_slaves(bond))
      statement.
      
      Reproducer:
      $ insmod bonding.ko mode=0 miimon=100 max_bonds=2
      $ ifconfig bond0 up
      $ ifenslave bond0 eth0 eth1
      $ ifconfig eth0 down
      $ ifenslave -d bond0 eth1
      $ cat /proc/net/bonding/bond0
      
      Fixes: ff59c456 ("[PATCH] bonding: support carrier state for master")
      Signed-off-by: default avatarZhang Changzhong <zhangchangzhong@huawei.com>
      Acked-by: default avatarJay Vosburgh <jay.vosburgh@canonical.com>
      Link: https://lore.kernel.org/r/1645021088-38370-1-git-send-email-zhangchangzhong@huawei.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      a6ab75ce
    • Luis Chamberlain's avatar
      fs/file_table: fix adding missing kmemleak_not_leak() · a3580ac9
      Luis Chamberlain authored
      Commit b42bc9a3 ("Fix regression due to "fs: move binfmt_misc sysctl
      to its own file") fixed a regression, however it failed to add a
      kmemleak_not_leak().
      
      Fixes: b42bc9a3 ("Fix regression due to "fs: move binfmt_misc sysctl to its own file")
      Reported-by: default avatarTong Zhang <ztong0001@gmail.com>
      Cc: Tong Zhang <ztong0001@gmail.com>
      Signed-off-by: default avatarLuis Chamberlain <mcgrof@kernel.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a3580ac9
    • Linus Torvalds's avatar
      Merge tag 'perf-tools-fixes-for-v5.17-2022-02-17' of... · 2dd3a8a1
      Linus Torvalds authored
      Merge tag 'perf-tools-fixes-for-v5.17-2022-02-17' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux
      
      Pull perf tools fixes from Arnaldo Carvalho de Melo:
      
       - Fix corrupt inject files when only last branch option is enabled with
         ARM CoreSight ETM
      
       - Fix use-after-free for realloc(..., 0) in libsubcmd, found by gcc 12
      
       - Defer freeing string after possible strlen() on it in the BPF loader,
         found by gcc 12
      
       - Avoid early exit in 'perf trace' due SIGCHLD from non-workload
         processes
      
       - Fix arm64 perf_event_attr 'perf test's wrt --call-graph
         initialization
      
       - Fix libperf 32-bit build for 'perf test' wrt uint64_t printf
      
       - Fix perf_cpu_map__for_each_cpu macro in libperf, providing access to
         the CPU iterator
      
       - Sync linux/perf_event.h UAPI with the kernel sources
      
       - Update Jiri Olsa's email address in MAINTAINERS
      
      * tag 'perf-tools-fixes-for-v5.17-2022-02-17' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux:
        perf bpf: Defer freeing string after possible strlen() on it
        perf test: Fix arm64 perf_event_attr tests wrt --call-graph initialization
        libsubcmd: Fix use-after-free for realloc(..., 0)
        libperf: Fix perf_cpu_map__for_each_cpu macro
        perf cs-etm: Fix corrupt inject files when only last branch option is enabled
        perf cs-etm: No-op refactor of synth opt usage
        libperf: Fix 32-bit build for tests uint64_t printf
        tools headers UAPI: Sync linux/perf_event.h with the kernel sources
        perf trace: Avoid early exit due SIGCHLD from non-workload processes
        MAINTAINERS: Update Jiri's email address
      2dd3a8a1
    • Linus Torvalds's avatar
      Merge tag 'modules-5.17-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux · edbd6c62
      Linus Torvalds authored
      Pull module fix from Luis Chamberlain:
       "Fixes module decompression when CONFIG_SYSFS=n
      
        The only fix trickled down for v5.17-rc cycle so far is the fix for
        module decompression when CONFIG_SYSFS=n. This was reported through
        0-day"
      
      * tag 'modules-5.17-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux:
        module: fix building with sysfs disabled
      edbd6c62
    • Danie du Toit's avatar
      nfp: flower: netdev offload check for ip6gretap · 7dbcda58
      Danie du Toit authored
      IPv6 GRE tunnels are not being offloaded, this is caused by a missing
      netdev offload check. The functionality of IPv6 GRE tunnel offloading
      was previously added but this check was not included. Adding the
      ip6gretap check allows IPv6 GRE tunnels to be offloaded correctly.
      
      Fixes: f7536ffb ("nfp: flower: Allow ipv6gretap interface for offloading")
      Signed-off-by: default avatarDanie du Toit <danie.dutoit@corigine.com>
      Signed-off-by: default avatarLouis Peens <louis.peens@corigine.com>
      Signed-off-by: default avatarSimon Horman <simon.horman@corigine.com>
      Link: https://lore.kernel.org/r/20220217124820.40436-1-louis.peens@corigine.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      7dbcda58
    • Eric Dumazet's avatar
      ipv6: fix data-race in fib6_info_hw_flags_set / fib6_purge_rt · d95d6320
      Eric Dumazet authored
      Because fib6_info_hw_flags_set() is called without any synchronization,
      all accesses to gi6->offload, fi->trap and fi->offload_failed
      need some basic protection like READ_ONCE()/WRITE_ONCE().
      
      BUG: KCSAN: data-race in fib6_info_hw_flags_set / fib6_purge_rt
      
      read to 0xffff8881087d5886 of 1 bytes by task 13953 on cpu 0:
       fib6_drop_pcpu_from net/ipv6/ip6_fib.c:1007 [inline]
       fib6_purge_rt+0x4f/0x580 net/ipv6/ip6_fib.c:1033
       fib6_del_route net/ipv6/ip6_fib.c:1983 [inline]
       fib6_del+0x696/0x890 net/ipv6/ip6_fib.c:2028
       __ip6_del_rt net/ipv6/route.c:3876 [inline]
       ip6_del_rt+0x83/0x140 net/ipv6/route.c:3891
       __ipv6_dev_ac_dec+0x2b5/0x370 net/ipv6/anycast.c:374
       ipv6_dev_ac_dec net/ipv6/anycast.c:387 [inline]
       __ipv6_sock_ac_close+0x141/0x200 net/ipv6/anycast.c:207
       ipv6_sock_ac_close+0x79/0x90 net/ipv6/anycast.c:220
       inet6_release+0x32/0x50 net/ipv6/af_inet6.c:476
       __sock_release net/socket.c:650 [inline]
       sock_close+0x6c/0x150 net/socket.c:1318
       __fput+0x295/0x520 fs/file_table.c:280
       ____fput+0x11/0x20 fs/file_table.c:313
       task_work_run+0x8e/0x110 kernel/task_work.c:164
       tracehook_notify_resume include/linux/tracehook.h:189 [inline]
       exit_to_user_mode_loop kernel/entry/common.c:175 [inline]
       exit_to_user_mode_prepare+0x160/0x190 kernel/entry/common.c:207
       __syscall_exit_to_user_mode_work kernel/entry/common.c:289 [inline]
       syscall_exit_to_user_mode+0x20/0x40 kernel/entry/common.c:300
       do_syscall_64+0x50/0xd0 arch/x86/entry/common.c:86
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      write to 0xffff8881087d5886 of 1 bytes by task 1912 on cpu 1:
       fib6_info_hw_flags_set+0x155/0x3b0 net/ipv6/route.c:6230
       nsim_fib6_rt_hw_flags_set drivers/net/netdevsim/fib.c:668 [inline]
       nsim_fib6_rt_add drivers/net/netdevsim/fib.c:691 [inline]
       nsim_fib6_rt_insert drivers/net/netdevsim/fib.c:756 [inline]
       nsim_fib6_event drivers/net/netdevsim/fib.c:853 [inline]
       nsim_fib_event drivers/net/netdevsim/fib.c:886 [inline]
       nsim_fib_event_work+0x284f/0x2cf0 drivers/net/netdevsim/fib.c:1477
       process_one_work+0x3f6/0x960 kernel/workqueue.c:2307
       worker_thread+0x616/0xa70 kernel/workqueue.c:2454
       kthread+0x2c7/0x2e0 kernel/kthread.c:327
       ret_from_fork+0x1f/0x30
      
      value changed: 0x22 -> 0x2a
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 1 PID: 1912 Comm: kworker/1:3 Not tainted 5.16.0-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Workqueue: events nsim_fib_event_work
      
      Fixes: 0c5fcf9e ("IPv6: Add "offload failed" indication to routes")
      Fixes: bb3c4ab9 ("ipv6: Add "offload" and "trap" indications to routes")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Amit Cohen <amcohen@nvidia.com>
      Cc: Ido Schimmel <idosch@nvidia.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Link: https://lore.kernel.org/r/20220216173217.3792411-2-eric.dumazet@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      d95d6320
    • Eric Dumazet's avatar
      ipv4: fix data races in fib_alias_hw_flags_set · 9fcf986c
      Eric Dumazet authored
      fib_alias_hw_flags_set() can be used by concurrent threads,
      and is only RCU protected.
      
      We need to annotate accesses to following fields of struct fib_alias:
      
          offload, trap, offload_failed
      
      Because of READ_ONCE()WRITE_ONCE() limitations, make these
      field u8.
      
      BUG: KCSAN: data-race in fib_alias_hw_flags_set / fib_alias_hw_flags_set
      
      read to 0xffff888134224a6a of 1 bytes by task 2013 on cpu 1:
       fib_alias_hw_flags_set+0x28a/0x470 net/ipv4/fib_trie.c:1050
       nsim_fib4_rt_hw_flags_set drivers/net/netdevsim/fib.c:350 [inline]
       nsim_fib4_rt_add drivers/net/netdevsim/fib.c:367 [inline]
       nsim_fib4_rt_insert drivers/net/netdevsim/fib.c:429 [inline]
       nsim_fib4_event drivers/net/netdevsim/fib.c:461 [inline]
       nsim_fib_event drivers/net/netdevsim/fib.c:881 [inline]
       nsim_fib_event_work+0x1852/0x2cf0 drivers/net/netdevsim/fib.c:1477
       process_one_work+0x3f6/0x960 kernel/workqueue.c:2307
       process_scheduled_works kernel/workqueue.c:2370 [inline]
       worker_thread+0x7df/0xa70 kernel/workqueue.c:2456
       kthread+0x1bf/0x1e0 kernel/kthread.c:377
       ret_from_fork+0x1f/0x30
      
      write to 0xffff888134224a6a of 1 bytes by task 4872 on cpu 0:
       fib_alias_hw_flags_set+0x2d5/0x470 net/ipv4/fib_trie.c:1054
       nsim_fib4_rt_hw_flags_set drivers/net/netdevsim/fib.c:350 [inline]
       nsim_fib4_rt_add drivers/net/netdevsim/fib.c:367 [inline]
       nsim_fib4_rt_insert drivers/net/netdevsim/fib.c:429 [inline]
       nsim_fib4_event drivers/net/netdevsim/fib.c:461 [inline]
       nsim_fib_event drivers/net/netdevsim/fib.c:881 [inline]
       nsim_fib_event_work+0x1852/0x2cf0 drivers/net/netdevsim/fib.c:1477
       process_one_work+0x3f6/0x960 kernel/workqueue.c:2307
       process_scheduled_works kernel/workqueue.c:2370 [inline]
       worker_thread+0x7df/0xa70 kernel/workqueue.c:2456
       kthread+0x1bf/0x1e0 kernel/kthread.c:377
       ret_from_fork+0x1f/0x30
      
      value changed: 0x00 -> 0x02
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 0 PID: 4872 Comm: kworker/0:0 Not tainted 5.17.0-rc3-syzkaller-00188-g1d41d2e8-dirty #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Workqueue: events nsim_fib_event_work
      
      Fixes: 90b93f1b ("ipv4: Add "offload" and "trap" indications to routes")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Reviewed-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Link: https://lore.kernel.org/r/20220216173217.3792411-1-eric.dumazet@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      9fcf986c
    • Mans Rullgard's avatar
      net: dsa: lan9303: add VLAN IDs to master device · 430065e2
      Mans Rullgard authored
      If the master device does VLAN filtering, the IDs used by the switch
      must be added for any frames to be received.  Do this in the
      port_enable() function, and remove them in port_disable().
      
      Fixes: a1292595 ("net: dsa: add new DSA switch driver for the SMSC-LAN9303")
      Signed-off-by: default avatarMans Rullgard <mans@mansr.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Reviewed-by: default avatarVladimir Oltean <olteanv@gmail.com>
      Link: https://lore.kernel.org/r/20220216204818.28746-1-mans@mansr.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      430065e2
    • Mans Rullgard's avatar
      net: dsa: lan9303: handle hwaccel VLAN tags · 017b355b
      Mans Rullgard authored
      Check for a hwaccel VLAN tag on rx and use it if present.  Otherwise,
      use __skb_vlan_pop() like the other tag parsers do.  This fixes the case
      where the VLAN tag has already been consumed by the master.
      
      Fixes: a1292595 ("net: dsa: add new DSA switch driver for the SMSC-LAN9303")
      Signed-off-by: default avatarMans Rullgard <mans@mansr.com>
      Reviewed-by: default avatarVladimir Oltean <olteanv@gmail.com>
      Link: https://lore.kernel.org/r/20220216124634.23123-1-mans@mansr.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      017b355b
    • Linus Torvalds's avatar
      mm: don't try to NUMA-migrate COW pages that have other uses · 80d47f5d
      Linus Torvalds authored
      Oded Gabbay reports that enabling NUMA balancing causes corruption with
      his Gaudi accelerator test load:
      
       "All the details are in the bug, but the bottom line is that somehow,
        this patch causes corruption when the numa balancing feature is
        enabled AND we don't use process affinity AND we use GUP to pin pages
        so our accelerator can DMA to/from system memory.
      
        Either disabling numa balancing, using process affinity to bind to
        specific numa-node or reverting this patch causes the bug to
        disappear"
      
      and Oded bisected the issue to commit 09854ba9 ("mm: do_wp_page()
      simplification").
      
      Now, the NUMA balancing shouldn't actually be changing the writability
      of a page, and as such shouldn't matter for COW.  But it appears it
      does.  Suspicious.
      
      However, regardless of that, the condition for enabling NUMA faults in
      change_pte_range() is nonsensical.  It uses "page_mapcount(page)" to
      decide if a COW page should be NUMA-protected or not, and that makes
      absolutely no sense.
      
      The number of mappings a page has is irrelevant: not only does GUP get a
      reference to a page as in Oded's case, but the other mappings migth be
      paged out and the only reference to them would be in the page count.
      
      Since we should never try to NUMA-balance a page that we can't move
      anyway due to other references, just fix the code to use 'page_count()'.
      Oded confirms that that fixes his issue.
      
      Now, this does imply that something in NUMA balancing ends up changing
      page protections (other than the obvious one of making the page
      inaccessible to get the NUMA faulting information).  Otherwise the COW
      simplification wouldn't matter - since doing the GUP on the page would
      make sure it's writable.
      
      The cause of that permission change would be good to figure out too,
      since it clearly results in spurious COW events - but fixing the
      nonsensical test that just happened to work before is obviously the
      CorrectThing(tm) to do regardless.
      
      Fixes: 09854ba9 ("mm: do_wp_page() simplification")
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=215616
      Link: https://lore.kernel.org/all/CAFCwf10eNmwq2wD71xjUhqkvv5+_pJMR1nPug2RqNDcFT4H86Q@mail.gmail.com/Reported-and-tested-by: default avatarOded Gabbay <oded.gabbay@gmail.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Peter Xu <peterx@redhat.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      80d47f5d
    • Seth Forshee's avatar
      vsock: remove vsock from connected table when connect is interrupted by a signal · b9208492
      Seth Forshee authored
      vsock_connect() expects that the socket could already be in the
      TCP_ESTABLISHED state when the connecting task wakes up with a signal
      pending. If this happens the socket will be in the connected table, and
      it is not removed when the socket state is reset. In this situation it's
      common for the process to retry connect(), and if the connection is
      successful the socket will be added to the connected table a second
      time, corrupting the list.
      
      Prevent this by calling vsock_remove_connected() if a signal is received
      while waiting for a connection. This is harmless if the socket is not in
      the connected table, and if it is in the table then removing it will
      prevent list corruption from a double add.
      
      Note for backporting: this patch requires d5afa82c ("vsock: correct
      removal of socket from the list"), which is in all current stable trees
      except 4.9.y.
      
      Fixes: d021c344 ("VSOCK: Introduce VM Sockets")
      Signed-off-by: default avatarSeth Forshee <sforshee@digitalocean.com>
      Reviewed-by: default avatarStefano Garzarella <sgarzare@redhat.com>
      Link: https://lore.kernel.org/r/20220217141312.2297547-1-sforshee@digitalocean.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      b9208492
    • Jonas Gorski's avatar
      Revert "net: ethernet: bgmac: Use devm_platform_ioremap_resource_byname" · 6aba04ee
      Jonas Gorski authored
      This reverts commit 3710e809.
      
      Since idm_base and nicpm_base are still optional resources not present
      on all platforms, this breaks the driver for everything except Northstar
      2 (which has both).
      
      The same change was already reverted once with 755f5738 ("net:
      broadcom: fix a mistake about ioremap resource").
      
      So let's do it again.
      
      Fixes: 3710e809 ("net: ethernet: bgmac: Use devm_platform_ioremap_resource_byname")
      Signed-off-by: default avatarJonas Gorski <jonas.gorski@gmail.com>
      [florian: Added comments to explain the resources are optional]
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Link: https://lore.kernel.org/r/20220216184634.2032460-1-f.fainelli@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      6aba04ee
    • Eric Dumazet's avatar
      ipv6/addrconf: ensure addrconf_verify_rtnl() has completed · be6b41c1
      Eric Dumazet authored
      Before freeing the hash table in addrconf_exit_net(),
      we need to make sure the work queue has completed,
      or risk NULL dereference or UAF.
      
      Thus, use cancel_delayed_work_sync() to enforce this.
      We do not hold RTNL in addrconf_exit_net(), making this safe.
      
      Fixes: 8805d13f ("ipv6/addrconf: use one delayed work per netns")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@kernel.org>
      Link: https://lore.kernel.org/r/20220216182037.3742-1-eric.dumazet@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      be6b41c1
    • Jakub Kicinski's avatar
      net: allow out-of-order netdev unregistration · faab39f6
      Jakub Kicinski authored
      Sprinkle for each loops to allow netdevices to be unregistered
      out of order, as their refs are released.
      
      This prevents problems caused by dependencies between netdevs
      which want to release references in their ->priv_destructor.
      See commit d6ff94af ("vlan: move dev_put into vlan_dev_uninit")
      for example.
      
      Eric has removed the only known ordering requirement in
      commit c002496b ("Merge branch 'ipv6-loopback'")
      so let's try this and see if anything explodes...
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Reviewed-by: default avatarXin Long <lucien.xin@gmail.com>
      Link: https://lore.kernel.org/r/20220215225310.3679266-2-kuba@kernel.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      faab39f6
    • Jakub Kicinski's avatar
      net: transition netdev reg state earlier in run_todo · ae68db14
      Jakub Kicinski authored
      In prep for unregistering netdevs out of order move the netdev
      state validation and change outside of the loop.
      
      While at it modernize this code and use WARN() instead of
      pr_err() + dump_stack().
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Reviewed-by: default avatarXin Long <lucien.xin@gmail.com>
      Link: https://lore.kernel.org/r/20220215225310.3679266-1-kuba@kernel.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      ae68db14
    • Xin Long's avatar
      ping: fix the dif and sdif check in ping_lookup · 35a79e64
      Xin Long authored
      When 'ping' changes to use PING socket instead of RAW socket by:
      
         # sysctl -w net.ipv4.ping_group_range="0 100"
      
      There is another regression caused when matching sk_bound_dev_if
      and dif, RAW socket is using inet_iif() while PING socket lookup
      is using skb->dev->ifindex, the cmd below fails due to this:
      
        # ip link add dummy0 type dummy
        # ip link set dummy0 up
        # ip addr add 192.168.111.1/24 dev dummy0
        # ping -I dummy0 192.168.111.1 -c1
      
      The issue was also reported on:
      
        https://github.com/iputils/iputils/issues/104
      
      But fixed in iputils in a wrong way by not binding to device when
      destination IP is on device, and it will cause some of kselftests
      to fail, as Jianlin noticed.
      
      This patch is to use inet(6)_iif and inet(6)_sdif to get dif and
      sdif for PING socket, and keep consistent with RAW socket.
      
      Fixes: c319b4d7 ("net: ipv4: add IPPROTO_ICMP socket kind")
      Reported-by: default avatarJianlin Shi <jishi@redhat.com>
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      35a79e64
    • Daniele Palmas's avatar
      net: usb: cdc_mbim: avoid altsetting toggling for Telit FN990 · 21e8a963
      Daniele Palmas authored
      Add quirk CDC_MBIM_FLAG_AVOID_ALTSETTING_TOGGLE for Telit FN990
      0x1071 composition in order to avoid bind error.
      Signed-off-by: default avatarDaniele Palmas <dnlplm@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      21e8a963
    • David S. Miller's avatar
      Merge branch 'ping6-SOL_IPV6' · 4d449bdc
      David S. Miller authored
      Jakub Kicinski says:
      
      ====================
      net: ping6: support setting basic SOL_IPV6 options via cmsg
      
      Support for IPV6_HOPLIMIT, IPV6_TCLASS, IPV6_DONTFRAG on ICMPv6
      sockets and associated tests. I have no immediate plans to
      implement IPV6_FLOWINFO and all the extension header stuff.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4d449bdc
    • Jakub Kicinski's avatar
      selftests: net: basic test for IPV6_2292* · a22982c3
      Jakub Kicinski authored
      Add a basic test to make sure ping sockets don't crash
      with IPV6_2292* options.
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a22982c3
    • Jakub Kicinski's avatar
      selftests: net: test IPV6_HOPLIMIT · 05ae83d5
      Jakub Kicinski authored
      Test setting IPV6_HOPLIMIT via setsockopt and cmsg
      across socket types.
      
      Output without the kernel support (this series):
      
        Case HOPLIMIT ICMP cmsg - packet data returned 1, expected 0
        Case HOPLIMIT ICMP diff - packet data returned 1, expected 0
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      05ae83d5
    • Jakub Kicinski's avatar
      selftests: net: test IPV6_TCLASS · 9657ad09
      Jakub Kicinski authored
      Test setting IPV6_TCLASS via setsockopt and cmsg
      across socket types.
      
      Output without the kernel support (this series):
      
        Case TCLASS ICMP cmsg - packet data returned 1, expected 0
        Case TCLASS ICMP cmsg - rejection returned 0, expected 1
        Case TCLASS ICMP diff - pass returned 1, expected 0
        Case TCLASS ICMP diff - packet data returned 1, expected 0
        Case TCLASS ICMP diff - rejection returned 0, expected 1
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9657ad09
    • Jakub Kicinski's avatar
      selftests: net: test IPV6_DONTFRAG · 6f97c7c6
      Jakub Kicinski authored
      Test setting IPV6_DONTFRAG via setsockopt and cmsg
      across socket types.
      
      Output without the kernel support (this series):
      
          Case DONTFRAG ICMP setsock returned 0, expected 1
          Case DONTFRAG ICMP cmsg returned 0, expected 1
          Case DONTFRAG ICMP both returned 0, expected 1
          Case DONTFRAG ICMP diff returned 0, expected 1
        FAIL - 4/24 cases failed
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6f97c7c6
    • Jakub Kicinski's avatar
      net: ping6: support setting basic SOL_IPV6 options via cmsg · 13651224
      Jakub Kicinski authored
      Support setting IPV6_HOPLIMIT, IPV6_TCLASS, IPV6_DONTFRAG
      during sendmsg via SOL_IPV6 cmsgs.
      
      tclass and dontfrag are init'ed from struct ipv6_pinfo in
      ipcm6_init_sk(), while hlimit is inited to -1, so we need
      to handle it being populated via cmsg explicitly.
      
      Leave extension headers and flowlabel unimplemented.
      Those are slightly more laborious to test and users
      seem to primarily care about IPV6_TCLASS.
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      13651224
    • David S. Miller's avatar
      Merge branch 'switchdev-BRENTRY' · d54f16c7
      David S. Miller authored
      Vladimir Oltean says:
      
      ====================
      kRemove BRENTRY checks from switchdev drivers
      
      As discussed here:
      https://patchwork.kernel.org/project/netdevbpf/patch/20220214233111.1586715-2-vladimir.oltean@nxp.com/#24738869
      
      no switchdev driver makes use of VLAN port objects that lack the
      BRIDGE_VLAN_INFO_BRENTRY flag. Notifying them in the first place rather
      seems like an omission of commit 9c86ce2c ("net: bridge: Notify
      about bridge VLANs").
      
      Since commit 3116ad06 ("net: bridge: vlan: don't notify to switchdev
      master VLANs without BRENTRY flag") that was just merged, the bridge no
      longer notifies switchdev upon creation of these VLANs, so we can remove
      the checks from drivers.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d54f16c7
    • Vladimir Oltean's avatar
      net: ti: cpsw: remove guards against !BRIDGE_VLAN_INFO_BRENTRY · 5edb65ea
      Vladimir Oltean authored
      Since commit 3116ad06 ("net: bridge: vlan: don't notify to switchdev
      master VLANs without BRENTRY flag"), the bridge no longer emits
      switchdev notifiers for VLANs that don't have the
      BRIDGE_VLAN_INFO_BRENTRY flag, so these checks are dead code.
      Remove them.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5edb65ea
    • Vladimir Oltean's avatar
      net: ti: am65-cpsw-nuss: remove guards against !BRIDGE_VLAN_INFO_BRENTRY · 1d21c327
      Vladimir Oltean authored
      Since commit 3116ad06 ("net: bridge: vlan: don't notify to switchdev
      master VLANs without BRENTRY flag"), the bridge no longer emits
      switchdev notifiers for VLANs that don't have the
      BRIDGE_VLAN_INFO_BRENTRY flag, so these checks are dead code.
      Remove them.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1d21c327
    • Vladimir Oltean's avatar
      net: sparx5: remove guards against !BRIDGE_VLAN_INFO_BRENTRY · 318994d3
      Vladimir Oltean authored
      Since commit 3116ad06 ("net: bridge: vlan: don't notify to switchdev
      master VLANs without BRENTRY flag"), the bridge no longer emits
      switchdev notifiers for VLANs that don't have the
      BRIDGE_VLAN_INFO_BRENTRY flag, so these checks are dead code.
      Remove them.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      318994d3
    • Vladimir Oltean's avatar
      net: lan966x: remove guards against !BRIDGE_VLAN_INFO_BRENTRY · ba43b547
      Vladimir Oltean authored
      Since commit 3116ad06 ("net: bridge: vlan: don't notify to switchdev
      master VLANs without BRENTRY flag"), the bridge no longer emits
      switchdev notifiers for VLANs that don't have the
      BRIDGE_VLAN_INFO_BRENTRY flag, so these checks are dead code.
      Remove them.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: default avatarHoratiu Vultur <horatiu.vultur@microchip.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ba43b547
    • Vladimir Oltean's avatar
      mlxsw: spectrum: remove guards against !BRIDGE_VLAN_INFO_BRENTRY · ddaff504
      Vladimir Oltean authored
      Since commit 3116ad06 ("net: bridge: vlan: don't notify to switchdev
      master VLANs without BRENTRY flag"), the bridge no longer emits
      switchdev notifiers for VLANs that don't have the
      BRIDGE_VLAN_INFO_BRENTRY flag, so these checks are dead code.
      Remove them.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ddaff504
    • David S. Miller's avatar
      Merge branch 'ptp-over-udp-dsa' · 5da1033b
      David S. Miller authored
      Vladimir Oltean says:
      
      ====================
      Support PTP over UDP with the ocelot-8021q DSA tagging protocol
      
      The alternative tag_8021q-based tagger for Ocelot switches, added here:
      https://patchwork.kernel.org/project/netdevbpf/cover/20210129010009.3959398-1-olteanv@gmail.com/
      
      gained support for PTP over L2 here:
      https://patchwork.kernel.org/project/netdevbpf/cover/20210213223801.1334216-1-olteanv@gmail.com/
      
      mostly as a minimum viable requirement. That PTP support was mostly
      self-contained code that installed some rules to replicate PTP packets
      on the CPU queue, in felix_setup_mmio_filtering().
      
      However ocelot-8021q starts to look more interesting for general purpose
      usage, so it is now time to reduce the technical debt by integrating the
      PTP traps used by Felix for tag_8021q with the rest of the Ocelot driver.
      
      There is further consolidation of traps to be done. The cookies used by
      MRP traps overlap with the cookies used for tag_8021q PTP traps, so
      those features could not be used at the same time.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5da1033b
    • Vladimir Oltean's avatar
      net: dsa: tag_ocelot_8021q: calculate TX checksum in software for deferred packets · 29940ce3
      Vladimir Oltean authored
      DSA inherits NETIF_F_CSUM_MASK from master->vlan_features, and the
      expectation is that TX checksumming is offloaded and not done in
      software.
      
      Normally the DSA master takes care of this, but packets handled by
      ocelot_defer_xmit() are a very special exception, because they are
      actually injected into the switch through register-based MMIO. So the
      DSA master is not involved at all for these packets => no one calculates
      the checksum.
      
      This allows PTP over UDP to work using the ocelot-8021q tagging
      protocol.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      29940ce3
    • Vladimir Oltean's avatar
      net: dsa: felix: update destinations of existing traps with ocelot-8021q · 99348004
      Vladimir Oltean authored
      Historically, the felix DSA driver has installed special traps such that
      PTP over L2 works with the ocelot-8021q tagging protocol; commit
      0a6f17c6 ("net: dsa: tag_ocelot_8021q: add support for PTP
      timestamping") has the details.
      
      Then the ocelot switch library also gained more comprehensive support
      for PTP traps through commit 96ca08c0 ("net: mscc: ocelot: set up
      traps for PTP packets").
      
      Right now, PTP over L2 works using ocelot-8021q via the traps it has set
      for itself, but nothing else does. Consolidating the two code blocks
      would make ocelot-8021q gain support for PTP over L4 and tc-flower
      traps, and at the same time avoid some code and TCAM duplication.
      
      The traps are similar in intent, but different in execution, so some
      explanation is required. The traps set up by felix_setup_mmio_filtering()
      are VCAP IS1 filters, which have a PAG that chains them to a VCAP IS2
      filter, and the IS2 is where the 'trap' action resides. The traps set up
      by ocelot_trap_add(), on the other hand, have a single filter, in VCAP
      IS2. The reason for chaining VCAP IS1 and IS2 in Felix was to ensure
      that the hardcoded traps take precedence and cannot be overridden by the
      Ocelot switch library.
      
      So in principle, the PTP traps needed for ocelot-8021q in the Felix
      driver can rely on ocelot_trap_add(), but the filters need to be patched
      to account for a quirk that LS1028A has: the quirk_no_xtr_irq described
      in commit 0a6f17c6 ("net: dsa: tag_ocelot_8021q: add support for PTP
      timestamping"). Live-patching is done by iterating through the trap list
      every time we know it has been updated, and transforming a trap into a
      redirect + CPU copy if ocelot-8021q is in use.
      
      Making the DSA ocelot-8021q tagger work with the Ocelot traps means we
      can eliminate the dedicated OCELOT_VCAP_IS1_TAG_8021Q_PTP_MMIO and
      OCELOT_VCAP_IS2_TAG_8021Q_PTP_MMIO cookies. To minimize the patch delta,
      OCELOT_VCAP_IS2_MRP_TRAP takes the place of OCELOT_VCAP_IS2_TAG_8021Q_PTP_MMIO
      (the alternative would have been to left-shift all cookie numbers by 1).
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      99348004
    • Vladimir Oltean's avatar
      net: dsa: felix: remove dead code in felix_setup_mmio_filtering() · d78637a8
      Vladimir Oltean authored
      There has been some controversy related to the sanity check that a CPU
      port exists, and commit e8b1d769 ("net: dsa: felix: Fix memory leak
      in felix_setup_mmio_filtering") even "corrected" an apparent memory leak
      as static analysis tools see it.
      
      However, the check is completely dead code, since the earliest point at
      which felix_setup_mmio_filtering() can be called is:
      
      felix_pci_probe
      -> dsa_register_switch
         -> dsa_switch_probe
            -> dsa_tree_setup
               -> dsa_tree_setup_cpu_ports
                  -> dsa_tree_setup_default_cpu
                     -> contains the "DSA: tree %d has no CPU port\n" check
               -> dsa_tree_setup_master
                  -> dsa_master_setup
                     -> sysfs_create_group(&dev->dev.kobj, &dsa_group);
                        -> makes tagging_store() callable
                           -> dsa_tree_change_tag_proto
                              -> dsa_tree_notify
                                 -> dsa_switch_event
                                    -> dsa_switch_change_tag_proto
                                       -> ds->ops->change_tag_protocol
                                          -> felix_change_tag_protocol
                                             -> felix_set_tag_protocol
                                                -> felix_setup_tag_8021q
                                                   -> felix_setup_mmio_filtering
                                                      -> breaks at first CPU port
      
      So probing would have failed earlier if there wasn't any CPU port
      defined.
      
      To avoid all confusion, delete the dead code and replace it with a
      comment.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d78637a8
    • Vladimir Oltean's avatar
      net: mscc: ocelot: annotate which traps need PTP timestamping · 9d75b881
      Vladimir Oltean authored
      The ocelot switch library does not need this information, but the felix
      DSA driver does.
      
      As a reminder, the VSC9959 switch in LS1028A doesn't have an IRQ line
      for packet extraction, so to be notified that a PTP packet needs to be
      dequeued, it receives that packet also over Ethernet, by setting up a
      packet trap. The Felix driver needs to install special kinds of traps
      for packets in need of RX timestamps, such that the packets are
      replicated both over Ethernet and over the CPU port module.
      
      But the Ocelot switch library sets up more than one trap for PTP event
      messages; it also traps PTP general messages, MRP control messages etc.
      Those packets don't need PTP timestamps, so there's no reason for the
      Felix driver to send them to the CPU port module.
      
      By knowing which traps need PTP timestamps, the Felix driver can
      adjust the traps installed using ocelot_trap_add() such that only those
      will actually get delivered to the CPU port module.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9d75b881
    • Vladimir Oltean's avatar
      net: mscc: ocelot: keep traps in a list · e42bd4ed
      Vladimir Oltean authored
      When using the ocelot-8021q tagging protocol, the CPU port isn't
      configured as an NPI port, but is a regular port. So a "trap to CPU"
      operation is actually a "redirect" operation. So DSA needs to set up the
      trapping action one way or another, depending on the tagging protocol in
      use.
      
      To ease DSA's work of modifying the action, keep all currently installed
      traps in a list, so that DSA can live-patch them when the tagging
      protocol changes.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e42bd4ed
    • Vladimir Oltean's avatar
      net: dsa: felix: use DSA port iteration helpers · 2960bb14
      Vladimir Oltean authored
      Use the helpers that avoid the quadratic complexity associated with
      calling dsa_to_port() indirectly: dsa_is_unused_port(),
      dsa_is_cpu_port().
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2960bb14
    • Vladimir Oltean's avatar
      net: mscc: ocelot: avoid overlap in VCAP IS2 between PTP and MRP traps · 85ea0daa
      Vladimir Oltean authored
      OCELOT_VCAP_IS2_TAG_8021Q_TXVLAN overlaps with OCELOT_VCAP_IS2_MRP_REDIRECT.
      To avoid this, make OCELOT_VCAP_IS2_MRP_REDIRECT take the cookie region
      from N to 2 * N - 1 (where N is ocelot->num_phys_ports).
      
      To avoid any risk that the singleton (not per port) VCAP IS2 filters
      overlap with per-port VCAP IS2 filters, we must ensure that the number
      of singleton filters is smaller than the number of physical ports.
      This is true right now, but may change in the future as switches with
      less ports get supported, or more singleton filters get added. So to be
      future-proof, let's move the singleton filters at the end of the range,
      where they won't overlap with anything to their right.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      85ea0daa
    • Vladimir Oltean's avatar
      net: mscc: ocelot: use a single VCAP filter for all MRP traps · b9bace6e
      Vladimir Oltean authored
      The MRP assist code installs a VCAP IS2 trapping rule for each port, but
      since the key and the action is the same, just the ingress port mask
      differs, there isn't any need to do this. We can save some space in the
      TCAM by using a single filter and adjusting the ingress port mask.
      
      Reuse the ocelot_trap_add() and ocelot_trap_del() functions for this
      purpose.
      
      Now that the cookies are no longer per port, we need to change the
      allocation scheme such that MRP traps use a fixed number.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b9bace6e