1. 22 Feb, 2022 2 commits
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf · 5663b854
      David S. Miller authored
      Pablo Neira Ayuso says:
      
      ====================
      Netfilter fixes for net
      
      This is fixing up the use without proper initialization in patch 5/5
      
      -o-
      
      Hi,
      
      The following patchset contains Netfilter fixes for net:
      
      1) Missing #ifdef CONFIG_IP6_NF_IPTABLES in recent xt_socket fix.
      
      2) Fix incorrect flow action array size in nf_tables.
      
      3) Unregister flowtable hooks from netns exit path.
      
      4) Fix missing limit object release, from Florian Westphal.
      
      5) Memleak in nf_tables object update path, also from Florian.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5663b854
    • Florian Westphal's avatar
      netfilter: nf_tables: fix memory leak during stateful obj update · dad3bdee
      Florian Westphal authored
      stateful objects can be updated from the control plane.
      The transaction logic allocates a temporary object for this purpose.
      
      The ->init function was called for this object, so plain kfree() leaks
      resources. We must call ->destroy function of the object.
      
      nft_obj_destroy does this, but it also decrements the module refcount,
      but the update path doesn't increment it.
      
      To avoid special-casing the update object release, do module_get for
      the update case too and release it via nft_obj_destroy().
      
      Fixes: d62d0ba9 ("netfilter: nf_tables: Introduce stateful object update operation")
      Cc: Fernando Fernandez Mancera <ffmancera@riseup.net>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      dad3bdee
  2. 21 Feb, 2022 4 commits
    • Florian Westphal's avatar
      netfilter: nft_limit: fix stateful object memory leak · 1a58f84e
      Florian Westphal authored
      We need to provide a destroy callback to release the extra fields.
      
      Fixes: 3b9e2ea6 ("netfilter: nft_limit: move stateful fields out of expression data")
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      1a58f84e
    • Pablo Neira Ayuso's avatar
      netfilter: nf_tables: unregister flowtable hooks on netns exit · 6069da44
      Pablo Neira Ayuso authored
      Unregister flowtable hooks before they are releases via
      nf_tables_flowtable_destroy() otherwise hook core reports UAF.
      
      BUG: KASAN: use-after-free in nf_hook_entries_grow+0x5a7/0x700 net/netfilter/core.c:142 net/netfilter/core.c:142
      Read of size 4 at addr ffff8880736f7438 by task syz-executor579/3666
      
      CPU: 0 PID: 3666 Comm: syz-executor579 Not tainted 5.16.0-rc5-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       <TASK>
       __dump_stack lib/dump_stack.c:88 [inline]
       __dump_stack lib/dump_stack.c:88 [inline] lib/dump_stack.c:106
       dump_stack_lvl+0x1dc/0x2d8 lib/dump_stack.c:106 lib/dump_stack.c:106
       print_address_description+0x65/0x380 mm/kasan/report.c:247 mm/kasan/report.c:247
       __kasan_report mm/kasan/report.c:433 [inline]
       __kasan_report mm/kasan/report.c:433 [inline] mm/kasan/report.c:450
       kasan_report+0x19a/0x1f0 mm/kasan/report.c:450 mm/kasan/report.c:450
       nf_hook_entries_grow+0x5a7/0x700 net/netfilter/core.c:142 net/netfilter/core.c:142
       __nf_register_net_hook+0x27e/0x8d0 net/netfilter/core.c:429 net/netfilter/core.c:429
       nf_register_net_hook+0xaa/0x180 net/netfilter/core.c:571 net/netfilter/core.c:571
       nft_register_flowtable_net_hooks+0x3c5/0x730 net/netfilter/nf_tables_api.c:7232 net/netfilter/nf_tables_api.c:7232
       nf_tables_newflowtable+0x2022/0x2cf0 net/netfilter/nf_tables_api.c:7430 net/netfilter/nf_tables_api.c:7430
       nfnetlink_rcv_batch net/netfilter/nfnetlink.c:513 [inline]
       nfnetlink_rcv_skb_batch net/netfilter/nfnetlink.c:634 [inline]
       nfnetlink_rcv_batch net/netfilter/nfnetlink.c:513 [inline] net/netfilter/nfnetlink.c:652
       nfnetlink_rcv_skb_batch net/netfilter/nfnetlink.c:634 [inline] net/netfilter/nfnetlink.c:652
       nfnetlink_rcv+0x10e6/0x2550 net/netfilter/nfnetlink.c:652 net/netfilter/nfnetlink.c:652
      
      __nft_release_hook() calls nft_unregister_flowtable_net_hooks() which
      only unregisters the hooks, then after RCU grace period, it is
      guaranteed that no packets add new entries to the flowtable (no flow
      offload rules and flowtable hooks are reachable from packet path), so it
      is safe to call nf_flow_table_free() which cleans up the remaining
      entries from the flowtable (both software and hardware) and it unbinds
      the flow_block.
      
      Fixes: ff4bf2f4 ("netfilter: nf_tables: add nft_unregister_flowtable_hook()")
      Reported-by: syzbot+e918523f77e62790d6d9@syzkaller.appspotmail.com
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      6069da44
    • Baruch Siach's avatar
      net: mdio-ipq4019: add delay after clock enable · b6ad6261
      Baruch Siach authored
      Experimentation shows that PHY detect might fail when the code attempts
      MDIO bus read immediately after clock enable. Add delay to stabilize the
      clock before bus access.
      
      PHY detect failure started to show after commit 7590fc6f ("net:
      mdio: Demote probed message to debug print") that removed coincidental
      delay between clock enable and bus access.
      
      10ms is meant to match the time it take to send the probed message over
      UART at 115200 bps. This might be a far overshoot.
      
      Fixes: 23a890d4 ("net: mdio: Add the reset function for IPQ MDIO driver")
      Signed-off-by: default avatarBaruch Siach <baruch.siach@siklu.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b6ad6261
    • Tao Liu's avatar
      gso: do not skip outer ip header in case of ipip and net_failover · cc20cced
      Tao Liu authored
      We encounter a tcp drop issue in our cloud environment. Packet GROed in
      host forwards to a VM virtio_net nic with net_failover enabled. VM acts
      as a IPVS LB with ipip encapsulation. The full path like:
      host gro -> vm virtio_net rx -> net_failover rx -> ipvs fullnat
       -> ipip encap -> net_failover tx -> virtio_net tx
      
      When net_failover transmits a ipip pkt (gso_type = 0x0103, which means
      SKB_GSO_TCPV4, SKB_GSO_DODGY and SKB_GSO_IPXIP4), there is no gso
      did because it supports TSO and GSO_IPXIP4. But network_header points to
      inner ip header.
      
      Call Trace:
       tcp4_gso_segment        ------> return NULL
       inet_gso_segment        ------> inner iph, network_header points to
       ipip_gso_segment
       inet_gso_segment        ------> outer iph
       skb_mac_gso_segment
      
      Afterwards virtio_net transmits the pkt, only inner ip header is modified.
      And the outer one just keeps unchanged. The pkt will be dropped in remote
      host.
      
      Call Trace:
       inet_gso_segment        ------> inner iph, outer iph is skipped
       skb_mac_gso_segment
       __skb_gso_segment
       validate_xmit_skb
       validate_xmit_skb_list
       sch_direct_xmit
       __qdisc_run
       __dev_queue_xmit        ------> virtio_net
       dev_hard_start_xmit
       __dev_queue_xmit        ------> net_failover
       ip_finish_output2
       ip_output
       iptunnel_xmit
       ip_tunnel_xmit
       ipip_tunnel_xmit        ------> ipip
       dev_hard_start_xmit
       __dev_queue_xmit
       ip_finish_output2
       ip_output
       ip_forward
       ip_rcv
       __netif_receive_skb_one_core
       netif_receive_skb_internal
       napi_gro_receive
       receive_buf
       virtnet_poll
       net_rx_action
      
      The root cause of this issue is specific with the rare combination of
      SKB_GSO_DODGY and a tunnel device that adds an SKB_GSO_ tunnel option.
      SKB_GSO_DODGY is set from external virtio_net. We need to reset network
      header when callbacks.gso_segment() returns NULL.
      
      This patch also includes ipv6_gso_segment(), considering SIT, etc.
      
      Fixes: cb32f511 ("ipip: add GSO/TSO support")
      Signed-off-by: default avatarTao Liu <thomas.liu@ucloud.cn>
      Reviewed-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cc20cced
  3. 20 Feb, 2022 9 commits
  4. 19 Feb, 2022 15 commits
    • Vladimir Oltean's avatar
      net: dsa: avoid call to __dev_set_promiscuity() while rtnl_mutex isn't held · 8940e6b6
      Vladimir Oltean authored
      If the DSA master doesn't support IFF_UNICAST_FLT, then the following
      call path is possible:
      
      dsa_slave_switchdev_event_work
      -> dsa_port_host_fdb_add
         -> dev_uc_add
            -> __dev_set_rx_mode
               -> __dev_set_promiscuity
      
      Since the blamed commit, dsa_slave_switchdev_event_work() no longer
      holds rtnl_lock(), which triggers the ASSERT_RTNL() from
      __dev_set_promiscuity().
      
      Taking rtnl_lock() around dev_uc_add() is impossible, because all the
      code paths that call dsa_flush_workqueue() do so from contexts where the
      rtnl_mutex is already held - so this would lead to an instant deadlock.
      
      dev_uc_add() in itself doesn't require the rtnl_mutex for protection.
      There is this comment in __dev_set_rx_mode() which assumes so:
      
      		/* Unicast addresses changes may only happen under the rtnl,
      		 * therefore calling __dev_set_promiscuity here is safe.
      		 */
      
      but it is from commit 4417da66 ("[NET]: dev: secondary unicast
      address support") dated June 2007, and in the meantime, commit
      f1f28aa3 ("netdev: Add addr_list_lock to struct net_device."), dated
      July 2008, has added &dev->addr_list_lock to protect this instead of the
      global rtnl_mutex.
      
      Nonetheless, __dev_set_promiscuity() does assume rtnl_mutex protection,
      but it is the uncommon path of what we typically expect dev_uc_add()
      to do. So since only the uncommon path requires rtnl_lock(), just check
      ahead of time whether dev_uc_add() would result into a call to
      __dev_set_promiscuity(), and handle that condition separately.
      
      DSA already configures the master interface to be promiscuous if the
      tagger requires this. We can extend this to also cover the case where
      the master doesn't handle dev_uc_add() (doesn't support IFF_UNICAST_FLT),
      and on the premise that we'd end up making it promiscuous during
      operation anyway, either if a DSA slave has a non-inherited MAC address,
      or if the bridge notifies local FDB entries for its own MAC address, the
      address of a station learned on a foreign port, etc.
      
      Fixes: 0faf890f ("net: dsa: drop rtnl_lock from dsa_slave_switchdev_event_work")
      Reported-by: default avatarOleksij Rempel <o.rempel@pengutronix.de>
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8940e6b6
    • Svenning Sørensen's avatar
      net: dsa: microchip: fix bridging with more than two member ports · 3d00827a
      Svenning Sørensen authored
      Commit b3612ccd ("net: dsa: microchip: implement multi-bridge support")
      plugged a packet leak between ports that were members of different bridges.
      Unfortunately, this broke another use case, namely that of more than two
      ports that are members of the same bridge.
      
      After that commit, when a port is added to a bridge, hardware bridging
      between other member ports of that bridge will be cleared, preventing
      packet exchange between them.
      
      Fix by ensuring that the Port VLAN Membership bitmap includes any existing
      ports in the bridge, not just the port being added.
      
      Fixes: b3612ccd ("net: dsa: microchip: implement multi-bridge support")
      Signed-off-by: default avatarSvenning Sørensen <sss@secomea.com>
      Tested-by: default avatarOleksij Rempel <o.rempel@pengutronix.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3d00827a
    • Christophe Leroy's avatar
      net: Force inlining of checksum functions in net/checksum.h · 5486f5bf
      Christophe Leroy authored
      All functions defined as static inline in net/checksum.h are
      meant to be inlined for performance reason.
      
      But since commit ac7c3e4f ("compiler: enable
      CONFIG_OPTIMIZE_INLINING forcibly") the compiler is allowed to
      uninline functions when it wants.
      
      Fair enough in the general case, but for tiny performance critical
      checksum helpers that's counter-productive.
      
      The problem mainly arises when selecting CONFIG_CC_OPTIMISE_FOR_SIZE,
      Those helpers being 'static inline' in header files you suddenly find
      them duplicated many times in the resulting vmlinux.
      
      Here is a typical exemple when building powerpc pmac32_defconfig
      with CONFIG_CC_OPTIMISE_FOR_SIZE. csum_sub() appears 4 times:
      
      	c04a23cc <csum_sub>:
      	c04a23cc:	7c 84 20 f8 	not     r4,r4
      	c04a23d0:	7c 63 20 14 	addc    r3,r3,r4
      	c04a23d4:	7c 63 01 94 	addze   r3,r3
      	c04a23d8:	4e 80 00 20 	blr
      		...
      	c04a2ce8:	4b ff f6 e5 	bl      c04a23cc <csum_sub>
      		...
      	c04a2d2c:	4b ff f6 a1 	bl      c04a23cc <csum_sub>
      		...
      	c04a2d54:	4b ff f6 79 	bl      c04a23cc <csum_sub>
      		...
      	c04a754c <csum_sub>:
      	c04a754c:	7c 84 20 f8 	not     r4,r4
      	c04a7550:	7c 63 20 14 	addc    r3,r3,r4
      	c04a7554:	7c 63 01 94 	addze   r3,r3
      	c04a7558:	4e 80 00 20 	blr
      		...
      	c04ac930:	4b ff ac 1d 	bl      c04a754c <csum_sub>
      		...
      	c04ad264:	4b ff a2 e9 	bl      c04a754c <csum_sub>
      		...
      	c04e3b08 <csum_sub>:
      	c04e3b08:	7c 84 20 f8 	not     r4,r4
      	c04e3b0c:	7c 63 20 14 	addc    r3,r3,r4
      	c04e3b10:	7c 63 01 94 	addze   r3,r3
      	c04e3b14:	4e 80 00 20 	blr
      		...
      	c04e5788:	4b ff e3 81 	bl      c04e3b08 <csum_sub>
      		...
      	c04e65c8:	4b ff d5 41 	bl      c04e3b08 <csum_sub>
      		...
      	c0512d34 <csum_sub>:
      	c0512d34:	7c 84 20 f8 	not     r4,r4
      	c0512d38:	7c 63 20 14 	addc    r3,r3,r4
      	c0512d3c:	7c 63 01 94 	addze   r3,r3
      	c0512d40:	4e 80 00 20 	blr
      		...
      	c0512dfc:	4b ff ff 39 	bl      c0512d34 <csum_sub>
      		...
      	c05138bc:	4b ff f4 79 	bl      c0512d34 <csum_sub>
      		...
      
      Restore the expected behaviour by using __always_inline for all
      functions defined in net/checksum.h
      
      vmlinux size is even reduced by 256 bytes with this patch:
      
      	   text	   data	    bss	    dec	    hex	filename
      	6980022	2515362	 194384	9689768	 93daa8	vmlinux.before
      	6979862	2515266	 194384	9689512	 93d9a8	vmlinux.now
      
      Fixes: ac7c3e4f ("compiler: enable CONFIG_OPTIMIZE_INLINING forcibly")
      Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
      Cc: Nick Desaulniers <ndesaulniers@google.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarChristophe Leroy <christophe.leroy@csgroup.eu>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5486f5bf
    • David S. Miller's avatar
      Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue · 0033fced
      David S. Miller authored
      Tony Nguyen says:
      
      ====================
      Intel Wired LAN Driver Updates 2022-02-18
      
      This series contains updates to ice driver only.
      
      Wojciech fixes protocol matching for slow-path switchdev so that all
      packets are correctly redirected.
      
      Michal removes accidental unconditional setting of l4 port filtering
      flag.
      
      Jake adds locking to protect VF reset and removal to fix various issues
      that can be encountered when they race with each other.
      
      Tom Rix propagates an error and initializes a struct to resolve reported
      Clang issues.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0033fced
    • David S. Miller's avatar
      Merge branch 'mptcp-fixes' · 90141edc
      David S. Miller authored
      Mat Martineau says:
      
      ====================
      mptcp: Fix address advertisement races and stabilize tests
      
      Patches 1, 2, and 7 modify two self tests to give consistent, accurate
      results by fixing timing issues and accounting for syncookie behavior.
      
      Paches 3-6 fix two races in overlapping address advertisement send and
      receive. Associated self tests are updated, including addition of two
      MIBs to enable testing and tracking dropped address events.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      90141edc
    • Paolo Abeni's avatar
      selftests: mptcp: be more conservative with cookie MPJ limits · e35f885b
      Paolo Abeni authored
      Since commit 2843ff6f ("mptcp: remote addresses fullmesh"), an
      MPTCP client can attempt creating multiple MPJ subflow simultaneusly.
      
      In such scenario the server, when syncookies are enabled, could end-up
      accepting incoming MPJ syn even above the configured subflow limit, as
      the such limit can be enforced in a reliable way only after the subflow
      creation. In case of syncookie, only after the 3rd ack reception.
      
      As a consequence the related self-tests case sporadically fails, as it
      verify that the server always accept the expected number of MPJ syn.
      
      Address the issues relaxing the MPJ syn number constrain. Note that the
      check on the accepted number of MPJ 3rd ack still remains intact.
      
      Fixes: 2843ff6f ("mptcp: remote addresses fullmesh")
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e35f885b
    • Paolo Abeni's avatar
      selftests: mptcp: more robust signal race test · 6ef84b15
      Paolo Abeni authored
      The in kernel MPTCP PM implementation can process a single
      incoming add address option at any given time. In the
      mentioned test the server can surpass such limit. Let the
      setup cope with that allowing a faster add_addr retransmission.
      
      Fixes: a88c9e49 ("mptcp: do not block subflows creation on errors")
      Fixes: f7efc777 ("mptcp: drop argument port from mptcp_pm_announce_addr")
      Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/254Reported-and-tested-by: default avatarMatthieu Baerts <matthieu.baerts@tessares.net>
      Reviewed-by: default avatarMatthieu Baerts <matthieu.baerts@tessares.net>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6ef84b15
    • Paolo Abeni's avatar
      mptcp: add mibs counter for ignored incoming options · f73c1194
      Paolo Abeni authored
      The MPTCP in kernel path manager has some constraints on incoming
      addresses announce processing, so that in edge scenarios it can
      end-up dropping (ignoring) some of such announces.
      
      The above is not very limiting in practice since such scenarios are
      very uncommon and MPTCP will recover due to ADD_ADDR retransmissions.
      
      This patch adds a few MIB counters to account for such drop events
      to allow easier introspection of the critical scenarios.
      
      Fixes: f7efc777 ("mptcp: drop argument port from mptcp_pm_announce_addr")
      Reviewed-by: default avatarMatthieu Baerts <matthieu.baerts@tessares.net>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f73c1194
    • Paolo Abeni's avatar
      mptcp: fix race in incoming ADD_ADDR option processing · 837cf45d
      Paolo Abeni authored
      If an MPTCP endpoint received multiple consecutive incoming
      ADD_ADDR options, mptcp_pm_add_addr_received() can overwrite
      the current remote address value after the PM lock is released
      in mptcp_pm_nl_add_addr_received() and before such address
      is echoed.
      
      Fix the issue caching the remote address value a little earlier
      and always using the cached value after releasing the PM lock.
      
      Fixes: f7efc777 ("mptcp: drop argument port from mptcp_pm_announce_addr")
      Reviewed-by: default avatarMatthieu Baerts <matthieu.baerts@tessares.net>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      837cf45d
    • Paolo Abeni's avatar
      mptcp: fix race in overlapping signal events · 98247bc1
      Paolo Abeni authored
      After commit a88c9e49 ("mptcp: do not block subflows
      creation on errors"), if a signal address races with a failing
      subflow creation, the subflow creation failure control path
      can trigger the selection of the next address to be announced
      while the current announced is still pending.
      
      The above will cause the unintended suppression of the ADD_ADDR
      announce.
      
      Fix the issue skipping the to-be-suppressed announce before it
      will mark an endpoint as already used. The relevant announce
      will be triggered again when the current one will complete.
      
      Fixes: a88c9e49 ("mptcp: do not block subflows creation on errors")
      Reviewed-by: default avatarMatthieu Baerts <matthieu.baerts@tessares.net>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      98247bc1
    • Paolo Abeni's avatar
      selftests: mptcp: improve 'fair usage on close' stability · 5b31dda7
      Paolo Abeni authored
      The mentioned test has to wait for a subflow creation failure.
      The current code looks for TCP sockets in TW state and sometimes
      misses the relevant event. Switch to a more stable check, looking
      for the associated mib counter.
      
      Fixes: 46e967d1 ("selftests: mptcp: add tests for subflow creation failure")
      Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/257Reported-and-tested-by: default avatarMatthieu Baerts <matthieu.baerts@tessares.net>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5b31dda7
    • Paolo Abeni's avatar
      selftests: mptcp: fix diag instability · 0cd33c5f
      Paolo Abeni authored
      Instead of waiting for an arbitrary amount of time for the MPTCP
      MP_CAPABLE handshake to complete, explicitly wait for the relevant
      socket to enter into the established status.
      
      Additionally let the data transfer application use the slowest
      transfer mode available (-r), to cope with very slow host, or
      high jitter caused by hosting VMs.
      
      Fixes: df62f2ec ("selftests/mptcp: add diag interface tests")
      Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/258Reported-and-tested-by: default avatarMatthieu Baerts <matthieu.baerts@tessares.net>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0cd33c5f
    • Christophe JAILLET's avatar
      nfp: flower: Fix a potential leak in nfp_tunnel_add_shared_mac() · 3a14d088
      Christophe JAILLET authored
      ida_simple_get() returns an id between min (0) and max (NFP_MAX_MAC_INDEX)
      inclusive.
      So NFP_MAX_MAC_INDEX (0xff) is a valid id.
      
      In order for the error handling path to work correctly, the 'invalid'
      value for 'ida_idx' should not be in the 0..NFP_MAX_MAC_INDEX range,
      inclusive.
      
      So set it to -1.
      
      Fixes: 20cce886 ("nfp: flower: enable MAC address sharing for offloadable devs")
      Signed-off-by: default avatarChristophe JAILLET <christophe.jaillet@wanadoo.fr>
      Signed-off-by: default avatarSimon Horman <simon.horman@corigine.com>
      Link: https://lore.kernel.org/r/20220218131535.100258-1-simon.horman@corigine.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      3a14d088
    • Subash Abhinov Kasiviswanathan's avatar
    • Jeremy Linton's avatar
      net: mvpp2: always set port pcs ops · 5a2aba71
      Jeremy Linton authored
      Booting a MACCHIATObin with 5.17, the system OOPs with
      a null pointer deref when the network is started. This
      is caused by the pcs->ops structure being null in
      mcpp2_acpi_start() when it tries to call pcs_config().
      
      Hoisting the code which sets pcs_gmac.ops and pcs_xlg.ops,
      assuring they are always set, fixes the problem.
      
      The OOPs looks like:
      [   18.687760] Unable to handle kernel access to user memory outside uaccess routines at virtual address 0000000000000010
      [   18.698561] Mem abort info:
      [   18.698564]   ESR = 0x96000004
      [   18.698567]   EC = 0x25: DABT (current EL), IL = 32 bits
      [   18.709821]   SET = 0, FnV = 0
      [   18.714292]   EA = 0, S1PTW = 0
      [   18.718833]   FSC = 0x04: level 0 translation fault
      [   18.725126] Data abort info:
      [   18.729408]   ISV = 0, ISS = 0x00000004
      [   18.734655]   CM = 0, WnR = 0
      [   18.738933] user pgtable: 4k pages, 48-bit VAs, pgdp=0000000111bbf000
      [   18.745409] [0000000000000010] pgd=0000000000000000, p4d=0000000000000000
      [   18.752235] Internal error: Oops: 96000004 [#1] SMP
      [   18.757134] Modules linked in: rfkill ip_set nf_tables nfnetlink qrtr sunrpc vfat fat omap_rng fuse zram xfs crct10dif_ce mvpp2 ghash_ce sbsa_gwdt phylink xhci_plat_hcd ahci_plam
      [   18.773481] CPU: 0 PID: 681 Comm: NetworkManager Not tainted 5.17.0-0.rc3.89.fc36.aarch64 #1
      [   18.781954] Hardware name: Marvell                         Armada 7k/8k Family Board      /Armada 7k/8k Family Board      , BIOS EDK II Jun  4 2019
      [   18.795222] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
      [   18.802213] pc : mvpp2_start_dev+0x2b0/0x300 [mvpp2]
      [   18.807208] lr : mvpp2_start_dev+0x298/0x300 [mvpp2]
      [   18.812197] sp : ffff80000b4732c0
      [   18.815522] x29: ffff80000b4732c0 x28: 0000000000000000 x27: ffffccab38ae57f8
      [   18.822689] x26: ffff6eeb03065a10 x25: ffff80000b473a30 x24: ffff80000b4735b8
      [   18.829855] x23: 0000000000000000 x22: 00000000000001e0 x21: ffff6eeb07b6ab68
      [   18.837021] x20: ffff6eeb07b6ab30 x19: ffff6eeb07b6a9c0 x18: 0000000000000014
      [   18.844187] x17: 00000000f6232bfe x16: ffffccab899b1dc0 x15: 000000006a30f9fa
      [   18.851353] x14: 000000003b77bd50 x13: 000006dc896f0e8e x12: 001bbbfccfd0d3a2
      [   18.858519] x11: 0000000000001528 x10: 0000000000001548 x9 : ffffccab38ad0fb0
      [   18.865685] x8 : ffff80000b473330 x7 : 0000000000000000 x6 : 0000000000000000
      [   18.872851] x5 : 0000000000000000 x4 : 0000000000000000 x3 : ffff80000b4732f8
      [   18.880017] x2 : 000000000000001a x1 : 0000000000000002 x0 : ffff6eeb07b6ab68
      [   18.887183] Call trace:
      [   18.889637]  mvpp2_start_dev+0x2b0/0x300 [mvpp2]
      [   18.894279]  mvpp2_open+0x134/0x2b4 [mvpp2]
      [   18.898483]  __dev_open+0x128/0x1e4
      [   18.901988]  __dev_change_flags+0x17c/0x1d0
      [   18.906187]  dev_change_flags+0x30/0x70
      [   18.910038]  do_setlink+0x278/0xa7c
      [   18.913540]  __rtnl_newlink+0x44c/0x7d0
      [   18.917391]  rtnl_newlink+0x5c/0x8c
      [   18.920892]  rtnetlink_rcv_msg+0x254/0x314
      [   18.925006]  netlink_rcv_skb+0x48/0x10c
      [   18.928858]  rtnetlink_rcv+0x24/0x30
      [   18.932449]  netlink_unicast+0x290/0x2f4
      [   18.936386]  netlink_sendmsg+0x1d0/0x41c
      [   18.940323]  sock_sendmsg+0x60/0x70
      [   18.943825]  ____sys_sendmsg+0x248/0x260
      [   18.947762]  ___sys_sendmsg+0x74/0xa0
      [   18.951438]  __sys_sendmsg+0x64/0xcc
      [   18.955027]  __arm64_sys_sendmsg+0x30/0x40
      [   18.959140]  invoke_syscall+0x50/0x120
      [   18.962906]  el0_svc_common.constprop.0+0x4c/0xf4
      [   18.967629]  do_el0_svc+0x30/0x9c
      [   18.970958]  el0_svc+0x28/0xb0
      [   18.974025]  el0t_64_sync_handler+0x10c/0x140
      [   18.978400]  el0t_64_sync+0x1a4/0x1a8
      [   18.982078] Code: 52800004 b9416262 aa1503e0 52800041 (f94008a5)
      [   18.988196] ---[ end trace 0000000000000000 ]---
      
      Fixes: cff05632 ("net: mvpp2: use .mac_select_pcs() interface")
      Suggested-by: default avatarRussell King (Oracle) <rmk+kernel@armlinux.org.uk>
      Signed-off-by: default avatarJeremy Linton <jeremy.linton@arm.com>
      Reviewed-by: default avatarMarcin Wojtas <mw@semihalf.com>
      Link: https://lore.kernel.org/r/20220214231852.3331430-1-jeremy.linton@arm.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      5a2aba71
  5. 18 Feb, 2022 10 commits
    • Tom Rix's avatar
      ice: initialize local variable 'tlv' · 5950bdc8
      Tom Rix authored
      Clang static analysis reports this issues
      ice_common.c:5008:21: warning: The left expression of the compound
        assignment is an uninitialized value. The computed value will
        also be garbage
        ldo->phy_type_low |= ((u64)buf << (i * 16));
        ~~~~~~~~~~~~~~~~~ ^
      
      When called from ice_cfg_phy_fec() ldo is the uninitialized local
      variable tlv.  So initialize.
      
      Fixes: ea78ce4d ("ice: add link lenient and default override support")
      Signed-off-by: default avatarTom Rix <trix@redhat.com>
      Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel)
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      5950bdc8
    • Tom Rix's avatar
      ice: check the return of ice_ptp_gettimex64 · ed22d9c8
      Tom Rix authored
      Clang static analysis reports this issue
      time64.h:69:50: warning: The left operand of '+'
        is a garbage value
        set_normalized_timespec64(&ts_delta, lhs.tv_sec + rhs.tv_sec,
                                             ~~~~~~~~~~ ^
      In ice_ptp_adjtime_nonatomic(), the timespec64 variable 'now'
      is set by ice_ptp_gettimex64().  This function can fail
      with -EBUSY, so 'now' can have a gargbage value.
      So check the return.
      
      Fixes: 06c16d89 ("ice: register 1588 PTP clock device object for E810 devices")
      Signed-off-by: default avatarTom Rix <trix@redhat.com>
      Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel)
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      ed22d9c8
    • Jacob Keller's avatar
      ice: fix concurrent reset and removal of VFs · fadead80
      Jacob Keller authored
      Commit c503e632 ("ice: Stop processing VF messages during teardown")
      introduced a driver state flag, ICE_VF_DEINIT_IN_PROGRESS, which is
      intended to prevent some issues with concurrently handling messages from
      VFs while tearing down the VFs.
      
      This change was motivated by crashes caused while tearing down and
      bringing up VFs in rapid succession.
      
      It turns out that the fix actually introduces issues with the VF driver
      caused because the PF no longer responds to any messages sent by the VF
      during its .remove routine. This results in the VF potentially removing
      its DMA memory before the PF has shut down the device queues.
      
      Additionally, the fix doesn't actually resolve concurrency issues within
      the ice driver. It is possible for a VF to initiate a reset just prior
      to the ice driver removing VFs. This can result in the remove task
      concurrently operating while the VF is being reset. This results in
      similar memory corruption and panics purportedly fixed by that commit.
      
      Fix this concurrency at its root by protecting both the reset and
      removal flows using the existing VF cfg_lock. This ensures that we
      cannot remove the VF while any outstanding critical tasks such as a
      virtchnl message or a reset are occurring.
      
      This locking change also fixes the root cause originally fixed by commit
      c503e632 ("ice: Stop processing VF messages during teardown"), so we
      can simply revert it.
      
      Note that I kept these two changes together because simply reverting the
      original commit alone would leave the driver vulnerable to worse race
      conditions.
      
      Fixes: c503e632 ("ice: Stop processing VF messages during teardown")
      Signed-off-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Tested-by: default avatarKonrad Jankowski <konrad0.jankowski@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      fadead80
    • Michal Swiatkowski's avatar
      ice: fix setting l4 port flag when adding filter · 932645c2
      Michal Swiatkowski authored
      Accidentally filter flag for none encapsulated l4 port field is always
      set. Even if user wants to add encapsulated l4 port field.
      
      Remove this unnecessary flag setting.
      
      Fixes: 9e300987 ("ice: VXLAN and Geneve TC support")
      Signed-off-by: default avatarMichal Swiatkowski <michal.swiatkowski@linux.intel.com>
      Tested-by: default avatarSandeep Penigalapati <sandeep.penigalapati@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      932645c2
    • Wojciech Drewek's avatar
      ice: Match on all profiles in slow-path · b70bc066
      Wojciech Drewek authored
      In switchdev mode, slow-path rules need to match all protocols, in order
      to correctly redirect unfiltered or missed packets to the uplink. To set
      this up for the virtual function to uplink flow, the rule that redirects
      packets to the control VSI must have the tunnel type set to
      ICE_SW_TUN_AND_NON_TUN. As a result of that new tunnel type being set,
      ice_get_compat_fv_bitmap will select ICE_PROF_ALL. At that point all
      profiles would be selected for this rule, resulting in the desired
      behavior. Without this change slow-path would not work with
      tunnel protocols.
      
      Fixes: 8b032a55 ("ice: low level support for tunnels")
      Signed-off-by: default avatarWojciech Drewek <wojciech.drewek@intel.com>
      Tested-by: default avatarSandeep Penigalapati <sandeep.penigalapati@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      b70bc066
    • Xiaoke Wang's avatar
      net: ll_temac: check the return value of devm_kmalloc() · b352c346
      Xiaoke Wang authored
      devm_kmalloc() returns a pointer to allocated memory on success, NULL
      on failure. While lp->indirect_lock is allocated by devm_kmalloc()
      without proper check. It is better to check the value of it to
      prevent potential wrong memory access.
      
      Fixes: f14f5c11 ("net: ll_temac: Support indirect_mutex share within TEMAC IP")
      Signed-off-by: default avatarXiaoke Wang <xkernel.wang@foxmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b352c346
    • Eric Dumazet's avatar
      net-timestamp: convert sk->sk_tskey to atomic_t · a1cdec57
      Eric Dumazet authored
      UDP sendmsg() can be lockless, this is causing all kinds
      of data races.
      
      This patch converts sk->sk_tskey to remove one of these races.
      
      BUG: KCSAN: data-race in __ip_append_data / __ip_append_data
      
      read to 0xffff8881035d4b6c of 4 bytes by task 8877 on cpu 1:
       __ip_append_data+0x1c1/0x1de0 net/ipv4/ip_output.c:994
       ip_make_skb+0x13f/0x2d0 net/ipv4/ip_output.c:1636
       udp_sendmsg+0x12bd/0x14c0 net/ipv4/udp.c:1249
       inet_sendmsg+0x5f/0x80 net/ipv4/af_inet.c:819
       sock_sendmsg_nosec net/socket.c:705 [inline]
       sock_sendmsg net/socket.c:725 [inline]
       ____sys_sendmsg+0x39a/0x510 net/socket.c:2413
       ___sys_sendmsg net/socket.c:2467 [inline]
       __sys_sendmmsg+0x267/0x4c0 net/socket.c:2553
       __do_sys_sendmmsg net/socket.c:2582 [inline]
       __se_sys_sendmmsg net/socket.c:2579 [inline]
       __x64_sys_sendmmsg+0x53/0x60 net/socket.c:2579
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      write to 0xffff8881035d4b6c of 4 bytes by task 8880 on cpu 0:
       __ip_append_data+0x1d8/0x1de0 net/ipv4/ip_output.c:994
       ip_make_skb+0x13f/0x2d0 net/ipv4/ip_output.c:1636
       udp_sendmsg+0x12bd/0x14c0 net/ipv4/udp.c:1249
       inet_sendmsg+0x5f/0x80 net/ipv4/af_inet.c:819
       sock_sendmsg_nosec net/socket.c:705 [inline]
       sock_sendmsg net/socket.c:725 [inline]
       ____sys_sendmsg+0x39a/0x510 net/socket.c:2413
       ___sys_sendmsg net/socket.c:2467 [inline]
       __sys_sendmmsg+0x267/0x4c0 net/socket.c:2553
       __do_sys_sendmmsg net/socket.c:2582 [inline]
       __se_sys_sendmmsg net/socket.c:2579 [inline]
       __x64_sys_sendmmsg+0x53/0x60 net/socket.c:2579
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      value changed: 0x0000054d -> 0x0000054e
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 0 PID: 8880 Comm: syz-executor.5 Not tainted 5.17.0-rc2-syzkaller-00167-gdcb85f85-dirty #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      
      Fixes: 09c2d251 ("net-timestamp: add key to disambiguate concurrent datagrams")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Willem de Bruijn <willemb@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a1cdec57
    • Oliver Neukum's avatar
      sr9700: sanity check for packet length · e9da0b56
      Oliver Neukum authored
      A malicious device can leak heap data to user space
      providing bogus frame lengths. Introduce a sanity check.
      Signed-off-by: default avatarOliver Neukum <oneukum@suse.com>
      Reviewed-by: default avatarGrant Grundler <grundler@chromium.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e9da0b56
    • Paul Blakey's avatar
      net/sched: act_ct: Fix flow table lookup after ct clear or switching zones · 2f131de3
      Paul Blakey authored
      Flow table lookup is skipped if packet either went through ct clear
      action (which set the IP_CT_UNTRACKED flag on the packet), or while
      switching zones and there is already a connection associated with
      the packet. This will result in no SW offload of the connection,
      and the and connection not being removed from flow table with
      TCP teardown (fin/rst packet).
      
      To fix the above, remove these unneccary checks in flow
      table lookup.
      
      Fixes: 46475bb2 ("net/sched: act_ct: Software offload of established flows")
      Signed-off-by: default avatarPaul Blakey <paulb@nvidia.com>
      Acked-by: default avatarMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2f131de3
    • suresh kumar's avatar
      net-sysfs: add check for netdevice being present to speed_show · 4224cfd7
      suresh kumar authored
      When bringing down the netdevice or system shutdown, a panic can be
      triggered while accessing the sysfs path because the device is already
      removed.
      
          [  755.549084] mlx5_core 0000:12:00.1: Shutdown was called
          [  756.404455] mlx5_core 0000:12:00.0: Shutdown was called
          ...
          [  757.937260] BUG: unable to handle kernel NULL pointer dereference at           (null)
          [  758.031397] IP: [<ffffffff8ee11acb>] dma_pool_alloc+0x1ab/0x280
      
          crash> bt
          ...
          PID: 12649  TASK: ffff8924108f2100  CPU: 1   COMMAND: "amsd"
          ...
           #9 [ffff89240e1a38b0] page_fault at ffffffff8f38c778
              [exception RIP: dma_pool_alloc+0x1ab]
              RIP: ffffffff8ee11acb  RSP: ffff89240e1a3968  RFLAGS: 00010046
              RAX: 0000000000000246  RBX: ffff89243d874100  RCX: 0000000000001000
              RDX: 0000000000000000  RSI: 0000000000000246  RDI: ffff89243d874090
              RBP: ffff89240e1a39c0   R8: 000000000001f080   R9: ffff8905ffc03c00
              R10: ffffffffc04680d4  R11: ffffffff8edde9fd  R12: 00000000000080d0
              R13: ffff89243d874090  R14: ffff89243d874080  R15: 0000000000000000
              ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
          #10 [ffff89240e1a39c8] mlx5_alloc_cmd_msg at ffffffffc04680f3 [mlx5_core]
          #11 [ffff89240e1a3a18] cmd_exec at ffffffffc046ad62 [mlx5_core]
          #12 [ffff89240e1a3ab8] mlx5_cmd_exec at ffffffffc046b4fb [mlx5_core]
          #13 [ffff89240e1a3ae8] mlx5_core_access_reg at ffffffffc0475434 [mlx5_core]
          #14 [ffff89240e1a3b40] mlx5e_get_fec_caps at ffffffffc04a7348 [mlx5_core]
          #15 [ffff89240e1a3bb0] get_fec_supported_advertised at ffffffffc04992bf [mlx5_core]
          #16 [ffff89240e1a3c08] mlx5e_get_link_ksettings at ffffffffc049ab36 [mlx5_core]
          #17 [ffff89240e1a3ce8] __ethtool_get_link_ksettings at ffffffff8f25db46
          #18 [ffff89240e1a3d48] speed_show at ffffffff8f277208
          #19 [ffff89240e1a3dd8] dev_attr_show at ffffffff8f0b70e3
          #20 [ffff89240e1a3df8] sysfs_kf_seq_show at ffffffff8eedbedf
          #21 [ffff89240e1a3e18] kernfs_seq_show at ffffffff8eeda596
          #22 [ffff89240e1a3e28] seq_read at ffffffff8ee76d10
          #23 [ffff89240e1a3e98] kernfs_fop_read at ffffffff8eedaef5
          #24 [ffff89240e1a3ed8] vfs_read at ffffffff8ee4e3ff
          #25 [ffff89240e1a3f08] sys_read at ffffffff8ee4f27f
          #26 [ffff89240e1a3f50] system_call_fastpath at ffffffff8f395f92
      
          crash> net_device.state ffff89443b0c0000
            state = 0x5  (__LINK_STATE_START| __LINK_STATE_NOCARRIER)
      
      To prevent this scenario, we also make sure that the netdevice is present.
      Signed-off-by: default avatarsuresh kumar <suresh2514@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4224cfd7