1. 21 Feb, 2023 5 commits
    • Xin Long's avatar
      netfilter: xt_length: use skb len to match in length_mt6 · 05c07c0c
      Xin Long authored
      For IPv6 Jumbo packets, the ipv6_hdr(skb)->payload_len is always 0,
      and its real payload_len ( > 65535) is saved in hbh exthdr. With 0
      length for the jumbo packets, it may mismatch.
      
      To fix this, we can just use skb->len instead of parsing exthdrs, as
      the hbh exthdr parsing has been done before coming to length_mt6 in
      ip6_rcv_core() and br_validate_ipv6() and also the packet has been
      trimmed according to the correct IPv6 (ext)hdr length there, and skb
      len is trustable in length_mt6().
      
      Note that this patch is especially needed after the IPv6 BIG TCP was
      supported in kernel, which is using IPv6 Jumbo packets. Besides, to
      match the packets greater than 65535 more properly, a v1 revision of
      xt_length may be needed to extend "min, max" to u32 in the future,
      and for now the IPv6 Jumbo packets can be matched by:
      
        # ip6tables -m length ! --length 0:65535
      
      Fixes: 7c4e983c ("net: allow gso_max_size to exceed 65536")
      Fixes: 0fe79f28 ("net: allow gro_max_size to exceed 65536")
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      05c07c0c
    • Florian Westphal's avatar
      netfilter: ebtables: fix table blob use-after-free · e58a171d
      Florian Westphal authored
      We are not allowed to return an error at this point.
      Looking at the code it looks like ret is always 0 at this
      point, but its not.
      
      t = find_table_lock(net, repl->name, &ret, &ebt_mutex);
      
      ... this can return a valid table, with ret != 0.
      
      This bug causes update of table->private with the new
      blob, but then frees the blob right away in the caller.
      
      Syzbot report:
      
      BUG: KASAN: vmalloc-out-of-bounds in __ebt_unregister_table+0xc00/0xcd0 net/bridge/netfilter/ebtables.c:1168
      Read of size 4 at addr ffffc90005425000 by task kworker/u4:4/74
      Workqueue: netns cleanup_net
      Call Trace:
       kasan_report+0xbf/0x1f0 mm/kasan/report.c:517
       __ebt_unregister_table+0xc00/0xcd0 net/bridge/netfilter/ebtables.c:1168
       ebt_unregister_table+0x35/0x40 net/bridge/netfilter/ebtables.c:1372
       ops_exit_list+0xb0/0x170 net/core/net_namespace.c:169
       cleanup_net+0x4ee/0xb10 net/core/net_namespace.c:613
      ...
      
      ip(6)tables appears to be ok (ret should be 0 at this point) but make
      this more obvious.
      
      Fixes: c58dd2dd ("netfilter: Can't fail and free after table replacement")
      Reported-by: syzbot+f61594de72d6705aea03@syzkaller.appspotmail.com
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      e58a171d
    • Phil Sutter's avatar
      netfilter: ip6t_rpfilter: Fix regression with VRF interfaces · efb056e5
      Phil Sutter authored
      When calling ip6_route_lookup() for the packet arriving on the VRF
      interface, the result is always the real (slave) interface. Expect this
      when validating the result.
      
      Fixes: acc641ab ("netfilter: rpfilter/fib: Populate flowic_l3mdev field")
      Signed-off-by: default avatarPhil Sutter <phil@nwl.cc>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      efb056e5
    • Florian Westphal's avatar
      netfilter: conntrack: fix rmmod double-free race · e6d57e9f
      Florian Westphal authored
      nf_conntrack_hash_check_insert() callers free the ct entry directly, via
      nf_conntrack_free.
      
      This isn't safe anymore because
      nf_conntrack_hash_check_insert() might place the entry into the conntrack
      table and then delteted the entry again because it found that a conntrack
      extension has been removed at the same time.
      
      In this case, the just-added entry is removed again and an error is
      returned to the caller.
      
      Problem is that another cpu might have picked up this entry and
      incremented its reference count.
      
      This results in a use-after-free/double-free, once by the other cpu and
      once by the caller of nf_conntrack_hash_check_insert().
      
      Fix this by making nf_conntrack_hash_check_insert() not fail anymore
      after the insertion, just like before the 'Fixes' commit.
      
      This is safe because a racing nf_ct_iterate() has to wait for us
      to release the conntrack hash spinlocks.
      
      While at it, make the function return -EAGAIN in the rmmod (genid
      changed) case, this makes nfnetlink replay the command (suggested
      by Pablo Neira).
      
      Fixes: c56716c6 ("netfilter: extensions: introduce extension genid count")
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      e6d57e9f
    • Hangyu Hua's avatar
      netfilter: ctnetlink: fix possible refcount leak in ctnetlink_create_conntrack() · ac489398
      Hangyu Hua authored
      nf_ct_put() needs to be called to put the refcount got by
      nf_conntrack_find_get() to avoid refcount leak when
      nf_conntrack_hash_check_insert() fails.
      
      Fixes: 7d367e06 ("netfilter: ctnetlink: fix soft lockup when netlink adds new entries (v2)")
      Signed-off-by: default avatarHangyu Hua <hbh25y@gmail.com>
      Acked-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      ac489398
  2. 09 Feb, 2023 1 commit
  3. 07 Feb, 2023 3 commits
    • Vladimir Oltean's avatar
      selftests: ocelot: tc_flower_chains: make test_vlan_ingress_modify() more comprehensive · bbb253b2
      Vladimir Oltean authored
      We have two IS1 filters of the OCELOT_VCAP_KEY_ANY key type (the one with
      "action vlan pop" and the one with "action vlan modify") and one of the
      OCELOT_VCAP_KEY_IPV4 key type (the one with "action skbedit priority").
      But we have no IS1 filter with the OCELOT_VCAP_KEY_ETYPE key type, and
      there was an uncaught breakage there.
      
      To increase test coverage, convert one of the OCELOT_VCAP_KEY_ANY
      filters to OCELOT_VCAP_KEY_ETYPE, by making the filter also match on the
      MAC SA of the traffic sent by mausezahn, $h1_mac.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Link: https://lore.kernel.org/r/20230205192409.1796428-2-vladimir.oltean@nxp.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      bbb253b2
    • Vladimir Oltean's avatar
      net: mscc: ocelot: fix VCAP filters not matching on MAC with "protocol 802.1Q" · f964f839
      Vladimir Oltean authored
      Alternative short title: don't instruct the hardware to match on
      EtherType with "protocol 802.1Q" flower filters. It doesn't work for the
      reasons detailed below.
      
      With a command such as the following:
      
      tc filter add dev $swp1 ingress chain $(IS1 2) pref 3 \
      	protocol 802.1Q flower skip_sw vlan_id 200 src_mac $h1_mac \
      	action vlan modify id 300 \
      	action goto chain $(IS2 0 0)
      
      the created filter is set by ocelot_flower_parse_key() to be of type
      OCELOT_VCAP_KEY_ETYPE, and etype is set to {value=0x8100, mask=0xffff}.
      This gets propagated all the way to is1_entry_set() which commits it to
      hardware (the VCAP_IS1_HK_ETYPE field of the key). Compare this to the
      case where src_mac isn't specified - the key type is OCELOT_VCAP_KEY_ANY,
      and is1_entry_set() doesn't populate VCAP_IS1_HK_ETYPE.
      
      The problem is that for VLAN-tagged frames, the hardware interprets the
      ETYPE field as holding the encapsulated VLAN protocol. So the above
      filter will only match those packets which have an encapsulated protocol
      of 0x8100, rather than all packets with VLAN ID 200 and the given src_mac.
      
      The reason why this is allowed to occur is because, although we have a
      block of code in ocelot_flower_parse_key() which sets "match_protocol"
      to false when VLAN keys are present, that code executes too late.
      There is another block of code, which executes for Ethernet addresses,
      and has a "goto finished_key_parsing" and skips the VLAN header parsing.
      By skipping it, "match_protocol" remains with the value it was
      initialized with, i.e. "true", and "proto" is set to f->common.protocol,
      or 0x8100.
      
      The concept of ignoring some keys rather than erroring out when they are
      present but can't be offloaded is dubious in itself, but is present
      since the initial commit fe3490e6 ("net: mscc: ocelot: Hardware
      ofload for tc flower filter"), and it's outside of the scope of this
      patch to change that.
      
      The problem was introduced when the driver started to interpret the
      flower filter's protocol, and populate the VCAP filter's ETYPE field
      based on it.
      
      To fix this, it is sufficient to move the code that parses the VLAN keys
      earlier than the "goto finished_key_parsing" instruction. This will
      ensure that if we have a flower filter with both VLAN and Ethernet
      address keys, it won't match on ETYPE 0x8100, because the VLAN key
      parsing sets "match_protocol = false".
      
      Fixes: 86b956de ("net: mscc: ocelot: support matching on EtherType")
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Link: https://lore.kernel.org/r/20230205192409.1796428-1-vladimir.oltean@nxp.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      f964f839
    • Vladimir Oltean's avatar
      net: dsa: mt7530: don't change PVC_EG_TAG when CPU port becomes VLAN-aware · 0b6d6425
      Vladimir Oltean authored
      Frank reports that in a mt7530 setup where some ports are standalone and
      some are in a VLAN-aware bridge, 8021q uppers of the standalone ports
      lose their VLAN tag on xmit, as seen by the link partner.
      
      This seems to occur because once the other ports join the VLAN-aware
      bridge, mt7530_port_vlan_filtering() also calls
      mt7530_port_set_vlan_aware(ds, cpu_dp->index), and this affects the way
      that the switch processes the traffic of the standalone port.
      
      Relevant is the PVC_EG_TAG bit. The MT7530 documentation says about it:
      
      EG_TAG: Incoming Port Egress Tag VLAN Attribution
      0: disabled (system default)
      1: consistent (keep the original ingress tag attribute)
      
      My interpretation is that this setting applies on the ingress port, and
      "disabled" is basically the normal behavior, where the egress tag format
      of the packet (tagged or untagged) is decided by the VLAN table
      (MT7530_VLAN_EGRESS_UNTAG or MT7530_VLAN_EGRESS_TAG).
      
      But there is also an option of overriding the system default behavior,
      and for the egress tagging format of packets to be decided not by the
      VLAN table, but simply by copying the ingress tag format (if ingress was
      tagged, egress is tagged; if ingress was untagged, egress is untagged;
      aka "consistent). This is useful in 2 scenarios:
      
      - VLAN-unaware bridge ports will always encounter a miss in the VLAN
        table. They should forward a packet as-is, though. So we use
        "consistent" there. See commit e045124e ("net: dsa: mt7530: fix
        tagged frames pass-through in VLAN-unaware mode").
      
      - Traffic injected from the CPU port. The operating system is in god
        mode; if it wants a packet to exit as VLAN-tagged, it sends it as
        VLAN-tagged. Otherwise it sends it as VLAN-untagged*.
      
      *This is true only if we don't consider the bridge TX forwarding offload
      feature, which mt7530 doesn't support.
      
      So for now, make the CPU port always stay in "consistent" mode to allow
      software VLANs to be forwarded to their egress ports with the VLAN tag
      intact, and not stripped.
      
      Link: https://lore.kernel.org/netdev/trinity-e6294d28-636c-4c40-bb8b-b523521b00be-1674233135062@3c-app-gmx-bs36/
      Fixes: e045124e ("net: dsa: mt7530: fix tagged frames pass-through in VLAN-unaware mode")
      Reported-by: default avatarFrank Wunderlich <frank-w@public-files.de>
      Tested-by: default avatarFrank Wunderlich <frank-w@public-files.de>
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Tested-by: default avatarArınç ÜNAL <arinc.unal@arinc9.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Link: https://lore.kernel.org/r/20230205140713.1609281-1-vladimir.oltean@nxp.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      0b6d6425
  4. 06 Feb, 2023 4 commits
    • Alan Stern's avatar
      net: USB: Fix wrong-direction WARNING in plusb.c · 811d5811
      Alan Stern authored
      The syzbot fuzzer detected a bug in the plusb network driver: A
      zero-length control-OUT transfer was treated as a read instead of a
      write.  In modern kernels this error provokes a WARNING:
      
      usb 1-1: BOGUS control dir, pipe 80000280 doesn't match bRequestType c0
      WARNING: CPU: 0 PID: 4645 at drivers/usb/core/urb.c:411
      usb_submit_urb+0x14a7/0x1880 drivers/usb/core/urb.c:411
      Modules linked in:
      CPU: 1 PID: 4645 Comm: dhcpcd Not tainted
      6.2.0-rc6-syzkaller-00050-g9f266cca #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google
      01/12/2023
      RIP: 0010:usb_submit_urb+0x14a7/0x1880 drivers/usb/core/urb.c:411
      ...
      Call Trace:
       <TASK>
       usb_start_wait_urb+0x101/0x4b0 drivers/usb/core/message.c:58
       usb_internal_control_msg drivers/usb/core/message.c:102 [inline]
       usb_control_msg+0x320/0x4a0 drivers/usb/core/message.c:153
       __usbnet_read_cmd+0xb9/0x390 drivers/net/usb/usbnet.c:2010
       usbnet_read_cmd+0x96/0xf0 drivers/net/usb/usbnet.c:2068
       pl_vendor_req drivers/net/usb/plusb.c:60 [inline]
       pl_set_QuickLink_features drivers/net/usb/plusb.c:75 [inline]
       pl_reset+0x2f/0xf0 drivers/net/usb/plusb.c:85
       usbnet_open+0xcc/0x5d0 drivers/net/usb/usbnet.c:889
       __dev_open+0x297/0x4d0 net/core/dev.c:1417
       __dev_change_flags+0x587/0x750 net/core/dev.c:8530
       dev_change_flags+0x97/0x170 net/core/dev.c:8602
       devinet_ioctl+0x15a2/0x1d70 net/ipv4/devinet.c:1147
       inet_ioctl+0x33f/0x380 net/ipv4/af_inet.c:979
       sock_do_ioctl+0xcc/0x230 net/socket.c:1169
       sock_ioctl+0x1f8/0x680 net/socket.c:1286
       vfs_ioctl fs/ioctl.c:51 [inline]
       __do_sys_ioctl fs/ioctl.c:870 [inline]
       __se_sys_ioctl fs/ioctl.c:856 [inline]
       __x64_sys_ioctl+0x197/0x210 fs/ioctl.c:856
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x63/0xcd
      
      The fix is to call usbnet_write_cmd() instead of usbnet_read_cmd() and
      remove the USB_DIR_IN flag.
      
      Reported-and-tested-by: syzbot+2a0e7abd24f1eb90ce25@syzkaller.appspotmail.com
      Signed-off-by: default avatarAlan Stern <stern@rowland.harvard.edu>
      Fixes: 090ffa9d ("[PATCH] USB: usbnet (9/9) module for pl2301/2302 cables")
      CC: stable@vger.kernel.org
      Link: https://lore.kernel.org/r/00000000000052099f05f3b3e298@google.com/Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      811d5811
    • Casper Andersson's avatar
      net: microchip: sparx5: fix PTP init/deinit not checking all ports · d7d94b26
      Casper Andersson authored
      Check all ports instead of just port_count ports. PTP init was only
      checking ports 0 to port_count. If the hardware ports are not mapped
      starting from 0 then they would be missed, e.g. if only ports 20-30 were
      mapped it would attempt to init ports 0-10, resulting in NULL pointers
      when attempting to timestamp. Now it will init all mapped ports.
      
      Fixes: 70dfe25c ("net: sparx5: Update extraction/injection for timestamping")
      Signed-off-by: default avatarCasper Andersson <casper.casan@gmail.com>
      Reviewed-by: default avatarHoratiu Vultur <horatiu.vultur@microchip.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d7d94b26
    • Herton R. Krzesinski's avatar
      uapi: add missing ip/ipv6 header dependencies for linux/stddef.h · 03702d4d
      Herton R. Krzesinski authored
      Since commit 58e0be1e ("net: use struct_group to copy ip/ipv6
      header addresses"), ip and ipv6 headers started to use the __struct_group
      definition, which is defined at include/uapi/linux/stddef.h. However,
      linux/stddef.h isn't explicitly included in include/uapi/linux/{ip,ipv6}.h,
      which breaks build of xskxceiver bpf selftest if you install the uapi
      headers in the system:
      
      $ make V=1 xskxceiver -C tools/testing/selftests/bpf
      ...
      make: Entering directory '(...)/tools/testing/selftests/bpf'
      gcc -g -O0 -rdynamic -Wall -Werror (...)
      In file included from xskxceiver.c:79:
      /usr/include/linux/ip.h:103:9: error: expected specifier-qualifier-list before ‘__struct_group’
        103 |         __struct_group(/* no tag */, addrs, /* no attrs */,
            |         ^~~~~~~~~~~~~~
      ...
      
      Include the missing <linux/stddef.h> dependency in ip.h and do the
      same for the ipv6.h header.
      
      Fixes: 58e0be1e ("net: use struct_group to copy ip/ipv6 header addresses")
      Signed-off-by: default avatarHerton R. Krzesinski <herton@redhat.com>
      Reviewed-by: default avatarCarlos O'Donell <carlos@redhat.com>
      Tested-by: default avatarCarlos O'Donell <carlos@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      03702d4d
    • Julian Anastasov's avatar
      neigh: make sure used and confirmed times are valid · c1d2ecdf
      Julian Anastasov authored
      Entries can linger in cache without timer for days, thanks to
      the gc_thresh1 limit. As result, without traffic, the confirmed
      time can be outdated and to appear to be in the future. Later,
      on traffic, NUD_STALE entries can switch to NUD_DELAY and start
      the timer which can see the invalid confirmed time and wrongly
      switch to NUD_REACHABLE state instead of NUD_PROBE. As result,
      timer is set many days in the future. This is more visible on
      32-bit platforms, with higher HZ value.
      
      Why this is a problem? While we expect unused entries to expire,
      such entries stay in REACHABLE state for too long, locked in
      cache. They are not expired normally, only when cache is full.
      
      Problem and the wrong state change reported by Zhang Changzhong:
      
      172.16.1.18 dev bond0 lladdr 0a:0e:0f:01:12:01 ref 1 used 350521/15994171/350520 probes 4 REACHABLE
      
      350520 seconds have elapsed since this entry was last updated, but it is
      still in the REACHABLE state (base_reachable_time_ms is 30000),
      preventing lladdr from being updated through probe.
      
      Fix it by ensuring timer is started with valid used/confirmed
      times. Considering the valid time range is LONG_MAX jiffies,
      we try not to go too much in the past while we are in
      DELAY/PROBE state. There are also places that need
      used/updated times to be validated while timer is not running.
      Reported-by: default avatarZhang Changzhong <zhangchangzhong@huawei.com>
      Signed-off-by: default avatarJulian Anastasov <ja@ssi.bg>
      Tested-by: default avatarZhang Changzhong <zhangchangzhong@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c1d2ecdf
  5. 04 Feb, 2023 7 commits
  6. 03 Feb, 2023 1 commit
  7. 02 Feb, 2023 19 commits