1. 05 Apr, 2022 4 commits
  2. 04 Apr, 2022 11 commits
  3. 03 Apr, 2022 2 commits
  4. 02 Apr, 2022 2 commits
  5. 01 Apr, 2022 17 commits
    • David S. Miller's avatar
      Merge branch 'nexthop-route-deletye-warning' · 37391cc8
      David S. Miller authored
      Nikolay Aleksandrov says:
      
      ====================
      net: ipv4: fix nexthop route delete warning
      
      The first patch fixes a warning that can be triggered by deleting a
      nexthop route and specifying a device (more info in its commit msg).
      And the second patch adds a selftest for that case.
      
      Chose this way to fix it because we should match when deleting without
      nh spec and should fail when deleting a nexthop route with old-style nh
      spec because nexthop objects are managed separately, e.g.:
      $ ip r show 1.2.3.4/32
      1.2.3.4 nhid 12 via 192.168.11.2 dev dummy0
      
      $ ip r del 1.2.3.4/32
      $ ip r del 1.2.3.4/32 nhid 12
      <both should work>
      
      $ ip r del 1.2.3.4/32 dev dummy0
      <should fail with ESRCH>
      
      v2: addded more to patch 01's commit message
          adjusted the test comment in patch 02
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      37391cc8
    • Nikolay Aleksandrov's avatar
      selftests: net: add delete nexthop route warning test · 392baa33
      Nikolay Aleksandrov authored
      Add a test which causes a WARNING on kernels which treat a
      nexthop route like a normal route when comparing for deletion and a
      device is specified. That is, a route is found but we hit a warning while
      matching it. The warning is from fib_info_nh() in include/net/nexthop.h
      because we run it on a fib_info with nexthop object. The call chain is:
       inet_rtm_delroute -> fib_table_delete -> fib_nh_match (called with a
      nexthop fib_info and also with fc_oif set thus calling fib_info_nh on
      the fib_info and triggering the warning).
      
      Repro steps:
       $ ip nexthop add id 12 via 172.16.1.3 dev veth1
       $ ip route add 172.16.101.1/32 nhid 12
       $ ip route delete 172.16.101.1/32 dev veth1
      Signed-off-by: default avatarNikolay Aleksandrov <razor@blackwall.org>
      Reviewed-by: default avatarDavid Ahern <dsahern@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      392baa33
    • Nikolay Aleksandrov's avatar
      net: ipv4: fix route with nexthop object delete warning · 6bf92d70
      Nikolay Aleksandrov authored
      FRR folks have hit a kernel warning[1] while deleting routes[2] which is
      caused by trying to delete a route pointing to a nexthop id without
      specifying nhid but matching on an interface. That is, a route is found
      but we hit a warning while matching it. The warning is from
      fib_info_nh() in include/net/nexthop.h because we run it on a fib_info
      with nexthop object. The call chain is:
       inet_rtm_delroute -> fib_table_delete -> fib_nh_match (called with a
      nexthop fib_info and also with fc_oif set thus calling fib_info_nh on
      the fib_info and triggering the warning). The fix is to not do any
      matching in that branch if the fi has a nexthop object because those are
      managed separately. I.e. we should match when deleting without nh spec and
      should fail when deleting a nexthop route with old-style nh spec because
      nexthop objects are managed separately, e.g.:
       $ ip r show 1.2.3.4/32
       1.2.3.4 nhid 12 via 192.168.11.2 dev dummy0
      
       $ ip r del 1.2.3.4/32
       $ ip r del 1.2.3.4/32 nhid 12
       <both should work>
      
       $ ip r del 1.2.3.4/32 dev dummy0
       <should fail with ESRCH>
      
      [1]
       [  523.462226] ------------[ cut here ]------------
       [  523.462230] WARNING: CPU: 14 PID: 22893 at include/net/nexthop.h:468 fib_nh_match+0x210/0x460
       [  523.462236] Modules linked in: dummy rpcsec_gss_krb5 xt_socket nf_socket_ipv4 nf_socket_ipv6 ip6table_raw iptable_raw bpf_preload xt_statistic ip_set ip_vs_sh ip_vs_wrr ip_vs_rr ip_vs xt_mark nf_tables xt_nat veth nf_conntrack_netlink nfnetlink xt_addrtype br_netfilter overlay dm_crypt nfsv3 nfs fscache netfs vhost_net vhost vhost_iotlb tap tun xt_CHECKSUM xt_MASQUERADE xt_conntrack 8021q garp mrp ipt_REJECT nf_reject_ipv4 ip6table_mangle ip6table_nat iptable_mangle iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_filter bridge stp llc rfcomm snd_seq_dummy snd_hrtimer rpcrdma rdma_cm iw_cm ib_cm ib_core ip6table_filter xt_comment ip6_tables vboxnetadp(OE) vboxnetflt(OE) vboxdrv(OE) qrtr bnep binfmt_misc xfs vfat fat squashfs loop nvidia_drm(POE) nvidia_modeset(POE) nvidia_uvm(POE) nvidia(POE) intel_rapl_msr intel_rapl_common snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio snd_hda_codec_hdmi btusb btrtl iwlmvm uvcvideo btbcm snd_hda_intel edac_mce_amd
       [  523.462274]  videobuf2_vmalloc videobuf2_memops btintel snd_intel_dspcfg videobuf2_v4l2 snd_intel_sdw_acpi bluetooth snd_usb_audio snd_hda_codec mac80211 snd_usbmidi_lib joydev snd_hda_core videobuf2_common kvm_amd snd_rawmidi snd_hwdep snd_seq videodev ccp snd_seq_device libarc4 ecdh_generic mc snd_pcm kvm iwlwifi snd_timer drm_kms_helper snd cfg80211 cec soundcore irqbypass rapl wmi_bmof i2c_piix4 rfkill k10temp pcspkr acpi_cpufreq nfsd auth_rpcgss nfs_acl lockd grace sunrpc drm zram ip_tables crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel nvme sp5100_tco r8169 nvme_core wmi ipmi_devintf ipmi_msghandler fuse
       [  523.462300] CPU: 14 PID: 22893 Comm: ip Tainted: P           OE     5.16.18-200.fc35.x86_64 #1
       [  523.462302] Hardware name: Micro-Star International Co., Ltd. MS-7C37/MPG X570 GAMING EDGE WIFI (MS-7C37), BIOS 1.C0 10/29/2020
       [  523.462303] RIP: 0010:fib_nh_match+0x210/0x460
       [  523.462304] Code: 7c 24 20 48 8b b5 90 00 00 00 e8 bb ee f4 ff 48 8b 7c 24 20 41 89 c4 e8 ee eb f4 ff 45 85 e4 0f 85 2e fe ff ff e9 4c ff ff ff <0f> 0b e9 17 ff ff ff 3c 0a 0f 85 61 fe ff ff 48 8b b5 98 00 00 00
       [  523.462306] RSP: 0018:ffffaa53d4d87928 EFLAGS: 00010286
       [  523.462307] RAX: 0000000000000000 RBX: ffffaa53d4d87a90 RCX: ffffaa53d4d87bb0
       [  523.462308] RDX: ffff9e3d2ee6be80 RSI: ffffaa53d4d87a90 RDI: ffffffff920ed380
       [  523.462309] RBP: ffff9e3d2ee6be80 R08: 0000000000000064 R09: 0000000000000000
       [  523.462310] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000031
       [  523.462310] R13: 0000000000000020 R14: 0000000000000000 R15: ffff9e3d331054e0
       [  523.462311] FS:  00007f245517c1c0(0000) GS:ffff9e492ed80000(0000) knlGS:0000000000000000
       [  523.462313] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       [  523.462313] CR2: 000055e5dfdd8268 CR3: 00000003ef488000 CR4: 0000000000350ee0
       [  523.462315] Call Trace:
       [  523.462316]  <TASK>
       [  523.462320]  fib_table_delete+0x1a9/0x310
       [  523.462323]  inet_rtm_delroute+0x93/0x110
       [  523.462325]  rtnetlink_rcv_msg+0x133/0x370
       [  523.462327]  ? _copy_to_iter+0xb5/0x6f0
       [  523.462330]  ? rtnl_calcit.isra.0+0x110/0x110
       [  523.462331]  netlink_rcv_skb+0x50/0xf0
       [  523.462334]  netlink_unicast+0x211/0x330
       [  523.462336]  netlink_sendmsg+0x23f/0x480
       [  523.462338]  sock_sendmsg+0x5e/0x60
       [  523.462340]  ____sys_sendmsg+0x22c/0x270
       [  523.462341]  ? import_iovec+0x17/0x20
       [  523.462343]  ? sendmsg_copy_msghdr+0x59/0x90
       [  523.462344]  ? __mod_lruvec_page_state+0x85/0x110
       [  523.462348]  ___sys_sendmsg+0x81/0xc0
       [  523.462350]  ? netlink_seq_start+0x70/0x70
       [  523.462352]  ? __dentry_kill+0x13a/0x180
       [  523.462354]  ? __fput+0xff/0x250
       [  523.462356]  __sys_sendmsg+0x49/0x80
       [  523.462358]  do_syscall_64+0x3b/0x90
       [  523.462361]  entry_SYSCALL_64_after_hwframe+0x44/0xae
       [  523.462364] RIP: 0033:0x7f24552aa337
       [  523.462365] Code: 0e 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b9 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 89 54 24 1c 48 89 74 24 10
       [  523.462366] RSP: 002b:00007fff7f05a838 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
       [  523.462368] RAX: ffffffffffffffda RBX: 000000006245bf91 RCX: 00007f24552aa337
       [  523.462368] RDX: 0000000000000000 RSI: 00007fff7f05a8a0 RDI: 0000000000000003
       [  523.462369] RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000000
       [  523.462370] R10: 0000000000000008 R11: 0000000000000246 R12: 0000000000000001
       [  523.462370] R13: 00007fff7f05ce08 R14: 0000000000000000 R15: 000055e5dfdd1040
       [  523.462373]  </TASK>
       [  523.462374] ---[ end trace ba537bc16f6bf4ed ]---
      
      [2] https://github.com/FRRouting/frr/issues/6412
      
      Fixes: 4c7e8084 ("ipv4: Plumb support for nexthop object in a fib_info")
      Signed-off-by: default avatarNikolay Aleksandrov <razor@blackwall.org>
      Reviewed-by: default avatarDavid Ahern <dsahern@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6bf92d70
    • Randy Dunlap's avatar
      net: micrel: fix KS8851_MLL Kconfig · c3efcedd
      Randy Dunlap authored
      KS8851_MLL selects MICREL_PHY, which depends on PTP_1588_CLOCK_OPTIONAL,
      so make KS8851_MLL also depend on PTP_1588_CLOCK_OPTIONAL since
      'select' does not follow any dependency chains.
      
      Fixes kconfig warning and build errors:
      
      WARNING: unmet direct dependencies detected for MICREL_PHY
        Depends on [m]: NETDEVICES [=y] && PHYLIB [=y] && PTP_1588_CLOCK_OPTIONAL [=m]
        Selected by [y]:
        - KS8851_MLL [=y] && NETDEVICES [=y] && ETHERNET [=y] && NET_VENDOR_MICREL [=y] && HAS_IOMEM [=y]
      
      ld: drivers/net/phy/micrel.o: in function `lan8814_ts_info':
      micrel.c:(.text+0xb35): undefined reference to `ptp_clock_index'
      ld: drivers/net/phy/micrel.o: in function `lan8814_probe':
      micrel.c:(.text+0x2586): undefined reference to `ptp_clock_register'
      Signed-off-by: default avatarRandy Dunlap <rdunlap@infradead.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Jakub Kicinski <kuba@kernel.org>
      Cc: Paolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c3efcedd
    • David S. Miller's avatar
      Merge branch 'MCTP-fixes' · f41bdd49
      David S. Miller authored
      Matt Johnston says:
      
      ====================
      MCTP fixes
      
      The following are fixes for the mctp core and mctp-i2c driver.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f41bdd49
    • Matt Johnston's avatar
      mctp: Use output netdev to allocate skb headroom · 4a9dda1c
      Matt Johnston authored
      Previously the skb was allocated with headroom MCTP_HEADER_MAXLEN,
      but that isn't sufficient if we are using devs that are not MCTP
      specific.
      
      This also adds a check that the smctp_halen provided to sendmsg for
      extended addressing is the correct size for the netdev.
      
      Fixes: 833ef3b9 ("mctp: Populate socket implementation")
      Reported-by: default avatarMatthew Rinaldi <mjrinal@g.clemson.edu>
      Signed-off-by: default avatarMatt Johnston <matt@codeconstruct.com.au>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4a9dda1c
    • Matt Johnston's avatar
      mctp i2c: correct mctp_i2c_header_create result · 8ce40a2f
      Matt Johnston authored
      header_ops.create should return the length of the header,
      instead mctp_i2c_head_create() returned 0.
      This didn't cause any problem because the MCTP stack accepted
      0 as success.
      
      Fixes: f5b8abf9 ("mctp i2c: MCTP I2C binding driver")
      Signed-off-by: default avatarMatt Johnston <matt@codeconstruct.com.au>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8ce40a2f
    • Matt Johnston's avatar
      mctp: Fix check for dev_hard_header() result · 60be976a
      Matt Johnston authored
      dev_hard_header() returns the length of the header, so
      we need to test for negative errors rather than non-zero.
      
      Fixes: 889b7da2 ("mctp: Add initial routing framework")
      Signed-off-by: default avatarMatt Johnston <matt@codeconstruct.com.au>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      60be976a
    • David S. Miller's avatar
      Merge branch 'ice-fixups' · 4298a62f
      David S. Miller authored
      Tony Nguyen says:
      
      ====================
      ice-fixups
      
      This series handles a handful of cleanups for the ice
      driver.  Ivan fixed a problem on the VSI during a release,
      fixing a MAC address setting, and a broken IFF_ALLMULTI
      handling.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4298a62f
    • Ivan Vecera's avatar
      ice: Fix broken IFF_ALLMULTI handling · 1273f895
      Ivan Vecera authored
      Handling of all-multicast flag and associated multicast promiscuous
      mode is broken in ice driver. When an user switches allmulticast
      flag on or off the driver checks whether any VLANs are configured
      over the interface (except default VLAN 0).
      
      If any extra VLANs are registered it enables multicast promiscuous
      mode for all these VLANs (including default VLAN 0) using
      ICE_SW_LKUP_PROMISC_VLAN look-up type. In this situation all
      multicast packets tagged with known VLAN ID or untagged are received
      and multicast packets tagged with unknown VLAN ID ignored.
      
      If no extra VLANs are registered (so only VLAN 0 exists) it enables
      multicast promiscuous mode for VLAN 0 and uses ICE_SW_LKUP_PROMISC
      look-up type. In this situation any multicast packets including
      tagged ones are received.
      
      The driver handles IFF_ALLMULTI in ice_vsi_sync_fltr() this way:
      
      ice_vsi_sync_fltr() {
        ...
        if (changed_flags & IFF_ALLMULTI) {
          if (netdev->flags & IFF_ALLMULTI) {
            if (vsi->num_vlans > 1)
              ice_set_promisc(..., ICE_MCAST_VLAN_PROMISC_BITS);
            else
              ice_set_promisc(..., ICE_MCAST_PROMISC_BITS);
          } else {
            if (vsi->num_vlans > 1)
              ice_clear_promisc(..., ICE_MCAST_VLAN_PROMISC_BITS);
            else
              ice_clear_promisc(..., ICE_MCAST_PROMISC_BITS);
          }
        }
        ...
      }
      
      The code above depends on value vsi->num_vlan that specifies number
      of VLANs configured over the interface (including VLAN 0) and
      this is problem because that value is modified in NDO callbacks
      ice_vlan_rx_add_vid() and ice_vlan_rx_kill_vid().
      
      Scenario 1:
      1. ip link set ens7f0 allmulticast on
      2. ip link add vlan10 link ens7f0 type vlan id 10
      3. ip link set ens7f0 allmulticast off
      4. ip link set ens7f0 allmulticast on
      
      [1] In this scenario IFF_ALLMULTI is enabled and the driver calls
          ice_set_promisc(..., ICE_MCAST_PROMISC_BITS) that installs
          multicast promisc rule with non-VLAN look-up type.
      [2] Then VLAN with ID 10 is added and vsi->num_vlan incremented to 2
      [3] Command switches IFF_ALLMULTI off and the driver calls
          ice_clear_promisc(..., ICE_MCAST_VLAN_PROMISC_BITS) but this
          call is effectively NOP because it looks for multicast promisc
          rules for VLAN 0 and VLAN 10 with VLAN look-up type but no such
          rules exist. So the all-multicast remains enabled silently
          in hardware.
      [4] Command tries to switch IFF_ALLMULTI on and the driver calls
          ice_clear_promisc(..., ICE_MCAST_PROMISC_BITS) but this call
          fails (-EEXIST) because non-VLAN multicast promisc rule already
          exists.
      
      Scenario 2:
      1. ip link add vlan10 link ens7f0 type vlan id 10
      2. ip link set ens7f0 allmulticast on
      3. ip link add vlan20 link ens7f0 type vlan id 20
      4. ip link del vlan10 ; ip link del vlan20
      5. ip link set ens7f0 allmulticast off
      
      [1] VLAN with ID 10 is added and vsi->num_vlan==2
      [2] Command switches IFF_ALLMULTI on and driver installs multicast
          promisc rules with VLAN look-up type for VLAN 0 and 10
      [3] VLAN with ID 20 is added and vsi->num_vlan==3 but no multicast
          promisc rules is added for this new VLAN so the interface does
          not receive MC packets from VLAN 20
      [4] Both VLANs are removed but multicast rule for VLAN 10 remains
          installed so interface receives multicast packets from VLAN 10
      [5] Command switches IFF_ALLMULTI off and because vsi->num_vlan is 1
          the driver tries to remove multicast promisc rule for VLAN 0
          with non-VLAN look-up that does not exist.
          All-multicast looks disabled from user point of view but it
          is partially enabled in HW (interface receives all multicast
          packets either untagged or tagged with VLAN ID 10)
      
      To resolve these issues the patch introduces these changes:
      1. Adds handling for IFF_ALLMULTI to ice_vlan_rx_add_vid() and
         ice_vlan_rx_kill_vid() callbacks. So when VLAN is added/removed
         and IFF_ALLMULTI is enabled an appropriate multicast promisc
         rule for that VLAN ID is added/removed.
      2. In ice_vlan_rx_add_vid() when first VLAN besides VLAN 0 is added
         so (vsi->num_vlan == 2) and IFF_ALLMULTI is enabled then look-up
         type for existing multicast promisc rule for VLAN 0 is updated
         to ICE_MCAST_VLAN_PROMISC_BITS.
      3. In ice_vlan_rx_kill_vid() when last VLAN besides VLAN 0 is removed
         so (vsi->num_vlan == 1) and IFF_ALLMULTI is enabled then look-up
         type for existing multicast promisc rule for VLAN 0 is updated
         to ICE_MCAST_PROMISC_BITS.
      4. Both ice_vlan_rx_{add,kill}_vid() have to run under ICE_CFG_BUSY
         bit protection to avoid races with ice_vsi_sync_fltr() that runs
         in ice_service_task() context.
      5. Bit ICE_VSI_VLAN_FLTR_CHANGED is use-less and can be removed.
      6. Error messages added to ice_fltr_*_vsi_promisc() helper functions
         to avoid them in their callers
      7. Small improvements to increase readability
      
      Fixes: 5eda8afd ("ice: Add support for PF/VF promiscuous mode")
      Signed-off-by: default avatarIvan Vecera <ivecera@redhat.com>
      Reviewed-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Signed-off-by: default avatarAlice Michael <alice.michael@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1273f895
    • Ivan Vecera's avatar
      ice: Fix MAC address setting · 2c0069f3
      Ivan Vecera authored
      Commit 2ccc1c1c ("ice: Remove excess error variables") merged
      the usage of 'status' and 'err' variables into single one in
      function ice_set_mac_address(). Unfortunately this causes
      a regression when call of ice_fltr_add_mac() returns -EEXIST because
      this return value does not indicate an error in this case but
      value of 'err' remains to be -EEXIST till the end of the function
      and is returned to caller.
      
      Prior mentioned commit this does not happen because return value of
      ice_fltr_add_mac() was stored to 'status' variable first and
      if it was -EEXIST then 'err' remains to be zero.
      
      Fix the problem by reset 'err' to zero when ice_fltr_add_mac()
      returns -EEXIST.
      
      Fixes: 2ccc1c1c ("ice: Remove excess error variables")
      Signed-off-by: default avatarIvan Vecera <ivecera@redhat.com>
      Reviewed-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Acked-by: default avatarAlexander Lobakin <alexandr.lobakin@intel.com>
      Signed-off-by: default avatarAlice Michael <alice.michael@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2c0069f3
    • Ivan Vecera's avatar
      ice: Clear default forwarding VSI during VSI release · bd8c624c
      Ivan Vecera authored
      VSI is set as default forwarding one when promisc mode is set for
      PF interface, when PF is switched to switchdev mode or when VF
      driver asks to enable allmulticast or promisc mode for the VF
      interface (when vf-true-promisc-support priv flag is off).
      The third case is buggy because in that case VSI associated with
      VF remains as default one after VF removal.
      
      Reproducer:
      1. Create VF
         echo 1 > sys/class/net/ens7f0/device/sriov_numvfs
      2. Enable allmulticast or promisc mode on VF
         ip link set ens7f0v0 allmulticast on
         ip link set ens7f0v0 promisc on
      3. Delete VF
         echo 0 > sys/class/net/ens7f0/device/sriov_numvfs
      4. Try to enable promisc mode on PF
         ip link set ens7f0 promisc on
      
      Although it looks that promisc mode on PF is enabled the opposite
      is true because ice_vsi_sync_fltr() responsible for IFF_PROMISC
      handling first checks if any other VSI is set as default forwarding
      one and if so the function does not do anything. At this point
      it is not possible to enable promisc mode on PF without re-probe
      device.
      
      To resolve the issue this patch clear default forwarding VSI
      during ice_vsi_release() when the VSI to be released is the default
      one.
      
      Fixes: 01b5e89a ("ice: Add VF promiscuous support")
      Signed-off-by: default avatarIvan Vecera <ivecera@redhat.com>
      Reviewed-by: default avatarMichal Swiatkowski <michal.swiatkowski@linux.intel.com>
      Reviewed-by: default avatarMaciej Fijalkowski <maciej.fijalkowski@intel.com>
      Signed-off-by: default avatarAlice Michael <alice.michael@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bd8c624c
    • Vladimir Oltean's avatar
      Revert "net: dsa: stop updating master MTU from master.c" · 066dfc42
      Vladimir Oltean authored
      This reverts commit a1ff94c2.
      
      Switch drivers that don't implement ->port_change_mtu() will cause the
      DSA master to remain with an MTU of 1500, since we've deleted the other
      code path. In turn, this causes a regression for those systems, where
      MTU-sized traffic can no longer be terminated.
      
      Revert the change taking into account the fact that rtnl_lock() is now
      taken top-level from the callers of dsa_master_setup() and
      dsa_master_teardown(). Also add a comment in order for it to be
      absolutely clear why it is still needed.
      
      Fixes: a1ff94c2 ("net: dsa: stop updating master MTU from master.c")
      Reported-by: default avatarLuiz Angelo Daros de Luca <luizluca@gmail.com>
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Tested-by: default avatarLuiz Angelo Daros de Luca <luizluca@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      066dfc42
    • Jean-Philippe Brucker's avatar
      skbuff: fix coalescing for page_pool fragment recycling · 1effe8ca
      Jean-Philippe Brucker authored
      Fix a use-after-free when using page_pool with page fragments. We
      encountered this problem during normal RX in the hns3 driver:
      
      (1) Initially we have three descriptors in the RX queue. The first one
          allocates PAGE1 through page_pool, and the other two allocate one
          half of PAGE2 each. Page references look like this:
      
                      RX_BD1 _______ PAGE1
                      RX_BD2 _______ PAGE2
                      RX_BD3 _________/
      
      (2) Handle RX on the first descriptor. Allocate SKB1, eventually added
          to the receive queue by tcp_queue_rcv().
      
      (3) Handle RX on the second descriptor. Allocate SKB2 and pass it to
          netif_receive_skb():
      
          netif_receive_skb(SKB2)
            ip_rcv(SKB2)
              SKB3 = skb_clone(SKB2)
      
          SKB2 and SKB3 share a reference to PAGE2 through
          skb_shinfo()->dataref. The other ref to PAGE2 is still held by
          RX_BD3:
      
                            SKB2 ---+- PAGE2
                            SKB3 __/   /
                      RX_BD3 _________/
      
       (3b) Now while handling TCP, coalesce SKB3 with SKB1:
      
            tcp_v4_rcv(SKB3)
              tcp_try_coalesce(to=SKB1, from=SKB3)    // succeeds
              kfree_skb_partial(SKB3)
                skb_release_data(SKB3)                // drops one dataref
      
                            SKB1 _____ PAGE1
                                 \____
                            SKB2 _____ PAGE2
                                       /
                      RX_BD3 _________/
      
          In skb_try_coalesce(), __skb_frag_ref() takes a page reference to
          PAGE2, where it should instead have increased the page_pool frag
          reference, pp_frag_count. Without coalescing, when releasing both
          SKB2 and SKB3, a single reference to PAGE2 would be dropped. Now
          when releasing SKB1 and SKB2, two references to PAGE2 will be
          dropped, resulting in underflow.
      
       (3c) Drop SKB2:
      
            af_packet_rcv(SKB2)
              consume_skb(SKB2)
                skb_release_data(SKB2)                // drops second dataref
                  page_pool_return_skb_page(PAGE2)    // drops one pp_frag_count
      
                            SKB1 _____ PAGE1
                                 \____
                                       PAGE2
                                       /
                      RX_BD3 _________/
      
      (4) Userspace calls recvmsg()
          Copies SKB1 and releases it. Since SKB3 was coalesced with SKB1, we
          release the SKB3 page as well:
      
          tcp_eat_recv_skb(SKB1)
            skb_release_data(SKB1)
              page_pool_return_skb_page(PAGE1)
              page_pool_return_skb_page(PAGE2)        // drops second pp_frag_count
      
      (5) PAGE2 is freed, but the third RX descriptor was still using it!
          In our case this causes IOMMU faults, but it would silently corrupt
          memory if the IOMMU was disabled.
      
      Change the logic that checks whether pp_recycle SKBs can be coalesced.
      We still reject differing pp_recycle between 'from' and 'to' SKBs, but
      in order to avoid the situation described above, we also reject
      coalescing when both 'from' and 'to' are pp_recycled and 'from' is
      cloned.
      
      The new logic allows coalescing a cloned pp_recycle SKB into a page
      refcounted one, because in this case the release (4) will drop the right
      reference, the one taken by skb_try_coalesce().
      
      Fixes: 53e0961d ("page_pool: add frag page recycling support in page pool")
      Suggested-by: default avatarAlexander Duyck <alexanderduyck@fb.com>
      Signed-off-by: default avatarJean-Philippe Brucker <jean-philippe@linaro.org>
      Reviewed-by: default avatarYunsheng Lin <linyunsheng@huawei.com>
      Reviewed-by: default avatarAlexander Duyck <alexanderduyck@fb.com>
      Acked-by: default avatarIlias Apalodimas <ilias.apalodimas@linaro.org>
      Acked-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1effe8ca
    • Eyal Birger's avatar
      vrf: fix packet sniffing for traffic originating from ip tunnels · 012d69fb
      Eyal Birger authored
      in commit 04893908
      ("vrf: add mac header for tunneled packets when sniffer is attached")
      an Ethernet header was cooked for traffic originating from tunnel devices.
      
      However, the header is added based on whether the mac_header is unset
      and ignores cases where the device doesn't expose a mac header to upper
      layers, such as in ip tunnels like ipip and gre.
      
      Traffic originating from such devices still appears garbled when capturing
      on the vrf device.
      
      Fix by observing whether the original device exposes a header to upper
      layers, similar to the logic done in af_packet.
      
      In addition, skb->mac_len needs to be adjusted after adding the Ethernet
      header for the skb_push/pull() surrounding dev_queue_xmit_nit() to work
      on these packets.
      
      Fixes: 04893908 ("vrf: add mac header for tunneled packets when sniffer is attached")
      Signed-off-by: default avatarEyal Birger <eyal.birger@gmail.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      012d69fb
    • Ziyang Xuan's avatar
      net/tls: fix slab-out-of-bounds bug in decrypt_internal · 9381fe8c
      Ziyang Xuan authored
      The memory size of tls_ctx->rx.iv for AES128-CCM is 12 setting in
      tls_set_sw_offload(). The return value of crypto_aead_ivsize()
      for "ccm(aes)" is 16. So memcpy() require 16 bytes from 12 bytes
      memory space will trigger slab-out-of-bounds bug as following:
      
      ==================================================================
      BUG: KASAN: slab-out-of-bounds in decrypt_internal+0x385/0xc40 [tls]
      Read of size 16 at addr ffff888114e84e60 by task tls/10911
      
      Call Trace:
       <TASK>
       dump_stack_lvl+0x34/0x44
       print_report.cold+0x5e/0x5db
       ? decrypt_internal+0x385/0xc40 [tls]
       kasan_report+0xab/0x120
       ? decrypt_internal+0x385/0xc40 [tls]
       kasan_check_range+0xf9/0x1e0
       memcpy+0x20/0x60
       decrypt_internal+0x385/0xc40 [tls]
       ? tls_get_rec+0x2e0/0x2e0 [tls]
       ? process_rx_list+0x1a5/0x420 [tls]
       ? tls_setup_from_iter.constprop.0+0x2e0/0x2e0 [tls]
       decrypt_skb_update+0x9d/0x400 [tls]
       tls_sw_recvmsg+0x3c8/0xb50 [tls]
      
      Allocated by task 10911:
       kasan_save_stack+0x1e/0x40
       __kasan_kmalloc+0x81/0xa0
       tls_set_sw_offload+0x2eb/0xa20 [tls]
       tls_setsockopt+0x68c/0x700 [tls]
       __sys_setsockopt+0xfe/0x1b0
      
      Replace the crypto_aead_ivsize() with prot->iv_size + prot->salt_size
      when memcpy() iv value in TLS_1_3_VERSION scenario.
      
      Fixes: f295b3ae ("net/tls: Add support of AES128-CCM based ciphers")
      Signed-off-by: default avatarZiyang Xuan <william.xuanziyang@huawei.com>
      Reviewed-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9381fe8c
    • Taehee Yoo's avatar
      net: sfc: add missing xdp queue reinitialization · 059a47f1
      Taehee Yoo authored
      After rx/tx ring buffer size is changed, kernel panic occurs when
      it acts XDP_TX or XDP_REDIRECT.
      
      When tx/rx ring buffer size is changed(ethtool -G), sfc driver
      reallocates and reinitializes rx and tx queues and their buffer
      (tx_queue->buffer).
      But it misses reinitializing xdp queues(efx->xdp_tx_queues).
      So, while it is acting XDP_TX or XDP_REDIRECT, it uses the uninitialized
      tx_queue->buffer.
      
      A new function efx_set_xdp_channels() is separated from efx_set_channels()
      to handle only xdp queues.
      
      Splat looks like:
         BUG: kernel NULL pointer dereference, address: 000000000000002a
         #PF: supervisor write access in kernel mode
         #PF: error_code(0x0002) - not-present page
         PGD 0 P4D 0
         Oops: 0002 [#4] PREEMPT SMP NOPTI
         RIP: 0010:efx_tx_map_chunk+0x54/0x90 [sfc]
         CPU: 2 PID: 0 Comm: swapper/2 Tainted: G      D           5.17.0+ #55 e8beeee8289528f11357029357cf
         Code: 48 8b 8d a8 01 00 00 48 8d 14 52 4c 8d 2c d0 44 89 e0 48 85 c9 74 0e 44 89 e2 4c 89 f6 48 80
         RSP: 0018:ffff92f121e45c60 EFLAGS: 00010297
         RIP: 0010:efx_tx_map_chunk+0x54/0x90 [sfc]
         RAX: 0000000000000040 RBX: ffff92ea506895c0 RCX: ffffffffc0330870
         RDX: 0000000000000001 RSI: 00000001139b10ce RDI: ffff92ea506895c0
         RBP: ffffffffc0358a80 R08: 00000001139b110d R09: 0000000000000000
         R10: 0000000000000001 R11: ffff92ea414c0088 R12: 0000000000000040
         R13: 0000000000000018 R14: 00000001139b10ce R15: ffff92ea506895c0
         FS:  0000000000000000(0000) GS:ffff92f121ec0000(0000) knlGS:0000000000000000
         CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
         Code: 48 8b 8d a8 01 00 00 48 8d 14 52 4c 8d 2c d0 44 89 e0 48 85 c9 74 0e 44 89 e2 4c 89 f6 48 80
         CR2: 000000000000002a CR3: 00000003e6810004 CR4: 00000000007706e0
         RSP: 0018:ffff92f121e85c60 EFLAGS: 00010297
         PKRU: 55555554
         RAX: 0000000000000040 RBX: ffff92ea50689700 RCX: ffffffffc0330870
         RDX: 0000000000000001 RSI: 00000001145a90ce RDI: ffff92ea50689700
         RBP: ffffffffc0358a80 R08: 00000001145a910d R09: 0000000000000000
         R10: 0000000000000001 R11: ffff92ea414c0088 R12: 0000000000000040
         R13: 0000000000000018 R14: 00000001145a90ce R15: ffff92ea50689700
         FS:  0000000000000000(0000) GS:ffff92f121e80000(0000) knlGS:0000000000000000
         CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
         CR2: 000000000000002a CR3: 00000003e6810005 CR4: 00000000007706e0
         PKRU: 55555554
         Call Trace:
          <IRQ>
          efx_xdp_tx_buffers+0x12b/0x3d0 [sfc 84c94b8e32d44d296c17e10a634d3ad454de4ba5]
          __efx_rx_packet+0x5c3/0x930 [sfc 84c94b8e32d44d296c17e10a634d3ad454de4ba5]
          efx_rx_packet+0x28c/0x2e0 [sfc 84c94b8e32d44d296c17e10a634d3ad454de4ba5]
          efx_ef10_ev_process+0x5f8/0xf40 [sfc 84c94b8e32d44d296c17e10a634d3ad454de4ba5]
          ? enqueue_task_fair+0x95/0x550
          efx_poll+0xc4/0x360 [sfc 84c94b8e32d44d296c17e10a634d3ad454de4ba5]
      
      Fixes: 3990a8ff ("sfc: allocate channels for XDP tx queues")
      Signed-off-by: default avatarTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      059a47f1
  6. 31 Mar, 2022 4 commits
    • Linus Torvalds's avatar
      Merge tag 'net-5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 2975dbdc
      Linus Torvalds authored
      Pull more networking updates from Jakub Kicinski:
       "Networking fixes and rethook patches.
      
        Features:
      
         - kprobes: rethook: x86: replace kretprobe trampoline with rethook
      
        Current release - regressions:
      
         - sfc: avoid null-deref on systems without NUMA awareness in the new
           queue sizing code
      
        Current release - new code bugs:
      
         - vxlan: do not feed vxlan_vnifilter_dump_dev with non-vxlan devices
      
         - eth: lan966x: fix null-deref on PHY pointer in timestamp ioctl when
           interface is down
      
        Previous releases - always broken:
      
         - openvswitch: correct neighbor discovery target mask field in the
           flow dump
      
         - wireguard: ignore v6 endpoints when ipv6 is disabled and fix a leak
      
         - rxrpc: fix call timer start racing with call destruction
      
         - rxrpc: fix null-deref when security type is rxrpc_no_security
      
         - can: fix UAF bugs around echo skbs in multiple drivers
      
        Misc:
      
         - docs: move netdev-FAQ to the 'process' section of the
           documentation"
      
      * tag 'net-5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (57 commits)
        vxlan: do not feed vxlan_vnifilter_dump_dev with non vxlan devices
        openvswitch: Add recirc_id to recirc warning
        rxrpc: fix some null-ptr-deref bugs in server_key.c
        rxrpc: Fix call timer start racing with call destruction
        net: hns3: fix software vlan talbe of vlan 0 inconsistent with hardware
        net: hns3: fix the concurrency between functions reading debugfs
        docs: netdev: move the netdev-FAQ to the process pages
        docs: netdev: broaden the new vs old code formatting guidelines
        docs: netdev: call out the merge window in tag checking
        docs: netdev: add missing back ticks
        docs: netdev: make the testing requirement more stringent
        docs: netdev: add a question about re-posting frequency
        docs: netdev: rephrase the 'should I update patchwork' question
        docs: netdev: rephrase the 'Under review' question
        docs: netdev: shorten the name and mention msgid for patch status
        docs: netdev: note that RFC postings are allowed any time
        docs: netdev: turn the net-next closed into a Warning
        docs: netdev: move the patch marking section up
        docs: netdev: minor reword
        docs: netdev: replace references to old archives
        ...
      2975dbdc
    • Linus Torvalds's avatar
      Merge tag 'v5.18-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 · 93235e3d
      Linus Torvalds authored
      Pull crypto fixes from Herbert Xu:
      
       - Missing Kconfig dependency on arm that leads to boot failure
      
       - x86 SLS fixes
      
       - Reference leak in the stm32 driver
      
      * tag 'v5.18-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
        crypto: x86/sm3 - Fixup SLS
        crypto: x86/poly1305 - Fixup SLS
        crypto: x86/chacha20 - Avoid spurious jumps to other functions
        crypto: stm32 - fix reference leak in stm32_crc_remove
        crypto: arm/aes-neonbs-cbc - Select generic cbc and aes
      93235e3d
    • Eric Dumazet's avatar
      vxlan: do not feed vxlan_vnifilter_dump_dev with non vxlan devices · 9d570741
      Eric Dumazet authored
      vxlan_vnifilter_dump_dev() assumes it is called only
      for vxlan devices. Make sure it is the case.
      
      BUG: KASAN: slab-out-of-bounds in vxlan_vnifilter_dump_dev+0x9a0/0xb40 drivers/net/vxlan/vxlan_vnifilter.c:349
      Read of size 4 at addr ffff888060d1ce70 by task syz-executor.3/17662
      
      CPU: 0 PID: 17662 Comm: syz-executor.3 Tainted: G        W         5.17.0-syzkaller-12888-g77c9387c #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       <TASK>
       __dump_stack lib/dump_stack.c:88 [inline]
       dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
       print_address_description.constprop.0.cold+0xeb/0x495 mm/kasan/report.c:313
       print_report mm/kasan/report.c:429 [inline]
       kasan_report.cold+0xf4/0x1c6 mm/kasan/report.c:491
       vxlan_vnifilter_dump_dev+0x9a0/0xb40 drivers/net/vxlan/vxlan_vnifilter.c:349
       vxlan_vnifilter_dump+0x3ff/0x650 drivers/net/vxlan/vxlan_vnifilter.c:428
       netlink_dump+0x4b5/0xb70 net/netlink/af_netlink.c:2270
       __netlink_dump_start+0x647/0x900 net/netlink/af_netlink.c:2375
       netlink_dump_start include/linux/netlink.h:245 [inline]
       rtnetlink_rcv_msg+0x70c/0xb80 net/core/rtnetlink.c:5953
       netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2496
       netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline]
       netlink_unicast+0x543/0x7f0 net/netlink/af_netlink.c:1345
       netlink_sendmsg+0x904/0xe00 net/netlink/af_netlink.c:1921
       sock_sendmsg_nosec net/socket.c:705 [inline]
       sock_sendmsg+0xcf/0x120 net/socket.c:725
       ____sys_sendmsg+0x6e2/0x800 net/socket.c:2413
       ___sys_sendmsg+0xf3/0x170 net/socket.c:2467
       __sys_sendmsg+0xe5/0x1b0 net/socket.c:2496
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x35/0x80 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      RIP: 0033:0x7f87b8e89049
      
      Fixes: f9c4bb0b ("vxlan: vni filtering support on collect metadata device")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Acked-by: default avatarRoopa Prabhu <roopa@nvidia.com>
      Link: https://lore.kernel.org/r/20220330194643.2706132-1-eric.dumazet@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      9d570741
    • Stéphane Graber's avatar
      openvswitch: Add recirc_id to recirc warning · ea07af2e
      Stéphane Graber authored
      When hitting the recirculation limit, the kernel would currently log
      something like this:
      
      [   58.586597] openvswitch: ovs-system: deferred action limit reached, drop recirc action
      
      Which isn't all that useful to debug as we only have the interface name
      to go on but can't track it down to a specific flow.
      
      With this change, we now instead get:
      
      [   58.586597] openvswitch: ovs-system: deferred action limit reached, drop recirc action (recirc_id=0x9e)
      
      Which can now be correlated with the flow entries from OVS.
      Suggested-by: default avatarFrode Nordahl <frode.nordahl@canonical.com>
      Signed-off-by: default avatarStéphane Graber <stgraber@ubuntu.com>
      Tested-by: default avatarStephane Graber <stgraber@ubuntu.com>
      Acked-by: default avatarEelco Chaudron <echaudro@redhat.com>
      Link: https://lore.kernel.org/r/20220330194244.3476544-1-stgraber@ubuntu.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      ea07af2e