1. 17 Mar, 2023 1 commit
  2. 16 Mar, 2023 15 commits
    • David S. Miller's avatar
      Merge branch 'virtio_net-xdp-bugs' · 04504793
      David S. Miller authored
      Xuan Zhuo says:
      
      ====================
      virtio_net: fix two bugs related to XDP
      
      This patch set fixes two bugs related to XDP.
      These two patch is not associated.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      04504793
    • Xuan Zhuo's avatar
      virtio_net: free xdp shinfo frags when build_skb_from_xdp_buff() fails · 1a3bd6ea
      Xuan Zhuo authored
      build_skb_from_xdp_buff() may return NULL, in this case
      we need to free the frags of xdp shinfo.
      
      Fixes: fab89baf ("virtio-net: support multi-buffer xdp")
      Signed-off-by: default avatarXuan Zhuo <xuanzhuo@linux.alibaba.com>
      Acked-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Reviewed-by: default avatarYunsheng Lin <linyunsheng@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1a3bd6ea
    • Xuan Zhuo's avatar
      virtio_net: fix page_to_skb() miss headroom · fa0f1ba7
      Xuan Zhuo authored
      Because headroom is not passed to page_to_skb(), this causes the shinfo
      exceeds the range. Then the frags of shinfo are changed by other process.
      
      [  157.724634] stack segment: 0000 [#1] PREEMPT SMP NOPTI
      [  157.725358] CPU: 3 PID: 679 Comm: xdp_pass_user_f Tainted: G            E      6.2.0+ #150
      [  157.726401] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/4
      [  157.727820] RIP: 0010:skb_release_data+0x11b/0x180
      [  157.728449] Code: 44 24 02 48 83 c3 01 39 d8 7e be 48 89 d8 48 c1 e0 04 41 80 7d 7e 00 49 8b 6c 04 30 79 0c 48 89 ef e8 89 b
      [  157.730751] RSP: 0018:ffffc90000178b48 EFLAGS: 00010202
      [  157.731383] RAX: 0000000000000010 RBX: 0000000000000001 RCX: 0000000000000000
      [  157.732270] RDX: 0000000000000000 RSI: 0000000000000002 RDI: ffff888100dd0b00
      [  157.733117] RBP: 5d5d76010f6e2408 R08: ffff888100dd0b2c R09: 0000000000000000
      [  157.734013] R10: ffffffff82effd30 R11: 000000000000a14e R12: ffff88810981ffc0
      [  157.734904] R13: ffff888100dd0b00 R14: 0000000000000002 R15: 0000000000002310
      [  157.735793] FS:  00007f06121d9740(0000) GS:ffff88842fcc0000(0000) knlGS:0000000000000000
      [  157.736794] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  157.737522] CR2: 00007ffd9a56c084 CR3: 0000000104bda001 CR4: 0000000000770ee0
      [  157.738420] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [  157.739283] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [  157.740146] PKRU: 55555554
      [  157.740502] Call Trace:
      [  157.740843]  <IRQ>
      [  157.741117]  kfree_skb_reason+0x50/0x120
      [  157.741613]  __udp4_lib_rcv+0x52b/0x5e0
      [  157.742132]  ip_protocol_deliver_rcu+0xaf/0x190
      [  157.742715]  ip_local_deliver_finish+0x77/0xa0
      [  157.743280]  ip_sublist_rcv_finish+0x80/0x90
      [  157.743834]  ip_list_rcv_finish.constprop.0+0x16f/0x190
      [  157.744493]  ip_list_rcv+0x126/0x140
      [  157.744952]  __netif_receive_skb_list_core+0x29b/0x2c0
      [  157.745602]  __netif_receive_skb_list+0xed/0x160
      [  157.746190]  ? udp4_gro_receive+0x275/0x350
      [  157.746732]  netif_receive_skb_list_internal+0xf2/0x1b0
      [  157.747398]  napi_gro_receive+0xd1/0x210
      [  157.747911]  virtnet_receive+0x75/0x1c0
      [  157.748422]  virtnet_poll+0x48/0x1b0
      [  157.748878]  __napi_poll+0x29/0x1b0
      [  157.749330]  net_rx_action+0x27a/0x340
      [  157.749812]  __do_softirq+0xf3/0x2fb
      [  157.750298]  do_softirq+0xa2/0xd0
      [  157.750745]  </IRQ>
      [  157.751563]  <TASK>
      [  157.752329]  __local_bh_enable_ip+0x6d/0x80
      [  157.753178]  virtnet_xdp_set+0x482/0x860
      [  157.754159]  ? __pfx_virtnet_xdp+0x10/0x10
      [  157.755129]  dev_xdp_install+0xa4/0xe0
      [  157.756033]  dev_xdp_attach+0x20b/0x5e0
      [  157.756933]  do_setlink+0x82e/0xc90
      [  157.757777]  ? __nla_validate_parse+0x12b/0x1e0
      [  157.758744]  rtnl_setlink+0xd8/0x170
      [  157.759549]  ? mod_objcg_state+0xcb/0x320
      [  157.760328]  ? security_capable+0x37/0x60
      [  157.761209]  ? security_capable+0x37/0x60
      [  157.762072]  rtnetlink_rcv_msg+0x145/0x3d0
      [  157.762929]  ? ___slab_alloc+0x327/0x610
      [  157.763754]  ? __alloc_skb+0x141/0x170
      [  157.764533]  ? __pfx_rtnetlink_rcv_msg+0x10/0x10
      [  157.765422]  netlink_rcv_skb+0x58/0x110
      [  157.766229]  netlink_unicast+0x21f/0x330
      [  157.766951]  netlink_sendmsg+0x240/0x4a0
      [  157.767654]  sock_sendmsg+0x93/0xa0
      [  157.768434]  ? sockfd_lookup_light+0x12/0x70
      [  157.769245]  __sys_sendto+0xfe/0x170
      [  157.770079]  ? handle_mm_fault+0xe9/0x2d0
      [  157.770859]  ? preempt_count_add+0x51/0xa0
      [  157.771645]  ? up_read+0x3c/0x80
      [  157.772340]  ? do_user_addr_fault+0x1e9/0x710
      [  157.773166]  ? kvm_read_and_reset_apf_flags+0x49/0x60
      [  157.774087]  __x64_sys_sendto+0x29/0x30
      [  157.774856]  do_syscall_64+0x3c/0x90
      [  157.775518]  entry_SYSCALL_64_after_hwframe+0x72/0xdc
      [  157.776382] RIP: 0033:0x7f06122def70
      
      Fixes: 18117a84 ("virtio-net: remove xdp related info from page_to_skb()")
      Signed-off-by: default avatarXuan Zhuo <xuanzhuo@linux.alibaba.com>
      Acked-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fa0f1ba7
    • Rob Herring's avatar
      net: Use of_property_read_bool() for boolean properties · 1a87e641
      Rob Herring authored
      It is preferred to use typed property access functions (i.e.
      of_property_read_<type> functions) rather than low-level
      of_get_property/of_find_property functions for reading properties.
      Convert reading boolean properties to of_property_read_bool().
      Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Acked-by: Marc Kleine-Budde <mkl@pengutronix.de> # for net/can
      Acked-by: default avatarKalle Valo <kvalo@kernel.org>
      Acked-by: default avatarNicolas Ferre <nicolas.ferre@microchip.com>
      Acked-by: default avatarFrancois Romieu <romieu@fr.zoreil.com>
      Reviewed-by: default avatarWei Fang <wei.fang@nxp.com>
      Signed-off-by: default avatarRob Herring <robh@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1a87e641
    • David S. Miller's avatar
      Merge branch 'net-dsa-marvell-mtu-reporting' · 65d63e82
      David S. Miller authored
      Vladimir Oltean says:
      
      ====================
      Fix MTU reporting for Marvell DSA switches where we can't change it
      
      As explained in patch 2, the driver doesn't know how to change the MTU
      on MV88E6165, MV88E6191, MV88E6220, MV88E6250 and MV88E6290, and there
      is a regression where it actually reports an MTU value below the
      Ethernet standard (1500).
      
      Fixing that shows another issue where DSA is unprepared to be told that
      a switch supports an MTU of only 1500, and still errors out. That is
      addressed by patch 1.
      
      Testing was not done on "real" hardware, but on a different Marvell DSA
      switch, with code modified such that the driver doesn't know how to
      change the MTU on that, either.
      
      A key assumption is that these switches don't need any MTU configuration
      to pass full MTU-sized, DSA-tagged packets, which seems like a
      reasonable assumption to make. My 6390 and 6190 switches, with
      .port_set_jumbo_size commented out, certainly don't seem to have any
      problem passing MTU-sized traffic, as can be seen in this iperf3 session
      captured with tcpdump on the DSA master:
      
      $MAC > $MAC, Marvell DSA mode Forward, dev 2, port 8, untagged, VID 1000,
      	FPri 0, ethertype IPv4 (0x0800), length 1518:
      	10.0.0.69.49590 > 10.0.0.1.5201: Flags [.], seq 81088:82536,
      	ack 1, win 502, options [nop,nop,TS val 2221498829 ecr 3012859850],
      	length 1448
      
      I don't want to go all the way and say that the adjustment made by
      commit b9c587fe ("dsa: mv88e6xxx: Include tagger overhead when
      setting MTU for DSA and CPU ports") is completely unnecessary, just that
      there's an equally good chance that the switches with unknown MTU
      configuration procedure "just work".
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      65d63e82
    • Vladimir Oltean's avatar
      net: dsa: mv88e6xxx: fix max_mtu of 1492 on 6165, 6191, 6220, 6250, 6290 · 7e951737
      Vladimir Oltean authored
      There are 3 classes of switch families that the driver is aware of, as
      far as mv88e6xxx_change_mtu() is concerned:
      
      - MTU configuration is available per port. Here, the
        chip->info->ops->port_set_jumbo_size() method will be present.
      
      - MTU configuration is global to the switch. Here, the
        chip->info->ops->set_max_frame_size() method will be present.
      
      - We don't know how to change the MTU. Here, none of the above methods
        will be present.
      
      Switch families MV88E6165, MV88E6191, MV88E6220, MV88E6250 and MV88E6290
      fall in category 3.
      
      The blamed commit has adjusted the MTU for all 3 categories by EDSA_HLEN
      (8 bytes), resulting in a new maximum MTU of 1492 being reported by the
      driver for these switches.
      
      I don't have the hardware to test, but I do have a MV88E6390 switch on
      which I can simulate this by commenting out its .port_set_jumbo_size
      definition from mv88e6390_ops. The result is this set of messages at
      probe time:
      
      mv88e6085 d0032004.mdio-mii:10: nonfatal error -34 setting MTU to 1500 on port 1
      mv88e6085 d0032004.mdio-mii:10: nonfatal error -34 setting MTU to 1500 on port 2
      mv88e6085 d0032004.mdio-mii:10: nonfatal error -34 setting MTU to 1500 on port 3
      mv88e6085 d0032004.mdio-mii:10: nonfatal error -34 setting MTU to 1500 on port 4
      mv88e6085 d0032004.mdio-mii:10: nonfatal error -34 setting MTU to 1500 on port 5
      mv88e6085 d0032004.mdio-mii:10: nonfatal error -34 setting MTU to 1500 on port 6
      mv88e6085 d0032004.mdio-mii:10: nonfatal error -34 setting MTU to 1500 on port 7
      mv88e6085 d0032004.mdio-mii:10: nonfatal error -34 setting MTU to 1500 on port 8
      
      It is highly implausible that there exist Ethernet switches which don't
      support the standard MTU of 1500 octets, and this is what the DSA
      framework says as well - the error comes from dsa_slave_create() ->
      dsa_slave_change_mtu(slave_dev, ETH_DATA_LEN).
      
      But the error messages are alarming, and it would be good to suppress
      them.
      
      As a consequence of this unlikeliness, we reimplement mv88e6xxx_get_max_mtu()
      and mv88e6xxx_change_mtu() on switches from the 3rd category as follows:
      the maximum supported MTU is 1500, and any request to set the MTU to a
      value larger than that fails in dev_validate_mtu().
      
      Fixes: b9c587fe ("dsa: mv88e6xxx: Include tagger overhead when setting MTU for DSA and CPU ports")
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7e951737
    • Vladimir Oltean's avatar
      net: dsa: don't error out when drivers return ETH_DATA_LEN in .port_max_mtu() · 636e8adf
      Vladimir Oltean authored
      Currently, when dsa_slave_change_mtu() is called on a user port where
      dev->max_mtu is 1500 (as returned by ds->ops->port_max_mtu()), the code
      will stumble upon this check:
      
      	if (new_master_mtu > mtu_limit)
      		return -ERANGE;
      
      because new_master_mtu is adjusted for the tagger overhead but mtu_limit
      is not.
      
      But it would be good if the logic went through, for example if the DSA
      master really depends on an MTU adjustment to accept DSA-tagged frames.
      
      To make the code pass through the check, we need to adjust mtu_limit for
      the overhead as well, if the minimum restriction was caused by the DSA
      user port's MTU (dev->max_mtu). A DSA user port MTU and a DSA master MTU
      are always offset by the protocol overhead.
      
      Currently no drivers return 1500 .port_max_mtu(), but this is only
      temporary and a bug in itself - mv88e6xxx should have done that, but
      since commit b9c587fe ("dsa: mv88e6xxx: Include tagger overhead when
      setting MTU for DSA and CPU ports") it no longer does. This is a
      preparation for fixing that.
      
      Fixes: bfcb8132 ("net: dsa: configure the MTU for switch ports")
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      636e8adf
    • Maciej Fijalkowski's avatar
      ice: xsk: disable txq irq before flushing hw · b830c964
      Maciej Fijalkowski authored
      ice_qp_dis() intends to stop a given queue pair that is a target of xsk
      pool attach/detach. One of the steps is to disable interrupts on these
      queues. It currently is broken in a way that txq irq is turned off
      *after* HW flush which in turn takes no effect.
      
      ice_qp_dis():
      -> ice_qvec_dis_irq()
      --> disable rxq irq
      --> flush hw
      -> ice_vsi_stop_tx_ring()
      -->disable txq irq
      
      Below splat can be triggered by following steps:
      - start xdpsock WITHOUT loading xdp prog
      - run xdp_rxq_info with XDP_TX action on this interface
      - start traffic
      - terminate xdpsock
      
      [  256.312485] BUG: kernel NULL pointer dereference, address: 0000000000000018
      [  256.319560] #PF: supervisor read access in kernel mode
      [  256.324775] #PF: error_code(0x0000) - not-present page
      [  256.329994] PGD 0 P4D 0
      [  256.332574] Oops: 0000 [#1] PREEMPT SMP NOPTI
      [  256.337006] CPU: 3 PID: 32 Comm: ksoftirqd/3 Tainted: G           OE      6.2.0-rc5+ #51
      [  256.345218] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0008.031920191559 03/19/2019
      [  256.355807] RIP: 0010:ice_clean_rx_irq_zc+0x9c/0x7d0 [ice]
      [  256.361423] Code: b7 8f 8a 00 00 00 66 39 ca 0f 84 f1 04 00 00 49 8b 47 40 4c 8b 24 d0 41 0f b7 45 04 66 25 ff 3f 66 89 04 24 0f 84 85 02 00 00 <49> 8b 44 24 18 0f b7 14 24 48 05 00 01 00 00 49 89 04 24 49 89 44
      [  256.380463] RSP: 0018:ffffc900088bfd20 EFLAGS: 00010206
      [  256.385765] RAX: 000000000000003c RBX: 0000000000000035 RCX: 000000000000067f
      [  256.393012] RDX: 0000000000000775 RSI: 0000000000000000 RDI: ffff8881deb3ac80
      [  256.400256] RBP: 000000000000003c R08: ffff889847982710 R09: 0000000000010000
      [  256.407500] R10: ffffffff82c060c0 R11: 0000000000000004 R12: 0000000000000000
      [  256.414746] R13: ffff88811165eea0 R14: ffffc9000d255000 R15: ffff888119b37600
      [  256.421990] FS:  0000000000000000(0000) GS:ffff8897e0cc0000(0000) knlGS:0000000000000000
      [  256.430207] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  256.436036] CR2: 0000000000000018 CR3: 0000000005c0a006 CR4: 00000000007706e0
      [  256.443283] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [  256.450527] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [  256.457770] PKRU: 55555554
      [  256.460529] Call Trace:
      [  256.463015]  <TASK>
      [  256.465157]  ? ice_xmit_zc+0x6e/0x150 [ice]
      [  256.469437]  ice_napi_poll+0x46d/0x680 [ice]
      [  256.473815]  ? _raw_spin_unlock_irqrestore+0x1b/0x40
      [  256.478863]  __napi_poll+0x29/0x160
      [  256.482409]  net_rx_action+0x136/0x260
      [  256.486222]  __do_softirq+0xe8/0x2e5
      [  256.489853]  ? smpboot_thread_fn+0x2c/0x270
      [  256.494108]  run_ksoftirqd+0x2a/0x50
      [  256.497747]  smpboot_thread_fn+0x1c1/0x270
      [  256.501907]  ? __pfx_smpboot_thread_fn+0x10/0x10
      [  256.506594]  kthread+0xea/0x120
      [  256.509785]  ? __pfx_kthread+0x10/0x10
      [  256.513597]  ret_from_fork+0x29/0x50
      [  256.517238]  </TASK>
      
      In fact, irqs were not disabled and napi managed to be scheduled and run
      while xsk_pool pointer was still valid, but SW ring of xdp_buff pointers
      was already freed.
      
      To fix this, call ice_qvec_dis_irq() after ice_vsi_stop_tx_ring(). Also
      while at it, remove redundant ice_clean_rx_ring() call - this is handled
      in ice_qp_clean_rings().
      
      Fixes: 2d4238f5 ("ice: Add support for AF_XDP")
      Signed-off-by: default avatarMaciej Fijalkowski <maciej.fijalkowski@intel.com>
      Reviewed-by: default avatarLarysa Zaremba <larysa.zaremba@intel.com>
      Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com> (A Contingent Worker at Intel)
      Acked-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      Reviewed-by: default avatarLeon Romanovsky <leonro@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b830c964
    • David S. Miller's avatar
      Merge branch 'net-virtio-vsock' · 1a200b51
      David S. Miller authored
      1a200b51
    • Arseniy Krasnov's avatar
      test/vsock: copy to user failure test · 7e699d2a
      Arseniy Krasnov authored
      This adds SOCK_STREAM and SOCK_SEQPACKET tests for invalid buffer case.
      It tries to read data to NULL buffer (data already presents in socket's
      queue), then uses valid buffer. For SOCK_STREAM second read must return
      data, because skbuff is not dropped, but for SOCK_SEQPACKET skbuff will
      be dropped by kernel, and 'recv()' will return EAGAIN.
      Signed-off-by: default avatarArseniy Krasnov <AVKrasnov@sberdevices.ru>
      Reviewed-by: default avatarStefano Garzarella <sgarzare@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7e699d2a
    • Arseniy Krasnov's avatar
      virtio/vsock: don't drop skbuff on copy failure · 8daaf39f
      Arseniy Krasnov authored
      This returns behaviour of SOCK_STREAM read as before skbuff usage. When
      copying to user fails current skbuff won't be dropped, but returned to
      sockets's queue. Technically instead of 'skb_dequeue()', 'skb_peek()' is
      called and when skbuff becomes empty, it is removed from queue by
      '__skb_unlink()'.
      
      Fixes: 71dc9ec9 ("virtio/vsock: replace virtio_vsock_pkt with sk_buff")
      Signed-off-by: default avatarArseniy Krasnov <AVKrasnov@sberdevices.ru>
      Reviewed-by: default avatarStefano Garzarella <sgarzare@redhat.com>
      Acked-by: default avatarBobby Eshleman <bobby.eshleman@bytedance.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8daaf39f
    • Arseniy Krasnov's avatar
      virtio/vsock: remove redundant 'skb_pull()' call · 6825e6b4
      Arseniy Krasnov authored
      Since we now no longer use 'skb->len' to update credit, there is no sense
      to update skbuff state, because it is used only once after dequeue to
      copy data and then will be released.
      
      Fixes: 71dc9ec9 ("virtio/vsock: replace virtio_vsock_pkt with sk_buff")
      Signed-off-by: default avatarArseniy Krasnov <AVKrasnov@sberdevices.ru>
      Reviewed-by: default avatarStefano Garzarella <sgarzare@redhat.com>
      Acked-by: default avatarBobby Eshleman <bobby.eshleman@bytedance.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6825e6b4
    • Arseniy Krasnov's avatar
      virtio/vsock: don't use skbuff state to account credit · 07770616
      Arseniy Krasnov authored
      'skb->len' can vary when we partially read the data, this complicates the
      calculation of credit to be updated in 'virtio_transport_inc_rx_pkt()/
      virtio_transport_dec_rx_pkt()'.
      
      Also in 'virtio_transport_dec_rx_pkt()' we were miscalculating the
      credit since 'skb->len' was redundant.
      
      For these reasons, let's replace the use of skbuff state to calculate new
      'rx_bytes'/'fwd_cnt' values with explicit value as input argument. This
      makes code more simple, because it is not needed to change skbuff state
      before each call to update 'rx_bytes'/'fwd_cnt'.
      
      Fixes: 71dc9ec9 ("virtio/vsock: replace virtio_vsock_pkt with sk_buff")
      Signed-off-by: default avatarArseniy Krasnov <AVKrasnov@sberdevices.ru>
      Reviewed-by: default avatarStefano Garzarella <sgarzare@redhat.com>
      Acked-by: default avatarBobby Eshleman <bobby.eshleman@bytedance.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      07770616
    • Vladimir Oltean's avatar
      net: phy: mscc: fix deadlock in phy_ethtool_{get,set}_wol() · cd356010
      Vladimir Oltean authored
      Since the blamed commit, phy_ethtool_get_wol() and phy_ethtool_set_wol()
      acquire phydev->lock, but the mscc phy driver implementations,
      vsc85xx_wol_get() and vsc85xx_wol_set(), acquire the same lock as well,
      resulting in a deadlock.
      
      $ ip link set swp3 down
      ============================================
      WARNING: possible recursive locking detected
      mscc_felix 0000:00:00.5 swp3: Link is Down
      --------------------------------------------
      ip/375 is trying to acquire lock:
      ffff3d7e82e987a8 (&dev->lock){+.+.}-{4:4}, at: vsc85xx_wol_get+0x2c/0xf4
      
      but task is already holding lock:
      ffff3d7e82e987a8 (&dev->lock){+.+.}-{4:4}, at: phy_ethtool_get_wol+0x3c/0x6c
      
      other info that might help us debug this:
       Possible unsafe locking scenario:
      
             CPU0
             ----
        lock(&dev->lock);
        lock(&dev->lock);
      
       *** DEADLOCK ***
      
       May be due to missing lock nesting notation
      
      2 locks held by ip/375:
       #0: ffffd43b2a955788 (rtnl_mutex){+.+.}-{4:4}, at: rtnetlink_rcv_msg+0x144/0x58c
       #1: ffff3d7e82e987a8 (&dev->lock){+.+.}-{4:4}, at: phy_ethtool_get_wol+0x3c/0x6c
      
      Call trace:
       __mutex_lock+0x98/0x454
       mutex_lock_nested+0x2c/0x38
       vsc85xx_wol_get+0x2c/0xf4
       phy_ethtool_get_wol+0x50/0x6c
       phy_suspend+0x84/0xcc
       phy_state_machine+0x1b8/0x27c
       phy_stop+0x70/0x154
       phylink_stop+0x34/0xc0
       dsa_port_disable_rt+0x2c/0xa4
       dsa_slave_close+0x38/0xec
       __dev_close_many+0xc8/0x16c
       __dev_change_flags+0xdc/0x218
       dev_change_flags+0x24/0x6c
       do_setlink+0x234/0xea4
       __rtnl_newlink+0x46c/0x878
       rtnl_newlink+0x50/0x7c
       rtnetlink_rcv_msg+0x16c/0x58c
      
      Removing the mutex_lock(&phydev->lock) calls from the driver restores
      the functionality.
      
      Fixes: 2f987d48 ("net: phy: Add locks to ethtool functions")
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Link: https://lore.kernel.org/r/20230314153025.2372970-1-vladimir.oltean@nxp.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      cd356010
    • Shawn Bohrer's avatar
      veth: Fix use after free in XDP_REDIRECT · 7c101318
      Shawn Bohrer authored
      Commit 718a18a0 ("veth: Rework veth_xdp_rcv_skb in order
      to accept non-linear skb") introduced a bug where it tried to
      use pskb_expand_head() if the headroom was less than
      XDP_PACKET_HEADROOM.  This however uses kmalloc to expand the head,
      which will later allow consume_skb() to free the skb while is it still
      in use by AF_XDP.
      
      Previously if the headroom was less than XDP_PACKET_HEADROOM we
      continued on to allocate a new skb from pages so this restores that
      behavior.
      
      BUG: KASAN: use-after-free in __xsk_rcv+0x18d/0x2c0
      Read of size 78 at addr ffff888976250154 by task napi/iconduit-g/148640
      
      CPU: 5 PID: 148640 Comm: napi/iconduit-g Kdump: loaded Tainted: G           O       6.1.4-cloudflare-kasan-2023.1.2 #1
      Hardware name: Quanta Computer Inc. QuantaPlex T41S-2U/S2S-MB, BIOS S2S_3B10.03 06/21/2018
      Call Trace:
        <TASK>
        dump_stack_lvl+0x34/0x48
        print_report+0x170/0x473
        ? __xsk_rcv+0x18d/0x2c0
        kasan_report+0xad/0x130
        ? __xsk_rcv+0x18d/0x2c0
        kasan_check_range+0x149/0x1a0
        memcpy+0x20/0x60
        __xsk_rcv+0x18d/0x2c0
        __xsk_map_redirect+0x1f3/0x490
        ? veth_xdp_rcv_skb+0x89c/0x1ba0 [veth]
        xdp_do_redirect+0x5ca/0xd60
        veth_xdp_rcv_skb+0x935/0x1ba0 [veth]
        ? __netif_receive_skb_list_core+0x671/0x920
        ? veth_xdp+0x670/0x670 [veth]
        veth_xdp_rcv+0x304/0xa20 [veth]
        ? do_xdp_generic+0x150/0x150
        ? veth_xdp_rcv_one+0xde0/0xde0 [veth]
        ? _raw_spin_lock_bh+0xe0/0xe0
        ? newidle_balance+0x887/0xe30
        ? __perf_event_task_sched_in+0xdb/0x800
        veth_poll+0x139/0x571 [veth]
        ? veth_xdp_rcv+0xa20/0xa20 [veth]
        ? _raw_spin_unlock+0x39/0x70
        ? finish_task_switch.isra.0+0x17e/0x7d0
        ? __switch_to+0x5cf/0x1070
        ? __schedule+0x95b/0x2640
        ? io_schedule_timeout+0x160/0x160
        __napi_poll+0xa1/0x440
        napi_threaded_poll+0x3d1/0x460
        ? __napi_poll+0x440/0x440
        ? __kthread_parkme+0xc6/0x1f0
        ? __napi_poll+0x440/0x440
        kthread+0x2a2/0x340
        ? kthread_complete_and_exit+0x20/0x20
        ret_from_fork+0x22/0x30
        </TASK>
      
      Freed by task 148640:
        kasan_save_stack+0x23/0x50
        kasan_set_track+0x21/0x30
        kasan_save_free_info+0x2a/0x40
        ____kasan_slab_free+0x169/0x1d0
        slab_free_freelist_hook+0xd2/0x190
        __kmem_cache_free+0x1a1/0x2f0
        skb_release_data+0x449/0x600
        consume_skb+0x9f/0x1c0
        veth_xdp_rcv_skb+0x89c/0x1ba0 [veth]
        veth_xdp_rcv+0x304/0xa20 [veth]
        veth_poll+0x139/0x571 [veth]
        __napi_poll+0xa1/0x440
        napi_threaded_poll+0x3d1/0x460
        kthread+0x2a2/0x340
        ret_from_fork+0x22/0x30
      
      The buggy address belongs to the object at ffff888976250000
        which belongs to the cache kmalloc-2k of size 2048
      The buggy address is located 340 bytes inside of
        2048-byte region [ffff888976250000, ffff888976250800)
      
      The buggy address belongs to the physical page:
      page:00000000ae18262a refcount:2 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x976250
      head:00000000ae18262a order:3 compound_mapcount:0 compound_pincount:0
      flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
      raw: 002ffff800010200 0000000000000000 dead000000000122 ffff88810004cf00
      raw: 0000000000000000 0000000080080008 00000002ffffffff 0000000000000000
      page dumped because: kasan: bad access detected
      
      Memory state around the buggy address:
        ffff888976250000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
        ffff888976250080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      > ffff888976250100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                                                        ^
        ffff888976250180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
        ffff888976250200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      
      Fixes: 718a18a0 ("veth: Rework veth_xdp_rcv_skb in order to accept non-linear skb")
      Signed-off-by: default avatarShawn Bohrer <sbohrer@cloudflare.com>
      Acked-by: default avatarLorenzo Bianconi <lorenzo@kernel.org>
      Acked-by: default avatarToshiaki Makita <toshiaki.makita1@gmail.com>
      Acked-by: default avatarToke Høiland-Jørgensen <toke@kernel.org>
      Link: https://lore.kernel.org/r/20230314153351.2201328-1-sbohrer@cloudflare.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      7c101318
  3. 15 Mar, 2023 16 commits
    • David S. Miller's avatar
      Merge branch 'mtk_eth_soc-SGMII-fixes' · 75014826
      David S. Miller authored
      Daniel Golle says:
      
      ====================
      net: ethernet: mtk_eth_soc: minor SGMII fixes
      
      This small series brings two minor fixes for the SGMII unit found in
      MediaTek's router SoCs.
      
      The first patch resets the PCS internal state machine on major
      configuration changes, just like it is also done in MediaTek's SDK.
      
      The second patch makes sure we only write values and restart AN if
      actually needed, thus preventing unnesseray loss of an existing link
      in some cases.
      
      Both patches have previously been submitted as part of the series
      "net: ethernet: mtk_eth_soc: various enhancements" which grew a bit
      too big and it has correctly been criticized that some of the patches
      should rather go as fixes to net-next.
      
      This new series tries to address this.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      75014826
    • Daniel Golle's avatar
      net: ethernet: mtk_eth_soc: only write values if needed · 6e933a80
      Daniel Golle authored
      Only restart auto-negotiation and write link timer if actually
      necessary. This prevents losing the link in case of minor
      changes.
      
      Fixes: 7e538372 ("net: ethernet: mediatek: Re-add support SGMII")
      Reviewed-by: default avatarRussell King (Oracle) <rmk+kernel@armlinux.org.uk>
      Tested-by: default avatarBjørn Mork <bjorn@mork.no>
      Signed-off-by: default avatarDaniel Golle <daniel@makrotopia.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6e933a80
    • Daniel Golle's avatar
      net: ethernet: mtk_eth_soc: reset PCS state · 611e2dab
      Daniel Golle authored
      Reset the internal PCS state machine when changing interface mode.
      This prevents confusing the state machine when changing interface
      modes, e.g. from SGMII to 2500Base-X or vice-versa.
      
      Fixes: 7e538372 ("net: ethernet: mediatek: Re-add support SGMII")
      Reviewed-by: default avatarRussell King (Oracle) <rmk+kernel@armlinux.org.uk>
      Tested-by: default avatarBjørn Mork <bjorn@mork.no>
      Signed-off-by: default avatarDaniel Golle <daniel@makrotopia.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      611e2dab
    • Szymon Heidrich's avatar
      net: usb: smsc75xx: Limit packet length to skb->len · d8b22831
      Szymon Heidrich authored
      Packet length retrieved from skb data may be larger than
      the actual socket buffer length (up to 9026 bytes). In such
      case the cloned skb passed up the network stack will leak
      kernel memory contents.
      
      Fixes: d0cad871 ("smsc75xx: SMSC LAN75xx USB gigabit ethernet adapter driver")
      Signed-off-by: default avatarSzymon Heidrich <szymon.heidrich@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d8b22831
    • David S. Miller's avatar
      Merge branch 'net-smc-fixes' · fd6ad75f
      David S. Miller authored
      Wenjia Zhang says:
      
      ====================
      net/smc: Fixes 2023-03-01
      
      The 1st patch solves the problem that CLC message initialization was
      not properly reversed in error handling path. And the 2nd one fixes
      the possible deadlock triggered by cancel_delayed_work_sync().
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fd6ad75f
    • Stefan Raspl's avatar
      net/smc: Fix device de-init sequence · 9d876d3e
      Stefan Raspl authored
      CLC message initialization was not properly reversed in error handling path.
      Reported-and-suggested-by: default avatarAlexander Gordeev <agordeev@linux.ibm.com>
      Signed-off-by: default avatarStefan Raspl <raspl@linux.ibm.com>
      Signed-off-by: default avatarWenjia Zhang <wenjia@linux.ibm.com>
      Reviewed-by: default avatarTony Lu <tonylu@linux.alibaba.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9d876d3e
    • Wenjia Zhang's avatar
      net/smc: fix deadlock triggered by cancel_delayed_work_syn() · 13085e1b
      Wenjia Zhang authored
      The following LOCKDEP was detected:
      		Workqueue: events smc_lgr_free_work [smc]
      		WARNING: possible circular locking dependency detected
      		6.1.0-20221027.rc2.git8.56bc5b569087.300.fc36.s390x+debug #1 Not tainted
      		------------------------------------------------------
      		kworker/3:0/176251 is trying to acquire lock:
      		00000000f1467148 ((wq_completion)smc_tx_wq-00000000#2){+.+.}-{0:0},
      			at: __flush_workqueue+0x7a/0x4f0
      		but task is already holding lock:
      		0000037fffe97dc8 ((work_completion)(&(&lgr->free_work)->work)){+.+.}-{0:0},
      			at: process_one_work+0x232/0x730
      		which lock already depends on the new lock.
      		the existing dependency chain (in reverse order) is:
      		-> #4 ((work_completion)(&(&lgr->free_work)->work)){+.+.}-{0:0}:
      		       __lock_acquire+0x58e/0xbd8
      		       lock_acquire.part.0+0xe2/0x248
      		       lock_acquire+0xac/0x1c8
      		       __flush_work+0x76/0xf0
      		       __cancel_work_timer+0x170/0x220
      		       __smc_lgr_terminate.part.0+0x34/0x1c0 [smc]
      		       smc_connect_rdma+0x15e/0x418 [smc]
      		       __smc_connect+0x234/0x480 [smc]
      		       smc_connect+0x1d6/0x230 [smc]
      		       __sys_connect+0x90/0xc0
      		       __do_sys_socketcall+0x186/0x370
      		       __do_syscall+0x1da/0x208
      		       system_call+0x82/0xb0
      		-> #3 (smc_client_lgr_pending){+.+.}-{3:3}:
      		       __lock_acquire+0x58e/0xbd8
      		       lock_acquire.part.0+0xe2/0x248
      		       lock_acquire+0xac/0x1c8
      		       __mutex_lock+0x96/0x8e8
      		       mutex_lock_nested+0x32/0x40
      		       smc_connect_rdma+0xa4/0x418 [smc]
      		       __smc_connect+0x234/0x480 [smc]
      		       smc_connect+0x1d6/0x230 [smc]
      		       __sys_connect+0x90/0xc0
      		       __do_sys_socketcall+0x186/0x370
      		       __do_syscall+0x1da/0x208
      		       system_call+0x82/0xb0
      		-> #2 (sk_lock-AF_SMC){+.+.}-{0:0}:
      		       __lock_acquire+0x58e/0xbd8
      		       lock_acquire.part.0+0xe2/0x248
      		       lock_acquire+0xac/0x1c8
      		       lock_sock_nested+0x46/0xa8
      		       smc_tx_work+0x34/0x50 [smc]
      		       process_one_work+0x30c/0x730
      		       worker_thread+0x62/0x420
      		       kthread+0x138/0x150
      		       __ret_from_fork+0x3c/0x58
      		       ret_from_fork+0xa/0x40
      		-> #1 ((work_completion)(&(&smc->conn.tx_work)->work)){+.+.}-{0:0}:
      		       __lock_acquire+0x58e/0xbd8
      		       lock_acquire.part.0+0xe2/0x248
      		       lock_acquire+0xac/0x1c8
      		       process_one_work+0x2bc/0x730
      		       worker_thread+0x62/0x420
      		       kthread+0x138/0x150
      		       __ret_from_fork+0x3c/0x58
      		       ret_from_fork+0xa/0x40
      		-> #0 ((wq_completion)smc_tx_wq-00000000#2){+.+.}-{0:0}:
      		       check_prev_add+0xd8/0xe88
      		       validate_chain+0x70c/0xb20
      		       __lock_acquire+0x58e/0xbd8
      		       lock_acquire.part.0+0xe2/0x248
      		       lock_acquire+0xac/0x1c8
      		       __flush_workqueue+0xaa/0x4f0
      		       drain_workqueue+0xaa/0x158
      		       destroy_workqueue+0x44/0x2d8
      		       smc_lgr_free+0x9e/0xf8 [smc]
      		       process_one_work+0x30c/0x730
      		       worker_thread+0x62/0x420
      		       kthread+0x138/0x150
      		       __ret_from_fork+0x3c/0x58
      		       ret_from_fork+0xa/0x40
      		other info that might help us debug this:
      		Chain exists of:
      		  (wq_completion)smc_tx_wq-00000000#2
      	  	  --> smc_client_lgr_pending
      		  --> (work_completion)(&(&lgr->free_work)->work)
      		 Possible unsafe locking scenario:
      		       CPU0                    CPU1
      		       ----                    ----
      		  lock((work_completion)(&(&lgr->free_work)->work));
      		                   lock(smc_client_lgr_pending);
      		                   lock((work_completion)
      					(&(&lgr->free_work)->work));
      		  lock((wq_completion)smc_tx_wq-00000000#2);
      		 *** DEADLOCK ***
      		2 locks held by kworker/3:0/176251:
      		 #0: 0000000080183548
      			((wq_completion)events){+.+.}-{0:0},
      				at: process_one_work+0x232/0x730
      		 #1: 0000037fffe97dc8
      			((work_completion)
      			 (&(&lgr->free_work)->work)){+.+.}-{0:0},
      				at: process_one_work+0x232/0x730
      		stack backtrace:
      		CPU: 3 PID: 176251 Comm: kworker/3:0 Not tainted
      		Hardware name: IBM 8561 T01 701 (z/VM 7.2.0)
      		Call Trace:
      		 [<000000002983c3e4>] dump_stack_lvl+0xac/0x100
      		 [<0000000028b477ae>] check_noncircular+0x13e/0x160
      		 [<0000000028b48808>] check_prev_add+0xd8/0xe88
      		 [<0000000028b49cc4>] validate_chain+0x70c/0xb20
      		 [<0000000028b4bd26>] __lock_acquire+0x58e/0xbd8
      		 [<0000000028b4cf6a>] lock_acquire.part.0+0xe2/0x248
      		 [<0000000028b4d17c>] lock_acquire+0xac/0x1c8
      		 [<0000000028addaaa>] __flush_workqueue+0xaa/0x4f0
      		 [<0000000028addf9a>] drain_workqueue+0xaa/0x158
      		 [<0000000028ae303c>] destroy_workqueue+0x44/0x2d8
      		 [<000003ff8029af26>] smc_lgr_free+0x9e/0xf8 [smc]
      		 [<0000000028adf3d4>] process_one_work+0x30c/0x730
      		 [<0000000028adf85a>] worker_thread+0x62/0x420
      		 [<0000000028aeac50>] kthread+0x138/0x150
      		 [<0000000028a63914>] __ret_from_fork+0x3c/0x58
      		 [<00000000298503da>] ret_from_fork+0xa/0x40
      		INFO: lockdep is turned off.
      ===================================================================
      
      This deadlock occurs because cancel_delayed_work_sync() waits for
      the work(&lgr->free_work) to finish, while the &lgr->free_work
      waits for the work(lgr->tx_wq), which needs the sk_lock-AF_SMC, that
      is already used under the mutex_lock.
      
      The solution is to use cancel_delayed_work() instead, which kills
      off a pending work.
      
      Fixes: a52bcc91 ("net/smc: improve termination processing")
      Signed-off-by: default avatarWenjia Zhang <wenjia@linux.ibm.com>
      Reviewed-by: default avatarJan Karcher <jaka@linux.ibm.com>
      Reviewed-by: default avatarKarsten Graul <kgraul@linux.ibm.com>
      Reviewed-by: default avatarTony Lu <tonylu@linux.alibaba.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      13085e1b
    • Ido Schimmel's avatar
      mlxsw: spectrum: Fix incorrect parsing depth after reload · 35c35692
      Ido Schimmel authored
      Spectrum ASICs have a configurable limit on how deep into the packet
      they parse. By default, the limit is 96 bytes.
      
      There are several cases where this parsing depth is not enough and there
      is a need to increase it. For example, timestamping of PTP packets and a
      FIB multipath hash policy that requires hashing on inner fields. The
      driver therefore maintains a reference count that reflects the number of
      consumers that require an increased parsing depth.
      
      During reload_down() the parsing depth reference count does not
      necessarily drop to zero, but the parsing depth itself is restored to
      the default during reload_up() when the firmware is reset. It is
      therefore possible to end up in situations where the driver thinks that
      the parsing depth was increased (reference count is non-zero), when it
      is not.
      
      Fix by making sure that all the consumers that increase the parsing
      depth reference count also decrease it during reload_down().
      Specifically, make sure that when the routing code is de-initialized it
      drops the reference count if it was increased because of a FIB multipath
      hash policy that requires hashing on inner fields.
      
      Add a warning if the reference count is not zero after the driver was
      de-initialized and explicitly reset it to zero during initialization for
      good measures.
      
      Fixes: 2d91f080 ("mlxsw: spectrum: Add infrastructure for parsing configuration")
      Reported-by: default avatarMaksym Yaremchuk <maksymy@nvidia.com>
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: default avatarPetr Machata <petrm@nvidia.com>
      Link: https://lore.kernel.org/r/9c35e1b3e6c1d8f319a2449d14e2b86373f3b3ba.1678727526.git.petrm@nvidia.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      35c35692
    • Lorenzo Bianconi's avatar
      veth: rely on rtnl_dereference() instead of on rcu_dereference() in veth_set_xdp_features() · 5ce76fe1
      Lorenzo Bianconi authored
      Fix the following kernel warning in veth_set_xdp_features routine
      relying on rtnl_dereference() instead of on rcu_dereference():
      
      =============================
      WARNING: suspicious RCU usage
      6.3.0-rc1-00144-g064d7052 #149 Not tainted
      -----------------------------
      drivers/net/veth.c:1265 suspicious rcu_dereference_check() usage!
      
      other info that might help us debug this:
      
      rcu_scheduler_active = 2, debug_locks = 1
      1 lock held by ip/135:
      (net/core/rtnetlink.c:6172)
      
      stack backtrace:
      CPU: 1 PID: 135 Comm: ip Not tainted 6.3.0-rc1-00144-g064d7052 #149
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1
      04/01/2014
      Call Trace:
       <TASK>
      dump_stack_lvl (lib/dump_stack.c:107)
      lockdep_rcu_suspicious (include/linux/context_tracking.h:152)
      veth_set_xdp_features (drivers/net/veth.c:1265 (discriminator 9))
      veth_newlink (drivers/net/veth.c:1892)
      ? veth_set_features (drivers/net/veth.c:1774)
      ? kasan_save_stack (mm/kasan/common.c:47)
      ? kasan_save_stack (mm/kasan/common.c:46)
      ? kasan_set_track (mm/kasan/common.c:52)
      ? alloc_netdev_mqs (include/linux/slab.h:737)
      ? rcu_read_lock_sched_held (kernel/rcu/update.c:125)
      ? trace_kmalloc (include/trace/events/kmem.h:54)
      ? __xdp_rxq_info_reg (net/core/xdp.c:188)
      ? alloc_netdev_mqs (net/core/dev.c:10657)
      ? rtnl_create_link (net/core/rtnetlink.c:3312)
      rtnl_newlink_create (net/core/rtnetlink.c:3440)
      ? rtnl_link_get_net_capable.constprop.0 (net/core/rtnetlink.c:3391)
      __rtnl_newlink (net/core/rtnetlink.c:3657)
      ? lock_downgrade (kernel/locking/lockdep.c:5321)
      ? rtnl_link_unregister (net/core/rtnetlink.c:3487)
      rtnl_newlink (net/core/rtnetlink.c:3671)
      rtnetlink_rcv_msg (net/core/rtnetlink.c:6174)
      ? rtnl_link_fill (net/core/rtnetlink.c:6070)
      ? mark_usage (kernel/locking/lockdep.c:4914)
      ? mark_usage (kernel/locking/lockdep.c:4914)
      netlink_rcv_skb (net/netlink/af_netlink.c:2574)
      ? rtnl_link_fill (net/core/rtnetlink.c:6070)
      ? netlink_ack (net/netlink/af_netlink.c:2551)
      ? lock_acquire (kernel/locking/lockdep.c:467)
      ? net_generic (include/linux/rcupdate.h:805)
      ? netlink_deliver_tap (include/linux/rcupdate.h:805)
      netlink_unicast (net/netlink/af_netlink.c:1340)
      ? netlink_attachskb (net/netlink/af_netlink.c:1350)
      netlink_sendmsg (net/netlink/af_netlink.c:1942)
      ? netlink_unicast (net/netlink/af_netlink.c:1861)
      ? netlink_unicast (net/netlink/af_netlink.c:1861)
      sock_sendmsg (net/socket.c:727)
      ____sys_sendmsg (net/socket.c:2501)
      ? kernel_sendmsg (net/socket.c:2448)
      ? __copy_msghdr (net/socket.c:2428)
      ___sys_sendmsg (net/socket.c:2557)
      ? mark_usage (kernel/locking/lockdep.c:4914)
      ? do_recvmmsg (net/socket.c:2544)
      ? lock_acquire (kernel/locking/lockdep.c:467)
      ? find_held_lock (kernel/locking/lockdep.c:5159)
      ? __lock_release (kernel/locking/lockdep.c:5345)
      ? __might_fault (mm/memory.c:5625)
      ? lock_downgrade (kernel/locking/lockdep.c:5321)
      ? __fget_light (include/linux/atomic/atomic-arch-fallback.h:227)
      __sys_sendmsg (include/linux/file.h:31)
      ? __sys_sendmsg_sock (net/socket.c:2572)
      ? rseq_get_rseq_cs (kernel/rseq.c:275)
      ? lockdep_hardirqs_on_prepare.part.0 (kernel/locking/lockdep.c:4263)
      do_syscall_64 (arch/x86/entry/common.c:50)
      entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120)
      RIP: 0033:0x7f0d1aadeb17
      Code: 0f 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b9 0f 1f 00 f3 0f 1e
      fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 2e 00 00 00 0f 05 <48> 3d 00
      f0 ff ff 77 51 c3 48 83 ec 28 89 54 24 1c 48 89 74 24 10
      
      Fixes: fccca038 ("veth: take into account device reconfiguration for xdp_features flag")
      Suggested-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarMatthieu Baerts <matthieu.baerts@tessares.net>
      Link: https://lore.kernel.org/netdev/cover.1678364612.git.lorenzo@kernel.org/T/#me4c9d8e985ec7ebee981cfdb5bc5ec651ef4035dSigned-off-by: default avatarLorenzo Bianconi <lorenzo@kernel.org>
      Reported-by: syzbot+c3d0d9c42d59ff644ea6@syzkaller.appspotmail.com
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Tested-by: default avatarMatthieu Baerts <matthieu.baerts@tessares.net>
      Link: https://lore.kernel.org/r/dfd6a9a7d85e9113063165e1f47b466b90ad7b8a.1678748579.git.lorenzo@kernel.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      5ce76fe1
    • Zheng Wang's avatar
      nfc: st-nci: Fix use after free bug in ndlc_remove due to race condition · 5000fe6c
      Zheng Wang authored
      This bug influences both st_nci_i2c_remove and st_nci_spi_remove.
      Take st_nci_i2c_remove as an example.
      
      In st_nci_i2c_probe, it called ndlc_probe and bound &ndlc->sm_work
      with llt_ndlc_sm_work.
      
      When it calls ndlc_recv or timeout handler, it will finally call
      schedule_work to start the work.
      
      When we call st_nci_i2c_remove to remove the driver, there
      may be a sequence as follows:
      
      Fix it by finishing the work before cleanup in ndlc_remove
      
      CPU0                  CPU1
      
                          |llt_ndlc_sm_work
      st_nci_i2c_remove   |
        ndlc_remove       |
           st_nci_remove  |
           nci_free_device|
           kfree(ndev)    |
      //free ndlc->ndev   |
                          |llt_ndlc_rcv_queue
                          |nci_recv_frame
                          |//use ndlc->ndev
      
      Fixes: 35630df6 ("NFC: st21nfcb: Add driver for STMicroelectronics ST21NFCB NFC chip")
      Signed-off-by: default avatarZheng Wang <zyytlz.wz@163.com>
      Reviewed-by: default avatarKrzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
      Link: https://lore.kernel.org/r/20230312160837.2040857-1-zyytlz.wz@163.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      5000fe6c
    • Jakub Kicinski's avatar
      Merge branch 'tcp-fix-bind-regression-for-dual-stack-wildcard-address' · cf18d55e
      Jakub Kicinski authored
      Kuniyuki Iwashima says:
      
      ====================
      tcp: Fix bind() regression for dual-stack wildcard address.
      
      The first patch fixes the regression reported in [0], and the second
      patch adds a test for similar cases to catch future regression.
      
      [0]: https://lore.kernel.org/netdev/e21bf153-80b0-9ec0-15ba-e04a4ad42c34@redhat.com/
      ====================
      
      Link: https://lore.kernel.org/r/20230312031904.4674-1-kuniyu@amazon.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      cf18d55e
    • Kuniyuki Iwashima's avatar
      selftest: Add test for bind() conflicts. · 13715acf
      Kuniyuki Iwashima authored
      The test checks if (IPv4, IPv6) address pair properly conflict or not.
      
        * IPv4
          * 0.0.0.0
          * 127.0.0.1
      
        * IPv6
          * ::
          * ::1
      
      If the IPv6 address is [::], the second bind() always fails.
      Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      13715acf
    • Kuniyuki Iwashima's avatar
      tcp: Fix bind() conflict check for dual-stack wildcard address. · d9ba9934
      Kuniyuki Iwashima authored
      Paul Holzinger reported [0] that commit 5456262d ("net: Fix
      incorrect address comparison when searching for a bind2 bucket")
      introduced a bind() regression.  Paul also gave a nice repro that
      calls two types of bind() on the same port, both of which now
      succeed, but the second call should fail:
      
        bind(fd1, ::, port) + bind(fd2, 127.0.0.1, port)
      
      The cited commit added address family tests in three functions to
      fix the uninit-value KMSAN report. [1]  However, the test added to
      inet_bind2_bucket_match_addr_any() removed a necessary conflict
      check; the dual-stack wildcard address no longer conflicts with
      an IPv4 non-wildcard address.
      
      If tb->family is AF_INET6 and sk->sk_family is AF_INET in
      inet_bind2_bucket_match_addr_any(), we still need to check
      if tb has the dual-stack wildcard address.
      
      Note that the IPv4 wildcard address does not conflict with
      IPv6 non-wildcard addresses.
      
      [0]: https://lore.kernel.org/netdev/e21bf153-80b0-9ec0-15ba-e04a4ad42c34@redhat.com/
      [1]: https://lore.kernel.org/netdev/CAG_fn=Ud3zSW7AZWXc+asfMhZVL5ETnvuY44Pmyv4NPv-ijN-A@mail.gmail.com/
      
      Fixes: 5456262d ("net: Fix incorrect address comparison when searching for a bind2 bucket")
      Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Reported-by: default avatarPaul Holzinger <pholzing@redhat.com>
      Link: https://lore.kernel.org/netdev/CAG_fn=Ud3zSW7AZWXc+asfMhZVL5ETnvuY44Pmyv4NPv-ijN-A@mail.gmail.com/Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Tested-by: default avatarPaul Holzinger <pholzing@redhat.com>
      Reviewed-by: default avatarMartin KaFai Lau <martin.lau@kernel.org>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      d9ba9934
    • Heiner Kallweit's avatar
      net: phy: smsc: bail out in lan87xx_read_status if genphy_read_status fails · c22c3bbf
      Heiner Kallweit authored
      If genphy_read_status fails then further access to the PHY may result
      in unpredictable behavior. To prevent this bail out immediately if
      genphy_read_status fails.
      
      Fixes: 4223dbff ("net: phy: smsc: Re-enable EDPD mode for LAN87xx")
      Signed-off-by: default avatarHeiner Kallweit <hkallweit1@gmail.com>
      Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Link: https://lore.kernel.org/r/026aa4f2-36f5-1c10-ab9f-cdb17dda6ac4@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      c22c3bbf
    • Eric Dumazet's avatar
      net: tunnels: annotate lockless accesses to dev->needed_headroom · 4b397c06
      Eric Dumazet authored
      IP tunnels can apparently update dev->needed_headroom
      in their xmit path.
      
      This patch takes care of three tunnels xmit, and also the
      core LL_RESERVED_SPACE() and LL_RESERVED_SPACE_EXTRA()
      helpers.
      
      More changes might be needed for completeness.
      
      BUG: KCSAN: data-race in ip_tunnel_xmit / ip_tunnel_xmit
      
      read to 0xffff88815b9da0ec of 2 bytes by task 888 on cpu 1:
      ip_tunnel_xmit+0x1270/0x1730 net/ipv4/ip_tunnel.c:803
      __gre_xmit net/ipv4/ip_gre.c:469 [inline]
      ipgre_xmit+0x516/0x570 net/ipv4/ip_gre.c:661
      __netdev_start_xmit include/linux/netdevice.h:4881 [inline]
      netdev_start_xmit include/linux/netdevice.h:4895 [inline]
      xmit_one net/core/dev.c:3580 [inline]
      dev_hard_start_xmit+0x127/0x400 net/core/dev.c:3596
      __dev_queue_xmit+0x1007/0x1eb0 net/core/dev.c:4246
      dev_queue_xmit include/linux/netdevice.h:3051 [inline]
      neigh_direct_output+0x17/0x20 net/core/neighbour.c:1623
      neigh_output include/net/neighbour.h:546 [inline]
      ip_finish_output2+0x740/0x840 net/ipv4/ip_output.c:228
      ip_finish_output+0xf4/0x240 net/ipv4/ip_output.c:316
      NF_HOOK_COND include/linux/netfilter.h:291 [inline]
      ip_output+0xe5/0x1b0 net/ipv4/ip_output.c:430
      dst_output include/net/dst.h:444 [inline]
      ip_local_out+0x64/0x80 net/ipv4/ip_output.c:126
      iptunnel_xmit+0x34a/0x4b0 net/ipv4/ip_tunnel_core.c:82
      ip_tunnel_xmit+0x1451/0x1730 net/ipv4/ip_tunnel.c:813
      __gre_xmit net/ipv4/ip_gre.c:469 [inline]
      ipgre_xmit+0x516/0x570 net/ipv4/ip_gre.c:661
      __netdev_start_xmit include/linux/netdevice.h:4881 [inline]
      netdev_start_xmit include/linux/netdevice.h:4895 [inline]
      xmit_one net/core/dev.c:3580 [inline]
      dev_hard_start_xmit+0x127/0x400 net/core/dev.c:3596
      __dev_queue_xmit+0x1007/0x1eb0 net/core/dev.c:4246
      dev_queue_xmit include/linux/netdevice.h:3051 [inline]
      neigh_direct_output+0x17/0x20 net/core/neighbour.c:1623
      neigh_output include/net/neighbour.h:546 [inline]
      ip_finish_output2+0x740/0x840 net/ipv4/ip_output.c:228
      ip_finish_output+0xf4/0x240 net/ipv4/ip_output.c:316
      NF_HOOK_COND include/linux/netfilter.h:291 [inline]
      ip_output+0xe5/0x1b0 net/ipv4/ip_output.c:430
      dst_output include/net/dst.h:444 [inline]
      ip_local_out+0x64/0x80 net/ipv4/ip_output.c:126
      iptunnel_xmit+0x34a/0x4b0 net/ipv4/ip_tunnel_core.c:82
      ip_tunnel_xmit+0x1451/0x1730 net/ipv4/ip_tunnel.c:813
      __gre_xmit net/ipv4/ip_gre.c:469 [inline]
      ipgre_xmit+0x516/0x570 net/ipv4/ip_gre.c:661
      __netdev_start_xmit include/linux/netdevice.h:4881 [inline]
      netdev_start_xmit include/linux/netdevice.h:4895 [inline]
      xmit_one net/core/dev.c:3580 [inline]
      dev_hard_start_xmit+0x127/0x400 net/core/dev.c:3596
      __dev_queue_xmit+0x1007/0x1eb0 net/core/dev.c:4246
      dev_queue_xmit include/linux/netdevice.h:3051 [inline]
      neigh_direct_output+0x17/0x20 net/core/neighbour.c:1623
      neigh_output include/net/neighbour.h:546 [inline]
      ip_finish_output2+0x740/0x840 net/ipv4/ip_output.c:228
      ip_finish_output+0xf4/0x240 net/ipv4/ip_output.c:316
      NF_HOOK_COND include/linux/netfilter.h:291 [inline]
      ip_output+0xe5/0x1b0 net/ipv4/ip_output.c:430
      dst_output include/net/dst.h:444 [inline]
      ip_local_out+0x64/0x80 net/ipv4/ip_output.c:126
      iptunnel_xmit+0x34a/0x4b0 net/ipv4/ip_tunnel_core.c:82
      ip_tunnel_xmit+0x1451/0x1730 net/ipv4/ip_tunnel.c:813
      __gre_xmit net/ipv4/ip_gre.c:469 [inline]
      ipgre_xmit+0x516/0x570 net/ipv4/ip_gre.c:661
      __netdev_start_xmit include/linux/netdevice.h:4881 [inline]
      netdev_start_xmit include/linux/netdevice.h:4895 [inline]
      xmit_one net/core/dev.c:3580 [inline]
      dev_hard_start_xmit+0x127/0x400 net/core/dev.c:3596
      __dev_queue_xmit+0x1007/0x1eb0 net/core/dev.c:4246
      dev_queue_xmit include/linux/netdevice.h:3051 [inline]
      neigh_direct_output+0x17/0x20 net/core/neighbour.c:1623
      neigh_output include/net/neighbour.h:546 [inline]
      ip_finish_output2+0x740/0x840 net/ipv4/ip_output.c:228
      ip_finish_output+0xf4/0x240 net/ipv4/ip_output.c:316
      NF_HOOK_COND include/linux/netfilter.h:291 [inline]
      ip_output+0xe5/0x1b0 net/ipv4/ip_output.c:430
      dst_output include/net/dst.h:444 [inline]
      ip_local_out+0x64/0x80 net/ipv4/ip_output.c:126
      iptunnel_xmit+0x34a/0x4b0 net/ipv4/ip_tunnel_core.c:82
      ip_tunnel_xmit+0x1451/0x1730 net/ipv4/ip_tunnel.c:813
      __gre_xmit net/ipv4/ip_gre.c:469 [inline]
      ipgre_xmit+0x516/0x570 net/ipv4/ip_gre.c:661
      __netdev_start_xmit include/linux/netdevice.h:4881 [inline]
      netdev_start_xmit include/linux/netdevice.h:4895 [inline]
      xmit_one net/core/dev.c:3580 [inline]
      dev_hard_start_xmit+0x127/0x400 net/core/dev.c:3596
      __dev_queue_xmit+0x1007/0x1eb0 net/core/dev.c:4246
      dev_queue_xmit include/linux/netdevice.h:3051 [inline]
      neigh_direct_output+0x17/0x20 net/core/neighbour.c:1623
      neigh_output include/net/neighbour.h:546 [inline]
      ip_finish_output2+0x740/0x840 net/ipv4/ip_output.c:228
      ip_finish_output+0xf4/0x240 net/ipv4/ip_output.c:316
      NF_HOOK_COND include/linux/netfilter.h:291 [inline]
      ip_output+0xe5/0x1b0 net/ipv4/ip_output.c:430
      dst_output include/net/dst.h:444 [inline]
      ip_local_out+0x64/0x80 net/ipv4/ip_output.c:126
      iptunnel_xmit+0x34a/0x4b0 net/ipv4/ip_tunnel_core.c:82
      ip_tunnel_xmit+0x1451/0x1730 net/ipv4/ip_tunnel.c:813
      __gre_xmit net/ipv4/ip_gre.c:469 [inline]
      ipgre_xmit+0x516/0x570 net/ipv4/ip_gre.c:661
      __netdev_start_xmit include/linux/netdevice.h:4881 [inline]
      netdev_start_xmit include/linux/netdevice.h:4895 [inline]
      xmit_one net/core/dev.c:3580 [inline]
      dev_hard_start_xmit+0x127/0x400 net/core/dev.c:3596
      __dev_queue_xmit+0x1007/0x1eb0 net/core/dev.c:4246
      dev_queue_xmit include/linux/netdevice.h:3051 [inline]
      neigh_direct_output+0x17/0x20 net/core/neighbour.c:1623
      neigh_output include/net/neighbour.h:546 [inline]
      ip_finish_output2+0x740/0x840 net/ipv4/ip_output.c:228
      ip_finish_output+0xf4/0x240 net/ipv4/ip_output.c:316
      NF_HOOK_COND include/linux/netfilter.h:291 [inline]
      ip_output+0xe5/0x1b0 net/ipv4/ip_output.c:430
      dst_output include/net/dst.h:444 [inline]
      ip_local_out+0x64/0x80 net/ipv4/ip_output.c:126
      iptunnel_xmit+0x34a/0x4b0 net/ipv4/ip_tunnel_core.c:82
      ip_tunnel_xmit+0x1451/0x1730 net/ipv4/ip_tunnel.c:813
      __gre_xmit net/ipv4/ip_gre.c:469 [inline]
      ipgre_xmit+0x516/0x570 net/ipv4/ip_gre.c:661
      __netdev_start_xmit include/linux/netdevice.h:4881 [inline]
      netdev_start_xmit include/linux/netdevice.h:4895 [inline]
      xmit_one net/core/dev.c:3580 [inline]
      dev_hard_start_xmit+0x127/0x400 net/core/dev.c:3596
      __dev_queue_xmit+0x1007/0x1eb0 net/core/dev.c:4246
      
      write to 0xffff88815b9da0ec of 2 bytes by task 2379 on cpu 0:
      ip_tunnel_xmit+0x1294/0x1730 net/ipv4/ip_tunnel.c:804
      __gre_xmit net/ipv4/ip_gre.c:469 [inline]
      ipgre_xmit+0x516/0x570 net/ipv4/ip_gre.c:661
      __netdev_start_xmit include/linux/netdevice.h:4881 [inline]
      netdev_start_xmit include/linux/netdevice.h:4895 [inline]
      xmit_one net/core/dev.c:3580 [inline]
      dev_hard_start_xmit+0x127/0x400 net/core/dev.c:3596
      __dev_queue_xmit+0x1007/0x1eb0 net/core/dev.c:4246
      dev_queue_xmit include/linux/netdevice.h:3051 [inline]
      neigh_direct_output+0x17/0x20 net/core/neighbour.c:1623
      neigh_output include/net/neighbour.h:546 [inline]
      ip6_finish_output2+0x9bc/0xc50 net/ipv6/ip6_output.c:134
      __ip6_finish_output net/ipv6/ip6_output.c:195 [inline]
      ip6_finish_output+0x39a/0x4e0 net/ipv6/ip6_output.c:206
      NF_HOOK_COND include/linux/netfilter.h:291 [inline]
      ip6_output+0xeb/0x220 net/ipv6/ip6_output.c:227
      dst_output include/net/dst.h:444 [inline]
      NF_HOOK include/linux/netfilter.h:302 [inline]
      mld_sendpack+0x438/0x6a0 net/ipv6/mcast.c:1820
      mld_send_cr net/ipv6/mcast.c:2121 [inline]
      mld_ifc_work+0x519/0x7b0 net/ipv6/mcast.c:2653
      process_one_work+0x3e6/0x750 kernel/workqueue.c:2390
      worker_thread+0x5f2/0xa10 kernel/workqueue.c:2537
      kthread+0x1ac/0x1e0 kernel/kthread.c:376
      ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:308
      
      value changed: 0x0dd4 -> 0x0e14
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 0 PID: 2379 Comm: kworker/0:0 Not tainted 6.3.0-rc1-syzkaller-00002-g8ca09d5f-dirty #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/02/2023
      Workqueue: mld mld_ifc_work
      
      Fixes: 8eb30be0 ("ipv6: Create ip6_tnl_xmit")
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Link: https://lore.kernel.org/r/20230310191109.2384387-1-edumazet@google.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      4b397c06
    • Dave Ertman's avatar
      ice: avoid bonding causing auxiliary plug/unplug under RTNL lock · 248401cb
      Dave Ertman authored
      RDMA is not supported in ice on a PF that has been added to a bonded
      interface. To enforce this, when an interface enters a bond, we unplug
      the auxiliary device that supports RDMA functionality.  This unplug
      currently happens in the context of handling the netdev bonding event.
      This event is sent to the ice driver under RTNL context.  This is causing
      a deadlock where the RDMA driver is waiting for the RTNL lock to complete
      the removal.
      
      Defer the unplugging/re-plugging of the auxiliary device to the service
      task so that it is not performed under the RTNL lock context.
      
      Cc: stable@vger.kernel.org # 6.1.x
      Reported-by: default avatarJaroslav Pulchart <jaroslav.pulchart@gooddata.com>
      Link: https://lore.kernel.org/netdev/CAK8fFZ6A_Gphw_3-QMGKEFQk=sfCw1Qmq0TVZK3rtAi7vb621A@mail.gmail.com/
      Fixes: 5cb1ebdb ("ice: Fix race condition during interface enslave")
      Fixes: 4eace75e ("RDMA/irdma: Report the correct link speed")
      Signed-off-by: default avatarDave Ertman <david.m.ertman@intel.com>
      Tested-by: Arpana Arland <arpanax.arland@intel.com> (A Contingent worker at Intel)
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      Reviewed-by: default avatarLeon Romanovsky <leonro@nvidia.com>
      Link: https://lore.kernel.org/r/20230310194833.3074601-1-anthony.l.nguyen@intel.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      248401cb
  4. 14 Mar, 2023 4 commits
  5. 13 Mar, 2023 3 commits
  6. 11 Mar, 2023 1 commit