1. 25 Apr, 2022 21 commits
    • Eric Dumazet's avatar
      tcp: make sure treq->af_specific is initialized · ba5a4fdd
      Eric Dumazet authored
      syzbot complained about a recent change in TCP stack,
      hitting a NULL pointer [1]
      
      tcp request sockets have an af_specific pointer, which
      was used before the blamed change only for SYNACK generation
      in non SYNCOOKIE mode.
      
      tcp requests sockets momentarily created when third packet
      coming from client in SYNCOOKIE mode were not using
      treq->af_specific.
      
      Make sure this field is populated, in the same way normal
      TCP requests sockets do in tcp_conn_request().
      
      [1]
      TCP: request_sock_TCPv6: Possible SYN flooding on port 20002. Sending cookies.  Check SNMP counters.
      general protection fault, probably for non-canonical address 0xdffffc0000000001: 0000 [#1] PREEMPT SMP KASAN
      KASAN: null-ptr-deref in range [0x0000000000000008-0x000000000000000f]
      CPU: 1 PID: 3695 Comm: syz-executor864 Not tainted 5.18.0-rc3-syzkaller-00224-g5fd1fe48 #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      RIP: 0010:tcp_create_openreq_child+0xe16/0x16b0 net/ipv4/tcp_minisocks.c:534
      Code: 48 c1 ea 03 80 3c 02 00 0f 85 e5 07 00 00 4c 8b b3 28 01 00 00 48 b8 00 00 00 00 00 fc ff df 49 8d 7e 08 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 c9 07 00 00 48 8b 3c 24 48 89 de 41 ff 56 08 48
      RSP: 0018:ffffc90000de0588 EFLAGS: 00010202
      RAX: dffffc0000000000 RBX: ffff888076490330 RCX: 0000000000000100
      RDX: 0000000000000001 RSI: ffffffff87d67ff0 RDI: 0000000000000008
      RBP: ffff88806ee1c7f8 R08: 0000000000000000 R09: 0000000000000000
      R10: ffffffff87d67f00 R11: 0000000000000000 R12: ffff88806ee1bfc0
      R13: ffff88801b0e0368 R14: 0000000000000000 R15: 0000000000000000
      FS:  00007f517fe58700(0000) GS:ffff8880b9d00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007ffcead76960 CR3: 000000006f97b000 CR4: 00000000003506e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       <IRQ>
       tcp_v6_syn_recv_sock+0x199/0x23b0 net/ipv6/tcp_ipv6.c:1267
       tcp_get_cookie_sock+0xc9/0x850 net/ipv4/syncookies.c:207
       cookie_v6_check+0x15c3/0x2340 net/ipv6/syncookies.c:258
       tcp_v6_cookie_check net/ipv6/tcp_ipv6.c:1131 [inline]
       tcp_v6_do_rcv+0x1148/0x13b0 net/ipv6/tcp_ipv6.c:1486
       tcp_v6_rcv+0x3305/0x3840 net/ipv6/tcp_ipv6.c:1725
       ip6_protocol_deliver_rcu+0x2e9/0x1900 net/ipv6/ip6_input.c:422
       ip6_input_finish+0x14c/0x2c0 net/ipv6/ip6_input.c:464
       NF_HOOK include/linux/netfilter.h:307 [inline]
       NF_HOOK include/linux/netfilter.h:301 [inline]
       ip6_input+0x9c/0xd0 net/ipv6/ip6_input.c:473
       dst_input include/net/dst.h:461 [inline]
       ip6_rcv_finish net/ipv6/ip6_input.c:76 [inline]
       NF_HOOK include/linux/netfilter.h:307 [inline]
       NF_HOOK include/linux/netfilter.h:301 [inline]
       ipv6_rcv+0x27f/0x3b0 net/ipv6/ip6_input.c:297
       __netif_receive_skb_one_core+0x114/0x180 net/core/dev.c:5405
       __netif_receive_skb+0x24/0x1b0 net/core/dev.c:5519
       process_backlog+0x3a0/0x7c0 net/core/dev.c:5847
       __napi_poll+0xb3/0x6e0 net/core/dev.c:6413
       napi_poll net/core/dev.c:6480 [inline]
       net_rx_action+0x8ec/0xc60 net/core/dev.c:6567
       __do_softirq+0x29b/0x9c2 kernel/softirq.c:558
       invoke_softirq kernel/softirq.c:432 [inline]
       __irq_exit_rcu+0x123/0x180 kernel/softirq.c:637
       irq_exit_rcu+0x5/0x20 kernel/softirq.c:649
       sysvec_apic_timer_interrupt+0x93/0xc0 arch/x86/kernel/apic/apic.c:1097
      
      Fixes: 5b0b9e4c ("tcp: md5: incorrect tcp_header_len for incoming connections")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Francesco Ruggeri <fruggeri@arista.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ba5a4fdd
    • Eric Dumazet's avatar
      tcp: fix potential xmit stalls caused by TCP_NOTSENT_LOWAT · 4bfe744f
      Eric Dumazet authored
      I had this bug sitting for too long in my pile, it is time to fix it.
      
      Thanks to Doug Porter for reminding me of it!
      
      We had various attempts in the past, including commit
      0cbe6a8f ("tcp: remove SOCK_QUEUE_SHRUNK"),
      but the issue is that TCP stack currently only generates
      EPOLLOUT from input path, when tp->snd_una has advanced
      and skb(s) cleaned from rtx queue.
      
      If a flow has a big RTT, and/or receives SACKs, it is possible
      that the notsent part (tp->write_seq - tp->snd_nxt) reaches 0
      and no more data can be sent until tp->snd_una finally advances.
      
      What is needed is to also check if POLLOUT needs to be generated
      whenever tp->snd_nxt is advanced, from output path.
      
      This bug triggers more often after an idle period, as
      we do not receive ACK for at least one RTT. tcp_notsent_lowat
      could be a fraction of what CWND and pacing rate would allow to
      send during this RTT.
      
      In a followup patch, I will remove the bogus call
      to tcp_chrono_stop(sk, TCP_CHRONO_SNDBUF_LIMITED)
      from tcp_check_space(). Fact that we have decided to generate
      an EPOLLOUT does not mean the application has immediately
      refilled the transmit queue. This optimistic call
      might have been the reason the bug seemed not too serious.
      
      Tested:
      
      200 ms rtt, 1% packet loss, 32 MB tcp_rmem[2] and tcp_wmem[2]
      
      $ echo 500000 >/proc/sys/net/ipv4/tcp_notsent_lowat
      $ cat bench_rr.sh
      SUM=0
      for i in {1..10}
      do
       V=`netperf -H remote_host -l30 -t TCP_RR -- -r 10000000,10000 -o LOCAL_BYTES_SENT | egrep -v "MIGRATED|Bytes"`
       echo $V
       SUM=$(($SUM + $V))
      done
      echo SUM=$SUM
      
      Before patch:
      $ bench_rr.sh
      130000000
      80000000
      140000000
      140000000
      140000000
      140000000
      130000000
      40000000
      90000000
      110000000
      SUM=1140000000
      
      After patch:
      $ bench_rr.sh
      430000000
      590000000
      530000000
      450000000
      450000000
      350000000
      450000000
      490000000
      480000000
      460000000
      SUM=4680000000  # This is 410 % of the value before patch.
      
      Fixes: c9bee3b7 ("tcp: TCP_NOTSENT_LOWAT socket option")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarDoug Porter <dsp@fb.com>
      Cc: Soheil Hassas Yeganeh <soheil@google.com>
      Cc: Neal Cardwell <ncardwell@google.com>
      Acked-by: default avatarSoheil Hassas Yeganeh <soheil@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4bfe744f
    • Vladimir Oltean's avatar
      net: mscc: ocelot: don't add VID 0 to ocelot->vlans when leaving VLAN-aware bridge · 1fcb8fb3
      Vladimir Oltean authored
      DSA, through dsa_port_bridge_leave(), first notifies the port of the
      fact that it left a bridge, then, if that bridge was VLAN-aware, it
      notifies the port of the change in VLAN awareness state, towards
      VLAN-unaware mode.
      
      So ocelot_port_vlan_filtering() can be called when ocelot_port->bridge
      is NULL, and this makes ocelot_add_vlan_unaware_pvid() create a struct
      ocelot_bridge_vlan with a vid of 0 and an "untagged" setting of true on
      that port.
      
      In a way this structure correctly reflects the reality, but by design,
      VID 0 (OCELOT_STANDALONE_PVID) was not meant to be kept in the bridge
      VLAN list of the driver, but managed separately.
      
      Having OCELOT_STANDALONE_PVID in ocelot->vlans makes us trip up on
      several sanity checks that did not expect to have this VID there.
      For example, after we leave a VLAN-aware bridge and we re-join it, we
      can no longer program egress-tagged VLANs to hardware:
      
       # ip link add br0 type bridge vlan_filtering 1 && ip link set br0 up
       # ip link set swp0 master br0
       # ip link set swp0 nomaster
       # ip link set swp0 master br0
       # bridge vlan add dev swp0 vid 100
      Error: mscc_ocelot_switch_lib: Port with more than one egress-untagged VLAN cannot have egress-tagged VLANs.
      
      But this configuration is in fact supported by the hardware, since we
      could use OCELOT_PORT_TAG_NATIVE. According to its comment:
      
      /* all VLANs except the native VLAN and VID 0 are egress-tagged */
      
      yet when assessing the eligibility for this mode, we do not check for
      VID 0 in ocelot_port_uses_native_vlan(), instead we just ensure that
      ocelot_port_num_untagged_vlans() == 1. This is simply because VID 0
      doesn't have a bridge VLAN structure.
      
      The way I identify the problem is that ocelot_port_vlan_filtering(false)
      only means to call ocelot_add_vlan_unaware_pvid() when we dynamically
      turn off VLAN awareness for a bridge we are under, and the PVID changes
      from the bridge PVID to a reserved PVID based on the bridge number.
      
      Since OCELOT_STANDALONE_PVID is statically added to the VLAN table
      during ocelot_vlan_init() and never removed afterwards, calling
      ocelot_add_vlan_unaware_pvid() for it is not intended and does not serve
      any purpose.
      
      Fix the issue by avoiding the call to ocelot_add_vlan_unaware_pvid(vid=0)
      when we're resetting VLAN awareness after leaving the bridge, to become
      a standalone port.
      
      Fixes: 54c31984 ("net: mscc: ocelot: enforce FDB isolation when VLAN-unaware")
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1fcb8fb3
    • Vladimir Oltean's avatar
      net: mscc: ocelot: ignore VID 0 added by 8021q module · 9323ac36
      Vladimir Oltean authored
      Both the felix DSA driver and ocelot switchdev driver declare
      dev->features & NETIF_F_HW_VLAN_CTAG_FILTER under certain circumstances*,
      so the 8021q module will add VID 0 to our RX filter when the port goes
      up, to ensure 802.1p traffic is not dropped.
      
      We treat VID 0 as a special value (OCELOT_STANDALONE_PVID) which
      deliberately does not have a struct ocelot_bridge_vlan associated with
      it. Instead, this gets programmed to the VLAN table in ocelot_vlan_init().
      
      If we allow external calls to modify VID 0, we reach the following
      situation:
      
       # ip link add br0 type bridge vlan_filtering 1 && ip link set br0 up
       # ip link set swp0 master br0
       # ip link set swp0 up # this adds VID 0 to ocelot->vlans with untagged=false
      bridge vlan
      port              vlan-id
      swp0              1 PVID Egress Untagged # the bridge also adds VID 1
      br0               1 PVID Egress Untagged
       # bridge vlan add dev swp0 vid 100 untagged
      Error: mscc_ocelot_switch_lib: Port with egress-tagged VLANs cannot have more than one egress-untagged (native) VLAN.
      
      This configuration should have been accepted, because
      ocelot_port_manage_port_tag() should select OCELOT_PORT_TAG_NATIVE.
      Yet it isn't, because we have an entry in ocelot->vlans which says
      VID 0 should be egress-tagged, something the hardware can't do.
      
      Fix this by suppressing additions/deletions on VID 0 and managing this
      VLAN exclusively using OCELOT_STANDALONE_PVID.
      
      *DSA toggles it when the port becomes VLAN-aware by joining a VLAN-aware
      bridge. Ocelot declares it unconditionally for some reason.
      
      Fixes: 54c31984 ("net: mscc: ocelot: enforce FDB isolation when VLAN-unaware")
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9323ac36
    • Vladimir Oltean's avatar
      net: dsa: flood multicast to CPU when slave has IFF_PROMISC · 7c762e70
      Vladimir Oltean authored
      Certain DSA switches can eliminate flooding to the CPU when none of the
      ports have the IFF_ALLMULTI or IFF_PROMISC flags set. This is done by
      synthesizing a call to dsa_port_bridge_flags() for the CPU port, a call
      which normally comes from the bridge driver via switchdev.
      
      The bridge port flags and IFF_PROMISC|IFF_ALLMULTI have slightly
      different semantics, and due to inattention/lack of proper testing, the
      IFF_PROMISC flag allows unknown unicast to be flooded to the CPU, but
      not unknown multicast.
      
      This must be fixed by setting both BR_FLOOD (unicast) and BR_MCAST_FLOOD
      in the synthesized dsa_port_bridge_flags() call, since IFF_PROMISC means
      that packets should not be filtered regardless of their MAC DA.
      
      Fixes: 7569459a ("net: dsa: manage flooding on the CPU ports")
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7c762e70
    • Peilin Ye's avatar
      ip_gre, ip6_gre: Fix race condition on o_seqno in collect_md mode · 31c417c9
      Peilin Ye authored
      As pointed out by Jakub Kicinski, currently using TUNNEL_SEQ in
      collect_md mode is racy for [IP6]GRE[TAP] devices.  Consider the
      following sequence of events:
      
      1. An [IP6]GRE[TAP] device is created in collect_md mode using "ip link
         add ... external".  "ip" ignores "[o]seq" if "external" is specified,
         so TUNNEL_SEQ is off, and the device is marked as NETIF_F_LLTX (i.e.
         it uses lockless TX);
      2. Someone sets TUNNEL_SEQ on outgoing skb's, using e.g.
         bpf_skb_set_tunnel_key() in an eBPF program attached to this device;
      3. gre_fb_xmit() or __gre6_xmit() processes these skb's:
      
      	gre_build_header(skb, tun_hlen,
      			 flags, protocol,
      			 tunnel_id_to_key32(tun_info->key.tun_id),
      			 (flags & TUNNEL_SEQ) ? htonl(tunnel->o_seqno++)
      					      : 0);   ^^^^^^^^^^^^^^^^^
      
      Since we are not using the TX lock (&txq->_xmit_lock), multiple CPUs may
      try to do this tunnel->o_seqno++ in parallel, which is racy.  Fix it by
      making o_seqno atomic_t.
      
      As mentioned by Eric Dumazet in commit b790e01a ("ip_gre: lockless
      xmit"), making o_seqno atomic_t increases "chance for packets being out
      of order at receiver" when NETIF_F_LLTX is on.
      
      Maybe a better fix would be:
      
      1. Do not ignore "oseq" in external mode.  Users MUST specify "oseq" if
         they want the kernel to allow sequencing of outgoing packets;
      2. Reject all outgoing TUNNEL_SEQ packets if the device was not created
         with "oseq".
      
      Unfortunately, that would break userspace.
      
      We could now make [IP6]GRE[TAP] devices always NETIF_F_LLTX, but let us
      do it in separate patches to keep this fix minimal.
      Suggested-by: default avatarJakub Kicinski <kuba@kernel.org>
      Fixes: 77a5196a ("gre: add sequence number for collect md mode.")
      Signed-off-by: default avatarPeilin Ye <peilin.ye@bytedance.com>
      Acked-by: default avatarWilliam Tu <u9012063@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      31c417c9
    • Peilin Ye's avatar
      ip6_gre: Make o_seqno start from 0 in native mode · fde98ae9
      Peilin Ye authored
      For IP6GRE and IP6GRETAP devices, currently o_seqno starts from 1 in
      native mode.  According to RFC 2890 2.2., "The first datagram is sent
      with a sequence number of 0."  Fix it.
      
      It is worth mentioning that o_seqno already starts from 0 in collect_md
      mode, see the "if (tunnel->parms.collect_md)" clause in __gre6_xmit(),
      where tunnel->o_seqno is passed to gre_build_header() before getting
      incremented.
      
      Fixes: c12b395a ("gre: Support GRE over IPv6")
      Signed-off-by: default avatarPeilin Ye <peilin.ye@bytedance.com>
      Acked-by: default avatarWilliam Tu <u9012063@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fde98ae9
    • Peilin Ye's avatar
      ip_gre: Make o_seqno start from 0 in native mode · ff827beb
      Peilin Ye authored
      For GRE and GRETAP devices, currently o_seqno starts from 1 in native
      mode.  According to RFC 2890 2.2., "The first datagram is sent with a
      sequence number of 0."  Fix it.
      
      It is worth mentioning that o_seqno already starts from 0 in collect_md
      mode, see gre_fb_xmit(), where tunnel->o_seqno is passed to
      gre_build_header() before getting incremented.
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Signed-off-by: default avatarPeilin Ye <peilin.ye@bytedance.com>
      Acked-by: default avatarWilliam Tu <u9012063@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ff827beb
    • Dan Carpenter's avatar
      net: lan966x: fix a couple off by one bugs · 9810c58c
      Dan Carpenter authored
      The lan966x->ports[] array has lan966x->num_phys_ports elements.  These
      are assigned in lan966x_probe().  That means the > comparison should be
      changed to >=.
      
      The first off by one check is harmless but the second one could lead to
      an out of bounds access and a crash.
      
      Fixes: 5ccd66e0 ("net: lan966x: add support for interrupts from analyzer")
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9810c58c
    • liuyacan's avatar
      net/smc: sync err code when tcp connection was refused · 4e2e65e2
      liuyacan authored
      In the current implementation, when TCP initiates a connection
      to an unavailable [ip,port], ECONNREFUSED will be stored in the
      TCP socket, but SMC will not. However, some apps (like curl) use
      getsockopt(,,SO_ERROR,,) to get the error information, which makes
      them miss the error message and behave strangely.
      
      Fixes: 50717a37 ("net/smc: nonblocking connect rework")
      Signed-off-by: default avatarliuyacan <liuyacan@corp.netease.com>
      Reviewed-by: default avatarTony Lu <tonylu@linux.alibaba.com>
      Acked-by: default avatarKarsten Graul <kgraul@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4e2e65e2
    • Peng Wu's avatar
      net: hns: Add missing fwnode_handle_put in hns_mac_init · e85f8a9f
      Peng Wu authored
      In one of the error paths of the device_for_each_child_node() loop
      in hns_mac_init, add missing call to fwnode_handle_put.
      Signed-off-by: default avatarPeng Wu <wupeng58@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e85f8a9f
    • David S. Miller's avatar
      Merge branch 'hns3-fixes' · c4c89a6a
      David S. Miller authored
      Guangbin Huang says:
      
      ====================
      net: hns3: add some fixes for -net
      
      This series adds some fixes for the HNS3 ethernet driver.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c4c89a6a
    • Jian Shen's avatar
      net: hns3: add return value for mailbox handling in PF · c59d6062
      Jian Shen authored
      Currently, there are some querying mailboxes sent from VF to PF,
      and VF will wait the PF's handling result. For mailbox
      HCLGE_MBX_GET_QID_IN_PF and HCLGE_MBX_GET_RSS_KEY, it may fail
      when the input parameter is invalid, but the prototype of their
      handler function is void. In this case, PF always return success
      to VF, which may cause the VF get incorrect result.
      
      Fixes it by adding return value for these function.
      
      Fixes: 63b1279d ("net: hns3: check queue id range before using")
      Fixes: 532cfc0d ("net: hns3: add a check for index in hclge_get_rss_key()")
      Signed-off-by: default avatarJian Shen <shenjian15@huawei.com>
      Signed-off-by: default avatarGuangbin Huang <huangguangbin2@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c59d6062
    • Jian Shen's avatar
      net: hns3: add validity check for message data length · 7d413735
      Jian Shen authored
      Add validity check for message data length in function
      hclge_send_mbx_msg(), avoid unexpected overflow.
      
      Fixes: dde1a86e ("net: hns3: Add mailbox support to PF driver")
      Signed-off-by: default avatarJian Shen <shenjian15@huawei.com>
      Signed-off-by: default avatarGuangbin Huang <huangguangbin2@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7d413735
    • Jie Wang's avatar
      net: hns3: modify the return code of hclge_get_ring_chain_from_mbx · 48009e99
      Jie Wang authored
      Currently, function hclge_get_ring_chain_from_mbx will return -ENOMEM if
      ring_num is bigger than HCLGE_MBX_MAX_RING_CHAIN_PARAM_NUM. It is better to
      return -EINVAL for the invalid parameter case.
      
      So this patch fixes it by return -EINVAL in this abnormal branch.
      
      Fixes: 5d02a58d ("net: hns3: fix for buffer overflow smatch warning")
      Signed-off-by: default avatarJie Wang <wangjie125@huawei.com>
      Signed-off-by: default avatarGuangbin Huang <huangguangbin2@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      48009e99
    • Peng Li's avatar
      net: hns3: fix error log of tx/rx tqps stats · 123521b6
      Peng Li authored
      The comments in function hclge_comm_tqps_update_stats is not right,
      so fix it.
      
      Fixes: 287db5c4 ("net: hns3: create new set of common tqp stats APIs for PF and VF reuse")
      Signed-off-by: default avatarPeng Li <lipeng321@huawei.com>
      Signed-off-by: default avatarGuangbin Huang <huangguangbin2@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      123521b6
    • Hao Chen's avatar
      net: hns3: align the debugfs output to the left · 1ec1968e
      Hao Chen authored
      For debugfs node rx/tx_queue_info and rx/tx_bd_info, their output info is
      aligned to the right, it's not aligned with output of other debugfs node,
      so uniform their output info.
      
      Fixes: 907676b1 ("net: hns3: use tx bounce buffer for small packets")
      Fixes: e44c495d ("net: hns3: refactor queue info of debugfs")
      Fixes: 77e91848 ("net: hns3: refactor dump bd info of debugfs")
      Signed-off-by: default avatarHao Chen <chenhao288@hisilicon.com>
      Signed-off-by: default avatarGuangbin Huang <huangguangbin2@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1ec1968e
    • Jian Shen's avatar
      net: hns3: clear inited state and stop client after failed to register netdev · e98365af
      Jian Shen authored
      If failed to register netdev, it needs to clear INITED state and stop
      client in case of cause problem when concurrency with uninitialized
      process of driver.
      
      Fixes: a289a7e5 ("net: hns3: put off calling register_netdev() until client initialize complete")
      Signed-off-by: default avatarJian Shen <shenjian15@huawei.com>
      Signed-off-by: default avatarGuangbin Huang <huangguangbin2@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e98365af
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf · 5220a525
      David S. Miller authored
      Pablo Neira Ayuso says:
      
      ====================
      Netfilter fixes for net
      
      The following patchset contains Netfilter fixes for net:
      
      1) Fix incorrect printing of memory size of IPVS connection hash table,
         from Pengcheng Yang.
      
      2) Fix spurious EEXIST errors in nft_set_rbtree.
      
      3) Remove leftover empty flowtable file, from  Rongguang Wei.
      
      4) Fix ip6_route_me_harder() with vrf driver, from Martin Willi.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5220a525
    • Martin Willi's avatar
      netfilter: Update ip6_route_me_harder to consider L3 domain · 8ddffdb9
      Martin Willi authored
      The commit referenced below fixed packet re-routing if Netfilter mangles
      a routing key property of a packet and the packet is routed in a VRF L3
      domain. The fix, however, addressed IPv4 re-routing, only.
      
      This commit applies the same behavior for IPv6. While at it, untangle
      the nested ternary operator to make the code more readable.
      
      Fixes: 6d8b49c3 ("netfilter: Update ip_route_me_harder to consider L3 domain")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarMartin Willi <martin@strongswan.org>
      Reviewed-by: default avatarDavid Ahern <dsahern@kernel.org>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      8ddffdb9
    • Rongguang Wei's avatar
      netfilter: flowtable: Remove the empty file · b9b1e0da
      Rongguang Wei authored
      CONFIG_NF_FLOW_TABLE_IPV4 is already removed and the real user is also
      removed(nf_flow_table_ipv4.c is empty).
      
      Fixes: c42ba429 ("netfilter: flowtable: remove ipv4/ipv6 modules")
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      b9b1e0da
  2. 23 Apr, 2022 1 commit
    • Xin Long's avatar
      sctp: check asoc strreset_chunk in sctp_generate_reconf_event · 165e3e17
      Xin Long authored
      A null pointer reference issue can be triggered when the response of a
      stream reconf request arrives after the timer is triggered, such as:
      
        send Incoming SSN Reset Request --->
        CPU0:
         reconf timer is triggered,
         go to the handler code before hold sk lock
                                  <--- reply with Outgoing SSN Reset Request
        CPU1:
         process Outgoing SSN Reset Request,
         and set asoc->strreset_chunk to NULL
        CPU0:
         continue the handler code, hold sk lock,
         and try to hold asoc->strreset_chunk, crash!
      
      In Ying Xu's testing, the call trace is:
      
        [ ] BUG: kernel NULL pointer dereference, address: 0000000000000010
        [ ] RIP: 0010:sctp_chunk_hold+0xe/0x40 [sctp]
        [ ] Call Trace:
        [ ]  <IRQ>
        [ ]  sctp_sf_send_reconf+0x2c/0x100 [sctp]
        [ ]  sctp_do_sm+0xa4/0x220 [sctp]
        [ ]  sctp_generate_reconf_event+0xbd/0xe0 [sctp]
        [ ]  call_timer_fn+0x26/0x130
      
      This patch is to fix it by returning from the timer handler if asoc
      strreset_chunk is already set to NULL.
      
      Fixes: 7b9438de ("sctp: add stream reconf timer")
      Reported-by: default avatarYing Xu <yinxu@redhat.com>
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Acked-by: default avatarMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      165e3e17
  3. 22 Apr, 2022 12 commits
  4. 21 Apr, 2022 2 commits
    • Linus Torvalds's avatar
      Merge tag 'net-5.18-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 59f0c244
      Linus Torvalds authored
      Pull networking fixes from Paolo Abeni:
       "Including fixes from xfrm and can.
      
        Current release - regressions:
      
         - rxrpc: restore removed timer deletion
      
        Current release - new code bugs:
      
         - gre: fix device lookup for l3mdev use-case
      
         - xfrm: fix egress device lookup for l3mdev use-case
      
        Previous releases - regressions:
      
         - sched: cls_u32: fix netns refcount changes in u32_change()
      
         - smc: fix sock leak when release after smc_shutdown()
      
         - xfrm: limit skb_page_frag_refill use to a single page
      
         - eth: atlantic: invert deep par in pm functions, preventing null
           derefs
      
         - eth: stmmac: use readl_poll_timeout_atomic() in atomic state
      
        Previous releases - always broken:
      
         - gre: fix skb_under_panic on xmit
      
         - openvswitch: fix OOB access in reserve_sfa_size()
      
         - dsa: hellcreek: calculate checksums in tagger
      
         - eth: ice: fix crash in switchdev mode
      
         - eth: igc:
            - fix infinite loop in release_swfw_sync
            - fix scheduling while atomic"
      
      * tag 'net-5.18-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (37 commits)
        drivers: net: hippi: Fix deadlock in rr_close()
        selftests: mlxsw: vxlan_flooding_ipv6: Prevent flooding of unwanted packets
        selftests: mlxsw: vxlan_flooding: Prevent flooding of unwanted packets
        nfc: MAINTAINERS: add Bug entry
        net: stmmac: Use readl_poll_timeout_atomic() in atomic state
        doc/ip-sysctl: add bc_forwarding
        netlink: reset network and mac headers in netlink_dump()
        net: mscc: ocelot: fix broken IP multicast flooding
        net: dsa: hellcreek: Calculate checksums in tagger
        net: atlantic: invert deep par in pm functions, preventing null derefs
        can: isotp: stop timeout monitoring when no first frame was sent
        bonding: do not discard lowest hash bit for non layer3+4 hashing
        net: lan966x: Make sure to release ptp interrupt
        ipv6: make ip6_rt_gc_expire an atomic_t
        net: Handle l3mdev in ip_tunnel_init_flow
        l3mdev: l3mdev_master_upper_ifindex_by_index_rcu should be using netdev_master_upper_dev_get_rcu
        net/sched: cls_u32: fix possible leak in u32_init_knode()
        net/sched: cls_u32: fix netns refcount changes in u32_change()
        powerpc: Update MAINTAINERS for ibmvnic and VAS
        net: restore alpha order to Ethernet devices in config
        ...
      59f0c244
    • Duoming Zhou's avatar
      drivers: net: hippi: Fix deadlock in rr_close() · bc6de287
      Duoming Zhou authored
      There is a deadlock in rr_close(), which is shown below:
      
         (Thread 1)                |      (Thread 2)
                                   | rr_open()
      rr_close()                   |  add_timer()
       spin_lock_irqsave() //(1)   |  (wait a time)
       ...                         | rr_timer()
       del_timer_sync()            |  spin_lock_irqsave() //(2)
       (wait timer to stop)        |  ...
      
      We hold rrpriv->lock in position (1) of thread 1 and
      use del_timer_sync() to wait timer to stop, but timer handler
      also need rrpriv->lock in position (2) of thread 2.
      As a result, rr_close() will block forever.
      
      This patch extracts del_timer_sync() from the protection of
      spin_lock_irqsave(), which could let timer handler to obtain
      the needed lock.
      Signed-off-by: default avatarDuoming Zhou <duoming@zju.edu.cn>
      Link: https://lore.kernel.org/r/20220417125519.82618-1-duoming@zju.edu.cnSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      bc6de287
  5. 20 Apr, 2022 4 commits
    • Linus Torvalds's avatar
      Merge tag 'xtensa-20220416' of https://github.com/jcmvbkbc/linux-xtensa · b2534357
      Linus Torvalds authored
      Pull xtensa fixes from Max Filippov:
      
       - fix patching CPU selection in patch_text
      
       - fix potential deadlock in ISS platform serial driver
      
       - fix potential register clobbering in coprocessor exception handler
      
      * tag 'xtensa-20220416' of https://github.com/jcmvbkbc/linux-xtensa:
        xtensa: fix a7 clobbering in coprocessor context load/store
        arch: xtensa: platforms: Fix deadlock in rs_close()
        xtensa: patch_text: Fixup last cpu should be master
      b2534357
    • Linus Torvalds's avatar
      Merge tag 'erofs-for-5.18-rc4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs · 10c5f102
      Linus Torvalds authored
      Pull erofs fixes from Gao Xiang:
       "One patch to fix a use-after-free race related to the on-stack
        z_erofs_decompressqueue, which happens very rarely but needs to be
        fixed properly soon.
      
        The other patch fixes some sysfs Sphinx warnings"
      
      * tag 'erofs-for-5.18-rc4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
        Documentation/ABI: sysfs-fs-erofs: Fix Sphinx errors
        erofs: fix use-after-free of on-stack io[]
      10c5f102
    • Linus Torvalds's avatar
      Revert "fs/pipe: use kvcalloc to allocate a pipe_buffer array" · 906f9040
      Linus Torvalds authored
      This reverts commit 5a519c8f.
      
      It turns out that making the pipe almost arbitrarily large has some
      rather unexpected downsides.  The kernel test robot reports a kernel
      warning that is due to pipe->max_usage now growing to the point where
      the iter_file_splice_write() buffer allocation can no longer be
      satisfied as a slab allocation, and the
      
              int nbufs = pipe->max_usage;
              struct bio_vec *array = kcalloc(nbufs, sizeof(struct bio_vec),
                                              GFP_KERNEL);
      
      code sequence there will now always fail as a result.
      
      That code could be modified to use kvcalloc() too, but I feel very
      uncomfortable making those kinds of changes for a very niche use case
      that really should have other options than make these kinds of
      fundamental changes to pipe behavior.
      
      Maybe the CRIU process dumping should be multi-threaded, and use
      multiple pipes and multiple cores, rather than try to use one larger
      pipe to minimize splice() calls.
      Reported-by: default avatarkernel test robot <oliver.sang@intel.com>
      Link: https://lore.kernel.org/all/20220420073717.GD16310@xsang-OptiPlex-9020/
      Cc: Andrei Vagin <avagin@gmail.com>
      Cc: Dmitry Safonov <0x7f454c46@gmail.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      906f9040
    • Mikulas Patocka's avatar
      x86: __memcpy_flushcache: fix wrong alignment if size > 2^32 · a6823e4e
      Mikulas Patocka authored
      The first "if" condition in __memcpy_flushcache is supposed to align the
      "dest" variable to 8 bytes and copy data up to this alignment.  However,
      this condition may misbehave if "size" is greater than 4GiB.
      
      The statement min_t(unsigned, size, ALIGN(dest, 8) - dest); casts both
      arguments to unsigned int and selects the smaller one.  However, the
      cast truncates high bits in "size" and it results in misbehavior.
      
      For example:
      
      	suppose that size == 0x100000001, dest == 0x200000002
      	min_t(unsigned, size, ALIGN(dest, 8) - dest) == min_t(0x1, 0xe) == 0x1;
      	...
      	dest += 0x1;
      
      so we copy just one byte "and" dest remains unaligned.
      
      This patch fixes the bug by replacing unsigned with size_t.
      Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a6823e4e