1. 19 Aug, 2021 11 commits
  2. 18 Aug, 2021 11 commits
  3. 17 Aug, 2021 4 commits
  4. 16 Aug, 2021 10 commits
    • Lahav Schlesinger's avatar
      vrf: Reset skb conntrack connection on VRF rcv · 09e856d5
      Lahav Schlesinger authored
      To fix the "reverse-NAT" for replies.
      
      When a packet is sent over a VRF, the POST_ROUTING hooks are called
      twice: Once from the VRF interface, and once from the "actual"
      interface the packet will be sent from:
      1) First SNAT: l3mdev_l3_out() -> vrf_l3_out() -> .. -> vrf_output_direct()
           This causes the POST_ROUTING hooks to run.
      2) Second SNAT: 'ip_output()' calls POST_ROUTING hooks again.
      
      Similarly for replies, first ip_rcv() calls PRE_ROUTING hooks, and
      second vrf_l3_rcv() calls them again.
      
      As an example, consider the following SNAT rule:
      > iptables -t nat -A POSTROUTING -p udp -m udp --dport 53 -j SNAT --to-source 2.2.2.2 -o vrf_1
      
      In this case sending over a VRF will create 2 conntrack entries.
      The first is from the VRF interface, which performs the IP SNAT.
      The second will run the SNAT, but since the "expected reply" will remain
      the same, conntrack randomizes the source port of the packet:
      e..g With a socket bound to 1.1.1.1:10000, sending to 3.3.3.3:53, the conntrack
      rules are:
      udp      17 29 src=2.2.2.2 dst=3.3.3.3 sport=10000 dport=53 packets=1 bytes=68 [UNREPLIED] src=3.3.3.3 dst=2.2.2.2 sport=53 dport=61033 packets=0 bytes=0 mark=0 use=1
      udp      17 29 src=1.1.1.1 dst=3.3.3.3 sport=10000 dport=53 packets=1 bytes=68 [UNREPLIED] src=3.3.3.3 dst=2.2.2.2 sport=53 dport=10000 packets=0 bytes=0 mark=0 use=1
      
      i.e. First SNAT IP from 1.1.1.1 --> 2.2.2.2, and second the src port is
      SNAT-ed from 10000 --> 61033.
      
      But when a reply is sent (3.3.3.3:53 -> 2.2.2.2:61033) only the later
      conntrack entry is matched:
      udp      17 29 src=2.2.2.2 dst=3.3.3.3 sport=10000 dport=53 packets=1 bytes=68 src=3.3.3.3 dst=2.2.2.2 sport=53 dport=61033 packets=1 bytes=49 mark=0 use=1
      udp      17 28 src=1.1.1.1 dst=3.3.3.3 sport=10000 dport=53 packets=1 bytes=68 [UNREPLIED] src=3.3.3.3 dst=2.2.2.2 sport=53 dport=10000 packets=0 bytes=0 mark=0 use=1
      
      And a "port 61033 unreachable" ICMP packet is sent back.
      
      The issue is that when PRE_ROUTING hooks are called from vrf_l3_rcv(),
      the skb already has a conntrack flow attached to it, which means
      nf_conntrack_in() will not resolve the flow again.
      
      This means only the dest port is "reverse-NATed" (61033 -> 10000) but
      the dest IP remains 2.2.2.2, and since the socket is bound to 1.1.1.1 it's
      not received.
      This can be verified by logging the 4-tuple of the packet in '__udp4_lib_rcv()'.
      
      The fix is then to reset the flow when skb is received on a VRF, to let
      conntrack resolve the flow again (which now will hit the earlier flow).
      
      To reproduce: (Without the fix "Got pkt_to_nat_port" will not be printed by
        running 'bash ./repro'):
        $ cat run_in_A1.py
        import logging
        logging.getLogger("scapy.runtime").setLevel(logging.ERROR)
        from scapy.all import *
        import argparse
      
        def get_packet_to_send(udp_dst_port, msg_name):
            return Ether(src='11:22:33:44:55:66', dst=iface_mac)/ \
                IP(src='3.3.3.3', dst='2.2.2.2')/ \
                UDP(sport=53, dport=udp_dst_port)/ \
                Raw(f'{msg_name}\x0012345678901234567890')
      
        parser = argparse.ArgumentParser()
        parser.add_argument('-iface_mac', dest="iface_mac", type=str, required=True,
                            help="From run_in_A3.py")
        parser.add_argument('-socket_port', dest="socket_port", type=str,
                            required=True, help="From run_in_A3.py")
        parser.add_argument('-v1_mac', dest="v1_mac", type=str, required=True,
                            help="From script")
      
        args, _ = parser.parse_known_args()
        iface_mac = args.iface_mac
        socket_port = int(args.socket_port)
        v1_mac = args.v1_mac
      
        print(f'Source port before NAT: {socket_port}')
      
        while True:
            pkts = sniff(iface='_v0', store=True, count=1, timeout=10)
            if 0 == len(pkts):
                print('Something failed, rerun the script :(', flush=True)
                break
            pkt = pkts[0]
            if not pkt.haslayer('UDP'):
                continue
      
            pkt_sport = pkt.getlayer('UDP').sport
            print(f'Source port after NAT: {pkt_sport}', flush=True)
      
            pkt_to_send = get_packet_to_send(pkt_sport, 'pkt_to_nat_port')
            sendp(pkt_to_send, '_v0', verbose=False) # Will not be received
      
            pkt_to_send = get_packet_to_send(socket_port, 'pkt_to_socket_port')
            sendp(pkt_to_send, '_v0', verbose=False)
            break
      
        $ cat run_in_A2.py
        import socket
        import netifaces
      
        print(f"{netifaces.ifaddresses('e00000')[netifaces.AF_LINK][0]['addr']}",
              flush=True)
        s = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
        s.setsockopt(socket.SOL_SOCKET, socket.SO_BINDTODEVICE,
                     str('vrf_1' + '\0').encode('utf-8'))
        s.connect(('3.3.3.3', 53))
        print(f'{s. getsockname()[1]}', flush=True)
        s.settimeout(5)
      
        while True:
            try:
                # Periodically send in order to keep the conntrack entry alive.
                s.send(b'a'*40)
                resp = s.recvfrom(1024)
                msg_name = resp[0].decode('utf-8').split('\0')[0]
                print(f"Got {msg_name}", flush=True)
            except Exception as e:
                pass
      
        $ cat repro.sh
        ip netns del A1 2> /dev/null
        ip netns del A2 2> /dev/null
        ip netns add A1
        ip netns add A2
      
        ip -n A1 link add _v0 type veth peer name _v1 netns A2
        ip -n A1 link set _v0 up
      
        ip -n A2 link add e00000 type bond
        ip -n A2 link add lo0 type dummy
        ip -n A2 link add vrf_1 type vrf table 10001
        ip -n A2 link set vrf_1 up
        ip -n A2 link set e00000 master vrf_1
      
        ip -n A2 addr add 1.1.1.1/24 dev e00000
        ip -n A2 link set e00000 up
        ip -n A2 link set _v1 master e00000
        ip -n A2 link set _v1 up
        ip -n A2 link set lo0 up
        ip -n A2 addr add 2.2.2.2/32 dev lo0
      
        ip -n A2 neigh add 1.1.1.10 lladdr 77:77:77:77:77:77 dev e00000
        ip -n A2 route add 3.3.3.3/32 via 1.1.1.10 dev e00000 table 10001
      
        ip netns exec A2 iptables -t nat -A POSTROUTING -p udp -m udp --dport 53 -j \
      	SNAT --to-source 2.2.2.2 -o vrf_1
      
        sleep 5
        ip netns exec A2 python3 run_in_A2.py > x &
        XPID=$!
        sleep 5
      
        IFACE_MAC=`sed -n 1p x`
        SOCKET_PORT=`sed -n 2p x`
        V1_MAC=`ip -n A2 link show _v1 | sed -n 2p | awk '{print $2'}`
        ip netns exec A1 python3 run_in_A1.py -iface_mac ${IFACE_MAC} -socket_port \
                ${SOCKET_PORT} -v1_mac ${SOCKET_PORT}
        sleep 5
      
        kill -9 $XPID
        wait $XPID 2> /dev/null
        ip netns del A1
        ip netns del A2
        tail x -n 2
        rm x
        set +x
      
      Fixes: 73e20b76 ("net: vrf: Add support for PREROUTING rules on vrf device")
      Signed-off-by: default avatarLahav Schlesinger <lschlesinger@drivenets.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@kernel.org>
      Link: https://lore.kernel.org/r/20210815120002.2787653-1-lschlesinger@drivenets.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      09e856d5
    • Dan Carpenter's avatar
      net: iosm: Prevent underflow in ipc_chnl_cfg_get() · 4f3f2e3f
      Dan Carpenter authored
      The bounds check on "index" doesn't catch negative values.  Using
      ARRAY_SIZE() directly is more readable and more robust because it prevents
      negative values for "index".  Fortunately we only pass valid values to
      ipc_chnl_cfg_get() so this patch does not affect runtime.
      Reported-by: default avatarSolomon Ucko <solly.ucko@gmail.com>
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Reviewed-by: default avatarM Chetan Kumar <m.chetan.kumar@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4f3f2e3f
    • David S. Miller's avatar
      Merge branch 'bnxt_en-fixes' · 517c54d2
      David S. Miller authored
      Michael Chan says:
      
      ====================
      bnxt_en: 2 bug fixes
      
      The first one disables aRFS/NTUPLE on an older broken firmware version.
      The second one adds missing memory barriers related to completion ring
      handling.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      517c54d2
    • Michael Chan's avatar
      bnxt_en: Add missing DMA memory barriers · 828affc2
      Michael Chan authored
      Each completion ring entry has a valid bit to indicate that the entry
      contains a valid completion event.  The driver's main poll loop
      __bnxt_poll_work() has the proper dma_rmb() to make sure the valid
      bit of the next entry has been checked before proceeding further.
      But when we call bnxt_rx_pkt() to process the RX event, the RX
      completion event consists of two completion entries and only the
      first entry has been checked to be valid.  We need the same barrier
      after checking the next completion entry.  Add missing dma_rmb()
      barriers in bnxt_rx_pkt() and other similar locations.
      
      Fixes: 67a95e20 ("bnxt_en: Need memory barrier when processing the completion ring.")
      Reported-by: default avatarLance Richardson <lance.richardson@broadcom.com>
      Reviewed-by: default avatarAndy Gospodarek <gospo@broadcom.com>
      Reviewed-by: default avatarLance Richardson <lance.richardson@broadcom.com>
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      828affc2
    • Michael Chan's avatar
      bnxt_en: Disable aRFS if running on 212 firmware · 976e52b7
      Michael Chan authored
      212 firmware broke aRFS, so disable it.  Traffic may stop after ntuple
      filters are inserted and deleted by the 212 firmware.
      
      Fixes: ae10ae74 ("bnxt_en: Add new hardware RFS mode.")
      Reviewed-by: default avatarPavan Chebbi <pavan.chebbi@broadcom.com>
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      976e52b7
    • Shai Malin's avatar
      qed: Fix null-pointer dereference in qed_rdma_create_qp() · d33d19d3
      Shai Malin authored
      Fix a possible null-pointer dereference in qed_rdma_create_qp().
      
      Changes from V2:
      - Revert checkpatch fixes.
      Reported-by: default avatarTOTE Robot <oslab@tsinghua.edu.cn>
      Signed-off-by: default avatarAriel Elior <aelior@marvell.com>
      Signed-off-by: default avatarShai Malin <smalin@marvell.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d33d19d3
    • Shai Malin's avatar
      qed: qed ll2 race condition fixes · 37110237
      Shai Malin authored
      Avoiding qed ll2 race condition and NULL pointer dereference as part
      of the remove and recovery flows.
      
      Changes form V1:
      - Change (!p_rx->set_prod_addr).
      - qed_ll2.c checkpatch fixes.
      
      Change from V2:
      - Revert "qed_ll2.c checkpatch fixes".
      Signed-off-by: default avatarAriel Elior <aelior@marvell.com>
      Signed-off-by: default avatarShai Malin <smalin@marvell.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      37110237
    • Xin Long's avatar
      tipc: call tipc_wait_for_connect only when dlen is not 0 · 7387a72c
      Xin Long authored
      __tipc_sendmsg() is called to send SYN packet by either tipc_sendmsg()
      or tipc_connect(). The difference is in tipc_connect(), it will call
      tipc_wait_for_connect() after __tipc_sendmsg() to wait until connecting
      is done. So there's no need to wait in __tipc_sendmsg() for this case.
      
      This patch is to fix it by calling tipc_wait_for_connect() only when dlen
      is not 0 in __tipc_sendmsg(), which means it's called by tipc_connect().
      
      Note this also fixes the failure in tipcutils/test/ptts/:
      
        # ./tipcTS &
        # ./tipcTC 9
        (hang)
      
      Fixes: 36239dab6da7 ("tipc: fix implicit-connect for SYN+")
      Reported-by: default avatarShuang Li <shuali@redhat.com>
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Acked-by: default avatarJon Maloy <jmaloy@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7387a72c
    • Andy Shevchenko's avatar
      ptp_pch: Restore dependency on PCI · 55c8fca1
      Andy Shevchenko authored
      During the swap dependency on PCH_GBE to selection PTP_1588_CLOCK_PCH
      incidentally dropped the implicit dependency on the PCI. Restore it.
      
      Fixes: 18d359ce ("pch_gbe, ptp_pch: Fix the dependency direction between these drivers")
      Reported-by: default avatarkernel test robot <lkp@intel.com>
      Signed-off-by: default avatarAndy Shevchenko <andriy.shevchenko@linux.intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      55c8fca1
    • Pavel Skripkin's avatar
      net: 6pack: fix slab-out-of-bounds in decode_data · 19d1532a
      Pavel Skripkin authored
      Syzbot reported slab-out-of bounds write in decode_data().
      The problem was in missing validation checks.
      
      Syzbot's reproducer generated malicious input, which caused
      decode_data() to be called a lot in sixpack_decode(). Since
      rx_count_cooked is only 400 bytes and noone reported before,
      that 400 bytes is not enough, let's just check if input is malicious
      and complain about buffer overrun.
      
      Fail log:
      ==================================================================
      BUG: KASAN: slab-out-of-bounds in drivers/net/hamradio/6pack.c:843
      Write of size 1 at addr ffff888087c5544e by task kworker/u4:0/7
      
      CPU: 0 PID: 7 Comm: kworker/u4:0 Not tainted 5.6.0-rc3-syzkaller #0
      ...
      Workqueue: events_unbound flush_to_ldisc
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x197/0x210 lib/dump_stack.c:118
       print_address_description.constprop.0.cold+0xd4/0x30b mm/kasan/report.c:374
       __kasan_report.cold+0x1b/0x32 mm/kasan/report.c:506
       kasan_report+0x12/0x20 mm/kasan/common.c:641
       __asan_report_store1_noabort+0x17/0x20 mm/kasan/generic_report.c:137
       decode_data.part.0+0x23b/0x270 drivers/net/hamradio/6pack.c:843
       decode_data drivers/net/hamradio/6pack.c:965 [inline]
       sixpack_decode drivers/net/hamradio/6pack.c:968 [inline]
      
      Reported-and-tested-by: syzbot+fc8cd9a673d4577fb2e4@syzkaller.appspotmail.com
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Signed-off-by: default avatarPavel Skripkin <paskripkin@gmail.com>
      Reviewed-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      19d1532a
  5. 14 Aug, 2021 1 commit
  6. 13 Aug, 2021 3 commits