1. 02 Nov, 2021 12 commits
    • Jakub Kicinski's avatar
      Revert "net: avoid double accounting for pure zerocopy skbs" · 84882cf7
      Jakub Kicinski authored
      This reverts commit f1a456f8.
      
        WARNING: CPU: 1 PID: 6819 at net/core/skbuff.c:5429 skb_try_coalesce+0x78b/0x7e0
        CPU: 1 PID: 6819 Comm: xxxxxxx Kdump: loaded Tainted: G S                5.15.0-04194-gd852503f7711 #16
        RIP: 0010:skb_try_coalesce+0x78b/0x7e0
        Code: e8 2a bf 41 ff 44 8b b3 bc 00 00 00 48 8b 7c 24 30 e8 19 c0 41 ff 44 89 f0 48 03 83 c0 00 00 00 48 89 44 24 40 e9 47 fb ff ff <0f> 0b e9 ca fc ff ff 4c 8d 70 ff 48 83 c0 07 48 89 44 24 38 e9 61
        RSP: 0018:ffff88881f449688 EFLAGS: 00010282
        RAX: 00000000fffffe96 RBX: ffff8881566e4460 RCX: ffffffff82079f7e
        RDX: 0000000000000003 RSI: dffffc0000000000 RDI: ffff8881566e47b0
        RBP: ffff8881566e46e0 R08: ffffed102619235d R09: ffffed102619235d
        R10: ffff888130c91ae3 R11: ffffed102619235c R12: ffff88881f4498a0
        R13: 0000000000000056 R14: 0000000000000009 R15: ffff888130c91ac0
        FS:  00007fec2cbb9700(0000) GS:ffff88881f440000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: 00007fec1b060d80 CR3: 00000003acf94005 CR4: 00000000003706e0
        DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
        DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
        Call Trace:
         <IRQ>
         tcp_try_coalesce+0xeb/0x290
         ? tcp_parse_options+0x610/0x610
         ? mark_held_locks+0x79/0xa0
         tcp_queue_rcv+0x69/0x2f0
         tcp_rcv_established+0xa49/0xd40
         ? tcp_data_queue+0x18a0/0x18a0
         tcp_v6_do_rcv+0x1c9/0x880
         ? rt6_mtu_change_route+0x100/0x100
         tcp_v6_rcv+0x1624/0x1830
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      84882cf7
    • Jakub Kicinski's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 8a33dcc2
      Jakub Kicinski authored
      Merge in the fixes we had queued in case there was another -rc.
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      8a33dcc2
    • Jakub Kicinski's avatar
      Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next · b7b98f86
      Jakub Kicinski authored
      Alexei Starovoitov says:
      
      ====================
      pull-request: bpf-next 2021-11-01
      
      We've added 181 non-merge commits during the last 28 day(s) which contain
      a total of 280 files changed, 11791 insertions(+), 5879 deletions(-).
      
      The main changes are:
      
      1) Fix bpf verifier propagation of 64-bit bounds, from Alexei.
      
      2) Parallelize bpf test_progs, from Yucong and Andrii.
      
      3) Deprecate various libbpf apis including af_xdp, from Andrii, Hengqi, Magnus.
      
      4) Improve bpf selftests on s390, from Ilya.
      
      5) bloomfilter bpf map type, from Joanne.
      
      6) Big improvements to JIT tests especially on Mips, from Johan.
      
      7) Support kernel module function calls from bpf, from Kumar.
      
      8) Support typeless and weak ksym in light skeleton, from Kumar.
      
      9) Disallow unprivileged bpf by default, from Pawan.
      
      10) BTF_KIND_DECL_TAG support, from Yonghong.
      
      11) Various bpftool cleanups, from Quentin.
      
      * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (181 commits)
        libbpf: Deprecate AF_XDP support
        kbuild: Unify options for BTF generation for vmlinux and modules
        selftests/bpf: Add a testcase for 64-bit bounds propagation issue.
        bpf: Fix propagation of signed bounds from 64-bit min/max into 32-bit.
        bpf: Fix propagation of bounds from 64-bit min/max into 32-bit and var_off.
        selftests/bpf: Fix also no-alu32 strobemeta selftest
        bpf: Add missing map_delete_elem method to bloom filter map
        selftests/bpf: Add bloom map success test for userspace calls
        bpf: Add alignment padding for "map_extra" + consolidate holes
        bpf: Bloom filter map naming fixups
        selftests/bpf: Add test cases for struct_ops prog
        bpf: Add dummy BPF STRUCT_OPS for test purpose
        bpf: Factor out helpers for ctx access checking
        bpf: Factor out a helper to prepare trampoline for struct_ops prog
        selftests, bpf: Fix broken riscv build
        riscv, libbpf: Add RISC-V (RV64) support to bpf_tracing.h
        tools, build: Add RISC-V to HOSTARCH parsing
        riscv, bpf: Increase the maximum number of iterations
        selftests, bpf: Add one test for sockmap with strparser
        selftests, bpf: Fix test_txmsg_ingress_parser error
        ...
      ====================
      
      Link: https://lore.kernel.org/r/20211102013123.9005-1-alexei.starovoitov@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      b7b98f86
    • Jakub Kicinski's avatar
      Merge branch 'make-neighbor-eviction-controllable-by-userspace' · 52fa3ee0
      Jakub Kicinski authored
      James Prestwood says:
      
      ====================
      Make neighbor eviction controllable by userspace
      ====================
      
      Link: https://lore.kernel.org/r/20211101173630.300969-1-prestwoj@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      52fa3ee0
    • James Prestwood's avatar
      selftests: net: add arp_ndisc_evict_nocarrier · f86ca07e
      James Prestwood authored
      This tests the sysctl options for ARP/ND:
      
      /net/ipv4/conf/<iface>/arp_evict_nocarrier
      /net/ipv4/conf/all/arp_evict_nocarrier
      /net/ipv6/conf/<iface>/ndisc_evict_nocarrier
      /net/ipv6/conf/all/ndisc_evict_nocarrier
      Signed-off-by: default avatarJames Prestwood <prestwoj@gmail.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@kernel.org>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      f86ca07e
    • James Prestwood's avatar
      net: ndisc: introduce ndisc_evict_nocarrier sysctl parameter · 18ac597a
      James Prestwood authored
      In most situations the neighbor discovery cache should be cleared on a
      NOCARRIER event which is currently done unconditionally. But for wireless
      roams the neighbor discovery cache can and should remain intact since
      the underlying network has not changed.
      
      This patch introduces a sysctl option ndisc_evict_nocarrier which can
      be disabled by a wireless supplicant during a roam. This allows packets
      to be sent after a roam immediately without having to wait for
      neighbor discovery.
      
      A user reported roughly a 1 second delay after a roam before packets
      could be sent out (note, on IPv4). This delay was due to the ARP
      cache being cleared. During testing of this same scenario using IPv6
      no delay was noticed, but regardless there is no reason to clear
      the ndisc cache for wireless roams.
      Signed-off-by: default avatarJames Prestwood <prestwoj@gmail.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@kernel.org>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      18ac597a
    • James Prestwood's avatar
      net: arp: introduce arp_evict_nocarrier sysctl parameter · fcdb44d0
      James Prestwood authored
      This change introduces a new sysctl parameter, arp_evict_nocarrier.
      When set (default) the ARP cache will be cleared on a NOCARRIER event.
      This new option has been defaulted to '1' which maintains existing
      behavior.
      
      Clearing the ARP cache on NOCARRIER is relatively new, introduced by:
      
      commit 859bd2ef
      Author: David Ahern <dsahern@gmail.com>
      Date:   Thu Oct 11 20:33:49 2018 -0700
      
          net: Evict neighbor entries on carrier down
      
      The reason for this changes is to prevent the ARP cache from being
      cleared when a wireless device roams. Specifically for wireless roams
      the ARP cache should not be cleared because the underlying network has not
      changed. Clearing the ARP cache in this case can introduce significant
      delays sending out packets after a roam.
      
      A user reported such a situation here:
      
      https://lore.kernel.org/linux-wireless/CACsRnHWa47zpx3D1oDq9JYnZWniS8yBwW1h0WAVZ6vrbwL_S0w@mail.gmail.com/
      
      After some investigation it was found that the kernel was holding onto
      packets until ARP finished which resulted in this 1 second delay. It
      was also found that the first ARP who-has was never responded to,
      which is actually what caues the delay. This change is more or less
      working around this behavior, but again, there is no reason to clear
      the cache on a roam anyways.
      
      As for the unanswered who-has, we know the packet made it OTA since
      it was seen while monitoring. Why it never received a response is
      unknown. In any case, since this is a problem on the AP side of things
      all that can be done is to work around it until it is solved.
      
      Some background on testing/reproducing the packet delay:
      
      Hardware:
       - 2 access points configured for Fast BSS Transition (Though I don't
         see why regular reassociation wouldn't have the same behavior)
       - Wireless station running IWD as supplicant
       - A device on network able to respond to pings (I used one of the APs)
      
      Procedure:
       - Connect to first AP
       - Ping once to establish an ARP entry
       - Start a tcpdump
       - Roam to second AP
       - Wait for operstate UP event, and note the timestamp
       - Start pinging
      
      Results:
      
      Below is the tcpdump after UP. It was recorded the interface went UP at
      10:42:01.432875.
      
      10:42:01.461871 ARP, Request who-has 192.168.254.1 tell 192.168.254.71, length 28
      10:42:02.497976 ARP, Request who-has 192.168.254.1 tell 192.168.254.71, length 28
      10:42:02.507162 ARP, Reply 192.168.254.1 is-at ac:86:74:55:b0:20, length 46
      10:42:02.507185 IP 192.168.254.71 > 192.168.254.1: ICMP echo request, id 52792, seq 1, length 64
      10:42:02.507205 IP 192.168.254.71 > 192.168.254.1: ICMP echo request, id 52792, seq 2, length 64
      10:42:02.507212 IP 192.168.254.71 > 192.168.254.1: ICMP echo request, id 52792, seq 3, length 64
      10:42:02.507219 IP 192.168.254.71 > 192.168.254.1: ICMP echo request, id 52792, seq 4, length 64
      10:42:02.507225 IP 192.168.254.71 > 192.168.254.1: ICMP echo request, id 52792, seq 5, length 64
      10:42:02.507232 IP 192.168.254.71 > 192.168.254.1: ICMP echo request, id 52792, seq 6, length 64
      10:42:02.515373 IP 192.168.254.1 > 192.168.254.71: ICMP echo reply, id 52792, seq 1, length 64
      10:42:02.521399 IP 192.168.254.1 > 192.168.254.71: ICMP echo reply, id 52792, seq 2, length 64
      10:42:02.521612 IP 192.168.254.1 > 192.168.254.71: ICMP echo reply, id 52792, seq 3, length 64
      10:42:02.521941 IP 192.168.254.1 > 192.168.254.71: ICMP echo reply, id 52792, seq 4, length 64
      10:42:02.522419 IP 192.168.254.1 > 192.168.254.71: ICMP echo reply, id 52792, seq 5, length 64
      10:42:02.523085 IP 192.168.254.1 > 192.168.254.71: ICMP echo reply, id 52792, seq 6, length 64
      
      You can see the first ARP who-has went out very quickly after UP, but
      was never responded to. Nearly a second later the kernel retries and
      gets a response. Only then do the ping packets go out. If an ARP entry
      is manually added prior to UP (after the cache is cleared) it is seen
      that the first ping is never responded to, so its not only an issue with
      ARP but with data packets in general.
      
      As mentioned prior, the wireless interface was also monitored to verify
      the ping/ARP packet made it OTA which was observed to be true.
      Signed-off-by: default avatarJames Prestwood <prestwoj@gmail.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@kernel.org>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      fcdb44d0
    • Magnus Karlsson's avatar
      libbpf: Deprecate AF_XDP support · 0b170456
      Magnus Karlsson authored
      Deprecate AF_XDP support in libbpf ([0]). This has been moved to
      libxdp as it is a better fit for that library. The AF_XDP support only
      uses the public libbpf functions and can therefore just use libbpf as
      a library from libxdp. The libxdp APIs are exactly the same so it
      should just be linking with libxdp instead of libbpf for the AF_XDP
      functionality. If not, please submit a bug report. Linking with both
      libraries is supported but make sure you link in the correct order so
      that the new functions in libxdp are used instead of the deprecated
      ones in libbpf.
      
      Libxdp can be found at https://github.com/xdp-project/xdp-tools.
      
        [0] Closes: https://github.com/libbpf/libbpf/issues/270Signed-off-by: default avatarMagnus Karlsson <magnus.karlsson@intel.com>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Acked-by: default avatarToke Høiland-Jørgensen <toke@redhat.com>
      Link: https://lore.kernel.org/bpf/20211029090111.4733-1-magnus.karlsson@gmail.com
      0b170456
    • Jiri Olsa's avatar
      kbuild: Unify options for BTF generation for vmlinux and modules · 9741e07e
      Jiri Olsa authored
      Using new PAHOLE_FLAGS variable to pass extra arguments to
      pahole for both vmlinux and modules BTF data generation.
      
      Adding new scripts/pahole-flags.sh script that detect and
      prints pahole options.
      
      [ fixed issues found by kernel test robot ]
      Signed-off-by: default avatarJiri Olsa <jolsa@kernel.org>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Acked-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/bpf/20211029125729.70002-1-jolsa@kernel.org
      9741e07e
    • Alexei Starovoitov's avatar
      selftests/bpf: Add a testcase for 64-bit bounds propagation issue. · 0869e507
      Alexei Starovoitov authored
      ./test_progs-no_alu32 -vv -t twfw
      
      Before the 64-bit_into_32-bit fix:
      19: (25) if r1 > 0x3f goto pc+6
       R1_w=inv(id=0,umax_value=63,var_off=(0x0; 0xff),s32_max_value=255,u32_max_value=255)
      
      and eventually:
      
      invalid access to map value, value_size=8 off=7 size=8
      R6 max value is outside of the allowed memory range
      libbpf: failed to load object 'no_alu32/twfw.o'
      
      After the fix:
      19: (25) if r1 > 0x3f goto pc+6
       R1_w=inv(id=0,umax_value=63,var_off=(0x0; 0x3f))
      
      verif_twfw:OK
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Acked-by: default avatarYonghong Song <yhs@fb.com>
      Link: https://lore.kernel.org/bpf/20211101222153.78759-3-alexei.starovoitov@gmail.com
      0869e507
    • Alexei Starovoitov's avatar
      bpf: Fix propagation of signed bounds from 64-bit min/max into 32-bit. · 388e2c0b
      Alexei Starovoitov authored
      Similar to unsigned bounds propagation fix signed bounds.
      The 'Fixes' tag is a hint. There is no security bug here.
      The verifier was too conservative.
      
      Fixes: 3f50f132 ("bpf: Verifier, do explicit ALU32 bounds tracking")
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Acked-by: default avatarYonghong Song <yhs@fb.com>
      Link: https://lore.kernel.org/bpf/20211101222153.78759-2-alexei.starovoitov@gmail.com
      388e2c0b
    • Alexei Starovoitov's avatar
      bpf: Fix propagation of bounds from 64-bit min/max into 32-bit and var_off. · b9979db8
      Alexei Starovoitov authored
      Before this fix:
      166: (b5) if r2 <= 0x1 goto pc+22
      from 166 to 189: R2=invP(id=1,umax_value=1,var_off=(0x0; 0xffffffff))
      
      After this fix:
      166: (b5) if r2 <= 0x1 goto pc+22
      from 166 to 189: R2=invP(id=1,umax_value=1,var_off=(0x0; 0x1))
      
      While processing BPF_JLE the reg_set_min_max() would set true_reg->umax_value = 1
      and call __reg_combine_64_into_32(true_reg).
      
      Without the fix it would not pass the condition:
      if (__reg64_bound_u32(reg->umin_value) && __reg64_bound_u32(reg->umax_value))
      
      since umin_value == 0 at this point.
      Before commit 10bf4e83 the umin was incorrectly ingored.
      The commit 10bf4e83 fixed the correctness issue, but pessimized
      propagation of 64-bit min max into 32-bit min max and corresponding var_off.
      
      Fixes: 10bf4e83 ("bpf: Fix propagation of 32 bit unsigned bounds from 64 bit bounds")
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Acked-by: default avatarYonghong Song <yhs@fb.com>
      Link: https://lore.kernel.org/bpf/20211101222153.78759-1-alexei.starovoitov@gmail.com
      b9979db8
  2. 01 Nov, 2021 28 commits