1. 13 Feb, 2019 1 commit
    • Alin Nastac's avatar
      netfilter: reject: skip csum verification for protocols that don't support it · 7fc38225
      Alin Nastac authored
      Some protocols have other means to verify the payload integrity
      (AH, ESP, SCTP) while others are incompatible with nf_ip(6)_checksum
      implementation because checksum is either optional or might be
      partial (UDPLITE, DCCP, GRE). Because nf_ip(6)_checksum was used
      to validate the packets, ip(6)tables REJECT rules were not capable
      to generate ICMP(v6) errors for the protocols mentioned above.
      
      This commit also fixes the incorrect pseudo-header protocol used
      for IPv4 packets that carry other transport protocols than TCP or
      UDP (pseudo-header used protocol 0 iso the proper value).
      Signed-off-by: default avatarAlin Nastac <alin.nastac@gmail.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      7fc38225
  2. 12 Feb, 2019 1 commit
    • Chieh-Min Wang's avatar
      netfilter: conntrack: fix cloned unconfirmed skb->_nfct race in __nf_conntrack_confirm · 13f5251f
      Chieh-Min Wang authored
      For bridge(br_flood) or broadcast/multicast packets, they could clone
      skb with unconfirmed conntrack which break the rule that unconfirmed
      skb->_nfct is never shared.  With nfqueue running on my system, the race
      can be easily reproduced with following warning calltrace:
      
      [13257.707525] CPU: 0 PID: 12132 Comm: main Tainted: P        W       4.4.60 #7744
      [13257.707568] Hardware name: Qualcomm (Flattened Device Tree)
      [13257.714700] [<c021f6dc>] (unwind_backtrace) from [<c021bce8>] (show_stack+0x10/0x14)
      [13257.720253] [<c021bce8>] (show_stack) from [<c0449e10>] (dump_stack+0x94/0xa8)
      [13257.728240] [<c0449e10>] (dump_stack) from [<c022a7e0>] (warn_slowpath_common+0x94/0xb0)
      [13257.735268] [<c022a7e0>] (warn_slowpath_common) from [<c022a898>] (warn_slowpath_null+0x1c/0x24)
      [13257.743519] [<c022a898>] (warn_slowpath_null) from [<c06ee450>] (__nf_conntrack_confirm+0xa8/0x618)
      [13257.752284] [<c06ee450>] (__nf_conntrack_confirm) from [<c0772670>] (ipv4_confirm+0xb8/0xfc)
      [13257.761049] [<c0772670>] (ipv4_confirm) from [<c06e7a60>] (nf_iterate+0x48/0xa8)
      [13257.769725] [<c06e7a60>] (nf_iterate) from [<c06e7af0>] (nf_hook_slow+0x30/0xb0)
      [13257.777108] [<c06e7af0>] (nf_hook_slow) from [<c07f20b4>] (br_nf_post_routing+0x274/0x31c)
      [13257.784486] [<c07f20b4>] (br_nf_post_routing) from [<c06e7a60>] (nf_iterate+0x48/0xa8)
      [13257.792556] [<c06e7a60>] (nf_iterate) from [<c06e7af0>] (nf_hook_slow+0x30/0xb0)
      [13257.800458] [<c06e7af0>] (nf_hook_slow) from [<c07e5580>] (br_forward_finish+0x94/0xa4)
      [13257.808010] [<c07e5580>] (br_forward_finish) from [<c07f22ac>] (br_nf_forward_finish+0x150/0x1ac)
      [13257.815736] [<c07f22ac>] (br_nf_forward_finish) from [<c06e8df0>] (nf_reinject+0x108/0x170)
      [13257.824762] [<c06e8df0>] (nf_reinject) from [<c06ea854>] (nfqnl_recv_verdict+0x3d8/0x420)
      [13257.832924] [<c06ea854>] (nfqnl_recv_verdict) from [<c06e940c>] (nfnetlink_rcv_msg+0x158/0x248)
      [13257.841256] [<c06e940c>] (nfnetlink_rcv_msg) from [<c06e5564>] (netlink_rcv_skb+0x54/0xb0)
      [13257.849762] [<c06e5564>] (netlink_rcv_skb) from [<c06e4ec8>] (netlink_unicast+0x148/0x23c)
      [13257.858093] [<c06e4ec8>] (netlink_unicast) from [<c06e5364>] (netlink_sendmsg+0x2ec/0x368)
      [13257.866348] [<c06e5364>] (netlink_sendmsg) from [<c069fb8c>] (sock_sendmsg+0x34/0x44)
      [13257.874590] [<c069fb8c>] (sock_sendmsg) from [<c06a03dc>] (___sys_sendmsg+0x1ec/0x200)
      [13257.882489] [<c06a03dc>] (___sys_sendmsg) from [<c06a11c8>] (__sys_sendmsg+0x3c/0x64)
      [13257.890300] [<c06a11c8>] (__sys_sendmsg) from [<c0209b40>] (ret_fast_syscall+0x0/0x34)
      
      The original code just triggered the warning but do nothing. It will
      caused the shared conntrack moves to the dying list and the packet be
      droppped (nf_ct_resolve_clash returns NF_DROP for dying conntrack).
      
      - Reproduce steps:
      
      +----------------------------+
      |          br0(bridge)       |
      |                            |
      +-+---------+---------+------+
        | eth0|   | eth1|   | eth2|
        |     |   |     |   |     |
        +--+--+   +--+--+   +---+-+
           |         |          |
           |         |          |
        +--+-+     +-+--+    +--+-+
        | PC1|     | PC2|    | PC3|
        +----+     +----+    +----+
      
      iptables -A FORWARD -m mark --mark 0x1000000/0x1000000 -j NFQUEUE --queue-num 100 --queue-bypass
      
      ps: Our nfq userspace program will set mark on packets whose connection
      has already been processed.
      
      PC1 sends broadcast packets simulated by hping3:
      
      hping3 --rand-source --udp 192.168.1.255 -i u100
      
      - Broadcast racing flow chart is as follow:
      
      br_handle_frame
        BR_HOOK(NFPROTO_BRIDGE, NF_BR_PRE_ROUTING, br_handle_frame_finish)
        // skb->_nfct (unconfirmed conntrack) is constructed at PRE_ROUTING stage
        br_handle_frame_finish
          // check if this packet is broadcast
          br_flood_forward
            br_flood
              list_for_each_entry_rcu(p, &br->port_list, list) // iterate through each port
                maybe_deliver
                  deliver_clone
                    skb = skb_clone(skb)
                    __br_forward
                      BR_HOOK(NFPROTO_BRIDGE, NF_BR_FORWARD,...)
                      // queue in our nfq and received by our userspace program
                      // goto __nf_conntrack_confirm with process context on CPU 1
          br_pass_frame_up
            BR_HOOK(NFPROTO_BRIDGE, NF_BR_LOCAL_IN,...)
            // goto __nf_conntrack_confirm with softirq context on CPU 0
      
      Because conntrack confirm can happen at both INPUT and POSTROUTING
      stage.  So with NFQUEUE running, skb->_nfct with the same unconfirmed
      conntrack could race on different core.
      
      This patch fixes a repeating kernel splat, now it is only displayed
      once.
      Signed-off-by: default avatarChieh-Min Wang <chiehminw@synology.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      13f5251f
  3. 11 Feb, 2019 3 commits
  4. 04 Feb, 2019 3 commits
  5. 29 Jan, 2019 8 commits
    • Florian Westphal's avatar
      netfilter: nf_tables: add NFTA_RULE_POSITION_ID to nla_policy · 0604628b
      Florian Westphal authored
      Fixes: 75dd48e2 ("netfilter: nf_tables: Support RULE_ID reference in new rule")
      Reported-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Acked-by: default avatarPhil Sutter <phil@nwl.cc>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      0604628b
    • Stephen Rothwell's avatar
      enetc: include linux/vmalloc.h for vzalloc etc · bbcbf2ee
      Stephen Rothwell authored
      Fixes: d4fd0404 ("enetc: Introduce basic PF and VF ENETC ethernet drivers")
      Signed-off-by: default avatarStephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bbcbf2ee
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next · ec7146db
      David S. Miller authored
      Daniel Borkmann says:
      
      ====================
      pull-request: bpf-next 2019-01-29
      
      The following pull-request contains BPF updates for your *net-next* tree.
      
      The main changes are:
      
      1) Teach verifier dead code removal, this also allows for optimizing /
         removing conditional branches around dead code and to shrink the
         resulting image. Code store constrained architectures like nfp would
         have hard time doing this at JIT level, from Jakub.
      
      2) Add JMP32 instructions to BPF ISA in order to allow for optimizing
         code generation for 32-bit sub-registers. Evaluation shows that this
         can result in code reduction of ~5-20% compared to 64 bit-only code
         generation. Also add implementation for most JITs, from Jiong.
      
      3) Add support for __int128 types in BTF which is also needed for
         vmlinux's BTF conversion to work, from Yonghong.
      
      4) Add a new command to bpftool in order to dump a list of BPF-related
         parameters from the system or for a specific network device e.g. in
         terms of available prog/map types or helper functions, from Quentin.
      
      5) Add AF_XDP sock_diag interface for querying sockets from user
         space which provides information about the RX/TX/fill/completion
         rings, umem, memory usage etc, from Björn.
      
      6) Add skb context access for skb_shared_info->gso_segs field, from Eric.
      
      7) Add support for testing flow dissector BPF programs by extending
         existing BPF_PROG_TEST_RUN infrastructure, from Stanislav.
      
      8) Split BPF kselftest's test_verifier into various subgroups of tests
         in order better deal with merge conflicts in this area, from Jakub.
      
      9) Add support for queue/stack manipulations in bpftool, from Stanislav.
      
      10) Document BTF, from Yonghong.
      
      11) Dump supported ELF section names in libbpf on program load
          failure, from Taeung.
      
      12) Silence a false positive compiler warning in verifier's BTF
          handling, from Peter.
      
      13) Fix help string in bpftool's feature probing, from Prashant.
      
      14) Remove duplicate includes in BPF kselftests, from Yue.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ec7146db
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next · 343917b4
      David S. Miller authored
      Pablo Neira Ayuso says:
      
      ====================
      Netfilter/IPVS updates for net-next
      
      The following patchset contains Netfilter/IPVS updates for your net-next tree:
      
      1) Introduce a hashtable to speed up object lookups, from Florian Westphal.
      
      2) Make direct calls to built-in extension, also from Florian.
      
      3) Call helper before confirming the conntrack as it used to be originally,
         from Florian.
      
      4) Call request_module() to autoload br_netfilter when physdev is used
         to relax the dependency, also from Florian.
      
      5) Allow to insert rules at a given position ID that is internal to the
         batch, from Phil Sutter.
      
      6) Several patches to replace conntrack indirections by direct calls,
         and to reduce modularization, from Florian. This also includes
         several follow up patches to deal with minor fallout from this
         rework.
      
      7) Use RCU from conntrack gre helper, from Florian.
      
      8) GRE conntrack module becomes built-in into nf_conntrack, from Florian.
      
      9) Replace nf_ct_invert_tuplepr() by calls to nf_ct_invert_tuple(),
         from Florian.
      
      10) Unify sysctl handling at the core of nf_conntrack, from Florian.
      
      11) Provide modparam to register conntrack hooks.
      
      12) Allow to match on the interface kind string, from wenxu.
      
      13) Remove several exported symbols, not required anymore now after
          a bit of de-modulatization work has been done, from Florian.
      
      14) Remove built-in map support in the hash extension, this can be
          done with the existing userspace infrastructure, from laura.
      
      15) Remove indirection to calculate checksums in IPVS, from Matteo Croce.
      
      16) Use call wrappers for indirection in IPVS, also from Matteo.
      
      17) Remove superfluous __percpu parameter in nft_counter, patch from
          Luc Van Oostenryck.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      343917b4
    • Daniel Borkmann's avatar
      Merge branch 'bpf-flow-dissector-tests' · 3d2af27a
      Daniel Borkmann authored
      Stanislav Fomichev says:
      
      ====================
      This patch series adds support for testing flow dissector BPF programs
      by extending already existing BPF_PROG_TEST_RUN. The goal is to have
      a packet as an input and `struct bpf_flow_key' as an output. That way
      we can easily test flow dissector programs' behavior. I've also modified
      existing test_progs.c test to do a simple flow dissector run as well.
      
      * first patch introduces new __skb_flow_bpf_dissect to simplify
        sharing between __skb_flow_bpf_dissect and BPF_PROG_TEST_RUN
      * second patch adds actual BPF_PROG_TEST_RUN support
      * third patch adds example usage to the selftests
      
      v3:
      * rebased on top of latest bpf-next
      
      v2:
      * loop over 'kattr->test.repeat' inside of
        bpf_prog_test_run_flow_dissector, don't reuse
        bpf_test_run/bpf_test_run_one
      ====================
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      3d2af27a
    • Stanislav Fomichev's avatar
      selftests/bpf: add simple BPF_PROG_TEST_RUN examples for flow dissector · bf0f0fd9
      Stanislav Fomichev authored
      Use existing pkt_v4 and pkt_v6 to make sure flow_keys are what we want.
      
      Also, add new bpf_flow_load routine (and flow_dissector_load.h header)
      that loads bpf_flow.o program and does all required setup.
      Signed-off-by: default avatarStanislav Fomichev <sdf@google.com>
      Acked-by: default avatarSong Liu <songliubraving@fb.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      bf0f0fd9
    • Stanislav Fomichev's avatar
      bpf: add BPF_PROG_TEST_RUN support for flow dissector · b7a1848e
      Stanislav Fomichev authored
      The input is packet data, the output is struct bpf_flow_key. This should
      make it easy to test flow dissector programs without elaborate
      setup.
      Signed-off-by: default avatarStanislav Fomichev <sdf@google.com>
      Acked-by: default avatarSong Liu <songliubraving@fb.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      b7a1848e
    • Stanislav Fomichev's avatar
      net/flow_dissector: move bpf case into __skb_flow_bpf_dissect · c8aa7038
      Stanislav Fomichev authored
      This way, we can reuse it for flow dissector in BPF_PROG_TEST_RUN.
      
      No functional changes.
      Signed-off-by: default avatarStanislav Fomichev <sdf@google.com>
      Acked-by: default avatarSong Liu <songliubraving@fb.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      c8aa7038
  6. 28 Jan, 2019 24 commits