1. 06 May, 2019 3 commits
  2. 05 May, 2019 3 commits
    • Taehee Yoo's avatar
      netfilter: nf_flow_table: fix missing error check for rhashtable_insert_fast · 43c8f131
      Taehee Yoo authored
      rhashtable_insert_fast() may return an error value when memory
      allocation fails, but flow_offload_add() does not check for errors.
      This patch just adds missing error checking.
      
      Fixes: ac2a6666 ("netfilter: add generic flow table infrastructure")
      Signed-off-by: default avatarTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      43c8f131
    • Florian Westphal's avatar
      netfilter: nf_tables: fix base chain stat rcu_dereference usage · edbd82c5
      Florian Westphal authored
      Following splat gets triggered when nfnetlink monitor is running while
      xtables-nft selftests are running:
      
      net/netfilter/nf_tables_api.c:1272 suspicious rcu_dereference_check() usage!
      other info that might help us debug this:
      
      1 lock held by xtables-nft-mul/27006:
       #0: 00000000e0f85be9 (&net->nft.commit_mutex){+.+.}, at: nf_tables_valid_genid+0x1a/0x50
      Call Trace:
       nf_tables_fill_chain_info.isra.45+0x6cc/0x6e0
       nf_tables_chain_notify+0xf8/0x1a0
       nf_tables_commit+0x165c/0x1740
      
      nf_tables_fill_chain_info() can be called both from dumps (rcu read locked)
      or from the transaction path if a userspace process subscribed to nftables
      notifications.
      
      In the 'table dump' case, rcu_access_pointer() cannot be used: We do not
      hold transaction mutex so the pointer can be NULLed right after the check.
      Just unconditionally fetch the value, then have the helper return
      immediately if its NULL.
      
      In the notification case we don't hold the rcu read lock, but updates are
      prevented due to transaction mutex. Use rcu_dereference_check() to make lockdep
      aware of this.
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      edbd82c5
    • Jakub Jankowski's avatar
      netfilter: nf_conntrack_h323: restore boundary check correctness · f5e85ce8
      Jakub Jankowski authored
      Since commit bc7d811a ("netfilter: nf_ct_h323: Convert
      CHECK_BOUND macro to function"), NAT traversal for H.323
      doesn't work, failing to parse H323-UserInformation.
      nf_h323_error_boundary() compares contents of the bitstring,
      not the addresses, preventing valid H.323 packets from being
      conntrack'd.
      
      This looks like an oversight from when CHECK_BOUND macro was
      converted to a function.
      
      To fix it, stop dereferencing bs->cur and bs->end.
      
      Fixes: bc7d811a ("netfilter: nf_ct_h323: Convert CHECK_BOUND macro to function")
      Signed-off-by: default avatarJakub Jankowski <shasta@toxcorp.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      f5e85ce8
  3. 30 Apr, 2019 7 commits
    • Taehee Yoo's avatar
      netfilter: nf_flow_table: check ttl value in flow offload data path · 33cc3c0c
      Taehee Yoo authored
      nf_flow_offload_ip_hook() and nf_flow_offload_ipv6_hook() do not check
      ttl value. So, ttl value overflow may occur.
      
      Fixes: 97add9f0 ("netfilter: flow table support for IPv4")
      Fixes: 09952107 ("netfilter: flow table support for IPv6")
      Signed-off-by: default avatarTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      33cc3c0c
    • Taehee Yoo's avatar
      netfilter: nf_flow_table: fix netdev refcnt leak · 26a302af
      Taehee Yoo authored
      flow_offload_alloc() calls nf_route() to get a dst_entry. Internally,
      nf_route() calls ip_route_output_key() that allocates a dst_entry and
      holds it. So, a dst_entry should be released by dst_release() if
      nf_route() is successful.
      
      Otherwise, netns exit routine cannot be finished and the following
      message is printed:
      
      [  257.490952] unregister_netdevice: waiting for lo to become free. Usage count = 1
      
      Fixes: ac2a6666 ("netfilter: add generic flow table infrastructure")
      Signed-off-by: default avatarTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      26a302af
    • Pablo Neira Ayuso's avatar
      netfilter: nft_flow_offload: add entry to flowtable after confirmation · 270a8a29
      Pablo Neira Ayuso authored
      This is fixing flow offload for UDP traffic where packets only follow
      one single direction.
      
      The flow_offload_fixup_tcp() mechanism works fine in case that the
      offloaded entry remains in SYN_RECV state, given sequence tracking is
      reset and that conntrack handles syn+ack packets as a retransmission, ie.
      
      	sES + synack => sIG
      
      for reply traffic.
      
      Fixes: a3c90f7a ("netfilter: nf_tables: flow offload expression")
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      270a8a29
    • Florian Westphal's avatar
      netfilter: nf_tables: delay chain policy update until transaction is complete · 66293c46
      Florian Westphal authored
      When we process a long ruleset of the form
      
      chain input {
         type filter hook input priority filter; policy drop;
         ...
      }
      
      Then the base chain gets registered early on, we then continue to
      process/validate the next messages coming in the same transaction.
      
      Problem is that if the base chain policy is 'drop', it will take effect
      immediately, which causes all traffic to get blocked until the
      transaction completes or is aborted.
      
      Fix this by deferring the policy until the transaction has been
      processed and all of the rules have been flagged as active.
      Reported-by: default avatarJann Haber <jann.haber@selfnet.de>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      66293c46
    • Eric Dumazet's avatar
      ipv6/flowlabel: wait rcu grace period before put_pid() · 6c0afef5
      Eric Dumazet authored
      syzbot was able to catch a use-after-free read in pid_nr_ns() [1]
      
      ip6fl_seq_show() seems to use RCU protection, dereferencing fl->owner.pid
      but fl_free() releases fl->owner.pid before rcu grace period is started.
      
      [1]
      
      BUG: KASAN: use-after-free in pid_nr_ns+0x128/0x140 kernel/pid.c:407
      Read of size 4 at addr ffff888094012a04 by task syz-executor.0/18087
      
      CPU: 0 PID: 18087 Comm: syz-executor.0 Not tainted 5.1.0-rc6+ #89
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x172/0x1f0 lib/dump_stack.c:113
       print_address_description.cold+0x7c/0x20d mm/kasan/report.c:187
       kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317
       __asan_report_load4_noabort+0x14/0x20 mm/kasan/generic_report.c:131
       pid_nr_ns+0x128/0x140 kernel/pid.c:407
       ip6fl_seq_show+0x2f8/0x4f0 net/ipv6/ip6_flowlabel.c:794
       seq_read+0xad3/0x1130 fs/seq_file.c:268
       proc_reg_read+0x1fe/0x2c0 fs/proc/inode.c:227
       do_loop_readv_writev fs/read_write.c:701 [inline]
       do_loop_readv_writev fs/read_write.c:688 [inline]
       do_iter_read+0x4a9/0x660 fs/read_write.c:922
       vfs_readv+0xf0/0x160 fs/read_write.c:984
       kernel_readv fs/splice.c:358 [inline]
       default_file_splice_read+0x475/0x890 fs/splice.c:413
       do_splice_to+0x12a/0x190 fs/splice.c:876
       splice_direct_to_actor+0x2d2/0x970 fs/splice.c:953
       do_splice_direct+0x1da/0x2a0 fs/splice.c:1062
       do_sendfile+0x597/0xd00 fs/read_write.c:1443
       __do_sys_sendfile64 fs/read_write.c:1498 [inline]
       __se_sys_sendfile64 fs/read_write.c:1490 [inline]
       __x64_sys_sendfile64+0x15a/0x220 fs/read_write.c:1490
       do_syscall_64+0x103/0x610 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      RIP: 0033:0x458da9
      Code: ad b8 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 7b b8 fb ff c3 66 2e 0f 1f 84 00 00 00 00
      RSP: 002b:00007f300d24bc78 EFLAGS: 00000246 ORIG_RAX: 0000000000000028
      RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 0000000000458da9
      RDX: 00000000200000c0 RSI: 0000000000000008 RDI: 0000000000000007
      RBP: 000000000073bf00 R08: 0000000000000000 R09: 0000000000000000
      R10: 000000000000005a R11: 0000000000000246 R12: 00007f300d24c6d4
      R13: 00000000004c5fa3 R14: 00000000004da748 R15: 00000000ffffffff
      
      Allocated by task 17543:
       save_stack+0x45/0xd0 mm/kasan/common.c:75
       set_track mm/kasan/common.c:87 [inline]
       __kasan_kmalloc mm/kasan/common.c:497 [inline]
       __kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:470
       kasan_slab_alloc+0xf/0x20 mm/kasan/common.c:505
       slab_post_alloc_hook mm/slab.h:437 [inline]
       slab_alloc mm/slab.c:3393 [inline]
       kmem_cache_alloc+0x11a/0x6f0 mm/slab.c:3555
       alloc_pid+0x55/0x8f0 kernel/pid.c:168
       copy_process.part.0+0x3b08/0x7980 kernel/fork.c:1932
       copy_process kernel/fork.c:1709 [inline]
       _do_fork+0x257/0xfd0 kernel/fork.c:2226
       __do_sys_clone kernel/fork.c:2333 [inline]
       __se_sys_clone kernel/fork.c:2327 [inline]
       __x64_sys_clone+0xbf/0x150 kernel/fork.c:2327
       do_syscall_64+0x103/0x610 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      Freed by task 7789:
       save_stack+0x45/0xd0 mm/kasan/common.c:75
       set_track mm/kasan/common.c:87 [inline]
       __kasan_slab_free+0x102/0x150 mm/kasan/common.c:459
       kasan_slab_free+0xe/0x10 mm/kasan/common.c:467
       __cache_free mm/slab.c:3499 [inline]
       kmem_cache_free+0x86/0x260 mm/slab.c:3765
       put_pid.part.0+0x111/0x150 kernel/pid.c:111
       put_pid+0x20/0x30 kernel/pid.c:105
       fl_free+0xbe/0xe0 net/ipv6/ip6_flowlabel.c:102
       ip6_fl_gc+0x295/0x3e0 net/ipv6/ip6_flowlabel.c:152
       call_timer_fn+0x190/0x720 kernel/time/timer.c:1325
       expire_timers kernel/time/timer.c:1362 [inline]
       __run_timers kernel/time/timer.c:1681 [inline]
       __run_timers kernel/time/timer.c:1649 [inline]
       run_timer_softirq+0x652/0x1700 kernel/time/timer.c:1694
       __do_softirq+0x266/0x95a kernel/softirq.c:293
      
      The buggy address belongs to the object at ffff888094012a00
       which belongs to the cache pid_2 of size 88
      The buggy address is located 4 bytes inside of
       88-byte region [ffff888094012a00, ffff888094012a58)
      The buggy address belongs to the page:
      page:ffffea0002500480 count:1 mapcount:0 mapping:ffff88809a483080 index:0xffff888094012980
      flags: 0x1fffc0000000200(slab)
      raw: 01fffc0000000200 ffffea00018a3508 ffffea0002524a88 ffff88809a483080
      raw: ffff888094012980 ffff888094012000 000000010000001b 0000000000000000
      page dumped because: kasan: bad access detected
      
      Memory state around the buggy address:
       ffff888094012900: fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc fc
       ffff888094012980: fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc fc
      >ffff888094012a00: fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc fc
                         ^
       ffff888094012a80: fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc fc
       ffff888094012b00: fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc fc
      
      Fixes: 4f82f457 ("net ip6 flowlabel: Make owner a union of struct pid * and kuid_t")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6c0afef5
    • Stephen Suryaputra's avatar
      vrf: Use orig netdev to count Ip6InNoRoutes and a fresh route lookup when sending dest unreach · 1d3fd8a1
      Stephen Suryaputra authored
      When there is no route to an IPv6 dest addr, skb_dst(skb) points
      to loopback dev in the case of that the IP6CB(skb)->iif is
      enslaved to a vrf. This causes Ip6InNoRoutes to be incremented on the
      loopback dev. This also causes the lookup to fail on icmpv6_send() and
      the dest unreachable to not sent and Ip6OutNoRoutes gets incremented on
      the loopback dev.
      
      To reproduce:
      * Gateway configuration:
              ip link add dev vrf_258 type vrf table 258
              ip link set dev enp0s9 master vrf_258
              ip addr add 66:1/64 dev enp0s9
              ip -6 route add unreachable default metric 8192 table 258
              sysctl -w net.ipv6.conf.all.forwarding=1
              sysctl -w net.ipv6.conf.enp0s9.forwarding=1
      * Sender configuration:
              ip addr add 66::2/64 dev enp0s9
              ip -6 route add default via 66::1
      and ping 67::1 for example from the sender.
      
      Fix this by counting on the original netdev and reset the skb dst to
      force a fresh lookup.
      
      v2: Fix typo of destination address in the repro steps.
      v3: Simplify the loopback check (per David Ahern) and use reverse
          Christmas tree format (per David Miller).
      Signed-off-by: default avatarStephen Suryaputra <ssuryaextr@gmail.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@gmail.com>
      Tested-by: default avatarDavid Ahern <dsahern@gmail.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1d3fd8a1
    • Eric Dumazet's avatar
      tcp: add sanity tests in tcp_add_backlog() · ca2fe295
      Eric Dumazet authored
      Richard and Bruno both reported that my commit added a bug,
      and Bruno was able to determine the problem came when a segment
      wih a FIN packet was coalesced to a prior one in tcp backlog queue.
      
      It turns out the header prediction in tcp_rcv_established()
      looks back to TCP headers in the packet, not in the metadata
      (aka TCP_SKB_CB(skb)->tcp_flags)
      
      The fast path in tcp_rcv_established() is not supposed to
      handle a FIN flag (it does not call tcp_fin())
      
      Therefore we need to make sure to propagate the FIN flag,
      so that the coalesced packet does not go through the fast path,
      the same than a GRO packet carrying a FIN flag.
      
      While we are at it, make sure we do not coalesce packets with
      RST or SYN, or if they do not have ACK set.
      
      Many thanks to Richard and Bruno for pinpointing the bad commit,
      and to Richard for providing a first version of the fix.
      
      Fixes: 4f693b55 ("tcp: implement coalescing on backlog queue")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarRichard Purdie <richard.purdie@linuxfoundation.org>
      Reported-by: default avatarBruno Prémont <bonbons@sysophe.eu>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ca2fe295
  4. 29 Apr, 2019 3 commits
  5. 28 Apr, 2019 4 commits
  6. 27 Apr, 2019 7 commits
  7. 26 Apr, 2019 12 commits
  8. 24 Apr, 2019 1 commit
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · cd8dead0
      Linus Torvalds authored
      Pull networking fixes from David Miller:
       "Just the usual assortment of small'ish fixes:
      
         1) Conntrack timeout is sometimes not initialized properly, from
            Alexander Potapenko.
      
         2) Add a reasonable range limit to tcp_min_rtt_wlen to avoid
            undefined behavior. From ZhangXiaoxu.
      
         3) des1 field of descriptor in stmmac driver is initialized with the
            wrong variable. From Yue Haibing.
      
         4) Increase mlxsw pci sw reset timeout a little bit more, from Ido
            Schimmel.
      
         5) Match IOT2000 stmmac devices more accurately, from Su Bao Cheng.
      
         6) Fallback refcount fix in TLS code, from Jakub Kicinski.
      
         7) Fix max MTU check when using XDP in mlx5, from Maxim Mikityanskiy.
      
         8) Fix recursive locking in team driver, from Hangbin Liu.
      
         9) Fix tls_set_device_offload_Rx() deadlock, from Jakub Kicinski.
      
        10) Don't use napi_alloc_frag() outside of softiq context of socionext
            driver, from Ilias Apalodimas.
      
        11) MAC address increment overflow in ncsi, from Tao Ren.
      
        12) Fix a regression in 8K/1M pool switching of RDS, from Zhu Yanjun.
      
        13) ipv4_link_failure has to validate the headers that are actually
            there because RAW sockets can pass in arbitrary garbage, from Eric
            Dumazet"
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (43 commits)
        ipv4: add sanity checks in ipv4_link_failure()
        net/rose: fix unbound loop in rose_loopback_timer()
        rxrpc: fix race condition in rxrpc_input_packet()
        net: rds: exchange of 8K and 1M pool
        net: vrf: Fix operation not supported when set vrf mac
        net/ncsi: handle overflow when incrementing mac address
        net: socionext: replace napi_alloc_frag with the netdev variant on init
        net: atheros: fix spelling mistake "underun" -> "underrun"
        spi: ST ST95HF NFC: declare missing of table
        spi: Micrel eth switch: declare missing of table
        net: stmmac: move stmmac_check_ether_addr() to driver probe
        netfilter: fix nf_l4proto_log_invalid to log invalid packets
        netfilter: never get/set skb->tstamp
        netfilter: ebtables: CONFIG_COMPAT: drop a bogus WARN_ON
        Documentation: decnet: remove reference to CONFIG_DECNET_ROUTE_FWMARK
        dt-bindings: add an explanation for internal phy-mode
        net/tls: don't leak IV and record seq when offload fails
        net/tls: avoid potential deadlock in tls_set_device_offload_rx()
        selftests/net: correct the return value for run_afpackettests
        team: fix possible recursive locking when add slaves
        ...
      cd8dead0