1. 20 Oct, 2018 34 commits
  2. 18 Oct, 2018 6 commits
    • Greg Kroah-Hartman's avatar
      Linux 4.9.134 · 1d326a94
      Greg Kroah-Hartman authored
      1d326a94
    • Dan Carpenter's avatar
      ipv4: frags: precedence bug in ip_expire() · 5a0f340f
      Dan Carpenter authored
      (commit 70837ffe upstream)
      
      We accidentally removed the parentheses here, but they are required
      because '!' has higher precedence than '&'.
      
      Fixes: fa0f5273 ("ip: use rb trees for IP frag queue.")
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5a0f340f
    • Taehee Yoo's avatar
      ip: frags: fix crash in ip_do_fragment() · 85e59af9
      Taehee Yoo authored
      commit 5d407b07 upstream
      
      A kernel crash occurrs when defragmented packet is fragmented
      in ip_do_fragment().
      In defragment routine, skb_orphan() is called and
      skb->ip_defrag_offset is set. but skb->sk and
      skb->ip_defrag_offset are same union member. so that
      frag->sk is not NULL.
      Hence crash occurrs in skb->sk check routine in ip_do_fragment() when
      defragmented packet is fragmented.
      
      test commands:
         %iptables -t nat -I POSTROUTING -j MASQUERADE
         %hping3 192.168.4.2 -s 1000 -p 2000 -d 60000
      
      splat looks like:
      [  261.069429] kernel BUG at net/ipv4/ip_output.c:636!
      [  261.075753] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI
      [  261.083854] CPU: 1 PID: 1349 Comm: hping3 Not tainted 4.19.0-rc2+ #3
      [  261.100977] RIP: 0010:ip_do_fragment+0x1613/0x2600
      [  261.106945] Code: e8 e2 38 e3 fe 4c 8b 44 24 18 48 8b 74 24 08 e9 92 f6 ff ff 80 3c 02 00 0f 85 da 07 00 00 48 8b b5 d0 00 00 00 e9 25 f6 ff ff <0f> 0b 0f 0b 44 8b 54 24 58 4c 8b 4c 24 18 4c 8b 5c 24 60 4c 8b 6c
      [  261.127015] RSP: 0018:ffff8801031cf2c0 EFLAGS: 00010202
      [  261.134156] RAX: 1ffff1002297537b RBX: ffffed0020639e6e RCX: 0000000000000004
      [  261.142156] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff880114ba9bd8
      [  261.150157] RBP: ffff880114ba8a40 R08: ffffed0022975395 R09: ffffed0022975395
      [  261.158157] R10: 0000000000000001 R11: ffffed0022975394 R12: ffff880114ba9ca4
      [  261.166159] R13: 0000000000000010 R14: ffff880114ba9bc0 R15: dffffc0000000000
      [  261.174169] FS:  00007fbae2199700(0000) GS:ffff88011b400000(0000) knlGS:0000000000000000
      [  261.183012] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  261.189013] CR2: 00005579244fe000 CR3: 0000000119bf4000 CR4: 00000000001006e0
      [  261.198158] Call Trace:
      [  261.199018]  ? dst_output+0x180/0x180
      [  261.205011]  ? save_trace+0x300/0x300
      [  261.209018]  ? ip_copy_metadata+0xb00/0xb00
      [  261.213034]  ? sched_clock_local+0xd4/0x140
      [  261.218158]  ? kill_l4proto+0x120/0x120 [nf_conntrack]
      [  261.223014]  ? rt_cpu_seq_stop+0x10/0x10
      [  261.227014]  ? find_held_lock+0x39/0x1c0
      [  261.233008]  ip_finish_output+0x51d/0xb50
      [  261.237006]  ? ip_fragment.constprop.56+0x220/0x220
      [  261.243011]  ? nf_ct_l4proto_register_one+0x5b0/0x5b0 [nf_conntrack]
      [  261.250152]  ? rcu_is_watching+0x77/0x120
      [  261.255010]  ? nf_nat_ipv4_out+0x1e/0x2b0 [nf_nat_ipv4]
      [  261.261033]  ? nf_hook_slow+0xb1/0x160
      [  261.265007]  ip_output+0x1c7/0x710
      [  261.269005]  ? ip_mc_output+0x13f0/0x13f0
      [  261.273002]  ? __local_bh_enable_ip+0xe9/0x1b0
      [  261.278152]  ? ip_fragment.constprop.56+0x220/0x220
      [  261.282996]  ? nf_hook_slow+0xb1/0x160
      [  261.287007]  raw_sendmsg+0x21f9/0x4420
      [  261.291008]  ? dst_output+0x180/0x180
      [  261.297003]  ? sched_clock_cpu+0x126/0x170
      [  261.301003]  ? find_held_lock+0x39/0x1c0
      [  261.306155]  ? stop_critical_timings+0x420/0x420
      [  261.311004]  ? check_flags.part.36+0x450/0x450
      [  261.315005]  ? _raw_spin_unlock_irq+0x29/0x40
      [  261.320995]  ? _raw_spin_unlock_irq+0x29/0x40
      [  261.326142]  ? cyc2ns_read_end+0x10/0x10
      [  261.330139]  ? raw_bind+0x280/0x280
      [  261.334138]  ? sched_clock_cpu+0x126/0x170
      [  261.338995]  ? check_flags.part.36+0x450/0x450
      [  261.342991]  ? __lock_acquire+0x4500/0x4500
      [  261.348994]  ? inet_sendmsg+0x11c/0x500
      [  261.352989]  ? dst_output+0x180/0x180
      [  261.357012]  inet_sendmsg+0x11c/0x500
      [ ... ]
      
      v2:
       - clear skb->sk at reassembly routine.(Eric Dumarzet)
      
      Fixes: fa0f5273 ("ip: use rb trees for IP frag queue.")
      Suggested-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarTaehee Yoo <ap420073@gmail.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      85e59af9
    • Peter Oskolkov's avatar
      ip: process in-order fragments efficiently · 4077ddb2
      Peter Oskolkov authored
      This patch changes the runtime behavior of IP defrag queue:
      incoming in-order fragments are added to the end of the current
      list/"run" of in-order fragments at the tail.
      
      On some workloads, UDP stream performance is substantially improved:
      
      RX: ./udp_stream -F 10 -T 2 -l 60
      TX: ./udp_stream -c -H <host> -F 10 -T 5 -l 60
      
      with this patchset applied on a 10Gbps receiver:
      
        throughput=9524.18
        throughput_units=Mbit/s
      
      upstream (net-next):
      
        throughput=4608.93
        throughput_units=Mbit/s
      Reported-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarPeter Oskolkov <posk@google.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Florian Westphal <fw@strlen.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      (cherry picked from commit a4fd284a)
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4077ddb2
    • Peter Oskolkov's avatar
      ip: add helpers to process in-order fragments faster. · e9e4ac48
      Peter Oskolkov authored
      This patch introduces several helper functions/macros that will be
      used in the follow-up patch. No runtime changes yet.
      
      The new logic (fully implemented in the second patch) is as follows:
      
      * Nodes in the rb-tree will now contain not single fragments, but lists
        of consecutive fragments ("runs").
      
      * At each point in time, the current "active" run at the tail is
        maintained/tracked. Fragments that arrive in-order, adjacent
        to the previous tail fragment, are added to this tail run without
        triggering the re-balancing of the rb-tree.
      
      * If a fragment arrives out of order with the offset _before_ the tail run,
        it is inserted into the rb-tree as a single fragment.
      
      * If a fragment arrives after the current tail fragment (with a gap),
        it starts a new "tail" run, as is inserted into the rb-tree
        at the end as the head of the new run.
      
      skb->cb is used to store additional information
      needed here (suggested by Eric Dumazet).
      Reported-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarPeter Oskolkov <posk@google.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Florian Westphal <fw@strlen.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      (cherry picked from commit 353c9cb3)
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e9e4ac48
    • Peter Oskolkov's avatar
      ip: use rb trees for IP frag queue. · 10043954
      Peter Oskolkov authored
      (commit fa0f5273 upstream)
      
      Similar to TCP OOO RX queue, it makes sense to use rb trees to store
      IP fragments, so that OOO fragments are inserted faster.
      
      Tested:
      
      - a follow-up patch contains a rather comprehensive ip defrag
        self-test (functional)
      - ran neper `udp_stream -c -H <host> -F 100 -l 300 -T 20`:
          netstat --statistics
          Ip:
              282078937 total packets received
              0 forwarded
              0 incoming packets discarded
              946760 incoming packets delivered
              18743456 requests sent out
              101 fragments dropped after timeout
              282077129 reassemblies required
              944952 packets reassembled ok
              262734239 packet reassembles failed
         (The numbers/stats above are somewhat better re:
          reassemblies vs a kernel without this patchset. More
          comprehensive performance testing TBD).
      Reported-by: default avatarJann Horn <jannh@google.com>
      Reported-by: default avatarJuha-Matti Tilli <juha-matti.tilli@iki.fi>
      Suggested-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarPeter Oskolkov <posk@google.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Florian Westphal <fw@strlen.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      10043954