1. 07 Oct, 2013 3 commits
    • Alexei Starovoitov's avatar
      net: fix unsafe set_memory_rw from softirq · d45ed4a4
      Alexei Starovoitov authored
      on x86 system with net.core.bpf_jit_enable = 1
      
      sudo tcpdump -i eth1 'tcp port 22'
      
      causes the warning:
      [   56.766097]  Possible unsafe locking scenario:
      [   56.766097]
      [   56.780146]        CPU0
      [   56.786807]        ----
      [   56.793188]   lock(&(&vb->lock)->rlock);
      [   56.799593]   <Interrupt>
      [   56.805889]     lock(&(&vb->lock)->rlock);
      [   56.812266]
      [   56.812266]  *** DEADLOCK ***
      [   56.812266]
      [   56.830670] 1 lock held by ksoftirqd/1/13:
      [   56.836838]  #0:  (rcu_read_lock){.+.+..}, at: [<ffffffff8118f44c>] vm_unmap_aliases+0x8c/0x380
      [   56.849757]
      [   56.849757] stack backtrace:
      [   56.862194] CPU: 1 PID: 13 Comm: ksoftirqd/1 Not tainted 3.12.0-rc3+ #45
      [   56.868721] Hardware name: System manufacturer System Product Name/P8Z77 WS, BIOS 3007 07/26/2012
      [   56.882004]  ffffffff821944c0 ffff88080bbdb8c8 ffffffff8175a145 0000000000000007
      [   56.895630]  ffff88080bbd5f40 ffff88080bbdb928 ffffffff81755b14 0000000000000001
      [   56.909313]  ffff880800000001 ffff880800000000 ffffffff8101178f 0000000000000001
      [   56.923006] Call Trace:
      [   56.929532]  [<ffffffff8175a145>] dump_stack+0x55/0x76
      [   56.936067]  [<ffffffff81755b14>] print_usage_bug+0x1f7/0x208
      [   56.942445]  [<ffffffff8101178f>] ? save_stack_trace+0x2f/0x50
      [   56.948932]  [<ffffffff810cc0a0>] ? check_usage_backwards+0x150/0x150
      [   56.955470]  [<ffffffff810ccb52>] mark_lock+0x282/0x2c0
      [   56.961945]  [<ffffffff810ccfed>] __lock_acquire+0x45d/0x1d50
      [   56.968474]  [<ffffffff810cce6e>] ? __lock_acquire+0x2de/0x1d50
      [   56.975140]  [<ffffffff81393bf5>] ? cpumask_next_and+0x55/0x90
      [   56.981942]  [<ffffffff810cef72>] lock_acquire+0x92/0x1d0
      [   56.988745]  [<ffffffff8118f52a>] ? vm_unmap_aliases+0x16a/0x380
      [   56.995619]  [<ffffffff817628f1>] _raw_spin_lock+0x41/0x50
      [   57.002493]  [<ffffffff8118f52a>] ? vm_unmap_aliases+0x16a/0x380
      [   57.009447]  [<ffffffff8118f52a>] vm_unmap_aliases+0x16a/0x380
      [   57.016477]  [<ffffffff8118f44c>] ? vm_unmap_aliases+0x8c/0x380
      [   57.023607]  [<ffffffff810436b0>] change_page_attr_set_clr+0xc0/0x460
      [   57.030818]  [<ffffffff810cfb8d>] ? trace_hardirqs_on+0xd/0x10
      [   57.037896]  [<ffffffff811a8330>] ? kmem_cache_free+0xb0/0x2b0
      [   57.044789]  [<ffffffff811b59c3>] ? free_object_rcu+0x93/0xa0
      [   57.051720]  [<ffffffff81043d9f>] set_memory_rw+0x2f/0x40
      [   57.058727]  [<ffffffff8104e17c>] bpf_jit_free+0x2c/0x40
      [   57.065577]  [<ffffffff81642cba>] sk_filter_release_rcu+0x1a/0x30
      [   57.072338]  [<ffffffff811108e2>] rcu_process_callbacks+0x202/0x7c0
      [   57.078962]  [<ffffffff81057f17>] __do_softirq+0xf7/0x3f0
      [   57.085373]  [<ffffffff81058245>] run_ksoftirqd+0x35/0x70
      
      cannot reuse jited filter memory, since it's readonly,
      so use original bpf insns memory to hold work_struct
      
      defer kfree of sk_filter until jit completed freeing
      
      tested on x86_64 and i386
      Signed-off-by: default avatarAlexei Starovoitov <ast@plumgrid.com>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d45ed4a4
    • Oussama Ghorbel's avatar
      ipv6: Allow the MTU of ipip6 tunnel to be set below 1280 · 582442d6
      Oussama Ghorbel authored
      The (inner) MTU of a ipip6 (IPv4-in-IPv6) tunnel cannot be set below 1280, which is the minimum MTU in IPv6.
      However, there should be no IPv6 on the tunnel interface at all, so the IPv6 rules should not apply.
      More info at https://bugzilla.kernel.org/show_bug.cgi?id=15530
      
      This patch allows to check the minimum MTU for ipv6 tunnel according to these rules:
      -In case the tunnel is configured with ipip6 mode the minimum MTU is 68.
      -In case the tunnel is configured with ip6ip6 or any mode the minimum MTU is 1280.
      Signed-off-by: default avatarOussama Ghorbel <ou.ghorbel@gmail.com>
      Acked-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      582442d6
    • Michael S. Tsirkin's avatar
      netif_set_xps_queue: make cpu mask const · 3573540c
      Michael S. Tsirkin authored
      virtio wants to pass in cpumask_of(cpu), make parameter
      const to avoid build warnings.
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3573540c
  2. 04 Oct, 2013 1 commit
    • Eric Dumazet's avatar
      tcp: do not forget FIN in tcp_shifted_skb() · 5e8a402f
      Eric Dumazet authored
      Yuchung found following problem :
      
       There are bugs in the SACK processing code, merging part in
       tcp_shift_skb_data(), that incorrectly resets or ignores the sacked
       skbs FIN flag. When a receiver first SACK the FIN sequence, and later
       throw away ofo queue (e.g., sack-reneging), the sender will stop
       retransmitting the FIN flag, and hangs forever.
      
      Following packetdrill test can be used to reproduce the bug.
      
      $ cat sack-merge-bug.pkt
      `sysctl -q net.ipv4.tcp_fack=0`
      
      // Establish a connection and send 10 MSS.
      0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
      +.000 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
      +.000 bind(3, ..., ...) = 0
      +.000 listen(3, 1) = 0
      
      +.050 < S 0:0(0) win 32792 <mss 1000,sackOK,nop,nop,nop,wscale 7>
      +.000 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 6>
      +.001 < . 1:1(0) ack 1 win 1024
      +.000 accept(3, ..., ...) = 4
      
      +.100 write(4, ..., 12000) = 12000
      +.000 shutdown(4, SHUT_WR) = 0
      +.000 > . 1:10001(10000) ack 1
      +.050 < . 1:1(0) ack 2001 win 257
      +.000 > FP. 10001:12001(2000) ack 1
      +.050 < . 1:1(0) ack 2001 win 257 <sack 10001:11001,nop,nop>
      +.050 < . 1:1(0) ack 2001 win 257 <sack 10001:12002,nop,nop>
      // SACK reneg
      +.050 < . 1:1(0) ack 12001 win 257
      +0 %{ print "unacked: ",tcpi_unacked }%
      +5 %{ print "" }%
      
      First, a typo inverted left/right of one OR operation, then
      code forgot to advance end_seq if the merged skb carried FIN.
      
      Bug was added in 2.6.29 by commit 832d11c5
      ("tcp: Try to restore large SKBs while SACK processing")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarYuchung Cheng <ycheng@google.com>
      Acked-by: default avatarNeal Cardwell <ncardwell@google.com>
      Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
      Acked-by: default avatarIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5e8a402f
  3. 03 Oct, 2013 4 commits
  4. 02 Oct, 2013 21 commits
  5. 01 Oct, 2013 11 commits
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · c31eeace
      Linus Torvalds authored
      Pull networking changes from David Miller:
      
       1) Multiply in netfilter IPVS can overflow when calculating destination
          weight.  From Simon Kirby.
      
       2) Use after free fixes in IPVS from Julian Anastasov.
      
       3) SFC driver bug fixes from Daniel Pieczko.
      
       4) Memory leak in pcan_usb_core failure paths, from Alexey Khoroshilov.
      
       5) Locking and encapsulation fixes to serial line CAN driver, from
          Andrew Naujoks.
      
       6) Duplex and VF handling fixes to bnx2x driver from Yaniv Rosner,
          Eilon Greenstein, and Ariel Elior.
      
       7) In lapb, if no other packets are outstanding, T1 timeouts actually
          stall things and no packet gets sent.  Fix from Josselin Costanzi.
      
       8) ICMP redirects should not make it to the socket error queues, from
          Duan Jiong.
      
       9) Fix bugs in skge DMA mapping error handling, from Nikulas Patocka.
      
      10) Fix setting of VLAN priority field on via-rhine driver, from Roget
          Luethi.
      
      11) Fix TX stalls and VLAN promisc programming in be2net driver from
          Ajit Khaparde.
      
      12) Packet padding doesn't get handled correctly in new usbnet SG
          support code, from Ming Lei.
      
      13) Fix races in netdevice teardown wrt.  network namespace closing.
          From Eric W.  Biederman.
      
      14) Fix potential missed initialization of net_secret if not TCP
          connections are openned.  From Eric Dumazet.
      
      15) Cinterion PLXX product ID in qmi_wwan driver is wrong, from
          Aleksander Morgado.
      
      16) skb_cow_head() can change skb->data and thus packet header pointers,
          don't use stale ip_hdr reference in ip_tunnel code.
      
      17) Backend state transition handling fixes in xen-netback, from Paul
          Durrant.
      
      18) Packet offset for AH protocol is handled wrong in flow dissector,
          from Eric Dumazet.
      
      19) Taking down an fq packet scheduler instance can leave stale packets
          in the queues, fix from Eric Dumazet.
      
      20) Fix performance regressions introduced by TCP Small Queues.  From
          Eric Dumazet.
      
      21) IPV6 GRE tunneling code calculates max_headroom incorrectly, from
          Hannes Frederic Sowa.
      
      22) Multicast timer handlers in ipv4 and ipv6 can be the last and final
          reference to the ipv4/ipv6 specific network device state, so use the
          reference put that will check and release the object if the
          reference hits zero.  From Salam Noureddine.
      
      23) Fix memory corruption in ip_tunnel driver, and use skb_push()
          instead of __skb_push() so that similar bugs are less hard to find.
          From Steffen Klassert.
      
      24) Add forgotten hookup of rtnl_ops in SIT and ip6tnl drivers, from
          Nicolas Dichtel.
      
      25) fq scheduler doesn't accurately rate limit in certain circumstances,
          from Eric Dumazet.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (103 commits)
        pkt_sched: fq: rate limiting improvements
        ip6tnl: allow to use rtnl ops on fb tunnel
        sit: allow to use rtnl ops on fb tunnel
        ip_tunnel: Remove double unregister of the fallback device
        ip_tunnel_core: Change __skb_push back to skb_push
        ip_tunnel: Add fallback tunnels to the hash lists
        ip_tunnel: Fix a memory corruption in ip_tunnel_xmit
        qlcnic: Fix SR-IOV configuration
        ll_temac: Reset dma descriptors indexes on ndo_open
        skbuff: size of hole is wrong in a comment
        ipv6 mcast: use in6_dev_put in timer handlers instead of __in6_dev_put
        ipv4 igmp: use in_dev_put in timer handlers instead of __in_dev_put
        ethernet: moxa: fix incorrect placement of __initdata tag
        ipv6: gre: correct calculation of max_headroom
        powerpc/83xx: gianfar_ptp: select 1588 clock source through dts file
        Revert "powerpc/83xx: gianfar_ptp: select 1588 clock source through dts file"
        bonding: Fix broken promiscuity reference counting issue
        tcp: TSQ can use a dynamic limit
        dm9601: fix IFF_ALLMULTI handling
        pkt_sched: fq: qdisc dismantle fixes
        ...
      c31eeace
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc · 0b936842
      Linus Torvalds authored
      Pull sparc fix from David Miller:
       "Just a single bug fix to a regression added during some strlcpy()
        conversions"
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
        sparc64: Fix buggy strlcpy() conversion in ldom_reboot().
      0b936842
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · 517bf8fc
      Linus Torvalds authored
      Pull vfs lru leak fix from Al Viro:
       "The fix in "super: fix for destroy lrus" didn't - they need to be
        destroyed, all right, but that's the wrong place..."
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
        fs/super.c: fix lru_list leak for real
      517bf8fc
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/virt/kvm/kvm · 77c4ad8e
      Linus Torvalds authored
      Pull two KVM fixes from Gleb Natapov.
      
      * git://git.kernel.org/pub/scm/virt/kvm/kvm:
        KVM: VMX: do not check bit 12 of EPT violation exit qualification when undefined
        ARM: kvm: rename cpu_reset to avoid name clash
      77c4ad8e
    • Al Viro's avatar
      fs/super.c: fix lru_list leak for real · c2d22ecd
      Al Viro authored
      Freeing ->s_{inode,dentry}_lru in deactivate_locked_super() is wrong;
      the right place is destroy_super().  As it is, we leak them if sget()
      decides that new superblock it has allocated (and never shown to
      anybody) isn't needed and should be freed.
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      c2d22ecd
    • Eric Dumazet's avatar
      pkt_sched: fq: rate limiting improvements · 0eab5eb7
      Eric Dumazet authored
      FQ rate limiting suffers from two problems, reported
      by Steinar :
      
      1) FQ enforces a delay when flow quantum is exhausted in order
      to reduce cpu overhead. But if packets are small, current
      delay computation is slightly wrong, and observed rates can
      be too high.
      
      Steinar had this problem because he disabled TSO and GSO,
      and default FQ quantum is 2*1514.
      
      (Of course, I wish recent TSO auto sizing changes will help
      to not having to disable TSO in the first place)
      
      2) maxrate was not used for forwarded flows (skbs not attached
      to a socket)
      
      Tested:
      
      tc qdisc add dev eth0 root est 1sec 4sec fq maxrate 8Mbit
      netperf -H lpq84 -l 1000 &
      sleep 10 ; tc -s qdisc show dev eth0
      qdisc fq 8003: root refcnt 32 limit 10000p flow_limit 100p buckets 1024
       quantum 3028 initial_quantum 15140 maxrate 8000Kbit
       Sent 16819357 bytes 11258 pkt (dropped 0, overlimits 0 requeues 0)
       rate 7831Kbit 653pps backlog 7570b 5p requeues 0
        44 flows (43 inactive, 1 throttled), next packet delay 2977352 ns
        0 gc, 0 highprio, 5545 throttled
      
      lpq83:~# tcpdump -p -i eth0 host lpq84 -c 12
      09:02:52.079484 IP lpq83 > lpq84: . 1389536928:1389538376(1448) ack 3808678021 win 457 <nop,nop,timestamp 961812 572609068>
      09:02:52.079499 IP lpq83 > lpq84: . 1448:2896(1448) ack 1 win 457 <nop,nop,timestamp 961812 572609068>
      09:02:52.079906 IP lpq84 > lpq83: . ack 2896 win 16384 <nop,nop,timestamp 572609080 961812>
      09:02:52.082568 IP lpq83 > lpq84: . 2896:4344(1448) ack 1 win 457 <nop,nop,timestamp 961815 572609071>
      09:02:52.082581 IP lpq83 > lpq84: . 4344:5792(1448) ack 1 win 457 <nop,nop,timestamp 961815 572609071>
      09:02:52.083017 IP lpq84 > lpq83: . ack 5792 win 16384 <nop,nop,timestamp 572609083 961815>
      09:02:52.085678 IP lpq83 > lpq84: . 5792:7240(1448) ack 1 win 457 <nop,nop,timestamp 961818 572609074>
      09:02:52.085693 IP lpq83 > lpq84: . 7240:8688(1448) ack 1 win 457 <nop,nop,timestamp 961818 572609074>
      09:02:52.086117 IP lpq84 > lpq83: . ack 8688 win 16384 <nop,nop,timestamp 572609086 961818>
      09:02:52.088792 IP lpq83 > lpq84: . 8688:10136(1448) ack 1 win 457 <nop,nop,timestamp 961821 572609077>
      09:02:52.088806 IP lpq83 > lpq84: . 10136:11584(1448) ack 1 win 457 <nop,nop,timestamp 961821 572609077>
      09:02:52.089217 IP lpq84 > lpq83: . ack 11584 win 16384 <nop,nop,timestamp 572609090 961821>
      Reported-by: default avatarSteinar H. Gunderson <sesse@google.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0eab5eb7
    • Nicolas Dichtel's avatar
      ip6tnl: allow to use rtnl ops on fb tunnel · bb814094
      Nicolas Dichtel authored
      rtnl ops where introduced by c075b130 ("ip6tnl: advertise tunnel param via
      rtnl"), but I forget to assign rtnl ops to fb tunnels.
      
      Now that it is done, we must remove the explicit call to
      unregister_netdevice_queue(), because  the fallback tunnel is added to the queue
      in ip6_tnl_destroy_tunnels() when checking rtnl_link_ops of all netdevices (this
      is valid since commit 0bd87628 ("ip6tnl: add x-netns support")).
      Signed-off-by: default avatarNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bb814094
    • Nicolas Dichtel's avatar
      sit: allow to use rtnl ops on fb tunnel · 205983c4
      Nicolas Dichtel authored
      rtnl ops where introduced by ba3e3f50 ("sit: advertise tunnel param via
      rtnl"), but I forget to assign rtnl ops to fb tunnels.
      
      Now that it is done, we must remove the explicit call to
      unregister_netdevice_queue(), because  the fallback tunnel is added to the queue
      in sit_destroy_tunnels() when checking rtnl_link_ops of all netdevices (this
      is valid since commit 5e6700b3 ("sit: add support of x-netns")).
      Signed-off-by: default avatarNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      205983c4
    • David S. Miller's avatar
      Merge branch 'ip_tunnel' · 9cb17124
      David S. Miller authored
      ip_tunnel bug fixes from Steffen Klassert.
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9cb17124
    • Steffen Klassert's avatar
      ip_tunnel: Remove double unregister of the fallback device · cfe4a536
      Steffen Klassert authored
      When queueing the netdevices for removal, we queue the
      fallback device twice in ip_tunnel_destroy(). The first
      time when we queue all netdevices in the namespace and
      then again explicitly. Fix this by removing the explicit
      queueing of the fallback device.
      
      Bug was introduced when network namespace support was added
      with commit 6c742e71 ("ipip: add x-netns support").
      
      Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: default avatarSteffen Klassert <steffen.klassert@secunet.com>
      Acked-by: default avatarNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cfe4a536
    • Steffen Klassert's avatar
      ip_tunnel_core: Change __skb_push back to skb_push · 78a3694d
      Steffen Klassert authored
      Git commit 0e6fbc5b ("ip_tunnels: extend iptunnel_xmit()")
      moved the IP header installation to iptunnel_xmit() and
      changed skb_push() to __skb_push(). This makes possible
      bugs hard to track down, so change it back to skb_push().
      
      Cc: Pravin Shelar <pshelar@nicira.com>
      Signed-off-by: default avatarSteffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      78a3694d