1. 15 Oct, 2014 33 commits
    • Joe Savage's avatar
      USB: serial: cp210x: added Ketra N1 wireless interface support · 8cfcc3eb
      Joe Savage authored
      commit bfc2d7df upstream.
      
      Added support for Ketra N1 wireless interface, which uses the
      Silicon Labs' CP2104 USB to UART bridge with customized PID 8946.
      Signed-off-by: default avatarJoe Savage <joe.savage@goketra.com>
      Signed-off-by: default avatarJohan Hovold <johan@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8cfcc3eb
    • Lu Baolu's avatar
      USB: Add device quirk for ASUS T100 Base Station keyboard · 5b1ab22a
      Lu Baolu authored
      commit ddbe1fca upstream.
      
      This full-speed USB device generates spurious remote wakeup event
      as soon as USB_DEVICE_REMOTE_WAKEUP feature is set. As the result,
      Linux can't enter system suspend and S0ix power saving modes once
      this keyboard is used.
      
      This patch tries to introduce USB_QUIRK_IGNORE_REMOTE_WAKEUP quirk.
      With this quirk set, wakeup capability will be ignored during
      device configure.
      
      This patch could be back-ported to kernels as old as 2.6.39.
      Signed-off-by: default avatarLu Baolu <baolu.lu@linux.intel.com>
      Acked-by: default avatarAlan Stern <stern@rowland.harvard.edu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5b1ab22a
    • Per Hurtig's avatar
      tcp: fixing TLP's FIN recovery · 53fd1aa4
      Per Hurtig authored
      [ Upstream commit bef1909e ]
      
      Fix to a problem observed when losing a FIN segment that does not
      contain data.  In such situations, TLP is unable to recover from
      *any* tail loss and instead adds at least PTO ms to the
      retransmission process, i.e., RTO = RTO + PTO.
      Signed-off-by: default avatarPer Hurtig <per.hurtig@kau.se>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Acked-by: default avatarNandita Dukkipati <nanditad@google.com>
      Acked-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      53fd1aa4
    • Vlad Yasevich's avatar
      sctp: handle association restarts when the socket is closed. · ce8c5039
      Vlad Yasevich authored
      [ Upstream commit bdf6fa52 ]
      
      Currently association restarts do not take into consideration the
      state of the socket.  When a restart happens, the current assocation
      simply transitions into established state.  This creates a condition
      where a remote system, through a the restart procedure, may create a
      local association that is no way reachable by user.  The conditions
      to trigger this are as follows:
        1) Remote does not acknoledge some data causing data to remain
           outstanding.
        2) Local application calls close() on the socket.  Since data
           is still outstanding, the association is placed in SHUTDOWN_PENDING
           state.  However, the socket is closed.
        3) The remote tries to create a new association, triggering a restart
           on the local system.  The association moves from SHUTDOWN_PENDING
           to ESTABLISHED.  At this point, it is no longer reachable by
           any socket on the local system.
      
      This patch addresses the above situation by moving the newly ESTABLISHED
      association into SHUTDOWN-SENT state and bundling a SHUTDOWN after
      the COOKIE-ACK chunk.  This way, the restarted associate immidiately
      enters the shutdown procedure and forces the termination of the
      unreachable association.
      Reported-by: default avatarDavid Laight <David.Laight@aculab.com>
      Signed-off-by: default avatarVlad Yasevich <vyasevich@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ce8c5039
    • Joe Lawrence's avatar
      team: avoid race condition in scheduling delayed work · 5c4b226c
      Joe Lawrence authored
      [ Upstream commit 47549650 ]
      
      When team_notify_peers and team_mcast_rejoin are called, they both reset
      their respective .count_pending atomic variable. Then when the actual
      worker function is executed, the variable is atomically decremented.
      This pattern introduces a potential race condition where the
      .count_pending rolls over and the worker function keeps rescheduling
      until .count_pending decrements to zero again:
      
      THREAD 1                           THREAD 2
      
      ========                           ========
      team_notify_peers(teamX)
        atomic_set count_pending = 1
        schedule_delayed_work
                                         team_notify_peers(teamX)
                                         atomic_set count_pending = 1
      team_notify_peers_work
        atomic_dec_and_test
          count_pending = 0
        (return)
                                         schedule_delayed_work
                                         team_notify_peers_work
                                         atomic_dec_and_test
                                           count_pending = -1
                                         schedule_delayed_work
                                         (repeat until count_pending = 0)
      
      Instead of assigning a new value to .count_pending, use atomic_add to
      tack-on the additional desired worker function invocations.
      Signed-off-by: default avatarJoe Lawrence <joe.lawrence@stratus.com>
      Acked-by: default avatarJiri Pirko <jiri@resnulli.us>
      Fixes: fc423ff0 ("team: add peer notification")
      Fixes: 492b200e ("team: add support for sending multicast rejoins")
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5c4b226c
    • Nicolas Dichtel's avatar
      ip6_gre: fix flowi6_proto value in xmit path · 79152e44
      Nicolas Dichtel authored
      [ Upstream commit 3be07244 ]
      
      In xmit path, we build a flowi6 which will be used for the output route lookup.
      We are sending a GRE packet, neither IPv4 nor IPv6 encapsulated packet, thus the
      protocol should be IPPROTO_GRE.
      
      Fixes: c12b395a ("gre: Support GRE over IPv6")
      Reported-by: default avatarMatthieu Ternisien d'Ouville <matthieu.tdo@6wind.com>
      Signed-off-by: default avatarNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      79152e44
    • KY Srinivasan's avatar
      hyperv: Fix a bug in netvsc_start_xmit() · 60fa7e97
      KY Srinivasan authored
      [ Upstream commit dedb845d ]
      
      After the packet is successfully sent, we should not touch the skb
      as it may have been freed. This patch is based on the work done by
      Long Li <longli@microsoft.com>.
      
      In this version of the patch I have fixed issues pointed out by David.
      David, please queue this up for stable.
      Signed-off-by: default avatarK. Y. Srinivasan <kys@microsoft.com>
      Tested-by: default avatarLong Li <longli@microsoft.com>
      Tested-by: default avatarSitsofe Wheeler <sitsofe@yahoo.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      60fa7e97
    • Eric Dumazet's avatar
      gro: fix aggregation for skb using frag_list · 48a6420f
      Eric Dumazet authored
      [ Upstream commit 73d3fe6d ]
      
      In commit 8a29111c ("net: gro: allow to build full sized skb")
      I added a regression for linear skb that traditionally force GRO
      to use the frag_list fallback.
      
      Erez Shitrit found that at most two segments were aggregated and
      the "if (skb_gro_len(p) != pinfo->gso_size)" test was failing.
      
      This is because pinfo at this spot still points to the last skb in the
      chain, instead of the first one, where we find the correct gso_size
      information.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Fixes: 8a29111c ("net: gro: allow to build full sized skb")
      Reported-by: default avatarErez Shitrit <erezsh@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      48a6420f
    • Soren Brinkmann's avatar
      Revert "net/macb: add pinctrl consumer support" · e31689a6
      Soren Brinkmann authored
      [ Upstream commit 9026968a ]
      
      This reverts commit 8ef29f8a.
      The driver core already calls pinctrl_get() and claims the default
      state. There is no need to replicate this in the driver.
      Acked-by: default avatarNicolas Ferre <nicolas.ferre@atmel.com>
      Acked-by: default avatarNicolas Ferre <nicolas.ferre@atmel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e31689a6
    • Vlad Yasevich's avatar
      macvtap: Fix race between device delete and open. · e714dac8
      Vlad Yasevich authored
      [ Upstream commit 40b8fe45 ]
      
      In macvtap device delete and open calls can race and
      this causes a list curruption of the vlan queue_list.
      
      The race intself is triggered by the idr accessors
      that located the vlan device.  The device is stored
      into and removed from the idr under both an rtnl and
      a mutex.  However, when attempting to locate the device
      in idr, only a mutex is taken.  As a result, once cpu
      perfoming a delete may take an rtnl and wait for the mutex,
      while another cput doing an open() will take the idr
      mutex first to fetch the device pointer and later take
      an rtnl to add a queue for the device which may have
      just gotten deleted.
      
      With this patch, we now hold the rtnl for the duration
      of the macvtap_open() call thus making sure that
      open will not race with delete.
      
      CC: Michael S. Tsirkin <mst@redhat.com>
      CC: Jason Wang <jasowang@redhat.com>
      Signed-off-by: default avatarVladislav Yasevich <vyasevic@redhat.com>
      Acked-by: default avatarJason Wang <jasowang@redhat.com>
      Acked-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e714dac8
    • Steffen Klassert's avatar
      xfrm: Generate queueing routes only from route lookup functions · 8f20fcf0
      Steffen Klassert authored
      [ Upstream commit b8c203b2 ]
      
      Currently we genarate a queueing route if we have matching policies
      but can not resolve the states and the sysctl xfrm_larval_drop is
      disabled. Here we assume that dst_output() is called to kill the
      queued packets. Unfortunately this assumption is not true in all
      cases, so it is possible that these packets leave the system unwanted.
      
      We fix this by generating queueing routes only from the
      route lookup functions, here we can guarantee a call to
      dst_output() afterwards.
      
      Fixes: a0073fe1 ("xfrm: Add a state resolution packet queue")
      Reported-by: default avatarKonstantinos Kolelis <k.kolelis@sirrix.com>
      Signed-off-by: default avatarSteffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8f20fcf0
    • Steffen Klassert's avatar
      xfrm: Generate blackhole routes only from route lookup functions · 0845e2d0
      Steffen Klassert authored
      [ Upstream commit f92ee619 ]
      
      Currently we genarate a blackhole route route whenever we have
      matching policies but can not resolve the states. Here we assume
      that dst_output() is called to kill the balckholed packets.
      Unfortunately this assumption is not true in all cases, so
      it is possible that these packets leave the system unwanted.
      
      We fix this by generating blackhole routes only from the
      route lookup functions, here we can guarantee a call to
      dst_output() afterwards.
      
      Fixes: 2774c131 ("xfrm: Handle blackhole route creation via afinfo.")
      Reported-by: default avatarKonstantinos Kolelis <k.kolelis@sirrix.com>
      Signed-off-by: default avatarSteffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0845e2d0
    • Vlad Yasevich's avatar
      tg3: Allow for recieve of full-size 8021AD frames · 152fc44a
      Vlad Yasevich authored
      [ Upstream commit 7d3083ee ]
      
      When receiving a vlan-tagged frame that still contains
      a vlan header, the length of the packet will be greater
      then MTU+ETH_HLEN since it will account of the extra
      vlan header.  TG3 checks this for the case for 802.1Q,
      but not for 802.1ad.  As a result, full sized 802.1ad
      frames get dropped by the card.
      
      Add a check for 802.1ad protocol when receving full
      sized frames.
      Suggested-by: default avatarPrashant Sreedharan <prashant@broadcom.com>
      CC: Prashant Sreedharan <prashant@broadcom.com>
      CC: Michael Chan <mchan@broadcom.com>
      Signed-off-by: default avatarVladislav Yasevich <vyasevic@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      152fc44a
    • Vlad Yasevich's avatar
      tg3: Work around HW/FW limitations with vlan encapsulated frames · f28909e6
      Vlad Yasevich authored
      [ Upstream commit 476c1885 ]
      
      TG3 appears to have an issue performing TSO and checksum offloading
      correclty when the frame has been vlan encapsulated (non-accelrated).
      In these cases, tcp checksum is not correctly updated.
      
      This patch attempts to work around this issue.  After the patch,
      802.1ad vlans start working correctly over tg3 devices.
      
      CC: Prashant Sreedharan <prashant@broadcom.com>
      CC: Michael Chan <mchan@broadcom.com>
      Signed-off-by: default avatarVladislav Yasevich <vyasevic@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f28909e6
    • Francesco Ruggeri's avatar
      net: allow macvlans to move to net namespace · 72c01e6f
      Francesco Ruggeri authored
      [ Upstream commit 0d0162e7 ]
      
      I cannot move a macvlan interface created on top of a bonding interface
      to a different namespace:
      
      % ip netns add dummy0
      % ip link add link bond0 mac0 type macvlan
      % ip link set mac0 netns dummy0
      RTNETLINK answers: Invalid argument
      %
      
      The problem seems to be that commit f9399814 ("bonding: Don't allow
      bond devices to change network namespaces.") sets NETIF_F_NETNS_LOCAL
      on bonding interfaces, and commit 797f87f8 ("macvlan: fix netdev
      feature propagation from lower device") causes macvlan interfaces
      to inherit its features from the lower device.
      
      NETIF_F_NETNS_LOCAL should not be inherited from the lower device
      by a macvlan.
      Patch tested on 3.16.
      Signed-off-by: default avatarFrancesco Ruggeri <fruggeri@arista.com>
      Acked-by: default avatarCong Wang <cwang@twopensource.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      72c01e6f
    • Vlad Yasevich's avatar
      bridge: Fix br_should_learn to check vlan_enabled · 8b30313e
      Vlad Yasevich authored
      [ Upstream commit c095f248 ]
      
      As Toshiaki Makita pointed out, the BRIDGE_INPUT_SKB_CB will
      not be initialized in br_should_learn() as that function
      is called only from br_handle_local_finish().  That is
      an input handler for link-local ethernet traffic so it perfectly
      correct to check br->vlan_enabled here.
      
      Reported-by: Toshiaki Makita<toshiaki.makita1@gmail.com>
      Fixes: 20adfa1a bridge: Check if vlan filtering is enabled only once.
      Signed-off-by: default avatarVladislav Yasevich <vyasevic@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8b30313e
    • Vlad Yasevich's avatar
      bridge: Check if vlan filtering is enabled only once. · 00397b67
      Vlad Yasevich authored
      [ Upstream commit 20adfa1a ]
      
      The bridge code checks if vlan filtering is enabled on both
      ingress and egress.   When the state flip happens, it
      is possible for the bridge to currently be forwarding packets
      and forwarding behavior becomes non-deterministic.  Bridge
      may drop packets on some interfaces, but not others.
      
      This patch solves this by caching the filtered state of the
      packet into skb_cb on ingress.  The skb_cb is guaranteed to
      not be over-written between the time packet entres bridge
      forwarding path and the time it leaves it.  On egress, we
      can then check the cached state to see if we need to
      apply filtering information.
      Signed-off-by: default avatarVladislav Yasevich <vyasevic@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      00397b67
    • WANG Cong's avatar
      ipv6: restore the behavior of ipv6_sock_ac_drop() · d7c5b263
      WANG Cong authored
      [ Upstream commit de185ab4 ]
      
      It is possible that the interface is already gone after joining
      the list of anycast on this interface as we don't hold a refcount
      for the device, in this case we are safe to ignore the error.
      
      What's more important, for API compatibility we should not
      change this behavior for applications even if it were correct.
      
      Fixes: commit a9ed4a29 ("ipv6: fix rtnl locking in setsockopt for anycast and multicast")
      Cc: Sabrina Dubroca <sd@queasysnail.net>
      Cc: David S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Acked-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d7c5b263
    • Nikolay Aleksandrov's avatar
      bonding: fix div by zero while enslaving and transmitting · 9ede8fd5
      Nikolay Aleksandrov authored
      [ Upstream commit 9a72c2da ]
      
      The problem is that the slave is first linked and slave_cnt is
      incremented afterwards leading to a div by zero in the modes that use it
      as a modulus. What happens is that in bond_start_xmit()
      bond_has_slaves() is used to evaluate further transmission and it becomes
      true after the slave is linked in, but when slave_cnt is used in the xmit
      path it is still 0, so fetch it once and transmit based on that. Since
      it is used only in round-robin and XOR modes, the fix is only for them.
      Thanks to Eric Dumazet for pointing out the fault in my first try to fix
      this.
      
      Call trace (took it out of net-next kernel, but it's the same with net):
      [46934.330038] divide error: 0000 [#1] SMP
      [46934.330041] Modules linked in: bonding(O) 9p fscache
      snd_hda_codec_generic crct10dif_pclmul
      [46934.330041] bond0: Enslaving eth1 as an active interface with an up
      link
      [46934.330051]  ppdev joydev crc32_pclmul crc32c_intel 9pnet_virtio
      ghash_clmulni_intel snd_hda_intel 9pnet snd_hda_controller parport_pc
      serio_raw pcspkr snd_hda_codec parport virtio_balloon virtio_console
      snd_hwdep snd_pcm pvpanic i2c_piix4 snd_timer i2ccore snd soundcore
      virtio_blk virtio_net virtio_pci virtio_ring virtio ata_generic
      pata_acpi floppy [last unloaded: bonding]
      [46934.330053] CPU: 1 PID: 3382 Comm: ping Tainted: G           O
      3.17.0-rc4+ #27
      [46934.330053] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
      [46934.330054] task: ffff88005aebf2c0 ti: ffff88005b728000 task.ti:
      ffff88005b728000
      [46934.330059] RIP: 0010:[<ffffffffa0198c33>]  [<ffffffffa0198c33>]
      bond_start_xmit+0x1c3/0x450 [bonding]
      [46934.330060] RSP: 0018:ffff88005b72b7f8  EFLAGS: 00010246
      [46934.330060] RAX: 0000000000000679 RBX: ffff88004b077000 RCX:
      000000000000002a
      [46934.330061] RDX: 0000000000000000 RSI: ffff88004b3f0500 RDI:
      ffff88004b077940
      [46934.330061] RBP: ffff88005b72b830 R08: 00000000000000c0 R09:
      ffff88004a83e000
      [46934.330062] R10: 000000000000ffff R11: ffff88004b1f12c0 R12:
      ffff88004b3f0500
      [46934.330062] R13: ffff88004b3f0500 R14: 000000000000002a R15:
      ffff88004b077940
      [46934.330063] FS:  00007fbd91a4c740(0000) GS:ffff88005f080000(0000)
      knlGS:0000000000000000
      [46934.330064] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [46934.330064] CR2: 00007f803a8bb000 CR3: 000000004b2c9000 CR4:
      00000000000406e0
      [46934.330069] Stack:
      [46934.330071]  ffffffff811e6169 00000000e772fa05 ffff88004b077000
      ffff88004b3f0500
      [46934.330072]  ffffffff81d17d18 000000000000002a 0000000000000000
      ffff88005b72b8a0
      [46934.330073]  ffffffff81620108 ffffffff8161fe0e ffff88005b72b8c4
      ffff88005b302000
      [46934.330073] Call Trace:
      [46934.330077]  [<ffffffff811e6169>] ?
      __kmalloc_node_track_caller+0x119/0x300
      [46934.330084]  [<ffffffff81620108>] dev_hard_start_xmit+0x188/0x410
      [46934.330086]  [<ffffffff8161fe0e>] ? harmonize_features+0x2e/0x90
      [46934.330088]  [<ffffffff81620b06>] __dev_queue_xmit+0x456/0x590
      [46934.330089]  [<ffffffff81620c50>] dev_queue_xmit+0x10/0x20
      [46934.330090]  [<ffffffff8168f022>] arp_xmit+0x22/0x60
      [46934.330091]  [<ffffffff8168f090>] arp_send.part.16+0x30/0x40
      [46934.330092]  [<ffffffff8168f1e5>] arp_solicit+0x115/0x2b0
      [46934.330094]  [<ffffffff8160b5d7>] ? copy_skb_header+0x17/0xa0
      [46934.330096]  [<ffffffff8162875a>] neigh_probe+0x4a/0x70
      [46934.330097]  [<ffffffff8162979c>] __neigh_event_send+0xac/0x230
      [46934.330098]  [<ffffffff8162a00b>] neigh_resolve_output+0x13b/0x220
      [46934.330100]  [<ffffffff8165f120>] ? ip_forward_options+0x1c0/0x1c0
      [46934.330101]  [<ffffffff81660478>] ip_finish_output+0x1f8/0x860
      [46934.330102]  [<ffffffff81661f08>] ip_output+0x58/0x90
      [46934.330103]  [<ffffffff81661602>] ? __ip_local_out+0xa2/0xb0
      [46934.330104]  [<ffffffff81661640>] ip_local_out_sk+0x30/0x40
      [46934.330105]  [<ffffffff81662a66>] ip_send_skb+0x16/0x50
      [46934.330106]  [<ffffffff81662ad3>] ip_push_pending_frames+0x33/0x40
      [46934.330107]  [<ffffffff8168854c>] raw_sendmsg+0x88c/0xa30
      [46934.330110]  [<ffffffff81612b31>] ? skb_recv_datagram+0x41/0x60
      [46934.330111]  [<ffffffff816875a9>] ? raw_recvmsg+0xa9/0x1f0
      [46934.330113]  [<ffffffff816978d4>] inet_sendmsg+0x74/0xc0
      [46934.330114]  [<ffffffff81697a9b>] ? inet_recvmsg+0x8b/0xb0
      [46934.330115] bond0: Adding slave eth2
      [46934.330116]  [<ffffffff8160357c>] sock_sendmsg+0x9c/0xe0
      [46934.330118]  [<ffffffff81603248>] ?
      move_addr_to_kernel.part.20+0x28/0x80
      [46934.330121]  [<ffffffff811b4477>] ? might_fault+0x47/0x50
      [46934.330122]  [<ffffffff816039b9>] ___sys_sendmsg+0x3a9/0x3c0
      [46934.330125]  [<ffffffff8144a14a>] ? n_tty_write+0x3aa/0x530
      [46934.330127]  [<ffffffff810d1ae4>] ? __wake_up+0x44/0x50
      [46934.330129]  [<ffffffff81242b38>] ? fsnotify+0x238/0x310
      [46934.330130]  [<ffffffff816048a1>] __sys_sendmsg+0x51/0x90
      [46934.330131]  [<ffffffff816048f2>] SyS_sendmsg+0x12/0x20
      [46934.330134]  [<ffffffff81738b29>] system_call_fastpath+0x16/0x1b
      [46934.330144] Code: 48 8b 10 4c 89 ee 4c 89 ff e8 aa bc ff ff 31 c0 e9
      1a ff ff ff 0f 1f 00 4c 89 ee 4c 89 ff e8 65 fb ff ff 31 d2 4c 89 ee 4c
      89 ff <f7> b3 64 09 00 00 e8 02 bd ff ff 31 c0 e9 f2 fe ff ff 0f 1f 00
      [46934.330146] RIP  [<ffffffffa0198c33>] bond_start_xmit+0x1c3/0x450
      [bonding]
      [46934.330146]  RSP <ffff88005b72b7f8>
      
      CC: Eric Dumazet <eric.dumazet@gmail.com>
      CC: Andy Gospodarek <andy@greyhouse.net>
      CC: Jay Vosburgh <j.vosburgh@gmail.com>
      CC: Veaceslav Falico <vfalico@gmail.com>
      Fixes: 278b2083 ("bonding: initial RCU conversion")
      Signed-off-by: default avatarNikolay Aleksandrov <nikolay@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9ede8fd5
    • Sabrina Dubroca's avatar
      ipv6: fix rtnl locking in setsockopt for anycast and multicast · 4c163b4a
      Sabrina Dubroca authored
      [ Upstream commit a9ed4a29 ]
      
      Calling setsockopt with IPV6_JOIN_ANYCAST or IPV6_LEAVE_ANYCAST
      triggers the assertion in addrconf_join_solict()/addrconf_leave_solict()
      
      ipv6_sock_ac_join(), ipv6_sock_ac_drop(), ipv6_sock_ac_close() need to
      take RTNL before calling ipv6_dev_ac_inc/dec. Same thing with
      ipv6_sock_mc_join(), ipv6_sock_mc_drop(), ipv6_sock_mc_close() before
      calling ipv6_dev_mc_inc/dec.
      
      This patch moves ASSERT_RTNL() up a level in the call stack.
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarSabrina Dubroca <sd@queasysnail.net>
      Reported-by: default avatarTommi Rantala <tt.rantala@gmail.com>
      Acked-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4c163b4a
    • Guillaume Nault's avatar
      l2tp: fix race while getting PMTU on PPP pseudo-wire · 5f3a420d
      Guillaume Nault authored
      [ Upstream commit eed4d839 ]
      
      Use dst_entry held by sk_dst_get() to retrieve tunnel's PMTU.
      
      The dst_mtu(__sk_dst_get(tunnel->sock)) call was racy. __sk_dst_get()
      could return NULL if tunnel->sock->sk_dst_cache was reset just before the
      call, thus making dst_mtu() dereference a NULL pointer:
      
      [ 1937.661598] BUG: unable to handle kernel NULL pointer dereference at 0000000000000020
      [ 1937.664005] IP: [<ffffffffa049db88>] pppol2tp_connect+0x33d/0x41e [l2tp_ppp]
      [ 1937.664005] PGD daf0c067 PUD d9f93067 PMD 0
      [ 1937.664005] Oops: 0000 [#1] SMP
      [ 1937.664005] Modules linked in: l2tp_ppp l2tp_netlink l2tp_core ip6table_filter ip6_tables iptable_filter ip_tables ebtable_nat ebtables x_tables udp_tunnel pppoe pppox ppp_generic slhc deflate ctr twofish_generic twofish_x86_64_3way xts lrw gf128mul glue_helper twofish_x86_64 twofish_common blowfish_generic blowfish_x86_64 blowfish_common des_generic cbc xcbc rmd160 sha512_generic hmac crypto_null af_key xfrm_algo 8021q garp bridge stp llc tun atmtcp clip atm ext3 mbcache jbd iTCO_wdt coretemp kvm_intel iTCO_vendor_support kvm pcspkr evdev ehci_pci lpc_ich mfd_core i5400_edac edac_core i5k_amb shpchp button processor thermal_sys xfs crc32c_generic libcrc32c dm_mod usbhid sg hid sr_mod sd_mod cdrom crc_t10dif crct10dif_common ata_generic ahci ata_piix tg3 libahci libata uhci_hcd ptp ehci_hcd pps_core usbcore scsi_mod libphy usb_common [last unloaded: l2tp_core]
      [ 1937.664005] CPU: 0 PID: 10022 Comm: l2tpstress Tainted: G           O   3.17.0-rc1 #1
      [ 1937.664005] Hardware name: HP ProLiant DL160 G5, BIOS O12 08/22/2008
      [ 1937.664005] task: ffff8800d8fda790 ti: ffff8800c43c4000 task.ti: ffff8800c43c4000
      [ 1937.664005] RIP: 0010:[<ffffffffa049db88>]  [<ffffffffa049db88>] pppol2tp_connect+0x33d/0x41e [l2tp_ppp]
      [ 1937.664005] RSP: 0018:ffff8800c43c7de8  EFLAGS: 00010282
      [ 1937.664005] RAX: ffff8800da8a7240 RBX: ffff8800d8c64600 RCX: 000001c325a137b5
      [ 1937.664005] RDX: 8c6318c6318c6320 RSI: 000000000000010c RDI: 0000000000000000
      [ 1937.664005] RBP: ffff8800c43c7ea8 R08: 0000000000000000 R09: 0000000000000000
      [ 1937.664005] R10: ffffffffa048e2c0 R11: ffff8800d8c64600 R12: ffff8800ca7a5000
      [ 1937.664005] R13: ffff8800c439bf40 R14: 000000000000000c R15: 0000000000000009
      [ 1937.664005] FS:  00007fd7f610f700(0000) GS:ffff88011a600000(0000) knlGS:0000000000000000
      [ 1937.664005] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      [ 1937.664005] CR2: 0000000000000020 CR3: 00000000d9d75000 CR4: 00000000000027e0
      [ 1937.664005] Stack:
      [ 1937.664005]  ffffffffa049da80 ffff8800d8fda790 000000000000005b ffff880000000009
      [ 1937.664005]  ffff8800daf3f200 0000000000000003 ffff8800c43c7e48 ffffffff81109b57
      [ 1937.664005]  ffffffff81109b0e ffffffff8114c566 0000000000000000 0000000000000000
      [ 1937.664005] Call Trace:
      [ 1937.664005]  [<ffffffffa049da80>] ? pppol2tp_connect+0x235/0x41e [l2tp_ppp]
      [ 1937.664005]  [<ffffffff81109b57>] ? might_fault+0x9e/0xa5
      [ 1937.664005]  [<ffffffff81109b0e>] ? might_fault+0x55/0xa5
      [ 1937.664005]  [<ffffffff8114c566>] ? rcu_read_unlock+0x1c/0x26
      [ 1937.664005]  [<ffffffff81309196>] SYSC_connect+0x87/0xb1
      [ 1937.664005]  [<ffffffff813e56f7>] ? sysret_check+0x1b/0x56
      [ 1937.664005]  [<ffffffff8107590d>] ? trace_hardirqs_on_caller+0x145/0x1a1
      [ 1937.664005]  [<ffffffff81213dee>] ? trace_hardirqs_on_thunk+0x3a/0x3f
      [ 1937.664005]  [<ffffffff8114c262>] ? spin_lock+0x9/0xb
      [ 1937.664005]  [<ffffffff813092b4>] SyS_connect+0x9/0xb
      [ 1937.664005]  [<ffffffff813e56d2>] system_call_fastpath+0x16/0x1b
      [ 1937.664005] Code: 10 2a 84 81 e8 65 76 bd e0 65 ff 0c 25 10 bb 00 00 4d 85 ed 74 37 48 8b 85 60 ff ff ff 48 8b 80 88 01 00 00 48 8b b8 10 02 00 00 <48> 8b 47 20 ff 50 20 85 c0 74 0f 83 e8 28 89 83 10 01 00 00 89
      [ 1937.664005] RIP  [<ffffffffa049db88>] pppol2tp_connect+0x33d/0x41e [l2tp_ppp]
      [ 1937.664005]  RSP <ffff8800c43c7de8>
      [ 1937.664005] CR2: 0000000000000020
      [ 1939.559375] ---[ end trace 82d44500f28f8708 ]---
      
      Fixes: f34c4a35 ("l2tp: take PMTU from tunnel UDP socket")
      Signed-off-by: default avatarGuillaume Nault <g.nault@alphalink.fr>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5f3a420d
    • Gerhard Stenzel's avatar
      vxlan: fix incorrect initializer in union vxlan_addr · ed6d9919
      Gerhard Stenzel authored
      [ Upstream commit a45e92a5 ]
      
      The first initializer in the following
      
              union vxlan_addr ipa = {
                  .sin.sin_addr.s_addr = tip,
                  .sa.sa_family = AF_INET,
              };
      
      is optimised away by the compiler, due to the second initializer,
      therefore initialising .sin.sin_addr.s_addr always to 0.
      This results in netlink messages indicating a L3 miss never contain the
      missed IP address. This was observed with GCC 4.8 and 4.9. I do not know about previous versions.
      The problem affects user space programs relying on an IP address being
      sent as part of a netlink message indicating a L3 miss.
      
      Changing
                  .sa.sa_family = AF_INET,
      to
                  .sin.sin_family = AF_INET,
      fixes the problem.
      Signed-off-by: default avatarGerhard Stenzel <gerhard.stenzel@de.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ed6d9919
    • Jiri Benc's avatar
      openvswitch: fix panic with multiple vlan headers · 5bfbaf50
      Jiri Benc authored
      [ Upstream commit 2ba5af42 ]
      
      When there are multiple vlan headers present in a received frame, the first
      one is put into vlan_tci and protocol is set to ETH_P_8021Q. Anything in the
      skb beyond the VLAN TPID may be still non-linear, including the inner TCI
      and ethertype. While ovs_flow_extract takes care of IP and IPv6 headers, it
      does nothing with ETH_P_8021Q. Later, if OVS_ACTION_ATTR_POP_VLAN is
      executed, __pop_vlan_tci pulls the next vlan header into vlan_tci.
      
      This leads to two things:
      
      1. Part of the resulting ethernet header is in the non-linear part of the
         skb. When eth_type_trans is called later as the result of
         OVS_ACTION_ATTR_OUTPUT, kernel BUGs in __skb_pull. Also, __pop_vlan_tci
         is in fact accessing random data when it reads past the TPID.
      
      2. network_header points into the ethernet header instead of behind it.
         mac_len is set to a wrong value (10), too.
      Reported-by: default avatarYulong Pei <ypei@redhat.com>
      Signed-off-by: default avatarJiri Benc <jbenc@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5bfbaf50
    • Eric Dumazet's avatar
      packet: handle too big packets for PACKET_V3 · 454116bc
      Eric Dumazet authored
      [ Upstream commit dc808110 ]
      
      af_packet can currently overwrite kernel memory by out of bound
      accesses, because it assumed a [new] block can always hold one frame.
      
      This is not generally the case, even if most existing tools do it right.
      
      This patch clamps too long frames as API permits, and issue a one time
      error on syslog.
      
      [  394.357639] tpacket_rcv: packet too big, clamped from 5042 to 3966. macoff=82
      
      In this example, packet header tp_snaplen was set to 3966,
      and tp_len was set to 5042 (skb->len)
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Fixes: f6fb8f10 ("af-packet: TPACKET_V3 flexible buffer implementation.")
      Acked-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Acked-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      454116bc
    • Neal Cardwell's avatar
      tcp: fix ssthresh and undo for consecutive short FRTO episodes · 867f806f
      Neal Cardwell authored
      [ Upstream commit 0c9ab092 ]
      
      Fix TCP FRTO logic so that it always notices when snd_una advances,
      indicating that any RTO after that point will be a new and distinct
      loss episode.
      
      Previously there was a very specific sequence that could cause FRTO to
      fail to notice a new loss episode had started:
      
      (1) RTO timer fires, enter FRTO and retransmit packet 1 in write queue
      (2) receiver ACKs packet 1
      (3) FRTO sends 2 more packets
      (4) RTO timer fires again (should start a new loss episode)
      
      The problem was in step (3) above, where tcp_process_loss() returned
      early (in the spot marked "Step 2.b"), so that it never got to the
      logic to clear icsk_retransmits. Thus icsk_retransmits stayed
      non-zero. Thus in step (4) tcp_enter_loss() would see the non-zero
      icsk_retransmits, decide that this RTO is not a new episode, and
      decide not to cut ssthresh and remember the current cwnd and ssthresh
      for undo.
      
      There were two main consequences to the bug that we have
      observed. First, ssthresh was not decreased in step (4). Second, when
      there was a series of such FRTO (1-4) sequences that happened to be
      followed by an FRTO undo, we would restore the cwnd and ssthresh from
      before the entire series started (instead of the cwnd and ssthresh
      from before the most recent RTO). This could result in cwnd and
      ssthresh being restored to values much bigger than the proper values.
      Signed-off-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarYuchung Cheng <ycheng@google.com>
      Fixes: e33099f9 ("tcp: implement RFC5682 F-RTO")
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      867f806f
    • Neal Cardwell's avatar
      tcp: fix tcp_release_cb() to dispatch via address family for mtu_reduced() · 97b477f8
      Neal Cardwell authored
      [ Upstream commit 4fab9071 ]
      
      Make sure we use the correct address-family-specific function for
      handling MTU reductions from within tcp_release_cb().
      
      Previously AF_INET6 sockets were incorrectly always using the IPv6
      code path when sometimes they were handling IPv4 traffic and thus had
      an IPv4 dst.
      Signed-off-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Diagnosed-by: default avatarWillem de Bruijn <willemb@google.com>
      Fixes: 563d34d0 ("tcp: dont drop MTU reduction indications")
      Reviewed-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      97b477f8
    • Shmulik Ladkani's avatar
      sit: Fix ipip6_tunnel_lookup device matching criteria · 8e27e5cd
      Shmulik Ladkani authored
      [ Upstream commit bc8fc7b8 ]
      
      As of 4fddbf5d ("sit: strictly restrict incoming traffic to tunnel link device"),
      when looking up a tunnel, tunnel's underlying interface (t->parms.link)
      is verified to match incoming traffic's ingress device.
      
      However the comparison was incorrectly based on skb->dev->iflink.
      
      Instead, dev->ifindex should be used, which correctly represents the
      interface from which the IP stack hands the ipip6 packets.
      
      This allows setting up sit tunnels bound to vlan interfaces (otherwise
      incoming ipip6 traffic on the vlan interface was dropped due to
      ipip6_tunnel_lookup match failure).
      Signed-off-by: default avatarShmulik Ladkani <shmulik.ladkani@gmail.com>
      Acked-by: default avatarNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8e27e5cd
    • Andrey Vagin's avatar
      tcp: don't use timestamp from repaired skb-s to calculate RTT (v2) · d7bdb8e9
      Andrey Vagin authored
      [ Upstream commit 9d186cac ]
      
      We don't know right timestamp for repaired skb-s. Wrong RTT estimations
      isn't good, because some congestion modules heavily depends on it.
      
      This patch adds the TCPCB_REPAIRED flag, which is included in
      TCPCB_RETRANS.
      
      Thanks to Eric for the advice how to fix this issue.
      
      This patch fixes the warning:
      [  879.562947] WARNING: CPU: 0 PID: 2825 at net/ipv4/tcp_input.c:3078 tcp_ack+0x11f5/0x1380()
      [  879.567253] CPU: 0 PID: 2825 Comm: socket-tcpbuf-l Not tainted 3.16.0-next-20140811 #1
      [  879.567829] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
      [  879.568177]  0000000000000000 00000000c532680c ffff880039643d00 ffffffff817aa2d2
      [  879.568776]  0000000000000000 ffff880039643d38 ffffffff8109afbd ffff880039d6ba80
      [  879.569386]  ffff88003a449800 000000002983d6bd 0000000000000000 000000002983d6bc
      [  879.569982] Call Trace:
      [  879.570264]  [<ffffffff817aa2d2>] dump_stack+0x4d/0x66
      [  879.570599]  [<ffffffff8109afbd>] warn_slowpath_common+0x7d/0xa0
      [  879.570935]  [<ffffffff8109b0ea>] warn_slowpath_null+0x1a/0x20
      [  879.571292]  [<ffffffff816d0a05>] tcp_ack+0x11f5/0x1380
      [  879.571614]  [<ffffffff816d10bd>] tcp_rcv_established+0x1ed/0x710
      [  879.571958]  [<ffffffff816dc9da>] tcp_v4_do_rcv+0x10a/0x370
      [  879.572315]  [<ffffffff81657459>] release_sock+0x89/0x1d0
      [  879.572642]  [<ffffffff816c81a0>] do_tcp_setsockopt.isra.36+0x120/0x860
      [  879.573000]  [<ffffffff8110a52e>] ? rcu_read_lock_held+0x6e/0x80
      [  879.573352]  [<ffffffff816c8912>] tcp_setsockopt+0x32/0x40
      [  879.573678]  [<ffffffff81654ac4>] sock_common_setsockopt+0x14/0x20
      [  879.574031]  [<ffffffff816537b0>] SyS_setsockopt+0x80/0xf0
      [  879.574393]  [<ffffffff817b40a9>] system_call_fastpath+0x16/0x1b
      [  879.574730] ---[ end trace a17cbc38eb8c5c00 ]---
      
      v2: moving setting of skb->when for repaired skb-s in tcp_write_xmit,
          where it's set for other skb-s.
      
      Fixes: 431a9124 ("tcp: timestamp SYN+DATA messages")
      Fixes: 740b0f18 ("tcp: switch rtt estimations to usec resolution")
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Pavel Emelyanov <xemul@parallels.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Signed-off-by: default avatarAndrey Vagin <avagin@openvz.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d7bdb8e9
    • Neerav Parikh's avatar
      i40e: Don't stop driver probe when querying DCB config fails · 46b6abb9
      Neerav Parikh authored
      Commit id: 014269ff in Linus's tree
      should be queued up for stable 3.14 & 3.15 since the i40e driver will
      not load when DCB is enabled, unless this patch is applied.
      
      In case of any AQ command to query port's DCB configuration fails
      during driver's probe time; the probe fails and returns an error.
      
      This patch prevents this issue by continuing the driver probe even
      when an error is returned.
      
      Also, added an error message to dump the AQ error status to show what
      error caused the failure to get the DCB configuration from firmware.
      
      Change-ID: Ifd5663512588bca684069bb7d4fb586dd72221af
      Signed-off-by: default avatarNeerav Parikh <neerav.parikh@intel.com>
      Signed-off-by: default avatarCatherine Sullivan <catherine.sullivan@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      46b6abb9
    • Stanislaw Gruszka's avatar
      myri10ge: check for DMA mapping errors · b3804066
      Stanislaw Gruszka authored
      [ Upstream commit 10545937 ]
      
      On IOMMU systems DMA mapping can fail, we need to check for
      that possibility.
      Signed-off-by: default avatarStanislaw Gruszka <sgruszka@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b3804066
    • Vlad Yasevich's avatar
      net: Always untag vlan-tagged traffic on input. · ff81e63f
      Vlad Yasevich authored
      [ Upstream commit 0d5501c1 ]
      
      Currently the functionality to untag traffic on input resides
      as part of the vlan module and is build only when VLAN support
      is enabled in the kernel.  When VLAN is disabled, the function
      vlan_untag() turns into a stub and doesn't really untag the
      packets.  This seems to create an interesting interaction
      between VMs supporting checksum offloading and some network drivers.
      
      There are some drivers that do not allow the user to change
      tx-vlan-offload feature of the driver.  These drivers also seem
      to assume that any VLAN-tagged traffic they transmit will
      have the vlan information in the vlan_tci and not in the vlan
      header already in the skb.  When transmitting skbs that already
      have tagged data with partial checksum set, the checksum doesn't
      appear to be updated correctly by the card thus resulting in a
      failure to establish TCP connections.
      
      The following is a packet trace taken on the receiver where a
      sender is a VM with a VLAN configued.  The host VM is running on
      doest not have VLAN support and the outging interface on the
      host is tg3:
      10:12:43.503055 52:54:00:ae:42:3f > 28:d2:44:7d:c2:de, ethertype 802.1Q
      (0x8100), length 78: vlan 100, p 0, ethertype IPv4, (tos 0x0, ttl 64, id 27243,
      offset 0, flags [DF], proto TCP (6), length 60)
          10.0.100.1.58545 > 10.0.100.10.ircu-2: Flags [S], cksum 0xdc39 (incorrect
      -> 0x48d9), seq 1069378582, win 29200, options [mss 1460,sackOK,TS val
      4294837885 ecr 0,nop,wscale 7], length 0
      10:12:44.505556 52:54:00:ae:42:3f > 28:d2:44:7d:c2:de, ethertype 802.1Q
      (0x8100), length 78: vlan 100, p 0, ethertype IPv4, (tos 0x0, ttl 64, id 27244,
      offset 0, flags [DF], proto TCP (6), length 60)
          10.0.100.1.58545 > 10.0.100.10.ircu-2: Flags [S], cksum 0xdc39 (incorrect
      -> 0x44ee), seq 1069378582, win 29200, options [mss 1460,sackOK,TS val
      4294838888 ecr 0,nop,wscale 7], length 0
      
      This connection finally times out.
      
      I've only access to the TG3 hardware in this configuration thus have
      only tested this with TG3 driver.  There are a lot of other drivers
      that do not permit user changes to vlan acceleration features, and
      I don't know if they all suffere from a similar issue.
      
      The patch attempt to fix this another way.  It moves the vlan header
      stipping code out of the vlan module and always builds it into the
      kernel network core.  This way, even if vlan is not supported on
      a virtualizatoin host, the virtual machines running on top of such
      host will still work with VLANs enabled.
      
      CC: Patrick McHardy <kaber@trash.net>
      CC: Nithin Nayak Sujir <nsujir@broadcom.com>
      CC: Michael Chan <mchan@broadcom.com>
      CC: Jiri Pirko <jiri@resnulli.us>
      Signed-off-by: default avatarVladislav Yasevich <vyasevic@redhat.com>
      Acked-by: default avatarJiri Pirko <jiri@resnulli.us>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ff81e63f
    • Jiri Benc's avatar
      rtnetlink: fix VF info size · 7f97ac52
      Jiri Benc authored
      [ Upstream commit 945a3676 ]
      
      Commit 1d8faf48 ("net/core: Add VF link state control") added new
      attribute to IFLA_VF_INFO group in rtnl_fill_ifinfo but did not adjust size
      of the allocated memory in if_nlmsg_size/rtnl_vfinfo_size. As the result, we
      may trigger warnings in rtnl_getlink and similar functions when many VF
      links are enabled, as the information does not fit into the allocated skb.
      
      Fixes: 1d8faf48 ("net/core: Add VF link state control")
      Reported-by: default avatarYulong Pei <ypei@redhat.com>
      Signed-off-by: default avatarJiri Benc <jbenc@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7f97ac52
    • Daniel Borkmann's avatar
      netlink: reset network header before passing to taps · c5c2f898
      Daniel Borkmann authored
      [ Upstream commit 4e48ed88 ]
      
      netlink doesn't set any network header offset thus when the skb is
      being passed to tap devices via dev_queue_xmit_nit(), it emits klog
      false positives due to it being unset like:
      
        ...
        [  124.990397] protocol 0000 is buggy, dev nlmon0
        [  124.990411] protocol 0000 is buggy, dev nlmon0
        ...
      
      So just reset the network header before passing to the device; for
      packet sockets that just means nothing will change - mac and net
      offset hold the same value just as before.
      Reported-by: default avatarMarcel Holtmann <marcel@holtmann.org>
      Signed-off-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c5c2f898
  2. 09 Oct, 2014 7 commits
    • Greg Kroah-Hartman's avatar
      Linux 3.14.21 · 89161fe9
      Greg Kroah-Hartman authored
      89161fe9
    • Linus Torvalds's avatar
      mm: don't pointlessly use BUG_ON() for sanity check · c56af023
      Linus Torvalds authored
      commit 50f5aa8a upstream.
      
      BUG_ON() is a big hammer, and should be used _only_ if there is some
      major corruption that you cannot possibly recover from, making it
      imperative that the current process (and possibly the whole machine) be
      terminated with extreme prejudice.
      
      The trivial sanity check in the vmacache code is *not* such a fatal
      error.  Recovering from it is absolutely trivial, and using BUG_ON()
      just makes it harder to debug for no actual advantage.
      
      To make matters worse, the placement of the BUG_ON() (only if the range
      check matched) actually makes it harder to hit the sanity check to begin
      with, so _if_ there is a bug (and we just got a report from Srivatsa
      Bhat that this can indeed trigger), it is harder to debug not just
      because the machine is possibly dead, but because we don't have better
      coverage.
      
      BUG_ON() must *die*.  Maybe we should add a checkpatch warning for it,
      because it is simply just about the worst thing you can ever do if you
      hit some "this cannot happen" situation.
      Reported-by: default avatarSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Cc: Davidlohr Bueso <davidlohr@hp.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarMel Gorman <mgorman@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c56af023
    • Davidlohr Bueso's avatar
      mm: per-thread vma caching · efb5fea2
      Davidlohr Bueso authored
      commit 615d6e87 upstream.
      
      This patch is a continuation of efforts trying to optimize find_vma(),
      avoiding potentially expensive rbtree walks to locate a vma upon faults.
      The original approach (https://lkml.org/lkml/2013/11/1/410), where the
      largest vma was also cached, ended up being too specific and random,
      thus further comparison with other approaches were needed.  There are
      two things to consider when dealing with this, the cache hit rate and
      the latency of find_vma().  Improving the hit-rate does not necessarily
      translate in finding the vma any faster, as the overhead of any fancy
      caching schemes can be too high to consider.
      
      We currently cache the last used vma for the whole address space, which
      provides a nice optimization, reducing the total cycles in find_vma() by
      up to 250%, for workloads with good locality.  On the other hand, this
      simple scheme is pretty much useless for workloads with poor locality.
      Analyzing ebizzy runs shows that, no matter how many threads are
      running, the mmap_cache hit rate is less than 2%, and in many situations
      below 1%.
      
      The proposed approach is to replace this scheme with a small per-thread
      cache, maximizing hit rates at a very low maintenance cost.
      Invalidations are performed by simply bumping up a 32-bit sequence
      number.  The only expensive operation is in the rare case of a seq
      number overflow, where all caches that share the same address space are
      flushed.  Upon a miss, the proposed replacement policy is based on the
      page number that contains the virtual address in question.  Concretely,
      the following results are seen on an 80 core, 8 socket x86-64 box:
      
      1) System bootup: Most programs are single threaded, so the per-thread
         scheme does improve ~50% hit rate by just adding a few more slots to
         the cache.
      
      +----------------+----------+------------------+
      | caching scheme | hit-rate | cycles (billion) |
      +----------------+----------+------------------+
      | baseline       | 50.61%   | 19.90            |
      | patched        | 73.45%   | 13.58            |
      +----------------+----------+------------------+
      
      2) Kernel build: This one is already pretty good with the current
         approach as we're dealing with good locality.
      
      +----------------+----------+------------------+
      | caching scheme | hit-rate | cycles (billion) |
      +----------------+----------+------------------+
      | baseline       | 75.28%   | 11.03            |
      | patched        | 88.09%   | 9.31             |
      +----------------+----------+------------------+
      
      3) Oracle 11g Data Mining (4k pages): Similar to the kernel build workload.
      
      +----------------+----------+------------------+
      | caching scheme | hit-rate | cycles (billion) |
      +----------------+----------+------------------+
      | baseline       | 70.66%   | 17.14            |
      | patched        | 91.15%   | 12.57            |
      +----------------+----------+------------------+
      
      4) Ebizzy: There's a fair amount of variation from run to run, but this
         approach always shows nearly perfect hit rates, while baseline is just
         about non-existent.  The amounts of cycles can fluctuate between
         anywhere from ~60 to ~116 for the baseline scheme, but this approach
         reduces it considerably.  For instance, with 80 threads:
      
      +----------------+----------+------------------+
      | caching scheme | hit-rate | cycles (billion) |
      +----------------+----------+------------------+
      | baseline       | 1.06%    | 91.54            |
      | patched        | 99.97%   | 14.18            |
      +----------------+----------+------------------+
      
      [akpm@linux-foundation.org: fix nommu build, per Davidlohr]
      [akpm@linux-foundation.org: document vmacache_valid() logic]
      [akpm@linux-foundation.org: attempt to untangle header files]
      [akpm@linux-foundation.org: add vmacache_find() BUG_ON]
      [hughd@google.com: add vmacache_valid_mm() (from Oleg)]
      [akpm@linux-foundation.org: coding-style fixes]
      [akpm@linux-foundation.org: adjust and enhance comments]
      Signed-off-by: default avatarDavidlohr Bueso <davidlohr@hp.com>
      Reviewed-by: default avatarRik van Riel <riel@redhat.com>
      Acked-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Reviewed-by: default avatarMichel Lespinasse <walken@google.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Tested-by: default avatarHugh Dickins <hughd@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarMel Gorman <mgorman@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      efb5fea2
    • Christoph Lameter's avatar
      vmscan: reclaim_clean_pages_from_list() must use mod_zone_page_state() · 264a8ae7
      Christoph Lameter authored
      commit 83da7510 upstream.
      
      Seems to be called with preemption enabled.  Therefore it must use
      mod_zone_page_state instead.
      Signed-off-by: default avatarChristoph Lameter <cl@linux.com>
      Reported-by: default avatarGrygorii Strashko <grygorii.strashko@ti.com>
      Tested-by: default avatarGrygorii Strashko <grygorii.strashko@ti.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Santosh Shilimkar <santosh.shilimkar@ti.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarMel Gorman <mgorman@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      264a8ae7
    • Vladimir Davydov's avatar
      mm: vmscan: shrink_slab: rename max_pass -> freeable · 8e524793
      Vladimir Davydov authored
      commit d5bc5fd3 upstream.
      
      The name `max_pass' is misleading, because this variable actually keeps
      the estimate number of freeable objects, not the maximal number of
      objects we can scan in this pass, which can be twice that.  Rename it to
      reflect its actual meaning.
      Signed-off-by: default avatarVladimir Davydov <vdavydov@parallels.com>
      Acked-by: default avatarDavid Rientjes <rientjes@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarMel Gorman <mgorman@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8e524793
    • Vladimir Davydov's avatar
      mm: vmscan: respect NUMA policy mask when shrinking slab on direct reclaim · 12f2f0ba
      Vladimir Davydov authored
      commit 99120b77 upstream.
      
      When direct reclaim is executed by a process bound to a set of NUMA
      nodes, we should scan only those nodes when possible, but currently we
      will scan kmem from all online nodes even if the kmem shrinker is NUMA
      aware.  That said, binding a process to a particular NUMA node won't
      prevent it from shrinking inode/dentry caches from other nodes, which is
      not good.  Fix this.
      Signed-off-by: default avatarVladimir Davydov <vdavydov@parallels.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Dave Chinner <dchinner@redhat.com>
      Cc: Glauber Costa <glommer@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarMel Gorman <mgorman@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      12f2f0ba
    • Jens Axboe's avatar
      mm/filemap.c: avoid always dirtying mapping->flags on O_DIRECT · 6591981a
      Jens Axboe authored
      commit 7fcbbaf1 upstream.
      
      In some testing I ran today (some fio jobs that spread over two nodes),
      we end up spending 40% of the time in filemap_check_errors().  That
      smells fishy.  Looking further, this is basically what happens:
      
      blkdev_aio_read()
          generic_file_aio_read()
              filemap_write_and_wait_range()
                  if (!mapping->nr_pages)
                      filemap_check_errors()
      
      and filemap_check_errors() always attempts two test_and_clear_bit() on
      the mapping flags, thus dirtying it for every single invocation.  The
      patch below tests each of these bits before clearing them, avoiding this
      issue.  In my test case (4-socket box), performance went from 1.7M IOPS
      to 4.0M IOPS.
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      Acked-by: default avatarJeff Moyer <jmoyer@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarMel Gorman <mgorman@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6591981a