1. 29 Sep, 2015 10 commits
    • Achiad Shochat's avatar
      net/mlx5e: Priv state flag not rolled-back upon netdev open error · 343b29f3
      Achiad Shochat authored
      The private mlx5 state flag that indicates that the netdev is
      opened is set at the beginning of the netdev open flow.
      In case an error occured later in the mlx5 netdev open flow, this
      flag was not cleared, remaining set although the actual set is
      closed.
      Signed-off-by: default avatarAchiad Shochat <achiad@mellanox.com>
      Signed-off-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      343b29f3
    • Andrzej Hajda's avatar
      tools: bpf_jit_disasm: make get_last_jit_image return unsigned · 4de61ba2
      Andrzej Hajda authored
      The function returns always non-negative values.
      
      The problem has been detected using proposed semantic patch
      scripts/coccinelle/tests/assign_signed_to_unsigned.cocci [1].
      
      [1]: http://permalink.gmane.org/gmane.linux.kernel/2046107Signed-off-by: default avatarAndrzej Hajda <a.hajda@samsung.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4de61ba2
    • Eric Dumazet's avatar
      tcp: avoid reorders for TFO passive connections · 7c85af88
      Eric Dumazet authored
      We found that a TCP Fast Open passive connection was vulnerable
      to reorders, as the exchange might look like
      
      [1] C -> S S <FO ...> <request>
      [2] S -> C S. ack request <options>
      [3] S -> C . <answer>
      
      packets [2] and [3] can be generated at almost the same time.
      
      If C receives the 3rd packet before the 2nd, it will drop it as
      the socket is in SYN_SENT state and expects a SYNACK.
      
      S will have to retransmit the answer.
      
      Current OOO avoidance in linux is defeated because SYNACK
      packets are attached to the LISTEN socket, while DATA packets
      are attached to the children. They might be sent by different cpus,
      and different TX queues might be selected.
      
      It turns out that for TFO, we created a child, which is a
      full blown socket in TCP_SYN_RECV state, and we simply can attach
      the SYNACK packet to this socket.
      
      This means that at the time tcp_sendmsg() pushes DATA packet,
      skb->ooo_okay will be set iff the SYNACK packet had been sent
      and TX completed.
      
      This removes the reorder source at the host level.
      
      We also removed the export of tcp_try_fastopen(), as it is no
      longer called from IPv6.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarYuchung Cheng <ycheng@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7c85af88
    • David S. Miller's avatar
      Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue · eae93fe4
      David S. Miller authored
      Jeff Kirsher says:
      
      ====================
      Intel Wired LAN Driver Updates 2015-09-28
      
      This series contains updates to i40e, i40evf and igb to resolve issues
      seen and reported by Red Hat.
      
      Kiran moves i40e_get_head() in preparation for the refactor of the Tx
      timeout logic, so that it can be used in other areas of the driver.
      Refactored the driver timeout logic by issuing a writeback request via
      a software interrupt to the hardware the first time the driver detects
      a hang.  This was due to the driver being too aggressive in resetting a
      hung queue.
      
      Shannon adds the GRE protocol to the transmit checksum encoding.
      
      Anjali fixes an issue of forcing writeback too often, which caused us to
      not benefit from NAPI.  We now disable force writeback in the clean
      routine for X710 and XL710 adapters.  The X722 adapters do not enable
      interrupt to force a writeback and benefit from WB_ON_ITR and so force
      WB is left enabled for those adapters.  Fixed a possible deadlock issue
      where sync_vsi_filters() can be called directly under RTNL or through
      the timer subtask without RTNL.  So update the flow to see if we are
      already under RTNL before trying to grab it.
      
      Stefan Assmann provides a fix for igb where SR-IOV was not getting
      enabled properly and we ran into a NULL pointer if the max_vfs module
      parameter is specified.  This is prevented by setting the
      IGB_FLAG_HAS_MSIX bit before calling igb_probe_vfs().
      
      v2: added "i40e: Fix for recursive RTNL lock during PROMISC change" patch
          to the series, as it resolves another issues seen and reported by
          Red Hat.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      eae93fe4
    • Stefan Assmann's avatar
      igb: assume MSI-X interrupts during initialization · cbfe360a
      Stefan Assmann authored
      In igb_sw_init() the sequence of calls was changed from
      igb_init_queue_configuration()
      igb_init_interrupt_scheme()
      igb_probe_vfs()
      to
      igb_probe_vfs()
      igb_init_queue_configuration()
      igb_init_interrupt_scheme()
      
      This results in adapter->flags not having the IGB_FLAG_HAS_MSIX bit set
      during igb_probe_vfs()->igb_enable_sriov(). Therefore SR-IOV does not
      get enabled properly and we run into a NULL pointer if the max_vfs
      module parameter is specified (adapter->vf_data does not get allocated,
      crash on accessing the structure).
      
      [    7.419348] BUG: unable to handle kernel NULL pointer dereference at 0000000000000048
      [    7.419367] IP: [<ffffffffa02161c6>] igb_reset+0xe6/0x5d0 [igb]
      [    7.419370] PGD 0
      [    7.419373] Oops: 0002 [#1] SMP
      [    7.419381] Modules linked in: ahci(+) libahci igb(+) i40e(+) vxlan ip6_udp_tunnel udp_tunnel megaraid_sas(+) ixgbe(+) mdio
      [    7.419385] CPU: 0 PID: 4 Comm: kworker/0:0 Not tainted 4.2.0+ #153
      [    7.419387] Hardware name: Dell Inc. PowerEdge R720/0C4Y3R, BIOS 1.6.0 03/07/2013
      [...]
      [    7.419431] Call Trace:
      [    7.419442]  [<ffffffffa0217236>] igb_probe+0x8b6/0x1340 [igb]
      [    7.419447]  [<ffffffff814c7f15>] local_pci_probe+0x45/0xa0
      
      Prevent this by setting the IGB_FLAG_HAS_MSIX bit before calling
      igb_probe_vfs(). The real interrupt capabilities will be checked during
      igb_init_interrupt_scheme() so this is safe to do.
      Signed-off-by: default avatarStefan Assmann <sassmann@kpanic.de>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      cbfe360a
    • Anjali Singhai's avatar
      i40e: Fix for recursive RTNL lock during PROMISC change · 30e2561b
      Anjali Singhai authored
      The sync_vsi_filters function can be called directly under RTNL
      or through the timer subtask without one. This was causing a deadlock.
      
      If sync_vsi_filters is called from a thread which held the lock,
      and in another thread the PROMISC setting got changed we would
      be executing the PROMISC change in the thread which already held
      the lock alongside the other filter update. The PROMISC change
      requires a reset if we are on a VEB, which requires it to be called
      under RTNL.
      
      Earlier the driver would call reset for PROMISC change without
      checking if we were already under RTNL and would try to grab it
      causing a deadlock. This patch changes the flow to see if we are
      already under RTNL before trying to grab it.
      Signed-off-by: default avatarAnjali Singhai Jain <anjali.singhai@intel.com>
      Signed-off-by: default avatarKiran Patil <kiran.patil@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      30e2561b
    • Anjali Singhai's avatar
      i40e: Fix RS bit update in Tx path and disable force WB workaround · 58044743
      Anjali Singhai authored
      This patch fixes the issue of forcing WB too often causing us to not
      benefit from NAPI.
      
      Without this patch we were forcing WB/arming interrupt too often taking
      away the benefits of NAPI and causing a performance impact.
      
      With this patch we disable force WB in the clean routine for X710
      and XL710 adapters. X722 adapters do not enable interrupt to force
      a WB and benefit from WB_ON_ITR and hence force WB is left enabled
      for those adapters.
      For XL710 and X710 adapters if we have less than 4 packets pending
      a software Interrupt triggered from service task will force a WB.
      
      This patch also changes the conditions for setting RS bit as described
      in code comments. This optimizes when the HW does a tail bump and when
      it does a WB. It also optimizes when we do a wmb.
      Signed-off-by: default avatarAnjali Singhai Jain <anjali.singhai@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      58044743
    • Shannon Nelson's avatar
      i40e: add GRE tunnel type to csum encoding · c1d1791d
      Shannon Nelson authored
      Make sure the Tx checksum encoder knows about GRE protocol and sets the
      descriptor flag appropriately.
      Signed-off-by: default avatarShannon Nelson <shannon.nelson@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      c1d1791d
    • Kiran Patil's avatar
      i40e/i40evf: refactor tx timeout logic · b03a8c1f
      Kiran Patil authored
      This patch modifies the driver timeout logic by issuing a writeback
      request via a software interrupt to the hardware the first time the
      driver detects a hang. The driver was too aggressive in resetting a hung
      queue, so back that off by removing logic to down the netdevice after
      too many hangs, and move the function to the service task.
      
      Change-ID: Ife100b9d124cd08cbdb81ab659008c1b9abbedea
      Signed-off-by: default avatarKiran Patil <kiran.patil@intel.com>
      Signed-off-by: default avatarShannon Nelson <shannon.nelson@intel.com>
      Signed-off-by: default avatarJesse Brandeburg <jesse.brandeburg@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      b03a8c1f
    • Kiran Patil's avatar
      i40e: Move i40e_get_head into header file · 1e6d6f8c
      Kiran Patil authored
      i40e_get_head needs to be called in multiple files in a further patch,
      prepare by moving the function into a header file.
      Signed-off-by: default avatarKiran Patil <kiran.patil@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      1e6d6f8c
  2. 28 Sep, 2015 1 commit
  3. 27 Sep, 2015 5 commits
    • David S. Miller's avatar
      Merge branch 'vxlan-ipv4-ipv6' · 8f350437
      David S. Miller authored
      Jiri Benc says:
      
      ====================
      vxlan: support both IPv4 and IPv6 sockets
      
      Note: this needs net merged into net-next in order to apply.
      
      It's currently not easy enough to work with metadata based vxlan tunnels. In
      particular, it's necessary to create separate network interfaces for IPv4
      and IPv6 tunneling. Assigning an IPv6 address to an IPv4 interface is
      allowed yet won't do what's expected. With route based tunneling, one has to
      pay attention to use the vxlan interface opened with the correct family.
      Other users of this (openvswitch) would need to always create two vxlan
      interfaces.
      
      Furthermore, there's no sane API for creating an IPv6 vxlan metadata based
      interface.
      
      This patchset simplifies this by opening both IPv4 and IPv6 socket if the
      vxlan interface has the metadata flag (IFLA_VXLAN_COLLECT_METADATA) set.
      Assignment of addresses etc. works as expected after this.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8f350437
    • Jiri Benc's avatar
      vxlan: support both IPv4 and IPv6 sockets in a single vxlan device · b1be00a6
      Jiri Benc authored
      For metadata based vxlan interface, open both IPv4 and IPv6 socket. This is
      much more user friendly: it's not necessary to create two vxlan interfaces
      and pay attention to using the right one in routing rules.
      Signed-off-by: default avatarJiri Benc <jbenc@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b1be00a6
    • Jiri Benc's avatar
      vxlan: make vxlan_sock_add and vxlan_sock_release complementary · 205f356d
      Jiri Benc authored
      Make vxlan_sock_add both alloc the socket and attach it to vxlan_dev. Let
      vxlan_sock_release accept vxlan_dev as its parameter instead of vxlan_sock.
      
      This makes vxlan_sock_add and vxlan_sock release complementary. It reduces
      code duplication in the next patch.
      Signed-off-by: default avatarJiri Benc <jbenc@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      205f356d
    • David Woodhouse's avatar
      8139cp: Fix GSO MSS handling · 8b7a7048
      David Woodhouse authored
      When fixing the TSO support I noticed we just mask ->gso_size with the
      MSSMask value and don't care about the consequences.
      
      Provide a .ndo_features_check() method which drops the NETIF_F_TSO
      feature for any skb which would exceed the maximum, and thus forces it
      to be segmented by software.
      
      Then we can stop the masking in cp_start_xmit(), and just WARN if the
      maximum is exceeded, which should now never happen.
      
      Finally, Francois Romieu noticed that we didn't even have the right
      value for MSSMask anyway; it should be 0x7ff (11 bits) not 0xfff.
      Signed-off-by: default avatarDavid Woodhouse <David.Woodhouse@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8b7a7048
    • David Woodhouse's avatar
      8139cp: Enable offload features by default · 5a58f227
      David Woodhouse authored
      I fixed TSO. Hardware checksum and scatter/gather also appear to be
      working correctly both on real hardware and in QEMU's emulation.
      
      Let's enable them by default and see if anyone screams...
      Signed-off-by: default avatarDavid Woodhouse <David.Woodhouse@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5a58f227
  4. 26 Sep, 2015 4 commits
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 4963ed48
      David S. Miller authored
      Conflicts:
      	net/ipv4/arp.c
      
      The net/ipv4/arp.c conflict was one commit adding a new
      local variable while another commit was deleting one.
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4963ed48
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 518a7cb6
      Linus Torvalds authored
      Pull networking fixes from David Miller:
      
       1) When we run a tap on netlink sockets, we have to copy mmap'd SKBs
          instead of cloning them.  From Daniel Borkmann.
      
       2) When converting classical BPF into eBPF, fix the setting of the
          source reg to BPF_REG_X.  From Tycho Andersen.
      
       3) Fix igmpv3/mldv2 report parsing in the bridge multicast code, from
          Linus Lussing.
      
       4) Fix dst refcounting for ipv6 tunnels, from Martin KaFai Lau.
      
       5) Set NLM_F_REPLACE flag properly when replacing ipv6 routes, from
          Roopa Prabhu.
      
       6) Add some new cxgb4 PCI device IDs, from Hariprasad Shenai.
      
       7) Fix headroom tests and SKB leaks in ipv6 fragmentation code, from
          Florian Westphal.
      
       8) Check DMA mapping errors in bna driver, from Ivan Vecera.
      
       9) Several 8139cp bug fixes (dev_kfree_skb_any in interrupt context,
          misclearing of interrupt status in TX timeout handler, etc.) from
          David Woodhouse.
      
      10) In tipc, reset SKB header pointer after skb_linearize(), from Erik
          Hugne.
      
      11) Fix autobind races et al. in netlink code, from Herbert Xu with
          help from Tejun Heo and others.
      
      12) Missing SET_NETDEV_DEV in sunvnet driver, from Sowmini Varadhan.
      
      13) Fix various races in timewait timer and reqsk_queue_hadh_req, from
          Eric Dumazet.
      
      14) Fix array overruns in mac80211, from Johannes Berg and Dan
          Carpenter.
      
      15) Fix data race in rhashtable_rehash_one(), from Dmitriy Vyukov.
      
      16) Fix race between poll_one_napi and napi_disable, from Neil Horman.
      
      17) Fix byte order in geneve tunnel port config, from John W Linville.
      
      18) Fix handling of ARP replies over lightweight tunnels, from Jiri
          Benc.
      
      19) We can loop when fib rule dumps cross multiple SKBs, fix from Wilson
          Kok and Roopa Prabhu.
      
      20) Several reference count handling bug fixes in the PHY/MDIO layer
          from Russel King.
      
      21) Fix lockdep splat in ppp_dev_uninit(), from Guillaume Nault.
      
      22) Fix crash in icmp_route_lookup(), from David Ahern.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (116 commits)
        net: Fix panic in icmp_route_lookup
        net: update docbook comment for __mdiobus_register()
        ppp: fix lockdep splat in ppp_dev_uninit()
        net: via/Kconfig: GENERIC_PCI_IOMAP required if PCI not selected
        phy: marvell: add link partner advertised modes
        net: fix net_device refcounting
        phy: add phy_device_remove()
        phy: fixed-phy: properly validate phy in fixed_phy_update_state()
        net: fix phy refcounting in a bunch of drivers
        of_mdio: fix MDIO phy device refcounting
        phy: add proper phy struct device refcounting
        phy: fix mdiobus module safety
        net: dsa: fix of_mdio_find_bus() device refcount leak
        phy: fix of_mdio_find_bus() device refcount leak
        ip6_tunnel: Reduce log level in ip6_tnl_err() to debug
        ip6_gre: Reduce log level in ip6gre_err() to debug
        fib_rules: fix fib rule dumps across multiple skbs
        bnx2x: byte swap rss_key to comply to Toeplitz specs
        net: revert "net_sched: move tp->root allocation into fw_init()"
        lwtunnel: remove source and destination UDP port config option
        ...
      518a7cb6
    • David Ahern's avatar
      net: Fix panic in icmp_route_lookup · bdb06cbf
      David Ahern authored
      Andrey reported a panic:
      
      [ 7249.865507] BUG: unable to handle kernel pointer dereference at 000000b4
      [ 7249.865559] IP: [<c16afeca>] icmp_route_lookup+0xaa/0x320
      [ 7249.865598] *pdpt = 0000000030f7f001 *pde = 0000000000000000
      [ 7249.865637] Oops: 0000 [#1]
      ...
      [ 7249.866811] CPU: 0 PID: 0 Comm: swapper/0 Not tainted
      4.3.0-999-generic #201509220155
      [ 7249.866876] Hardware name: MSI MS-7250/MS-7250, BIOS 080014  08/02/2006
      [ 7249.866916] task: c1a5ab00 ti: c1a52000 task.ti: c1a52000
      [ 7249.866949] EIP: 0060:[<c16afeca>] EFLAGS: 00210246 CPU: 0
      [ 7249.866981] EIP is at icmp_route_lookup+0xaa/0x320
      [ 7249.867012] EAX: 00000000 EBX: f483ba48 ECX: 00000000 EDX: f2e18a00
      [ 7249.867045] ESI: 000000c0 EDI: f483ba70 EBP: f483b9ec ESP: f483b974
      [ 7249.867077]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
      [ 7249.867108] CR0: 8005003b CR2: 000000b4 CR3: 36ee07c0 CR4: 000006f0
      [ 7249.867141] Stack:
      [ 7249.867165]  320310ee 00000000 00000042 320310ee 00000000 c1aeca00
      f3920240 f0c69180
      [ 7249.867268]  f483ba04 f855058b a89b66cd f483ba44 f8962f4b 00000000
      e659266c f483ba54
      [ 7249.867361]  8004753c f483ba5c f8962f4b f2031140 000003c1 ffbd8fa0
      c16b0e00 00000064
      [ 7249.867448] Call Trace:
      [ 7249.867494]  [<f855058b>] ? e1000_xmit_frame+0x87b/0xdc0 [e1000e]
      [ 7249.867534]  [<f8962f4b>] ? tcp_in_window+0xeb/0xb10 [nf_conntrack]
      [ 7249.867576]  [<f8962f4b>] ? tcp_in_window+0xeb/0xb10 [nf_conntrack]
      [ 7249.867615]  [<c16b0e00>] ? icmp_send+0xa0/0x380
      [ 7249.867648]  [<c16b102f>] icmp_send+0x2cf/0x380
      [ 7249.867681]  [<f89c8126>] nf_send_unreach+0xa6/0xc0 [nf_reject_ipv4]
      [ 7249.867714]  [<f89cd0da>] reject_tg+0x7a/0x9f [ipt_REJECT]
      [ 7249.867746]  [<f88c29a7>] ipt_do_table+0x317/0x70c [ip_tables]
      [ 7249.867780]  [<f895e0a6>] ? __nf_conntrack_find_get+0x166/0x3b0
      [nf_conntrack]
      [ 7249.867838]  [<f895eea8>] ? nf_conntrack_in+0x398/0x600 [nf_conntrack]
      [ 7249.867889]  [<f84c0035>] iptable_filter_hook+0x35/0x80 [iptable_filter]
      [ 7249.867933]  [<c16776a1>] nf_iterate+0x71/0x80
      [ 7249.867970]  [<c1677715>] nf_hook_slow+0x65/0xc0
      [ 7249.868002]  [<c1681811>] __ip_local_out_sk+0xc1/0xd0
      [ 7249.868034]  [<c1680f30>] ? ip_forward_options+0x1a0/0x1a0
      [ 7249.868066]  [<c1681836>] ip_local_out_sk+0x16/0x30
      [ 7249.868097]  [<c1684054>] ip_send_skb+0x14/0x80
      [ 7249.868129]  [<c16840f4>] ip_push_pending_frames+0x34/0x40
      [ 7249.868163]  [<c16844a2>] ip_send_unicast_reply+0x282/0x310
      [ 7249.868196]  [<c16a0863>] tcp_v4_send_reset+0x1b3/0x380
      [ 7249.868227]  [<c16a1b63>] tcp_v4_rcv+0x323/0x990
      [ 7249.868257]  [<c16776a1>] ? nf_iterate+0x71/0x80
      [ 7249.868289]  [<c167dc2b>] ip_local_deliver_finish+0x8b/0x230
      [ 7249.868322]  [<c167df4c>] ip_local_deliver+0x4c/0xa0
      [ 7249.868353]  [<c167dba0>] ? ip_rcv_finish+0x390/0x390
      [ 7249.868384]  [<c167d88c>] ip_rcv_finish+0x7c/0x390
      [ 7249.868415]  [<c167e280>] ip_rcv+0x2e0/0x420
      ...
      
      Prior to the VRF change the oif was not set in the flow struct, so the
      VRF support should really have only added the vrf_master_ifindex lookup.
      
      Fixes: 613d09b3 ("net: Use VRF device index for lookups on TX")
      Cc: Andrey Melnikov <temnota.am@gmail.com>
      Signed-off-by: default avatarDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bdb06cbf
    • Russell King's avatar
      net: update docbook comment for __mdiobus_register() · 59f06978
      Russell King authored
      Update the docbook comment for __mdiobus_register() to include the new
      module owner argument.  This resolves a warning found by the 0-day
      builder.
      Signed-off-by: default avatarRussell King <rmk+kernel@arm.linux.org.uk>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      59f06978
  5. 25 Sep, 2015 20 commits