1. 06 Sep, 2014 18 commits
    • Jiri Pirko's avatar
    • David S. Miller's avatar
      Merge branch 'tcp' · 3aff5017
      David S. Miller authored
      Eric Dumazet says:
      
      ====================
      tcp: deduplicate TCP_SKB_CB(skb)->when
      
      TCP_SKB_CB(skb)->when has different meaning in output and input paths.
      
      In output path, it contains a timestamp.
      In input path, it contains an ISN, chosen by tcp_timewait_state_process()
      
      Its usage in output path is obsolete after usec timestamping.
      Lets simplify and clean this.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3aff5017
    • Eric Dumazet's avatar
      tcp: remove TCP_SKB_CB(skb)->when · 7faee5c0
      Eric Dumazet authored
      After commit 740b0f18 ("tcp: switch rtt estimations to usec resolution"),
      we no longer need to maintain timestamps in two different fields.
      
      TCP_SKB_CB(skb)->when can be removed, as same information sits in skb_mstamp.stamp_jiffies
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Acked-by: default avatarYuchung Cheng <ycheng@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7faee5c0
    • Eric Dumazet's avatar
      tcp: introduce TCP_SKB_CB(skb)->tcp_tw_isn · 04317daf
      Eric Dumazet authored
      TCP_SKB_CB(skb)->when has different meaning in output and input paths.
      
      In output path, it contains a timestamp.
      In input path, it contains an ISN, chosen by tcp_timewait_state_process()
      
      Lets add a different name to ease code comprehension.
      
      Note that 'when' field will disappear in following patch,
      as skb_mstamp already contains timestamp, the anonymous
      union will promptly disappear as well.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Acked-by: default avatarYuchung Cheng <ycheng@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      04317daf
    • David S. Miller's avatar
      Merge branch 'eth_get_headlen' · 2ba38943
      David S. Miller authored
      Alexander Duyck says:
      
      ====================
      net: Drop get_headlen functions in favor of generic function
      
      This series replaces the igb_get_headlen and ixgbe_get_headlen functions
      with a generic function named eth_get_headlen.
      
      I have done some performance testing on ixgbe with 258 byte frames since
      the calls are only used on frames larger than 256 bytes and have seen no
      significant difference in CPU utilization.
      
      v2: renamed __skb_get_poff to skb_get_poff
          renamed ___skb_get_poff to __skb_get_poff
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2ba38943
    • Alexander Duyck's avatar
      ixgbe: use new eth_get_headlen interface · 8496e338
      Alexander Duyck authored
      Update ixgbe to drop the ixgbe_get_headlen function in favor of eth_get_headlen.
      Signed-off-by: default avatarAlexander Duyck <alexander.h.duyck@intel.com>
      Acked-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8496e338
    • Alexander Duyck's avatar
      igb: use new eth_get_headlen interface · 24cd23d3
      Alexander Duyck authored
      Update igb to drop the igb_get_headlen function in favor of eth_get_headlen.
      Signed-off-by: default avatarAlexander Duyck <alexander.h.duyck@intel.com>
      Acked-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      24cd23d3
    • Alexander Duyck's avatar
      net: Add function for parsing the header length out of linear ethernet frames · 56193d1b
      Alexander Duyck authored
      This patch updates some of the flow_dissector api so that it can be used to
      parse the length of ethernet buffers stored in fragments.  Most of the
      changes needed were to __skb_get_poff as it needed to be updated to support
      sending a linear buffer instead of a skb.
      
      I have split __skb_get_poff into two functions, the first is skb_get_poff
      and it retains the functionality of the original __skb_get_poff.  The other
      function is __skb_get_poff which now works much like __skb_flow_dissect in
      relation to skb_flow_dissect in that it provides the same functionality but
      works with just a data buffer and hlen instead of needing an skb.
      Signed-off-by: default avatarAlexander Duyck <alexander.h.duyck@intel.com>
      Acked-by: default avatarAlexei Starovoitov <ast@plumgrid.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      56193d1b
    • David S. Miller's avatar
      Merge branch 'timestamping' · 2c048e64
      David S. Miller authored
      Alexander Duyck says:
      
      ====================
      This change makes it so that the core path for the phy timestamping logic
      is shared between skb_tx_tstamp and skb_complete_tx_timestamp.  In addition
      it provides a means of using the same skb clone type path in non phy
      timestamping drivers.
      
      The main motivation for this is to enable non-phy drivers to be able to
      manipulate tx timestamp skbs for such things as putting them in lists or
      setting aside buffer in the context block.
      
      v2: Incorporated suggested changes from Willem de Bruijn and Eric Dumazet
           dropped uneeded comment
           restored order of hwtstamp vs swtstamp
           added destructor for skb
          Dropped usage of skb_complete_tx_timestamp as a kfree_skb w/ destructor
      
      v3: Updated destructor handling and dealt with socket reference counting issues
      
      v4: Split out combining destructors into a separate patch
      ====================
      2c048e64
    • Alexander Duyck's avatar
      net: merge cases where sock_efree and sock_edemux are the same function · 82eabd9e
      Alexander Duyck authored
      Since sock_efree and sock_demux are essentially the same code for non-TCP
      sockets and the case where CONFIG_INET is not defined we can combine the
      code or replace the call to sock_edemux in several spots.  As a result we
      can avoid a bit of unnecessary code or code duplication.
      Signed-off-by: default avatarAlexander Duyck <alexander.h.duyck@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      82eabd9e
    • Alexander Duyck's avatar
      net-timestamp: Make the clone operation stand-alone from phy timestamping · 62bccb8c
      Alexander Duyck authored
      The phy timestamping takes a different path than the regular timestamping
      does in that it will create a clone first so that the packets needing to be
      timestamped can be placed in a queue, or the context block could be used.
      
      In order to support these use cases I am pulling the core of the code out
      so it can be used in other drivers beyond just phy devices.
      
      In addition I have added a destructor named sock_efree which is meant to
      provide a simple way for dropping the reference to skb exceptions that
      aren't part of either the receive or send windows for the socket, and I
      have removed some duplication in spots where this destructor could be used
      in place of sock_edemux.
      Signed-off-by: default avatarAlexander Duyck <alexander.h.duyck@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      62bccb8c
    • Alexander Duyck's avatar
      net-timestamp: Merge shared code between phy and regular timestamping · 37846ef0
      Alexander Duyck authored
      This change merges the shared bits that exist between skb_tx_tstamp and
      skb_complete_tx_timestamp.  By doing this we can avoid the two diverging as
      there were already changes pushed into skb_tx_tstamp that hadn't made it
      into the other function.
      
      In addition this resolves issues with the fact that
      skb_complete_tx_timestamp was included in linux/skbuff.h even though it was
      only compiled in if phy timestamping was enabled.
      Signed-off-by: default avatarAlexander Duyck <alexander.h.duyck@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      37846ef0
    • Eric Dumazet's avatar
      ipv4: harden fnhe_hashfun() · d546c621
      Eric Dumazet authored
      Lets make this hash function a bit secure, as ICMP attacks are still
      in the wild.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d546c621
    • Willem de Bruijn's avatar
      net-timestamp: fix allocation error in test · 18a47e6d
      Willem de Bruijn authored
      A buffer is incorrectly zeroed to the length of the pointer. If
      cfg_payload_len < sizeof(void *) this can overwrites unrelated memory.
      The buffer contents are never read, so no need to zero.
      
      Fixes: 8fe2f761 ("net-timestamp: expand documentation")
      Reported-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      18a47e6d
    • Dan Carpenter's avatar
      hyperv: NULL dereference on error · b1c84927
      Dan Carpenter authored
      We try to call free_netvsc_device(net_device) when "net_device" is NULL.
      It leads to an Oops.
      
      Fixes: f90251c8 ('hyperv: Increase the buffer length for netvsc_channel_cb()')
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b1c84927
    • David S. Miller's avatar
      Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-next · a77f9a28
      David S. Miller authored
      Jeff Kirsher says:
      
      ====================
      Intel Wired LAN Driver Updates 2014-09-04
      
      This series contains updates to i40e, i40evf, ixgbe and ixgbevf.
      
      Catherine adds dual speed module support to i40e.  Updates i40e to allow
      the user to change link settings when the link is down.
      
      Serey renames i40e_ndo_set_vf_spoofck() to i40e_ndo_set_vf_spookchk()
      to be more consistent with what is defined in netdev and removes a
      unnecessary variable assignment.
      
      Jesse makes a malicious driver detection warning only print if extended
      driver string is enabled for i40e.  Fixes a panic under traffic load when
      resetting or if/whenever there was a Tx-timeout because we were enabling
      the Tx queue to early.
      
      Anjali fixes an issue when PF reset fails, where we were trying to restart
      the admin queue which has not been setup at that point.  This resolves an
      occasional kernel panic when PF reset fails for some reason.
      
      Ethan Zhao replaces the use of a local i40e_vfs_are_assigned() with the
      global kernel pci_vfs_assigned() for i40e.
      
      Alex cleans up the FDB handling for ixgbe.  This change makes it so that
      the behavior for FDB handling is consistent between both the SR-IOV and
      non-SR-IOV cases.  The main change is that we perform bounds checking on
      the number of SR-IOV addresses regardless of if SR-IOV is enabled or not
      as we can only support a certain number of addresses in the hardware.
      
      Emil extends the pending Tx work check to the VF interfaces, where the
      driver initiates a reset of the interface on link loss with pending Tx
      work in order to clear the rings.  Introduces a delay for 82599 VFs of
      at least 500 usecs to make sure the VFLINKS value is correct, since this
      bit tends to flap when a DA or SFP+ cable is disconnected.
      
      Jacob adds code comments in ixgbe to make it more obvious that we are
      resetting features based on the fact that we do not have MSI-X enabled,
      and cannot use the previous settings.  Also resolves a kernel NULL
      pointer dereference by limiting the combined total of MACVLAN and
      SR-IOV VFs, since the hardware has a limited number of pools available
      (64).  Previously, no checks were in place to limit the number of
      accelerated MACVLAN devices based on the number of pools, which would
      be ok since there was already a limit for these well below the number of
      available pools.  However, SR-IOV uses the very same pools, therefore
      we need to ensure that the total number of pools does not exceed the
      number of pools available in the hardware.
      
      v2:
       - clean up code comment in patch 5 by replacing "an" with "auto
         negotiation" based on feedback from Sergei Shtylyov
       - removed un-necessary parenthesis around function call in patch 8
         based on feedback from Sergei Shtylyov
      ====================
      a77f9a28
    • Daniel Mack's avatar
      net: ethernet: cpsw: improve interrupt lookup logic in cpsw_probe() · c2b32e58
      Daniel Mack authored
      Simplify the interrupt resource lookup code in cpsw_probe() by the
      following:
      
       * Only look at the first member of the resource. As the driver only
         works for DT-enabled platforms anyway, a resource of type
         IORESOURCE_IRQ will only contain one single entry
         (res->start == res->end), so there is no need for the iteration.
      
       * Add a bounds check to avoid overflows if we are passed more than
         ARRAY_SIZE(priv->irqs_table) resources.
      
       * Assign 'ret' with the return value of devm_request_irq() so that
         cpsw_probe() returns the appropriate error code.
      
       * If devm_request_irq() fails, report the error code in the log
         message.
      Signed-off-by: default avatarDaniel Mack <zonque@gmail.com>
      Acked-by: default avatarMugunthan V N <mugunthanvnm@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c2b32e58
    • Eric Dumazet's avatar
      ipv4: fix a race in update_or_create_fnhe() · caa41527
      Eric Dumazet authored
      nh_exceptions is effectively used under rcu, but lacks proper
      barriers. Between kzalloc() and setting of nh->nh_exceptions(),
      we need a proper memory barrier.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Fixes: 4895c771 ("ipv4: Add FIB nexthop exceptions.")
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      caa41527
  2. 05 Sep, 2014 18 commits
  3. 04 Sep, 2014 4 commits
    • Jacob Keller's avatar
      ixgbe: limit combined total of macvlan and SR-IOV VFs · aac2f1bf
      Jacob Keller authored
      Hardware has a limited number of pools available (64). Previously, no
      checks were in place to limit the number of accelerated macvlan devices
      based on the number of pools. Normally this would be ok, because there
      was already a limit for these well below the number of available pools.
      However, SR-IOV uses the very same pools. Therefor, we need to ensure
      that the total number of pools (number of VFs plus the number of non-VF
      pools in use for accelerated macvlans) does not exceed the number of
      pools available in hardware.
      
      This patch resolves a kernel NULL pointer dereference caused by the following commands:
      
      $modprobe ixgbe max_vfs=63
      
      $ethtool -K eth2 l2-fwd-offload on
      
      $ip link add link eth2 macvlan0 type macvlan
      
      $ip link set dev macvlan0 up
      
      [  992.950080] BUG: unable to handle kernel NULL pointer dereference at 0000000000000056
      [  992.951109] IP: [<ffffffffa003b71e>] ixgbe_disable_fwd_ring+0x1e/0xf0 [ixgbe]
      [  992.951684] PGD 22a80e067 PUD 232e9b067 PMD 0
      [  992.952389] Oops: 0000 [#1] SMP
      [  992.953014] Modules linked in: nfsd lockd nfs_acl exportfs auth_rpcgss oid_registry sunrpc bridge stp llc vhost_net macvtap macvlan vhost tun kvm_intel kvm ioatdma ixgbe mdio igb dca
      [  992.956042] CPU: 2 PID: 11928 Comm: ifconfig Not tainted 3.16.0-rc6-net-next-07-29-2014-FCoE+ #1
      [  992.956915] Hardware name: Intel Corporation S2600CO/S2600CO, BIOS SE5C600.86B.02.03.0003.041920141333 04/19/2014
      [  992.957791] task: ffff8804341c0000 ti: ffff8801d7dc8000 task.ti: ffff8801d7dc8000
      [  992.958660] RIP: 0010:[<ffffffffa003b71e>]  [<ffffffffa003b71e>] ixgbe_disable_fwd_ring+0x1e/0xf0 [ixgbe]
      [  992.959613] RSP: 0018:ffff8801d7dcbbb8  EFLAGS: 00010286
      [  992.960093] RAX: 0000000000000001 RBX: 0000000000000000 RCX: 0000000000000001
      [  992.960575] RDX: ffff880232eb7000 RSI: 0000000000000000 RDI: ffff88022dc05800
      [  992.961059] RBP: ffff8801d7dcbbd8 R08: 0000000000000000 R09: 0000000000000000
      [  992.961541] R10: 0000000000000001 R11: 0000000000000000 R12: ffff88022ec20980
      [  992.962023] R13: ffff880232eb7000 R14: 0000000000000001 R15: 0000000000000001
      [  992.962508] FS:  00007fab264887a0(0000) GS:ffff880237640000(0000) knlGS:0000000000000000
      [  992.963378] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  992.963858] CR2: 0000000000000056 CR3: 000000022a939000 CR4: 00000000001427e0
      [  992.964340] Stack:
      [  992.964806]  ffff88022ec28840 ffff88022ec20980 ffff88022dc05800 ffff880232eb7000
      [  992.965976]  ffff8801d7dcbc28 ffffffffa003bae8 ffff8801d7dcbbe8 0000000000000400
      [  992.967147]  000000000000000d ffff88022ec20980 ffff88022ec20000 ffff88022dc05800
      [  992.968319] Call Trace:
      [  992.968795]  [<ffffffffa003bae8>] ixgbe_fwd_ring_up+0x88/0x280 [ixgbe]
      [  992.969284]  [<ffffffffa0041d83>] ixgbe_fwd_add+0x173/0x220 [ixgbe]
      [  992.969767]  [<ffffffffa015056c>] macvlan_open+0x1bc/0x230 [macvlan]
      [  992.970256]  [<ffffffff816b8de7>] __dev_open+0xd7/0x150
      [  992.970735]  [<ffffffff816b8bd7>] __dev_change_flags+0xa7/0x170
      [  992.971220]  [<ffffffff816b8ccb>] dev_change_flags+0x2b/0x70
      [  992.971703]  [<ffffffff817471b2>] devinet_ioctl+0x602/0x6d0
      [  992.972184]  [<ffffffff81748168>] inet_ioctl+0x78/0x90
      [  992.972666]  [<ffffffff816a143b>] sock_do_ioctl+0x2b/0x70
      [  992.973146]  [<ffffffff816a14ed>] sock_ioctl+0x6d/0x260
      [  992.973627]  [<ffffffff811ad3b4>] do_vfs_ioctl+0x84/0x540
      [  992.974109]  [<ffffffff811a4c81>] ? final_putname+0x21/0x50
      [  992.974593]  [<ffffffff818725d5>] ? sysret_check+0x22/0x5d
      [  992.975073]  [<ffffffff811ad901>] SyS_ioctl+0x91/0xa0
      [  992.975550]  [<ffffffff818725a9>] system_call_fastpath+0x16/0x1b
      [  992.976026] Code: ff 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 48 83 ec 20 48 89 5d e8 4c 89 65 f0 48 89 f3 4c 89 6d f8 4c 8b a7 08 02 00 00 <44> 0f b6 6e 56 44 03 af 14 02 00 00 4c 89 e7 e8 5e f2 ff ff be
      [  992.982261] RIP  [<ffffffffa003b71e>] ixgbe_disable_fwd_ring+0x1e/0xf0 [ixgbe]
      [  992.983212]  RSP <ffff8801d7dcbbb8>
      [  992.983681] CR2: 0000000000000056
      [  992.984248] ---[ end trace 9f54802b5cc3638b ]---
      
      Cc: John Fastabend <john.r.fastabend@intel.com>
      Signed-off-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Tested-by: default avatarPhil Schmitt <phillip.j.schmitt@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      aac2f1bf
    • Jacob Keller's avatar
      ixgbe: add comment noting recalculation of queues · eec66731
      Jacob Keller authored
      Since we previously called ixgbe_set_num_queues just prior to attempting
      to set our interrupt scheme, it may be non obvious why we have to call
      it again inside the function. Add a comment which helps make it more
      obvious that we are resetting features based on the fact that we do not
      have MSI-X enabled, and cannot use the previous settings.
      Signed-off-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Tested-by: default avatarPhil Schmitt <phillip.j.schmitt@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      eec66731
    • Emil Tantilov's avatar
      ixgbevf: introduce delay for checking VFLINKS on 82599 · b8a2ca19
      Emil Tantilov authored
      VFLINKS.LINKUP bit tends to flap when a DA or SFP+ cable is disconnected.
      It can take up to 500 usecs for the LINKUP bit to be correct.
      
      This patch resolves the issue by introducing a delay for 82599 VFs of at
      least 500 usecs to make sure the VFLINKS value is correct.
      Signed-off-by: default avatarEmil Tantilov <emil.s.tantilov@intel.com>
      Tested-by: default avatarPhil Schmitt <phillip.j.schmitt@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      b8a2ca19
    • Emil Tantilov's avatar
      ixgbe: reset interface on link loss with pending Tx work from the VF · 07923c17
      Emil Tantilov authored
      ixgbe initiates a reset of the interface on link loss with pending Tx work
      in order to clear the rings.
      
      This patch extends the pending Tx work check to the VF interfaces with the
      same purpose.
      Signed-off-by: default avatarEmil Tantilov <emil.s.tantilov@intel.com>
      Tested-by: default avatarPhil Schmitt <phillip.j.schmitt@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      07923c17