1. 24 Jun, 2015 22 commits
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 3a07bd6f
      David S. Miller authored
      Conflicts:
      	drivers/net/ethernet/mellanox/mlx4/main.c
      	net/packet/af_packet.c
      
      Both conflicts were cases of simple overlapping changes.
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3a07bd6f
    • Phil Sutter's avatar
      net: inet_diag: export IPV6_V6ONLY sockopt · 20462155
      Phil Sutter authored
      For AF_INET6 sockets, the value of struct ipv6_pinfo.ipv6only is
      exported to userspace. It indicates whether a socket bound to in6addr_any
      listens on IPv4 as well as IPv6. Since the socket is natively IPv6, it is not
      listed by e.g. 'ss -l -4'.
      
      This patch is accompanied by an appropriate one for iproute2 to enable
      the additional information in 'ss -e'.
      Signed-off-by: default avatarPhil Sutter <phil@nwl.cc>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      20462155
    • Alexey Brodkin's avatar
      stmmac: troubleshoot unexpected bits in des0 & des1 · f1590670
      Alexey Brodkin authored
      Current implementation of descriptor init procedure only takes
      care about setting/clearing ownership flag in "des0"/"des1"
      fields while it is perfectly possible to get unexpected bits
      set because of the following factors:
      
       [1] On driver probe underlying memory allocated with
           dma_alloc_coherent() might not be zeroed and so
           it will be filled with garbage.
      
       [2] During driver operation some bits could be set by SD/MMC
           controller (for example error flags etc).
      
      And unexpected and/or randomly set flags in "des0"/"des1"
      fields may lead to unpredictable behavior of GMAC DMA block.
      
      This change addresses both items above with:
      
       [1] Use of dma_zalloc_coherent() instead of simple
           dma_alloc_coherent() to make sure allocated memory is
           zeroed. That shouldn't affect performance because
           this allocation only happens once on driver probe.
      
       [2] Do explicit zeroing of both "des0" and "des1" fields
           of all buffer descriptors during initialization of
           DMA transfer.
      
      And while at it fixed identation of dma_free_coherent()
      counterpart as well.
      Signed-off-by: default avatarAlexey Brodkin <abrodkin@synopsys.com>
      Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
      Cc: arc-linux-dev@synopsys.com
      Cc: linux-kernel@vger.kernel.org
      Cc: stable@vger.kernel.org
      Cc: David Miller <davem@davemloft.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f1590670
    • David S. Miller's avatar
      Merge branch 'ipv4-nexthop-link-status' · f389a40e
      David S. Miller authored
      Andy Gospodarek says:
      
      ====================
      changes to make ipv4 routing table aware of next-hop link status
      
      This series adds the ability to have the Linux kernel track whether or
      not a particular route should be used based on the link-status of the
      interface associated with the next-hop.
      
      Before this patch any link-failure on an interface that was serving as a
      gateway for some systems could result in those systems being isolated
      from the rest of the network as the stack would continue to attempt to
      send frames out of an interface that is actually linked-down.  When the
      kernel is responsible for all forwarding, it should also be responsible
      for taking action when the traffic can no longer be forwarded -- there
      is no real need to outsource link-monitoring to userspace anymore.
      
      This feature is only enabled with the new per-interface or ipv4 global
      sysctls called 'ignore_routes_with_linkdown'.
      
      net.ipv4.conf.all.ignore_routes_with_linkdown = 0
      net.ipv4.conf.default.ignore_routes_with_linkdown = 0
      net.ipv4.conf.lo.ignore_routes_with_linkdown = 0
      ...
      
      When the above sysctls are set, the kernel will not only report to
      userspace that the link is down, but it will also report to userspace
      that a route is dead.  This will signal to userspace that the route will
      not be selected.
      
      With the new sysctls set, the following behavior can be observed
      (interface p8p1 is link-down):
      
      default via 10.0.5.2 dev p9p1
      10.0.5.0/24 dev p9p1  proto kernel  scope link  src 10.0.5.15
      70.0.0.0/24 dev p7p1  proto kernel  scope link  src 70.0.0.1
      80.0.0.0/24 dev p8p1  proto kernel  scope link  src 80.0.0.1 dead linkdown
      90.0.0.0/24 via 80.0.0.2 dev p8p1  metric 1 dead linkdown
      90.0.0.0/24 via 70.0.0.2 dev p7p1  metric 2
      90.0.0.1 via 70.0.0.2 dev p7p1  src 70.0.0.1
          cache
      local 80.0.0.1 dev lo  src 80.0.0.1
          cache <local>
      80.0.0.2 via 10.0.5.2 dev p9p1  src 10.0.5.15
          cache
      
      While the route does remain in the table (so it can be modified if
      needed rather than being wiped away as it would be if IFF_UP was
      cleared), the proper next-hop is chosen automatically when the link is
      down.  Now interface p8p1 is linked-up:
      
      default via 10.0.5.2 dev p9p1
      10.0.5.0/24 dev p9p1  proto kernel  scope link  src 10.0.5.15
      70.0.0.0/24 dev p7p1  proto kernel  scope link  src 70.0.0.1
      80.0.0.0/24 dev p8p1  proto kernel  scope link  src 80.0.0.1
      90.0.0.0/24 via 80.0.0.2 dev p8p1  metric 1
      90.0.0.0/24 via 70.0.0.2 dev p7p1  metric 2
      192.168.56.0/24 dev p2p1  proto kernel  scope link  src 192.168.56.2
      90.0.0.1 via 80.0.0.2 dev p8p1  src 80.0.0.1
          cache
      local 80.0.0.1 dev lo  src 80.0.0.1
          cache <local>
      80.0.0.2 dev p8p1  src 80.0.0.1
          cache
      
      and the output changes to what one would expect.
      
      If the global or interface sysctl is not set, the following output would
      be expected when p8p1 is down:
      
      default via 10.0.5.2 dev p9p1
      10.0.5.0/24 dev p9p1  proto kernel  scope link  src 10.0.5.15
      70.0.0.0/24 dev p7p1  proto kernel  scope link  src 70.0.0.1
      80.0.0.0/24 dev p8p1  proto kernel  scope link  src 80.0.0.1 linkdown
      90.0.0.0/24 via 80.0.0.2 dev p8p1  metric 1 linkdown
      90.0.0.0/24 via 70.0.0.2 dev p7p1  metric 2
      
      If the dead flag does not appear there should be no expectation that the
      kernel would skip using this route due to link being down.
      
      v2: Split kernel changes into 2 patches: first to add linkdown flag and
      second to add new sysctl settings.  Also took suggestion from Alex to
      simplify code by only checking sysctl during fib lookup and suggestion
      from Scott to add a per-interface sysctl.  Added iproute2 patch to
      recognize and print linkdown flag.
      
      v3: Code cleanups along with reverse-path checks suggested by Alex and
      small fixes related to problems found when multipath was disabled.
      
      v4: Drop binary sysctls
      
      v5: Whitespace and variable declaration fixups suggested by Dave
      
      v6: Style changes noticed by Dave and checkpath suggestions.
      
      v7: Last checkpatch fixup.
      
      Though there were some that preferred not to have a configuration option
      and to make this behavior the default when it was discussed in Ottawa
      earlier this year since "it was time to do this."  I wanted to propose
      the config option to preserve the current behavior for those that desire
      it.  I'll happily remove it if Dave and Linus approve.
      
      An IPv6 implementation is also needed (DECnet too!), but I wanted to
      start with the IPv4 implementation to get people comfortable with the
      idea before moving forward.  If this is accepted the IPv6 implementation
      can be posted shortly.
      
      There was also a request for switchdev support for this, but that will
      be posted as a followup as switchdev does not currently handle dead
      next-hops in a multi-path case and I felt that infra needed to be added
      first.
      
      FWIW, we have been running the original version of this series with a
      global sysctl and our customers have been happily using a backported
      version for IPv4 and IPv6 for >6 months.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f389a40e
    • Andy Gospodarek's avatar
      net: ipv4 sysctl option to ignore routes when nexthop link is down · 0eeb075f
      Andy Gospodarek authored
      This feature is only enabled with the new per-interface or ipv4 global
      sysctls called 'ignore_routes_with_linkdown'.
      
      net.ipv4.conf.all.ignore_routes_with_linkdown = 0
      net.ipv4.conf.default.ignore_routes_with_linkdown = 0
      net.ipv4.conf.lo.ignore_routes_with_linkdown = 0
      ...
      
      When the above sysctls are set, will report to userspace that a route is
      dead and will no longer resolve to this nexthop when performing a fib
      lookup.  This will signal to userspace that the route will not be
      selected.  The signalling of a RTNH_F_DEAD is only passed to userspace
      if the sysctl is enabled and link is down.  This was done as without it
      the netlink listeners would have no idea whether or not a nexthop would
      be selected.   The kernel only sets RTNH_F_DEAD internally if the
      interface has IFF_UP cleared.
      
      With the new sysctl set, the following behavior can be observed
      (interface p8p1 is link-down):
      
      default via 10.0.5.2 dev p9p1
      10.0.5.0/24 dev p9p1  proto kernel  scope link  src 10.0.5.15
      70.0.0.0/24 dev p7p1  proto kernel  scope link  src 70.0.0.1
      80.0.0.0/24 dev p8p1  proto kernel  scope link  src 80.0.0.1 dead linkdown
      90.0.0.0/24 via 80.0.0.2 dev p8p1  metric 1 dead linkdown
      90.0.0.0/24 via 70.0.0.2 dev p7p1  metric 2
      90.0.0.1 via 70.0.0.2 dev p7p1  src 70.0.0.1
          cache
      local 80.0.0.1 dev lo  src 80.0.0.1
          cache <local>
      80.0.0.2 via 10.0.5.2 dev p9p1  src 10.0.5.15
          cache
      
      While the route does remain in the table (so it can be modified if
      needed rather than being wiped away as it would be if IFF_UP was
      cleared), the proper next-hop is chosen automatically when the link is
      down.  Now interface p8p1 is linked-up:
      
      default via 10.0.5.2 dev p9p1
      10.0.5.0/24 dev p9p1  proto kernel  scope link  src 10.0.5.15
      70.0.0.0/24 dev p7p1  proto kernel  scope link  src 70.0.0.1
      80.0.0.0/24 dev p8p1  proto kernel  scope link  src 80.0.0.1
      90.0.0.0/24 via 80.0.0.2 dev p8p1  metric 1
      90.0.0.0/24 via 70.0.0.2 dev p7p1  metric 2
      192.168.56.0/24 dev p2p1  proto kernel  scope link  src 192.168.56.2
      90.0.0.1 via 80.0.0.2 dev p8p1  src 80.0.0.1
          cache
      local 80.0.0.1 dev lo  src 80.0.0.1
          cache <local>
      80.0.0.2 dev p8p1  src 80.0.0.1
          cache
      
      and the output changes to what one would expect.
      
      If the sysctl is not set, the following output would be expected when
      p8p1 is down:
      
      default via 10.0.5.2 dev p9p1
      10.0.5.0/24 dev p9p1  proto kernel  scope link  src 10.0.5.15
      70.0.0.0/24 dev p7p1  proto kernel  scope link  src 70.0.0.1
      80.0.0.0/24 dev p8p1  proto kernel  scope link  src 80.0.0.1 linkdown
      90.0.0.0/24 via 80.0.0.2 dev p8p1  metric 1 linkdown
      90.0.0.0/24 via 70.0.0.2 dev p7p1  metric 2
      
      Since the dead flag does not appear, there should be no expectation that
      the kernel would skip using this route due to link being down.
      
      v2: Split kernel changes into 2 patches, this actually makes a
      behavioral change if the sysctl is set.  Also took suggestion from Alex
      to simplify code by only checking sysctl during fib lookup and
      suggestion from Scott to add a per-interface sysctl.
      
      v3: Code clean-ups to make it more readable and efficient as well as a
      reverse path check fix.
      
      v4: Drop binary sysctl
      
      v5: Whitespace fixups from Dave
      
      v6: Style changes from Dave and checkpatch suggestions
      
      v7: One more checkpatch fixup
      Signed-off-by: default avatarAndy Gospodarek <gospo@cumulusnetworks.com>
      Signed-off-by: default avatarDinesh Dutt <ddutt@cumulusnetworks.com>
      Acked-by: default avatarScott Feldman <sfeldma@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0eeb075f
    • Andy Gospodarek's avatar
      net: track link-status of ipv4 nexthops · 8a3d0316
      Andy Gospodarek authored
      Add a fib flag called RTNH_F_LINKDOWN to any ipv4 nexthops that are
      reachable via an interface where carrier is off.  No action is taken,
      but additional flags are passed to userspace to indicate carrier status.
      
      This also includes a cleanup to fib_disable_ip to more clearly indicate
      what event made the function call to replace the more cryptic force
      option previously used.
      
      v2: Split out kernel functionality into 2 patches, this patch simply
      sets and clears new nexthop flag RTNH_F_LINKDOWN.
      
      v3: Cleanups suggested by Alex as well as a bug noticed in
      fib_sync_down_dev and fib_sync_up when multipath was not enabled.
      
      v5: Whitespace and variable declaration fixups suggested by Dave.
      
      v6: Style fixups noticed by Dave; ran checkpatch to be sure I got them
      all.
      Signed-off-by: default avatarAndy Gospodarek <gospo@cumulusnetworks.com>
      Signed-off-by: default avatarDinesh Dutt <ddutt@cumulusnetworks.com>
      Acked-by: default avatarScott Feldman <sfeldma@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8a3d0316
    • Vivien Didelot's avatar
      net: switchdev: ignore unsupported bridge flags · 5c8079d0
      Vivien Didelot authored
      switchdev_port_bridge_getlink() queries SWITCHDEV_ATTR_PORT_BRIDGE_FLAGS
      attributes, but a driver doesn't need to implement this in order to get
      bridge link information.
      
      So error out only on errors different than -EOPNOTSUPP.
      
      (This is a follow-up patch for 7d4f8d87.)
      
      Fixes: 8793d0a6 ("switchdev: add new switchdev_port_bridge_getlink")
      Signed-off-by: default avatarVivien Didelot <vivien.didelot@savoirfairelinux.com>
      Acked-by: default avatarJiri Pirko <jiri@resnulli.us>
      Acked-by: default avatarScott Feldman <sfeldma@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5c8079d0
    • Pavel Fedin's avatar
      net: Cavium: Fix MAC address setting in shutdown state · bd049a90
      Pavel Fedin authored
      This bug pops up with NetworkManager on Fedora 21. NetworkManager tends to
      stop the interface (nicvf_stop() is called) before changing settings. In
      stopped state MAC cannot be sent to a PF. However, when the interface is
      restarted (nicvf_open() is called), we ping the PF using NIC_MBOX_MSG_READY
      message, and the PF replies back with old MAC address, overriding what we
      had after MAC setting from userspace. As a result, we cannot set MAC
      address using NetworkManager.
      
      This patch introduces special tracking of MAC change in stopped state so
      that the correct new MAC address is sent to a PF when interface is reopen.
      Signed-off-by: default avatarPavel Fedin <p.fedin@samsung.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bd049a90
    • Stephen Rothwell's avatar
    • Julian Anastasov's avatar
      ip: report the original address of ICMP messages · 34b99df4
      Julian Anastasov authored
      ICMP messages can trigger ICMP and local errors. In this case
      serr->port is 0 and starting from Linux 4.0 we do not return
      the original target address to the error queue readers.
      Add function to define which errors provide addr_offset.
      With this fix my ping command is not silent anymore.
      
      Fixes: c247f053 ("ip: fix error queue empty skb handling")
      Signed-off-by: default avatarJulian Anastasov <ja@ssi.bg>
      Acked-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      34b99df4
    • David S. Miller's avatar
      Merge branch 'mlx-next' · 12d4ae9d
      David S. Miller authored
      Or Gerlitz says:
      
      ====================
      Mellanox NIC drivers update, June 23 2015
      
      This series has two fixes from Eran to his recent SRIOV counters work in
      mlx4 and few more updates from Saeed and Achiad to the mlx5 Ethernet
      code. All fixes here relate to net-next code, so no need for -stable.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      12d4ae9d
    • Saeed Mahameed's avatar
      net/mlx5e: Prefetch skb data on RX · 99611ba1
      Saeed Mahameed authored
      Prefetch the 1st cache line used by the buffer pointed by
      the skb linear data.
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      99611ba1
    • Achiad Shochat's avatar
      net/mlx5e: Pop cq outside mlx5e_get_cqe · a1f5a1a8
      Achiad Shochat authored
      Separate between mlx5e_get_cqe() and mlx5_cqwq_pop(), this helps for
      better code readability and better CQ buffer management.
      Signed-off-by: default avatarAchiad Shochat <achiad@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a1f5a1a8
    • Achiad Shochat's avatar
      net/mlx5e: Remove mlx5e_cq.sqrq back-pointer · e3391054
      Achiad Shochat authored
      Use container_of() instead.
      Signed-off-by: default avatarAchiad Shochat <achiad@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e3391054
    • Achiad Shochat's avatar
      net/mlx5e: Remove extra spaces · 8ca56ce3
      Achiad Shochat authored
      Coding Style fix, remove extra spaces.
      Signed-off-by: default avatarAchiad Shochat <achiad@mellanox.com>
      Signed-off-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8ca56ce3
    • Achiad Shochat's avatar
      net/mlx5e: Avoid TX CQE generation if more xmit packets expected · 059ba072
      Achiad Shochat authored
      In order to save PCI BW consumed by TX CQEs and to reduce the amount of
      CPU cache misses caused by TX CQE reading, we request TX CQE generation
      only when skb->xmit_more=0.
      
      As a consequence of the above, a single TX CQE may now indicate the
      transmission completion of multiple TX SKBs.
      
      This also handles a problem introduced in commit b1b8105ebf41 "net/mlx5e:
      Support NETIF_F_SG" where we didn't ask for NOP completions while the
      driver didn't have the proper code to handle this case.
      
      Fixes: b1b8105ebf41 ('net/mlx5e: Support NETIF_F_SG')
      Signed-off-by: default avatarAchiad Shochat <achiad@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      059ba072
    • Achiad Shochat's avatar
      net/mlx5e: Avoid redundant dev_kfree_skb() upon NOP completion · 9fc59306
      Achiad Shochat authored
      NOP completion SKBs are always NULL.
      Signed-off-by: default avatarAchiad Shochat <achiad@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9fc59306
    • Achiad Shochat's avatar
      net/mlx5e: Remove re-assignment of wq type in mlx5e_enable_rq() · ef583d03
      Achiad Shochat authored
      It is already assigned at mlx5e_build_rq_param()
      Signed-off-by: default avatarAchiad Shochat <achiad@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ef583d03
    • Saeed Mahameed's avatar
      net/mlx5e: Use skb_shinfo(skb)->gso_segs rather than counting them · fb6c6f25
      Saeed Mahameed authored
      Instead of counting number of gso fragments, we can use
      skb_shinfo(skb)->gso_segs.
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fb6c6f25
    • Saeed Mahameed's avatar
      net/mlx5e: Static mapping of netdev priv resources to/from netdev TX queues · 03289b88
      Saeed Mahameed authored
      To save per-packet calculations, we use the following static mappings:
      1) priv {channel, tc} to netdev txq (used @mlx5e_selec_queue())
      2) netdev txq to priv sq (used @mlx5e_xmit())
      
      Thanks to these static mappings, no more need for a separate implementation
      of ndo_start_xmit when multiple TCs are configured.
      We believe the performance improvement of such separation would be negligible, if any.
      The previous way of dynamically calculating the above mappings required
      allocating more TX queues than actually used (@alloc_etherdev_mqs()),
      which is now no longer needed.
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      03289b88
    • Eran Ben Elisha's avatar
      net/mlx4_en: Use HW counters for rx/tx bytes/packets in PF device · f1a3badb
      Eran Ben Elisha authored
      Under SRIOV, the port rx/tx bytes/packets statistics should by read
      from the HW instead of using the PF netdevice SW accounting. This is
      needed in order to get the full port statistics and not just the PF
      own ones
      Signed-off-by: default avatarEran Ben Elisha <eranbe@mellanox.com>
      Signed-off-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f1a3badb
    • Eran Ben Elisha's avatar
      net/mlx4_en: Fix off-by-four in ethtool · 9a2abf5a
      Eran Ben Elisha authored
      NUM_ALL_STATS was not updated with the new four entries, instead
      NUM_FLOW_STATS was updated, fix it. that caused off-by-four for all
      counters below pf_*_*.
      
      Fixes: b42de4d0 ('net/mlx4_en: Show PF own statistics via ethtool')
      Signed-off-by: default avatarEran Ben Elisha <eranbe@mellanox.com>
      Signed-off-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9a2abf5a
  2. 23 Jun, 2015 18 commits