1. 07 Jun, 2016 16 commits
    • Jamal Hadi Salim's avatar
    • Jamal Hadi Salim's avatar
      net sched actions: introduce timestamp for firsttime use · 53eb440f
      Jamal Hadi Salim authored
      Useful to know when the action was first used for accounting
      (and debugging)
      Signed-off-by: default avatarJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      53eb440f
    • Jamal Hadi Salim's avatar
    • Amir Vadai's avatar
      net/sched: cls_flower: Introduce support in SKIP SW flag · e69985c6
      Amir Vadai authored
      In order to make a filter processed only by hardware, skip_sw flag
      should be supplied. This is an addition to the already existing skip_hw
      flag (filter will be processed by software only). If no flag is
      specified, filter will be processed by both software and hardware.
      
      If only hardware offloaded filters exist, fl_classify() will return
      without doing anything.
      
      A following userspace patch will be sent once kernel patch is accepted.
      
      Example:
      
      tc filter add dev enp0s9 protocol ip prio 20 parent ffff: \
      	flower \
      		ip_proto 6 \
      		indev enp0s9 \
      		skip_sw \
      	action skbedit mark 0x1234
      Signed-off-by: default avatarAmir Vadai <amirva@mellanox.com>
      Acked-by: default avatarJiri Pirko <jiri@mellanox.com>
      Acked-by: default avatarJohn Fastabend <john.r.fastabend@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e69985c6
    • David S. Miller's avatar
      Merge branch 'qed-iov-fw-reqs' · 919f274f
      David S. Miller authored
      Yuval Mintz says:
      
      ====================
      qed: IOV series - relax firmware requirements
      
      In order for VFs to work, current implementation demands that the VF's
      requried storm firmware would be exactly the version that was loaded by
      the PF, which is a very harsh requirement.
      This patch series is intended to relax this -
      the recently submitted firmware is intended to be forward/backward
      compatible in its fastpath [slowpath is configured by PF on behalf of VF],
      and so VFs would only be required of having the same major faspath HSI in
      order to work.
      
      Most of the other patches in this series extend current forward
      compatibilty of driver to reduce chance of breaking PF/VF compatibility
      in the future. A few are unrelated IOV changes.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      919f274f
    • Yuval Mintz's avatar
      qed: PF to reply to unknown messages · 54fdd80f
      Yuval Mintz authored
      If a future VF would send the PF an unknown message, the PF today would
      not send a reply. This would have 2 bad effects:
        a. VF would have to timeout on the request.
        b. If VF were to send an additional message to PF, firmware would mark
           it as malicious.
      
      Instead, if there's some valid reply-address on the message - let the PF
      answer and tell the VF it doesn't know the message.
      Signed-off-by: default avatarYuval Mintz <Yuval.Mintz@qlogic.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      54fdd80f
    • Yuval Mintz's avatar
      qed: PF enforce MAC limitation of VFs · 8246d0b4
      Yuval Mintz authored
      The only limitation relating to MACs the PF enforce today on its VFs
      is in case it has a forced-unicast MAC address for them, in which case
      they can't configure other unicast addresses.
      Specifically, the PF isn't enforcing the number of MAC addresse a VF can
      configure regardless of the nubmer of such filters agreed upon by PF and
      VF during the acquisition process.
      
      PF's shadow-config is now extended to also contain information about its
      VFs' unicast addresses configuration, allowing such enforcement.
      Signed-off-by: default avatarYuval Mintz <Yuval.Mintz@qlogic.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8246d0b4
    • Yuval Mintz's avatar
      qed: Move doorbell calculation from VF to PF · 5040acf5
      Yuval Mintz authored
      Today, the VF is aware of its queues context-ids, and calculates the
      doorbell address when opening its queues on its own.
      The configuration of doorbells in HW can sometime in the future be changed
      by the PF [hw has several configurable features that might affect doorbell
      addresses, e.g., dpm support], this would break compatibility with older
      VFs as their calculated doorbell addresses would be incorrect for such a
      configuration.
      
      In order to avoid such a backward compatibility failure, let the PF make
      the calculation of the doorbell offset based on the context-id, and pass
      that to the VF.
      Signed-off-by: default avatarYuval Mintz <Yuval.Mintz@qlogic.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5040acf5
    • Yuval Mintz's avatar
      qed: Make PF more robust against malicious VF · 41086467
      Yuval Mintz authored
      There are several requests the VF can make toward the PF which the driver
      would pass to firmware without checking the validity first - specifically,
      opening queues and updating vports. Such configurations might cause the
      firmware to assert.
      
      This adds validation of the legality of said configurations on the PF side
      before passing it onward via ramrod to firmware.
      Signed-off-by: default avatarYuval Mintz <Yuval.Mintz@qlogic.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      41086467
    • Yuval Mintz's avatar
      qed: PF-VF resource negotiation · 1cf2b1a9
      Yuval Mintz authored
      One of the goals of the vf's first message to the PF [acquire]
      is to learn about the number of resources available to it [macs, vlans,
      etc.]. This is done via negotiation - the VF requires a set of resources,
      which the PF either approves or disaproves and sends a smaller set of
      resources as alternative. In this later case, the VF is then expected to
      either abort the probe or re-send the acquire message with less
      required resources.
      
      While this infrastructure exists since the initial submision of qed
      SRIOV support, it's in fact completely inoperational - PF isn't really
      looking into the resources the VF has asked for and is never going to
      reply to the VF that it lacks resources.
      
      This patch addresses this flow, fixing it and allowing the PF and VF
      to actually agree on a set of resources.
      Signed-off-by: default avatarYuval Mintz <Yuval.Mintz@qlogic.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1cf2b1a9
    • Yuval Mintz's avatar
      qed: Relax VF firmware requirements · 1fe614d1
      Yuval Mintz authored
      Current driver require an exact match between VF and PF storm firmware;
      Any difference would fail the VF acquire message, causing the VF probe
      to be aborted.
      
      While there's still dependencies between the two, the recent FW submission
      has relaxed the match requirement - instead of an exact match, there's now
      a 'fastpath' HSI major/minor scheme, where VFs and PFs that match in their
      major number can co-exist even if their minor is different.
      
      In order to accomadate this change some changes in the vf-start init flow
      had to be made, as the VF start ramrod now has to be sent only after PF
      learns which fastpath HSI its VF is requiring.
      Signed-off-by: default avatarYuval Mintz <Yuval.Mintz@qlogic.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1fe614d1
    • Eric Dumazet's avatar
      net: get rid of spin_trylock() in net_tx_action() · 3bcb846c
      Eric Dumazet authored
      Note: Tom Herbert posted almost same patch 3 months back, but for
      different reasons.
      
      The reasons we want to get rid of this spin_trylock() are :
      
      1) Under high qdisc pressure, the spin_trylock() has almost no
      chance to succeed.
      
      2) We loop multiple times in softirq handler, eventually reaching
      the max retry count (10), and we schedule ksoftirqd.
      
      Since we want to adhere more strictly to ksoftirqd being waked up in
      the future (https://lwn.net/Articles/687617/), better avoid spurious
      wakeups.
      
      3) calls to __netif_reschedule() dirty the cache line containing
      q->next_sched, slowing down the owner of qdisc.
      
      4) RT kernels can not use the spin_trylock() here.
      
      With help of busylock, we get the qdisc spinlock fast enough, and
      the trylock trick brings only performance penalty.
      
      Depending on qdisc setup, I observed a gain of up to 19 % in qdisc
      performance (1016600 pps instead of 853400 pps, using prio+tbf+fq_codel)
      
      ("mpstat -I SCPU 1" is much happier now)
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Tom Herbert <tom@herbertland.com>
      Acked-by: default avatarTom Herbert <tom@herbertland.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3bcb846c
    • Jason Wang's avatar
      vhost_net: stop polling socket during rx processing · 8241a1e4
      Jason Wang authored
      We don't stop rx polling socket during rx processing, this will lead
      unnecessary wakeups from under layer net devices (E.g
      sock_def_readable() form tun). Rx will be slowed down in this
      way. This patch avoids this by stop polling socket during rx
      processing. A small drawback is that this introduces some overheads in
      light load case because of the extra start/stop polling, but single
      netperf TCP_RR does not notice any change. In a super heavy load case,
      e.g using pktgen to inject packet to guest, we get about ~8.8%
      improvement on pps:
      
      before: ~1240000 pkt/s
      after:  ~1350000 pkt/s
      Signed-off-by: default avatarJason Wang <jasowang@redhat.com>
      Acked-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8241a1e4
    • Bhaktipriya Shridhar's avatar
      net: ethernet: cavium: liquidio: request_manager: Remove create_workqueue · aaa76724
      Bhaktipriya Shridhar authored
      alloc_workqueue replaces deprecated create_workqueue().
      
      A dedicated workqueue has been used since the workitem viz
      (&db_wq->wk.work which maps to check_db_timeout) is involved
      in normal device operation. WQ_MEM_RECLAIM has been set to guarantee
      forward progress under memory pressure, which is a requirement here.
      Since there are only a fixed number of work items, explicit concurrency
      limit is unnecessary.
      
      flush_workqueue is unnecessary since destroy_workqueue() itself calls
      drain_workqueue() which flushes repeatedly till the workqueue
      becomes empty.
      Signed-off-by: default avatarBhaktipriya Shridhar <bhaktipriya96@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      aaa76724
    • Bhaktipriya Shridhar's avatar
      net: ethernet: cavium: liquidio: response_manager: Remove create_workqueue · 523a61b4
      Bhaktipriya Shridhar authored
      alloc_workqueue replaces deprecated create_workqueue().
      
      A dedicated workqueue has been used since the workitem viz
      (&cwq->wk.work which maps to oct_poll_req_completion) is involved
      in normal device operation. WQ_MEM_RECLAIM has been set to guarantee
      forward progress under memory pressure, which is a requirement here.
      Since there are only a fixed number of work items, explicit concurrency
      limit is unnecessary.
      
      flush_workqueue is unnecessary since destroy_workqueue() itself calls
      drain_workqueue() which flushes repeatedly till the workqueue
      becomes empty. Hence the call to flush_workqueue() has been dropped.
      Signed-off-by: default avatarBhaktipriya Shridhar <bhaktipriya96@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      523a61b4
    • Aaron Conole's avatar
      virtio-net: Add initial MTU advice feature · 14de9d11
      Aaron Conole authored
      This commit adds the feature bit and associated mtu device entry for the
      virtio network device.  When a virtio device comes up, it checks the
      feature bit for the VIRTIO_NET_F_MTU feature.  If such feature bit is
      enabled, the driver will read the advised MTU and use it as the initial
      value.
      Signed-off-by: default avatarAaron Conole <aconole@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      14de9d11
  2. 06 Jun, 2016 12 commits
    • David S. Miller's avatar
      net: Revert vrf-local changes. · 3d9dc408
      David S. Miller authored
      This reverts commit 2fb7ea45.
      
      It results in build errors because ip6_input is not a
      symbol exported to modules.
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3d9dc408
    • David S. Miller's avatar
      Merge branch 'vrf-local' · 2fb7ea45
      David S. Miller authored
      David Ahern says:
      
      ====================
      net: vrf: Add support for local traffic to local addresses
      
      Add support for locally originated traffic to VRF-local addresses,
      be it addresses on enslaved devices or addresses on the VRF device:
      
      $ ip addr show dev red
      33: red: <NOARP,MASTER,UP,LOWER_UP> mtu 65536 qdisc pfifo_fast state UP group default qlen 1000
          link/ether be:00:53:b5:e4:25 brd ff:ff:ff:ff:ff:ff
          inet 1.1.1.1/32 scope global red
             valid_lft forever preferred_lft forever
          inet6 1111:1::1/128 scope global
             valid_lft forever preferred_lft forever
      
      $ ip addr show dev eth1
      3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master red state UP group default qlen 1000
          link/ether 02:e0:f9:79:34:bd brd ff:ff:ff:ff:ff:ff
          inet 10.100.1.1/24 brd 10.100.1.255 scope global eth1
             valid_lft forever preferred_lft forever
          inet6 2100:1::1/120 scope global
             valid_lft forever preferred_lft forever
          inet6 fe80::e0:f9ff:fe79:34bd/64 scope link
             valid_lft forever preferred_lft forever
      
      $ ping -c1 -I red 10.100.1.1
          ping: Warning: source address might be selected on device other than red.
          PING 10.100.1.1 (10.100.1.1) from 10.100.1.1 red: 56(84) bytes of data.
          64 bytes from 10.100.1.1: icmp_seq=1 ttl=64 time=0.057 ms
      
      $ ping -c1 -I red 1.1.1.1
      PING 1.1.1.1 (1.1.1.1) from 1.1.1.1 red: 56(84) bytes of data.
      64 bytes from 1.1.1.1: icmp_seq=1 ttl=64 time=0.136 ms
      
      --- 1.1.1.1 ping statistics ---
      1 packets transmitted, 1 received, 0% packet loss, time 0ms
      rtt min/avg/max/mdev = 0.136/0.136/0.136/0.000 ms
      
      $ ping6 -c1 -I red  2100:1::1
      ping6: Warning: source address might be selected on device other than red.
      PING 2100:1::1(2100:1::1) from 2100:1::1 red: 56 data bytes
      64 bytes from 2100:1::1: icmp_seq=1 ttl=64 time=0.167 ms
      
      --- 2100:1::1 ping statistics ---
      1 packets transmitted, 1 received, 0% packet loss, time 0ms
      rtt min/avg/max/mdev = 0.167/0.167/0.167/0.000 ms
      
      $ ping6 -c1 -I red 1111::1
      PING 1111::1(1111::1) from 1111:1::1 red: 56 data bytes
      64 bytes from 1111::1: icmp_seq=1 ttl=64 time=0.187 ms
      
      --- 1111::1 ping statistics ---
      1 packets transmitted, 1 received, 0% packet loss, time 0ms
      rtt min/avg/max/mdev = 0.187/0.187/0.187/0.000 ms
      
      This change also enables use of loopback address on the VRF device:
      $ ip addr add dev red 127.0.0.1/8
      
      $ ping -c1 -I red 127.0.0.1
      PING 127.0.0.1 (127.0.0.1) from 127.0.0.1 red: 56(84) bytes of data.
      64 bytes from 127.0.0.1: icmp_seq=1 ttl=64 time=0.058 ms
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2fb7ea45
    • David Ahern's avatar
      net: vrf: ipv6 support for local traffic to local addresses · 625b47b5
      David Ahern authored
      Add support for locally originated traffic to VRF-local IPv6 addresses.
      Similar to IPv4 a local dst is set on the skb and the packet is
      reinserted with a call to netif_rx. With this patch, ping, tcp and udp
      packets to a local IPv6 address are successfully routed:
      
          $ ip addr show dev eth1
          4: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master red state UP group default qlen 1000
              link/ether 02:e0:f9:1c:b9:74 brd ff:ff:ff:ff:ff:ff
              inet 10.100.1.1/24 brd 10.100.1.255 scope global eth1
                 valid_lft forever preferred_lft forever
              inet6 2100:1::1/120 scope global
                 valid_lft forever preferred_lft forever
              inet6 fe80::e0:f9ff:fe1c:b974/64 scope link
                 valid_lft forever preferred_lft forever
      
          $ ping6 -c1 -I red 2100:1::1
          ping6: Warning: source address might be selected on device other than red.
          PING 2100:1::1(2100:1::1) from 2100:1::1 red: 56 data bytes
          64 bytes from 2100:1::1: icmp_seq=1 ttl=64 time=0.098 ms
      
      ip6_input is exported so the VRF driver can use it for the dst input
      function. The dst_alloc function for IPv4 defaults to setting the input and
      output functions; IPv6's does not. VRF does not need to duplicate the Rx path
      so just export the ipv6 input function.
      Signed-off-by: default avatarDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      625b47b5
    • David Ahern's avatar
      net: vrf: ipv4 support for local traffic to local addresses · 671cd19a
      David Ahern authored
      Add support for locally originated traffic to VRF-local addresses. If
      destination device for an skb is the loopback or VRF device then set
      its dst to a local version of the VRF cached dst_entry and call netif_rx
      to insert the packet onto the rx queue - similar to what is done for
      loopback. This patch handles IPv4 support; follow on patch handles IPv6.
      
      With this patch, ping, tcp and udp packets to a local IPv4 address are
      successfully routed:
      
          $ ip addr show dev eth1
          4: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master red state UP group default qlen 1000
              link/ether 02:e0:f9:1c:b9:74 brd ff:ff:ff:ff:ff:ff
              inet 10.100.1.1/24 brd 10.100.1.255 scope global eth1
                 valid_lft forever preferred_lft forever
              inet6 2100:1::1/120 scope global
                 valid_lft forever preferred_lft forever
              inet6 fe80::e0:f9ff:fe1c:b974/64 scope link
                 valid_lft forever preferred_lft forever
      
          $ ping -c1 -I red 10.100.1.1
          ping: Warning: source address might be selected on device other than red.
          PING 10.100.1.1 (10.100.1.1) from 10.100.1.1 red: 56(84) bytes of data.
          64 bytes from 10.100.1.1: icmp_seq=1 ttl=64 time=0.057 ms
      
      This patch also enables use of IPv4 loopback address on the VRF device:
          $ ip addr add dev red 127.0.0.1/8
      
          $ ping -c1 -I red 127.0.0.1
          PING 127.0.0.1 (127.0.0.1) from 127.0.0.1 red: 56(84) bytes of data.
          64 bytes from 127.0.0.1: icmp_seq=1 ttl=64 time=0.058 ms
      Signed-off-by: default avatarDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      671cd19a
    • David Ahern's avatar
      net: vrf: Minor refactoring for local address patches · 09fcf916
      David Ahern authored
      Move the stripping of the ethernet header from is_ip_tx_frame into the
      ipv4 and ipv6 outbound functions. If the packet is destined to a local
      address the header is retained since the packet is sent back to netif_rx.
      
      Collapse vrf_send_v4_prep into vrf_process_v4_outbound.
      Signed-off-by: default avatarDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      09fcf916
    • David S. Miller's avatar
      Merge branch 'hv_netvsc-cleanups' · b94eb2ce
      David S. Miller authored
      Vitaly Kuznetsov says:
      
      ====================
      hv_netvsc: cleanup after untangling the pointer mess
      
      Changes since v1:
      - resend when net-next is open [David Miller]
      - rebased to current net-next.
      
      After we made traveling through our internal structures explicit it became
      obvious that some functions take arguments they don't need just to do
      redundant pointer travel and get to what they really need while their
      callers already have the required information.
      
      This is just a cleanup series with no functional changes intended. It
      doesn't pretend to be complete, additional cleanup of other functions may
      follow.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b94eb2ce
    • Vitaly Kuznetsov's avatar
      hv_netvsc: pass struct net_device to rndis_filter_set_offload_params() · 426d9541
      Vitaly Kuznetsov authored
      The only caller rndis_filter_device_add() has 'struct net_device' pointer
      already.
      Signed-off-by: default avatarVitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      426d9541
    • Vitaly Kuznetsov's avatar
      hv_netvsc: pass struct net_device to rndis_filter_set_device_mac() · e834da9a
      Vitaly Kuznetsov authored
      We unpack 'struct net_device' in netvsc_set_mac_addr() to get to
      'struct hv_device' pointer which we use in rndis_filter_set_device_mac()
      to get back to 'struct net_device'.
      Signed-off-by: default avatarVitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e834da9a
    • Vitaly Kuznetsov's avatar
      hv_netvsc: pass struct netvsc_device to rndis_filter_{open, close}() · 2f5fa6c8
      Vitaly Kuznetsov authored
      Both rndis_filter_open()/rndis_filter_close() use struct hv_device to
      reach to struct netvsc_device only and all callers have it already.
      While on it, rename net_device to nvdev in rndis_filter_open() as
      net_device is misleading.
      Signed-off-by: default avatarVitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2f5fa6c8
    • Vitaly Kuznetsov's avatar
      hv_netvsc: introduce {net, hv}_device_to_netvsc_device() helpers · 2625466d
      Vitaly Kuznetsov authored
      Make it easier to get 'struct netvsc_device' from 'struct net_device' and
      'struct hv_device' by introducing inline helpers.
      Signed-off-by: default avatarVitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2625466d
    • Vitaly Kuznetsov's avatar
      hv_netvsc: remove redundant assignment in netvsc_recv_callback() · 4baa994d
      Vitaly Kuznetsov authored
      net_device_ctx is assigned in the very beginning of the function and 'net'
      pointer doesn't change.
      Signed-off-by: default avatarVitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4baa994d
    • Michal Kubeček's avatar
      net: disable fragment reassembly if high_thresh is zero · 30759219
      Michal Kubeček authored
      Before commit 6d7b857d ("net: use lib/percpu_counter API for
      fragmentation mem accounting"), setting the reassembly high threshold
      to 0 prevented fragment reassembly as first fragment would be always
      evicted before second could be added to the queue. While inefficient,
      some users apparently relied on this method.
      
      Since the commit mentioned above, a percpu counter is used for
      reassembly memory accounting and high batch size avoids taking slow path
      in most common scenarios. As a result, a whole full sized packet can be
      reassembled without the percpu counter's main counter changing its value
      so that even with high_thresh set to 0, fragmented packets can be still
      reassembled and processed.
      
      Add explicit check preventing reassembly if high threshold is zero.
      Signed-off-by: default avatarMichal Kubecek <mkubecek@suse.cz>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      30759219
  3. 05 Jun, 2016 12 commits