1. 28 Nov, 2017 6 commits
    • David S. Miller's avatar
      Merge branch 'mlxsw-GRE-offloading-fixes' · e2549970
      David S. Miller authored
      Jiri Pirko says:
      
      ====================
      mlxsw: GRE offloading fixes
      
      Petr says:
      
      This patchset fixes a couple bugs in offloading GRE tunnels in mlxsw
      driver.
      
      Patch #1 fixes a problem that local routes pointing at a GRE tunnel
      device are offloaded even if that netdevice is down.
      
      Patch #2 detects that as a result of moving a GRE netdevice to a
      different VRF, two tunnels now have a conflict of local addresses,
      something that the mlxsw driver can't offload.
      
      Patch #3 fixes a FIB abort caused by forming a route pointing at a
      GRE tunnel that is eligible for offloading but already onloaded.
      
      Patch #4 fixes a problem that next hops migrated to a new RIF kept the
      old RIF reference, which went dangling shortly afterwards.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e2549970
    • Petr Machata's avatar
      mlxsw: spectrum_router: Update nexthop RIF on update · 09dbf629
      Petr Machata authored
      The function mlxsw_sp_nexthop_rif_update() walks the list of nexthops
      associated with a RIF, and updates the corresponding entries in the
      switch. It is used in particular when a tunnel underlay netdevice moves
      to a different VRF, and all the nexthops are migrated over to a new RIF.
      The problem is that each nexthop holds a reference to its RIF, and that
      is not updated. So after the old RIF is gone, further activity on these
      nexthops (such as downing the underlay netdevice) dereferences a
      dangling pointer.
      
      Fix the issue by updating rif of impacted nexthops before calling
      mlxsw_sp_nexthop_rif_update().
      
      Fixes: 0c5f1cd5 ("mlxsw: spectrum_router: Generalize __mlxsw_sp_ipip_entry_update_tunnel()")
      Signed-off-by: default avatarPetr Machata <petrm@mellanox.com>
      Reviewed-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      09dbf629
    • Petr Machata's avatar
      mlxsw: spectrum_router: Handle encap to demoted tunnels · d97cda5f
      Petr Machata authored
      Some tunnels that are offloadable on their own can nonetheless be
      demoted to slow path if their local address is in conflict with that of
      another tunnel. When a route is formed for such a tunnel,
      mlxsw_sp_nexthop_ipip_init() fails to find the corresponding IPIP entry,
      and that triggers a FIB abort.
      
      Resolve the problem by not assuming that a tunnel for which
      mlxsw_sp_ipip_ops.can_offload() holds also automatically has an IPIP
      entry.
      
      Fixes: af641713 ("mlxsw: spectrum_router: Onload conflicting tunnels")
      Signed-off-by: default avatarPetr Machata <petrm@mellanox.com>
      Reviewed-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d97cda5f
    • Petr Machata's avatar
      mlxsw: spectrum_router: Demote tunnels on VRF migration · cab43d9c
      Petr Machata authored
      The mlxsw driver currently doesn't offload GRE tunnels if they have the
      same local address and use the same underlay VRF. When such a situation
      arises, the tunnels in conflict are demoted to slow path.
      
      However, the current code only verifies this condition on tunnel
      creation and tunnel change, not when a tunnel is moved to a different
      VRF. When the tunnel has no bound device, underlay and overlay are the
      same. Thus moving a tunnel moves the underlay as well, and that can
      cause local address conflict.
      
      So modify mlxsw_sp_netdevice_ipip_ol_vrf_event() to check if there are
      any conflicting tunnels, and demote them if yes.
      
      Fixes: af641713 ("mlxsw: spectrum_router: Onload conflicting tunnels")
      Signed-off-by: default avatarPetr Machata <petrm@mellanox.com>
      Reviewed-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cab43d9c
    • Petr Machata's avatar
      mlxsw: spectrum_router: Offload decap only for up tunnels · 57c77ce4
      Petr Machata authored
      When a new local route is added, an IPIP entry is looked up to determine
      whether the route should be offloaded as a tunnel decap or as a trap.
      That decision should take into account whether the tunnel netdevice in
      question is actually IFF_UP, and only install a decap offload if it is.
      
      Fixes: 0063587d ("mlxsw: spectrum: Support decap-only IP-in-IP tunnels")
      Signed-off-by: default avatarPetr Machata <petrm@mellanox.com>
      Reviewed-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      57c77ce4
    • David S. Miller's avatar
      Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-queue · 32f0160c
      David S. Miller authored
      Jeff Kirsher says:
      
      ====================
      Intel Wired LAN Driver Updates 2017-11-27
      
      This series contains updates to e1000, e1000e and i40e.
      
      Gustavo A. R. Silva fixes a sizeof() issue where we were taking the size of
      the pointer (which is always the size of the pointer).
      
      Sasha does a follow up fix to a previous fix for buffer overrun, to resolve
      community feedback from David Laight and the use of magic numbers.
      
      Amritha fixes the reporting of error codes for when adding a cloud filter
      fails.
      
      Ahmad Fatoum brushes the dust off the e1000 driver to fix a code comment
      and debug message which was incorrect about what the code was really doing.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      32f0160c
  2. 27 Nov, 2017 16 commits
  3. 26 Nov, 2017 2 commits
    • zhangliping's avatar
      openvswitch: fix the incorrect flow action alloc size · 67c8d22a
      zhangliping authored
      If we want to add a datapath flow, which has more than 500 vxlan outputs'
      action, we will get the following error reports:
        openvswitch: netlink: Flow action size 32832 bytes exceeds max
        openvswitch: netlink: Flow action size 32832 bytes exceeds max
        openvswitch: netlink: Actions may not be safe on all matching packets
        ... ...
      
      It seems that we can simply enlarge the MAX_ACTIONS_BUFSIZE to fix it, but
      this is not the root cause. For example, for a vxlan output action, we need
      about 60 bytes for the nlattr, but after it is converted to the flow
      action, it only occupies 24 bytes. This means that we can still support
      more than 1000 vxlan output actions for a single datapath flow under the
      the current 32k max limitation.
      
      So even if the nla_len(attr) is larger than MAX_ACTIONS_BUFSIZE, we
      shouldn't report EINVAL and keep it move on, as the judgement can be
      done by the reserve_sfa_size.
      Signed-off-by: default avatarzhangliping <zhangliping02@baidu.com>
      Acked-by: default avatarPravin B Shelar <pshelar@ovn.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      67c8d22a
    • Gustavo A. R. Silva's avatar
      net: openvswitch: datapath: fix data type in queue_gso_packets · 2734166e
      Gustavo A. R. Silva authored
      gso_type is being used in binary AND operations together with SKB_GSO_UDP.
      The issue is that variable gso_type is of type unsigned short and
      SKB_GSO_UDP expands to more than 16 bits:
      
      SKB_GSO_UDP = 1 << 16
      
      this makes any binary AND operation between gso_type and SKB_GSO_UDP to
      be always zero, hence making some code unreachable and likely causing
      undesired behavior.
      
      Fix this by changing the data type of variable gso_type to unsigned int.
      
      Addresses-Coverity-ID: 1462223
      Fixes: 0c19f846 ("net: accept UFO datagrams from tuntap and packet")
      Signed-off-by: default avatarGustavo A. R. Silva <garsilva@embeddedor.com>
      Acked-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2734166e
  4. 25 Nov, 2017 6 commits
  5. 24 Nov, 2017 10 commits
    • Johannes Berg's avatar
      cfg80211: select CRYPTO_SHA256 if needed · 01a95b21
      Johannes Berg authored
      When regulatory database certificates are built-in, they're
      currently using the SHA256 digest algorithm, so add that to
      the build in that case.
      
      Also add a note that for custom certificates, one may need
      to add the right algorithms.
      Reported-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Tested-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      01a95b21
    • Zhu Yanjun's avatar
      forcedeth: replace pci_unmap_page with dma_unmap_page · ca43a0c7
      Zhu Yanjun authored
      The function pci_unmap_page is obsolete. So it is replaced with
      the function dma_unmap_page.
      
      CC: Srinivas Eeda <srinivas.eeda@oracle.com>
      CC: Joe Jin <joe.jin@oracle.com>
      CC: Junxiao Bi <junxiao.bi@oracle.com>
      Signed-off-by: default avatarZhu Yanjun <yanjun.zhu@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ca43a0c7
    • David S. Miller's avatar
      Merge tag 'rxrpc-fixes-20171124' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs · 5f109b94
      David S. Miller authored
      David Howells says:
      
      ====================
      rxrpc: Fixes and improvements
      
      Here's a set of patches that fix and improve some stuff in the AF_RXRPC
      protocol:
      
      The patches are:
      
       (1) Unlock mutex returned by rxrpc_accept_call().
      
       (2) Don't set connection upgrade by default.
      
       (3) Differentiate the call->user_mutex used by the kernel from that used
           by userspace calling sendmsg() to avoid lockdep warnings.
      
       (4) Delay terminal ACK transmission to a work queue so that it can be
           replaced by the next call if there is one.
      
       (5) Split the call parameters from the connection parameters so that more
           call-specific parameters can be passed through.
      
       (6) Fix the call timeouts to work the same as for other RxRPC/AFS
           implementations.
      
       (7) Don't transmit DELAY ACKs immediately, but instead delay them slightly
           so that can be discarded or can represent more packets.
      
       (8) Use RTT to calculate certain protocol timeouts.
      
       (9) Add a timeout to detect lost ACK/DATA packets.
      
      (10) Add a keepalive function so that we ping the peer if we haven't
           transmitted for a short while, thereby keeping intervening firewall
           routes open.
      
      (11) Make service endpoints expire like they're supposed to so that the UDP
           port can be reused.
      
      (12) Fix connection expiry timers to make cleanup happen in a more timely
           fashion.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5f109b94
    • David Howells's avatar
      rxrpc: Fix conn expiry timers · 3d18cbb7
      David Howells authored
      Fix the rxrpc connection expiry timers so that connections for closed
      AF_RXRPC sockets get deleted in a more timely fashion, freeing up the
      transport UDP port much more quickly.
      
       (1) Replace the delayed work items with work items plus timers so that
           timer_reduce() can be used to shorten them and so that the timer
           doesn't requeue the work item if the net namespace is dead.
      
       (2) Don't use queue_delayed_work() as that won't alter the timeout if the
           timer is already running.
      
       (3) Don't rearm the timers if the network namespace is dead.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      3d18cbb7
    • David Howells's avatar
      rxrpc: Fix service endpoint expiry · f859ab61
      David Howells authored
      RxRPC service endpoints expire like they're supposed to by the following
      means:
      
       (1) Mark dead rxrpc_net structs (with ->live) rather than twiddling the
           global service conn timeout, otherwise the first rxrpc_net struct to
           die will cause connections on all others to expire immediately from
           then on.
      
       (2) Mark local service endpoints for which the socket has been closed
           (->service_closed) so that the expiration timeout can be much
           shortened for service and client connections going through that
           endpoint.
      
       (3) rxrpc_put_service_conn() needs to schedule the reaper when the usage
           count reaches 1, not 0, as idle conns have a 1 count.
      
       (4) The accumulator for the earliest time we might want to schedule for
           should be initialised to jiffies + MAX_JIFFY_OFFSET, not ULONG_MAX as
           the comparison functions use signed arithmetic.
      
       (5) Simplify the expiration handling, adding the expiration value to the
           idle timestamp each time rather than keeping track of the time in the
           past before which the idle timestamp must go to be expired.  This is
           much easier to read.
      
       (6) Ignore the timeouts if the net namespace is dead.
      
       (7) Restart the service reaper work item rather the client reaper.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      f859ab61
    • David Howells's avatar
      rxrpc: Add keepalive for a call · 415f44e4
      David Howells authored
      We need to transmit a packet every so often to act as a keepalive for the
      peer (which has a timeout from the last time it received a packet) and also
      to prevent any intervening firewalls from closing the route.
      
      Do this by resetting a timer every time we transmit a packet.  If the timer
      ever expires, we transmit a PING ACK packet and thereby also elicit a PING
      RESPONSE ACK from the other side - which prevents our last-rx timeout from
      expiring.
      
      The timer is set to 1/6 of the last-rx timeout so that we can detect the
      other side going away if it misses 6 replies in a row.
      
      This is particularly necessary for servers where the processing of the
      service function may take a significant amount of time.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      415f44e4
    • David Howells's avatar
      rxrpc: Add a timeout for detecting lost ACKs/lost DATA · bd1fdf8c
      David Howells authored
      Add an extra timeout that is set/updated when we send a DATA packet that
      has the request-ack flag set.  This allows us to detect if we don't get an
      ACK in response to the latest flagged packet.
      
      The ACK packet is adjudged to have been lost if it doesn't turn up within
      2*RTT of the transmission.
      
      If the timeout occurs, we schedule the sending of a PING ACK to find out
      the state of the other side.  If a new DATA packet is ready to go sooner,
      we cancel the sending of the ping and set the request-ack flag on that
      instead.
      
      If we get back a PING-RESPONSE ACK that indicates a lower tx_top than what
      we had at the time of the ping transmission, we adjudge all the DATA
      packets sent between the response tx_top and the ping-time tx_top to have
      been lost and retransmit immediately.
      
      Rather than sending a PING ACK, we could just pick a DATA packet and
      speculatively retransmit that with request-ack set.  It should result in
      either a REQUESTED ACK or a DUPLICATE ACK which we can then use in lieu the
      a PING-RESPONSE ACK mentioned above.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      bd1fdf8c
    • David Howells's avatar
      rxrpc: Express protocol timeouts in terms of RTT · beb8e5e4
      David Howells authored
      Express protocol timeouts for data retransmission and deferred ack
      generation in terms on RTT rather than specified timeouts once we have
      sufficient RTT samples.
      
      For the moment, this requires just one RTT sample to be able to use this
      for ack deferral and two for data retransmission.
      
      The data retransmission timeout is set at RTT*1.5 and the ACK deferral
      timeout is set at RTT.
      
      Note that the calculated timeout is limited to a minimum of 4ns to make
      sure it doesn't happen too quickly.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      beb8e5e4
    • David Howells's avatar
      rxrpc: Don't transmit DELAY ACKs immediately on proposal · 8637abaa
      David Howells authored
      Don't transmit a DELAY ACK immediately on proposal when the Rx window is
      rotated, but rather defer it to the work function.  This means that we have
      a chance to queue/consume more received packets before we actually send the
      DELAY ACK, or even cancel it entirely, thereby reducing the number of
      packets transmitted.
      
      We do, however, want to continue sending other types of packet immediately,
      particularly REQUESTED ACKs, as they may be used for RTT calculation by the
      other side.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      8637abaa
    • David Howells's avatar
      rxrpc: Fix call timeouts · a158bdd3
      David Howells authored
      Fix the rxrpc call expiration timeouts and make them settable from
      userspace.  By analogy with other rx implementations, there should be three
      timeouts:
      
       (1) "Normal timeout"
      
           This is set for all calls and is triggered if we haven't received any
           packets from the peer in a while.  It is measured from the last time
           we received any packet on that call.  This is not reset by any
           connection packets (such as CHALLENGE/RESPONSE packets).
      
           If a service operation takes a long time, the server should generate
           PING ACKs at a duration that's substantially less than the normal
           timeout so is to keep both sides alive.  This is set at 1/6 of normal
           timeout.
      
       (2) "Idle timeout"
      
           This is set only for a service call and is triggered if we stop
           receiving the DATA packets that comprise the request data.  It is
           measured from the last time we received a DATA packet.
      
       (3) "Hard timeout"
      
           This can be set for a call and specified the maximum lifetime of that
           call.  It should not be specified by default.  Some operations (such
           as volume transfer) take a long time.
      
      Allow userspace to set/change the timeouts on a call with sendmsg, using a
      control message:
      
      	RXRPC_SET_CALL_TIMEOUTS
      
      The data to the message is a number of 32-bit words, not all of which need
      be given:
      
      	u32 hard_timeout;	/* sec from first packet */
      	u32 idle_timeout;	/* msec from packet Rx */
      	u32 normal_timeout;	/* msec from data Rx */
      
      This can be set in combination with any other sendmsg() that affects a
      call.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      a158bdd3