1. 16 Nov, 2019 17 commits
    • Matteo Croce's avatar
      bonding: symmetric ICMP transmit · df98be06
      Matteo Croce authored
      A bonding with layer2+3 or layer3+4 hashing uses the IP addresses and the ports
      to balance packets between slaves. With some network errors, we receive an ICMP
      error packet by the remote host or a router. If sent by a router, the source IP
      can differ from the remote host one. Additionally the ICMP protocol has no port
      numbers, so a layer3+4 bonding will get a different hash than the previous one.
      These two conditions could let the packet go through a different interface than
      the other packets of the same flow:
      
          # tcpdump -qltnni veth0 |sed 's/^/0: /' &
          # tcpdump -qltnni veth1 |sed 's/^/1: /' &
          # hping3 -2 192.168.0.2 -p 9
          0: IP 192.168.0.1.2251 > 192.168.0.2.9: UDP, length 0
          1: IP 192.168.0.2 > 192.168.0.1: ICMP 192.168.0.2 udp port 9 unreachable, length 36
          1: IP 192.168.0.1.2252 > 192.168.0.2.9: UDP, length 0
          1: IP 192.168.0.2 > 192.168.0.1: ICMP 192.168.0.2 udp port 9 unreachable, length 36
          1: IP 192.168.0.1.2253 > 192.168.0.2.9: UDP, length 0
          1: IP 192.168.0.2 > 192.168.0.1: ICMP 192.168.0.2 udp port 9 unreachable, length 36
          0: IP 192.168.0.1.2254 > 192.168.0.2.9: UDP, length 0
          1: IP 192.168.0.2 > 192.168.0.1: ICMP 192.168.0.2 udp port 9 unreachable, length 36
      
      An ICMP error packet contains the header of the packet which caused the network
      error, so inspect it and match the flow against it, so we can send the ICMP via
      the same interface of the previous packet in the flow.
      Move the IP and port dissect code into a generic function bond_flow_ip() and if
      we are dissecting an ICMP error packet, call it again with the adjusted offset.
      
          # hping3 -2 192.168.0.2 -p 9
          1: IP 192.168.0.1.1224 > 192.168.0.2.9: UDP, length 0
          1: IP 192.168.0.2 > 192.168.0.1: ICMP 192.168.0.2 udp port 9 unreachable, length 36
          1: IP 192.168.0.1.1225 > 192.168.0.2.9: UDP, length 0
          1: IP 192.168.0.2 > 192.168.0.1: ICMP 192.168.0.2 udp port 9 unreachable, length 36
          0: IP 192.168.0.1.1226 > 192.168.0.2.9: UDP, length 0
          0: IP 192.168.0.2 > 192.168.0.1: ICMP 192.168.0.2 udp port 9 unreachable, length 36
          0: IP 192.168.0.1.1227 > 192.168.0.2.9: UDP, length 0
          0: IP 192.168.0.2 > 192.168.0.1: ICMP 192.168.0.2 udp port 9 unreachable, length 36
      Signed-off-by: default avatarMatteo Croce <mcroce@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      df98be06
    • Horatiu Vultur's avatar
      net: mscc: ocelot: omit error check from of_get_phy_mode · 4214fa1e
      Horatiu Vultur authored
      The commit 0c65b2b9 ("net: of_get_phy_mode: Change API to solve
      int/unit warnings") updated the function of_get_phy_mode declaration.
      Now it returns an error code and in case the node doesn't contain the
      property 'phy-mode' or 'phy-connection-type' it returns -EINVAL and would
      set the phy_interface_t to PHY_INTERFACE_MODE_NA.
      
      Ocelot VSC7514 has 4 internal phys which have the phy interface
      PHY_INTERFACE_MODE_NA. So because of_get_phy_mode would assign
      PHY_INTERFACE_MODE_NA to phy_mode when there is an error, there is no need
      to add the error check.
      
      Updates for v2:
       - drop error check because of_get_phy_mode already assigns phy_interface
         to PHY_INTERFACE_MODE in case of error.
      Signed-off-by: default avatarHoratiu Vultur <horatiu.vultur@microchip.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4214fa1e
    • Alexander Lobakin's avatar
      net: core: allow fast GRO for skbs with Ethernet header in head · 8aef998d
      Alexander Lobakin authored
      Commit 78d3fd0b ("gro: Only use skb_gro_header for completely
      non-linear packets") back in May'09 (v2.6.31-rc1) has changed the
      original condition '!skb_headlen(skb)' to
      'skb->mac_header == skb->tail' in gro_reset_offset() saying: "Since
      the drivers that need this optimisation all provide completely
      non-linear packets" (note that this condition has become the current
      'skb_mac_header(skb) == skb_tail_pointer(skb)' later with commmit
      ced14f68 ("net: Correct comparisons and calculations using
      skb->tail and skb-transport_header") without any functional changes).
      
      For now, we have the following rough statistics for v5.4-rc7:
      1) napi_gro_frags: 14
      2) napi_gro_receive with skb->head containing (most of) payload: 83
      3) napi_gro_receive with skb->head containing all the headers: 20
      4) napi_gro_receive with skb->head containing only Ethernet header: 2
      
      With the current condition, fast GRO with the usage of
      NAPI_GRO_CB(skb)->frag0 is available only in the [1] case.
      Packets pushed by [2] and [3] go through the 'slow' path, but
      it's not a problem for them as they already contain all the needed
      headers in skb->head, so pskb_may_pull() only moves skb->data.
      
      The layout of skbs in the fourth [4] case at the moment of
      dev_gro_receive() is identical to skbs that have come through [1],
      as napi_frags_skb() pulls Ethernet header to skb->head. The only
      difference is that the mentioned condition is always false for them,
      because skb_put() and friends irreversibly alter the tail pointer.
      They also go through the 'slow' path, but now every single
      pskb_may_pull() in every single .gro_receive() will call the *really*
      slow __pskb_pull_tail() to pull headers to head. This significantly
      decreases the overall performance for no visible reasons.
      
      The only two users of method [4] is:
      * drivers/staging/qlge
      * drivers/net/wireless/iwlwifi (all three variants: dvm, mvm, mvm-mq)
      
      Note that in case with wireless drivers we can't use [1]
      (napi_gro_frags()) at least for now and mac80211 stack always
      performs pushes and pulls anyways, so performance hit is inavoidable.
      
      At the moment of v2.6.31 the mentioned change was necessary (that's
      why I don't add the "Fixes:" tag), but it became obsolete since
      skb_gro_mac_header() has gone in commit a50e233c ("net-gro:
      restore frag0 optimization"), so we can simply revert the condition
      in gro_reset_offset() to allow skbs from [4] go through the 'fast'
      path just like in case [1].
      
      This was tested on a 600 MHz MIPS CPU and a custom driver and this
      patch gave boosts up to 40 Mbps to method [4] in both directions
      comparing to net-next, which made overall performance relatively
      close to [1] (without it, [4] is the slowest).
      
      v2:
      - Add more references and explanations to commit message
      - Fix some typos ibid
      - No functional changes
      Signed-off-by: default avatarAlexander Lobakin <alobakin@dlink.ru>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8aef998d
    • David S. Miller's avatar
      Merge branch 'bnx2x-Remove-function-casts' · f92e88db
      David S. Miller authored
      Kees Cook says:
      
      ====================
      bnx2x: Remove function casts
      
      In order to make the entire kernel usable under Clang's Control Flow
      Integrity protections, function prototype casts need to be avoided
      because this will trip CFI checks at runtime (i.e. a mismatch between
      the caller's expected function prototype and the destination function's
      prototype). Many of these cases can be found with -Wcast-function-type,
      which found that bnx2x had a bunch of needless (or at least confusing)
      function casts. This series removes them all.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f92e88db
    • Kees Cook's avatar
      bnx2x: Remove hw_reset_t function casts · 548e5ffe
      Kees Cook authored
      All .rw_reset callbacks except bnx2x_84833_hw_reset_phy() use a
      void return type. No callers of .hw_reset check a return value and
      bnx2x_84833_hw_reset_phy() unconditionally returns 0. Remove all
      hw_reset_t casts and fix the return type to void.
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      548e5ffe
    • Kees Cook's avatar
      bnx2x: Remove format_fw_ver_t function casts · 26658f6b
      Kees Cook authored
      The return values for format_fw_ver_t callbacks are supposed to be
      "int", not "u8". Ultimately, the top-level caller doesn't actually check
      the return value at all, but just clean this all up anyway and fix the
      prototypes so that casts are no longer needed.
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      26658f6b
    • Kees Cook's avatar
      bnx2x: Remove config_init_t function casts · 3e19d1f2
      Kees Cook authored
      No callers of .config_init check return values. Remove the casting and
      change all callbacks to have the correct function prototype.
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3e19d1f2
    • Kees Cook's avatar
      bnx2x: Remove read_status_t function casts · 2c855d73
      Kees Cook authored
      The function casts for .read_status callbacks end up casting some int
      return values to u8. This seems to be bug-prone (-EINVAL being returned
      into something that appears to be true/false), but fixing the function
      prototypes doesn't change the existing behavior. Fix the return values
      to remove the casts.
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2c855d73
    • Kees Cook's avatar
      bnx2x: Drop redundant callback function casts · 86c1fe88
      Kees Cook authored
      NULL is already "void *" so it will auto-cast in assignments and
      initializers. Additionally, all the callbacks for .link_reset,
      .config_loopback, .set_link_led, and .phy_specific_func are already
      correct. No casting is needed for these, so remove them.
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      86c1fe88
    • Po Liu's avatar
      enetc: update TSN Qbv PSPEED set according to adjust link speed · 2e47cb41
      Po Liu authored
      ENETC has a register PSPEED to indicate the link speed of hardware.
      It is need to update accordingly. PSPEED field needs to be updated
      with the port speed for QBV scheduling purposes. Or else there is
      chance for gate slot not free by frame taking the MAC if PSPEED and
      phy speed not match. So update PSPEED when link adjust. This is
      implement by the adjust_link.
      Signed-off-by: default avatarPo Liu <Po.Liu@nxp.com>
      Signed-off-by: default avatarClaudiu Manoil <claudiu.manoil@nxp.com>
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2e47cb41
    • Po Liu's avatar
      enetc: Configure the Time-Aware Scheduler via tc-taprio offload · 34c6adf1
      Po Liu authored
      ENETC supports in hardware for time-based egress shaping according
      to IEEE 802.1Qbv. This patch implement the Qbv enablement by the
      hardware offload method qdisc tc-taprio method.
      Also update cbdr writeback to up level since control bd ring may
      writeback data to control bd ring.
      Signed-off-by: default avatarPo Liu <Po.Liu@nxp.com>
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarClaudiu Manoil <claudiu.manoil@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      34c6adf1
    • Jonathan Lemon's avatar
      page_pool: do not release pool until inflight == 0. · c3f812ce
      Jonathan Lemon authored
      The page pool keeps track of the number of pages in flight, and
      it isn't safe to remove the pool until all pages are returned.
      
      Disallow removing the pool until all pages are back, so the pool
      is always available for page producers.
      
      Make the page pool responsible for its own delayed destruction
      instead of relying on XDP, so the page pool can be used without
      the xdp memory model.
      
      When all pages are returned, free the pool and notify xdp if the
      pool is registered with the xdp memory system.  Have the callback
      perform a table walk since some drivers (cpsw) may share the pool
      among multiple xdp_rxq_info.
      
      Note that the increment of pages_state_release_cnt may result in
      inflight == 0, resulting in the pool being released.
      
      Fixes: d956a048 ("xdp: force mem allocator removal and periodic warning")
      Signed-off-by: default avatarJonathan Lemon <jonathan.lemon@gmail.com>
      Acked-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Acked-by: default avatarIlias Apalodimas <ilias.apalodimas@linaro.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c3f812ce
    • David S. Miller's avatar
      Merge branch 'smc-last-part-of-termination-improvements' · 3af7ff93
      David S. Miller authored
      Karsten Graul says:
      
      ====================
      last part of termination improvements
      
      Patches 1 and 2 finish the set of termination patches, introducing
      a reboot handler that terminates all link groups. Patch 3 adds an
      rcu_barrier before the module is unloaded, and patch 4 is cleanup.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3af7ff93
    • Ursula Braun's avatar
      net/smc: remove unused constant · ab8536ca
      Ursula Braun authored
      Constant SMC_CLOSE_WAIT_LISTEN_CLCSOCK_TIME is defined, but since
      commit 3d502067 ("net/smc: simplify wait when closing listen socket")
      no longer used. Remove it.
      Signed-off-by: default avatarUrsula Braun <ubraun@linux.ibm.com>
      Signed-off-by: default avatarKarsten Graul <kgraul@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ab8536ca
    • Ursula Braun's avatar
      net/smc: use rcu_barrier() on module unload · 4ead9c96
      Ursula Braun authored
      Add rcu_barrier() to make sure no RCU readers or callbacks are
      pending when the module is unloaded.
      Signed-off-by: default avatarUrsula Braun <ubraun@linux.ibm.com>
      Signed-off-by: default avatarKarsten Graul <kgraul@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4ead9c96
    • Ursula Braun's avatar
      net/smc: guarantee removal of link groups in reboot · a33a803c
      Ursula Braun authored
      When rebooting it should be guaranteed all link groups are cleaned
      up and freed.
      Signed-off-by: default avatarUrsula Braun <ubraun@linux.ibm.com>
      Signed-off-by: default avatarKarsten Graul <kgraul@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a33a803c
    • Ursula Braun's avatar
      net/smc: introduce bookkeeping of SMCR link groups · 6dabd405
      Ursula Braun authored
      If the smc module is unloaded return control from exit routine only,
      if all link groups are freed.
      If an IB device is thrown away return control from device removal only,
      if all link groups belonging to this device are freed.
      Counters for the total number of SMCR link groups and for the total
      number of SMCR links per IB device are introduced. smc module unloading
      continues only if the total number of SMCR link groups is zero. IB device
      removal continues only it the total number of SMCR links per IB device
      has decreased to zero.
      Signed-off-by: default avatarUrsula Braun <ubraun@linux.ibm.com>
      Signed-off-by: default avatarKarsten Graul <kgraul@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6dabd405
  2. 15 Nov, 2019 23 commits