1. 25 Apr, 2018 5 commits
  2. 24 Apr, 2018 35 commits
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 24cac700
      Linus Torvalds authored
      Pull networking fixes from David Miller:
      
       1) Fix rtnl deadlock in ipvs, from Julian Anastasov.
      
       2) s390 qeth fixes from Julian Wiedmann (control IO completion stalls,
          bad MAC address update sequence, request side races on command IO
          timeouts).
      
       3) Handle seq_file overflow properly in l2tp, from Guillaume Nault.
      
       4) Fix VLAN priority mappings in cpsw driver, from Ivan Khoronzhuk.
      
       5) Packet scheduler ife action fixes (malformed TLV lengths, etc.) from
          Alexander Aring.
      
       6) Fix out of bounds access in tcp md5 option parser, from Jann Horn.
      
       7) Missing netlink attribute policies in rtm_ipv6_policy table, from
          Eric Dumazet.
      
       8) Missing socket address length checks in l2tp and pppoe connect, from
          Guillaume Nault.
      
       9) Fix netconsole over team and bonding, from Xin Long.
      
      10) Fix race with AF_PACKET socket state bitfields, from Willem de
          Bruijn.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (51 commits)
        ice: Fix insufficient memory issue in ice_aq_manage_mac_read
        sfc: ARFS filter IDs
        net: ethtool: Add missing kernel doc for FEC parameters
        packet: fix bitfield update race
        ice: Do not check INTEVENT bit for OICR interrupts
        ice: Fix incorrect comment for action type
        ice: Fix initialization for num_nodes_added
        igb: Fix the transmission mode of queue 0 for Qav mode
        ixgbevf: ensure xdp_ring resources are free'd on error exit
        team: fix netconsole setup over team
        amd-xgbe: Only use the SFP supported transceiver signals
        amd-xgbe: Improve KR auto-negotiation and training
        amd-xgbe: Add pre/post auto-negotiation phy hooks
        pppoe: check sockaddr length in pppoe_connect()
        l2tp: check sockaddr length in pppol2tp_connect()
        net: phy: marvell: clear wol event before setting it
        ipv6: add RTA_TABLE and RTA_PREFSRC to rtm_ipv6_policy
        bonding: do not set slave_dev npinfo before slave_enable_netpoll in bond_enslave
        tcp: don't read out-of-bounds opsize
        ibmvnic: Clean actual number of RX or TX pools
        ...
      24cac700
    • Srinivas Jampala's avatar
      liquidio: Swap VF representor Tx and Rx statistics · 16f4faa4
      Srinivas Jampala authored
      Swap VF representor tx and rx interface statistics since it is a
      virtual switchdev port and tx for VM should be rx for VF representor
      and vice-versa.
      Signed-off-by: default avatarSrinivas Jampala <srinivasa.jampala@cavium.com>
      Acked-by: default avatarDerek Chickles <derek.chickles@cavium.com>
      Signed-off-by: default avatarFelix Manlunas <felix.manlunas@cavium.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      16f4faa4
    • Eric Dumazet's avatar
      net/ipv6: fix LOCKDEP issue in rt6_remove_exception_rt() · 091311de
      Eric Dumazet authored
      rt6_remove_exception_rt() is called under rcu_read_lock() only.
      
      We lock rt6_exception_lock a bit later, so we do not hold
      rt6_exception_lock yet.
      
      Fixes: 8a14e46f ("net/ipv6: Fix missing rcu dereferences on from")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Cc: David Ahern <dsahern@gmail.com>
      Acked-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      091311de
    • David S. Miller's avatar
      Merge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-queue · d19efb72
      David S. Miller authored
      Jeff Kirsher says:
      
      ====================
      Intel Wired LAN Driver Updates 2018-04-24
      
      This series contains fixes to ixgbevf, igb and ice drivers.
      
      Colin Ian King fixes the return value on error for the new XDP support
      that went into ixgbevf for 4.17.
      
      Vinicius provides a fix for queue 0 for igb, which was not receiving all
      the credits it needed when QAV mode was enabled.
      
      Anirudh provides several fixes for the new ice driver, starting with
      properly initializing num_nodes_added to zero.  Fixed up a code comment
      to better reflect what is really going on in the code.  Fixed how to
      detect if an OICR interrupt has occurred to a more reliable method.
      
      Md Fahad fixes the ice driver to allocate the right amount of memory
      when reading and storing the devices MAC addresses.  The device can have
      up to 2 MAC addresses (LAN and WoL), while WoL is currently not
      supported, we need to ensure it can be properly handled when support is
      added.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d19efb72
    • Colin Ian King's avatar
      net/tls: remove redundant second null check on sgout · 95ad7544
      Colin Ian King authored
      A duplicated null check on sgout is redundant as it is known to be
      already true because of the identical earlier check. Remove it.
      Detected by cppcheck:
      
      net/tls/tls_sw.c:696: (warning) Identical inner 'if' condition is always
      true.
      Signed-off-by: default avatarColin Ian King <colin.king@canonical.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      95ad7544
    • Colin Ian King's avatar
      fsl/fman_port: remove redundant check on port->rev_info.major · 080aadda
      Colin Ian King authored
      The check port->rev_info.major >= 6 is being performed twice, thus
      the inner second check is always true and is redundant, hence it
      can be removed. Detected by cppcheck.
      
      drivers/net/ethernet/freescale/fman/fman_port.c:1394]: (warning)
      Identical inner 'if' condition is always true.
      Signed-off-by: default avatarColin Ian King <colin.king@canonical.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      080aadda
    • Md Fahad Iqbal Polash's avatar
      ice: Fix insufficient memory issue in ice_aq_manage_mac_read · d6fef10c
      Md Fahad Iqbal Polash authored
      For the MAC read operation, the device can return up to two (LAN and WoL)
      MAC addresses. Without access to adequate memory, the device will return
      an error. Fixed this by allocating the right amount of memory. Also, logic
      to detect and copy the LAN MAC address into the port_info structure has
      been added. Note that the WoL MAC address is ignored currently as the WoL
      feature isn't supported yet.
      
      Fixes: dc49c772 ("ice: Get MAC/PHY/link info and scheduler topology")
      Signed-off-by: default avatarMd Fahad Iqbal Polash <md.fahad.iqbal.polash@intel.com>
      Signed-off-by: default avatarAnirudh Venkataramanan <anirudh.venkataramanan@intel.com>
      Tested-by: default avatarTony Brelinski <tonyx.brelinski@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      d6fef10c
    • Denis Bolotin's avatar
      qed: Fix copying 2 strings · c7d852e3
      Denis Bolotin authored
      The strscpy() was a recent fix (net: qed: use correct strncpy() size) to
      prevent passing the length of the source buffer to strncpy() and guarantee
      null termination.
      It misses the goal of overwriting only the first 3 characters in
      "???_BIG_RAM" and "???_RAM" while keeping the rest of the string.
      Use strncpy() with the length of 3, without null termination.
      Signed-off-by: default avatarDenis Bolotin <denis.bolotin@cavium.com>
      Signed-off-by: default avatarAriel Elior <ariel.elior@cavium.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c7d852e3
    • Edward Cree's avatar
      sfc: ARFS filter IDs · f8d62037
      Edward Cree authored
      Associate an arbitrary ID with each ARFS filter, allowing to properly query
       for expiry.  The association is maintained in a hash table, which is
       protected by a spinlock.
      
      v3: fix build warnings when CONFIG_RFS_ACCEL is disabled (thanks lkp-robot).
      v2: fixed uninitialised variable (thanks davem and lkp-robot).
      
      Fixes: 3af0f342 ("sfc: replace asynchronous filter operations")
      Signed-off-by: default avatarEdward Cree <ecree@solarflare.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f8d62037
    • David S. Miller's avatar
      Merge branch 'ipconfig-NTP-server-support-bug-fixes-documentation-improvements' · bc0fbc66
      David S. Miller authored
      Chris Novakovic says:
      
      ====================
      ipconfig: NTP server support, bug fixes, documentation improvements
      
      This series (against net-next) makes various improvements to ipconfig:
      
       - Patch #1 correctly documents the behaviour of parameter 4 in the
         "ip=" and "nfsaddrs=" command line parameter.
       - Patch #2 tidies up the printk()s for reporting configured name
         servers.
       - Patch #3 fixes a bug in autoconfiguration via BOOTP whereby the IP
         addresses of IEN-116 name servers are requested from the BOOTP
         server, rather than those of DNS name servers.
       - Patch #4 requests the number of DNS servers specified by
         CONF_NAMESERVERS_MAX when autoconfiguring via BOOTP, rather than
         hardcoding it to 2.
       - Patch #5 fully documents the contents and format of /proc/net/pnp in
         Documentation/filesystems/nfs/nfsroot.txt.
       - Patch #6 fixes a bug whereby bogus information is written to
         /proc/net/pnp when ipconfig is not used.
       - Patch #7 creates a new procfs directory for ipconfig-related
         configuration reports at /proc/net/ipconfig.
       - Patch #8 allows for NTP servers to be configured (manually on the
         kernel command line or automatically via DHCP), enabling systems with
         an NFS root filesystem to synchronise their clock before mounting
         their root filesystem. NTP server IP addresses are written to
         /proc/net/ipconfig/ntp_servers.
      
      Changes from v1:
      
       - David requested that a new directory /proc/net/ipconfig be created to
         contain ipconfig-related configuration reports, which is implemented
         in the new patch #7. NTP server IPs are now written to this directory
         instead of /proc/net/ntp in the new patch #8.
       - Cong and David both requested that the modification to CREDITS be
         dropped. This patch has been removed from the series.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bc0fbc66
    • Chris Novakovic's avatar
      ipconfig: Write NTP server IPs to /proc/net/ipconfig/ntp_servers · c04d2cb2
      Chris Novakovic authored
      Distributed filesystems are most effective when the server and client
      clocks are synchronised. Embedded devices often use NFS for their
      root filesystem but typically do not contain an RTC, so the clocks of
      the NFS server and the embedded device will be out-of-sync when the root
      filesystem is mounted (and may not be synchronised until late in the
      boot process).
      
      Extend ipconfig with the ability to export IP addresses of NTP servers
      it discovers to /proc/net/ipconfig/ntp_servers. They can be supplied as
      follows:
      
       - If ipconfig is configured manually via the "ip=" or "nfsaddrs="
         kernel command line parameters, one NTP server can be specified in
         the new "<ntp0-ip>" parameter.
       - If ipconfig is autoconfigured via DHCP, request DHCP option 42 in
         the DHCPDISCOVER message, and record the IP addresses of up to three
         NTP servers sent by the responding DHCP server in the subsequent
         DHCPOFFER message.
      
      ipconfig will only write the NTP server IP addresses it discovers to
      /proc/net/ipconfig/ntp_servers, one per line (in the order received from
      the DHCP server, if DHCP autoconfiguration is used); making use of these
      NTP servers is the responsibility of a user space process (e.g. an
      initrd/initram script that invokes an NTP client before mounting an NFS
      root filesystem).
      Signed-off-by: default avatarChris Novakovic <chris@chrisn.me.uk>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c04d2cb2
    • Chris Novakovic's avatar
      ipconfig: Create /proc/net/ipconfig directory · 4d019b3f
      Chris Novakovic authored
      To allow ipconfig to report IP configuration details to user space
      processes without cluttering /proc/net, create a new subdirectory
      /proc/net/ipconfig. All files containing IP configuration details should
      be written to this directory.
      Signed-off-by: default avatarChris Novakovic <chris@chrisn.me.uk>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4d019b3f
    • Chris Novakovic's avatar
      ipconfig: Correctly initialise ic_nameservers · 300eec7c
      Chris Novakovic authored
      ic_nameservers, which stores the list of name servers discovered by
      ipconfig, is initialised (i.e. has all of its elements set to NONE, or
      0xffffffff) by ic_nameservers_predef() in the following scenarios:
      
       - before the "ip=" and "nfsaddrs=" kernel command line parameters are
         parsed (in ip_auto_config_setup());
       - before autoconfiguring via DHCP or BOOTP (in ic_bootp_init()), in
         order to clear any values that may have been set after parsing "ip="
         or "nfsaddrs=" and are no longer needed.
      
      This means that ic_nameservers_predef() is not called when neither "ip="
      nor "nfsaddrs=" is specified on the kernel command line. In this
      scenario, every element in ic_nameservers remains set to 0x00000000,
      which is indistinguishable from ANY and causes pnp_seq_show() to write
      the following (bogus) information to /proc/net/pnp:
      
        #MANUAL
        nameserver 0.0.0.0
        nameserver 0.0.0.0
        nameserver 0.0.0.0
      
      This is potentially problematic for systems that blindly link
      /etc/resolv.conf to /proc/net/pnp.
      
      Ensure that ic_nameservers is also initialised when neither "ip=" nor
      "nfsaddrs=" are specified by calling ic_nameservers_predef() in
      ip_auto_config(), but only when ip_auto_config_setup() was not called
      earlier. This causes the following to be written to /proc/net/pnp, and
      is consistent with what gets written when ipconfig is configured
      manually but no name servers are specified on the kernel command line:
      
        #MANUAL
      Signed-off-by: default avatarChris Novakovic <chris@chrisn.me.uk>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      300eec7c
    • Chris Novakovic's avatar
      ipconfig: Document /proc/net/pnp · 8b0b37c5
      Chris Novakovic authored
      Fully document the format used by the /proc/net/pnp file written by
      ipconfig, explain where its values originate from, and clarify that the
      tertiary name server IP and DNS domain name are only written to the file
      when autoconfiguration is used.
      Signed-off-by: default avatarChris Novakovic <chris@chrisn.me.uk>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8b0b37c5
    • Chris Novakovic's avatar
      ipconfig: BOOTP: Request CONF_NAMESERVERS_MAX name servers · de1fa15b
      Chris Novakovic authored
      When ipconfig is autoconfigured via BOOTP, the request packet
      initialised by ic_bootp_init_ext() always allocates 8 bytes for the name
      server option, limiting the BOOTP server to responding with at most 2
      name servers even though ipconfig in fact supports an arbitrary number
      of name servers (as defined by CONF_NAMESERVERS_MAX, which is currently
      3).
      
      Only request name servers in the request packet if CONF_NAMESERVERS_MAX
      is positive (to comply with [1, §3.8]), and allocate enough space in the
      packet for CONF_NAMESERVERS_MAX name servers to indicate the maximum
      number we can accept in response.
      
      [1] RFC 2132, "DHCP Options and BOOTP Vendor Extensions":
          https://tools.ietf.org/rfc/rfc2132.txtSigned-off-by: default avatarChris Novakovic <chris@chrisn.me.uk>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      de1fa15b
    • Chris Novakovic's avatar
      ipconfig: BOOTP: Don't request IEN-116 name servers · 4e1a8af2
      Chris Novakovic authored
      When ipconfig is autoconfigured via BOOTP, the request packet
      initialised by ic_bootp_init_ext() allocates 8 bytes for tag 5 ("Name
      Server" [1, §3.7]), but tag 5 in the response isn't processed by
      ic_do_bootp_ext(). Instead, allocate the 8 bytes to tag 6 ("Domain Name
      Server" [1, §3.8]), which is processed by ic_do_bootp_ext(), and appears
      to have been the intended tag to request.
      
      This won't cause any breakage for existing users, as tag 5 responses
      provided by BOOTP servers weren't being processed anyway.
      
      [1] RFC 2132, "DHCP Options and BOOTP Vendor Extensions":
          https://tools.ietf.org/rfc/rfc2132.txtSigned-off-by: default avatarChris Novakovic <chris@chrisn.me.uk>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4e1a8af2
    • Chris Novakovic's avatar
      ipconfig: Tidy up reporting of name servers · e18bdc83
      Chris Novakovic authored
      Commit 5e953778 ("ipconfig: add
      nameserver IPs to kernel-parameter ip=") adds the IP addresses of
      discovered name servers to the summary printed by ipconfig when
      configuration is complete. It appears the intention in ip_auto_config()
      was to print the name servers on a new line (especially given the
      spacing and lack of comma before "nameserver0="), but they're actually
      printed on the same line as the NFS root filesystem configuration
      summary:
      
        [    0.686186] IP-Config: Complete:
        [    0.686226]      device=eth0, hwaddr=xx:xx:xx:xx:xx:xx, ipaddr=10.0.0.2, mask=255.255.255.0, gw=10.0.0.1
        [    0.686328]      host=test, domain=example.com, nis-domain=(none)
        [    0.686386]      bootserver=10.0.0.1, rootserver=10.0.0.1, rootpath=     nameserver0=10.0.0.1
      
      This makes it harder to read and parse ipconfig's output. Instead, print
      the name servers on a separate line:
      
        [    0.791250] IP-Config: Complete:
        [    0.791289]      device=eth0, hwaddr=xx:xx:xx:xx:xx:xx, ipaddr=10.0.0.2, mask=255.255.255.0, gw=10.0.0.1
        [    0.791407]      host=test, domain=example.com, nis-domain=(none)
        [    0.791475]      bootserver=10.0.0.1, rootserver=10.0.0.1, rootpath=
        [    0.791476]      nameserver0=10.0.0.1
      Signed-off-by: default avatarChris Novakovic <chris@chrisn.me.uk>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e18bdc83
    • Chris Novakovic's avatar
      ipconfig: Document setting of NIS domain name · 660de409
      Chris Novakovic authored
      ic_do_bootp_ext() is responsible for parsing the "ip=" and "nfsaddrs="
      kernel parameters. If a "." character is found in parameter 4 (the
      client's hostname), everything before the first "." is used as the
      hostname, and everything after it is used as the NIS domain name (but
      not necessarily the DNS domain name).
      
      Document this behaviour in Documentation/filesystems/nfs/nfsroot.txt,
      as it is not made explicit.
      Signed-off-by: default avatarChris Novakovic <chris@chrisn.me.uk>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      660de409
    • Florian Fainelli's avatar
      net: ethtool: Add missing kernel doc for FEC parameters · d805c520
      Florian Fainelli authored
      While adding support for ethtool::get_fecparam and set_fecparam, kernel
      doc for these functions was missed, add those.
      
      Fixes: 1a5f3da2 ("net: ethtool: add support for forward error correction modes")
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Acked-by: default avatarRoopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d805c520
    • David S. Miller's avatar
      Merge branch 'rhash-cleanups' · 5cb5ce33
      David S. Miller authored
      NeilBrown says:
      
      ====================
      A few rhashtables cleanups
      
      2 patches fixes documentation
      1 fixes a bit in rhashtable_walk_start()
      1 improves rhashtable_walk stability.
      
      All reviewed and Acked.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5cb5ce33
    • NeilBrown's avatar
      rhashtable: improve rhashtable_walk stability when stop/start used. · 5d240a89
      NeilBrown authored
      When a walk of an rhashtable is interrupted with rhastable_walk_stop()
      and then rhashtable_walk_start(), the location to restart from is based
      on a 'skip' count in the current hash chain, and this can be incorrect
      if insertions or deletions have happened.  This does not happen when
      the walk is not stopped and started as iter->p is a placeholder which
      is safe to use while holding the RCU read lock.
      
      In rhashtable_walk_start() we can revalidate that 'p' is still in the
      same hash chain.  If it isn't then the current method is still used.
      
      With this patch, if a rhashtable walker ensures that the current
      object remains in the table over a stop/start period (possibly by
      elevating the reference count if that is sufficient), it can be sure
      that a walk will not miss objects that were in the hashtable for the
      whole time of the walk.
      
      rhashtable_walk_start() may not find the object even though it is
      still in the hashtable if a rehash has moved it to a new table.  In
      this case it will (eventually) get -EAGAIN and will need to proceed
      through the whole table again to be sure to see everything at least
      once.
      Acked-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarNeilBrown <neilb@suse.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5d240a89
    • NeilBrown's avatar
      rhashtable: reset iter when rhashtable_walk_start sees new table · b41cc04b
      NeilBrown authored
      The documentation claims that when rhashtable_walk_start_check()
      detects a resize event, it will rewind back to the beginning
      of the table.  This is not true.  We need to set ->slot and
      ->skip to be zero for it to be true.
      Acked-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarNeilBrown <neilb@suse.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b41cc04b
    • NeilBrown's avatar
      rhashtable: Revise incorrect comment on r{hl, hash}table_walk_enter() · 82266e98
      NeilBrown authored
      Neither rhashtable_walk_enter() or rhltable_walk_enter() sleep, though
      they do take a spinlock without irq protection.
      So revise the comments to accurately state the contexts in which
      these functions can be called.
      Acked-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarNeilBrown <neilb@suse.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      82266e98
    • NeilBrown's avatar
      rhashtable: remove outdated comments about grow_decision etc · 0c6f69a5
      NeilBrown authored
      grow_decision and shink_decision no longer exist, so remove
      the remaining references to them.
      Acked-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarNeilBrown <neilb@suse.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0c6f69a5
    • Eric Dumazet's avatar
      tcp: md5: only call tp->af_specific->md5_lookup() for md5 sockets · 8c2320e8
      Eric Dumazet authored
      RETPOLINE made calls to tp->af_specific->md5_lookup() quite expensive,
      given they have no result.
      We can omit the calls for sockets that have no md5 keys.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8c2320e8
    • Willem de Bruijn's avatar
      packet: fix bitfield update race · a6361f0c
      Willem de Bruijn authored
      Updates to the bitfields in struct packet_sock are not atomic.
      Serialize these read-modify-write cycles.
      
      Move po->running into a separate variable. Its writes are protected by
      po->bind_lock (except for one startup case at packet_create). Also
      replace a textual precondition warning with lockdep annotation.
      
      All others are set only in packet_setsockopt. Serialize these
      updates by holding the socket lock. Analogous to other field updates,
      also hold the lock when testing whether a ring is active (pg_vec).
      
      Fixes: 8dc41944 ("[PACKET]: Add optional checksum computation for recvmsg")
      Reported-by: default avatarDaeRyong Jeong <threeearcat@gmail.com>
      Reported-by: default avatarByoungyoung Lee <byoungyoung@purdue.edu>
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a6361f0c
    • Ben Shelton's avatar
      ice: Do not check INTEVENT bit for OICR interrupts · 30d84397
      Ben Shelton authored
      According to the hardware spec, checking the INTEVENT bit isn't a
      reliable way to detect if an OICR interrupt has occurred. This is
      because this bit can be cleared by the hardware/firmware before the
      interrupt service routine has run. So instead, just check for OICR
      events every time.
      
      Fixes: 940b61af ("ice: Initialize PF and setup miscellaneous interrupt")
      Signed-off-by: default avatarBen Shelton <benjamin.h.shelton@intel.com>
      Signed-off-by: default avatarAnirudh Venkataramanan <anirudh.venkataramanan@intel.com>
      Tested-by: default avatarTony Brelinski <tonyx.brelinski@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      30d84397
    • Anirudh Venkataramanan's avatar
      ice: Fix incorrect comment for action type · 34357a90
      Anirudh Venkataramanan authored
      Action type 5 defines large action generic values. Fix comment to
      reflect that better.
      Signed-off-by: default avatarAnirudh Venkataramanan <anirudh.venkataramanan@intel.com>
      Tested-by: default avatarTony Brelinski <tonyx.brelinski@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      34357a90
    • Anirudh Venkataramanan's avatar
      ice: Fix initialization for num_nodes_added · d332a38c
      Anirudh Venkataramanan authored
      ice_sched_add_nodes_to_layer is used recursively, and so we start
      with num_nodes_added being 0. This way, in case of an error or if
      num_nodes is NULL, the function just returns 0 to indicate that no
      nodes were added.
      
      Fixes: 5513b920 ("ice: Update Tx scheduler tree for VSI multi-Tx queue support")
      Signed-off-by: default avatarAnirudh Venkataramanan <anirudh.venkataramanan@intel.com>
      Tested-by: default avatarTony Brelinski <tonyx.brelinski@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      d332a38c
    • Vinicius Costa Gomes's avatar
      igb: Fix the transmission mode of queue 0 for Qav mode · 2707df97
      Vinicius Costa Gomes authored
      When Qav mode is enabled, queue 0 should be kept on Stream Reservation
      mode. From the i210 datasheet, section 8.12.19:
      
      "Note: Queue0 QueueMode must be set to 1b when TransmitMode is set to
      Qav." ("QueueMode 1b" represents the Stream Reservation mode)
      
      The solution is to give queue 0 the all the credits it might need, so
      it has priority over queue 1.
      
      A situation where this can happen is when cbs is "installed" only on
      queue 1, leaving queue 0 alone. For example:
      
      $ tc qdisc replace dev enp2s0 handle 100: parent root mqprio num_tc 3 \
           	   map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 queues 1@0 1@1 2@2 hw 0
      
      $ tc qdisc replace dev enp2s0 parent 100:2 cbs locredit -1470 \
           	   hicredit 30 sendslope -980000 idleslope 20000 offload 1
      Signed-off-by: default avatarVinicius Costa Gomes <vinicius.gomes@intel.com>
      Tested-by: default avatarAaron Brown <aaron.f.brown@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      2707df97
    • Colin Ian King's avatar
      ixgbevf: ensure xdp_ring resources are free'd on error exit · 39035bfd
      Colin Ian King authored
      The current error handling for failed resource setup for xdp_ring
      data is a break out of the loop and returning 0 indicated everything
      was OK, when in fact it is not.  Fix this by exiting via the
      error exit label err_setup_tx that will clean up the resources
      correctly and return and error status.
      
      Detected by CoverityScan, CID#1466879 ("Logically dead code")
      
      Fixes: 21092e9c ("ixgbevf: Add support for XDP_TX action")
      Signed-off-by: default avatarColin Ian King <colin.king@canonical.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      39035bfd
    • Yafang Shao's avatar
      Revert "net: init sk_cookie for inet socket" · a06ac0d6
      Yafang Shao authored
      This reverts commit <c6849a3a> ("net: init sk_cookie for inet socket")
      
      Per discussion with Eric, when update sock_net(sk)->cookie_gen, the
      whole cache cache line will be invalidated, as this cache line is shared
      with all cpus, that may cause great performace hit.
      
      Bellow is the data form Eric.
      "Performance is reduced from ~5 Mpps to ~3.8 Mpps with 16 RX queues on
      my host" when running synflood test.
      
      Have to revert it to prevent from cache line false sharing.
      Signed-off-by: default avatarYafang Shao <laoar.shao@gmail.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a06ac0d6
    • David S. Miller's avatar
      Merge branch 'net-DIM-tx' · 8399743a
      David S. Miller authored
      Tal Gilboa says:
      
      ====================
      Introduce adaptive TX interrupt moderation to net DIM
      
      Net DIM is a library designed for dynamic interrupt moderation. It was
      implemented and optimized with receive side interrupts in mind, since these
      are usually the CPU expensive ones. This patch-set introduces adaptive transmit
      interrupt moderation to net DIM, complete with a usage in the mlx5e driver.
      Using adaptive TX behavior would reduce interrupt rate for multiple scenarios.
      Furthermore, it is essential for increasing bandwidth on cases where payload
      aggregation is required.
      
      v3: Remove "inline" from functions in .c files (requested by DaveM). Revert
      adding "enabled" field from struct net_dim and applied mlx5e structural
      suggestions (suggested by SaeedM).
      
      v2: Rebase over proper tree.
      
      v1: Fix compilation issues due to missed function renaming.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8399743a
    • Tal Gilboa's avatar
      net/mlx5e: Enable adaptive-TX moderation · cbce4f44
      Tal Gilboa authored
      Add support for adaptive TX moderation. This greatly reduces TX interrupt
      rate and increases bandwidth, mostly for TCP bandwidth over ARM
      architecture (below). There is a slight single stream TCP with very large
      message sizes degradation (x86). In this case if there's any moderation on
      transmitted packets the bandwidth would reduce due to hitting TCP output limit.
      Since this is a synthetic case, this is still worth doing.
      
      Performance improvement (ConnectX-4Lx 40GbE, ARM)
      TCP 64B bandwidth with 1-50 streams increased 6-35%.
      TCP 64B bandwidth with 100-500 streams increased 20-70%.
      
      Performance improvement (ConnectX-5 100GbE, x86)
      Bandwidth: increased up to 40% (1024B with 10s of streams).
      Interrupt rate: reduced up to 50% (1024B with 1000s of streams).
      
      Performance degradation (ConnectX-5 100GbE, x86)
      Bandwidth: up to 10% decrease single stream TCP (1MB message size from
      51Gb/s to 47Gb/s).
      Signed-off-by: default avatarTal Gilboa <talgi@mellanox.com>
      Reviewed-by: default avatarTariq Toukan <tariqt@mellanox.com>
      Acked-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cbce4f44
    • Tal Gilboa's avatar
      net/dim: Support adaptive TX moderation · 623ad755
      Tal Gilboa authored
      Interrupt moderation for TX traffic requires different profiles than RX
      interrupt moderation. The main goal here is to reduce interrupt rate and
      allow better payload aggregation by keeping SKBs in the TX queue a bit
      longer. Ping-pong behavior would get a profile with a short timer, so
      latency wouldn't increase for these scenarios. There might be a slight
      degradation in bandwidth for single stream with large message sizes, since
      net.ipv4.tcp_limit_output_bytes is limiting the allowed TX traffic, but
      with many streams performance is always improved.
      Signed-off-by: default avatarTal Gilboa <talgi@mellanox.com>
      Reviewed-by: default avatarTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      623ad755