1. 13 Feb, 2018 22 commits
  2. 12 Feb, 2018 13 commits
    • Alexander Duyck's avatar
      i40e/i40evf: Add support for new mechanism of updating adaptive ITR · a0073a4b
      Alexander Duyck authored
      This patch replaces the existing mechanism for determining the correct
      value to program for adaptive ITR with yet another new and more
      complicated approach.
      
      The basic idea from a 30K foot view is that this new approach will push the
      Rx interrupt moderation up so that by default it starts in low latency and
      is gradually pushed up into a higher latency setup as long as doing so
      increases the number of packets processed, if the number of packets drops
      to 4 to 1 per packet we will reset and just base our ITR on the size of the
      packets being received. For Tx we leave it floating at a high interrupt
      delay and do not pull it down unless we start processing more than 112
      packets per interrupt. If we start exceeding that we will cut our interrupt
      rates in half until we are back below 112.
      
      The side effect of these patches are that we will be processing more
      packets per interrupt. This is both a good and a bad thing as it means we
      will not be blocking processing in the case of things like pktgen and XDP,
      but we will also be consuming a bit more CPU in the cases of things such as
      network throughput tests using netperf.
      
      One delta from this versus the ixgbe version of the changes is that I have
      made the interrupt moderation a bit more aggressive when we are in bulk
      mode by moving our "goldilocks zone" up from 48 to 96 to 56 to 112. The
      main motivation behind moving this is to address the fact that we need to
      update less frequently, and have more fine grained control due to the
      separate Tx and Rx ITR times.
      Signed-off-by: default avatarAlexander Duyck <alexander.h.duyck@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      a0073a4b
    • Alexander Duyck's avatar
      i40e/i40evf: Split container ITR into current_itr and target_itr · 556fdfd6
      Alexander Duyck authored
      This patch is mostly prep-work for replacing the current approach to
      programming the dynamic aka adaptive ITR. Specifically here what we are
      doing is splitting the Tx and Rx ITR each into two separate values.
      
      The first value current_itr represents the current value of the register.
      
      The second value target_itr represents the desired value of the register.
      
      The general plan by doing this is to allow for deferring the update of the
      ITR value under certain circumstances. For now we will work with what we
      have, but in the future I hope to change the behavior so that we always
      only update one ITR at a time using some simple logic to determine which
      ITR requires an update.
      Signed-off-by: default avatarAlexander Duyck <alexander.h.duyck@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      556fdfd6
    • Alexander Duyck's avatar
      i40evf: Correctly populate rxitr_idx and txitr_idx · d4942d58
      Alexander Duyck authored
      While testing code for the recent ITR changes I found that updating the Tx
      ITR appeared to have no effect with everything defaulting to the Rx ITR. A
      bit of digging narrowed it down the fact that we were asking the PF to
      associate all causes with ITR 0 as we weren't populating the itr_idx values
      for either Rx or Tx.
      
      To correct it I have added the configuration for these values to this
      patch. In addition I did some minor clean-up to just add a local pointer
      for the vector map instead of dereferencing it based off of the index
      repeatedly. In my opinion this makes the resultant code a bit more readable
      and saves us a few characters.
      Signed-off-by: default avatarAlexander Duyck <alexander.h.duyck@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      d4942d58
    • Alexander Duyck's avatar
      i40e/i40evf: Use usec value instead of reg value for ITR defines · 92418fb1
      Alexander Duyck authored
      Instead of using the register value for the defines when setting up the
      ring ITR we can just use the actual values and avoid the use of shifts and
      macros to translate between the values we have and the values we want.
      
      This helps to make the code more readable as we can quickly translate from
      one value to the other.
      Signed-off-by: default avatarAlexander Duyck <alexander.h.duyck@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      92418fb1
    • Denys Vlasenko's avatar
      net: make getname() functions return length rather than use int* parameter · 9b2c45d4
      Denys Vlasenko authored
      Changes since v1:
      Added changes in these files:
          drivers/infiniband/hw/usnic/usnic_transport.c
          drivers/staging/lustre/lnet/lnet/lib-socket.c
          drivers/target/iscsi/iscsi_target_login.c
          drivers/vhost/net.c
          fs/dlm/lowcomms.c
          fs/ocfs2/cluster/tcp.c
          security/tomoyo/network.c
      
      Before:
      All these functions either return a negative error indicator,
      or store length of sockaddr into "int *socklen" parameter
      and return zero on success.
      
      "int *socklen" parameter is awkward. For example, if caller does not
      care, it still needs to provide on-stack storage for the value
      it does not need.
      
      None of the many FOO_getname() functions of various protocols
      ever used old value of *socklen. They always just overwrite it.
      
      This change drops this parameter, and makes all these functions, on success,
      return length of sockaddr. It's always >= 0 and can be differentiated
      from an error.
      
      Tests in callers are changed from "if (err)" to "if (err < 0)", where needed.
      
      rpc_sockname() lost "int buflen" parameter, since its only use was
      to be passed to kernel_getsockname() as &buflen and subsequently
      not used in any way.
      
      Userspace API is not changed.
      
          text    data     bss      dec     hex filename
      30108430 2633624  873672 33615726 200ef6e vmlinux.before.o
      30108109 2633612  873672 33615393 200ee21 vmlinux.o
      Signed-off-by: default avatarDenys Vlasenko <dvlasenk@redhat.com>
      CC: David S. Miller <davem@davemloft.net>
      CC: linux-kernel@vger.kernel.org
      CC: netdev@vger.kernel.org
      CC: linux-bluetooth@vger.kernel.org
      CC: linux-decnet-user@lists.sourceforge.net
      CC: linux-wireless@vger.kernel.org
      CC: linux-rdma@vger.kernel.org
      CC: linux-sctp@vger.kernel.org
      CC: linux-nfs@vger.kernel.org
      CC: linux-x25@vger.kernel.org
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9b2c45d4
    • Alexander Duyck's avatar
      i40e/i40evf: Don't bother setting the CLEARPBA bit · 4ff17929
      Alexander Duyck authored
      The CLEARPBA bit in the dynamic interrupt control register actually has
      no effect either way on the hardware. As per errata 28 in the XL710
      specification update the interrupt is actually cleared any time the
      register is written with the INTENA_MSK bit set to 0. As such the act of
      toggling the enable bit actually will trigger the interrupt being
      cleared and could lead to potential lost events if auto-masking is
      not enabled.
      Signed-off-by: default avatarAlexander Duyck <alexander.h.duyck@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      4ff17929
    • Alexander Duyck's avatar
      i40e/i40evf: Clean-up of bits related to using q_vector->reg_idx · 8b99b117
      Alexander Duyck authored
      This patch is a further clean-up related to the change over to using
      q_vector->reg_idx when accessing the ITR registers. Specifically the code
      appears to have several other spots where we were computing the register
      offset manually and this resulted in errors in a few spots.
      
      Specifically in the i40evf functions for mapping queues to vectors it
      appears we may have had an off by 1 error since (v_idx - 1) for the first
      q_vector with an index of 0 would result in us returning -1 if I am not
      mistaken.
      Signed-off-by: default avatarAlexander Duyck <alexander.h.duyck@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      8b99b117
    • Alan Brady's avatar
      i40e: use changed_flags to check I40E_FLAG_DISABLE_FW_LLDP · fe09ed0e
      Alan Brady authored
      Currently in i40e_set_priv_flags we use new_flags to check for the
      I40E_FLAG_DISABLE_FW_LLDP flag.  This is an issue for a few a reasons.
      DISABLE_FW_LLDP is persistent across reboots/driver reloads.  This means
      we need some way to detect if FW LLDP is enabled on init.  We do this by
      trying to init_dcb and if it fails with EPERM we know LLDP is disabled
      in FW.
      
      This could be a problem on older FW versions or NPAR enabled PFs because
      there are situations where the FW could disable LLDP, but they do _not_
      support using this flag to change it.  If we do end up in this
      situation, the flag will be set, then when the user tries to change any
      priv flags, the driver thinks the user is trying to disable FW LLDP on a
      FW that doesn't support it and essentially forbids any priv flag
      changes.
      
      The fix is simple, instead of checking if this flag is set, we should be
      checking if the user is trying to _change_ the flag on unsupported FW
      versions.
      
      This patch also adds a comment explaining that the cmpxchg is the point
      of no return.  Once we put the new flags into pf->flags we can't back
      out.
      Signed-off-by: default avatarAlan Brady <alan.brady@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      fe09ed0e
    • Paweł Jabłoński's avatar
      i40e: Warn when setting link-down-on-close while in MFP · 17b4d25c
      Paweł Jabłoński authored
      This patch adds a warning message when the link-down-on-close flag is
      setting on. The warning is printed only on MFP devices
      Signed-off-by: default avatarPaweł Jabłoński <pawel.jablonski@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      17b4d25c
    • Filip Sadowski's avatar
      i40e: Add delay after EMP reset for firmware to recover · 1fa51a65
      Filip Sadowski authored
      This patch adds necessary delay for 4.33 firmware to recover after
      EMP reset. Without this patch driver occasionally reinitializes
      structures too quickly to communicate with firmware after EMP reset
      causing AdminQ to timeout.
      Signed-off-by: default avatarFilip Sadowski <filip.sadowski@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      1fa51a65
    • Alexander Duyck's avatar
      i40e/i40evf: Clean up logic for adaptive ITR · 71dc3719
      Alexander Duyck authored
      The logic for dynamic ITR update is confusing at best as there were odd
      paths chosen for how to find the rings associated with a given queue based
      on the vector index and other inconsistencies throughout the code.
      
      This patch is an attempt to clean up the logic so that we can more easily
      understand what is going on. Specifically if there is a Rx or Tx ring that
      is enabled in dynamic mode on the q_vector it is allowed to override the
      other side of the interrupt moderation. While it isn't correct all this
      patch is doing is cleaning up the logic for now so that when we come
      through and fix it we can more easily identify that this is wrong.
      
      The other big change made here is that we replace references to:
      	vsi->rx_rings[q_vector->v_idx]->itr_setting
      with:
      	q_vector->rx.ring->itr_setting
      
      The general idea is we can avoid the long pointer chase since just
      accessing q_vector->rx.ring is a single pointer access versus having to
      chase down vsi->rx_rings, and then finding the pointer in the array, and
      finally chasing down the itr_setting from there.
      Signed-off-by: default avatarAlexander Duyck <alexander.h.duyck@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      71dc3719
    • Alexander Duyck's avatar
      i40e/i40evf: Only track one ITR setting per ring instead of Tx/Rx · 40588ca6
      Alexander Duyck authored
      The rings are already split out into Tx and Rx rings so it doesn't make
      sense to have any single ring store both a Tx and Rx itr_setting value.
      Since that is the case drop the pair in favor of storing just a single ITR
      value.
      Signed-off-by: default avatarAlexander Duyck <alexander.h.duyck@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      40588ca6
    • Alan Brady's avatar
      i40e: fix typo in function description · 11a350c9
      Alan Brady authored
      'bufer' should be 'buffer'
      Signed-off-by: default avatarAlan Brady <alan.brady@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      11a350c9
  3. 11 Feb, 2018 5 commits
    • Linus Torvalds's avatar
      Linux 4.16-rc1 · 7928b2cb
      Linus Torvalds authored
      7928b2cb
    • Al Viro's avatar
      unify {de,}mangle_poll(), get rid of kernel-side POLL... · 7a163b21
      Al Viro authored
      except, again, POLLFREE and POLL_BUSY_LOOP.
      
      With this, we finally get to the promised end result:
      
       - POLL{IN,OUT,...} are plain integers and *not* in __poll_t, so any
         stray instances of ->poll() still using those will be caught by
         sparse.
      
       - eventpoll.c and select.c warning-free wrt __poll_t
      
       - no more kernel-side definitions of POLL... - userland ones are
         visible through the entire kernel (and used pretty much only for
         mangle/demangle)
      
       - same behavior as after the first series (i.e. sparc et.al. epoll(2)
         working correctly).
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7a163b21
    • Linus Torvalds's avatar
      vfs: do bulk POLL* -> EPOLL* replacement · a9a08845
      Linus Torvalds authored
      This is the mindless scripted replacement of kernel use of POLL*
      variables as described by Al, done by this script:
      
          for V in IN OUT PRI ERR RDNORM RDBAND WRNORM WRBAND HUP RDHUP NVAL MSG; do
              L=`git grep -l -w POLL$V | grep -v '^t' | grep -v /um/ | grep -v '^sa' | grep -v '/poll.h$'|grep -v '^D'`
              for f in $L; do sed -i "-es/^\([^\"]*\)\(\<POLL$V\>\)/\\1E\\2/" $f; done
          done
      
      with de-mangling cleanups yet to come.
      
      NOTE! On almost all architectures, the EPOLL* constants have the same
      values as the POLL* constants do.  But they keyword here is "almost".
      For various bad reasons they aren't the same, and epoll() doesn't
      actually work quite correctly in some cases due to this on Sparc et al.
      
      The next patch from Al will sort out the final differences, and we
      should be all done.
      Scripted-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a9a08845
    • Linus Torvalds's avatar
      Merge branch 'work.poll2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · ee5daa13
      Linus Torvalds authored
      Pull more poll annotation updates from Al Viro:
       "This is preparation to solving the problems you've mentioned in the
        original poll series.
      
        After this series, the kernel is ready for running
      
            for V in IN OUT PRI ERR RDNORM RDBAND WRNORM WRBAND HUP RDHUP NVAL MSG; do
                  L=`git grep -l -w POLL$V | grep -v '^t' | grep -v /um/ | grep -v '^sa' | grep -v '/poll.h$'|grep -v '^D'`
                  for f in $L; do sed -i "-es/^\([^\"]*\)\(\<POLL$V\>\)/\\1E\\2/" $f; done
            done
      
        as a for bulk search-and-replace.
      
        After that, the kernel is ready to apply the patch to unify
        {de,}mangle_poll(), and then get rid of kernel-side POLL... uses
        entirely, and we should be all done with that stuff.
      
        Basically, that's what you suggested wrt KPOLL..., except that we can
        use EPOLL... instead - they already are arch-independent (and equal to
        what is currently kernel-side POLL...).
      
        After the preparations (in this series) switch to returning EPOLL...
        from ->poll() instances is completely mechanical and kernel-side
        POLL... can go away. The last step (killing kernel-side POLL... and
        unifying {de,}mangle_poll() has to be done after the
        search-and-replace job, since we need userland-side POLL... for
        unified {de,}mangle_poll(), thus the cherry-pick at the last step.
      
        After that we will have:
      
         - POLL{IN,OUT,...} *not* in __poll_t, so any stray instances of
           ->poll() still using those will be caught by sparse.
      
         - eventpoll.c and select.c warning-free wrt __poll_t
      
         - no more kernel-side definitions of POLL... - userland ones are
           visible through the entire kernel (and used pretty much only for
           mangle/demangle)
      
         - same behavior as after the first series (i.e. sparc et.al. epoll(2)
           working correctly)"
      
      * 'work.poll2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
        annotate ep_scan_ready_list()
        ep_send_events_proc(): return result via esed->res
        preparation to switching ->poll() to returning EPOLL...
        add EPOLLNVAL, annotate EPOLL... and event_poll->event
        use linux/poll.h instead of asm/poll.h
        xen: fix poll misannotation
        smc: missing poll annotations
      ee5daa13
    • Linus Torvalds's avatar
      Merge tag 'xtensa-20180211' of git://github.com/jcmvbkbc/linux-xtensa · 3fc928dc
      Linus Torvalds authored
      Pull xtense fix from Max Filippov:
       "Build fix for xtensa architecture with KASAN enabled"
      
      * tag 'xtensa-20180211' of git://github.com/jcmvbkbc/linux-xtensa:
        xtensa: fix build with KASAN
      3fc928dc