1. 29 Nov, 2016 8 commits
    • Chuck Lever's avatar
      xprtrdma: Update documenting comment · 289400af
      Chuck Lever authored
      Clean up: If reset fails, FRMRs are no longer abandoned, rather
      they are released immediately. Update the comment to reflect this.
      
      Fixes: 2ffc871a ('xprtrdma: Release orphaned MRs immediately')
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: default avatarAnna Schumaker <Anna.Schumaker@Netapp.com>
      289400af
    • Chuck Lever's avatar
      xprtrdma: Refactor FRMR invalidation · a100fda1
      Chuck Lever authored
      Clean up: After some recent updates, clarifications can be made to
      the FRMR invalidation logic.
      
      - Both the remote and local invalidation case mark the frmr INVALID,
        so make that a common path.
      
      - Manage the WR list more "tastefully" by replacing the conditional
        that discriminates between the list head and ->next pointers.
      
      - Use mw->mw_handle in all cases, since that has the same value as
        f->fr_mr->rkey, and is already in cache.
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: default avatarAnna Schumaker <Anna.Schumaker@Netapp.com>
      a100fda1
    • Chuck Lever's avatar
      xprtrdma: Avoid calls to ro_unmap_safe() · 48016dce
      Chuck Lever authored
      Micro-optimization: Most of the time, calls to ro_unmap_safe are
      expensive no-ops. Call only when there is work to do.
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: default avatarAnna Schumaker <Anna.Schumaker@Netapp.com>
      48016dce
    • Chuck Lever's avatar
      xprtrdma: Address coverity complaint about wait_for_completion() · 109b88ab
      Chuck Lever authored
      > ** CID 114101:  Error handling issues  (CHECKED_RETURN)
      > /net/sunrpc/xprtrdma/verbs.c: 355 in rpcrdma_create_id()
      
      Commit 5675add3 ("RPC/RDMA: harden connection logic against
      missing/late rdma_cm upcalls.") replaced wait_for_completion() calls
      with these two call sites.
      
      The original wait_for_completion() calls were added in the initial
      commit of verbs.c, which was commit c56c65fb ("RPCRDMA: rpc rdma
      verbs interface implementation"), but these returned void.
      
      rpcrdma_create_id() is called by the RDMA connect worker, which
      probably won't ever be interrupted. It is also called by
      rpcrdma_ia_open which is in the synchronous mount path, and ^C is
      possible there.
      
      Add a bit of logic at those two call sites to return if the waits
      return ERESTARTSYS.
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: default avatarAnna Schumaker <Anna.Schumaker@Netapp.com>
      109b88ab
    • Chuck Lever's avatar
      SUNRPC: Proper metric accounting when RPC is not transmitted · ae09531d
      Chuck Lever authored
      I noticed recently that during an xfstests on a krb5i mount, the
      retransmit count for certain operations had gone negative, and the
      backlog value became unreasonably large. I recall that Andy has
      pointed this out to me in the past.
      
      When call_refresh fails to find a valid credential for an RPC, the
      RPC exits immediately without sending anything on the wire. This
      leaves rq_ntrans, rq_xtime, and rq_rtt set to zero.
      
      The solution for om_queue is to not add the to RPC's running backlog
      queue total whenever rq_xtime is zero.
      
      For om_ntrans, it's a bit more difficult. A zero rq_ntrans causes
      om_ops to become larger than om_ntrans. The design of the RPC
      metrics API assumes that ntrans will always be equal to or larger
      than the ops count. The result is that when an RPC fails to find
      credentials, the RPC operation's reported retransmit count, which is
      computed in user space as the difference between ops and ntrans,
      goes negative.
      
      Ideally the kernel API should report a separate retransmit and
      "exited before initial transmission" metric, so that user space can
      sort out the difference properly.
      
      To avoid kernel API changes and changes to the way rq_ntrans is used
      when performing transport locking, account for untransmitted RPCs
      so that om_ntrans keeps up with om_ops: always add one or more to
      om_ntrans.
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: default avatarAnna Schumaker <Anna.Schumaker@Netapp.com>
      ae09531d
    • Chuck Lever's avatar
      xprtrdma: Support for SG_GAP devices · 5e9fc6a0
      Chuck Lever authored
      Some devices (such as the Mellanox CX-4) can register, under a
      single R_key, a set of memory regions that are not contiguous. When
      this is done, all the segments in a Reply list, say, can then be
      invalidated in a single LocalInv Work Request (or via Remote
      Invalidation, which can invalidate exactly one R_key when completing
      a Receive).
      
      This means a single FastReg WR is used to register, and one or zero
      LocalInv WRs can invalidate, the memory involved with RDMA transfers
      on behalf of an RPC.
      
      In addition, xprtrdma constructs some Reply chunks from three or
      more segments. By registering them with SG_GAP, only one segment
      is needed for the Reply chunk, allowing the whole chunk to be
      invalidated remotely.
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: default avatarAnna Schumaker <Anna.Schumaker@Netapp.com>
      5e9fc6a0
    • Chuck Lever's avatar
      xprtrdma: Make FRWR send queue entry accounting more accurate · 8d38de65
      Chuck Lever authored
      Verbs providers may perform house-keeping on the Send Queue during
      each signaled send completion. It is necessary therefore for a verbs
      consumer (like xprtrdma) to occasionally force a signaled send
      completion if it runs unsignaled most of the time.
      
      xprtrdma does not require signaled completions for Send or FastReg
      Work Requests, but does signal some LocalInv Work Requests. To
      ensure that Send Queue house-keeping can run before the Send Queue
      is more than half-consumed, xprtrdma forces a signaled completion
      on occasion by counting the number of Send Queue Entries it
      consumes. It currently does this by counting each ib_post_send as
      one Entry.
      
      Commit c9918ff5 ("xprtrdma: Add ro_unmap_sync method for FRWR")
      introduced the ability for frwr_op_unmap_sync to post more than one
      Work Request with a single post_send. Thus the underlying assumption
      of one Send Queue Entry per ib_post_send is no longer true.
      
      Also, FastReg Work Requests are currently never signaled. They
      should be signaled once in a while, just as Send is, to keep the
      accounting of consumed SQEs accurate.
      
      While we're here, convert the CQCOUNT macros to the currently
      preferred kernel coding style, which is inline functions.
      
      Fixes: c9918ff5 ("xprtrdma: Add ro_unmap_sync method for FRWR")
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: default avatarAnna Schumaker <Anna.Schumaker@Netapp.com>
      8d38de65
    • Chuck Lever's avatar
      xprtrdma: Cap size of callback buffer resources · 62aee0e3
      Chuck Lever authored
      When the inline threshold size is set to large values (say, 32KB)
      any NFSv4.1 CB request from the server gets a reply with status
      NFS4ERR_RESOURCE.
      
      Looks like there are some upper layer assumptions about the maximum
      size of a reply (for example, in process_op). Cap the size of the
      NFSv4 client's reply resources at a page.
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: default avatarAnna Schumaker <Anna.Schumaker@Netapp.com>
      62aee0e3
  2. 27 Nov, 2016 5 commits
  3. 26 Nov, 2016 17 commits
    • Linus Torvalds's avatar
      Merge branch 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux · e3480312
      Linus Torvalds authored
      Pull i2c fixes from Wolfram Sang:
       "Here is a revert and two bugfixes for the I2C designware driver.
      
        Please note that we are still hunting down a regression for the
        i2c-octeon driver. While there is a fix pending, we have unclear
        feedback from the testers currently. An rc8 would be quite helpful
        for this case"
      
      * 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
        Revert "i2c: designware: do not disable adapter after transfer"
        i2c: designware: fix rx fifo depth tracking
        i2c: designware: report short transfers
      e3480312
    • Linus Torvalds's avatar
      Merge branch 'fixes' of git://git.armlinux.org.uk/~rmk/linux-arm · a56f3eb2
      Linus Torvalds authored
      Pull ARM fix from Russell King:
       "This resolves the ksyms issues by reverting the commit which
        introduced the breakage"
      
      There was what I consider to be a better fix, but it's late in the rc
      game, so I'll take the revert.
      
      * 'fixes' of git://git.armlinux.org.uk/~rmk/linux-arm:
        Revert "arm: move exports to definitions"
      a56f3eb2
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · a0d60e62
      Linus Torvalds authored
      Pull networking fixes from David Miller:
      
       1) Fix leak in fsl/fman driver, from Dan Carpenter.
      
       2) Call flow dissector initcall earlier than any networking driver can
          register and start to use it, from Eric Dumazet.
      
       3) Some dup header fixes from Geliang Tang.
      
       4) TIPC link monitoring compat fix from Jon Paul Maloy.
      
       5) Link changes require EEE re-negotiation in bcm_sf2 driver, from
          Florian Fainelli.
      
       6) Fix bogus handle ID passed into tfilter_notify_chain(), from Roman
          Mashak.
      
       7) Fix dump size calculation in rtnl_calcit(), from Zhang Shengju.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (26 commits)
        tipc: resolve connection flow control compatibility problem
        mvpp2: use correct size for memset
        net/mlx5: drop duplicate header delay.h
        net: ieee802154: drop duplicate header delay.h
        ibmvnic: drop duplicate header seq_file.h
        fsl/fman: fix a leak in tgec_free()
        net: ethtool: don't require CAP_NET_ADMIN for ETHTOOL_GLINKSETTINGS
        tipc: improve sanity check for received domain records
        tipc: fix compatibility bug in link monitoring
        net: ethernet: mvneta: Remove IFF_UNICAST_FLT which is not implemented
        dwc_eth_qos: drop duplicate headers
        net sched filters: fix filter handle ID in tfilter_notify_chain()
        net: dsa: bcm_sf2: Ensure we re-negotiate EEE during after link change
        bnxt: do not busy-poll when link is down
        udplite: call proper backlog handlers
        ipv6: bump genid when the IFA_F_TENTATIVE flag is clear
        net/mlx4_en: Free netdev resources under state lock
        net: revert "net: l2tp: Treat NET_XMIT_CN as success in l2tp_eth_dev_xmit"
        rtnetlink: fix the wrong minimal dump size getting from rtnl_calcit()
        bnxt_en: Fix a VXLAN vs GENEVE issue
        ...
      a0d60e62
    • Linus Torvalds's avatar
      Merge branch 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm · 30e2b7cf
      Linus Torvalds authored
      Pull libnvdimm fixes from Dan Williams:
      
       - Fix a crash that occurs at driver initialization if the memory region
         is already busy (request_mem_region() fails).
      
       - Fix a vma validation check that mistakenly allows a private device-
         dax mapping to be established. Device-dax explicitly forbids private
         mappings so it can guarantee a given fault granularity and backing
         memory type.
      
       Both of these fixes have soaked in -next and are tagged for -stable.
      
      * 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
        device-dax: fail all private mapping attempts
        device-dax: check devm_nsio_enable() return value
      30e2b7cf
    • Linus Torvalds's avatar
      Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm · fc13ca19
      Linus Torvalds authored
      Pull KVM fixes from Radim Krčmář:
       "Four fixes for bugs found by syzkaller on x86, all for stable"
      
      * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
        KVM: x86: check for pic and ioapic presence before use
        KVM: x86: fix out-of-bounds accesses of rtc_eoi map
        KVM: x86: drop error recovery in em_jmp_far and em_ret_far
        KVM: x86: fix out-of-bounds access in lapic
      fc13ca19
    • Linus Torvalds's avatar
      Merge tag 'powerpc-4.9-6' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux · 39c15737
      Linus Torvalds authored
      Pull powerpc fixes from Michael Ellerman:
       "Fixes marked for stable:
         - Set missing wakeup bit in LPCR on POWER9
         - Fix the early OPAL console wrappers
         - Fixup kernel read only mapping
      
        Fixes for code merged this cycle:
         - Fix missing CRCs, add more asm-prototypes.h declarations"
      
      * tag 'powerpc-4.9-6' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
        powerpc/mm: Fixup kernel read only mapping
        powerpc/boot: Fix the early OPAL console wrappers
        powerpc: Fix missing CRCs, add more asm-prototypes.h declarations
        powerpc: Set missing wakeup bit in LPCR on POWER9
      39c15737
    • Jon Paul Maloy's avatar
      tipc: resolve connection flow control compatibility problem · 6998cc6e
      Jon Paul Maloy authored
      In commit 10724cc7 ("tipc: redesign connection-level flow control")
      we replaced the previous message based flow control with one based on
      1k blocks. In order to ensure backwards compatibility the mechanism
      falls back to using message as base unit when it senses that the peer
      doesn't support the new algorithm. The default flow control window,
      i.e., how many units can be sent before the sender blocks and waits
      for an acknowledge (aka advertisement) is 512. This was tested against
      the previous version, which uses an acknowledge frequency of on ack per
      256 received message, and found to work fine.
      
      However, we missed the fact that versions older than Linux 3.15 use an
      acknowledge frequency of 512, which is exactly the limit where a 4.6+
      sender will stop and wait for acknowledge. This would also work fine if
      it weren't for the fact that if the first sent message on a 4.6+ server
      side is an empty SYNACK, this one is also is counted as a sent message,
      while it is not counted as a received message on a legacy 3.15-receiver.
      This leads to the sender always being one step ahead of the receiver, a
      scenario causing the sender to block after 512 sent messages, while the
      receiver only has registered 511 read messages. Hence, the legacy
      receiver is not trigged to send an acknowledge, with a permanently
      blocked sender as result.
      
      We solve this deadlock by simply allowing the sender to send one more
      message before it blocks, i.e., by a making minimal change to the
      condition used for determining connection congestion.
      Signed-off-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6998cc6e
    • Arnd Bergmann's avatar
      mvpp2: use correct size for memset · e8f967c3
      Arnd Bergmann authored
      gcc-7 detects a short memset in mvpp2, introduced in the original
      merge of the driver:
      
      drivers/net/ethernet/marvell/mvpp2.c: In function 'mvpp2_cls_init':
      drivers/net/ethernet/marvell/mvpp2.c:3296:2: error: 'memset' used with length equal to number of elements without multiplication by element size [-Werror=memset-elt-size]
      
      The result seems to be that we write uninitialized data into the
      flow table registers, although we did not get any warning about
      that uninitialized data usage.
      
      Using sizeof() lets us initialize then entire array instead.
      
      Fixes: 3f518509 ("ethernet: Add new driver for Marvell Armada 375 network unit")
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e8f967c3
    • Geliang Tang's avatar
      net/mlx5: drop duplicate header delay.h · 5e7dfeb7
      Geliang Tang authored
      Drop duplicate header delay.h from mlx5/core/main.c.
      Signed-off-by: default avatarGeliang Tang <geliangtang@gmail.com>
      Acked-by: default avatarMatan Barak <matanb@mellanox.com>
      Acked-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5e7dfeb7
    • Geliang Tang's avatar
      net: ieee802154: drop duplicate header delay.h · 8f8a8b13
      Geliang Tang authored
      Drop duplicate header delay.h from adf7242.c.
      Signed-off-by: default avatarGeliang Tang <geliangtang@gmail.com>
      Acked-by: default avatarStefan Schmidt <stefan@osg.samsung.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8f8a8b13
    • Geliang Tang's avatar
      ibmvnic: drop duplicate header seq_file.h · 4ee12efa
      Geliang Tang authored
      Drop duplicate header seq_file.h from ibmvnic.c.
      Signed-off-by: default avatarGeliang Tang <geliangtang@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4ee12efa
    • Dan Carpenter's avatar
      fsl/fman: fix a leak in tgec_free() · 1f1e70ef
      Dan Carpenter authored
      We set "tgec->cfg" to NULL before passing it to kfree().  There is no
      need to set it to NULL at all.  Let's just delete it.
      
      Fixes: 57ba4c9b ("fsl/fman: Add FMan MAC support")
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1f1e70ef
    • Miroslav Lichvar's avatar
      net: ethtool: don't require CAP_NET_ADMIN for ETHTOOL_GLINKSETTINGS · 8006f6bf
      Miroslav Lichvar authored
      The ETHTOOL_GLINKSETTINGS command is deprecating the ETHTOOL_GSET
      command and likewise it shouldn't require the CAP_NET_ADMIN capability.
      Signed-off-by: default avatarMiroslav Lichvar <mlichvar@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8006f6bf
    • Jon Paul Maloy's avatar
      tipc: improve sanity check for received domain records · d876a4d2
      Jon Paul Maloy authored
      In commit 35c55c98 ("tipc: add neighbor monitoring framework") we
      added a data area to the link monitor STATE messages under the
      assumption that previous versions did not use any such data area.
      
      For versions older than Linux 4.3 this assumption is not correct. In
      those version, all STATE messages sent out from a node inadvertently
      contain a 16 byte data area containing a string; -a leftover from
      previous RESET messages which were using this during the setup phase.
      This string serves no purpose in STATE messages, and should no be there.
      
      Unfortunately, this data area is delivered to the link monitor
      framework, where a sanity check catches that it is not a correct domain
      record, and drops it. It also issues a rate limited warning about the
      event.
      
      Since such events occur much more frequently than anticipated, we now
      choose to remove the warning in order to not fill the kernel log with
      useless contents. We also make the sanity check stricter, to further
      reduce the risk that such data is inavertently admitted.
      Signed-off-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d876a4d2
    • Jon Paul Maloy's avatar
      tipc: fix compatibility bug in link monitoring · f7967556
      Jon Paul Maloy authored
      commit 81729810 ("tipc: fix link priority propagation") introduced a
      compatibility problem between TIPC versions newer than Linux 4.6 and
      those older than Linux 4.4. In versions later than 4.4, link STATE
      messages only contain a non-zero link priority value when the sender
      wants the receiver to change its priority. This has the effect that the
      receiver resets itself in order to apply the new priority. This works
      well, and is consistent with the said commit.
      
      However, in versions older than 4.4 a valid link priority is present in
      all sent link STATE messages, leading to cyclic link establishment and
      reset on the 4.6+ node.
      
      We fix this by adding a test that the received value should not only
      be valid, but also differ from the current value in order to cause the
      receiving link endpoint to reset.
      Reported-by: default avatarAmar Nv <amar.nv005@gmail.com>
      Signed-off-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f7967556
    • Andrew Lunn's avatar
      net: ethernet: mvneta: Remove IFF_UNICAST_FLT which is not implemented · 97db8afa
      Andrew Lunn authored
      The mvneta driver advertises it supports IFF_UNICAST_FLT. However, it
      actually does not. The hardware probably does support it, but there is
      no code to configure the filter. As a quick and simple fix, remove the
      flag. This will cause the core to fall back to promiscuous mode.
      Signed-off-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Fixes: b50b72de ("net: mvneta: enable features before registering the driver")
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      97db8afa
    • Linus Torvalds's avatar
      Merge branch 'parisc-4.9-4' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux · 3ad0e83c
      Linus Torvalds authored
      Pull parisc fixes from Helge Deller:
       "On parisc we were still seeing occasional random segmentation faults
        and memory corruption on SMP machines. Dave Anglin then looked again
        at the TLB related code and found two issues in the PCI DMA and
        generic TLB flush functions.
      
        Then, in our startup code we had some timing of the cache and TLB
        functions to calculate a threshold when to use a complete TLB/cache
        flush or just to flush a specific range. This code produced a race
        with newly started CPUs and thus lead to occasional kernel crashes
        (due to stale TLB/cache entries). The patch by Dave fixes this issue
        by flushing the local caches before starting secondary CPUs and by
        removing the race.
      
        The last problem fixed by this series is that we quite often suffered
        from hung tasks and self-detected stalls on the CPUs. It was somehow
        clear that this was related to the (in v4.7) newly introduced cr16
        clocksource and the own implementation of sched_clock(). I replaced
        the open-coded sched_clock() function and switched to the generic
        sched_clock() implementation which seems to have fixed this isse as
        well.
      
        All patches have been sucessfully tested on a variety of machines,
        including our debian buildd servers.
      
        All patches (beside the small pr_cont fix) are tagged for stable
        releases"
      
      * 'parisc-4.9-4' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
        parisc: Also flush data TLB in flush_icache_page_asm
        parisc: Fix race in pci-dma.c
        parisc: Switch to generic sched_clock implementation
        parisc: Fix races in parisc_setup_cache_timing()
        parisc: Fix printk continuations in system detection
      3ad0e83c
  4. 25 Nov, 2016 10 commits