1. 05 Oct, 2016 13 commits
    • David Howells's avatar
      rxrpc: Don't request an ACK on the last DATA packet of a call's Tx phase · d044da4b
      David Howells authored
      Don't request an ACK on the last DATA packet of a call's Tx phase as for a
      client there will be a reply packet or some sort of ACK to shift phase.  If
      the ACK is requested, OpenAFS sends a REQUESTED-ACK ACK with soft-ACKs in
      it and doesn't follow up with a hard-ACK.
      
      If we don't set the flag, OpenAFS will send a DELAY ACK that hard-ACKs the
      reply data, thereby allowing the call to terminate cleanly.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      d044da4b
    • David Howells's avatar
      rxrpc: Need to produce an ACK for service op if op takes a long time · b090aef7
      David Howells authored
      We need to generate a DELAY ACK from the service end of an operation if we
      start doing the actual operation work and it takes longer than expected.
      This will hard-ACK the request data and allow the client to release its
      resources.
      
      To make this work:
      
       (1) We have to set the ack timer and propose an ACK when the call moves to
           the RXRPC_CALL_SERVER_ACK_REQUEST and clear the pending ACK and cancel
           the timer when we start transmitting the reply (the first DATA packet
           of the reply implicitly ACKs the request phase).
      
       (2) It must be possible to set the timer when the caller is holding
           call->state_lock, so split the lock-getting part of the timer function
           out.
      
       (3) Add trace notes for the ACK we're requesting and the timer we clear.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      b090aef7
    • David Howells's avatar
      afs: Check for fatal error when in waiting for ack state · b38e61fc
      David Howells authored
      When it's in the waiting-for-ACK state, the AFS filesystem needs to check
      the result of rxrpc_kernel_recv_data() any time it is notified to see if it
      is indicating a fatal error.  If this is the case, it needs to mark the
      call completed otherwise the call just sits there and never goes away.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      b38e61fc
    • David Howells's avatar
      rxrpc: Return negative error code to kernel service · 7c93deba
      David Howells authored
      In rxrpc_kernel_recv_data(), when we return the error number incurred by a
      failed call, we must negate it before returning it as it's stored as
      positive (that's what we have to pass back to userspace).
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      7c93deba
    • David Howells's avatar
      rxrpc: Add missing notification · 80f91864
      David Howells authored
      The call's background processor work item needs to notify the socket when
      it completes a call so that recvmsg() or the AFS fs can deal with it.
      Without this, call expiry isn't handled.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      80f91864
    • David Howells's avatar
      rxrpc: Queue the call on expiry · 8f6ab339
      David Howells authored
      When a call expires, it must be queued for the background processor to deal
      with otherwise a service call that is improperly terminated will just sit
      there awaiting an ACK and won't expire.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      8f6ab339
    • David Howells's avatar
      rxrpc: Partially handle OpenAFS's improper termination of calls · c6b1c58f
      David Howells authored
      OpenAFS doesn't always correctly terminate client calls that it makes -
      this includes calls the OpenAFS servers make to the cache manager service.
      It should end the client call with either:
      
       (1) An ACK that has firstPacket set to one greater than the seq number of
           the reply DATA packet with the LAST_PACKET flag set (thereby
           hard-ACK'ing all packets).  nAcks should be 0 and acks[] should be
           empty (ie. no soft-ACKs).
      
       (2) An ACKALL packet.
      
      OpenAFS, though, may send an ACK packet with firstPacket set to the last
      seq number or less and soft-ACKs listed for all packets up to and including
      the last DATA packet.
      
      The transmitter, however, is obliged to keep the call live and the
      soft-ACK'd DATA packets around until they're hard-ACK'd as the receiver is
      permitted to drop any merely soft-ACK'd packet and request retransmission
      by sending an ACK packet with a NACK in it.
      
      Further, OpenAFS will also terminate a client call by beginning the next
      client call on the same connection channel.  This implicitly completes the
      previous call.
      
      This patch handles implicit ACK of a call on a channel by the reception of
      the first packet of the next call on that channel.
      
      If another call doesn't come along to implicitly ACK a call, then we have
      to time the call out.  There are some bugs there that will be addressed in
      subsequent patches.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      c6b1c58f
    • David Howells's avatar
      rxrpc: Fix loss of PING RESPONSE ACK production due to PING ACKs · 7edef5a1
      David Howells authored
      Separate the output of PING ACKs from the output of other sorts of ACK so
      that if we receive a PING ACK and schedule transmission of a PING RESPONSE
      ACK, the response doesn't get cancelled by a PING ACK we happen to be
      scheduling transmission of at the same time.
      
      If a PING RESPONSE gets lost, the other side might just sit there waiting
      for it and refuse to proceed otherwise.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      7edef5a1
    • David Howells's avatar
      rxrpc: Fix warning by splitting rxrpc_send_call_packet() · e2fbba62
      David Howells authored
      Split rxrpc_send_data_packet() to separate ACK generation (which is more
      complicated) from ABORT generation.  This simplifies the code a bit and
      fixes the following warning:
      
      In file included from ../net/rxrpc/output.c:20:0:
      net/rxrpc/output.c: In function 'rxrpc_send_call_packet':
      net/rxrpc/ar-internal.h:1187:27: error: 'top' may be used uninitialized in this function [-Werror=maybe-uninitialized]
      net/rxrpc/output.c:103:24: note: 'top' was declared here
      net/rxrpc/output.c:225:25: error: 'hard_ack' may be used uninitialized in this function [-Werror=maybe-uninitialized]
      Reported-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      e2fbba62
    • David Howells's avatar
      rxrpc: Only ping for lost reply in client call · b18523ef
      David Howells authored
      When a reply is deemed lost, we send a ping to find out the other end
      received all the request data packets we sent.  This should be limited to
      client calls and we shouldn't do this on service calls.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      b18523ef
    • David Howells's avatar
      rxrpc: Fix oops on incoming call to serviceless endpoint · 9dfa1237
      David Howells authored
      If an call comes in to a local endpoint that isn't listening for any
      incoming calls at the moment, an oops will happen.  We need to check that
      the local endpoint's service pointer isn't NULL before we dereference it.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      9dfa1237
    • David Howells's avatar
      rxrpc: Fix duplicate const · 07ed6e95
      David Howells authored
      Remove a duplicate const keyword.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      07ed6e95
    • David Howells's avatar
      rxrpc: Accesses of rxrpc_local::service need to be RCU managed · 0f4b111a
      David Howells authored
      struct rxrpc_local->service is marked __rcu - this means that accesses of
      it need to be managed using RCU wrappers.  There are two such places in
      rxrpc_release_sock() where the value is checked and cleared.  Fix this by
      using the appropriate wrappers.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      0f4b111a
  2. 04 Oct, 2016 27 commits
    • David S. Miller's avatar
      Merge branch 'ncsi-next' · 9a8dd213
      David S. Miller authored
      Gavin Shan says:
      
      ====================
      net/ncsi: NCSI Improvment and bug fixes
      
      This series of patches improves NCSI stack according to the comments
      I received after the NCSI code was merged to 4.8.rc1:
      
        * PATCH[1/8] fixes the build warning caused by xchg() with ia64-linux-gcc.
          The atomic operations are removed. The NCSI's lock should be taken when
          reading or updating its state and chained state.
        * Channel ID (0x1f) is the reserved one and it cannot be valid channel ID.
          So we needn't try to probe channel whose ID is 0x1f. PATCH[2/8] and
          PATCH[3/8] are addressing this issue.
        * The request IDs are assigned in round-robin fashion, but it's broken.
          PATCH[4/8] make it work.
        * PATCH[5/8] and PATCH[6/8] reworks the channel monitoring to improve the
          code readability and its robustness.
        * PATCH[7/8] and PATCH[8/8] introduces ncsi_stop_dev() so that the network
          device can be closed and opened afterwards. No error will be seen.
      
      Changelog
      =========
      v2:
        * The NCSI's lock is taken when reading or updating its state as the
          {READ,WRITE}_ONCE() isn't reliable.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9a8dd213
    • Gavin Shan's avatar
      net/faraday: Stop NCSI device on shutdown · 2c15f25b
      Gavin Shan authored
      This stops NCSI device when closing the network device so that the
      NCSI device can be reenabled later.
      Signed-off-by: default avatarGavin Shan <gwshan@linux.vnet.ibm.com>
      Reviewed-by: default avatarJoel Stanley <joel@jms.id.au>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2c15f25b
    • Gavin Shan's avatar
      net/ncsi: Introduce ncsi_stop_dev() · c0cd1ba4
      Gavin Shan authored
      This introduces ncsi_stop_dev(), as counterpart to ncsi_start_dev(),
      to stop the NCSI device so that it can be reenabled in future. This
      API should be called when the network device driver is going to
      shutdown the device. There are 3 things done in the function: Stop
      the channel monitoring; Reset channels to inactive state; Report
      NCSI link down.
      Signed-off-by: default avatarGavin Shan <gwshan@linux.vnet.ibm.com>
      Reviewed-by: default avatarJoel Stanley <joel@jms.id.au>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c0cd1ba4
    • Gavin Shan's avatar
      net/ncsi: Rework the channel monitoring · 83afdc6a
      Gavin Shan authored
      The original NCSI channel monitoring was implemented based on a
      backoff algorithm: the GLS response should be received in the
      specified interval. Otherwise, the channel is regarded as dead
      and failover should be taken if current channel is an active one.
      There are several problems in the implementation: (A) On BCM5718,
      we found when the IID (Instance ID) in the GLS command packet
      changes from 255 to 1, the response corresponding to IID#1 never
      comes in. It means we cannot make the unfair judgement that the
      channel is dead when one response is missed. (B) The code's
      readability should be improved. (C) We should do failover when
      current channel is active one and the channel monitoring should
      be marked as disabled before doing failover.
      
      This reworks the channel monitoring to address all above issues.
      The fields for channel monitoring is put into separate struct
      and the state of channel monitoring is predefined. The channel
      is regarded alive if the network controller responses to one of
      two GLS commands or both of them in 5 seconds.
      Signed-off-by: default avatarGavin Shan <gwshan@linux.vnet.ibm.com>
      Reviewed-by: default avatarJoel Stanley <joel@jms.id.au>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      83afdc6a
    • Gavin Shan's avatar
      net/ncsi: Allow to extend NCSI request properties · a0509cbe
      Gavin Shan authored
      There is only one NCSI request property for now: the response for
      the sent command need drive the workqueue or not. So we had one
      field (@driven) for the purpose. We lost the flexibility to extend
      NCSI request properties.
      
      This replaces @driven with @flags and @req_flags in NCSI request
      and NCSI command argument struct. Each bit of the newly introduced
      field can be used for one property. No functional changes introduced.
      Signed-off-by: default avatarGavin Shan <gwshan@linux.vnet.ibm.com>
      Reviewed-by: default avatarJoel Stanley <joel@jms.id.au>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a0509cbe
    • Gavin Shan's avatar
      net/ncsi: Rework request index allocation · a15af54f
      Gavin Shan authored
      The NCSI request index (struct ncsi_request::id) is put into instance
      ID (IID) field while sending NCSI command packet. It was designed the
      available IDs are given in round-robin fashion. @ndp->request_id was
      introduced to represent the next available ID, but it has been used
      as number of successively allocated IDs. It breaks the round-robin
      design. Besides, we shouldn't put 0 to NCSI command packet's IID
      field, meaning ID#0 should be reserved according section 6.3.1.1
      in NCSI spec (v1.1.0).
      
      This fixes above two issues. With it applied, the available IDs will
      be assigned in round-robin fashion and ID#0 won't be assigned.
      Signed-off-by: default avatarGavin Shan <gwshan@linux.vnet.ibm.com>
      Reviewed-by: default avatarJoel Stanley <joel@jms.id.au>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a15af54f
    • Gavin Shan's avatar
      net/ncsi: Don't probe on the reserved channel ID (0x1f) · 55e02d08
      Gavin Shan authored
      We needn't send CIS (Clear Initial State) command to the NCSI
      reserved channel (0x1f) in the enumeration. We shouldn't receive
      a valid response from CIS on NCSI channel 0x1f.
      Signed-off-by: default avatarGavin Shan <gwshan@linux.vnet.ibm.com>
      Reviewed-by: default avatarJoel Stanley <joel@jms.id.au>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      55e02d08
    • Gavin Shan's avatar
      net/ncsi: Introduce NCSI_RESERVED_CHANNEL · bc7e0f50
      Gavin Shan authored
      This defines NCSI_RESERVED_CHANNEL as the reserved NCSI channel
      ID (0x1f). No logical changes introduced.
      Signed-off-by: default avatarGavin Shan <gwshan@linux.vnet.ibm.com>
      Reviewed-by: default avatarJoel Stanley <joel@jms.id.au>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bc7e0f50
    • Gavin Shan's avatar
      net/ncsi: Avoid unused-value build warning from ia64-linux-gcc · d8cedaab
      Gavin Shan authored
      xchg() is used to set NCSI channel's state in order for consistent
      access to the state. xchg()'s return value should be used. Otherwise,
      one build warning will be raised (with -Wunused-value) as below message
      indicates. It is reported by ia64-linux-gcc (GCC) 4.9.0.
      
       net/ncsi/ncsi-manage.c: In function 'ncsi_channel_monitor':
       arch/ia64/include/uapi/asm/cmpxchg.h:56:2: warning: value computed is \
       not used [-Wunused-value]
        ((__typeof__(*(ptr))) __xchg((unsigned long) (x), (ptr), sizeof(*(ptr))))
         ^
       net/ncsi/ncsi-manage.c:202:3: note: in expansion of macro 'xchg'
        xchg(&nc->state, NCSI_CHANNEL_INACTIVE);
      
      This removes the atomic access to NCSI channel's state avoid the above
      build warning. We have to hold the channel's lock when its state is readed
      or updated. No functional changes introduced.
      Signed-off-by: default avatarGavin Shan <gwshan@linux.vnet.ibm.com>
      Reviewed-by: default avatarJoel Stanley <joel@jms.id.au>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d8cedaab
    • Andrew Collins's avatar
      net: Add netdev all_adj_list refcnt propagation to fix panic · 93409033
      Andrew Collins authored
      This is a respin of a patch to fix a relatively easily reproducible kernel
      panic related to the all_adj_list handling for netdevs in recent kernels.
      
      The following sequence of commands will reproduce the issue:
      
      ip link add link eth0 name eth0.100 type vlan id 100
      ip link add link eth0 name eth0.200 type vlan id 200
      ip link add name testbr type bridge
      ip link set eth0.100 master testbr
      ip link set eth0.200 master testbr
      ip link add link testbr mac0 type macvlan
      ip link delete dev testbr
      
      This creates an upper/lower tree of (excuse the poor ASCII art):
      
                  /---eth0.100-eth0
      mac0-testbr-
                  \---eth0.200-eth0
      
      When testbr is deleted, the all_adj_lists are walked, and eth0 is deleted twice from
      the mac0 list. Unfortunately, during setup in __netdev_upper_dev_link, only one
      reference to eth0 is added, so this results in a panic.
      
      This change adds reference count propagation so things are handled properly.
      
      Matthias Schiffer reported a similar crash in batman-adv:
      
      https://github.com/freifunk-gluon/gluon/issues/680
      https://www.open-mesh.org/issues/247
      
      which this patch also seems to resolve.
      Signed-off-by: default avatarAndrew Collins <acollins@cradlepoint.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      93409033
    • Raju Lakkaraju's avatar
      net: phy: Add Edge-rate driver for Microsemi PHYs. · a4cc96d1
      Raju Lakkaraju authored
      Edge-rate:
      As system and networking speeds increase, a signal's output transition,
      also know as the edge rate or slew rate (V/ns), takes on greater importance
      because high-speed signals come with a price. That price is an assortment of
      interference problems like ringing on the line, signal overshoot and
      undershoot, extended signal settling times, crosstalk noise, transmission
      line reflections, false signal detection by the receiving device and
      electromagnetic interference (EMI) -- all of which can negate the potential
      gains designers are seeking when they try to increase system speeds through
      the use of higher performance logic devices. The fact is, faster signaling
      edge rates can cause a higher level of electrical noise or other type of
      interference that can actually lead to slower line speeds and lower maximum
      system frequencies. This parameter allow the board designers to change the
      driving strange, and thereby change the EMI behavioral.
      
      Edge-rate parameters (vddmac, edge-slowdown) get from Device Tree.
      
      Tested on Beaglebone Black with VSC 8531 PHY.
      Signed-off-by: default avatarRaju Lakkaraju <Raju.Lakkaraju@microsemi.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a4cc96d1
    • Benjamin Poirier's avatar
      vmxnet3: Wake queue from reset work · 277964e1
      Benjamin Poirier authored
      vmxnet3_reset_work() expects tx queues to be stopped (via
      vmxnet3_quiesce_dev -> netif_tx_disable). However, this races with the
      netif_wake_queue() call in netif_tx_timeout() such that the driver's
      start_xmit routine may be called unexpectedly, triggering one of the BUG_ON
      in vmxnet3_map_pkt with a stack trace like this:
      
      RIP: 0010:[<ffffffffa00cf4bc>] vmxnet3_map_pkt+0x3ac/0x4c0 [vmxnet3]
       [<ffffffffa00cf7e0>] vmxnet3_tq_xmit+0x210/0x4e0 [vmxnet3]
       [<ffffffff813ab144>] dev_hard_start_xmit+0x2e4/0x4c0
       [<ffffffff813c956e>] sch_direct_xmit+0x17e/0x1e0
       [<ffffffff813c96a7>] __qdisc_run+0xd7/0x130
       [<ffffffff813a6a7a>] net_tx_action+0x10a/0x200
       [<ffffffff810691df>] __do_softirq+0x11f/0x260
       [<ffffffff81472fdc>] call_softirq+0x1c/0x30
       [<ffffffff81004695>] do_softirq+0x65/0xa0
       [<ffffffff81069b89>] local_bh_enable_ip+0x99/0xa0
       [<ffffffffa031ff36>] destroy_conntrack+0x96/0x110 [nf_conntrack]
       [<ffffffff813d65e2>] nf_conntrack_destroy+0x12/0x20
       [<ffffffff8139c6d5>] skb_release_head_state+0xb5/0xf0
       [<ffffffff8139d299>] skb_release_all+0x9/0x20
       [<ffffffff8139cfe9>] __kfree_skb+0x9/0x90
       [<ffffffffa00d0069>] vmxnet3_quiesce_dev+0x209/0x340 [vmxnet3]
       [<ffffffffa00d020a>] vmxnet3_reset_work+0x6a/0xa0 [vmxnet3]
       [<ffffffff8107d7cc>] process_one_work+0x16c/0x350
       [<ffffffff810804fa>] worker_thread+0x17a/0x410
       [<ffffffff810848c6>] kthread+0x96/0xa0
       [<ffffffff81472ee4>] kernel_thread_helper+0x4/0x10
      Signed-off-by: default avatarBenjamin Poirier <bpoirier@suse.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      277964e1
    • David S. Miller's avatar
      Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue · 0438e3c8
      David S. Miller authored
      Jeff Kirsher says:
      
      ====================
      100GbE Intel Wired LAN Driver Updates 2016-10-02
      
      This series contains updates to fm10k only.
      
      Jake fixes an issue where PTP applications requesting software timestamps
      may complain that the requested mode is not supported, so add a generic
      callback for those drivers that have software transmit timestamp support
      enabled.  Then provides a trivial cleanup where a code was not wrapped
      properly.  Got make sure that code looks good in a 80 character limit.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0438e3c8
    • Guilherme G Piccoli's avatar
      i40e: avoid NULL pointer dereference and recursive errors on early PCI error · edfc23ee
      Guilherme G Piccoli authored
      Although rare, it's possible to hit PCI error early on device
      probe, meaning possibly some structs are not entirely initialized,
      and some might even be completely uninitialized, leading to NULL
      pointer dereference.
      
      The i40e driver currently presents a "bad" behavior if device hits
      such early PCI error: firstly, the struct i40e_pf might not be
      attached to pci_dev yet, leading to a NULL pointer dereference on
      access to pf->state.
      
      Even checking if the struct is NULL and avoiding the access in that
      case isn't enough, since the driver cannot recover from PCI error
      that early; in our experiments we saw multiple failures on kernel
      log, like:
      
        [549.664] i40e 0007:01:00.1: Initial pf_reset failed: -15
        [549.664] i40e: probe of 0007:01:00.1 failed with error -15
        [...]
        [871.644] i40e 0007:01:00.1: The driver for the device stopped because the
        device firmware failed to init. Try updating your NVM image.
        [871.644] i40e: probe of 0007:01:00.1 failed with error -32
        [...]
        [872.516] i40e 0007:01:00.0: ARQ: Unknown event 0x0000 ignored
      
      Between the first probe failure (error -15) and the second (error -32)
      another PCI error happened due to the first bad probe. Also, driver
      started to flood console with those ARQ event messages.
      
      This patch will prevent these issues by allowing error recovery
      mechanism to remove the failed device from the system instead of
      trying to recover from early PCI errors during device probe.
      
      CC: <stable@vger.kernel.org>
      Signed-off-by: default avatarGuilherme G Piccoli <gpiccoli@linux.vnet.ibm.com>
      Acked-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      edfc23ee
    • David S. Miller's avatar
      Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue · 2f8fab7a
      David S. Miller authored
      Jeff Kirsher says:
      
      ====================
      40GbE Intel Wired LAN Driver Updates 2016-10-03
      
      This series contains fixes to i40e only.
      
      Stefan Assmann provides the changes in this series to resolve an issue
      where when we run out of MSIx vectors, iWARP gets disabled automatically.
      First adds a check for "no vectors left" during MSIx vector allocation
      for VMDq, which will prevent more vectors being allocated than available.
      Then fixed the MSIx vector redistribution when we reach the hardware limit
      for vectors so that additional features like VMDq, iWARP, etc do not get
      starved for vectors because the PF is hogging all the resources.  Lastly,
      fix the issue for flow director by moving the check for the reaching the
      vector limit earlier in the code so that a decision can be made on
      disabling flow director.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2f8fab7a
    • David S. Miller's avatar
      Merge branch 'qed-qedr-infrastructure' · b462d22b
      David S. Miller authored
      Yuval Mintz says:
      
      ====================
      qed*: Add qedr infrastructure support
      
      In the last couple of weeks we've been sending RFCs for the qedr
      driver - the RoCE driver for QLogic FastLinQ 4xxxx line of adapters.
      Latest RFC can be found at [1].
      
      At Doug's advice [2], we've decided to split the series into two:
       - first part contains the qed backbone that's necessary for all the
      configurations relating to the qedr driver, as well as the qede
      infrastructure that is used for communication between the qedr and qede.
       - Second part consists of the actual qedr driver and introduces almost
      no changes to qed/qede.
      
      This is the first of said two parts. The second half would be sent
      later this week.
      
      The only 'oddity' in the devision are the Kconfig options -
      As this series introduces both LL2 and QEDR-based logic in qed/qede,
      I wanted to add the CONFIG_INFINIBAND_QEDR option here [with default n].
      Otherwise, a lot of the code introduced would be dead-code [won't even
      be compiled] until qedr is accepted.
      As a result I've placed the config option in an odd place - under
      qlogic's Kconfig. The second series would then remove that option
      and add it in its correct place under the infiniband Kconfig.
      [I'm fine with pushing it there to begin with, but I didn't want to
      'contaminate' non-qlogic configuration files with half-baked options].
      
      Dave - I don't think you were E-mailed with Doug's suggestion.
      I think the notion was to have the two halves accepted side-by-side,
      but actually the first has no dependency issues, so it's also
      possible to simply take this first to net-next, and push the qedr
      into rdma once it's merged. But it's basically up to you and Doug;
      We'd align with whatever suits you best.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b462d22b
    • Ram Amrani's avatar
      qed: Add RoCE ll2 & GSI support · abd49676
      Ram Amrani authored
      Add the RoCE-specific LL2 logic [as well as GSI support] over
      the 'generic' LL2 interface.
      Signed-off-by: default avatarRam Amrani <Ram.Amrani@caviumnetworks.com>
      Signed-off-by: default avatarYuval Mintz <Yuval.Mintz@caviumnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      abd49676
    • Ram Amrani's avatar
      qed: Add support for memory registeration verbs · ee8eaea3
      Ram Amrani authored
      Add slowpath configuration support for user, dma and memory
      regions registration.
      Signed-off-by: default avatarRam Amrani <Ram.Amrani@caviumnetworks.com>
      Signed-off-by: default avatarYuval Mintz <Yuval.Mintz@caviumnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ee8eaea3
    • Ram Amrani's avatar
      qed: Add support for QP verbs · f1093940
      Ram Amrani authored
      Add support for the slowpath configurations of Queue Pair verbs
      which adds, deletes, modifies and queries Queue Pairs.
      Signed-off-by: default avatarRam Amrani <Ram.Amrani@caviumnetworks.com>
      Signed-off-by: default avatarYuval Mintz <Yuval.Mintz@caviumnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f1093940
    • Ram Amrani's avatar
      qed: PD,PKEY and CQ verb support · c295f86e
      Ram Amrani authored
      Add support for the configurations of the protection domain and
      completion queues.
      Signed-off-by: default avatarRam Amrani <Ram.Amrani@caviumnetworks.com>
      Signed-off-by: default avatarYuval Mintz <Yuval.Mintz@caviumnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c295f86e
    • Ram Amrani's avatar
      qed: Add support for RoCE hw init · 51ff1725
      Ram Amrani authored
      This adds the backbone required for the various HW initalizations
      which are necessary for the qedr driver - FW notification, resource
      initializations, etc.
      Signed-off-by: default avatarRam Amrani <Ram.Amrani@caviumnetworks.com>
      Signed-off-by: default avatarYuval Mintz <Yuval.Mintz@caviumnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      51ff1725
    • Ram Amrani's avatar
      qede: Add qedr framework · cee9fbd8
      Ram Amrani authored
      Adds a skeletal implementation of the qede RoCE driver -
      The qedr has some dependencies of the state of the underlying base
      interface. This adds some logic required with mutual registrations
      and the ability to pass updates on 'intresting' events.
      Signed-off-by: default avatarRam Amrani <Ram.Amrani@caviumnetworks.com>
      Signed-off-by: default avatarYuval Mintz <Yuval.Mintz@caviumnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cee9fbd8
    • Yuval Mintz's avatar
      qed: Add Light L2 support · 0a7fb11c
      Yuval Mintz authored
      Other protocols beside the networking driver need the ability
      of passing some L2 traffic, usually [although not limited] for the
      purpose of some management traffic.
      Signed-off-by: default avatarYuval Mintz <Yuval.Mintz@caviumnetworks.com>
      Signed-off-by: default avatarRam Amrani <Ram.Amrani@caviumnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0a7fb11c
    • Stefan Assmann's avatar
      i40e: fix sideband flow director vector allocation · abd97a94
      Stefan Assmann authored
      Currently if the MSI-X vector limit is reached the sideband flow
      director gets disabled. A bit too early to make that decision, as
      vectors may get re-distributed. So move the check further back.
      Signed-off-by: default avatarStefan Assmann <sassmann@kpanic.de>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      abd97a94
    • Stefan Assmann's avatar
      i40e: fix MSI-X vector redistribution if hw limit is reached · 4ce20abc
      Stefan Assmann authored
      The driver allocates 1 vector per CPU thread and the current hardware
      limit for vectors is 129 per PF. On systems with 128 or more threads
      this currently means all vectors are used by the PF leaving no room for
      additional features like VMDq, iWARP, etc...
      The code that should redistribute the vectors in this case is broken and
      never triggers. Fixed the code so that it actually triggers if the
      hardware limit is reached and adjust the number of queue pairs
      accordingly.
      Also the number of initially requested iWARP vectors was not properly
      saved when the vector limit was reached, and therefore always zero.
      
      Comparison with debug statement.
      Before:
      i40e 0000:2d:00.0: VMDq disabled, not enough MSI-X vectors
      i40e 0000:2d:00.0: IWARP disabled, not enough MSI-X vectors
      i40e 00.0 MSI-X vector distribution: PF 128, VMDq 0, FDSB 0, iWARP 0
      After:
      i40e 0000:2d:00.0: MSI-X vector limit reached, attempting to redistribute vectors
      i40e 00.0 MSI-X vector distribution: PF 78, VMDq 8, FDSB 0, iWARP 42
      Signed-off-by: default avatarStefan Assmann <sassmann@kpanic.de>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      4ce20abc
    • Stefan Assmann's avatar
      i40e: check if vectors are already depleted when doing VMDq allocation · 9ca57e97
      Stefan Assmann authored
      During MSI-X vector allocation for VMDq, a check for "no vectors left"
      was missing, add it. This prevents more vectors to be allocated than
      available.
      Signed-off-by: default avatarStefan Assmann <sassmann@kpanic.de>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      9ca57e97
    • Christophe Jaillet's avatar
      ptp: Fix resource leak in case of error · b9118b72
      Christophe Jaillet authored
      A call to 'ida_simple_remove()' is missing in the error handling path.
      
      This as been spotted with the following coccinelle script which tries to
      detect missing 'ida_simple_remove()' call in error handling paths.
      
      ///////////////
      @@
      expression x;
      identifier l;
      @@
      
      *   x = ida_simple_get(...);
          ...
          if (...) {
          ...
          }
          ...
          if (...) {
             ...
             goto l;
          }
          ...
      *   l: ... when != ida_simple_remove(...);
      Signed-off-by: default avatarChristophe JAILLET <christophe.jaillet@wanadoo.fr>
      Acked-by: default avatarRichard Cochran <richardcochran@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b9118b72