1. 03 May, 2020 12 commits
  2. 02 May, 2020 9 commits
  3. 01 May, 2020 19 commits
    • Stanislav Fomichev's avatar
      selftests/bpf: Use reno instead of dctcp · 57dc6f3b
      Stanislav Fomichev authored
      Andrey pointed out that we can use reno instead of dctcp for CC
      tests and drop CONFIG_TCP_CONG_DCTCP=y requirement.
      
      Fixes: beecf11b ("bpf: Bpf_{g,s}etsockopt for struct bpf_sock_addr")
      Suggested-by: default avatarAndrey Ignatov <rdna@fb.com>
      Signed-off-by: default avatarStanislav Fomichev <sdf@google.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Link: https://lore.kernel.org/bpf/20200501224320.28441-1-sdf@google.com
      57dc6f3b
    • Karsten Graul's avatar
      net/smc: llc_add_link_work to handle ADD_LINK LLC requests · b45e7f98
      Karsten Graul authored
      Introduce a work that is scheduled when a new ADD_LINK LLC request is
      received. The work will call either the SMC client or SMC server
      ADD_LINK processing.
      Signed-off-by: default avatarKarsten Graul <kgraul@linux.ibm.com>
      Reviewed-by: default avatarUrsula Braun <ubraun@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b45e7f98
    • Karsten Graul's avatar
      net/smc: allocate index for a new link · 8574cf40
      Karsten Graul authored
      Add smc_llc_alloc_alt_link() to find a free link index for a new link,
      depending on the new link group type. And update constants for the
      maximum number of links to 3 (2 symmetric and 1 dangling asymmetric link).
      These maximum numbers are the same as used by other implementations of the
      SMC-R protocol.
      Signed-off-by: default avatarKarsten Graul <kgraul@linux.ibm.com>
      Reviewed-by: default avatarUrsula Braun <ubraun@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8574cf40
    • Karsten Graul's avatar
      net/smc: introduce smc_pnet_find_alt_roce() · 6c868a3e
      Karsten Graul authored
      Introduce a new function in smc_pnet.c that searches for an alternate
      IB device, using an existing link group and a primary IB device. The
      alternate IB device needs to be active and must have the same PNETID
      as the link group.
      Signed-off-by: default avatarKarsten Graul <kgraul@linux.ibm.com>
      Reviewed-by: default avatarUrsula Braun <ubraun@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6c868a3e
    • Karsten Graul's avatar
      net/smc: remove DELETE LINK processing from smc_core.c · 33d20330
      Karsten Graul authored
      Support for multiple links makes the former DELETE LINK processing
      obsolete which sent one DELETE_LINK LLC message for each single link.
      Remove this processing from smc_core.c.
      Signed-off-by: default avatarKarsten Graul <kgraul@linux.ibm.com>
      Reviewed-by: default avatarUrsula Braun <ubraun@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      33d20330
    • Karsten Graul's avatar
      net/smc: take link down instead of terminating the link group · 87523930
      Karsten Graul authored
      Use the introduced link down processing in all places where the link
      group is terminated and take down the affected link only.
      Signed-off-by: default avatarKarsten Graul <kgraul@linux.ibm.com>
      Reviewed-by: default avatarUrsula Braun <ubraun@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      87523930
    • Karsten Graul's avatar
      net/smc: add smcr_port_err() and smcr_link_down() processing · 541afa10
      Karsten Graul authored
      Call smcr_port_err() when an IB event reports an inactive IB device.
      smcr_port_err() calls smcr_link_down() for all affected links.
      smcr_link_down() either triggers the local DELETE_LINK processing, or
      sends an DELETE_LINK LLC message to the SMC server to initiate the
      processing.
      The old handler function smc_port_terminate() is removed.
      Add helper smcr_link_down_cond() to take a link down conditionally, and
      smcr_link_down_cond_sched() to schedule the link_down processing to a
      work.
      Signed-off-by: default avatarKarsten Graul <kgraul@linux.ibm.com>
      Reviewed-by: default avatarUrsula Braun <ubraun@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      541afa10
    • Karsten Graul's avatar
      net/smc: add smcr_port_add() and smcr_link_up() processing · 1f90a05d
      Karsten Graul authored
      Call smcr_port_add() when an IB event reports a new active IB device.
      smcr_port_add() will start a work which either triggers the local
      ADD_LINK processing, or send an ADD_LINK LLC message to the SMC server
      to initiate the processing.
      Signed-off-by: default avatarKarsten Graul <kgraul@linux.ibm.com>
      Reviewed-by: default avatarUrsula Braun <ubraun@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1f90a05d
    • Karsten Graul's avatar
      net/smc: remember PNETID of IB device for later device matching · 35dcf7ec
      Karsten Graul authored
      The PNETID is needed to find an alternate link for a link group.
      Save the PNETID of the link that is used to create the link group for
      later device matching.
      Signed-off-by: default avatarKarsten Graul <kgraul@linux.ibm.com>
      Reviewed-by: default avatarUrsula Braun <ubraun@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      35dcf7ec
    • Karsten Graul's avatar
      net/smc: mutex to protect the lgr against parallel reconfigurations · d5500667
      Karsten Graul authored
      Introduce llc_conf_mutex in the link group which is used to protect the
      buffers and lgr states against parallel link reconfiguration.
      This ensures that new connections do not start to register buffers with
      the links of a link group when link creation or termination is running.
      Signed-off-by: default avatarKarsten Graul <kgraul@linux.ibm.com>
      Reviewed-by: default avatarUrsula Braun <ubraun@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d5500667
    • Karsten Graul's avatar
      net/smc: extend smc_llc_send_add_link() and smc_llc_send_delete_link() · fbed3b37
      Karsten Graul authored
      All LLC sends are done from worker context only, so remove the prep
      functions which were used to build the message before it was sent, and
      add the function content into the respective send function
      smc_llc_send_add_link() and smc_llc_send_delete_link().
      Extend smc_llc_send_add_link() to include the qp_mtu value in the LLC
      message, which is needed to establish a link after the initial link was
      created. Extend smc_llc_send_delete_link() to contain a link_id and a
      reason code for the link deletion in the LLC message, which is needed
      when a specific link should be deleted.
      And add the list of existing DELETE_LINK reason codes.
      Signed-off-by: default avatarKarsten Graul <kgraul@linux.ibm.com>
      Reviewed-by: default avatarUrsula Braun <ubraun@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fbed3b37
    • Karsten Graul's avatar
      net/smc: map and register buffers for a new link · fb33d277
      Karsten Graul authored
      Introduce support to map and register all current buffers for a new
      link. smcr_buf_map_lgr() will map used buffers for a new link and
      smcr_buf_reg_lgr() can be called to register used buffers on the
      IB device of the new link.
      Signed-off-by: default avatarKarsten Graul <kgraul@linux.ibm.com>
      Reviewed-by: default avatarUrsula Braun <ubraun@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fb33d277
    • Karsten Graul's avatar
      net/smc: unmapping of buffers to support multiple links · 4a3641c1
      Karsten Graul authored
      With the support of multiple links that are created and cleared there
      is a need to unmap one link from all current buffers. Add unmapping by
      link and by rmb. And make smcr_link_clear() available to be called from
      the LLC layer.
      Signed-off-by: default avatarKarsten Graul <kgraul@linux.ibm.com>
      Reviewed-by: default avatarUrsula Braun <ubraun@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4a3641c1
    • Karsten Graul's avatar
      net/smc: multiple link support for rmb buffer registration · 7562a13d
      Karsten Graul authored
      The CONFIRM_RKEY LLC processing handles all links in one LLC message.
      Move the call to this processing out of smcr_link_reg_rmb() which does
      processing per link, into smcr_lgr_reg_rmbs() which is responsible for
      link group level processing. Move smcr_link_reg_rmb() into module
      smc_core.c.
      >From af_smc.c now call smcr_lgr_reg_rmbs() to register new rmbs on all
      available links.
      Signed-off-by: default avatarKarsten Graul <kgraul@linux.ibm.com>
      Reviewed-by: default avatarUrsula Braun <ubraun@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7562a13d
    • David S. Miller's avatar
      Merge branch 'Introduce-a-flow-gate-control-action-and-apply-IEEE' · 47c0b580
      David S. Miller authored
      Po Liu says:
      
      ====================
      Introduce a flow gate control action and apply IEEE
      
      Changes from V4:
      ----------------
      0001:
      Fix and modify according to Vlid Buslov suggestions:
      - Change spin_lock_bh() to spin_lock() since tcf_gate_act() already in
      software irq.
      - Remove spin lock protect in the ops->cleanup function.
      - Enable the CONFIG_DEBUG_ATOMIC_SLEEP and CONFIG_PROVE_LOCKING checking,
      then fix as kzalloc flag type and lock deadlock.
      - Change kzalloc() flag type from GFP_KERNEL to GFP_ATOMIC since
      function in the spin_lock protect.
      - Change hrtimer type from HRTIMER_MODE_ABS to HRTIMER_MODE_ABS_SOFT
      to avoid deadlock.
      
      0002:
      Fix and modify according to Vlid Buslov suggestions:
      - Remove all rcu_read_lock protection since no rcu parameters.
      - Enable the CONFIG_DEBUG_ATOMIC_SLEEP and CONFIG_PROVE_LOCKING checking,
      then check kzalloc sleeping flag.
      - Change kzalloc to kcalloc for array memory alloc and change GFP_KERNEL
      flag to GFP_ATOMIC since function holding spin_lock protect.
      
      0003:
      - No changes.
      
      0004:
      - Commit comments rephrase act by Claudiu Manoil.
      
      Changes from V3:
      ----------------
      0001:
      
      Fix and modify according to Vlid Buslov:
      - Remove the struct gate_action and move the parameters to the
      struct tcf_gate align with tc_action parameters. This would not need to
      alloc rcu type memory with pointer.
      - Remove the spin_lock type entry_lock which do not needed anymore, will
      use the tcf_lock system provided.
      - Provide lockep protect for the status parameters in the tcf_gate_act().
      - Remove the cycletime 0 input warning, return error directly.
      
      And:
      - Remove Qci related description in the Kconfig for gate action.
      
      0002:
      - Fix rcu_read_lock protect range suggested by Vlid Buslov.
      
      0003:
      - No changes.
      
      0004:
      - Fix bug of gate maxoct wildcard condition not included.
      - Fix the pass time basetime calculation report by Vladimir Otlean.
      
      Changes from V2:
      0001: No changes.
      0002: No changes.
      0003: No changes.
      0004: Fix the vlan id filter parameter and add reject src mac
      FF-FF-FF-FF-FF-FF filter in driver.
      
      Changes from V1:
      ----------------
      0000: Update description make it more clear
      0001: Removed 'add update dropped stats' patch, will provide pull
      request as standalone patches.
      0001: Update commit description make it more clear ack by Jiri Pirko.
      0002: No changes
      0003: Fix some code style ack by Jiri Pirko.
      0004: Fix enetc_psfp_enable/disable parameter type ack by test robot
      
      iprout2 command patches:
        Not attach with these serial patches, will provide separate pull
      request after kernel accept these patches.
      
      Changes from RFC:
      -----------------
      0000: Reduce to 5 patches and remove the 4 max frame size offload and
      flow metering in the policing offload action, Only keep gate action
      offloading implementation.
      0001: No changes.
      0002:
       - fix kfree lead ack by Jakub Kicinski and Cong Wang
       - License fix from Jakub Kicinski and Stephen Hemminger
       - Update example in commit acked by Vinicius Costa Gomes
       - Fix the rcu protect in tcf_gate_act() acked by Vinicius
      
      0003: No changes
      0004: No changes
      0005:
       Acked by Vinicius Costa Gomes
       - Use refcount kernel lib
       - Update stream gate check code position
       - Update reduce ref names more clear
      
      iprout2 command patches:
      0000: Update license expression and add gate id
      0001: Add tc action gate man page
      
      --------------------------------------------------------------------
      These patches add stream gate action policing in IEEE802.1Qci (Per-Stream
      Filtering and Policing) software support and hardware offload support in
      tc flower, and implement the stream identify, stream filtering and
      stream gate filtering action in the NXP ENETC ethernet driver.
      Per-Stream Filtering and Policing (PSFP) specifies flow policing and
      filtering for ingress flows, and has three main parts:
       1. The stream filter instance table consists of an ordered list of
      stream filters that determine the filtering and policing actions that
      are to be applied to frames received on a specific stream. The main
      elements are stream gate id, flow metering id and maximum SDU size.
       2. The stream gate function setup a gate list to control ingress traffic
      class open/close state. When the gate is running at open state, the flow
      could pass but dropped when gate state is running to close. User setup a
      bastime to tell gate when start running the entry list, then the hardware
      would periodiclly. There is no compare qdisc action support.
       3. Flow metering is two rates two buckets and three-color marker to
      policing the frames. Flow metering instance are as specified in the
      algorithm in MEF10.3. The most likely qdisc action is policing action.
      
      The first patch introduces an ingress frame flow control gate action,
      for the point 2. The tc gate action maintains the open/close state gate
      list, allowing flows to pass when the gate is open. Each gate action
      may policing one or more qdisc filters. When the start time arrived, The
      driver would repeat the gate list periodiclly. User can assign a passed
      time, the driver would calculate a new future time by the cycletime of
      the gate list.
      
      The 0002 patch introduces the gate flow hardware offloading.
      
      The 0003 patch adds support control the on/off for the tc flower
      offloading by ethtool.
      
      The 0004 patch implement the stream identify and stream filtering and
      stream gate filtering action in the NXP ENETC ethernet driver. Tc filter
      command provide filtering keys with MAC address and VLAN id. These keys
      would be set to stream identify instance entry. Stream gate instance
      entry would refer the gate action parameters. Stream filter instance
      entry would refer the stream gate index and assign a stream handle value
      matches to the stream identify instance.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      47c0b580
    • Po Liu's avatar
      net: enetc: add tc flower psfp offload driver · 888ae5a3
      Po Liu authored
      This patch is to add tc flower offload for the enetc IEEE 802.1Qci(PSFP)
      function. There are four main feature parts to implement the flow
      policing and filtering for ingress flow with IEEE 802.1Qci features.
      They are stream identify(this is defined in the P802.1cb exactly but
      needed for 802.1Qci), stream filtering, stream gate and flow metering.
      Each function block includes many entries by index to assign parameters.
      So for one frame would be filtered by stream identify first, then
      flow into stream filter block by the same handle between stream identify
      and stream filtering. Then flow into stream gate control which assigned
      by the stream filtering entry. And then policing by the gate and limited
      by the max sdu in the filter block(optional). At last, policing by the
      flow metering block, index choosing at the fitering block.
      So you can see that each entry of block may link to many upper entries
      since they can be assigned same index means more streams want to share
      the same feature in the stream filtering or stream gate or flow
      metering.
      To implement such features, each stream filtered by source/destination
      mac address, some stream maybe also plus the vlan id value would be
      treated as one flow chain. This would be identified by the chain_index
      which already in the tc filter concept. Driver would maintain this chain
      and also with gate modules. The stream filter entry create by the gate
      index and flow meter(optional) entry id and also one priority value.
      Offloading only transfer the gate action and flow filtering parameters.
      Driver would create (or search same gate id and flow meter id and
       priority) one stream filter entry to set to the hardware. So stream
      filtering do not need transfer by the action offloading.
      This architecture is same with tc filter and actions relationship. tc
      filter maintain the list for each flow feature by keys. And actions
      maintain by the action list.
      
      Below showing a example commands by tc:
      > tc qdisc add dev eth0 ingress
      > ip link set eth0 address 10:00:80:00:00:00
      > tc filter add dev eth0 parent ffff: protocol ip chain 11 \
      	flower skip_sw dst_mac 10:00:80:00:00:00 \
      	action gate index 10 \
      	sched-entry open 200000000 1 8000000 \
      	sched-entry close 100000000 -1 -1
      
      Command means to set the dst_mac 10:00:80:00:00:00 to index 11 of stream
      identify module. Then setting the gate index 10 of stream gate module.
      Keep the gate open for 200ms and limit the traffic volume to 8MB in this
      sched-entry. Then direct the frames to the ingress queue 1.
      Signed-off-by: default avatarPo Liu <Po.Liu@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      888ae5a3
    • Po Liu's avatar
      net: enetc: add hw tc hw offload features for PSPF capability · 79e49982
      Po Liu authored
      This patch is to let ethtool enable/disable the tc flower offload
      features. Hardware ENETC has the feature of PSFP which is for per-stream
      policing. When enable the tc hw offloading feature, driver would enable
      the IEEE 802.1Qci feature. It is only set the register enable bit for
      this feature not enable for any entry of per stream filtering and stream
      gate or stream identify but get how much capabilities for each feature.
      Signed-off-by: default avatarPo Liu <Po.Liu@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      79e49982
    • Po Liu's avatar
      net: schedule: add action gate offloading · d29bdd69
      Po Liu authored
      Add the gate action to the flow action entry. Add the gate parameters to
      the tc_setup_flow_action() queueing to the entries of flow_action_entry
      array provide to the driver.
      Signed-off-by: default avatarPo Liu <Po.Liu@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d29bdd69
    • Po Liu's avatar
      net: qos: introduce a gate control flow action · a51c328d
      Po Liu authored
      Introduce a ingress frame gate control flow action.
      Tc gate action does the work like this:
      Assume there is a gate allow specified ingress frames can be passed at
      specific time slot, and be dropped at specific time slot. Tc filter
      chooses the ingress frames, and tc gate action would specify what slot
      does these frames can be passed to device and what time slot would be
      dropped.
      Tc gate action would provide an entry list to tell how much time gate
      keep open and how much time gate keep state close. Gate action also
      assign a start time to tell when the entry list start. Then driver would
      repeat the gate entry list cyclically.
      For the software simulation, gate action requires the user assign a time
      clock type.
      
      Below is the setting example in user space. Tc filter a stream source ip
      address is 192.168.0.20 and gate action own two time slots. One is last
      200ms gate open let frame pass another is last 100ms gate close let
      frames dropped. When the ingress frames have reach total frames over
      8000000 bytes, the excessive frames will be dropped in that 200000000ns
      time slot.
      
      > tc qdisc add dev eth0 ingress
      
      > tc filter add dev eth0 parent ffff: protocol ip \
      	   flower src_ip 192.168.0.20 \
      	   action gate index 2 clockid CLOCK_TAI \
      	   sched-entry open 200000000 -1 8000000 \
      	   sched-entry close 100000000 -1 -1
      
      > tc chain del dev eth0 ingress chain 0
      
      "sched-entry" follow the name taprio style. Gate state is
      "open"/"close". Follow with period nanosecond. Then next item is internal
      priority value means which ingress queue should put. "-1" means
      wildcard. The last value optional specifies the maximum number of
      MSDU octets that are permitted to pass the gate during the specified
      time interval.
      Base-time is not set will be 0 as default, as result start time would
      be ((N + 1) * cycletime) which is the minimal of future time.
      
      Below example shows filtering a stream with destination mac address is
      10:00:80:00:00:00 and ip type is ICMP, follow the action gate. The gate
      action would run with one close time slot which means always keep close.
      The time cycle is total 200000000ns. The base-time would calculate by:
      
       1357000000000 + (N + 1) * cycletime
      
      When the total value is the future time, it will be the start time.
      The cycletime here would be 200000000ns for this case.
      
      > tc filter add dev eth0 parent ffff:  protocol ip \
      	   flower skip_hw ip_proto icmp dst_mac 10:00:80:00:00:00 \
      	   action gate index 12 base-time 1357000000000 \
      	   sched-entry close 200000000 -1 -1 \
      	   clockid CLOCK_TAI
      Signed-off-by: default avatarPo Liu <Po.Liu@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a51c328d