1. 23 Apr, 2019 3 commits
  2. 15 Apr, 2019 1 commit
  3. 08 Apr, 2019 11 commits
    • Florian Westphal's avatar
      xfrm: store xfrm_mode directly, not its address · c9500d7b
      Florian Westphal authored
      This structure is now only 4 bytes, so its more efficient
      to cache a copy rather than its address.
      
      No significant size difference in allmodconfig vmlinux.
      
      With non-modular kernel that has all XFRM options enabled, this
      series reduces vmlinux image size by ~11kb. All xfrm_mode
      indirections are gone and all modes are built-in.
      
      before (ipsec-next master):
          text      data      bss         dec   filename
      21071494   7233140 11104324    39408958   vmlinux.master
      
      after this series:
      21066448   7226772 11104324    39397544   vmlinux.patched
      
      With allmodconfig kernel, the size increase is only 362 bytes,
      even all the xfrm config options removed in this series are
      modular.
      
      before:
          text      data     bss      dec   filename
      15731286   6936912 4046908 26715106   vmlinux.master
      
      after this series:
      15731492   6937068  4046908  26715468 vmlinux
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Reviewed-by: default avatarSabrina Dubroca <sd@queasysnail.net>
      Signed-off-by: default avatarSteffen Klassert <steffen.klassert@secunet.com>
      c9500d7b
    • Florian Westphal's avatar
      xfrm: make xfrm modes builtin · 4c145dce
      Florian Westphal authored
      after previous changes, xfrm_mode contains no function pointers anymore
      and all modules defining such struct contain no code except an init/exit
      functions to register the xfrm_mode struct with the xfrm core.
      
      Just place the xfrm modes core and remove the modules,
      the run-time xfrm_mode register/unregister functionality is removed.
      
      Before:
      
          text    data     bss      dec filename
          7523     200    2364    10087 net/xfrm/xfrm_input.o
         40003     628     440    41071 net/xfrm/xfrm_state.o
      15730338 6937080 4046908 26714326 vmlinux
      
          7389     200    2364    9953  net/xfrm/xfrm_input.o
         40574     656     440   41670  net/xfrm/xfrm_state.o
      15730084 6937068 4046908 26714060 vmlinux
      
      The xfrm*_mode_{transport,tunnel,beet} modules are gone.
      
      v2: replace CONFIG_INET6_XFRM_MODE_* IS_ENABLED guards with CONFIG_IPV6
          ones rather than removing them.
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Reviewed-by: default avatarSabrina Dubroca <sd@queasysnail.net>
      Signed-off-by: default avatarSteffen Klassert <steffen.klassert@secunet.com>
      4c145dce
    • Florian Westphal's avatar
      xfrm: remove afinfo pointer from xfrm_mode · 733a5fac
      Florian Westphal authored
      Adds an EXPORT_SYMBOL for afinfo_get_rcu, as it will now be called from
      ipv6 in case of CONFIG_IPV6=m.
      
      This change has virtually no effect on vmlinux size, but it reduces
      afinfo size and allows followup patch to make xfrm modes const.
      
      v2: mark if (afinfo) tests as likely (Sabrina)
          re-fetch afinfo according to inner_mode in xfrm_prepare_input().
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Reviewed-by: default avatarSabrina Dubroca <sd@queasysnail.net>
      Signed-off-by: default avatarSteffen Klassert <steffen.klassert@secunet.com>
      733a5fac
    • Florian Westphal's avatar
      xfrm: remove output2 indirection from xfrm_mode · 1de70830
      Florian Westphal authored
      similar to previous patch: no external module dependencies,
      so we can avoid the indirection by placing this in the core.
      
      This change removes the last indirection from xfrm_mode and the
      xfrm4|6_mode_{beet,tunnel}.c modules contain (almost) no code anymore.
      
      Before:
         text    data     bss     dec     hex filename
         3957     136       0    4093     ffd net/xfrm/xfrm_output.o
          587      44       0     631     277 net/ipv4/xfrm4_mode_beet.o
          649      32       0     681     2a9 net/ipv4/xfrm4_mode_tunnel.o
          625      44       0     669     29d net/ipv6/xfrm6_mode_beet.o
          599      32       0     631     277 net/ipv6/xfrm6_mode_tunnel.o
      After:
         text    data     bss     dec     hex filename
         5359     184       0    5543    15a7 net/xfrm/xfrm_output.o
          171      24       0     195      c3 net/ipv4/xfrm4_mode_beet.o
          171      24       0     195      c3 net/ipv4/xfrm4_mode_tunnel.o
          172      24       0     196      c4 net/ipv6/xfrm6_mode_beet.o
          172      24       0     196      c4 net/ipv6/xfrm6_mode_tunnel.o
      
      v2: fold the *encap_add functions into xfrm*_prepare_output
          preserve (move) output2 comment (Sabrina)
          use x->outer_mode->encap, not inner
          fix a build breakage on ppc (kbuild robot)
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Reviewed-by: default avatarSabrina Dubroca <sd@queasysnail.net>
      Signed-off-by: default avatarSteffen Klassert <steffen.klassert@secunet.com>
      1de70830
    • Florian Westphal's avatar
      xfrm: remove input2 indirection from xfrm_mode · b3284df1
      Florian Westphal authored
      No external dependencies on any module, place this in the core.
      Increase is about 1800 byte for xfrm_input.o.
      
      The beet helpers get added to internal header, as they can be reused
      from xfrm_output.c in the next patch (kernel contains several
      copies of them in the xfrm{4,6}_mode_beet.c files).
      
      Before:
         text    data     bss     dec filename
         5578     176    2364    8118 net/xfrm/xfrm_input.o
         1180      64       0    1244 net/ipv4/xfrm4_mode_beet.o
          171      40       0     211 net/ipv4/xfrm4_mode_transport.o
         1163      40       0    1203 net/ipv4/xfrm4_mode_tunnel.o
         1083      52       0    1135 net/ipv6/xfrm6_mode_beet.o
          172      40       0     212 net/ipv6/xfrm6_mode_ro.o
          172      40       0     212 net/ipv6/xfrm6_mode_transport.o
         1056      40       0    1096 net/ipv6/xfrm6_mode_tunnel.o
      
      After:
         text    data     bss     dec filename
         7373     200    2364    9937 net/xfrm/xfrm_input.o
          587      44       0     631 net/ipv4/xfrm4_mode_beet.o
          171      32       0     203 net/ipv4/xfrm4_mode_transport.o
          649      32       0     681 net/ipv4/xfrm4_mode_tunnel.o
          625      44       0     669 net/ipv6/xfrm6_mode_beet.o
          172      32       0     204 net/ipv6/xfrm6_mode_ro.o
          172      32       0     204 net/ipv6/xfrm6_mode_transport.o
          599      32       0     631 net/ipv6/xfrm6_mode_tunnel.o
      
      v2: pass inner_mode to xfrm_inner_mode_encap_remove to fix
          AF_UNSPEC selector breakage (bisected by Benedict Wong)
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Reviewed-by: default avatarSabrina Dubroca <sd@queasysnail.net>
      Signed-off-by: default avatarSteffen Klassert <steffen.klassert@secunet.com>
      b3284df1
    • Florian Westphal's avatar
      xfrm: remove gso_segment indirection from xfrm_mode · 7613b92b
      Florian Westphal authored
      These functions are small and we only have versions for tunnel
      and transport mode for ipv4 and ipv6 respectively.
      
      Just place the 'transport or tunnel' conditional in the protocol
      specific function instead of using an indirection.
      
      Before:
          3226       12       0     3238   net/ipv4/esp4_offload.o
          7004      492       0     7496   net/ipv4/ip_vti.o
          3339       12       0     3351   net/ipv6/esp6_offload.o
         11294      460       0    11754   net/ipv6/ip6_vti.o
          1180       72       0     1252   net/ipv4/xfrm4_mode_beet.o
           428       48       0      476   net/ipv4/xfrm4_mode_transport.o
          1271       48       0     1319   net/ipv4/xfrm4_mode_tunnel.o
          1083       60       0     1143   net/ipv6/xfrm6_mode_beet.o
           172       48       0      220   net/ipv6/xfrm6_mode_ro.o
           429       48       0      477   net/ipv6/xfrm6_mode_transport.o
          1164       48       0     1212   net/ipv6/xfrm6_mode_tunnel.o
      15730428  6937008 4046908 26714344   vmlinux
      
      After:
          3461       12       0     3473   net/ipv4/esp4_offload.o
          7000      492       0     7492   net/ipv4/ip_vti.o
          3574       12       0     3586   net/ipv6/esp6_offload.o
         11295      460       0    11755   net/ipv6/ip6_vti.o
          1180       64       0     1244   net/ipv4/xfrm4_mode_beet.o
           171       40       0      211   net/ipv4/xfrm4_mode_transport.o
          1163       40       0     1203   net/ipv4/xfrm4_mode_tunnel.o
          1083       52       0     1135   net/ipv6/xfrm6_mode_beet.o
           172       40       0      212   net/ipv6/xfrm6_mode_ro.o
           172       40       0      212   net/ipv6/xfrm6_mode_transport.o
          1056       40       0     1096   net/ipv6/xfrm6_mode_tunnel.o
      15730424  6937008 4046908 26714340   vmlinux
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Reviewed-by: default avatarSabrina Dubroca <sd@queasysnail.net>
      Signed-off-by: default avatarSteffen Klassert <steffen.klassert@secunet.com>
      7613b92b
    • Florian Westphal's avatar
      xfrm: remove xmit indirection from xfrm_mode · 303c5fab
      Florian Westphal authored
      There are only two versions (tunnel and transport). The ip/ipv6 versions
      are only differ in sizeof(iphdr) vs ipv6hdr.
      
      Place this in the core and use x->outer_mode->encap type to call the
      correct adjustment helper.
      
      Before:
         text   data    bss     dec      filename
      15730311  6937008 4046908 26714227 vmlinux
      
      After:
      15730428  6937008 4046908 26714344 vmlinux
      
      (about 117 byte increase)
      
      v2: use family from x->outer_mode, not inner
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Reviewed-by: default avatarSabrina Dubroca <sd@queasysnail.net>
      Signed-off-by: default avatarSteffen Klassert <steffen.klassert@secunet.com>
      303c5fab
    • Florian Westphal's avatar
      xfrm: remove output indirection from xfrm_mode · 0c620e97
      Florian Westphal authored
      Same is input indirection.  Only exception: we need to export
      xfrm_outer_mode_output for pktgen.
      
      Increases size of vmlinux by about 163 byte:
      Before:
         text    data     bss     dec      filename
      15730208  6936948 4046908 26714064   vmlinux
      
      After:
      15730311  6937008 4046908 26714227   vmlinux
      
      xfrm_inner_extract_output has no more external callers, make it static.
      
      v2: add IS_ENABLED(IPV6) guard in xfrm6_prepare_output
          add two missing breaks in xfrm_outer_mode_output (Sabrina Dubroca)
          add WARN_ON_ONCE for 'call AF_INET6 related output function, but
          CONFIG_IPV6=n' case.
          make xfrm_inner_extract_output static
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Reviewed-by: default avatarSabrina Dubroca <sd@queasysnail.net>
      Signed-off-by: default avatarSteffen Klassert <steffen.klassert@secunet.com>
      0c620e97
    • Florian Westphal's avatar
      xfrm: remove input indirection from xfrm_mode · c2d305e5
      Florian Westphal authored
      No need for any indirection or abstraction here, both functions
      are pretty much the same and quite small, they also have no external
      dependencies.
      
      xfrm_prepare_input can then be made static.
      
      With allmodconfig build, size increase of vmlinux is 25 byte:
      
      Before:
         text   data     bss     dec      filename
      15730207  6936924 4046908 26714039  vmlinux
      
      After:
      15730208  6936948 4046908 26714064 vmlinux
      
      v2: Fix INET_XFRM_MODE_TRANSPORT name in is-enabled test (Sabrina Dubroca)
          change copied comment to refer to transport and network header,
          not skb->{h,nh}, which don't exist anymore. (Sabrina)
          make xfrm_prepare_input static (Eyal Birger)
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Reviewed-by: default avatarSabrina Dubroca <sd@queasysnail.net>
      Signed-off-by: default avatarSteffen Klassert <steffen.klassert@secunet.com>
      c2d305e5
    • Florian Westphal's avatar
      xfrm: prefer family stored in xfrm_mode struct · b45714b1
      Florian Westphal authored
      Now that we have the family available directly in the
      xfrm_mode struct, we can use that and avoid one extra dereference.
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Reviewed-by: default avatarSabrina Dubroca <sd@queasysnail.net>
      Signed-off-by: default avatarSteffen Klassert <steffen.klassert@secunet.com>
      b45714b1
    • Florian Westphal's avatar
      xfrm: place af number into xfrm_mode struct · b262a695
      Florian Westphal authored
      This will be useful to know if we're supposed to decode ipv4 or ipv6.
      
      While at it, make the unregister function return void, all module_exit
      functions did just BUG(); there is never a point in doing error checks
      if there is no way to handle such error.
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Reviewed-by: default avatarSabrina Dubroca <sd@queasysnail.net>
      Signed-off-by: default avatarSteffen Klassert <steffen.klassert@secunet.com>
      b262a695
  4. 24 Mar, 2019 4 commits
  5. 22 Mar, 2019 3 commits
    • Johannes Berg's avatar
      genetlink: make policy common to family · 3b0f31f2
      Johannes Berg authored
      Since maxattr is common, the policy can't really differ sanely,
      so make it common as well.
      
      The only user that did in fact manage to make a non-common policy
      is taskstats, which has to be really careful about it (since it's
      still using a common maxattr!). This is no longer supported, but
      we can fake it using pre_doit.
      
      This reduces the size of e.g. nl80211.o (which has lots of commands):
      
         text	   data	    bss	    dec	    hex	filename
       398745	  14323	   2240	 415308	  6564c	net/wireless/nl80211.o (before)
       397913	  14331	   2240	 414484	  65314	net/wireless/nl80211.o (after)
      --------------------------------
         -832      +8       0    -824
      
      Which is obviously just 8 bytes for each command, and an added 8
      bytes for the new policy pointer. I'm not sure why the ops list is
      counted as .text though.
      
      Most of the code transformations were done using the following spatch:
          @ops@
          identifier OPS;
          expression POLICY;
          @@
          struct genl_ops OPS[] = {
          ...,
           {
          -	.policy = POLICY,
           },
          ...
          };
      
          @@
          identifier ops.OPS;
          expression ops.POLICY;
          identifier fam;
          expression M;
          @@
          struct genl_family fam = {
                  .ops = OPS,
                  .maxattr = M,
          +       .policy = POLICY,
                  ...
          };
      
      This also gets rid of devlink_nl_cmd_region_read_dumpit() accessing
      the cb->data as ops, which we want to change in a later genl patch.
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3b0f31f2
    • Heiner Kallweit's avatar
      r8169: use netif_start_queue instead of netif_wake_qeueue in rtl8169_start_xmit · 601ed4d6
      Heiner Kallweit authored
      Replace the call to netif_wake_queue in rtl8169_start_xmit with
      netif_start_queue as we don't need to actually wake up the queue since
      we are still in mid transmit so we just need to reset the bit so it
      doesn't prevent the next transmit.
      (Description shamelessly copied from a mail sent by Alex.)
      Suggested-by: default avatarAlexander Duyck <alexander.h.duyck@linux.intel.com>
      Signed-off-by: default avatarHeiner Kallweit <hkallweit1@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      601ed4d6
    • Heiner Kallweit's avatar
      net: phy: aquantia: add downshift support · 110a2432
      Heiner Kallweit authored
      Aquantia PHY's of the AQR107 family support the downshift feature.
      Add support for it as standard PHY tunable so that it can be controlled
      via ethtool.
      The AQCS109 supports a proprietary 2-pair 1Gbps mode. If two such PHY's
      are connected to each other with a 2-pair cable, they may not be able
      to establish a link if both advertise modes > 1Gbps.
      
      v2:
      - add downshift event detection
      - warn if downshift occurred
      - read downshifted rate from vendor register
      - enable downshift per default on all AQR107 family members
      Signed-off-by: default avatarHeiner Kallweit <hkallweit1@gmail.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      110a2432
  6. 21 Mar, 2019 18 commits
    • David S. Miller's avatar
      Merge branch 'Refactor-flower-classifier-to-remove-dependency-on-rtnl-lock' · 1d965c4d
      David S. Miller authored
      Vlad Buslov says:
      
      ====================
      Refactor flower classifier to remove dependency on rtnl lock
      
      Currently, all netlink protocol handlers for updating rules, actions and
      qdiscs are protected with single global rtnl lock which removes any
      possibility for parallelism. This patch set is a third step to remove
      rtnl lock dependency from TC rules update path.
      
      Recently, new rtnl registration flag RTNL_FLAG_DOIT_UNLOCKED was added.
      TC rule update handlers (RTM_NEWTFILTER, RTM_DELTFILTER, etc.) are
      already registered with this flag and only take rtnl lock when qdisc or
      classifier requires it. Classifiers can indicate that their ops
      callbacks don't require caller to hold rtnl lock by setting the
      TCF_PROTO_OPS_DOIT_UNLOCKED flag. The goal of this change is to refactor
      flower classifier to support unlocked execution and register it with
      unlocked flag.
      
      This patch set implements following changes to make flower classifier
      concurrency-safe:
      
      - Implement reference counting for individual filters. Change fl_get to
        take reference to filter. Implement tp->ops->put callback that was
        introduced in cls API patch set to release reference to flower filter.
      
      - Use tp->lock spinlock to protect internal classifier data structures
        from concurrent modification.
      
      - Handle concurrent tcf proto deletion by returning EAGAIN, which will
        cause cls API to retry and create new proto instance or return error
        to the user (depending on message type).
      
      - Handle concurrent insertion of filter with same priority and handle by
        returning EAGAIN, which will cause cls API to lookup filter again and
        process it accordingly to netlink message flags.
      
      - Extend flower mask with reference counting and protect masks list with
        masks_lock spinlock.
      
      - Prevent concurrent mask insertion by inserting temporary value to
        masks hash table. This is necessary because mask initialization is a
        sleeping operation and cannot be done while holding tp->lock.
      
      Both chain level and classifier level conflicts are resolved by
      returning -EAGAIN to cls API that results restart of whole operation.
      This retry mechanism is a result of fine-grained locking approach used
      in this and previous changes in series and is necessary to allow
      concurrent updates on same chain instance. Alternative approach would be
      to lock the whole chain while updating filters on any of child tp's,
      adding and removing classifier instances from the chain. However, since
      most CPU-intensive parts of filter update code are specifically in
      classifier code and its dependencies (extensions and hw offloads), such
      approach would negate most of the gains introduced by this change and
      previous changes in the series when updating same chain instance.
      
      Tcf hw offloads API is not changed by this patch set and still requires
      caller to hold rtnl lock. Refactored flower classifier tracks rtnl lock
      state by means of 'rtnl_held' flag provided by cls API and obtains the
      lock before calling hw offloads. Following patch set will lift this
      restriction and refactor cls hw offloads API to support unlocked
      execution.
      
      With these changes flower classifier is safely registered with
      TCF_PROTO_OPS_DOIT_UNLOCKED flag in last patch.
      
      Changes from V2 to V3:
      - Rebase on latest net-next
      
      Changes from V1 to V2:
      - Extend cover letter with explanation about retry mechanism.
      - Rebase on current net-next.
      - Patch 1:
        - Use rcu_dereference_raw() for tp->root dereference.
        - Update comment in fl_head_dereference().
      - Patch 2:
        - Remove redundant check in fl_change error handling code.
        - Add empty line between error check and new handle assignment.
      - Patch 3:
        - Refactor loop in fl_get_next_filter() to improve readability.
      - Patch 4:
        - Refactor __fl_delete() to improve readability.
      - Patch 6:
        - Fix comment in fl_check_assign_mask().
      - Patch 9:
        - Extend commit message.
        - Fix error code in comment.
      - Patch 11:
        - Fix fl_hw_replace_filter() to always release rtnl lock in error
          handlers.
      - Patch 12:
        - Don't take rtnl lock before calling __fl_destroy_filter() in
          workqueue context.
        - Extend commit message with explanation why flower still takes rtnl
          lock before calling hardware offloads API.
      
      Github: <https://github.com/vbuslov/linux/tree/unlocked-flower-cong3>
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1d965c4d
    • Vlad Buslov's avatar
      net: sched: flower: set unlocked flag for flower proto ops · 92149190
      Vlad Buslov authored
      Set TCF_PROTO_OPS_DOIT_UNLOCKED for flower classifier to indicate that its
      ops callbacks don't require caller to hold rtnl lock. Don't take rtnl lock
      in fl_destroy_filter_work() that is executed on workqueue instead of being
      called by cls API and is not affected by setting
      TCF_PROTO_OPS_DOIT_UNLOCKED. Rtnl mutex is still manually taken by flower
      classifier before calling hardware offloads API that has not been updated
      for unlocked execution.
      Signed-off-by: default avatarVlad Buslov <vladbu@mellanox.com>
      Reviewed-by: default avatarStefano Brivio <sbrivio@redhat.com>
      Acked-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      92149190
    • Vlad Buslov's avatar
      net: sched: flower: track rtnl lock state · c24e43d8
      Vlad Buslov authored
      Use 'rtnl_held' flag to track if caller holds rtnl lock. Propagate the flag
      to internal functions that need to know rtnl lock state. Take rtnl lock
      before calling tcf APIs that require it (hw offload, bind filter, etc.).
      Signed-off-by: default avatarVlad Buslov <vladbu@mellanox.com>
      Reviewed-by: default avatarStefano Brivio <sbrivio@redhat.com>
      Acked-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c24e43d8
    • Vlad Buslov's avatar
      net: sched: flower: protect flower classifier state with spinlock · 3d81e711
      Vlad Buslov authored
      struct tcf_proto was extended with spinlock to be used by classifiers
      instead of global rtnl lock. Use it to protect shared flower classifier
      data structures (handle_idr, mask hashtable and list) and fields of
      individual filters that can be accessed concurrently. This patch set uses
      tcf_proto->lock as per instance lock that protects all filters on
      tcf_proto.
      Signed-off-by: default avatarVlad Buslov <vladbu@mellanox.com>
      Reviewed-by: default avatarStefano Brivio <sbrivio@redhat.com>
      Acked-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3d81e711
    • Vlad Buslov's avatar
      net: sched: flower: handle concurrent tcf proto deletion · 272ffaad
      Vlad Buslov authored
      Without rtnl lock protection tcf proto can be deleted concurrently. Check
      tcf proto 'deleting' flag after taking tcf spinlock to verify that no
      concurrent deletion is in progress. Return EAGAIN error if concurrent
      deletion detected, which will cause caller to retry and possibly create new
      instance of tcf proto.
      
      Retry mechanism is a result of fine-grained locking approach used in this
      and previous changes in series and is necessary to allow concurrent updates
      on same chain instance. Alternative approach would be to lock the whole
      chain while updating filters on any of child tp's, adding and removing
      classifier instances from the chain. However, since most CPU-intensive
      parts of filter update code are specifically in classifier code and its
      dependencies (extensions and hw offloads), such approach would negate most
      of the gains introduced by this change and previous changes in the series
      when updating same chain instance.
      Signed-off-by: default avatarVlad Buslov <vladbu@mellanox.com>
      Reviewed-by: default avatarStefano Brivio <sbrivio@redhat.com>
      Acked-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      272ffaad
    • Vlad Buslov's avatar
      net: sched: flower: handle concurrent filter insertion in fl_change · 9a2d9389
      Vlad Buslov authored
      Check if user specified a handle and another filter with the same handle
      was inserted concurrently. Return EAGAIN to retry filter processing (in
      case it is an overwrite request).
      Signed-off-by: default avatarVlad Buslov <vladbu@mellanox.com>
      Acked-by: default avatarJiri Pirko <jiri@mellanox.com>
      Reviewed-by: default avatarStefano Brivio <sbrivio@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9a2d9389
    • Vlad Buslov's avatar
      net: sched: flower: protect masks list with spinlock · 259e60f9
      Vlad Buslov authored
      Protect modifications of flower masks list with spinlock to remove
      dependency on rtnl lock and allow concurrent access.
      Signed-off-by: default avatarVlad Buslov <vladbu@mellanox.com>
      Acked-by: default avatarJiri Pirko <jiri@mellanox.com>
      Reviewed-by: default avatarStefano Brivio <sbrivio@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      259e60f9
    • Vlad Buslov's avatar
      net: sched: flower: handle concurrent mask insertion · 195c234d
      Vlad Buslov authored
      Without rtnl lock protection masks with same key can be inserted
      concurrently. Insert temporary mask with reference count zero to masks
      hashtable. This will cause any concurrent modifications to retry.
      
      Wait for rcu grace period to complete after removing temporary mask from
      masks hashtable to accommodate concurrent readers.
      Signed-off-by: default avatarVlad Buslov <vladbu@mellanox.com>
      Acked-by: default avatarJiri Pirko <jiri@mellanox.com>
      Suggested-by: default avatarJiri Pirko <jiri@mellanox.com>
      Reviewed-by: default avatarStefano Brivio <sbrivio@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      195c234d
    • Vlad Buslov's avatar
      net: sched: flower: add reference counter to flower mask · f48ef4d5
      Vlad Buslov authored
      Extend fl_flow_mask structure with reference counter to allow parallel
      modification without relying on rtnl lock. Use rcu read lock to safely
      lookup mask and increment reference counter in order to accommodate
      concurrent deletes.
      Signed-off-by: default avatarVlad Buslov <vladbu@mellanox.com>
      Acked-by: default avatarJiri Pirko <jiri@mellanox.com>
      Reviewed-by: default avatarStefano Brivio <sbrivio@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f48ef4d5
    • Vlad Buslov's avatar
      net: sched: flower: track filter deletion with flag · b2552b8c
      Vlad Buslov authored
      In order to prevent double deletion of filter by concurrent tasks when rtnl
      lock is not used for synchronization, add 'deleted' filter field. Check
      value of this field when modifying filters and return error if concurrent
      deletion is detected.
      
      Refactor __fl_delete() to accept pointer to 'last' boolean as argument,
      and return error code as function return value instead. This is necessary
      to signal concurrent filter delete to caller.
      Signed-off-by: default avatarVlad Buslov <vladbu@mellanox.com>
      Reviewed-by: default avatarStefano Brivio <sbrivio@redhat.com>
      Acked-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b2552b8c
    • Vlad Buslov's avatar
      net: sched: flower: introduce reference counting for filters · 06177558
      Vlad Buslov authored
      Extend flower filters with reference counting in order to remove dependency
      on rtnl lock in flower ops and allow to modify filters concurrently.
      Reference to flower filter can be taken/released concurrently as soon as it
      is marked as 'unlocked' by last patch in this series. Use atomic reference
      counter type to make concurrent modifications safe.
      
      Always take reference to flower filter while working with it:
      - Modify fl_get() to take reference to filter.
      - Implement tp->put() callback as fl_put() function to allow cls API to
      release reference taken by fl_get().
      - Modify fl_change() to assume that caller holds reference to fold and take
      reference to fnew.
      - Take reference to filter while using it in fl_walk().
      
      Implement helper functions to get/put filter reference counter.
      Signed-off-by: default avatarVlad Buslov <vladbu@mellanox.com>
      Reviewed-by: default avatarStefano Brivio <sbrivio@redhat.com>
      Acked-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      06177558
    • Vlad Buslov's avatar
      net: sched: flower: refactor fl_change · 620da486
      Vlad Buslov authored
      As a preparation for using classifier spinlock instead of relying on
      external rtnl lock, rearrange code in fl_change. The goal is to group the
      code which changes classifier state in single block in order to allow
      following commits in this set to protect it from parallel modification with
      tp->lock. Data structures that require tp->lock protection are mask
      hashtable and filters list, and classifier handle_idr.
      
      fl_hw_replace_filter() is a sleeping function and cannot be called while
      holding a spinlock. In order to execute all sequence of changes to shared
      classifier data structures atomically, call fl_hw_replace_filter() before
      modifying them.
      Signed-off-by: default avatarVlad Buslov <vladbu@mellanox.com>
      Reviewed-by: default avatarStefano Brivio <sbrivio@redhat.com>
      Acked-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      620da486
    • Vlad Buslov's avatar
      net: sched: flower: don't check for rtnl on head dereference · e474619a
      Vlad Buslov authored
      Flower classifier only changes root pointer during init and destroy. Cls
      API implements reference counting for tcf_proto, so there is no danger of
      concurrent access to tp when it is being destroyed, even without protection
      provided by rtnl lock.
      
      Implement new function fl_head_dereference() to dereference tp->root
      without checking for rtnl lock. Use it in all flower function that obtain
      head pointer instead of rtnl_dereference().
      Signed-off-by: default avatarVlad Buslov <vladbu@mellanox.com>
      Reviewed-by: default avatarStefano Brivio <sbrivio@redhat.com>
      Acked-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e474619a
    • Jakub Kicinski's avatar
      nfp: remove defines for unused control bits · 31f1a0e3
      Jakub Kicinski authored
      NFP driver ABI contains bits for L2 switching which were never
      implemented in initially envisioned form.
      
      Remove the defines, and open up the possibility of
      reclaiming the bits for other uses.
      Signed-off-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Reviewed-by: default avatarDirk van der Merwe <dirk.vandermerwe@netronome.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      31f1a0e3
    • David S. Miller's avatar
      Merge branch 'rhashtable-cleanups' · 143eb9ac
      David S. Miller authored
      NeilBrown says:
      
      ====================
      Two clean-ups for rhashtable.
      
      These two patches make small improvements to
      rhashtable, but are otherwise unrelated.
      
      Thanks to Herbert, Miguel, and Paul for the review.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      143eb9ac
    • NeilBrown's avatar
      rhashtable: rename rht_for_each*continue as *from. · f7ad68bf
      NeilBrown authored
      The pattern set by list.h is that for_each..continue()
      iterators start at the next entry after the given one,
      while for_each..from() iterators start at the given
      entry.
      
      The rht_for_each*continue() iterators are documented as though the
      start at the 'next' entry, but actually start at the given entry,
      and they are used expecting that behaviour.
      So fix the documentation and change the names to *from for consistency
      with list.h
      Acked-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Acked-by: default avatarMiguel Ojeda <miguel.ojeda.sandonis@gmail.com>
      Signed-off-by: default avatarNeilBrown <neilb@suse.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f7ad68bf
    • NeilBrown's avatar
      rhashtable: don't hold lock on first table throughout insertion. · 4feb7c7a
      NeilBrown authored
      rhashtable_try_insert() currently holds a lock on the bucket in
      the first table, while also locking buckets in subsequent tables.
      This is unnecessary and looks like a hold-over from some earlier
      version of the implementation.
      
      As insert and remove always lock a bucket in each table in turn, and
      as insert only inserts in the final table, there cannot be any races
      that are not covered by simply locking a bucket in each table in turn.
      
      When an insert call reaches that last table it can be sure that there
      is no matchinf entry in any other table as it has searched them all, and
      insertion never happens anywhere but in the last table.  The fact that
      code tests for the existence of future_tbl while holding a lock on
      the relevant bucket ensures that two threads inserting the same key
      will make compatible decisions about which is the "last" table.
      
      This simplifies the code and allows the ->rehash field to be
      discarded.
      
      We still need a way to ensure that a dead bucket_table is never
      re-linked by rhashtable_walk_stop().  This can be achieved by calling
      call_rcu() inside the locked region, and checking with
      rcu_head_after_call_rcu() in rhashtable_walk_stop() to see if the
      bucket table is empty and dead.
      Acked-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Reviewed-by: default avatarPaul E. McKenney <paulmck@linux.ibm.com>
      Signed-off-by: default avatarNeilBrown <neilb@suse.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4feb7c7a
    • David S. Miller's avatar
      Merge branch 'net-phy-Move-Omega-PHY-entry-to-Cygnus-PHY-driver' · 83b038db
      David S. Miller authored
      Florian Fainelli says:
      
      ====================
      net: phy: Move Omega PHY entry to Cygnus PHY driver
      
      In order to pave the way for adding some specific Omega PHY features
      that may not be desirable on other products covered by the bcm7xxx PHY
      driver, split the Omega PHY entry into the Cygnus PHY driver such that
      the PHY drivers are reflective of product lines/business units
      maintaining them within Broadcom.
      
      No functional changes intended.
      ====================
      Acked-by: default avatarArun Parameswaran <arun.parameswaran@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      83b038db