1. 20 Aug, 2019 23 commits
    • Tony Nguyen's avatar
      ice: Do not always bring up PF VSI in ice_ena_vsi() · e6c45149
      Tony Nguyen authored
      During rebuild ice_ena_vsi() is called to recover the VSI state.
      This function assumes the PF VSI is always to be enabled, however,
      it's possible that during reset/rebuild the interface can be
      brought down.  If this occurs, we can attempt to bring up the PF
      VSI on a downed interface which can lead to various crashes. If
      the interface is not running, do not bring up the associated VSI.
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      e6c45149
    • Mitch Williams's avatar
      ice: allow empty Rx descriptors · ac6f733a
      Mitch Williams authored
      In some circumstances, the hardware will hand us a receive descriptor
      which has no data attached, but is otherwise valid. The receive code was
      improperly ignoring these descriptors, which result in an infinite loop.
      
      To fix this, change the receive code to process all descriptors,
      regardless of the size of the associated data. Add checks to the
      memory-handling functions to allow for zero size.
      Signed-off-by: default avatarMitch Williams <mitch.a.williams@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      ac6f733a
    • Usha Ketineni's avatar
      ice: Fix kernel hang with DCB reset in CEE mode · 7829570e
      Usha Ketineni authored
      This patch fixes the set local MIB AQ call failures in the DCB rebuild path
      by setting the defaults for the ETS recommended DCB configuration. Also,
      willing bits for the DCB configuration needs to be set correctly. Resets
      works fine in IEEE mode as the ETS recommended DCB configuration is
      populated but not in CEE mode.
      Without this patch, PFR causes the kernel hang in CEE mode.
      Signed-off-by: default avatarUsha Ketineni <usha.k.ketineni@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      7829570e
    • Brett Creeley's avatar
      ice: Set WB_ON_ITR when we don't re-enable interrupts · 2ab28bb0
      Brett Creeley authored
      Currently when busy polling is enabled we aren't setting/enabling
      WB_ON_ITR in the driver. This doesn't break the driver, but it does
      cause issues. If we don't enable WB_ON_ITR mode we will still get
      write-backs from hardware during polling when a cache line has been
      filled, but if a cache line is not filled we will not get the
      write-back because WB_ON_ITR is not set. Fix this by enabling
      WB_ON_ITR in the driver when interrupts are disabled.
      Signed-off-by: default avatarBrett Creeley <brett.creeley@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      2ab28bb0
    • Paul Greenwalt's avatar
      ice: fix set pause param autoneg check · f1a4a66d
      Paul Greenwalt authored
      When ETHTOOL_GLINKSETTINGS is defined get pause param pause->autoneg
      reports SW configured setting, however when not defined get pause param
      pause->autoneg reports the link status. Set pause param needs to compare
      pause->autoneg with the same source as get pause param to block the user
      from changing autoneg with the set pause param option, or the user
      may be incorrectly blocked from changing Rx|Tx pause settings.
      Signed-off-by: default avatarPaul Greenwalt <paul.greenwalt@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      f1a4a66d
    • Akeem G Abodunrin's avatar
      ice: Restructure VFs initialization flows · d82dd83d
      Akeem G Abodunrin authored
      This patch restructures how VFs are configured, and resources allocated.
      Instead of freeing resources that were never allocated, and resetting
      empty VFs that have never been created - the new flow will just allocate
      resources for number of requested VFs based on the availability.
      
      During VFs initialization process, global interrupt is disabled, and
      rearmed after getting MSIX vectors for VFs. This allows immediate mailbox
      communications, instead of delaying it till later and VFs.
      PF communications resulted to using polling instead of actual interrupt.
      The issue manifested when creating higher number of VFs (128 VFs) per PF.
      Signed-off-by: default avatarAkeem G Abodunrin <akeem.g.abodunrin@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      d82dd83d
    • Brett Creeley's avatar
      ice: Assume that more than one Rx queue is rare in ice_napi_poll · 9118fcd5
      Brett Creeley authored
      Currently we divide budget by the number of Rx queues per Rx ring
      container in ice_napi_poll even if there is only 1. This is an
      unnecessary divide for the normal case of 1 Rx ring per Rx ring
      container. Fix this by using an unlikely() call in the case where we
      actually need to divide.
      
      Also, we will always set budget_per_ring even if there are no Rx rings
      in the Rx ring container so we don't need to initialize it to 0.
      Signed-off-by: default avatarBrett Creeley <brett.creeley@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      9118fcd5
    • Brett Creeley's avatar
      ice: Use the software based tail when checking for hung Tx ring · c1ddf1f5
      Brett Creeley authored
      Currently in ice_get_tx_pending we try to read a Tx ring's tail. This is
      then compared with the software based head (next_to_clean) to determine
      if we have pending work. This will never work because reading of the Tx
      ring's tail is no longer supported. Fix this by using the software based
      tail (next_to_use) to determine if there is pending work.
      Signed-off-by: default avatarBrett Creeley <brett.creeley@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      c1ddf1f5
    • David S. Miller's avatar
      Merge tag 'wireless-drivers-next-for-davem-2019-08-19' of... · 932630fa
      David S. Miller authored
      Merge tag 'wireless-drivers-next-for-davem-2019-08-19' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next
      
      Kalle Valo says:
      
      ====================
      wireless-drivers-next patches for 5.4
      
      First set of patches for 5.4.
      
      Major changes:
      
      brcmfmac
      
      * enable 160 MHz channel support
      
      rt2x00
      
      * add support for PLANEX GW-USMicroN USB device
      
      rtw88
      
      * add Bluetooth coexistance support
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      932630fa
    • David S. Miller's avatar
      Merge branch 'sctp-support-per-endpoint-auth-and-asconf-flags' · 5483ecef
      David S. Miller authored
      Xin Long says:
      
      ====================
      sctp: support per endpoint auth and asconf flags
      
      This patchset mostly does 3 things:
      
        1. add per endpint asconf flag and use asconf flag properly
           and add SCTP_ASCONF_SUPPORTED sockopt.
        2. use auth flag properly and add SCTP_AUTH_SUPPORTED sockopt.
        3. remove the 'global feature switch' to discard chunks.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5483ecef
    • Xin Long's avatar
      sctp: remove net sctp.x_enable working as a global switch · 2f757634
      Xin Long authored
      The netns sctp feature flags shouldn't work as a global switch,
      which is mostly like a firewall/netfilter's job. Also, it will
      break asoc as it discard or accept chunks incorrectly when net
      sctp.x_enable is changed after the asoc is created.
      
      Since each type of chunk's processing function will check the
      corresp asoc's feature flag, this 'global switch' should be
      removed, and net sctp.x_enable will only work as the default
      feature flags for the future sctp sockets/endpoints.
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2f757634
    • Xin Long's avatar
      sctp: add SCTP_AUTH_SUPPORTED sockopt · 56dd525a
      Xin Long authored
      SCTP_AUTH_SUPPORTED sockopt is used to set enpoint's auth
      flag. With this feature, each endpoint will have its own
      flag for its future asoc's auth_capable, instead of netns
      auth flag.
      
      Note that when both ep's auth_enable is enabled, endpoint
      auth related data should be initialized. If asconf_enable
      is also set, SCTP_CID_ASCONF/SCTP_CID_ASCONF_ACK should
      be added into auth_chunk_list.
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      56dd525a
    • Xin Long's avatar
      sctp: add sctp_auth_init and sctp_auth_free · 03f96127
      Xin Long authored
      This patch is to factor out sctp_auth_init and sctp_auth_free
      functions, and sctp_auth_init will also be used in the next
      patch for SCTP_AUTH_SUPPORTED sockopt.
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      03f96127
    • Xin Long's avatar
      sctp: use ep and asoc auth_enable properly · 219f9ea4
      Xin Long authored
      sctp has per endpoint auth flag and per asoc auth flag, and
      the asoc one should be checked when coming to asoc and the
      endpoint one should be checked when coming to endpoint.
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      219f9ea4
    • Xin Long's avatar
      sctp: add SCTP_ASCONF_SUPPORTED sockopt · df2c71ff
      Xin Long authored
      SCTP_ASCONF_SUPPORTED sockopt is used to set enpoint's asconf
      flag. With this feature, each endpoint will have its own flag
      for its future asoc's asconf_capable, instead of netns asconf
      flag.
      
      Note that when both ep's asconf_enable and auth_enable are
      enabled, SCTP_CID_ASCONF and SCTP_CID_ASCONF_ACK should be
      added into auth_chunk_list.
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      df2c71ff
    • Xin Long's avatar
      sctp: check asoc peer.asconf_capable before processing asconf · a2eeacc8
      Xin Long authored
      asconf chunks should be dropped when the asoc doesn't support
      asconf feature.
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a2eeacc8
    • Xin Long's avatar
      sctp: not set peer.asconf_capable in sctp_association_init · bb2ded26
      Xin Long authored
      asoc->peer.asconf_capable is to be set during handshake, and its
      value should be initialized to 0. net->sctp.addip_noauth will be
      checked in sctp_process_init when processing INIT_ACK on client
      and COOKIE_ECHO on server.
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bb2ded26
    • Xin Long's avatar
      sctp: add asconf_enable in struct sctp_endpoint · 4e27428f
      Xin Long authored
      This patch is to make addip/asconf flag per endpoint,
      and its value is initialized by the per netns flag,
      net->sctp.addip_enable.
      
      It also replaces the checks of net->sctp.addip_enable
      with ep->asconf_enable in some places.
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4e27428f
    • Li RongQing's avatar
      net: remove empty inet_exit_net · af809709
      Li RongQing authored
      Pointer members of an object with static storage duration, if not
      explicitly initialized, will be initialized to a NULL pointer. The
      net namespace API checks if this pointer is not NULL before using it,
      it are safe to remove the function.
      Signed-off-by: default avatarLi RongQing <lirongqing@baidu.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      af809709
    • David S. Miller's avatar
      Merge branch 'ns-plugin-fixes' · 196640a6
      David S. Miller authored
      Vlad Buslov says:
      
      ====================
      Fix problems with using ns plugin
      
      Recent changes to plugin architecture broke some of the tests when running tdc
      without specifying a test group. Fix tests incompatible with ns plugin and
      modify tests to not reuse interface name of ns veth interface for dummy
      interface.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      196640a6
    • Vlad Buslov's avatar
      tc-testing: concurrency: wrap piped rule update commands · 14b54ac4
      Vlad Buslov authored
      Concurrent tests use several commands to update rules in parallel: 'find'
      prints names of batch files in tmp directory and pipes result to 'xargs'
      which runs instance of tc per batch file in parallel. This breaks when used
      with ns plugin that adds 'ip netns exec $NS' prefix to the command, which
      causes only first command in pipe to be executed in namespace:
      
      =====> Test e41d: Add 1M flower filters with 10 parallel tc instances
      -----> prepare stage
      ns/SubPlugin.adjust_command
      adjust_command:  stage is setup; inserting netns stuff in command [/bin/mkdir tmp] list [['/bin/mkdir', 'tmp']]
      adjust_command:  return command [ip netns exec tcut /bin/mkdir tmp]
      command "ip netns exec tcut /bin/mkdir tmp"
      ns/SubPlugin.adjust_command
      adjust_command:  stage is setup; inserting netns stuff in command [/sbin/tc qdisc add dev ens1f0 ingress] list [['/sbin/tc', 'qdisc', 'add', 'dev', 'ens1f0', 'ingress']]
      adjust_command:  return command [ip netns exec tcut /sbin/tc qdisc add dev ens1f0 ingress]
      command "ip netns exec tcut /sbin/tc qdisc add dev ens1f0 ingress"
      ns/SubPlugin.adjust_command
      adjust_command:  stage is setup; inserting netns stuff in command [./tdc_multibatch.py ens1f0 tmp 100000 10 add] list [['./tdc_multibatch.py', 'ens1f0', 'tmp', '100000', '10', 'add']]
      adjust_command:  return command [ip netns exec tcut ./tdc_multibatch.py ens1f0 tmp 100000 10 add]
      command "ip netns exec tcut ./tdc_multibatch.py ens1f0 tmp 100000 10 add"
      -----> execute stage
      ns/SubPlugin.adjust_command
      adjust_command:  stage is execute; inserting netns stuff in command [find tmp/add* -print | xargs -n 1 -P 10 /sbin/tc -b] list [['find', 'tmp/add*', '-print', '|', 'xargs', '-n', '1', '-P', '10', '/sbin/tc', '-b']
      ]
      adjust_command:  return command [ip netns exec tcut find tmp/add* -print | xargs -n 1 -P 10 /sbin/tc -b]
      command "ip netns exec tcut find tmp/add* -print | xargs -n 1 -P 10 /sbin/tc -b"
      exit: 123
      exit: 0
      Cannot find device "ens1f0"
      Cannot find device "ens1f0"
      Command failed tmp/add_0:1
      Command failed tmp/add_1:1
      Cannot find device "ens1f0"
      Command failed tmp/add_2:1
      Cannot find device "ens1f0"
      Command failed tmp/add_4:1
      Cannot find device "ens1f0"
      Command failed tmp/add_3:1
      Cannot find device "ens1f0"
      Command failed tmp/add_5:1
      Cannot find device "ens1f0"
      Command failed tmp/add_6:1
      Cannot find device "ens1f0"
      Command failed tmp/add_8:1
      Cannot find device "ens1f0"
      Command failed tmp/add_7:1
      Cannot find device "ens1f0"
      Command failed tmp/add_9:1
      
      Fix the issue by executing whole compound command in namespace by wrapping
      it in 'bash -c' invocation.
      
      Fixes: 489ce2f4 ("tc-testing: Restore original behaviour for namespaces in tdc")
      Signed-off-by: default avatarVlad Buslov <vladbu@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      14b54ac4
    • Vlad Buslov's avatar
      tc-testing: use dedicated DUMMY interface name for dummy dev · c11a99e7
      Vlad Buslov authored
      A lot of tests reuse $DEV1 veth name for naming dummy device. This causes
      problem when tdc is invoked without specifying a test group and tries to
      execute all tests. In this case tdc instantiates ns plugin, which creates
      veth pair once before running tests. However, if any of the tests that
      reuse $DEV1 run before test that depend on ns plugin, it will delete $DEV1
      as a part of teardown section:
      
      =====> Test 3b88: Delete ingress qdisc twice                                                                                                                                                             [3770/41080]
      -----> prepare stage
      ns/SubPlugin.adjust_command
      adjust_command:  stage is setup; inserting netns stuff in command [/sbin/ip link add dev v0p1 type dummy || /bin/true] list [['/sbin/ip', 'link', 'add', 'dev', 'v0p1', 'type', 'dummy', '||', '/bin/true']]
      adjust_command:  return command [ip netns exec tcut /sbin/ip link add dev v0p1 type dummy || /bin/true]
      command "ip netns exec tcut /sbin/ip link add dev v0p1 type dummy || /bin/true"
      ns/SubPlugin.adjust_command
      adjust_command:  stage is setup; inserting netns stuff in command [/sbin/tc qdisc add dev v0p1 ingress] list [['/sbin/tc', 'qdisc', 'add', 'dev', 'v0p1', 'ingress']]
      adjust_command:  return command [ip netns exec tcut /sbin/tc qdisc add dev v0p1 ingress]
      command "ip netns exec tcut /sbin/tc qdisc add dev v0p1 ingress"
      ns/SubPlugin.adjust_command
      adjust_command:  stage is setup; inserting netns stuff in command [/sbin/tc qdisc del dev v0p1 ingress] list [['/sbin/tc', 'qdisc', 'del', 'dev', 'v0p1', 'ingress']]
      adjust_command:  return command [ip netns exec tcut /sbin/tc qdisc del dev v0p1 ingress]
      command "ip netns exec tcut /sbin/tc qdisc del dev v0p1 ingress"
      -----> execute stage
      ns/SubPlugin.adjust_command
      adjust_command:  stage is execute; inserting netns stuff in command [/sbin/tc qdisc del dev v0p1 ingress] list [['/sbin/tc', 'qdisc', 'del', 'dev', 'v0p1', 'ingress']]
      adjust_command:  return command [ip netns exec tcut /sbin/tc qdisc del dev v0p1 ingress]
      command "ip netns exec tcut /sbin/tc qdisc del dev v0p1 ingress"
      -----> verify stage
      ns/SubPlugin.adjust_command
      adjust_command:  stage is verify; inserting netns stuff in command [/sbin/tc qdisc show dev v0p1] list [['/sbin/tc', 'qdisc', 'show', 'dev', 'v0p1']]
      adjust_command:  return command [ip netns exec tcut /sbin/tc qdisc show dev v0p1]
      command "ip netns exec tcut /sbin/tc qdisc show dev v0p1"
      -----> teardown stage
      ns/SubPlugin.adjust_command
      adjust_command:  stage is teardown; inserting netns stuff in command [/sbin/ip link del dev v0p1 type dummy] list [['/sbin/ip', 'link', 'del', 'dev', 'v0p1', 'type', 'dummy']]
      adjust_command:  return command [ip netns exec tcut /sbin/ip link del dev v0p1 type dummy]
      command "ip netns exec tcut /sbin/ip link del dev v0p1 type dummy"
      
      After this ns-dependent tests will fail because dev doesn't exist:
      
      =====> Test 901f: Add fw filter with prio at 32-bit maxixum
      -----> prepare stage
      ns/SubPlugin.adjust_command
      adjust_command:  stage is setup; inserting netns stuff in command [/sbin/tc qdisc add dev v0p1 ingress] list [['/sbin/tc', 'qdisc', 'add', 'dev', 'v0p1', 'ingress']]
      adjust_command:  return command [ip netns exec tcut /sbin/tc qdisc add dev v0p1 ingress]
      command "ip netns exec tcut /sbin/tc qdisc add dev v0p1 ingress"
      
      -----> prepare stage *** Could not execute: "$TC qdisc add dev $DEV1 ingress"
      
      -----> prepare stage *** Error message: "Cannot find device "v0p1"
      "
      returncode 1; expected [0]
      
      -----> prepare stage *** Aborting test run.
      
      <_io.BufferedReader name=3> *** stdout ***
      
      <_io.BufferedReader name=5> *** stderr ***
      "-----> prepare stage" did not complete successfully
      Exception <class '__main__.PluginMgrTestFail'> ('setup', None, '"-----> prepare stage" did not complete successfully') (caught in test_runner, running test 477 901f Add fw filter with prio at 32-bit maxixum stage
      setup)
      ---------------
      traceback
        File "./tdc.py", line 371, in test_runner
          res = run_one_test(pm, args, index, tidx)
        File "./tdc.py", line 272, in run_one_test
          prepare_env(args, pm, 'setup', "-----> prepare stage", tidx["setup"])
        File "./tdc.py", line 247, in prepare_env
          '"{}" did not complete successfully'.format(prefix))
      ---------------
      
      Fix the issue by introducing standalone $DUMMY config variable and
      substitute all usage of $DEV1 in tests that don't depend on ns plugin.
      
      Fixes: 489ce2f4 ("tc-testing: Restore original behaviour for namespaces in tdc")
      Signed-off-by: default avatarVlad Buslov <vladbu@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c11a99e7
    • Hayes Wang's avatar
      r8152: fix accessing skb after napi_gro_receive · 6636fb31
      Hayes Wang authored
      Fix accessing skb after napi_gro_receive which is caused by
      commit 47922fcd ("r8152: support skb_add_rx_frag").
      
      Fixes: 47922fcd ("r8152: support skb_add_rx_frag")
      Signed-off-by: default avatarHayes Wang <hayeswang@realtek.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6636fb31
  2. 19 Aug, 2019 7 commits
    • David S. Miller's avatar
      Merge branch 'RTL8125-EEE' · 44b3769b
      David S. Miller authored
      Heiner Kallweit says:
      
      ====================
      net: phy: realtek: support NBase-T MMD EEE registers on RTL8125
      
      Add missing EEE-related constants, including the new MMD EEE registers
      for NBase-T / 802.3bz. Based on that emulate the new 802.3bz MMD EEE
      registers for 2.5Gbps EEE on RTL8125.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      44b3769b
    • Heiner Kallweit's avatar
      net: phy: realtek: support NBase-T MMD EEE registers on RTL8125 · edde25e5
      Heiner Kallweit authored
      Emulate the 802.3bz MMD EEE registers for 2.5Gbps EEE on RTL8125.
      Signed-off-by: default avatarHeiner Kallweit <hkallweit1@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      edde25e5
    • Heiner Kallweit's avatar
      net: phy: add EEE-related constants · 99b60d56
      Heiner Kallweit authored
      Add EEE-related constants. This includes the new MMD EEE registers for
      NBase-T / 802.3bz.
      Signed-off-by: default avatarHeiner Kallweit <hkallweit1@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      99b60d56
    • Vlad Buslov's avatar
      net: flow_offload: convert block_ing_cb_list to regular list type · 607f625b
      Vlad Buslov authored
      RCU list block_ing_cb_list is protected by rcu read lock in
      flow_block_ing_cmd() and with flow_indr_block_ing_cb_lock mutex in all
      functions that use it. However, flow_block_ing_cmd() needs to call blocking
      functions while iterating block_ing_cb_list which leads to following
      suspicious RCU usage warning:
      
      [  401.510948] =============================
      [  401.510952] WARNING: suspicious RCU usage
      [  401.510993] 5.3.0-rc3+ #589 Not tainted
      [  401.510996] -----------------------------
      [  401.511001] include/linux/rcupdate.h:265 Illegal context switch in RCU read-side critical section!
      [  401.511004]
                     other info that might help us debug this:
      
      [  401.511008]
                     rcu_scheduler_active = 2, debug_locks = 1
      [  401.511012] 7 locks held by test-ecmp-add-v/7576:
      [  401.511015]  #0: 00000000081d71a5 (sb_writers#4){.+.+}, at: vfs_write+0x166/0x1d0
      [  401.511037]  #1: 000000002bd338c3 (&of->mutex){+.+.}, at: kernfs_fop_write+0xef/0x1b0
      [  401.511051]  #2: 00000000c921c634 (kn->count#317){.+.+}, at: kernfs_fop_write+0xf7/0x1b0
      [  401.511062]  #3: 00000000a19cdd56 (&dev->mutex){....}, at: sriov_numvfs_store+0x6b/0x130
      [  401.511079]  #4: 000000005425fa52 (pernet_ops_rwsem){++++}, at: unregister_netdevice_notifier+0x30/0x140
      [  401.511092]  #5: 00000000c5822793 (rtnl_mutex){+.+.}, at: unregister_netdevice_notifier+0x35/0x140
      [  401.511101]  #6: 00000000c2f3507e (rcu_read_lock){....}, at: flow_block_ing_cmd+0x5/0x130
      [  401.511115]
                     stack backtrace:
      [  401.511121] CPU: 21 PID: 7576 Comm: test-ecmp-add-v Not tainted 5.3.0-rc3+ #589
      [  401.511124] Hardware name: Supermicro SYS-2028TP-DECR/X10DRT-P, BIOS 2.0b 03/30/2017
      [  401.511127] Call Trace:
      [  401.511138]  dump_stack+0x85/0xc0
      [  401.511146]  ___might_sleep+0x100/0x180
      [  401.511154]  __mutex_lock+0x5b/0x960
      [  401.511162]  ? find_held_lock+0x2b/0x80
      [  401.511173]  ? __tcf_get_next_chain+0x1d/0xb0
      [  401.511179]  ? mark_held_locks+0x49/0x70
      [  401.511194]  ? __tcf_get_next_chain+0x1d/0xb0
      [  401.511198]  __tcf_get_next_chain+0x1d/0xb0
      [  401.511251]  ? uplink_rep_async_event+0x70/0x70 [mlx5_core]
      [  401.511261]  tcf_block_playback_offloads+0x39/0x160
      [  401.511276]  tcf_block_setup+0x1b0/0x240
      [  401.511312]  ? mlx5e_rep_indr_setup_tc_cb+0xca/0x290 [mlx5_core]
      [  401.511347]  ? mlx5e_rep_indr_tc_block_unbind+0x50/0x50 [mlx5_core]
      [  401.511359]  tc_indr_block_get_and_ing_cmd+0x11b/0x1e0
      [  401.511404]  ? mlx5e_rep_indr_tc_block_unbind+0x50/0x50 [mlx5_core]
      [  401.511414]  flow_block_ing_cmd+0x7e/0x130
      [  401.511453]  ? mlx5e_rep_indr_tc_block_unbind+0x50/0x50 [mlx5_core]
      [  401.511462]  __flow_indr_block_cb_unregister+0x7f/0xf0
      [  401.511502]  mlx5e_nic_rep_netdevice_event+0x75/0xb0 [mlx5_core]
      [  401.511513]  unregister_netdevice_notifier+0xe9/0x140
      [  401.511554]  mlx5e_cleanup_rep_tx+0x6f/0xe0 [mlx5_core]
      [  401.511597]  mlx5e_detach_netdev+0x4b/0x60 [mlx5_core]
      [  401.511637]  mlx5e_vport_rep_unload+0x71/0xc0 [mlx5_core]
      [  401.511679]  esw_offloads_disable+0x5b/0x90 [mlx5_core]
      [  401.511724]  mlx5_eswitch_disable.cold+0xdf/0x176 [mlx5_core]
      [  401.511759]  mlx5_device_disable_sriov+0xab/0xb0 [mlx5_core]
      [  401.511794]  mlx5_core_sriov_configure+0xaf/0xd0 [mlx5_core]
      [  401.511805]  sriov_numvfs_store+0xf8/0x130
      [  401.511817]  kernfs_fop_write+0x122/0x1b0
      [  401.511826]  vfs_write+0xdb/0x1d0
      [  401.511835]  ksys_write+0x65/0xe0
      [  401.511847]  do_syscall_64+0x5c/0xb0
      [  401.511857]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
      [  401.511862] RIP: 0033:0x7fad892d30f8
      [  401.511868] Code: 89 02 48 c7 c0 ff ff ff ff eb bb 0f 1f 80 00 00 00 00 f3 0f 1e fa 48 8d 05 25 96 0d 00 8b 00 85 c0 75 17 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 60 c3 0f 1f 80 00 00 00 00 48 83
       ec 28 48 89
      [  401.511871] RSP: 002b:00007ffca2a9fad8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
      [  401.511875] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007fad892d30f8
      [  401.511878] RDX: 0000000000000002 RSI: 000055afeb072a90 RDI: 0000000000000001
      [  401.511881] RBP: 000055afeb072a90 R08: 00000000ffffffff R09: 000000000000000a
      [  401.511884] R10: 000055afeb058710 R11: 0000000000000246 R12: 0000000000000002
      [  401.511887] R13: 00007fad893a8780 R14: 0000000000000002 R15: 00007fad893a3740
      
      To fix the described incorrect RCU usage, convert block_ing_cb_list from
      RCU list to regular list and protect it with flow_indr_block_ing_cb_lock
      mutex in flow_block_ing_cmd().
      
      Fixes: 1150ab0f ("flow_offload: support get multi-subsystem block")
      Signed-off-by: default avatarVlad Buslov <vladbu@mellanox.com>
      Acked-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Acked-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      607f625b
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 446bf64b
      David S. Miller authored
      Merge conflict of mlx5 resolved using instructions in merge
      commit 9566e650.
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      446bf64b
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 06821504
      Linus Torvalds authored
      Pull networking fixes from David Miller:
      
        1) Fix jmp to 1st instruction in x64 JIT, from Alexei Starovoitov.
      
        2) Severl kTLS fixes in mlx5 driver, from Tariq Toukan.
      
        3) Fix severe performance regression due to lack of SKB coalescing of
           fragments during local delivery, from Guillaume Nault.
      
        4) Error path memory leak in sch_taprio, from Ivan Khoronzhuk.
      
        5) Fix batched events in skbedit packet action, from Roman Mashak.
      
        6) Propagate VLAN TX offload to hw_enc_features in bond and team
           drivers, from Yue Haibing.
      
        7) RXRPC local endpoint refcounting fix and read after free in
           rxrpc_queue_local(), from David Howells.
      
        8) Fix endian bug in ibmveth multicast list handling, from Thomas
           Falcon.
      
        9) Oops, make nlmsg_parse() wrap around the correct function,
           __nlmsg_parse not __nla_parse(). Fix from David Ahern.
      
       10) Memleak in sctp_scend_reset_streams(), fro Zheng Bin.
      
       11) Fix memory leak in cxgb4, from Wenwen Wang.
      
       12) Yet another race in AF_PACKET, from Eric Dumazet.
      
       13) Fix false detection of retransmit failures in tipc, from Tuong
           Lien.
      
       14) Use after free in ravb_tstamp_skb, from Tho Vu.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (101 commits)
        ravb: Fix use-after-free ravb_tstamp_skb
        netfilter: nf_tables: map basechain priority to hardware priority
        net: sched: use major priority number as hardware priority
        wimax/i2400m: fix a memory leak bug
        net: cavium: fix driver name
        ibmvnic: Unmap DMA address of TX descriptor buffers after use
        bnxt_en: Fix to include flow direction in L2 key
        bnxt_en: Use correct src_fid to determine direction of the flow
        bnxt_en: Suppress HWRM errors for HWRM_NVM_GET_VARIABLE command
        bnxt_en: Fix handling FRAG_ERR when NVM_INSTALL_UPDATE cmd fails
        bnxt_en: Improve RX doorbell sequence.
        bnxt_en: Fix VNIC clearing logic for 57500 chips.
        net: kalmia: fix memory leaks
        cx82310_eth: fix a memory leak bug
        bnx2x: Fix VF's VLAN reconfiguration in reload.
        Bluetooth: Add debug setting for changing minimum encryption key size
        tipc: fix false detection of retransmit failures
        lan78xx: Fix memory leaks
        MAINTAINERS: r8169: Update path to the driver
        MAINTAINERS: PHY LIBRARY: Update files in the record
        ...
      06821504
    • David Howells's avatar
      keys: Fix description size · 555df336
      David Howells authored
      The maximum key description size is 4095.  Commit f771fde8 ("keys:
      Simplify key description management") inadvertantly reduced that to 255
      and made sizes between 256 and 4095 work weirdly, and any size whereby
      size & 255 == 0 would cause an assertion in __key_link_begin() at the
      following line:
      
      	BUG_ON(index_key->desc_len == 0);
      
      This can be fixed by simply increasing the size of desc_len in struct
      keyring_index_key to a u16.
      
      Note the argument length test in keyutils only checked empty
      descriptions and descriptions with a size around the limit (ie.  4095)
      and not for all the values in between, so it missed this.  This has been
      addressed and
      
      	https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/keyutils.git/commit/?id=066bf56807c26cd3045a25f355b34c1d8a20a5aa
      
      now exhaustively tests all possible lengths of type, description and
      payload and then some.
      
      The assertion failure looks something like:
      
       kernel BUG at security/keys/keyring.c:1245!
       ...
       RIP: 0010:__key_link_begin+0x88/0xa0
       ...
       Call Trace:
        key_create_or_update+0x211/0x4b0
        __x64_sys_add_key+0x101/0x200
        do_syscall_64+0x5b/0x1e0
        entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      It can be triggered by:
      
      	keyctl add user "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa" a @s
      
      Fixes: f771fde8 ("keys: Simplify key description management")
      Reported-by: default avatarkernel test robot <rong.a.chen@intel.com>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      555df336
  3. 18 Aug, 2019 10 commits