1. 19 May, 2022 1 commit
    • Jakub Kicinski's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf · 7dc02d7f
      Jakub Kicinski authored
      Pablo Neira Ayuso says:
      
      ====================
      Netfilter fixes for net
      
      1) Reduce number of hardware offload retries from flowtable datapath
         which might hog system with retries, from Felix Fietkau.
      
      2) Skip neighbour lookup for PPPoE device, fill_forward_path() already
         provides this and set on destination address from fill_forward_path for
         PPPoE device, also from Felix.
      
      4) When combining PPPoE on top of a VLAN device, set info->outdev to the
         PPPoE device so software offload works, from Felix.
      
      5) Fix TCP teardown flowtable state, races with conntrack gc might result
         in resetting the state to ESTABLISHED and the time to one day. Joint
         work with Oz Shlomo and Sven Auhagen.
      
      6) Call dst_check() from flowtable datapath to check if dst is stale
         instead of doing it from garbage collector path.
      
      7) Disable register tracking infrastructure, either user-space or
         kernel need to pre-fetch keys inconditionally, otherwise register
         tracking assumes data is already available in register that might
         not well be there, leading to incorrect reductions.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
        netfilter: nf_tables: disable expression reduction infra
        netfilter: flowtable: move dst_check to packet path
        netfilter: flowtable: fix TCP flow teardown
        netfilter: nft_flow_offload: fix offload with pppoe + vlan
        net: fix dev_fill_forward_path with pppoe + bridge
        netfilter: nft_flow_offload: skip dst neigh lookup for ppp devices
        netfilter: flowtable: fix excessive hw offload attempts after failure
      ====================
      
      Link: https://lore.kernel.org/r/20220518213841.359653-1-pablo@netfilter.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      7dc02d7f
  2. 18 May, 2022 27 commits
    • Pablo Neira Ayuso's avatar
      netfilter: nf_tables: disable expression reduction infra · 9e539c5b
      Pablo Neira Ayuso authored
      Either userspace or kernelspace need to pre-fetch keys inconditionally
      before comparisons for this to work. Otherwise, register tracking data
      is misleading and it might result in reducing expressions which are not
      yet registers.
      
      First expression is also guaranteed to be evaluated always, however,
      certain expressions break before writing data to registers, before
      comparing the data, leaving the register in undetermined state.
      
      This patch disables this infrastructure by now.
      
      Fixes: b2d30654 ("netfilter: nf_tables: do not reduce read-only expressions")
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      9e539c5b
    • Ritaro Takenaka's avatar
      netfilter: flowtable: move dst_check to packet path · 2738d9d9
      Ritaro Takenaka authored
      Fixes sporadic IPv6 packet loss when flow offloading is enabled.
      
      IPv6 route GC and flowtable GC are not synchronized.
      When dst_cache becomes stale and a packet passes through the flow before
      the flowtable GC teardowns it, the packet can be dropped.
      So, it is necessary to check dst every time in packet path.
      
      Fixes: 227e1e4d ("netfilter: nf_flowtable: skip device lookup from interface index")
      Signed-off-by: default avatarRitaro Takenaka <ritarot634@gmail.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      2738d9d9
    • Pablo Neira Ayuso's avatar
      netfilter: flowtable: fix TCP flow teardown · e5eaac2b
      Pablo Neira Ayuso authored
      This patch addresses three possible problems:
      
      1. ct gc may race to undo the timeout adjustment of the packet path, leaving
         the conntrack entry in place with the internal offload timeout (one day).
      
      2. ct gc removes the ct because the IPS_OFFLOAD_BIT is not set and the CLOSE
         timeout is reached before the flow offload del.
      
      3. tcp ct is always set to ESTABLISHED with a very long timeout
         in flow offload teardown/delete even though the state might be already
         CLOSED. Also as a remark we cannot assume that the FIN or RST packet
         is hitting flow table teardown as the packet might get bumped to the
         slow path in nftables.
      
      This patch resets IPS_OFFLOAD_BIT from flow_offload_teardown(), so
      conntrack handles the tcp rst/fin packet which triggers the CLOSE/FIN
      state transition.
      
      Moreover, teturn the connection's ownership to conntrack upon teardown
      by clearing the offload flag and fixing the established timeout value.
      The flow table GC thread will asynchonrnously free the flow table and
      hardware offload entries.
      
      Before this patch, the IPS_OFFLOAD_BIT remained set for expired flows on
      which is also misleading since the flow is back to classic conntrack
      path.
      
      If nf_ct_delete() removes the entry from the conntrack table, then it
      calls nf_ct_put() which decrements the refcnt. This is not a problem
      because the flowtable holds a reference to the conntrack object from
      flow_offload_alloc() path which is released via flow_offload_free().
      
      This patch also updates nft_flow_offload to skip packets in SYN_RECV
      state. Since we might miss or bump packets to slow path, we do not know
      what will happen there while we are still in SYN_RECV, this patch
      postpones offload up to the next packet which also aligns to the
      existing behaviour in tc-ct.
      
      flow_offload_teardown() does not reset the existing tcp state from
      flow_offload_fixup_tcp() to ESTABLISHED anymore, packets bump to slow
      path might have already update the state to CLOSE/FIN.
      
      Joint work with Oz and Sven.
      
      Fixes: 1e5b2471 ("netfilter: nf_flow_table: teardown flow timeout race")
      Signed-off-by: default avatarOz Shlomo <ozsh@nvidia.com>
      Signed-off-by: default avatarSven Auhagen <sven.auhagen@voleatech.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      e5eaac2b
    • Joel Stanley's avatar
      net: ftgmac100: Disable hardware checksum on AST2600 · 6fd45e79
      Joel Stanley authored
      The AST2600 when using the i210 NIC over NC-SI has been observed to
      produce incorrect checksum results with specific MTU values. This was
      first observed when sending data across a long distance set of networks.
      
      On a local network, the following test was performed using a 1MB file of
      random data.
      
      On the receiver run this script:
      
       #!/bin/bash
       while [ 1 ]; do
              # Zero the stats
              nstat -r  > /dev/null
              nc -l 9899 > test-file
              # Check for checksum errors
              TcpInCsumErrors=$(nstat | grep TcpInCsumErrors)
              if [ -z "$TcpInCsumErrors" ]; then
                      echo No TcpInCsumErrors
              else
                      echo TcpInCsumErrors = $TcpInCsumErrors
              fi
       done
      
      On an AST2600 system:
      
       # nc <IP of  receiver host> 9899 < test-file
      
      The test was repeated with various MTU values:
      
       # ip link set mtu 1410 dev eth0
      
      The observed results:
      
       1500 - good
       1434 - bad
       1400 - good
       1410 - bad
       1420 - good
      
      The test was repeated after disabling tx checksumming:
      
       # ethtool -K eth0 tx-checksumming off
      
      And all MTU values tested resulted in transfers without error.
      
      An issue with the driver cannot be ruled out, however there has been no
      bug discovered so far.
      
      David has done the work to take the original bug report of slow data
      transfer between long distance connections and triaged it down to this
      test case.
      
      The vendor suspects this this is a hardware issue when using NC-SI. The
      fixes line refers to the patch that introduced AST2600 support.
      Reported-by: default avatarDavid Wilder <wilder@us.ibm.com>
      Reviewed-by: default avatarDylan Hung <dylan_hung@aspeedtech.com>
      Signed-off-by: default avatarJoel Stanley <joel@jms.id.au>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6fd45e79
    • Kevin Mitchell's avatar
      igb: skip phy status check where unavailable · 942d2ad5
      Kevin Mitchell authored
      igb_read_phy_reg() will silently return, leaving phy_data untouched, if
      hw->ops.read_reg isn't set. Depending on the uninitialized value of
      phy_data, this led to the phy status check either succeeding immediately
      or looping continuously for 2 seconds before emitting a noisy err-level
      timeout. This message went out to the console even though there was no
      actual problem.
      
      Instead, first check if there is read_reg function pointer. If not,
      proceed without trying to check the phy status register.
      
      Fixes: b72f3f72 ("igb: When GbE link up, wait for Remote receiver status condition")
      Signed-off-by: default avatarKevin Mitchell <kevmitch@arista.com>
      Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel)
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      942d2ad5
    • Lin Ma's avatar
      nfc: pn533: Fix buggy cleanup order · b8cedb70
      Lin Ma authored
      When removing the pn533 device (i2c or USB), there is a logic error. The
      original code first cancels the worker (flush_delayed_work) and then
      destroys the workqueue (destroy_workqueue), leaving the timer the last
      one to be deleted (del_timer). This result in a possible race condition
      in a multi-core preempt-able kernel. That is, if the cleanup
      (pn53x_common_clean) is concurrently run with the timer handler
      (pn533_listen_mode_timer), the timer can queue the poll_work to the
      already destroyed workqueue, causing use-after-free.
      
      This patch reorder the cleanup: it uses the del_timer_sync to make sure
      the handler is finished before the routine will destroy the workqueue.
      Note that the timer cannot be activated by the worker again.
      
      static void pn533_wq_poll(struct work_struct *work)
      ...
       rc = pn533_send_poll_frame(dev);
       if (rc)
         return;
      
       if (cur_mod->len == 0 && dev->poll_mod_count > 1)
         mod_timer(&dev->listen_timer, ...);
      
      That is, the mod_timer can be called only when pn533_send_poll_frame()
      returns no error, which is impossible because the device is detaching
      and the lower driver should return ENODEV code.
      Signed-off-by: default avatarLin Ma <linma@zju.edu.cn>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b8cedb70
    • David S. Miller's avatar
      Merge branch 'mptcp-checksums' · 575fb4fb
      David S. Miller authored
      Mat Martineau says:
      
      ====================
      mptcp: Fix checksum byte order on little-endian
      
      These patches address a bug in the byte ordering of MPTCP checksums on
      little-endian architectures. The __sum16 type is always big endian, but
      was being cast to u16 and then byte-swapped (on little-endian archs)
      when reading/writing the checksum field in MPTCP option headers.
      
      MPTCP checksums are off by default, but are enabled if one or both peers
      request it in the SYN/SYNACK handshake.
      
      The corrected code is verified to interoperate between big-endian and
      little-endian machines.
      
      Patch 1 fixes the checksum byte order, patch 2 partially mitigates
      interoperation with peers sending bad checksums by falling back to TCP
      instead of resetting the connection.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      575fb4fb
    • Mat Martineau's avatar
      mptcp: Do TCP fallback on early DSS checksum failure · ae66fb2b
      Mat Martineau authored
      RFC 8684 section 3.7 describes several opportunities for a MPTCP
      connection to "fall back" to regular TCP early in the connection
      process, before it has been confirmed that MPTCP options can be
      successfully propagated on all SYN, SYN/ACK, and data packets. If a peer
      acknowledges the first received data packet with a regular TCP header
      (no MPTCP options), fallback is allowed.
      
      If the recipient of that first data packet finds a MPTCP DSS checksum
      error, this provides an opportunity to fail gracefully with a TCP
      fallback rather than resetting the connection (as might happen if a
      checksum failure were detected later).
      
      This commit modifies the checksum failure code to attempt fallback on
      the initial subflow of a MPTCP connection, only if it's a failure in the
      first data mapping. In cases where the peer initiates the connection,
      requests checksums, is the first to send data, and the peer is sending
      incorrect checksums (see
      https://github.com/multipath-tcp/mptcp_net-next/issues/275), this allows
      the connection to proceed as TCP rather than reset.
      
      Fixes: dd8bcd17 ("mptcp: validate the data checksum")
      Acked-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ae66fb2b
    • Paolo Abeni's avatar
      mptcp: fix checksum byte order · ba2c89e0
      Paolo Abeni authored
      The MPTCP code typecasts the checksum value to u16 and
      then converts it to big endian while storing the value into
      the MPTCP option.
      
      As a result, the wire encoding for little endian host is
      wrong, and that causes interoperabilty interoperability
      issues with other implementation or host with different endianness.
      
      Address the issue writing in the packet the unmodified __sum16 value.
      
      MPTCP checksum is disabled by default, interoperating with systems
      with bad mptcp-level csum encoding should cause fallback to TCP.
      
      Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/275
      Fixes: c5b39e26 ("mptcp: send out checksum for DSS")
      Fixes: 390b95a5 ("mptcp: receive checksum for DSS")
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ba2c89e0
    • David S. Miller's avatar
      Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue · 680b8926
      David S. Miller authored
      Tony Nguyen says:
      
      ====================
      Intel Wired LAN Driver Updates 2022-05-17
      
      This series contains updates to ice driver only.
      
      Arkadiusz prevents writing of timestamps when rings are being
      configured to resolve null pointer dereference.
      
      Paul changes a delayed call to baseline statistics to occur immediately
      which was causing misreporting of statistics due to the delay.
      
      Michal fixes incorrect restoration of interrupt moderation settings.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      680b8926
    • David S. Miller's avatar
      Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec · 089403a3
      David S. Miller authored
      Steffen Klassert says:
      
      ====================
      pull request (net): ipsec 2022-05-18
      
      1) Fix "disable_policy" flag use when arriving from different devices.
         From Eyal Birger.
      
      2) Fix error handling of pfkey_broadcast in function pfkey_process.
         From Jiasheng Jiang.
      
      3) Check the encryption module availability consistency in pfkey.
         From Thomas Bartschies.
      
      Please pull or let me know if there are problems.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      089403a3
    • David S. Miller's avatar
      Merge tag 'mlx5-fixes-2022-05-17' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux · 765d1216
      David S. Miller authored
      Saeed Mahameed says:
      
      ====================
      mlx5 fixes 2022-05-17
      
      This series provides bug fixes to mlx5 driver.
      Please pull and let me know if there is any problem.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      765d1216
    • Thomas Bartschies's avatar
      net: af_key: check encryption module availability consistency · 015c44d7
      Thomas Bartschies authored
      Since the recent introduction supporting the SM3 and SM4 hash algos for IPsec, the kernel
      produces invalid pfkey acquire messages, when these encryption modules are disabled. This
      happens because the availability of the algos wasn't checked in all necessary functions.
      This patch adds these checks.
      Signed-off-by: default avatarThomas Bartschies <thomas.bartschies@cvk.de>
      Signed-off-by: default avatarSteffen Klassert <steffen.klassert@secunet.com>
      015c44d7
    • Jiasheng Jiang's avatar
      net: af_key: add check for pfkey_broadcast in function pfkey_process · 4dc2a5a8
      Jiasheng Jiang authored
      If skb_clone() returns null pointer, pfkey_broadcast() will
      return error.
      Therefore, it should be better to check the return value of
      pfkey_broadcast() and return error if fails.
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Signed-off-by: default avatarJiasheng Jiang <jiasheng@iscas.ac.cn>
      Signed-off-by: default avatarSteffen Klassert <steffen.klassert@secunet.com>
      4dc2a5a8
    • Shay Drory's avatar
      net/mlx5: Drain fw_reset when removing device · 16d42d31
      Shay Drory authored
      In case fw sync reset is called in parallel to device removal, device
      might stuck in the following deadlock:
               CPU 0                        CPU 1
               -----                        -----
                                        remove_one
                                         uninit_one (locks intf_state_mutex)
      mlx5_sync_reset_now_event()
      work in fw_reset->wq.
       mlx5_enter_error_state()
        mutex_lock (intf_state_mutex)
                                         cleanup_once
                                          fw_reset_cleanup()
                                           destroy_workqueue(fw_reset->wq)
      
      Drain the fw_reset WQ, and make sure no new work is being queued, before
      entering uninit_one().
      The Drain is done before devlink_unregister() since fw_reset, in some
      flows, is using devlink API devlink_remote_reload_actions_performed().
      
      Fixes: 38b9f903 ("net/mlx5: Handle sync reset request event")
      Signed-off-by: default avatarShay Drory <shayd@nvidia.com>
      Reviewed-by: default avatarMoshe Shemesh <moshe@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      16d42d31
    • Paul Blakey's avatar
      net/mlx5e: CT: Fix setting flow_source for smfs ct tuples · 04c551ba
      Paul Blakey authored
      Cited patch sets flow_source to ANY overriding the provided spec
      flow_source, avoiding the optimization done by commit c9c079b4
      ("net/mlx5: CT: Set flow source hint from provided tuple device").
      
      To fix the above, set the dr_rule flow_source from provided flow spec.
      
      Fixes: 3ee61ebb ("net/mlx5: CT: Add software steering ct flow steering provider")
      Signed-off-by: default avatarPaul Blakey <paulb@nvidia.com>
      Reviewed-by: default avatarOz Shlomo <ozsh@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      04c551ba
    • Paul Blakey's avatar
      net/mlx5e: CT: Fix support for GRE tuples · 8e1dcf49
      Paul Blakey authored
      cited commit removed support for GRE tuples when software steering was enabled.
      
      To bring back support for GRE tuples, add GRE ipv4/ipv6 matchers.
      
      Fixes: 3ee61ebb ("net/mlx5: CT: Add software steering ct flow steering provider")
      Signed-off-by: default avatarPaul Blakey <paulb@nvidia.com>
      Reviewed-by: default avatarOz Shlomo <ozsh@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      8e1dcf49
    • Gal Pressman's avatar
      net/mlx5e: Remove HW-GRO from reported features · 6bbd7230
      Gal Pressman authored
      We got reports of certain HW-GRO flows causing kernel call traces, which
      might be related to firmware. To be on the safe side, disable the
      feature for now and re-enable it once a driver/firmware fix is found.
      
      Fixes: 83439f3c ("net/mlx5e: Add HW-GRO offload")
      Signed-off-by: default avatarGal Pressman <gal@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      6bbd7230
    • Maxim Mikityanskiy's avatar
      net/mlx5e: Properly block HW GRO when XDP is enabled · b0617e7b
      Maxim Mikityanskiy authored
      HW GRO is incompatible and mutually exclusive with XDP and XSK. However,
      the needed checks are only made when enabling XDP. If HW GRO is enabled
      when XDP is already active, the command will succeed, and XDP will be
      skipped in the data path, although still enabled.
      
      This commit fixes the bug by checking the XDP and XSK status in
      mlx5e_fix_features and disabling HW GRO if XDP is enabled.
      
      Fixes: 83439f3c ("net/mlx5e: Add HW-GRO offload")
      Signed-off-by: default avatarMaxim Mikityanskiy <maximmi@nvidia.com>
      Reviewed-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      b0617e7b
    • Maxim Mikityanskiy's avatar
      net/mlx5e: Properly block LRO when XDP is enabled · cf6e34c8
      Maxim Mikityanskiy authored
      LRO is incompatible and mutually exclusive with XDP. However, the needed
      checks are only made when enabling XDP. If LRO is enabled when XDP is
      already active, the command will succeed, and XDP will be skipped in the
      data path, although still enabled.
      
      This commit fixes the bug by checking the XDP status in
      mlx5e_fix_features and disabling LRO if XDP is enabled.
      
      Fixes: 86994156 ("net/mlx5e: XDP fast RX drop bpf programs support")
      Signed-off-by: default avatarMaxim Mikityanskiy <maximmi@nvidia.com>
      Reviewed-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      cf6e34c8
    • Aya Levin's avatar
      net/mlx5e: Block rx-gro-hw feature in switchdev mode · 15a5078c
      Aya Levin authored
      When the driver is in switchdev mode and rx-gro-hw is set, the RQ needs
      special CQE handling. Till then, block setting of rx-gro-hw feature in
      switchdev mode, to avoid failure while setting the feature due to
      failure while opening the RQ.
      
      Fixes: f97d5c2a ("net/mlx5e: Add handle SHAMPO cqe support")
      Signed-off-by: default avatarAya Levin <ayal@nvidia.com>
      Reviewed-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      15a5078c
    • Maxim Mikityanskiy's avatar
      net/mlx5e: Wrap mlx5e_trap_napi_poll into rcu_read_lock · 37916974
      Maxim Mikityanskiy authored
      The body of mlx5e_napi_poll is wrapped into rcu_read_lock to be able to
      read the XDP program pointer using rcu_dereference. However, the trap RQ
      NAPI doesn't use rcu_read_lock, because the trap RQ works only in the
      non-linear mode, and mlx5e_skb_from_cqe_nonlinear, until recently,
      didn't support XDP and didn't call rcu_dereference.
      
      Starting from the cited commit, mlx5e_skb_from_cqe_nonlinear supports
      XDP and calls rcu_dereference, but mlx5e_trap_napi_poll doesn't wrap it
      into rcu_read_lock. It leads to RCU-lockdep warnings like this:
      
          WARNING: suspicious RCU usage
      
      This commit fixes the issue by adding an rcu_read_lock to
      mlx5e_trap_napi_poll, similarly to mlx5e_napi_poll.
      
      Fixes: ea5d49bd ("net/mlx5e: Add XDP multi buffer support to the non-linear legacy RQ")
      Signed-off-by: default avatarMaxim Mikityanskiy <maximmi@nvidia.com>
      Reviewed-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      37916974
    • Yevgeny Kliteynik's avatar
      net/mlx5: DR, Ignore modify TTL on RX if device doesn't support it · 785d7ed2
      Yevgeny Kliteynik authored
      When modifying TTL, packet's csum has to be recalculated.
      Due to HW issue in ConnectX-5, csum recalculation for modify
      TTL on RX is supported through a work-around that is specifically
      enabled by configuration.
      If the work-around isn't enabled, rather than adding an unsupported
      action the modify TTL action on RX should be ignored.
      Ignoring modify TTL action might result in zero actions, so in such
      cases we will not convert the match STE to modify STE, as it is done
      by FW in DMFS.
      
      This patch fixes an issue where modify TTL action was ignored both
      on RX and TX instead of only on RX.
      
      Fixes: 4ff725e1 ("net/mlx5: DR, Ignore modify TTL if device doesn't support it")
      Signed-off-by: default avatarYevgeny Kliteynik <kliteyn@nvidia.com>
      Reviewed-by: default avatarAlex Vesker <valex@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      785d7ed2
    • Shay Drory's avatar
      net/mlx5: Initialize flow steering during driver probe · b3388697
      Shay Drory authored
      Currently, software objects of flow steering are created and destroyed
      during reload flow. In case a device is unloaded, the following error
      is printed during grace period:
      
       mlx5_core 0000:00:0b.0: mlx5_fw_fatal_reporter_err_work:690:(pid 95):
          Driver is in error state. Unloading
      
      As a solution to fix use-after-free bugs, where we try to access
      these objects, when reading the value of flow_steering_mode devlink
      param[1], let's split flow steering creation and destruction into two
      routines:
          * init and cleanup: memory, cache, and pools allocation/free.
          * create and destroy: namespaces initialization and cleanup.
      
      While at it, re-order the cleanup function to mirror the init function.
      
      [1]
      Kasan trace:
      
      [  385.119849 ] BUG: KASAN: use-after-free in mlx5_devlink_fs_mode_get+0x3b/0xa0
      [  385.119849 ] Read of size 4 at addr ffff888104b79308 by task bash/291
      [  385.119849 ]
      [  385.119849 ] CPU: 1 PID: 291 Comm: bash Not tainted 5.17.0-rc1+ #2
      [  385.119849 ] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-2.fc32 04/01/2014
      [  385.119849 ] Call Trace:
      [  385.119849 ]  <TASK>
      [  385.119849 ]  dump_stack_lvl+0x6e/0x91
      [  385.119849 ]  print_address_description.constprop.0+0x1f/0x160
      [  385.119849 ]  ? mlx5_devlink_fs_mode_get+0x3b/0xa0
      [  385.119849 ]  ? mlx5_devlink_fs_mode_get+0x3b/0xa0
      [  385.119849 ]  kasan_report.cold+0x83/0xdf
      [  385.119849 ]  ? devlink_param_notify+0x20/0x190
      [  385.119849 ]  ? mlx5_devlink_fs_mode_get+0x3b/0xa0
      [  385.119849 ]  mlx5_devlink_fs_mode_get+0x3b/0xa0
      [  385.119849 ]  devlink_nl_param_fill+0x18a/0xa50
      [  385.119849 ]  ? _raw_spin_lock_irqsave+0x8d/0xe0
      [  385.119849 ]  ? devlink_flash_update_timeout_notify+0xf0/0xf0
      [  385.119849 ]  ? __wake_up_common+0x4b/0x1e0
      [  385.119849 ]  ? preempt_count_sub+0x14/0xc0
      [  385.119849 ]  ? _raw_spin_unlock_irqrestore+0x28/0x40
      [  385.119849 ]  ? __wake_up_common_lock+0xe3/0x140
      [  385.119849 ]  ? __wake_up_common+0x1e0/0x1e0
      [  385.119849 ]  ? __sanitizer_cov_trace_const_cmp8+0x27/0x80
      [  385.119849 ]  ? __rcu_read_unlock+0x48/0x70
      [  385.119849 ]  ? kasan_unpoison+0x23/0x50
      [  385.119849 ]  ? __kasan_slab_alloc+0x2c/0x80
      [  385.119849 ]  ? memset+0x20/0x40
      [  385.119849 ]  ? __sanitizer_cov_trace_const_cmp4+0x25/0x80
      [  385.119849 ]  devlink_param_notify+0xce/0x190
      [  385.119849 ]  devlink_unregister+0x92/0x2b0
      [  385.119849 ]  remove_one+0x41/0x140
      [  385.119849 ]  pci_device_remove+0x68/0x140
      [  385.119849 ]  ? pcibios_free_irq+0x10/0x10
      [  385.119849 ]  __device_release_driver+0x294/0x3f0
      [  385.119849 ]  device_driver_detach+0x82/0x130
      [  385.119849 ]  unbind_store+0x193/0x1b0
      [  385.119849 ]  ? subsys_interface_unregister+0x270/0x270
      [  385.119849 ]  drv_attr_store+0x4e/0x70
      [  385.119849 ]  ? drv_attr_show+0x60/0x60
      [  385.119849 ]  sysfs_kf_write+0xa7/0xc0
      [  385.119849 ]  kernfs_fop_write_iter+0x23a/0x2f0
      [  385.119849 ]  ? sysfs_kf_bin_read+0x160/0x160
      [  385.119849 ]  new_sync_write+0x311/0x430
      [  385.119849 ]  ? new_sync_read+0x480/0x480
      [  385.119849 ]  ? _raw_spin_lock+0x87/0xe0
      [  385.119849 ]  ? __sanitizer_cov_trace_cmp4+0x25/0x80
      [  385.119849 ]  ? security_file_permission+0x94/0xa0
      [  385.119849 ]  vfs_write+0x4c7/0x590
      [  385.119849 ]  ksys_write+0xf6/0x1e0
      [  385.119849 ]  ? __x64_sys_read+0x50/0x50
      [  385.119849 ]  ? fpregs_assert_state_consistent+0x99/0xa0
      [  385.119849 ]  do_syscall_64+0x3d/0x90
      [  385.119849 ]  entry_SYSCALL_64_after_hwframe+0x44/0xae
      [  385.119849 ] RIP: 0033:0x7fc36ef38504
      [  385.119849 ] Code: 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b3 0f 1f
      80 00 00 00 00 48 8d 05 f9 61 0d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f
      05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
      [  385.119849 ] RSP: 002b:00007ffde0ff3d08 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
      [  385.119849 ] RAX: ffffffffffffffda RBX: 000000000000000c RCX: 00007fc36ef38504
      [  385.119849 ] RDX: 000000000000000c RSI: 00007fc370521040 RDI: 0000000000000001
      [  385.119849 ] RBP: 00007fc370521040 R08: 00007fc36f00b8c0 R09: 00007fc36ee4b740
      [  385.119849 ] R10: 0000000000000000 R11: 0000000000000246 R12: 00007fc36f00a760
      [  385.119849 ] R13: 000000000000000c R14: 00007fc36f005760 R15: 000000000000000c
      [  385.119849 ]  </TASK>
      [  385.119849 ]
      [  385.119849 ] Allocated by task 65:
      [  385.119849 ]  kasan_save_stack+0x1e/0x40
      [  385.119849 ]  __kasan_kmalloc+0x81/0xa0
      [  385.119849 ]  mlx5_init_fs+0x11b/0x1160
      [  385.119849 ]  mlx5_load+0x13c/0x220
      [  385.119849 ]  mlx5_load_one+0xda/0x160
      [  385.119849 ]  mlx5_recover_device+0xb8/0x100
      [  385.119849 ]  mlx5_health_try_recover+0x2f9/0x3a1
      [  385.119849 ]  devlink_health_reporter_recover+0x75/0x100
      [  385.119849 ]  devlink_health_report+0x26c/0x4b0
      [  385.275909 ]  mlx5_fw_fatal_reporter_err_work+0x11e/0x1b0
      [  385.275909 ]  process_one_work+0x520/0x970
      [  385.275909 ]  worker_thread+0x378/0x950
      [  385.275909 ]  kthread+0x1bb/0x200
      [  385.275909 ]  ret_from_fork+0x1f/0x30
      [  385.275909 ]
      [  385.275909 ] Freed by task 65:
      [  385.275909 ]  kasan_save_stack+0x1e/0x40
      [  385.275909 ]  kasan_set_track+0x21/0x30
      [  385.275909 ]  kasan_set_free_info+0x20/0x30
      [  385.275909 ]  __kasan_slab_free+0xfc/0x140
      [  385.275909 ]  kfree+0xa5/0x3b0
      [  385.275909 ]  mlx5_unload+0x2e/0xb0
      [  385.275909 ]  mlx5_unload_one+0x86/0xb0
      [  385.275909 ]  mlx5_fw_fatal_reporter_err_work.cold+0xca/0xcf
      [  385.275909 ]  process_one_work+0x520/0x970
      [  385.275909 ]  worker_thread+0x378/0x950
      [  385.275909 ]  kthread+0x1bb/0x200
      [  385.275909 ]  ret_from_fork+0x1f/0x30
      [  385.275909 ]
      [  385.275909 ] The buggy address belongs to the object at ffff888104b79300
      [  385.275909 ]  which belongs to the cache kmalloc-128 of size 128
      [  385.275909 ] The buggy address is located 8 bytes inside of
      [  385.275909 ]  128-byte region [ffff888104b79300, ffff888104b79380)
      [  385.275909 ] The buggy address belongs to the page:
      [  385.275909 ] page:00000000de44dd39 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x104b78
      [  385.275909 ] head:00000000de44dd39 order:1 compound_mapcount:0
      [  385.275909 ] flags: 0x8000000000010200(slab|head|zone=2)
      [  385.275909 ] raw: 8000000000010200 0000000000000000 dead000000000122 ffff8881000428c0
      [  385.275909 ] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
      [  385.275909 ] page dumped because: kasan: bad access detected
      [  385.275909 ]
      [  385.275909 ] Memory state around the buggy address:
      [  385.275909 ]  ffff888104b79200: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 fc fc
      [  385.275909 ]  ffff888104b79280: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
      [  385.275909 ] >ffff888104b79300: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      [  385.275909 ]                       ^
      [  385.275909 ]  ffff888104b79380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
      [  385.275909 ]  ffff888104b79400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      [  385.275909 ]]
      
      Fixes: e890acd5 ("net/mlx5: Add devlink flow_steering_mode parameter")
      Signed-off-by: default avatarShay Drory <shayd@nvidia.com>
      Reviewed-by: default avatarMark Bloch <mbloch@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      b3388697
    • Maor Dickman's avatar
      net/mlx5: DR, Fix missing flow_source when creating multi-destination FW table · 2c5fc6cd
      Maor Dickman authored
      In order to support multiple destination FTEs with SW steering
      FW table is created with single FTE with multiple actions and
      SW steering rule forward to it. When creating this table, flow
      source isn't set according to the original FTE.
      
      Fix this by passing the original FTE flow source to the created
      FW table.
      
      Fixes: 34583bee ("net/mlx5: DR, Create multi-destination table for SW-steering use")
      Signed-off-by: default avatarMaor Dickman <maord@nvidia.com>
      Reviewed-by: default avatarYevgeny Kliteynik <kliteyn@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      2c5fc6cd
    • Duoming Zhou's avatar
      NFC: nci: fix sleep in atomic context bugs caused by nci_skb_alloc · 23dd4581
      Duoming Zhou authored
      There are sleep in atomic context bugs when the request to secure
      element of st-nci is timeout. The root cause is that nci_skb_alloc
      with GFP_KERNEL parameter is called in st_nci_se_wt_timeout which is
      a timer handler. The call paths that could trigger bugs are shown below:
      
          (interrupt context 1)
      st_nci_se_wt_timeout
        nci_hci_send_event
          nci_hci_send_data
            nci_skb_alloc(..., GFP_KERNEL) //may sleep
      
         (interrupt context 2)
      st_nci_se_wt_timeout
        nci_hci_send_event
          nci_hci_send_data
            nci_send_data
              nci_queue_tx_data_frags
                nci_skb_alloc(..., GFP_KERNEL) //may sleep
      
      This patch changes allocation mode of nci_skb_alloc from GFP_KERNEL to
      GFP_ATOMIC in order to prevent atomic context sleeping. The GFP_ATOMIC
      flag makes memory allocation operation could be used in atomic context.
      
      Fixes: ed06aeef ("nfc: st-nci: Rename st21nfcb to st-nci")
      Signed-off-by: default avatarDuoming Zhou <duoming@zju.edu.cn>
      Reviewed-by: default avatarKrzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
      Link: https://lore.kernel.org/r/20220517012530.75714-1-duoming@zju.edu.cnSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      23dd4581
    • Christophe JAILLET's avatar
      net/qla3xxx: Fix a test in ql_reset_work() · 5361448e
      Christophe JAILLET authored
      test_bit() tests if one bit is set or not.
      Here the logic seems to check of bit QL_RESET_PER_SCSI (i.e. 4) OR bit
      QL_RESET_START (i.e. 3) is set.
      
      In fact, it checks if bit 7 (4 | 3 = 7) is set, that is to say
      QL_ADAPTER_UP.
      
      This looks harmless, because this bit is likely be set, and when the
      ql_reset_work() delayed work is scheduled in ql3xxx_isr() (the only place
      that schedule this work), QL_RESET_START or QL_RESET_PER_SCSI is set.
      
      This has been spotted by smatch.
      
      Fixes: 5a4faa87 ("[PATCH] qla3xxx NIC driver")
      Signed-off-by: default avatarChristophe JAILLET <christophe.jaillet@wanadoo.fr>
      Link: https://lore.kernel.org/r/80e73e33f390001d9c0140ffa9baddf6466a41a2.1652637337.git.christophe.jaillet@wanadoo.frSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      5361448e
  3. 17 May, 2022 8 commits
    • Michal Wilczynski's avatar
      ice: Fix interrupt moderation settings getting cleared · bf13502e
      Michal Wilczynski authored
      Adaptive-rx and Adaptive-tx are interrupt moderation settings
      that can be enabled/disabled using ethtool:
      ethtool -C ethX adaptive-rx on/off adaptive-tx on/off
      
      Unfortunately those settings are getting cleared after
      changing number of queues, or in ethtool world 'channels':
      ethtool -L ethX rx 1 tx 1
      
      Clearing was happening due to introduction of bit fields
      in ice_ring_container struct. This way only itr_setting
      bits were rebuilt during ice_vsi_rebuild_set_coalesce().
      
      Introduce an anonymous struct of bitfields and create a
      union to refer to them as a single variable.
      This way variable can be easily saved and restored.
      
      Fixes: 61dc79ce ("ice: Restore interrupt throttle settings after VSI rebuild")
      Signed-off-by: default avatarMichal Wilczynski <michal.wilczynski@intel.com>
      Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel)
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      bf13502e
    • Paul Greenwalt's avatar
      ice: fix possible under reporting of ethtool Tx and Rx statistics · 31b6298f
      Paul Greenwalt authored
      The hardware statistics counters are not cleared during resets so the
      drivers first access is to initialize the baseline and then subsequent
      reads are for reporting the counters. The statistics counters are read
      during the watchdog subtask when the interface is up. If the baseline
      is not initialized before the interface is up, then there can be a brief
      window in which some traffic can be transmitted/received before the
      initial baseline reading takes place.
      
      Directly initialize ethtool statistics in driver open so the baseline will
      be initialized when the interface is up, and any dropped packets
      incremented before the interface is up won't be reported.
      
      Fixes: 28dc1b86 ("ice: ignore dropped packets during init")
      Signed-off-by: default avatarPaul Greenwalt <paul.greenwalt@intel.com>
      Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel)
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      31b6298f
    • Arkadiusz Kubalewski's avatar
      ice: fix crash when writing timestamp on RX rings · 4503cc7f
      Arkadiusz Kubalewski authored
      Do not allow to write timestamps on RX rings if PF is being configured.
      When PF is being configured RX rings can be freed or rebuilt. If at the
      same time timestamps are updated, the kernel will crash by dereferencing
      null RX ring pointer.
      
      PID: 1449   TASK: ff187d28ed658040  CPU: 34  COMMAND: "ice-ptp-0000:51"
       #0 [ff1966a94a713bb0] machine_kexec at ffffffff9d05a0be
       #1 [ff1966a94a713c08] __crash_kexec at ffffffff9d192e9d
       #2 [ff1966a94a713cd0] crash_kexec at ffffffff9d1941bd
       #3 [ff1966a94a713ce8] oops_end at ffffffff9d01bd54
       #4 [ff1966a94a713d08] no_context at ffffffff9d06bda4
       #5 [ff1966a94a713d60] __bad_area_nosemaphore at ffffffff9d06c10c
       #6 [ff1966a94a713da8] do_page_fault at ffffffff9d06cae4
       #7 [ff1966a94a713de0] page_fault at ffffffff9da0107e
          [exception RIP: ice_ptp_update_cached_phctime+91]
          RIP: ffffffffc076db8b  RSP: ff1966a94a713e98  RFLAGS: 00010246
          RAX: 16e3db9c6b7ccae4  RBX: ff187d269dd3c180  RCX: ff187d269cd4d018
          RDX: 0000000000000000  RSI: 0000000000000000  RDI: 0000000000000000
          RBP: ff187d269cfcc644   R8: ff187d339b9641b0   R9: 0000000000000000
          R10: 0000000000000002  R11: 0000000000000000  R12: ff187d269cfcc648
          R13: ffffffff9f128784  R14: ffffffff9d101b70  R15: ff187d269cfcc640
          ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
       #8 [ff1966a94a713ea0] ice_ptp_periodic_work at ffffffffc076dbef [ice]
       #9 [ff1966a94a713ee0] kthread_worker_fn at ffffffff9d101c1b
       #10 [ff1966a94a713f10] kthread at ffffffff9d101b4d
       #11 [ff1966a94a713f50] ret_from_fork at ffffffff9da0023f
      
      Fixes: 77a78115 ("ice: enable receive hardware timestamping")
      Signed-off-by: default avatarArkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
      Reviewed-by: default avatarMichal Schmidt <mschmidt@redhat.com>
      Tested-by: default avatarDave Cain <dcain@redhat.com>
      Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel)
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      4503cc7f
    • Zixuan Fu's avatar
      net: vmxnet3: fix possible NULL pointer dereference in vmxnet3_rq_cleanup() · edf410cb
      Zixuan Fu authored
      In vmxnet3_rq_create(), when dma_alloc_coherent() fails,
      vmxnet3_rq_destroy() is called. It sets rq->rx_ring[i].base to NULL. Then
      vmxnet3_rq_create() returns an error to its callers mxnet3_rq_create_all()
      -> vmxnet3_change_mtu(). Then vmxnet3_change_mtu() calls
      vmxnet3_force_close() -> dev_close() in error handling code. And the driver
      calls vmxnet3_close() -> vmxnet3_quiesce_dev() -> vmxnet3_rq_cleanup_all()
      -> vmxnet3_rq_cleanup(). In vmxnet3_rq_cleanup(),
      rq->rx_ring[ring_idx].base is accessed, but this variable is NULL, causing
      a NULL pointer dereference.
      
      To fix this possible bug, an if statement is added to check whether
      rq->rx_ring[0].base is NULL in vmxnet3_rq_cleanup() and exit early if so.
      
      The error log in our fault-injection testing is shown as follows:
      
      [   65.220135] BUG: kernel NULL pointer dereference, address: 0000000000000008
      ...
      [   65.222633] RIP: 0010:vmxnet3_rq_cleanup_all+0x396/0x4e0 [vmxnet3]
      ...
      [   65.227977] Call Trace:
      ...
      [   65.228262]  vmxnet3_quiesce_dev+0x80f/0x8a0 [vmxnet3]
      [   65.228580]  vmxnet3_close+0x2c4/0x3f0 [vmxnet3]
      [   65.228866]  __dev_close_many+0x288/0x350
      [   65.229607]  dev_close_many+0xa4/0x480
      [   65.231124]  dev_close+0x138/0x230
      [   65.231933]  vmxnet3_force_close+0x1f0/0x240 [vmxnet3]
      [   65.232248]  vmxnet3_change_mtu+0x75d/0x920 [vmxnet3]
      ...
      
      Fixes: d1a890fa ("net: VMware virtual Ethernet NIC driver: vmxnet3")
      Reported-by: default avatarTOTE Robot <oslab@tsinghua.edu.cn>
      Signed-off-by: default avatarZixuan Fu <r33s3n6@gmail.com>
      Link: https://lore.kernel.org/r/20220514050711.2636709-1-r33s3n6@gmail.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      edf410cb
    • Zixuan Fu's avatar
      net: vmxnet3: fix possible use-after-free bugs in vmxnet3_rq_alloc_rx_buf() · 9e7fef95
      Zixuan Fu authored
      In vmxnet3_rq_alloc_rx_buf(), when dma_map_single() fails, rbi->skb is
      freed immediately. Similarly, in another branch, when dma_map_page() fails,
      rbi->page is also freed. In the two cases, vmxnet3_rq_alloc_rx_buf()
      returns an error to its callers vmxnet3_rq_init() -> vmxnet3_rq_init_all()
      -> vmxnet3_activate_dev(). Then vmxnet3_activate_dev() calls
      vmxnet3_rq_cleanup_all() in error handling code, and rbi->skb or rbi->page
      are freed again in vmxnet3_rq_cleanup_all(), causing use-after-free bugs.
      
      To fix these possible bugs, rbi->skb and rbi->page should be cleared after
      they are freed.
      
      The error log in our fault-injection testing is shown as follows:
      
      [   14.319016] BUG: KASAN: use-after-free in consume_skb+0x2f/0x150
      ...
      [   14.321586] Call Trace:
      ...
      [   14.325357]  consume_skb+0x2f/0x150
      [   14.325671]  vmxnet3_rq_cleanup_all+0x33a/0x4e0 [vmxnet3]
      [   14.326150]  vmxnet3_activate_dev+0xb9d/0x2ca0 [vmxnet3]
      [   14.326616]  vmxnet3_open+0x387/0x470 [vmxnet3]
      ...
      [   14.361675] Allocated by task 351:
      ...
      [   14.362688]  __netdev_alloc_skb+0x1b3/0x6f0
      [   14.362960]  vmxnet3_rq_alloc_rx_buf+0x1b0/0x8d0 [vmxnet3]
      [   14.363317]  vmxnet3_activate_dev+0x3e3/0x2ca0 [vmxnet3]
      [   14.363661]  vmxnet3_open+0x387/0x470 [vmxnet3]
      ...
      [   14.367309]
      [   14.367412] Freed by task 351:
      ...
      [   14.368932]  __dev_kfree_skb_any+0xd2/0xe0
      [   14.369193]  vmxnet3_rq_alloc_rx_buf+0x71e/0x8d0 [vmxnet3]
      [   14.369544]  vmxnet3_activate_dev+0x3e3/0x2ca0 [vmxnet3]
      [   14.369883]  vmxnet3_open+0x387/0x470 [vmxnet3]
      [   14.370174]  __dev_open+0x28a/0x420
      [   14.370399]  __dev_change_flags+0x192/0x590
      [   14.370667]  dev_change_flags+0x7a/0x180
      [   14.370919]  do_setlink+0xb28/0x3570
      [   14.371150]  rtnl_newlink+0x1160/0x1740
      [   14.371399]  rtnetlink_rcv_msg+0x5bf/0xa50
      [   14.371661]  netlink_rcv_skb+0x1cd/0x3e0
      [   14.371913]  netlink_unicast+0x5dc/0x840
      [   14.372169]  netlink_sendmsg+0x856/0xc40
      [   14.372420]  ____sys_sendmsg+0x8a7/0x8d0
      [   14.372673]  __sys_sendmsg+0x1c2/0x270
      [   14.372914]  do_syscall_64+0x41/0x90
      [   14.373145]  entry_SYSCALL_64_after_hwframe+0x44/0xae
      ...
      
      Fixes: 5738a09d ("vmxnet3: fix checks for dma mapping errors")
      Reported-by: default avatarTOTE Robot <oslab@tsinghua.edu.cn>
      Signed-off-by: default avatarZixuan Fu <r33s3n6@gmail.com>
      Link: https://lore.kernel.org/r/20220514050656.2636588-1-r33s3n6@gmail.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      9e7fef95
    • Xin Long's avatar
      xfrm: set dst dev to blackhole_netdev instead of loopback_dev in ifdown · 4d33ab08
      Xin Long authored
      The global blackhole_netdev has replaced pernet loopback_dev to become the
      one given to the object that holds an netdev when ifdown in many places of
      ipv4 and ipv6 since commit 8d7017fd ("blackhole_netdev: use
      blackhole_netdev to invalidate dst entries").
      
      Especially after commit faab39f6 ("net: allow out-of-order netdev
      unregistration"), it's no longer safe to use loopback_dev that may be
      freed before other netdev.
      
      This patch is to set dst dev to blackhole_netdev instead of loopback_dev
      in ifdown.
      
      v1->v2:
        - add Fixes tag as Eric suggested.
      
      Fixes: faab39f6 ("net: allow out-of-order netdev unregistration")
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Link: https://lore.kernel.org/r/e8c87482998ca6fcdab214f5a9d582899ec0c648.1652665047.git.lucien.xin@gmail.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      4d33ab08
    • Christophe JAILLET's avatar
      net: systemport: Fix an error handling path in bcm_sysport_probe() · ef6b1cd1
      Christophe JAILLET authored
      if devm_clk_get_optional() fails, we still need to go through the error
      handling path.
      
      Add the missing goto.
      
      Fixes: 6328a126 ("net: systemport: Manage Wake-on-LAN clock")
      Signed-off-by: default avatarChristophe JAILLET <christophe.jaillet@wanadoo.fr>
      Acked-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Link: https://lore.kernel.org/r/99d70634a81c229885ae9e4ee69b2035749f7edc.1652634040.git.christophe.jaillet@wanadoo.frSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      ef6b1cd1
    • Horatiu Vultur's avatar
      net: lan966x: Fix assignment of the MAC address · af8ca6ea
      Horatiu Vultur authored
      The following two scenarios were failing for lan966x.
      1. If the port had the address X and then trying to assign the same
         address, then the HW was just removing this address because first it
         tries to learn new address and then delete the old one. As they are
         the same the HW remove it.
      2. If the port eth0 was assigned the same address as one of the other
         ports eth1 then when assigning back the address to eth0 then the HW
         was deleting the address of eth1.
      
      The case 1. is fixed by checking if the port has already the same
      address while case 2. is fixed by checking if the address is used by any
      other port.
      
      Fixes: e18aba89 ("net: lan966x: add mactable support")
      Signed-off-by: default avatarHoratiu Vultur <horatiu.vultur@microchip.com>
      Link: https://lore.kernel.org/r/20220513180030.3076793-1-horatiu.vultur@microchip.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      af8ca6ea
  4. 16 May, 2022 4 commits