1. 16 Sep, 2021 4 commits
    • Paolo Abeni's avatar
      igc: fix tunnel offloading · 40ee363c
      Paolo Abeni authored
      Checking tunnel offloading, it turns out that offloading doesn't work
      as expected.  The following script allows to reproduce the issue.
      Call it as `testscript DEVICE LOCALIP REMOTEIP NETMASK'
      
      === SNIP ===
      if [ $# -ne 4 ]
      then
        echo "Usage $0 DEVICE LOCALIP REMOTEIP NETMASK"
        exit 1
      fi
      DEVICE="$1"
      LOCAL_ADDRESS="$2"
      REMOTE_ADDRESS="$3"
      NWMASK="$4"
      echo "Driver: $(ethtool -i ${DEVICE} | awk '/^driver:/{print $2}') "
      ethtool -k "${DEVICE}" | grep tx-udp
      echo
      echo "Set up NIC and tunnel..."
      ip addr add "${LOCAL_ADDRESS}/${NWMASK}" dev "${DEVICE}"
      ip link set "${DEVICE}" up
      sleep 2
      ip link add vxlan1 type vxlan id 42 \
      		   remote "${REMOTE_ADDRESS}" \
      		   local "${LOCAL_ADDRESS}" \
      		   dstport 0 \
      		   dev "${DEVICE}"
      ip addr add fc00::1/64 dev vxlan1
      ip link set vxlan1 up
      sleep 2
      rm -f vxlan.pcap
      echo "Running tcpdump and iperf3..."
      ( nohup tcpdump -i any -w vxlan.pcap >/dev/null 2>&1 ) &
      sleep 2
      iperf3 -c fc00::2 >/dev/null
      pkill tcpdump
      echo
      echo -n "Max. Paket Size: "
      tcpdump -r vxlan.pcap -nnle 2>/dev/null \
      | grep "${LOCAL_ADDRESS}.*> ${REMOTE_ADDRESS}.*OTV" \
      | awk '{print $8}' | awk -F ':' '{print $1}' \
      | sort -n | tail -1
      echo
      ip link del vxlan1
      ip addr del ${LOCAL_ADDRESS}/${NWMASK} dev "${DEVICE}"
      === SNAP ===
      
      The expected outcome is
      
        Max. Paket Size: 64904
      
      This is what you see on igb, the code igc has been taken from.
      However, on igc the output is
      
        Max. Paket Size: 1516
      
      so the GSO aggregate packets are segmented by the kernel before calling
      igc_xmit_frame.  Inside the subsequent call to igc_tso, the check for
      skb_is_gso(skb) fails and the function returns prematurely.
      
      It turns out that this occurs because the feature flags aren't set
      entirely correctly in igc_probe.  In contrast to the original code
      from igb_probe, igc_probe neglects to set the flags required to allow
      tunnel offloading.
      
      Setting the same flags as igb fixes the issue on igc.
      
      Fixes: 34428dff ("igc: Add GSO partial support")
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Tested-by: default avatarCorinna Vinschen <vinschen@redhat.com>
      Acked-by: default avatarSasha Neftin <sasha.neftin@intel.com>
      Tested-by: default avatarNechama Kraus <nechamax.kraus@linux.intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      40ee363c
    • Eli Cohen's avatar
      net/{mlx5|nfp|bnxt}: Remove unnecessary RTNL lock assert · 7c3a0a01
      Eli Cohen authored
      Remove the assert from the callback priv lookup function since it does
      not require RTNL lock and is already protected by flow_indr_block_lock.
      
      This will avoid warnings from being emitted to dmesg if the driver
      registers its callback after an ingress qdisc was created for a
      netdevice.
      
      The warnings started after the following patch was merged:
      commit 74fc4f82 ("net: Fix offloading indirect devices dependency on qdisc order creation")
      Signed-off-by: default avatarEli Cohen <elic@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7c3a0a01
    • Adam Borowski's avatar
      net: wan: wanxl: define CROSS_COMPILE_M68K · 84fb7dfc
      Adam Borowski authored
      It was used but never set.  The hardcoded value from before the dawn of
      time was non-standard; the usual name for cross-tools is $TRIPLET-$TOOL
      Signed-off-by: default avatarAdam Borowski <kilobyte@angband.pl>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      84fb7dfc
    • Xiang wangx's avatar
      selftests: nci: replace unsigned int with int · 98dc68f8
      Xiang wangx authored
      Should not use comparison of unsigned expressions < 0.
      Signed-off-by: default avatarXiang wangx <wangxiang@cdjrlc.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      98dc68f8
  2. 15 Sep, 2021 5 commits
    • Vladimir Oltean's avatar
      net: dsa: flush switchdev workqueue before tearing down CPU/DSA ports · a57d8c21
      Vladimir Oltean authored
      Sometimes when unbinding the mv88e6xxx driver on Turris MOX, these error
      messages appear:
      
      mv88e6085 d0032004.mdio-mii:12: port 1 failed to delete be:79:b4:9e:9e:96 vid 1 from fdb: -2
      mv88e6085 d0032004.mdio-mii:12: port 1 failed to delete be:79:b4:9e:9e:96 vid 0 from fdb: -2
      mv88e6085 d0032004.mdio-mii:12: port 1 failed to delete d8:58:d7:00:ca:6d vid 100 from fdb: -2
      mv88e6085 d0032004.mdio-mii:12: port 1 failed to delete d8:58:d7:00:ca:6d vid 1 from fdb: -2
      mv88e6085 d0032004.mdio-mii:12: port 1 failed to delete d8:58:d7:00:ca:6d vid 0 from fdb: -2
      
      (and similarly for other ports)
      
      What happens is that DSA has a policy "even if there are bugs, let's at
      least not leak memory" and dsa_port_teardown() clears the dp->fdbs and
      dp->mdbs lists, which are supposed to be empty.
      
      But deleting that cleanup code, the warnings go away.
      
      => the FDB and MDB lists (used for refcounting on shared ports, aka CPU
      and DSA ports) will eventually be empty, but are not empty by the time
      we tear down those ports. Aka we are deleting them too soon.
      
      The addresses that DSA complains about are host-trapped addresses: the
      local addresses of the ports, and the MAC address of the bridge device.
      
      The problem is that offloading those entries happens from a deferred
      work item scheduled by the SWITCHDEV_FDB_DEL_TO_DEVICE handler, and this
      races with the teardown of the CPU and DSA ports where the refcounting
      is kept.
      
      In fact, not only it races, but fundamentally speaking, if we iterate
      through the port list linearly, we might end up tearing down the shared
      ports even before we delete a DSA user port which has a bridge upper.
      
      So as it turns out, we need to first tear down the user ports (and the
      unused ones, for no better place of doing that), then the shared ports
      (the CPU and DSA ports). In between, we need to ensure that all work
      items scheduled by our switchdev handlers (which only run for user
      ports, hence the reason why we tear them down first) have finished.
      
      Fixes: 161ca59d ("net: dsa: reference count the MDB entries at the cross-chip notifier level")
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Link: https://lore.kernel.org/r/20210914134726.2305133-1-vladimir.oltean@nxp.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      a57d8c21
    • Vladimir Oltean's avatar
      Revert "net: phy: Uniform PHY driver access" · 301de697
      Vladimir Oltean authored
      This reverts commit 3ac8eed6, which did
      more than it said on the box, and not only it replaced to_phy_driver
      with phydev->drv, but it also removed the "!drv" check, without actually
      explaining why that is fine.
      
      That patch in fact breaks suspend/resume on any system which has PHY
      devices with no drivers bound.
      
      The stack trace is:
      
      Unable to handle kernel NULL pointer dereference at virtual address 00000000000000e8
      pc : mdio_bus_phy_suspend+0xd8/0xec
      lr : dpm_run_callback+0x38/0x90
      Call trace:
       mdio_bus_phy_suspend+0xd8/0xec
       dpm_run_callback+0x38/0x90
       __device_suspend+0x108/0x3cc
       dpm_suspend+0x140/0x210
       dpm_suspend_start+0x7c/0xa0
       suspend_devices_and_enter+0x13c/0x540
       pm_suspend+0x2a4/0x330
      
      Examples why that assumption is not fine:
      
      - There is an MDIO bus with a PHY device that doesn't have a specific
        PHY driver loaded, because mdiobus_register() automatically creates a
        PHY device for it but there is no specific PHY driver in the system.
        Normally under those circumstances, the generic PHY driver will be
        bound lazily to it (at phy_attach_direct time). But some Ethernet
        drivers attach to their PHY at .ndo_open time. Until then it, the
        to-be-driven-by-genphy PHY device will not have a driver. The blamed
        patch amounts to saying "you need to open all net devices before the
        system can suspend, to avoid the NULL pointer dereference".
      
      - There is any raw MDIO device which has 'plausible' values in the PHY
        ID registers 2 and 3, which is located on an MDIO bus whose driver
        does not set bus->phy_mask = ~0 (which prevents auto-scanning of PHY
        devices). An example could be a MAC's internal MDIO bus with PCS
        devices on it, for serial links such as SGMII. PHY devices will get
        created for those PCSes too, due to that MDIO bus auto-scanning, and
        although those PHY devices are not used, they do not bother anybody
        either. PCS devices are usually managed in Linux as raw MDIO devices.
        Nonetheless, they do not have a PHY driver, nor does anybody attempt
        to connect to them (because they are not a PHY), and therefore this
        patch breaks that.
      
      The goal itself of the patch is questionable, so I am going for a
      straight revert. to_phy_driver does not seem to have a need to be
      replaced by phydev->drv, in fact that might even trigger code paths
      which were not given too deep of a thought.
      
      For instance:
      
      phy_probe populates phydev->drv at the beginning, but does not clean it
      up on any error (including EPROBE_DEFER). So if the phydev driver
      requests probe deferral, phydev->drv will remain populated despite there
      being no driver bound.
      
      If a system suspend starts in between the initial probe deferral request
      and the subsequent probe retry, we will be calling the phydev->drv->suspend
      method, but _before_ any phydev->drv->probe call has succeeded.
      
      That is to say, if the phydev->drv is allocating any driver-private data
      structure in ->probe, it pretty much expects that data structure to be
      available in ->suspend. But it may not. That is a pretty insane
      environment to present to PHY drivers.
      
      In the code structure before the blamed patch, mdio_bus_phy_may_suspend
      would just say "no, don't suspend" to any PHY device which does not have
      a driver pointer _in_the_device_structure_ (not the phydev->drv). That
      would essentially ensure that ->suspend will never get called for a
      device that has not yet successfully completed probe. This is the code
      structure the patch is returning to, via the revert.
      
      Fixes: 3ac8eed6 ("net: phy: Uniform PHY driver access")
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Acked-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Link: https://lore.kernel.org/r/20210914140515.2311548-1-vladimir.oltean@nxp.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      301de697
    • Vladimir Oltean's avatar
      net: dsa: destroy the phylink instance on any error in dsa_slave_phy_setup · 6a52e733
      Vladimir Oltean authored
      DSA supports connecting to a phy-handle, and has a fallback to a non-OF
      based method of connecting to an internal PHY on the switch's own MDIO
      bus, if no phy-handle and no fixed-link nodes were present.
      
      The -ENODEV error code from the first attempt (phylink_of_phy_connect)
      is what triggers the second attempt (phylink_connect_phy).
      
      However, when the first attempt returns a different error code than
      -ENODEV, this results in an unbalance of calls to phylink_create and
      phylink_destroy by the time we exit the function. The phylink instance
      has leaked.
      
      There are many other error codes that can be returned by
      phylink_of_phy_connect. For example, phylink_validate returns -EINVAL.
      So this is a practical issue too.
      
      Fixes: aab9c406 ("net: dsa: Plug in PHYLINK support")
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Reviewed-by: default avatarRussell King (Oracle) <rmk+kernel@armlinux.org.uk>
      Link: https://lore.kernel.org/r/20210914134331.2303380-1-vladimir.oltean@nxp.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      6a52e733
    • Randy Dunlap's avatar
      ptp: dp83640: don't define PAGE0 · 7366c23f
      Randy Dunlap authored
      Building dp83640.c on arch/parisc/ produces a build warning for
      PAGE0 being redefined. Since the macro is not used in the dp83640
      driver, just make it a comment for documentation purposes.
      
      In file included from ../drivers/net/phy/dp83640.c:23:
      ../drivers/net/phy/dp83640_reg.h:8: warning: "PAGE0" redefined
          8 | #define PAGE0                     0x0000
                       from ../drivers/net/phy/dp83640.c:11:
      ../arch/parisc/include/asm/page.h:187: note: this is the location of the previous definition
        187 | #define PAGE0   ((struct zeropage *)__PAGE_OFFSET)
      
      Fixes: cb646e2b ("ptp: Added a clock driver for the National Semiconductor PHYTER.")
      Signed-off-by: default avatarRandy Dunlap <rdunlap@infradead.org>
      Reported-by: default avatarGeert Uytterhoeven <geert@linux-m68k.org>
      Cc: Richard Cochran <richard.cochran@omicron.at>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: Heiner Kallweit <hkallweit1@gmail.com>
      Cc: Russell King <linux@armlinux.org.uk>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Link: https://lore.kernel.org/r/20210913220605.19682-1-rdunlap@infradead.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      7366c23f
    • Adrian Bunk's avatar
      bnx2x: Fix enabling network interfaces without VFs · 52ce14c1
      Adrian Bunk authored
      This function is called to enable SR-IOV when available,
      not enabling interfaces without VFs was a regression.
      
      Fixes: 65161c35 ("bnx2x: Fix missing error code in bnx2x_iov_init_one()")
      Signed-off-by: default avatarAdrian Bunk <bunk@kernel.org>
      Reported-by: default avatarYunQiang Su <wzssyqa@gmail.com>
      Tested-by: default avatarYunQiang Su <wzssyqa@gmail.com>
      Cc: stable@vger.kernel.org
      Acked-by: default avatarShai Malin <smalin@marvell.com>
      Link: https://lore.kernel.org/r/20210912190523.27991-1-bunk@kernel.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      52ce14c1
  3. 14 Sep, 2021 4 commits
    • Eric Dumazet's avatar
      Revert "Revert "ipv4: fix memory leaks in ip_cmsg_send() callers"" · d198b277
      Eric Dumazet authored
      This reverts commit d7807a9a.
      
      As mentioned in https://lkml.org/lkml/2021/9/13/1819
      5 years old commit 91948309 ("ipv4: fix memory leaks in ip_cmsg_send() callers")
      was a correct fix.
      
        ip_cmsg_send() can loop over multiple cmsghdr()
      
        If IP_RETOPTS has been successful, but following cmsghdr generates an error,
        we do not free ipc.ok
      
        If IP_RETOPTS is not successful, we have freed the allocated temporary space,
        not the one currently in ipc.opt.
      
      Sure, code could be refactored, but let's not bring back old bugs.
      
      Fixes: d7807a9a ("Revert "ipv4: fix memory leaks in ip_cmsg_send() callers"")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Yajun Deng <yajun.deng@linux.dev>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d198b277
    • zhenggy's avatar
      tcp: fix tp->undo_retrans accounting in tcp_sacktag_one() · 4f884f39
      zhenggy authored
      Commit 10d3be56 ("tcp-tso: do not split TSO packets at retransmit
      time") may directly retrans a multiple segments TSO/GSO packet without
      split, Since this commit, we can no longer assume that a retransmitted
      packet is a single segment.
      
      This patch fixes the tp->undo_retrans accounting in tcp_sacktag_one()
      that use the actual segments(pcount) of the retransmitted packet.
      
      Before that commit (10d3be56), the assumption underlying the
      tp->undo_retrans-- seems correct.
      
      Fixes: 10d3be56 ("tcp-tso: do not split TSO packets at retransmit time")
      Signed-off-by: default avatarzhenggy <zhenggy@chinatelecom.cn>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Acked-by: default avatarYuchung Cheng <ycheng@google.com>
      Acked-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4f884f39
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf · 2865ba82
      David S. Miller authored
      Daniel Borkmann says:
      
      ====================
      pull-request: bpf 2021-09-14
      
      The following pull-request contains BPF updates for your *net* tree.
      
      We've added 7 non-merge commits during the last 13 day(s) which contain
      a total of 18 files changed, 334 insertions(+), 193 deletions(-).
      
      The main changes are:
      
      1) Fix mmap_lock lockdep splat in BPF stack map's build_id lookup, from Yonghong Song.
      
      2) Fix BPF cgroup v2 program bypass upon net_cls/prio activation, from Daniel Borkmann.
      
      3) Fix kvcalloc() BTF line info splat on oversized allocation attempts, from Bixuan Cui.
      
      4) Fix BPF selftest build of task_pt_regs test for arm64/s390, from Jean-Philippe Brucker.
      
      5) Fix BPF's disasm.{c,h} to dual-license so that it is aligned with bpftool given the former
         is a build dependency for the latter, from Daniel Borkmann with ACKs from contributors.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2865ba82
    • Eric Dumazet's avatar
      net-caif: avoid user-triggerable WARN_ON(1) · 550ac9c1
      Eric Dumazet authored
      syszbot triggers this warning, which looks something
      we can easily prevent.
      
      If we initialize priv->list_field in chnl_net_init(),
      then always use list_del_init(), we can remove robust_list_del()
      completely.
      
      WARNING: CPU: 0 PID: 3233 at net/caif/chnl_net.c:67 robust_list_del net/caif/chnl_net.c:67 [inline]
      WARNING: CPU: 0 PID: 3233 at net/caif/chnl_net.c:67 chnl_net_uninit+0xc9/0x2e0 net/caif/chnl_net.c:375
      Modules linked in:
      CPU: 0 PID: 3233 Comm: syz-executor.3 Not tainted 5.14.0-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      RIP: 0010:robust_list_del net/caif/chnl_net.c:67 [inline]
      RIP: 0010:chnl_net_uninit+0xc9/0x2e0 net/caif/chnl_net.c:375
      Code: 89 eb e8 3a a3 ba f8 48 89 d8 48 c1 e8 03 42 80 3c 28 00 0f 85 bf 01 00 00 48 81 fb 00 14 4e 8d 48 8b 2b 75 d0 e8 17 a3 ba f8 <0f> 0b 5b 5d 41 5c 41 5d e9 0a a3 ba f8 4c 89 e3 e8 02 a3 ba f8 4c
      RSP: 0018:ffffc90009067248 EFLAGS: 00010202
      RAX: 0000000000008780 RBX: ffffffff8d4e1400 RCX: ffffc9000fd34000
      RDX: 0000000000040000 RSI: ffffffff88bb6e49 RDI: 0000000000000003
      RBP: ffff88802cd9ee08 R08: 0000000000000000 R09: ffffffff8d0e6647
      R10: ffffffff88bb6dc2 R11: 0000000000000000 R12: ffff88803791ae08
      R13: dffffc0000000000 R14: 00000000e600ffce R15: ffff888073ed3480
      FS:  00007fed10fa0700(0000) GS:ffff8880b9d00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000001b2c322000 CR3: 00000000164a6000 CR4: 00000000001506e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       register_netdevice+0xadf/0x1500 net/core/dev.c:10347
       ipcaif_newlink+0x4c/0x260 net/caif/chnl_net.c:468
       __rtnl_newlink+0x106d/0x1750 net/core/rtnetlink.c:3458
       rtnl_newlink+0x64/0xa0 net/core/rtnetlink.c:3506
       rtnetlink_rcv_msg+0x413/0xb80 net/core/rtnetlink.c:5572
       netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2504
       netlink_unicast_kernel net/netlink/af_netlink.c:1314 [inline]
       netlink_unicast+0x533/0x7d0 net/netlink/af_netlink.c:1340
       netlink_sendmsg+0x86d/0xdb0 net/netlink/af_netlink.c:1929
       sock_sendmsg_nosec net/socket.c:704 [inline]
       sock_sendmsg+0xcf/0x120 net/socket.c:724
       __sys_sendto+0x21c/0x320 net/socket.c:2036
       __do_sys_sendto net/socket.c:2048 [inline]
       __se_sys_sendto net/socket.c:2044 [inline]
       __x64_sys_sendto+0xdd/0x1b0 net/socket.c:2044
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      Fixes: cc36a070 ("net-caif: add CAIF netdevice")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      550ac9c1
  4. 13 Sep, 2021 20 commits
  5. 12 Sep, 2021 3 commits
  6. 11 Sep, 2021 1 commit
    • Jesper Nilsson's avatar
      net: stmmac: allow CSR clock of 300MHz · 08dad2f4
      Jesper Nilsson authored
      The Synopsys Ethernet IP uses the CSR clock as a base clock for MDC.
      The divisor used is set in the MAC_MDIO_Address register field CR
      (Clock Rate)
      
      The divisor is there to change the CSR clock into a clock that falls
      below the IEEE 802.3 specified max frequency of 2.5MHz.
      
      If the CSR clock is 300MHz, the code falls back to using the reset
      value in the MAC_MDIO_Address register, as described in the comment
      above this code.
      
      However, 300MHz is actually an allowed value and the proper divider
      can be estimated quite easily (it's just 1Hz difference!)
      
      A CSR frequency of 300MHz with the maximum clock rate value of 0x5
      (STMMAC_CSR_250_300M, a divisor of 124) gives somewhere around
      ~2.42MHz which is below the IEEE 802.3 specified maximum.
      
      For the ARTPEC-8 SoC, the CSR clock is this problematic 300MHz,
      and unfortunately, the reset-value of the MAC_MDIO_Address CR field
      is 0x0.
      
      This leads to a clock rate of zero and a divisor of 42, and gives an
      MDC frequency of ~7.14MHz.
      
      Allow CSR clock of 300MHz by making the comparison inclusive.
      Signed-off-by: default avatarJesper Nilsson <jesper.nilsson@axis.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      08dad2f4
  7. 10 Sep, 2021 3 commits
    • Yonghong Song's avatar
      bpf, mm: Fix lockdep warning triggered by stack_map_get_build_id_offset() · 2f1aaf3e
      Yonghong Song authored
      Currently the bpf selftest "get_stack_raw_tp" triggered the warning:
      
        [ 1411.304463] WARNING: CPU: 3 PID: 140 at include/linux/mmap_lock.h:164 find_vma+0x47/0xa0
        [ 1411.304469] Modules linked in: bpf_testmod(O) [last unloaded: bpf_testmod]
        [ 1411.304476] CPU: 3 PID: 140 Comm: systemd-journal Tainted: G        W  O      5.14.0+ #53
        [ 1411.304479] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
        [ 1411.304481] RIP: 0010:find_vma+0x47/0xa0
        [ 1411.304484] Code: de 48 89 ef e8 ba f5 fe ff 48 85 c0 74 2e 48 83 c4 08 5b 5d c3 48 8d bf 28 01 00 00 be ff ff ff ff e8 2d 9f d8 00 85 c0 75 d4 <0f> 0b 48 89 de 48 8
        [ 1411.304487] RSP: 0018:ffffabd440403db8 EFLAGS: 00010246
        [ 1411.304490] RAX: 0000000000000000 RBX: 00007f00ad80a0e0 RCX: 0000000000000000
        [ 1411.304492] RDX: 0000000000000001 RSI: ffffffff9776b144 RDI: ffffffff977e1b0e
        [ 1411.304494] RBP: ffff9cf5c2f50000 R08: ffff9cf5c3eb25d8 R09: 00000000fffffffe
        [ 1411.304496] R10: 0000000000000001 R11: 00000000ef974e19 R12: ffff9cf5c39ae0e0
        [ 1411.304498] R13: 0000000000000000 R14: 0000000000000000 R15: ffff9cf5c39ae0e0
        [ 1411.304501] FS:  00007f00ae754780(0000) GS:ffff9cf5fba00000(0000) knlGS:0000000000000000
        [ 1411.304504] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        [ 1411.304506] CR2: 000000003e34343c CR3: 0000000103a98005 CR4: 0000000000370ee0
        [ 1411.304508] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
        [ 1411.304510] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
        [ 1411.304512] Call Trace:
        [ 1411.304517]  stack_map_get_build_id_offset+0x17c/0x260
        [ 1411.304528]  __bpf_get_stack+0x18f/0x230
        [ 1411.304541]  bpf_get_stack_raw_tp+0x5a/0x70
        [ 1411.305752] RAX: 0000000000000000 RBX: 5541f689495641d7 RCX: 0000000000000000
        [ 1411.305756] RDX: 0000000000000001 RSI: ffffffff9776b144 RDI: ffffffff977e1b0e
        [ 1411.305758] RBP: ffff9cf5c02b2f40 R08: ffff9cf5ca7606c0 R09: ffffcbd43ee02c04
        [ 1411.306978]  bpf_prog_32007c34f7726d29_bpf_prog1+0xaf/0xd9c
        [ 1411.307861] R10: 0000000000000001 R11: 0000000000000044 R12: ffff9cf5c2ef60e0
        [ 1411.307865] R13: 0000000000000005 R14: 0000000000000000 R15: ffff9cf5c2ef6108
        [ 1411.309074]  bpf_trace_run2+0x8f/0x1a0
        [ 1411.309891] FS:  00007ff485141700(0000) GS:ffff9cf5fae00000(0000) knlGS:0000000000000000
        [ 1411.309896] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        [ 1411.311221]  syscall_trace_enter.isra.20+0x161/0x1f0
        [ 1411.311600] CR2: 00007ff48514d90e CR3: 0000000107114001 CR4: 0000000000370ef0
        [ 1411.312291]  do_syscall_64+0x15/0x80
        [ 1411.312941] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
        [ 1411.313803]  entry_SYSCALL_64_after_hwframe+0x44/0xae
        [ 1411.314223] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
        [ 1411.315082] RIP: 0033:0x7f00ad80a0e0
        [ 1411.315626] Call Trace:
        [ 1411.315632]  stack_map_get_build_id_offset+0x17c/0x260
      
      To reproduce, first build `test_progs` binary:
      
        make -C tools/testing/selftests/bpf -j60
      
      and then run the binary at tools/testing/selftests/bpf directory:
      
        ./test_progs -t get_stack_raw_tp
      
      The warning is due to commit 5b78ed24 ("mm/pagemap: add mmap_assert_locked()
      annotations to find_vma*()") which added mmap_assert_locked() in find_vma()
      function. The mmap_assert_locked() function asserts that mm->mmap_lock needs
      to be held. But this is not the case for bpf_get_stack() or bpf_get_stackid()
      helper (kernel/bpf/stackmap.c), which uses mmap_read_trylock_non_owner()
      instead. Since mm->mmap_lock is not held in bpf_get_stack[id]() use case,
      the above warning is emitted during test run.
      
      This patch fixed the issue by (1). using mmap_read_trylock() instead of
      mmap_read_trylock_non_owner() to satisfy lockdep checking in find_vma(), and
      (2). droping lockdep for mmap_lock right before the irq_work_queue(). The
      function mmap_read_trylock_non_owner() is also removed since after this
      patch nobody calls it any more.
      
      Fixes: 5b78ed24 ("mm/pagemap: add mmap_assert_locked() annotations to find_vma*()")
      Suggested-by: default avatarJason Gunthorpe <jgg@ziepe.ca>
      Signed-off-by: default avatarYonghong Song <yhs@fb.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Reviewed-by: default avatarLiam R. Howlett <Liam.Howlett@oracle.com>
      Cc: Luigi Rizzo <lrizzo@google.com>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Cc: linux-mm@kvack.org
      Link: https://lore.kernel.org/bpf/20210909155000.1610299-1-yhs@fb.com
      2f1aaf3e
    • Colin Ian King's avatar
      qlcnic: Remove redundant initialization of variable ret · 666eb96d
      Colin Ian King authored
      The variable ret is being initialized with a value that is never read, it
      is being updated later on. The assignment is redundant and can be removed.
      
      Addresses-Coverity: ("Unused value")
      Signed-off-by: default avatarColin Ian King <colin.king@canonical.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      666eb96d
    • Shai Malin's avatar
      qed: Handle management FW error · 20e100f5
      Shai Malin authored
      Handle MFW (management FW) error response in order to avoid a crash
      during recovery flows.
      
      Changes from v1:
      - Add "Fixes tag".
      
      Fixes: tag 5e7ba042 ("qed: Fix reading stale configuration information")
      Signed-off-by: default avatarAriel Elior <aelior@marvell.com>
      Signed-off-by: default avatarShai Malin <smalin@marvell.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      20e100f5