1. 22 Sep, 2020 10 commits
    • Tariq Toukan's avatar
      net/mlx5e: TLS, Do not expose FPGA TLS counter if not supported · 8f0bcd19
      Tariq Toukan authored
      The set of TLS TX global SW counters in mlx5e_tls_sw_stats_desc
      is updated from all rings by using atomic ops.
      This set of stats is used only in the FPGA TLS use case, not in
      the Connect-X TLS one, where regular per-ring counters are used.
      
      Do not expose them in the Connect-X use case, as this would cause
      counter duplication. For example, tx_tls_drop_no_sync_data would
      appear twice in the ethtool stats.
      
      Fixes: d2ead1f3 ("net/mlx5e: Add kTLS TX HW offload support")
      Signed-off-by: default avatarTariq Toukan <tariqt@mellanox.com>
      Reviewed-by: default avatarMoshe Shemesh <moshe@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      8f0bcd19
    • Alaa Hleihel's avatar
      net/mlx5e: Fix using wrong stats_grps in mlx5e_update_ndo_stats() · b521105b
      Alaa Hleihel authored
      The cited commit started to reuse function mlx5e_update_ndo_stats() for
      the representors as well.
      However, the function is hard-coded to work on mlx5e_nic_stats_grps only.
      Due to this issue, the representors statistics were not updated in the
      output of "ip -s".
      
      Fix it to work with the correct group by extracting it from the caller's
      profile.
      
      Also, while at it and since this function became generic, move it to
      en_stats.c and rename it accordingly.
      
      Fixes: 8a236b15 ("net/mlx5e: Convert rep stats to mlx5e_stats_grp-based infra")
      Signed-off-by: default avatarAlaa Hleihel <alaa@nvidia.com>
      Reviewed-by: default avatarVlad Buslov <vladbu@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      b521105b
    • Ron Diskin's avatar
      net/mlx5e: Fix multicast counter not up-to-date in "ip -s" · 47c97e6b
      Ron Diskin authored
      Currently the FW does not generate events for counters other than error
      counters. Unlike ".get_ethtool_stats", ".ndo_get_stats64" (which ip -s
      uses) might run in atomic context, while the FW interface is non atomic.
      Thus, 'ip' is not allowed to issue FW commands, so it will only display
      cached counters in the driver.
      
      Add a SW counter (mcast_packets) in the driver to count rx multicast
      packets. The counter also counts broadcast packets, as we consider it a
      special case of multicast.
      Use the counter value when calling "ip -s"/"ifconfig".
      
      Fixes: f62b8bb8 ("net/mlx5: Extend mlx5_core to support ConnectX-4 Ethernet functionality")
      Signed-off-by: default avatarRon Diskin <rondi@mellanox.com>
      Reviewed-by: default avatarTariq Toukan <tariqt@mellanox.com>
      Reviewed-by: default avatarMoshe Shemesh <moshe@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      47c97e6b
    • Maor Dickman's avatar
      net/mlx5e: Fix endianness when calculating pedit mask first bit · 82198d8b
      Maor Dickman authored
      The field mask value is provided in network byte order and has to
      be converted to host byte order before calculating pedit mask
      first bit.
      
      Fixes: 88f30bbc ("net/mlx5e: Bit sized fields rewrite support")
      Signed-off-by: default avatarMaor Dickman <maord@nvidia.com>
      Reviewed-by: default avatarRoi Dayan <roid@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      82198d8b
    • Maor Dickman's avatar
      net/mlx5e: Enable adding peer miss rules only if merged eswitch is supported · 6cec0229
      Maor Dickman authored
      The cited commit creates peer miss group during switchdev mode
      initialization in order to handle miss packets correctly while in VF
      LAG mode. This is done regardless of FW support of such groups which
      could cause rules setups failure later on.
      
      Fix by adding FW capability check before creating peer groups/rule.
      
      Fixes: ac004b83 ("net/mlx5e: E-Switch, Add peer miss rules")
      Signed-off-by: default avatarMaor Dickman <maord@mellanox.com>
      Reviewed-by: default avatarRoi Dayan <roid@mellanox.com>
      Reviewed-by: default avatarRaed Salem <raeds@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      6cec0229
    • Roi Dayan's avatar
      net/mlx5e: CT: Fix freeing ct_label mapping · 4c8594ad
      Roi Dayan authored
      Add missing mapping remove call when removing ct rule,
      as the mapping was allocated when ct rule was adding with ct_label.
      Also there is a missing mapping remove call in error flow.
      
      Fixes: 54b154ec ("net/mlx5e: CT: Map 128 bits labels to 32 bit map ID")
      Signed-off-by: default avatarRoi Dayan <roid@mellanox.com>
      Reviewed-by: default avatarEli Britstein <elibr@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      4c8594ad
    • Jianbo Liu's avatar
      net/mlx5e: Fix memory leak of tunnel info when rule under multipath not ready · 12a240a4
      Jianbo Liu authored
      When deleting vxlan flow rule under multipath, tun_info in parse_attr is
      not freed when the rule is not ready.
      
      Fixes: ef06c9ee ("net/mlx5e: Allow one failure when offloading tc encap rules under multipath")
      Signed-off-by: default avatarJianbo Liu <jianbol@mellanox.com>
      Reviewed-by: default avatarRoi Dayan <roid@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      12a240a4
    • Maxim Mikityanskiy's avatar
      net/mlx5e: Use synchronize_rcu to sync with NAPI · 9c25a22d
      Maxim Mikityanskiy authored
      As described in the previous commit, napi_synchronize doesn't quite fit
      the purpose when we just need to wait until the currently running NAPI
      quits. Its implementation waits until NAPI is not running by polling and
      waiting for 1ms in between. In cases where we need to deactivate one
      queue (e.g., recovery flows) or where we deactivate them one-by-one
      (deactivate channel flow), we may get stuck in napi_synchronize forever
      if other queues keep NAPI active, causing a soft lockup. Depending on
      kernel configuration (CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC), it may result
      in a kernel panic.
      
      To fix the issue, use synchronize_rcu to wait for NAPI to quit, and wrap
      the whole NAPI in rcu_read_lock.
      
      Fixes: acc6c595 ("net/mlx5e: Split open/close channels to stages")
      Signed-off-by: default avatarMaxim Mikityanskiy <maximmi@mellanox.com>
      Reviewed-by: default avatarTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      9c25a22d
    • Maxim Mikityanskiy's avatar
      net/mlx5e: Use RCU to protect rq->xdp_prog · fe45386a
      Maxim Mikityanskiy authored
      Currently, the RQs are temporarily deactivated while hot-replacing the
      XDP program, and napi_synchronize is used to make sure rq->xdp_prog is
      not in use. However, napi_synchronize is not ideal: instead of waiting
      till the end of a NAPI cycle, it polls and waits until NAPI is not
      running, sleeping for 1ms between the periodic checks. Under heavy
      workloads, this loop will never end, which may even lead to a kernel
      panic if the kernel detects the hangup. Such workloads include XSK TX
      and possibly also heavy RX (XSK or normal).
      
      The fix is inspired by commit 326fe02d ("net/mlx4_en: protect
      ring->xdp_prog with rcu_read_lock"). As mlx5e_xdp_handle is already
      protected by rcu_read_lock, and bpf_prog_put uses call_rcu to free the
      program, there is no need for additional synchronization if proper RCU
      functions are used to access the pointer. This patch converts all
      accesses to rq->xdp_prog to use RCU functions.
      
      Fixes: 86994156 ("net/mlx5e: XDP fast RX drop bpf programs support")
      Fixes: db05815b ("net/mlx5e: Add XSK zero-copy support")
      Signed-off-by: default avatarMaxim Mikityanskiy <maximmi@mellanox.com>
      Reviewed-by: default avatarTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      fe45386a
    • Maor Gottlieb's avatar
      net/mlx5: Fix FTE cleanup · cefc2355
      Maor Gottlieb authored
      Currently, when an FTE is allocated, its refcount is decreased to 0
      with the purpose it will not be a stand alone steering object and every
      rule (destination) of the FTE would increase the refcount.
      When mlx5_cleanup_fs is called while not all rules were deleted by the
      steering users, it hit refcount underflow on the FTE once clean_tree
      calls to tree_remove_node after the deleted rules already decreased
      the refcount to 0.
      
      FTE is no longer destroyed implicitly when the last rule (destination)
      is deleted. mlx5_del_flow_rules avoids it by increasing the refcount on
      the FTE and destroy it explicitly after all rules were deleted. So we
      can avoid the refcount underflow by making FTE as stand alone object.
      In addition need to set del_hw_func to FTE so the HW object will be
      destroyed when the FTE is deleted from the cleanup_tree flow.
      
      refcount_t: underflow; use-after-free.
      WARNING: CPU: 2 PID: 15715 at lib/refcount.c:28 refcount_warn_saturate+0xd9/0xe0
      Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
      Call Trace:
       tree_put_node+0xf2/0x140 [mlx5_core]
       clean_tree+0x4e/0xf0 [mlx5_core]
       clean_tree+0x4e/0xf0 [mlx5_core]
       clean_tree+0x4e/0xf0 [mlx5_core]
       clean_tree+0x5f/0xf0 [mlx5_core]
       clean_tree+0x4e/0xf0 [mlx5_core]
       clean_tree+0x5f/0xf0 [mlx5_core]
       mlx5_cleanup_fs+0x26/0x270 [mlx5_core]
       mlx5_unload+0x2e/0xa0 [mlx5_core]
       mlx5_unload_one+0x51/0x120 [mlx5_core]
       mlx5_devlink_reload_down+0x51/0x90 [mlx5_core]
       devlink_reload+0x39/0x120
       ? devlink_nl_cmd_reload+0x43/0x220
       genl_rcv_msg+0x1e4/0x420
       ? genl_family_rcv_msg_attrs_parse+0x100/0x100
       netlink_rcv_skb+0x47/0x110
       genl_rcv+0x24/0x40
       netlink_unicast+0x217/0x2f0
       netlink_sendmsg+0x30f/0x430
       sock_sendmsg+0x30/0x40
       __sys_sendto+0x10e/0x140
       ? handle_mm_fault+0xc4/0x1f0
       ? do_page_fault+0x33f/0x630
       __x64_sys_sendto+0x24/0x30
       do_syscall_64+0x48/0x130
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Fixes: 718ce4d6 ("net/mlx5: Consolidate update FTE for all removal changes")
      Fixes: bd71b08e ("net/mlx5: Support multiple updates of steering rules in parallel")
      Signed-off-by: default avatarMaor Gottlieb <maorg@nvidia.com>
      Reviewed-by: default avatarMark Bloch <mbloch@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      cefc2355
  2. 18 Sep, 2020 2 commits
  3. 17 Sep, 2020 11 commits
    • David S. Miller's avatar
      Merge branch 'net-phy-Unbind-fixes' · 0dfdbc74
      David S. Miller authored
      Florian Fainelli says:
      
      ====================
      net: phy: Unbind fixes
      
      This patch series fixes a couple of issues with the unbinding of the PHY
      drivers and then bringing down a network interface. The first is a NULL
      pointer de-reference and the second was an incorrect warning being
      triggered.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0dfdbc74
    • Florian Fainelli's avatar
      net: phy: Do not warn in phy_stop() on PHY_DOWN · 5116a8ad
      Florian Fainelli authored
      When phy_is_started() was added to catch incorrect PHY states,
      phy_stop() would not be qualified against PHY_DOWN. It is possible to
      reach that state when the PHY driver has been unbound and the network
      device is then brought down.
      
      Fixes: 2b3e88ea ("net: phy: improve phy state checking")
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5116a8ad
    • Florian Fainelli's avatar
      net: phy: Avoid NPD upon phy_detach() when driver is unbound · c2b727df
      Florian Fainelli authored
      If we have unbound the PHY driver prior to calling phy_detach() (often
      via phy_disconnect()) then we can cause a NULL pointer de-reference
      accessing the driver owner member. The steps to reproduce are:
      
      echo unimac-mdio-0:01 > /sys/class/net/eth0/phydev/driver/unbind
      ip link set eth0 down
      
      Fixes: cafe8df8 ("net: phy: Fix lack of reference count on PHY driver")
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c2b727df
    • Michal Kubecek's avatar
      ethtool: add and use message type for tunnel info reply · 19a83d36
      Michal Kubecek authored
      Tunnel offload info code uses ETHTOOL_MSG_TUNNEL_INFO_GET message type (cmd
      field in genetlink header) for replies to tunnel info netlink request, i.e.
      the same value as the request have. This is a problem because we are using
      two separate enums for userspace to kernel and kernel to userspace message
      types so that this ETHTOOL_MSG_TUNNEL_INFO_GET (28) collides with
      ETHTOOL_MSG_CABLE_TEST_TDR_NTF which is what message type 28 means for
      kernel to userspace messages.
      
      As the tunnel info request reached mainline in 5.9 merge window, we should
      still be able to fix the reply message type without breaking backward
      compatibility.
      
      Fixes: c7d759eb ("ethtool: add tunnel info interface")
      Signed-off-by: default avatarMichal Kubecek <mkubecek@suse.cz>
      Reviewed-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      19a83d36
    • Xie He's avatar
      drivers/net/wan/hdlc: Set skb->protocol before transmitting · 9fb030a7
      Xie He authored
      This patch sets skb->protocol before transmitting frames on the HDLC
      device, so that a user listening on the HDLC device with an AF_PACKET
      socket will see outgoing frames' sll_protocol field correctly set and
      consistent with that of incoming frames.
      
      1. Control frames in hdlc_cisco and hdlc_ppp
      
      When these drivers send control frames, skb->protocol is not set.
      
      This value should be set to htons(ETH_P_HDLC), because when receiving
      control frames, their skb->protocol is set to htons(ETH_P_HDLC).
      
      When receiving, hdlc_type_trans in hdlc.h is called, which then calls
      cisco_type_trans or ppp_type_trans. The skb->protocol of control frames
      is set to htons(ETH_P_HDLC) so that the control frames can be received
      by hdlc_rcv in hdlc.c, which calls cisco_rx or ppp_rx to process the
      control frames.
      
      2. hdlc_fr
      
      When this driver sends control frames, skb->protocol is set to internal
      values used in this driver.
      
      When this driver sends data frames (from upper stacked PVC devices),
      skb->protocol is the same as that of the user data packet being sent on
      the upper PVC device (for normal PVC devices), or is htons(ETH_P_802_3)
      (for Ethernet-emulating PVC devices).
      
      However, skb->protocol for both control frames and data frames should be
      set to htons(ETH_P_HDLC), because when receiving, all frames received on
      the HDLC device will have their skb->protocol set to htons(ETH_P_HDLC).
      
      When receiving, hdlc_type_trans in hdlc.h is called, and because this
      driver doesn't provide a type_trans function in struct hdlc_proto,
      all frames will have their skb->protocol set to htons(ETH_P_HDLC).
      The frames are then received by hdlc_rcv in hdlc.c, which calls fr_rx
      to process the frames (control frames are consumed and data frames
      are re-received on upper PVC devices).
      
      Cc: Krzysztof Halasa <khc@pm.waw.pl>
      Signed-off-by: default avatarXie He <xie.he.0141@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9fb030a7
    • Xie He's avatar
      drivers/net/wan/lapbether: Make skb->protocol consistent with the header · 83f9a9c8
      Xie He authored
      This driver is a virtual driver stacked on top of Ethernet interfaces.
      
      When this driver transmits data on the Ethernet device, the skb->protocol
      setting is inconsistent with the Ethernet header prepended to the skb.
      
      This causes a user listening on the Ethernet interface with an AF_PACKET
      socket, to see different sll_protocol values for incoming and outgoing
      frames, because incoming frames would have this value set by parsing the
      Ethernet header.
      
      This patch changes the skb->protocol value for outgoing Ethernet frames,
      making it consistent with the Ethernet header prepended. This makes a
      user listening on the Ethernet device with an AF_PACKET socket, to see
      the same sll_protocol value for incoming and outgoing frames.
      
      Cc: Martin Schiller <ms@dev.tdt.de>
      Signed-off-by: default avatarXie He <xie.he.0141@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      83f9a9c8
    • Raju Rangoju's avatar
      cxgb4: fix memory leak during module unload · f4a26a9b
      Raju Rangoju authored
      Fix the memory leak in mps during module unload
      path by freeing mps reference entries if the list
      adpter->mps_ref is not already empty
      
      Fixes: 28b38705 ("cxgb4: Re-work the logic for mps refcounting")
      Signed-off-by: default avatarRaju Rangoju <rajur@chelsio.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f4a26a9b
    • Andres Beltran's avatar
      hv_netvsc: Add validation for untrusted Hyper-V values · 44144185
      Andres Beltran authored
      For additional robustness in the face of Hyper-V errors or malicious
      behavior, validate all values that originate from packets that Hyper-V
      has sent to the guest in the host-to-guest ring buffer. Ensure that
      invalid values cannot cause indexing off the end of an array, or
      subvert an existing validation via integer overflow. Ensure that
      outgoing packets do not have any leftover guest memory that has not
      been zeroed out.
      Signed-off-by: default avatarAndres Beltran <lkmlabelt@gmail.com>
      Co-developed-by: default avatarAndrea Parri (Microsoft) <parri.andrea@gmail.com>
      Signed-off-by: default avatarAndrea Parri (Microsoft) <parri.andrea@gmail.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Jakub Kicinski <kuba@kernel.org>
      Cc: netdev@vger.kernel.org
      Reviewed-by: default avatarHaiyang Zhang <haiyangz@microsoft.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      44144185
    • Matthias Schiffer's avatar
      net: dsa: microchip: ksz8795: really set the correct number of ports · fd944dc2
      Matthias Schiffer authored
      The KSZ9477 and KSZ8795 use the port_cnt field differently: For the
      KSZ9477, it includes the CPU port(s), while for the KSZ8795, it doesn't.
      
      It would be a good cleanup to make the handling of both drivers match,
      but as a first step, fix the recently broken assignment of num_ports in
      the KSZ8795 driver (which completely broke probing, as the CPU port
      index was always failing the num_ports check).
      
      Fixes: af199a1a ("net: dsa: microchip: set the correct number of ports")
      Signed-off-by: default avatarMatthias Schiffer <matthias.schiffer@ew.tq-group.com>
      Reviewed-by: default avatarCodrin Ciubotariu <codrin.ciubotariu@microchip.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fd944dc2
    • Mark Gray's avatar
      geneve: add transport ports in route lookup for geneve · 34beb215
      Mark Gray authored
      This patch adds transport ports information for route lookup so that
      IPsec can select Geneve tunnel traffic to do encryption. This is
      needed for OVS/OVN IPsec with encrypted Geneve tunnels.
      
      This can be tested by configuring a host-host VPN using an IKE
      daemon and specifying port numbers. For example, for an
      Openswan-type configuration, the following parameters should be
      configured on both hosts and IPsec set up as-per normal:
      
      $ cat /etc/ipsec.conf
      
      conn in
      ...
      left=$IP1
      right=$IP2
      ...
      leftprotoport=udp/6081
      rightprotoport=udp
      ...
      conn out
      ...
      left=$IP1
      right=$IP2
      ...
      leftprotoport=udp
      rightprotoport=udp/6081
      ...
      
      The tunnel can then be setup using "ip" on both hosts (but
      changing the relevant IP addresses):
      
      $ ip link add tun type geneve id 1000 remote $IP2
      $ ip addr add 192.168.0.1/24 dev tun
      $ ip link set tun up
      
      This can then be tested by pinging from $IP1:
      
      $ ping 192.168.0.2
      
      Without this patch the traffic is unencrypted on the wire.
      
      Fixes: 2d07dc79 ("geneve: add initial netdev driver for GENEVE tunnels")
      Signed-off-by: default avatarQiuyu Xiao <qiuyu.xiao.qyx@gmail.com>
      Signed-off-by: default avatarMark Gray <mark.d.gray@redhat.com>
      Reviewed-by: default avatarGreg Rose <gvrose8192@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      34beb215
    • Lu Wei's avatar
      net: hns: kerneldoc fixes · 5f1ab0f4
      Lu Wei authored
      Fix some parameter description or spelling mistakes.
      Signed-off-by: default avatarLu Wei <luwei32@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5f1ab0f4
  4. 16 Sep, 2020 3 commits
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf · d5d325ea
      David S. Miller authored
      Alexei Starovoitov says:
      
      ====================
      pull-request: bpf 2020-09-15
      
      The following pull-request contains BPF updates for your *net* tree.
      
      We've added 12 non-merge commits during the last 19 day(s) which contain
      a total of 10 files changed, 47 insertions(+), 38 deletions(-).
      
      The main changes are:
      
      1) docs/bpf fixes, from Andrii.
      
      2) ld_abs fix, from Daniel.
      
      3) socket casting helpers fix, from Martin.
      
      4) hash iterator fixes, from Yonghong.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d5d325ea
    • Yonghong Song's avatar
      bpf: Fix a rcu warning for bpffs map pretty-print · ce880cb8
      Yonghong Song authored
      Running selftest
        ./btf_btf -p
      the kernel had the following warning:
        [   51.528185] WARNING: CPU: 3 PID: 1756 at kernel/bpf/hashtab.c:717 htab_map_get_next_key+0x2eb/0x300
        [   51.529217] Modules linked in:
        [   51.529583] CPU: 3 PID: 1756 Comm: test_btf Not tainted 5.9.0-rc1+ #878
        [   51.530346] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.3-1.el7.centos 04/01/2014
        [   51.531410] RIP: 0010:htab_map_get_next_key+0x2eb/0x300
        ...
        [   51.542826] Call Trace:
        [   51.543119]  map_seq_next+0x53/0x80
        [   51.543528]  seq_read+0x263/0x400
        [   51.543932]  vfs_read+0xad/0x1c0
        [   51.544311]  ksys_read+0x5f/0xe0
        [   51.544689]  do_syscall_64+0x33/0x40
        [   51.545116]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      The related source code in kernel/bpf/hashtab.c:
        709 static int htab_map_get_next_key(struct bpf_map *map, void *key, void *next_key)
        710 {
        711         struct bpf_htab *htab = container_of(map, struct bpf_htab, map);
        712         struct hlist_nulls_head *head;
        713         struct htab_elem *l, *next_l;
        714         u32 hash, key_size;
        715         int i = 0;
        716
        717         WARN_ON_ONCE(!rcu_read_lock_held());
      
      In kernel/bpf/inode.c, bpffs map pretty print calls map->ops->map_get_next_key()
      without holding a rcu_read_lock(), hence causing the above warning.
      To fix the issue, just surrounding map->ops->map_get_next_key() with rcu read lock.
      
      Fixes: a26ca7c9 ("bpf: btf: Add pretty print support to the basic arraymap")
      Reported-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarYonghong Song <yhs@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarAndrii Nakryiko <andriin@fb.com>
      Cc: Martin KaFai Lau <kafai@fb.com>
      Link: https://lore.kernel.org/bpf/20200916004401.146277-1-yhs@fb.com
      ce880cb8
    • Martin KaFai Lau's avatar
      bpf: Bpf_skc_to_* casting helpers require a NULL check on sk · 8c33dadc
      Martin KaFai Lau authored
      The bpf_skc_to_* type casting helpers are available to
      BPF_PROG_TYPE_TRACING.  The traced PTR_TO_BTF_ID may be NULL.
      For example, the skb->sk may be NULL.  Thus, these casting helpers
      need to check "!sk" also and this patch fixes them.
      
      Fixes: 0d4fad3e ("bpf: Add bpf_skc_to_udp6_sock() helper")
      Fixes: 478cfbdf ("bpf: Add bpf_skc_to_{tcp, tcp_timewait, tcp_request}_sock() helpers")
      Fixes: af7ec138 ("bpf: Add bpf_skc_to_tcp6_sock() helper")
      Signed-off-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarYonghong Song <yhs@fb.com>
      Acked-by: default avatarSong Liu <songliubraving@fb.com>
      Link: https://lore.kernel.org/bpf/20200915182959.241101-1-kafai@fb.com
      8c33dadc
  5. 15 Sep, 2020 5 commits
    • David Ahern's avatar
      ipv4: Update exception handling for multipath routes via same device · 2fbc6e89
      David Ahern authored
      Kfir reported that pmtu exceptions are not created properly for
      deployments where multipath routes use the same device.
      
      After some digging I see 2 compounding problems:
      1. ip_route_output_key_hash_rcu is updating the flowi4_oif *after*
         the route lookup. This is the second use case where this has
         been a problem (the first is related to use of vti devices with
         VRF). I can not find any reason for the oif to be changed after the
         lookup; the code goes back to the start of git. It does not seem
         logical so remove it.
      
      2. fib_lookups for exceptions do not call fib_select_path to handle
         multipath route selection based on the hash.
      
      The end result is that the fib_lookup used to add the exception
      always creates it based using the first leg of the route.
      
      An example topology showing the problem:
      
                       |  host1
                   +------+
                   | eth0 |  .209
                   +------+
                       |
                   +------+
           switch  | br0  |
                   +------+
                       |
             +---------+---------+
             | host2             |  host3
         +------+             +------+
         | eth0 | .250        | eth0 | 192.168.252.252
         +------+             +------+
      
         +-----+             +-----+
         | vti | .2          | vti | 192.168.247.3
         +-----+             +-----+
             \                  /
       =================================
       tunnels
               192.168.247.1/24
      
      for h in host1 host2 host3; do
              ip netns add ${h}
              ip -netns ${h} link set lo up
              ip netns exec ${h} sysctl -wq net.ipv4.ip_forward=1
      done
      
      ip netns add switch
      ip -netns switch li set lo up
      ip -netns switch link add br0 type bridge stp 0
      ip -netns switch link set br0 up
      
      for n in 1 2 3; do
              ip -netns switch link add eth-sw type veth peer name eth-h${n}
              ip -netns switch li set eth-h${n} master br0 up
              ip -netns switch li set eth-sw netns host${n} name eth0
      done
      
      ip -netns host1 addr add 192.168.252.209/24 dev eth0
      ip -netns host1 link set dev eth0 up
      ip -netns host1 route add 192.168.247.0/24 \
              nexthop via 192.168.252.250 dev eth0 nexthop via 192.168.252.252 dev eth0
      
      ip -netns host2 addr add 192.168.252.250/24 dev eth0
      ip -netns host2 link set dev eth0 up
      
      ip -netns host2 addr add 192.168.252.252/24 dev eth0
      ip -netns host3 link set dev eth0 up
      
      ip netns add tunnel
      ip -netns tunnel li set lo up
      ip -netns tunnel li add br0 type bridge
      ip -netns tunnel li set br0 up
      for n in $(seq 11 20); do
              ip -netns tunnel addr add dev br0 192.168.247.${n}/24
      done
      
      for n in 2 3
      do
              ip -netns tunnel link add vti${n} type veth peer name eth${n}
              ip -netns tunnel link set eth${n} mtu 1360 master br0 up
              ip -netns tunnel link set vti${n} netns host${n} mtu 1360 up
              ip -netns host${n} addr add dev vti${n} 192.168.247.${n}/24
      done
      ip -netns tunnel ro add default nexthop via 192.168.247.2 nexthop via 192.168.247.3
      
      ip netns exec host1 ping -M do -s 1400 -c3 -I 192.168.252.209 192.168.247.11
      ip netns exec host1 ping -M do -s 1400 -c3 -I 192.168.252.209 192.168.247.15
      ip -netns host1 ro ls cache
      
      Before this patch the cache always shows exceptions against the first
      leg in the multipath route; 192.168.252.250 per this example. Since the
      hash has an initial random seed, you may need to vary the final octet
      more than what is listed. In my tests, using addresses between 11 and 19
      usually found 1 that used both legs.
      
      With this patch, the cache will have exceptions for both legs.
      
      Fixes: 4895c771 ("ipv4: Add FIB nexthop exceptions")
      Reported-by: default avatarKfir Itzhak <mastertheknife@gmail.com>
      Signed-off-by: default avatarDavid Ahern <dsahern@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2fbc6e89
    • Lu Wei's avatar
      net: tipc: kerneldoc fixes · 2e5117ba
      Lu Wei authored
      Fix parameter description of tipc_link_bc_create()
      Reported-by: default avatarHulk Robot <hulkci@huawei.com>
      Fixes: 16ad3f40 ("tipc: introduce variable window congestion control")
      Signed-off-by: default avatarLu Wei <luwei32@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2e5117ba
    • Dany Madden's avatar
      ibmvnic: update MAINTAINERS · d3f2ef18
      Dany Madden authored
      Update supporters for IBM Power SRIOV Virtual NIC Device Driver.
      Thomas Falcon is moving on to other works. Dany Madden, Lijun Pan
      and Sukadev Bhattiprolu are the current supporters.
      Signed-off-by: default avatarDany Madden <drt@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d3f2ef18
    • Andrii Nakryiko's avatar
      docs/bpf: Remove source code links · 65dce596
      Andrii Nakryiko authored
      Make path to bench_ringbufs.c just a text, not a special link.
      
      Fixes: 97abb2b3 ("docs/bpf: Add BPF ring buffer design notes")
      Reported-by: default avatarMauro Carvalho Chehab <mchehab+huawei@kernel.org>
      Signed-off-by: default avatarAndrii Nakryiko <andriin@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20200915005031.2748397-1-andriin@fb.com
      65dce596
    • Björn Töpel's avatar
      xsk: Fix number of pinned pages/umem size discrepancy · 2b1667e5
      Björn Töpel authored
      For AF_XDP sockets, there was a discrepancy between the number of of
      pinned pages and the size of the umem region.
      
      The size of the umem region is used to validate the AF_XDP descriptor
      addresses. The logic that pinned the pages covered by the region only
      took whole pages into consideration, creating a mismatch between the
      size and pinned pages. A user could then pass AF_XDP addresses outside
      the range of pinned pages, but still within the size of the region,
      crashing the kernel.
      
      This change correctly calculates the number of pages to be
      pinned. Further, the size check for the aligned mode is
      simplified. Now the code simply checks if the size is divisible by the
      chunk size.
      
      Fixes: bbff2f32 ("xsk: new descriptor addressing scheme")
      Reported-by: default avatarCiara Loftus <ciara.loftus@intel.com>
      Signed-off-by: default avatarBjörn Töpel <bjorn.topel@intel.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Tested-by: default avatarCiara Loftus <ciara.loftus@intel.com>
      Acked-by: default avatarSong Liu <songliubraving@fb.com>
      Link: https://lore.kernel.org/bpf/20200910075609.7904-1-bjorn.topel@gmail.com
      2b1667e5
  6. 14 Sep, 2020 9 commits