1. 10 Jul, 2020 22 commits
    • David S. Miller's avatar
      Merge branch 'mlxsw-Various-fixes' · 1195c7ce
      David S. Miller authored
      Ido Schimmel says:
      
      ====================
      mlxsw: Various fixes
      
      Fix two issues found by syzkaller.
      
      Patch #1 removes inappropriate usage of WARN_ON() following memory
      allocation failure. Constantly triggered when syzkaller injects faults.
      
      Patch #2 fixes a use-after-free that can be triggered by 'devlink dev
      info' following a failed devlink reload.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1195c7ce
    • Ido Schimmel's avatar
      mlxsw: pci: Fix use-after-free in case of failed devlink reload · c4317b11
      Ido Schimmel authored
      In case devlink reload failed, it is possible to trigger a
      use-after-free when querying the kernel for device info via 'devlink dev
      info' [1].
      
      This happens because as part of the reload error path the PCI command
      interface is de-initialized and its mailboxes are freed. When the
      devlink '->info_get()' callback is invoked the device is queried via the
      command interface and the freed mailboxes are accessed.
      
      Fix this by initializing the command interface once during probe and not
      during every reload.
      
      This is consistent with the other bus used by mlxsw (i.e., 'mlxsw_i2c')
      and also allows user space to query the running firmware version (for
      example) from the device after a failed reload.
      
      [1]
      BUG: KASAN: use-after-free in memcpy include/linux/string.h:406 [inline]
      BUG: KASAN: use-after-free in mlxsw_pci_cmd_exec+0x177/0xa60 drivers/net/ethernet/mellanox/mlxsw/pci.c:1675
      Write of size 4096 at addr ffff88810ae32000 by task syz-executor.1/2355
      
      CPU: 1 PID: 2355 Comm: syz-executor.1 Not tainted 5.8.0-rc2+ #29
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0xf6/0x16e lib/dump_stack.c:118
       print_address_description.constprop.0+0x1c/0x250 mm/kasan/report.c:383
       __kasan_report mm/kasan/report.c:513 [inline]
       kasan_report.cold+0x1f/0x37 mm/kasan/report.c:530
       check_memory_region_inline mm/kasan/generic.c:186 [inline]
       check_memory_region+0x14e/0x1b0 mm/kasan/generic.c:192
       memcpy+0x39/0x60 mm/kasan/common.c:106
       memcpy include/linux/string.h:406 [inline]
       mlxsw_pci_cmd_exec+0x177/0xa60 drivers/net/ethernet/mellanox/mlxsw/pci.c:1675
       mlxsw_cmd_exec+0x249/0x550 drivers/net/ethernet/mellanox/mlxsw/core.c:2335
       mlxsw_cmd_access_reg drivers/net/ethernet/mellanox/mlxsw/cmd.h:859 [inline]
       mlxsw_core_reg_access_cmd drivers/net/ethernet/mellanox/mlxsw/core.c:1938 [inline]
       mlxsw_core_reg_access+0x2f6/0x540 drivers/net/ethernet/mellanox/mlxsw/core.c:1985
       mlxsw_reg_query drivers/net/ethernet/mellanox/mlxsw/core.c:2000 [inline]
       mlxsw_devlink_info_get+0x17f/0x6e0 drivers/net/ethernet/mellanox/mlxsw/core.c:1090
       devlink_nl_info_fill.constprop.0+0x13c/0x2d0 net/core/devlink.c:4588
       devlink_nl_cmd_info_get_dumpit+0x246/0x460 net/core/devlink.c:4648
       genl_lock_dumpit+0x85/0xc0 net/netlink/genetlink.c:575
       netlink_dump+0x515/0xe50 net/netlink/af_netlink.c:2245
       __netlink_dump_start+0x53d/0x830 net/netlink/af_netlink.c:2353
       genl_family_rcv_msg_dumpit.isra.0+0x296/0x300 net/netlink/genetlink.c:638
       genl_family_rcv_msg net/netlink/genetlink.c:733 [inline]
       genl_rcv_msg+0x78d/0x9d0 net/netlink/genetlink.c:753
       netlink_rcv_skb+0x152/0x440 net/netlink/af_netlink.c:2469
       genl_rcv+0x24/0x40 net/netlink/genetlink.c:764
       netlink_unicast_kernel net/netlink/af_netlink.c:1303 [inline]
       netlink_unicast+0x53a/0x750 net/netlink/af_netlink.c:1329
       netlink_sendmsg+0x850/0xd90 net/netlink/af_netlink.c:1918
       sock_sendmsg_nosec net/socket.c:652 [inline]
       sock_sendmsg+0x150/0x190 net/socket.c:672
       ____sys_sendmsg+0x6d8/0x840 net/socket.c:2363
       ___sys_sendmsg+0xff/0x170 net/socket.c:2417
       __sys_sendmsg+0xe5/0x1b0 net/socket.c:2450
       do_syscall_64+0x56/0xa0 arch/x86/entry/common.c:359
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Fixes: a9c8336f ("mlxsw: core: Add support for devlink info command")
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Reviewed-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c4317b11
    • Ido Schimmel's avatar
      mlxsw: spectrum_router: Remove inappropriate usage of WARN_ON() · d9d54202
      Ido Schimmel authored
      We should not trigger a warning when a memory allocation fails. Remove
      the WARN_ON().
      
      The warning is constantly triggered by syzkaller when it is injecting
      faults:
      
      [ 2230.758664] FAULT_INJECTION: forcing a failure.
      [ 2230.758664] name failslab, interval 1, probability 0, space 0, times 0
      [ 2230.762329] CPU: 3 PID: 1407 Comm: syz-executor.0 Not tainted 5.8.0-rc2+ #28
      ...
      [ 2230.898175] WARNING: CPU: 3 PID: 1407 at drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c:6265 mlxsw_sp_router_fib_event+0xfad/0x13e0
      [ 2230.898179] Kernel panic - not syncing: panic_on_warn set ...
      [ 2230.898183] CPU: 3 PID: 1407 Comm: syz-executor.0 Not tainted 5.8.0-rc2+ #28
      [ 2230.898190] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
      
      Fixes: 3057224e ("mlxsw: spectrum_router: Implement FIB offload in deferred work")
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Reviewed-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d9d54202
    • David S. Miller's avatar
      Merge branch 'macb-WOL-fixes' · f9f41e3d
      David S. Miller authored
      Nicolas Ferre says:
      
      ====================
      net: macb: Wake-on-Lan magic packet fixes and GEM handling
      
      Here is a split series to fix WoL magic-packet on the current macb driver. Only
      fixes in this one based on current net/master.
      
      Changes in v5:
      - Addressed the error code returned by phylink_ethtool_set_wol() as suggested
        by Russell.
        If PHY handles WoL, MAC doesn't stay in the way.
      - Removed Florian's tag on 3/5 because of the above changes.
      - Correct the "Fixes" tag on 1/5.
      
      Changes in v4:
      - Pure bug fix series for 'net'. GEM addition and MACB update removed: will be
        sent later.
      
      Changes in v3:
      - Revert some of the v2 changes done in macb_resume(). Now the resume function
        supports in-depth re-configuration of the controller in order to deal with
        deeper sleep states. Basically as it was before changes introduced by this
        series
      - Tested for non-regression with our deeper Power Management mode which cuts
        power to the controller completely
      
      Changes in v2:
      - Add patch 4/7 ("net: macb: fix macb_suspend() by removing call to netif_carrier_off()")
        needed for keeping phy state consistent
      - Add patch 5/7 ("net: macb: fix call to pm_runtime in the suspend/resume functions") that prevent
        putting the macb in runtime pm suspend mode when WoL is used
      - Collect review tags on 3 first patches from Florian: Thanks!
      - Review of macb_resume() function
      - Addition of pm_wakeup_event() in both MACB and GEM WoL IRQ handlers
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f9f41e3d
    • Nicolas Ferre's avatar
      net: macb: fix call to pm_runtime in the suspend/resume functions · 6c8f85ca
      Nicolas Ferre authored
      The calls to pm_runtime_force_suspend/resume() functions are only
      relevant if the device is not configured to act as a WoL wakeup source.
      Add the device_may_wakeup() test before calling them.
      
      Fixes: 3e2a5e15 ("net: macb: add wake-on-lan support via magic packet")
      Cc: Claudiu Beznea <claudiu.beznea@microchip.com>
      Cc: Harini Katakam <harini.katakam@xilinx.com>
      Cc: Sergio Prado <sergio.prado@e-labworks.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarNicolas Ferre <nicolas.ferre@microchip.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6c8f85ca
    • Nicolas Ferre's avatar
      net: macb: fix macb_suspend() by removing call to netif_carrier_off() · 64febc5e
      Nicolas Ferre authored
      As we now use the phylink call to phylink_stop() in the non-WoL path,
      there is no need for this call to netif_carrier_off() anymore. It can
      disturb the underlying phylink FSM.
      
      Fixes: 7897b071 ("net: macb: convert to phylink")
      Cc: Claudiu Beznea <claudiu.beznea@microchip.com>
      Cc: Harini Katakam <harini.katakam@xilinx.com>
      Cc: Antoine Tenart <antoine.tenart@bootlin.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarNicolas Ferre <nicolas.ferre@microchip.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      64febc5e
    • Nicolas Ferre's avatar
      net: macb: fix macb_get/set_wol() when moving to phylink · 253fe094
      Nicolas Ferre authored
      Keep previous function goals and integrate phylink actions to them.
      
      phylink_ethtool_get_wol() is not enough to figure out if Ethernet driver
      supports Wake-on-Lan.
      Initialization of "supported" and "wolopts" members is done in phylink
      function, no need to keep them in calling function.
      
      phylink_ethtool_set_wol() return value is considered and determines
      if the MAC has to handle WoL or not. The case where the PHY doesn't
      implement WoL leads to the MAC configuring it to provide this feature.
      
      Fixes: 7897b071 ("net: macb: convert to phylink")
      Cc: Claudiu Beznea <claudiu.beznea@microchip.com>
      Cc: Harini Katakam <harini.katakam@xilinx.com>
      Cc: Antoine Tenart <antoine.tenart@bootlin.com>
      Cc: Florian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarNicolas Ferre <nicolas.ferre@microchip.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      253fe094
    • Nicolas Ferre's avatar
      net: macb: mark device wake capable when "magic-packet" property present · ced4799d
      Nicolas Ferre authored
      Change the way the "magic-packet" DT property is handled in the
      macb_probe() function, matching DT binding documentation.
      Now we mark the device as "wakeup capable" instead of calling the
      device_init_wakeup() function that would enable the wakeup source.
      
      For Ethernet WoL, enabling the wakeup_source is done by
      using ethtool and associated macb_set_wol() function that
      already calls device_set_wakeup_enable() for this purpose.
      
      That would reduce power consumption by cutting more clocks if
      "magic-packet" property is set but WoL is not configured by ethtool.
      
      Fixes: 3e2a5e15 ("net: macb: add wake-on-lan support via magic packet")
      Cc: Claudiu Beznea <claudiu.beznea@microchip.com>
      Cc: Harini Katakam <harini.katakam@xilinx.com>
      Cc: Sergio Prado <sergio.prado@e-labworks.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarNicolas Ferre <nicolas.ferre@microchip.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ced4799d
    • Nicolas Ferre's avatar
      net: macb: fix wakeup test in runtime suspend/resume routines · 515a10a7
      Nicolas Ferre authored
      Use the proper struct device pointer to check if the wakeup flag
      and wakeup source are positioned.
      Use the one passed by function call which is equivalent to
      &bp->dev->dev.parent.
      
      It's preventing the trigger of a spurious interrupt in case the
      Wake-on-Lan feature is used.
      
      Fixes: d54f89af ("net: macb: Add pm runtime support")
      Cc: Claudiu Beznea <claudiu.beznea@microchip.com>
      Cc: Harini Katakam <harini.katakam@xilinx.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarNicolas Ferre <nicolas.ferre@microchip.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      515a10a7
    • Davide Caratti's avatar
      bnxt_en: fix NULL dereference in case SR-IOV configuration fails · c8b1d743
      Davide Caratti authored
      we need to set 'active_vfs' back to 0, if something goes wrong during the
      allocation of SR-IOV resources: otherwise, further VF configurations will
      wrongly assume that bp->pf.vf[x] are valid memory locations, and commands
      like the ones in the following sequence:
      
       # echo 2 >/sys/bus/pci/devices/${ADDR}/sriov_numvfs
       # ip link set dev ens1f0np0 up
       # ip link set dev ens1f0np0 vf 0 trust on
      
      will cause a kernel crash similar to this:
      
       bnxt_en 0000:3b:00.0: not enough MMIO resources for SR-IOV
       BUG: kernel NULL pointer dereference, address: 0000000000000014
       #PF: supervisor read access in kernel mode
       #PF: error_code(0x0000) - not-present page
       PGD 0 P4D 0
       Oops: 0000 [#1] SMP PTI
       CPU: 43 PID: 2059 Comm: ip Tainted: G          I       5.8.0-rc2.upstream+ #871
       Hardware name: Dell Inc. PowerEdge R740/08D89F, BIOS 2.2.11 06/13/2019
       RIP: 0010:bnxt_set_vf_trust+0x5b/0x110 [bnxt_en]
       Code: 44 24 58 31 c0 e8 f5 fb ff ff 85 c0 0f 85 b6 00 00 00 48 8d 1c 5b 41 89 c6 b9 0b 00 00 00 48 c1 e3 04 49 03 9c 24 f0 0e 00 00 <8b> 43 14 89 c2 83 c8 10 83 e2 ef 45 84 ed 49 89 e5 0f 44 c2 4c 89
       RSP: 0018:ffffac6246a1f570 EFLAGS: 00010246
       RAX: 0000000000000000 RBX: 0000000000000000 RCX: 000000000000000b
       RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff98b28f538900
       RBP: ffff98b28f538900 R08: 0000000000000000 R09: 0000000000000008
       R10: ffffffffb9515be0 R11: ffffac6246a1f678 R12: ffff98b28f538000
       R13: 0000000000000001 R14: 0000000000000000 R15: ffffffffc05451e0
       FS:  00007fde0f688800(0000) GS:ffff98baffd40000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       CR2: 0000000000000014 CR3: 000000104bb0a003 CR4: 00000000007606e0
       DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
       DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
       PKRU: 55555554
       Call Trace:
        do_setlink+0x994/0xfe0
        __rtnl_newlink+0x544/0x8d0
        rtnl_newlink+0x47/0x70
        rtnetlink_rcv_msg+0x29f/0x350
        netlink_rcv_skb+0x4a/0x110
        netlink_unicast+0x21d/0x300
        netlink_sendmsg+0x329/0x450
        sock_sendmsg+0x5b/0x60
        ____sys_sendmsg+0x204/0x280
        ___sys_sendmsg+0x88/0xd0
        __sys_sendmsg+0x5e/0xa0
        do_syscall_64+0x47/0x80
        entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Fixes: c0c050c5 ("bnxt_en: New Broadcom ethernet driver.")
      Reported-by: default avatarFei Liu <feliu@redhat.com>
      CC: Jonathan Toppins <jtoppins@redhat.com>
      CC: Michael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavide Caratti <dcaratti@redhat.com>
      Reviewed-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Acked-by: default avatarJonathan Toppins <jtoppins@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c8b1d743
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf · 45ae836f
      David S. Miller authored
      Alexei Starovoitov says:
      
      ====================
      pull-request: bpf 2020-07-09
      
      The following pull-request contains BPF updates for your *net* tree.
      
      We've added 4 non-merge commits during the last 1 day(s) which contain
      a total of 4 files changed, 26 insertions(+), 15 deletions(-).
      
      The main changes are:
      
      1) fix crash in libbpf on 32-bit archs, from Jakub and Andrii.
      
      2) fix crash when l2tp and bpf_sk_reuseport conflict, from Martin.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      45ae836f
    • David S. Miller's avatar
      Merge tag 'mlx5-fixes-2020-07-02' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux · ca68d563
      David S. Miller authored
      Saeed Mahameed says:
      
      ====================
      mlx5 fixes 2020-07-02
      
      This series introduces some fixes to mlx5 driver.
      
      V1->v2:
       - Drop "ip -s" patch and mirred device hold reference patch.
       - Will revise them in a later submission.
      
      Please pull and let me know if there is any problem.
      
      For -stable v5.2
       ('net/mlx5: Fix eeprom support for SFP module')
      
      For -stable v5.4
       ('net/mlx5e: Fix 50G per lane indication')
      
      For -stable v5.5
       ('net/mlx5e: Fix CPU mapping after function reload to avoid aRFS RX crash')
       ('net/mlx5e: Fix VXLAN configuration restore after function reload')
      
      For -stable v5.7
       ('net/mlx5e: CT: Fix memory leak in cleanup')
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ca68d563
    • Jakub Bogusz's avatar
      libbpf: Fix libbpf hashmap on (I)LP32 architectures · b2f9f153
      Jakub Bogusz authored
      On ILP32, 64-bit result was shifted by value calculated for 32-bit long type
      and returned value was much outside hashmap capacity.
      As advised by Andrii Nakryiko, this patch uses different hashing variant for
      architectures with size_t shorter than long long.
      
      Fixes: e3b92422 ("libbpf: add resizable non-thread safe internal hashmap")
      Signed-off-by: default avatarJakub Bogusz <qboosh@pld-linux.org>
      Signed-off-by: default avatarAndrii Nakryiko <andriin@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20200709225723.1069937-1-andriin@fb.com
      b2f9f153
    • Eli Britstein's avatar
      net/mlx5e: CT: Fix memory leak in cleanup · eb32b3f5
      Eli Britstein authored
      CT entries are deleted via a workqueue from netfilter. If removing the
      module before that, the rules are cleaned by the driver itself, but the
      memory entries for them are not freed. Fix that.
      
      Fixes: ac991b48 ("net/mlx5e: CT: Offload established flows")
      Signed-off-by: default avatarEli Britstein <elibr@mellanox.com>
      Reviewed-by: default avatarRoi Dayan <roid@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      eb32b3f5
    • Eran Ben Elisha's avatar
      net/mlx5e: Fix port buffers cell size value · 88b3d5c9
      Eran Ben Elisha authored
      Device unit for port buffers size, xoff_threshold and xon_threshold is
      cells. Fix a bug in driver where cell unit size was hard-coded to
      128 bytes. This hard-coded value is buggy, as it is wrong for some hardware
      versions.
      
      Driver to read cell size from SBCAM register and translate bytes to cell
      units accordingly.
      
      In order to fix the bug, this patch exposes SBCAM (Shared buffer
      capabilities mask) layout and defines.
      
      If SBCAM.cap_cell_size is valid, use it for all bytes to cells
      calculations. If not valid, fallback to 128.
      
      Cell size do not change on the fly per device. Instead of issuing SBCAM
      access reg command every time such translation is needed, cache it in
      mlx5e_dcbx as part of mlx5e_dcbnl_initialize(). Pass dcbx.port_buff_cell_sz
      as a param to every function that needs bytes to cells translation.
      
      While fixing the bug, move MLX5E_BUFFER_CELL_SHIFT macro to
      en_dcbnl.c, as it is only used by that file.
      
      Fixes: 0696d608 ("net/mlx5e: Receive buffer configuration")
      Signed-off-by: default avatarEran Ben Elisha <eranbe@mellanox.com>
      Reviewed-by: default avatarHuy Nguyen <huyn@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      88b3d5c9
    • Aya Levin's avatar
      net/mlx5e: Fix 50G per lane indication · 6a1cf4e4
      Aya Levin authored
      Some released FW versions mistakenly don't set the capability that 50G
      per lane link-modes are supported for VFs (ptys_extended_ethernet
      capability bit). When the capability is unset, read
      PTYS.ext_eth_proto_capability (always reliable).
      If PTYS.ext_eth_proto_capability is valid (has a non-zero value)
      conclude that the HCA supports 50G per lane. Otherwise, conclude that
      the HCA doesn't support 50G per lane.
      
      Fixes: a08b4ed1 ("net/mlx5: Add support to ext_* fields introduced in Port Type and Speed register")
      Signed-off-by: default avatarAya Levin <ayal@mellanox.com>
      Reviewed-by: default avatarEran Ben Elisha <eranbe@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      6a1cf4e4
    • Aya Levin's avatar
      net/mlx5e: Fix CPU mapping after function reload to avoid aRFS RX crash · f4aebbfb
      Aya Levin authored
      After function reload, CPU mapping used by aRFS RX is broken, leading to
      a kernel panic. Fix by moving initialization of rx_cpu_rmap from
      netdev_init to netdev_attach. IRQ table is re-allocated on mlx5_load,
      but netdev is not re-initialize.
      
      Trace of the panic:
      [ 22.055672] general protection fault, probably for non-canonical address 0x785634120000ff1c: 0000 [#1] SMP PTI
      [ 22.065010] CPU: 4 PID: 0 Comm: swapper/4 Not tainted 5.7.0-rc2-for-upstream-perf-2020-04-21_16-34-03-31 #1
      [ 22.067967] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
      [ 22.071174] RIP: 0010:get_rps_cpu+0x267/0x300
      [ 22.075692] RSP: 0018:ffffc90000244d60 EFLAGS: 00010202
      [ 22.076888] RAX: ffff888459b0e400 RBX: 0000000000000000 RCX:0000000000000007
      [ 22.078364] RDX: 0000000000008884 RSI: ffff888467cb5b00 RDI:0000000000000000
      [ 22.079815] RBP: 00000000ff342b27 R08: 0000000000000007 R09:0000000000000003
      [ 22.081289] R10: ffffffffffffffff R11: 00000000000070cc R12:ffff888454900000
      [ 22.082767] R13: ffffc90000e5a950 R14: ffffc90000244dc0 R15:0000000000000007
      [ 22.084190] FS: 0000000000000000(0000) GS:ffff88846fc80000(0000)knlGS:0000000000000000
      [ 22.086161] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [ 22.087427] CR2: ffffffffffffffff CR3: 0000000464426003 CR4:0000000000760ee0
      [ 22.088888] DR0: 0000000000000000 DR1: 0000000000000000 DR2:0000000000000000
      [ 22.090336] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:0000000000000400
      [ 22.091764] PKRU: 55555554
      [ 22.092618] Call Trace:
      [ 22.093442] <IRQ>
      [ 22.094211] ? kvm_clock_get_cycles+0xd/0x10
      [ 22.095272] netif_receive_skb_list_internal+0x258/0x2a0
      [ 22.096460] gro_normal_list.part.137+0x19/0x40
      [ 22.097547] napi_complete_done+0xc6/0x110
      [ 22.098685] mlx5e_napi_poll+0x190/0x670 [mlx5_core]
      [ 22.099859] net_rx_action+0x2a0/0x400
      [ 22.100848] __do_softirq+0xd8/0x2a8
      [ 22.101829] irq_exit+0xa5/0xb0
      [ 22.102750] do_IRQ+0x52/0xd0
      [ 22.103654] common_interrupt+0xf/0xf
      [ 22.104641] </IRQ>
      
      Fixes: 4383cfcc ("net/mlx5: Add devlink reload")
      Signed-off-by: default avatarAya Levin <ayal@mellanox.com>
      Reviewed-by: default avatarEran Ben Elisha <eranbe@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      f4aebbfb
    • Aya Levin's avatar
      net/mlx5e: Fix VXLAN configuration restore after function reload · b3c2ed21
      Aya Levin authored
      When detaching netdev, remove vxlan port configuration using
      udp_tunnel_drop_rx_info. During function reload, configuration will be
      restored using udp_tunnel_get_rx_info. This ensures sync between
      firmware and driver. Use udp_tunnel_get_rx_info even if its physical
      interface is down.
      
      Fixes: 4383cfcc ("net/mlx5: Add devlink reload")
      Signed-off-by: default avatarAya Levin <ayal@mellanox.com>
      Reviewed-by: default avatarEran Ben Elisha <eranbe@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      b3c2ed21
    • Vlad Buslov's avatar
      net/mlx5e: Fix usage of rcu-protected pointer · c1aea9e1
      Vlad Buslov authored
      In mlx5e_configure_flower() flow pointer is protected by rcu read lock.
      However, after cited commit the pointer is being used outside of rcu read
      block. Extend the block to protect all pointer accesses.
      
      Fixes: 553f9328 ("net/mlx5e: Support tc block sharing for representors")
      Signed-off-by: default avatarVlad Buslov <vladbu@mellanox.com>
      Reviewed-by: default avatarRoi Dayan <roid@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      c1aea9e1
    • Vlad Buslov's avatar
      net/mxl5e: Verify that rpriv is not NULL · 2fb15e72
      Vlad Buslov authored
      In helper function is_flow_rule_duplicate_allowed() verify that rpviv
      pointer is not NULL before dereferencing it. This can happen when device is
      in NIC mode and leads to following crash:
      
      [90444.046419] BUG: kernel NULL pointer dereference, address: 0000000000000000
      [90444.048149] #PF: supervisor read access in kernel mode
      [90444.049781] #PF: error_code(0x0000) - not-present page
      [90444.051386] PGD 80000003d35a4067 P4D 80000003d35a4067 PUD 3d35a3067 PMD 0
      [90444.053051] Oops: 0000 [#1] SMP PTI
      [90444.054683] CPU: 16 PID: 31736 Comm: tc Not tainted 5.8.0-rc1+ #1157
      [90444.056340] Hardware name: Supermicro SYS-2028TP-DECR/X10DRT-P, BIOS 2.0b 03/30/2017
      [90444.058079] RIP: 0010:mlx5e_configure_flower+0x3aa/0x9b0 [mlx5_core]
      [90444.059753] Code: 24 50 49 8b 95 08 02 00 00 48 b8 00 08 00 00 04 00 00 00 48 21 c2 48 39 c2 74 0a 41 f6 85 0d 02 00 00 20 74 16 48 8b 44 24 20 <48> 8b 00 66 83 78 20 ff 74 07 4d 89 aa e0 00 00 00 48 83 7d 28 00
      [90444.063232] RSP: 0018:ffffabe9c61ff768 EFLAGS: 00010246
      [90444.065014] RAX: 0000000000000000 RBX: ffff9b13c4c91e80 RCX: 00000000000093fa
      [90444.066784] RDX: 0000000400000800 RSI: 0000000000000000 RDI: 000000000002d5e0
      [90444.068533] RBP: ffff9b174d308468 R08: 0000000000000000 R09: ffff9b17d63003f0
      [90444.070285] R10: ffff9b17ea288600 R11: 0000000000000000 R12: ffffabe9c61ff878
      [90444.072032] R13: ffff9b174d300000 R14: ffffabe9c61ffbb8 R15: ffff9b174d300880
      [90444.073760] FS:  00007f3c23775480(0000) GS:ffff9b13efc80000(0000) knlGS:0000000000000000
      [90444.075492] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [90444.077266] CR2: 0000000000000000 CR3: 00000003e2a60002 CR4: 00000000001606e0
      [90444.079024] Call Trace:
      [90444.080753]  tc_setup_cb_add+0xca/0x1e0
      [90444.082415]  fl_hw_replace_filter+0x15f/0x1f0 [cls_flower]
      [90444.084119]  fl_change+0xa59/0x13dc [cls_flower]
      [90444.085772]  ? wait_for_completion+0xa8/0xf0
      [90444.087364]  tc_new_tfilter+0x3f5/0xa60
      [90444.088960]  rtnetlink_rcv_msg+0xeb/0x360
      [90444.090514]  ? __d_lookup_done+0x76/0xe0
      [90444.092034]  ? proc_alloc_inode+0x16/0x70
      [90444.093560]  ? prep_new_page+0x8c/0xf0
      [90444.095048]  ? _cond_resched+0x15/0x30
      [90444.096483]  ? rtnl_calcit.isra.0+0x110/0x110
      [90444.097907]  netlink_rcv_skb+0x49/0x110
      [90444.099289]  netlink_unicast+0x191/0x230
      [90444.100629]  netlink_sendmsg+0x243/0x480
      [90444.101984]  sock_sendmsg+0x5e/0x60
      [90444.103305]  ____sys_sendmsg+0x1f3/0x260
      [90444.104597]  ? copy_msghdr_from_user+0x5c/0x90
      [90444.105916]  ? __mod_lruvec_state+0x3c/0xe0
      [90444.107210]  ___sys_sendmsg+0x81/0xc0
      [90444.108484]  ? do_filp_open+0xa5/0x100
      [90444.109732]  ? handle_mm_fault+0x117b/0x1e00
      [90444.110970]  ? __check_object_size+0x46/0x147
      [90444.112205]  ? __check_object_size+0x136/0x147
      [90444.113402]  __sys_sendmsg+0x59/0xa0
      [90444.114587]  do_syscall_64+0x4d/0x90
      [90444.115782]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      [90444.116953] RIP: 0033:0x7f3c2393b7b8
      [90444.118101] Code: Bad RIP value.
      [90444.119240] RSP: 002b:00007ffc6ad8e6c8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
      [90444.120408] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f3c2393b7b8
      [90444.121583] RDX: 0000000000000000 RSI: 00007ffc6ad8e740 RDI: 0000000000000003
      [90444.122750] RBP: 000000005eea0c3a R08: 0000000000000001 R09: 00007ffc6ad8e68c
      [90444.123928] R10: 0000000000404fa8 R11: 0000000000000246 R12: 0000000000000001
      [90444.125073] R13: 0000000000000000 R14: 00007ffc6ad92a00 R15: 00000000004866a0
      [90444.126221] Modules linked in: act_skbedit act_tunnel_key act_mirred bonding vxlan ip6_udp_tunnel udp_tunnel nfnetlink act_gact cls_flower sch_ingress openvswitch nsh nf_conncount nfsv3 nfs_acl nfs lockd grace fscache tun bridge stp llc sunrpc rdma_ucm rdma_cm iw_cm ib_cm mlx5_ib ib_uverbs ib_core mlx5_core intel_r
      apl_msr intel_rapl_common sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel mlxfw kvm act_ct nf_flow_table nf_nat nf_conntrack irqbypass crct10dif_pclmul nf_defrag_ipv6 igb ipmi_ssif libcrc32c crc32_pclmul crc32c_intel ipmi_si nf_defrag_ipv4 ptp ghash_clmulni_intel mei_me ses iTCO_wdt i2c_i801 pps_core
      ioatdma iTCO_vendor_support joydev mei enclosure intel_cstate i2c_smbus wmi dca ipmi_devintf intel_uncore lpc_ich ipmi_msghandler pcspkr acpi_pad acpi_power_meter ast i2c_algo_bit drm_vram_helper drm_kms_helper drm_ttm_helper ttm drm mpt3sas raid_class scsi_transport_sas
      [90444.136253] CR2: 0000000000000000
      [90444.137621] ---[ end trace 924af62aa2b151bd ]---
      
      Fixes: 553f9328 ("net/mlx5e: Support tc block sharing for representors")
      Reported-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarVlad Buslov <vladbu@mellanox.com>
      Reviewed-by: default avatarRoi Dayan <roid@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      2fb15e72
    • Vu Pham's avatar
      net/mlx5: E-Switch, Fix vlan or qos setting in legacy mode · 01f3d5db
      Vu Pham authored
      Refactoring eswitch ingress acl codes accidentally inserts extra
      memset zero that removes vlan and/or qos setting in legacy mode.
      
      Fixes: 07bab950 ("net/mlx5: E-Switch, Refactor eswitch ingress acl codes")
      Signed-off-by: default avatarVu Pham <vuhuong@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      01f3d5db
    • Eran Ben Elisha's avatar
      net/mlx5: Fix eeprom support for SFP module · 47afbdd2
      Eran Ben Elisha authored
      Fix eeprom SFP query support by setting i2c_addr, offset and page number
      correctly. Unlike QSFP modules, SFP eeprom params are as follow:
      - i2c_addr is 0x50 for offset 0 - 255 and 0x51 for offset 256 - 511.
      - Page number is always zero.
      - Page offset is always relative to zero.
      
      As part of eeprom query, query the module ID (SFP / QSFP*) via helper
      function to set the params accordingly.
      
      In addition, change mlx5_qsfp_eeprom_page() input type to be u16 to avoid
      unnecessary casting.
      
      Fixes: a708fb7b ("net/mlx5e: ethtool, Add support for EEPROM high pages query")
      Signed-off-by: default avatarEran Ben Elisha <eranbe@mellanox.com>
      Signed-off-by: default avatarHuy Nguyen <huyn@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      47afbdd2
  2. 09 Jul, 2020 8 commits
    • Cong Wang's avatar
      cgroup: Fix sock_cgroup_data on big-endian. · 14b032b8
      Cong Wang authored
      In order for no_refcnt and is_data to be the lowest order two
      bits in the 'val' we have to pad out the bitfield of the u8.
      
      Fixes: ad0f75e5 ("cgroup: fix cgroup_sk_alloc() for sk_clone_lock()")
      Reported-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      14b032b8
    • Lorenz Bauer's avatar
      selftests: bpf: Fix detach from sockmap tests · f43cb0d6
      Lorenz Bauer authored
      Fix sockmap tests which rely on old bpf_prog_dispatch behaviour.
      In the first case, the tests check that detaching without giving
      a program succeeds. Since these are not the desired semantics,
      invert the condition. In the second case, the clean up code doesn't
      supply the necessary program fds.
      
      Fixes: bb0de313 ("bpf: sockmap: Require attach_bpf_fd when detaching a program")
      Reported-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: default avatarLorenz Bauer <lmb@cloudflare.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Reviewed-by: default avatarJakub Sitnicki <jakub@cloudflare.com>
      Link: https://lore.kernel.org/bpf/20200709115151.75829-1-lmb@cloudflare.com
      f43cb0d6
    • Christoph Paasch's avatar
      tcp: make sure listeners don't initialize congestion-control state · ce69e563
      Christoph Paasch authored
      syzkaller found its way into setsockopt with TCP_CONGESTION "cdg".
      tcp_cdg_init() does a kcalloc to store the gradients. As sk_clone_lock
      just copies all the memory, the allocated pointer will be copied as
      well, if the app called setsockopt(..., TCP_CONGESTION) on the listener.
      If now the socket will be destroyed before the congestion-control
      has properly been initialized (through a call to tcp_init_transfer), we
      will end up freeing memory that does not belong to that particular
      socket, opening the door to a double-free:
      
      [   11.413102] ==================================================================
      [   11.414181] BUG: KASAN: double-free or invalid-free in tcp_cleanup_congestion_control+0x58/0xd0
      [   11.415329]
      [   11.415560] CPU: 3 PID: 4884 Comm: syz-executor.5 Not tainted 5.8.0-rc2 #80
      [   11.416544] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
      [   11.418148] Call Trace:
      [   11.418534]  <IRQ>
      [   11.418834]  dump_stack+0x7d/0xb0
      [   11.419297]  print_address_description.constprop.0+0x1a/0x210
      [   11.422079]  kasan_report_invalid_free+0x51/0x80
      [   11.423433]  __kasan_slab_free+0x15e/0x170
      [   11.424761]  kfree+0x8c/0x230
      [   11.425157]  tcp_cleanup_congestion_control+0x58/0xd0
      [   11.425872]  tcp_v4_destroy_sock+0x57/0x5a0
      [   11.426493]  inet_csk_destroy_sock+0x153/0x2c0
      [   11.427093]  tcp_v4_syn_recv_sock+0xb29/0x1100
      [   11.427731]  tcp_get_cookie_sock+0xc3/0x4a0
      [   11.429457]  cookie_v4_check+0x13d0/0x2500
      [   11.433189]  tcp_v4_do_rcv+0x60e/0x780
      [   11.433727]  tcp_v4_rcv+0x2869/0x2e10
      [   11.437143]  ip_protocol_deliver_rcu+0x23/0x190
      [   11.437810]  ip_local_deliver+0x294/0x350
      [   11.439566]  __netif_receive_skb_one_core+0x15d/0x1a0
      [   11.441995]  process_backlog+0x1b1/0x6b0
      [   11.443148]  net_rx_action+0x37e/0xc40
      [   11.445361]  __do_softirq+0x18c/0x61a
      [   11.445881]  asm_call_on_stack+0x12/0x20
      [   11.446409]  </IRQ>
      [   11.446716]  do_softirq_own_stack+0x34/0x40
      [   11.447259]  do_softirq.part.0+0x26/0x30
      [   11.447827]  __local_bh_enable_ip+0x46/0x50
      [   11.448406]  ip_finish_output2+0x60f/0x1bc0
      [   11.450109]  __ip_queue_xmit+0x71c/0x1b60
      [   11.451861]  __tcp_transmit_skb+0x1727/0x3bb0
      [   11.453789]  tcp_rcv_state_process+0x3070/0x4d3a
      [   11.456810]  tcp_v4_do_rcv+0x2ad/0x780
      [   11.457995]  __release_sock+0x14b/0x2c0
      [   11.458529]  release_sock+0x4a/0x170
      [   11.459005]  __inet_stream_connect+0x467/0xc80
      [   11.461435]  inet_stream_connect+0x4e/0xa0
      [   11.462043]  __sys_connect+0x204/0x270
      [   11.465515]  __x64_sys_connect+0x6a/0xb0
      [   11.466088]  do_syscall_64+0x3e/0x70
      [   11.466617]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      [   11.467341] RIP: 0033:0x7f56046dc469
      [   11.467844] Code: Bad RIP value.
      [   11.468282] RSP: 002b:00007f5604dccdd8 EFLAGS: 00000246 ORIG_RAX: 000000000000002a
      [   11.469326] RAX: ffffffffffffffda RBX: 000000000068bf00 RCX: 00007f56046dc469
      [   11.470379] RDX: 0000000000000010 RSI: 0000000020000000 RDI: 0000000000000004
      [   11.471311] RBP: 00000000ffffffff R08: 0000000000000000 R09: 0000000000000000
      [   11.472286] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
      [   11.473341] R13: 000000000041427c R14: 00007f5604dcd5c0 R15: 0000000000000003
      [   11.474321]
      [   11.474527] Allocated by task 4884:
      [   11.475031]  save_stack+0x1b/0x40
      [   11.475548]  __kasan_kmalloc.constprop.0+0xc2/0xd0
      [   11.476182]  tcp_cdg_init+0xf0/0x150
      [   11.476744]  tcp_init_congestion_control+0x9b/0x3a0
      [   11.477435]  tcp_set_congestion_control+0x270/0x32f
      [   11.478088]  do_tcp_setsockopt.isra.0+0x521/0x1a00
      [   11.478744]  __sys_setsockopt+0xff/0x1e0
      [   11.479259]  __x64_sys_setsockopt+0xb5/0x150
      [   11.479895]  do_syscall_64+0x3e/0x70
      [   11.480395]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      [   11.481097]
      [   11.481321] Freed by task 4872:
      [   11.481783]  save_stack+0x1b/0x40
      [   11.482230]  __kasan_slab_free+0x12c/0x170
      [   11.482839]  kfree+0x8c/0x230
      [   11.483240]  tcp_cleanup_congestion_control+0x58/0xd0
      [   11.483948]  tcp_v4_destroy_sock+0x57/0x5a0
      [   11.484502]  inet_csk_destroy_sock+0x153/0x2c0
      [   11.485144]  tcp_close+0x932/0xfe0
      [   11.485642]  inet_release+0xc1/0x1c0
      [   11.486131]  __sock_release+0xc0/0x270
      [   11.486697]  sock_close+0xc/0x10
      [   11.487145]  __fput+0x277/0x780
      [   11.487632]  task_work_run+0xeb/0x180
      [   11.488118]  __prepare_exit_to_usermode+0x15a/0x160
      [   11.488834]  do_syscall_64+0x4a/0x70
      [   11.489326]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Wei Wang fixed a part of these CDG-malloc issues with commit c1201444
      ("tcp: memset ca_priv data to 0 properly").
      
      This patch here fixes the listener-scenario: We make sure that listeners
      setting the congestion-control through setsockopt won't initialize it
      (thus CDG never allocates on listeners). For those who use AF_UNSPEC to
      reuse a socket, tcp_disconnect() is changed to cleanup afterwards.
      
      (The issue can be reproduced at least down to v4.4.x.)
      
      Cc: Wei Wang <weiwan@google.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Fixes: 2b0a8c9e ("tcp: add CDG congestion control")
      Signed-off-by: default avatarChristoph Paasch <cpaasch@apple.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ce69e563
    • Martin KaFai Lau's avatar
      bpf: net: Avoid incorrect bpf_sk_reuseport_detach call · c9a368f1
      Martin KaFai Lau authored
      bpf_sk_reuseport_detach is currently called when sk->sk_user_data
      is not NULL.  It is incorrect because sk->sk_user_data may not be
      managed by the bpf's reuseport_array.  It has been reported in [1] that,
      the bpf_sk_reuseport_detach() which is called from udp_lib_unhash() has
      corrupted the sk_user_data managed by l2tp.
      
      This patch solves it by using another bit (defined as SK_USER_DATA_BPF)
      of the sk_user_data pointer value.  It marks that a sk_user_data is
      managed/owned by BPF.
      
      The patch depends on a PTRMASK introduced in
      commit f1ff5ce2 ("net, sk_msg: Clear sk_user_data pointer on clone if tagged").
      
      [ Note: sk->sk_user_data is used by bpf's reuseport_array only when a sk is
        added to the bpf's reuseport_array.
        i.e. doing setsockopt(SO_REUSEPORT) and having "sk->sk_reuseport == 1"
        alone will not stop sk->sk_user_data being used by other means. ]
      
      [1]: https://lore.kernel.org/netdev/20200706121259.GA20199@katalix.com/
      
      Fixes: 5dc4c4b7 ("bpf: Introduce BPF_MAP_TYPE_REUSEPORT_SOCKARRAY")
      Reported-by: default avatarJames Chapman <jchapman@katalix.com>
      Reported-by: syzbot+9f092552ba9a5efca5df@syzkaller.appspotmail.com
      Signed-off-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Tested-by: default avatarJames Chapman <jchapman@katalix.com>
      Acked-by: default avatarJames Chapman <jchapman@katalix.com>
      Link: https://lore.kernel.org/bpf/20200709061110.4019316-1-kafai@fb.com
      c9a368f1
    • Martin KaFai Lau's avatar
      bpf: net: Avoid copying sk_user_data of reuseport_array during sk_clone · f3dda7a6
      Martin KaFai Lau authored
      It makes little sense for copying sk_user_data of reuseport_array during
      sk_clone_lock().  This patch reuses the SK_USER_DATA_NOCOPY bit introduced in
      commit f1ff5ce2 ("net, sk_msg: Clear sk_user_data pointer on clone if tagged").
      It is used to mark the sk_user_data is not supposed to be copied to its clone.
      
      Although the cloned sk's sk_user_data will not be used/freed in
      bpf_sk_reuseport_detach(), this change can still allow the cloned
      sk's sk_user_data to be used by some other means.
      
      Freeing the reuseport_array's sk_user_data does not require a rcu grace
      period.  Thus, the existing rcu_assign_sk_user_data_nocopy() is not
      used.
      
      Fixes: 5dc4c4b7 ("bpf: Introduce BPF_MAP_TYPE_REUSEPORT_SOCKARRAY")
      Signed-off-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Reviewed-by: default avatarJakub Sitnicki <jakub@cloudflare.com>
      Link: https://lore.kernel.org/bpf/20200709061104.4018798-1-kafai@fb.com
      f3dda7a6
    • Michal Kubecek's avatar
      ethtool: fix genlmsg_put() failure handling in ethnl_default_dumpit() · 365f9ae4
      Michal Kubecek authored
      If the genlmsg_put() call in ethnl_default_dumpit() fails, we bail out
      without checking if we already have some messages in current skb like we do
      with ethnl_default_dump_one() failure later. Therefore if existing messages
      almost fill up the buffer so that there is not enough space even for
      netlink and genetlink header, we lose all prepared messages and return and
      error.
      
      Rather than duplicating the skb->len check, move the genlmsg_put(),
      genlmsg_cancel() and genlmsg_end() calls into ethnl_default_dump_one().
      This is also more logical as all message composition will be in
      ethnl_default_dump_one() and only iteration logic will be left in
      ethnl_default_dumpit().
      
      Fixes: 728480f1 ("ethtool: default handlers for GET requests")
      Reported-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarMichal Kubecek <mkubecek@suse.cz>
      Reviewed-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      365f9ae4
    • Cong Wang's avatar
      net_sched: fix a memory leak in atm_tc_init() · 306381ae
      Cong Wang authored
      When tcf_block_get() fails inside atm_tc_init(),
      atm_tc_put() is called to release the qdisc p->link.q.
      But the flow->ref prevents it to do so, as the flow->ref
      is still zero.
      
      Fix this by moving the p->link.ref initialization before
      tcf_block_get().
      
      Fixes: 6529eaba ("net: sched: introduce tcf block infractructure")
      Reported-and-tested-by: syzbot+d411cff6ab29cc2c311b@syzkaller.appspotmail.com
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      306381ae
    • Sudarsana Reddy Kalluru's avatar
      qed: Populate nvm-file attributes while reading nvm config partition. · 13cf8aab
      Sudarsana Reddy Kalluru authored
      NVM config file address will be modified when the MBI image is upgraded.
      Driver would return stale config values if user reads the nvm-config
      (via ethtool -d) in this state. The fix is to re-populate nvm attribute
      info while reading the nvm config values/partition.
      
      Changes from previous version:
      -------------------------------
      v3: Corrected the formatting in 'Fixes' tag.
      v2: Added 'Fixes' tag.
      
      Fixes: 1ac4329a ("qed: Add configuration information to register dump and debug data")
      Signed-off-by: default avatarSudarsana Reddy Kalluru <skalluru@marvell.com>
      Signed-off-by: default avatarIgor Russkikh <irusskikh@marvell.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      13cf8aab
  3. 08 Jul, 2020 10 commits
    • Rahul Lakkireddy's avatar
      cxgb4: fix all-mask IP address comparison · 76c4d85c
      Rahul Lakkireddy authored
      Convert all-mask IP address to Big Endian, instead, for comparison.
      
      Fixes: f286dd8e ("cxgb4: use correct type for all-mask IP address comparison")
      Signed-off-by: default avatarRahul Lakkireddy <rahul.lakkireddy@chelsio.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      76c4d85c
    • Hamish Martin's avatar
      tipc: fix retransmission on unicast links · a34f8291
      Hamish Martin authored
      A scenario has been observed where a 'bc_init' message for a link is not
      retransmitted if it fails to be received by the peer. This leads to the
      peer never establishing the link fully and it discarding all other data
      received on the link. In this scenario the message is lost in transit to
      the peer.
      
      The issue is traced to the 'nxt_retr' field of the skb not being
      initialised for links that aren't a bc_sndlink. This leads to the
      comparison in tipc_link_advance_transmq() that gates whether to attempt
      retransmission of a message performing in an undesirable way.
      Depending on the relative value of 'jiffies', this comparison:
          time_before(jiffies, TIPC_SKB_CB(skb)->nxt_retr)
      may return true or false given that 'nxt_retr' remains at the
      uninitialised value of 0 for non bc_sndlinks.
      
      This is most noticeable shortly after boot when jiffies is initialised
      to a high value (to flush out rollover bugs) and we compare a jiffies of,
      say, 4294940189 to zero. In that case time_before returns 'true' leading
      to the skb not being retransmitted.
      
      The fix is to ensure that all skbs have a valid 'nxt_retr' time set for
      them and this is achieved by refactoring the setting of this value into
      a central function.
      With this fix, transmission losses of 'bc_init' messages do not stall
      the link establishment forever because the 'bc_init' message is
      retransmitted and the link eventually establishes correctly.
      
      Fixes: 382f598f ("tipc: reduce duplicate packets for unicast traffic")
      Acked-by: default avatarJon Maloy <jmaloy@redhat.com>
      Signed-off-by: default avatarHamish Martin <hamish.martin@alliedtelesis.co.nz>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a34f8291
    • Xin Long's avatar
      l2tp: remove skb_dst_set() from l2tp_xmit_skb() · 27d53323
      Xin Long authored
      In the tx path of l2tp, l2tp_xmit_skb() calls skb_dst_set() to set
      skb's dst. However, it will eventually call inet6_csk_xmit() or
      ip_queue_xmit() where skb's dst will be overwritten by:
      
         skb_dst_set_noref(skb, dst);
      
      without releasing the old dst in skb. Then it causes dst/dev refcnt leak:
      
        unregister_netdevice: waiting for eth0 to become free. Usage count = 1
      
      This can be reproduced by simply running:
      
        # modprobe l2tp_eth && modprobe l2tp_ip
        # sh ./tools/testing/selftests/net/l2tp.sh
      
      So before going to inet6_csk_xmit() or ip_queue_xmit(), skb's dst
      should be dropped. This patch is to fix it by removing skb_dst_set()
      from l2tp_xmit_skb() and moving skb_dst_drop() into l2tp_xmit_core().
      
      Fixes: 3557baab ("[L2TP]: PPP over L2TP driver core")
      Reported-by: default avatarHangbin Liu <liuhangbin@gmail.com>
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Acked-by: default avatarJames Chapman <jchapman@katalix.com>
      Tested-by: default avatarJames Chapman <jchapman@katalix.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      27d53323
    • David S. Miller's avatar
      Merge branch 'net-smc-fixes' · 1412bb2b
      David S. Miller authored
      Karsten Graul says:
      
      ====================
      net/smc: fixes 2020-07-08
      
      Please apply the following patch series for smc to netdev's net tree.
      
      The patches fix problems found during more testing of SMC
      functionality, resulting in hang conditions and unneeded link
      deactivations. The clc module was hardened to be prepared for
      possible future SMCD versions.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1412bb2b
    • Ursula Braun's avatar
      net/smc: tolerate future SMCD versions · fb4f7926
      Ursula Braun authored
      CLC proposal messages of future SMCD versions could be larger than SMCD
      V1 CLC proposal messages.
      To enable toleration in SMC V1 the receival of CLC proposal messages
      is adapted:
      * accept larger length values in CLC proposal
      * check trailing eye catcher for incoming CLC proposal with V1 length only
      * receive the whole CLC proposal even in cases it does not fit into the
        V1 buffer
      
      Fixes: e7b7a64a ("smc: support variable CLC proposal messages")
      Signed-off-by: default avatarUrsula Braun <ubraun@linux.ibm.com>
      Signed-off-by: default avatarKarsten Graul <kgraul@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fb4f7926
    • Ursula Braun's avatar
      net/smc: switch smcd_dev_list spinlock to mutex · 82087c03
      Ursula Braun authored
      The similar smc_ib_devices spinlock has been converted to a mutex.
      Protecting the smcd_dev_list by a mutex is possible as well. This
      patch converts the smcd_dev_list spinlock to a mutex.
      
      Fixes: c6ba7c9b ("net/smc: add base infrastructure for SMC-D and ISM")
      Signed-off-by: default avatarUrsula Braun <ubraun@linux.ibm.com>
      Signed-off-by: default avatarKarsten Graul <kgraul@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      82087c03
    • Ursula Braun's avatar
      net/smc: fix sleep bug in smc_pnet_find_roce_resource() · 92f3cb0e
      Ursula Braun authored
      Tests showed this BUG:
      [572555.252867] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:935
      [572555.252876] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 131031, name: smcapp
      [572555.252879] INFO: lockdep is turned off.
      [572555.252883] CPU: 1 PID: 131031 Comm: smcapp Tainted: G           O      5.7.0-rc3uschi+ #356
      [572555.252885] Hardware name: IBM 3906 M03 703 (LPAR)
      [572555.252887] Call Trace:
      [572555.252896]  [<00000000ac364554>] show_stack+0x94/0xe8
      [572555.252901]  [<00000000aca1f400>] dump_stack+0xa0/0xe0
      [572555.252906]  [<00000000ac3c8c10>] ___might_sleep+0x260/0x280
      [572555.252910]  [<00000000acdc0c98>] __mutex_lock+0x48/0x940
      [572555.252912]  [<00000000acdc15c2>] mutex_lock_nested+0x32/0x40
      [572555.252975]  [<000003ff801762d0>] mlx5_lag_get_roce_netdev+0x30/0xc0 [mlx5_core]
      [572555.252996]  [<000003ff801fb3aa>] mlx5_ib_get_netdev+0x3a/0xe0 [mlx5_ib]
      [572555.253007]  [<000003ff80063848>] smc_pnet_find_roce_resource+0x1d8/0x310 [smc]
      [572555.253011]  [<000003ff800602f0>] __smc_connect+0x1f0/0x3e0 [smc]
      [572555.253015]  [<000003ff80060634>] smc_connect+0x154/0x190 [smc]
      [572555.253022]  [<00000000acbed8d4>] __sys_connect+0x94/0xd0
      [572555.253025]  [<00000000acbef620>] __s390x_sys_socketcall+0x170/0x360
      [572555.253028]  [<00000000acdc6800>] system_call+0x298/0x2b8
      [572555.253030] INFO: lockdep is turned off.
      
      Function smc_pnet_find_rdma_dev() might be called from
      smc_pnet_find_roce_resource(). It holds the smc_ib_devices list
      spinlock while calling infiniband op get_netdev(). At least for mlx5
      the get_netdev operation wants mutex serialization, which conflicts
      with the smc_ib_devices spinlock.
      This patch switches the smc_ib_devices spinlock into a mutex to
      allow sleeping when calling get_netdev().
      
      Fixes: a4cf0443 ("smc: introduce SMC as an IB-client")
      Signed-off-by: default avatarUrsula Braun <ubraun@linux.ibm.com>
      Signed-off-by: default avatarKarsten Graul <kgraul@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      92f3cb0e
    • Karsten Graul's avatar
      net/smc: fix work request handling · b7eede75
      Karsten Graul authored
      Wait for pending sends only when smc_switch_conns() found a link to move
      the connections to. Do not wait during link freeing, this can lead to
      permanent hang situations. And refuse to provide a new tx slot on an
      unusable link.
      
      Fixes: c6f02ebe ("net/smc: switch connections to alternate link")
      Reviewed-by: default avatarUrsula Braun <ubraun@linux.ibm.com>
      Signed-off-by: default avatarKarsten Graul <kgraul@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b7eede75
    • Karsten Graul's avatar
      net/smc: separate LLC wait queues for flow and messages · 6778a6be
      Karsten Graul authored
      There might be races in scenarios where both SMC link groups are on the
      same system. Prevent that by creating separate wait queues for LLC flows
      and messages. Switch to non-interruptable versions of wait_event() and
      wake_up() for the llc flow waiter to make sure the waiters get control
      sequentially. Fine tune the llc_flow_lock to include the assignment of
      the message. Write to system log when an unexpected message was
      dropped. And remove an extra indirection and use the existing local
      variable lgr in smc_llc_enqueue().
      
      Fixes: 555da9af ("net/smc: add event-based llc_flow framework")
      Reviewed-by: default avatarUrsula Braun <ubraun@linux.ibm.com>
      Signed-off-by: default avatarKarsten Graul <kgraul@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6778a6be
    • Dmitry Bogdanov's avatar
      net: atlantic: fix ip dst and ipv6 address filters · a42e6aee
      Dmitry Bogdanov authored
      This patch fixes ip dst and ipv6 address filters.
      There were 2 mistakes in the code, which led to the issue:
      * invalid register was used for ipv4 dst address;
      * incorrect write order of dwords for ipv6 addresses.
      
      Fixes: 23e7a718 ("net: aquantia: add rx-flow filter definitions")
      Signed-off-by: default avatarDmitry Bogdanov <dbogdanov@marvell.com>
      Signed-off-by: default avatarMark Starovoytov <mstarovoitov@marvell.com>
      Signed-off-by: default avatarAlexander Lobakin <alobakin@marvell.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a42e6aee