1. 30 Jul, 2020 2 commits
    • Francesco Ruggeri's avatar
      igb: reinit_locked() should be called with rtnl_lock · 024a8168
      Francesco Ruggeri authored
      We observed two panics involving races with igb_reset_task.
      The first panic is caused by this race condition:
      
      	kworker			reboot -f
      
      	igb_reset_task
      	igb_reinit_locked
      	igb_down
      	napi_synchronize
      				__igb_shutdown
      				igb_clear_interrupt_scheme
      				igb_free_q_vectors
      				igb_free_q_vector
      				adapter->q_vector[v_idx] = NULL;
      	napi_disable
      	Panics trying to access
      	adapter->q_vector[v_idx].napi_state
      
      The second panic (a divide error) is caused by this race:
      
      kworker		reboot -f	tx packet
      
      igb_reset_task
      		__igb_shutdown
      		rtnl_lock()
      		...
      		igb_clear_interrupt_scheme
      		igb_free_q_vectors
      		adapter->num_tx_queues = 0
      		...
      		rtnl_unlock()
      rtnl_lock()
      igb_reinit_locked
      igb_down
      igb_up
      netif_tx_start_all_queues
      				dev_hard_start_xmit
      				igb_xmit_frame
      				igb_tx_queue_mapping
      				Panics on
      				r_idx % adapter->num_tx_queues
      
      This commit applies to igb_reset_task the same changes that
      were applied to ixgbe in commit 2f90b865 ("ixgbe: this patch
      adds support for DCB to the kernel and ixgbe driver"),
      commit 8f4c5c9f ("ixgbe: reinit_locked() should be called with
      rtnl_lock") and commit 88adce4e ("ixgbe: fix possible race in
      reset subtask").
      Signed-off-by: default avatarFrancesco Ruggeri <fruggeri@arista.com>
      Tested-by: default avatarAaron Brown <aaron.f.brown@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      024a8168
    • Aaron Ma's avatar
      e1000e: continue to init PHY even when failed to disable ULP · 1050242f
      Aaron Ma authored
      After 'commit e086ba2f ("e1000e: disable s0ix entry and exit flows
       for ME systems")',
      ThinkPad P14s always failed to disable ULP by ME.
      'commit 0c80cdbf ("e1000e: Warn if disabling ULP failed")'
      break out of init phy:
      
      error log:
      [   42.364753] e1000e 0000:00:1f.6 enp0s31f6: Failed to disable ULP
      [   42.524626] e1000e 0000:00:1f.6 enp0s31f6: PHY Wakeup cause - Unicast Packet
      [   42.822476] e1000e 0000:00:1f.6 enp0s31f6: Hardware Error
      
      When disable s0ix, E1000_FWSM_ULP_CFG_DONE will never be 1.
      If continue to init phy like before, it can work as before.
      iperf test result good too.
      
      Fixes: 0c80cdbf ("e1000e: Warn if disabling ULP failed")
      Signed-off-by: default avatarAaron Ma <aaron.ma@canonical.com>
      Tested-by: default avatarAaron Brown <aaron.f.brown@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      1050242f
  2. 29 Jul, 2020 22 commits
    • Thomas Falcon's avatar
      ibmvnic: Fix IRQ mapping disposal in error path · 27a2145d
      Thomas Falcon authored
      RX queue IRQ mappings are disposed in both the TX IRQ and RX IRQ
      error paths. Fix this and dispose of TX IRQ mappings correctly in
      case of an error.
      
      Fixes: ea22d51a ("ibmvnic: simplify and improve driver probe function")
      Signed-off-by: default avatarThomas Falcon <tlfalcon@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      27a2145d
    • David S. Miller's avatar
      Merge branch 'mlxsw-fixes' · 5d104a5f
      David S. Miller authored
      Ido Schimmel says:
      
      ====================
      mlxsw fixes
      
      This patch set contains various fixes for mlxsw.
      
      Patches #1-#2 fix two trap related issues introduced in previous cycle.
      
      Patches #3-#5 fix rare use-after-frees discovered by syzkaller. After
      over a week of fuzzing with the fixes, the bugs did not reproduce.
      
      Patch #6 from Amit fixes an issue in the ethtool selftest that was
      recently discovered after running the test on a new platform that
      supports only 1Gbps and 10Gbps speeds.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5d104a5f
    • Amit Cohen's avatar
      selftests: ethtool: Fix test when only two speeds are supported · 10fef9ca
      Amit Cohen authored
      The test case check_highest_speed_is_chosen() configures $h1 to
      advertise a subset of its supported speeds and checks that $h2 chooses
      the highest speed from the subset.
      
      To find the common advertised speeds between $h1 and $h2,
      common_speeds_get() is called.
      
      Currently, the first speed returned from common_speeds_get() is removed
      claiming "h1 does not advertise this speed". The claim is wrong because
      the function is called after $h1 already advertised a subset of speeds.
      
      In case $h1 supports only two speeds, it will advertise a single speed
      which will be later removed because of previously mentioned bug. This
      results in the test needlessly failing. When more than two speeds are
      supported this is not an issue because the first advertised speed
      is the lowest one.
      
      Fix this by not removing any speed from the list of commonly advertised
      speeds.
      
      Fixes: 64916b57 ("selftests: forwarding: Add speed and auto-negotiation test")
      Reported-by: default avatarDanielle Ratson <danieller@mellanox.com>
      Signed-off-by: default avatarAmit Cohen <amitc@mellanox.com>
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      10fef9ca
    • Ido Schimmel's avatar
      mlxsw: spectrum_router: Fix use-after-free in router init / de-init · 5515c344
      Ido Schimmel authored
      Several notifiers are registered as part of router initialization.
      Since some of these notifiers are registered before the end of the
      initialization, it is possible for them to access uninitialized or freed
      memory when processing notifications [1].
      
      Additionally, some of these notifiers queue work items on a workqueue.
      If these work items are executed after the router was de-initialized,
      they will access freed memory.
      
      Fix both problems by moving the registration of the notifiers to the end
      of the router initialization and flush the work queue after they are
      unregistered.
      
      [1]
      BUG: KASAN: use-after-free in __mutex_lock_common kernel/locking/mutex.c:938 [inline]
      BUG: KASAN: use-after-free in __mutex_lock+0xeea/0x1340 kernel/locking/mutex.c:1103
      Read of size 8 at addr ffff888038c3a6e0 by task kworker/u4:1/61
      
      CPU: 1 PID: 61 Comm: kworker/u4:1 Not tainted 5.8.0-rc2+ #36
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
      Workqueue: mlxsw_core_ordered mlxsw_sp_inet6addr_event_work
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0xf6/0x16e lib/dump_stack.c:118
       print_address_description.constprop.0+0x1c/0x250 mm/kasan/report.c:383
       __kasan_report mm/kasan/report.c:513 [inline]
       kasan_report.cold+0x1f/0x37 mm/kasan/report.c:530
       __mutex_lock_common kernel/locking/mutex.c:938 [inline]
       __mutex_lock+0xeea/0x1340 kernel/locking/mutex.c:1103
       mlxsw_sp_inet6addr_event_work+0xb3/0x1b0 drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c:7123
       process_one_work+0xa3e/0x17a0 kernel/workqueue.c:2269
       worker_thread+0x9e/0x1050 kernel/workqueue.c:2415
       kthread+0x355/0x470 kernel/kthread.c:291
       ret_from_fork+0x22/0x30 arch/x86/entry/entry_64.S:293
      
      Allocated by task 1298:
       save_stack+0x1b/0x40 mm/kasan/common.c:48
       set_track mm/kasan/common.c:56 [inline]
       __kasan_kmalloc mm/kasan/common.c:494 [inline]
       __kasan_kmalloc.constprop.0+0xc2/0xd0 mm/kasan/common.c:467
       kmalloc include/linux/slab.h:555 [inline]
       kzalloc include/linux/slab.h:669 [inline]
       mlxsw_sp_router_init+0xb2/0x1d20 drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c:8074
       mlxsw_sp_init+0xbd8/0x3ac0 drivers/net/ethernet/mellanox/mlxsw/spectrum.c:2932
       __mlxsw_core_bus_device_register+0x657/0x10d0 drivers/net/ethernet/mellanox/mlxsw/core.c:1375
       mlxsw_core_bus_device_register drivers/net/ethernet/mellanox/mlxsw/core.c:1436 [inline]
       mlxsw_devlink_core_bus_device_reload_up+0xcd/0x150 drivers/net/ethernet/mellanox/mlxsw/core.c:1133
       devlink_reload net/core/devlink.c:2959 [inline]
       devlink_reload+0x281/0x3b0 net/core/devlink.c:2944
       devlink_nl_cmd_reload+0x2f1/0x7c0 net/core/devlink.c:2987
       genl_family_rcv_msg_doit net/netlink/genetlink.c:691 [inline]
       genl_family_rcv_msg net/netlink/genetlink.c:736 [inline]
       genl_rcv_msg+0x611/0x9d0 net/netlink/genetlink.c:753
       netlink_rcv_skb+0x152/0x440 net/netlink/af_netlink.c:2469
       genl_rcv+0x24/0x40 net/netlink/genetlink.c:764
       netlink_unicast_kernel net/netlink/af_netlink.c:1303 [inline]
       netlink_unicast+0x53a/0x750 net/netlink/af_netlink.c:1329
       netlink_sendmsg+0x850/0xd90 net/netlink/af_netlink.c:1918
       sock_sendmsg_nosec net/socket.c:652 [inline]
       sock_sendmsg+0x150/0x190 net/socket.c:672
       ____sys_sendmsg+0x6d8/0x840 net/socket.c:2363
       ___sys_sendmsg+0xff/0x170 net/socket.c:2417
       __sys_sendmsg+0xe5/0x1b0 net/socket.c:2450
       do_syscall_64+0x56/0xa0 arch/x86/entry/common.c:359
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Freed by task 1348:
       save_stack+0x1b/0x40 mm/kasan/common.c:48
       set_track mm/kasan/common.c:56 [inline]
       kasan_set_free_info mm/kasan/common.c:316 [inline]
       __kasan_slab_free+0x12c/0x170 mm/kasan/common.c:455
       slab_free_hook mm/slub.c:1474 [inline]
       slab_free_freelist_hook mm/slub.c:1507 [inline]
       slab_free mm/slub.c:3072 [inline]
       kfree+0xe6/0x320 mm/slub.c:4063
       mlxsw_sp_fini+0x340/0x4e0 drivers/net/ethernet/mellanox/mlxsw/spectrum.c:3132
       mlxsw_core_bus_device_unregister+0x16c/0x6d0 drivers/net/ethernet/mellanox/mlxsw/core.c:1474
       mlxsw_devlink_core_bus_device_reload_down+0x8e/0xc0 drivers/net/ethernet/mellanox/mlxsw/core.c:1123
       devlink_reload+0xc6/0x3b0 net/core/devlink.c:2952
       devlink_nl_cmd_reload+0x2f1/0x7c0 net/core/devlink.c:2987
       genl_family_rcv_msg_doit net/netlink/genetlink.c:691 [inline]
       genl_family_rcv_msg net/netlink/genetlink.c:736 [inline]
       genl_rcv_msg+0x611/0x9d0 net/netlink/genetlink.c:753
       netlink_rcv_skb+0x152/0x440 net/netlink/af_netlink.c:2469
       genl_rcv+0x24/0x40 net/netlink/genetlink.c:764
       netlink_unicast_kernel net/netlink/af_netlink.c:1303 [inline]
       netlink_unicast+0x53a/0x750 net/netlink/af_netlink.c:1329
       netlink_sendmsg+0x850/0xd90 net/netlink/af_netlink.c:1918
       sock_sendmsg_nosec net/socket.c:652 [inline]
       sock_sendmsg+0x150/0x190 net/socket.c:672
       ____sys_sendmsg+0x6d8/0x840 net/socket.c:2363
       ___sys_sendmsg+0xff/0x170 net/socket.c:2417
       __sys_sendmsg+0xe5/0x1b0 net/socket.c:2450
       do_syscall_64+0x56/0xa0 arch/x86/entry/common.c:359
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      The buggy address belongs to the object at ffff888038c3a000
       which belongs to the cache kmalloc-2k of size 2048
      The buggy address is located 1760 bytes inside of
       2048-byte region [ffff888038c3a000, ffff888038c3a800)
      The buggy address belongs to the page:
      page:ffffea0000e30e00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 head:ffffea0000e30e00 order:3 compound_mapcount:0 compound_pincount:0
      flags: 0x100000000010200(slab|head)
      raw: 0100000000010200 dead000000000100 dead000000000122 ffff88806c40c000
      raw: 0000000000000000 0000000000080008 00000001ffffffff 0000000000000000
      page dumped because: kasan: bad access detected
      
      Memory state around the buggy address:
       ffff888038c3a580: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
       ffff888038c3a600: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      >ffff888038c3a680: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                                                             ^
       ffff888038c3a700: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
       ffff888038c3a780: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      
      Fixes: 965fa8e6 ("mlxsw: spectrum_router: Make RIF deletion more robust")
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Reviewed-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5515c344
    • Ido Schimmel's avatar
      mlxsw: core: Free EMAD transactions using kfree_rcu() · 3c8ce24b
      Ido Schimmel authored
      The lifetime of EMAD transactions (i.e., 'struct mlxsw_reg_trans') is
      managed using RCU. They are freed using kfree_rcu() once the transaction
      ends.
      
      However, in case the transaction failed it is freed immediately after being
      removed from the active transactions list. This is problematic because it is
      still possible for a different CPU to dereference the transaction from an RCU
      read-side critical section while traversing the active transaction list in
      mlxsw_emad_rx_listener_func(). In which case, a use-after-free is triggered
      [1].
      
      Fix this by freeing the transaction after a grace period by calling
      kfree_rcu().
      
      [1]
      BUG: KASAN: use-after-free in mlxsw_emad_rx_listener_func+0x969/0xac0 drivers/net/ethernet/mellanox/mlxsw/core.c:671
      Read of size 8 at addr ffff88800b7964e8 by task syz-executor.2/2881
      
      CPU: 0 PID: 2881 Comm: syz-executor.2 Not tainted 5.8.0-rc4+ #44
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
      Call Trace:
       <IRQ>
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0xf6/0x16e lib/dump_stack.c:118
       print_address_description.constprop.0+0x1c/0x250 mm/kasan/report.c:383
       __kasan_report mm/kasan/report.c:513 [inline]
       kasan_report.cold+0x1f/0x37 mm/kasan/report.c:530
       mlxsw_emad_rx_listener_func+0x969/0xac0 drivers/net/ethernet/mellanox/mlxsw/core.c:671
       mlxsw_core_skb_receive+0x571/0x700 drivers/net/ethernet/mellanox/mlxsw/core.c:2061
       mlxsw_pci_cqe_rdq_handle drivers/net/ethernet/mellanox/mlxsw/pci.c:595 [inline]
       mlxsw_pci_cq_tasklet+0x12a6/0x2520 drivers/net/ethernet/mellanox/mlxsw/pci.c:651
       tasklet_action_common.isra.0+0x13f/0x3e0 kernel/softirq.c:550
       __do_softirq+0x223/0x964 kernel/softirq.c:292
       asm_call_on_stack+0x12/0x20 arch/x86/entry/entry_64.S:711
       </IRQ>
       __run_on_irqstack arch/x86/include/asm/irq_stack.h:22 [inline]
       run_on_irqstack_cond arch/x86/include/asm/irq_stack.h:48 [inline]
       do_softirq_own_stack+0x109/0x140 arch/x86/kernel/irq_64.c:77
       invoke_softirq kernel/softirq.c:387 [inline]
       __irq_exit_rcu kernel/softirq.c:417 [inline]
       irq_exit_rcu+0x16f/0x1a0 kernel/softirq.c:429
       sysvec_apic_timer_interrupt+0x4e/0xd0 arch/x86/kernel/apic/apic.c:1091
       asm_sysvec_apic_timer_interrupt+0x12/0x20 arch/x86/include/asm/idtentry.h:587
      RIP: 0010:arch_local_irq_restore arch/x86/include/asm/irqflags.h:85 [inline]
      RIP: 0010:__raw_spin_unlock_irqrestore include/linux/spinlock_api_smp.h:160 [inline]
      RIP: 0010:_raw_spin_unlock_irqrestore+0x3b/0x40 kernel/locking/spinlock.c:191
      Code: e8 2a c3 f4 fc 48 89 ef e8 12 96 f5 fc f6 c7 02 75 11 53 9d e8 d6 db 11 fd 65 ff 0d 1f 21 b3 56 5b 5d c3 e8 a7 d7 11 fd 53 9d <eb> ed 0f 1f 00 55 48 89 fd 65 ff 05 05 21 b3 56 ff 74 24 08 48 8d
      RSP: 0018:ffff8880446ffd80 EFLAGS: 00000286
      RAX: 0000000000000006 RBX: 0000000000000286 RCX: 0000000000000006
      RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffffa94ecea9
      RBP: ffff888012934408 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000001 R11: fffffbfff57be301 R12: 1ffff110088dffc1
      R13: ffff888037b817c0 R14: ffff88802442415a R15: ffff888024424000
       __do_sys_perf_event_open+0x1b5d/0x2bd0 kernel/events/core.c:11874
       do_syscall_64+0x56/0xa0 arch/x86/entry/common.c:384
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      RIP: 0033:0x473dbd
      Code: Bad RIP value.
      RSP: 002b:00007f21e5e9cc28 EFLAGS: 00000246 ORIG_RAX: 000000000000012a
      RAX: ffffffffffffffda RBX: 000000000057bf00 RCX: 0000000000473dbd
      RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000020000040
      RBP: 000000000057bf00 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000003 R11: 0000000000000246 R12: 000000000057bf0c
      R13: 00007ffd0493503f R14: 00000000004d0f46 R15: 00007f21e5e9cd80
      
      Allocated by task 871:
       save_stack+0x1b/0x40 mm/kasan/common.c:48
       set_track mm/kasan/common.c:56 [inline]
       __kasan_kmalloc mm/kasan/common.c:494 [inline]
       __kasan_kmalloc.constprop.0+0xc2/0xd0 mm/kasan/common.c:467
       kmalloc include/linux/slab.h:555 [inline]
       kzalloc include/linux/slab.h:669 [inline]
       mlxsw_core_reg_access_emad+0x70/0x1410 drivers/net/ethernet/mellanox/mlxsw/core.c:1812
       mlxsw_core_reg_access+0xeb/0x540 drivers/net/ethernet/mellanox/mlxsw/core.c:1991
       mlxsw_sp_port_get_hw_xstats+0x335/0x7e0 drivers/net/ethernet/mellanox/mlxsw/spectrum.c:1130
       update_stats_cache+0xf4/0x140 drivers/net/ethernet/mellanox/mlxsw/spectrum.c:1173
       process_one_work+0xa3e/0x17a0 kernel/workqueue.c:2269
       worker_thread+0x9e/0x1050 kernel/workqueue.c:2415
       kthread+0x355/0x470 kernel/kthread.c:291
       ret_from_fork+0x22/0x30 arch/x86/entry/entry_64.S:293
      
      Freed by task 871:
       save_stack+0x1b/0x40 mm/kasan/common.c:48
       set_track mm/kasan/common.c:56 [inline]
       kasan_set_free_info mm/kasan/common.c:316 [inline]
       __kasan_slab_free+0x12c/0x170 mm/kasan/common.c:455
       slab_free_hook mm/slub.c:1474 [inline]
       slab_free_freelist_hook mm/slub.c:1507 [inline]
       slab_free mm/slub.c:3072 [inline]
       kfree+0xe6/0x320 mm/slub.c:4052
       mlxsw_core_reg_access_emad+0xd45/0x1410 drivers/net/ethernet/mellanox/mlxsw/core.c:1819
       mlxsw_core_reg_access+0xeb/0x540 drivers/net/ethernet/mellanox/mlxsw/core.c:1991
       mlxsw_sp_port_get_hw_xstats+0x335/0x7e0 drivers/net/ethernet/mellanox/mlxsw/spectrum.c:1130
       update_stats_cache+0xf4/0x140 drivers/net/ethernet/mellanox/mlxsw/spectrum.c:1173
       process_one_work+0xa3e/0x17a0 kernel/workqueue.c:2269
       worker_thread+0x9e/0x1050 kernel/workqueue.c:2415
       kthread+0x355/0x470 kernel/kthread.c:291
       ret_from_fork+0x22/0x30 arch/x86/entry/entry_64.S:293
      
      The buggy address belongs to the object at ffff88800b796400
       which belongs to the cache kmalloc-512 of size 512
      The buggy address is located 232 bytes inside of
       512-byte region [ffff88800b796400, ffff88800b796600)
      The buggy address belongs to the page:
      page:ffffea00002de500 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 head:ffffea00002de500 order:2 compound_mapcount:0 compound_pincount:0
      flags: 0x100000000010200(slab|head)
      raw: 0100000000010200 dead000000000100 dead000000000122 ffff88806c402500
      raw: 0000000000000000 0000000000100010 00000001ffffffff 0000000000000000
      page dumped because: kasan: bad access detected
      
      Memory state around the buggy address:
       ffff88800b796380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
       ffff88800b796400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      >ffff88800b796480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                                                                ^
       ffff88800b796500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
       ffff88800b796580: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      
      Fixes: caf7297e ("mlxsw: core: Introduce support for asynchronous EMAD register access")
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Reviewed-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3c8ce24b
    • Ido Schimmel's avatar
      mlxsw: core: Increase scope of RCU read-side critical section · 7d8e8f34
      Ido Schimmel authored
      The lifetime of the Rx listener item ('rxl_item') is managed using RCU,
      but is dereferenced outside of RCU read-side critical section, which can
      lead to a use-after-free.
      
      Fix this by increasing the scope of the RCU read-side critical section.
      
      Fixes: 93c1edb2 ("mlxsw: Introduce Mellanox switch driver core")
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Reviewed-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7d8e8f34
    • Ido Schimmel's avatar
      mlxsw: spectrum: Use different trap group for externally routed packets · ec4f5b36
      Ido Schimmel authored
      Cited commit mistakenly removed the trap group for externally routed
      packets (e.g., via the management interface) and grouped locally routed
      and externally routed packet traps under the same group, thereby
      subjecting them to the same policer.
      
      This can result in problems, for example, when FRR is restarted and
      suddenly all transient traffic is trapped to the CPU because of a
      default route through the management interface. Locally routed packets
      required to re-establish a BGP connection will never reach the CPU and
      the routing tables will not be re-populated.
      
      Fix this by using a different trap group for externally routed packets.
      
      Fixes: 8110668e ("mlxsw: spectrum_trap: Register layer 3 control traps")
      Reported-by: default avatarAlex Veber <alexve@mellanox.com>
      Tested-by: default avatarAlex Veber <alexve@mellanox.com>
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Reviewed-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ec4f5b36
    • Ido Schimmel's avatar
      mlxsw: spectrum_router: Allow programming link-local host routes · 89ab5331
      Ido Schimmel authored
      Cited commit added the ability to program link-local prefix routes to
      the ASIC so that relevant packets are routed and trapped correctly.
      
      However, host routes were not included in the change and thus not
      programmed to the ASIC. This can result in packets being trapped via an
      external route trap instead of a local route trap as in IPv4.
      
      Fix this by programming all the link-local routes to the ASIC.
      
      Fixes: 10d3757f ("mlxsw: spectrum_router: Allow programming link-local prefix routes")
      Reported-by: default avatarAlex Veber <alexve@mellanox.com>
      Tested-by: default avatarAlex Veber <alexve@mellanox.com>
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Reviewed-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      89ab5331
    • Ido Schimmel's avatar
      ipv4: Silence suspicious RCU usage warning · 83f35228
      Ido Schimmel authored
      fib_trie_unmerge() is called with RTNL held, but not from an RCU
      read-side critical section. This leads to the following warning [1] when
      the FIB alias list in a leaf is traversed with
      hlist_for_each_entry_rcu().
      
      Since the function is always called with RTNL held and since
      modification of the list is protected by RTNL, simply use
      hlist_for_each_entry() and silence the warning.
      
      [1]
      WARNING: suspicious RCU usage
      5.8.0-rc4-custom-01520-gc1f937f3f83b #30 Not tainted
      -----------------------------
      net/ipv4/fib_trie.c:1867 RCU-list traversed in non-reader section!!
      
      other info that might help us debug this:
      
      rcu_scheduler_active = 2, debug_locks = 1
      1 lock held by ip/164:
       #0: ffffffff85a27850 (rtnl_mutex){+.+.}-{3:3}, at: rtnetlink_rcv_msg+0x49a/0xbd0
      
      stack backtrace:
      CPU: 0 PID: 164 Comm: ip Not tainted 5.8.0-rc4-custom-01520-gc1f937f3f83b #30
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-2.fc32 04/01/2014
      Call Trace:
       dump_stack+0x100/0x184
       lockdep_rcu_suspicious+0x153/0x15d
       fib_trie_unmerge+0x608/0xdb0
       fib_unmerge+0x44/0x360
       fib4_rule_configure+0xc8/0xad0
       fib_nl_newrule+0x37a/0x1dd0
       rtnetlink_rcv_msg+0x4f7/0xbd0
       netlink_rcv_skb+0x17a/0x480
       rtnetlink_rcv+0x22/0x30
       netlink_unicast+0x5ae/0x890
       netlink_sendmsg+0x98a/0xf40
       ____sys_sendmsg+0x879/0xa00
       ___sys_sendmsg+0x122/0x190
       __sys_sendmsg+0x103/0x1d0
       __x64_sys_sendmsg+0x7d/0xb0
       do_syscall_64+0x54/0xa0
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      RIP: 0033:0x7fc80a234e97
      Code: Bad RIP value.
      RSP: 002b:00007ffef8b66798 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
      RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fc80a234e97
      RDX: 0000000000000000 RSI: 00007ffef8b66800 RDI: 0000000000000003
      RBP: 000000005f141b1c R08: 0000000000000001 R09: 0000000000000000
      R10: 00007fc80a2a8ac0 R11: 0000000000000246 R12: 0000000000000001
      R13: 0000000000000000 R14: 00007ffef8b67008 R15: 0000556fccb10020
      
      Fixes: 0ddcf43d ("ipv4: FIB Local/MAIN table collapse")
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Reviewed-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      83f35228
    • Ido Schimmel's avatar
      vxlan: Ensure FDB dump is performed under RCU · b5141915
      Ido Schimmel authored
      The commit cited below removed the RCU read-side critical section from
      rtnl_fdb_dump() which means that the ndo_fdb_dump() callback is invoked
      without RCU protection.
      
      This results in the following warning [1] in the VXLAN driver, which
      relied on the callback being invoked from an RCU read-side critical
      section.
      
      Fix this by calling rcu_read_lock() in the VXLAN driver, as already done
      in the bridge driver.
      
      [1]
      WARNING: suspicious RCU usage
      5.8.0-rc4-custom-01521-g481007553ce6 #29 Not tainted
      -----------------------------
      drivers/net/vxlan.c:1379 RCU-list traversed in non-reader section!!
      
      other info that might help us debug this:
      
      rcu_scheduler_active = 2, debug_locks = 1
      1 lock held by bridge/166:
       #0: ffffffff85a27850 (rtnl_mutex){+.+.}-{3:3}, at: netlink_dump+0xea/0x1090
      
      stack backtrace:
      CPU: 1 PID: 166 Comm: bridge Not tainted 5.8.0-rc4-custom-01521-g481007553ce6 #29
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-2.fc32 04/01/2014
      Call Trace:
       dump_stack+0x100/0x184
       lockdep_rcu_suspicious+0x153/0x15d
       vxlan_fdb_dump+0x51e/0x6d0
       rtnl_fdb_dump+0x4dc/0xad0
       netlink_dump+0x540/0x1090
       __netlink_dump_start+0x695/0x950
       rtnetlink_rcv_msg+0x802/0xbd0
       netlink_rcv_skb+0x17a/0x480
       rtnetlink_rcv+0x22/0x30
       netlink_unicast+0x5ae/0x890
       netlink_sendmsg+0x98a/0xf40
       __sys_sendto+0x279/0x3b0
       __x64_sys_sendto+0xe6/0x1a0
       do_syscall_64+0x54/0xa0
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      RIP: 0033:0x7fe14fa2ade0
      Code: Bad RIP value.
      RSP: 002b:00007fff75bb5b88 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
      RAX: ffffffffffffffda RBX: 00005614b1ba0020 RCX: 00007fe14fa2ade0
      RDX: 000000000000011c RSI: 00007fff75bb5b90 RDI: 0000000000000003
      RBP: 00007fff75bb5b90 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000246 R12: 00005614b1b89160
      R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
      
      Fixes: 5e6d2435 ("bridge: netlink dump interface at par with brctl")
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Reviewed-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b5141915
    • Martin Varghese's avatar
      Documentation: bareudp: Corrected description of bareudp module. · 1ed06dbc
      Martin Varghese authored
      Removed redundant words.
      
      Fixes: 571912c6 ("net: UDP tunnel encapsulation module for tunnelling different protocols like MPLS, IP, NSH etc.")
      Signed-off-by: default avatarMartin Varghese <martin.varghese@nokia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1ed06dbc
    • Guillaume Nault's avatar
      bareudp: forbid mixing IP and MPLS in multiproto mode · 302d201b
      Guillaume Nault authored
      In multiproto mode, bareudp_xmit() accepts sending multicast MPLS and
      IPv6 packets regardless of the bareudp ethertype. In practice, this
      let an IP tunnel send multicast MPLS packets, or an MPLS tunnel send
      IPv6 packets.
      
      We need to restrict the test further, so that the multiproto mode only
      enables
        * IPv6 for IPv4 tunnels,
        * or multicast MPLS for unicast MPLS tunnels.
      
      To improve clarity, the protocol validation is moved to its own
      function, where each logical test has its own condition.
      
      v2: s/ntohs/htons/
      
      Fixes: 4b5f6723 ("net: Special handling for IP & MPLS.")
      Signed-off-by: default avatarGuillaume Nault <gnault@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      302d201b
    • Xiyu Yang's avatar
      ipv6: Fix nexthop refcnt leak when creating ipv6 route info · 706ec919
      Xiyu Yang authored
      ip6_route_info_create() invokes nexthop_get(), which increases the
      refcount of the "nh".
      
      When ip6_route_info_create() returns, local variable "nh" becomes
      invalid, so the refcount should be decreased to keep refcount balanced.
      
      The reference counting issue happens in one exception handling path of
      ip6_route_info_create(). When nexthops can not be used with source
      routing, the function forgets to decrease the refcnt increased by
      nexthop_get(), causing a refcnt leak.
      
      Fix this issue by pulling up the error source routing handling when
      nexthops can not be used with source routing.
      
      Fixes: f88d8ea6 ("ipv6: Plumb support for nexthop object in a fib6_info")
      Signed-off-by: default avatarXiyu Yang <xiyuyang19@fudan.edu.cn>
      Signed-off-by: default avatarXin Tan <tanxin.ctf@gmail.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      706ec919
    • David S. Miller's avatar
      Merge branch 'Fix-bugs-in-Octeontx2-netdev-driver' · fa662d78
      David S. Miller authored
      Subbaraya Sundeep says:
      
      ====================
      Fix bugs in Octeontx2 netdev driver
      
      There are problems in the existing Octeontx2
      netdev drivers like missing cancel_work for the
      reset task, missing lock in reset task and
      missing unergister_netdev in driver remove.
      This patch set fixes the above problems.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fa662d78
    • Subbaraya Sundeep's avatar
      octeontx2-pf: Unregister netdev at driver remove · ed543f5c
      Subbaraya Sundeep authored
      Added unregister_netdev in the driver remove
      function. Generally unregister_netdev is called
      after disabling all the device interrupts but here
      it is called before disabling device mailbox
      interrupts. The reason behind this is VF needs
      mailbox interrupt to communicate with its PF to
      clean up its resources during otx2_stop.
      otx2_stop disables packet I/O and queue interrupts
      first and by using mailbox interrupt communicates
      to PF to free VF resources. Hence this patch
      calls unregister_device just before
      disabling mailbox interrupts.
      
      Fixes: 3184fb5b ("octeontx2-vf: Virtual function driver support")
      Signed-off-by: default avatarSubbaraya Sundeep <sbhatta@marvell.com>
      Signed-off-by: default avatarSunil Goutham <sgoutham@marvell.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ed543f5c
    • Subbaraya Sundeep's avatar
      octeontx2-pf: cancel reset_task work · c0376f47
      Subbaraya Sundeep authored
      During driver exit cancel the queued
      reset_task work in VF driver.
      
      Fixes: 3184fb5b ("octeontx2-vf: Virtual function driver support")
      Signed-off-by: default avatarSubbaraya Sundeep <sbhatta@marvell.com>
      Signed-off-by: default avatarSunil Goutham <sgoutham@marvell.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c0376f47
    • Subbaraya Sundeep's avatar
      octeontx2-pf: Fix reset_task bugs · 948a6633
      Subbaraya Sundeep authored
      Two bugs exist in the code related to reset_task
      in PF driver one is the missing protection
      against network stack ndo_open and ndo_close.
      Other one is the missing cancel_work.
      This patch fixes those problems.
      
      Fixes: 4ff7d148 ("octeontx2-pf: Error handling support")
      Signed-off-by: default avatarSubbaraya Sundeep <sbhatta@marvell.com>
      Signed-off-by: default avatarSunil Goutham <sgoutham@marvell.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      948a6633
    • Jakub Kicinski's avatar
      mlx4: disable device on shutdown · 3cab8c65
      Jakub Kicinski authored
      It appears that not disabling a PCI device on .shutdown may lead to
      a Hardware Error with particular (perhaps buggy) BIOS versions:
      
          mlx4_en: eth0: Close port called
          mlx4_en 0000:04:00.0: removed PHC
          reboot: Restarting system
          {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 1
          {1}[Hardware Error]: event severity: fatal
          {1}[Hardware Error]:  Error 0, type: fatal
          {1}[Hardware Error]:   section_type: PCIe error
          {1}[Hardware Error]:   port_type: 4, root port
          {1}[Hardware Error]:   version: 1.16
          {1}[Hardware Error]:   command: 0x4010, status: 0x0143
          {1}[Hardware Error]:   device_id: 0000:00:02.2
          {1}[Hardware Error]:   slot: 0
          {1}[Hardware Error]:   secondary_bus: 0x04
          {1}[Hardware Error]:   vendor_id: 0x8086, device_id: 0x2f06
          {1}[Hardware Error]:   class_code: 000604
          {1}[Hardware Error]:   bridge: secondary_status: 0x2000, control: 0x0003
          {1}[Hardware Error]:   aer_uncor_status: 0x00100000, aer_uncor_mask: 0x00000000
          {1}[Hardware Error]:   aer_uncor_severity: 0x00062030
          {1}[Hardware Error]:   TLP Header: 40000018 040000ff 791f4080 00000000
      [hw error repeats]
          Kernel panic - not syncing: Fatal hardware error!
          CPU: 0 PID: 2189 Comm: reboot Kdump: loaded Not tainted 5.6.x-blabla #1
          Hardware name: HP ProLiant DL380 Gen9/ProLiant DL380 Gen9, BIOS P89 05/05/2017
      
      Fix the mlx4 driver.
      
      This is a very similar problem to what had been fixed in:
      commit 0d98ba8d ("scsi: hpsa: disable device during shutdown")
      to address https://bugzilla.kernel.org/show_bug.cgi?id=199779.
      
      Fixes: 2ba5fbd6 ("net/mlx4_core: Handle AER flow properly")
      Reported-by: default avatarJake Lawrence <lawja@fb.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Reviewed-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3cab8c65
    • David S. Miller's avatar
      Merge branch 'rhashtable-Fix-unprotected-RCU-dereference-in-__rht_ptr' · a7ef23e5
      David S. Miller authored
      Herbert Xu says:
      
      ====================
      rhashtable: Fix unprotected RCU dereference in __rht_ptr
      
      This patch series fixes an unprotected dereference in __rht_ptr.
      The first patch is a minimal fix that does not use the correct
      RCU markings but is suitable for backport, and the second patch
      cleans up the RCU markings.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a7ef23e5
    • Herbert Xu's avatar
      rhashtable: Restore RCU marking on rhash_lock_head · ce9b362b
      Herbert Xu authored
      This patch restores the RCU marking on bucket_table->buckets as
      it really does need RCU protection.  Its removal had led to a fatal
      bug.
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ce9b362b
    • Herbert Xu's avatar
      rhashtable: Fix unprotected RCU dereference in __rht_ptr · 1748f6a2
      Herbert Xu authored
      The rcu_dereference call in rht_ptr_rcu is completely bogus because
      we've already dereferenced the value in __rht_ptr and operated on it.
      This causes potential double readings which could be fatal.  The RCU
      dereference must occur prior to the comparison in __rht_ptr.
      
      This patch changes the order of RCU dereference so that it is done
      first and the result is then fed to __rht_ptr.  The RCU marking
      changes have been minimised using casts which will be removed in
      a follow-up patch.
      
      Fixes: ba6306e3 ("rhashtable: Remove RCU marking from...")
      Reported-by: default avatar"Gong, Sishuai" <sishuai@purdue.edu>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1748f6a2
    • René van Dorst's avatar
      net: ethernet: mtk_eth_soc: Always call mtk_gmac0_rgmii_adjust() for mt7623 · 19016d93
      René van Dorst authored
      Modify mtk_gmac0_rgmii_adjust() so it can always be called.
      mtk_gmac0_rgmii_adjust() sets-up the TRGMII clocks.
      Signed-off-by: default avatarRené van Dorst <opensource@vdorst.com>
      Signed-off-By: default avatarDavid Woodhouse <dwmw2@infradead.org>
      Tested-by: default avatarFrank Wunderlich <frank-w@public-files.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      19016d93
  3. 28 Jul, 2020 16 commits
    • David S. Miller's avatar
      Merge tag 'mlx5-fixes-2020-07-28' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux · b5cd55b3
      David S. Miller authored
      Saeed Mahameed says:
      
      ====================
      mlx5 fixes-2020-07-28
      
      This series introduces some fixes to mlx5 driver.
      v1->v2:
       - Drop the "Hold reference on mirred devices" patch, until Or's
         comments are addressed.
       - Imporve "Modify uplink state" patch commit message per Or's request.
      
      Please pull and let me know if there is any problem.
      
      For -Stable:
      
      For -stable v4.9
       ('net/mlx5e: Fix error path of device attach')
      
      For -stable v4.15
       ('net/mlx5: Verify Hardware supports requested ptp function on a given
      pin')
      
      For -stable v5.3
       ('net/mlx5e: Modify uplink state on interface up/down')
      
      For -stable v5.4
       ('net/mlx5e: Fix kernel crash when setting vf VLANID on a VF dev')
       ('net/mlx5: E-switch, Destroy TSAR when fail to enable the mode')
      
      For -stable v5.5
       ('net/mlx5: E-switch, Destroy TSAR after reload interface')
      
      For -stable v5.7
       ('net/mlx5: Fix a bug of using ptp channel index as pin index')
      ====================
      Acked-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b5cd55b3
    • David S. Miller's avatar
      Merge branch 'net-lan78xx-fix-NULL-deref-and-memory-leak' · 2ff34c90
      David S. Miller authored
      Johan Hovold says:
      
      ====================
      net: lan78xx: fix NULL deref and memory leak
      
      The first two patches fix a NULL-pointer dereference at probe that can
      be triggered by a malicious device and a small transfer-buffer memory
      leak, respectively.
      
      For another subsystem I would have marked them:
      
      	Cc: stable@vger.kernel.org	# 4.3
      
      The third one replaces the driver's current broken endpoint lookup
      helper, which could end up accepting incomplete interfaces and whose
      results weren't even useeren
      Johan
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2ff34c90
    • Johan Hovold's avatar
      net: lan78xx: replace bogus endpoint lookup · ea060b35
      Johan Hovold authored
      Drop the bogus endpoint-lookup helper which could end up accepting
      interfaces based on endpoints belonging to unrelated altsettings.
      
      Note that the returned bulk pipes and interrupt endpoint descriptor
      were never actually used. Instead the bulk-endpoint numbers are
      hardcoded to 1 and 2 (matching the specification), while the interrupt-
      endpoint descriptor was assumed to be the third descriptor created by
      USB core.
      
      Try to bring some order to this by dropping the bogus lookup helper and
      adding the missing endpoint sanity checks while keeping the interrupt-
      descriptor assumption for now.
      Signed-off-by: default avatarJohan Hovold <johan@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ea060b35
    • Johan Hovold's avatar
      net: lan78xx: fix transfer-buffer memory leak · 63634aa6
      Johan Hovold authored
      The interrupt URB transfer-buffer was never freed on disconnect or after
      probe errors.
      
      Fixes: 55d7de9d ("Microchip's LAN7800 family USB 2/3 to 10/100/1000 Ethernet device driver")
      Cc: Woojung.Huh@microchip.com <Woojung.Huh@microchip.com>
      Signed-off-by: default avatarJohan Hovold <johan@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      63634aa6
    • Johan Hovold's avatar
      net: lan78xx: add missing endpoint sanity check · 8d8e95fd
      Johan Hovold authored
      Add the missing endpoint sanity check to prevent a NULL-pointer
      dereference should a malicious device lack the expected endpoints.
      
      Note that the driver has a broken endpoint-lookup helper,
      lan78xx_get_endpoints(), which can end up accepting interfaces in an
      altsetting without endpoints as long as *some* altsetting has a bulk-in
      and a bulk-out endpoint.
      
      Fixes: 55d7de9d ("Microchip's LAN7800 family USB 2/3 to 10/100/1000 Ethernet device driver")
      Cc: Woojung.Huh@microchip.com <Woojung.Huh@microchip.com>
      Signed-off-by: default avatarJohan Hovold <johan@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8d8e95fd
    • Rustam Kovhaev's avatar
      usb: hso: check for return value in hso_serial_common_create() · e911e99a
      Rustam Kovhaev authored
      in case of an error tty_register_device_attr() returns ERR_PTR(),
      add IS_ERR() check
      
      Reported-and-tested-by: syzbot+67b2bd0e34f952d0321e@syzkaller.appspotmail.com
      Link: https://syzkaller.appspot.com/bug?extid=67b2bd0e34f952d0321eSigned-off-by: default avatarRustam Kovhaev <rkovhaev@gmail.com>
      Reviewed-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e911e99a
    • Alaa Hleihel's avatar
      net/mlx5e: Fix kernel crash when setting vf VLANID on a VF dev · 350a6324
      Alaa Hleihel authored
      After the cited commit, function 'mlx5_eswitch_set_vport_vlan' started
      to acquire esw->state_lock.
      However, esw is not defined for VF devices, hence attempting to set vf
      VLANID on a VF dev will cause a kernel panic.
      
      Fix it by moving up the (redundant) esw validation from function
      '__mlx5_eswitch_set_vport_vlan' since the rest of the callers now have
      and use a valid esw.
      
      For example with vf device eth4:
       # ip link set dev eth4 vf 0 vlan 0
      
      Trace of the panic:
       [  411.409842] BUG: unable to handle page fault for address: 00000000000011b8
       [  411.449745] #PF: supervisor read access in kernel mode
       [  411.452348] #PF: error_code(0x0000) - not-present page
       [  411.454938] PGD 80000004189c9067 P4D 80000004189c9067 PUD 41899a067 PMD 0
       [  411.458382] Oops: 0000 [#1] SMP PTI
       [  411.460268] CPU: 4 PID: 5711 Comm: ip Not tainted 5.8.0-rc4_for_upstream_min_debug_2020_07_08_22_04 #1
       [  411.462447] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
       [  411.464158] RIP: 0010:__mutex_lock+0x4e/0x940
       [  411.464928] Code: fd 41 54 49 89 f4 41 52 53 89 d3 48 83 ec 70 44 8b 1d ee 03 b0 01 65 48 8b 04 25 28 00 00 00 48 89 45 c8 31 c0 45 85 db 75 0a <48> 3b 7f 60 0f 85 7e 05 00 00 49 8d 45 68 41 56 41 b8 01 00 00 00
       [  411.467678] RSP: 0018:ffff88841fcd74b0 EFLAGS: 00010246
       [  411.468562] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
       [  411.469715] RDX: 0000000000000000 RSI: 0000000000000002 RDI: 0000000000001158
       [  411.470812] RBP: ffff88841fcd7550 R08: ffffffffa00fa1ce R09: 0000000000000000
       [  411.471835] R10: ffff88841fcd7570 R11: 0000000000000000 R12: 0000000000000002
       [  411.472862] R13: 0000000000001158 R14: ffffffffa00fa1ce R15: 0000000000000000
       [  411.474004] FS:  00007faee7ca6b80(0000) GS:ffff88846fc00000(0000) knlGS:0000000000000000
       [  411.475237] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       [  411.476129] CR2: 00000000000011b8 CR3: 000000041909c006 CR4: 0000000000360ea0
       [  411.477260] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
       [  411.478340] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
       [  411.479332] Call Trace:
       [  411.479760]  ? __nla_validate_parse.part.6+0x57/0x8f0
       [  411.482825]  ? mlx5_eswitch_set_vport_vlan+0x3e/0xa0 [mlx5_core]
       [  411.483804]  mlx5_eswitch_set_vport_vlan+0x3e/0xa0 [mlx5_core]
       [  411.484733]  mlx5e_set_vf_vlan+0x41/0x50 [mlx5_core]
       [  411.485545]  do_setlink+0x613/0x1000
       [  411.486165]  __rtnl_newlink+0x53d/0x8c0
       [  411.486791]  ? mark_held_locks+0x49/0x70
       [  411.487429]  ? __lock_acquire+0x8fe/0x1eb0
       [  411.488085]  ? rcu_read_lock_sched_held+0x52/0x60
       [  411.488998]  ? kmem_cache_alloc_trace+0x16d/0x2d0
       [  411.489759]  rtnl_newlink+0x47/0x70
       [  411.490357]  rtnetlink_rcv_msg+0x24e/0x450
       [  411.490978]  ? netlink_deliver_tap+0x92/0x3d0
       [  411.491631]  ? validate_linkmsg+0x330/0x330
       [  411.492262]  netlink_rcv_skb+0x47/0x110
       [  411.492852]  netlink_unicast+0x1ac/0x270
       [  411.493551]  netlink_sendmsg+0x336/0x450
       [  411.494209]  sock_sendmsg+0x30/0x40
       [  411.494779]  ____sys_sendmsg+0x1dd/0x1f0
       [  411.495378]  ? copy_msghdr_from_user+0x5c/0x90
       [  411.496082]  ___sys_sendmsg+0x87/0xd0
       [  411.496683]  ? lock_acquire+0xb9/0x3a0
       [  411.497322]  ? lru_cache_add+0x5/0x170
       [  411.497944]  ? find_held_lock+0x2d/0x90
       [  411.498568]  ? handle_mm_fault+0xe46/0x18c0
       [  411.499205]  ? __sys_sendmsg+0x51/0x90
       [  411.499784]  __sys_sendmsg+0x51/0x90
       [  411.500341]  do_syscall_64+0x59/0x2e0
       [  411.500938]  ? asm_exc_page_fault+0x8/0x30
       [  411.501609]  ? rcu_read_lock_sched_held+0x52/0x60
       [  411.502350]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
       [  411.503093] RIP: 0033:0x7faee73b85a7
       [  411.503654] Code: Bad RIP value.
      
      Fixes: 0e18134f ("net/mlx5e: Eswitch, use state_lock to synchronize vlan change")
      Signed-off-by: default avatarAlaa Hleihel <alaa@mellanox.com>
      Reviewed-by: default avatarRoi Dayan <roid@mellanox.com>
      Reviewed-by: default avatarVlad Buslov <vladbu@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      350a6324
    • Ron Diskin's avatar
      net/mlx5e: Modify uplink state on interface up/down · 7d0314b1
      Ron Diskin authored
      When setting the PF interface up/down, notify the firmware to update
      uplink state via MODIFY_VPORT_STATE, when E-Switch is enabled.
      
      This behavior will prevent sending traffic out on uplink port when PF is
      down, such as sending traffic from a VF interface which is still up.
      Currently when calling mlx5e_open/close(), the driver only sends PAOS
      command to notify the firmware to set the physical port state to
      up/down, however, it is not sufficient. When VF is in "auto" state, it
      follows the uplink state, which was not updated on mlx5e_open/close()
      before this patch.
      
      When switchdev mode is enabled and uplink representor is first enabled,
      set the uplink port state value back to its FW default "AUTO".
      
      Fixes: 63bfd399 ("net/mlx5e: Send PAOS command on interface up/down")
      Signed-off-by: default avatarRon Diskin <rondi@mellanox.com>
      Reviewed-by: default avatarRoi Dayan <roid@mellanox.com>
      Reviewed-by: default avatarMoshe Shemesh <moshe@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      7d0314b1
    • Eran Ben Elisha's avatar
      net/mlx5: Query PPS pin operational status before registering it · ed56d749
      Eran Ben Elisha authored
      In a special configuration, a ConnectX6-Dx pin pps-out might be activated
      when driver is loaded. Fix the driver to always read the operational pin
      mode when registering it, and advertise it accordingly.
      
      Fixes: ee7f1220 ("net/mlx5e: Implement 1PPS support")
      Signed-off-by: default avatarEran Ben Elisha <eranbe@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      ed56d749
    • Raed Salem's avatar
      net/mlx5e: Fix slab-out-of-bounds in mlx5e_rep_is_lag_netdev · 21083309
      Raed Salem authored
      mlx5e_rep_is_lag_netdev is used as first check as part of netdev events
      handler for bond device of non-uplink representors, this handler can get
      any netdevice under the same network namespace of mlx5e netdevice. Current
      code treats the netdev as mlx5e netdev and only later on verifies this,
      hence causes the following Kasan trace:
      [15402.744990] ==================================================================
      [15402.746942] BUG: KASAN: slab-out-of-bounds in mlx5e_rep_is_lag_netdev+0xcb/0xf0 [mlx5_core]
      [15402.749009] Read of size 8 at addr ffff880391f3f6b0 by task ovs-vswitchd/5347
      
      [15402.752065] CPU: 7 PID: 5347 Comm: ovs-vswitchd Kdump: loaded Tainted: G    B      O     --------- -t - 4.18.0-g3dcc204d291d-dirty #1
      [15402.755349] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
      [15402.757600] Call Trace:
      [15402.758968]  dump_stack+0x71/0xab
      [15402.760427]  print_address_description+0x6a/0x270
      [15402.761969]  kasan_report+0x179/0x2d0
      [15402.763445]  ? mlx5e_rep_is_lag_netdev+0xcb/0xf0 [mlx5_core]
      [15402.765121]  mlx5e_rep_is_lag_netdev+0xcb/0xf0 [mlx5_core]
      [15402.766782]  mlx5e_rep_esw_bond_netevent+0x129/0x620 [mlx5_core]
      
      Fix by deferring the violating access to be post the netdev verify check.
      
      Fixes: 7e51891a ("net/mlx5e: Use netdev events to set/del egress acl forward-to-vport rule")
      Signed-off-by: default avatarRaed Salem <raeds@mellanox.com>
      Reviewed-by: default avatarRoi Dayan <roid@mellanox.com>
      Reviewed-by: default avatarVu Pham <vuhuong@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      21083309
    • Eran Ben Elisha's avatar
      net/mlx5: Verify Hardware supports requested ptp function on a given pin · 071995c8
      Eran Ben Elisha authored
      Fix a bug where driver did not verify Hardware pin capabilities for
      PTP functions.
      
      Fixes: ee7f1220 ("net/mlx5e: Implement 1PPS support")
      Signed-off-by: default avatarEran Ben Elisha <eranbe@mellanox.com>
      Reviewed-by: default avatarAriel Levkovich <lariel@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      071995c8
    • Eran Ben Elisha's avatar
      net/mlx5: Fix a bug of using ptp channel index as pin index · 88c8cf92
      Eran Ben Elisha authored
      On PTP mlx5_ptp_enable(on=0) flow, driver mistakenly used channel index
      as pin index.
      
      After ptp patch marked in fixes tag was introduced, driver can freely
      call ptp_find_pin() as part of the .enable() callback.
      
      Fix driver mlx5_ptp_enable(on=0) flow to always use ptp_find_pin(). With
      that, Driver will use the correct pin index in mlx5_ptp_enable(on=0) flow.
      
      In addition, when initializing the pins, always set channel to zero. As
      all pins can be attached to all channels, let ptp_set_pinfunc() to move
      them between the channels.
      
      For stable branches, this fix to be applied only on kernels that includes
      both patches in fixes tag. Otherwise, mlx5_ptp_enable(on=0) will be stuck
      on pincfg_mux.
      
      Fixes: 62582a7e ("ptp: Avoid deadlocks in the programmable pin code.")
      Fixes: ee7f1220 ("net/mlx5e: Implement 1PPS support")
      Signed-off-by: default avatarEran Ben Elisha <eranbe@mellanox.com>
      Reviewed-by: default avatarAriel Levkovich <lariel@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      88c8cf92
    • Maor Dickman's avatar
      net/mlx5e: Fix missing cleanup of ethtool steering during rep rx cleanup · 0e2e7aa5
      Maor Dickman authored
      The cited commit add initialization of ethtool steering during
      representor rx initializations without cleaning it up in representor
      rx cleanup, this may cause for stale ethtool flows to remain after
      moving back from switchdev mode to legacy mode.
      
      Fixed by calling ethtool steering cleanup during rep rx cleanup.
      
      Fixes: 6783e8b2 ("net/mlx5e: Init ethtool steering for representors")
      Signed-off-by: default avatarMaor Dickman <maord@mellanox.com>
      Reviewed-by: default avatarRoi Dayan <roid@mellanox.com>
      Reviewed-by: default avatarVlad Buslov <vladbu@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      0e2e7aa5
    • Aya Levin's avatar
      net/mlx5e: Fix error path of device attach · 5cd39b6e
      Aya Levin authored
      On failure to attach the netdev, fix the rollback by re-setting the
      device's state back to MLX5E_STATE_DESTROYING.
      
      Failing to attach doesn't stop statistics polling via .ndo_get_stats64.
      In this case, although the device is not attached, it falsely continues
      to query the firmware for counters. Setting the device's state back to
      MLX5E_STATE_DESTROYING prevents the firmware counters query.
      
      Fixes: 26e59d80 ("net/mlx5e: Implement mlx5e interface attach/detach callbacks")
      Signed-off-by: default avatarAya Levin <ayal@mellanox.com>
      Reviewed-by: default avatarTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      5cd39b6e
    • Maor Gottlieb's avatar
      net/mlx5: Fix forward to next namespace · 59f8f7c8
      Maor Gottlieb authored
      The steering tree is as follow (nic RX as example):
      		   ---------
                         |root_ns|
      		   ---------
      			|
            	--------------------------------
          	|		|	       |
         ---------- 	   ----------      ---------
         |p(prio)0|	   |   p1   |      |   pn  |
         ----------	   ----------	   ---------
              |		|
       ----------------  ---------------
       |ns(e.g bypass)|  |ns(e.g. lag) |
       ----------------  ---------------
        |     |    |
      ----  ----  ----
      |p0|  |p1|  |pn|
      ----  ----  ----
       |
      ----
      |FT|
      ----
      
      find_next_chained_ft(prio) returns the first flow table in the next
      priority. If prio is a parent of a flow table then it returns the first
      flow table in the next priority in the same namespace, else if prio
      is parent of namespace, then it should return the first flow table
      in the next namespace. Currently if the user requests to forward to
      next namespace, the code calls to find_next_chained_ft with the prio
      of the next namespace and not the prio of the namesapce itself.
      
      Fixes: 9254f8ed ("net/mlx5: Add support in forward to namespace")
      Signed-off-by: default avatarMaor Gottlieb <maorg@mellanox.com>
      Reviewed-by: default avatarMark Bloch <markb@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      59f8f7c8
    • Parav Pandit's avatar
      net/mlx5: E-switch, Destroy TSAR after reload interface · 0c2600c6
      Parav Pandit authored
      When eswitch offloads is enabled, TSAR is created before reloading
      the interfaces.
      However when eswitch offloads mode is disabled, TSAR is disabled before
      reloading the interfaces.
      
      To keep the eswitch enable/disable sequence as mirror, destroy TSAR
      after reloading the interfaces.
      
      Fixes: 1bd27b11 ("net/mlx5: Introduce E-switch QoS management")
      Signed-off-by: default avatarParav Pandit <parav@mellanox.com>
      Reviewed-by: default avatarRoi Dayan <roid@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      0c2600c6