1. 22 Apr, 2020 11 commits
  2. 21 Apr, 2020 8 commits
    • Voon Weifeng's avatar
      net: stmmac: Enable SERDES power up/down sequence · b9663b7c
      Voon Weifeng authored
      This patch is to enable Intel SERDES power up/down sequence. The SERDES
      converts 8/10 bits data to SGMII signal. Below is an example of
      HW configuration for SGMII mode. The SERDES is located in the PHY IF
      in the diagram below.
      
      <-----------------GBE Controller---------->|<--External PHY chip-->
      +----------+         +----+            +---+           +----------+
      |   EQoS   | <-GMII->| DW | < ------ > |PHY| <-SGMII-> | External |
      |   MAC    |         |xPCS|            |IF |           | PHY      |
      +----------+         +----+            +---+           +----------+
             ^               ^                 ^                ^
             |               |                 |                |
             +---------------------MDIO-------------------------+
      
      PHY IF configuration and status registers are accessible through
      mdio address 0x15 which is defined as mdio_adhoc_addr. During D0,
      The driver will need to power up PHY IF by changing the power state
      to P0. Likewise, for D3, the driver sets PHY IF power state to P3.
      Signed-off-by: default avatarVoon Weifeng <weifeng.voon@intel.com>
      Signed-off-by: default avatarOng Boon Leong <boon.leong.ong@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b9663b7c
    • Dejin Zheng's avatar
      net: broadcom: convert to devm_platform_ioremap_resource_byname() · d7a5502b
      Dejin Zheng authored
      Use the function devm_platform_ioremap_resource_byname() to simplify
      source code which calls the functions platform_get_resource_byname()
      and devm_ioremap_resource(). Remove also a few error messages which
      became unnecessary with this software refactoring.
      Suggested-by: default avatarMarkus Elfring <Markus.Elfring@web.de>
      Signed-off-by: default avatarDejin Zheng <zhengdejin5@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d7a5502b
    • Taehee Yoo's avatar
      macvlan: fix null dereference in macvlan_device_event() · 4dee15b4
      Taehee Yoo authored
      In the macvlan_device_event(), the list_first_entry_or_null() is used.
      This function could return null pointer if there is no node.
      But, the macvlan module doesn't check the null pointer.
      So, null-ptr-deref would occur.
      
            bond0
              |
         +----+-----+
         |          |
      macvlan0   macvlan1
         |          |
       dummy0     dummy1
      
      The problem scenario.
      If dummy1 is removed,
      1. ->dellink() of dummy1 is called.
      2. NETDEV_UNREGISTER of dummy1 notification is sent to macvlan module.
      3. ->dellink() of macvlan1 is called.
      4. NETDEV_UNREGISTER of macvlan1 notification is sent to bond module.
      5. __bond_release_one() is called and it internally calls
         dev_set_mac_address().
      6. dev_set_mac_address() calls the ->ndo_set_mac_address() of macvlan1,
         which is macvlan_set_mac_address().
      7. macvlan_set_mac_address() calls the dev_set_mac_address() with dummy1.
      8. NETDEV_CHANGEADDR of dummy1 is sent to macvlan module.
      9. In the macvlan_device_event(), it calls list_first_entry_or_null().
      At this point, dummy1 and macvlan1 were removed.
      So, list_first_entry_or_null() will return NULL.
      
      Test commands:
          ip netns add nst
          ip netns exec nst ip link add bond0 type bond
          for i in {0..10}
          do
              ip netns exec nst ip link add dummy$i type dummy
      	ip netns exec nst ip link add macvlan$i link dummy$i \
      		type macvlan mode passthru
      	ip netns exec nst ip link set macvlan$i master bond0
          done
          ip netns del nst
      
      Splat looks like:
      [   40.585687][  T146] general protection fault, probably for non-canonical address 0xdffffc0000000000: 0000 [#1] SMP DEI
      [   40.587249][  T146] KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007]
      [   40.588342][  T146] CPU: 1 PID: 146 Comm: kworker/u8:2 Not tainted 5.7.0-rc1+ #532
      [   40.589299][  T146] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
      [   40.590469][  T146] Workqueue: netns cleanup_net
      [   40.591045][  T146] RIP: 0010:macvlan_device_event+0x4e2/0x900 [macvlan]
      [   40.591905][  T146] Code: 00 00 00 00 00 fc ff df 80 3c 06 00 0f 85 45 02 00 00 48 89 da 48 b8 00 00 00 00 00 fc ff d2
      [   40.594126][  T146] RSP: 0018:ffff88806116f4a0 EFLAGS: 00010246
      [   40.594783][  T146] RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 0000000000000000
      [   40.595653][  T146] RDX: 0000000000000000 RSI: ffff88806547ddd8 RDI: ffff8880540f1360
      [   40.596495][  T146] RBP: ffff88804011a808 R08: fffffbfff4fb8421 R09: fffffbfff4fb8421
      [   40.597377][  T146] R10: ffffffffa7dc2107 R11: 0000000000000000 R12: 0000000000000008
      [   40.598186][  T146] R13: ffff88804011a000 R14: ffff8880540f1000 R15: 1ffff1100c22de9a
      [   40.599012][  T146] FS:  0000000000000000(0000) GS:ffff888067800000(0000) knlGS:0000000000000000
      [   40.600004][  T146] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [   40.600665][  T146] CR2: 00005572d3a807b8 CR3: 000000005fcf4003 CR4: 00000000000606e0
      [   40.601485][  T146] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [   40.602461][  T146] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [   40.603443][  T146] Call Trace:
      [   40.603871][  T146]  ? nf_tables_dump_setelem+0xa0/0xa0 [nf_tables]
      [   40.604587][  T146]  ? macvlan_uninit+0x100/0x100 [macvlan]
      [   40.605212][  T146]  ? __module_text_address+0x13/0x140
      [   40.605842][  T146]  notifier_call_chain+0x90/0x160
      [   40.606477][  T146]  dev_set_mac_address+0x28e/0x3f0
      [   40.607117][  T146]  ? netdev_notify_peers+0xc0/0xc0
      [   40.607762][  T146]  ? __module_text_address+0x13/0x140
      [   40.608440][  T146]  ? notifier_call_chain+0x90/0x160
      [   40.609097][  T146]  ? dev_set_mac_address+0x1f0/0x3f0
      [   40.609758][  T146]  dev_set_mac_address+0x1f0/0x3f0
      [   40.610402][  T146]  ? __local_bh_enable_ip+0xe9/0x1b0
      [   40.611071][  T146]  ? bond_hw_addr_flush+0x77/0x100 [bonding]
      [   40.611823][  T146]  ? netdev_notify_peers+0xc0/0xc0
      [   40.612461][  T146]  ? bond_hw_addr_flush+0x77/0x100 [bonding]
      [   40.613213][  T146]  ? bond_hw_addr_flush+0x77/0x100 [bonding]
      [   40.613963][  T146]  ? __local_bh_enable_ip+0xe9/0x1b0
      [   40.614631][  T146]  ? bond_time_in_interval.isra.31+0x90/0x90 [bonding]
      [   40.615484][  T146]  ? __bond_release_one+0x9f0/0x12c0 [bonding]
      [   40.616230][  T146]  __bond_release_one+0x9f0/0x12c0 [bonding]
      [   40.616949][  T146]  ? bond_enslave+0x47c0/0x47c0 [bonding]
      [   40.617642][  T146]  ? lock_downgrade+0x730/0x730
      [   40.618218][  T146]  ? check_flags.part.42+0x450/0x450
      [   40.618850][  T146]  ? __mutex_unlock_slowpath+0xd0/0x670
      [   40.619519][  T146]  ? trace_hardirqs_on+0x30/0x180
      [   40.620117][  T146]  ? wait_for_completion+0x250/0x250
      [   40.620754][  T146]  bond_netdev_event+0x822/0x970 [bonding]
      [   40.621460][  T146]  ? __module_text_address+0x13/0x140
      [   40.622097][  T146]  notifier_call_chain+0x90/0x160
      [   40.622806][  T146]  rollback_registered_many+0x660/0xcf0
      [   40.623522][  T146]  ? netif_set_real_num_tx_queues+0x780/0x780
      [   40.624290][  T146]  ? notifier_call_chain+0x90/0x160
      [   40.624957][  T146]  ? netdev_upper_dev_unlink+0x114/0x180
      [   40.625686][  T146]  ? __netdev_adjacent_dev_unlink_neighbour+0x30/0x30
      [   40.626421][  T146]  ? mutex_is_locked+0x13/0x50
      [   40.627016][  T146]  ? unregister_netdevice_queue+0xf2/0x240
      [   40.627663][  T146]  unregister_netdevice_many.part.134+0x13/0x1b0
      [   40.628362][  T146]  default_device_exit_batch+0x2d9/0x390
      [   40.628987][  T146]  ? unregister_netdevice_many+0x40/0x40
      [   40.629615][  T146]  ? dev_change_net_namespace+0xcb0/0xcb0
      [   40.630279][  T146]  ? prepare_to_wait_exclusive+0x2e0/0x2e0
      [   40.630943][  T146]  ? ops_exit_list.isra.9+0x97/0x140
      [   40.631554][  T146]  cleanup_net+0x441/0x890
      [ ... ]
      
      Fixes: e289fd28 ("macvlan: fix the problem when mac address changes for passthru mode")
      Reported-by: syzbot+5035b1f9dc7ea4558d5a@syzkaller.appspotmail.com
      Signed-off-by: default avatarTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4dee15b4
    • Jason Yan's avatar
      e1000: remove unneeded conversion to bool · c95576a3
      Jason Yan authored
      The '==' expression itself is bool, no need to convert it to bool again.
      This fixes the following coccicheck warning:
      
      drivers/net/ethernet/intel/e1000/e1000_main.c:1479:44-49: WARNING:
      conversion to bool not needed here
      Signed-off-by: default avatarJason Yan <yanaijie@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c95576a3
    • Jason Yan's avatar
      i40e: Remove unneeded conversion to bool · 7ff4f063
      Jason Yan authored
      The '==' expression itself is bool, no need to convert it to bool again.
      This fixes the following coccicheck warning:
      
      drivers/net/ethernet/intel/i40e/i40e_main.c:1614:52-57: WARNING:
      conversion to bool not needed here
      drivers/net/ethernet/intel/i40e/i40e_main.c:11439:52-57: WARNING:
      conversion to bool not needed here
      Signed-off-by: default avatarJason Yan <yanaijie@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7ff4f063
    • Jason Yan's avatar
      ptp: Remove unneeded conversion to bool · e9a9e519
      Jason Yan authored
      The '==' expression itself is bool, no need to convert it to bool again.
      This fixes the following coccicheck warning:
      
      drivers/ptp/ptp_ines.c:403:55-60: WARNING: conversion to bool not
      needed here
      drivers/ptp/ptp_ines.c:404:55-60: WARNING: conversion to bool not
      needed here
      Signed-off-by: default avatarJason Yan <yanaijie@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e9a9e519
    • Jiri Slaby's avatar
      cgroup, netclassid: remove double cond_resched · 526f3d96
      Jiri Slaby authored
      Commit 018d26fc ("cgroup, netclassid: periodically release file_lock
      on classid") added a second cond_resched to write_classid indirectly by
      update_classid_task. Remove the one in write_classid.
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      Cc: Dmitry Yakunin <zeil@yandex-team.ru>
      Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Cc: David S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      526f3d96
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf · 76fc6a9a
      David S. Miller authored
      Pablo Neira Ayuso says:
      
      ====================
      Netfilter fixes for net
      
      The following patchset contains Netfilter fixes for net:
      
      1) flow_block_cb memleak in nf_flow_table_offload_del_cb(), from Roi Dayan.
      
      2) Fix error path handling in nf_nat_inet_register_fn(), from Hillf Danton.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      76fc6a9a
  3. 20 Apr, 2020 16 commits
    • David S. Miller's avatar
    • Zhu Yanjun's avatar
      net/mlx5e: Get the latest values from counters in switchdev mode · dcdf4ce0
      Zhu Yanjun authored
      In the switchdev mode, when running "cat
      /sys/class/net/NIC/statistics/tx_packets", the ppcnt register is
      accessed to get the latest values. But currently this command can
      not get the correct values from ppcnt.
      
      From firmware manual, before getting the 802_3 counters, the 802_3
      data layout should be set to the ppcnt register.
      
      When the command "cat /sys/class/net/NIC/statistics/tx_packets" is
      run, before updating 802_3 data layout with ppcnt register, the
      monitor counters are tested. The test result will decide the
      802_3 data layout is updated or not.
      
      Actually the monitor counters do not support to monitor rx/tx
      stats of 802_3 in switchdev mode. So the rx/tx counters change
      will not trigger monitor counters. So the 802_3 data layout will
      not be updated in ppcnt register. Finally this command can not get
      the latest values from ppcnt register with 802_3 data layout.
      
      Fixes: 5c7e8bbb ("net/mlx5e: Use monitor counters for update stats")
      Signed-off-by: default avatarZhu Yanjun <yanjunz@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      dcdf4ce0
    • Saeed Mahameed's avatar
      net/mlx5: Kconfig: convert imply usage to weak dependency · 96c34151
      Saeed Mahameed authored
      MLX5_CORE uses the 'imply' keyword to depend on VXLAN, PTP_1588_CLOCK,
      MLXFW and PCI_HYPERV_INTERFACE.
      
      This was useful to force vxlan, ptp, etc.. to be reachable to mlx5
      regardless of their config states.
      
      Due to the changes in the cited commit below, the semantics of 'imply'
      was changed to not force any restriction on the implied config.
      
      As a result of this change, the compilation of MLX5_CORE=y and VXLAN=m
      would result in undefined references, as VXLAN now would stay as 'm'.
      
      To fix this we change MLX5_CORE to have a weak dependency on
      these modules/configs and make sure they are reachable, by adding:
      depend on symbol || !symbol.
      
      For example: VXLAN=m MLX5_CORE=y, this will force MLX5_CORE to m
      
      Fixes: def2fbff ("kconfig: allow symbols implied by y to become m")
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Cc: Masahiro Yamada <masahiroy@kernel.org>
      Cc: Nicolas Pitre <nico@fluxnic.net>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Reported-by: default avatarRandy Dunlap <rdunlap@infradead.org>
      96c34151
    • Maxim Mikityanskiy's avatar
      net/mlx5e: Don't trigger IRQ multiple times on XSK wakeup to avoid WQ overruns · e7e0004a
      Maxim Mikityanskiy authored
      XSK wakeup function triggers NAPI by posting a NOP WQE to a special XSK
      ICOSQ. When the application floods the driver with wakeup requests by
      calling sendto() in a certain pattern that ends up in mlx5e_trigger_irq,
      the XSK ICOSQ may overflow.
      
      Multiple NOPs are not required and won't accelerate the process, so
      avoid posting a second NOP if there is one already on the way. This way
      we also avoid increasing the queue size (which might not help anyway).
      
      Fixes: db05815b ("net/mlx5e: Add XSK zero-copy support")
      Signed-off-by: default avatarMaxim Mikityanskiy <maximmi@mellanox.com>
      Reviewed-by: default avatarTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      e7e0004a
    • Paul Blakey's avatar
      net/mlx5: CT: Change idr to xarray to protect parallel tuple id allocation · 70840b66
      Paul Blakey authored
      After allowing parallel tuple insertion, we get the following trace:
      
      [ 5505.142249] ------------[ cut here ]------------
      [ 5505.148155] WARNING: CPU: 21 PID: 13313 at lib/radix-tree.c:581 delete_node+0x16c/0x180
      [ 5505.295553] CPU: 21 PID: 13313 Comm: kworker/u50:22 Tainted: G           OE     5.6.0+ #78
      [ 5505.304824] Hardware name: Supermicro Super Server/X10DRT-P, BIOS 2.0b 03/30/2017
      [ 5505.313740] Workqueue: nf_flow_table_offload flow_offload_work_handler [nf_flow_table]
      [ 5505.323257] RIP: 0010:delete_node+0x16c/0x180
      [ 5505.349862] RSP: 0018:ffffb19184eb7b30 EFLAGS: 00010282
      [ 5505.356785] RAX: 0000000000000000 RBX: ffff904ac95b86d8 RCX: ffff904b6f938838
      [ 5505.365190] RDX: 0000000000000000 RSI: ffff904ac954b908 RDI: ffff904ac954b920
      [ 5505.373628] RBP: ffff904b4ac13060 R08: 0000000000000001 R09: 0000000000000000
      [ 5505.382155] R10: 0000000000000000 R11: 0000000000000040 R12: 0000000000000000
      [ 5505.390527] R13: ffffb19184eb7bfc R14: ffff904b6bef5800 R15: ffff90482c1203c0
      [ 5505.399246] FS:  0000000000000000(0000) GS:ffff904c2fc80000(0000) knlGS:0000000000000000
      [ 5505.408621] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [ 5505.415739] CR2: 00007f5d27006010 CR3: 0000000058c10006 CR4: 00000000001626e0
      [ 5505.424547] Call Trace:
      [ 5505.428429]  idr_alloc_u32+0x7b/0xc0
      [ 5505.433803]  mlx5_tc_ct_entry_add_rule+0xbf/0x950 [mlx5_core]
      [ 5505.441354]  ? mlx5_fc_create+0x23c/0x370 [mlx5_core]
      [ 5505.448225]  mlx5_tc_ct_block_flow_offload+0x874/0x10b0 [mlx5_core]
      [ 5505.456278]  ? mlx5_tc_ct_block_flow_offload+0x63d/0x10b0 [mlx5_core]
      [ 5505.464532]  nf_flow_offload_tuple.isra.21+0xc5/0x140 [nf_flow_table]
      [ 5505.472286]  ? __kmalloc+0x217/0x2f0
      [ 5505.477093]  ? flow_rule_alloc+0x1c/0x30
      [ 5505.482117]  flow_offload_work_handler+0x1d0/0x290 [nf_flow_table]
      [ 5505.489674]  ? process_one_work+0x17c/0x580
      [ 5505.494922]  process_one_work+0x202/0x580
      [ 5505.500082]  ? process_one_work+0x17c/0x580
      [ 5505.505696]  worker_thread+0x4c/0x3f0
      [ 5505.510458]  kthread+0x103/0x140
      [ 5505.514989]  ? process_one_work+0x580/0x580
      [ 5505.520616]  ? kthread_bind+0x10/0x10
      [ 5505.525837]  ret_from_fork+0x3a/0x50
      [ 5505.570841] ---[ end trace 07995de9c56d6831 ]---
      
      This happens from parallel deletes/adds to idr, as idr isn't protected.
      Fix that by using xarray as the tuple_ids allocator instead of idr.
      
      Fixes: 7da182a9 ("netfilter: flowtable: Use work entry per offload command")
      Reviewed-by: default avatarRoi Dayan <roid@mellanox.com>
      Signed-off-by: default avatarPaul Blakey <paulb@mellanox.com>
      Reviewed-by: default avatarOz Shlomo <ozsh@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      70840b66
    • Niklas Schnelle's avatar
      net/mlx5: Fix failing fw tracer allocation on s390 · a019b361
      Niklas Schnelle authored
      On s390 FORCE_MAX_ZONEORDER is 9 instead of 11, thus a larger kzalloc()
      allocation as done for the firmware tracer will always fail.
      
      Looking at mlx5_fw_tracer_save_trace(), it is actually the driver itself
      that copies the debug data into the trace array and there is no need for
      the allocation to be contiguous in physical memory. We can therefor use
      kvzalloc() instead of kzalloc() and get rid of the large contiguous
      allcoation.
      
      Fixes: f53aaa31 ("net/mlx5: FW tracer, implement tracer logic")
      Signed-off-by: default avatarNiklas Schnelle <schnelle@linux.ibm.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      a019b361
    • Taehee Yoo's avatar
      team: fix hang in team_mode_get() · 1c30fbc7
      Taehee Yoo authored
      When team mode is changed or set, the team_mode_get() is called to check
      whether the mode module is inserted or not. If the mode module is not
      inserted, it calls the request_module().
      In the request_module(), it creates a child process, which is
      the "modprobe" process and waits for the done of the child process.
      At this point, the following locks were used.
      down_read(&cb_lock()); by genl_rcv()
          genl_lock(); by genl_rcv_msc()
              rtnl_lock(); by team_nl_cmd_options_set()
                  mutex_lock(&team->lock); by team_nl_team_get()
      
      Concurrently, the team module could be removed by rmmod or "modprobe -r"
      The __exit function of team module is team_module_exit(), which calls
      team_nl_fini() and it tries to acquire following locks.
      down_write(&cb_lock);
          genl_lock();
      Because of the genl_lock() and cb_lock, this process can't be finished
      earlier than request_module() routine.
      
      The problem secenario.
      CPU0                                     CPU1
      team_mode_get
          request_module()
                                               modprobe -r team_mode_roundrobin
                                                           team <--(B)
              modprobe team <--(A)
                  team_mode_roundrobin
      
      By request_module(), the "modprobe team_mode_roundrobin" command
      will be executed. At this point, the modprobe process will decide
      that the team module should be inserted before team_mode_roundrobin.
      Because the team module is being removed.
      
      By the module infrastructure, the same module insert/remove operations
      can't be executed concurrently.
      So, (A) waits for (B) but (B) also waits for (A) because of locks.
      So that the hang occurs at this point.
      
      Test commands:
          while :
          do
              teamd -d &
      	killall teamd &
      	modprobe -rv team_mode_roundrobin &
          done
      
      The approach of this patch is to hold the reference count of the team
      module if the team module is compiled as a module. If the reference count
      of the team module is not zero while request_module() is being called,
      the team module will not be removed at that moment.
      So that the above scenario could not occur.
      
      Fixes: 3d249d4c ("net: introduce ethernet teaming device")
      Signed-off-by: default avatarTaehee Yoo <ap420073@gmail.com>
      Reviewed-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1c30fbc7
    • David S. Miller's avatar
      Merge branch 'mptcp-fix-races-on-accept' · 0b943d90
      David S. Miller authored
      Paolo Abeni says:
      
      ====================
      mptcp: fix races on accept()
      
      This series includes some fixes for accept() races which may cause inconsistent
      MPTCP socket status and oops. Please see the individual patches for the
      technical details.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0b943d90
    • Paolo Abeni's avatar
      mptcp: drop req socket remote_key* fields · fca5c82c
      Paolo Abeni authored
      We don't need them, as we can use the current ingress opt
      data instead. Setting them in syn_recv_sock() may causes
      inconsistent mptcp socket status, as per previous commit.
      
      Fixes: cc7972ea ("mptcp: parse and emit MP_CAPABLE option according to v1 spec")
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fca5c82c
    • Paolo Abeni's avatar
      mptcp: avoid flipping mp_capable field in syn_recv_sock() · 4c8941de
      Paolo Abeni authored
      If multiple CPUs races on the same req_sock in syn_recv_sock(),
      flipping such field can cause inconsistent child socket status.
      
      When racing, the CPU losing the req ownership may still change
      the mptcp request socket mp_capable flag while the CPU owning
      the request is cloning the socket, leaving the child socket with
      'is_mptcp' set but no 'mp_capable' flag.
      
      Such socket will stay with 'conn' field cleared, heading to oops
      in later mptcp callback.
      
      Address the issue tracking the fallback status in a local variable.
      
      Fixes: 58b09919 ("mptcp: create msk early")
      Co-developed-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4c8941de
    • Florian Westphal's avatar
      mptcp: handle mptcp listener destruction via rcu · 5e20087d
      Florian Westphal authored
      Following splat can occur during self test:
      
       BUG: KASAN: use-after-free in subflow_data_ready+0x156/0x160
       Read of size 8 at addr ffff888100c35c28 by task mptcp_connect/4808
      
        subflow_data_ready+0x156/0x160
        tcp_child_process+0x6a3/0xb30
        tcp_v4_rcv+0x2231/0x3730
        ip_protocol_deliver_rcu+0x5c/0x860
        ip_local_deliver_finish+0x220/0x360
        ip_local_deliver+0x1c8/0x4e0
        ip_rcv_finish+0x1da/0x2f0
        ip_rcv+0xd0/0x3c0
        __netif_receive_skb_one_core+0xf5/0x160
        __netif_receive_skb+0x27/0x1c0
        process_backlog+0x21e/0x780
        net_rx_action+0x35f/0xe90
        do_softirq+0x4c/0x50
        [..]
      
      This occurs when accessing subflow_ctx->conn.
      
      Problem is that tcp_child_process() calls listen sockets'
      sk_data_ready() notification, but it doesn't hold the listener
      lock.  Another cpu calling close() on the listener will then cause
      transition of refcount to 0.
      
      Fixes: 58b09919 ("mptcp: create msk early")
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5e20087d
    • Rahul Lakkireddy's avatar
      cxgb4: fix large delays in PTP synchronization · bd019427
      Rahul Lakkireddy authored
      Fetching PTP sync information from mailbox is slow and can take
      up to 10 milliseconds. Reduce this unnecessary delay by directly
      reading the information from the corresponding registers.
      
      Fixes: 9c33e420 ("cxgb4: Add PTP Hardware Clock (PHC) support")
      Signed-off-by: default avatarManoj Malviya <manojmalviya@chelsio.com>
      Signed-off-by: default avatarRahul Lakkireddy <rahul.lakkireddy@chelsio.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bd019427
    • Marc Zyngier's avatar
      net: stmmac: dwmac-meson8b: Add missing boundary to RGMII TX clock array · f0212a5e
      Marc Zyngier authored
      Running with KASAN on a VIM3L systems leads to the following splat
      when probing the Ethernet device:
      
      ==================================================================
      BUG: KASAN: global-out-of-bounds in _get_maxdiv+0x74/0xd8
      Read of size 4 at addr ffffa000090615f4 by task systemd-udevd/139
      CPU: 1 PID: 139 Comm: systemd-udevd Tainted: G            E     5.7.0-rc1-00101-g8624b7577b9c #781
      Hardware name: amlogic w400/w400, BIOS 2020.01-rc5 03/12/2020
      Call trace:
       dump_backtrace+0x0/0x2a0
       show_stack+0x20/0x30
       dump_stack+0xec/0x148
       print_address_description.isra.12+0x70/0x35c
       __kasan_report+0xfc/0x1d4
       kasan_report+0x4c/0x68
       __asan_load4+0x9c/0xd8
       _get_maxdiv+0x74/0xd8
       clk_divider_bestdiv+0x74/0x5e0
       clk_divider_round_rate+0x80/0x1a8
       clk_core_determine_round_nolock.part.9+0x9c/0xd0
       clk_core_round_rate_nolock+0xf0/0x108
       clk_hw_round_rate+0xac/0xf0
       clk_factor_round_rate+0xb8/0xd0
       clk_core_determine_round_nolock.part.9+0x9c/0xd0
       clk_core_round_rate_nolock+0xf0/0x108
       clk_core_round_rate_nolock+0xbc/0x108
       clk_core_set_rate_nolock+0xc4/0x2e8
       clk_set_rate+0x58/0xe0
       meson8b_dwmac_probe+0x588/0x72c [dwmac_meson8b]
       platform_drv_probe+0x78/0xd8
       really_probe+0x158/0x610
       driver_probe_device+0x140/0x1b0
       device_driver_attach+0xa4/0xb0
       __driver_attach+0xcc/0x1c8
       bus_for_each_dev+0xf4/0x168
       driver_attach+0x3c/0x50
       bus_add_driver+0x238/0x2e8
       driver_register+0xc8/0x1e8
       __platform_driver_register+0x88/0x98
       meson8b_dwmac_driver_init+0x28/0x1000 [dwmac_meson8b]
       do_one_initcall+0xa8/0x328
       do_init_module+0xe8/0x368
       load_module+0x3300/0x36b0
       __do_sys_finit_module+0x120/0x1a8
       __arm64_sys_finit_module+0x4c/0x60
       el0_svc_common.constprop.2+0xe4/0x268
       do_el0_svc+0x98/0xa8
       el0_svc+0x24/0x68
       el0_sync_handler+0x12c/0x318
       el0_sync+0x158/0x180
      
      The buggy address belongs to the variable:
       div_table.63646+0x34/0xfffffffffffffa40 [dwmac_meson8b]
      
      Memory state around the buggy address:
       ffffa00009061480: fa fa fa fa 00 00 00 01 fa fa fa fa 00 00 00 00
       ffffa00009061500: 05 fa fa fa fa fa fa fa 00 04 fa fa fa fa fa fa
      >ffffa00009061580: 00 03 fa fa fa fa fa fa 00 00 00 00 00 00 fa fa
                                                                   ^
       ffffa00009061600: fa fa fa fa 00 01 fa fa fa fa fa fa 01 fa fa fa
       ffffa00009061680: fa fa fa fa 00 01 fa fa fa fa fa fa 04 fa fa fa
      ==================================================================
      
      Digging into this indeed shows that the clock divider array is
      lacking a final fence, and that the clock subsystems goes in the
      weeds. Oh well.
      
      Let's add the empty structure that indicates the end of the array.
      
      Fixes: bd6f4854 ("net: stmmac: dwmac-meson8b: Fix the RGMII TX delay on Meson8b/8m2 SoCs")
      Signed-off-by: default avatarMarc Zyngier <maz@kernel.org>
      Cc: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
      Reviewed-by: default avatarMartin Blumenstingl <martin.blumenstingl@googlemail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f0212a5e
    • John Haxby's avatar
      ipv6: fix restrict IPV6_ADDRFORM operation · 82c9ae44
      John Haxby authored
      Commit b6f61189 ("ipv6: restrict IPV6_ADDRFORM operation") fixed a
      problem found by syzbot an unfortunate logic error meant that it
      also broke IPV6_ADDRFORM.
      
      Rearrange the checks so that the earlier test is just one of the series
      of checks made before moving the socket from IPv6 to IPv4.
      
      Fixes: b6f61189 ("ipv6: restrict IPV6_ADDRFORM operation")
      Signed-off-by: default avatarJohn Haxby <john.haxby@oracle.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      82c9ae44
    • Tang Bin's avatar
      net: systemport: Omit superfluous error message in bcm_sysport_probe() · bdbe05b3
      Tang Bin authored
      In the function bcm_sysport_probe(), when get irq failed, the function
      platform_get_irq() logs an error message, so remove redundant message
      here.
      Signed-off-by: default avatarTang Bin <tangbin@cmss.chinamobile.com>
      Acked-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bdbe05b3
    • Tonghao Zhang's avatar
      net: openvswitch: ovs_ct_exit to be done under ovs_lock · 27de77ce
      Tonghao Zhang authored
      syzbot wrote:
      | =============================
      | WARNING: suspicious RCU usage
      | 5.7.0-rc1+ #45 Not tainted
      | -----------------------------
      | net/openvswitch/conntrack.c:1898 RCU-list traversed in non-reader section!!
      |
      | other info that might help us debug this:
      | rcu_scheduler_active = 2, debug_locks = 1
      | ...
      |
      | stack backtrace:
      | Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-0-ga698c8995f-prebuilt.qemu.org 04/01/2014
      | Workqueue: netns cleanup_net
      | Call Trace:
      | ...
      | ovs_ct_exit
      | ovs_exit_net
      | ops_exit_list.isra.7
      | cleanup_net
      | process_one_work
      | worker_thread
      
      To avoid that warning, invoke the ovs_ct_exit under ovs_lock and add
      lockdep_ovsl_is_held as optional lockdep expression.
      
      Link: https://lore.kernel.org/lkml/000000000000e642a905a0cbee6e@google.com
      Fixes: 11efd5cb ("openvswitch: Support conntrack zone limit")
      Cc: Pravin B Shelar <pshelar@ovn.org>
      Cc: Yi-Hung Wei <yihung.wei@gmail.com>
      Reported-by: syzbot+7ef50afd3a211f879112@syzkaller.appspotmail.com
      Signed-off-by: default avatarTonghao Zhang <xiangxia.m.yue@gmail.com>
      Acked-by: default avatarPravin B Shelar <pshelar@ovn.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      27de77ce
  4. 19 Apr, 2020 1 commit
    • Hillf Danton's avatar
      netfilter: nat: fix error handling upon registering inet hook · b4faef17
      Hillf Danton authored
      A case of warning was reported by syzbot.
      
      ------------[ cut here ]------------
      WARNING: CPU: 0 PID: 19934 at net/netfilter/nf_nat_core.c:1106
      nf_nat_unregister_fn+0x532/0x5c0 net/netfilter/nf_nat_core.c:1106
      Kernel panic - not syncing: panic_on_warn set ...
      CPU: 0 PID: 19934 Comm: syz-executor.5 Not tainted 5.6.0-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x188/0x20d lib/dump_stack.c:118
       panic+0x2e3/0x75c kernel/panic.c:221
       __warn.cold+0x2f/0x35 kernel/panic.c:582
       report_bug+0x27b/0x2f0 lib/bug.c:195
       fixup_bug arch/x86/kernel/traps.c:175 [inline]
       fixup_bug arch/x86/kernel/traps.c:170 [inline]
       do_error_trap+0x12b/0x220 arch/x86/kernel/traps.c:267
       do_invalid_op+0x32/0x40 arch/x86/kernel/traps.c:286
       invalid_op+0x23/0x30 arch/x86/entry/entry_64.S:1027
      RIP: 0010:nf_nat_unregister_fn+0x532/0x5c0 net/netfilter/nf_nat_core.c:1106
      Code: ff df 48 c1 ea 03 80 3c 02 00 75 75 48 8b 44 24 10 4c 89 ef 48 c7 00 00 00 00 00 e8 e8 f8 53 fb e9 4d fe ff ff e8 ee 9c 16 fb <0f> 0b e9 41 fe ff ff e8 e2 45 54 fb e9 b5 fd ff ff 48 8b 7c 24 20
      RSP: 0018:ffffc90005487208 EFLAGS: 00010246
      RAX: 0000000000040000 RBX: 0000000000000004 RCX: ffffc9001444a000
      RDX: 0000000000040000 RSI: ffffffff865c94a2 RDI: 0000000000000005
      RBP: ffff88808b5cf000 R08: ffff8880a2620140 R09: fffffbfff14bcd79
      R10: ffffc90005487208 R11: fffffbfff14bcd78 R12: 0000000000000000
      R13: 0000000000000001 R14: 0000000000000001 R15: 0000000000000000
       nf_nat_ipv6_unregister_fn net/netfilter/nf_nat_proto.c:1017 [inline]
       nf_nat_inet_register_fn net/netfilter/nf_nat_proto.c:1038 [inline]
       nf_nat_inet_register_fn+0xfc/0x140 net/netfilter/nf_nat_proto.c:1023
       nf_tables_register_hook net/netfilter/nf_tables_api.c:224 [inline]
       nf_tables_addchain.constprop.0+0x82e/0x13c0 net/netfilter/nf_tables_api.c:1981
       nf_tables_newchain+0xf68/0x16a0 net/netfilter/nf_tables_api.c:2235
       nfnetlink_rcv_batch+0x83a/0x1610 net/netfilter/nfnetlink.c:433
       nfnetlink_rcv_skb_batch net/netfilter/nfnetlink.c:543 [inline]
       nfnetlink_rcv+0x3af/0x420 net/netfilter/nfnetlink.c:561
       netlink_unicast_kernel net/netlink/af_netlink.c:1303 [inline]
       netlink_unicast+0x537/0x740 net/netlink/af_netlink.c:1329
       netlink_sendmsg+0x882/0xe10 net/netlink/af_netlink.c:1918
       sock_sendmsg_nosec net/socket.c:652 [inline]
       sock_sendmsg+0xcf/0x120 net/socket.c:672
       ____sys_sendmsg+0x6bf/0x7e0 net/socket.c:2362
       ___sys_sendmsg+0x100/0x170 net/socket.c:2416
       __sys_sendmsg+0xec/0x1b0 net/socket.c:2449
       do_syscall_64+0xf6/0x7d0 arch/x86/entry/common.c:295
       entry_SYSCALL_64_after_hwframe+0x49/0xb3
      
      and to quiesce it, unregister NFPROTO_IPV6 hook instead of NFPROTO_INET
      in case of failing to register NFPROTO_IPV4 hook.
      Reported-by: default avatarsyzbot <syzbot+33e06702fd6cffc24c40@syzkaller.appspotmail.com>
      Fixes: d164385e ("netfilter: nat: add inet family nat support")
      Cc: Florian Westphal <fw@strlen.de>
      Cc: Stefano Brivio <sbrivio@redhat.com>
      Signed-off-by: default avatarHillf Danton <hdanton@sina.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      b4faef17
  5. 18 Apr, 2020 4 commits
    • Eric Dumazet's avatar
      tcp: cache line align MAX_TCP_HEADER · 9bacd256
      Eric Dumazet authored
      TCP stack is dumb in how it cooks its output packets.
      
      Depending on MAX_HEADER value, we might chose a bad ending point
      for the headers.
      
      If we align the end of TCP headers to cache line boundary, we
      make sure to always use the smallest number of cache lines,
      which always help.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Soheil Hassas Yeganeh <soheil@google.com>
      Acked-by: default avatarSoheil Hassas Yeganeh <soheil@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9bacd256
    • David S. Miller's avatar
      Merge branch 'mptcp-fixes' · 56e639e6
      David S. Miller authored
      Florian Westphal says:
      
      ====================
      mptcp: fix 'attempt to release socket in state...' splats
      
      These two patches fix error handling corner-cases where
      inet_sock_destruct gets called for a mptcp_sk that is not in TCP_CLOSE
      state.  This results in unwanted error printks from the network stack.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      56e639e6
    • Florian Westphal's avatar
      mptcp: fix 'Attempt to release TCP socket in state' warnings · 9f5ca6a5
      Florian Westphal authored
      We need to set sk_state to CLOSED, else we will get following:
      
      IPv4: Attempt to release TCP socket in state 3 00000000b95f109e
      IPv4: Attempt to release TCP socket in state 10 00000000b95f109e
      
      First one is from inet_sock_destruct(), second one from
      mptcp_sk_clone failure handling.  Setting sk_state to CLOSED isn't
      enough, we also need to orphan sk so it has DEAD flag set.
      Otherwise, a very similar warning is printed from inet_sock_destruct().
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9f5ca6a5
    • Florian Westphal's avatar
      mptcp: fix splat when incoming connection is never accepted before exit/close · df1036da
      Florian Westphal authored
      Following snippet (replicated from syzkaller reproducer) generates
      warning: "IPv4: Attempt to release TCP socket in state 1".
      
      int main(void) {
       struct sockaddr_in sin1 = { .sin_family = 2, .sin_port = 0x4e20,
                                   .sin_addr.s_addr = 0x010000e0, };
       struct sockaddr_in sin2 = { .sin_family = 2,
      	                     .sin_addr.s_addr = 0x0100007f, };
       struct sockaddr_in sin3 = { .sin_family = 2, .sin_port = 0x4e20,
      	                     .sin_addr.s_addr = 0x0100007f, };
       int r0 = socket(0x2, 0x1, 0x106);
       int r1 = socket(0x2, 0x1, 0x106);
      
       bind(r1, (void *)&sin1, sizeof(sin1));
       connect(r1, (void *)&sin2, sizeof(sin2));
       listen(r1, 3);
       return connect(r0, (void *)&sin3, 0x4d);
      }
      
      Reason is that the newly generated mptcp socket is closed via the ulp
      release of the tcp listener socket when its accept backlog gets purged.
      
      To fix this, delay setting the ESTABLISHED state until after userspace
      calls accept and via mptcp specific destructor.
      
      Fixes: 58b09919 ("mptcp: create msk early")
      Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/9Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      df1036da