1. 31 Mar, 2019 10 commits
    • Xin Long's avatar
      tipc: handle the err returned from cmd header function · 2ac695d1
      Xin Long authored
      Syzbot found a crash:
      
        BUG: KMSAN: uninit-value in tipc_nl_compat_name_table_dump+0x54f/0xcd0 net/tipc/netlink_compat.c:872
        Call Trace:
          tipc_nl_compat_name_table_dump+0x54f/0xcd0 net/tipc/netlink_compat.c:872
          __tipc_nl_compat_dumpit+0x59e/0xda0 net/tipc/netlink_compat.c:215
          tipc_nl_compat_dumpit+0x63a/0x820 net/tipc/netlink_compat.c:280
          tipc_nl_compat_handle net/tipc/netlink_compat.c:1226 [inline]
          tipc_nl_compat_recv+0x1b5f/0x2750 net/tipc/netlink_compat.c:1265
          genl_family_rcv_msg net/netlink/genetlink.c:601 [inline]
          genl_rcv_msg+0x185f/0x1a60 net/netlink/genetlink.c:626
          netlink_rcv_skb+0x431/0x620 net/netlink/af_netlink.c:2477
          genl_rcv+0x63/0x80 net/netlink/genetlink.c:637
          netlink_unicast_kernel net/netlink/af_netlink.c:1310 [inline]
          netlink_unicast+0xf3e/0x1020 net/netlink/af_netlink.c:1336
          netlink_sendmsg+0x127f/0x1300 net/netlink/af_netlink.c:1917
          sock_sendmsg_nosec net/socket.c:622 [inline]
          sock_sendmsg net/socket.c:632 [inline]
      
        Uninit was created at:
          __alloc_skb+0x309/0xa20 net/core/skbuff.c:208
          alloc_skb include/linux/skbuff.h:1012 [inline]
          netlink_alloc_large_skb net/netlink/af_netlink.c:1182 [inline]
          netlink_sendmsg+0xb82/0x1300 net/netlink/af_netlink.c:1892
          sock_sendmsg_nosec net/socket.c:622 [inline]
          sock_sendmsg net/socket.c:632 [inline]
      
      It was supposed to be fixed on commit 974cb0e3 ("tipc: fix uninit-value
      in tipc_nl_compat_name_table_dump") by checking TLV_GET_DATA_LEN(msg->req)
      in cmd->header()/tipc_nl_compat_name_table_dump_header(), which is called
      ahead of tipc_nl_compat_name_table_dump().
      
      However, tipc_nl_compat_dumpit() doesn't handle the error returned from cmd
      header function. It means even when the check added in that fix fails, it
      won't stop calling tipc_nl_compat_name_table_dump(), and the issue will be
      triggered again.
      
      So this patch is to add the process for the err returned from cmd header
      function in tipc_nl_compat_dumpit().
      
      Reported-by: syzbot+3ce8520484b0d4e260a5@syzkaller.appspotmail.com
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2ac695d1
    • Xin Long's avatar
      tipc: check link name with right length in tipc_nl_compat_link_set · 8c63bf9a
      Xin Long authored
      A similar issue as fixed by Patch "tipc: check bearer name with right
      length in tipc_nl_compat_bearer_enable" was also found by syzbot in
      tipc_nl_compat_link_set().
      
      The length to check with should be 'TLV_GET_DATA_LEN(msg->req) -
      offsetof(struct tipc_link_config, name)'.
      
      Reported-by: syzbot+de00a87b8644a582ae79@syzkaller.appspotmail.com
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8c63bf9a
    • Xin Long's avatar
      tipc: check bearer name with right length in tipc_nl_compat_bearer_enable · 6f07e5f0
      Xin Long authored
      Syzbot reported the following crash:
      
      BUG: KMSAN: uninit-value in memchr+0xce/0x110 lib/string.c:961
        memchr+0xce/0x110 lib/string.c:961
        string_is_valid net/tipc/netlink_compat.c:176 [inline]
        tipc_nl_compat_bearer_enable+0x2c4/0x910 net/tipc/netlink_compat.c:401
        __tipc_nl_compat_doit net/tipc/netlink_compat.c:321 [inline]
        tipc_nl_compat_doit+0x3aa/0xaf0 net/tipc/netlink_compat.c:354
        tipc_nl_compat_handle net/tipc/netlink_compat.c:1162 [inline]
        tipc_nl_compat_recv+0x1ae7/0x2750 net/tipc/netlink_compat.c:1265
        genl_family_rcv_msg net/netlink/genetlink.c:601 [inline]
        genl_rcv_msg+0x185f/0x1a60 net/netlink/genetlink.c:626
        netlink_rcv_skb+0x431/0x620 net/netlink/af_netlink.c:2477
        genl_rcv+0x63/0x80 net/netlink/genetlink.c:637
        netlink_unicast_kernel net/netlink/af_netlink.c:1310 [inline]
        netlink_unicast+0xf3e/0x1020 net/netlink/af_netlink.c:1336
        netlink_sendmsg+0x127f/0x1300 net/netlink/af_netlink.c:1917
        sock_sendmsg_nosec net/socket.c:622 [inline]
        sock_sendmsg net/socket.c:632 [inline]
      
      Uninit was created at:
        __alloc_skb+0x309/0xa20 net/core/skbuff.c:208
        alloc_skb include/linux/skbuff.h:1012 [inline]
        netlink_alloc_large_skb net/netlink/af_netlink.c:1182 [inline]
        netlink_sendmsg+0xb82/0x1300 net/netlink/af_netlink.c:1892
        sock_sendmsg_nosec net/socket.c:622 [inline]
        sock_sendmsg net/socket.c:632 [inline]
      
      It was triggered when the bearer name size < TIPC_MAX_BEARER_NAME,
      it would check with a wrong len/TLV_GET_DATA_LEN(msg->req), which
      also includes priority and disc_domain length.
      
      This patch is to fix it by checking it with a right length:
      'TLV_GET_DATA_LEN(msg->req) - offsetof(struct tipc_bearer_config, name)'.
      
      Reported-by: syzbot+8b707430713eb46e1e45@syzkaller.appspotmail.com
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6f07e5f0
    • David S. Miller's avatar
      Merge branch 'net-stmmac-fix-handling-of-oversized-frames' · d3de85a5
      David S. Miller authored
      Aaro Koskinen says:
      
      ====================
      net: stmmac: fix handling of oversized frames
      
      I accidentally had MTU size mismatch (9000 vs. 1500) in my network,
      and I noticed I could kill a system using stmmac & 1500 MTU simply
      by pinging it with "ping -s 2000 ...".
      
      While testing a fix I encountered also some other issues that need fixing.
      
      I have tested these only with enhanced descriptors, so the normal
      descriptor changes need a careful review.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d3de85a5
    • Aaro Koskinen's avatar
      net: stmmac: don't log oversized frames · 057a0c56
      Aaro Koskinen authored
      This is log is harmful as it can trigger multiple times per packet. Delete
      it.
      Signed-off-by: default avatarAaro Koskinen <aaro.koskinen@nokia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      057a0c56
    • Aaro Koskinen's avatar
      net: stmmac: fix dropping of multi-descriptor RX frames · 8ac0c24f
      Aaro Koskinen authored
      Packets without the last descriptor set should be dropped early. If we
      receive a frame larger than the DMA buffer, the HW will continue using the
      next descriptor. Driver mistakes these as individual frames, and sometimes
      a truncated frame (without the LD set) may look like a valid packet.
      
      This fixes a strange issue where the system replies to 4098-byte ping
      although the MTU/DMA buffer size is set to 4096, and yet at the same
      time it's logging an oversized packet.
      Signed-off-by: default avatarAaro Koskinen <aaro.koskinen@nokia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8ac0c24f
    • Aaro Koskinen's avatar
      net: stmmac: don't overwrite discard_frame status · 1b746ce8
      Aaro Koskinen authored
      If we have error bits set, the discard_frame status will get overwritten
      by checksum bit checks, which might set the status back to good one.
      Fix by checking the COE status only if the frame is good.
      Signed-off-by: default avatarAaro Koskinen <aaro.koskinen@nokia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1b746ce8
    • Aaro Koskinen's avatar
      net: stmmac: don't stop NAPI processing when dropping a packet · 07b39753
      Aaro Koskinen authored
      Currently, if we drop a packet, we exit from NAPI loop before the budget
      is consumed. In some situations this will make the RX processing stall
      e.g. when flood pinging the system with oversized packets, as the
      errorneous packets are not dropped efficiently.
      
      If we drop a packet, we should just continue to the next one as long as
      the budget allows.
      Signed-off-by: default avatarAaro Koskinen <aaro.koskinen@nokia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      07b39753
    • Aaro Koskinen's avatar
      net: stmmac: ratelimit RX error logs · 972c9be7
      Aaro Koskinen authored
      Ratelimit RX error logs.
      Signed-off-by: default avatarAaro Koskinen <aaro.koskinen@nokia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      972c9be7
    • Aaro Koskinen's avatar
      net: stmmac: use correct DMA buffer size in the RX descriptor · 583e6361
      Aaro Koskinen authored
      We always program the maximum DMA buffer size into the receive descriptor,
      although the allocated size may be less. E.g. with the default MTU size
      we allocate only 1536 bytes. If somebody sends us a bigger frame, then
      memory may get corrupted.
      
      Fix by using exact buffer sizes.
      Signed-off-by: default avatarAaro Koskinen <aaro.koskinen@nokia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      583e6361
  2. 30 Mar, 2019 2 commits
  3. 29 Mar, 2019 28 commits
    • David S. Miller's avatar
      Merge tag 'mlx5-fixes-2019-03-29' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux · 19c84744
      David S. Miller authored
      Saeed Mahameed says:
      
      ====================
      Mellanox, mlx5 fixes 2019-03-29
      
      This series introduces some fixes to mlx5 driver.
      
      Please pull and let me know if there is any problem.
      
      For -stable v4.11
      ('net/mlx5: Decrease default mr cache size')
      
      For -stable v4.12
      ('net/mlx5e: Add a lock on tir list')
      
      For -stable v4.13
      ('net/mlx5e: Fix error handling when refreshing TIRs')
      
      For -stable v4.18
      ('net/mlx5e: Update xon formula')
      
      For -stable v4.19
      ('net: mlx5: Add a missing check on idr_find, free buf')
      ('net/mlx5e: Update xoff formula')
      
      net-next merge Note:
      When merged with net-next the following simple conflict will appear,
      
      drivers/net/ethernet/mellanox/mlx5/core/en/port_buffer.c
      
      ++<<<<<<< HEAD (net)
       + *   max_mtu: netdev's max_mtu
      ++=======
      +  *    @mtu: device's MTU
      ++>>>>>>> net-next
      
      To resolve: just replace the line in net-next
      *    @mtu: device's MTU
      to
      *    @max_mtu: netdev's max_mtu
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      19c84744
    • David S. Miller's avatar
      Revert "cxgb4: Update 1.23.3.0 as the latest firmware supported." · ec915f47
      David S. Miller authored
      This reverts commit 4d31c4fa.
      
      Accidently applied this to the wrong tree.
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ec915f47
    • Vishal Kulkarni's avatar
      cxgb4: Update 1.23.3.0 as the latest firmware supported. · 4d31c4fa
      Vishal Kulkarni authored
      Change t4fw_version.h to update latest firmware version
      number to 1.23.3.0.
      Signed-off-by: default avatarVishal Kulkarni <vishal@chelsio.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4d31c4fa
    • Li RongQing's avatar
      net: ethtool: not call vzalloc for zero sized memory request · 3d883026
      Li RongQing authored
      NULL or ZERO_SIZE_PTR will be returned for zero sized memory
      request, and derefencing them will lead to a segfault
      
      so it is unnecessory to call vzalloc for zero sized memory
      request and not call functions which maybe derefence the
      NULL allocated memory
      
      this also fixes a possible memory leak if phy_ethtool_get_stats
      returns error, memory should be freed before exit
      Signed-off-by: default avatarLi RongQing <lirongqing@baidu.com>
      Reviewed-by: default avatarWang Li <wangli39@baidu.com>
      Reviewed-by: default avatarMichal Kubecek <mkubecek@suse.cz>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3d883026
    • Jakub Kicinski's avatar
      net: tls: prevent false connection termination with offload · c43ac97b
      Jakub Kicinski authored
      Only decrypt_internal() performs zero copy on rx, all paths
      which don't hit decrypt_internal() must set zc to false,
      otherwise tls_sw_recvmsg() may return 0 causing the application
      to believe that that connection got closed.
      
      Currently this happens with device offload when new record
      is first read from.
      
      Fixes: d069b780 ("tls: Fix tls_device receive")
      Signed-off-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Reviewed-by: default avatarSimon Horman <simon.horman@netronome.com>
      Reported-by: default avatarDavid Beckett <david.beckett@netronome.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c43ac97b
    • Haiyang Zhang's avatar
      hv_netvsc: Fix unwanted wakeup after tx_disable · 1b704c4a
      Haiyang Zhang authored
      After queue stopped, the wakeup mechanism may wake it up again
      when ring buffer usage is lower than a threshold. This may cause
      send path panic on NULL pointer when we stopped all tx queues in
      netvsc_detach and start removing the netvsc device.
      
      This patch fix it by adding a tx_disable flag to prevent unwanted
      queue wakeup.
      
      Fixes: 7b2ee50c ("hv_netvsc: common detach logic")
      Reported-by: default avatarMohammed Gamal <mgamal@redhat.com>
      Signed-off-by: default avatarHaiyang Zhang <haiyangz@microsoft.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1b704c4a
    • Konstantin Khorenko's avatar
      bonding: show full hw address in sysfs for slave entries · 18bebc6d
      Konstantin Khorenko authored
      Bond expects ethernet hwaddr for its slave, but it can be longer than 6
      bytes - infiniband interface for example.
      
       # cat /sys/devices/<skipped>/net/ib0/address
       80:00:02:08:fe:80:00:00:00:00:00:00:7c:fe:90:03:00:be:5d:e1
      
       # cat /sys/devices/<skipped>/net/ib0/bonding_slave/perm_hwaddr
       80:00:02:08:fe:80
      
      So print full hwaddr in sysfs "bonding_slave/perm_hwaddr" as well.
      Signed-off-by: default avatarKonstantin Khorenko <khorenko@virtuozzo.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      18bebc6d
    • Eli Britstein's avatar
      net/mlx5e: Consider tunnel type for encap contexts · 7f1a546e
      Eli Britstein authored
      The driver allocates an encap context based on the tunnel properties,
      and reuse that context for all flows using the same tunnel properties.
      Commit df2ef3bf ("net/mlx5e: Add GRE protocol offloading")
      introduced another tunnel protocol other than the single VXLAN
      previously supported. A flow that uses a tunnel with the same tunnel
      properties but with a different tunnel type (GRE vs VXLAN for example)
      would mistakenly reuse the previous alocated context, causing the
      traffic to be sent with the wrong encapsulation. Fix that by
      considering the tunnel type for encap contexts.
      
      Fixes: df2ef3bf ("net/mlx5e: Add GRE protocol offloading")
      Signed-off-by: default avatarEli Britstein <elibr@mellanox.com>
      Reviewed-by: default avatarRoi Dayan <roid@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      7f1a546e
    • Huy Nguyen's avatar
      net/mlx5e: Update xon formula · e28408e9
      Huy Nguyen authored
      Set xon = xoff - netdev's max_mtu.
      netdev's max_mtu will give enough time for the pause frame to
      arrive at the sender.
      
      Fixes: 0696d608 ("net/mlx5e: Receive buffer configuration")
      Signed-off-by: default avatarHuy Nguyen <huyn@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      e28408e9
    • Huy Nguyen's avatar
      net/mlx5e: Update xoff formula · 5ec983e9
      Huy Nguyen authored
      Set minimum speed in xoff threshold formula to 40Gbps
      
      Fixes: 0696d608 ("net/mlx5e: Receive buffer configuration")
      Signed-off-by: default avatarHuy Nguyen <huyn@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      5ec983e9
    • Huy Nguyen's avatar
      net/mlx5: E-Switch, fix syndrome (0x678139) when turn on vepa · 36acf63a
      Huy Nguyen authored
      Make sure the struct mlx5_flow_destination is zero before
      filling in the field.
      
      Fixes: 8da202b2 ("net/mlx5: E-Switch, Add support for VEPA in legacy mode.")
      Signed-off-by: default avatarHuy Nguyen <huyn@mellanox.com>
      Reviewed-by: default avatarDaniel Jurgens <danielj@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      36acf63a
    • Omri Kahalon's avatar
      net/mlx5: E-Switch, Fix esw manager vport indication for more vport commands · eca4a928
      Omri Kahalon authored
      Traditionally, the PF (Physical Function) which resides on vport 0 was
      the E-switch manager. Since the ECPF (Embedded CPU Physical Function),
      which resides on vport 0xfffe, was introduced as the E-Switch manager,
      the assumption that the E-switch manager is on vport 0 is incorrect.
      
      Since the eswitch code already uses the actual vport value, all we
      need is to always set other_vport=1.
      Signed-off-by: default avatarOmri Kahalon <omrik@mellanox.com>
      Reviewed-by: default avatarMax Gurtovoy <maxg@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      eca4a928
    • Roi Dayan's avatar
      net/mlx5: E-Switch, Protect from invalid memory access in offload fdb table · 5c1d260e
      Roi Dayan authored
      The esw offloads structures share a union with the legacy mode structs.
      Reset the offloads struct to zero in init to protect from null
      assumptions made by the legacy mode code.
      Signed-off-by: default avatarRoi Dayan <roid@mellanox.com>
      Reviewed-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      5c1d260e
    • Tonghao Zhang's avatar
      net/mlx5e: Correctly use the namespace type when allocating pedit action · 84be899f
      Tonghao Zhang authored
      The capacity of FDB offloading and NIC offloading table are
      different, and when allocating the pedit actions, we should
      use the correct namespace type.
      
      Fixes: c500c86b ("net/mlx5e: support for two independent packet edit actions")
      Cc: Pablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarTonghao Zhang <xiangxia.m.yue@gmail.com>
      Reviewed-by: default avatarRoi Dayan <roid@mellanox.com>
      Acked-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      84be899f
    • Roi Dayan's avatar
      net/mlx5: E-Switch, Fix access to invalid memory when toggling esw modes · 8a91ad93
      Roi Dayan authored
      The esw fdb table has a union of legacy and offloads members.
      So if we were in a certain esw mode we could set some memebers and not
      set null which is fine as on destroy path and don't care.
      But then moving from legacy to switchdev a second time, the cleanup flow
      of legacy mode checks if a struct member was in use if it's not null so
      we need to make sure to reset the code to null when we init legacy mode.
      
      Fixes: 8da202b2 ("net/mlx5: E-Switch, Add support for VEPA in legacy mode.")
      Signed-off-by: default avatarRoi Dayan <roid@mellanox.com>
      Reviewed-by: default avatarHuy Nguyen <huyn@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      8a91ad93
    • Aya Levin's avatar
      net/mlx5: ethtool, Allow legacy link-modes configuration via non-extended ptys · dd1b9e09
      Aya Levin authored
      Allow configuration of legacy link-modes even when extended link-modes
      are supported. This requires reading of legacy advertisement even when
      extended link-modes are supported. Since legacy and extended
      advertisement are mutually excluded, wait for empty reply from extended
      advertisement before reading legacy advertisement.
      
      Fixes: 6a897372 ("net/mlx5: ethtool, Add ethtool support for 50Gbps per lane link modes")
      Signed-off-by: default avatarAya Levin <ayal@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      dd1b9e09
    • Aya Levin's avatar
      net/mlx5: ethtool, Fix type analysis of advertised link-mode · 8d047bf5
      Aya Levin authored
      Ethtool option set_link_ksettings allows setting of legacy link-modes
      or extended link-modes. Refine the decision of which type of link-modes
      is set.
      
      Fixes: 6a897372 ("net/mlx5: ethtool, Add ethtool support for 50Gbps per lane link modes")
      Signed-off-by: default avatarAya Levin <ayal@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      8d047bf5
    • Yuval Avnery's avatar
      net/mlx5e: Add a lock on tir list · 80a2a902
      Yuval Avnery authored
      Refresh tirs is looping over a global list of tirs while netdevs are
      adding and removing tirs from that list. That is why a lock is
      required.
      
      Fixes: 724b2aa1 ("net/mlx5e: TIRs management refactoring")
      Signed-off-by: default avatarYuval Avnery <yuvalav@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      80a2a902
    • Aditya Pakki's avatar
      net: mlx5: Add a missing check on idr_find, free buf · 8e949363
      Aditya Pakki authored
      idr_find() can return a NULL value to 'flow' which is used without a
      check. The patch adds a check to avoid potential NULL pointer dereference.
      
      In case of mlx5_fpga_sbu_conn_sendmsg() failure, free buf allocated
      using kzalloc.
      
      Fixes: ab412e1d ("net/mlx5: Accel, add TLS rx offload routines")
      Signed-off-by: default avatarAditya Pakki <pakki001@umn.edu>
      Reviewed-by: default avatarYuval Shaia <yuval.shaia@oracle.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      8e949363
    • Dmytro Linkin's avatar
      net/mlx5e: Allow IPv4 ttl & IPv6 hop_limit rewrite for all L4 protocols · 8998576b
      Dmytro Linkin authored
      For some protocols we are not allowing IP header rewrite offload, since
      the HW is not capable to properly adjust the l4 checksum. However, TTL
      & HOPLIMIT modification can be done for all IP protocols, because they
      are not part of the pseudo header taken into account for checksum.
      
      Fixes: 73867881 ("drivers: net: use flow action infrastructure")
      Signed-off-by: default avatarDmytro Linkin <dmitrolin@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      8998576b
    • Gavi Teitz's avatar
      net/mlx5e: Fix error handling when refreshing TIRs · bc87a003
      Gavi Teitz authored
      Previously, a false positive would be caught if the TIRs list is
      empty, since the err value was initialized to -ENOMEM, and was only
      updated if a TIR is refreshed. This is resolved by initializing the
      err value to zero.
      
      Fixes: b676f653 ("net/mlx5e: Refactor refresh TIRs")
      Signed-off-by: default avatarGavi Teitz <gavi@mellanox.com>
      Reviewed-by: default avatarRoi Dayan <roid@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      bc87a003
    • Artemy Kovalyov's avatar
      net/mlx5: Decrease default mr cache size · e8b26b21
      Artemy Kovalyov authored
      Delete initialization of high order entries in mr cache to decrease initial
      memory footprint. When required, the administrator can populate the
      entries with memory keys via the /sys interface.
      
      This approach is very helpful to significantly reduce the per HW function
      memory footprint in virtualization environments such as SRIOV.
      
      Fixes: 9603b61d ("mlx5: Move pci device handling from mlx5_ib to mlx5_core")
      Signed-off-by: default avatarArtemy Kovalyov <artemyko@mellanox.com>
      Signed-off-by: default avatarMoni Shoua <monis@mellanox.com>
      Signed-off-by: default avatarLeon Romanovsky <leonro@mellanox.com>
      Reported-by: default avatarShalom Toledo <shalomt@mellanox.com>
      Acked-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      e8b26b21
    • Jesper Dangaard Brouer's avatar
      xdp: fix cpumap redirect SKB creation bug · 676e4a6f
      Jesper Dangaard Brouer authored
      We want to avoid leaking pointer info from xdp_frame (that is placed in
      top of frame) like commit 6dfb970d ("xdp: avoid leaking info stored in
      frame data on page reuse"), and followup commit 97e19cce ("bpf:
      reserve xdp_frame size in xdp headroom") that reserve this headroom.
      
      These changes also affected how cpumap constructed SKBs, as xdpf->headroom
      size changed, the skb data starting point were in-effect shifted with 32
      bytes (sizeof xdp_frame). This was still okay, as the cpumap frame_size
      calculation also included xdpf->headroom which were reduced by same amount.
      
      A bug was introduced in commit 77ea5f4c ("bpf/cpumap: make sure
      frame_size for build_skb is aligned if headroom isn't"), where the
      xdpf->headroom became part of the SKB_DATA_ALIGN rounding up. This
      round-up to find the frame_size is in principle still correct as it does
      not exceed the 2048 bytes frame_size (which is max for ixgbe and i40e),
      but the 32 bytes offset of pkt_data_start puts this over the 2048 bytes
      limit. This cause skb_shared_info to spill into next frame. It is a little
      hard to trigger, as the SKB need to use above 15 skb_shinfo->frags[] as
      far as I calculate. This does happen in practise for TCP streams when
      skb_try_coalesce() kicks in.
      
      KASAN can be used to detect these wrong memory accesses, I've seen:
       BUG: KASAN: use-after-free in skb_try_coalesce+0x3cb/0x760
       BUG: KASAN: wild-memory-access in skb_release_data+0xe2/0x250
      
      Driver veth also construct a SKB from xdp_frame in this way, but is not
      affected, as it doesn't reserve/deduct the room (used by xdp_frame) from
      the SKB headroom. Instead is clears the pointers via xdp_scrub_frame(),
      and allows SKB to use this area.
      
      The fix in this patch is to do like veth and instead allow SKB to (re)use
      the area occupied by xdp_frame, by clearing via xdp_scrub_frame().  (This
      does kill the idea of the SKB being able to access (mem) info from this
      area, but I guess it was a bad idea anyhow, and it was already killed by
      the veth changes.)
      
      Fixes: 77ea5f4c ("bpf/cpumap: make sure frame_size for build_skb is aligned if headroom isn't")
      Signed-off-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      676e4a6f
    • Alexander Lobakin's avatar
      net: core: netif_receive_skb_list: unlist skb before passing to pt->func · 9a5a90d1
      Alexander Lobakin authored
      __netif_receive_skb_list_ptype() leaves skb->next poisoned before passing
      it to pt_prev->func handler, what may produce (in certain cases, e.g. DSA
      setup) crashes like:
      
      [ 88.606777] CPU 0 Unable to handle kernel paging request at virtual address 0000000e, epc == 80687078, ra == 8052cc7c
      [ 88.618666] Oops[#1]:
      [ 88.621196] CPU: 0 PID: 0 Comm: swapper Not tainted 5.1.0-rc2-dlink-00206-g4192a172-dirty #1473
      [ 88.630885] $ 0 : 00000000 10000400 00000002 864d7850
      [ 88.636709] $ 4 : 87c0ddf0 864d7800 87c0ddf0 00000000
      [ 88.642526] $ 8 : 00000000 49600000 00000001 00000001
      [ 88.648342] $12 : 00000000 c288617b dadbee27 25d17c41
      [ 88.654159] $16 : 87c0ddf0 85cff080 80790000 fffffffd
      [ 88.659975] $20 : 80797b20 ffffffff 00000001 864d7800
      [ 88.665793] $24 : 00000000 8011e658
      [ 88.671609] $28 : 80790000 87c0dbc0 87cabf00 8052cc7c
      [ 88.677427] Hi : 00000003
      [ 88.680622] Lo : 7b5b4220
      [ 88.683840] epc : 80687078 vlan_dev_hard_start_xmit+0x1c/0x1a0
      [ 88.690532] ra : 8052cc7c dev_hard_start_xmit+0xac/0x188
      [ 88.696734] Status: 10000404	IEp
      [ 88.700422] Cause : 50000008 (ExcCode 02)
      [ 88.704874] BadVA : 0000000e
      [ 88.708069] PrId : 0001a120 (MIPS interAptiv (multi))
      [ 88.713005] Modules linked in:
      [ 88.716407] Process swapper (pid: 0, threadinfo=(ptrval), task=(ptrval), tls=00000000)
      [ 88.725219] Stack : 85f61c28 00000000 0000000e 80780000 87c0ddf0 85cff080 80790000 8052cc7c
      [ 88.734529] 87cabf00 00000000 00000001 85f5fb40 807b0000 864d7850 87cabf00 807d0000
      [ 88.743839] 864d7800 8655f600 00000000 85cff080 87c1c000 0000006a 00000000 8052d96c
      [ 88.753149] 807a0000 8057adb8 87c0dcc8 87c0dc50 85cfff08 00000558 87cabf00 85f58c50
      [ 88.762460] 00000002 85f58c00 864d7800 80543308 fffffff4 00000001 85f58c00 864d7800
      [ 88.771770] ...
      [ 88.774483] Call Trace:
      [ 88.777199] [<80687078>] vlan_dev_hard_start_xmit+0x1c/0x1a0
      [ 88.783504] [<8052cc7c>] dev_hard_start_xmit+0xac/0x188
      [ 88.789326] [<8052d96c>] __dev_queue_xmit+0x6e8/0x7d4
      [ 88.794955] [<805a8640>] ip_finish_output2+0x238/0x4d0
      [ 88.800677] [<805ab6a0>] ip_output+0xc8/0x140
      [ 88.805526] [<805a68f4>] ip_forward+0x364/0x560
      [ 88.810567] [<805a4ff8>] ip_rcv+0x48/0xe4
      [ 88.815030] [<80528d44>] __netif_receive_skb_one_core+0x44/0x58
      [ 88.821635] [<8067f220>] dsa_switch_rcv+0x108/0x1ac
      [ 88.827067] [<80528f80>] __netif_receive_skb_list_core+0x228/0x26c
      [ 88.833951] [<8052ed84>] netif_receive_skb_list+0x1d4/0x394
      [ 88.840160] [<80355a88>] lunar_rx_poll+0x38c/0x828
      [ 88.845496] [<8052fa78>] net_rx_action+0x14c/0x3cc
      [ 88.850835] [<806ad300>] __do_softirq+0x178/0x338
      [ 88.856077] [<8012a2d4>] irq_exit+0xbc/0x100
      [ 88.860846] [<802f8b70>] plat_irq_dispatch+0xc0/0x144
      [ 88.866477] [<80105974>] handle_int+0x14c/0x158
      [ 88.871516] [<806acfb0>] r4k_wait+0x30/0x40
      [ 88.876462] Code: afb10014 8c8200a0 00803025 <9443000c> 94a20468 00000000 10620042 00a08025 9605046a
      [ 88.887332]
      [ 88.888982] ---[ end trace eb863d007da11cf1 ]---
      [ 88.894122] Kernel panic - not syncing: Fatal exception in interrupt
      [ 88.901202] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---
      
      Fix this by pulling skb off the sublist and zeroing skb->next pointer
      before calling ptype callback.
      
      Fixes: 88eb1944 ("net: core: propagate SKB lists through packet_type lookup")
      Reviewed-by: default avatarEdward Cree <ecree@solarflare.com>
      Signed-off-by: default avatarAlexander Lobakin <alobakin@dlink.ru>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9a5a90d1
    • Mao Wenan's avatar
      net: rds: force to destroy connection if t_sock is NULL in rds_tcp_kill_sock(). · cb66ddd1
      Mao Wenan authored
      When it is to cleanup net namespace, rds_tcp_exit_net() will call
      rds_tcp_kill_sock(), if t_sock is NULL, it will not call
      rds_conn_destroy(), rds_conn_path_destroy() and rds_tcp_conn_free() to free
      connection, and the worker cp_conn_w is not stopped, afterwards the net is freed in
      net_drop_ns(); While cp_conn_w rds_connect_worker() will call rds_tcp_conn_path_connect()
      and reference 'net' which has already been freed.
      
      In rds_tcp_conn_path_connect(), rds_tcp_set_callbacks() will set t_sock = sock before
      sock->ops->connect, but if connect() is failed, it will call
      rds_tcp_restore_callbacks() and set t_sock = NULL, if connect is always
      failed, rds_connect_worker() will try to reconnect all the time, so
      rds_tcp_kill_sock() will never to cancel worker cp_conn_w and free the
      connections.
      
      Therefore, the condition !tc->t_sock is not needed if it is going to do
      cleanup_net->rds_tcp_exit_net->rds_tcp_kill_sock, because tc->t_sock is always
      NULL, and there is on other path to cancel cp_conn_w and free
      connection. So this patch is to fix this.
      
      rds_tcp_kill_sock():
      ...
      if (net != c_net || !tc->t_sock)
      ...
      Acked-by: default avatarSantosh Shilimkar <santosh.shilimkar@oracle.com>
      
      ==================================================================
      BUG: KASAN: use-after-free in inet_create+0xbcc/0xd28
      net/ipv4/af_inet.c:340
      Read of size 4 at addr ffff8003496a4684 by task kworker/u8:4/3721
      
      CPU: 3 PID: 3721 Comm: kworker/u8:4 Not tainted 5.1.0 #11
      Hardware name: linux,dummy-virt (DT)
      Workqueue: krdsd rds_connect_worker
      Call trace:
       dump_backtrace+0x0/0x3c0 arch/arm64/kernel/time.c:53
       show_stack+0x28/0x38 arch/arm64/kernel/traps.c:152
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x120/0x188 lib/dump_stack.c:113
       print_address_description+0x68/0x278 mm/kasan/report.c:253
       kasan_report_error mm/kasan/report.c:351 [inline]
       kasan_report+0x21c/0x348 mm/kasan/report.c:409
       __asan_report_load4_noabort+0x30/0x40 mm/kasan/report.c:429
       inet_create+0xbcc/0xd28 net/ipv4/af_inet.c:340
       __sock_create+0x4f8/0x770 net/socket.c:1276
       sock_create_kern+0x50/0x68 net/socket.c:1322
       rds_tcp_conn_path_connect+0x2b4/0x690 net/rds/tcp_connect.c:114
       rds_connect_worker+0x108/0x1d0 net/rds/threads.c:175
       process_one_work+0x6e8/0x1700 kernel/workqueue.c:2153
       worker_thread+0x3b0/0xdd0 kernel/workqueue.c:2296
       kthread+0x2f0/0x378 kernel/kthread.c:255
       ret_from_fork+0x10/0x18 arch/arm64/kernel/entry.S:1117
      
      Allocated by task 687:
       save_stack mm/kasan/kasan.c:448 [inline]
       set_track mm/kasan/kasan.c:460 [inline]
       kasan_kmalloc+0xd4/0x180 mm/kasan/kasan.c:553
       kasan_slab_alloc+0x14/0x20 mm/kasan/kasan.c:490
       slab_post_alloc_hook mm/slab.h:444 [inline]
       slab_alloc_node mm/slub.c:2705 [inline]
       slab_alloc mm/slub.c:2713 [inline]
       kmem_cache_alloc+0x14c/0x388 mm/slub.c:2718
       kmem_cache_zalloc include/linux/slab.h:697 [inline]
       net_alloc net/core/net_namespace.c:384 [inline]
       copy_net_ns+0xc4/0x2d0 net/core/net_namespace.c:424
       create_new_namespaces+0x300/0x658 kernel/nsproxy.c:107
       unshare_nsproxy_namespaces+0xa0/0x198 kernel/nsproxy.c:206
       ksys_unshare+0x340/0x628 kernel/fork.c:2577
       __do_sys_unshare kernel/fork.c:2645 [inline]
       __se_sys_unshare kernel/fork.c:2643 [inline]
       __arm64_sys_unshare+0x38/0x58 kernel/fork.c:2643
       __invoke_syscall arch/arm64/kernel/syscall.c:35 [inline]
       invoke_syscall arch/arm64/kernel/syscall.c:47 [inline]
       el0_svc_common+0x168/0x390 arch/arm64/kernel/syscall.c:83
       el0_svc_handler+0x60/0xd0 arch/arm64/kernel/syscall.c:129
       el0_svc+0x8/0xc arch/arm64/kernel/entry.S:960
      
      Freed by task 264:
       save_stack mm/kasan/kasan.c:448 [inline]
       set_track mm/kasan/kasan.c:460 [inline]
       __kasan_slab_free+0x114/0x220 mm/kasan/kasan.c:521
       kasan_slab_free+0x10/0x18 mm/kasan/kasan.c:528
       slab_free_hook mm/slub.c:1370 [inline]
       slab_free_freelist_hook mm/slub.c:1397 [inline]
       slab_free mm/slub.c:2952 [inline]
       kmem_cache_free+0xb8/0x3a8 mm/slub.c:2968
       net_free net/core/net_namespace.c:400 [inline]
       net_drop_ns.part.6+0x78/0x90 net/core/net_namespace.c:407
       net_drop_ns net/core/net_namespace.c:406 [inline]
       cleanup_net+0x53c/0x6d8 net/core/net_namespace.c:569
       process_one_work+0x6e8/0x1700 kernel/workqueue.c:2153
       worker_thread+0x3b0/0xdd0 kernel/workqueue.c:2296
       kthread+0x2f0/0x378 kernel/kthread.c:255
       ret_from_fork+0x10/0x18 arch/arm64/kernel/entry.S:1117
      
      The buggy address belongs to the object at ffff8003496a3f80
       which belongs to the cache net_namespace of size 7872
      The buggy address is located 1796 bytes inside of
       7872-byte region [ffff8003496a3f80, ffff8003496a5e40)
      The buggy address belongs to the page:
      page:ffff7e000d25a800 count:1 mapcount:0 mapping:ffff80036ce4b000
      index:0x0 compound_mapcount: 0
      flags: 0xffffe0000008100(slab|head)
      raw: 0ffffe0000008100 dead000000000100 dead000000000200 ffff80036ce4b000
      raw: 0000000000000000 0000000080040004 00000001ffffffff 0000000000000000
      page dumped because: kasan: bad access detected
      
      Memory state around the buggy address:
       ffff8003496a4580: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
       ffff8003496a4600: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      >ffff8003496a4680: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                         ^
       ffff8003496a4700: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
       ffff8003496a4780: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      ==================================================================
      
      Fixes: 467fa153("RDS-TCP: Support multiple RDS-TCP listen endpoints, one per netns.")
      Reported-by: default avatarHulk Robot <hulkci@huawei.com>
      Signed-off-by: default avatarMao Wenan <maowenan@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cb66ddd1
    • Andrea Righi's avatar
      openvswitch: fix flow actions reallocation · f28cd2af
      Andrea Righi authored
      The flow action buffer can be resized if it's not big enough to contain
      all the requested flow actions. However, this resize doesn't take into
      account the new requested size, the buffer is only increased by a factor
      of 2x. This might be not enough to contain the new data, causing a
      buffer overflow, for example:
      
      [   42.044472] =============================================================================
      [   42.045608] BUG kmalloc-96 (Not tainted): Redzone overwritten
      [   42.046415] -----------------------------------------------------------------------------
      
      [   42.047715] Disabling lock debugging due to kernel taint
      [   42.047716] INFO: 0x8bf2c4a5-0x720c0928. First byte 0x0 instead of 0xcc
      [   42.048677] INFO: Slab 0xbc6d2040 objects=29 used=18 fp=0xdc07dec4 flags=0x2808101
      [   42.049743] INFO: Object 0xd53a3464 @offset=2528 fp=0xccdcdebb
      
      [   42.050747] Redzone 76f1b237: cc cc cc cc cc cc cc cc                          ........
      [   42.051839] Object d53a3464: 6b 6b 6b 6b 6b 6b 6b 6b 0c 00 00 00 6c 00 00 00  kkkkkkkk....l...
      [   42.053015] Object f49a30cc: 6c 00 0c 00 00 00 00 00 00 00 00 03 78 a3 15 f6  l...........x...
      [   42.054203] Object acfe4220: 20 00 02 00 ff ff ff ff 00 00 00 00 00 00 00 00   ...............
      [   42.055370] Object 21024e91: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
      [   42.056541] Object 070e04c3: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
      [   42.057797] Object 948a777a: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
      [   42.059061] Redzone 8bf2c4a5: 00 00 00 00                                      ....
      [   42.060189] Padding a681b46e: 5a 5a 5a 5a 5a 5a 5a 5a                          ZZZZZZZZ
      
      Fix by making sure the new buffer is properly resized to contain all the
      requested data.
      
      BugLink: https://bugs.launchpad.net/bugs/1813244Signed-off-by: default avatarAndrea Righi <andrea.righi@canonical.com>
      Acked-by: default avatarPravin B Shelar <pshelar@ovn.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f28cd2af
    • David S. Miller's avatar
      Merge branch 'nfp-fix-retcode-and-disable-netpoll-on-representors' · 577dd43a
      David S. Miller authored
      Jakub Kicinski says:
      
      ====================
      nfp: fix retcode and disable netpoll on representors
      
      This series avoids a potential crash on nfp representor devices
      when netpoll is in use.  If transmitting the frame through underlying
      vNIC fails we'd return an error code (by passing on error code from
      __dev_queue_xmit()) and cause double free in netpoll code.
      
      Fix the error code and disable netpoll on reprs altogether.
      IRQ-safety of locking the queues and calling __dev_queue_xmit()
      is questionable.
      
      Big thanks to John Hurley for debugging and narrowing down
      the trace log after I gave up! :)
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      577dd43a
    • Jakub Kicinski's avatar
      nfp: disable netpoll on representors · c3e1f7ff
      Jakub Kicinski authored
      NFP reprs are software device on top of the PF's vNIC.
      The comment above __dev_queue_xmit() sayeth:
      
       When calling this method, interrupts MUST be enabled.  This is because
       the BH enable code must have IRQs enabled so that it will not deadlock.
      
      For netconsole we can't guarantee IRQ state, let's just
      disable netpoll on representors to be on the safe side.
      
      When the initial implementation of NFP reprs was added by the
      commit 5de73ee4 ("nfp: general representor implementation")
      .ndo_poll_controller was required for netpoll to be enabled.
      
      Fixes: ac3d9dd0 ("netpoll: make ndo_poll_controller() optional")
      Signed-off-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Reviewed-by: default avatarJohn Hurley <john.hurley@netronome.com>
      Reviewed-by: default avatarSimon Horman <simon.horman@netronome.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c3e1f7ff