1. 31 Mar, 2021 11 commits
    • Tariq Toukan's avatar
      net/mlx5e: Guarantee room for XSK wakeup NOP on async ICOSQ · 3ff3874f
      Tariq Toukan authored
      XSK wakeup flow triggers an IRQ by posting a NOP WQE and hitting
      the doorbell on the async ICOSQ.
      It maintains its state so that it doesn't issue another NOP WQE
      if it has an outstanding one already.
      
      For this flow to work properly, the NOP post must not fail.
      Make sure to reserve room for the NOP WQE in all WQE posts to the
      async ICOSQ.
      
      Fixes: 8d94b590 ("net/mlx5e: Turn XSK ICOSQ into a general asynchronous one")
      Fixes: 1182f365 ("net/mlx5e: kTLS, Add kTLS RX HW offload support")
      Fixes: 0419d8c9 ("net/mlx5e: kTLS, Add kTLS RX resync support")
      Signed-off-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Reviewed-by: default avatarMaxim Mikityanskiy <maximmi@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      3ff3874f
    • Dima Chumak's avatar
      net/mlx5e: Consider geneve_opts for encap contexts · 929a2fad
      Dima Chumak authored
      Current algorithm for encap keys is legacy from initial vxlan
      implementation and doesn't take into account all possible fields of a
      tunnel. For example, for a Geneve tunnel, which may have additional TLV
      options, they are ignored when comparing encap keys and a rule can be
      attached to an incorrect encap entry.
      
      Fix that by introducing encap_info_equal() operation in
      struct mlx5e_tc_tunnel. Geneve tunnel type uses custom implementation,
      which extends generic algorithm and considers options if they are set.
      
      Fixes: 7f1a546e ("net/mlx5e: Consider tunnel type for encap contexts")
      Signed-off-by: default avatarDima Chumak <dchumak@nvidia.com>
      Reviewed-by: default avatarVlad Buslov <vladbu@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      929a2fad
    • Daniel Jurgens's avatar
      net/mlx5: Don't request more than supported EQs · a7b76002
      Daniel Jurgens authored
      Calculating the number of compeltion EQs based on the number of
      available IRQ vectors doesn't work now that all async EQs share one IRQ.
      Thus the max number of EQs can be exceeded on systems with more than
      approximately 256 CPUs. Take this into account when calculating the
      number of available completion EQs.
      
      Fixes: 81bfa206 ("net/mlx5: Use a single IRQ for all async EQs")
      Signed-off-by: default avatarDaniel Jurgens <danielj@mellanox.com>
      Reviewed-by: default avatarParav Pandit <parav@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      a7b76002
    • Tariq Toukan's avatar
      net/mlx5e: kTLS, Fix RX counters atomicity · 6f4fdd53
      Tariq Toukan authored
      Some TLS RX counters increment per socket/connection, and are not
      protected against parallel modifications from several cores.
      Switch them to atomic counters by taking them out of the RQ stats into
      the global atomic TLS stats.
      
      In this patch, we touch 'rx_tls_ctx/del' that count the number of
      device-offloaded RX TLS connections added/deleted.
      These counters are updated in the add/del callbacks, out of the fast
      data-path.
      
      This change is not needed for counters that increment only in NAPI
      context, as they are protected by the NAPI mechanism.
      Keep them as tls_* counters under 'struct mlx5e_rq_stats'.
      
      Fixes: 76c1e1ac ("net/mlx5e: kTLS, Add kTLS RX stats")
      Signed-off-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Reviewed-by: default avatarMaxim Mikityanskiy <maximmi@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      6f4fdd53
    • Tariq Toukan's avatar
      net/mlx5e: kTLS, Fix TX counters atomicity · a51bce96
      Tariq Toukan authored
      Some TLS TX counters increment per socket/connection, and are not
      protected against parallel modifications from several cores.
      Switch them to atomic counters by taking them out of the SQ stats into
      the global atomic TLS stats.
      
      In this patch, we touch a single counter 'tx_tls_ctx' that counts the
      number of device-offloaded TX TLS connections added.
      Now that this counter can be increased without the for having the SQ
      context in hand, move it to the mlx5e_ktls_add_tx() callback where it
      really belongs, out of the fast data-path.
      
      This change is not needed for counters that increment only in NAPI
      context or under the TX lock, as they are already protected.
      Keep them as tls_* counters under 'struct mlx5e_sq_stats'.
      
      Fixes: d2ead1f3 ("net/mlx5e: Add kTLS TX HW offload support")
      Signed-off-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Reviewed-by: default avatarMaxim Mikityanskiy <maximmi@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      a51bce96
    • Maor Dickman's avatar
      net/mlx5: E-switch, Create vport miss group only if src rewrite is supported · e929e3da
      Maor Dickman authored
      Create send to vport miss group was added in order to support traffic
      recirculation to root table with metadata source rewrite.
      This group is created also in case source rewrite isn't supported.
      
      Fixed by creating send to vport miss group only if source rewrite is
      supported by FW.
      
      Fixes: 8e404fef ("net/mlx5e: Match recirculated packet miss in slow table using reg_c1")
      Signed-off-by: default avatarMaor Dickman <maord@nvidia.com>
      Reviewed-by: default avatarVlad Buslov <vladbu@nvidia.com>
      Reviewed-by: default avatarRoi Dayan <roid@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      e929e3da
    • Aya Levin's avatar
      net/mlx5e: Fix ethtool indication of connector type · 3211434d
      Aya Levin authored
      Use connector_type read from PTYS register when it's valid, based on
      corresponding capability bit.
      
      Fixes: 5b4793f8 ("net/mlx5e: Add support for reading connector type from PTYS")
      Signed-off-by: default avatarAya Levin <ayal@nvidia.com>
      Reviewed-by: default avatarEran Ben Elisha <eranbe@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      3211434d
    • Maor Dickman's avatar
      net/mlx5: Delete auxiliary bus driver eth-rep first · 1f90aedf
      Maor Dickman authored
      Delete auxiliary bus drivers flow deletes the eth driver
      first and then the eth-reps driver but eth-reps devices resources
      are depend on eth device.
      
      Fixed by changing the delete order of auxiliary bus drivers to delete
      the eth-rep driver first and after it the eth driver.
      
      Fixes: 601c10c8 ("net/mlx5: Delete custom device management logic")
      Signed-off-by: default avatarMaor Dickman <maord@nvidia.com>
      Reviewed-by: default avatarLeon Romanovsky <leonro@nvidia.com>
      Reviewed-by: default avatarRoi Dayan <roid@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      1f90aedf
    • Ariel Levkovich's avatar
      net/mlx5e: Fix mapping of ct_label zero · d24f847e
      Ariel Levkovich authored
      ct_label 0 is a default label each flow has and therefore
      there can be rules that match on ct_label=0 without a prior
      rule that set the ct_label to this value.
      
      The ct_label value is not used directly in the HW rules and
      instead it is mapped to some id within a defined range and this
      id is used to set and match the metadata register which carries
      the ct_label.
      
      If we have a rule that matches on ct_label=0, the hw rule will
      perform matching on a value that is != 0 because of the mapping
      from label to id. Since the metadata register default value is
      0 and it was never set before to anything else by an action that
      sets the ct_label, there will always be a mismatch between that
      register and the value in the rule.
      
      To support such rule, a forced mapping of ct_label 0 to id=0
      is done so that it will match the metadata register default
      value of 0.
      
      Fixes: 54b154ec ("net/mlx5e: CT: Map 128 bits labels to 32 bit map ID")
      Signed-off-by: default avatarAriel Levkovich <lariel@nvidia.com>
      Reviewed-by: default avatarRoi Dayan <roid@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      d24f847e
    • Eric Dumazet's avatar
      net: ensure mac header is set in virtio_net_hdr_to_skb() · 61431a59
      Eric Dumazet authored
      Commit 924a9bc3 ("net: check if protocol extracted by virtio_net_hdr_set_proto is correct")
      added a call to dev_parse_header_protocol() but mac_header is not yet set.
      
      This means that eth_hdr() reads complete garbage, and syzbot complained about it [1]
      
      This patch resets mac_header earlier, to get more coverage about this change.
      
      Audit of virtio_net_hdr_to_skb() callers shows that this change should be safe.
      
      [1]
      
      BUG: KASAN: use-after-free in eth_header_parse_protocol+0xdc/0xe0 net/ethernet/eth.c:282
      Read of size 2 at addr ffff888017a6200b by task syz-executor313/8409
      
      CPU: 1 PID: 8409 Comm: syz-executor313 Not tainted 5.12.0-rc2-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:79 [inline]
       dump_stack+0x141/0x1d7 lib/dump_stack.c:120
       print_address_description.constprop.0.cold+0x5b/0x2f8 mm/kasan/report.c:232
       __kasan_report mm/kasan/report.c:399 [inline]
       kasan_report.cold+0x7c/0xd8 mm/kasan/report.c:416
       eth_header_parse_protocol+0xdc/0xe0 net/ethernet/eth.c:282
       dev_parse_header_protocol include/linux/netdevice.h:3177 [inline]
       virtio_net_hdr_to_skb.constprop.0+0x99d/0xcd0 include/linux/virtio_net.h:83
       packet_snd net/packet/af_packet.c:2994 [inline]
       packet_sendmsg+0x2325/0x52b0 net/packet/af_packet.c:3031
       sock_sendmsg_nosec net/socket.c:654 [inline]
       sock_sendmsg+0xcf/0x120 net/socket.c:674
       sock_no_sendpage+0xf3/0x130 net/core/sock.c:2860
       kernel_sendpage.part.0+0x1ab/0x350 net/socket.c:3631
       kernel_sendpage net/socket.c:3628 [inline]
       sock_sendpage+0xe5/0x140 net/socket.c:947
       pipe_to_sendpage+0x2ad/0x380 fs/splice.c:364
       splice_from_pipe_feed fs/splice.c:418 [inline]
       __splice_from_pipe+0x43e/0x8a0 fs/splice.c:562
       splice_from_pipe fs/splice.c:597 [inline]
       generic_splice_sendpage+0xd4/0x140 fs/splice.c:746
       do_splice_from fs/splice.c:767 [inline]
       do_splice+0xb7e/0x1940 fs/splice.c:1079
       __do_splice+0x134/0x250 fs/splice.c:1144
       __do_sys_splice fs/splice.c:1350 [inline]
       __se_sys_splice fs/splice.c:1332 [inline]
       __x64_sys_splice+0x198/0x250 fs/splice.c:1332
       do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
      
      Fixes: 924a9bc3 ("net: check if protocol extracted by virtio_net_hdr_set_proto is correct")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Balazs Nemeth <bnemeth@redhat.com>
      Cc: Willem de Bruijn <willemb@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      61431a59
    • Florian Fainelli's avatar
      net: phy: broadcom: Only advertise EEE for supported modes · c056d480
      Florian Fainelli authored
      We should not be advertising EEE for modes that we do not support,
      correct that oversight by looking at the PHY device supported linkmodes.
      
      Fixes: 99cec8a4 ("net: phy: broadcom: Allow enabling or disabling of EEE")
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c056d480
  2. 30 Mar, 2021 7 commits
    • Yinjun Zhang's avatar
      nfp: flower: ignore duplicate merge hints from FW · 2ea538db
      Yinjun Zhang authored
      A merge hint message needs some time to process before the merged
      flow actually reaches the firmware, during which we may get duplicate
      merge hints if there're more than one packet that hit the pre-merged
      flow. And processing duplicate merge hints will cost extra host_ctx's
      which are a limited resource.
      
      Avoid the duplicate merge by using hash table to store the sub_flows
      to be merged.
      
      Fixes: 8af56f40 ("nfp: flower: offload merge flows")
      Signed-off-by: default avatarYinjun Zhang <yinjun.zhang@corigine.com>
      Signed-off-by: default avatarLouis Peens <louis.peens@corigine.com>
      Signed-off-by: default avatarSimon Horman <simon.horman@netronome.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2ea538db
    • Paolo Abeni's avatar
      net: let skb_orphan_partial wake-up waiters. · 9adc89af
      Paolo Abeni authored
      Currently the mentioned helper can end-up freeing the socket wmem
      without waking-up any processes waiting for more write memory.
      
      If the partially orphaned skb is attached to an UDP (or raw) socket,
      the lack of wake-up can hang the user-space.
      
      Even for TCP sockets not calling the sk destructor could have bad
      effects on TSQ.
      
      Address the issue using skb_orphan to release the sk wmem before
      setting the new sock_efree destructor. Additionally bundle the
      whole ownership update in a new helper, so that later other
      potential users could avoid duplicate code.
      
      v1 -> v2:
       - use skb_orphan() instead of sort of open coding it (Eric)
       - provide an helper for the ownership change (Eric)
      
      Fixes: f6ba8d33 ("netem: fix skb_orphan_partial()")
      Suggested-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9adc89af
    • Yunjian Wang's avatar
      sch_htb: fix null pointer dereference on a null new_q · ae81feb7
      Yunjian Wang authored
      sch_htb: fix null pointer dereference on a null new_q
      
      Currently if new_q is null, the null new_q pointer will be
      dereference when 'q->offload' is true. Fix this by adding
      a braces around htb_parent_to_leaf_offload() to avoid it.
      
      Addresses-Coverity: ("Dereference after null check")
      Fixes: d03b195b ("sch_htb: Hierarchical QoS hardware offload")
      Signed-off-by: default avatarYunjian Wang <wangyunjian@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ae81feb7
    • Loic Poulain's avatar
      net: qrtr: Fix memory leak on qrtr_tx_wait failure · 8a03dd92
      Loic Poulain authored
      qrtr_tx_wait does not check for radix_tree_insert failure, causing
      the 'flow' object to be unreferenced after qrtr_tx_wait return. Fix
      that by releasing flow on radix_tree_insert failure.
      
      Fixes: 5fdeb0d3 ("net: qrtr: Implement outgoing flow control")
      Reported-by: syzbot+739016799a89c530b32a@syzkaller.appspotmail.com
      Signed-off-by: default avatarLoic Poulain <loic.poulain@linaro.org>
      Reviewed-by: default avatarBjorn Andersson <bjorn.andersson@linaro.org>
      Reviewed-by: default avatarManivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8a03dd92
    • Kumar Kartikeya Dwivedi's avatar
      net: sched: bump refcount for new action in ACT replace mode · 6855e821
      Kumar Kartikeya Dwivedi authored
      Currently, action creation using ACT API in replace mode is buggy.
      When invoking for non-existent action index 42,
      
      	tc action replace action bpf obj foo.o sec <xyz> index 42
      
      kernel creates the action, fills up the netlink response, and then just
      deletes the action after notifying userspace.
      
      	tc action show action bpf
      
      doesn't list the action.
      
      This happens due to the following sequence when ovr = 1 (replace mode)
      is enabled:
      
      tcf_idr_check_alloc is used to atomically check and either obtain
      reference for existing action at index, or reserve the index slot using
      a dummy entry (ERR_PTR(-EBUSY)).
      
      This is necessary as pointers to these actions will be held after
      dropping the idrinfo lock, so bumping the reference count is necessary
      as we need to insert the actions, and notify userspace by dumping their
      attributes. Finally, we drop the reference we took using the
      tcf_action_put_many call in tcf_action_add. However, for the case where
      a new action is created due to free index, its refcount remains one.
      This when paired with the put_many call leads to the kernel setting up
      the action, notifying userspace of its creation, and then tearing it
      down. For existing actions, the refcount is still held so they remain
      unaffected.
      
      Fortunately due to rtnl_lock serialization requirement, such an action
      with refcount == 1 will not be concurrently deleted by anything else, at
      best CLS API can move its refcount up and down by binding to it after it
      has been published from tcf_idr_insert_many. Since refcount is atleast
      one until put_many call, CLS API cannot delete it. Also __tcf_action_put
      release path already ensures deterministic outcome (either new action
      will be created or existing action will be reused in case CLS API tries
      to bind to action concurrently) due to idr lock serialization.
      
      We fix this by making refcount of newly created actions as 2 in ACT API
      replace mode. A relaxed store will suffice as visibility is ensured only
      after the tcf_idr_insert_many call.
      
      Note that in case of creation or overwriting using CLS API only (i.e.
      bind = 1), overwriting existing action object is not allowed, and any
      such request is silently ignored (without error).
      
      The refcount bump that occurs in tcf_idr_check_alloc call there for
      existing action will pair with tcf_exts_destroy call made from the
      owner module for the same action. In case of action creation, there
      is no existing action, so no tcf_exts_destroy callback happens.
      
      This means no code changes for CLS API.
      
      Fixes: cae422f3 ("net: sched: use reference counting action init")
      Signed-off-by: default avatarKumar Kartikeya Dwivedi <memxor@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6855e821
    • Milton Miller's avatar
      net/ncsi: Avoid channel_monitor hrtimer deadlock · 03cb4d05
      Milton Miller authored
      Calling ncsi_stop_channel_monitor from channel_monitor is a guaranteed
      deadlock on SMP because stop calls del_timer_sync on the timer that
      invoked channel_monitor as its timer function.
      
      Recognise the inherent race of marking the monitor disabled before
      deleting the timer by just returning if enable was cleared.  After
      a timeout (the default case -- reset to START when response received)
      just mark the monitor.enabled false.
      
      If the channel has an entry on the channel_queue list, or if the
      state is not ACTIVE or INACTIVE, then warn and mark the timer stopped
      and don't restart, as the locking is broken somehow.
      
      Fixes: 0795fb20 ("net/ncsi: Stop monitor if channel times out or is inactive")
      Signed-off-by: default avatarMilton Miller <miltonm@us.ibm.com>
      Signed-off-by: default avatarEddie James <eajames@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      03cb4d05
    • Lv Yunlong's avatar
      ethernet/netronome/nfp: Fix a use after free in nfp_bpf_ctrl_msg_rx · 6e5a03bc
      Lv Yunlong authored
      In nfp_bpf_ctrl_msg_rx, if
      nfp_ccm_get_type(skb) == NFP_CCM_TYPE_BPF_BPF_EVENT is true, the skb
      will be freed. But the skb is still used by nfp_ccm_rx(&bpf->ccm, skb).
      
      My patch adds a return when the skb was freed.
      
      Fixes: bcf0cafa ("nfp: split out common control message handling code")
      Signed-off-by: default avatarLv Yunlong <lyl2019@mail.ustc.edu.cn>
      Reviewed-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6e5a03bc
  3. 29 Mar, 2021 22 commits