1. 05 Apr, 2021 2 commits
    • Maciej Żenczykowski's avatar
      net-ipv6: bugfix - raw & sctp - switch to ipv6_can_nonlocal_bind() · 630e4576
      Maciej Żenczykowski authored
      Found by virtue of ipv6 raw sockets not honouring the per-socket
      IP{,V6}_FREEBIND setting.
      
      Based on hits found via:
        git grep '[.]ip_nonlocal_bind'
      We fix both raw ipv6 sockets to honour IP{,V6}_FREEBIND and IP{,V6}_TRANSPARENT,
      and we fix sctp sockets to honour IP{,V6}_TRANSPARENT (they already honoured
      FREEBIND), and not just the ipv6 'ip_nonlocal_bind' sysctl.
      
      The helper is defined as:
        static inline bool ipv6_can_nonlocal_bind(struct net *net, struct inet_sock *inet) {
          return net->ipv6.sysctl.ip_nonlocal_bind || inet->freebind || inet->transparent;
        }
      so this change only widens the accepted opt-outs and is thus a clean bugfix.
      
      I'm not entirely sure what 'fixes' tag to add, since this is AFAICT an ancient bug,
      but IMHO this should be applied to stable kernels as far back as possible.
      As such I'm adding a 'fixes' tag with the commit that originally added the helper,
      which happened in 4.19.  Backporting to older LTS kernels (at least 4.9 and 4.14)
      would presumably require open-coding it or backporting the helper as well.
      
      Other possibly relevant commits:
        v4.18-rc6-1502-g83ba4645 net: add helpers checking if socket can be bound to nonlocal address
        v4.18-rc6-1431-gd0c1f011 net/ipv6: allow any source address for sendmsg pktinfo with ip_nonlocal_bind
        v4.14-rc5-271-gb71d21c2 sctp: full support for ipv6 ip_nonlocal_bind & IP_FREEBIND
        v4.7-rc7-1883-g9b974202 sctp: support ipv6 nonlocal bind
        v4.1-12247-g35a256fe ipv6: Nonlocal bind
      
      Cc: Lorenzo Colitti <lorenzo@google.com>
      Fixes: 83ba4645 ("net: add helpers checking if socket can be bound to nonlocal address")
      Signed-off-by: default avatarMaciej Żenczykowski <maze@google.com>
      Reviewed-By: default avatarLorenzo Colitti <lorenzo@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      630e4576
    • Ilya Maximets's avatar
      openvswitch: fix send of uninitialized stack memory in ct limit reply · 4d51419d
      Ilya Maximets authored
      'struct ovs_zone_limit' has more members than initialized in
      ovs_ct_limit_get_default_limit().  The rest of the memory is a random
      kernel stack content that ends up being sent to userspace.
      
      Fix that by using designated initializer that will clear all
      non-specified fields.
      
      Fixes: 11efd5cb ("openvswitch: Support conntrack zone limit")
      Signed-off-by: default avatarIlya Maximets <i.maximets@ovn.org>
      Acked-by: default avatarTonghao Zhang <xiangxia.m.yue@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4d51419d
  2. 02 Apr, 2021 3 commits
  3. 01 Apr, 2021 17 commits
  4. 31 Mar, 2021 17 commits
    • Ong Boon Leong's avatar
      xdp: fix xdp_return_frame() kernel BUG throw for page_pool memory model · 622d1369
      Ong Boon Leong authored
      xdp_return_frame() may be called outside of NAPI context to return
      xdpf back to page_pool. xdp_return_frame() calls __xdp_return() with
      napi_direct = false. For page_pool memory model, __xdp_return() calls
      xdp_return_frame_no_direct() unconditionally and below false negative
      kernel BUG throw happened under preempt-rt build:
      
      [  430.450355] BUG: using smp_processor_id() in preemptible [00000000] code: modprobe/3884
      [  430.451678] caller is __xdp_return+0x1ff/0x2e0
      [  430.452111] CPU: 0 PID: 3884 Comm: modprobe Tainted: G     U      E     5.12.0-rc2+ #45
      
      Changes in v2:
       - This patch fixes the issue by making xdp_return_frame_no_direct() is
         only called if napi_direct = true, as recommended for better by
         Jesper Dangaard Brouer. Thanks!
      
      Fixes: 2539650f ("xdp: Helpers for disabling napi_direct of xdp_return_frame")
      Signed-off-by: default avatarOng Boon Leong <boon.leong.ong@intel.com>
      Acked-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      622d1369
    • Eric Dumazet's avatar
      Revert "net: correct sk_acceptq_is_full()" · c609e6aa
      Eric Dumazet authored
      This reverts commit f211ac15.
      
      We had similar attempt in the past, and we reverted it.
      
      History:
      
      64a14651 [NET]: Revert incorrect accept queue backlog changes.
      8488df89 [NET]: Fix bugs in "Whether sock accept queue is full" checking
      
      I am adding a fat comment so that future attempts will
      be much harder.
      
      Fixes: f211ac15 ("net: correct sk_acceptq_is_full()")
      Cc: iuyacan <yacanliu@163.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c609e6aa
    • David S. Miller's avatar
      Merge tag 'mlx5-fixes-2021-03-31' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux · 9dc22c0d
      David S. Miller authored
      Saeed Mahameed says:
      
      ====================
      mlx5 fixes 2021-03-31
      
      This series introduces some fixes to mlx5 driver.
      Please pull and let me know if there is any problem.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9dc22c0d
    • David S. Miller's avatar
      Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec · c9170f13
      David S. Miller authored
      Steffen Klassert says:
      
      ====================
      pull request (net): ipsec 2021-03-31
      
      1) Fix ipv4 pmtu checks for xfrm anf vti interfaces.
         From Eyal Birger.
      
      2) There are situations where the socket passed to
         xfrm_output_resume() is not the same as the one
         attached to the skb. Use the socket passed to
         xfrm_output_resume() to avoid lookup failures
         when xfrm is used with VRFs.
         From Evan Nimmo.
      
      3) Make the xfrm_state_hash_generation sequence counter per
         network namespace because but its write serialization
         lock is also per network namespace. Write protection
         is insufficient otherwise.
         From Ahmed S. Darwish.
      
      4) Fixup sctp featue flags when used with esp offload.
         From Xin Long.
      
      5) xfrm BEET mode doesn't support fragments for inner packets.
         This is a limitation of the protocol, so no fix possible.
         Warn at least to notify the user about that situation.
         From Xin Long.
      
      6) Fix NULL pointer dereference on policy lookup when
         namespaces are uses in combination with esp offload.
      
      7) Fix incorrect transformation on esp offload when
         packets get segmented at layer 3.
      
      8) Fix some user triggered usages of WARN_ONCE in
         the xfrm compat layer.
         From Dmitry Safonov.
      
      Please pull or let me know if there are problems.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c9170f13
    • Lv Yunlong's avatar
      net/rds: Fix a use after free in rds_message_map_pages · bdc2ab5c
      Lv Yunlong authored
      In rds_message_map_pages, the rm is freed by rds_message_put(rm).
      But rm is still used by rm->data.op_sg in return value.
      
      My patch assigns ERR_CAST(rm->data.op_sg) to err before the rm is
      freed to avoid the uaf.
      
      Fixes: 7dba9203 ("net/rds: Use ERR_PTR for rds_message_alloc_sgs()")
      Signed-off-by: default avatarLv Yunlong <lyl2019@mail.ustc.edu.cn>
      Reviewed-by: default avatarHåkon Bugge <haakon.bugge@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bdc2ab5c
    • Tong Zhu's avatar
      neighbour: Disregard DEAD dst in neigh_update · d47ec7a0
      Tong Zhu authored
      After a short network outage, the dst_entry is timed out and put
      in DST_OBSOLETE_DEAD. We are in this code because arp reply comes
      from this neighbour after network recovers. There is a potential
      race condition that dst_entry is still in DST_OBSOLETE_DEAD.
      With that, another neighbour lookup causes more harm than good.
      
      In best case all packets in arp_queue are lost. This is
      counterproductive to the original goal of finding a better path
      for those packets.
      
      I observed a worst case with 4.x kernel where a dst_entry in
      DST_OBSOLETE_DEAD state is associated with loopback net_device.
      It leads to an ethernet header with all zero addresses.
      A packet with all zero source MAC address is quite deadly with
      mac80211, ath9k and 802.11 block ack.  It fails
      ieee80211_find_sta_by_ifaddr in ath9k (xmit.c). Ath9k flushes tx
      queue (ath_tx_complete_aggr). BAW (block ack window) is not
      updated. BAW logic is damaged and ath9k transmission is disabled.
      Signed-off-by: default avatarTong Zhu <zhutong@amazon.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d47ec7a0
    • Tariq Toukan's avatar
      net/mlx5e: Guarantee room for XSK wakeup NOP on async ICOSQ · 3ff3874f
      Tariq Toukan authored
      XSK wakeup flow triggers an IRQ by posting a NOP WQE and hitting
      the doorbell on the async ICOSQ.
      It maintains its state so that it doesn't issue another NOP WQE
      if it has an outstanding one already.
      
      For this flow to work properly, the NOP post must not fail.
      Make sure to reserve room for the NOP WQE in all WQE posts to the
      async ICOSQ.
      
      Fixes: 8d94b590 ("net/mlx5e: Turn XSK ICOSQ into a general asynchronous one")
      Fixes: 1182f365 ("net/mlx5e: kTLS, Add kTLS RX HW offload support")
      Fixes: 0419d8c9 ("net/mlx5e: kTLS, Add kTLS RX resync support")
      Signed-off-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Reviewed-by: default avatarMaxim Mikityanskiy <maximmi@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      3ff3874f
    • Dima Chumak's avatar
      net/mlx5e: Consider geneve_opts for encap contexts · 929a2fad
      Dima Chumak authored
      Current algorithm for encap keys is legacy from initial vxlan
      implementation and doesn't take into account all possible fields of a
      tunnel. For example, for a Geneve tunnel, which may have additional TLV
      options, they are ignored when comparing encap keys and a rule can be
      attached to an incorrect encap entry.
      
      Fix that by introducing encap_info_equal() operation in
      struct mlx5e_tc_tunnel. Geneve tunnel type uses custom implementation,
      which extends generic algorithm and considers options if they are set.
      
      Fixes: 7f1a546e ("net/mlx5e: Consider tunnel type for encap contexts")
      Signed-off-by: default avatarDima Chumak <dchumak@nvidia.com>
      Reviewed-by: default avatarVlad Buslov <vladbu@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      929a2fad
    • Daniel Jurgens's avatar
      net/mlx5: Don't request more than supported EQs · a7b76002
      Daniel Jurgens authored
      Calculating the number of compeltion EQs based on the number of
      available IRQ vectors doesn't work now that all async EQs share one IRQ.
      Thus the max number of EQs can be exceeded on systems with more than
      approximately 256 CPUs. Take this into account when calculating the
      number of available completion EQs.
      
      Fixes: 81bfa206 ("net/mlx5: Use a single IRQ for all async EQs")
      Signed-off-by: default avatarDaniel Jurgens <danielj@mellanox.com>
      Reviewed-by: default avatarParav Pandit <parav@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      a7b76002
    • Tariq Toukan's avatar
      net/mlx5e: kTLS, Fix RX counters atomicity · 6f4fdd53
      Tariq Toukan authored
      Some TLS RX counters increment per socket/connection, and are not
      protected against parallel modifications from several cores.
      Switch them to atomic counters by taking them out of the RQ stats into
      the global atomic TLS stats.
      
      In this patch, we touch 'rx_tls_ctx/del' that count the number of
      device-offloaded RX TLS connections added/deleted.
      These counters are updated in the add/del callbacks, out of the fast
      data-path.
      
      This change is not needed for counters that increment only in NAPI
      context, as they are protected by the NAPI mechanism.
      Keep them as tls_* counters under 'struct mlx5e_rq_stats'.
      
      Fixes: 76c1e1ac ("net/mlx5e: kTLS, Add kTLS RX stats")
      Signed-off-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Reviewed-by: default avatarMaxim Mikityanskiy <maximmi@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      6f4fdd53
    • Tariq Toukan's avatar
      net/mlx5e: kTLS, Fix TX counters atomicity · a51bce96
      Tariq Toukan authored
      Some TLS TX counters increment per socket/connection, and are not
      protected against parallel modifications from several cores.
      Switch them to atomic counters by taking them out of the SQ stats into
      the global atomic TLS stats.
      
      In this patch, we touch a single counter 'tx_tls_ctx' that counts the
      number of device-offloaded TX TLS connections added.
      Now that this counter can be increased without the for having the SQ
      context in hand, move it to the mlx5e_ktls_add_tx() callback where it
      really belongs, out of the fast data-path.
      
      This change is not needed for counters that increment only in NAPI
      context or under the TX lock, as they are already protected.
      Keep them as tls_* counters under 'struct mlx5e_sq_stats'.
      
      Fixes: d2ead1f3 ("net/mlx5e: Add kTLS TX HW offload support")
      Signed-off-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Reviewed-by: default avatarMaxim Mikityanskiy <maximmi@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      a51bce96
    • Maor Dickman's avatar
      net/mlx5: E-switch, Create vport miss group only if src rewrite is supported · e929e3da
      Maor Dickman authored
      Create send to vport miss group was added in order to support traffic
      recirculation to root table with metadata source rewrite.
      This group is created also in case source rewrite isn't supported.
      
      Fixed by creating send to vport miss group only if source rewrite is
      supported by FW.
      
      Fixes: 8e404fef ("net/mlx5e: Match recirculated packet miss in slow table using reg_c1")
      Signed-off-by: default avatarMaor Dickman <maord@nvidia.com>
      Reviewed-by: default avatarVlad Buslov <vladbu@nvidia.com>
      Reviewed-by: default avatarRoi Dayan <roid@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      e929e3da
    • Aya Levin's avatar
      net/mlx5e: Fix ethtool indication of connector type · 3211434d
      Aya Levin authored
      Use connector_type read from PTYS register when it's valid, based on
      corresponding capability bit.
      
      Fixes: 5b4793f8 ("net/mlx5e: Add support for reading connector type from PTYS")
      Signed-off-by: default avatarAya Levin <ayal@nvidia.com>
      Reviewed-by: default avatarEran Ben Elisha <eranbe@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      3211434d
    • Maor Dickman's avatar
      net/mlx5: Delete auxiliary bus driver eth-rep first · 1f90aedf
      Maor Dickman authored
      Delete auxiliary bus drivers flow deletes the eth driver
      first and then the eth-reps driver but eth-reps devices resources
      are depend on eth device.
      
      Fixed by changing the delete order of auxiliary bus drivers to delete
      the eth-rep driver first and after it the eth driver.
      
      Fixes: 601c10c8 ("net/mlx5: Delete custom device management logic")
      Signed-off-by: default avatarMaor Dickman <maord@nvidia.com>
      Reviewed-by: default avatarLeon Romanovsky <leonro@nvidia.com>
      Reviewed-by: default avatarRoi Dayan <roid@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      1f90aedf
    • Ariel Levkovich's avatar
      net/mlx5e: Fix mapping of ct_label zero · d24f847e
      Ariel Levkovich authored
      ct_label 0 is a default label each flow has and therefore
      there can be rules that match on ct_label=0 without a prior
      rule that set the ct_label to this value.
      
      The ct_label value is not used directly in the HW rules and
      instead it is mapped to some id within a defined range and this
      id is used to set and match the metadata register which carries
      the ct_label.
      
      If we have a rule that matches on ct_label=0, the hw rule will
      perform matching on a value that is != 0 because of the mapping
      from label to id. Since the metadata register default value is
      0 and it was never set before to anything else by an action that
      sets the ct_label, there will always be a mismatch between that
      register and the value in the rule.
      
      To support such rule, a forced mapping of ct_label 0 to id=0
      is done so that it will match the metadata register default
      value of 0.
      
      Fixes: 54b154ec ("net/mlx5e: CT: Map 128 bits labels to 32 bit map ID")
      Signed-off-by: default avatarAriel Levkovich <lariel@nvidia.com>
      Reviewed-by: default avatarRoi Dayan <roid@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      d24f847e
    • Eric Dumazet's avatar
      net: ensure mac header is set in virtio_net_hdr_to_skb() · 61431a59
      Eric Dumazet authored
      Commit 924a9bc3 ("net: check if protocol extracted by virtio_net_hdr_set_proto is correct")
      added a call to dev_parse_header_protocol() but mac_header is not yet set.
      
      This means that eth_hdr() reads complete garbage, and syzbot complained about it [1]
      
      This patch resets mac_header earlier, to get more coverage about this change.
      
      Audit of virtio_net_hdr_to_skb() callers shows that this change should be safe.
      
      [1]
      
      BUG: KASAN: use-after-free in eth_header_parse_protocol+0xdc/0xe0 net/ethernet/eth.c:282
      Read of size 2 at addr ffff888017a6200b by task syz-executor313/8409
      
      CPU: 1 PID: 8409 Comm: syz-executor313 Not tainted 5.12.0-rc2-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:79 [inline]
       dump_stack+0x141/0x1d7 lib/dump_stack.c:120
       print_address_description.constprop.0.cold+0x5b/0x2f8 mm/kasan/report.c:232
       __kasan_report mm/kasan/report.c:399 [inline]
       kasan_report.cold+0x7c/0xd8 mm/kasan/report.c:416
       eth_header_parse_protocol+0xdc/0xe0 net/ethernet/eth.c:282
       dev_parse_header_protocol include/linux/netdevice.h:3177 [inline]
       virtio_net_hdr_to_skb.constprop.0+0x99d/0xcd0 include/linux/virtio_net.h:83
       packet_snd net/packet/af_packet.c:2994 [inline]
       packet_sendmsg+0x2325/0x52b0 net/packet/af_packet.c:3031
       sock_sendmsg_nosec net/socket.c:654 [inline]
       sock_sendmsg+0xcf/0x120 net/socket.c:674
       sock_no_sendpage+0xf3/0x130 net/core/sock.c:2860
       kernel_sendpage.part.0+0x1ab/0x350 net/socket.c:3631
       kernel_sendpage net/socket.c:3628 [inline]
       sock_sendpage+0xe5/0x140 net/socket.c:947
       pipe_to_sendpage+0x2ad/0x380 fs/splice.c:364
       splice_from_pipe_feed fs/splice.c:418 [inline]
       __splice_from_pipe+0x43e/0x8a0 fs/splice.c:562
       splice_from_pipe fs/splice.c:597 [inline]
       generic_splice_sendpage+0xd4/0x140 fs/splice.c:746
       do_splice_from fs/splice.c:767 [inline]
       do_splice+0xb7e/0x1940 fs/splice.c:1079
       __do_splice+0x134/0x250 fs/splice.c:1144
       __do_sys_splice fs/splice.c:1350 [inline]
       __se_sys_splice fs/splice.c:1332 [inline]
       __x64_sys_splice+0x198/0x250 fs/splice.c:1332
       do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
      
      Fixes: 924a9bc3 ("net: check if protocol extracted by virtio_net_hdr_set_proto is correct")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Balazs Nemeth <bnemeth@redhat.com>
      Cc: Willem de Bruijn <willemb@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      61431a59
    • Florian Fainelli's avatar
      net: phy: broadcom: Only advertise EEE for supported modes · c056d480
      Florian Fainelli authored
      We should not be advertising EEE for modes that we do not support,
      correct that oversight by looking at the PHY device supported linkmodes.
      
      Fixes: 99cec8a4 ("net: phy: broadcom: Allow enabling or disabling of EEE")
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c056d480
  5. 30 Mar, 2021 1 commit