1. 08 Jul, 2020 10 commits
    • Rahul Lakkireddy's avatar
      cxgb4: fix all-mask IP address comparison · 76c4d85c
      Rahul Lakkireddy authored
      Convert all-mask IP address to Big Endian, instead, for comparison.
      
      Fixes: f286dd8e ("cxgb4: use correct type for all-mask IP address comparison")
      Signed-off-by: default avatarRahul Lakkireddy <rahul.lakkireddy@chelsio.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      76c4d85c
    • Hamish Martin's avatar
      tipc: fix retransmission on unicast links · a34f8291
      Hamish Martin authored
      A scenario has been observed where a 'bc_init' message for a link is not
      retransmitted if it fails to be received by the peer. This leads to the
      peer never establishing the link fully and it discarding all other data
      received on the link. In this scenario the message is lost in transit to
      the peer.
      
      The issue is traced to the 'nxt_retr' field of the skb not being
      initialised for links that aren't a bc_sndlink. This leads to the
      comparison in tipc_link_advance_transmq() that gates whether to attempt
      retransmission of a message performing in an undesirable way.
      Depending on the relative value of 'jiffies', this comparison:
          time_before(jiffies, TIPC_SKB_CB(skb)->nxt_retr)
      may return true or false given that 'nxt_retr' remains at the
      uninitialised value of 0 for non bc_sndlinks.
      
      This is most noticeable shortly after boot when jiffies is initialised
      to a high value (to flush out rollover bugs) and we compare a jiffies of,
      say, 4294940189 to zero. In that case time_before returns 'true' leading
      to the skb not being retransmitted.
      
      The fix is to ensure that all skbs have a valid 'nxt_retr' time set for
      them and this is achieved by refactoring the setting of this value into
      a central function.
      With this fix, transmission losses of 'bc_init' messages do not stall
      the link establishment forever because the 'bc_init' message is
      retransmitted and the link eventually establishes correctly.
      
      Fixes: 382f598f ("tipc: reduce duplicate packets for unicast traffic")
      Acked-by: default avatarJon Maloy <jmaloy@redhat.com>
      Signed-off-by: default avatarHamish Martin <hamish.martin@alliedtelesis.co.nz>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a34f8291
    • Xin Long's avatar
      l2tp: remove skb_dst_set() from l2tp_xmit_skb() · 27d53323
      Xin Long authored
      In the tx path of l2tp, l2tp_xmit_skb() calls skb_dst_set() to set
      skb's dst. However, it will eventually call inet6_csk_xmit() or
      ip_queue_xmit() where skb's dst will be overwritten by:
      
         skb_dst_set_noref(skb, dst);
      
      without releasing the old dst in skb. Then it causes dst/dev refcnt leak:
      
        unregister_netdevice: waiting for eth0 to become free. Usage count = 1
      
      This can be reproduced by simply running:
      
        # modprobe l2tp_eth && modprobe l2tp_ip
        # sh ./tools/testing/selftests/net/l2tp.sh
      
      So before going to inet6_csk_xmit() or ip_queue_xmit(), skb's dst
      should be dropped. This patch is to fix it by removing skb_dst_set()
      from l2tp_xmit_skb() and moving skb_dst_drop() into l2tp_xmit_core().
      
      Fixes: 3557baab ("[L2TP]: PPP over L2TP driver core")
      Reported-by: default avatarHangbin Liu <liuhangbin@gmail.com>
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Acked-by: default avatarJames Chapman <jchapman@katalix.com>
      Tested-by: default avatarJames Chapman <jchapman@katalix.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      27d53323
    • David S. Miller's avatar
      Merge branch 'net-smc-fixes' · 1412bb2b
      David S. Miller authored
      Karsten Graul says:
      
      ====================
      net/smc: fixes 2020-07-08
      
      Please apply the following patch series for smc to netdev's net tree.
      
      The patches fix problems found during more testing of SMC
      functionality, resulting in hang conditions and unneeded link
      deactivations. The clc module was hardened to be prepared for
      possible future SMCD versions.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1412bb2b
    • Ursula Braun's avatar
      net/smc: tolerate future SMCD versions · fb4f7926
      Ursula Braun authored
      CLC proposal messages of future SMCD versions could be larger than SMCD
      V1 CLC proposal messages.
      To enable toleration in SMC V1 the receival of CLC proposal messages
      is adapted:
      * accept larger length values in CLC proposal
      * check trailing eye catcher for incoming CLC proposal with V1 length only
      * receive the whole CLC proposal even in cases it does not fit into the
        V1 buffer
      
      Fixes: e7b7a64a ("smc: support variable CLC proposal messages")
      Signed-off-by: default avatarUrsula Braun <ubraun@linux.ibm.com>
      Signed-off-by: default avatarKarsten Graul <kgraul@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fb4f7926
    • Ursula Braun's avatar
      net/smc: switch smcd_dev_list spinlock to mutex · 82087c03
      Ursula Braun authored
      The similar smc_ib_devices spinlock has been converted to a mutex.
      Protecting the smcd_dev_list by a mutex is possible as well. This
      patch converts the smcd_dev_list spinlock to a mutex.
      
      Fixes: c6ba7c9b ("net/smc: add base infrastructure for SMC-D and ISM")
      Signed-off-by: default avatarUrsula Braun <ubraun@linux.ibm.com>
      Signed-off-by: default avatarKarsten Graul <kgraul@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      82087c03
    • Ursula Braun's avatar
      net/smc: fix sleep bug in smc_pnet_find_roce_resource() · 92f3cb0e
      Ursula Braun authored
      Tests showed this BUG:
      [572555.252867] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:935
      [572555.252876] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 131031, name: smcapp
      [572555.252879] INFO: lockdep is turned off.
      [572555.252883] CPU: 1 PID: 131031 Comm: smcapp Tainted: G           O      5.7.0-rc3uschi+ #356
      [572555.252885] Hardware name: IBM 3906 M03 703 (LPAR)
      [572555.252887] Call Trace:
      [572555.252896]  [<00000000ac364554>] show_stack+0x94/0xe8
      [572555.252901]  [<00000000aca1f400>] dump_stack+0xa0/0xe0
      [572555.252906]  [<00000000ac3c8c10>] ___might_sleep+0x260/0x280
      [572555.252910]  [<00000000acdc0c98>] __mutex_lock+0x48/0x940
      [572555.252912]  [<00000000acdc15c2>] mutex_lock_nested+0x32/0x40
      [572555.252975]  [<000003ff801762d0>] mlx5_lag_get_roce_netdev+0x30/0xc0 [mlx5_core]
      [572555.252996]  [<000003ff801fb3aa>] mlx5_ib_get_netdev+0x3a/0xe0 [mlx5_ib]
      [572555.253007]  [<000003ff80063848>] smc_pnet_find_roce_resource+0x1d8/0x310 [smc]
      [572555.253011]  [<000003ff800602f0>] __smc_connect+0x1f0/0x3e0 [smc]
      [572555.253015]  [<000003ff80060634>] smc_connect+0x154/0x190 [smc]
      [572555.253022]  [<00000000acbed8d4>] __sys_connect+0x94/0xd0
      [572555.253025]  [<00000000acbef620>] __s390x_sys_socketcall+0x170/0x360
      [572555.253028]  [<00000000acdc6800>] system_call+0x298/0x2b8
      [572555.253030] INFO: lockdep is turned off.
      
      Function smc_pnet_find_rdma_dev() might be called from
      smc_pnet_find_roce_resource(). It holds the smc_ib_devices list
      spinlock while calling infiniband op get_netdev(). At least for mlx5
      the get_netdev operation wants mutex serialization, which conflicts
      with the smc_ib_devices spinlock.
      This patch switches the smc_ib_devices spinlock into a mutex to
      allow sleeping when calling get_netdev().
      
      Fixes: a4cf0443 ("smc: introduce SMC as an IB-client")
      Signed-off-by: default avatarUrsula Braun <ubraun@linux.ibm.com>
      Signed-off-by: default avatarKarsten Graul <kgraul@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      92f3cb0e
    • Karsten Graul's avatar
      net/smc: fix work request handling · b7eede75
      Karsten Graul authored
      Wait for pending sends only when smc_switch_conns() found a link to move
      the connections to. Do not wait during link freeing, this can lead to
      permanent hang situations. And refuse to provide a new tx slot on an
      unusable link.
      
      Fixes: c6f02ebe ("net/smc: switch connections to alternate link")
      Reviewed-by: default avatarUrsula Braun <ubraun@linux.ibm.com>
      Signed-off-by: default avatarKarsten Graul <kgraul@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b7eede75
    • Karsten Graul's avatar
      net/smc: separate LLC wait queues for flow and messages · 6778a6be
      Karsten Graul authored
      There might be races in scenarios where both SMC link groups are on the
      same system. Prevent that by creating separate wait queues for LLC flows
      and messages. Switch to non-interruptable versions of wait_event() and
      wake_up() for the llc flow waiter to make sure the waiters get control
      sequentially. Fine tune the llc_flow_lock to include the assignment of
      the message. Write to system log when an unexpected message was
      dropped. And remove an extra indirection and use the existing local
      variable lgr in smc_llc_enqueue().
      
      Fixes: 555da9af ("net/smc: add event-based llc_flow framework")
      Reviewed-by: default avatarUrsula Braun <ubraun@linux.ibm.com>
      Signed-off-by: default avatarKarsten Graul <kgraul@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6778a6be
    • Dmitry Bogdanov's avatar
      net: atlantic: fix ip dst and ipv6 address filters · a42e6aee
      Dmitry Bogdanov authored
      This patch fixes ip dst and ipv6 address filters.
      There were 2 mistakes in the code, which led to the issue:
      * invalid register was used for ipv4 dst address;
      * incorrect write order of dwords for ipv6 addresses.
      
      Fixes: 23e7a718 ("net: aquantia: add rx-flow filter definitions")
      Signed-off-by: default avatarDmitry Bogdanov <dbogdanov@marvell.com>
      Signed-off-by: default avatarMark Starovoytov <mstarovoitov@marvell.com>
      Signed-off-by: default avatarAlexander Lobakin <alobakin@marvell.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a42e6aee
  2. 07 Jul, 2020 18 commits
    • Shannon Nelson's avatar
      ionic: centralize queue reset code · 086c18f2
      Shannon Nelson authored
      The queue reset pattern is used in a couple different places,
      only slightly different from each other, and could cause
      issues if one gets changed and the other didn't.  This puts
      them together so that only one version is needed, yet each
      can have slighty different effects by passing in a pointer
      to a work function to do whatever configuration twiddling is
      needed in the middle of the reset.
      
      This specifically addresses issues seen where under loops
      of changing ring size or queue count parameters we could
      occasionally bump into the netdev watchdog.
      
      v2: added more commit message commentary
      
      Fixes: 4d03e00a ("ionic: Add initial ethtool support")
      Signed-off-by: default avatarShannon Nelson <snelson@pensando.io>
      Acked-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      086c18f2
    • Toke Høiland-Jørgensen's avatar
      vlan: consolidate VLAN parsing code and limit max parsing depth · 469acedd
      Toke Høiland-Jørgensen authored
      Toshiaki pointed out that we now have two very similar functions to extract
      the L3 protocol number in the presence of VLAN tags. And Daniel pointed out
      that the unbounded parsing loop makes it possible for maliciously crafted
      packets to loop through potentially hundreds of tags.
      
      Fix both of these issues by consolidating the two parsing functions and
      limiting the VLAN tag parsing to a max depth of 8 tags. As part of this,
      switch over __vlan_get_protocol() to use skb_header_pointer() instead of
      pskb_may_pull(), to avoid the possible side effects of the latter and keep
      the skb pointer 'const' through all the parsing functions.
      
      v2:
      - Use limit of 8 tags instead of 32 (matching XMIT_RECURSION_LIMIT)
      Reported-by: default avatarToshiaki Makita <toshiaki.makita1@gmail.com>
      Reported-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Fixes: d7bf2ebe ("sched: consistently handle layer3 header accesses in the presence of VLANs")
      Signed-off-by: default avatarToke Høiland-Jørgensen <toke@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      469acedd
    • Alexander Lobakin's avatar
      net: qed: fix buffer overflow on ethtool -d · da328711
      Alexander Lobakin authored
      When generating debug dump, driver firstly collects all data in binary
      form, and then performs per-feature formatting to human-readable if it
      is supported.
      
      For ethtool -d, this is roughly incorrect for two reasons. First of all,
      drivers should always provide only original raw dumps to Ethtool without
      any changes.
      The second, and more critical, is that Ethtool's output buffer size is
      strictly determined by ethtool_ops::get_regs_len(), and all data *must*
      fit in it. The current version of driver always returns the size of raw
      data, but the size of the formatted buffer exceeds it in most cases.
      This leads to out-of-bound writes and memory corruption.
      
      Address both issues by adding an option to return original, non-formatted
      debug data, and using it for Ethtool case.
      
      v2:
       - Expand commit message to make it more clear;
       - No functional changes.
      
      Fixes: c965db44 ("qed: Add support for debug data collection")
      Signed-off-by: default avatarAlexander Lobakin <alobakin@marvell.com>
      Signed-off-by: default avatarIgor Russkikh <irusskikh@marvell.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      da328711
    • Linus Lüssing's avatar
      bridge: mcast: Fix MLD2 Report IPv6 payload length check · 5fc6266a
      Linus Lüssing authored
      Commit e57f6185 ("net: bridge: mcast: fix stale nsrcs pointer in
      igmp3/mld2 report handling") introduced a bug in the IPv6 header payload
      length check which would potentially lead to rejecting a valid MLD2 Report:
      
      The check needs to take into account the 2 bytes for the "Number of
      Sources" field in the "Multicast Address Record" before reading it.
      And not the size of a pointer to this field.
      
      Fixes: e57f6185 ("net: bridge: mcast: fix stale nsrcs pointer in igmp3/mld2 report handling")
      Acked-by: default avatarNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: default avatarLinus Lüssing <linus.luessing@c0d3.blue>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5fc6266a
    • Martin Varghese's avatar
      net: Added pointer check for dst->ops->neigh_lookup in dst_neigh_lookup_skb · 394de110
      Martin Varghese authored
      The packets from tunnel devices (eg bareudp) may have only
      metadata in the dst pointer of skb. Hence a pointer check of
      neigh_lookup is needed in dst_neigh_lookup_skb
      
      Kernel crashes when packets from bareudp device is processed in
      the kernel neighbour subsytem.
      
      [  133.384484] BUG: kernel NULL pointer dereference, address: 0000000000000000
      [  133.385240] #PF: supervisor instruction fetch in kernel mode
      [  133.385828] #PF: error_code(0x0010) - not-present page
      [  133.386603] PGD 0 P4D 0
      [  133.386875] Oops: 0010 [#1] SMP PTI
      [  133.387275] CPU: 0 PID: 5045 Comm: ping Tainted: G        W         5.8.0-rc2+ #15
      [  133.388052] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
      [  133.391076] RIP: 0010:0x0
      [  133.392401] Code: Bad RIP value.
      [  133.394029] RSP: 0018:ffffb79980003d50 EFLAGS: 00010246
      [  133.396656] RAX: 0000000080000102 RBX: ffff9de2fe0d6600 RCX: ffff9de2fe5e9d00
      [  133.399018] RDX: 0000000000000000 RSI: ffff9de2fe5e9d00 RDI: ffff9de2fc21b400
      [  133.399685] RBP: ffff9de2fe5e9d00 R08: 0000000000000000 R09: 0000000000000000
      [  133.400350] R10: ffff9de2fbc6be22 R11: ffff9de2fe0d6600 R12: ffff9de2fc21b400
      [  133.401010] R13: ffff9de2fe0d6628 R14: 0000000000000001 R15: 0000000000000003
      [  133.401667] FS:  00007fe014918740(0000) GS:ffff9de2fec00000(0000) knlGS:0000000000000000
      [  133.402412] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  133.402948] CR2: ffffffffffffffd6 CR3: 000000003bb72000 CR4: 00000000000006f0
      [  133.403611] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [  133.404270] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [  133.404933] Call Trace:
      [  133.405169]  <IRQ>
      [  133.405367]  __neigh_update+0x5a4/0x8f0
      [  133.405734]  arp_process+0x294/0x820
      [  133.406076]  ? __netif_receive_skb_core+0x866/0xe70
      [  133.406557]  arp_rcv+0x129/0x1c0
      [  133.406882]  __netif_receive_skb_one_core+0x95/0xb0
      [  133.407340]  process_backlog+0xa7/0x150
      [  133.407705]  net_rx_action+0x2af/0x420
      [  133.408457]  __do_softirq+0xda/0x2a8
      [  133.408813]  asm_call_on_stack+0x12/0x20
      [  133.409290]  </IRQ>
      [  133.409519]  do_softirq_own_stack+0x39/0x50
      [  133.410036]  do_softirq+0x50/0x60
      [  133.410401]  __local_bh_enable_ip+0x50/0x60
      [  133.410871]  ip_finish_output2+0x195/0x530
      [  133.411288]  ip_output+0x72/0xf0
      [  133.411673]  ? __ip_finish_output+0x1f0/0x1f0
      [  133.412122]  ip_send_skb+0x15/0x40
      [  133.412471]  raw_sendmsg+0x853/0xab0
      [  133.412855]  ? insert_pfn+0xfe/0x270
      [  133.413827]  ? vvar_fault+0xec/0x190
      [  133.414772]  sock_sendmsg+0x57/0x80
      [  133.415685]  __sys_sendto+0xdc/0x160
      [  133.416605]  ? syscall_trace_enter+0x1d4/0x2b0
      [  133.417679]  ? __audit_syscall_exit+0x1d9/0x280
      [  133.418753]  ? __prepare_exit_to_usermode+0x5d/0x1a0
      [  133.419819]  __x64_sys_sendto+0x24/0x30
      [  133.420848]  do_syscall_64+0x4d/0x90
      [  133.421768]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      [  133.422833] RIP: 0033:0x7fe013689c03
      [  133.423749] Code: Bad RIP value.
      [  133.424624] RSP: 002b:00007ffc7288f418 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
      [  133.425940] RAX: ffffffffffffffda RBX: 000056151fc63720 RCX: 00007fe013689c03
      [  133.427225] RDX: 0000000000000040 RSI: 000056151fc63720 RDI: 0000000000000003
      [  133.428481] RBP: 00007ffc72890b30 R08: 000056151fc60500 R09: 0000000000000010
      [  133.429757] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
      [  133.431041] R13: 000056151fc636e0 R14: 000056151fc616bc R15: 0000000000000080
      [  133.432481] Modules linked in: mpls_iptunnel act_mirred act_tunnel_key cls_flower sch_ingress veth mpls_router ip_tunnel bareudp ip6_udp_tunnel udp_tunnel macsec udp_diag inet_diag unix_diag af_packet_diag netlink_diag binfmt_misc xt_MASQUERADE iptable_nat xt_addrtype xt_conntrack nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 br_netfilter bridge stp llc ebtable_filter ebtables overlay ip6table_filter ip6_tables iptable_filter sunrpc ext4 mbcache jbd2 pcspkr i2c_piix4 virtio_balloon joydev ip_tables xfs libcrc32c ata_generic qxl pata_acpi drm_ttm_helper ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm ata_piix libata virtio_net net_failover virtio_console failover virtio_blk i2c_core virtio_pci virtio_ring serio_raw floppy virtio dm_mirror dm_region_hash dm_log dm_mod
      [  133.444045] CR2: 0000000000000000
      [  133.445082] ---[ end trace f4aeee1958fd1638 ]---
      [  133.446236] RIP: 0010:0x0
      [  133.447180] Code: Bad RIP value.
      [  133.448152] RSP: 0018:ffffb79980003d50 EFLAGS: 00010246
      [  133.449363] RAX: 0000000080000102 RBX: ffff9de2fe0d6600 RCX: ffff9de2fe5e9d00
      [  133.450835] RDX: 0000000000000000 RSI: ffff9de2fe5e9d00 RDI: ffff9de2fc21b400
      [  133.452237] RBP: ffff9de2fe5e9d00 R08: 0000000000000000 R09: 0000000000000000
      [  133.453722] R10: ffff9de2fbc6be22 R11: ffff9de2fe0d6600 R12: ffff9de2fc21b400
      [  133.455149] R13: ffff9de2fe0d6628 R14: 0000000000000001 R15: 0000000000000003
      [  133.456520] FS:  00007fe014918740(0000) GS:ffff9de2fec00000(0000) knlGS:0000000000000000
      [  133.458046] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  133.459342] CR2: ffffffffffffffd6 CR3: 000000003bb72000 CR4: 00000000000006f0
      [  133.460782] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [  133.462240] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [  133.463697] Kernel panic - not syncing: Fatal exception in interrupt
      [  133.465226] Kernel Offset: 0xfa00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
      [  133.467025] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---
      
      Fixes: aaa0c23c ("Fix dst_neigh_lookup/dst_neigh_lookup_skb return value handling bug")
      Signed-off-by: default avatarMartin Varghese <martin.varghese@nokia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      394de110
    • wenxu's avatar
      net/sched: act_ct: add miss tcf_lastuse_update. · 8367b3ab
      wenxu authored
      When tcf_ct_act execute the tcf_lastuse_update should
      be update or the used stats never update
      
      filter protocol ip pref 3 flower chain 0
      filter protocol ip pref 3 flower chain 0 handle 0x1
        eth_type ipv4
        dst_ip 1.1.1.1
        ip_flags frag/firstfrag
        skip_hw
        not_in_hw
       action order 1: ct zone 1 nat pipe
        index 1 ref 1 bind 1 installed 103 sec used 103 sec
       Action statistics:
       Sent 151500 bytes 101 pkt (dropped 0, overlimits 0 requeues 0)
       backlog 0b 0p requeues 0
       cookie 4519c04dc64a1a295787aab13b6a50fb
      Signed-off-by: default avatarwenxu <wenxu@ucloud.cn>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8367b3ab
    • Sebastian Andrzej Siewior's avatar
      net/mlx5e: Do not include rwlock.h directly · f0b594df
      Sebastian Andrzej Siewior authored
      rwlock.h should not be included directly. Instead linux/splinlock.h
      should be included. Including it directly will break the RT build.
      
      Fixes: 549c243e ("net/mlx5e: Extract neigh-specific code from en_rep.c to rep/neigh.c")
      Signed-off-by: default avatarSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Reviewed-by: default avatarLeon Romanovsky <leonro@mellanox.com>
      Acked-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f0b594df
    • Paolo Abeni's avatar
      mptcp: fix DSS map generation on fin retransmission · 9c29e361
      Paolo Abeni authored
      The RFC 8684 mandates that no-data DATA FIN packets should carry
      a DSS with 0 sequence number and data len equal to 1. Currently,
      on FIN retransmission we re-use the existing mapping; if the previous
      fin transmission was part of a partially acked data packet, we could
      end-up writing in the egress packet a non-compliant DSS.
      
      The above will be detected by a "Bad mapping" warning on the receiver
      side.
      
      This change addresses the issue explicitly checking for 0 len packet
      when adding the DATA_FIN option.
      
      Fixes: 6d0060f6 ("mptcp: Write MPTCP DSS headers to outgoing data packets")
      Reported-by: syzbot+42a07faa5923cfaeb9c9@syzkaller.appspotmail.com
      Tested-by: default avatarChristoph Paasch <cpaasch@apple.com>
      Reviewed-by: default avatarChristoph Paasch <cpaasch@apple.com>
      Reviewed-by: default avatarMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9c29e361
    • Sabrina Dubroca's avatar
      ipv4: fill fl4_icmp_{type,code} in ping_v4_sendmsg · 5eff0690
      Sabrina Dubroca authored
      IPv4 ping sockets don't set fl4.fl4_icmp_{type,code}, which leads to
      incomplete IPsec ACQUIRE messages being sent to userspace. Currently,
      both raw sockets and IPv6 ping sockets set those fields.
      
      Expected output of "ip xfrm monitor":
          acquire proto esp
            sel src 10.0.2.15/32 dst 8.8.8.8/32 proto icmp type 8 code 0 dev ens4
            policy src 10.0.2.15/32 dst 8.8.8.8/32
              <snip>
      
      Currently with ping sockets:
          acquire proto esp
            sel src 10.0.2.15/32 dst 8.8.8.8/32 proto icmp type 0 code 0 dev ens4
            policy src 10.0.2.15/32 dst 8.8.8.8/32
              <snip>
      
      The Libreswan test suite found this problem after Fedora changed the
      value for the sysctl net.ipv4.ping_group_range.
      
      Fixes: c319b4d7 ("net: ipv4: add IPPROTO_ICMP socket kind")
      Reported-by: default avatarPaul Wouters <pwouters@redhat.com>
      Tested-by: default avatarPaul Wouters <pwouters@redhat.com>
      Signed-off-by: default avatarSabrina Dubroca <sd@queasysnail.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5eff0690
    • Tobias Waldekranz's avatar
      net: ethernet: fec: prevent tx starvation under high rx load · 7cdaa4cc
      Tobias Waldekranz authored
      In the ISR, we poll the event register for the queues in need of
      service and then enter polled mode. After this point, the event
      register will never be read again until we exit polled mode.
      
      In a scenario where a UDP flow is routed back out through the same
      interface, i.e. "router-on-a-stick" we'll typically only see an rx
      queue event initially. Once we start to process the incoming flow
      we'll be locked polled mode, but we'll never clean the tx rings since
      that event is never caught.
      
      Eventually the netdev watchdog will trip, causing all buffers to be
      dropped and then the process starts over again.
      
      Rework the NAPI poll to keep trying to consome the entire budget as
      long as new events are coming in, making sure to service all rx/tx
      queues, in priority order, on each pass.
      
      Fixes: 4d494cdc ("net: fec: change data structure to support multiqueue")
      Signed-off-by: default avatarTobias Waldekranz <tobias@waldekranz.com>
      Tested-by: default avatarFugang Duan <fugang.duan@nxp.com>
      Reviewed-by: default avatarFugang Duan <fugang.duan@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7cdaa4cc
    • Tom Rix's avatar
      net: sky2: initialize return of gm_phy_read · 28b18e4e
      Tom Rix authored
      clang static analysis flags this garbage return
      
      drivers/net/ethernet/marvell/sky2.c:208:2: warning: Undefined or garbage value returned to caller [core.uninitialized.UndefReturn]
              return v;
              ^~~~~~~~
      
      static inline u16 gm_phy_read( ...
      {
      	u16 v;
      	__gm_phy_read(hw, port, reg, &v);
      	return v;
      }
      
      __gm_phy_read can return without setting v.
      
      So handle similar to skge.c's gm_phy_read, initialize v.
      Signed-off-by: default avatarTom Rix <trix@redhat.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      28b18e4e
    • Cong Wang's avatar
      cgroup: fix cgroup_sk_alloc() for sk_clone_lock() · ad0f75e5
      Cong Wang authored
      When we clone a socket in sk_clone_lock(), its sk_cgrp_data is
      copied, so the cgroup refcnt must be taken too. And, unlike the
      sk_alloc() path, sock_update_netprioidx() is not called here.
      Therefore, it is safe and necessary to grab the cgroup refcnt
      even when cgroup_sk_alloc is disabled.
      
      sk_clone_lock() is in BH context anyway, the in_interrupt()
      would terminate this function if called there. And for sk_alloc()
      skcd->val is always zero. So it's safe to factor out the code
      to make it more readable.
      
      The global variable 'cgroup_sk_alloc_disabled' is used to determine
      whether to take these reference counts. It is impossible to make
      the reference counting correct unless we save this bit of information
      in skcd->val. So, add a new bit there to record whether the socket
      has already taken the reference counts. This obviously relies on
      kmalloc() to align cgroup pointers to at least 4 bytes,
      ARCH_KMALLOC_MINALIGN is certainly larger than that.
      
      This bug seems to be introduced since the beginning, commit
      d979a39d ("cgroup: duplicate cgroup reference when cloning sockets")
      tried to fix it but not compeletely. It seems not easy to trigger until
      the recent commit 090e28b2
      ("netprio_cgroup: Fix unlimited memory leak of v2 cgroups") was merged.
      
      Fixes: bd1060a1 ("sock, cgroup: add sock->sk_cgroup")
      Reported-by: default avatarCameron Berkenpas <cam@neo-zeon.de>
      Reported-by: default avatarPeter Geis <pgwipeout@gmail.com>
      Reported-by: default avatarLu Fengqi <lufq.fnst@cn.fujitsu.com>
      Reported-by: default avatarDaniël Sonck <dsonck92@gmail.com>
      Reported-by: default avatarZhang Qiang <qiang.zhang@windriver.com>
      Tested-by: default avatarCameron Berkenpas <cam@neo-zeon.de>
      Tested-by: default avatarPeter Geis <pgwipeout@gmail.com>
      Tested-by: default avatarThomas Lamprecht <t.lamprecht@proxmox.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Zefan Li <lizefan@huawei.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Roman Gushchin <guro@fb.com>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ad0f75e5
    • David Ahern's avatar
      ipv6: Fix use of anycast address with loopback · aea23c32
      David Ahern authored
      Thomas reported a regression with IPv6 and anycast using the following
      reproducer:
      
          echo 1 >  /proc/sys/net/ipv6/conf/all/forwarding
          ip -6 a add fc12::1/16 dev lo
          sleep 2
          echo "pinging lo"
          ping6 -c 2 fc12::
      
      The conversion of addrconf_f6i_alloc to use ip6_route_info_create missed
      the use of fib6_is_reject which checks addresses added to the loopback
      interface and sets the REJECT flag as needed. Update fib6_is_reject for
      loopback checks to handle RTF_ANYCAST addresses.
      
      Fixes: c7a1ce39 ("ipv6: Change addrconf_f6i_alloc to use ip6_route_info_create")
      Reported-by: thomas.gambier@nexedi.com
      Signed-off-by: default avatarDavid Ahern <dsahern@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      aea23c32
    • AceLan Kao's avatar
      net: usb: qmi_wwan: add support for Quectel EG95 LTE modem · f815dd5c
      AceLan Kao authored
      Add support for Quectel Wireless Solutions Co., Ltd. EG95 LTE modem
      
      T:  Bus=01 Lev=01 Prnt=01 Port=02 Cnt=02 Dev#=  5 Spd=480 MxCh= 0
      D:  Ver= 2.00 Cls=ef(misc ) Sub=02 Prot=01 MxPS=64 #Cfgs=  1
      P:  Vendor=2c7c ProdID=0195 Rev=03.18
      S:  Manufacturer=Android
      S:  Product=Android
      C:  #Ifs= 5 Cfg#= 1 Atr=a0 MxPwr=500mA
      I:  If#=0x0 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=ff Driver=(none)
      I:  If#=0x1 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=00 Driver=(none)
      I:  If#=0x2 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=00 Driver=(none)
      I:  If#=0x3 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=00 Driver=(none)
      I:  If#=0x4 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=ff Driver=(none)
      Signed-off-by: default avatarAceLan Kao <acelan.kao@canonical.com>
      Acked-by: default avatarBjørn Mork <bjorn@mork.no>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f815dd5c
    • David S. Miller's avatar
      Merge branch 'net-ipa-fix-warning-reported-errors' · 92cffd48
      David S. Miller authored
      Alex Elder says:
      
      ====================
      net: ipa: fix warning-reported errors
      
      Building the kernel with W=1 produces numerous warnings for the IPA
      code.  Some of those warnings turn out to flag real problems, and
      this series fixes them.  The first patch fixes the most important
      ones, but the second and third are problems I think are worth
      treating as bugs as well.
      
      Note:  I'll happily combine any of these if someone prefers that.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      92cffd48
    • Alex Elder's avatar
      net: ipa: include declarations in "ipa_gsi.c" · a21c1f02
      Alex Elder authored
      Include "ipa_gsi.h" in "ipa_gsi.c", so the public functions are
      defined before they are used in "ipa_gsi.c".  This addresses some
      warnings that are reported with a "W=1" build.
      
      Fixes: c3f398b1 ("soc: qcom: ipa: IPA interface to GSI")
      Signed-off-by: default avatarAlex Elder <elder@linaro.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a21c1f02
    • Alex Elder's avatar
      net: ipa: declare struct types in "ipa_gsi.h" · 3c90e95b
      Alex Elder authored
      Pointers to two struct types are used in "ipa_gsi.h", without those
      struct types being forward-declared.  Add these declarations.
      
      Fixes: c3f398b1 ("soc: qcom: ipa: IPA interface to GSI")
      Signed-off-by: default avatarAlex Elder <elder@linaro.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3c90e95b
    • Alex Elder's avatar
      net: ipa: fix QMI structure definition bugs · 74478ea4
      Alex Elder authored
      Building with "W=1" did exactly what it was supposed to do, namely
      point out some suspicious-looking code to be verified not to contain
      bugs.
      
      Some QMI message structures defined in "ipa_qmi_msg.c" contained
      some bad field names (duplicating the "elem_size" field instead of
      defining the "offset" field), almost certainly due to copy/paste
      errors that weren't obvious in a scan of the code.  Fix these bugs.
      
      Fixes: 530f9216 ("soc: qcom: ipa: AP/modem communications")
      Signed-off-by: default avatarAlex Elder <elder@linaro.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      74478ea4
  3. 06 Jul, 2020 10 commits
  4. 05 Jul, 2020 2 commits
    • David S. Miller's avatar
      Merge branch 'net-rmnet-fix-interface-leak-for-rmnet-module' · 0f57a1e5
      David S. Miller authored
      Taehee Yoo says:
      
      ====================
      net: rmnet: fix interface leak for rmnet module
      
      There are two problems in rmnet module that they occur the leak of
      a lower interface.
      The symptom is the same, which is the leak of a lower interface.
      But there are two different real problems.
      This patchset is to fix these real problems.
      
      1. Do not allow to have different two modes.
      As a lower interface of rmnet, there are two modes that they are VND
      and BRIDGE.
      One interface can have only one mode.
      But in the current rmnet, there is no code to prevent to have
      two modes in one lower interface.
      So, interface leak occurs.
      
      2. Do not allow to add multiple bridge interfaces.
      rmnet can have only two bridge interface.
      If an additional bridge interface is tried to be attached,
      rmnet should deny it.
      But there is no code to do that.
      So, interface leak occurs.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0f57a1e5
    • Taehee Yoo's avatar
      net: rmnet: do not allow to add multiple bridge interfaces · 2fb2799a
      Taehee Yoo authored
      rmnet can have only two bridge interface.
      One of them is a link interface and another one is added by
      the master operation.
      rmnet interface shouldn't allow adding additional
      bridge interfaces by mater operation.
      But, there is no code to deny additional interfaces.
      So, interface leak occurs.
      
      Test commands:
          ip link add dummy0 type dummy
          ip link add dummy1 type dummy
          ip link add dummy2 type dummy
          ip link add rmnet0 link dummy0 type rmnet mux_id 1
          ip link set dummy1 master rmnet0
          ip link set dummy2 master rmnet0
          ip link del rmnet0
      
      In the above test command, the dummy0 was attached to rmnet as VND mode.
      Then, dummy1 was attached to rmnet0 as BRIDGE mode.
      At this point, dummy0 mode is switched from VND to BRIDGE automatically.
      Then, dummy2 is attached to rmnet as BRIDGE mode.
      At this point, rmnet0 should deny this operation.
      But, rmnet0 doesn't deny this.
      So that below splat occurs when the rmnet0 interface is deleted.
      
      Splat looks like:
      [  186.684787][    C2] WARNING: CPU: 2 PID: 1009 at net/core/dev.c:8992 rollback_registered_many+0x986/0xcf0
      [  186.684788][    C2] Modules linked in: rmnet dummy openvswitch nsh nf_conncount nf_nat nf_conntrack nf_defrag_x
      [  186.684805][    C2] CPU: 2 PID: 1009 Comm: ip Not tainted 5.8.0-rc1+ #621
      [  186.684807][    C2] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
      [  186.684808][    C2] RIP: 0010:rollback_registered_many+0x986/0xcf0
      [  186.684811][    C2] Code: 41 8b 4e cc 45 31 c0 31 d2 4c 89 ee 48 89 df e8 e0 47 ff ff 85 c0 0f 84 cd fc ff ff 5
      [  186.684812][    C2] RSP: 0018:ffff8880cd9472e0 EFLAGS: 00010287
      [  186.684815][    C2] RAX: ffff8880cc56da58 RBX: ffff8880ab21c000 RCX: ffffffff9329d323
      [  186.684816][    C2] RDX: 1ffffffff2be6410 RSI: 0000000000000008 RDI: ffffffff95f32080
      [  186.684818][    C2] RBP: dffffc0000000000 R08: fffffbfff2be6411 R09: fffffbfff2be6411
      [  186.684819][    C2] R10: ffffffff95f32087 R11: 0000000000000001 R12: ffff8880cd947480
      [  186.684820][    C2] R13: ffff8880ab21c0b8 R14: ffff8880cd947400 R15: ffff8880cdf10640
      [  186.684822][    C2] FS:  00007f00843890c0(0000) GS:ffff8880d4e00000(0000) knlGS:0000000000000000
      [  186.684823][    C2] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  186.684825][    C2] CR2: 000055b8ab1077b8 CR3: 00000000ab612006 CR4: 00000000000606e0
      [  186.684826][    C2] Call Trace:
      [  186.684827][    C2]  ? lockdep_hardirqs_on_prepare+0x379/0x540
      [  186.684829][    C2]  ? netif_set_real_num_tx_queues+0x780/0x780
      [  186.684830][    C2]  ? rmnet_unregister_real_device+0x56/0x90 [rmnet]
      [  186.684831][    C2]  ? __kasan_slab_free+0x126/0x150
      [  186.684832][    C2]  ? kfree+0xdc/0x320
      [  186.684834][    C2]  ? rmnet_unregister_real_device+0x56/0x90 [rmnet]
      [  186.684835][    C2]  unregister_netdevice_many.part.135+0x13/0x1b0
      [  186.684836][    C2]  rtnl_delete_link+0xbc/0x100
      [ ... ]
      [  238.440071][ T1009] unregister_netdevice: waiting for rmnet0 to become free. Usage count = 1
      
      Fixes: 037f9cdf ("net: rmnet: use upper/lower device infrastructure")
      Signed-off-by: default avatarTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2fb2799a