Commits · b8b79c414eca4e9bcab645e02cb92c48db974ce9 · Kirill Smelkov / linux

21 Jun, 2021 16 commits

net: dsa: mv88e6xxx: Fix adding vlan 0 · b8b79c41

Eldar Gasanov authored Jun 21, 2021

8021q module adds vlan 0 to all interfaces when it starts.
When 8021q module is loaded it isn't possible to create bond
with mv88e6xxx interfaces, bonding module dipslay error
"Couldn't add bond vlan ids", because it tries to add vlan 0
to slave interfaces.

There is unexpected behavior in the switch. When a PVID
is assigned to a port the switch changes VID to PVID
in ingress frames with VID 0 on the port. Expected
that the switch doesn't assign PVID to tagged frames
with VID 0. But there isn't a way to change this behavior
in the switch.

Fixes: 57e661aa ("net: dsa: mv88e6xxx: Link aggregation support")
Signed-off-by: Eldar Gasanov <eldargasanov2@gmail.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

b8b79c41

vsock: notify server to shutdown when client has pending signal · c7ff9cff

Longpeng(Mike) authored Jun 21, 2021

The client's sk_state will be set to TCP_ESTABLISHED if the server
replay the client's connect request.

However, if the client has pending signal, its sk_state will be set
to TCP_CLOSE without notify the server, so the server will hold the
corrupt connection.

            client                        server

1. sk_state=TCP_SYN_SENT         |
2. call ->connect()              |
3. wait reply                    |
                                 | 4. sk_state=TCP_ESTABLISHED
                                 | 5. insert to connected list
                                 | 6. reply to the client
7. sk_state=TCP_ESTABLISHED      |
8. insert to connected list      |
9. *signal pending* <--------------------- the user kill client
10. sk_state=TCP_CLOSE           |
client is exiting...             |
11. call ->release()             |
     virtio_transport_close
      if (!(sk->sk_state == TCP_ESTABLISHED ||
	      sk->sk_state == TCP_CLOSING))
		return true; *return at here, the server cannot notice the connection is corrupt*

So the client should notify the peer in this case.

Cc: David S. Miller <davem@davemloft.net>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Jorgen Hansen <jhansen@vmware.com>
Cc: Norbert Slusarek <nslusarek@gmx.net>
Cc: Andra Paraschiv <andraprs@amazon.com>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: David Brazdil <dbrazdil@google.com>
Cc: Alexander Popov <alex.popov@linux.com>
Suggested-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://lkml.org/lkml/2021/5/17/418Signed-off-by: lixianming <lixianming5@huawei.com>
Signed-off-by: Longpeng(Mike) <longpeng2@huawei.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c7ff9cff

net: mana: Fix a memory leak in an error handling path in 'mana_create_txq()' · b9078845

Christophe JAILLET authored Jun 20, 2021

If this test fails we must free some resources as in all the other error
handling paths of this function.

Fixes: ca9c54d2 ("net: mana: Add a driver for Microsoft Azure Network Adapter (MANA)")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

b9078845

Merge branch 'nnicstar-fixes' · 4f35dabb

David S. Miller authored Jun 21, 2021

Zheyu Ma says:

====================
atm: nicstar: fix two bugs about error handling

Zheyu Ma (2):
  atm: nicstar: use 'dma_free_coherent' instead of 'kfree'
  atm: nicstar: register the interrupt handler in the right place
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

4f35dabb

atm: nicstar: register the interrupt handler in the right place · 70b639dc

Zheyu Ma authored Jun 20, 2021

Because the error handling is sequential, the application of resources
should be carried out in the order of error handling, so the operation
of registering the interrupt handler should be put in front, so as not
to free the unregistered interrupt handler during error handling.

This log reveals it:

[    3.438724] Trying to free already-free IRQ 23
[    3.439060] WARNING: CPU: 5 PID: 1 at kernel/irq/manage.c:1825 free_irq+0xfb/0x480
[    3.440039] Modules linked in:
[    3.440257] CPU: 5 PID: 1 Comm: swapper/0 Not tainted 5.12.4-g70e7f0549188-dirty #142
[    3.440793] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014
[    3.441561] RIP: 0010:free_irq+0xfb/0x480
[    3.441845] Code: 6e 08 74 6f 4d 89 f4 e8 c3 78 09 00 4d 8b 74 24 18 4d 85 f6 75 e3 e8 b4 78 09 00 8b 75 c8 48 c7 c7 a0 ac d5 85 e8 95 d7 f5 ff <0f> 0b 48 8b 75 c0 4c 89 ff e8 87 c5 90 03 48 8b 43 40 4c 8b a0 80
[    3.443121] RSP: 0000:ffffc90000017b50 EFLAGS: 00010086
[    3.443483] RAX: 0000000000000000 RBX: ffff888107c6f000 RCX: 0000000000000000
[    3.443972] RDX: 0000000000000000 RSI: ffffffff8123f301 RDI: 00000000ffffffff
[    3.444462] RBP: ffffc90000017b90 R08: 0000000000000001 R09: 0000000000000003
[    3.444950] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000
[    3.444994] R13: ffff888107dc0000 R14: ffff888104f6bf00 R15: ffff888107c6f0a8
[    3.444994] FS:  0000000000000000(0000) GS:ffff88817bd40000(0000) knlGS:0000000000000000
[    3.444994] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    3.444994] CR2: 0000000000000000 CR3: 000000000642e000 CR4: 00000000000006e0
[    3.444994] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[    3.444994] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[    3.444994] Call Trace:
[    3.444994]  ns_init_card_error+0x18e/0x250
[    3.444994]  nicstar_init_one+0x10d2/0x1130
[    3.444994]  local_pci_probe+0x4a/0xb0
[    3.444994]  pci_device_probe+0x126/0x1d0
[    3.444994]  ? pci_device_remove+0x100/0x100
[    3.444994]  really_probe+0x27e/0x650
[    3.444994]  driver_probe_device+0x84/0x1d0
[    3.444994]  ? mutex_lock_nested+0x16/0x20
[    3.444994]  device_driver_attach+0x63/0x70
[    3.444994]  __driver_attach+0x117/0x1a0
[    3.444994]  ? device_driver_attach+0x70/0x70
[    3.444994]  bus_for_each_dev+0xb6/0x110
[    3.444994]  ? rdinit_setup+0x40/0x40
[    3.444994]  driver_attach+0x22/0x30
[    3.444994]  bus_add_driver+0x1e6/0x2a0
[    3.444994]  driver_register+0xa4/0x180
[    3.444994]  __pci_register_driver+0x77/0x80
[    3.444994]  ? uPD98402_module_init+0xd/0xd
[    3.444994]  nicstar_init+0x1f/0x75
[    3.444994]  do_one_initcall+0x7a/0x3d0
[    3.444994]  ? rdinit_setup+0x40/0x40
[    3.444994]  ? rcu_read_lock_sched_held+0x4a/0x70
[    3.444994]  kernel_init_freeable+0x2a7/0x2f9
[    3.444994]  ? rest_init+0x2c0/0x2c0
[    3.444994]  kernel_init+0x13/0x180
[    3.444994]  ? rest_init+0x2c0/0x2c0
[    3.444994]  ? rest_init+0x2c0/0x2c0
[    3.444994]  ret_from_fork+0x1f/0x30
[    3.444994] Kernel panic - not syncing: panic_on_warn set ...
[    3.444994] CPU: 5 PID: 1 Comm: swapper/0 Not tainted 5.12.4-g70e7f0549188-dirty #142
[    3.444994] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014
[    3.444994] Call Trace:
[    3.444994]  dump_stack+0xba/0xf5
[    3.444994]  ? free_irq+0xfb/0x480
[    3.444994]  panic+0x155/0x3ed
[    3.444994]  ? __warn+0xed/0x150
[    3.444994]  ? free_irq+0xfb/0x480
[    3.444994]  __warn+0x103/0x150
[    3.444994]  ? free_irq+0xfb/0x480
[    3.444994]  report_bug+0x119/0x1c0
[    3.444994]  handle_bug+0x3b/0x80
[    3.444994]  exc_invalid_op+0x18/0x70
[    3.444994]  asm_exc_invalid_op+0x12/0x20
[    3.444994] RIP: 0010:free_irq+0xfb/0x480
[    3.444994] Code: 6e 08 74 6f 4d 89 f4 e8 c3 78 09 00 4d 8b 74 24 18 4d 85 f6 75 e3 e8 b4 78 09 00 8b 75 c8 48 c7 c7 a0 ac d5 85 e8 95 d7 f5 ff <0f> 0b 48 8b 75 c0 4c 89 ff e8 87 c5 90 03 48 8b 43 40 4c 8b a0 80
[    3.444994] RSP: 0000:ffffc90000017b50 EFLAGS: 00010086
[    3.444994] RAX: 0000000000000000 RBX: ffff888107c6f000 RCX: 0000000000000000
[    3.444994] RDX: 0000000000000000 RSI: ffffffff8123f301 RDI: 00000000ffffffff
[    3.444994] RBP: ffffc90000017b90 R08: 0000000000000001 R09: 0000000000000003
[    3.444994] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000
[    3.444994] R13: ffff888107dc0000 R14: ffff888104f6bf00 R15: ffff888107c6f0a8
[    3.444994]  ? vprintk_func+0x71/0x110
[    3.444994]  ns_init_card_error+0x18e/0x250
[    3.444994]  nicstar_init_one+0x10d2/0x1130
[    3.444994]  local_pci_probe+0x4a/0xb0
[    3.444994]  pci_device_probe+0x126/0x1d0
[    3.444994]  ? pci_device_remove+0x100/0x100
[    3.444994]  really_probe+0x27e/0x650
[    3.444994]  driver_probe_device+0x84/0x1d0
[    3.444994]  ? mutex_lock_nested+0x16/0x20
[    3.444994]  device_driver_attach+0x63/0x70
[    3.444994]  __driver_attach+0x117/0x1a0
[    3.444994]  ? device_driver_attach+0x70/0x70
[    3.444994]  bus_for_each_dev+0xb6/0x110
[    3.444994]  ? rdinit_setup+0x40/0x40
[    3.444994]  driver_attach+0x22/0x30
[    3.444994]  bus_add_driver+0x1e6/0x2a0
[    3.444994]  driver_register+0xa4/0x180
[    3.444994]  __pci_register_driver+0x77/0x80
[    3.444994]  ? uPD98402_module_init+0xd/0xd
[    3.444994]  nicstar_init+0x1f/0x75
[    3.444994]  do_one_initcall+0x7a/0x3d0
[    3.444994]  ? rdinit_setup+0x40/0x40
[    3.444994]  ? rcu_read_lock_sched_held+0x4a/0x70
[    3.444994]  kernel_init_freeable+0x2a7/0x2f9
[    3.444994]  ? rest_init+0x2c0/0x2c0
[    3.444994]  kernel_init+0x13/0x180
[    3.444994]  ? rest_init+0x2c0/0x2c0
[    3.444994]  ? rest_init+0x2c0/0x2c0
[    3.444994]  ret_from_fork+0x1f/0x30
[    3.444994] Dumping ftrace buffer:
[    3.444994]    (ftrace buffer empty)
[    3.444994] Kernel Offset: disabled
[    3.444994] Rebooting in 1 seconds..
Signed-off-by: Zheyu Ma <zheyuma97@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

70b639dc

atm: nicstar: use 'dma_free_coherent' instead of 'kfree' · 6a1e5a4a

Zheyu Ma authored Jun 20, 2021

When 'nicstar_init_one' fails, 'ns_init_card_error' will be executed for
error handling, but the correct memory free function should be used,
otherwise it will cause an error. Since 'card->rsq.org' and
'card->tsq.org' are allocated using 'dma_alloc_coherent' function, they
should be freed using 'dma_free_coherent'.

Fix this by using 'dma_free_coherent' instead of 'kfree'

This log reveals it:

[    3.440294] kernel BUG at mm/slub.c:4206!
[    3.441059] invalid opcode: 0000 [#1] PREEMPT SMP PTI
[    3.441430] CPU: 2 PID: 1 Comm: swapper/0 Not tainted 5.12.4-g70e7f0549188-dirty #141
[    3.441986] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014
[    3.442780] RIP: 0010:kfree+0x26a/0x300
[    3.443065] Code: e8 3a c3 b9 ff e9 d6 fd ff ff 49 8b 45 00 31 db a9 00 00 01 00 75 4d 49 8b 45 00 a9 00 00 01 00 75 0a 49 8b 45 08 a8 01 75 02 <0f> 0b 89 d9 b8 00 10 00 00 be 06 00 00 00 48 d3 e0 f7 d8 48 63 d0
[    3.443396] RSP: 0000:ffffc90000017b70 EFLAGS: 00010246
[    3.443396] RAX: dead000000000100 RBX: 0000000000000000 RCX: 0000000000000000
[    3.443396] RDX: 0000000000000000 RSI: ffffffff85d3df94 RDI: ffffffff85df38e6
[    3.443396] RBP: ffffc90000017b90 R08: 0000000000000001 R09: 0000000000000001
[    3.443396] R10: 0000000000000000 R11: 0000000000000001 R12: ffff888107dc0000
[    3.443396] R13: ffffea00001f0100 R14: ffff888101a8bf00 R15: ffff888107dc0160
[    3.443396] FS:  0000000000000000(0000) GS:ffff88817bc80000(0000) knlGS:0000000000000000
[    3.443396] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    3.443396] CR2: 0000000000000000 CR3: 000000000642e000 CR4: 00000000000006e0
[    3.443396] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[    3.443396] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[    3.443396] Call Trace:
[    3.443396]  ns_init_card_error+0x12c/0x220
[    3.443396]  nicstar_init_one+0x10d2/0x1130
[    3.443396]  local_pci_probe+0x4a/0xb0
[    3.443396]  pci_device_probe+0x126/0x1d0
[    3.443396]  ? pci_device_remove+0x100/0x100
[    3.443396]  really_probe+0x27e/0x650
[    3.443396]  driver_probe_device+0x84/0x1d0
[    3.443396]  ? mutex_lock_nested+0x16/0x20
[    3.443396]  device_driver_attach+0x63/0x70
[    3.443396]  __driver_attach+0x117/0x1a0
[    3.443396]  ? device_driver_attach+0x70/0x70
[    3.443396]  bus_for_each_dev+0xb6/0x110
[    3.443396]  ? rdinit_setup+0x40/0x40
[    3.443396]  driver_attach+0x22/0x30
[    3.443396]  bus_add_driver+0x1e6/0x2a0
[    3.443396]  driver_register+0xa4/0x180
[    3.443396]  __pci_register_driver+0x77/0x80
[    3.443396]  ? uPD98402_module_init+0xd/0xd
[    3.443396]  nicstar_init+0x1f/0x75
[    3.443396]  do_one_initcall+0x7a/0x3d0
[    3.443396]  ? rdinit_setup+0x40/0x40
[    3.443396]  ? rcu_read_lock_sched_held+0x4a/0x70
[    3.443396]  kernel_init_freeable+0x2a7/0x2f9
[    3.443396]  ? rest_init+0x2c0/0x2c0
[    3.443396]  kernel_init+0x13/0x180
[    3.443396]  ? rest_init+0x2c0/0x2c0
[    3.443396]  ? rest_init+0x2c0/0x2c0
[    3.443396]  ret_from_fork+0x1f/0x30
[    3.443396] Modules linked in:
[    3.443396] Dumping ftrace buffer:
[    3.443396]    (ftrace buffer empty)
[    3.458593] ---[ end trace 3c6f8f0d8ef59bcd ]---
[    3.458922] RIP: 0010:kfree+0x26a/0x300
[    3.459198] Code: e8 3a c3 b9 ff e9 d6 fd ff ff 49 8b 45 00 31 db a9 00 00 01 00 75 4d 49 8b 45 00 a9 00 00 01 00 75 0a 49 8b 45 08 a8 01 75 02 <0f> 0b 89 d9 b8 00 10 00 00 be 06 00 00 00 48 d3 e0 f7 d8 48 63 d0
[    3.460499] RSP: 0000:ffffc90000017b70 EFLAGS: 00010246
[    3.460870] RAX: dead000000000100 RBX: 0000000000000000 RCX: 0000000000000000
[    3.461371] RDX: 0000000000000000 RSI: ffffffff85d3df94 RDI: ffffffff85df38e6
[    3.461873] RBP: ffffc90000017b90 R08: 0000000000000001 R09: 0000000000000001
[    3.462372] R10: 0000000000000000 R11: 0000000000000001 R12: ffff888107dc0000
[    3.462871] R13: ffffea00001f0100 R14: ffff888101a8bf00 R15: ffff888107dc0160
[    3.463368] FS:  0000000000000000(0000) GS:ffff88817bc80000(0000) knlGS:0000000000000000
[    3.463949] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    3.464356] CR2: 0000000000000000 CR3: 000000000642e000 CR4: 00000000000006e0
[    3.464856] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[    3.465356] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[    3.465860] Kernel panic - not syncing: Fatal exception
[    3.466370] Dumping ftrace buffer:
[    3.466616]    (ftrace buffer empty)
[    3.466871] Kernel Offset: disabled
[    3.467122] Rebooting in 1 seconds..
Signed-off-by: Zheyu Ma <zheyuma97@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

6a1e5a4a

Merge branch 'mptcp-sdeq-fixes' · 0d0f2a36

David S. Miller authored Jun 21, 2021

Mat Martineau says:

====================
mptcp: 32-bit sequence number improvements

MPTCP-level sequence numbers are 64 bits, but RFC 8684 allows use of
32-bit sequence numbers in the DSS option to save header space. Those
32-bit numbers are the least significant bits of the full 64-bit
sequence number, so the receiver must infer the correct upper 32 bits.

These two patches improve the logic for determining the full 64-bit
sequence numbers when the 32-bit truncated version has wrapped around.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

0d0f2a36

mptcp: fix 32 bit DSN expansion · 5957a890

Paolo Abeni authored Jun 18, 2021

The current implementation of 32 bit DSN expansion is buggy.
After the previous patch, we can simply reuse the newly
introduced helper to do the expansion safely.

Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/120
Fixes: 648ef4b8 ("mptcp: Implement MPTCP receive path")
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

5957a890

mptcp: fix bad handling of 32 bit ack wrap-around · 1502328f

Paolo Abeni authored Jun 18, 2021

When receiving 32 bits DSS ack from the peer, the MPTCP need
to expand them to 64 bits value. The current code is buggy
WRT detecting 32 bits ack wrap-around: when the wrap-around
happens the current unsigned 32 bit ack value is lower than
the previous one.

Additionally check for possible reverse wrap and make the helper
visible, so that we could re-use it for the next patch.

Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/204
Fixes: cc9d2566 ("mptcp: update per unacked sequence on pkt reception")
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

1502328f

tls: prevent oversized sendfile() hangs by ignoring MSG_MORE · d452d48b

Jakub Kicinski authored Jun 18, 2021

We got multiple reports that multi_chunk_sendfile test
case from tls selftest fails. This was sort of expected,
as the original fix was never applied (see it in the first
Link:). The test in question uses sendfile() with count
larger than the size of the underlying file. This will
make splice set MSG_MORE on all sendpage calls, meaning
TLS will never close and flush the last partial record.

Eric seem to have addressed a similar problem in
commit 35f9c09f ("tcp: tcp_sendpages() should call tcp_push() once")
by introducing MSG_SENDPAGE_NOTLAST. Unlike MSG_MORE
MSG_SENDPAGE_NOTLAST is not set on the last call
of a "pipefull" of data (PIPE_DEF_BUFFERS == 16,
so every 16 pages or whenever we run out of data).

Having a break every 16 pages should be fine, TLS
can pack exactly 4 pages into a record, so for
aligned reads there should be no difference,
unaligned may see one extra record per sendpage().

Sticking to TCP semantics seems preferable to modifying
splice, but we can revisit it if real life scenarios
show a regression.
Reported-by: Vadim Fedorenko <vfedorenko@novek.ru>
Reported-by: Seth Forshee <seth.forshee@canonical.com>
Link: https://lore.kernel.org/netdev/1591392508-14592-1-git-send-email-pooja.trivedi@stackpath.com/
Fixes: 3c4d7559 ("tls: kernel TLS support")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Tested-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d452d48b

Merge tag 'linux-can-fixes-for-5.13-20210619' of... · d52f9b22

David S. Miller authored Jun 21, 2021

Merge tag 'linux-can-fixes-for-5.13-20210619' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can

Marc Kleine-Budde says:

====================
pull-request: can 2021-06-19

this is a pull request of 5 patches for net/master.

The first patch is by Thadeu Lima de Souza Cascardo and fixes a
potential use-after-free in the CAN broadcast manager socket, by
delaying the release of struct bcm_op after synchronize_rcu().

Oliver Hartkopp's patch fixes a similar potential user-after-free in
the CAN gateway socket by synchronizing RCU operations before removing
gw job entry.

Another patch by Oliver Hartkopp fixes a potential use-after-free in
the ISOTP socket by omitting unintended hrtimer restarts on socket
release.

Oleksij Rempel's patch for the j1939 socket fixes a potential
use-after-free by setting the SOCK_RCU_FREE flag on the socket.

The last patch is by Pavel Skripkin and fixes a use-after-free in the
ems_usb CAN driver.

All patches are intended for stable and have stable@v.k.o on Cc.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

d52f9b22

Merge tag 'wireless-drivers-2021-06-19' of... · 0d98ec87

David S. Miller authored Jun 21, 2021

Merge tag 'wireless-drivers-2021-06-19' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers

Kalle Valo says:

====================
wireless-drivers fixes for v5.13

Only one important fix for an mwifiex regression.

mwifiex

* fix deadlock during rmmod or firmware reset, regression from
  cfg80211 RTNL changes in v5.12-rc1
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

0d98ec87

hv_netvsc: Set needed_headroom according to VF · 536ba2e0

Haiyang Zhang authored Jun 18, 2021

Set needed_headroom according to VF if VF needs a bigger
headroom.
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

536ba2e0

net/netif_receive_skb_core: Use migrate_disable() · 2b4cd14f

Sebastian Andrzej Siewior authored Jun 17, 2021

The preempt disable around do_xdp_generic() has been introduced in
commit
   bbbe211c ("net: rcu lock and preempt disable missing around generic xdp")

For BPF it is enough to use migrate_disable() and the code was updated
as it can be seen in commit
   3c58482a ("bpf: Provide bpf_prog_run_pin_on_cpu() helper")

This is a leftover which was not converted.

Use migrate_disable() before invoking do_xdp_generic().
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

2b4cd14f

net: sched: add barrier to ensure correct ordering for lockless qdisc · 89837eb4

Yunsheng Lin authored Jun 17, 2021

The spin_trylock() was assumed to contain the implicit
barrier needed to ensure the correct ordering between
STATE_MISSED setting/clearing and STATE_MISSED checking
in commit a90c57f2 ("net: sched: fix packet stuck
problem for lockless qdisc").

But it turns out that spin_trylock() only has load-acquire
semantic, for strongly-ordered system(like x86), the compiler
barrier implicitly contained in spin_trylock() seems enough
to ensure the correct ordering. But for weakly-orderly system
(like arm64), the store-release semantic is needed to ensure
the correct ordering as clear_bit() and test_bit() is store
operation, see queued_spin_lock().

So add the explicit barrier to ensure the correct ordering
for the above case.

Fixes: a90c57f2 ("net: sched: fix packet stuck problem for lockless qdisc")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

89837eb4

vrf: do not push non-ND strict packets with a source LLA through packet taps again · 603113c5

Antoine Tenart authored Jun 18, 2021

Non-ND strict packets with a source LLA go through the packet taps
again, while non-ND strict packets with other source addresses do not,
and we can see a clone of those packets on the vrf interface (we should
not). This is due to a series of changes:

Commit 6f12fa77[1] made non-ND strict packets not being pushed again
in the packet taps. This changed with commit 205704c6[2] for those
packets having a source LLA, as they need a lookup with the orig_iif.

The issue now is those packets do not skip the 'vrf_ip6_rcv' function to
the end (as the ones without a source LLA) and go through the check to
call packet taps again. This check was changed by commit 6f12fa77[1]
and do not exclude non-strict packets anymore. Packets matching
'need_strict && !is_ndisc && is_ll_src' are now being sent through the
packet taps again. This can be seen by dumping packets on the vrf
interface.

Fix this by having the same code path for all non-ND strict packets and
selectively lookup with the orig_iif for those with a source LLA. This
has the effect to revert to the pre-205704c6[2] condition, which
should also be easier to maintain.

[1] 6f12fa77 ("vrf: mark skb for multicast or link-local as enslaved to VRF")
[2] 205704c6 ("vrf: packets with lladdr src needs dst at input with orig_iif when needs strict")

Fixes: 205704c6 ("vrf: packets with lladdr src needs dst at input with orig_iif when needs strict")
Cc: Stephen Suryaputra <ssuryaextr@gmail.com>
Reported-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Antoine Tenart <atenart@kernel.org>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

603113c5

19 Jun, 2021 11 commits

net: can: ems_usb: fix use-after-free in ems_usb_disconnect() · ab4a0b8f

Pavel Skripkin authored Jun 17, 2021

In ems_usb_disconnect() dev pointer, which is netdev private data, is
used after free_candev() call:
| 	if (dev) {
| 		unregister_netdev(dev->netdev);
| 		free_candev(dev->netdev);
|
| 		unlink_all_urbs(dev);
|
| 		usb_free_urb(dev->intr_urb);
|
| 		kfree(dev->intr_in_buffer);
| 		kfree(dev->tx_msg_buffer);
| 	}

Fix it by simply moving free_candev() at the end of the block.

Fail log:
| BUG: KASAN: use-after-free in ems_usb_disconnect
| Read of size 8 at addr ffff88804e041008 by task kworker/1:2/2895
|
| CPU: 1 PID: 2895 Comm: kworker/1:2 Not tainted 5.13.0-rc5+ #164
| Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a-rebuilt.opensuse.4
| Workqueue: usb_hub_wq hub_event
| Call Trace:
|     dump_stack (lib/dump_stack.c:122)
|     print_address_description.constprop.0.cold (mm/kasan/report.c:234)
|     kasan_report.cold (mm/kasan/report.c:420 mm/kasan/report.c:436)
|     ems_usb_disconnect (drivers/net/can/usb/ems_usb.c:683 drivers/net/can/usb/ems_usb.c:1058)

Fixes: 702171ad ("ems_usb: Added support for EMS CPC-USB/ARM7 CAN/USB interface")
Link: https://lore.kernel.org/r/20210617185130.5834-1-paskripkin@gmail.com
Cc: linux-stable <stable@vger.kernel.org>
Signed-off-by: Pavel Skripkin <paskripkin@gmail.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

ab4a0b8f

can: j1939: j1939_sk_init(): set SOCK_RCU_FREE to call sk_destruct() after RCU is done · 22c696fe

Oleksij Rempel authored Jun 17, 2021

Set SOCK_RCU_FREE to let RCU to call sk_destruct() on completion.
Without this patch, we will run in to j1939_can_recv() after priv was
freed by j1939_sk_release()->j1939_sk_sock_destruct()

Fixes: 25fe97cb ("can: j1939: move j1939_priv_put() into sk_destruct callback")
Link: https://lore.kernel.org/r/20210617130623.12705-1-o.rempel@pengutronix.de
Cc: linux-stable <stable@vger.kernel.org>
Reported-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reported-by: syzbot+bdf710cfc41c186fdff3@syzkaller.appspotmail.com
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

22c696fe

can: isotp: isotp_release(): omit unintended hrtimer restart on socket release · 14a4696b

Oliver Hartkopp authored Jun 18, 2021

When closing the isotp socket, the potentially running hrtimers are
canceled before removing the subscription for CAN identifiers via
can_rx_unregister().

This may lead to an unintended (re)start of a hrtimer in
isotp_rcv_cf() and isotp_rcv_fc() in the case that a CAN frame is
received by isotp_rcv() while the subscription removal is processed.

However, isotp_rcv() is called under RCU protection, so after calling
can_rx_unregister, we may call synchronize_rcu in order to wait for
any RCU read-side critical sections to finish. This prevents the
reception of CAN frames after hrtimer_cancel() and therefore the
unintended (re)start of the hrtimers.

Link: https://lore.kernel.org/r/20210618173713.2296-1-socketcan@hartkopp.net
Fixes: e057dd3f ("can: add ISO 15765-2:2016 transport protocol")
Cc: linux-stable <stable@vger.kernel.org>
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

14a4696b

can: gw: synchronize rcu operations before removing gw job entry · fb8696ab

Oliver Hartkopp authored Jun 18, 2021

can_can_gw_rcv() is called under RCU protection, so after calling
can_rx_unregister(), we have to call synchronize_rcu in order to wait
for any RCU read-side critical sections to finish before removing the
kmem_cache entry with the referenced gw job entry.

Link: https://lore.kernel.org/r/20210618173645.2238-1-socketcan@hartkopp.net
Fixes: c1aabdf3 ("can-gw: add netlink based CAN routing")
Cc: linux-stable <stable@vger.kernel.org>
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

fb8696ab

can: bcm: delay release of struct bcm_op after synchronize_rcu() · d5f9023f

Thadeu Lima de Souza Cascardo authored Jun 19, 2021

can_rx_register() callbacks may be called concurrently to the call to
can_rx_unregister(). The callbacks and callback data, though, are
protected by RCU and the struct sock reference count.

So the callback data is really attached to the life of sk, meaning
that it should be released on sk_destruct. However, bcm_remove_op()
calls tasklet_kill(), and RCU callbacks may be called under RCU
softirq, so that cannot be used on kernels before the introduction of
HRTIMER_MODE_SOFT.

However, bcm_rx_handler() is called under RCU protection, so after
calling can_rx_unregister(), we may call synchronize_rcu() in order to
wait for any RCU read-side critical sections to finish. That is,
bcm_rx_handler() won't be called anymore for those ops. So, we only
free them, after we do that synchronize_rcu().

Fixes: ffd980f9 ("[CAN]: Add broadcast manager (bcm) protocol")
Link: https://lore.kernel.org/r/20210619161813.2098382-1-cascardo@canonical.com
Cc: linux-stable <stable@vger.kernel.org>
Reported-by: syzbot+0f7e7e5e2f4f40fa89c0@syzkaller.appspotmail.com
Reported-by: Norbert Slusarek <nslusarek@gmx.net>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Acked-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

d5f9023f

Merge branch 'ezchip-fixes' · dda2626b

David S. Miller authored Jun 19, 2021

Pavel Skripkin says:

====================
net: ethernat: ezchip: bug fixing and code improvments

While manual code reviewing, I found some error in ezchip driver.
Two of them looks very dangerous:
  1. use-after-free in nps_enet_remove
      Accessing netdev private data after free_netdev()

  2. wrong error handling of platform_get_irq()
      It can cause passing negative irq to request_irq()

Also, in 2nd patch I removed redundant check to increase execution
speed and make code more straightforward.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

dda2626b

net: ethernet: ezchip: fix error handling · 0de449d5

Pavel Skripkin authored Jun 18, 2021

As documented at drivers/base/platform.c for platform_get_irq:

 * Gets an IRQ for a platform device and prints an error message if finding the
 * IRQ fails. Device drivers should check the return value for errors so as to
 * not pass a negative integer value to the request_irq() APIs.

So, the driver should check that platform_get_irq() return value
is _negative_, not that it's equal to zero, because -ENXIO (return
value from request_irq() if irq was not found) will
pass this check and it leads to passing negative irq to request_irq()

Fixes: 0dd07709 ("NET: Add ezchip ethernet driver")
Signed-off-by: Pavel Skripkin <paskripkin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

0de449d5

net: ethernet: ezchip: remove redundant check · 4ae85b23

Pavel Skripkin authored Jun 18, 2021

err varibale will be set everytime, when code gets
into this path. This check will just slowdown the execution
and that's all.

Fixes: 0dd07709 ("NET: Add ezchip ethernet driver")
Signed-off-by: Pavel Skripkin <paskripkin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

4ae85b23

net: ethernet: ezchip: fix UAF in nps_enet_remove · e4b8700e

Pavel Skripkin authored Jun 18, 2021

priv is netdev private data, but it is used
after free_netdev(). It can cause use-after-free when accessing priv
pointer. So, fix it by moving free_netdev() after netif_napi_del()
call.

Fixes: 0dd07709 ("NET: Add ezchip ethernet driver")
Signed-off-by: Pavel Skripkin <paskripkin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e4b8700e

net: ethernet: aeroflex: fix UAF in greth_of_remove · e3a5de6d

Pavel Skripkin authored Jun 18, 2021

static int greth_of_remove(struct platform_device *of_dev)
{
...
	struct greth_private *greth = netdev_priv(ndev);
...
	unregister_netdev(ndev);
	free_netdev(ndev);

	of_iounmap(&of_dev->resource[0], greth->regs, resource_size(&of_dev->resource[0]));
...
}

greth is netdev private data, but it is used
after free_netdev(). It can cause use-after-free when accessing greth
pointer. So, fix it by moving free_netdev() after of_iounmap()
call.

Fixes: d4c41139 ("net: Add Aeroflex Gaisler 10/100/1G Ethernet MAC driver")
Signed-off-by: Pavel Skripkin <paskripkin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e3a5de6d

Merge tag 'net-5.13-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 9ed13a17

Linus Torvalds authored Jun 18, 2021

Pull networking fixes from Jakub Kicinski:
 "Networking fixes for 5.13-rc7, including fixes from wireless, bpf,
  bluetooth, netfilter and can.

  Current release - regressions:

   - mlxsw: spectrum_qdisc: Pass handle, not band number to find_class()
     to fix modifying offloaded qdiscs

   - lantiq: net: fix duplicated skb in rx descriptor ring

   - rtnetlink: fix regression in bridge VLAN configuration, empty info
     is not an error, bot-generated "fix" was not needed

   - libbpf: s/rx/tx/ typo on umem->rx_ring_setup_done to fix umem
     creation

  Current release - new code bugs:

   - ethtool: fix NULL pointer dereference during module EEPROM dump via
     the new netlink API

   - mlx5e: don't update netdev RQs with PTP-RQ, the special purpose
     queue should not be visible to the stack

   - mlx5e: select special PTP queue only for SKBTX_HW_TSTAMP skbs

   - mlx5e: verify dev is present in get devlink port ndo, avoid a panic

  Previous releases - regressions:

   - neighbour: allow NUD_NOARP entries to be force GCed

   - further fixes for fallout from reorg of WiFi locking (staging:
     rtl8723bs, mac80211, cfg80211)

   - skbuff: fix incorrect msg_zerocopy copy notifications

   - mac80211: fix NULL ptr deref for injected rate info

   - Revert "net/mlx5: Arm only EQs with EQEs" it may cause missed IRQs

  Previous releases - always broken:

   - bpf: more speculative execution fixes

   - netfilter: nft_fib_ipv6: skip ipv6 packets from any to link-local

   - udp: fix race between close() and udp_abort() resulting in a panic

   - fix out of bounds when parsing TCP options before packets are
     validated (in netfilter: synproxy, tc: sch_cake and mptcp)

   - mptcp: improve operation under memory pressure, add missing
     wake-ups

   - mptcp: fix double-lock/soft lookup in subflow_error_report()

   - bridge: fix races (null pointer deref and UAF) in vlan tunnel
     egress

   - ena: fix DMA mapping function issues in XDP

   - rds: fix memory leak in rds_recvmsg

  Misc:

   - vrf: allow larger MTUs

   - icmp: don't send out ICMP messages with a source address of 0.0.0.0

   - cdc_ncm: switch to eth%d interface naming"

* tag 'net-5.13-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (139 commits)
  net: ethernet: fix potential use-after-free in ec_bhf_remove
  selftests/net: Add icmp.sh for testing ICMP dummy address responses
  icmp: don't send out ICMP messages with a source address of 0.0.0.0
  net: ll_temac: Avoid ndo_start_xmit returning NETDEV_TX_BUSY
  net: ll_temac: Fix TX BD buffer overwrite
  net: ll_temac: Add memory-barriers for TX BD access
  net: ll_temac: Make sure to free skb when it is completely used
  MAINTAINERS: add Guvenc as SMC maintainer
  bnxt_en: Call bnxt_ethtool_free() in bnxt_init_one() error path
  bnxt_en: Fix TQM fastpath ring backing store computation
  bnxt_en: Rediscover PHY capabilities after firmware reset
  cxgb4: fix wrong shift.
  mac80211: handle various extensible elements correctly
  mac80211: reset profile_periodicity/ema_ap
  cfg80211: avoid double free of PMSR request
  cfg80211: make certificate generation more robust
  mac80211: minstrel_ht: fix sample time check
  net: qed: Fix memcpy() overflow of qed_dcbx_params()
  net: cdc_eem: fix tx fixup skb leak
  net: hamradio: fix memory leak in mkiss_close
  ...

9ed13a17

18 Jun, 2021 13 commits

Merge tag 'for-5.13-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux · 6fab154a

Linus Torvalds authored Jun 18, 2021

Pull btrfs fix from David Sterba:
 "One more fix, for a space accounting bug in zoned mode. It happens
  when a block group is switched back rw->ro and unusable bytes (due to
  zoned constraints) are subtracted twice.

  It has user visible effects so I consider it important enough for late
  -rc inclusion and backport to stable"

* tag 'for-5.13-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: zoned: fix negative space_info->bytes_readonly

6fab154a

Merge tag 'pci-v5.13-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci · 728a748b

Linus Torvalds authored Jun 18, 2021

Pull PCI fixes from Bjorn Helgaas:

 - Clear 64-bit flag for host bridge windows below 4GB to fix a resource
   allocation regression added in -rc1 (Punit Agrawal)

 - Fix tegra194 MCFG quirk build regressions added in -rc1 (Jon Hunter)

 - Avoid secondary bus resets on TI KeyStone C667X devices (Antti
   Järvinen)

 - Avoid secondary bus resets on some NVIDIA GPUs (Shanker Donthineni)

 - Work around FLR erratum on Huawei Intelligent NIC VF (Chiqijun)

 - Avoid broken ATS on AMD Navi14 GPU (Evan Quan)

 - Trust Broadcom BCM57414 NIC to isolate functions even though it
   doesn't advertise ACS support (Sriharsha Basavapatna)

 - Work around AMD RS690 BIOSes that don't configure DMA above 4GB
   (Mikel Rychliski)

 - Fix panic during PIO transfer on Aardvark controller (Pali Rohár)

* tag 'pci-v5.13-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
  PCI: aardvark: Fix kernel panic during PIO transfer
  PCI: Add AMD RS690 quirk to enable 64-bit DMA
  PCI: Add ACS quirk for Broadcom BCM57414 NIC
  PCI: Mark AMD Navi14 GPU ATS as broken
  PCI: Work around Huawei Intelligent NIC VF FLR erratum
  PCI: Mark some NVIDIA GPUs to avoid bus reset
  PCI: Mark TI C667X to avoid bus reset
  PCI: tegra194: Fix MCFG quirk build regressions
  PCI: of: Clear 64-bit flag for non-prefetchable memory below 4GB

728a748b

afs: Re-enable freezing once a page fault is interrupted · 9620ad86

Matthew Wilcox (Oracle) authored Jun 16, 2021

If a task is killed during a page fault, it does not currently call
sb_end_pagefault(), which means that the filesystem cannot be frozen
at any time thereafter.  This may be reported by lockdep like this:

====================================
WARNING: fsstress/10757 still has locks held!
5.13.0-rc4-build4+ #91 Not tainted
------------------------------------
1 lock held by fsstress/10757:
 #0: ffff888104eac530
 (
sb_pagefaults

as filesystem freezing is modelled as a lock.

Fix this by removing all the direct returns from within the function,
and using 'ret' to indicate whether we were interrupted or successful.

Fixes: 1cf7a151 ("afs: Implement shared-writeable mmap")
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: linux-afs@lists.infradead.org
Link: https://lore.kernel.org/r/20210616154900.1958373-1-willy@infradead.org/Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9620ad86

net: ethernet: fix potential use-after-free in ec_bhf_remove · 9cca0c2d

Pavel Skripkin authored Jun 18, 2021

static void ec_bhf_remove(struct pci_dev *dev)
{
...
	struct ec_bhf_priv *priv = netdev_priv(net_dev);

	unregister_netdev(net_dev);
	free_netdev(net_dev);

	pci_iounmap(dev, priv->dma_io);
	pci_iounmap(dev, priv->io);
...
}

priv is netdev private data, but it is used
after free_netdev(). It can cause use-after-free when accessing priv
pointer. So, fix it by moving free_netdev() after pci_iounmap()
calls.

Fixes: 6af55ff5 ("Driver for Beckhoff CX5020 EtherCAT master module.")
Signed-off-by: Pavel Skripkin <paskripkin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9cca0c2d

Merge tag 'mac80211-for-net-2021-06-18' of... · 0d1dc9e1

David S. Miller authored Jun 18, 2021

Merge tag 'mac80211-for-net-2021-06-18' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211

Johannes Berg says:

====================
A couple of straggler fixes:
 * a minstrel HT sample check fix
 * peer measurement could double-free on races
 * certificate file generation at build time could
   sometimes hang
 * some parameters weren't reset between connections
   in mac80211
 * some extensible elements were treated as non-
   extensible, possibly causuing bad connections
   (or failures) if the AP adds data
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

0d1dc9e1

selftests/net: Add icmp.sh for testing ICMP dummy address responses · 7e9838b7

Toke Høiland-Jørgensen authored Jun 18, 2021

This adds a new icmp.sh selftest for testing that the kernel will respond
correctly with an ICMP unreachable message with the dummy (192.0.0.8)
source address when there are no IPv4 addresses configured to use as source
addresses.
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

7e9838b7

icmp: don't send out ICMP messages with a source address of 0.0.0.0 · 32182747

Toke Høiland-Jørgensen authored Jun 18, 2021

When constructing ICMP response messages, the kernel will try to pick a
suitable source address for the outgoing packet. However, if no IPv4
addresses are configured on the system at all, this will fail and we end up
producing an ICMP message with a source address of 0.0.0.0. This can happen
on a box routing IPv4 traffic via v6 nexthops, for instance.

Since 0.0.0.0 is not generally routable on the internet, there's a good
chance that such ICMP messages will never make it back to the sender of the
original packet that the ICMP message was sent in response to. This, in
turn, can create connectivity and PMTUd problems for senders. Fortunately,
RFC7600 reserves a dummy address to be used as a source for ICMP
messages (192.0.0.8/32), so let's teach the kernel to substitute that
address as a last resort if the regular source address selection procedure
fails.

Below is a quick example reproducing this issue with network namespaces:

ip netns add ns0
ip l add type veth peer netns ns0
ip l set dev veth0 up
ip a add 10.0.0.1/24 dev veth0
ip a add fc00:dead:cafe:42::1/64 dev veth0
ip r add 10.1.0.0/24 via inet6 fc00:dead:cafe:42::2
ip -n ns0 l set dev veth0 up
ip -n ns0 a add fc00:dead:cafe:42::2/64 dev veth0
ip -n ns0 r add 10.0.0.0/24 via inet6 fc00:dead:cafe:42::1
ip netns exec ns0 sysctl -w net.ipv4.icmp_ratelimit=0
ip netns exec ns0 sysctl -w net.ipv4.ip_forward=1
tcpdump -tpni veth0 -c 2 icmp &
ping -w 1 10.1.0.1 > /dev/null
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on veth0, link-type EN10MB (Ethernet), snapshot length 262144 bytes
IP 10.0.0.1 > 10.1.0.1: ICMP echo request, id 29, seq 1, length 64
IP 0.0.0.0 > 10.0.0.1: ICMP net 10.1.0.1 unreachable, length 92
2 packets captured
2 packets received by filter
0 packets dropped by kernel

With this patch the above capture changes to:
IP 10.0.0.1 > 10.1.0.1: ICMP echo request, id 31127, seq 1, length 64
IP 192.0.0.8 > 10.0.0.1: ICMP net 10.1.0.1 unreachable, length 92

Fixes: 1da177e4 ("Linux-2.6.12-rc2")
Reported-by: Juliusz Chroboczek <jch@irif.fr>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

32182747

net: ll_temac: Avoid ndo_start_xmit returning NETDEV_TX_BUSY · f6396341

Esben Haabendal authored Jun 18, 2021

As documented in Documentation/networking/driver.rst, the ndo_start_xmit
method must not return NETDEV_TX_BUSY under any normal circumstances, and
as recommended, we simply stop the tx queue in advance, when there is a
risk that the next xmit would cause a NETDEV_TX_BUSY return.
Signed-off-by: Esben Haabendal <esben@geanix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f6396341

net: ll_temac: Fix TX BD buffer overwrite · c364df24

Esben Haabendal authored Jun 18, 2021

Just as the initial check, we need to ensure num_frag+1 buffers available,
as that is the number of buffers we are going to use.

This fixes a buffer overflow, which might be seen during heavy network
load. Complete lockup of TEMAC was reproducible within about 10 minutes of
a particular load.

Fixes: 84823ff8 ("net: ll_temac: Fix race condition causing TX hang")
Cc: stable@vger.kernel.org # v5.4+
Signed-off-by: Esben Haabendal <esben@geanix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c364df24

net: ll_temac: Add memory-barriers for TX BD access · 28d9fab4

Esben Haabendal authored Jun 18, 2021

Add a couple of memory-barriers to ensure correct ordering of read/write
access to TX BDs.

In xmit_done, we should ensure that reading the additional BD fields are
only done after STS_CTRL_APP0_CMPLT bit is set.

When xmit_done marks the BD as free by setting APP0=0, we need to ensure
that the other BD fields are reset first, so we avoid racing with the xmit
path, which writes to the same fields.

Finally, making sure to read APP0 of next BD after the current BD, ensures
that we see all available buffers.
Signed-off-by: Esben Haabendal <esben@geanix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

28d9fab4

net: ll_temac: Make sure to free skb when it is completely used · 6aa32217

Esben Haabendal authored Jun 18, 2021

With the skb pointer piggy-backed on the TX BD, we have a simple and
efficient way to free the skb buffer when the frame has been transmitted.
But in order to avoid freeing the skb while there are still fragments from
the skb in use, we need to piggy-back on the TX BD of the skb, not the
first.

Without this, we are doing use-after-free on the DMA side, when the first
BD of a multi TX BD packet is seen as completed in xmit_done, and the
remaining BDs are still being processed.

Cc: stable@vger.kernel.org # v5.4+
Signed-off-by: Esben Haabendal <esben@geanix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

6aa32217

MAINTAINERS: add Guvenc as SMC maintainer · 35036d69

Karsten Graul authored Jun 18, 2021

Add Guvenc as maintainer for Shared Memory Communications (SMC)
Sockets.

Cc: Julian Wiedmann <jwi@linux.ibm.com>
Acked-by: Guvenc Gulce <guvenc@linux.ibm.com>
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

35036d69

Merge branch 'bnxt_en-fixes' · b6a258c1

David S. Miller authored Jun 18, 2021

Michael Chan says:

====================
bnxt_en: Bug fixes

This patchset includes 3 small bug fixes to reinitialize PHY capabilities
after firmware reset, setup the chip's internal TQM fastpath ring
backing memory properly for RoCE traffic, and to free ethtool related
memory if driver probe fails.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

b6a258c1