Commits · 0b943d907984e587721be873d47f614ed8edaf35 · Kirill Smelkov / linux

20 Apr, 2020 9 commits

Merge branch 'mptcp-fix-races-on-accept' · 0b943d90

David S. Miller authored Apr 20, 2020

Paolo Abeni says:

====================
mptcp: fix races on accept()

This series includes some fixes for accept() races which may cause inconsistent
MPTCP socket status and oops. Please see the individual patches for the
technical details.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

0b943d90

mptcp: drop req socket remote_key* fields · fca5c82c

Paolo Abeni authored Apr 20, 2020

We don't need them, as we can use the current ingress opt
data instead. Setting them in syn_recv_sock() may causes
inconsistent mptcp socket status, as per previous commit.

Fixes: cc7972ea ("mptcp: parse and emit MP_CAPABLE option according to v1 spec")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

fca5c82c

mptcp: avoid flipping mp_capable field in syn_recv_sock() · 4c8941de

Paolo Abeni authored Apr 20, 2020

If multiple CPUs races on the same req_sock in syn_recv_sock(),
flipping such field can cause inconsistent child socket status.

When racing, the CPU losing the req ownership may still change
the mptcp request socket mp_capable flag while the CPU owning
the request is cloning the socket, leaving the child socket with
'is_mptcp' set but no 'mp_capable' flag.

Such socket will stay with 'conn' field cleared, heading to oops
in later mptcp callback.

Address the issue tracking the fallback status in a local variable.

Fixes: 58b09919 ("mptcp: create msk early")
Co-developed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

4c8941de

mptcp: handle mptcp listener destruction via rcu · 5e20087d

Florian Westphal authored Apr 20, 2020

Following splat can occur during self test:

 BUG: KASAN: use-after-free in subflow_data_ready+0x156/0x160
 Read of size 8 at addr ffff888100c35c28 by task mptcp_connect/4808

  subflow_data_ready+0x156/0x160
  tcp_child_process+0x6a3/0xb30
  tcp_v4_rcv+0x2231/0x3730
  ip_protocol_deliver_rcu+0x5c/0x860
  ip_local_deliver_finish+0x220/0x360
  ip_local_deliver+0x1c8/0x4e0
  ip_rcv_finish+0x1da/0x2f0
  ip_rcv+0xd0/0x3c0
  __netif_receive_skb_one_core+0xf5/0x160
  __netif_receive_skb+0x27/0x1c0
  process_backlog+0x21e/0x780
  net_rx_action+0x35f/0xe90
  do_softirq+0x4c/0x50
  [..]

This occurs when accessing subflow_ctx->conn.

Problem is that tcp_child_process() calls listen sockets'
sk_data_ready() notification, but it doesn't hold the listener
lock.  Another cpu calling close() on the listener will then cause
transition of refcount to 0.

Fixes: 58b09919 ("mptcp: create msk early")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

5e20087d

cxgb4: fix large delays in PTP synchronization · bd019427

Rahul Lakkireddy authored Apr 20, 2020

Fetching PTP sync information from mailbox is slow and can take
up to 10 milliseconds. Reduce this unnecessary delay by directly
reading the information from the corresponding registers.

Fixes: 9c33e420 ("cxgb4: Add PTP Hardware Clock (PHC) support")
Signed-off-by: Manoj Malviya <manojmalviya@chelsio.com>
Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bd019427

net: stmmac: dwmac-meson8b: Add missing boundary to RGMII TX clock array · f0212a5e

Marc Zyngier authored Apr 18, 2020

Running with KASAN on a VIM3L systems leads to the following splat
when probing the Ethernet device:

==================================================================
BUG: KASAN: global-out-of-bounds in _get_maxdiv+0x74/0xd8
Read of size 4 at addr ffffa000090615f4 by task systemd-udevd/139
CPU: 1 PID: 139 Comm: systemd-udevd Tainted: G            E     5.7.0-rc1-00101-g8624b7577b9c #781
Hardware name: amlogic w400/w400, BIOS 2020.01-rc5 03/12/2020
Call trace:
 dump_backtrace+0x0/0x2a0
 show_stack+0x20/0x30
 dump_stack+0xec/0x148
 print_address_description.isra.12+0x70/0x35c
 __kasan_report+0xfc/0x1d4
 kasan_report+0x4c/0x68
 __asan_load4+0x9c/0xd8
 _get_maxdiv+0x74/0xd8
 clk_divider_bestdiv+0x74/0x5e0
 clk_divider_round_rate+0x80/0x1a8
 clk_core_determine_round_nolock.part.9+0x9c/0xd0
 clk_core_round_rate_nolock+0xf0/0x108
 clk_hw_round_rate+0xac/0xf0
 clk_factor_round_rate+0xb8/0xd0
 clk_core_determine_round_nolock.part.9+0x9c/0xd0
 clk_core_round_rate_nolock+0xf0/0x108
 clk_core_round_rate_nolock+0xbc/0x108
 clk_core_set_rate_nolock+0xc4/0x2e8
 clk_set_rate+0x58/0xe0
 meson8b_dwmac_probe+0x588/0x72c [dwmac_meson8b]
 platform_drv_probe+0x78/0xd8
 really_probe+0x158/0x610
 driver_probe_device+0x140/0x1b0
 device_driver_attach+0xa4/0xb0
 __driver_attach+0xcc/0x1c8
 bus_for_each_dev+0xf4/0x168
 driver_attach+0x3c/0x50
 bus_add_driver+0x238/0x2e8
 driver_register+0xc8/0x1e8
 __platform_driver_register+0x88/0x98
 meson8b_dwmac_driver_init+0x28/0x1000 [dwmac_meson8b]
 do_one_initcall+0xa8/0x328
 do_init_module+0xe8/0x368
 load_module+0x3300/0x36b0
 __do_sys_finit_module+0x120/0x1a8
 __arm64_sys_finit_module+0x4c/0x60
 el0_svc_common.constprop.2+0xe4/0x268
 do_el0_svc+0x98/0xa8
 el0_svc+0x24/0x68
 el0_sync_handler+0x12c/0x318
 el0_sync+0x158/0x180

The buggy address belongs to the variable:
 div_table.63646+0x34/0xfffffffffffffa40 [dwmac_meson8b]

Memory state around the buggy address:
 ffffa00009061480: fa fa fa fa 00 00 00 01 fa fa fa fa 00 00 00 00
 ffffa00009061500: 05 fa fa fa fa fa fa fa 00 04 fa fa fa fa fa fa
>ffffa00009061580: 00 03 fa fa fa fa fa fa 00 00 00 00 00 00 fa fa
                                                             ^
 ffffa00009061600: fa fa fa fa 00 01 fa fa fa fa fa fa 01 fa fa fa
 ffffa00009061680: fa fa fa fa 00 01 fa fa fa fa fa fa 04 fa fa fa
==================================================================

Digging into this indeed shows that the clock divider array is
lacking a final fence, and that the clock subsystems goes in the
weeds. Oh well.

Let's add the empty structure that indicates the end of the array.

Fixes: bd6f4854 ("net: stmmac: dwmac-meson8b: Fix the RGMII TX delay on Meson8b/8m2 SoCs")
Signed-off-by: Marc Zyngier <maz@kernel.org>
Cc: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Reviewed-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f0212a5e

ipv6: fix restrict IPV6_ADDRFORM operation · 82c9ae44

John Haxby authored Apr 18, 2020

Commit b6f61189 ("ipv6: restrict IPV6_ADDRFORM operation") fixed a
problem found by syzbot an unfortunate logic error meant that it
also broke IPV6_ADDRFORM.

Rearrange the checks so that the earlier test is just one of the series
of checks made before moving the socket from IPv6 to IPv4.

Fixes: b6f61189 ("ipv6: restrict IPV6_ADDRFORM operation")
Signed-off-by: John Haxby <john.haxby@oracle.com>
Cc: stable@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>

82c9ae44

net: systemport: Omit superfluous error message in bcm_sysport_probe() · bdbe05b3

Tang Bin authored Apr 18, 2020

In the function bcm_sysport_probe(), when get irq failed, the function
platform_get_irq() logs an error message, so remove redundant message
here.
Signed-off-by: Tang Bin <tangbin@cmss.chinamobile.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bdbe05b3

net: openvswitch: ovs_ct_exit to be done under ovs_lock · 27de77ce

Tonghao Zhang authored Apr 17, 2020

syzbot wrote:
| =============================
| WARNING: suspicious RCU usage
| 5.7.0-rc1+ #45 Not tainted
| -----------------------------
| net/openvswitch/conntrack.c:1898 RCU-list traversed in non-reader section!!
|
| other info that might help us debug this:
| rcu_scheduler_active = 2, debug_locks = 1
| ...
|
| stack backtrace:
| Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-0-ga698c8995f-prebuilt.qemu.org 04/01/2014
| Workqueue: netns cleanup_net
| Call Trace:
| ...
| ovs_ct_exit
| ovs_exit_net
| ops_exit_list.isra.7
| cleanup_net
| process_one_work
| worker_thread

To avoid that warning, invoke the ovs_ct_exit under ovs_lock and add
lockdep_ovsl_is_held as optional lockdep expression.

Link: https://lore.kernel.org/lkml/000000000000e642a905a0cbee6e@google.com
Fixes: 11efd5cb ("openvswitch: Support conntrack zone limit")
Cc: Pravin B Shelar <pshelar@ovn.org>
Cc: Yi-Hung Wei <yihung.wei@gmail.com>
Reported-by: syzbot+7ef50afd3a211f879112@syzkaller.appspotmail.com
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

27de77ce

18 Apr, 2020 12 commits

tcp: cache line align MAX_TCP_HEADER · 9bacd256

Eric Dumazet authored Apr 17, 2020

TCP stack is dumb in how it cooks its output packets.

Depending on MAX_HEADER value, we might chose a bad ending point
for the headers.

If we align the end of TCP headers to cache line boundary, we
make sure to always use the smallest number of cache lines,
which always help.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9bacd256

Merge branch 'mptcp-fixes' · 56e639e6

David S. Miller authored Apr 18, 2020

Florian Westphal says:

====================
mptcp: fix 'attempt to release socket in state...' splats

These two patches fix error handling corner-cases where
inet_sock_destruct gets called for a mptcp_sk that is not in TCP_CLOSE
state.  This results in unwanted error printks from the network stack.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

56e639e6

mptcp: fix 'Attempt to release TCP socket in state' warnings · 9f5ca6a5

Florian Westphal authored Apr 17, 2020

We need to set sk_state to CLOSED, else we will get following:

IPv4: Attempt to release TCP socket in state 3 00000000b95f109e
IPv4: Attempt to release TCP socket in state 10 00000000b95f109e

First one is from inet_sock_destruct(), second one from
mptcp_sk_clone failure handling.  Setting sk_state to CLOSED isn't
enough, we also need to orphan sk so it has DEAD flag set.
Otherwise, a very similar warning is printed from inet_sock_destruct().
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

9f5ca6a5

mptcp: fix splat when incoming connection is never accepted before exit/close · df1036da

Florian Westphal authored Apr 17, 2020

Following snippet (replicated from syzkaller reproducer) generates
warning: "IPv4: Attempt to release TCP socket in state 1".

int main(void) {
 struct sockaddr_in sin1 = { .sin_family = 2, .sin_port = 0x4e20,
                             .sin_addr.s_addr = 0x010000e0, };
 struct sockaddr_in sin2 = { .sin_family = 2,
	                     .sin_addr.s_addr = 0x0100007f, };
 struct sockaddr_in sin3 = { .sin_family = 2, .sin_port = 0x4e20,
	                     .sin_addr.s_addr = 0x0100007f, };
 int r0 = socket(0x2, 0x1, 0x106);
 int r1 = socket(0x2, 0x1, 0x106);

 bind(r1, (void *)&sin1, sizeof(sin1));
 connect(r1, (void *)&sin2, sizeof(sin2));
 listen(r1, 3);
 return connect(r0, (void *)&sin3, 0x4d);
}

Reason is that the newly generated mptcp socket is closed via the ulp
release of the tcp listener socket when its accept backlog gets purged.

To fix this, delay setting the ESTABLISHED state until after userspace
calls accept and via mptcp specific destructor.

Fixes: 58b09919 ("mptcp: create msk early")
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/9Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

df1036da

net/mlx4_en: avoid indirect call in TX completion · 310660a1

Eric Dumazet authored Apr 15, 2020

Commit 9ecc2d86 ("net/mlx4_en: add xdp forwarding and data write support")
brought another indirect call in fast path.

Use INDIRECT_CALL_2() helper to avoid the cost of the indirect call
when/if CONFIG_RETPOLINE=y
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Tariq Toukan <tariqt@mellanox.com>
Cc: Willem de Bruijn <willemb@google.com>
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

310660a1

ipv6: rpl: fix full address compression · 62e69776

Alexander Aring authored Apr 15, 2020

This patch makes it impossible that cmpri or cmpre values are set to the
value 16 which is not possible, because these are 4 bit values. We
currently run in an overflow when assigning the value 16 to it.

According to the standard a value of 16 can be interpreted as a full
elided address which isn't possible to set as compression value. A reason
why this cannot be set is that the current ipv6 header destination address
should never show up inside the segments of the rpl header. In this case we
run in a overflow and the address will have no compression at all. Means
cmpri or compre is set to 0.

As we handle cmpri and cmpre sometimes as unsigned char or 4 bit value
inside the rpl header the current behaviour ends in an invalid header
format. This patch simple use the best compression method if we ever run
into the case that the destination address is showed up inside the rpl
segments. We avoid the overflow handling and the rpl header is still valid,
even when we have the destination address inside the rpl segments.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

62e69776

net: stmmac: Fix sub-second increment · 91a2559c

Julien Beraud authored Apr 15, 2020

In fine adjustement mode, which is the current default, the sub-second
    increment register is the number of nanoseconds that will be added to
    the clock when the accumulator overflows. At each clock cycle, the
    value of the addend register is added to the accumulator.
    Currently, we use 20ns = 1e09ns / 50MHz as this value whatever the
    frequency of the ptp clock actually is.
    The adjustment is then done on the addend register, only incrementing
    every X clock cycles X being the ratio between 50MHz and ptp_clock_rate
    (addend = 2^32 * 50MHz/ptp_clock_rate).
    This causes the following issues :
    - In case the frequency of the ptp clock is inferior or equal to 50MHz,
      the addend value calculation will overflow and the default
      addend value will be set to 0, causing the clock to not work at
      all. (For instance, for ptp_clock_rate = 50MHz, addend = 2^32).
    - The resolution of the timestamping clock is limited to 20ns while it
      is not needed, thus limiting the accuracy of the timestamping to
      20ns.

    Fix this by setting sub-second increment to 2e09ns / ptp_clock_rate.
    It will allow to reach the minimum possible frequency for
    ptp_clk_ref, which is 5MHz for GMII 1000Mps Full-Duplex by setting the
    sub-second-increment to a higher value. For instance, for 25MHz, it
    gives ssinc = 80ns and default_addend = 2^31.
    It will also allow to use a lower value for sub-second-increment, thus
    improving the timestamping accuracy with frequencies higher than
    100MHz, for instance, for 200MHz, ssinc = 10ns and default_addend =
    2^31.

v1->v2:
 - Remove modifications to the calculation of default addend, which broke
 compatibility with clock frequencies for which 2000000000 / ptp_clk_freq
 is not an integer.
 - Modify description according to discussions.
Signed-off-by: Julien Beraud <julien.beraud@orolia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

91a2559c

net: stmmac: fix enabling socfpga's ptp_ref_clock · 15ce3060

Julien Beraud authored Apr 15, 2020

There are 2 registers to write to enable a ptp ref clock coming from the
fpga.
One that enables the usage of the clock from the fpga for emac0 and emac1
as a ptp ref clock, and the other to allow signals from the fpga to reach
emac0 and emac1.
Currently, if the dwmac-socfpga has phymode set to PHY_INTERFACE_MODE_MII,
PHY_INTERFACE_MODE_GMII, or PHY_INTERFACE_MODE_SGMII, both registers will
be written and the ptp ref clock will be set as coming from the fpga.
Separate the 2 register writes to only enable signals from the fpga to
reach emac0 or emac1 when ptp ref clock is not coming from the fpga.
Signed-off-by: Julien Beraud <julien.beraud@orolia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

15ce3060

wimax/i2400m: Fix potential urb refcnt leak · 7717cbec

Xiyu Yang authored Apr 15, 2020

i2400mu_bus_bm_wait_for_ack() invokes usb_get_urb(), which increases the
refcount of the "notif_urb".

When i2400mu_bus_bm_wait_for_ack() returns, local variable "notif_urb"
becomes invalid, so the refcount should be decreased to keep refcount
balanced.

The issue happens in all paths of i2400mu_bus_bm_wait_for_ack(), which
forget to decrease the refcnt increased by usb_get_urb(), causing a
refcnt leak.

Fix this issue by calling usb_put_urb() before the
i2400mu_bus_bm_wait_for_ack() returns.
Signed-off-by: Xiyu Yang <xiyuyang19@fudan.edu.cn>
Signed-off-by: Xin Tan <tanxin.ctf@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

7717cbec

tipc: Fix potential tipc_node refcnt leak in tipc_rcv · de058420

Xiyu Yang authored Apr 15, 2020

tipc_rcv() invokes tipc_node_find() twice, which returns a reference of
the specified tipc_node object to "n" with increased refcnt.

When tipc_rcv() returns or a new object is assigned to "n", the original
local reference of "n" becomes invalid, so the refcount should be
decreased to keep refcount balanced.

The issue happens in some paths of tipc_rcv(), which forget to decrease
the refcnt increased by tipc_node_find() and will cause a refcnt leak.

Fix this issue by calling tipc_node_put() before the original object
pointed by "n" becomes invalid.
Signed-off-by: Xiyu Yang <xiyuyang19@fudan.edu.cn>
Signed-off-by: Xin Tan <tanxin.ctf@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

de058420

tipc: Fix potential tipc_aead refcnt leak in tipc_crypto_rcv · 441870ee

Xiyu Yang authored Apr 15, 2020

tipc_crypto_rcv() invokes tipc_aead_get(), which returns a reference of
the tipc_aead object to "aead" with increased refcnt.

When tipc_crypto_rcv() returns, the original local reference of "aead"
becomes invalid, so the refcount should be decreased to keep refcount
balanced.

The issue happens in one error path of tipc_crypto_rcv(). When TIPC
message decryption status is EINPROGRESS or EBUSY, the function forgets
to decrease the refcnt increased by tipc_aead_get() and causes a refcnt
leak.

Fix this issue by calling tipc_aead_put() on the error path when TIPC
message decryption status is EINPROGRESS or EBUSY.
Signed-off-by: Xiyu Yang <xiyuyang19@fudan.edu.cn>
Signed-off-by: Xin Tan <tanxin.ctf@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

441870ee

net: netrom: Fix potential nr_neigh refcnt leak in nr_add_node · d03f2284

Xiyu Yang authored Apr 15, 2020

nr_add_node() invokes nr_neigh_get_dev(), which returns a local
reference of the nr_neigh object to "nr_neigh" with increased refcnt.

When nr_add_node() returns, "nr_neigh" becomes invalid, so the refcount
should be decreased to keep refcount balanced.

The issue happens in one normal path of nr_add_node(), which forgets to
decrease the refcnt increased by nr_neigh_get_dev() and causes a refcnt
leak. It should decrease the refcnt before the function returns like
other normal paths do.

Fix this issue by calling nr_neigh_put() before the nr_add_node()
returns.
Signed-off-by: Xiyu Yang <xiyuyang19@fudan.edu.cn>
Signed-off-by: Xin Tan <tanxin.ctf@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d03f2284

16 Apr, 2020 11 commits

Merge tag 'tag-chrome-platform-fixes-for-v5.7-rc2' of... · 2fcd8014

Linus Torvalds authored Apr 16, 2020

Merge tag 'tag-chrome-platform-fixes-for-v5.7-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/chrome-platform/linux

Pull chrome-platform fixes from Benson Leung:
 "Two small fixes for cros_ec_sensorhub_ring.c, addressing issues
  introduced in the cros_ec_sensorhub FIFO support commit"

* tag 'tag-chrome-platform-fixes-for-v5.7-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/chrome-platform/linux:
  platform/chrome: cros_ec_sensorhub: Add missing '\n' in log messages
  platform/chrome: cros_ec_sensorhub: Off by one in cros_sensorhub_send_sample()

2fcd8014

Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · c8372665

Linus Torvalds authored Apr 16, 2020

Pull networking fixes from David Miller:

 1) Disable RISCV BPF JIT builds when !MMU, from Björn Töpel.

 2) nf_tables leaves dangling pointer after free, fix from Eric Dumazet.

 3) Out of boundary write in __xsk_rcv_memcpy(), fix from Li RongQing.

 4) Adjust icmp6 message source address selection when routes have a
    preferred source address set, from Tim Stallard.

 5) Be sure to validate HSR protocol version when creating new links,
    from Taehee Yoo.

 6) CAP_NET_ADMIN should be sufficient to manage l2tp tunnels even in
    non-initial namespaces, from Michael Weiß.

 7) Missing release firmware call in mlx5, from Eran Ben Elisha.

 8) Fix variable type in macsec_changelink(), caught by KASAN. Fix from
    Taehee Yoo.

 9) Fix pause frame negotiation in marvell phy driver, from Clemens
    Gruber.

10) Record RX queue early enough in tun packet paths such that XDP
    programs will see the correct RX queue index, from Gilberto Bertin.

11) Fix double unlock in mptcp, from Florian Westphal.

12) Fix offset overflow in ARM bpf JIT, from Luke Nelson.

13) marvell10g needs to soft reset PHY when coming out of low power
    mode, from Russell King.

14) Fix MTU setting regression in stmmac for some chip types, from
    Florian Fainelli.

* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (101 commits)
  amd-xgbe: Use __napi_schedule() in BH context
  mISDN: make dmril and dmrim static
  net: stmmac: dwmac-sunxi: Provide TX and RX fifo sizes
  net: dsa: mt7530: fix tagged frames pass-through in VLAN-unaware mode
  tipc: fix incorrect increasing of link window
  Documentation: Fix tcp_challenge_ack_limit default value
  net: tulip: make early_486_chipsets static
  dt-bindings: net: ethernet-phy: add desciption for ethernet-phy-id1234.d400
  ipv6: remove redundant assignment to variable err
  net/rds: Use ERR_PTR for rds_message_alloc_sgs()
  net: mscc: ocelot: fix untagged packet drops when enslaving to vlan aware bridge
  selftests/bpf: Check for correct program attach/detach in xdp_attach test
  libbpf: Fix type of old_fd in bpf_xdp_set_link_opts
  libbpf: Always specify expected_attach_type on program load if supported
  xsk: Add missing check on user supplied headroom size
  mac80211: fix channel switch trigger from unknown mesh peer
  mac80211: fix race in ieee80211_register_hw()
  net: marvell10g: soft-reset the PHY when coming out of low power
  net: marvell10g: report firmware version
  net/cxgb4: Check the return from t4_query_params properly
  ...

c8372665

amd-xgbe: Use __napi_schedule() in BH context · d518691c

Sebastian Andrzej Siewior authored Apr 16, 2020

The driver uses __napi_schedule_irqoff() which is fine as long as it is
invoked with disabled interrupts by everybody. Since the commit
mentioned below the driver may invoke xgbe_isr_task() in tasklet/softirq
context. This may lead to list corruption if another driver uses
__napi_schedule_irqoff() in IRQ context.

Use __napi_schedule() which safe to use from IRQ and softirq context.

Fixes: 85b85c85 ("amd-xgbe: Re-issue interrupt if interrupt status not cleared")
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d518691c

mISDN: make dmril and dmrim static · 05eab4f3

Jason Yan authored Apr 15, 2020

Fix the following sparse warning:

drivers/isdn/hardware/mISDN/mISDNisar.c:746:12: warning: symbol 'dmril'
was not declared. Should it be static?
drivers/isdn/hardware/mISDN/mISDNisar.c:749:12: warning: symbol 'dmrim'
was not declared. Should it be static?
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Jason Yan <yanaijie@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

05eab4f3

net: stmmac: dwmac-sunxi: Provide TX and RX fifo sizes · 806fd188

Florian Fainelli authored Apr 14, 2020

After commit bfcb8132 ("net: dsa:
configure the MTU for switch ports") my Lamobo R1 platform which uses
an allwinner,sun7i-a20-gmac compatible Ethernet MAC started to fail
by rejecting a MTU of 1536. The reason for that is that the DMA
capabilities are not readable on this version of the IP, and there
is also no 'tx-fifo-depth' property being provided in Device Tree. The
property is documented as optional, and is not provided.

Chen-Yu indicated that the FIFO sizes are 4KB for TX and 16KB for RX, so
provide these values through platform data as an immediate fix until
various Device Tree sources get updated accordingly.

Fixes: eaf4fac4 ("net: stmmac: Do not accept invalid MTU values")
Suggested-by: Chen-Yu Tsai <wens@csie.org>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Chen-Yu Tsai <wens@csie.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

806fd188

net: dsa: mt7530: fix tagged frames pass-through in VLAN-unaware mode · e045124e

DENG Qingfang authored Apr 14, 2020

In VLAN-unaware mode, the Egress Tag (EG_TAG) field in Port VLAN
Control register must be set to Consistent to let tagged frames pass
through as is, otherwise their tags will be stripped.

Fixes: 83163f7d ("net: dsa: mediatek: add VLAN support for MT7530")
Signed-off-by: DENG Qingfang <dqfext@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Tested-by: René van Dorst <opensource@vdorst.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e045124e

Merge tag 'selinux-pr-20200416' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux · 9786cab6

Linus Torvalds authored Apr 16, 2020

Pull SELinux fix from Paul Moore:
 "One small SELinux fix to ensure we cleanup properly on an error
  condition"

* tag 'selinux-pr-20200416' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
  selinux: free str on error in str_read()

9786cab6

Merge tag 'ceph-for-5.7-rc2' of git://github.com/ceph/ceph-client · 3fa84bf9

Linus Torvalds authored Apr 16, 2020

Pull ceph fixes from Ilya Dryomov:

 - a set of patches for a deadlock on "rbd map" error path

 - a fix for invalid pointer dereference and uninitialized variable use
   on asynchronous create and unlink error paths.

* tag 'ceph-for-5.7-rc2' of git://github.com/ceph/ceph-client:
  ceph: fix potential bad pointer deref in async dirops cb's
  rbd: don't mess with a page vector in rbd_notify_op_lock()
  rbd: don't test rbd_dev->opts in rbd_dev_image_release()
  rbd: call rbd_dev_unprobe() after unwatching and flushing notifies
  rbd: avoid a deadlock on header_rwsem when flushing notifies

3fa84bf9

Merge tag 'trace-v5.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace · 4ede1259

Linus Torvalds authored Apr 16, 2020

Pull tracing fix from Steven Rostedt:
 "This fixes a small race between allocating a snapshot buffer and
  setting the snapshot trigger.

  On a slow machine, the trigger can occur before the snapshot is
  allocated causing a warning to be displayed in the ring buffer, and no
  snapshot triggering. Reversing the allocation and the enabling of the
  trigger fixes the problem"

* tag 'trace-v5.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tracing: Fix the race between registering 'snapshot' event trigger and triggering 'snapshot' operation

4ede1259

keys: Fix proc_keys_next to increase position index · 86d32f9a

Vasily Averin authored Apr 14, 2020

If seq_file .next function does not change position index,
read after some lseek can generate unexpected output:

    $ dd if=/proc/keys bs=1  # full usual output
    0f6bfdf5 I--Q---     2 perm 3f010000  1000  1000 user      4af2f79ab8848d0a: 740
    1fb91b32 I--Q---     3 perm 1f3f0000  1000 65534 keyring   _uid.1000: 2
    27589480 I--Q---     1 perm 0b0b0000     0     0 user      invocation_id: 16
    2f33ab67 I--Q---   152 perm 3f030000     0     0 keyring   _ses: 2
    33f1d8fa I--Q---     4 perm 3f030000  1000  1000 keyring   _ses: 1
    3d427fda I--Q---     2 perm 3f010000  1000  1000 user      69ec44aec7678e5a: 740
    3ead4096 I--Q---     1 perm 1f3f0000  1000 65534 keyring   _uid_ses.1000: 1
    521+0 records in
    521+0 records out
    521 bytes copied, 0,00123769 s, 421 kB/s

But a read after lseek in middle of last line results in the partial
last line and then a repeat of the final line:

    $ dd if=/proc/keys bs=500 skip=1
    dd: /proc/keys: cannot skip to specified offset
    g   _uid_ses.1000: 1
    3ead4096 I--Q---     1 perm 1f3f0000  1000 65534 keyring   _uid_ses.1000: 1
    0+1 records in
    0+1 records out
    97 bytes copied, 0,000135035 s, 718 kB/s

and a read after lseek beyond end of file results in the last line being
shown:

    $ dd if=/proc/keys bs=1000 skip=1   # read after lseek beyond end of file
    dd: /proc/keys: cannot skip to specified offset
    3ead4096 I--Q---     1 perm 1f3f0000  1000 65534 keyring   _uid_ses.1000: 1
    0+1 records in
    0+1 records out
    76 bytes copied, 0,000119981 s, 633 kB/s

See https://bugzilla.kernel.org/show_bug.cgi?id=206283

Fixes: 1f4aace6 ("fs/seq_file.c: simplify seq_file iteration code ...")
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

86d32f9a

Merge tag 'efi-urgent-2020-04-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 00086336

Linus Torvalds authored Apr 15, 2020

Pull EFI fixes from Ingo Molnar:
 "Misc EFI fixes, including the boot failure regression caused by the
  BSS section not being cleared by the loaders"

* tag 'efi-urgent-2020-04-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  efi/x86: Revert struct layout change to fix kexec boot regression
  efi/x86: Don't remap text<->rodata gap read-only for mixed mode
  efi/x86: Fix the deletion of variables in mixed mode
  efi/libstub/file: Merge file name buffers to reduce stack usage
  Documentation/x86, efi/x86: Clarify EFI handover protocol and its requirements
  efi/arm: Deal with ADR going out of range in efi_enter_kernel()
  efi/x86: Always relocate the kernel for EFI handover entry
  efi/x86: Move efi stub globals from .bss to .data
  efi/libstub/x86: Remove redundant assignment to pointer hdr
  efi/cper: Use scnprintf() for avoiding potential buffer overflow

00086336

15 Apr, 2020 8 commits

tipc: fix incorrect increasing of link window · edadedf1

Tuong Lien authored Apr 15, 2020

In commit 16ad3f40 ("tipc: introduce variable window congestion
control"), we allow link window to change with the congestion avoidance
algorithm. However, there is a bug that during the slow-start if packet
retransmission occurs, the link will enter the fast-recovery phase, set
its window to the 'ssthresh' which is never less than 300, so the link
window suddenly increases to that limit instead of decreasing.

Consequently, two issues have been observed:

- For broadcast-link: it can leave a gap between the link queues that a
new packet will be inserted and sent before the previous ones, i.e. not
in-order.

- For unicast: the algorithm does not work as expected, the link window
jumps to the slow-start threshold whereas packet retransmission occurs.

This commit fixes the issues by avoiding such the link window increase,
but still decreasing if the 'ssthresh' is lowered.

Fixes: 16ad3f40 ("tipc: introduce variable window congestion control")
Acked-by: Jon Maloy <jmaloy@redhat.com>
Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

edadedf1

Documentation: Fix tcp_challenge_ack_limit default value · 5309960e

Cambda Zhu authored Apr 15, 2020

The default value of tcp_challenge_ack_limit has been changed from
100 to 1000 and this patch fixes its documentation.
Signed-off-by: Cambda Zhu <cambda@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

5309960e

net: tulip: make early_486_chipsets static · ae5a44bb

Jason Yan authored Apr 15, 2020

Fix the following sparse warning:

drivers/net/ethernet/dec/tulip/tulip_core.c:1280:28: warning: symbol
'early_486_chipsets' was not declared. Should it be static?
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Jason Yan <yanaijie@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ae5a44bb

dt-bindings: net: ethernet-phy: add desciption for ethernet-phy-id1234.d400 · c8322754

Johan Jonker authored Apr 15, 2020

The description below is already in use in
'rk3228-evb.dts', 'rk3229-xms6.dts' and 'rk3328.dtsi'
but somehow never added to a document, so add
"ethernet-phy-id1234.d400", "ethernet-phy-ieee802.3-c22"
for ethernet-phy nodes on Rockchip platforms to
'ethernet-phy.yaml'.
Signed-off-by: Johan Jonker <jbx6244@gmail.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c8322754

ipv6: remove redundant assignment to variable err · 672e2477

Colin Ian King authored Apr 16, 2020

The variable err is being initialized with a value that is never read
and it is being updated later with a new value.  The initialization is
redundant and can be removed.

Addresses-Coverity: ("Unused value")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

672e2477

selinux: free str on error in str_read() · af15f14c

Ondrej Mosnacek authored Apr 14, 2020

In [see "Fixes:"] I missed the fact that str_read() may give back an
allocated pointer even if it returns an error, causing a potential
memory leak in filename_trans_read_one(). Fix this by making the
function free the allocated string whenever it returns a non-zero value,
which also makes its behavior more obvious and prevents repeating the
same mistake in the future.
Reported-by: coverity-bot <keescook+coverity-bot@chromium.org>
Addresses-Coverity-ID: 1461665 ("Resource leaks")
Fixes: c3a27611 ("selinux: optimize storage of filename transitions")
Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Paul Moore <paul@paul-moore.com>

af15f14c

net/rds: Use ERR_PTR for rds_message_alloc_sgs() · 7dba9203

Jason Gunthorpe authored Apr 14, 2020

Returning the error code via a 'int *ret' when the function returns a
pointer is very un-kernely and causes gcc 10's static analysis to choke:

net/rds/message.c: In function ‘rds_message_map_pages’:
net/rds/message.c:358:10: warning: ‘ret’ may be used uninitialized in this function [-Wmaybe-uninitialized]
  358 |   return ERR_PTR(ret);

Use a typical ERR_PTR return instead.
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

7dba9203

net: mscc: ocelot: fix untagged packet drops when enslaving to vlan aware bridge · 87b0f983

Vladimir Oltean authored Apr 14, 2020

To rehash a previous explanation given in commit 1c44ce56 ("net:
mscc: ocelot: fix vlan_filtering when enslaving to bridge before link is
up"), the switch driver operates the in a mode where a single VLAN can
be transmitted as untagged on a particular egress port. That is the
"native VLAN on trunk port" use case.

The configuration for this native VLAN is driven in 2 ways:
 - Set the egress port rewriter to strip the VLAN tag for the native
   VID (as it is egress-untagged, after all).
 - Configure the ingress port to drop untagged and priority-tagged
   traffic, if there is no native VLAN. The intention of this setting is
   that a trunk port with no native VLAN should not accept untagged
   traffic.

Since both of the above configurations for the native VLAN should only
be done if VLAN awareness is requested, they are actually done from the
ocelot_port_vlan_filtering function, after the basic procedure of
toggling the VLAN awareness flag of the port.

But there's a problem with that simplistic approach: we are trying to
juggle with 2 independent variables from a single function:
 - Native VLAN of the port - its value is held in port->vid.
 - VLAN awareness state of the port - currently there are some issues
   here, more on that later*.
The actual problem can be seen when enslaving the switch ports to a VLAN
filtering bridge:
 0. The driver configures a pvid of zero for each port, when in
    standalone mode. While the bridge configures a default_pvid of 1 for
    each port that gets added as a slave to it.
 1. The bridge calls ocelot_port_vlan_filtering with vlan_aware=true.
    The VLAN-filtering-dependent portion of the native VLAN
    configuration is done, considering that the native VLAN is 0.
 2. The bridge calls ocelot_vlan_add with vid=1, pvid=true,
    untagged=true. The native VLAN changes to 1 (change which gets
    propagated to hardware).
 3. ??? - nobody calls ocelot_port_vlan_filtering again, to reapply the
    VLAN-filtering-dependent portion of the native VLAN configuration,
    for the new native VLAN of 1. One can notice that after toggling "ip
    link set dev br0 type bridge vlan_filtering 0 && ip link set dev br0
    type bridge vlan_filtering 1", the new native VLAN finally makes it
    through and untagged traffic finally starts flowing again. But
    obviously that shouldn't be needed.

So it is clear that 2 independent variables need to both re-trigger the
native VLAN configuration. So we introduce the second variable as
ocelot_port->vlan_aware.

*Actually both the DSA Felix driver and the Ocelot driver already had
each its own variable:
 - Ocelot: ocelot_port_private->vlan_aware
 - Felix: dsa_port->vlan_filtering
but the common Ocelot library needs to work with a single, common,
variable, so there is some refactoring done to move the vlan_aware
property from the private structure into the common ocelot_port
structure.

Fixes: 97bb69e1 ("net: mscc: ocelot: break apart ocelot_vlan_port_apply")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

87b0f983