Commits · 9420e8ade4353a6710908ffafa23ecaf1caa0123 · Kirill Smelkov / linux

26 Mar, 2020 1 commit

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma · 9420e8ad

Linus Torvalds authored Mar 26, 2020

Pull rdma fixes from Jason Gunthorpe:
 "A small set of late-rc patches, mostly fixes for various crashers,
  some syzkaller fixes and a mlx5 HW limitation:

   - Several MAINTAINERS updates

   - Memory leak regression in ODP

   - Several fixes for syzkaller related crashes. Google recently taught
     syzkaller to create the software RDMA devices

   - Crash fixes for HFI1

   - Several fixes for mlx5 crashes

   - Prevent unprivileged access to an unsafe mlx5 HW resource"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
  RDMA/mlx5: Block delay drop to unprivileged users
  RDMA/mlx5: Fix access to wrong pointer while performing flush due to error
  RDMA/core: Ensure security pkey modify is not lost
  MAINTAINERS: Clean RXE section and add Zhu as RXE maintainer
  IB/hfi1: Ensure pq is not left on waitlist
  IB/rdmavt: Free kernel completion queue when done
  RDMA/mad: Do not crash if the rdma device does not have a umad interface
  RDMA/core: Fix missing error check on dev_set_name()
  RDMA/nl: Do not permit empty devices names during RDMA_NLDEV_CMD_NEWLINK/SET
  RDMA/mlx5: Fix the number of hwcounters of a dynamic counter
  MAINTAINERS: Update maintainers for HISILICON ROCE DRIVER
  RDMA/odp: Fix leaking the tgid for implicit ODP

9420e8ad

25 Mar, 2020 13 commits

Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 1b649e0b

Linus Torvalds authored Mar 25, 2020

Pull networking fixes from David Miller:

 1) Fix deadlock in bpf_send_signal() from Yonghong Song.

 2) Fix off by one in kTLS offload of mlx5, from Tariq Toukan.

 3) Add missing locking in iwlwifi mvm code, from Avraham Stern.

 4) Fix MSG_WAITALL handling in rxrpc, from David Howells.

 5) Need to hold RTNL mutex in tcindex_partial_destroy_work(), from Cong
    Wang.

 6) Fix producer race condition in AF_PACKET, from Willem de Bruijn.

 7) cls_route removes the wrong filter during change operations, from
    Cong Wang.

 8) Reject unrecognized request flags in ethtool netlink code, from
    Michal Kubecek.

 9) Need to keep MAC in reset until PHY is up in bcmgenet driver, from
    Doug Berger.

10) Don't leak ct zone template in act_ct during replace, from Paul
    Blakey.

11) Fix flushing of offloaded netfilter flowtable flows, also from Paul
    Blakey.

12) Fix throughput drop during tx backpressure in cxgb4, from Rahul
    Lakkireddy.

13) Don't let a non-NULL skb->dev leave the TCP stack, from Eric
    Dumazet.

14) TCP_QUEUE_SEQ socket option has to update tp->copied_seq as well,
    also from Eric Dumazet.

15) Restrict macsec to ethernet devices, from Willem de Bruijn.

16) Fix reference leak in some ethtool *_SET handlers, from Michal
    Kubecek.

17) Fix accidental disabling of MSI for some r8169 chips, from Heiner
    Kallweit.

* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (138 commits)
  net: Fix CONFIG_NET_CLS_ACT=n and CONFIG_NFT_FWD_NETDEV={y, m} build
  net: ena: Add PCI shutdown handler to allow safe kexec
  selftests/net/forwarding: define libs as TEST_PROGS_EXTENDED
  selftests/net: add missing tests to Makefile
  r8169: re-enable MSI on RTL8168c
  net: phy: mdio-bcm-unimac: Fix clock handling
  cxgb4/ptp: pass the sign of offset delta in FW CMD
  net: dsa: tag_8021q: replace dsa_8021q_remove_header with __skb_vlan_pop
  net: cbs: Fix software cbs to consider packet sending time
  net/mlx5e: Do not recover from a non-fatal syndrome
  net/mlx5e: Fix ICOSQ recovery flow with Striding RQ
  net/mlx5e: Fix missing reset of SW metadata in Striding RQ reset
  net/mlx5e: Enhance ICOSQ WQE info fields
  net/mlx5_core: Set IB capability mask1 to fix ib_srpt connection failure
  selftests: netfilter: add nfqueue test case
  netfilter: nft_fwd_netdev: allow to redirect to ifb via ingress
  netfilter: nft_fwd_netdev: validate family and chain type
  netfilter: nft_set_rbtree: Detect partial overlaps on insertion
  netfilter: nft_set_rbtree: Introduce and use nft_rbtree_interval_start()
  netfilter: nft_set_pipapo: Separate partial and complete overlap cases on insertion
  ...

1b649e0b

Merge tag 'gpio-v5.6-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio · 1dfb642b

Linus Torvalds authored Mar 25, 2020

Pull GPIO fixes from Linus Walleij:

 - One core quirk by myself to fix the .irq_disable() semantics when the
   gpiolib core takes over this callback.

 - The rest is an elaborate series of four patches fixing Intel laptop
   ACPI wakeup quirks.

* tag 'gpio-v5.6-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio:
  gpiolib: acpi: Add quirk to ignore EC wakeups on HP x2 10 CHT + AXP288 model
  gpiolib: acpi: Add quirk to ignore EC wakeups on HP x2 10 BYT + AXP288 model
  gpiolib: acpi: Rework honor_wakeup option into an ignore_wake option
  gpiolib: acpi: Correct comment for HP x2 10 honor_wakeup quirk
  gpiolib: Fix irq_disable() semantics

1dfb642b

Merge tag 'wireless-drivers-2020-03-25' of... · 2910594f

David S. Miller authored Mar 25, 2020

Merge tag 'wireless-drivers-2020-03-25' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers

Kalle Valo says:

====================
wireless-drivers fixes for v5.6

Fourth, and last, set of fixes for v5.6. Just two important fixes to
iwlwifi regressions.

iwlwifi

* fix GEO_TX_POWER_LIMIT command on certain devices which caused
  firmware to crash during initialisation

* add back device ids for three devices which were accidentally
  removed
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

2910594f

net: Fix CONFIG_NET_CLS_ACT=n and CONFIG_NFT_FWD_NETDEV={y, m} build · 2c64605b

Pablo Neira Ayuso authored Mar 25, 2020

net/netfilter/nft_fwd_netdev.c: In function ‘nft_fwd_netdev_eval’:
    net/netfilter/nft_fwd_netdev.c:32:10: error: ‘struct sk_buff’ has no member named ‘tc_redirected’
      pkt->skb->tc_redirected = 1;
              ^~
    net/netfilter/nft_fwd_netdev.c:33:10: error: ‘struct sk_buff’ has no member named ‘tc_from_ingress’
      pkt->skb->tc_from_ingress = 1;
              ^~

To avoid a direct dependency with tc actions from netfilter, wrap the
redirect bits around CONFIG_NET_REDIRECT and move helpers to
include/linux/skbuff.h. Turn on this toggle from the ifb driver, the
only existing client of these bits in the tree.

This patch adds skb_set_redirected() that sets on the redirected bit
on the skbuff, it specifies if the packet was redirect from ingress
and resets the timestamp (timestamp reset was originally missing in the
netfilter bugfix).

Fixes: bcfabee1 ("netfilter: nft_fwd_netdev: allow to redirect to ifb via ingress")
Reported-by: noreply@ellerman.id.au
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

2c64605b

net: ena: Add PCI shutdown handler to allow safe kexec · 428c4913

Guilherme G. Piccoli authored Mar 20, 2020

Currently ENA only provides the PCI remove() handler, used during rmmod
for example. This is not called on shutdown/kexec path; we are potentially
creating a failure scenario on kexec:

(a) Kexec is triggered, no shutdown() / remove() handler is called for ENA;
instead pci_device_shutdown() clears the master bit of the PCI device,
stopping all DMA transactions;

(b) Kexec reboot happens and the device gets enabled again, likely having
its FW with that DMA transaction buffered; then it may trigger the (now
invalid) memory operation in the new kernel, corrupting kernel memory area.

This patch aims to prevent this, by implementing a shutdown() handler
quite similar to the remove() one - the difference being the handling
of the netdev, which is unregistered on remove(), but following the
convention observed in other drivers, it's only detached on shutdown().

This prevents an odd issue in AWS Nitro instances, in which after the 2nd
kexec the next one will fail with an initrd corruption, caused by a wild
DMA write to invalid kernel memory. The lspci output for the adapter
present in my instance is:

00:05.0 Ethernet controller [0200]: Amazon.com, Inc. Elastic Network
Adapter (ENA) [1d0f:ec20]
Suggested-by: Gavin Shan <gshan@redhat.com>
Signed-off-by: Guilherme G. Piccoli <gpiccoli@canonical.com>
Acked-by: Sameeh Jubran <sameehj@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

428c4913

selftests/net/forwarding: define libs as TEST_PROGS_EXTENDED · c085dbfb

Hangbin Liu authored Mar 25, 2020

The lib files should not be defined as TEST_PROGS, or we will run them
in run_kselftest.sh.

Also remove ethtool_lib.sh exec permission.

Fixes: 81573b18 ("selftests/net/forwarding: add Makefile to install tests")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c085dbfb

selftests/net: add missing tests to Makefile · 919a23e9

Hangbin Liu authored Mar 25, 2020

Find some tests are missed in Makefile by running:
for file in $(ls *.sh); do grep -q $file Makefile || echo $file; done
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

919a23e9

Merge tag 'zonefs-5.6-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs · e2cf67f6

Linus Torvalds authored Mar 25, 2020

Pull zonefs fix from Damien Le Moal:
 "A single fix from me to correctly handle the size of read-only zone
  files"

* tag 'zonefs-5.6-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs:
  zonfs: Fix handling of read-only zones

e2cf67f6

RDMA/mlx5: Block delay drop to unprivileged users · ba80013f

Maor Gottlieb authored Mar 22, 2020

It has been discovered that this feature can globally block the RX port,
so it should be allowed for highly privileged users only.

Fixes: 03404e8a("IB/mlx5: Add support to dropless RQ")
Link: https://lore.kernel.org/r/20200322124906.1173790-1-leon@kernel.orgSigned-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>

ba80013f

zonfs: Fix handling of read-only zones · ccf4ad7d

Damien Le Moal authored Mar 20, 2020

The write pointer of zones in the read-only consition is defined as
invalid by the SCSI ZBC and ATA ZAC specifications. It is thus not
possible to determine the correct size of a read-only zone file on
mount. Fix this by handling read-only zones in the same manner as
offline zones by disabling all accesses to the zone (read and write)
and initializing the inode size of the read-only zone to 0).

For zones found to be in the read-only condition at runtime, only
disable write access to the zone and keep the size of the zone file to
its last updated value to allow the user to recover previously written
data.

Also fix zonefs documentation file to reflect this change.
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>

ccf4ad7d

Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf · 6f000f98

David S. Miller authored Mar 24, 2020

Pablo Neira Ayuso says:

====================
Netfilter fixes for net

The following patchset contains Netfilter fixes for net:

1) A new selftest for nf_queue, from Florian Westphal. This test
   covers two recent fixes: 07f8e4d0 ("tcp: also NULL skb->dev
   when copy was needed") and b738a185 ("tcp: ensure skb->dev is
   NULL before leaving TCP stack").

2) The fwd action breaks with ifb. For safety in next extensions,
   make sure the fwd action only runs from ingress until it is extended
   to be used from a different hook.

3) The pipapo set type now reports EEXIST in case of subrange overlaps.
   Update the rbtree set to validate range overlaps, so far this
   validation is only done only from userspace. From Stefano Brivio.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

6f000f98

Merge tag 'mlx5-fixes-2020-03-24' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux · 7e566df6

David S. Miller authored Mar 24, 2020

Saeed Mahameed says:

====================
Mellanox, mlx5 fixes 2020-03-24

This series introduces some fixes to mlx5 driver.

From Aya, Fixes to the RX error recovery flows
From Leon, Fix IB capability mask

Please pull and let me know if there is any problem.

For -stable v5.5
 ('net/mlx5_core: Set IB capability mask1 to fix ib_srpt connection failure')

For -stable v5.4
 ('net/mlx5e: Fix ICOSQ recovery flow with Striding RQ')
 ('net/mlx5e: Do not recover from a non-fatal syndrome')
 ('net/mlx5e: Fix missing reset of SW metadata in Striding RQ reset')
 ('net/mlx5e: Enhance ICOSQ WQE info fields')

The above patch ('net/mlx5e: Enhance ICOSQ WQE info fields')
will fail to apply cleanly on v5.4 due to a trivial contextual conflict,
but it is an important fix, do I need to do something about it or just
assume Greg will know how to handle this ?
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

7e566df6

r8169: re-enable MSI on RTL8168c · f13bc681

Heiner Kallweit authored Mar 24, 2020

The original change fixed an issue on RTL8168b by mimicking the vendor
driver behavior to disable MSI on chip versions before RTL8168d.
This however now caused an issue on a system with RTL8168c, see [0].
Therefore leave MSI disabled on RTL8168b, but re-enable it on RTL8168c.

[0] https://bugzilla.redhat.com/show_bug.cgi?id=1792839

Fixes: 003bd5b4 ("r8169: don't use MSI before RTL8168d")
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f13bc681

24 Mar, 2020 26 commits

net: phy: mdio-bcm-unimac: Fix clock handling · c312c781

Andre Przywara authored Mar 24, 2020

The DT binding for this PHY describes an *optional* clock property.
Due to a bug in the error handling logic, we are actually ignoring this
clock *all* of the time so far.

Fix this by using devm_clk_get_optional() to handle this clock properly.

Fixes: b78ac6ec ("net: phy: mdio-bcm-unimac: Allow configuring MDIO clock divider")
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c312c781

cxgb4/ptp: pass the sign of offset delta in FW CMD · 50e0d28d

Raju Rangoju authored Mar 24, 2020

cxgb4_ptp_fineadjtime() doesn't pass the signedness of offset delta
in FW_PTP_CMD. Fix it by passing correct sign.
Signed-off-by: Raju Rangoju <rajur@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

50e0d28d

net: dsa: tag_8021q: replace dsa_8021q_remove_header with __skb_vlan_pop · e80f40cb

Vladimir Oltean authored Mar 24, 2020

Not only did this wheel did not need reinventing, but there is also
an issue with it: It doesn't remove the VLAN header in a way that
preserves the L2 payload checksum when that is being provided by the DSA
master hw. It should recalculate checksum both for the push, before
removing the header, and for the pull afterwards. But the current
implementation is quite dizzying, with pulls followed immediately
afterwards by pushes, the memmove is done before the push, etc. This
makes a DSA master with RX checksumming offload to print stack traces
with the infamous 'hw csum failure' message.

So remove the dsa_8021q_remove_header function and replace it with
something that actually works with inet checksumming.

Fixes: d4619336 ("net: dsa: tag_8021q: Create helper function for removing VLAN header")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e80f40cb

net: cbs: Fix software cbs to consider packet sending time · 961d0e5b

Zh-yuan Ye authored Mar 24, 2020

Currently the software CBS does not consider the packet sending time
when depleting the credits. It caused the throughput to be
Idleslope[kbps] * (Port transmit rate[kbps] / |Sendslope[kbps]|) where
Idleslope * (Port transmit rate / (Idleslope + |Sendslope|)) = Idleslope
is expected. In order to fix the issue above, this patch takes the time
when the packet sending completes into account by moving the anchor time
variable "last" ahead to the send completion time upon transmission and
adding wait when the next dequeue request comes before the send
completion time of the previous packet.

changelog:
V2->V3:
 - remove unnecessary whitespace cleanup
 - add the checks if port_rate is 0 before division

V1->V2:
 - combine variable "send_completed" into "last"
 - add the comment for estimate of the packet sending

Fixes: 585d763a ("net/sched: Introduce Credit Based Shaper (CBS) qdisc")
Signed-off-by: Zh-yuan Ye <ye.zh-yuan@socionext.com>
Reviewed-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

961d0e5b

RDMA/mlx5: Fix access to wrong pointer while performing flush due to error · 950bf4f1

Leon Romanovsky authored Mar 18, 2020

The main difference between send and receive SW completions is related to
separate treatment of WQ queue. For receive completions, the initial index
to be flushed is stored in "tail", while for send completions, it is in
deleted "last_poll".

  CPU: 54 PID: 53405 Comm: kworker/u161:0 Kdump: loaded Tainted: G           OE    --------- -t - 4.18.0-147.el8.ppc64le #1
  Workqueue: ib-comp-unb-wq ib_cq_poll_work [ib_core]
  NIP:  c000003c7c00a000 LR: c00800000e586af4 CTR: c000003c7c00a000
  REGS: c0000036cc9db940 TRAP: 0400   Tainted: G           OE    --------- -t -  (4.18.0-147.el8.ppc64le)
  MSR:  9000000010009033 <SF,HV,EE,ME,IR,DR,RI,LE>  CR: 24004488  XER: 20040000
  CFAR: c00800000e586af0 IRQMASK: 0
  GPR00: c00800000e586ab4 c0000036cc9dbbc0 c00800000e5f1a00 c0000037d8433800
  GPR04: c000003895a26800 c0000037293f2000 0000000000000201 0000000000000011
  GPR08: c000003895a26c80 c000003c7c00a000 0000000000000000 c00800000ed30438
  GPR12: c000003c7c00a000 c000003fff684b80 c00000000017c388 c00000396ec4be40
  GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
  GPR20: c00000000151e498 0000000000000010 c000003895a26848 0000000000000010
  GPR24: 0000000000000010 0000000000010000 c000003895a26800 0000000000000000
  GPR28: 0000000000000010 c0000037d8433800 c000003895a26c80 c000003895a26800
  NIP [c000003c7c00a000] 0xc000003c7c00a000
  LR [c00800000e586af4] __ib_process_cq+0xec/0x1b0 [ib_core]
  Call Trace:
  [c0000036cc9dbbc0] [c00800000e586ab4] __ib_process_cq+0xac/0x1b0 [ib_core] (unreliable)
  [c0000036cc9dbc40] [c00800000e586c88] ib_cq_poll_work+0x40/0xb0 [ib_core]
  [c0000036cc9dbc70] [c000000000171f44] process_one_work+0x2f4/0x5c0
  [c0000036cc9dbd10] [c000000000172a0c] worker_thread+0xcc/0x760
  [c0000036cc9dbdc0] [c00000000017c52c] kthread+0x1ac/0x1c0
  [c0000036cc9dbe30] [c00000000000b75c] ret_from_kernel_thread+0x5c/0x80

Fixes: 8e3b6883 ("RDMA/mlx5: Delete unreachable handle_atomic code by simplifying SW completion")
Link: https://lore.kernel.org/r/20200318091640.44069-1-leon@kernel.orgSigned-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>

950bf4f1

RDMA/core: Ensure security pkey modify is not lost · 2d47fbac

Mike Marciniszyn authored Mar 13, 2020

The following modify sequence (loosely based on ipoib) will lose a pkey
modifcation:

- Modify (pkey index, port)
- Modify (new pkey index, NO port)

After the first modify, the qp_pps list will have saved the pkey and the
unit on the main list.

During the second modify, get_new_pps() will fetch the port from qp_pps
and read the new pkey index from qp_attr->pkey_index.  The state will
still be zero, or IB_PORT_PKEY_NOT_VALID. Because of the invalid state,
the new values will never replace the one in the qp pps list, losing the
new pkey.

This happens because the following if statements will never correct the
state because the first term will be false. If the code had been executed,
it would incorrectly overwrite valid values.

  if ((qp_attr_mask & IB_QP_PKEY_INDEX) && (qp_attr_mask & IB_QP_PORT))
	  new_pps->main.state = IB_PORT_PKEY_VALID;

  if (!(qp_attr_mask & (IB_QP_PKEY_INDEX | IB_QP_PORT)) && qp_pps) {
	  new_pps->main.port_num = qp_pps->main.port_num;
	  new_pps->main.pkey_index = qp_pps->main.pkey_index;
	  if (qp_pps->main.state != IB_PORT_PKEY_NOT_VALID)
		  new_pps->main.state = IB_PORT_PKEY_VALID;
  }

Fix by joining the two if statements with an or test to see if qp_pps is
non-NULL and in the correct state.

Fixes: 1dd01788 ("RDMA/core: Fix protection fault in get_pkey_idx_qp_list")
Link: https://lore.kernel.org/r/20200313124704.14982.55907.stgit@awfm-01.aw.intel.comReviewed-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>

2d47fbac

MAINTAINERS: Clean RXE section and add Zhu as RXE maintainer · 1fa70778

Leon Romanovsky authored Mar 12, 2020

Zhu Yanjun contributed many patches to RXE and expressed genuine interest
in improve RXE even more. Let's add him as a maintainer.

Link: https://lore.kernel.org/r/20200312083658.29603-1-leon@kernel.orgSigned-off-by: Leon Romanovsky <leonro@mellanox.com>
Acked-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>

1fa70778

net/mlx5e: Do not recover from a non-fatal syndrome · 187a9830

Aya Levin authored Mar 19, 2020

For non-fatal syndromes like LOCAL_LENGTH_ERR, recovery shouldn't be
triggered. In these scenarios, the RQ is not actually in ERR state.
This misleads the recovery flow which assumes that the RQ is really in
error state and no more completions arrive, causing crashes on bad page
state.

Fixes: 8276ea13 ("net/mlx5e: Report and recover from CQE with error on RQ")
Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

187a9830

net/mlx5e: Fix ICOSQ recovery flow with Striding RQ · e239c6d6

Aya Levin authored Mar 16, 2020

In striding RQ mode, the buffers of an RX WQE are first
prepared and posted to the HW using a UMR WQEs via the ICOSQ.
We maintain the state of these in-progress WQEs in the RQ
SW struct.

In the flow of ICOSQ recovery, the corresponding RQ is not
in error state, hence:

- The buffers of the in-progress WQEs must be released
  and the RQ metadata should reflect it.
- Existing RX WQEs in the RQ should not be affected.

For this, wrap the dealloc of the in-progress WQEs in
a function, and use it in the ICOSQ recovery flow
instead of mlx5e_free_rx_descs().

Fixes: be5323c8 ("net/mlx5e: Report and recover from CQE error on ICOSQ")
Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

e239c6d6

net/mlx5e: Fix missing reset of SW metadata in Striding RQ reset · 39369fd5

Aya Levin authored Mar 12, 2020

When resetting the RQ (moving RQ state from RST to RDY), the driver
resets the WQ's SW metadata.
In striding RQ mode, we maintain a field that reflects the actual
expected WQ head (including in progress WQEs posted to the ICOSQ).
It was mistakenly not reset together with the WQ. Fix this here.

Fixes: 8276ea13 ("net/mlx5e: Report and recover from CQE with error on RQ")
Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

39369fd5

net/mlx5e: Enhance ICOSQ WQE info fields · 1de0306c

Aya Levin authored Mar 09, 2020

Add number of WQEBBs (WQE's Basic Block) to WQE info struct. Set the
number of WQEBBs on WQE post, and increment the consumer counter (cc)
on completion.

In case of error completions, the cc was mistakenly not incremented,
keeping a gap between cc and pc (producer counter). This failed the
recovery flow on the ICOSQ from a CQE error which timed-out waiting for
the cc and pc to meet.

Fixes: be5323c8 ("net/mlx5e: Report and recover from CQE error on ICOSQ")
Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

1de0306c

net/mlx5_core: Set IB capability mask1 to fix ib_srpt connection failure · 306f354c

Leon Romanovsky authored Mar 16, 2020

The cap_mask1 isn't protected by field_select and not listed among RW
fields, but it is required to be written to properly initialize ports
in IB virtualization mode.

Link: https://lore.kernel.org/linux-rdma/88bab94d2fd72f3145835b4518bc63dda587add6.camel@redhat.com
Fixes: ab118da4 ("net/mlx5: Don't write read-only fields in MODIFY_HCA_VPORT_CONTEXT command")
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

306f354c

selftests: netfilter: add nfqueue test case · a64d558d

Florian Westphal authored Mar 23, 2020

Add a test case to check nf queue infrastructure.
Could be extended in the future to also cover serialization of
conntrack, uid and secctx attributes in nfqueue.

For now, this checks that 'queue bypass' works, that a queue rule with
no bypass option blocks traffic and that userspace receives the expected
number of packets.
For this we add two queues and hook all of
prerouting/input/forward/output/postrouting.

Packets get queued twice with a dummy base chain in between:
This passes with current nf tree, but reverting
commit 946c0d8e ("netfilter: nf_queue: fix reinject verdict handling")
makes this trip (it processes 30 instead of expected 20 packets).

v2: update config file with queue and other options missing/needed for
other tests.
v3: also test with tcp, this reveals problem with commit
28f8bfd1 ("netfilter: Support iif matches in POSTROUTING"), due to
skb->dev pointing at another skb in the retransmit rbtree (skb->dev
aliases to rbnode child).
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

a64d558d

netfilter: nft_fwd_netdev: allow to redirect to ifb via ingress · bcfabee1

Pablo Neira Ayuso authored Mar 23, 2020

Set skb->tc_redirected to 1, otherwise the ifb driver drops the packet.
Set skb->tc_from_ingress to 1 to reinject the packet back to the ingress
path after leaving the ifb egress path.

This patch inconditionally sets on these two skb fields that are
meaningful to the ifb driver. The existing forward action is guaranteed
to run from ingress path.

Fixes: 39e6dea2 ("netfilter: nf_tables: add forward expression to the netdev family")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

bcfabee1

netfilter: nft_fwd_netdev: validate family and chain type · 76a109fa

Pablo Neira Ayuso authored Mar 23, 2020

Make sure the forward action is only used from ingress.

Fixes: 39e6dea2 ("netfilter: nf_tables: add forward expression to the netdev family")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

76a109fa

netfilter: nft_set_rbtree: Detect partial overlaps on insertion · 7c84d414

Stefano Brivio authored Mar 22, 2020

...and return -ENOTEMPTY to the front-end in this case, instead of
proceeding. Currently, nft takes care of checking for these cases
and not sending them to the kernel, but if we drop the set_overlap()
call in nft we can end up in situations like:

 # nft add table t
 # nft add set t s '{ type inet_service ; flags interval ; }'
 # nft add element t s '{ 1 - 5 }'
 # nft add element t s '{ 6 - 10 }'
 # nft add element t s '{ 4 - 7 }'
 # nft list set t s
 table ip t {
 	set s {
 		type inet_service
 		flags interval
 		elements = { 1-3, 4-5, 6-7 }
 	}
 }

This change has the primary purpose of making the behaviour
consistent with nft_set_pipapo, but is also functional to avoid
inconsistent behaviour if userspace sends overlapping elements for
any reason.

v2: When we meet the same key data in the tree, as start element while
    inserting an end element, or as end element while inserting a start
    element, actually check that the existing element is active, before
    resetting the overlap flag (Pablo Neira Ayuso)
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

7c84d414

netfilter: nft_set_rbtree: Introduce and use nft_rbtree_interval_start() · 6f7c9caf

Stefano Brivio authored Mar 22, 2020

Replace negations of nft_rbtree_interval_end() with a new helper,
nft_rbtree_interval_start(), wherever this helps to visualise the
problem at hand, that is, for all the occurrences except for the
comparison against given flags in __nft_rbtree_get().

This gets especially useful in the next patch.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

6f7c9caf

netfilter: nft_set_pipapo: Separate partial and complete overlap cases on insertion · 0eb4b5ee

Stefano Brivio authored Mar 22, 2020

...and return -ENOTEMPTY to the front-end on collision, -EEXIST if
an identical element already exists. Together with the previous patch,
element collision will now be returned to the user as -EEXIST.
Reported-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

0eb4b5ee

netfilter: nf_tables: Allow set back-ends to report partial overlaps on insertion · 8c2d45b2

Pablo Neira Ayuso authored Mar 22, 2020

Currently, the -EEXIST return code of ->insert() callbacks is ambiguous: it
might indicate that a given element (including intervals) already exists as
such, or that the new element would clash with existing ones.

If identical elements already exist, the front-end is ignoring this without
returning error, in case NLM_F_EXCL is not set. However, if the new element
can't be inserted due an overlap, we should report this to the user.

To this purpose, allow set back-ends to return -ENOTEMPTY on collision with
existing elements, translate that to -EEXIST, and return that to userspace,
no matter if NLM_F_EXCL was set.
Reported-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

8c2d45b2

Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 76ccd234

Linus Torvalds authored Mar 24, 2020

Pull perf tooling fixes from Ingo Molnar:
 "A handful of tooling fixes all across the map, no kernel changes"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  tools headers uapi: Update linux/in.h copy
  perf probe: Do not depend on dwfl_module_addrsym()
  perf probe: Fix to delete multiple probe event
  perf parse-events: Fix reading of invalid memory in event parsing
  perf python: Fix clang detection when using CC=clang-version
  perf map: Fix off by one in strncpy() size argument
  tools: Let O= makes handle a relative path with -C option

76ccd234

Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 3f3ee43a

Linus Torvalds authored Mar 24, 2020

Pull x86 fix from Ingo Molnar:
 "A build fix with certain Kconfig combinations"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/ioremap: Fix CONFIG_EFI=n build

3f3ee43a

Merge tag 'dmaengine-fix-5.6' of git://git.infradead.org/users/vkoul/slave-dma · c6ac7188

Linus Torvalds authored Mar 24, 2020

Pull dmaengine fixes from Vinod Koul:
 "Late fixes in dmaengine for v5.6:

   - move .device_release missing log warning to debug

   - couple of maintainer entries for HiSilicon and IADX drivers

   - off-by-one fix for idxd driver

   - documentation warning fixes

   - TI k3 dma error handling fix"

* tag 'dmaengine-fix-5.6' of git://git.infradead.org/users/vkoul/slave-dma:
  dmaengine: ti: k3-udma-glue: Fix an error handling path in 'k3_udma_glue_cfg_rx_flow()'
  MAINTAINERS: Add maintainer for HiSilicon DMA engine driver
  dmaengine: idxd: fix off by one on cdev dwq refcount
  MAINTAINERS: rectify the INTEL IADX DRIVER entry
  dmaengine: move .device_release missing log warning to debug level
  docs: dmaengine: provider.rst: get rid of some warnings

c6ac7188

gpiolib: acpi: Add quirk to ignore EC wakeups on HP x2 10 CHT + AXP288 model · 0c625ccf

Hans de Goede authored Mar 02, 2020

There are at least 3 models of the HP x2 10 models:

Bay Trail SoC + AXP288 PMIC
Cherry Trail SoC + AXP288 PMIC
Cherry Trail SoC + TI PMIC

Like on the other HP x2 10 models we need to ignore wakeup for ACPI GPIO
events on the external embedded-controller pin to avoid spurious wakeups
on the HP x2 10 CHT + AXP288 model too.

This commit adds an extra DMI based quirk for the HP x2 10 CHT + AXP288
model, ignoring wakeups for ACPI GPIO events on the EC interrupt pin
on this model. This fixes spurious wakeups from suspend on this model.

Fixes: aa23ca3d ("gpiolib: acpi: Add honor_wakeup module-option + quirk mechanism")
Reported-and-tested-by: Marc Lehmann <schmorp@schmorp.de>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Link: https://lore.kernel.org/r/20200302111225.6641-4-hdegoede@redhat.comAcked-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>

0c625ccf

selftests/net/forwarding: add Makefile to install tests · 81573b18

Vadym Kochan authored Mar 23, 2020

Add missing Makefile for net/forwarding tests and include it to
the targets list, otherwise forwarding tests are not installed
in case of cross-compilation.
Signed-off-by: Vadym Kochan <vadym.kochan@plvision.eu>
Signed-off-by: David S. Miller <davem@davemloft.net>

81573b18

ethtool: fix reference leak in some *_SET handlers · 2f599ec4

Michal Kubecek authored Mar 22, 2020

Andrew noticed that some handlers for *_SET commands leak a netdev
reference if required ethtool_ops callbacks do not exist. A simple
reproducer would be e.g.

  ip link add veth1 type veth peer name veth2
  ethtool -s veth1 wol g
  ip link del veth1

Make sure dev_put() is called when ethtool_ops check fails.

v2: add Fixes tags

Fixes: a53f3d41 ("ethtool: set link settings with LINKINFO_SET request")
Fixes: bfbcfe20 ("ethtool: set link modes related data with LINKMODES_SET request")
Fixes: e54d04e3 ("ethtool: set message mask with DEBUG_SET request")
Fixes: 8d425b19 ("ethtool: set wake-on-lan settings with WOL_SET request")
Reported-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

2f599ec4

net: dsa: Fix duplicate frames flooded by learning · 0e62f543

Florian Fainelli authored Mar 22, 2020

When both the switch and the bridge are learning about new addresses,
switch ports attached to the bridge would see duplicate ARP frames
because both entities would attempt to send them.

Fixes: 5037d532 ("net: dsa: add Broadcom tag RX/TX handler")
Reported-by: Maxime Bizon <mbizon@freebox.fr>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Vivien Didelot <vivien.didelot@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

0e62f543