Commits · 7a078d2d18801bba7bde7337a823d7342299acf7 · Kirill Smelkov / linux

02 Nov, 2020 1 commit

libbpf, hashmap: Fix undefined behavior in hash_bits · 7a078d2d

Ian Rogers authored Oct 29, 2020

If bits is 0, the case when the map is empty, then the >> is the size of
the register which is undefined behavior - on x86 it is the same as a
shift by 0.

Fix by handling the 0 case explicitly and guarding calls to hash_bits for
empty maps in hashmap__for_each_key_entry and hashmap__for_each_entry_safe.

Fixes: e3b92422 ("libbpf: add resizable non-thread safe internal hashmap")
Suggested-by: Andrii Nakryiko <andriin@fb.com>,
Signed-off-by: Ian Rogers <irogers@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20201029223707.494059-1-irogers@google.com

7a078d2d

30 Oct, 2020 1 commit

bpf: Don't rely on GCC __attribute__((optimize)) to disable GCSE · 080b6f40

Ard Biesheuvel authored Oct 28, 2020

Commit 3193c083 ("bpf: Disable GCC -fgcse optimization for
___bpf_prog_run()") introduced a __no_fgcse macro that expands to a
function scope __attribute__((optimize("-fno-gcse"))), to disable a
GCC specific optimization that was causing trouble on x86 builds, and
was not expected to have any positive effect in the first place.

However, as the GCC manual documents, __attribute__((optimize))
is not for production use, and results in all other optimization
options to be forgotten for the function in question. This can
cause all kinds of trouble, but in one particular reported case,
it causes -fno-asynchronous-unwind-tables to be disregarded,
resulting in .eh_frame info to be emitted for the function.

This reverts commit 3193c083, and instead, it disables the -fgcse
optimization for the entire source file, but only when building for
X86 using GCC with CONFIG_BPF_JIT_ALWAYS_ON disabled. Note that the
original commit states that CONFIG_RETPOLINE=n triggers the issue,
whereas CONFIG_RETPOLINE=y performs better without the optimization,
so it is kept disabled in both cases.

Fixes: 3193c083 ("bpf: Disable GCC -fgcse optimization for ___bpf_prog_run()")
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Link: https://lore.kernel.org/lkml/CAMuHMdUg0WJHEcq6to0-eODpXPOywLot6UD2=GFHpzoj_hCoBQ@mail.gmail.com/
Link: https://lore.kernel.org/bpf/20201028171506.15682-2-ardb@kernel.org

080b6f40

29 Oct, 2020 4 commits

tools, bpftool: Remove two unused variables. · 0698ac66

Ian Rogers authored Oct 27, 2020

Avoid an unused variable warning.
Signed-off-by: Ian Rogers <irogers@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Tobias Klauser <tklauser@distanz.ch>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20201027233646.3434896-2-irogers@google.com

0698ac66

tools, bpftool: Avoid array index warnings. · 1e6f5dcc

Ian Rogers authored Oct 27, 2020

The bpf_caps array is shorter without CAP_BPF, avoid out of bounds reads
if this isn't defined. Working around this avoids -Wno-array-bounds with
clang.
Signed-off-by: Ian Rogers <irogers@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Tobias Klauser <tklauser@distanz.ch>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20201027233646.3434896-1-irogers@google.com

1e6f5dcc

xsk: Fix possible memory leak at socket close · e5e1a4bc

Magnus Karlsson authored Oct 27, 2020

Fix a possible memory leak at xsk socket close that is caused by the
refcounting of the umem object being wrong. The reference count of the
umem was decremented only after the pool had been freed. Note that if
the buffer pool is destroyed, it is important that the umem is
destroyed after the pool, otherwise the umem would disappear while the
driver is still running. And as the buffer pool needs to be destroyed
in a work queue, the umem is also (if its refcount reaches zero)
destroyed after the buffer pool in that same work queue.

What was missing is that the refcount also needs to be decremented
when the pool is not freed and when the pool has not even been
created. The first case happens when the refcount of the pool is
higher than 1, i.e. it is still being used by some other socket using
the same device and queue id. In this case, it is safe to decrement
the refcount of the umem outside of the work queue as the umem will
never be freed because the refcount of the umem is always greater than
or equal to the refcount of the buffer pool. The second case is if the
buffer pool has not been created yet, i.e. the socket was closed
before it was bound but after the umem was created. In this case, it
is safe to destroy the umem outside of the work queue, since there is
no pool that can use it by definition.

Fixes: 1c1efc2a ("xsk: Create and free buffer pool independently from umem")
Reported-by: syzbot+eb71df123dc2be2c1456@syzkaller.appspotmail.com
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Björn Töpel <bjorn.topel@intel.com>
Link: https://lore.kernel.org/bpf/1603801921-2712-1-git-send-email-magnus.karlsson@gmail.com

e5e1a4bc

bpf: Add struct bpf_redir_neigh forward declaration to BPF helper defs · 821f5c90

Andrii Nakryiko authored Oct 28, 2020

Forward-declare struct bpf_redir_neigh in bpf_helper_defs.h to avoid
compiler warning about unknown structs.

Fixes: ba452c9e ("bpf: Fix bpf_redirect_neigh helper api to support supplying nexthop")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/bpf/20201028181204.111241-1-andrii@kernel.org

821f5c90

27 Oct, 2020 2 commits

samples/bpf: Set rlimit for memlock to infinity in all samples · c66dca98

Toke Høiland-Jørgensen authored Oct 27, 2020

The memlock rlimit is a notorious source of failure for BPF programs. Most
of the samples just set it to infinity, but a few used a lower limit. The
problem with unconditionally setting a lower limit is that this will also
override the limit if the system-wide setting is *higher* than the limit
being set, which can lead to failures on systems that lock a lot of memory,
but set 'ulimit -l' to unlimited before running a sample.

One fix for this is to only conditionally set the limit if the current
limit is lower, but it is simpler to just unify all the samples and have
them all set the limit to infinity.
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Link: https://lore.kernel.org/bpf/20201026233623.91728-1-toke@redhat.com

c66dca98

bpf: Fix -Wshadow warnings · 343a3e8b

Arnd Bergmann authored Oct 26, 2020

There are thousands of warnings about one macro in a W=2 build:

  include/linux/filter.h:561:6: warning: declaration of 'ret' shadows a previous local [-Wshadow]

Prefix all the locals in that macro with __ to avoid most of
these warnings.

Fixes: 492ecee8 ("bpf: enable program stats")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20201026162110.3710415-1-arnd@kernel.org

343a3e8b

26 Oct, 2020 1 commit

selftest/bpf: Fix profiler test using CO-RE relocation for enums · 47254777

Andrii Nakryiko authored Oct 22, 2020

Instead of hard-coding invalid pids_cgrp_id, use Kconfig to detect the
presence of that enum value and CO-RE to capture its actual value in the
hosts's kernel.

Fixes: 03d4d13f ("selftests/bpf: Add profiler test")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20201022202739.3667367-1-andrii@kernel.org

47254777

24 Oct, 2020 8 commits

tcp: Prevent low rmem stalls with SO_RCVLOWAT. · 435ccfa8

Arjun Roy authored Oct 23, 2020

With SO_RCVLOWAT, under memory pressure,
it is possible to enter a state where:

1. We have not received enough bytes to satisfy SO_RCVLOWAT.
2. We have not entered buffer pressure (see tcp_rmem_pressure()).
3. But, we do not have enough buffer space to accept more packets.

In this case, we advertise 0 rwnd (due to #3) but the application does
not drain the receive queue (no wakeup because of #1 and #2) so the
flow stalls.

Modify the heuristic for SO_RCVLOWAT so that, if we are advertising
rwnd<=rcv_mss, force a wakeup to prevent a stall.

Without this patch, setting tcp_rmem to 6143 and disabling TCP
autotune causes a stalled flow. With this patch, no stall occurs. This
is with RPC-style traffic with large messages.

Fixes: 03f45c88 ("tcp: avoid extra wakeups for SO_RCVLOWAT users")
Signed-off-by: Arjun Roy <arjunroy@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20201023184709.217614-1-arjunroy.kdev@gmail.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

435ccfa8

net: ucc_geth: Drop extraneous parentheses in comparison · dab23422

Michael Ellerman authored Oct 23, 2020

Clang warns about the extra parentheses in this comparison:

  drivers/net/ethernet/freescale/ucc_geth.c:1361:28:
  warning: equality comparison with extraneous parentheses
    if ((ugeth->phy_interface == PHY_INTERFACE_MODE_SGMII))
         ~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~

It seems clear the intent here is to do a comparison not an
assignment, so drop the extra parentheses to avoid any confusion.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201023033236.3296988-1-mpe@ellerman.id.auSigned-off-by: Jakub Kicinski <kuba@kernel.org>

dab23422

Merge branch 'ionic-memory-usage-fixes' · 0c3b7f4b

Jakub Kicinski authored Oct 23, 2020

Shannon Nelson says:

====================
ionic: memory usage fixes

This patchset addresses some memory leaks and incorrect
io reads.
====================

Link: https://lore.kernel.org/r/20201022235531.65956-1-snelson@pensando.ioSigned-off-by: Jakub Kicinski <kuba@kernel.org>

0c3b7f4b

ionic: fix mem leak in rx_empty · 0c32a28e

Shannon Nelson authored Oct 22, 2020

The sentinel descriptor entry was getting missed in the
traverse of the ring from head to tail, so change to a
loop of 0 to the end.

Fixes: f1d2e894 ("ionic: use index not pointer for queue tracking")
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

0c32a28e

ionic: no rx flush in deinit · 43ecf7b4

Shannon Nelson authored Oct 22, 2020

Kmemleak pointed out to us that ionic_rx_flush() is sending
skbs into napi_gro_XXX with a disabled napi context, and these
end up getting lost and leaked.  We can safely remove the flush.

Fixes: 0f3154e6 ("ionic: Add Tx and Rx handling")
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

43ecf7b4

ionic: clean up sparse complaints · d701ec32

Shannon Nelson authored Oct 22, 2020

The sparse complaints around the static_asserts were obscuring
more useful complaints.  So, don't check the static_asserts,
and fix the remaining sparse complaints.
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

d701ec32

chelsio/chtls: fix tls record info to user · 4f3391ce

Vinay Kumar Yadav authored Oct 23, 2020

chtls_pt_recvmsg() receives a skb with tls header and subsequent
skb with data, need to finalize the data copy whenever next skb
with tls header is available. but here current tls header is
overwritten by next available tls header, ends up corrupting
user buffer data. fixing it by finalizing current record whenever
next skb contains tls header.

v1->v2:
- Improved commit message.

Fixes: 17a7d24a ("crypto: chtls - generic handling of data and hdr")
Signed-off-by: Vinay Kumar Yadav <vinay.yadav@chelsio.com>
Link: https://lore.kernel.org/r/20201022190556.21308-1-vinay.yadav@chelsio.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

4f3391ce

net: ipa: command payloads already mapped · df833050

Alex Elder authored Oct 21, 2020

IPA transactions describe actions to be performed by the IPA
hardware. Three cases use IPA transactions: transmitting a socket
buffer; providing a page to receive packet data; and issuing an IPA
immediate command. An IPA transaction contains a scatter/gather
list (SGL) to hold the set of actions to be performed.

We map buffers in the SGL for DMA at the time they are added to the
transaction. For skb TX transactions, we fill the SGL with a call
to skb_to_sgvec(). Page RX transactions involve a single page
pointer, and that is recorded in the SGL with sg_set_page(). In
both of these cases we then map the SGL for DMA with a call to
dma_map_sg().

Immediate commands are different. The payload for an immediate
command comes from a region of coherent DMA memory, which must
*not* be mapped for DMA. For that reason, gsi_trans_cmd_add()
sort of hand-crafts each SGL entry added to a command transaction.

This patch fixes a problem with the code that crafts the SGL entry
for an immediate command. Previously a portion of the SGL entry was
updated using sg_set_buf(). However this is not valid because it
includes a call to virt_to_page() on the buffer, but the command
buffer pointer is not a linear address.

Since we never actually map the SGL for command transactions, there
are very few fields in the SGL we need to fill. Specifically, we
only need to record the DMA address and the length, so they can be
used by __gsi_trans_commit() to fill a TRE. We additionally need to
preserve the SGL flags so for_each_sg() still works. For that we
can simply assign a null page pointer for command SGL entries.

Fixes: 9dd441e4 ("soc: qcom: ipa: GSI transactions")
Reported-by: Stephen Boyd <swboyd@chromium.org>
Tested-by: Stephen Boyd <swboyd@chromium.org>
Signed-off-by: Alex Elder <elder@linaro.org>
Link: https://lore.kernel.org/r/20201022010029.11877-1-elder@linaro.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>

df833050

23 Oct, 2020 23 commits

Merge tag 'net-5.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 3cb12d27

Linus Torvalds authored Oct 23, 2020

Pull networking fixes from Jakub Kicinski:
 "Cross-tree/merge window issues:

   - rtl8150: don't incorrectly assign random MAC addresses; fix late in
     the 5.9 cycle started depending on a return code from a function
     which changed with the 5.10 PR from the usb subsystem

  Current release regressions:

   - Revert "virtio-net: ethtool configurable RXCSUM", it was causing
     crashes at probe when control vq was not negotiated/available

  Previous release regressions:

   - ixgbe: fix probing of multi-port 10 Gigabit Intel NICs with an MDIO
     bus, only first device would be probed correctly

   - nexthop: Fix performance regression in nexthop deletion by
     effectively switching from recently added synchronize_rcu() to
     synchronize_rcu_expedited()

   - netsec: ignore 'phy-mode' device property on ACPI systems; the
     property is not populated correctly by the firmware, but firmware
     configures the PHY so just keep boot settings

  Previous releases - always broken:

   - tcp: fix to update snd_wl1 in bulk receiver fast path, addressing
     bulk transfers getting "stuck"

   - icmp: randomize the global rate limiter to prevent attackers from
     getting useful signal

   - r8169: fix operation under forced interrupt threading, make the
     driver always use hard irqs, even on RT, given the handler is light
     and only wants to schedule napi (and do so through a _irqoff()
     variant, preferably)

   - bpf: Enforce pointer id generation for all may-be-null register
     type to avoid pointers erroneously getting marked as null-checked

   - tipc: re-configure queue limit for broadcast link

   - net/sched: act_tunnel_key: fix OOB write in case of IPv6 ERSPAN
     tunnels

   - fix various issues in chelsio inline tls driver

  Misc:

   - bpf: improve just-added bpf_redirect_neigh() helper api to support
     supplying nexthop by the caller - in case BPF program has already
     done a lookup we can avoid doing another one

   - remove unnecessary break statements

   - make MCTCP not select IPV6, but rather depend on it"

* tag 'net-5.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (62 commits)
  tcp: fix to update snd_wl1 in bulk receiver fast path
  net: Properly typecast int values to set sk_max_pacing_rate
  netfilter: nf_fwd_netdev: clear timestamp in forwarding path
  ibmvnic: save changed mac address to adapter->mac_addr
  selftests: mptcp: depends on built-in IPv6
  Revert "virtio-net: ethtool configurable RXCSUM"
  rtnetlink: fix data overflow in rtnl_calcit()
  net: ethernet: mtk-star-emac: select REGMAP_MMIO
  net: hdlc_raw_eth: Clear the IFF_TX_SKB_SHARING flag after calling ether_setup
  net: hdlc: In hdlc_rcv, check to make sure dev is an HDLC device
  bpf, libbpf: Guard bpf inline asm from bpf_tail_call_static
  bpf, selftests: Extend test_tc_redirect to use modified bpf_redirect_neigh()
  bpf: Fix bpf_redirect_neigh helper api to support supplying nexthop
  mptcp: depends on IPV6 but not as a module
  sfc: move initialisation of efx->filter_sem to efx_init_struct()
  mpls: load mpls_gso after mpls_iptunnel
  net/sched: act_tunnel_key: fix OOB write in case of IPv6 ERSPAN tunnels
  net/sched: act_gate: Unlock ->tcfa_lock in tc_setup_flow_action()
  net: dsa: bcm_sf2: make const array static, makes object smaller
  mptcp: MPTCP_IPV6 should depend on IPV6 instead of selecting it
  ...

3cb12d27

Merge tag 'gfs2-for-5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2 · 0adc313c

Linus Torvalds authored Oct 23, 2020

Pull gfs2 updates from Andreas Gruenbacher:

 - Use iomap for non-journaled buffered I/O. This largely eliminates
   buffer heads on filesystems where the block size matches the page
   size. Many thanks to Christoph Hellwig for this patch!

 - Fixes for some more journaled data filesystem bugs, found by running
   xfstests with data journaling on for all files (chattr +j $MNT) (Bob
   Peterson)

 - gfs2_evict_inode refactoring (Bob Peterson)

 - Use the statfs data in the journal during recovery instead of reading
   it in from the local statfs inodes (Abhi Das)

 - Several other minor fixes by various people

* tag 'gfs2-for-5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: (30 commits)
  gfs2: Recover statfs info in journal head
  gfs2: lookup local statfs inodes prior to journal recovery
  gfs2: Add fields for statfs info in struct gfs2_log_header_host
  gfs2: Ignore subsequent errors after withdraw in rgrp_go_sync
  gfs2: Eliminate gl_vm
  gfs2: Only access gl_delete for iopen glocks
  gfs2: Fix comments to glock_hash_walk
  gfs2: eliminate GLF_QUEUED flag in favor of list_empty(gl_holders)
  gfs2: Ignore journal log writes for jdata holes
  gfs2: simplify gfs2_block_map
  gfs2: Only set PageChecked if we have a transaction
  gfs2: don't lock sd_ail_lock in gfs2_releasepage
  gfs2: make gfs2_ail1_empty_one return the count of active items
  gfs2: Wipe jdata and ail1 in gfs2_journal_wipe, formerly gfs2_meta_wipe
  gfs2: enhance log_blocks trace point to show log blocks free
  gfs2: add missing log_blocks trace points in gfs2_write_revokes
  gfs2: rename gfs2_write_full_page to gfs2_write_jdata_page, remove parm
  gfs2: add validation checks for size of superblock
  gfs2: use-after-free in sysfs deregistration
  gfs2: Fix NULL pointer dereference in gfs2_rgrp_dump
  ...

0adc313c

Merge tag '5.10-rc-smb3-fixes-part1' of git://git.samba.org/sfrench/cifs-2.6 · 0613ed91

Linus Torvalds authored Oct 23, 2020

Pull cifs updates from Steve French:

 - add support for recognizing special file types (char/block/fifo/
   symlink) for files created by Linux on WSL (a format we plan to move
   to as the default for creating special files on Linux, as it has
   advantages over the other current option, the SFU format) in readdir.

 - fix double queries to root directory when directory leases not
   supported (e.g. Samba)

 - fix querying mode bits (modefromsid mount option) for special file
   types

 - stronger encryption (gcm256), disabled by default until tested more
   broadly

 - allow querying owner when server reports 'well known SID' on query
   dir with SMB3.1.1 POSIX extensions

* tag '5.10-rc-smb3-fixes-part1' of git://git.samba.org/sfrench/cifs-2.6: (30 commits)
  SMB3: add support for recognizing WSL reparse tags
  cifs: remove bogus debug code
  smb3.1.1: fix typo in compression flag
  cifs: move smb version mount options into fs_context.c
  cifs: move cache mount options to fs_context.ch
  cifs: move security mount options into fs_context.ch
  cifs: add files to host new mount api
  smb3: do not try to cache root directory if dir leases not supported
  smb3: fix stat when special device file and mounted with modefromsid
  cifs: Print the address and port we are connecting to in generic_ip_connect()
  SMB3: Resolve data corruption of TCP server info fields
  cifs: make const array static, makes object smaller
  SMB3.1.1: Fix ids returned in POSIX query dir
  smb3: add dynamic trace point to trace when credits obtained
  smb3.1.1: do not fail if no encryption required but server doesn't support it
  cifs: Return the error from crypt_message when enc/dec key not found.
  smb3.1.1: set gcm256 when requested
  smb3.1.1: rename nonces used for GCM and CCM encryption
  smb3.1.1: print warning if server does not support requested encryption type
  smb3.1.1: add new module load parm enable_gcm_256
  ...

0613ed91

Merge tag 'vfs-5.10-merge-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux · c4728cfb

Linus Torvalds authored Oct 23, 2020

Pull clone/dedupe/remap code refactoring from Darrick Wong:
 "Move the generic file range remap (aka reflink and dedupe) functions
  out of mm/filemap.c and fs/read_write.c and into fs/remap_range.c to
  reduce clutter in the first two files"

* tag 'vfs-5.10-merge-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  vfs: move the generic write and copy checks out of mm
  vfs: move the remap range helpers to remap_range.c
  vfs: move generic_remap_checks out of mm

c4728cfb

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm · f9a705ad

Linus Torvalds authored Oct 23, 2020

Pull KVM updates from Paolo Bonzini:
 "For x86, there is a new alternative and (in the future) more scalable
  implementation of extended page tables that does not need a reverse
  map from guest physical addresses to host physical addresses.

  For now it is disabled by default because it is still lacking a few of
  the existing MMU's bells and whistles. However it is a very solid
  piece of work and it is already available for people to hammer on it.

  Other updates:

  ARM:
   - New page table code for both hypervisor and guest stage-2
   - Introduction of a new EL2-private host context
   - Allow EL2 to have its own private per-CPU variables
   - Support of PMU event filtering
   - Complete rework of the Spectre mitigation

  PPC:
   - Fix for running nested guests with in-kernel IRQ chip
   - Fix race condition causing occasional host hard lockup
   - Minor cleanups and bugfixes

  x86:
   - allow trapping unknown MSRs to userspace
   - allow userspace to force #GP on specific MSRs
   - INVPCID support on AMD
   - nested AMD cleanup, on demand allocation of nested SVM state
   - hide PV MSRs and hypercalls for features not enabled in CPUID
   - new test for MSR_IA32_TSC writes from host and guest
   - cleanups: MMU, CPUID, shared MSRs
   - LAPIC latency optimizations ad bugfixes"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (232 commits)
  kvm: x86/mmu: NX largepage recovery for TDP MMU
  kvm: x86/mmu: Don't clear write flooding count for direct roots
  kvm: x86/mmu: Support MMIO in the TDP MMU
  kvm: x86/mmu: Support write protection for nesting in tdp MMU
  kvm: x86/mmu: Support disabling dirty logging for the tdp MMU
  kvm: x86/mmu: Support dirty logging for the TDP MMU
  kvm: x86/mmu: Support changed pte notifier in tdp MMU
  kvm: x86/mmu: Add access tracking for tdp_mmu
  kvm: x86/mmu: Support invalidate range MMU notifier for TDP MMU
  kvm: x86/mmu: Allocate struct kvm_mmu_pages for all pages in TDP MMU
  kvm: x86/mmu: Add TDP MMU PF handler
  kvm: x86/mmu: Remove disallowed_hugepage_adjust shadow_walk_iterator arg
  kvm: x86/mmu: Support zapping SPTEs in the TDP MMU
  KVM: Cache as_id in kvm_memory_slot
  kvm: x86/mmu: Add functions to handle changed TDP SPTEs
  kvm: x86/mmu: Allocate and free TDP MMU roots
  kvm: x86/mmu: Init / Uninit the TDP MMU
  kvm: x86/mmu: Introduce tdp_iter
  KVM: mmu: extract spte.h and spte.c
  KVM: mmu: Separate updating a PTE from kvm_set_pte_rmapp
  ...

f9a705ad

Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost · 9313f802

Linus Torvalds authored Oct 23, 2020

Pull virtio updates from Michael Tsirkin:
 "vhost, vdpa, and virtio cleanups and fixes

  A very quiet cycle, no new features"

* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
  MAINTAINERS: add URL for virtio-mem
  vhost_vdpa: remove unnecessary spin_lock in vhost_vring_call
  vringh: fix __vringh_iov() when riov and wiov are different
  vdpa/mlx5: Setup driver only if VIRTIO_CONFIG_S_DRIVER_OK
  s390: virtio: PV needs VIRTIO I/O device protection
  virtio: let arch advertise guest's memory access restrictions
  vhost_vdpa: Fix duplicate included kernel.h
  vhost: reduce stack usage in log_used
  virtio-mem: Constify mem_id_table
  virtio_input: Constify id_table
  virtio-balloon: Constify id_table
  vdpa/mlx5: Fix failure to bring link up
  vdpa/mlx5: Make use of a specific 16 bit endianness API

9313f802

Merge tag 'tag-chrome-platform-for-v5.10' of... · 090a7d04

Linus Torvalds authored Oct 23, 2020

Merge tag 'tag-chrome-platform-for-v5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/chrome-platform/linux

Pull chrome platform updates from Benson Leung:
 "cros-ec:
   - Error code cleanup across cros-ec by Guenter
   - Remove cros_ec_cmd_xfer in favor of cros_ec_cmd_xfer_status

  cros_ec_typec:
   - Landed initial USB4 support in typec connector class driver for
     cros_ec
   - Role switch bugfix on disconnect, and reordering configuration
     steps

  cros_ec_lightbar:
   - Fix buffer outsize and result for get_lightbar_version

  misc:
   - Remove config MFD_CROS_EC, now that transition from MFD is complete
   - Enable KEY_LEFTMETA in new location on arm based cros-ec-keyboard
     keymap"

* tag 'tag-chrome-platform-for-v5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/chrome-platform/linux:
  ARM: dts: cros-ec-keyboard: Add alternate keymap for KEY_LEFTMETA
  platform/chrome: Use kobj_to_dev() instead of container_of()
  platform/chrome: cros_ec_proto: Drop cros_ec_cmd_xfer()
  platform/chrome: cros_ec_proto: Update cros_ec_cmd_xfer() call-sites
  platform/chrome: Kconfig: Remove the transitional MFD_CROS_EC config
  platform/chrome: cros_ec_lightbar: Reduce ligthbar get version command
  platform/chrome: cros_ec_trace: Add fields to command traces
  platform/chrome: cros_ec_typec: Re-order connector configuration steps
  platform/chrome: cros_ec_typec: Avoid setting usb role twice during disconnect
  platform/chrome: cros_ec_typec: Send enum values to usb_role_switch_set_role()
  platform/chrome: cros_ec_typec: USB4 support
  pwm: cros-ec: Simplify EC error handling
  platform/chrome: cros_ec_proto: Convert EC error codes to Linux error codes
  platform/input: cros_ec: Replace -ENOTSUPP with -ENOPROTOOPT
  pwm: cros-ec: Accept more error codes from cros_ec_cmd_xfer_status
  platform/chrome: cros_ec_sysfs: Report range of error codes from EC
  cros_ec_lightbar: Accept more error codes from cros_ec_cmd_xfer_status
  iio: cros_ec: Accept -EOPNOTSUPP as 'not supported' error code

090a7d04

Merge tag 'arch-cleanup-2020-10-22' of git://git.kernel.dk/linux-block · 4a22709e

Linus Torvalds authored Oct 23, 2020

Pull arch task_work cleanups from Jens Axboe:
 "Two cleanups that don't fit other categories:

   - Finally get the task_work_add() cleanup done properly, so we don't
     have random 0/1/false/true/TWA_SIGNAL confusing use cases. Updates
     all callers, and also fixes up the documentation for
     task_work_add().

   - While working on some TIF related changes for 5.11, this
     TIF_NOTIFY_RESUME cleanup fell out of that. Remove some arch
     duplication for how that is handled"

* tag 'arch-cleanup-2020-10-22' of git://git.kernel.dk/linux-block:
  task_work: cleanup notification modes
  tracehook: clear TIF_NOTIFY_RESUME in tracehook_notify_resume()

4a22709e

Merge tag 'arc-5.10-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc · 0a14d764

Linus Torvalds authored Oct 23, 2020

Pull ARC fix from Vineet Gupta:
 "I found a snafu in perf driver which made it into 5.9-rc4 and the fix
  should go in now than wait"

* tag 'arc-5.10-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
  ARC: perf: redo the pct irq missing in device-tree handling

0a14d764

Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux · 032c7ed9

Linus Torvalds authored Oct 23, 2020

Pull more arm64 updates from Will Deacon:
 "A small selection of further arm64 fixes and updates. Most of these
  are fixes that came in during the merge window, with the exception of
  the HAVE_MOVE_PMD mremap() speed-up which we discussed back in 2018
  and somehow forgot to enable upstream.

   - Improve performance of Spectre-v2 mitigation on Falkor CPUs (if
     you're lucky enough to have one)

   - Select HAVE_MOVE_PMD. This has been shown to improve mremap()
     performance, which is used heavily by the Android runtime GC, and
     it seems we forgot to enable this upstream back in 2018.

   - Ensure linker flags are consistent between LLVM and BFD

   - Fix stale comment in Spectre mitigation rework

   - Fix broken copyright header

   - Fix KASLR randomisation of the linear map

   - Prevent arm64-specific prctl()s from compat tasks (return -EINVAL)"

Link: https://lore.kernel.org/kvmarm/20181108181201.88826-3-joelaf@google.com/

* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  arm64: proton-pack: Update comment to reflect new function name
  arm64: spectre-v2: Favour CPU-specific mitigation at EL2
  arm64: link with -z norelro regardless of CONFIG_RELOCATABLE
  arm64: Fix a broken copyright header in gen_vdso_offsets.sh
  arm64: mremap speedup - Enable HAVE_MOVE_PMD
  arm64: mm: use single quantity to represent the PA to VA translation
  arm64: reject prctl(PR_PAC_RESET_KEYS) on compat tasks

032c7ed9

gfs2: Recover statfs info in journal head · bedb0f05

Abhi Das authored Oct 20, 2020

Apply the outstanding statfs changes in the journal head to the
master statfs file. Zero out the local statfs file for good measure.

Previously, statfs updates would be read in from the local statfs inode and
synced to the master statfs inode during recovery.

We now use the statfs updates in the journal head to update the master statfs
inode instead of reading in from the local statfs inode. To preserve backward
compatibility with kernels that can't do this, we still need to keep the
local statfs inode up to date by writing changes to it. At some point in the
future, we can do away with the local statfs inodes altogether and keep the
statfs changes solely in the journal.
Signed-off-by: Abhi Das <adas@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>

bedb0f05

gfs2: lookup local statfs inodes prior to journal recovery · 97fd734b

Abhi Das authored Oct 20, 2020

We need to lookup the master statfs inode and the local statfs
inodes earlier in the mount process (in init_journal) so journal
recovery can use them when it attempts to recover the statfs info.
We lookup all the local statfs inodes and store them in a linked
list to allow a node to recover statfs info for other nodes in the
cluster.
Signed-off-by: Abhi Das <adas@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>

97fd734b

kvm: x86/mmu: NX largepage recovery for TDP MMU · 29cf0f50

Ben Gardon authored Oct 14, 2020

When KVM maps a largepage backed region at a lower level in order to
make it executable (i.e. NX large page shattering), it reduces the TLB
performance of that region. In order to avoid making this degradation
permanent, KVM must periodically reclaim shattered NX largepages by
zapping them and allowing them to be rebuilt in the page fault handler.

With this patch, the TDP MMU does not respect KVM's rate limiting on
reclaim. It traverses the entire TDP structure every time. This will be
addressed in a future patch.

Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell
machine. This series introduced no new failures.

This series can be viewed in Gerrit at:
https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538Signed-off-by: Ben Gardon <bgardon@google.com>
Message-Id: <20201014182700.2888246-21-bgardon@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

29cf0f50

kvm: x86/mmu: Don't clear write flooding count for direct roots · daa5b6c1

Ben Gardon authored Oct 14, 2020

Direct roots don't have a write flooding count because the guest can't
affect that paging structure. Thus there's no need to clear the write
flooding count on a fast CR3 switch for direct roots.

Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell
machine. This series introduced no new failures.

This series can be viewed in Gerrit at:
https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538Signed-off-by: Ben Gardon <bgardon@google.com>
Message-Id: <20201014182700.2888246-20-bgardon@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

daa5b6c1

kvm: x86/mmu: Support MMIO in the TDP MMU · 95fb5b02

Ben Gardon authored Oct 14, 2020

In order to support MMIO, KVM must be able to walk the TDP paging
structures to find mappings for a given GFN. Support this walk for
the TDP MMU.

Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell
machine. This series introduced no new failures.

This series can be viewed in Gerrit at:
	https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538

v2: Thanks to Dan Carpenter and kernel test robot for finding that root
was used uninitialized in get_mmio_spte.
Signed-off-by: Ben Gardon <bgardon@google.com>
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Message-Id: <20201014182700.2888246-19-bgardon@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

95fb5b02

kvm: x86/mmu: Support write protection for nesting in tdp MMU · 46044f72

Ben Gardon authored Oct 14, 2020

To support nested virtualization, KVM will sometimes need to write
protect pages which are part of a shadowed paging structure or are not
writable in the shadowed paging structure. Add a function to write
protect GFN mappings for this purpose.

Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell
machine. This series introduced no new failures.

This series can be viewed in Gerrit at:
https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538Signed-off-by: Ben Gardon <bgardon@google.com>
Message-Id: <20201014182700.2888246-18-bgardon@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

46044f72

kvm: x86/mmu: Support disabling dirty logging for the tdp MMU · 14881998

Ben Gardon authored Oct 14, 2020

Dirty logging ultimately breaks down MMU mappings to 4k granularity.
When dirty logging is no longer needed, these granaular mappings
represent a useless performance penalty. When dirty logging is disabled,
search the paging structure for mappings that could be re-constituted
into a large page mapping. Zap those mappings so that they can be
faulted in again at a higher mapping level.

Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell
machine. This series introduced no new failures.

This series can be viewed in Gerrit at:
https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538Signed-off-by: Ben Gardon <bgardon@google.com>
Message-Id: <20201014182700.2888246-17-bgardon@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

14881998

kvm: x86/mmu: Support dirty logging for the TDP MMU · a6a0b05d

Ben Gardon authored Oct 14, 2020

Dirty logging is a key feature of the KVM MMU and must be supported by
the TDP MMU. Add support for both the write protection and PML dirty
logging modes.

Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell
machine. This series introduced no new failures.

This series can be viewed in Gerrit at:
	https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538Signed-off-by: Ben Gardon <bgardon@google.com>
Message-Id: <20201014182700.2888246-16-bgardon@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

a6a0b05d

kvm: x86/mmu: Support changed pte notifier in tdp MMU · 1d8dd6b3

Ben Gardon authored Oct 14, 2020

In order to interoperate correctly with the rest of KVM and other Linux
subsystems, the TDP MMU must correctly handle various MMU notifiers. Add
a hook and handle the change_pte MMU notifier.

Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell
machine. This series introduced no new failures.

This series can be viewed in Gerrit at:
https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538Signed-off-by: Ben Gardon <bgardon@google.com>
Message-Id: <20201014182700.2888246-15-bgardon@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

1d8dd6b3

kvm: x86/mmu: Add access tracking for tdp_mmu · f8e14497

Ben Gardon authored Oct 14, 2020

In order to interoperate correctly with the rest of KVM and other Linux
subsystems, the TDP MMU must correctly handle various MMU notifiers. The
main Linux MM uses the access tracking MMU notifiers for swap and other
features. Add hooks to handle the test/flush HVA (range) family of
MMU notifiers.

Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell
machine. This series introduced no new failures.

This series can be viewed in Gerrit at:
	https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538Signed-off-by: Ben Gardon <bgardon@google.com>
Message-Id: <20201014182700.2888246-14-bgardon@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

f8e14497

kvm: x86/mmu: Support invalidate range MMU notifier for TDP MMU · 063afacd

Ben Gardon authored Oct 14, 2020

In order to interoperate correctly with the rest of KVM and other Linux
subsystems, the TDP MMU must correctly handle various MMU notifiers. Add
hooks to handle the invalidate range family of MMU notifiers.

Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell
machine. This series introduced no new failures.

This series can be viewed in Gerrit at:
https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538Signed-off-by: Ben Gardon <bgardon@google.com>
Message-Id: <20201014182700.2888246-13-bgardon@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

063afacd

kvm: x86/mmu: Allocate struct kvm_mmu_pages for all pages in TDP MMU · 89c0fd49

Ben Gardon authored Oct 14, 2020

Attach struct kvm_mmu_pages to every page in the TDP MMU to track
metadata, facilitate NX reclaim, and enable inproved parallelism of MMU
operations in future patches.

Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell
machine. This series introduced no new failures.

This series can be viewed in Gerrit at:
	https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538Signed-off-by: Ben Gardon <bgardon@google.com>
Message-Id: <20201014182700.2888246-12-bgardon@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

89c0fd49

kvm: x86/mmu: Add TDP MMU PF handler · bb18842e

Ben Gardon authored Oct 14, 2020

Add functions to handle page faults in the TDP MMU. These page faults
are currently handled in much the same way as the x86 shadow paging
based MMU, however the ordering of some operations is slightly
different. Future patches will add eager NX splitting, a fast page fault
handler, and parallel page faults.

Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell
machine. This series introduced no new failures.

This series can be viewed in Gerrit at:
https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538Signed-off-by: Ben Gardon <bgardon@google.com>
Message-Id: <20201014182700.2888246-11-bgardon@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

bb18842e