Commits · 3e3eabe26dc88692d34cf76ca0e0dd331481cc15 · Kirill Smelkov / linux

17 Sep, 2023 8 commits

x86/boot: Increase section and file alignment to 4k/512 · 3e3eabe2

Ard Biesheuvel authored Sep 15, 2023

Align x86 with other EFI architectures, and increase the section
alignment to the EFI page size (4k), so that firmware is able to honour
the section permission attributes and map code read-only and data
non-executable.

There are a number of requirements that have to be taken into account:
- the sign tools get cranky when there are gaps between sections in the
  file view of the image
- the virtual offset of each section must be aligned to the image's
  section alignment
- the file offset *and size* of each section must be aligned to the
  image's file alignment
- the image size must be aligned to the section alignment
- each section's virtual offset must be greater than or equal to the
  size of the headers.

In order to meet all these requirements, while avoiding the need for
lots of padding to accommodate the .compat section, the latter is placed
at an arbitrary offset towards the end of the image, but aligned to the
minimum file alignment (512 bytes). The space before the .text section
is therefore distributed between the PE header, the .setup section and
the .compat section, leaving no gaps in the file coverage, making the
signing tools happy.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20230915171623.655440-18-ardb@google.com

3e3eabe2

x86/boot: Split off PE/COFF .data section · 34951f3c

Ard Biesheuvel authored Sep 15, 2023

Describe the code and data of the decompressor binary using separate
.text and .data PE/COFF sections, so that we will be able to map them
using restricted permissions once we increase the section and file
alignment sufficiently. This avoids the need for memory mappings that
are writable and executable at the same time, which is something that
is best avoided for security reasons.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20230915171623.655440-17-ardb@google.com

34951f3c

x86/boot: Drop PE/COFF .reloc section · fa575052

Ard Biesheuvel authored Sep 15, 2023

Ancient buggy EFI loaders may have required a .reloc section to be
present at some point in time, but this has not been true for a long
time so the .reloc section can just be dropped.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20230915171623.655440-16-ardb@google.com

fa575052

x86/boot: Construct PE/COFF .text section from assembler · efa089e6

Ard Biesheuvel authored Sep 15, 2023

Now that the size of the setup block is visible to the assembler, it is
possible to populate the PE/COFF header fields from the asm code
directly, instead of poking the values into the binary using the build
tool. This will make it easier to reorganize the section layout without
having to tweak the build tool in lockstep.

This change has no impact on the resulting bzImage binary.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20230915171623.655440-15-ardb@google.com

efa089e6

x86/boot: Derive file size from _edata symbol · aeb92067

Ard Biesheuvel authored Sep 15, 2023

Tweak the linker script so that the value of _edata represents the
decompressor binary's file size rounded up to the appropriate alignment.
This removes the need to calculate it in the build tool, and will make
it easier to refer to the file size from the header directly in
subsequent changes to the PE header layout.

While adding _edata to the sed regex that parses the compressed
vmlinux's symbol list, tweak the regex a bit for conciseness.

This change has no impact on the resulting bzImage binary when
configured with CONFIG_EFI_STUB=y.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20230915171623.655440-14-ardb@google.com

aeb92067

x86/boot: Define setup size in linker script · 093ab258

Ard Biesheuvel authored Sep 15, 2023

The setup block contains the real mode startup code that is used when
booting from a legacy BIOS, along with the boot_params/setup_data that
is used by legacy x86 bootloaders to pass the command line and initial
ramdisk parameters, among other things.

The setup block also contains the PE/COFF header of the entire combined
image, which includes the compressed kernel image, the decompressor and
the EFI stub.

This PE header describes the layout of the executable image in memory,
and currently, the fact that the setup block precedes it makes it rather
fiddly to get the right values into the right place in the final image.

Let's make things a bit easier by defining the setup_size in the linker
script so it can be referenced from the asm code directly, rather than
having to rely on the build tool to calculate it. For the time being,
add 64 bytes of fixed padding for the .reloc and .compat sections - this
will be removed in a subsequent patch after the PE/COFF header has been
reorganized.

This change has no impact on the resulting bzImage binary when
configured with CONFIG_EFI_MIXED=y.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20230915171623.655440-13-ardb@google.com

093ab258

x86/boot: Set EFI handover offset directly in header asm · eac95634

Ard Biesheuvel authored Sep 15, 2023

The offsets of the EFI handover entrypoints are available to the
assembler when constructing the header, so there is no need to set them
from the build tool afterwards.

This change has no impact on the resulting bzImage binary.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20230915171623.655440-12-ardb@google.com

eac95634

x86/boot: Grab kernel_info offset from zoffset header directly · 2e765c02

Ard Biesheuvel authored Sep 15, 2023

Instead of parsing zoffset.h and poking the kernel_info offset value
into the header from the build tool, just grab the value directly in the
asm file that describes this header.

This change has no impact on the resulting bzImage binary.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20230915171623.655440-11-ardb@google.com

2e765c02

15 Sep, 2023 7 commits

x86/boot: Drop references to startup_64 · b618d31f

Ard Biesheuvel authored Sep 12, 2023

The x86 boot image generation tool assign a default value to startup_64
and subsequently parses the actual value from zoffset.h but it never
actually uses the value anywhere. So remove this code.

This change has no impact on the resulting bzImage binary.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20230912090051.4014114-25-ardb@google.com

b618d31f

x86/boot: Drop redundant code setting the root device · 7448e8e5

Ard Biesheuvel authored Sep 12, 2023

The root device defaults to 0,0 and is no longer configurable at build
time [0], so there is no need for the build tool to ever write to this
field.

[0] 079f85e6 ("x86, build: Do not set the root_dev field in bzImage")

This change has no impact on the resulting bzImage binary.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20230912090051.4014114-23-ardb@google.com

7448e8e5

x86/boot: Omit compression buffer from PE/COFF image memory footprint · 8eace5b3

Ard Biesheuvel authored Sep 12, 2023

Now that the EFI stub decompresses the kernel and hands over to the
decompressed image directly, there is no longer a need to provide a
decompression buffer as part of the .BSS allocation of the PE/COFF
image. It also means the PE/COFF image can be loaded anywhere in memory,
and setting the preferred image base is unnecessary. So drop the
handling of this from the header and from the build tool.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20230912090051.4014114-22-ardb@google.com

8eace5b3

x86/boot: Remove the 'bugger off' message · 768171d7

Ard Biesheuvel authored Sep 12, 2023

Ancient (pre-2003) x86 kernels could boot from a floppy disk straight from
the BIOS, using a small real mode boot stub at the start of the image
where the BIOS would expect the boot record (or boot block) to appear.

Due to its limitations (kernel size < 1 MiB, no support for IDE, USB or
El Torito floppy emulation), this support was dropped, and a Linux aware
bootloader is now always required to boot the kernel from a legacy BIOS.

To smoothen this transition, the boot stub was not removed entirely, but
replaced with one that just prints an error message telling the user to
install a bootloader.

As it is unlikely that anyone doing direct floppy boot with such an
ancient kernel is going to upgrade to v6.5+ and expect that this boot
method still works, printing this message is kind of pointless, and so
it should be possible to remove the logic that emits it.

Let's free up this space so it can be used to expand the PE header in a
subsequent patch.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: H. Peter Anvin (Intel) <hpa@zytor.com>
Link: https://lore.kernel.org/r/20230912090051.4014114-21-ardb@google.com

768171d7

x86/efi: Drop alignment flags from PE section headers · bfab35f5

Ard Biesheuvel authored Sep 12, 2023

The section header flags for alignment are documented in the PE/COFF
spec as being applicable to PE object files only, not to PE executables
such as the Linux bzImage, so let's drop them from the PE header.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20230912090051.4014114-20-ardb@google.com

bfab35f5

x86/efi: Disregard setup header of loaded image · 7e502622

Ard Biesheuvel authored Sep 12, 2023

The native EFI entrypoint does not take a struct boot_params from the
loader, but instead, it constructs one from scratch, using the setup
header data placed at the start of the image.

This setup header is placed in a way that permits legacy loaders to
manipulate the contents (i.e., to pass the kernel command line or the
address and size of an initial ramdisk), but EFI boot does not use it in
that way - it only copies the contents that were placed there at build
time, but EFI loaders will not (and should not) manipulate the setup
header to configure the boot. (Commit 63bf28ce "efi: x86: Wipe
setup_data on pure EFI boot" deals with some of the fallout of using
setup_data in a way that breaks EFI boot.)

Given that none of the non-zero values that are copied from the setup
header into the EFI stub's struct boot_params are relevant to the boot
now that the EFI stub no longer enters via the legacy decompressor, the
copy can be omitted altogether.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20230912090051.4014114-19-ardb@google.com

7e502622

x86/efi: Drop EFI stub .bss from .data section · 5f51c5d0

Ard Biesheuvel authored Sep 12, 2023

Now that the EFI stub always zero inits its BSS section upon entry,
there is no longer a need to place the BSS symbols carried by the stub
into the .data section.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20230912090051.4014114-18-ardb@google.com

5f51c5d0

14 Sep, 2023 5 commits

Merge tag 'net-6.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 9fdfb15a

Linus Torvalds authored Sep 14, 2023

Pull networking fixes from Paolo Abeni:
 "Quite unusually, this does not contains any fix coming from subtrees
  (nf, ebpf, wifi, etc).

  Current release - regressions:

   - bcmasp: fix possible OOB write in bcmasp_netfilt_get_all_active()

  Previous releases - regressions:

   - ipv4: fix one memleak in __inet_del_ifa()

   - tcp: fix bind() regressions for v4-mapped-v6 addresses.

   - tls: do not free tls_rec on async operation in
     bpf_exec_tx_verdict()

   - dsa: fixes for SJA1105 FDB regressions

   - veth: update XDP feature set when bringing up device

   - igb: fix hangup when enabling SR-IOV

  Previous releases - always broken:

   - kcm: fix memory leak in error path of kcm_sendmsg()

   - smc: fix data corruption in smcr_port_add

   - microchip: fix possible memory leak for vcap_dup_rule()"

* tag 'net-6.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (37 commits)
  kcm: Fix error handling for SOCK_DGRAM in kcm_sendmsg().
  net: renesas: rswitch: Add spin lock protection for irq {un}mask
  net: renesas: rswitch: Fix unmasking irq condition
  igb: clean up in all error paths when enabling SR-IOV
  ixgbe: fix timestamp configuration code
  selftest: tcp: Add v4-mapped-v6 cases in bind_wildcard.c.
  selftest: tcp: Move expected_errno into each test case in bind_wildcard.c.
  selftest: tcp: Fix address length in bind_wildcard.c.
  tcp: Fix bind() regression for v4-mapped-v6 non-wildcard address.
  tcp: Fix bind() regression for v4-mapped-v6 wildcard address.
  tcp: Factorise sk_family-independent comparison in inet_bind2_bucket_match(_addr_any).
  ipv6: fix ip6_sock_set_addr_preferences() typo
  veth: Update XDP feature set when bringing up device
  net: macb: fix sleep inside spinlock
  net/tls: do not free tls_rec on async operation in bpf_exec_tx_verdict()
  net: ethernet: mtk_eth_soc: fix pse_port configuration for MT7988
  net: ethernet: mtk_eth_soc: fix uninitialized variable
  kcm: Fix memory leak in error path of kcm_sendmsg()
  r8152: check budget for r8152_poll()
  net: dsa: sja1105: block FDB accesses that are concurrent with a switch reset
  ...

9fdfb15a

kcm: Fix error handling for SOCK_DGRAM in kcm_sendmsg(). · a22730b1

Kuniyuki Iwashima authored Sep 11, 2023

syzkaller found a memory leak in kcm_sendmsg(), and commit c821a88b
("kcm: Fix memory leak in error path of kcm_sendmsg()") suppressed it by
updating kcm_tx_msg(head)->last_skb if partial data is copied so that the
following sendmsg() will resume from the skb.

However, we cannot know how many bytes were copied when we get the error.
Thus, we could mess up the MSG_MORE queue.

When kcm_sendmsg() fails for SOCK_DGRAM, we should purge the queue as we
do so for UDP by udp_flush_pending_frames().

Even without this change, when the error occurred, the following sendmsg()
resumed from a wrong skb and the queue was messed up.  However, we have
yet to get such a report, and only syzkaller stumbled on it.  So, this
can be changed safely.

Note this does not change SOCK_SEQPACKET behaviour.

Fixes: c821a88b ("kcm: Fix memory leak in error path of kcm_sendmsg()")
Fixes: ab7ac4eb ("kcm: Kernel Connection Multiplexor module")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/r/20230912022753.33327-1-kuniyu@amazon.comSigned-off-by: Paolo Abeni <pabeni@redhat.com>

a22730b1

Merge branch 'net-renesas-rswitch-fix-a-lot-of-redundant-irq-issue' · 96f7dc69

Paolo Abeni authored Sep 14, 2023

Yoshihiro Shimoda says:

====================
net: renesas: rswitch: Fix a lot of redundant irq issue

After this patch series was applied, a lot of redundant interrupts
no longer occur.

For example: when "iperf3 -c <ipaddr> -R" on R-Car S4-8 Spider
 Before the patches are applied: about 800,000 times happened
 After the patches were applied: about 100,000 times happened
====================

Link: https://lore.kernel.org/r/20230912014936.3175430-1-yoshihiro.shimoda.uh@renesas.comSigned-off-by: Paolo Abeni <pabeni@redhat.com>

96f7dc69

net: renesas: rswitch: Add spin lock protection for irq {un}mask · c4f922e8

Yoshihiro Shimoda authored Sep 12, 2023

Add spin lock protection for irq {un}mask registers' control.

After napi_complete_done() and this protection were applied,
a lot of redundant interrupts no longer occur.

For example: when "iperf3 -c <ipaddr> -R" on R-Car S4-8 Spider
 Before the patches are applied: about 800,000 times happened
 After the patches were applied: about 100,000 times happened

Fixes: 3590918b ("net: ethernet: renesas: Add support for "Ethernet Switch"")
Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

c4f922e8

net: renesas: rswitch: Fix unmasking irq condition · e7b1ef29

Yoshihiro Shimoda authored Sep 12, 2023

Fix unmasking irq condition by using napi_complete_done(). Otherwise,
redundant interrupts happen.

Fixes: 3590918b ("net: ethernet: renesas: Add support for "Ethernet Switch"")
Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

e7b1ef29

13 Sep, 2023 14 commits

Merge tag 'pmdomain-v6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/linux-pm · aed8aee1

Linus Torvalds authored Sep 13, 2023

Pull genpm / pmdomain rename from Ulf Hansson:
 "This renames the genpd subsystem to pmdomain.

  As discussed on LKML, using 'genpd' as the name of a subsystem isn't
  very self-explanatory and the acronym itself that means Generic PM
  Domain, is known only by a limited group of people.

  The suggestion to improve the situation is to rename the subsystem to
  'pmdomain', which there seems to be a good consensus around using.

  Ideally it should indicate that its purpose is to manage Power Domains
  or 'PM domains' as we often also use within the Linux Kernel
  terminology"

* tag 'pmdomain-v6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/linux-pm:
  pmdomain: Rename the genpd subsystem to pmdomain

aed8aee1

Merge tag 'tpmdd-v6.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd · 23f108dc

Linus Torvalds authored Sep 13, 2023

Pull tpm fix from Jarkko Sakkinen.

* tag 'tpmdd-v6.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd:
  tpm: Fix typo in tpmrm class definition

23f108dc

Merge tag 'parisc-for-6.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux · 847165d7

Linus Torvalds authored Sep 13, 2023

Pull parisc architecture fixes from Helge Deller:

 - fix reference to exported symbols for parisc64 [Masahiro Yamada]

 - Block-TLB (BTLB) support on 32-bit CPUs

 - sparse and build-warning fixes

* tag 'parisc-for-6.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
  linux/export: fix reference to exported functions for parisc64
  parisc: BTLB: Initialize BTLB tables at CPU startup
  parisc: firmware: Simplify calling non-PA20 functions
  parisc: BTLB: _edata symbol has to be page aligned for BTLB support
  parisc: BTLB: Add BTLB insert and purge firmware function wrappers
  parisc: BTLB: Clear possibly existing BTLB entries
  parisc: Prepare for Block-TLB support on 32-bit kernel
  parisc: shmparam.h: Document aliasing requirements of PA-RISC
  parisc: irq: Make irq_stack_union static to avoid sparse warning
  parisc: drivers: Fix sparse warning
  parisc: iosapic.c: Fix sparse warnings
  parisc: ccio-dma: Fix sparse warnings
  parisc: sba-iommu: Fix sparse warnigs
  parisc: sba: Fix compile warning wrt list of SBA devices
  parisc: sba_iommu: Fix build warning if procfs if disabled

847165d7

Merge tag 'trace-v6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace · 99214f67

Linus Torvalds authored Sep 13, 2023

Pull tracing fixes from Steven Rostedt:

- Add missing LOCKDOWN checks for eventfs callers

When LOCKDOWN is active for tracing, it causes inconsistent state
when some functions succeed and others fail.

- Use dput() to free the top level eventfs descriptor

There was a race between accesses and freeing it.

- Fix a long standing bug that eventfs exposed due to changing timings
by dynamically creating files. That is, If a event file is opened for
an instance, there's nothing preventing the instance from being
removed which will make accessing the files cause use-after-free
bugs.

- Fix a ring buffer race that happens when iterating over the ring
buffer while writers are active. Check to make sure not to read the
event meta data if it's beyond the end of the ring buffer sub buffer.

- Fix the print trigger that disappeared because the test to create it
was looking for the event dir field being filled, but now it has the
"ef" field filled for the eventfs structure.

- Remove the unused "dir" field from the event structure.

- Fix the order of the trace_dynamic_info as it had it backwards for
the offset and len fields for which one was for which endianess.

- Fix NULL pointer dereference with eventfs_remove_rec()

If an allocation fails in one of the eventfs_add_*() functions, the
caller of it in event_subsystem_dir() or event_create_dir() assigns
the result to the structure. But it's assigning the ERR_PTR and not
NULL. This was passed to eventfs_remove_rec() which expects either a
good pointer or a NULL, not ERR_PTR. The fix is to not assign the
ERR_PTR to the structure, but to keep it NULL on error.

- Fix list_for_each_rcu() to use list_for_each_srcu() in
dcache_dir_open_wrapper(). One iteration of the code used RCU but
because it had to call sleepable code, it had to be changed to use
SRCU, but one of the iterations was missed.

- Fix synthetic event print function to use "as_u64" instead of passing
in a pointer to the union. To fix big/little endian issues, the u64
that represented several types was turned into a union to define the
types properly.

* tag 'trace-v6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
eventfs: Fix the NULL pointer dereference bug in eventfs_remove_rec()
tracefs/eventfs: Use list_for_each_srcu() in dcache_dir_open_wrapper()
tracing/synthetic: Print out u64 values properly
tracing/synthetic: Fix order of struct trace_dynamic_info
selftests/ftrace: Fix dependencies for some of the synthetic event tests
tracing: Remove unused trace_event_file dir field
tracing: Use the new eventfs descriptor for print trigger
ring-buffer: Do not attempt to read past "commit"
tracefs/eventfs: Free top level files on removal
ring-buffer: Avoid softlockup in ring_buffer_resize()
tracing: Have event inject files inc the trace array ref count
tracing: Have option files inc the trace array ref count
tracing: Have current_trace inc the trace array ref count
tracing: Have tracing_max_latency inc the trace array ref count
tracing: Increase trace array ref count on enable and filter files
tracefs/eventfs: Use dput to free the toplevel events directory
tracefs/eventfs: Add missing lockdown checks
tracefs: Add missing lockdown check to tracefs_create_dir()

99214f67

igb: clean up in all error paths when enabling SR-IOV · bc6ed2fa

Corinna Vinschen authored Sep 11, 2023

After commit 50f30349 ("igb: Enable SR-IOV after reinit"), removing
the igb module could hang or crash (depending on the machine) when the
module has been loaded with the max_vfs parameter set to some value != 0.

In case of one test machine with a dual port 82580, this hang occurred:

[  232.480687] igb 0000:41:00.1: removed PHC on enp65s0f1
[  233.093257] igb 0000:41:00.1: IOV Disabled
[  233.329969] pcieport 0000:40:01.0: AER: Multiple Uncorrected (Non-Fatal) err0
[  233.340302] igb 0000:41:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fata)
[  233.352248] igb 0000:41:00.0:   device [8086:1516] error status/mask=00100000
[  233.361088] igb 0000:41:00.0:    [20] UnsupReq               (First)
[  233.368183] igb 0000:41:00.0: AER:   TLP Header: 40000001 0000040f cdbfc00c c
[  233.376846] igb 0000:41:00.1: PCIe Bus Error: severity=Uncorrected (Non-Fata)
[  233.388779] igb 0000:41:00.1:   device [8086:1516] error status/mask=00100000
[  233.397629] igb 0000:41:00.1:    [20] UnsupReq               (First)
[  233.404736] igb 0000:41:00.1: AER:   TLP Header: 40000001 0000040f cdbfc00c c
[  233.538214] pci 0000:41:00.1: AER: can't recover (no error_detected callback)
[  233.538401] igb 0000:41:00.0: removed PHC on enp65s0f0
[  233.546197] pcieport 0000:40:01.0: AER: device recovery failed
[  234.157244] igb 0000:41:00.0: IOV Disabled
[  371.619705] INFO: task irq/35-aerdrv:257 blocked for more than 122 seconds.
[  371.627489]       Not tainted 6.4.0-dirty #2
[  371.632257] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this.
[  371.641000] task:irq/35-aerdrv   state:D stack:0     pid:257   ppid:2      f0
[  371.650330] Call Trace:
[  371.653061]  <TASK>
[  371.655407]  __schedule+0x20e/0x660
[  371.659313]  schedule+0x5a/0xd0
[  371.662824]  schedule_preempt_disabled+0x11/0x20
[  371.667983]  __mutex_lock.constprop.0+0x372/0x6c0
[  371.673237]  ? __pfx_aer_root_reset+0x10/0x10
[  371.678105]  report_error_detected+0x25/0x1c0
[  371.682974]  ? __pfx_report_normal_detected+0x10/0x10
[  371.688618]  pci_walk_bus+0x72/0x90
[  371.692519]  pcie_do_recovery+0xb2/0x330
[  371.696899]  aer_process_err_devices+0x117/0x170
[  371.702055]  aer_isr+0x1c0/0x1e0
[  371.705661]  ? __set_cpus_allowed_ptr+0x54/0xa0
[  371.710723]  ? __pfx_irq_thread_fn+0x10/0x10
[  371.715496]  irq_thread_fn+0x20/0x60
[  371.719491]  irq_thread+0xe6/0x1b0
[  371.723291]  ? __pfx_irq_thread_dtor+0x10/0x10
[  371.728255]  ? __pfx_irq_thread+0x10/0x10
[  371.732731]  kthread+0xe2/0x110
[  371.736243]  ? __pfx_kthread+0x10/0x10
[  371.740430]  ret_from_fork+0x2c/0x50
[  371.744428]  </TASK>

The reproducer was a simple script:

  #!/bin/sh
  for i in `seq 1 5`; do
    modprobe -rv igb
    modprobe -v igb max_vfs=1
    sleep 1
    modprobe -rv igb
  done

It turned out that this could only be reproduce on 82580 (quad and
dual-port), but not on 82576, i350 and i210.  Further debugging showed
that igb_enable_sriov()'s call to pci_enable_sriov() is failing, because
dev->is_physfn is 0 on 82580.

Prior to commit 50f30349 ("igb: Enable SR-IOV after reinit"),
igb_enable_sriov() jumped into the "err_out" cleanup branch.  After this
commit it only returned the error code.

So the cleanup didn't take place, and the incorrect VF setup in the
igb_adapter structure fooled the igb driver into assuming that VFs have
been set up where no VF actually existed.

Fix this problem by cleaning up again if pci_enable_sriov() fails.

Fixes: 50f30349 ("igb: Enable SR-IOV after reinit")
Signed-off-by: Corinna Vinschen <vinschen@redhat.com>
Reviewed-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bc6ed2fa

ixgbe: fix timestamp configuration code · 3c44191d

Vadim Fedorenko authored Sep 11, 2023

The commit in fixes introduced flags to control the status of hardware
configuration while processing packets. At the same time another structure
is used to provide configuration of timestamper to user-space applications.
The way it was coded makes this structures go out of sync easily. The
repro is easy for 82599 chips:

[root@hostname ~]# hwstamp_ctl -i eth0 -r 12 -t 1
current settings:
tx_type 0
rx_filter 0
new settings:
tx_type 1
rx_filter 12

The eth0 device is properly configured to timestamp any PTPv2 events.

[root@hostname ~]# hwstamp_ctl -i eth0 -r 1 -t 1
current settings:
tx_type 1
rx_filter 12
SIOCSHWTSTAMP failed: Numerical result out of range
The requested time stamping mode is not supported by the hardware.

The error is properly returned because HW doesn't support all packets
timestamping. But the adapter->flags is cleared of timestamp flags
even though no HW configuration was done. From that point no RX timestamps
are received by user-space application. But configuration shows good
values:

[root@hostname ~]# hwstamp_ctl -i eth0
current settings:
tx_type 1
rx_filter 12

Fix the issue by applying new flags only when the HW was actually
configured.

Fixes: a9763f3c ("ixgbe: Update PTP to support X550EM_x devices")
Signed-off-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

3c44191d

pmdomain: Rename the genpd subsystem to pmdomain · e2ad626f

Ulf Hansson authored Sep 13, 2023

It has been pointed out that naming a subsystem "genpd" isn't very
self-explanatory and the acronym itself that means Generic PM Domain, is
known only by a limited group of people.

In a way to improve the situation, let's rename the subsystem to pmdomain,
which ideally should indicate that this is about so called Power Domains or
"PM domains" as we often also use within the Linux Kernel terminology.
Suggested-by: Rafael J. Wysocki <rafael@kernel.org>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Heiko Stuebner <heiko@sntech.de>
Acked-by: Rafael J. Wysocki <rafael@kernel.org>
Acked-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://lore.kernel.org/r/20230912221127.487327-1-ulf.hansson@linaro.org

e2ad626f

Merge branch 'tcp-bind-fixes' · ab6c4ec8

David S. Miller authored Sep 13, 2023

Kuniyuki Iwashima says:

====================
tcp: Fix bind() regression for v4-mapped-v6 address

Since bhash2 was introduced, bind() is broken in two cases related
to v4-mapped-v6 address.

This series fixes the regression and adds test to cover the cases.

Changes:
  v2:
    * Added patch 1 to factorise duplicated comparison (Eric Dumazet)

  v1: https://lore.kernel.org/netdev/20230911165106.39384-1-kuniyu@amazon.com/
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

ab6c4ec8

selftest: tcp: Add v4-mapped-v6 cases in bind_wildcard.c. · 8637d8e8

Kuniyuki Iwashima authored Sep 11, 2023

We add these 8 test cases in bind_wildcard.c to check bind() conflicts.

  1st bind()          2nd bind()
  ---------           ---------
  0.0.0.0             ::FFFF:0.0.0.0
  ::FFFF:0.0.0.0      0.0.0.0
  0.0.0.0             ::FFFF:127.0.0.1
  ::FFFF:127.0.0.1    0.0.0.0
  127.0.0.1           ::FFFF:0.0.0.0
  ::FFFF:0.0.0.0      127.0.0.1
  127.0.0.1           ::FFFF:127.0.0.1
  ::FFFF:127.0.0.1    127.0.0.1

All test passed without bhash2 and with bhash2 and this series.

 Before bhash2:
  $ uname -r
  6.0.0-rc1-00393-g0bf73255
  $ ./bind_wildcard
  ...
  # PASSED: 16 / 16 tests passed.

 Just after bhash2:
  $ uname -r
  6.0.0-rc1-00394-g28044fc1
  $ ./bind_wildcard
  ...
  ok 15 bind_wildcard.v4_local_v6_v4mapped_local.v4_v6
  not ok 16 bind_wildcard.v4_local_v6_v4mapped_local.v6_v4
  # FAILED: 15 / 16 tests passed.

 On net.git:
  $ ./bind_wildcard
  ...
  not ok 14 bind_wildcard.v4_local_v6_v4mapped_any.v6_v4
  not ok 16 bind_wildcard.v4_local_v6_v4mapped_local.v6_v4
  # FAILED: 13 / 16 tests passed.

 With this series:
  $ ./bind_wildcard
  ...
  # PASSED: 16 / 16 tests passed.
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8637d8e8

selftest: tcp: Move expected_errno into each test case in bind_wildcard.c. · 2895d879

Kuniyuki Iwashima authored Sep 11, 2023

This is a preparation patch for the following patch.

Let's define expected_errno in each test case so that we can add other test
cases easily.
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

2895d879

selftest: tcp: Fix address length in bind_wildcard.c. · 0071d155

Kuniyuki Iwashima authored Sep 11, 2023

The selftest passes the IPv6 address length for an IPv4 address.
We should pass the correct length.

Note inet_bind_sk() does not check if the size is larger than
sizeof(struct sockaddr_in), so there is no real bug in this
selftest.

Fixes: 13715acf ("selftest: Add test for bind() conflicts.")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

0071d155

tcp: Fix bind() regression for v4-mapped-v6 non-wildcard address. · c48ef9c4

Kuniyuki Iwashima authored Sep 11, 2023

Since bhash2 was introduced, the example below does not work as expected.
These two bind() should conflict, but the 2nd bind() now succeeds.

  from socket import *

  s1 = socket(AF_INET6, SOCK_STREAM)
  s1.bind(('::ffff:127.0.0.1', 0))

  s2 = socket(AF_INET, SOCK_STREAM)
  s2.bind(('127.0.0.1', s1.getsockname()[1]))

During the 2nd bind() in inet_csk_get_port(), inet_bind2_bucket_find()
fails to find the 1st socket's tb2, so inet_bind2_bucket_create() allocates
a new tb2 for the 2nd socket.  Then, we call inet_csk_bind_conflict() that
checks conflicts in the new tb2 by inet_bhash2_conflict().  However, the
new tb2 does not include the 1st socket, thus the bind() finally succeeds.

In this case, inet_bind2_bucket_match() must check if AF_INET6 tb2 has
the conflicting v4-mapped-v6 address so that inet_bind2_bucket_find()
returns the 1st socket's tb2.

Note that if we bind two sockets to 127.0.0.1 and then ::FFFF:127.0.0.1,
the 2nd bind() fails properly for the same reason mentinoed in the previous
commit.

Fixes: 28044fc1 ("net: Add a bhash2 table hashed by port and address")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Acked-by: Andrei Vagin <avagin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c48ef9c4

tcp: Fix bind() regression for v4-mapped-v6 wildcard address. · aa99e5f8

Kuniyuki Iwashima authored Sep 11, 2023

Andrei Vagin reported bind() regression with strace logs.

If we bind() a TCPv6 socket to ::FFFF:0.0.0.0 and then bind() a TCPv4
socket to 127.0.0.1, the 2nd bind() should fail but now succeeds.

  from socket import *

  s1 = socket(AF_INET6, SOCK_STREAM)
  s1.bind(('::ffff:0.0.0.0', 0))

  s2 = socket(AF_INET, SOCK_STREAM)
  s2.bind(('127.0.0.1', s1.getsockname()[1]))

During the 2nd bind(), if tb->family is AF_INET6 and sk->sk_family is
AF_INET in inet_bind2_bucket_match_addr_any(), we still need to check
if tb has the v4-mapped-v6 wildcard address.

The example above does not work after commit 5456262d ("net: Fix
incorrect address comparison when searching for a bind2 bucket"), but
the blamed change is not the commit.

Before the commit, the leading zeros of ::FFFF:0.0.0.0 were treated
as 0.0.0.0, and the sequence above worked by chance.  Technically, this
case has been broken since bhash2 was introduced.

Note that if we bind() two sockets to 127.0.0.1 and then ::FFFF:0.0.0.0,
the 2nd bind() fails properly because we fall back to using bhash to
detect conflicts for the v4-mapped-v6 address.

Fixes: 28044fc1 ("net: Add a bhash2 table hashed by port and address")
Reported-by: Andrei Vagin <avagin@google.com>
Closes: https://lore.kernel.org/netdev/ZPuYBOFC8zsK6r9T@google.com/Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

aa99e5f8

tcp: Factorise sk_family-independent comparison in inet_bind2_bucket_match(_addr_any). · c6d27706

Kuniyuki Iwashima authored Sep 11, 2023

This is a prep patch to make the following patches cleaner that touch
inet_bind2_bucket_match() and inet_bind2_bucket_match_addr_any().

Both functions have duplicated comparison for netns, port, and l3mdev.
Let's factorise them.
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c6d27706

12 Sep, 2023 6 commits

tpm: Fix typo in tpmrm class definition · ea72883a

Justin M. Forbes authored Sep 12, 2023

Commit d2e8071b ("tpm: make all 'class' structures const")
unfortunately had a typo for the name on tpmrm.

Fixes: d2e8071b ("tpm: make all 'class' structures const")
Signed-off-by: Justin M. Forbes <jforbes@fedoraproject.org>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>

ea72883a

Merge tag 'for-6.6-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux · 3669558b

Linus Torvalds authored Sep 12, 2023

Pull btrfs fixes from David Sterba:

 - several fixes for handling directory item (inserting, removing,
   iteration, error handling)

 - fix transaction commit stalls when auto relocation is running and
   blocks other tasks that want to commit

 - fix a build error when DEBUG is enabled

 - fix lockdep warning in inode number lookup ioctl

 - fix race when finishing block group creation

 - remove link to obsolete wiki in several files

* tag 'for-6.6-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  MAINTAINERS: remove links to obsolete btrfs.wiki.kernel.org
  btrfs: assert delayed node locked when removing delayed item
  btrfs: remove BUG() after failure to insert delayed dir index item
  btrfs: improve error message after failure to add delayed dir index item
  btrfs: fix a compilation error if DEBUG is defined in btree_dirty_folio
  btrfs: check for BTRFS_FS_ERROR in pending ordered assert
  btrfs: fix lockdep splat and potential deadlock after failure running delayed items
  btrfs: do not block starts waiting on previous transaction commit
  btrfs: release path before inode lookup during the ino lookup ioctl
  btrfs: fix race between finishing block group creation and its item update

3669558b

Merge tag 'platform-drivers-x86-v6.6-2' of... · 2c758cef

Linus Torvalds authored Sep 12, 2023

Merge tag 'platform-drivers-x86-v6.6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86

Pull x86 platform driver fixes from Hans de Goede:

 - various platform/mellanox fixes

 - one new DMI quirk for asus-wmi

* tag 'platform-drivers-x86-v6.6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
  platform/x86: asus-wmi: Support 2023 ROG X16 tablet mode
  platform/mellanox: NVSW_SN2201 should depend on ACPI
  platform/mellanox: mlxbf-bootctl: add NET dependency into Kconfig
  platform/mellanox: mlxbf-pmc: Fix reading of unprogrammed events
  platform/mellanox: mlxbf-pmc: Fix potential buffer overflows
  platform/mellanox: mlxbf-tmfifo: Drop jumbo frames
  platform/mellanox: mlxbf-tmfifo: Drop the Rx packet if no more descriptors

2c758cef

ipv6: fix ip6_sock_set_addr_preferences() typo · 8cdd9f1a

Eric Dumazet authored Sep 11, 2023

ip6_sock_set_addr_preferences() second argument should be an integer.

SUNRPC attempts to set IPV6_PREFER_SRC_PUBLIC were
translated to IPV6_PREFER_SRC_TMP

Fixes: 18d5ad62 ("ipv6: add ip6_sock_set_addr_preferences")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20230911154213.713941-1-edumazet@google.comSigned-off-by: Paolo Abeni <pabeni@redhat.com>

8cdd9f1a

Merge tag 'linux-kselftest-next-6.6-rc2' of... · a747acc0

Linus Torvalds authored Sep 12, 2023

Merge tag 'linux-kselftest-next-6.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest

Pull kselftest fixes from Shuah Khan:

 - kselftest runner script to propagate SIGTERM to runner child
   to avoid kselftest hang

 - install symlinks required for test execution to avoid test
   failures

 - kselftest dependency checker script argument parsing

* tag 'linux-kselftest-next-6.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
  selftests: Keep symlinks, when possible
  selftests: fix dependency checker script
  kselftest/runner.sh: Propagate SIGTERM to runner child
  selftests/ftrace: Correctly enable event in instance-event.tc

a747acc0

Merge tag 'linux-kselftest-kunit-6.6-rc2' of... · fb52c87a

Linus Torvalds authored Sep 12, 2023

Merge tag 'linux-kselftest-kunit-6.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest

Pull kunit fixes from Shuah Khan:
 "Fixes to possible memory leak, null-ptr-deref, wild-memory-access, and
  error path bugs"

* tag 'linux-kselftest-kunit-6.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
  kunit: Fix possible memory leak in kunit_filter_suites()
  kunit: Fix possible null-ptr-deref in kunit_parse_glob_filter()
  kunit: Fix the wrong err path and add goto labels in kunit_filter_suites()
  kunit: Fix wild-memory-access bug in kunit_free_suite_set()
  kunit: test: Make filter strings in executor_test writable

fb52c87a