- 17 Jan, 2023 2 commits
-
-
Heiko Carstens authored
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
-
Heiko Carstens authored
GCC 11.1.0 and 11.2.0 generate a wrong warning when compiling the kernel e.g. with allmodconfig: arch/s390/kernel/setup.c: In function ‘setup_lowcore_dat_on’: ./include/linux/fortify-string.h:57:33: error: ‘__builtin_memcpy’ reading 128 bytes from a region of size 0 [-Werror=stringop-overread] ... arch/s390/kernel/setup.c:526:9: note: in expansion of macro ‘memcpy’ 526 | memcpy(abs_lc->cregs_save_area, S390_lowcore.cregs_save_area, | ^~~~~~ This could be addressed by using absolute_pointer() with the S390_lowcore macro, but this is not a good idea since this generates worse code for performance critical paths. Therefore simply use a for loop to copy the array in question and get rid of the warning. Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
-
- 13 Jan, 2023 17 commits
-
-
Vineeth Vijayan authored
Remove Cornelia's email address from the file as suggested by her. List linux-s390 mailing-list address as the primary contact instead. Link: https://lore.kernel.org/linux-s390/8735d0oiq6.fsf@redhat.com/Signed-off-by: Vineeth Vijayan <vneethv@linux.ibm.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
-
Christophe JAILLET authored
The commit in Fixes: has switch the order of a sysfs_create_group() and a kzalloc(). It correctly removed the now useless kfree() but forgot to add a sysfs_remove_group() in case of (unlikely) memory allocation failure. Add it now. Fixes: 260f3ea1 ("s390/vfio-ap: move probe and remove callbacks to vfio_ap_ops.c") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com> Link: https://lore.kernel.org/r/d0c0a35eec4fa87cb7f3910d8ac4dc0f7dc9008a.1659283738.git.christophe.jaillet@wanadoo.frSigned-off-by: Heiko Carstens <hca@linux.ibm.com>
-
Heiko Carstens authored
Move __amode31_base declaration to proper header file to get rid of arch/s390/boot/startup.c:24:15: warning: symbol '__amode31_base' was not declared. Should it be static? Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
-
Alexander Gordeev authored
Move Absolute Lowcore Area allocation to the decompressor. As result, get_abs_lowcore() and put_abs_lowcore() access brackets become really straight and do not require complex execution context analysis and LAP and interrupts tackling. Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
-
Alexander Gordeev authored
Move Real Memory Copy Area allocation to the decompressor. As result, memcpy_real() and memcpy_real_iter() movers become usable since the very moment the kernel starts. Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
-
Alexander Gordeev authored
Currently the decompressor sets up only identity mapping. Allow adding more address range types as a prerequisite for allocation of kernel fixed mappings. Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
-
Alexander Gordeev authored
The identity mapping is created in the decompressor, there is no need to have the same functionality in the kasan setup code. Thus, remove it. Remove the 4KB pages check for first 1MB since there is no need to take care of the lowcore pages. Acked-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
-
Alexander Gordeev authored
As the kernel is executed in DAT-on mode only, remove unnecessary DAT bit check together with the dead code. Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
-
Alexander Gordeev authored
The setup of the kernel virtual address space is spread throughout the sources, boot stages and config options like this: 1. The available physical memory regions are queried and stored as mem_detect information for later use in the decompressor. 2. Based on the physical memory availability the virtual memory layout is established in the decompressor; 3. If CONFIG_KASAN is disabled the kernel paging setup code populates kernel pgtables and turns DAT mode on. It uses the information stored at step [1]. 4. If CONFIG_KASAN is enabled the kernel early boot kasan setup populates kernel pgtables and turns DAT mode on. It uses the information stored at step [1]. The kasan setup creates early_pg_dir directory and directly overwrites swapper_pg_dir entries to make shadow memory pages available. Move the kernel virtual memory setup to the decompressor and start the kernel with DAT turned on right from the very first istruction. That completely eliminates the boot phase when the kernel runs in DAT-off mode, simplies the overall design and consolidates pgtables setup. The identity mapping is created in the decompressor, while kasan shadow mappings are still created by the early boot kernel code. Share with decompressor the existing kasan memory allocator. It decreases the size of a newly requested memory block from pgalloc_pos and ensures that kernel image is not overwritten. pgalloc_low and pgalloc_pos pointers are made preserved boot variables for that. Use the bootdata infrastructure to setup swapper_pg_dir and invalid_pg_dir directories used by the kernel later. The interim early_pg_dir directory established by the kasan initialization code gets eliminated as result. As the kernel runs in DAT-on mode only the PSW_KERNEL_BITS define gets PSW_MASK_DAT bit by default. Additionally, the setup_lowcore_dat_off() and setup_lowcore_dat_on() routines get merged, since there is no DAT-off mode stage anymore. The memory mappings are created with RW+X protection that allows the early boot code setting up all necessary data and services for the kernel being booted. Just before the paging is enabled the memory protection is changed to RO+X for text, RO+NX for read-only data and RW+NX for kernel data and the identity mapping. Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
-
Alexander Gordeev authored
Detect and enable memory facilities which is a prerequisite for pgtables setup in the decompressor. Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
-
Alexander Gordeev authored
Similar to existing PAGE_KERNEL_EXEC and SEGMENT_KERNEL_EXEC memory protection add REGION3_KERNEL_EXEC attribute that could be set on PUD pgtable entries. Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
-
Alexander Gordeev authored
Convert setup of pgtable entries to use set_pXe_bit() helpers as the preferred way in MM code. Locally introduce pgprot_clear_bit() helper, which is strictly speaking a generic function. However, it is only x86 pgprot_clear_protnone_bits() helper, which does a similar thing, so do not make it public. Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
-
Alexander Gordeev authored
Avoid duplicate IS_ENABLED(CONFIG_KASAN_VMALLOC) condition check. Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
-
Alexander Gordeev authored
Fix variables initialization coding style and setup zero pgtable same way region and segment pgtables are set up. Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
-
Alexander Gordeev authored
The kasan early boot memory allocators operate on pgalloc_pos and segment_pos physical address pointers, but fail to convert it to the corresponding virtual pointers. Currently it is not a problem, since virtual and physical addresses on s390 are the same. Nevertheless, should they ever differ, this would cause an invalid pointer access. Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
-
Alexander Gordeev authored
Commit ada1da31 ("s390/sclp: sort out physical vs virtual pointers usage") fixed the notion of virtual address for sclp_early_sccb pointer. However, it did not take into account that kasan_early_init() can also output messages and sclp_early_sccb should be adjusted by the time kasan_early_init() is called. Currently it is not a problem, since virtual and physical addresses on s390 are the same. Nevertheless, should they ever differ, this would cause an invalid pointer access. Fixes: ada1da31 ("s390/sclp: sort out physical vs virtual pointers usage") Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
-
Alexander Gordeev authored
Move declarations to appropriate header files. Instead of cryptic casting directly assign struct vmlinux_info type to _vmlinux_info linker script variable - wich it actually is. Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
-
- 11 Jan, 2023 4 commits
-
-
Heiko Carstens authored
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
-
Heiko Carstens authored
Use READ_ONCE() before cmpxchg() to prevent that the compiler generates code that fetches the to be compared old value several times from memory. Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com> Acked-by: Christian Borntraeger <borntraeger@linux.ibm.com> Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Link: https://lore.kernel.org/r/20230109145456.2895385-1-hca@linux.ibm.comSigned-off-by: Heiko Carstens <hca@linux.ibm.com>
-
Heiko Carstens authored
Make sure that *ptr__ within arch_this_cpu_to_op_simple() is only dereferenced once by using READ_ONCE(). Otherwise the compiler could generate incorrect code. Cc: <stable@vger.kernel.org> Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
-
Heiko Carstens authored
The current cmpxchg_double() loops within the perf hw sampling code do not have READ_ONCE() semantics to read the old value from memory. This allows the compiler to generate code which reads the "old" value several times from memory, which again allows for inconsistencies. For example: /* Reset trailer (using compare-double-and-swap) */ do { te_flags = te->flags & ~SDB_TE_BUFFER_FULL_MASK; te_flags |= SDB_TE_ALERT_REQ_MASK; } while (!cmpxchg_double(&te->flags, &te->overflow, te->flags, te->overflow, te_flags, 0ULL)); The compiler could generate code where te->flags used within the cmpxchg_double() call may be refetched from memory and which is not necessarily identical to the previous read version which was used to generate te_flags. Which in turn means that an incorrect update could happen. Fix this by adding READ_ONCE() semantics to all cmpxchg_double() loops. Given that READ_ONCE() cannot generate code on s390 which atomically reads 16 bytes, use a private compare-and-swap-double implementation to achieve that. Also replace cmpxchg_double() with the private implementation to be able to re-use the old value within the loops. As a side effect this converts the whole code to only use bit fields to read and modify bits within the hws trailer header. Reported-by: Alexander Gordeev <agordeev@linux.ibm.com> Acked-by: Alexander Gordeev <agordeev@linux.ibm.com> Acked-by: Hendrik Brueckner <brueckner@linux.ibm.com> Reviewed-by: Thomas Richter <tmricht@linux.ibm.com> Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/linux-s390/Y71QJBhNTIatvxUT@osiris/T/#ma14e2a5f7aa8ed4b94b6f9576799b3ad9c60f333Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
-
- 10 Jan, 2023 3 commits
-
-
Heiko Carstens authored
Add missing header include to get rid of arch/s390/crypto/arch_random.c:15:1: warning: symbol 's390_arch_random_available' was not declared. Should it be static? arch/s390/crypto/arch_random.c:17:12: warning: symbol 's390_arch_random_counter' was not declared. Should it be static? Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
-
Heiko Carstens authored
Fix this for allmodconfig: drivers/s390/char/con3270.c:43:24: error: 'condev' defined but not used [-Werror=unused-variable] static struct tty3270 *condev; ^~~~~~ Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Fixes: c17fe081 ("s390/3270: unify con3270 + tty3270") Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
-
Alexander Egorenkov authored
This commit addresses the following erroneous situation with file-based kdump executed on a system with a valid IPL report. On s390, a kdump kernel, its initrd and IPL report if present are loaded into a special and reserved on boot memory region - crashkernel. When a system crashes and kdump was activated before, the purgatory code is entered first which swaps the crashkernel and [0 - crashkernel size] memory regions. Only after that the kdump kernel is entered. For this reason, the pointer to an IPL report in lowcore must point to the IPL report after the swap and not to the address of the IPL report that was located in crashkernel memory region before the swap. Failing to do so, makes the kdump's decompressor try to read memory from the crashkernel memory region which already contains the production's kernel memory. The situation described above caused spontaneous kdump failures/hangs on systems where the Secure IPL is activated because on such systems an IPL report is always present. In that case kdump's decompressor tried to parse an IPL report which frequently lead to illegal memory accesses because an IPL report contains addresses to various data. Cc: <stable@vger.kernel.org> Fixes: 99feaa71 ("s390/kexec_file: Create ipl report and pass to next kernel") Reviewed-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Alexander Egorenkov <egorenar@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
-
- 09 Jan, 2023 14 commits
-
-
Xu Panda authored
The implementation of strscpy() is more robust and safer. That's now the recommended way to copy NUL-terminated strings. Signed-off-by: Xu Panda <xu.panda@zte.com.cn> Signed-off-by: Yang Yang <yang.yang29@zte.com.cn> Link: https://lore.kernel.org/r/202301052024349365834@zte.com.cnSigned-off-by: Heiko Carstens <hca@linux.ibm.com>
-
Eric Farman authored
By this point, all the pieces are in place to properly support a 2K Format-2 IDAL, and to convert a guest Format-1 IDAL to the 2K Format-2 variety. Let's remove the fence that prohibits them, and allow a guest to submit them if desired. Signed-off-by: Eric Farman <farman@linux.ibm.com> Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
-
Eric Farman authored
The vfio_pin_pages() interface allows contiguous pages to be pinned as a single request, which is great for the 4K pages that are normally processed. Old IDA formats operate on 2K chunks, which makes this logic more difficult. Since these formats are rare, let's just invoke the page pinning one-at-a-time, instead of trying to group them. We can rework this code at a later date if needed. Signed-off-by: Eric Farman <farman@linux.ibm.com> Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
-
Eric Farman authored
There are two scenarios that need to be addressed here. First, an ORB that does NOT have the Format-2 IDAL bit set could have both a direct-addressed CCW and an indirect-data-address CCW chained together. This means that the IDA CCW will contain a Format-1 IDAL, and can be easily converted to a 2K Format-2 IDAL. But it also means that the direct-addressed CCW needs to be converted to the same 2K Format-2 IDAL for consistency with the ORB settings. Secondly, a Format-1 IDAL is comprised of 31-bit addresses. Thus, we need to cast this IDAL to a pointer of ints while populating the list of addresses that are sent to vfio. Since the result of both of these is the use of the 2K IDAL variants, and the output of vfio-ccw is always a Format-2 IDAL (in order to use 64-bit addresses), make sure that the correct control bit gets set in the ORB when these scenarios occur. Signed-off-by: Eric Farman <farman@linux.ibm.com> Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
-
Eric Farman authored
Today, we allocate memory for a list of IDAWs, and if the CCW being processed contains an IDAL we read that data from the guest into that space. We then copy each IDAW into the pa_iova array, or fabricate that pa_iova array with a list of addresses based on a direct-addressed CCW. Combine the reading of the guest IDAL with the creation of a pseudo-IDAL for direct-addressed CCWs, so that both CCW types have a "guest" IDAL that can be populated straight into the pa_iova array. Signed-off-by: Eric Farman <farman@linux.ibm.com> Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
-
Eric Farman authored
The idal_nr_words() routine works well for 4K IDAWs, but lost its ability to handle the old 2K formats with the removal of 31-bit builds in commit 5a79859a ("s390: remove 31 bit support"). Since there's nothing preventing a guest from generating this IDAW format, let's re-introduce the math for them and use both when calculating the number of IDAWs based on the bits specified in the ORB. Signed-off-by: Eric Farman <farman@linux.ibm.com> Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
-
Eric Farman authored
The intention is to read the first IDAW to determine the starting location of an I/O operation, knowing that the second and any/all subsequent IDAWs will be aligned per architecture. But, this read receives 64-bits of data, which is the size of a Format-2 IDAW. In the event that Format-1 IDAWs are presented, adjust the size of the read to 32-bits. The data will end up occupying the upper word of the target iova variable, so shift it down to the lower word for use as an address. (By definition, this IDAW format uses a 31-bit address, so the "sign" bit will always be off and there is no concern about sign extension.) Signed-off-by: Eric Farman <farman@linux.ibm.com> Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
-
Eric Farman authored
The rules of an IDAW are fairly simple: Each one can move no more than a defined amount of data, must not cross the boundary defined by that length, and must be aligned to that length as well. The first IDAW in a list is special, in that it does not need to adhere to that alignment, but the other rules still apply. Thus, by reading the first IDAW in a list, the number of IDAWs that will comprise a data transfer of a particular size can be calculated. Let's factor out the reading of that first IDAW with the logic that calculates the length of the list, to simplify the rest of the routine that handles the individual IDAWs. Signed-off-by: Eric Farman <farman@linux.ibm.com> Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
-
Eric Farman authored
There are two possible ways the list of addresses that get passed to vfio are calculated. One is from a guest IDAL, which would be an array of (probably) non-contiguous addresses. The other is built from contiguous pages that follow the starting address provided by ccw->cda. page_array_alloc() attempts to simplify things by pre-populating this array from the starting address, but that's not needed for a CCW with an IDAL anyway so doesn't need to be in the allocator. Move it to the caller in the non-IDAL case, since it will be overwritten when reading the guest IDAL. Remove the initialization of the pa_page output pointers, since it won't be explicitly needed for either case. Signed-off-by: Eric Farman <farman@linux.ibm.com> Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
-
Eric Farman authored
The allocation of our page_array struct calculates the number of 4K pages that would be needed to hold a certain number of bytes. But, since the number of pages that will be pinned is also calculated by the length of the IDAL, this logic is unnecessary. Let's pass that information in directly, and avoid the math within the allocator. Also, let's make this two allocations instead of one, to make it apparent what's happening within here. Signed-off-by: Eric Farman <farman@linux.ibm.com> Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
-
Eric Farman authored
Everything about this allocation is harder than necessary, since the memory allocation is already aligned to our needs. Break them apart for readability, instead of doing the funky arithmetic. Of the structures that are involved, only ch_ccw needs the GFP_DMA flag, so the others can be allocated without it. Signed-off-by: Eric Farman <farman@linux.ibm.com> Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
-
Eric Farman authored
The act of processing a fetched CCW has two components: 1) Process a Transfer-in-channel (TIC) CCW 2) Process any other CCW The former needs to look at whether the TIC jumps backwards into the current channel program or forwards into a new segment, while the latter just processes the CCW data address itself. Rather than passing the chain segment and index within it to the handlers for the above, and requiring each to calculate the elements it needs, simply pass the needed pointers directly. For the TIC, that means the CCW being processed and the location of the entire channel program which holds all segments. For the other CCWs, the page_array pointer is also needed to perform the page pinning, etc. While at it, rename ccwchain_fetch_direct to _ccw, to indicate what it is. The name "_direct" is historical, when it used to process a direct-addressed CCW, but IDAs are processed here too. Signed-off-by: Eric Farman <farman@linux.ibm.com> Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
-
Eric Farman authored
It was suggested [1] that we replace the old copy_from_iova() routine (which pins a page, does a memcpy, and unpins the page) with the newer vfio_dma_rw() interface. This has a modest improvement in the overall time spent through the fsm_io_request() path, and simplifies some of the code to boot. [1] https://lore.kernel.org/r/20220706170553.GK693670@nvidia.com/Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Eric Farman <farman@linux.ibm.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
-
Eric Farman authored
The output of vfio_ccw is always a Format-2 IDAL, but the code that explicitly sets this is buried in cp_init(). In fact the input is often already a Format-2 IDAL, and would be rejected (via the check in ccwchain_calc_length()) if it weren't, so explicitly setting it doesn't do much. Setting it way down here only makes it impossible to make decisions in support of other IDAL formats. Let's move that to where the rest of the ORB is set up, so that the CCW processing in cp_prefetch() is performed according to the contents of the unmodified guest ORB. Signed-off-by: Eric Farman <farman@linux.ibm.com> Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
-