Commit 85fbf15b authored by Linus Torvalds's avatar Linus Torvalds

Merge branch 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 boot updates from Ingo Molnar:
 "The main changes were:

   - Extend the boot protocol to allow future extensions without hitting
     the setup_header size limit.

   - Add quirk to devicetree systems to disable the RTC unless it's
     listed as a supported device.

   - Fix ld.lld linker pedantry"

* 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/boot: Introduce setup_indirect
  x86/boot: Introduce kernel_info.setup_type_max
  x86/boot: Introduce kernel_info
  x86/init: Allow DT configured systems to disable RTC at boot time
  x86/realmode: Explicitly set entry point via ENTRY in linker script
parents fd261590 b3c72fc9
......@@ -68,8 +68,25 @@ Protocol 2.12 (Kernel 3.8) Added the xloadflags field and extension fields
Protocol 2.13 (Kernel 3.14) Support 32- and 64-bit flags being set in
xloadflags to support booting a 64-bit kernel from 32-bit
EFI
Protocol 2.14: BURNT BY INCORRECT COMMIT ae7e1238e68f2a472a125673ab506d49158c1889
(x86/boot: Add ACPI RSDP address to setup_header)
DO NOT USE!!! ASSUME SAME AS 2.13.
Protocol 2.15: (Kernel 5.5) Added the kernel_info and kernel_info.setup_type_max.
============= ============================================================
.. note::
The protocol version number should be changed only if the setup header
is changed. There is no need to update the version number if boot_params
or kernel_info are changed. Additionally, it is recommended to use
xloadflags (in this case the protocol version number should not be
updated either) or kernel_info to communicate supported Linux kernel
features to the boot loader. Due to very limited space available in
the original setup header every update to it should be considered
with great care. Starting from the protocol 2.15 the primary way to
communicate things to the boot loader is the kernel_info.
Memory Layout
=============
......@@ -207,6 +224,7 @@ Offset/Size Proto Name Meaning
0258/8 2.10+ pref_address Preferred loading address
0260/4 2.10+ init_size Linear memory required during initialization
0264/4 2.11+ handover_offset Offset of handover entry point
0268/4 2.15+ kernel_info_offset Offset of the kernel_info
=========== ======== ===================== ============================================
.. note::
......@@ -809,6 +827,47 @@ Protocol: 2.09+
sure to consider the case where the linked list already contains
entries.
The setup_data is a bit awkward to use for extremely large data objects,
both because the setup_data header has to be adjacent to the data object
and because it has a 32-bit length field. However, it is important that
intermediate stages of the boot process have a way to identify which
chunks of memory are occupied by kernel data.
Thus setup_indirect struct and SETUP_INDIRECT type were introduced in
protocol 2.15.
struct setup_indirect {
__u32 type;
__u32 reserved; /* Reserved, must be set to zero. */
__u64 len;
__u64 addr;
};
The type member is a SETUP_INDIRECT | SETUP_* type. However, it cannot be
SETUP_INDIRECT itself since making the setup_indirect a tree structure
could require a lot of stack space in something that needs to parse it
and stack space can be limited in boot contexts.
Let's give an example how to point to SETUP_E820_EXT data using setup_indirect.
In this case setup_data and setup_indirect will look like this:
struct setup_data {
__u64 next = 0 or <addr_of_next_setup_data_struct>;
__u32 type = SETUP_INDIRECT;
__u32 len = sizeof(setup_data);
__u8 data[sizeof(setup_indirect)] = struct setup_indirect {
__u32 type = SETUP_INDIRECT | SETUP_E820_EXT;
__u32 reserved = 0;
__u64 len = <len_of_SETUP_E820_EXT_data>;
__u64 addr = <addr_of_SETUP_E820_EXT_data>;
}
}
.. note::
SETUP_INDIRECT | SETUP_NONE objects cannot be properly distinguished
from SETUP_INDIRECT itself. So, this kind of objects cannot be provided
by the bootloaders.
============ ============
Field name: pref_address
Type: read (reloc)
......@@ -855,6 +914,121 @@ Offset/size: 0x264/4
See EFI HANDOVER PROTOCOL below for more details.
============ ==================
Field name: kernel_info_offset
Type: read
Offset/size: 0x268/4
Protocol: 2.15+
============ ==================
This field is the offset from the beginning of the kernel image to the
kernel_info. The kernel_info structure is embedded in the Linux image
in the uncompressed protected mode region.
The kernel_info
===============
The relationships between the headers are analogous to the various data
sections:
setup_header = .data
boot_params/setup_data = .bss
What is missing from the above list? That's right:
kernel_info = .rodata
We have been (ab)using .data for things that could go into .rodata or .bss for
a long time, for lack of alternatives and -- especially early on -- inertia.
Also, the BIOS stub is responsible for creating boot_params, so it isn't
available to a BIOS-based loader (setup_data is, though).
setup_header is permanently limited to 144 bytes due to the reach of the
2-byte jump field, which doubles as a length field for the structure, combined
with the size of the "hole" in struct boot_params that a protected-mode loader
or the BIOS stub has to copy it into. It is currently 119 bytes long, which
leaves us with 25 very precious bytes. This isn't something that can be fixed
without revising the boot protocol entirely, breaking backwards compatibility.
boot_params proper is limited to 4096 bytes, but can be arbitrarily extended
by adding setup_data entries. It cannot be used to communicate properties of
the kernel image, because it is .bss and has no image-provided content.
kernel_info solves this by providing an extensible place for information about
the kernel image. It is readonly, because the kernel cannot rely on a
bootloader copying its contents anywhere, but that is OK; if it becomes
necessary it can still contain data items that an enabled bootloader would be
expected to copy into a setup_data chunk.
All kernel_info data should be part of this structure. Fixed size data have to
be put before kernel_info_var_len_data label. Variable size data have to be put
after kernel_info_var_len_data label. Each chunk of variable size data has to
be prefixed with header/magic and its size, e.g.:
kernel_info:
.ascii "LToP" /* Header, Linux top (structure). */
.long kernel_info_var_len_data - kernel_info
.long kernel_info_end - kernel_info
.long 0x01234567 /* Some fixed size data for the bootloaders. */
kernel_info_var_len_data:
example_struct: /* Some variable size data for the bootloaders. */
.ascii "0123" /* Header/Magic. */
.long example_struct_end - example_struct
.ascii "Struct"
.long 0x89012345
example_struct_end:
example_strings: /* Some variable size data for the bootloaders. */
.ascii "ABCD" /* Header/Magic. */
.long example_strings_end - example_strings
.asciz "String_0"
.asciz "String_1"
example_strings_end:
kernel_info_end:
This way the kernel_info is self-contained blob.
.. note::
Each variable size data header/magic can be any 4-character string,
without \0 at the end of the string, which does not collide with
existing variable length data headers/magics.
Details of the kernel_info Fields
=================================
============ ========
Field name: header
Offset/size: 0x0000/4
============ ========
Contains the magic number "LToP" (0x506f544c).
============ ========
Field name: size
Offset/size: 0x0004/4
============ ========
This field contains the size of the kernel_info including kernel_info.header.
It does not count kernel_info.kernel_info_var_len_data size. This field should be
used by the bootloaders to detect supported fixed size fields in the kernel_info
and beginning of kernel_info.kernel_info_var_len_data.
============ ========
Field name: size_total
Offset/size: 0x0008/4
============ ========
This field contains the size of the kernel_info including kernel_info.header
and kernel_info.kernel_info_var_len_data.
============ ==============
Field name: setup_type_max
Offset/size: 0x000c/4
============ ==============
This field contains maximal allowed type for setup_data and setup_indirect structs.
The Image Checksum
==================
......
......@@ -87,7 +87,7 @@ $(obj)/vmlinux.bin: $(obj)/compressed/vmlinux FORCE
SETUP_OBJS = $(addprefix $(obj)/,$(setup-y))
sed-zoffset := -e 's/^\([0-9a-fA-F]*\) [ABCDGRSTVW] \(startup_32\|startup_64\|efi32_stub_entry\|efi64_stub_entry\|efi_pe_entry\|input_data\|_end\|_ehead\|_text\|z_.*\)$$/\#define ZO_\2 0x\1/p'
sed-zoffset := -e 's/^\([0-9a-fA-F]*\) [ABCDGRSTVW] \(startup_32\|startup_64\|efi32_stub_entry\|efi64_stub_entry\|efi_pe_entry\|input_data\|kernel_info\|_end\|_ehead\|_text\|z_.*\)$$/\#define ZO_\2 0x\1/p'
quiet_cmd_zoffset = ZOFFSET $@
cmd_zoffset = $(NM) $< | sed -n $(sed-zoffset) > $@
......
......@@ -72,8 +72,8 @@ $(obj)/../voffset.h: vmlinux FORCE
$(obj)/misc.o: $(obj)/../voffset.h
vmlinux-objs-y := $(obj)/vmlinux.lds $(obj)/head_$(BITS).o $(obj)/misc.o \
$(obj)/string.o $(obj)/cmdline.o $(obj)/error.o \
vmlinux-objs-y := $(obj)/vmlinux.lds $(obj)/kernel_info.o $(obj)/head_$(BITS).o \
$(obj)/misc.o $(obj)/string.o $(obj)/cmdline.o $(obj)/error.o \
$(obj)/piggy.o $(obj)/cpuflags.o
vmlinux-objs-$(CONFIG_EARLY_PRINTK) += $(obj)/early_serial_console.o
......
......@@ -459,6 +459,18 @@ static bool mem_avoid_overlap(struct mem_vector *img,
is_overlapping = true;
}
if (ptr->type == SETUP_INDIRECT &&
((struct setup_indirect *)ptr->data)->type != SETUP_INDIRECT) {
avoid.start = ((struct setup_indirect *)ptr->data)->addr;
avoid.size = ((struct setup_indirect *)ptr->data)->len;
if (mem_overlaps(img, &avoid) && (avoid.start < earliest)) {
*overlap = avoid;
earliest = overlap->start;
is_overlapping = true;
}
}
ptr = (struct setup_data *)(unsigned long)ptr->next;
}
......
/* SPDX-License-Identifier: GPL-2.0 */
#include <asm/bootparam.h>
.section ".rodata.kernel_info", "a"
.global kernel_info
kernel_info:
/* Header, Linux top (structure). */
.ascii "LToP"
/* Size. */
.long kernel_info_var_len_data - kernel_info
/* Size total. */
.long kernel_info_end - kernel_info
/* Maximal allowed type for setup_data and setup_indirect structs. */
.long SETUP_TYPE_MAX
kernel_info_var_len_data:
/* Empty for time being... */
kernel_info_end:
......@@ -300,7 +300,7 @@ _start:
# Part 2 of the header, from the old setup.S
.ascii "HdrS" # header signature
.word 0x020d # header version number (>= 0x0105)
.word 0x020f # header version number (>= 0x0105)
# or else old loadlin-1.5 will fail)
.globl realmode_swtch
realmode_swtch: .word 0, 0 # default_switch, SETUPSEG
......@@ -567,6 +567,7 @@ pref_address: .quad LOAD_PHYSICAL_ADDR # preferred load addr
init_size: .long INIT_SIZE # kernel initialization size
handover_offset: .long 0 # Filled in by build.c
kernel_info_offset: .long 0 # Filled in by build.c
# End of setup header #####################################################
......
......@@ -56,6 +56,7 @@ u8 buf[SETUP_SECT_MAX*512];
unsigned long efi32_stub_entry;
unsigned long efi64_stub_entry;
unsigned long efi_pe_entry;
unsigned long kernel_info;
unsigned long startup_64;
/*----------------------------------------------------------------------*/
......@@ -321,6 +322,7 @@ static void parse_zoffset(char *fname)
PARSE_ZOFS(p, efi32_stub_entry);
PARSE_ZOFS(p, efi64_stub_entry);
PARSE_ZOFS(p, efi_pe_entry);
PARSE_ZOFS(p, kernel_info);
PARSE_ZOFS(p, startup_64);
p = strchr(p, '\n');
......@@ -410,6 +412,9 @@ int main(int argc, char ** argv)
efi_stub_entry_update();
/* Update kernel_info offset. */
put_unaligned_le32(kernel_info, &buf[0x268]);
crc = partial_crc32(buf, i, crc);
if (fwrite(buf, 1, i, dest) != i)
die("Writing setup failed");
......
......@@ -2,7 +2,7 @@
#ifndef _ASM_X86_BOOTPARAM_H
#define _ASM_X86_BOOTPARAM_H
/* setup_data types */
/* setup_data/setup_indirect types */
#define SETUP_NONE 0
#define SETUP_E820_EXT 1
#define SETUP_DTB 2
......@@ -11,6 +11,11 @@
#define SETUP_APPLE_PROPERTIES 5
#define SETUP_JAILHOUSE 6
#define SETUP_INDIRECT (1<<31)
/* SETUP_INDIRECT | max(SETUP_*) */
#define SETUP_TYPE_MAX (SETUP_INDIRECT | SETUP_JAILHOUSE)
/* ram_size flags */
#define RAMDISK_IMAGE_START_MASK 0x07FF
#define RAMDISK_PROMPT_FLAG 0x8000
......@@ -49,6 +54,14 @@ struct setup_data {
__u8 data[0];
};
/* extensible setup indirect data node */
struct setup_indirect {
__u32 type;
__u32 reserved; /* Reserved, must be set to zero. */
__u64 len;
__u64 addr;
};
struct setup_header {
__u8 setup_sects;
__u16 root_flags;
......@@ -88,6 +101,7 @@ struct setup_header {
__u64 pref_address;
__u32 init_size;
__u32 handover_offset;
__u32 kernel_info_offset;
} __attribute__((packed));
struct sys_desc_table {
......
......@@ -999,6 +999,17 @@ void __init e820__reserve_setup_data(void)
data = early_memremap(pa_data, sizeof(*data));
e820__range_update(pa_data, sizeof(*data)+data->len, E820_TYPE_RAM, E820_TYPE_RESERVED_KERN);
e820__range_update_kexec(pa_data, sizeof(*data)+data->len, E820_TYPE_RAM, E820_TYPE_RESERVED_KERN);
if (data->type == SETUP_INDIRECT &&
((struct setup_indirect *)data->data)->type != SETUP_INDIRECT) {
e820__range_update(((struct setup_indirect *)data->data)->addr,
((struct setup_indirect *)data->data)->len,
E820_TYPE_RAM, E820_TYPE_RESERVED_KERN);
e820__range_update_kexec(((struct setup_indirect *)data->data)->addr,
((struct setup_indirect *)data->data)->len,
E820_TYPE_RAM, E820_TYPE_RESERVED_KERN);
}
pa_data = data->next;
early_memunmap(data, sizeof(*data));
}
......
......@@ -44,7 +44,12 @@ static ssize_t setup_data_read(struct file *file, char __user *user_buf,
if (count > node->len - pos)
count = node->len - pos;
pa = node->paddr + sizeof(struct setup_data) + pos;
pa = node->paddr + pos;
/* Is it direct data or invalid indirect one? */
if (!(node->type & SETUP_INDIRECT) || node->type == SETUP_INDIRECT)
pa += sizeof(struct setup_data);
p = memremap(pa, count, MEMREMAP_WB);
if (!p)
return -ENOMEM;
......@@ -108,9 +113,17 @@ static int __init create_setup_data_nodes(struct dentry *parent)
goto err_dir;
}
if (data->type == SETUP_INDIRECT &&
((struct setup_indirect *)data->data)->type != SETUP_INDIRECT) {
node->paddr = ((struct setup_indirect *)data->data)->addr;
node->type = ((struct setup_indirect *)data->data)->type;
node->len = ((struct setup_indirect *)data->data)->len;
} else {
node->paddr = pa_data;
node->type = data->type;
node->len = data->len;
}
create_setup_data_node(d, no, node);
pa_data = data->next;
......
......@@ -100,7 +100,12 @@ static int __init get_setup_data_size(int nr, size_t *size)
if (!data)
return -ENOMEM;
if (nr == i) {
if (data->type == SETUP_INDIRECT &&
((struct setup_indirect *)data->data)->type != SETUP_INDIRECT)
*size = ((struct setup_indirect *)data->data)->len;
else
*size = data->len;
memunmap(data);
return 0;
}
......@@ -130,6 +135,9 @@ static ssize_t type_show(struct kobject *kobj,
if (!data)
return -ENOMEM;
if (data->type == SETUP_INDIRECT)
ret = sprintf(buf, "0x%x\n", ((struct setup_indirect *)data->data)->type);
else
ret = sprintf(buf, "0x%x\n", data->type);
memunmap(data);
return ret;
......@@ -142,7 +150,7 @@ static ssize_t setup_data_data_read(struct file *fp,
loff_t off, size_t count)
{
int nr, ret = 0;
u64 paddr;
u64 paddr, len;
struct setup_data *data;
void *p;
......@@ -157,19 +165,28 @@ static ssize_t setup_data_data_read(struct file *fp,
if (!data)
return -ENOMEM;
if (off > data->len) {
if (data->type == SETUP_INDIRECT &&
((struct setup_indirect *)data->data)->type != SETUP_INDIRECT) {
paddr = ((struct setup_indirect *)data->data)->addr;
len = ((struct setup_indirect *)data->data)->len;
} else {
paddr += sizeof(*data);
len = data->len;
}
if (off > len) {
ret = -EINVAL;
goto out;
}
if (count > data->len - off)
count = data->len - off;
if (count > len - off)
count = len - off;
if (!count)
goto out;
ret = count;
p = memremap(paddr + sizeof(*data), data->len, MEMREMAP_WB);
p = memremap(paddr, len, MEMREMAP_WB);
if (!p) {
ret = -ENOMEM;
goto out;
......
......@@ -438,6 +438,12 @@ static void __init memblock_x86_reserve_range_setup_data(void)
while (pa_data) {
data = early_memremap(pa_data, sizeof(*data));
memblock_reserve(pa_data, sizeof(*data) + data->len);
if (data->type == SETUP_INDIRECT &&
((struct setup_indirect *)data->data)->type != SETUP_INDIRECT)
memblock_reserve(((struct setup_indirect *)data->data)->addr,
((struct setup_indirect *)data->data)->len);
pa_data = data->next;
early_memunmap(data, sizeof(*data));
}
......
......@@ -31,6 +31,28 @@ static int __init iommu_init_noop(void) { return 0; }
static void iommu_shutdown_noop(void) { }
bool __init bool_x86_init_noop(void) { return false; }
void x86_op_int_noop(int cpu) { }
static __init int set_rtc_noop(const struct timespec64 *now) { return -EINVAL; }
static __init void get_rtc_noop(struct timespec64 *now) { }
static __initconst const struct of_device_id of_cmos_match[] = {
{ .compatible = "motorola,mc146818" },
{}
};
/*
* Allow devicetree configured systems to disable the RTC by setting the
* corresponding DT node's status property to disabled. Code is optimized
* out for CONFIG_OF=n builds.
*/
static __init void x86_wallclock_init(void)
{
struct device_node *node = of_find_matching_node(NULL, of_cmos_match);
if (node && !of_device_is_available(node)) {
x86_platform.get_wallclock = get_rtc_noop;
x86_platform.set_wallclock = set_rtc_noop;
}
}
/*
* The platform setup functions are preset with the default functions
......@@ -73,7 +95,7 @@ struct x86_init_ops x86_init __initdata = {
.timers = {
.setup_percpu_clockev = setup_boot_APIC_clock,
.timer_init = hpet_time_init,
.wallclock_init = x86_init_noop,
.wallclock_init = x86_wallclock_init,
},
.iommu = {
......
......@@ -626,6 +626,17 @@ static bool memremap_is_setup_data(resource_size_t phys_addr,
paddr_next = data->next;
len = data->len;
if ((phys_addr > paddr) && (phys_addr < (paddr + len))) {
memunmap(data);
return true;
}
if (data->type == SETUP_INDIRECT &&
((struct setup_indirect *)data->data)->type != SETUP_INDIRECT) {
paddr = ((struct setup_indirect *)data->data)->addr;
len = ((struct setup_indirect *)data->data)->len;
}
memunmap(data);
if ((phys_addr > paddr) && (phys_addr < (paddr + len)))
......
......@@ -11,6 +11,7 @@
OUTPUT_FORMAT("elf32-i386")
OUTPUT_ARCH(i386)
ENTRY(pa_text_start)
SECTIONS
{
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment