1. 24 Aug, 2023 10 commits
    • Oleg Nesterov's avatar
      document while_each_thread(), change first_tid() to use for_each_thread() · dce8f8ed
      Oleg Nesterov authored
      Add the comment to explain that while_each_thread(g,t) is not rcu-safe
      unless g is stable (e.g.  current).  Even if g is a group leader and thus
      can't exit before t, t or another sub-thread can exec and remove g from
      the thread_group list.
      
      The only lockless user of while_each_thread() is first_tid() and it is
      fine in that it can't loop forever, yet for_each_thread() looks better and
      I am going to change while_each_thread/next_thread.
      
      Link: https://lkml.kernel.org/r/20230823170806.GA11724@redhat.comSigned-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      dce8f8ed
    • Alexey Dobriyan's avatar
      drivers/char/mem.c: shrink character device's devlist[] array · ed1af26c
      Alexey Dobriyan authored
      Merge padding, shrinking "struct memdev" from 32 bytes to 24 bytes
      on 64-bit.
      
      Link: https://lkml.kernel.org/r/fe4d62ab-2427-4635-b9f4-467853fb63e3@p183Signed-off-by: default avatarAlexey Dobriyan <adobriyan@gmail.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      ed1af26c
    • Eric DeVolder's avatar
      x86/crash: optimize CPU changes · 543cd4c5
      Eric DeVolder authored
      crash_prepare_elf64_headers() writes into the elfcorehdr an ELF PT_NOTE
      for all possible CPUs.  As such, subsequent changes to CPUs (ie.  hot
      un/plug, online/offline) do not need to rewrite the elfcorehdr.
      
      The kimage->file_mode term covers kdump images loaded via the
      kexec_file_load() syscall.  Since crash_prepare_elf64_headers() wrote the
      initial elfcorehdr, no update to the elfcorehdr is needed for CPU changes.
      
      The kimage->elfcorehdr_updated term covers kdump images loaded via the
      kexec_load() syscall.  At least one memory or CPU change must occur to
      cause crash_prepare_elf64_headers() to rewrite the elfcorehdr. 
      Afterwards, no update to the elfcorehdr is needed for CPU changes.
      
      This code is intentionally *NOT* hoisted into crash_handle_hotplug_event()
      as it would prevent the arch-specific handler from running for CPU
      changes.  This would break PPC, for example, which needs to update other
      information besides the elfcorehdr, on CPU changes.
      
      Link: https://lkml.kernel.org/r/20230814214446.6659-9-eric.devolder@oracle.comSigned-off-by: default avatarEric DeVolder <eric.devolder@oracle.com>
      Reviewed-by: default avatarSourabh Jain <sourabhjain@linux.ibm.com>
      Acked-by: default avatarHari Bathini <hbathini@linux.ibm.com>
      Acked-by: default avatarBaoquan He <bhe@redhat.com>
      Cc: Akhil Raj <lf32.dev@gmail.com>
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Cc: Borislav Petkov (AMD) <bp@alien8.de>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Mimi Zohar <zohar@linux.ibm.com>
      Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: "Rafael J. Wysocki" <rafael@kernel.org>
      Cc: Sean Christopherson <seanjc@google.com>
      Cc: Takashi Iwai <tiwai@suse.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Thomas Weißschuh <linux@weissschuh.net>
      Cc: Valentin Schneider <vschneid@redhat.com>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      543cd4c5
    • Eric DeVolder's avatar
      crash: change crash_prepare_elf64_headers() to for_each_possible_cpu() · a396d0f8
      Eric DeVolder authored
      The function crash_prepare_elf64_headers() generates the elfcorehdr which
      describes the CPUs and memory in the system for the crash kernel.  In
      particular, it writes out ELF PT_NOTEs for memory regions and the CPUs in
      the system.
      
      With respect to the CPUs, the current implementation utilizes
      for_each_present_cpu() which means that as CPUs are added and removed, the
      elfcorehdr must again be updated to reflect the new set of CPUs.
      
      The reasoning behind the move to use for_each_possible_cpu(), is:
      
      - At kernel boot time, all percpu crash_notes are allocated for all
        possible CPUs; that is, crash_notes are not allocated dynamically
        when CPUs are plugged/unplugged. Thus the crash_notes for each
        possible CPU are always available.
      
      - The crash_prepare_elf64_headers() creates an ELF PT_NOTE per CPU.
        Changing to for_each_possible_cpu() is valid as the crash_notes
        pointed to by each CPU PT_NOTE are present and always valid.
      
      Furthermore, examining a common crash processing path of:
      
       kernel panic -> crash kernel -> makedumpfile -> 'crash' analyzer
                 elfcorehdr      /proc/vmcore     vmcore
      
      reveals how the ELF CPU PT_NOTEs are utilized:
      
      - Upon panic, each CPU is sent an IPI and shuts itself down, recording
       its state in its crash_notes. When all CPUs are shutdown, the
       crash kernel is launched with a pointer to the elfcorehdr.
      
      - The crash kernel via linux/fs/proc/vmcore.c does not examine or
       use the contents of the PT_NOTEs, it exposes them via /proc/vmcore.
      
      - The makedumpfile utility uses /proc/vmcore and reads the CPU
       PT_NOTEs to craft a nr_cpus variable, which is reported in a
       header but otherwise generally unused. Makedumpfile creates the
       vmcore.
      
      - The 'crash' dump analyzer does not appear to reference the CPU
       PT_NOTEs. Instead it looks-up the cpu_[possible|present|onlin]_mask
       symbols and directly examines those structure contents from vmcore
       memory. From that information it is able to determine which CPUs
       are present and online, and locate the corresponding crash_notes.
       Said differently, it appears that 'crash' analyzer does not rely
       on the ELF PT_NOTEs for CPUs; rather it obtains the information
       directly via kernel symbols and the memory within the vmcore.
      
      (There maybe other vmcore generating and analysis tools that do use these
      PT_NOTEs, but 'makedumpfile' and 'crash' seems to be the most common
      solution.)
      
      This results in the benefit of having all CPUs described in the
      elfcorehdr, and therefore reducing the need to re-generate the elfcorehdr
      on CPU changes, at the small expense of an additional 56 bytes per PT_NOTE
      for not-present-but-possible CPUs.
      
      On systems where kexec_file_load() syscall is utilized, all the above is
      valid.  On systems where kexec_load() syscall is utilized, there may be
      the need for the elfcorehdr to be regenerated once.  The reason being that
      some archs only populate the 'present' CPUs from the
      /sys/devices/system/cpus entries, which the userspace 'kexec' utility uses
      to generate the userspace-supplied elfcorehdr.  In this situation, one
      memory or CPU change will rewrite the elfcorehdr via the
      crash_prepare_elf64_headers() function and now all possible CPUs will be
      described, just as with kexec_file_load() syscall.
      
      Link: https://lkml.kernel.org/r/20230814214446.6659-8-eric.devolder@oracle.comSigned-off-by: default avatarEric DeVolder <eric.devolder@oracle.com>
      Suggested-by: default avatarSourabh Jain <sourabhjain@linux.ibm.com>
      Reviewed-by: default avatarSourabh Jain <sourabhjain@linux.ibm.com>
      Acked-by: default avatarHari Bathini <hbathini@linux.ibm.com>
      Acked-by: default avatarBaoquan He <bhe@redhat.com>
      Cc: Akhil Raj <lf32.dev@gmail.com>
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Cc: Borislav Petkov (AMD) <bp@alien8.de>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Mimi Zohar <zohar@linux.ibm.com>
      Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: "Rafael J. Wysocki" <rafael@kernel.org>
      Cc: Sean Christopherson <seanjc@google.com>
      Cc: Takashi Iwai <tiwai@suse.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Thomas Weißschuh <linux@weissschuh.net>
      Cc: Valentin Schneider <vschneid@redhat.com>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      a396d0f8
    • Eric DeVolder's avatar
      crash: hotplug support for kexec_load() · a72bbec7
      Eric DeVolder authored
      The hotplug support for kexec_load() requires changes to the userspace
      kexec-tools and a little extra help from the kernel.
      
      Given a kdump capture kernel loaded via kexec_load(), and a subsequent
      hotplug event, the crash hotplug handler finds the elfcorehdr and rewrites
      it to reflect the hotplug change.  That is the desired outcome, however,
      at kernel panic time, the purgatory integrity check fails (because the
      elfcorehdr changed), and the capture kernel does not boot and no vmcore is
      generated.
      
      Therefore, the userspace kexec-tools/kexec must indicate to the kernel
      that the elfcorehdr can be modified (because the kexec excluded the
      elfcorehdr from the digest, and sized the elfcorehdr memory buffer
      appropriately).
      
      To facilitate hotplug support with kexec_load():
       - a new kexec flag KEXEC_UPATE_ELFCOREHDR indicates that it is
         safe for the kernel to modify the kexec_load()'d elfcorehdr
       - the /sys/kernel/crash_elfcorehdr_size node communicates the
         preferred size of the elfcorehdr memory buffer
       - The sysfs crash_hotplug nodes (ie.
         /sys/devices/system/[cpu|memory]/crash_hotplug) dynamically
         take into account kexec_file_load() vs kexec_load() and
         KEXEC_UPDATE_ELFCOREHDR.
         This is critical so that the udev rule processing of crash_hotplug
         is all that is needed to determine if the userspace unload-then-load
         of the kdump image is to be skipped, or not. The proposed udev
         rule change looks like:
         # The kernel updates the crash elfcorehdr for CPU and memory changes
         SUBSYSTEM=="cpu", ATTRS{crash_hotplug}=="1", GOTO="kdump_reload_end"
         SUBSYSTEM=="memory", ATTRS{crash_hotplug}=="1", GOTO="kdump_reload_end"
      
      The table below indicates the behavior of kexec_load()'d kdump image
      updates (with the new udev crash_hotplug rule in place):
      
       Kernel |Kexec
       -------+-----+----
       Old    |Old  |New
              |  a  | a
       -------+-----+----
       New    |  a  | b
       -------+-----+----
      
      where kexec 'old' and 'new' delineate kexec-tools has the needed
      modifications for the crash hotplug feature, and kernel 'old' and 'new'
      delineate the kernel supports this crash hotplug feature.
      
      Behavior 'a' indicates the unload-then-reload of the entire kdump image. 
      For the kexec 'old' column, the unload-then-reload occurs due to the
      missing flag KEXEC_UPDATE_ELFCOREHDR.  An 'old' kernel (with 'new' kexec)
      does not present the crash_hotplug sysfs node, which leads to the
      unload-then-reload of the kdump image.
      
      Behavior 'b' indicates the desired optimized behavior of the kernel
      directly modifying the elfcorehdr and avoiding the unload-then-reload of
      the kdump image.
      
      If the udev rule is not updated with crash_hotplug node check, then no
      matter any combination of kernel or kexec is new or old, the kdump image
      continues to be unload-then-reload on hotplug changes.
      
      To fully support crash hotplug feature, there needs to be a rollout of
      kernel, kexec-tools and udev rule changes.  However, the order of the
      rollout of these pieces does not matter; kexec_load()'d kdump images still
      function for hotplug as-is.
      
      Link: https://lkml.kernel.org/r/20230814214446.6659-7-eric.devolder@oracle.comSigned-off-by: default avatarEric DeVolder <eric.devolder@oracle.com>
      Suggested-by: default avatarHari Bathini <hbathini@linux.ibm.com>
      Acked-by: default avatarHari Bathini <hbathini@linux.ibm.com>
      Acked-by: default avatarBaoquan He <bhe@redhat.com>
      Cc: Akhil Raj <lf32.dev@gmail.com>
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Cc: Borislav Petkov (AMD) <bp@alien8.de>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Mimi Zohar <zohar@linux.ibm.com>
      Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: "Rafael J. Wysocki" <rafael@kernel.org>
      Cc: Sean Christopherson <seanjc@google.com>
      Cc: Sourabh Jain <sourabhjain@linux.ibm.com>
      Cc: Takashi Iwai <tiwai@suse.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Thomas Weißschuh <linux@weissschuh.net>
      Cc: Valentin Schneider <vschneid@redhat.com>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      a72bbec7
    • Eric DeVolder's avatar
      x86/crash: add x86 crash hotplug support · ea53ad9c
      Eric DeVolder authored
      When CPU or memory is hot un/plugged, or off/onlined, the crash
      elfcorehdr, which describes the CPUs and memory in the system, must also
      be updated.
      
      A new elfcorehdr is generated from the available CPUs and memory and
      replaces the existing elfcorehdr.  The segment containing the elfcorehdr
      is identified at run-time in crash_core:crash_handle_hotplug_event().
      
      No modifications to purgatory (see 'kexec: exclude elfcorehdr from the
      segment digest') or boot_params (as the elfcorehdr= capture kernel command
      line parameter pointer remains unchanged and correct) are needed, just
      elfcorehdr.
      
      For kexec_file_load(), the elfcorehdr segment size is based on NR_CPUS and
      CRASH_MAX_MEMORY_RANGES in order to accommodate a growing number of CPU
      and memory resources.
      
      For kexec_load(), the userspace kexec utility needs to size the elfcorehdr
      segment in the same/similar manner.
      
      To accommodate kexec_load() syscall in the absence of kexec_file_load()
      syscall support, prepare_elf_headers() and dependents are moved outside of
      CONFIG_KEXEC_FILE.
      
      [eric.devolder@oracle.com: correct unused function build error]
        Link: https://lkml.kernel.org/r/20230821182644.2143-1-eric.devolder@oracle.com
      Link: https://lkml.kernel.org/r/20230814214446.6659-6-eric.devolder@oracle.comSigned-off-by: default avatarEric DeVolder <eric.devolder@oracle.com>
      Reviewed-by: default avatarSourabh Jain <sourabhjain@linux.ibm.com>
      Acked-by: default avatarHari Bathini <hbathini@linux.ibm.com>
      Acked-by: default avatarBaoquan He <bhe@redhat.com>
      Cc: Akhil Raj <lf32.dev@gmail.com>
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Cc: Borislav Petkov (AMD) <bp@alien8.de>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Mimi Zohar <zohar@linux.ibm.com>
      Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: "Rafael J. Wysocki" <rafael@kernel.org>
      Cc: Sean Christopherson <seanjc@google.com>
      Cc: Takashi Iwai <tiwai@suse.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Thomas Weißschuh <linux@weissschuh.net>
      Cc: Valentin Schneider <vschneid@redhat.com>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      ea53ad9c
    • Eric DeVolder's avatar
      crash: memory and CPU hotplug sysfs attributes · 88a6f899
      Eric DeVolder authored
      Introduce the crash_hotplug attribute for memory and CPUs for use by
      userspace.  These attributes directly facilitate the udev rule for
      managing userspace re-loading of the crash kernel upon hot un/plug
      changes.
      
      For memory, expose the crash_hotplug attribute to the
      /sys/devices/system/memory directory.  For example:
      
       # udevadm info --attribute-walk /sys/devices/system/memory/memory81
        looking at device '/devices/system/memory/memory81':
          KERNEL=="memory81"
          SUBSYSTEM=="memory"
          DRIVER==""
          ATTR{online}=="1"
          ATTR{phys_device}=="0"
          ATTR{phys_index}=="00000051"
          ATTR{removable}=="1"
          ATTR{state}=="online"
          ATTR{valid_zones}=="Movable"
      
        looking at parent device '/devices/system/memory':
          KERNELS=="memory"
          SUBSYSTEMS==""
          DRIVERS==""
          ATTRS{auto_online_blocks}=="offline"
          ATTRS{block_size_bytes}=="8000000"
          ATTRS{crash_hotplug}=="1"
      
      For CPUs, expose the crash_hotplug attribute to the
      /sys/devices/system/cpu directory. For example:
      
       # udevadm info --attribute-walk /sys/devices/system/cpu/cpu0
        looking at device '/devices/system/cpu/cpu0':
          KERNEL=="cpu0"
          SUBSYSTEM=="cpu"
          DRIVER=="processor"
          ATTR{crash_notes}=="277c38600"
          ATTR{crash_notes_size}=="368"
          ATTR{online}=="1"
      
        looking at parent device '/devices/system/cpu':
          KERNELS=="cpu"
          SUBSYSTEMS==""
          DRIVERS==""
          ATTRS{crash_hotplug}=="1"
          ATTRS{isolated}==""
          ATTRS{kernel_max}=="8191"
          ATTRS{nohz_full}=="  (null)"
          ATTRS{offline}=="4-7"
          ATTRS{online}=="0-3"
          ATTRS{possible}=="0-7"
          ATTRS{present}=="0-3"
      
      With these sysfs attributes in place, it is possible to efficiently
      instruct the udev rule to skip crash kernel reloading for kernels
      configured with crash hotplug support.
      
      For example, the following is the proposed udev rule change for RHEL
      system 98-kexec.rules (as the first lines of the rule file):
      
       # The kernel updates the crash elfcorehdr for CPU and memory changes
       SUBSYSTEM=="cpu", ATTRS{crash_hotplug}=="1", GOTO="kdump_reload_end"
       SUBSYSTEM=="memory", ATTRS{crash_hotplug}=="1", GOTO="kdump_reload_end"
      
      When examined in the context of 98-kexec.rules, the above rules test if
      crash_hotplug is set, and if so, the userspace initiated
      unload-then-reload of the crash kernel is skipped.
      
      CPU and memory checks are separated in accordance with CONFIG_HOTPLUG_CPU
      and CONFIG_MEMORY_HOTPLUG kernel config options.  If an architecture
      supports, for example, memory hotplug but not CPU hotplug, then the
      /sys/devices/system/memory/crash_hotplug attribute file is present, but
      the /sys/devices/system/cpu/crash_hotplug attribute file will NOT be
      present.  Thus the udev rule skips userspace processing of memory hot
      un/plug events, but the udev rule will evaluate false for CPU events, thus
      allowing userspace to process CPU hot un/plug events (ie the
      unload-then-reload of the kdump capture kernel).
      
      Link: https://lkml.kernel.org/r/20230814214446.6659-5-eric.devolder@oracle.comSigned-off-by: default avatarEric DeVolder <eric.devolder@oracle.com>
      Reviewed-by: default avatarSourabh Jain <sourabhjain@linux.ibm.com>
      Acked-by: default avatarHari Bathini <hbathini@linux.ibm.com>
      Acked-by: default avatarBaoquan He <bhe@redhat.com>
      Cc: Akhil Raj <lf32.dev@gmail.com>
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Cc: Borislav Petkov (AMD) <bp@alien8.de>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Mimi Zohar <zohar@linux.ibm.com>
      Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: "Rafael J. Wysocki" <rafael@kernel.org>
      Cc: Sean Christopherson <seanjc@google.com>
      Cc: Takashi Iwai <tiwai@suse.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Thomas Weißschuh <linux@weissschuh.net>
      Cc: Valentin Schneider <vschneid@redhat.com>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      88a6f899
    • Eric DeVolder's avatar
      kexec: exclude elfcorehdr from the segment digest · f7cc804a
      Eric DeVolder authored
      When a crash kernel is loaded via the kexec_file_load() syscall, the
      kernel places the various segments (ie crash kernel, crash initrd,
      boot_params, elfcorehdr, purgatory, etc) in memory.  For those
      architectures that utilize purgatory, a hash digest of the segments is
      calculated for integrity checking.  The digest is embedded into the
      purgatory image prior to placing in memory.
      
      Updates to the elfcorehdr in response to CPU and memory changes would
      cause the purgatory integrity checking to fail (at crash time, and no
      vmcore created).  Therefore, the elfcorehdr segment is explicitly excluded
      from the purgatory digest, enabling updates to the elfcorehdr while also
      avoiding the need to recompute the hash digest and reload purgatory.
      
      Link: https://lkml.kernel.org/r/20230814214446.6659-4-eric.devolder@oracle.comSigned-off-by: default avatarEric DeVolder <eric.devolder@oracle.com>
      Suggested-by: default avatarBaoquan He <bhe@redhat.com>
      Reviewed-by: default avatarSourabh Jain <sourabhjain@linux.ibm.com>
      Acked-by: default avatarHari Bathini <hbathini@linux.ibm.com>
      Acked-by: default avatarBaoquan He <bhe@redhat.com>
      Cc: Akhil Raj <lf32.dev@gmail.com>
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Cc: Borislav Petkov (AMD) <bp@alien8.de>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Mimi Zohar <zohar@linux.ibm.com>
      Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: "Rafael J. Wysocki" <rafael@kernel.org>
      Cc: Sean Christopherson <seanjc@google.com>
      Cc: Takashi Iwai <tiwai@suse.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Thomas Weißschuh <linux@weissschuh.net>
      Cc: Valentin Schneider <vschneid@redhat.com>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      f7cc804a
    • Eric DeVolder's avatar
      crash: add generic infrastructure for crash hotplug support · 24726275
      Eric DeVolder authored
      To support crash hotplug, a mechanism is needed to update the crash
      elfcorehdr upon CPU or memory changes (eg.  hot un/plug or off/ onlining).
      The crash elfcorehdr describes the CPUs and memory to be written into the
      vmcore.
      
      To track CPU changes, callbacks are registered with the cpuhp mechanism
      via cpuhp_setup_state_nocalls(CPUHP_BP_PREPARE_DYN).  The crash hotplug
      elfcorehdr update has no explicit ordering requirement (relative to other
      cpuhp states), so meets the criteria for utilizing CPUHP_BP_PREPARE_DYN. 
      CPUHP_BP_PREPARE_DYN is a dynamic state and avoids the need to introduce a
      new state for crash hotplug.  Also, CPUHP_BP_PREPARE_DYN is the last state
      in the PREPARE group, just prior to the STARTING group, which is very
      close to the CPU starting up in a plug/online situation, or stopping in a
      unplug/ offline situation.  This minimizes the window of time during an
      actual plug/online or unplug/offline situation in which the elfcorehdr
      would be inaccurate.  Note that for a CPU being unplugged or offlined, the
      CPU will still be present in the list of CPUs generated by
      crash_prepare_elf64_headers().  However, there is no need to explicitly
      omit the CPU, see justification in 'crash: change
      crash_prepare_elf64_headers() to for_each_possible_cpu()'.
      
      To track memory changes, a notifier is registered to capture the memblock
      MEM_ONLINE and MEM_OFFLINE events via register_memory_notifier().
      
      The CPU callbacks and memory notifiers invoke crash_handle_hotplug_event()
      which performs needed tasks and then dispatches the event to the
      architecture specific arch_crash_handle_hotplug_event() to update the
      elfcorehdr with the current state of CPUs and memory.  During the process,
      the kexec_lock is held.
      
      Link: https://lkml.kernel.org/r/20230814214446.6659-3-eric.devolder@oracle.comSigned-off-by: default avatarEric DeVolder <eric.devolder@oracle.com>
      Reviewed-by: default avatarSourabh Jain <sourabhjain@linux.ibm.com>
      Acked-by: default avatarHari Bathini <hbathini@linux.ibm.com>
      Acked-by: default avatarBaoquan He <bhe@redhat.com>
      Cc: Akhil Raj <lf32.dev@gmail.com>
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Cc: Borislav Petkov (AMD) <bp@alien8.de>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Mimi Zohar <zohar@linux.ibm.com>
      Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: "Rafael J. Wysocki" <rafael@kernel.org>
      Cc: Sean Christopherson <seanjc@google.com>
      Cc: Takashi Iwai <tiwai@suse.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Thomas Weißschuh <linux@weissschuh.net>
      Cc: Valentin Schneider <vschneid@redhat.com>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      24726275
    • Eric DeVolder's avatar
      crash: move a few code bits to setup support of crash hotplug · 6f991cc3
      Eric DeVolder authored
      Patch series "crash: Kernel handling of CPU and memory hot un/plug", v28.
      
      Once the kdump service is loaded, if changes to CPUs or memory occur,
      either by hot un/plug or off/onlining, the crash elfcorehdr must also be
      updated.
      
      The elfcorehdr describes to kdump the CPUs and memory in the system, and
      any inaccuracies can result in a vmcore with missing CPU context or memory
      regions.
      
      The current solution utilizes udev to initiate an unload-then-reload of
      the kdump image (eg.  kernel, initrd, boot_params, purgatory and
      elfcorehdr) by the userspace kexec utility.  In the original post I
      outlined the significant performance problems related to offloading this
      activity to userspace.
      
      This patchset introduces a generic crash handler that registers with the
      CPU and memory notifiers.  Upon CPU or memory changes, from either hot
      un/plug or off/onlining, this generic handler is invoked and performs
      important housekeeping, for example obtaining the appropriate lock, and
      then invokes an architecture specific handler to do the appropriate
      elfcorehdr update.
      
      Note the description in patch 'crash: change crash_prepare_elf64_headers()
      to for_each_possible_cpu()' and 'x86/crash: optimize CPU changes' that
      enables further optimizations related to CPU plug/unplug/online/offline
      performance of elfcorehdr updates.
      
      In the case of x86_64, the arch specific handler generates a new
      elfcorehdr, and overwrites the old one in memory; thus no involvement with
      userspace needed.
      
      To realize the benefits/test this patchset, one must make a couple
      of minor changes to userspace:
      
       - Prevent udev from updating kdump crash kernel on hot un/plug changes.
         Add the following as the first lines to the RHEL udev rule file
         /usr/lib/udev/rules.d/98-kexec.rules:
      
         # The kernel updates the crash elfcorehdr for CPU and memory changes
         SUBSYSTEM=="cpu", ATTRS{crash_hotplug}=="1", GOTO="kdump_reload_end"
         SUBSYSTEM=="memory", ATTRS{crash_hotplug}=="1", GOTO="kdump_reload_end"
      
         With this changeset applied, the two rules evaluate to false for
         CPU and memory change events and thus skip the userspace
         unload-then-reload of kdump.
      
       - Change to the kexec_file_load for loading the kdump kernel:
         Eg. on RHEL: in /usr/bin/kdumpctl, change to:
          standard_kexec_args="-p -d -s"
         which adds the -s to select kexec_file_load() syscall.
      
      This kernel patchset also supports kexec_load() with a modified kexec
      userspace utility.  A working changeset to the kexec userspace utility is
      posted to the kexec-tools mailing list here:
      
       http://lists.infradead.org/pipermail/kexec/2023-May/027049.html
      
      To use the kexec-tools patch, apply, build and install kexec-tools, then
      change the kdumpctl's standard_kexec_args to replace the -s with
      --hotplug.  The removal of -s reverts to the kexec_load syscall and the
      addition of --hotplug invokes the changes put forth in the kexec-tools
      patch.
      
      
      This patch (of 8):
      
      The crash hotplug support leans on the work for the kexec_file_load()
      syscall.  To also support the kexec_load() syscall, a few bits of code
      need to be move outside of CONFIG_KEXEC_FILE.  As such, these bits are
      moved out of kexec_file.c and into a common location crash_core.c.
      
      In addition, struct crash_mem and crash_notes were moved to new locales so
      that PROC_KCORE, which sets CRASH_CORE alone, builds correctly.
      
      No functionality change intended.
      
      Link: https://lkml.kernel.org/r/20230814214446.6659-1-eric.devolder@oracle.com
      Link: https://lkml.kernel.org/r/20230814214446.6659-2-eric.devolder@oracle.comSigned-off-by: default avatarEric DeVolder <eric.devolder@oracle.com>
      Reviewed-by: default avatarSourabh Jain <sourabhjain@linux.ibm.com>
      Acked-by: default avatarHari Bathini <hbathini@linux.ibm.com>
      Acked-by: default avatarBaoquan He <bhe@redhat.com>
      Cc: Akhil Raj <lf32.dev@gmail.com>
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Cc: Borislav Petkov (AMD) <bp@alien8.de>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Mimi Zohar <zohar@linux.ibm.com>
      Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: "Rafael J. Wysocki" <rafael@kernel.org>
      Cc: Sean Christopherson <seanjc@google.com>
      Cc: Takashi Iwai <tiwai@suse.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Thomas Weißschuh <linux@weissschuh.net>
      Cc: Valentin Schneider <vschneid@redhat.com>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      6f991cc3
  2. 21 Aug, 2023 20 commits
  3. 18 Aug, 2023 10 commits