1. 24 Aug, 2023 4 commits
    • Eric DeVolder's avatar
      crash: memory and CPU hotplug sysfs attributes · 88a6f899
      Eric DeVolder authored
      Introduce the crash_hotplug attribute for memory and CPUs for use by
      userspace.  These attributes directly facilitate the udev rule for
      managing userspace re-loading of the crash kernel upon hot un/plug
      changes.
      
      For memory, expose the crash_hotplug attribute to the
      /sys/devices/system/memory directory.  For example:
      
       # udevadm info --attribute-walk /sys/devices/system/memory/memory81
        looking at device '/devices/system/memory/memory81':
          KERNEL=="memory81"
          SUBSYSTEM=="memory"
          DRIVER==""
          ATTR{online}=="1"
          ATTR{phys_device}=="0"
          ATTR{phys_index}=="00000051"
          ATTR{removable}=="1"
          ATTR{state}=="online"
          ATTR{valid_zones}=="Movable"
      
        looking at parent device '/devices/system/memory':
          KERNELS=="memory"
          SUBSYSTEMS==""
          DRIVERS==""
          ATTRS{auto_online_blocks}=="offline"
          ATTRS{block_size_bytes}=="8000000"
          ATTRS{crash_hotplug}=="1"
      
      For CPUs, expose the crash_hotplug attribute to the
      /sys/devices/system/cpu directory. For example:
      
       # udevadm info --attribute-walk /sys/devices/system/cpu/cpu0
        looking at device '/devices/system/cpu/cpu0':
          KERNEL=="cpu0"
          SUBSYSTEM=="cpu"
          DRIVER=="processor"
          ATTR{crash_notes}=="277c38600"
          ATTR{crash_notes_size}=="368"
          ATTR{online}=="1"
      
        looking at parent device '/devices/system/cpu':
          KERNELS=="cpu"
          SUBSYSTEMS==""
          DRIVERS==""
          ATTRS{crash_hotplug}=="1"
          ATTRS{isolated}==""
          ATTRS{kernel_max}=="8191"
          ATTRS{nohz_full}=="  (null)"
          ATTRS{offline}=="4-7"
          ATTRS{online}=="0-3"
          ATTRS{possible}=="0-7"
          ATTRS{present}=="0-3"
      
      With these sysfs attributes in place, it is possible to efficiently
      instruct the udev rule to skip crash kernel reloading for kernels
      configured with crash hotplug support.
      
      For example, the following is the proposed udev rule change for RHEL
      system 98-kexec.rules (as the first lines of the rule file):
      
       # The kernel updates the crash elfcorehdr for CPU and memory changes
       SUBSYSTEM=="cpu", ATTRS{crash_hotplug}=="1", GOTO="kdump_reload_end"
       SUBSYSTEM=="memory", ATTRS{crash_hotplug}=="1", GOTO="kdump_reload_end"
      
      When examined in the context of 98-kexec.rules, the above rules test if
      crash_hotplug is set, and if so, the userspace initiated
      unload-then-reload of the crash kernel is skipped.
      
      CPU and memory checks are separated in accordance with CONFIG_HOTPLUG_CPU
      and CONFIG_MEMORY_HOTPLUG kernel config options.  If an architecture
      supports, for example, memory hotplug but not CPU hotplug, then the
      /sys/devices/system/memory/crash_hotplug attribute file is present, but
      the /sys/devices/system/cpu/crash_hotplug attribute file will NOT be
      present.  Thus the udev rule skips userspace processing of memory hot
      un/plug events, but the udev rule will evaluate false for CPU events, thus
      allowing userspace to process CPU hot un/plug events (ie the
      unload-then-reload of the kdump capture kernel).
      
      Link: https://lkml.kernel.org/r/20230814214446.6659-5-eric.devolder@oracle.comSigned-off-by: default avatarEric DeVolder <eric.devolder@oracle.com>
      Reviewed-by: default avatarSourabh Jain <sourabhjain@linux.ibm.com>
      Acked-by: default avatarHari Bathini <hbathini@linux.ibm.com>
      Acked-by: default avatarBaoquan He <bhe@redhat.com>
      Cc: Akhil Raj <lf32.dev@gmail.com>
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Cc: Borislav Petkov (AMD) <bp@alien8.de>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Mimi Zohar <zohar@linux.ibm.com>
      Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: "Rafael J. Wysocki" <rafael@kernel.org>
      Cc: Sean Christopherson <seanjc@google.com>
      Cc: Takashi Iwai <tiwai@suse.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Thomas Weißschuh <linux@weissschuh.net>
      Cc: Valentin Schneider <vschneid@redhat.com>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      88a6f899
    • Eric DeVolder's avatar
      kexec: exclude elfcorehdr from the segment digest · f7cc804a
      Eric DeVolder authored
      When a crash kernel is loaded via the kexec_file_load() syscall, the
      kernel places the various segments (ie crash kernel, crash initrd,
      boot_params, elfcorehdr, purgatory, etc) in memory.  For those
      architectures that utilize purgatory, a hash digest of the segments is
      calculated for integrity checking.  The digest is embedded into the
      purgatory image prior to placing in memory.
      
      Updates to the elfcorehdr in response to CPU and memory changes would
      cause the purgatory integrity checking to fail (at crash time, and no
      vmcore created).  Therefore, the elfcorehdr segment is explicitly excluded
      from the purgatory digest, enabling updates to the elfcorehdr while also
      avoiding the need to recompute the hash digest and reload purgatory.
      
      Link: https://lkml.kernel.org/r/20230814214446.6659-4-eric.devolder@oracle.comSigned-off-by: default avatarEric DeVolder <eric.devolder@oracle.com>
      Suggested-by: default avatarBaoquan He <bhe@redhat.com>
      Reviewed-by: default avatarSourabh Jain <sourabhjain@linux.ibm.com>
      Acked-by: default avatarHari Bathini <hbathini@linux.ibm.com>
      Acked-by: default avatarBaoquan He <bhe@redhat.com>
      Cc: Akhil Raj <lf32.dev@gmail.com>
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Cc: Borislav Petkov (AMD) <bp@alien8.de>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Mimi Zohar <zohar@linux.ibm.com>
      Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: "Rafael J. Wysocki" <rafael@kernel.org>
      Cc: Sean Christopherson <seanjc@google.com>
      Cc: Takashi Iwai <tiwai@suse.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Thomas Weißschuh <linux@weissschuh.net>
      Cc: Valentin Schneider <vschneid@redhat.com>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      f7cc804a
    • Eric DeVolder's avatar
      crash: add generic infrastructure for crash hotplug support · 24726275
      Eric DeVolder authored
      To support crash hotplug, a mechanism is needed to update the crash
      elfcorehdr upon CPU or memory changes (eg.  hot un/plug or off/ onlining).
      The crash elfcorehdr describes the CPUs and memory to be written into the
      vmcore.
      
      To track CPU changes, callbacks are registered with the cpuhp mechanism
      via cpuhp_setup_state_nocalls(CPUHP_BP_PREPARE_DYN).  The crash hotplug
      elfcorehdr update has no explicit ordering requirement (relative to other
      cpuhp states), so meets the criteria for utilizing CPUHP_BP_PREPARE_DYN. 
      CPUHP_BP_PREPARE_DYN is a dynamic state and avoids the need to introduce a
      new state for crash hotplug.  Also, CPUHP_BP_PREPARE_DYN is the last state
      in the PREPARE group, just prior to the STARTING group, which is very
      close to the CPU starting up in a plug/online situation, or stopping in a
      unplug/ offline situation.  This minimizes the window of time during an
      actual plug/online or unplug/offline situation in which the elfcorehdr
      would be inaccurate.  Note that for a CPU being unplugged or offlined, the
      CPU will still be present in the list of CPUs generated by
      crash_prepare_elf64_headers().  However, there is no need to explicitly
      omit the CPU, see justification in 'crash: change
      crash_prepare_elf64_headers() to for_each_possible_cpu()'.
      
      To track memory changes, a notifier is registered to capture the memblock
      MEM_ONLINE and MEM_OFFLINE events via register_memory_notifier().
      
      The CPU callbacks and memory notifiers invoke crash_handle_hotplug_event()
      which performs needed tasks and then dispatches the event to the
      architecture specific arch_crash_handle_hotplug_event() to update the
      elfcorehdr with the current state of CPUs and memory.  During the process,
      the kexec_lock is held.
      
      Link: https://lkml.kernel.org/r/20230814214446.6659-3-eric.devolder@oracle.comSigned-off-by: default avatarEric DeVolder <eric.devolder@oracle.com>
      Reviewed-by: default avatarSourabh Jain <sourabhjain@linux.ibm.com>
      Acked-by: default avatarHari Bathini <hbathini@linux.ibm.com>
      Acked-by: default avatarBaoquan He <bhe@redhat.com>
      Cc: Akhil Raj <lf32.dev@gmail.com>
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Cc: Borislav Petkov (AMD) <bp@alien8.de>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Mimi Zohar <zohar@linux.ibm.com>
      Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: "Rafael J. Wysocki" <rafael@kernel.org>
      Cc: Sean Christopherson <seanjc@google.com>
      Cc: Takashi Iwai <tiwai@suse.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Thomas Weißschuh <linux@weissschuh.net>
      Cc: Valentin Schneider <vschneid@redhat.com>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      24726275
    • Eric DeVolder's avatar
      crash: move a few code bits to setup support of crash hotplug · 6f991cc3
      Eric DeVolder authored
      Patch series "crash: Kernel handling of CPU and memory hot un/plug", v28.
      
      Once the kdump service is loaded, if changes to CPUs or memory occur,
      either by hot un/plug or off/onlining, the crash elfcorehdr must also be
      updated.
      
      The elfcorehdr describes to kdump the CPUs and memory in the system, and
      any inaccuracies can result in a vmcore with missing CPU context or memory
      regions.
      
      The current solution utilizes udev to initiate an unload-then-reload of
      the kdump image (eg.  kernel, initrd, boot_params, purgatory and
      elfcorehdr) by the userspace kexec utility.  In the original post I
      outlined the significant performance problems related to offloading this
      activity to userspace.
      
      This patchset introduces a generic crash handler that registers with the
      CPU and memory notifiers.  Upon CPU or memory changes, from either hot
      un/plug or off/onlining, this generic handler is invoked and performs
      important housekeeping, for example obtaining the appropriate lock, and
      then invokes an architecture specific handler to do the appropriate
      elfcorehdr update.
      
      Note the description in patch 'crash: change crash_prepare_elf64_headers()
      to for_each_possible_cpu()' and 'x86/crash: optimize CPU changes' that
      enables further optimizations related to CPU plug/unplug/online/offline
      performance of elfcorehdr updates.
      
      In the case of x86_64, the arch specific handler generates a new
      elfcorehdr, and overwrites the old one in memory; thus no involvement with
      userspace needed.
      
      To realize the benefits/test this patchset, one must make a couple
      of minor changes to userspace:
      
       - Prevent udev from updating kdump crash kernel on hot un/plug changes.
         Add the following as the first lines to the RHEL udev rule file
         /usr/lib/udev/rules.d/98-kexec.rules:
      
         # The kernel updates the crash elfcorehdr for CPU and memory changes
         SUBSYSTEM=="cpu", ATTRS{crash_hotplug}=="1", GOTO="kdump_reload_end"
         SUBSYSTEM=="memory", ATTRS{crash_hotplug}=="1", GOTO="kdump_reload_end"
      
         With this changeset applied, the two rules evaluate to false for
         CPU and memory change events and thus skip the userspace
         unload-then-reload of kdump.
      
       - Change to the kexec_file_load for loading the kdump kernel:
         Eg. on RHEL: in /usr/bin/kdumpctl, change to:
          standard_kexec_args="-p -d -s"
         which adds the -s to select kexec_file_load() syscall.
      
      This kernel patchset also supports kexec_load() with a modified kexec
      userspace utility.  A working changeset to the kexec userspace utility is
      posted to the kexec-tools mailing list here:
      
       http://lists.infradead.org/pipermail/kexec/2023-May/027049.html
      
      To use the kexec-tools patch, apply, build and install kexec-tools, then
      change the kdumpctl's standard_kexec_args to replace the -s with
      --hotplug.  The removal of -s reverts to the kexec_load syscall and the
      addition of --hotplug invokes the changes put forth in the kexec-tools
      patch.
      
      
      This patch (of 8):
      
      The crash hotplug support leans on the work for the kexec_file_load()
      syscall.  To also support the kexec_load() syscall, a few bits of code
      need to be move outside of CONFIG_KEXEC_FILE.  As such, these bits are
      moved out of kexec_file.c and into a common location crash_core.c.
      
      In addition, struct crash_mem and crash_notes were moved to new locales so
      that PROC_KCORE, which sets CRASH_CORE alone, builds correctly.
      
      No functionality change intended.
      
      Link: https://lkml.kernel.org/r/20230814214446.6659-1-eric.devolder@oracle.com
      Link: https://lkml.kernel.org/r/20230814214446.6659-2-eric.devolder@oracle.comSigned-off-by: default avatarEric DeVolder <eric.devolder@oracle.com>
      Reviewed-by: default avatarSourabh Jain <sourabhjain@linux.ibm.com>
      Acked-by: default avatarHari Bathini <hbathini@linux.ibm.com>
      Acked-by: default avatarBaoquan He <bhe@redhat.com>
      Cc: Akhil Raj <lf32.dev@gmail.com>
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Cc: Borislav Petkov (AMD) <bp@alien8.de>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Mimi Zohar <zohar@linux.ibm.com>
      Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: "Rafael J. Wysocki" <rafael@kernel.org>
      Cc: Sean Christopherson <seanjc@google.com>
      Cc: Takashi Iwai <tiwai@suse.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Thomas Weißschuh <linux@weissschuh.net>
      Cc: Valentin Schneider <vschneid@redhat.com>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      6f991cc3
  2. 21 Aug, 2023 20 commits
  3. 18 Aug, 2023 16 commits