Merge tag 'trace-ring-buffer-v6.12' of...

Merge tag 'trace-ring-buffer-v6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull ring-buffer updates from Steven Rostedt: - tracing/ring-buffer: persistent buffer across reboots This allows for the tracing instance ring buffer to stay persistent across reboots. The way this is done is by adding to the kernel command line: trace_instance=boot_map@0x285400000:12M This will reserve 12 megabytes at the address 0x285400000, and then map the tracing instance "boot_map" ring buffer to that memory. This will appear as a normal instance in the tracefs system: /sys/kernel/tracing/instances/boot_map A user could enable tracing in that instance, and on reboot or kernel crash, if the memory is not wiped by the firmware, it will recreate the trace in that instance. For example, if one was debugging a shutdown of a kernel reboot: # cd /sys/kernel/tracing # echo function > instances/boot_map/current_tracer # reboot [..] # cd /sys/kernel/tracing # tail instances/boot_map/trace swapper/0-1 [000] d..1. 164.549800: restore_boot_irq_mode <-native_machine_shutdown swapper/0-1 [000] d..1. 164.549801: native_restore_boot_irq_mode <-native_machine_shutdown swapper/0-1 [000] d..1. 164.549802: disconnect_bsp_APIC <-native_machine_shutdown swapper/0-1 [000] d..1. 164.549811: hpet_disable <-native_machine_shutdown swapper/0-1 [000] d..1. 164.549812: iommu_shutdown_noop <-native_machine_restart swapper/0-1 [000] d..1. 164.549813: native_machine_emergency_restart <-__do_sys_reboot swapper/0-1 [000] d..1. 164.549813: tboot_shutdown <-native_machine_emergency_restart swapper/0-1 [000] d..1. 164.549820: acpi_reboot <-native_machine_emergency_restart swapper/0-1 [000] d..1. 164.549821: acpi_reset <-acpi_reboot swapper/0-1 [000] d..1. 164.549822: acpi_os_write_port <-acpi_reboot On reboot, the buffer is examined to make sure it is valid. The validation check even steps through every event to make sure the meta data of the event is correct. If any test fails, it will simply reset the buffer, and the buffer will be empty on boot. - Allow the tracing persistent boot buffer to use the "reserve_mem" option Instead of having the admin find a physical address to store the persistent buffer, which can be very tedious if they have to administrate several different machines, allow them to use the "reserve_mem" option that will find a location for them. It is not as reliable because of KASLR, as the loading of the kernel in different locations can cause the memory allocated to be inconsistent. Booting with "nokaslr" can make reserve_mem more reliable. - Have function graph tracer handle offsets from a previous boot. The ring buffer output from a previous boot may have different addresses due to kaslr. Have the function graph tracer handle these by using the delta from the previous boot to the new boot address space. - Only reset the saved meta offset when the buffer is started or reset In the persistent memory meta data, it holds the previous address space information, so that it can calculate the delta to have function tracing work. But this gets updated after being read to hold the new address space. But if the buffer isn't used for that boot, on reboot, the delta is now calculated from the previous boot and not the boot that holds the data in the ring buffer. This causes the functions not to be shown. Do not save the address space information of the current kernel until it is being recorded. - Add a magic variable to test the valid meta data Add a magic variable in the meta data that can also be used for validation. The validator of the previous buffer doesn't need this magic data, but it can be used if the meta data is changed by a new kernel, which may have the same format that passes the validator but is used differently. This magic number can also be used as a "versioning" of the meta data. - Align user space mapped ring buffer sub buffers to improve TLB entries Linus mentioned that the mapped ring buffer sub buffers were misaligned between the meta page and the sub-buffers, so that if the sub-buffers were bigger than PAGE_SIZE, it wouldn't allow the TLB to use bigger entries. - Add new kernel command line "traceoff" to disable tracing on boot for instances If tracing is enabled for a boot instance, there needs a way to be able to disable it on boot so that new events do not get entered into the ring buffer and be mixed with events from a previous boot, as that can be confusing. - Allow trace_printk() to go to other instances Currently, trace_printk() can only go to the top level instance. When debugging with a persistent buffer, it is really useful to be able to add trace_printk() to go to that buffer, so that you have access to them after a crash. - Do not use "bin_printk()" for traces to a boot instance The bin_printk() saves only a pointer to the printk format in the ring buffer, as the reader of the buffer can still have access to it. But this is not the case if the buffer is from a previous boot. If the trace_printk() is going to a "persistent" buffer, it will use the slower version that writes the printk format into the buffer. - Add command line option to allow trace_printk() to go to an instance Allow the kernel command line to define which instance the trace_printk() goes to, instead of forcing the admin to set it for every boot via the tracefs options. - Start a document that explains how to use tracefs to debug the kernel - Add some more kernel selftests to test user mapped ring buffer * tag 'trace-ring-buffer-v6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: (28 commits) selftests/ring-buffer: Handle meta-page bigger than the system selftests/ring-buffer: Verify the entire meta-page padding tracing/Documentation: Start a document on how to debug with tracing tracing: Add option to set an instance to be the trace_printk destination tracing: Have trace_printk not use binary prints if boot buffer tracing: Allow trace_printk() to go to other instance buffers tracing: Add "traceoff" flag to boot time tracing instances ring-buffer: Align meta-page to sub-buffers for improved TLB usage ring-buffer: Add magic and struct size to boot up meta data ring-buffer: Don't reset persistent ring-buffer meta saved addresses tracing/fgraph: Have fgraph handle previous boot function addresses tracing: Allow boot instances to use reserve_mem boot memory tracing: Fix ifdef of snapshots to not prevent last_boot_info file ring-buffer: Use vma_pages() helper function tracing: Fix NULL vs IS_ERR() check in enable_instances() tracing: Add last boot delta offset for stack traces tracing: Update function tracing output for previous boot buffer tracing: Handle old buffer mappings for event strings and functions tracing/ring-buffer: Add last_boot_info file to boot instance ring-buffer: Save text and data locations in mapped meta data ...

Merge tag 'trace-ring-buffer-v6.12' of...
Merge tag 'trace-ring-buffer-v6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull ring-buffer updates from Steven Rostedt: - tracing/ring-buffer: persistent buffer across reboots This allows for the tracing instance ring buffer to stay persistent across reboots. The way this is done is by adding to the kernel command line: trace_instance=boot_map@0x285400000:12M This will reserve 12 megabytes at the address 0x285400000, and then map the tracing instance "boot_map" ring buffer to that memory. This will appear as a normal instance in the tracefs system: /sys/kernel/tracing/instances/boot_map A user could enable tracing in that instance, and on reboot or kernel crash, if the memory is not wiped by the firmware, it will recreate the trace in that instance. For example, if one was debugging a shutdown of a kernel reboot: # cd /sys/kernel/tracing # echo function > instances/boot_map/current_tracer # reboot [..] # cd /sys/kernel/tracing # tail instances/boot_map/trace swapper/0-1 [000] d..1. 164.549800: restore_boot_irq_mode <-native_machine_shutdown swapper/0-1 [000] d..1. 164.549801: native_restore_boot_irq_mode <-native_machine_shutdown swapper/0-1 [000] d..1. 164.549802: disconnect_bsp_APIC <-native_machine_shutdown swapper/0-1 [000] d..1. 164.549811: hpet_disable <-native_machine_shutdown swapper/0-1 [000] d..1. 164.549812: iommu_shutdown_noop <-native_machine_restart swapper/0-1 [000] d..1. 164.549813: native_machine_emergency_restart <-__do_sys_reboot swapper/0-1 [000] d..1. 164.549813: tboot_shutdown <-native_machine_emergency_restart swapper/0-1 [000] d..1. 164.549820: acpi_reboot <-native_machine_emergency_restart swapper/0-1 [000] d..1. 164.549821: acpi_reset <-acpi_reboot swapper/0-1 [000] d..1. 164.549822: acpi_os_write_port <-acpi_reboot On reboot, the buffer is examined to make sure it is valid. The validation check even steps through every event to make sure the meta data of the event is correct. If any test fails, it will simply reset the buffer, and the buffer will be empty on boot. - Allow the tracing persistent boot buffer to use the "reserve_mem" option Instead of having the admin find a physical address to store the persistent buffer, which can be very tedious if they have to administrate several different machines, allow them to use the "reserve_mem" option that will find a location for them. It is not as reliable because of KASLR, as the loading of the kernel in different locations can cause the memory allocated to be inconsistent. Booting with "nokaslr" can make reserve_mem more reliable. - Have function graph tracer handle offsets from a previous boot. The ring buffer output from a previous boot may have different addresses due to kaslr. Have the function graph tracer handle these by using the delta from the previous boot to the new boot address space. - Only reset the saved meta offset when the buffer is started or reset In the persistent memory meta data, it holds the previous address space information, so that it can calculate the delta to have function tracing work. But this gets updated after being read to hold the new address space. But if the buffer isn't used for that boot, on reboot, the delta is now calculated from the previous boot and not the boot that holds the data in the ring buffer. This causes the functions not to be shown. Do not save the address space information of the current kernel until it is being recorded. - Add a magic variable to test the valid meta data Add a magic variable in the meta data that can also be used for validation. The validator of the previous buffer doesn't need this magic data, but it can be used if the meta data is changed by a new kernel, which may have the same format that passes the validator but is used differently. This magic number can also be used as a "versioning" of the meta data. - Align user space mapped ring buffer sub buffers to improve TLB entries Linus mentioned that the mapped ring buffer sub buffers were misaligned between the meta page and the sub-buffers, so that if the sub-buffers were bigger than PAGE_SIZE, it wouldn't allow the TLB to use bigger entries. - Add new kernel command line "traceoff" to disable tracing on boot for instances If tracing is enabled for a boot instance, there needs a way to be able to disable it on boot so that new events do not get entered into the ring buffer and be mixed with events from a previous boot, as that can be confusing. - Allow trace_printk() to go to other instances Currently, trace_printk() can only go to the top level instance. When debugging with a persistent buffer, it is really useful to be able to add trace_printk() to go to that buffer, so that you have access to them after a crash. - Do not use "bin_printk()" for traces to a boot instance The bin_printk() saves only a pointer to the printk format in the ring buffer, as the reader of the buffer can still have access to it. But this is not the case if the buffer is from a previous boot. If the trace_printk() is going to a "persistent" buffer, it will use the slower version that writes the printk format into the buffer. - Add command line option to allow trace_printk() to go to an instance Allow the kernel command line to define which instance the trace_printk() goes to, instead of forcing the admin to set it for every boot via the tracefs options. - Start a document that explains how to use tracefs to debug the kernel - Add some more kernel selftests to test user mapped ring buffer * tag 'trace-ring-buffer-v6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: (28 commits) selftests/ring-buffer: Handle meta-page bigger than the system selftests/ring-buffer: Verify the entire meta-page padding tracing/Documentation: Start a document on how to debug with tracing tracing: Add option to set an instance to be the trace_printk destination tracing: Have trace_printk not use binary prints if boot buffer tracing: Allow trace_printk() to go to other instance buffers tracing: Add "traceoff" flag to boot time tracing instances ring-buffer: Align meta-page to sub-buffers for improved TLB usage ring-buffer: Add magic and struct size to boot up meta data ring-buffer: Don't reset persistent ring-buffer meta saved addresses tracing/fgraph: Have fgraph handle previous boot function addresses tracing: Allow boot instances to use reserve_mem boot memory tracing: Fix ifdef of snapshots to not prevent last_boot_info file ring-buffer: Use vma_pages() helper function tracing: Fix NULL vs IS_ERR() check in enable_instances() tracing: Add last boot delta offset for stack traces tracing: Update function tracing output for previous boot buffer tracing: Handle old buffer mappings for event strings and functions tracing/ring-buffer: Add last_boot_info file to boot instance ring-buffer: Save text and data locations in mapped meta data ...
af9c191a · Linus Torvalds · dd609b8a · 75d7ff9a · af9c191a · af9c191a
Commit af9c191a authored Sep 22, 2024 by Linus Torvalds
10 changed files
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -6808,6 +6808,51 @@
 			the same thing would happen if it was left off). The irq_handler_entry
 			event, and all events under the "initcall" system.
+			Flags can be added to the instance to modify its behavior when it is
+			created. The flags are separated by '^'.
+			The available flags are:
+			    traceoff	- Have the tracing instance tracing disabled after it is created.
+			    traceprintk	- Have trace_printk() write into this trace instance
+					  (note, "printk" and "trace_printk" can also be used)
+				trace_instance=foo^traceoff^traceprintk,sched,irq
+			The flags must come before the defined events.
+			If memory has been reserved (see memmap for x86), the instance
+			can use that memory:
+				memmap=12M$0x284500000 trace_instance=boot_map@0x284500000:12M
+			The above will create a "boot_map" instance that uses the physical
+			memory at 0x284500000 that is 12Megs. The per CPU buffers of that
+			instance will be split up accordingly.
+			Alternatively, the memory can be reserved by the reserve_mem option:
+				reserve_mem=12M:4096:trace trace_instance=boot_map@trace
+			This will reserve 12 megabytes at boot up with a 4096 byte alignment
+			and place the ring buffer in this memory. Note that due to KASLR, the
+			memory may not be the same location each time, which will not preserve
+			the buffer content.
+			Also note that the layout of the ring buffer data may change between
+			kernel versions where the validator will fail and reset the ring buffer
+			if the layout is not the same as the previous kernel.
+			If the ring buffer is used for persistent bootups and has events enabled,
+			it is recommend to disable tracing so that events from a previous boot do not
+			mix with events of the current boot (unless you are debugging a random crash
+			at boot up).
+				reserve_mem=12M:4096:trace trace_instance=boot_map^traceoff^traceprintk@trace,sched,irq
+			See also Documentation/trace/debugging.rst
 	trace_options=[option-list]
 			[FTRACE] Enable or disable tracer options at boot.
 			The option-list is a comma delimited list of options

--- a/Documentation/trace/debugging.rst
+++ b/Documentation/trace/debugging.rst
+==============================
+Using the tracer for debugging
+==============================
+Copyright 2024 Google LLC.
+:Author:   Steven Rostedt <rostedt@goodmis.org>
+:License:  The GNU Free Documentation License, Version 1.2
+          (dual licensed under the GPL v2)
+- Written for: 6.12
+Introduction
+------------
+The tracing infrastructure can be very useful for debugging the Linux
+kernel. This document is a place to add various methods of using the tracer
+for debugging.
+First, make sure that the tracefs file system is mounted::
+ $ sudo mount -t tracefs tracefs /sys/kernel/tracing
+Using trace_printk()
+--------------------
+trace_printk() is a very lightweight utility that can be used in any context
+inside the kernel, with the exception of "noinstr" sections. It can be used
+in normal, softirq, interrupt and even NMI context. The trace data is
+written to the tracing ring buffer in a lockless way. To make it even
+lighter weight, when possible, it will only record the pointer to the format
+string, and save the raw arguments into the buffer. The format and the
+arguments will be post processed when the ring buffer is read. This way the
+trace_printk() format conversions are not done during the hot path, where
+the trace is being recorded.
+trace_printk() is meant only for debugging, and should never be added into
+a subsystem of the kernel. If you need debugging traces, add trace events
+instead. If a trace_printk() is found in the kernel, the following will
+appear in the dmesg::
+  **********************************************************
+  **   NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE   **
+  **                                                      **
+  ** trace_printk() being used. Allocating extra memory.  **
+  **                                                      **
+  ** This means that this is a DEBUG kernel and it is     **
+  ** unsafe for production use.                           **
+  **                                                      **
+  ** If you see this message and you are not debugging    **
+  ** the kernel, report this immediately to your vendor!  **
+  **                                                      **
+  **   NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE   **
+  **********************************************************
+Debugging kernel crashes
+------------------------
+There is various methods of acquiring the state of the system when a kernel
+crash occurs. This could be from the oops message in printk, or one could
+use kexec/kdump. But these just show what happened at the time of the crash.
+It can be very useful in knowing what happened up to the point of the crash.
+The tracing ring buffer, by default, is a circular buffer than will
+overwrite older events with newer ones. When a crash happens, the content of
+the ring buffer will be all the events that lead up to the crash.
+There are several kernel command line parameters that can be used to help in
+this. The first is "ftrace_dump_on_oops". This will dump the tracing ring
+buffer when a oops occurs to the console. This can be useful if the console
+is being logged somewhere. If a serial console is used, it may be prudent to
+make sure the ring buffer is relatively small, otherwise the dumping of the
+ring buffer may take several minutes to hours to finish. Here's an example
+of the kernel command line::
+  ftrace_dump_on_oops trace_buf_size=50K
+Note, the tracing buffer is made up of per CPU buffers where each of these
+buffers is broken up into sub-buffers that are by default PAGE_SIZE. The
+above trace_buf_size option above sets each of the per CPU buffers to 50K,
+so, on a machine with 8 CPUs, that's actually 400K total.
+Persistent buffers across boots
+-------------------------------
+If the system memory allows it, the tracing ring buffer can be specified at
+a specific location in memory. If the location is the same across boots and
+the memory is not modified, the tracing buffer can be retrieved from the
+following boot. There's two ways to reserve memory for the use of the ring
+buffer.
+The more reliable way (on x86) is to reserve memory with the "memmap" kernel
+command line option and then use that memory for the trace_instance. This
+requires a bit of knowledge of the physical memory layout of the system. The
+advantage of using this method, is that the memory for the ring buffer will
+always be the same::
+  memmap==12M$0x284500000 trace_instance=boot_map@0x284500000:12M
+The memmap above reserves 12 megabytes of memory at the physical memory
+location 0x284500000. Then the trace_instance option will create a trace
+instance "boot_map" at that same location with the same amount of memory
+reserved. As the ring buffer is broke up into per CPU buffers, the 12
+megabytes will be broken up evenly between those CPUs. If you have 8 CPUs,
+each per CPU ring buffer will be 1.5 megabytes in size. Note, that also
+includes meta data, so the amount of memory actually used by the ring buffer
+will be slightly smaller.
+Another more generic but less robust way to allocate a ring buffer mapping
+at boot is with the "reserve_mem" option::
+  reserve_mem=12M:4096:trace trace_instance=boot_map@trace
+The reserve_mem option above will find 12 megabytes that are available at
+boot up, and align it by 4096 bytes. It will label this memory as "trace"
+that can be used by later command line options.
+The trace_instance option creates a "boot_map" instance and will use the
+memory reserved by reserve_mem that was labeled as "trace". This method is
+more generic but may not be as reliable. Due to KASLR, the memory reserved
+by reserve_mem may not be located at the same location. If this happens,
+then the ring buffer will not be from the previous boot and will be reset.
+Sometimes, by using a larger alignment, it can keep KASLR from moving things
+around in such a way that it will move the location of the reserve_mem. By
+using a larger alignment, you may find better that the buffer is more
+consistent to where it is placed::
+  reserve_mem=12M:0x2000000:trace trace_instance=boot_map@trace
+On boot up, the memory reserved for the ring buffer is validated. It will go
+through a series of tests to make sure that the ring buffer contains valid
+data. If it is, it will then set it up to be available to read from the
+instance. If it fails any of the tests, it will clear the entire ring buffer
+and initialize it as new.
+The layout of this mapped memory may not be consistent from kernel to
+kernel, so only the same kernel is guaranteed to work if the mapping is
+preserved. Switching to a different kernel version may find a different
+layout and mark the buffer as invalid.
+Using trace_printk() in the boot instance
+-----------------------------------------
+By default, the content of trace_printk() goes into the top level tracing
+instance. But this instance is never preserved across boots. To have the
+trace_printk() content, and some other internal tracing go to the preserved
+buffer (like dump stacks), either set the instance to be the trace_printk()
+destination from the kernel command line, or set it after boot up via the
+trace_printk_dest option.
+After boot up::
+  echo 1 > /sys/kernel/tracing/instances/boot_map/options/trace_printk_dest
+From the kernel command line::
+  reserve_mem=12M:4096:trace trace_instance=boot_map^traceprintk^traceoff@trace
+If setting it from the kernel command line, it is recommended to also
+disable tracing with the "traceoff" flag, and enable tracing after boot up.
+Otherwise the trace from the most recent boot will be mixed with the trace
+from the previous boot, and may make it confusing to read.
--- a/Documentation/trace/ftrace.rst
+++ b/Documentation/trace/ftrace.rst
@@ -1186,6 +1186,18 @@ Here are the available options:
  trace_printk
 	Can disable trace_printk() from writing into the buffer.
+  trace_printk_dest
+	Set to have trace_printk() and similar internal tracing functions
+	write into this instance. Note, only one trace instance can have
+	this set. By setting this flag, it clears the trace_printk_dest flag
+	of the instance that had it set previously. By default, the top
+	level trace has this set, and will get it set again if another
+	instance has it set then clears it.
+	This flag cannot be cleared by the top level instance, as it is the
+	default instance. The only way the top level instance has this flag
+	cleared, is by it being set in another instance.
  annotate
 	It is sometimes confusing when the CPU buffers are full
 	and one CPU buffer had a lot of events recently, thus

--- a/include/linux/ring_buffer.h
+++ b/include/linux/ring_buffer.h
@@ -89,6 +89,14 @@ void ring_buffer_discard_commit(struct trace_buffer *buffer,
 struct trace_buffer *
 __ring_buffer_alloc(unsigned long size, unsigned flags, struct lock_class_key *key);
+struct trace_buffer *__ring_buffer_alloc_range(unsigned long size, unsigned flags,
+					       int order, unsigned long start,
+					       unsigned long range_size,
+					       struct lock_class_key *key);
+bool ring_buffer_last_boot_delta(struct trace_buffer *buffer, long *text,
+				 long *data);
 /*
 * Because the ring buffer is generic, if other users of the ring buffer get
 * traced by ftrace, it can produce lockdep warnings. We need to keep each
@@ -100,6 +108,18 @@ __ring_buffer_alloc(unsigned long size, unsigned flags, struct lock_class_key *k
 	__ring_buffer_alloc((size), (flags), &__key);	\
 })
+/*
+ * Because the ring buffer is generic, if other users of the ring buffer get
+ * traced by ftrace, it can produce lockdep warnings. We need to keep each
+ * ring buffer's lock class separate.
+ */
+#define ring_buffer_alloc_range(size, flags, order, start, range_size)	\
+({									\
+	static struct lock_class_key __key;				\
+	__ring_buffer_alloc_range((size), (flags), (order), (start),	\
+				  (range_size), &__key);		\
+})
 typedef bool (*ring_buffer_cond_fn)(void *data);
 int ring_buffer_wait(struct trace_buffer *buffer, int cpu, int full,
 		     ring_buffer_cond_fn cond, void *data);

--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -336,7 +336,6 @@ struct trace_array {
 	bool			allocated_snapshot;
 	spinlock_t		snapshot_trigger_lock;
 	unsigned int		snapshot;
-	unsigned int		mapped;
 	unsigned long		max_latency;
 #ifdef CONFIG_FSNOTIFY
 	struct dentry		*d_max_latency;
@@ -344,6 +343,13 @@ struct trace_array {
 	struct irq_work		fsnotify_irqwork;
 #endif
 #endif
+	/* The below is for memory mapped ring buffer */
+	unsigned int		mapped;
+	unsigned long		range_addr_start;
+	unsigned long		range_addr_size;
+	long			text_delta;
+	long			data_delta;
 	struct trace_pid_list	__rcu *filtered_pids;
 	struct trace_pid_list	__rcu *filtered_no_pids;
 	/*
@@ -423,7 +429,8 @@ struct trace_array {
 };
 enum {
-	TRACE_ARRAY_FL_GLOBAL	= (1 << 0)
+	TRACE_ARRAY_FL_GLOBAL	= BIT(0),
+	TRACE_ARRAY_FL_BOOT	= BIT(1),
 };
 extern struct list_head ftrace_trace_arrays;
@@ -644,6 +651,8 @@ trace_buffer_lock_reserve(struct trace_buffer *buffer,
 			  unsigned long len,
 			  unsigned int trace_ctx);
+int ring_buffer_meta_seq_init(struct file *file, struct trace_buffer *buffer, int cpu);
 struct trace_entry *tracing_get_trace_entry(struct trace_array *tr,
 						struct trace_array_cpu *data);
@@ -1312,6 +1321,7 @@ extern int trace_get_user(struct trace_parser *parser, const char __user *ubuf,
 		C(IRQ_INFO,		"irq-info"),		\
 		C(MARKERS,		"markers"),		\
 		C(EVENT_FORK,		"event-fork"),		\
+		C(TRACE_PRINTK,		"trace_printk_dest"),	\
 		C(PAUSE_ON_TRACE,	"pause-on-trace"),	\
 		C(HASH_PTR,		"hash-ptr"),	/* Print hashed pointer */ \
 		FUNCTION_FLAGS					\

--- a/kernel/trace/trace_functions_graph.c
+++ b/kernel/trace/trace_functions_graph.c
@@ -544,6 +544,8 @@ print_graph_irq(struct trace_iterator *iter, unsigned long addr,
 	struct trace_seq *s = &iter->seq;
 	struct trace_entry *ent = iter->ent;
+	addr += iter->tr->text_delta;
 	if (addr < (unsigned long)__irqentry_text_start ||
 		addr >= (unsigned long)__irqentry_text_end)
 		return;
@@ -710,6 +712,7 @@ print_graph_entry_leaf(struct trace_iterator *iter,
 	struct ftrace_graph_ret *graph_ret;
 	struct ftrace_graph_ent *call;
 	unsigned long long duration;
+	unsigned long func;
 	int cpu = iter->cpu;
 	int i;
@@ -717,6 +720,8 @@ print_graph_entry_leaf(struct trace_iterator *iter,
 	call = &entry->graph_ent;
 	duration = graph_ret->rettime - graph_ret->calltime;
+	func = call->func + iter->tr->text_delta;
 	if (data) {
 		struct fgraph_cpu_data *cpu_data;
@@ -747,10 +752,10 @@ print_graph_entry_leaf(struct trace_iterator *iter,
 	 * enabled.
 	 */
 	if (flags & __TRACE_GRAPH_PRINT_RETVAL)
-		print_graph_retval(s, graph_ret->retval, true, (void *)call->func,
+		print_graph_retval(s, graph_ret->retval, true, (void *)func,
 				!!(flags & TRACE_GRAPH_PRINT_RETVAL_HEX));
 	else
-		trace_seq_printf(s, "%ps();\n", (void *)call->func);
+		trace_seq_printf(s, "%ps();\n", (void *)func);
 	print_graph_irq(iter, graph_ret->func, TRACE_GRAPH_RET,
 			cpu, iter->ent->pid, flags);
@@ -766,6 +771,7 @@ print_graph_entry_nested(struct trace_iterator *iter,
 	struct ftrace_graph_ent *call = &entry->graph_ent;
 	struct fgraph_data *data = iter->private;
 	struct trace_array *tr = iter->tr;
+	unsigned long func;
 	int i;
 	if (data) {
@@ -788,7 +794,9 @@ print_graph_entry_nested(struct trace_iterator *iter,
 	for (i = 0; i < call->depth * TRACE_GRAPH_INDENT; i++)
 		trace_seq_putc(s, ' ');
-	trace_seq_printf(s, "%ps() {\n", (void *)call->func);
+	func = call->func + iter->tr->text_delta;
+	trace_seq_printf(s, "%ps() {\n", (void *)func);
 	if (trace_seq_has_overflowed(s))
 		return TRACE_TYPE_PARTIAL_LINE;
@@ -863,6 +871,8 @@ check_irq_entry(struct trace_iterator *iter, u32 flags,
 	int *depth_irq;
 	struct fgraph_data *data = iter->private;
+	addr += iter->tr->text_delta;
 	/*
 	 * If we are either displaying irqs, or we got called as
 	 * a graph event and private data does not exist,
@@ -990,11 +1000,14 @@ print_graph_return(struct ftrace_graph_ret *trace, struct trace_seq *s,
 	unsigned long long duration = trace->rettime - trace->calltime;
 	struct fgraph_data *data = iter->private;
 	struct trace_array *tr = iter->tr;
+	unsigned long func;
 	pid_t pid = ent->pid;
 	int cpu = iter->cpu;
 	int func_match = 1;
 	int i;
+	func = trace->func + iter->tr->text_delta;
 	if (check_irq_return(iter, flags, trace->depth))
 		return TRACE_TYPE_HANDLED;
@@ -1033,7 +1046,7 @@ print_graph_return(struct ftrace_graph_ret *trace, struct trace_seq *s,
 	 * function-retval option is enabled.
 	 */
 	if (flags & __TRACE_GRAPH_PRINT_RETVAL) {
-		print_graph_retval(s, trace->retval, false, (void *)trace->func,
+		print_graph_retval(s, trace->retval, false, (void *)func,
 			!!(flags & TRACE_GRAPH_PRINT_RETVAL_HEX));
 	} else {
 		/*
@@ -1046,7 +1059,7 @@ print_graph_return(struct ftrace_graph_ret *trace, struct trace_seq *s,
 		if (func_match && !(flags & TRACE_GRAPH_PRINT_TAIL))
 			trace_seq_puts(s, "}\n");
 		else
-			trace_seq_printf(s, "} /* %ps */\n", (void *)trace->func);
+			trace_seq_printf(s, "} /* %ps */\n", (void *)func);
 	}
 	/* Overrun */

--- a/kernel/trace/trace_output.c
+++ b/kernel/trace/trace_output.c
@@ -990,8 +990,11 @@ enum print_line_t trace_nop_print(struct trace_iterator *iter, int flags,
 }
 static void print_fn_trace(struct trace_seq *s, unsigned long ip,
-			   unsigned long parent_ip, int flags)
+			   unsigned long parent_ip, long delta, int flags)
 {
+	ip += delta;
+	parent_ip += delta;
 	seq_print_ip_sym(s, ip, flags);
 	if ((flags & TRACE_ITER_PRINT_PARENT) && parent_ip) {
@@ -1009,7 +1012,7 @@ static enum print_line_t trace_fn_trace(struct trace_iterator *iter, int flags,
 	trace_assign_type(field, iter->ent);
-	print_fn_trace(s, field->ip, field->parent_ip, flags);
+	print_fn_trace(s, field->ip, field->parent_ip, iter->tr->text_delta, flags);
 	trace_seq_putc(s, '\n');
 	return trace_handle_return(s);
@@ -1230,6 +1233,7 @@ static enum print_line_t trace_stack_print(struct trace_iterator *iter,
 	struct trace_seq *s = &iter->seq;
 	unsigned long *p;
 	unsigned long *end;
+	long delta = iter->tr->text_delta;
 	trace_assign_type(field, iter->ent);
 	end = (unsigned long *)((long)iter->ent + iter->ent_size);
@@ -1242,7 +1246,7 @@ static enum print_line_t trace_stack_print(struct trace_iterator *iter,
 			break;
 		trace_seq_puts(s, " => ");
-		seq_print_ip_sym(s, *p, flags);
+		seq_print_ip_sym(s, (*p) + delta, flags);
 		trace_seq_putc(s, '\n');
 	}
@@ -1587,10 +1591,13 @@ static enum print_line_t trace_print_print(struct trace_iterator *iter,
 {
 	struct print_entry *field;
 	struct trace_seq *s = &iter->seq;
+	unsigned long ip;
 	trace_assign_type(field, iter->ent);
-	seq_print_ip_sym(s, field->ip, flags);
+	ip = field->ip + iter->tr->text_delta;
+	seq_print_ip_sym(s, ip, flags);
 	trace_seq_printf(s, ": %s", field->buf);
 	return trace_handle_return(s);
@@ -1674,7 +1681,7 @@ trace_func_repeats_print(struct trace_iterator *iter, int flags,
 	trace_assign_type(field, iter->ent);
-	print_fn_trace(s, field->ip, field->parent_ip, flags);
+	print_fn_trace(s, field->ip, field->parent_ip, iter->tr->text_delta, flags);
 	trace_seq_printf(s, " (repeats: %u, last_ts:", field->count);
 	trace_print_time(s, iter,
 			 iter->ts - FUNC_REPEATS_GET_DELTA_TS(field));

--- a/tools/testing/selftests/ring-buffer/map_test.c
+++ b/tools/testing/selftests/ring-buffer/map_test.c
@@ -92,12 +92,22 @@ int tracefs_cpu_map(struct tracefs_cpu_map_desc *desc, int cpu)
 	if (desc->cpu_fd < 0)
 		return -ENODEV;
+again:
 	map = mmap(NULL, page_size, PROT_READ, MAP_SHARED, desc->cpu_fd, 0);
 	if (map == MAP_FAILED)
 		return -errno;
 	desc->meta = (struct trace_buffer_meta *)map;
+	/* the meta-page is bigger than the original mapping */
+	if (page_size < desc->meta->meta_struct_len) {
+		int meta_page_size = desc->meta->meta_page_size;
+		munmap(desc->meta, page_size);
+		page_size = meta_page_size;
+		goto again;
+	}
 	return 0;
 }
@@ -228,6 +238,20 @@ TEST_F(map, data_mmap)
 	data = mmap(NULL, data_len, PROT_READ, MAP_SHARED,
 		    desc->cpu_fd, meta_len);
 	ASSERT_EQ(data, MAP_FAILED);
+	/* Verify meta-page padding */
+	if (desc->meta->meta_page_size > getpagesize()) {
+		data_len = desc->meta->meta_page_size;
+		data = mmap(NULL, data_len,
+			    PROT_READ, MAP_SHARED, desc->cpu_fd, 0);
+		ASSERT_NE(data, MAP_FAILED);
+		for (int i = desc->meta->meta_struct_len;
+		     i < desc->meta->meta_page_size; i += sizeof(int))
+			ASSERT_EQ(*(int *)(data + i), 0);
+		munmap(data, data_len);
+	}
 }
 FIXTURE(snapshot) {