• Beau Belgrave's avatar
    user_events: Add minimal support for trace_event into ftrace · 7f5a08c7
    Beau Belgrave authored
    Minimal support for interacting with dynamic events, trace_event and
    ftrace. Core outline of flow between user process, ioctl and trace_event
    APIs.
    
    User mode processes that wish to use trace events to get data into
    ftrace, perf, eBPF, etc are limited to uprobes today. The user events
    features enables an ABI for user mode processes to create and write to
    trace events that are isolated from kernel level trace events. This
    enables a faster path for tracing from user mode data as well as opens
    managed code to participate in trace events, where stub locations are
    dynamic.
    
    User processes often want to trace only when it's useful. To enable this
    a set of pages are mapped into the user process space that indicate the
    current state of the user events that have been registered. User
    processes can check if their event is hooked to a trace/probe, and if it
    is, emit the event data out via the write() syscall.
    
    Two new files are introduced into tracefs to accomplish this:
    user_events_status - This file is mmap'd into participating user mode
    processes to indicate event status.
    
    user_events_data - This file is opened and register/delete ioctl's are
    issued to create/open/delete trace events that can be used for tracing.
    
    The typical scenario is on process start to mmap user_events_status. Processes
    then register the events they plan to use via the REG ioctl. The ioctl reads
    and updates the passed in user_reg struct. The status_index of the struct is
    used to know the byte in the status page to check for that event. The
    write_index of the struct is used to describe that event when writing out to
    the fd that was used for the ioctl call. The data must always include this
    index first when writing out data for an event. Data can be written either by
    write() or by writev().
    
    For example, in memory:
    int index;
    char data[];
    
    Psuedo code example of typical usage:
    struct user_reg reg;
    
    int page_fd = open("user_events_status", O_RDWR);
    char *page_data = mmap(NULL, PAGE_SIZE, PROT_READ, MAP_SHARED, page_fd, 0);
    close(page_fd);
    
    int data_fd = open("user_events_data", O_RDWR);
    
    reg.size = sizeof(reg);
    reg.name_args = (__u64)"test";
    
    ioctl(data_fd, DIAG_IOCSREG, &reg);
    int status_id = reg.status_index;
    int write_id = reg.write_index;
    
    struct iovec io[2];
    io[0].iov_base = &write_id;
    io[0].iov_len = sizeof(write_id);
    io[1].iov_base = payload;
    io[1].iov_len = sizeof(payload);
    
    if (page_data[status_id])
    	writev(data_fd, io, 2);
    
    User events are also exposed via the dynamic_events tracefs file for
    both create and delete. Current status is exposed via the user_events_status
    tracefs file.
    
    Simple example to register a user event via dynamic_events:
    	echo u:test >> dynamic_events
    	cat dynamic_events
    	u:test
    
    If an event is hooked to a probe, the probe hooked shows up:
    	echo 1 > events/user_events/test/enable
    	cat user_events_status
    	1:test # Used by ftrace
    
    	Active: 1
    	Busy: 1
    	Max: 4096
    
    If an event is not hooked to a probe, no probe status shows up:
    	echo 0 > events/user_events/test/enable
    	cat user_events_status
    	1:test
    
    	Active: 1
    	Busy: 0
    	Max: 4096
    
    Users can describe the trace event format via the following format:
    	name[:FLAG1[,FLAG2...] [field1[;field2...]]
    
    Each field has the following format:
    	type name
    
    Example for char array with a size of 20 named msg:
    	echo 'u:detailed char[20] msg' >> dynamic_events
    	cat dynamic_events
    	u:detailed char[20] msg
    
    Data offsets are based on the data written out via write() and will be
    updated to reflect the correct offset in the trace_event fields. For dynamic
    data it is recommended to use the new __rel_loc data type. This type will be
    the same as __data_loc, but the offset is relative to this entry. This allows
    user_events to not worry about what common fields are being inserted before
    the data.
    
    The above format is valid for both the ioctl and the dynamic_events file.
    
    Link: https://lkml.kernel.org/r/20220118204326.2169-2-beaub@linux.microsoft.comAcked-by: default avatarMasami Hiramatsu <mhiramat@kernel.org>
    Signed-off-by: default avatarBeau Belgrave <beaub@linux.microsoft.com>
    Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
    7f5a08c7
Makefile 3.52 KB