1. 12 Dec, 2022 1 commit
  2. 10 Dec, 2022 22 commits
  3. 09 Dec, 2022 1 commit
  4. 29 Nov, 2022 1 commit
  5. 28 Nov, 2022 1 commit
  6. 24 Nov, 2022 14 commits
    • Steven Rostedt (Google)'s avatar
      ftrace: Avoid needless updates of the ftrace function call · bd604f3d
      Steven Rostedt (Google) authored
      Song Shuai reported:
      
          The list func (ftrace_ops_list_func) will be patched first
          before the transition between old and new calls are set,
          which fixed the race described in this commit `59338f75`.
      
          While ftrace_trace_function changes from the list func to a
          ftrace_ops func, like unregistering the klp_ops to leave the only
          global_ops in ftrace_ops_list, the ftrace_[regs]_call will be
          replaced with the list func although it already exists. So there
          should be a condition to avoid this.
      
      And suggested using another variable to keep track of what the ftrace
      function is set to. But this could be simplified by using a helper
      function that does the same with a static variable.
      
      Link: https://lore.kernel.org/lkml/20221026132039.2236233-1-suagrfillet@gmail.com/
      Link: https://lore.kernel.org/linux-trace-kernel/20221122180905.737b6f52@gandalf.local.homeReported-by: default avatarSong Shuai <suagrfillet@gmail.com>
      Reviewed-by: default avatarMasami Hiramatsu (Google) <mhiramat@kernel.org>
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      bd604f3d
    • Zheng Yejian's avatar
      tracing: Optimize event type allocation with IDA · 96e6122c
      Zheng Yejian authored
      After commit 060fa5c8 ("tracing/events: reuse trace event ids after
       overflow"), trace events with dynamic type are linked up in list
      'ftrace_event_list' through field 'trace_event.list'. Then when max
      event type number used up, it's possible to reuse type number of some
      freed one by traversing 'ftrace_event_list'.
      
      As instead, using IDA to manage available type numbers can make codes
      simpler and then the field 'trace_event.list' can be dropped.
      
      Since 'struct trace_event' is used in static tracepoints, drop
      'trace_event.list' can make vmlinux smaller. Local test with about 2000
      tracepoints, vmlinux reduced about 64KB:
        before:-rwxrwxr-x 1 root root 76669448 Nov  8 17:14 vmlinux
        after: -rwxrwxr-x 1 root root 76604176 Nov  8 17:15 vmlinux
      
      Link: https://lkml.kernel.org/r/20221110020319.1259291-1-zhengyejian1@huawei.comSigned-off-by: default avatarZheng Yejian <zhengyejian1@huawei.com>
      Acked-by: default avatarMasami Hiramatsu (Google) <mhiramat@kernel.org>
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      96e6122c
    • Xiu Jianfeng's avatar
      tracing: Make tracepoint_print_iter static · a76d4648
      Xiu Jianfeng authored
      After change in commit 42391745 ("tracing: Make tracepoint_printk a
      static_key"), this symbol is not used outside of the file, so mark it
      static.
      
      Link: https://lkml.kernel.org/r/20221122091456.72055-1-xiujianfeng@huawei.comSigned-off-by: default avatarXiu Jianfeng <xiujianfeng@huawei.com>
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      a76d4648
    • Chuang Wang's avatar
      tracing/perf: Use strndup_user instead of kzalloc/strncpy_from_user · 9430cd62
      Chuang Wang authored
      This patch uses strndup_user instead of kzalloc + strncpy_from_user,
      which makes the code more concise.
      
      Link: https://lkml.kernel.org/r/20221121080831.707409-1-nashuiliang@gmail.comSigned-off-by: default avatarChuang Wang <nashuiliang@gmail.com>
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      9430cd62
    • Daniel Bristot de Oliveira's avatar
      Documentation/osnoise: Add osnoise/options documentation · 67543cd6
      Daniel Bristot de Oliveira authored
      Add the documentation about the osnoise/options file, along
      with an explanation about the OSNOISE_WORKLOAD option.
      
      Link: https://lkml.kernel.org/r/777af8f3d87beedd304805f98eff6c8291d64226.1668692096.git.bristot@kernel.org
      
      Cc: Daniel Bristot de Oliveira <bristot@kernel.org>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Signed-off-by: default avatarDaniel Bristot de Oliveira <bristot@kernel.org>
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      67543cd6
    • Daniel Bristot de Oliveira's avatar
      tracing/osnoise: Add OSNOISE_WORKLOAD option · 30838fcd
      Daniel Bristot de Oliveira authored
      The osnoise tracer is not only a tracer, and a set of tracepoints,
      but also a workload dispatcher.
      
      In preparation for having other workloads, e.g., in user-space,
      add an option to avoid dispatching the workload.
      
      By not dispatching the workload, the osnoise: tracepoints become
      generic events to measure the execution time of *any* task on Linux.
      
      For example:
      
        # cd /sys/kernel/tracing/
        # cat osnoise/options
        DEFAULTS OSNOISE_WORKLOAD
        # echo NO_OSNOISE_WORKLOAD > osnoise/options
        # cat osnoise/options
        NO_DEFAULTS NO_OSNOISE_WORKLOAD
        # echo osnoise > set_event
        # echo osnoise > current_tracer
        # tail -8 trace
            make-94722   [002] d..3.  1371.794507: thread_noise:     make:94722 start 1371.794302286 duration 200897 ns
              sh-121042  [020] d..3.  1371.794534: thread_noise:       sh:121042 start 1371.781610976 duration 8943683 ns
            make-121097  [005] d..3.  1371.794542: thread_noise:     make:121097 start 1371.794481522 duration 60444 ns
           <...>-40      [005] d..3.  1371.794550: thread_noise: migration/5:40 start 1371.794542256 duration 7154 ns
          <idle>-0       [018] dNh2.  1371.794554: irq_noise: reschedule:253 start 1371.794553547 duration 40 ns
          <idle>-0       [018] dNh2.  1371.794561: irq_noise: local_timer:236 start 1371.794556222 duration 4890 ns
          <idle>-0       [018] .Ns2.  1371.794563: softirq_noise:    SCHED:7 start 1371.794561803 duration 992 ns
          <idle>-0       [018] d..3.  1371.794566: thread_noise: swapper/18:0 start 1371.781368110 duration 13191798 ns
      
      In preparation for the rtla exec_time tracer/tool and
      rtla osnoise --user option.
      
      Link: https://lkml.kernel.org/r/f5cfbd37aefd419eefe9243b4d2fc38ed5753fe4.1668692096.git.bristot@kernel.org
      
      Cc: Daniel Bristot de Oliveira <bristot@kernel.org>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Signed-off-by: default avatarDaniel Bristot de Oliveira <bristot@kernel.org>
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      30838fcd
    • Daniel Bristot de Oliveira's avatar
      tracing/osnoise: Add osnoise/options file · b179d48b
      Daniel Bristot de Oliveira authored
      Add the tracing/osnoise/options file to control
      osnoise/timerlat tracer features. It is a single
      file to contain multiple features, similar to
      the sched/features file.
      
      Reading the file displays a list of options. Writing
      the OPTION_NAME enables it, writing NO_OPTION_NAME disables
      it.
      
      The DEAFULTS is a particular option that resets the options
      to the default ones.
      
      It uses a bitmask to keep track of the status of the option. When
      needed, we can add a list of static keys, but for now
      it does not justify the memory increase.
      
      Link: https://lkml.kernel.org/r/f8d34aefdb225d2603fcb4c02a120832a0cd3339.1668692096.git.bristot@kernel.org
      
      Cc: Daniel Bristot de Oliveira <bristot@kernel.org>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Signed-off-by: default avatarDaniel Bristot de Oliveira <bristot@kernel.org>
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      b179d48b
    • Song Chen's avatar
      ring_buffer: Remove unused "event" parameter · 04aabc32
      Song Chen authored
      After commit a389d86f ("ring-buffer: Have nested events still record
      running time stamp"), the "event" parameter is no longer used in either
      ring_buffer_unlock_commit() or rb_commit(). Best to remove it.
      
      Link: https://lkml.kernel.org/r/1666274811-24138-1-git-send-email-chensong_2000@189.cnSigned-off-by: default avatarSong Chen <chensong_2000@189.cn>
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      04aabc32
    • Steven Rostedt (Google)'s avatar
      tracing: Add trace_trigger kernel command line option · a01fdc89
      Steven Rostedt (Google) authored
      Allow triggers to be enabled at kernel boot up. For example:
      
        trace_trigger="sched_switch.stacktrace if prev_state == 2"
      
      The above will enable the stacktrace trigger on top of the sched_switch
      event and only trigger if its prev_state is 2 (TASK_UNINTERRUPTIBLE). Then
      at boot up, a stacktrace will trigger and be recorded in the tracing ring
      buffer every time the sched_switch happens where the previous state is
      TASK_INTERRUPTIBLE.
      
      Another useful trigger would be "traceoff" which can stop tracing on an
      event if a field of the event matches a certain value defined by the
      filter ("if" statement).
      
      Link: https://lore.kernel.org/linux-trace-kernel/20221020210056.0d8d0a5b@gandalf.local.homeSigned-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      a01fdc89
    • Steven Rostedt (Google)'s avatar
      tracing: Add __cpumask to denote a trace event field that is a cpumask_t · 8230f27b
      Steven Rostedt (Google) authored
      The trace events have a __bitmask field that can be used for anything
      that requires bitmasks. Although currently it is only used for CPU
      masks, it could be used in the future for any type of bitmasks.
      
      There is some user space tooling that wants to know if a field is a CPU
      mask and not just some random unsigned long bitmask. Introduce
      "__cpumask()" helper functions that work the same as the current
      __bitmask() helpers but displays in the format file:
      
        field:__data_loc cpumask_t *[] mask;    offset:36;      size:4; signed:0;
      
      Instead of:
      
        field:__data_loc unsigned long[] mask;  offset:32;      size:4; signed:0;
      
      The main difference is the type. Instead of "unsigned long" it is
      "cpumask_t *". Note, this type field needs to be a real type in the
      __dynamic_array() logic that both __cpumask and__bitmask use, but the
      comparison field requires it to be a scalar type whereas cpumask_t is a
      structure (non-scalar). But everything works when making it a pointer.
      
      Valentin added changes to remove the need of passing in "nr_bits" and the
      __cpumask will always use nr_cpumask_bits as its size.
      
      Link: https://lkml.kernel.org/r/20221014080456.1d32b989@rorschach.local.homeRequested-by: default avatarValentin Schneider <vschneid@redhat.com>
      Reviewed-by: default avatarValentin Schneider <vschneid@redhat.com>
      Signed-off-by: default avatarValentin Schneider <vschneid@redhat.com>
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      8230f27b
    • Zheng Yejian's avatar
      ftrace: Clean comments related to FTRACE_OPS_FL_PER_CPU · 78a01feb
      Zheng Yejian authored
      Commit b3a88803 ("ftrace: Kill FTRACE_OPS_FL_PER_CPU") didn't
      completely remove the comments related to FTRACE_OPS_FL_PER_CPU.
      
      Link: https://lkml.kernel.org/r/20221025153923.1995973-1-zhengyejian1@huawei.com
      
      Fixes: b3a88803 ("ftrace: Kill FTRACE_OPS_FL_PER_CPU")
      Signed-off-by: default avatarZheng Yejian <zhengyejian1@huawei.com>
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      78a01feb
    • Steven Rostedt (Google)'s avatar
      tracing: Free buffers when a used dynamic event is removed · 4313e5a6
      Steven Rostedt (Google) authored
      After 65536 dynamic events have been added and removed, the "type" field
      of the event then uses the first type number that is available (not
      currently used by other events). A type number is the identifier of the
      binary blobs in the tracing ring buffer (known as events) to map them to
      logic that can parse the binary blob.
      
      The issue is that if a dynamic event (like a kprobe event) is traced and
      is in the ring buffer, and then that event is removed (because it is
      dynamic, which means it can be created and destroyed), if another dynamic
      event is created that has the same number that new event's logic on
      parsing the binary blob will be used.
      
      To show how this can be an issue, the following can crash the kernel:
      
       # cd /sys/kernel/tracing
       # for i in `seq 65536`; do
           echo 'p:kprobes/foo do_sys_openat2 $arg1:u32' > kprobe_events
       # done
      
      For every iteration of the above, the writing to the kprobe_events will
      remove the old event and create a new one (with the same format) and
      increase the type number to the next available on until the type number
      reaches over 65535 which is the max number for the 16 bit type. After it
      reaches that number, the logic to allocate a new number simply looks for
      the next available number. When an dynamic event is removed, that number
      is then available to be reused by the next dynamic event created. That is,
      once the above reaches the max number, the number assigned to the event in
      that loop will remain the same.
      
      Now that means deleting one dynamic event and created another will reuse
      the previous events type number. This is where bad things can happen.
      After the above loop finishes, the kprobes/foo event which reads the
      do_sys_openat2 function call's first parameter as an integer.
      
       # echo 1 > kprobes/foo/enable
       # cat /etc/passwd > /dev/null
       # cat trace
                   cat-2211    [005] ....  2007.849603: foo: (do_sys_openat2+0x0/0x130) arg1=4294967196
                   cat-2211    [005] ....  2007.849620: foo: (do_sys_openat2+0x0/0x130) arg1=4294967196
                   cat-2211    [005] ....  2007.849838: foo: (do_sys_openat2+0x0/0x130) arg1=4294967196
                   cat-2211    [005] ....  2007.849880: foo: (do_sys_openat2+0x0/0x130) arg1=4294967196
       # echo 0 > kprobes/foo/enable
      
      Now if we delete the kprobe and create a new one that reads a string:
      
       # echo 'p:kprobes/foo do_sys_openat2 +0($arg2):string' > kprobe_events
      
      And now we can the trace:
      
       # cat trace
              sendmail-1942    [002] .....   530.136320: foo: (do_sys_openat2+0x0/0x240) arg1=             cat-2046    [004] .....   530.930817: foo: (do_sys_openat2+0x0/0x240) arg1="������������������������������������������������������������������������������������������������"
                   cat-2046    [004] .....   530.930961: foo: (do_sys_openat2+0x0/0x240) arg1="������������������������������������������������������������������������������������������������"
                   cat-2046    [004] .....   530.934278: foo: (do_sys_openat2+0x0/0x240) arg1="������������������������������������������������������������������������������������������������"
                   cat-2046    [004] .....   530.934563: foo: (do_sys_openat2+0x0/0x240) arg1="������������������������������������������������������������������������������������������������"
                  bash-1515    [007] .....   534.299093: foo: (do_sys_openat2+0x0/0x240) arg1="kkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk���������@��4Z����;Y�����U
      
      And dmesg has:
      
      ==================================================================
      BUG: KASAN: use-after-free in string+0xd4/0x1c0
      Read of size 1 at addr ffff88805fdbbfa0 by task cat/2049
      
       CPU: 0 PID: 2049 Comm: cat Not tainted 6.1.0-rc6-test+ #641
       Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01 v03.03 07/14/2016
       Call Trace:
        <TASK>
        dump_stack_lvl+0x5b/0x77
        print_report+0x17f/0x47b
        kasan_report+0xad/0x130
        string+0xd4/0x1c0
        vsnprintf+0x500/0x840
        seq_buf_vprintf+0x62/0xc0
        trace_seq_printf+0x10e/0x1e0
        print_type_string+0x90/0xa0
        print_kprobe_event+0x16b/0x290
        print_trace_line+0x451/0x8e0
        s_show+0x72/0x1f0
        seq_read_iter+0x58e/0x750
        seq_read+0x115/0x160
        vfs_read+0x11d/0x460
        ksys_read+0xa9/0x130
        do_syscall_64+0x3a/0x90
        entry_SYSCALL_64_after_hwframe+0x63/0xcd
       RIP: 0033:0x7fc2e972ade2
       Code: c0 e9 b2 fe ff ff 50 48 8d 3d b2 3f 0a 00 e8 05 f0 01 00 0f 1f 44 00 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 0f 05 <48> 3d 00 f0 ff ff 77 56 c3 0f 1f 44 00 00 48 83 ec 28 48 89 54 24
       RSP: 002b:00007ffc64e687c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
       RAX: ffffffffffffffda RBX: 0000000000020000 RCX: 00007fc2e972ade2
       RDX: 0000000000020000 RSI: 00007fc2e980d000 RDI: 0000000000000003
       RBP: 00007fc2e980d000 R08: 00007fc2e980c010 R09: 0000000000000000
       R10: 0000000000000022 R11: 0000000000000246 R12: 0000000000020f00
       R13: 0000000000000003 R14: 0000000000020000 R15: 0000000000020000
        </TASK>
      
       The buggy address belongs to the physical page:
       page:ffffea00017f6ec0 refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x5fdbb
       flags: 0xfffffc0000000(node=0|zone=1|lastcpupid=0x1fffff)
       raw: 000fffffc0000000 0000000000000000 ffffea00017f6ec8 0000000000000000
       raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
       page dumped because: kasan: bad access detected
      
       Memory state around the buggy address:
        ffff88805fdbbe80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
        ffff88805fdbbf00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
       >ffff88805fdbbf80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
                                      ^
        ffff88805fdbc000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
        ffff88805fdbc080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
       ==================================================================
      
      This was found when Zheng Yejian sent a patch to convert the event type
      number assignment to use IDA, which gives the next available number, and
      this bug showed up in the fuzz testing by Yujie Liu and the kernel test
      robot. But after further analysis, I found that this behavior is the same
      as when the event type numbers go past the 16bit max (and the above shows
      that).
      
      As modules have a similar issue, but is dealt with by setting a
      "WAS_ENABLED" flag when a module event is enabled, and when the module is
      freed, if any of its events were enabled, the ring buffer that holds that
      event is also cleared, to prevent reading stale events. The same can be
      done for dynamic events.
      
      If any dynamic event that is being removed was enabled, then make sure the
      buffers they were enabled in are now cleared.
      
      Link: https://lkml.kernel.org/r/20221123171434.545706e3@gandalf.local.home
      Link: https://lore.kernel.org/all/20221110020319.1259291-1-zhengyejian1@huawei.com/
      
      Cc: stable@vger.kernel.org
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Depends-on: e18eb878 ("tracing: Add tracing_reset_all_online_cpus_unlocked() function")
      Depends-on: 5448d44c ("tracing: Add unified dynamic event framework")
      Depends-on: 6212dd29 ("tracing/kprobes: Use dyn_event framework for kprobe events")
      Depends-on: 065e63f9 ("tracing: Only have rmmod clear buffers that its events were active in")
      Depends-on: 575380da ("tracing: Only clear trace buffer on module unload if event was traced")
      Fixes: 77b44d1b ("tracing/kprobes: Rename Kprobe-tracer to kprobe-event")
      Reported-by: default avatarZheng Yejian <zhengyejian1@huawei.com>
      Reported-by: default avatarYujie Liu <yujie.liu@intel.com>
      Reported-by: default avatarkernel test robot <yujie.liu@intel.com>
      Acked-by: default avatarMasami Hiramatsu (Google) <mhiramat@kernel.org>
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      4313e5a6
    • Steven Rostedt (Google)'s avatar
      tracing: Add tracing_reset_all_online_cpus_unlocked() function · e18eb878
      Steven Rostedt (Google) authored
      Currently the tracing_reset_all_online_cpus() requires the
      trace_types_lock held. But only one caller of this function actually has
      that lock held before calling it, and the other just takes the lock so
      that it can call it. More users of this function is needed where the lock
      is not held.
      
      Add a tracing_reset_all_online_cpus_unlocked() function for the one use
      case that calls it without being held, and also add a lockdep_assert to
      make sure it is held when called.
      
      Then have tracing_reset_all_online_cpus() take the lock internally, such
      that callers do not need to worry about taking it.
      
      Link: https://lkml.kernel.org/r/20221123192741.658273220@goodmis.org
      
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Zheng Yejian <zhengyejian1@huawei.com>
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      e18eb878
    • Steven Rostedt (Google)'s avatar
      tracing: Fix race where histograms can be called before the event · ef38c79a
      Steven Rostedt (Google) authored
      commit 94eedf3d ("tracing: Fix race where eprobes can be called before
      the event") fixed an issue where if an event is soft disabled, and the
      trigger is being added, there's a small window where the event sees that
      there's a trigger but does not see that it requires reading the event yet,
      and then calls the trigger with the record == NULL.
      
      This could be solved with adding memory barriers in the hot path, or to
      make sure that all the triggers requiring a record check for NULL. The
      latter was chosen.
      
      Commit 94eedf3d set the eprobe trigger handle to check for NULL, but
      the same needs to be done with histograms.
      
      Link: https://lore.kernel.org/linux-trace-kernel/20221118211809.701d40c0f8a757b0df3c025a@kernel.org/
      Link: https://lore.kernel.org/linux-trace-kernel/20221123164323.03450c3a@gandalf.local.home
      
      Cc: Tom Zanussi <zanussi@kernel.org>
      Cc: stable@vger.kernel.org
      Fixes: 7491e2c4 ("tracing: Add a probe that attaches to trace events")
      Reported-by: default avatarMasami Hiramatsu (Google) <mhiramat@kernel.org>
      Acked-by: default avatarMasami Hiramatsu (Google) <mhiramat@kernel.org>
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      ef38c79a