1. 10 Dec, 2022 15 commits
    • Steven Rostedt's avatar
      ring-buffer: Handle resize in early boot up · 88ca6a71
      Steven Rostedt authored
      With the new command line option that allows trace event triggers to be
      added at boot, the "snapshot" trigger will allocate the snapshot buffer
      very early, when interrupts can not be enabled. Allocating the ring buffer
      is not the problem, but it also resizes it, which is, as the resize code
      does synchronization that can not be preformed at early boot.
      
      To handle this, first change the raw_spin_lock_irq() in rb_insert_pages()
      to raw_spin_lock_irqsave(), such that the unlocking of that spin lock will
      not enable interrupts.
      
      Next, where it calls schedule_work_on(), disable migration and check if
      the CPU to update is the current CPU, and if so, perform the work
      directly, otherwise re-enable migration and call the schedule_work_on() to
      the CPU that is being updated. The rb_insert_pages() just needs to be run
      on the CPU that it is updating, and does not need preemption nor
      interrupts disabled when calling it.
      
      Link: https://lore.kernel.org/lkml/Y5J%2FCajlNh1gexvo@google.com/
      Link: https://lore.kernel.org/linux-trace-kernel/20221209101151.1fec1167@gandalf.local.home
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      
      Fixes: a01fdc89 ("tracing: Add trace_trigger kernel command line option")
      Reported-by: default avatarRoss Zwisler <zwisler@google.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Tested-by: default avatarRoss Zwisler <zwisler@google.com>
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      88ca6a71
    • Zheng Yejian's avatar
      tracing/hist: Fix issue of losting command info in error_log · 608c6ed3
      Zheng Yejian authored
      When input some constructed invalid 'trigger' command, command info
      in 'error_log' are lost [1].
      
      The root cause is that there is a path that event_hist_trigger_parse()
      is recursely called once and 'last_cmd' which save origin command is
      cleared, then later calling of hist_err() will no longer record origin
      command info:
      
        event_hist_trigger_parse() {
          last_cmd_set()  // <1> 'last_cmd' save origin command here at first
          create_actions() {
            onmatch_create() {
              action_create() {
                trace_action_create() {
                  trace_action_create_field_var() {
                    create_field_var_hist() {
                      event_hist_trigger_parse() {  // <2> recursely called once
                        hist_err_clear()  // <3> 'last_cmd' is cleared here
                      }
                      hist_err()  // <4> No longer find origin command!!!
      
      Since 'glob' is empty string while running into the recurse call, we
      can trickly check it and bypass the call of hist_err_clear() to solve it.
      
      [1]
       # cd /sys/kernel/tracing
       # echo "my_synth_event int v1; int v2; int v3;" >> synthetic_events
       # echo 'hist:keys=pid' >> events/sched/sched_waking/trigger
       # echo "hist:keys=next_pid:onmatch(sched.sched_waking).my_synth_event(\
      pid,pid1)" >> events/sched/sched_switch/trigger
       # cat error_log
      [  8.405018] hist:sched:sched_switch: error: Couldn't find synthetic event
        Command:
      hist:keys=next_pid:onmatch(sched.sched_waking).my_synth_event(pid,pid1)
                                                                ^
      [  8.816902] hist:sched:sched_switch: error: Couldn't find field
        Command:
      hist:keys=next_pid:onmatch(sched.sched_waking).my_synth_event(pid,pid1)
                                ^
      [  8.816902] hist:sched:sched_switch: error: Couldn't parse field variable
        Command:
      hist:keys=next_pid:onmatch(sched.sched_waking).my_synth_event(pid,pid1)
                                ^
      [  8.999880] : error: Couldn't find field
        Command:
                 ^
      [  8.999880] : error: Couldn't parse field variable
        Command:
                 ^
      [  8.999880] : error: Couldn't find field
        Command:
                 ^
      [  8.999880] : error: Couldn't create histogram for field
        Command:
                 ^
      
      Link: https://lore.kernel.org/linux-trace-kernel/20221207135326.3483216-1-zhengyejian1@huawei.com
      
      Cc: <mhiramat@kernel.org>
      Cc: <zanussi@kernel.org>
      Fixes: f404da6e ("tracing: Add 'last error' error facility for hist triggers")
      Signed-off-by: default avatarZheng Yejian <zhengyejian1@huawei.com>
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      608c6ed3
    • Zheng Yejian's avatar
      tracing: Fix issue of missing one synthetic field · ff4837f7
      Zheng Yejian authored
      The maximum number of synthetic fields supported is defined as
      SYNTH_FIELDS_MAX which value currently is 64, but it actually fails
      when try to generate a synthetic event with 64 fields by executing like:
      
        # echo "my_synth_event int v1; int v2; int v3; int v4; int v5; int v6;\
         int v7; int v8; int v9; int v10; int v11; int v12; int v13; int v14;\
         int v15; int v16; int v17; int v18; int v19; int v20; int v21; int v22;\
         int v23; int v24; int v25; int v26; int v27; int v28; int v29; int v30;\
         int v31; int v32; int v33; int v34; int v35; int v36; int v37; int v38;\
         int v39; int v40; int v41; int v42; int v43; int v44; int v45; int v46;\
         int v47; int v48; int v49; int v50; int v51; int v52; int v53; int v54;\
         int v55; int v56; int v57; int v58; int v59; int v60; int v61; int v62;\
         int v63; int v64" >> /sys/kernel/tracing/synthetic_events
      
      Correct the field counting to fix it.
      
      Link: https://lore.kernel.org/linux-trace-kernel/20221207091557.3137904-1-zhengyejian1@huawei.com
      
      Cc: <mhiramat@kernel.org>
      Cc: <zanussi@kernel.org>
      Cc: stable@vger.kernel.org
      Fixes: c9e759b1 ("tracing: Rework synthetic event command parsing")
      Signed-off-by: default avatarZheng Yejian <zhengyejian1@huawei.com>
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      ff4837f7
    • Zheng Yejian's avatar
      tracing/hist: Fix out-of-bound write on 'action_data.var_ref_idx' · 82470f7d
      Zheng Yejian authored
      When generate a synthetic event with many params and then create a trace
      action for it [1], kernel panic happened [2].
      
      It is because that in trace_action_create() 'data->n_params' is up to
      SYNTH_FIELDS_MAX (current value is 64), and array 'data->var_ref_idx'
      keeps indices into array 'hist_data->var_refs' for each synthetic event
      param, but the length of 'data->var_ref_idx' is TRACING_MAP_VARS_MAX
      (current value is 16), so out-of-bound write happened when 'data->n_params'
      more than 16. In this case, 'data->match_data.event' is overwritten and
      eventually cause the panic.
      
      To solve the issue, adjust the length of 'data->var_ref_idx' to be
      SYNTH_FIELDS_MAX and add sanity checks to avoid out-of-bound write.
      
      [1]
       # cd /sys/kernel/tracing/
       # echo "my_synth_event int v1; int v2; int v3; int v4; int v5; int v6;\
      int v7; int v8; int v9; int v10; int v11; int v12; int v13; int v14;\
      int v15; int v16; int v17; int v18; int v19; int v20; int v21; int v22;\
      int v23; int v24; int v25; int v26; int v27; int v28; int v29; int v30;\
      int v31; int v32; int v33; int v34; int v35; int v36; int v37; int v38;\
      int v39; int v40; int v41; int v42; int v43; int v44; int v45; int v46;\
      int v47; int v48; int v49; int v50; int v51; int v52; int v53; int v54;\
      int v55; int v56; int v57; int v58; int v59; int v60; int v61; int v62;\
      int v63" >> synthetic_events
       # echo 'hist:keys=pid:ts0=common_timestamp.usecs if comm=="bash"' >> \
      events/sched/sched_waking/trigger
       # echo "hist:keys=next_pid:onmatch(sched.sched_waking).my_synth_event(\
      pid,pid,pid,pid,pid,pid,pid,pid,pid,pid,pid,pid,pid,pid,pid,pid,pid,pid,\
      pid,pid,pid,pid,pid,pid,pid,pid,pid,pid,pid,pid,pid,pid,pid,pid,pid,pid,\
      pid,pid,pid,pid,pid,pid,pid,pid,pid,pid,pid,pid,pid,pid,pid,pid,pid,pid,\
      pid,pid,pid,pid,pid,pid,pid,pid,pid)" >> events/sched/sched_switch/trigger
      
      [2]
      BUG: unable to handle page fault for address: ffff91c900000000
      PGD 61001067 P4D 61001067 PUD 0
      Oops: 0000 [#1] PREEMPT SMP NOPTI
      CPU: 2 PID: 322 Comm: bash Tainted: G        W          6.1.0-rc8+ #229
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
      rel-1.15.0-0-g2dd4b9b3f840-prebuilt.qemu.org 04/01/2014
      RIP: 0010:strcmp+0xc/0x30
      Code: 75 f7 31 d2 44 0f b6 04 16 44 88 04 11 48 83 c2 01 45 84 c0 75 ee
      c3 cc cc cc cc 0f 1f 00 31 c0 eb 08 48 83 c0 01 84 d2 74 13 <0f> b6 14
      07 3a 14 06 74 ef 19 c0 83 c8 01 c3 cc cc cc cc 31 c3
      RSP: 0018:ffff9b3b00f53c48 EFLAGS: 00000246
      RAX: 0000000000000000 RBX: ffffffffba958a68 RCX: 0000000000000000
      RDX: 0000000000000010 RSI: ffff91c943d33a90 RDI: ffff91c900000000
      RBP: ffff91c900000000 R08: 00000018d604b529 R09: 0000000000000000
      R10: ffff91c9483eddb1 R11: ffff91ca483eddab R12: ffff91c946171580
      R13: ffff91c9479f0538 R14: ffff91c9457c2848 R15: ffff91c9479f0538
      FS:  00007f1d1cfbe740(0000) GS:ffff91c9bdc80000(0000)
      knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: ffff91c900000000 CR3: 0000000006316000 CR4: 00000000000006e0
      Call Trace:
       <TASK>
       __find_event_file+0x55/0x90
       action_create+0x76c/0x1060
       event_hist_trigger_parse+0x146d/0x2060
       ? event_trigger_write+0x31/0xd0
       trigger_process_regex+0xbb/0x110
       event_trigger_write+0x6b/0xd0
       vfs_write+0xc8/0x3e0
       ? alloc_fd+0xc0/0x160
       ? preempt_count_add+0x4d/0xa0
       ? preempt_count_add+0x70/0xa0
       ksys_write+0x5f/0xe0
       do_syscall_64+0x3b/0x90
       entry_SYSCALL_64_after_hwframe+0x63/0xcd
      RIP: 0033:0x7f1d1d0cf077
      Code: 64 89 02 48 c7 c0 ff ff ff ff eb bb 0f 1f 80 00 00 00 00 f3 0f 1e
      fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00
      f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74
      RSP: 002b:00007ffcebb0e568 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
      RAX: ffffffffffffffda RBX: 0000000000000143 RCX: 00007f1d1d0cf077
      RDX: 0000000000000143 RSI: 00005639265aa7e0 RDI: 0000000000000001
      RBP: 00005639265aa7e0 R08: 000000000000000a R09: 0000000000000142
      R10: 000056392639c017 R11: 0000000000000246 R12: 0000000000000143
      R13: 00007f1d1d1ae6a0 R14: 00007f1d1d1aa4a0 R15: 00007f1d1d1a98a0
       </TASK>
      Modules linked in:
      CR2: ffff91c900000000
      ---[ end trace 0000000000000000 ]---
      RIP: 0010:strcmp+0xc/0x30
      Code: 75 f7 31 d2 44 0f b6 04 16 44 88 04 11 48 83 c2 01 45 84 c0 75 ee
      c3 cc cc cc cc 0f 1f 00 31 c0 eb 08 48 83 c0 01 84 d2 74 13 <0f> b6 14
      07 3a 14 06 74 ef 19 c0 83 c8 01 c3 cc cc cc cc 31 c3
      RSP: 0018:ffff9b3b00f53c48 EFLAGS: 00000246
      RAX: 0000000000000000 RBX: ffffffffba958a68 RCX: 0000000000000000
      RDX: 0000000000000010 RSI: ffff91c943d33a90 RDI: ffff91c900000000
      RBP: ffff91c900000000 R08: 00000018d604b529 R09: 0000000000000000
      R10: ffff91c9483eddb1 R11: ffff91ca483eddab R12: ffff91c946171580
      R13: ffff91c9479f0538 R14: ffff91c9457c2848 R15: ffff91c9479f0538
      FS:  00007f1d1cfbe740(0000) GS:ffff91c9bdc80000(0000)
      knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: ffff91c900000000 CR3: 0000000006316000 CR4: 00000000000006e0
      
      Link: https://lore.kernel.org/linux-trace-kernel/20221207035143.2278781-1-zhengyejian1@huawei.com
      
      Cc: <mhiramat@kernel.org>
      Cc: <zanussi@kernel.org>
      Cc: stable@vger.kernel.org
      Fixes: d380dcde ("tracing: Fix now invalid var_ref_vals assumption in trace action")
      Signed-off-by: default avatarZheng Yejian <zhengyejian1@huawei.com>
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      82470f7d
    • Zheng Yejian's avatar
      tracing/hist: Fix wrong return value in parse_action_params() · 2cc6a528
      Zheng Yejian authored
      When number of synth fields is more than SYNTH_FIELDS_MAX,
      parse_action_params() should return -EINVAL.
      
      Link: https://lore.kernel.org/linux-trace-kernel/20221207034635.2253990-1-zhengyejian1@huawei.com
      
      Cc: <mhiramat@kernel.org>
      Cc: <zanussi@kernel.org>
      Cc: stable@vger.kernel.org
      Fixes: c282a386 ("tracing: Add 'onmatch' hist trigger action support")
      Signed-off-by: default avatarZheng Yejian <zhengyejian1@huawei.com>
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      2cc6a528
    • Steven Rostedt's avatar
      x86/mm/kmmio: Use rcu_read_lock_sched_notrace() · 20fb6c99
      Steven Rostedt authored
      The mmiotrace tracer is "special". The purpose is to help reverse engineer
      binary drivers by removing the memory allocated by the driver and when the
      driver goes to access it, a fault occurs, the mmiotracer will record what
      the driver was doing and then do the work on its behalf by single stepping
      through the process.
      
      But to achieve this ability, it must do some special things. One is to
      take the rcu_read_lock() when the fault occurs, and then release it in the
      breakpoint that is single stepping. This makes lockdep unhappy, as it
      changes the state of RCU from within an exception that is not contained in
      that exception, and we get a nasty splat from lockdep.
      
      Instead, switch to rcu_read_lock_sched_notrace() as the RCU sched variant
      has the same grace period as normal RCU. This is basically the same as
      rcu_read_lock() but does not make lockdep complain about it.
      
      Note, the preempt_disable() is still needed as it uses preempt_enable_no_resched().
      
      Link: https://lore.kernel.org/linux-trace-kernel/20221209134144.04f33626@gandalf.local.home
      
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Karol Herbst <karolherbst@gmail.com>
      Cc: Pekka Paalanen <ppaalanen@gmail.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Acked-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      20fb6c99
    • Steven Rostedt's avatar
      x86/mm/kmmio: Switch to arch_spin_lock() · 4994e387
      Steven Rostedt authored
      The mmiotrace tracer is "special". The purpose is to help reverse engineer
      binary drivers by removing the memory allocated by the driver and when the
      driver goes to access it, a fault occurs, the mmiotracer will record what
      the driver was doing and then do the work on its behalf by single stepping
      through the process.
      
      But to achieve this ability, it must do some special things. One is it
      needs to grab a lock while in the breakpoint handler. This is considered
      an NMI state, and then lockdep warns that the lock is being held in both
      an NMI state (really a breakpoint handler) and also in normal context.
      
      As the breakpoint/NMI state only happens when the driver is accessing
      memory, there's no concern of a race condition against the setup and
      tear-down of mmiotracer.
      
      To make lockdep and mmiotrace work together, convert the locks used in the
      breakpoint handler into arch_spin_lock().
      
      Link: https://lkml.kernel.org/r/20221206191229.656244029@goodmis.org
      Link: https://lore.kernel.org/lkml/20221201213126.620b7dd3@gandalf.local.home/
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Karol Herbst <karolherbst@gmail.com>
      Cc: Pekka Paalanen <ppaalanen@gmail.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Suggested-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      4994e387
    • Masami Hiramatsu (Google)'s avatar
      tracing: Fix complicated dependency of CONFIG_TRACER_MAX_TRACE · e25e43a4
      Masami Hiramatsu (Google) authored
      Both CONFIG_OSNOISE_TRACER and CONFIG_HWLAT_TRACER partially enables the
      CONFIG_TRACER_MAX_TRACE code, but that is complicated and has
      introduced a bug; It declares tracing_max_lat_fops data structure outside
      of #ifdefs, but since it is defined only when CONFIG_TRACER_MAX_TRACE=y
      or CONFIG_HWLAT_TRACER=y, if only CONFIG_OSNOISE_TRACER=y, that
      declaration comes to a definition(!).
      
      To fix this issue, and do not repeat the similar problem, makes
      CONFIG_OSNOISE_TRACER and CONFIG_HWLAT_TRACER enables the
      CONFIG_TRACER_MAX_TRACE always. It has there benefits;
      - Fix the tracing_max_lat_fops bug
      - Simplify the #ifdefs
      - CONFIG_TRACER_MAX_TRACE code is fully enabled, or not.
      
      Link: https://lore.kernel.org/linux-trace-kernel/167033628155.4111793.12185405690820208159.stgit@devnote3
      
      Fixes: 424b650f ("tracing: Fix missing osnoise tracer on max_latency")
      Cc: Daniel Bristot de Oliveira <bristot@kernel.org>
      Cc: stable@vger.kernel.org
      Reported-by: default avatarDavid Howells <dhowells@redhat.com>
      Reported-by: default avatarkernel test robot <lkp@intel.com>
      Signed-off-by: default avatarMasami Hiramatsu (Google) <mhiramat@kernel.org>
      Link: https://lore.kernel.org/all/166992525941.1716618.13740663757583361463.stgit@warthog.procyon.org.uk/ (original thread and v1)
      Link: https://lore.kernel.org/all/202212052253.VuhZ2ulJ-lkp@intel.com/T/#u (v1 error report)
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      e25e43a4
    • Steven Rostedt (Google)'s avatar
      tracing/probes: Handle system names with hyphens · 575b76cb
      Steven Rostedt (Google) authored
      When creating probe names, a check is done to make sure it matches basic C
      standard variable naming standards. Basically, starts with alphabetic or
      underline, and then the rest of the characters have alpha-numeric or
      underline in them.
      
      But system names do not have any true naming conventions, as they are
      created by the TRACE_SYSTEM macro and nothing tests to see what they are.
      The "xhci-hcd" trace events has a '-' in the system name. When trying to
      attach a eprobe to one of these trace points, it fails because the system
      name does not follow the variable naming convention because of the
      hyphen, and the eprobe checks fail on this.
      
      Allow hyphens in the system name so that eprobes can attach to the
      "xhci-hcd" trace events.
      
      Link: https://lore.kernel.org/all/Y3eJ8GiGnEvVd8%2FN@macondo/
      Link: https://lore.kernel.org/linux-trace-kernel/20221122122345.160f5077@gandalf.local.home
      
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: stable@vger.kernel.org
      Fixes: 5b7a9622 ("tracing/probe: Check event/group naming rule at parsing")
      Reported-by: default avatarRafael Mendonca <rafaelmendsr@gmail.com>
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      575b76cb
    • Song Chen's avatar
      trace/kprobe: remove duplicated calls of ring_buffer_event_data · fff1787a
      Song Chen authored
      Function __kprobe_trace_func calls ring_buffer_event_data to
      get a ring buffer, however, it has been done in above call
      trace_event_buffer_reserve. So does __kretprobe_trace_func.
      
      This patch removes those duplicated calls.
      
      Link: https://lore.kernel.org/all/1666145478-4706-1-git-send-email-chensong_2000@189.cn/Reviewed-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      Acked-by: default avatarMasami Hiramatsu (Google) <mhiramat@kernel.org>
      Signed-off-by: default avatarSong Chen <chensong_2000@189.cn>
      Signed-off-by: default avatarMasami Hiramatsu (Google) <mhiramat@kernel.org>
      fff1787a
    • Masami Hiramatsu (Google)'s avatar
    • Masami Hiramatsu (Google)'s avatar
      tracing: Add nohitcount option for suppressing display of raw hitcount · ccf47f5c
      Masami Hiramatsu (Google) authored
      Add 'nohitcount' ('NOHC' for short) option for suppressing display of
      the raw hitcount column in the histogram.
      Note that you must specify at least one value except raw 'hitcount'
      when you specify this nohitcount option.
      
        # cd /sys/kernel/debug/tracing/
        # echo hist:keys=pid:vals=runtime.percent,runtime.graph:sort=pid:NOHC > \
              events/sched/sched_stat_runtime/trigger
        # sleep 10
        # cat events/sched/sched_stat_runtime/hist
       # event histogram
       #
       # trigger info: hist:keys=pid:vals=runtime.percent,runtime.graph:sort=pid:size=2048:nohitcount  [active]
       #
      
       { pid:          8 }  runtime (%):   3.02  runtime: #
       { pid:         14 }  runtime (%):   2.25  runtime:
       { pid:         16 }  runtime (%):   2.25  runtime:
       { pid:         26 }  runtime (%):   0.17  runtime:
       { pid:         61 }  runtime (%):  11.52  runtime: ####
       { pid:         67 }  runtime (%):   1.56  runtime:
       { pid:         68 }  runtime (%):   0.84  runtime:
       { pid:         76 }  runtime (%):   0.92  runtime:
       { pid:        117 }  runtime (%):   2.50  runtime: #
       { pid:        146 }  runtime (%):  49.88  runtime: ####################
       { pid:        157 }  runtime (%):  16.63  runtime: ######
       { pid:        158 }  runtime (%):   8.38  runtime: ###
      
      Link: https://lore.kernel.org/linux-trace-kernel/166610814787.56030.4980636083486339906.stgit@devnote2Signed-off-by: default avatarMasami Hiramatsu (Google) <mhiramat@kernel.org>
      Reviewed-by: default avatarTom Zanussi <zanussi@kernel.org>
      Tested-by: default avatarTom Zanussi <zanussi@kernel.org>
      ccf47f5c
    • Masami Hiramatsu (Google)'s avatar
      tracing: Add .graph suffix option to histogram value · a2c54256
      Masami Hiramatsu (Google) authored
      Add the .graph suffix which shows the bar graph of the histogram value.
      
      For example, the below example shows that the bar graph
      of the histogram of the runtime for each tasks.
      
      ------
        # cd /sys/kernel/debug/tracing/
        # echo hist:keys=pid:vals=runtime.graph:sort=pid > \
         events/sched/sched_stat_runtime/trigger
        # sleep 10
        # cat events/sched/sched_stat_runtime/hist
       # event histogram
       #
       # trigger info: hist:keys=pid:vals=hitcount,runtime.graph:sort=pid:size=2048 [active]
       #
      
       { pid:         14 } hitcount:          2  runtime:
       { pid:         16 } hitcount:          8  runtime:
       { pid:         26 } hitcount:          1  runtime:
       { pid:         57 } hitcount:          3  runtime:
       { pid:         61 } hitcount:         20  runtime: ###
       { pid:         66 } hitcount:          2  runtime:
       { pid:         70 } hitcount:          3  runtime:
       { pid:         72 } hitcount:          2  runtime:
       { pid:        145 } hitcount:         14  runtime: ####################
       { pid:        152 } hitcount:          5  runtime: #######
       { pid:        153 } hitcount:          2  runtime: ####
      
       Totals:
           Hits: 62
           Entries: 11
           Dropped: 0
      -------
      
      Link: https://lore.kernel.org/linux-trace-kernel/166610813953.56030.10944148382315789485.stgit@devnote2Signed-off-by: default avatarMasami Hiramatsu (Google) <mhiramat@kernel.org>
      Reviewed-by: default avatarTom Zanussi <zanussi@kernel.org>
      Tested-by: default avatarTom Zanussi <zanussi@kernel.org>
      a2c54256
    • Masami Hiramatsu (Google)'s avatar
      tracing: Add .percent suffix option to histogram values · abaa5258
      Masami Hiramatsu (Google) authored
      Add .percent suffix option to show the histogram values in percentage.
      This feature is useful when we need yo undersntand the overall trend
      for the histograms of large values.
      E.g. this shows the runtime percentage for each tasks.
      
      ------
        # cd /sys/kernel/debug/tracing/
        # echo hist:keys=pid:vals=hitcount,runtime.percent:sort=pid > \
          events/sched/sched_stat_runtime/trigger
        # sleep 10
        # cat events/sched/sched_stat_runtime/hist
       # event histogram
       #
       # trigger info: hist:keys=pid:vals=hitcount,runtime.percent:sort=pid:size=2048 [active]
       #
      
       { pid:          8 } hitcount:          7  runtime (%):   4.14
       { pid:         14 } hitcount:          5  runtime (%):   3.69
       { pid:         16 } hitcount:         11  runtime (%):   3.41
       { pid:         61 } hitcount:         41  runtime (%):  19.75
       { pid:         65 } hitcount:          4  runtime (%):   1.48
       { pid:         70 } hitcount:          6  runtime (%):   3.60
       { pid:         72 } hitcount:          2  runtime (%):   1.10
       { pid:        144 } hitcount:         10  runtime (%):  32.01
       { pid:        151 } hitcount:          8  runtime (%):  22.66
       { pid:        152 } hitcount:          2  runtime (%):   8.10
      
       Totals:
           Hits: 96
           Entries: 10
           Dropped: 0
      -----
      
      Link: https://lore.kernel.org/linux-trace-kernel/166610813077.56030.4238090506973562347.stgit@devnote2Signed-off-by: default avatarMasami Hiramatsu (Google) <mhiramat@kernel.org>
      Reviewed-by: default avatarTom Zanussi <zanussi@kernel.org>
      Tested-by: default avatarTom Zanussi <zanussi@kernel.org>
      abaa5258
    • Tom Zanussi's avatar
      tracing: Allow multiple hitcount values in histograms · 5f2e094e
      Tom Zanussi authored
      The hitcount is treated specially in the histograms - since it's
      always expected to be there regardless of whether the user specified
      anything or not, it's always added as the first histogram value.
      
      Currently the code doesn't allow it to be added more than once as a
      value, which is inconsistent with all the other possible values.  It
      would seem to be a pointless thing to want to do, but other features
      being added such as percent and graph modifiers don't work properly
      with the current hitcount restrictions.
      
      Fix this by allowing multiple hitcounts to be added.
      
      Link: https://lore.kernel.org/linux-trace-kernel/166610812248.56030.16754785928712505251.stgit@devnote2Signed-off-by: default avatarTom Zanussi <zanussi@kernel.org>
      Signed-off-by: default avatarMasami Hiramatsu (Google) <mhiramat@kernel.org>
      Tested-by: default avatarTom Zanussi <zanussi@kernel.org>
      5f2e094e
  2. 09 Dec, 2022 1 commit
  3. 29 Nov, 2022 1 commit
  4. 28 Nov, 2022 1 commit
  5. 24 Nov, 2022 14 commits
    • Steven Rostedt (Google)'s avatar
      ftrace: Avoid needless updates of the ftrace function call · bd604f3d
      Steven Rostedt (Google) authored
      Song Shuai reported:
      
          The list func (ftrace_ops_list_func) will be patched first
          before the transition between old and new calls are set,
          which fixed the race described in this commit `59338f75`.
      
          While ftrace_trace_function changes from the list func to a
          ftrace_ops func, like unregistering the klp_ops to leave the only
          global_ops in ftrace_ops_list, the ftrace_[regs]_call will be
          replaced with the list func although it already exists. So there
          should be a condition to avoid this.
      
      And suggested using another variable to keep track of what the ftrace
      function is set to. But this could be simplified by using a helper
      function that does the same with a static variable.
      
      Link: https://lore.kernel.org/lkml/20221026132039.2236233-1-suagrfillet@gmail.com/
      Link: https://lore.kernel.org/linux-trace-kernel/20221122180905.737b6f52@gandalf.local.homeReported-by: default avatarSong Shuai <suagrfillet@gmail.com>
      Reviewed-by: default avatarMasami Hiramatsu (Google) <mhiramat@kernel.org>
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      bd604f3d
    • Zheng Yejian's avatar
      tracing: Optimize event type allocation with IDA · 96e6122c
      Zheng Yejian authored
      After commit 060fa5c8 ("tracing/events: reuse trace event ids after
       overflow"), trace events with dynamic type are linked up in list
      'ftrace_event_list' through field 'trace_event.list'. Then when max
      event type number used up, it's possible to reuse type number of some
      freed one by traversing 'ftrace_event_list'.
      
      As instead, using IDA to manage available type numbers can make codes
      simpler and then the field 'trace_event.list' can be dropped.
      
      Since 'struct trace_event' is used in static tracepoints, drop
      'trace_event.list' can make vmlinux smaller. Local test with about 2000
      tracepoints, vmlinux reduced about 64KB:
        before:-rwxrwxr-x 1 root root 76669448 Nov  8 17:14 vmlinux
        after: -rwxrwxr-x 1 root root 76604176 Nov  8 17:15 vmlinux
      
      Link: https://lkml.kernel.org/r/20221110020319.1259291-1-zhengyejian1@huawei.comSigned-off-by: default avatarZheng Yejian <zhengyejian1@huawei.com>
      Acked-by: default avatarMasami Hiramatsu (Google) <mhiramat@kernel.org>
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      96e6122c
    • Xiu Jianfeng's avatar
      tracing: Make tracepoint_print_iter static · a76d4648
      Xiu Jianfeng authored
      After change in commit 42391745 ("tracing: Make tracepoint_printk a
      static_key"), this symbol is not used outside of the file, so mark it
      static.
      
      Link: https://lkml.kernel.org/r/20221122091456.72055-1-xiujianfeng@huawei.comSigned-off-by: default avatarXiu Jianfeng <xiujianfeng@huawei.com>
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      a76d4648
    • Chuang Wang's avatar
      tracing/perf: Use strndup_user instead of kzalloc/strncpy_from_user · 9430cd62
      Chuang Wang authored
      This patch uses strndup_user instead of kzalloc + strncpy_from_user,
      which makes the code more concise.
      
      Link: https://lkml.kernel.org/r/20221121080831.707409-1-nashuiliang@gmail.comSigned-off-by: default avatarChuang Wang <nashuiliang@gmail.com>
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      9430cd62
    • Daniel Bristot de Oliveira's avatar
      Documentation/osnoise: Add osnoise/options documentation · 67543cd6
      Daniel Bristot de Oliveira authored
      Add the documentation about the osnoise/options file, along
      with an explanation about the OSNOISE_WORKLOAD option.
      
      Link: https://lkml.kernel.org/r/777af8f3d87beedd304805f98eff6c8291d64226.1668692096.git.bristot@kernel.org
      
      Cc: Daniel Bristot de Oliveira <bristot@kernel.org>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Signed-off-by: default avatarDaniel Bristot de Oliveira <bristot@kernel.org>
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      67543cd6
    • Daniel Bristot de Oliveira's avatar
      tracing/osnoise: Add OSNOISE_WORKLOAD option · 30838fcd
      Daniel Bristot de Oliveira authored
      The osnoise tracer is not only a tracer, and a set of tracepoints,
      but also a workload dispatcher.
      
      In preparation for having other workloads, e.g., in user-space,
      add an option to avoid dispatching the workload.
      
      By not dispatching the workload, the osnoise: tracepoints become
      generic events to measure the execution time of *any* task on Linux.
      
      For example:
      
        # cd /sys/kernel/tracing/
        # cat osnoise/options
        DEFAULTS OSNOISE_WORKLOAD
        # echo NO_OSNOISE_WORKLOAD > osnoise/options
        # cat osnoise/options
        NO_DEFAULTS NO_OSNOISE_WORKLOAD
        # echo osnoise > set_event
        # echo osnoise > current_tracer
        # tail -8 trace
            make-94722   [002] d..3.  1371.794507: thread_noise:     make:94722 start 1371.794302286 duration 200897 ns
              sh-121042  [020] d..3.  1371.794534: thread_noise:       sh:121042 start 1371.781610976 duration 8943683 ns
            make-121097  [005] d..3.  1371.794542: thread_noise:     make:121097 start 1371.794481522 duration 60444 ns
           <...>-40      [005] d..3.  1371.794550: thread_noise: migration/5:40 start 1371.794542256 duration 7154 ns
          <idle>-0       [018] dNh2.  1371.794554: irq_noise: reschedule:253 start 1371.794553547 duration 40 ns
          <idle>-0       [018] dNh2.  1371.794561: irq_noise: local_timer:236 start 1371.794556222 duration 4890 ns
          <idle>-0       [018] .Ns2.  1371.794563: softirq_noise:    SCHED:7 start 1371.794561803 duration 992 ns
          <idle>-0       [018] d..3.  1371.794566: thread_noise: swapper/18:0 start 1371.781368110 duration 13191798 ns
      
      In preparation for the rtla exec_time tracer/tool and
      rtla osnoise --user option.
      
      Link: https://lkml.kernel.org/r/f5cfbd37aefd419eefe9243b4d2fc38ed5753fe4.1668692096.git.bristot@kernel.org
      
      Cc: Daniel Bristot de Oliveira <bristot@kernel.org>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Signed-off-by: default avatarDaniel Bristot de Oliveira <bristot@kernel.org>
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      30838fcd
    • Daniel Bristot de Oliveira's avatar
      tracing/osnoise: Add osnoise/options file · b179d48b
      Daniel Bristot de Oliveira authored
      Add the tracing/osnoise/options file to control
      osnoise/timerlat tracer features. It is a single
      file to contain multiple features, similar to
      the sched/features file.
      
      Reading the file displays a list of options. Writing
      the OPTION_NAME enables it, writing NO_OPTION_NAME disables
      it.
      
      The DEAFULTS is a particular option that resets the options
      to the default ones.
      
      It uses a bitmask to keep track of the status of the option. When
      needed, we can add a list of static keys, but for now
      it does not justify the memory increase.
      
      Link: https://lkml.kernel.org/r/f8d34aefdb225d2603fcb4c02a120832a0cd3339.1668692096.git.bristot@kernel.org
      
      Cc: Daniel Bristot de Oliveira <bristot@kernel.org>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Signed-off-by: default avatarDaniel Bristot de Oliveira <bristot@kernel.org>
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      b179d48b
    • Song Chen's avatar
      ring_buffer: Remove unused "event" parameter · 04aabc32
      Song Chen authored
      After commit a389d86f ("ring-buffer: Have nested events still record
      running time stamp"), the "event" parameter is no longer used in either
      ring_buffer_unlock_commit() or rb_commit(). Best to remove it.
      
      Link: https://lkml.kernel.org/r/1666274811-24138-1-git-send-email-chensong_2000@189.cnSigned-off-by: default avatarSong Chen <chensong_2000@189.cn>
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      04aabc32
    • Steven Rostedt (Google)'s avatar
      tracing: Add trace_trigger kernel command line option · a01fdc89
      Steven Rostedt (Google) authored
      Allow triggers to be enabled at kernel boot up. For example:
      
        trace_trigger="sched_switch.stacktrace if prev_state == 2"
      
      The above will enable the stacktrace trigger on top of the sched_switch
      event and only trigger if its prev_state is 2 (TASK_UNINTERRUPTIBLE). Then
      at boot up, a stacktrace will trigger and be recorded in the tracing ring
      buffer every time the sched_switch happens where the previous state is
      TASK_INTERRUPTIBLE.
      
      Another useful trigger would be "traceoff" which can stop tracing on an
      event if a field of the event matches a certain value defined by the
      filter ("if" statement).
      
      Link: https://lore.kernel.org/linux-trace-kernel/20221020210056.0d8d0a5b@gandalf.local.homeSigned-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      a01fdc89
    • Steven Rostedt (Google)'s avatar
      tracing: Add __cpumask to denote a trace event field that is a cpumask_t · 8230f27b
      Steven Rostedt (Google) authored
      The trace events have a __bitmask field that can be used for anything
      that requires bitmasks. Although currently it is only used for CPU
      masks, it could be used in the future for any type of bitmasks.
      
      There is some user space tooling that wants to know if a field is a CPU
      mask and not just some random unsigned long bitmask. Introduce
      "__cpumask()" helper functions that work the same as the current
      __bitmask() helpers but displays in the format file:
      
        field:__data_loc cpumask_t *[] mask;    offset:36;      size:4; signed:0;
      
      Instead of:
      
        field:__data_loc unsigned long[] mask;  offset:32;      size:4; signed:0;
      
      The main difference is the type. Instead of "unsigned long" it is
      "cpumask_t *". Note, this type field needs to be a real type in the
      __dynamic_array() logic that both __cpumask and__bitmask use, but the
      comparison field requires it to be a scalar type whereas cpumask_t is a
      structure (non-scalar). But everything works when making it a pointer.
      
      Valentin added changes to remove the need of passing in "nr_bits" and the
      __cpumask will always use nr_cpumask_bits as its size.
      
      Link: https://lkml.kernel.org/r/20221014080456.1d32b989@rorschach.local.homeRequested-by: default avatarValentin Schneider <vschneid@redhat.com>
      Reviewed-by: default avatarValentin Schneider <vschneid@redhat.com>
      Signed-off-by: default avatarValentin Schneider <vschneid@redhat.com>
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      8230f27b
    • Zheng Yejian's avatar
      ftrace: Clean comments related to FTRACE_OPS_FL_PER_CPU · 78a01feb
      Zheng Yejian authored
      Commit b3a88803 ("ftrace: Kill FTRACE_OPS_FL_PER_CPU") didn't
      completely remove the comments related to FTRACE_OPS_FL_PER_CPU.
      
      Link: https://lkml.kernel.org/r/20221025153923.1995973-1-zhengyejian1@huawei.com
      
      Fixes: b3a88803 ("ftrace: Kill FTRACE_OPS_FL_PER_CPU")
      Signed-off-by: default avatarZheng Yejian <zhengyejian1@huawei.com>
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      78a01feb
    • Steven Rostedt (Google)'s avatar
      tracing: Free buffers when a used dynamic event is removed · 4313e5a6
      Steven Rostedt (Google) authored
      After 65536 dynamic events have been added and removed, the "type" field
      of the event then uses the first type number that is available (not
      currently used by other events). A type number is the identifier of the
      binary blobs in the tracing ring buffer (known as events) to map them to
      logic that can parse the binary blob.
      
      The issue is that if a dynamic event (like a kprobe event) is traced and
      is in the ring buffer, and then that event is removed (because it is
      dynamic, which means it can be created and destroyed), if another dynamic
      event is created that has the same number that new event's logic on
      parsing the binary blob will be used.
      
      To show how this can be an issue, the following can crash the kernel:
      
       # cd /sys/kernel/tracing
       # for i in `seq 65536`; do
           echo 'p:kprobes/foo do_sys_openat2 $arg1:u32' > kprobe_events
       # done
      
      For every iteration of the above, the writing to the kprobe_events will
      remove the old event and create a new one (with the same format) and
      increase the type number to the next available on until the type number
      reaches over 65535 which is the max number for the 16 bit type. After it
      reaches that number, the logic to allocate a new number simply looks for
      the next available number. When an dynamic event is removed, that number
      is then available to be reused by the next dynamic event created. That is,
      once the above reaches the max number, the number assigned to the event in
      that loop will remain the same.
      
      Now that means deleting one dynamic event and created another will reuse
      the previous events type number. This is where bad things can happen.
      After the above loop finishes, the kprobes/foo event which reads the
      do_sys_openat2 function call's first parameter as an integer.
      
       # echo 1 > kprobes/foo/enable
       # cat /etc/passwd > /dev/null
       # cat trace
                   cat-2211    [005] ....  2007.849603: foo: (do_sys_openat2+0x0/0x130) arg1=4294967196
                   cat-2211    [005] ....  2007.849620: foo: (do_sys_openat2+0x0/0x130) arg1=4294967196
                   cat-2211    [005] ....  2007.849838: foo: (do_sys_openat2+0x0/0x130) arg1=4294967196
                   cat-2211    [005] ....  2007.849880: foo: (do_sys_openat2+0x0/0x130) arg1=4294967196
       # echo 0 > kprobes/foo/enable
      
      Now if we delete the kprobe and create a new one that reads a string:
      
       # echo 'p:kprobes/foo do_sys_openat2 +0($arg2):string' > kprobe_events
      
      And now we can the trace:
      
       # cat trace
              sendmail-1942    [002] .....   530.136320: foo: (do_sys_openat2+0x0/0x240) arg1=             cat-2046    [004] .....   530.930817: foo: (do_sys_openat2+0x0/0x240) arg1="������������������������������������������������������������������������������������������������"
                   cat-2046    [004] .....   530.930961: foo: (do_sys_openat2+0x0/0x240) arg1="������������������������������������������������������������������������������������������������"
                   cat-2046    [004] .....   530.934278: foo: (do_sys_openat2+0x0/0x240) arg1="������������������������������������������������������������������������������������������������"
                   cat-2046    [004] .....   530.934563: foo: (do_sys_openat2+0x0/0x240) arg1="������������������������������������������������������������������������������������������������"
                  bash-1515    [007] .....   534.299093: foo: (do_sys_openat2+0x0/0x240) arg1="kkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk���������@��4Z����;Y�����U
      
      And dmesg has:
      
      ==================================================================
      BUG: KASAN: use-after-free in string+0xd4/0x1c0
      Read of size 1 at addr ffff88805fdbbfa0 by task cat/2049
      
       CPU: 0 PID: 2049 Comm: cat Not tainted 6.1.0-rc6-test+ #641
       Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01 v03.03 07/14/2016
       Call Trace:
        <TASK>
        dump_stack_lvl+0x5b/0x77
        print_report+0x17f/0x47b
        kasan_report+0xad/0x130
        string+0xd4/0x1c0
        vsnprintf+0x500/0x840
        seq_buf_vprintf+0x62/0xc0
        trace_seq_printf+0x10e/0x1e0
        print_type_string+0x90/0xa0
        print_kprobe_event+0x16b/0x290
        print_trace_line+0x451/0x8e0
        s_show+0x72/0x1f0
        seq_read_iter+0x58e/0x750
        seq_read+0x115/0x160
        vfs_read+0x11d/0x460
        ksys_read+0xa9/0x130
        do_syscall_64+0x3a/0x90
        entry_SYSCALL_64_after_hwframe+0x63/0xcd
       RIP: 0033:0x7fc2e972ade2
       Code: c0 e9 b2 fe ff ff 50 48 8d 3d b2 3f 0a 00 e8 05 f0 01 00 0f 1f 44 00 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 0f 05 <48> 3d 00 f0 ff ff 77 56 c3 0f 1f 44 00 00 48 83 ec 28 48 89 54 24
       RSP: 002b:00007ffc64e687c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
       RAX: ffffffffffffffda RBX: 0000000000020000 RCX: 00007fc2e972ade2
       RDX: 0000000000020000 RSI: 00007fc2e980d000 RDI: 0000000000000003
       RBP: 00007fc2e980d000 R08: 00007fc2e980c010 R09: 0000000000000000
       R10: 0000000000000022 R11: 0000000000000246 R12: 0000000000020f00
       R13: 0000000000000003 R14: 0000000000020000 R15: 0000000000020000
        </TASK>
      
       The buggy address belongs to the physical page:
       page:ffffea00017f6ec0 refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x5fdbb
       flags: 0xfffffc0000000(node=0|zone=1|lastcpupid=0x1fffff)
       raw: 000fffffc0000000 0000000000000000 ffffea00017f6ec8 0000000000000000
       raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
       page dumped because: kasan: bad access detected
      
       Memory state around the buggy address:
        ffff88805fdbbe80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
        ffff88805fdbbf00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
       >ffff88805fdbbf80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
                                      ^
        ffff88805fdbc000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
        ffff88805fdbc080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
       ==================================================================
      
      This was found when Zheng Yejian sent a patch to convert the event type
      number assignment to use IDA, which gives the next available number, and
      this bug showed up in the fuzz testing by Yujie Liu and the kernel test
      robot. But after further analysis, I found that this behavior is the same
      as when the event type numbers go past the 16bit max (and the above shows
      that).
      
      As modules have a similar issue, but is dealt with by setting a
      "WAS_ENABLED" flag when a module event is enabled, and when the module is
      freed, if any of its events were enabled, the ring buffer that holds that
      event is also cleared, to prevent reading stale events. The same can be
      done for dynamic events.
      
      If any dynamic event that is being removed was enabled, then make sure the
      buffers they were enabled in are now cleared.
      
      Link: https://lkml.kernel.org/r/20221123171434.545706e3@gandalf.local.home
      Link: https://lore.kernel.org/all/20221110020319.1259291-1-zhengyejian1@huawei.com/
      
      Cc: stable@vger.kernel.org
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Depends-on: e18eb878 ("tracing: Add tracing_reset_all_online_cpus_unlocked() function")
      Depends-on: 5448d44c ("tracing: Add unified dynamic event framework")
      Depends-on: 6212dd29 ("tracing/kprobes: Use dyn_event framework for kprobe events")
      Depends-on: 065e63f9 ("tracing: Only have rmmod clear buffers that its events were active in")
      Depends-on: 575380da ("tracing: Only clear trace buffer on module unload if event was traced")
      Fixes: 77b44d1b ("tracing/kprobes: Rename Kprobe-tracer to kprobe-event")
      Reported-by: default avatarZheng Yejian <zhengyejian1@huawei.com>
      Reported-by: default avatarYujie Liu <yujie.liu@intel.com>
      Reported-by: default avatarkernel test robot <yujie.liu@intel.com>
      Acked-by: default avatarMasami Hiramatsu (Google) <mhiramat@kernel.org>
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      4313e5a6
    • Steven Rostedt (Google)'s avatar
      tracing: Add tracing_reset_all_online_cpus_unlocked() function · e18eb878
      Steven Rostedt (Google) authored
      Currently the tracing_reset_all_online_cpus() requires the
      trace_types_lock held. But only one caller of this function actually has
      that lock held before calling it, and the other just takes the lock so
      that it can call it. More users of this function is needed where the lock
      is not held.
      
      Add a tracing_reset_all_online_cpus_unlocked() function for the one use
      case that calls it without being held, and also add a lockdep_assert to
      make sure it is held when called.
      
      Then have tracing_reset_all_online_cpus() take the lock internally, such
      that callers do not need to worry about taking it.
      
      Link: https://lkml.kernel.org/r/20221123192741.658273220@goodmis.org
      
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Zheng Yejian <zhengyejian1@huawei.com>
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      e18eb878
    • Steven Rostedt (Google)'s avatar
      tracing: Fix race where histograms can be called before the event · ef38c79a
      Steven Rostedt (Google) authored
      commit 94eedf3d ("tracing: Fix race where eprobes can be called before
      the event") fixed an issue where if an event is soft disabled, and the
      trigger is being added, there's a small window where the event sees that
      there's a trigger but does not see that it requires reading the event yet,
      and then calls the trigger with the record == NULL.
      
      This could be solved with adding memory barriers in the hot path, or to
      make sure that all the triggers requiring a record check for NULL. The
      latter was chosen.
      
      Commit 94eedf3d set the eprobe trigger handle to check for NULL, but
      the same needs to be done with histograms.
      
      Link: https://lore.kernel.org/linux-trace-kernel/20221118211809.701d40c0f8a757b0df3c025a@kernel.org/
      Link: https://lore.kernel.org/linux-trace-kernel/20221123164323.03450c3a@gandalf.local.home
      
      Cc: Tom Zanussi <zanussi@kernel.org>
      Cc: stable@vger.kernel.org
      Fixes: 7491e2c4 ("tracing: Add a probe that attaches to trace events")
      Reported-by: default avatarMasami Hiramatsu (Google) <mhiramat@kernel.org>
      Acked-by: default avatarMasami Hiramatsu (Google) <mhiramat@kernel.org>
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      ef38c79a
  6. 22 Nov, 2022 3 commits
  7. 21 Nov, 2022 1 commit
  8. 20 Nov, 2022 4 commits
    • Linus Torvalds's avatar
      Merge tag 'trace-probes-v6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace · c6c67bf9
      Linus Torvalds authored
      Pull tracing/probes fixes from Steven Rostedt:
      
       - Fix possible NULL pointer dereference on trace_event_file in
         kprobe_event_gen_test_exit()
      
       - Fix NULL pointer dereference for trace_array in
         kprobe_event_gen_test_exit()
      
       - Fix memory leak of filter string for eprobes
      
       - Fix a possible memory leak in rethook_alloc()
      
       - Skip clearing aggrprobe's post_handler in kprobe-on-ftrace case which
         can cause a possible use-after-free
      
       - Fix warning in eprobe filter creation
      
       - Fix eprobe filter creation as it picked the wrong event for the
         fields
      
      * tag 'trace-probes-v6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
        tracing/eprobe: Fix eprobe filter to make a filter correctly
        tracing/eprobe: Fix warning in filter creation
        kprobes: Skip clearing aggrprobe's post_handler in kprobe-on-ftrace case
        rethook: fix a potential memleak in rethook_alloc()
        tracing/eprobe: Fix memory leak of filter string
        tracing: kprobe: Fix potential null-ptr-deref on trace_array in kprobe_event_gen_test_exit()
        tracing: kprobe: Fix potential null-ptr-deref on trace_event_file in kprobe_event_gen_test_exit()
      c6c67bf9
    • Linus Torvalds's avatar
      Merge tag 'trace-v6.1-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace · 5239ddeb
      Linus Torvalds authored
      Pull tracing fixes from Steven Rostedt:
      
       - Fix polling to block on watermark like the reads do, as user space
         applications get confused when the select says read is available, and
         then the read blocks
      
       - Fix accounting of ring buffer dropped pages as it is what is used to
         determine if the buffer is empty or not
      
       - Fix memory leak in tracing_read_pipe()
      
       - Fix struct trace_array warning about being declared in parameters
      
       - Fix accounting of ftrace pages used in output at start up.
      
       - Fix allocation of dyn_ftrace pages by subtracting one from order
         instead of diving it by 2
      
       - Static analyzer found a case were a pointer being used outside of a
         NULL check (rb_head_page_deactivate())
      
       - Fix possible NULL pointer dereference if kstrdup() fails in
         ftrace_add_mod()
      
       - Fix memory leak in test_gen_synth_cmd() and test_empty_synth_event()
      
       - Fix bad pointer dereference in register_synth_event() on error path
      
       - Remove unused __bad_type_size() method
      
       - Fix possible NULL pointer dereference of entry in list 'tr->err_log'
      
       - Fix NULL pointer deference race if eprobe is called before the event
         setup
      
      * tag 'trace-v6.1-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
        tracing: Fix race where eprobes can be called before the event
        tracing: Fix potential null-pointer-access of entry in list 'tr->err_log'
        tracing: Remove unused __bad_type_size() method
        tracing: Fix wild-memory-access in register_synth_event()
        tracing: Fix memory leak in test_gen_synth_cmd() and test_empty_synth_event()
        ftrace: Fix null pointer dereference in ftrace_add_mod()
        ring_buffer: Do not deactivate non-existant pages
        ftrace: Optimize the allocation for mcount entries
        ftrace: Fix the possible incorrect kernel message
        tracing: Fix warning on variable 'struct trace_array'
        tracing: Fix memory leak in tracing_read_pipe()
        ring-buffer: Include dropped pages in counting dirty patches
        tracing/ring-buffer: Have polling block on watermark
      5239ddeb
    • Steven Rostedt (Google)'s avatar
      tracing: Fix race where eprobes can be called before the event · 94eedf3d
      Steven Rostedt (Google) authored
      The flag that tells the event to call its triggers after reading the event
      is set for eprobes after the eprobe is enabled. This leads to a race where
      the eprobe may be triggered at the beginning of the event where the record
      information is NULL. The eprobe then dereferences the NULL record causing
      a NULL kernel pointer bug.
      
      Test for a NULL record to keep this from happening.
      
      Link: https://lore.kernel.org/linux-trace-kernel/20221116192552.1066630-1-rafaelmendsr@gmail.com/
      Link: https://lore.kernel.org/linux-trace-kernel/20221117214249.2addbe10@gandalf.local.home
      
      Cc: Linux Trace Kernel <linux-trace-kernel@vger.kernel.org>
      Cc: Tzvetomir Stoyanov <tz.stoyanov@gmail.com>
      Cc: Tom Zanussi <zanussi@kernel.org>
      Cc: stable@vger.kernel.org
      Fixes: 7491e2c4 ("tracing: Add a probe that attaches to trace events")
      Acked-by: default avatarMasami Hiramatsu (Google) <mhiramat@kernel.org>
      Reported-by: default avatarRafael Mendonca <rafaelmendsr@gmail.com>
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      94eedf3d
    • Linus Torvalds's avatar
      Merge tag 'x86_urgent_for_v6.1_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 894909f9
      Linus Torvalds authored
      Pull x86 fixes from Borislav Petkov:
      
       - Do not hold fpregs lock when inheriting FPU permissions because the
         fpregs lock disables preemption on RT but fpu_inherit_perms() does
         spin_lock_irq(), which, on RT, uses rtmutexes and they need to be
         preemptible.
      
       - Check the page offset and the length of the data supplied by
         userspace for overflow when specifying a set of pages to add to an
         SGX enclave
      
      * tag 'x86_urgent_for_v6.1_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/fpu: Drop fpregs lock before inheriting FPU permissions
        x86/sgx: Add overflow check in sgx_validate_offset_length()
      894909f9