1. 10 Nov, 2021 1 commit
    • Steven Rostedt (VMware)'s avatar
      ring-buffer: Protect ring_buffer_reset() from reentrancy · 51d15794
      Steven Rostedt (VMware) authored
      The resetting of the entire ring buffer use to simply go through and reset
      each individual CPU buffer that had its own protection and synchronization.
      But this was very slow, due to performing a synchronization for each CPU.
      The code was reshuffled to do one disabling of all CPU buffers, followed
      by a single RCU synchronization, and then the resetting of each of the CPU
      buffers. But unfortunately, the mutex that prevented multiple occurrences
      of resetting the buffer was not moved to the upper function, and there is
      nothing to protect from it.
      
      Take the ring buffer mutex around the global reset.
      
      Cc: stable@vger.kernel.org
      Fixes: b23d7a5f ("ring-buffer: speed up buffer resets by avoiding synchronize_rcu for each CPU")
      Reported-by: default avatar"Tzvetomir Stoyanov (VMware)" <tz.stoyanov@gmail.com>
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      51d15794
  2. 02 Nov, 2021 5 commits
  3. 01 Nov, 2021 9 commits
  4. 30 Oct, 2021 1 commit
  5. 29 Oct, 2021 1 commit
  6. 27 Oct, 2021 13 commits
    • Robin H. Johnson's avatar
      tracing: Increase PERF_MAX_TRACE_SIZE to handle Sentinel1 and docker together · e531e90b
      Robin H. Johnson authored
      Running endpoint security solutions like Sentinel1 that use perf-based
      tracing heavily lead to this repeated dump complaining about dockerd.
      The default value of 2048 is nowhere near not large enough.
      
      Using the prior patch "tracing: show size of requested buffer", we get
      "perf buffer not large enough, wanted 6644, have 6144", after repeated
      up-sizing (I did 2/4/6/8K). With 8K, the problem doesn't occur at all,
      so below is the trace for 6K.
      
      I'm wondering if this value should be selectable at boot time, but this
      is a good starting point.
      
      ```
      ------------[ cut here ]------------
      perf buffer not large enough, wanted 6644, have 6144
      WARNING: CPU: 1 PID: 4997 at kernel/trace/trace_event_perf.c:402 perf_trace_buf_alloc+0x8c/0xa0
      Modules linked in: [..]
      CPU: 1 PID: 4997 Comm: sh Tainted: G                T 5.13.13-x86_64-00039-gb3959163488e #63
      Hardware name: LENOVO 20KH002JUS/20KH002JUS, BIOS N23ET66W (1.41 ) 09/02/2019
      RIP: 0010:perf_trace_buf_alloc+0x8c/0xa0
      Code: 80 3d 43 97 d0 01 00 74 07 31 c0 5b 5d 41 5c c3 ba 00 18 00 00 89 ee 48 c7 c7 00 82 7d 91 c6 05 25 97 d0 01 01 e8 22 ee bc 00 <0f> 0b 31 c0 eb db 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 55 89
      RSP: 0018:ffffb922026b7d58 EFLAGS: 00010282
      RAX: 0000000000000000 RBX: ffff9da5ee012000 RCX: 0000000000000027
      RDX: ffff9da881657828 RSI: 0000000000000001 RDI: ffff9da881657820
      RBP: 00000000000019f4 R08: 0000000000000000 R09: ffffb922026b7b80
      R10: ffffb922026b7b78 R11: ffffffff91dda688 R12: 000000000000000f
      R13: ffff9da5ee012108 R14: ffff9da8816570a0 R15: ffffb922026b7e30
      FS:  00007f420db1a080(0000) GS:ffff9da881640000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000000000060 CR3: 00000002504a8006 CR4: 00000000003706e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       kprobe_perf_func+0x11e/0x270
       ? do_execveat_common.isra.0+0x1/0x1c0
       ? do_execveat_common.isra.0+0x5/0x1c0
       kprobe_ftrace_handler+0x10e/0x1d0
       0xffffffffc03aa0c8
       ? do_execveat_common.isra.0+0x1/0x1c0
       do_execveat_common.isra.0+0x5/0x1c0
       __x64_sys_execve+0x33/0x40
       do_syscall_64+0x6b/0xc0
       ? do_syscall_64+0x11/0xc0
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      RIP: 0033:0x7f420dc1db37
      Code: ff ff 76 e7 f7 d8 64 41 89 00 eb df 0f 1f 80 00 00 00 00 f7 d8 64 41 89 00 eb dc 0f 1f 84 00 00 00 00 00 b8 3b 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 01 43 0f 00 f7 d8 64 89 01 48
      RSP: 002b:00007ffd4e8b4e38 EFLAGS: 00000246 ORIG_RAX: 000000000000003b
      RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f420dc1db37
      RDX: 0000564338d1e740 RSI: 0000564338d32d50 RDI: 0000564338d28f00
      RBP: 0000564338d28f00 R08: 0000564338d32d50 R09: 0000000000000020
      R10: 00000000000001b6 R11: 0000000000000246 R12: 0000564338d28f00
      R13: 0000564338d32d50 R14: 0000564338d1e740 R15: 0000564338d28c60
      ---[ end trace 83ab3e8e16275e49 ]---
      ```
      
      Link: https://lkml.kernel.org/r/20210831043723.13481-2-robbat2@gentoo.orgSigned-off-by: default avatarRobin H. Johnson <robbat2@gentoo.org>
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      e531e90b
    • Robin H. Johnson's avatar
      tracing: Show size of requested perf buffer · a90afe8d
      Robin H. Johnson authored
      If the perf buffer isn't large enough, provide a hint about how large it
      needs to be for whatever is running.
      
      Link: https://lkml.kernel.org/r/20210831043723.13481-1-robbat2@gentoo.orgSigned-off-by: default avatarRobin H. Johnson <robbat2@gentoo.org>
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      a90afe8d
    • Steven Rostedt (VMware)'s avatar
      bootconfig: Initialize ret in xbc_parse_tree() · 39d9c1c1
      Steven Rostedt (VMware) authored
      The do while loop continues while ret is zero, but ret is never
      initialized. The check for ret in the loop at the while should always be
      initialized, but if an empty string were to be passed in, q would be NULL
      and p would be '\0', and it would break out of the loop without ever
      setting ret.
      
      Set ret to zero, and then xbc_verify_tree() would be called and catch that
      it is an empty tree and report the proper error.
      
      Link: https://lkml.kernel.org/r/20211027105753.6ab9da5f@gandalf.local.home
      
      Fixes: bdac5c2b ("bootconfig: Allocate xbc_data inside xbc_init()")
      Reported-by: default avatarkernel test robot <lkp@intel.com>
      Reported-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Acked-by: default avatarMasami Hiramatsu <mhiramat@kernel.org>
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      39d9c1c1
    • 王贇's avatar
      ftrace: do CPU checking after preemption disabled · d33cc657
      王贇 authored
      With CONFIG_DEBUG_PREEMPT we observed reports like:
      
        BUG: using smp_processor_id() in preemptible
        caller is perf_ftrace_function_call+0x6f/0x2e0
        CPU: 1 PID: 680 Comm: a.out Not tainted
        Call Trace:
         <TASK>
         dump_stack_lvl+0x8d/0xcf
         check_preemption_disabled+0x104/0x110
         ? optimize_nops.isra.7+0x230/0x230
         ? text_poke_bp_batch+0x9f/0x310
         perf_ftrace_function_call+0x6f/0x2e0
         ...
         __text_poke+0x5/0x620
         text_poke_bp_batch+0x9f/0x310
      
      This telling us the CPU could be changed after task is preempted, and
      the checking on CPU before preemption will be invalid.
      
      Since now ftrace_test_recursion_trylock() will help to disable the
      preemption, this patch just do the checking after trylock() to address
      the issue.
      
      Link: https://lkml.kernel.org/r/54880691-5fe2-33e7-d12f-1fa6136f5183@linux.alibaba.com
      
      CC: Steven Rostedt <rostedt@goodmis.org>
      Cc: Guo Ren <guoren@kernel.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Cc: Albert Ou <aou@eecs.berkeley.edu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Jiri Kosina <jikos@kernel.org>
      Cc: Miroslav Benes <mbenes@suse.cz>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: Joe Lawrence <joe.lawrence@redhat.com>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Jisheng Zhang <jszhang@kernel.org>
      Reported-by: default avatarAbaci <abaci@linux.alibaba.com>
      Signed-off-by: default avatarMichael Wang <yun.wang@linux.alibaba.com>
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      d33cc657
    • 王贇's avatar
      ftrace: disable preemption when recursion locked · ce5e4803
      王贇 authored
      As the documentation explained, ftrace_test_recursion_trylock()
      and ftrace_test_recursion_unlock() were supposed to disable and
      enable preemption properly, however currently this work is done
      outside of the function, which could be missing by mistake.
      
      And since the internal using of trace_test_and_set_recursion()
      and trace_clear_recursion() also require preemption disabled, we
      can just merge the logical.
      
      This patch will make sure the preemption has been disabled when
      trace_test_and_set_recursion() return bit >= 0, and
      trace_clear_recursion() will enable the preemption if previously
      enabled.
      
      Link: https://lkml.kernel.org/r/13bde807-779c-aa4c-0672-20515ae365ea@linux.alibaba.com
      
      CC: Petr Mladek <pmladek@suse.com>
      Cc: Guo Ren <guoren@kernel.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Cc: Albert Ou <aou@eecs.berkeley.edu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Jiri Kosina <jikos@kernel.org>
      Cc: Joe Lawrence <joe.lawrence@redhat.com>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Jisheng Zhang <jszhang@kernel.org>
      CC: Steven Rostedt <rostedt@goodmis.org>
      CC: Miroslav Benes <mbenes@suse.cz>
      Reported-by: default avatarAbaci <abaci@linux.alibaba.com>
      Suggested-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: default avatarMichael Wang <yun.wang@linux.alibaba.com>
      [ Removed extra line in comment - SDR ]
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      ce5e4803
    • Kalesh Singh's avatar
      tracing/histogram: Document expression arithmetic and constants · 2d2f6d4b
      Kalesh Singh authored
      Histogram expressions now support division, and multiplication in
      addition to the already supported subtraction and addition operators.
      
      Numeric constants can also be used in a hist trigger expressions
      or assigned to a variable and used by refernce in an expression.
      
      Link: https://lkml.kernel.org/r/20211025200852.3002369-9-kaleshsingh@google.comSigned-off-by: default avatarKalesh Singh <kaleshsingh@google.com>
      Reviewed-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      2d2f6d4b
    • Kalesh Singh's avatar
      tracing/histogram: Optimize division by a power of 2 · 722eddaa
      Kalesh Singh authored
      The division is a slow operation. If the divisor is a power of 2, use a
      shift instead.
      
      Results were obtained using Android's version of perf (simpleperf[1]) as
      described below:
      
      1. hist_field_div() is modified to call 2 test functions:
         test_hist_field_div_[not]_optimized(); passing them the
         same args. Use noinline and volatile to ensure these are
         not optimized out by the compiler.
      2. Create a hist event trigger that uses division:
            events/kmem/rss_stat$ echo 'hist:keys=common_pid:x=size/<divisor>'
               >> trigger
            events/kmem/rss_stat$ echo 'hist:keys=common_pid:vals=$x'
               >> trigger
      3. Run Android's lmkd_test[2] to generate rss_stat events, and
         record CPU samples with Android's simpleperf:
            simpleperf record -a --exclude-perf --post-unwind=yes -m 16384 -g
               -f 2000 -o perf.data
      
      == Results ==
      
      Divisor is a power of 2 (divisor == 32):
      
         test_hist_field_div_not_optimized  | 8,717,091 cpu-cycles
         test_hist_field_div_optimized      | 1,643,137 cpu-cycles
      
      If the divisor is a power of 2, the optimized version is ~5.3x faster.
      
      Divisor is not a power of 2 (divisor == 33):
      
         test_hist_field_div_not_optimized  | 4,444,324 cpu-cycles
         test_hist_field_div_optimized      | 5,497,958 cpu-cycles
      
      If the divisor is not a power of 2, as expected, the optimized version is
      slightly slower (~24% slower).
      
      [1] https://android.googlesource.com/platform/system/extras/+/master/simpleperf/doc/README.md
      [2] https://cs.android.com/android/platform/superproject/+/master:system/memory/lmkd/tests/lmkd_test.cpp
      
      Link: https://lkml.kernel.org/r/20211025200852.3002369-7-kaleshsingh@google.comSigned-off-by: default avatarKalesh Singh <kaleshsingh@google.com>
      Suggested-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      722eddaa
    • Kalesh Singh's avatar
      tracing/histogram: Covert expr to const if both operands are constants · f47716b7
      Kalesh Singh authored
      If both operands of a hist trigger expression are constants, convert the
      expression to a constant. This optimization avoids having to perform the
      same calculation multiple times and also saves on memory since the
      merged constants are represented by a single struct hist_field instead
      or multiple.
      
      Link: https://lkml.kernel.org/r/20211025200852.3002369-6-kaleshsingh@google.comSigned-off-by: default avatarKalesh Singh <kaleshsingh@google.com>
      Suggested-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      f47716b7
    • Kalesh Singh's avatar
      tracing/histogram: Simplify handling of .sym-offset in expressions · c5eac6ee
      Kalesh Singh authored
      The '-' in .sym-offset can confuse the hist trigger arithmetic
      expression parsing. Simplify the handling of this by replacing the
      'sym-offset' with 'symXoffset'. This allows us to correctly evaluate
      expressions where the user may have inadvertently added a .sym-offset
      modifier to one of the operands in an expression, instead of bailing
      out. In this case the .sym-offset has no effect on the evaluation of the
      expression. The only valid use of the .sym-offset is as a hist key
      modifier.
      
      Link: https://lkml.kernel.org/r/20211025200852.3002369-5-kaleshsingh@google.comSigned-off-by: default avatarKalesh Singh <kaleshsingh@google.com>
      Suggested-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      c5eac6ee
    • Kalesh Singh's avatar
      tracing: Fix operator precedence for hist triggers expression · 9710b2f3
      Kalesh Singh authored
      The current histogram expression evaluation logic evaluates the
      expression from right to left. This can lead to incorrect results
      if the operations are not associative (as is the case for subtraction
      and, the now added, division operators).
      	e.g. 16-8-4-2 should be 2 not 10 --> 16-8-4-2 = ((16-8)-4)-2
      	     64/8/4/2 should be 1 not 16 --> 64/8/4/2 = ((64/8)/4)/2
      
      Division and multiplication are currently limited to single operation
      expression due to operator precedence support not yet implemented.
      
      Rework the expression parsing to support the correct evaluation of
      expressions containing operators of different precedences; and fix
      the associativity error by evaluating expressions with operators of
      the same precedence from left to right.
      
      Examples:
              (1) echo 'hist:keys=common_pid:a=8,b=4,c=2,d=1,w=$a-$b-$c-$d' \
                        >> event/trigger
              (2) echo 'hist:keys=common_pid:x=$a/$b/3/2' >> event/trigger
              (3) echo 'hist:keys=common_pid:y=$a+10/$c*1024' >> event/trigger
              (4) echo 'hist:keys=common_pid:z=$a/$b+$c*$d' >> event/trigger
      
      Link: https://lkml.kernel.org/r/20211025200852.3002369-4-kaleshsingh@google.comSigned-off-by: default avatarKalesh Singh <kaleshsingh@google.com>
      Reviewed-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      9710b2f3
    • Kalesh Singh's avatar
      tracing: Add division and multiplication support for hist triggers · bcef0441
      Kalesh Singh authored
      Adds basic support for division and multiplication operations for
      hist trigger variable expressions.
      
      For simplicity this patch only supports, division and multiplication
      for a single operation expression (e.g. x=$a/$b), as currently
      expressions are always evaluated right to left. This can lead to some
      incorrect results:
      
      	e.g. echo 'hist:keys=common_pid:x=8-4-2' >> event/trigger
      
      	     8-4-2 should evaluate to 2 i.e. (8-4)-2
      	     but currently x evaluate to  6 i.e. 8-(4-2)
      
      Multiplication and division in sub-expressions will work correctly, once
      correct operator precedence support is added (See next patch in this
      series).
      
      For the undefined case of division by 0, the histogram expression
      evaluates to (u64)(-1). Since this cannot be detected when the
      expression is created, it is the responsibility of the user to be
      aware and account for this possibility.
      
      Examples:
      	echo 'hist:keys=common_pid:a=8,b=4,x=$a/$b' \
                         >> event/trigger
      
      	echo 'hist:keys=common_pid:y=5*$b' \
                         >> event/trigger
      
      Link: https://lkml.kernel.org/r/20211025200852.3002369-3-kaleshsingh@google.comSigned-off-by: default avatarKalesh Singh <kaleshsingh@google.com>
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      bcef0441
    • Kalesh Singh's avatar
      tracing: Add support for creating hist trigger variables from literal · 52cfb373
      Kalesh Singh authored
      Currently hist trigger expressions don't support the use of numeric
      literals:
      	e.g. echo 'hist:keys=common_pid:x=$y-1234'
      		--> is not valid expression syntax
      
      Having the ability to use numeric constants in hist triggers supports
      a wider range of expressions for creating variables.
      
      Add support for creating trace event histogram variables from numeric
      literals.
      
      	e.g. echo 'hist:keys=common_pid:x=1234,y=size-1024' >> event/trigger
      
      A negative numeric constant is created, using unary minus operator
      (parentheses are required).
      
      	e.g. echo 'hist:keys=common_pid:z=-(2)' >> event/trigger
      
      Constants can be used with division/multiplication (added in the
      next patch in this series) to implement granularity filters for frequent
      trace events. For instance we can limit emitting the rss_stat
      trace event to when there is a 512KB cross over in the rss size:
      
        # Create a synthetic event to monitor instead of the high frequency
        # rss_stat event
        echo 'rss_stat_throttled unsigned int mm_id; unsigned int curr;
      	int member; long size' >> tracing/synthetic_events
      
        # Create a hist trigger that emits the synthetic rss_stat_throttled
        # event only when the rss size crosses a 512KB boundary.
        echo 'hist:keys=keys=mm_id,member:bucket=size/0x80000:onchange($bucket)
            .rss_stat_throttled(mm_id,curr,member,size)'
              >> events/kmem/rss_stat/trigger
      
      A use case for using constants with addition/subtraction is not yet
      known, but for completeness the use of constants are supported for all
      operators.
      
      Link: https://lkml.kernel.org/r/20211025200852.3002369-2-kaleshsingh@google.comSigned-off-by: default avatarKalesh Singh <kaleshsingh@google.com>
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      52cfb373
    • Masami Hiramatsu's avatar
      selftests/ftrace: Stop tracing while reading the trace file by default · 25b95138
      Masami Hiramatsu authored
      Stop tracing while reading the trace file by default, to prevent
      the test results while checking it and to avoid taking a long time
      to check the result.
      If there is any testcase which wants to test the tracing while reading
      the trace file, please override this setting inside the test case.
      
      This also recovers the pause-on-trace when clean it up.
      
      Link: https://lkml.kernel.org/r/163529053143.690749.15365238954175942026.stgit@devnote2Signed-off-by: default avatarMasami Hiramatsu <mhiramat@kernel.org>
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      25b95138
  7. 26 Oct, 2021 10 commits