1. 03 Aug, 2020 5 commits
    • Muchun Song's avatar
      kprobes: Fix NULL pointer dereference at kprobe_ftrace_handler · 0cb2f137
      Muchun Song authored
      We found a case of kernel panic on our server. The stack trace is as
      follows(omit some irrelevant information):
      
        BUG: kernel NULL pointer dereference, address: 0000000000000080
        RIP: 0010:kprobe_ftrace_handler+0x5e/0xe0
        RSP: 0018:ffffb512c6550998 EFLAGS: 00010282
        RAX: 0000000000000000 RBX: ffff8e9d16eea018 RCX: 0000000000000000
        RDX: ffffffffbe1179c0 RSI: ffffffffc0535564 RDI: ffffffffc0534ec0
        RBP: ffffffffc0534ec1 R08: ffff8e9d1bbb0f00 R09: 0000000000000004
        R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
        R13: ffff8e9d1f797060 R14: 000000000000bacc R15: ffff8e9ce13eca00
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: 0000000000000080 CR3: 00000008453d0005 CR4: 00000000003606e0
        DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
        DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
        Call Trace:
         <IRQ>
         ftrace_ops_assist_func+0x56/0xe0
         ftrace_call+0x5/0x34
         tcpa_statistic_send+0x5/0x130 [ttcp_engine]
      
      The tcpa_statistic_send is the function being kprobed. After analysis,
      the root cause is that the fourth parameter regs of kprobe_ftrace_handler
      is NULL. Why regs is NULL? We use the crash tool to analyze the kdump.
      
        crash> dis tcpa_statistic_send -r
               <tcpa_statistic_send>: callq 0xffffffffbd8018c0 <ftrace_caller>
      
      The tcpa_statistic_send calls ftrace_caller instead of ftrace_regs_caller.
      So it is reasonable that the fourth parameter regs of kprobe_ftrace_handler
      is NULL. In theory, we should call the ftrace_regs_caller instead of the
      ftrace_caller. After in-depth analysis, we found a reproducible path.
      
        Writing a simple kernel module which starts a periodic timer. The
        timer's handler is named 'kprobe_test_timer_handler'. The module
        name is kprobe_test.ko.
      
        1) insmod kprobe_test.ko
        2) bpftrace -e 'kretprobe:kprobe_test_timer_handler {}'
        3) echo 0 > /proc/sys/kernel/ftrace_enabled
        4) rmmod kprobe_test
        5) stop step 2) kprobe
        6) insmod kprobe_test.ko
        7) bpftrace -e 'kretprobe:kprobe_test_timer_handler {}'
      
      We mark the kprobe as GONE but not disarm the kprobe in the step 4).
      The step 5) also do not disarm the kprobe when unregister kprobe. So
      we do not remove the ip from the filter. In this case, when the module
      loads again in the step 6), we will replace the code to ftrace_caller
      via the ftrace_module_enable(). When we register kprobe again, we will
      not replace ftrace_caller to ftrace_regs_caller because the ftrace is
      disabled in the step 3). So the step 7) will trigger kernel panic. Fix
      this problem by disarming the kprobe when the module is going away.
      
      Link: https://lkml.kernel.org/r/20200728064536.24405-1-songmuchun@bytedance.com
      
      Cc: stable@vger.kernel.org
      Fixes: ae6aa16f ("kprobes: introduce ftrace based optimization")
      Acked-by: default avatarMasami Hiramatsu <mhiramat@kernel.org>
      Signed-off-by: default avatarMuchun Song <songmuchun@bytedance.com>
      Co-developed-by: default avatarChengming Zhou <zhouchengming@bytedance.com>
      Signed-off-by: default avatarChengming Zhou <zhouchengming@bytedance.com>
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      0cb2f137
    • Josef Bacik's avatar
      ftrace: Fix ftrace_trace_task return value · c58b6b03
      Josef Bacik authored
      I was attempting to use pid filtering with function_graph, but it wasn't
      allowing anything to make it through.  Turns out ftrace_trace_task
      returns false if ftrace_ignore_pid is not-empty, which isn't correct
      anymore.  We're now setting it to FTRACE_PID_IGNORE if we need to ignore
      that pid, otherwise it's set to the pid (which is weird considering the
      name) or to FTRACE_PID_TRACE.  Fix the check to check for !=
      FTRACE_PID_IGNORE.  With this we can now use function_graph with pid
      filtering.
      
      Link: https://lkml.kernel.org/r/20200725005048.1790-1-josef@toxicpanda.com
      
      Fixes: 717e3f5e ("ftrace: Make function trace pid filtering a bit more exact")
      Signed-off-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      c58b6b03
    • Nick Desaulniers's avatar
      tracepoint: Use __used attribute definitions from compiler_attributes.h · 1c39d761
      Nick Desaulniers authored
      Just a small cleanup while I was touching this header.
      compiler_attributes.h does feature detection of these __attributes__(())
      and provides more concise ways to invoke them.
      
      Link: https://lkml.kernel.org/r/20200730224555.2142154-3-ndesaulniers@google.comAcked-by: default avatarMiguel Ojeda <miguel.ojeda.sandonis@gmail.com>
      Signed-off-by: default avatarNick Desaulniers <ndesaulniers@google.com>
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      1c39d761
    • Nick Desaulniers's avatar
      tracepoint: Mark __tracepoint_string's __used · f3751ad0
      Nick Desaulniers authored
      __tracepoint_string's have their string data stored in .rodata, and an
      address to that data stored in the "__tracepoint_str" section. Functions
      that refer to those strings refer to the symbol of the address. Compiler
      optimization can replace those address references with references
      directly to the string data. If the address doesn't appear to have other
      uses, then it appears dead to the compiler and is removed. This can
      break the /tracing/printk_formats sysfs node which iterates the
      addresses stored in the "__tracepoint_str" section.
      
      Like other strings stored in custom sections in this header, mark these
      __used to inform the compiler that there are other non-obvious users of
      the address, so they should still be emitted.
      
      Link: https://lkml.kernel.org/r/20200730224555.2142154-2-ndesaulniers@google.com
      
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
      Cc: stable@vger.kernel.org
      Fixes: 102c9323 ("tracing: Add __tracepoint_string() to export string pointers")
      Reported-by: default avatarTim Murray <timmurray@google.com>
      Reported-by: default avatarSimon MacMullen <simonmacm@google.com>
      Suggested-by: default avatarGreg Hackmann <ghackmann@google.com>
      Signed-off-by: default avatarNick Desaulniers <ndesaulniers@google.com>
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      f3751ad0
    • Zhaoyang Huang's avatar
      trace : Have tracing buffer info use kvzalloc instead of kzalloc · 0f69dae4
      Zhaoyang Huang authored
      High order memory stuff within trace could introduce OOM, use kvzalloc instead.
      
      Please find the bellowing for the call stack we run across in an android system.
      The scenario happens when traced_probes is woken up to get a large quantity of
      trace even if free memory is even higher than watermark_low. 
      
      traced_probes invoked oom-killer: gfp_mask=0x140c0c0(GFP_KERNEL|__GFP_COMP|__GFP_ZERO), nodemask=(null),  order=2, oom_score_adj=-1
      
      traced_probes cpuset=system-background mems_allowed=0
      CPU: 3 PID: 588 Comm: traced_probes Tainted: G        W  O    4.14.181 #1
      Hardware name: Generic DT based system
      (unwind_backtrace) from [<c010d824>] (show_stack+0x20/0x24)
      (show_stack) from [<c0b2e174>] (dump_stack+0xa8/0xec)
      (dump_stack) from [<c027d584>] (dump_header+0x9c/0x220)
      (dump_header) from [<c027cfe4>] (oom_kill_process+0xc0/0x5c4)
      (oom_kill_process) from [<c027cb94>] (out_of_memory+0x220/0x310)
      (out_of_memory) from [<c02816bc>] (__alloc_pages_nodemask+0xff8/0x13a4)
      (__alloc_pages_nodemask) from [<c02a6a1c>] (kmalloc_order+0x30/0x48)
      (kmalloc_order) from [<c02a6a64>] (kmalloc_order_trace+0x30/0x118)
      (kmalloc_order_trace) from [<c0223d7c>] (tracing_buffers_open+0x50/0xfc)
      (tracing_buffers_open) from [<c02e6f58>] (do_dentry_open+0x278/0x34c)
      (do_dentry_open) from [<c02e70d0>] (vfs_open+0x50/0x70)
      (vfs_open) from [<c02f7c24>] (path_openat+0x5fc/0x169c)
      (path_openat) from [<c02f75c4>] (do_filp_open+0x94/0xf8)
      (do_filp_open) from [<c02e7650>] (do_sys_open+0x168/0x26c)
      (do_sys_open) from [<c02e77bc>] (SyS_openat+0x34/0x38)
      (SyS_openat) from [<c0108bc0>] (ret_fast_syscall+0x0/0x28)
      
      Link: https://lkml.kernel.org/r/1596155265-32365-1-git-send-email-zhaoyang.huang@unisoc.comSigned-off-by: default avatarZhaoyang Huang <zhaoyang.huang@unisoc.com>
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      0f69dae4
  2. 31 Jul, 2020 2 commits
  3. 30 Jul, 2020 3 commits
  4. 09 Jul, 2020 2 commits
  5. 02 Jul, 2020 4 commits
    • Steven Rostedt (VMware)'s avatar
      ring-buffer: Do not trigger a WARN if clock going backwards is detected · 29ce2451
      Steven Rostedt (VMware) authored
      After tweaking the ring buffer to be a bit faster, a warning is triggering
      on one of my machines, and causing my tests to fail. This warning is caused
      when the delta (current time stamp minus previous time stamp), is larger
      than the max time held by the ring buffer (59 bits).
      
      If the clock were to go backwards slightly, this would then easily trigger
      this warning. The machine that it triggered on, the clock did go backwards
      by around 450 nanoseconds, and this happened after a recalibration of the
      TSC clock. Now that the ring buffer is faster, it detects this, and the
      delta that is used larger than the max, the warning is triggered and my test
      fails.
      
      To handle the clock going backwards, look at the saved before and after time
      stamps. If they are the same, it means that the current event did not
      interrupt another event, and that those timestamp are of a previous event
      that was recorded. If the max delta is triggered, look at those time stamps,
      make sure they are the same, then use them to compare with the current
      timestamp. If the current timestamp is less than the before/after time
      stamps, then that means the clock being used went backward.
      
      Print out a message that this has happened, but do not warn about it (and
      only print the message once).
      
      Still do the warning if the delta is indeed larger than what can be used.
      
      Also remove the unneeded KERN_WARNING from the WARN_ONCE() print.
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      29ce2451
    • Steven Rostedt (VMware)'s avatar
      ring-buffer: Call trace_clock_local() directly for RETPOLINE kernels · bbeba3e5
      Steven Rostedt (VMware) authored
      After doing some benchmarks and examining the code, I found that the ring
      buffer clock calls were quite expensive, and noticed that it uses
      retpolines. This is because the ring buffer clock is programmable, and can
      be set. But in most cases it simply uses the fastest ns unit clock which is
      the trace_clock_local(). For RETPOLINE builds, checking if the ring buffer
      clock is set to trace_clock_local() and then calling it directly has brought
      the time of an event on my i7 box from an average of 93 nanoseconds an event
      down to 83 nanoseconds an event, and the minimum time from 81 nanoseconds to
      68 nanoseconds!
      Suggested-by: default avatarMathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      bbeba3e5
    • Steven Rostedt (VMware)'s avatar
      ring-buffer: Move the add_timestamp into its own function · 74e87937
      Steven Rostedt (VMware) authored
      Make a helper function rb_add_timestamp() that moves the adding of the
      extended time stamps into its own function. Also, remove the noinline and
      inline for the functions it calls, as recent benchmarks appear they do not
      make a difference (just let gcc decide).
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      74e87937
    • Steven Rostedt (VMware)'s avatar
      ring-buffer: Consolidate add_timestamp to remove some branches · 58fbc3c6
      Steven Rostedt (VMware) authored
      Reorganize a little the logic to handle adding the absolute time stamp,
      extended and forced time stamps, in such a way to remove a branch or two.
      This is just a micro optimization.
      
      Also add before and after time stamps to the rb_event_info structure to
      display those values in the rb_check_timestamps() code, if something were to
      go wrong.
      Suggested-by: default avatarMathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      58fbc3c6
  6. 30 Jun, 2020 10 commits
    • Steven Rostedt (VMware)'s avatar
      ring-buffer: Mark the !tail (crossing a page) as unlikely · 75b21c6d
      Steven Rostedt (VMware) authored
      It is the uncommon case where an event crosses a sub buffer boundary (page)
      mark that check at the end of reserving an event as unlikely.
      Suggested-by: default avatarMathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      75b21c6d
    • Nicholas Piggin's avatar
      ring-buffer: speed up buffer resets by avoiding synchronize_rcu for each CPU · b23d7a5f
      Nicholas Piggin authored
      On a 144 thread system, `perf ftrace` takes about 20 seconds to start
      up, due to calling synchronize_rcu() for each CPU.
      
        cat /proc/108560/stack
          0xc0003e7eb336f470
          __switch_to+0x2e0/0x480
          __wait_rcu_gp+0x20c/0x220
          synchronize_rcu+0x9c/0xc0
          ring_buffer_reset_cpu+0x88/0x2e0
          tracing_reset_online_cpus+0x84/0xe0
          tracing_open+0x1d4/0x1f0
      
      On a system with 10x more threads, it starts to become an annoyance.
      
      Batch these up so we disable all the per-cpu buffers first, then
      synchronize_rcu() once, then reset each of the buffers. This brings
      the time down to about 0.5s.
      
      Link: https://lkml.kernel.org/r/20200625053403.2386972-1-npiggin@gmail.comTested-by: default avatarAnton Blanchard <anton@ozlabs.org>
      Acked-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      Signed-off-by: default avatarNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      b23d7a5f
    • Steven Rostedt (VMware)'s avatar
      ring-buffer: Add rb_time_t 64 bit operations for speeding up 32 bit · 10464b4a
      Steven Rostedt (VMware) authored
      After a discussion with the new time algorithm to have nested events still
      have proper time keeping but required using local64_t atomic operations.
      Mathieu was concerned about the performance this would have on 32 bit
      machines, as in most cases, atomic 64 bit operations on them can be
      expensive.
      
      As the ring buffer's timing needs do not require full features of local64_t,
      a wrapper is made to implement a new rb_time_t operation that uses two longs
      on 32 bit machines but still uses the local64_t operations on 64 bit
      machines. There's a switch that can be made in the file to force 64 bit to
      use the 32 bit version just for testing purposes.
      
      All reads do not need to succeed if a read happened while the stamp being
      read is in the process of being updated. The requirement is that all reads
      must succed that were done by an interrupting event (where this event was
      interrupted by another event that did the write). Or if the event itself did
      the write first. That is: rb_time_set(t, x) followed by rb_time_read(t) will
      always succeed (even if it gets interrupted by another event that writes to
      t. The result of the read will be either the previous set, or a set
      performed by an interrupting event.
      
      If the read is done by an event that interrupted another event that was in
      the process of setting the time stamp, and no other event came along to
      write to that time stamp, it will fail and the rb_time_read() will return
      that it failed (the value to read will be undefined).
      
      A set will always write to the time stamp and return with a valid time
      stamp, such that any read after it will be valid.
      
      A cmpxchg may fail if it interrupted an event that was in the process of
      updating the time stamp just like the reads do. Other than that, it will act
      like a normal cmpxchg.
      
      The way this works is that the rb_time_t is made of of three fields. A cnt,
      that gets updated atomically everyting a modification is made. A top that
      represents the most significant 30 bits of the time, and a bottom to
      represent the least significant 30 bits of the time. Notice, that the time
      values is only 60 bits long (where the ring buffer only uses 59 bits, which
      gives us 18 years of nanoseconds!).
      
      The top two bits of both the top and bottom is a 2 bit counter that gets set
      by the value of the least two significant bits of the cnt. A read of the top
      and the bottom where both the top and bottom have the same most significant
      top 2 bits, are considered a match and a valid 60 bit number can be created
      from it. If they do not match, then the number is considered invalid, and
      this must only happen if an event interrupted another event in the midst of
      updating the time stamp.
      
      This is only used for 32 bits machines as 64 bit machines can get better
      performance out of the local64_t. This has been tested heavily by forcing 64
      bit to use this logic.
      
      Link: https://lore.kernel.org/r/20200625225345.18cf5881@oasis.local.home
      Link: http://lkml.kernel.org/r/20200629025259.309232719@goodmis.orgInspired-by: default avatarMathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      10464b4a
    • Steven Rostedt (VMware)'s avatar
      ring-buffer: Incorporate absolute timestamp into add_timestamp logic · 7c4b4a51
      Steven Rostedt (VMware) authored
      Instead of calling out the absolute test for each time to check if the
      ring buffer wants absolute time stamps for all its recording, incorporate it
      with the add_timestamp field and turn it into flags for faster processing
      between wanting a absolute tag and needing to force one.
      
      Link: http://lkml.kernel.org/r/20200629025259.154892368@goodmis.orgSigned-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      7c4b4a51
    • Steven Rostedt (VMware)'s avatar
      ring-buffer: Have nested events still record running time stamp · a389d86f
      Steven Rostedt (VMware) authored
      Up until now, if an event is interrupted while it is recorded by an
      interrupt, and that interrupt records events, the time of those events will
      all be the same. This is because events only record the delta of the time
      since the previous event (or beginning of a page), and to handle updating
      the time keeping for that of nested events is extremely racy. After years of
      thinking about this and several failed attempts, I finally have a solution
      to solve this puzzle.
      
      The problem is that you need to atomically calculate the delta and then
      update the time stamp you made the delta from, as well as then record it
      into the buffer, all this while at any time an interrupt can come in and
      do the same thing. This is easy to solve with heavy weight atomics, but that
      would be detrimental to the performance of the ring buffer. The current
      state of affairs sacrificed the time deltas for nested events for
      performance.
      
      The reason for previous failed attempts at solving this puzzle was because I
      was trying to completely avoid slow atomic operations like cmpxchg. I final
      came to the conclusion to always avoid cmpxchg is not possible, which is why
      those previous attempts always failed. But it is possible to pick one path
      (the most common case) and avoid cmpxchg in that path, which is the "fast
      path". The most common case is that an event will not be interrupted and
      have other events added into it. An event can detect if it has
      interrupted another event, and for these cases we can make it the slow
      path and use the heavy operations like cmpxchg.
      
      One more player was added to the game that made this possible, and that is
      the "absolute timestamp" (by Tom Zanussi) that allows us to inject a full 59
      bit time stamp. (Of course this breaks if a machine is running for more than
      18 years without a reboot!).
      
      There's barrier() placements around for being paranoid, even when they
      are not needed because of other atomic functions near by. But those
      should not hurt, as if they are not needed, they basically become a nop.
      
      Note, this also makes the race window much smaller, which means there
      are less slow paths to slow down the performance.
      
      The basic idea is that there's two main paths taken.
      
       1) Not being interrupted between time stamps and reserving buffer space.
          In this case, the time stamps taken are true to the location in the
          buffer.
      
       2) Was interrupted by another path between taking time stamps and reserving
          buffer space.
      
      The objective is to know what the delta is from the last reserved location
      in the buffer.
      
      As it is possible to detect if an event is interrupting another event before
      reserving data, space is added to the length to be reserved to inject a full
      time stamp along with the event being reserved.
      
      When an event is not interrupted, the write stamp is always the time of the
      last event written to the buffer.
      
      In path 1, there's two sub paths we care about:
      
       a) The event did not interrupt another event.
       b) The event interrupted another event.
      
      In case a, as the write stamp was read and known to be correct, the delta
      between the current time stamp and the write stamp is the delta between the
      current event and the previously recorded event.
      
      In case b, extra space was reserved to just put the full time stamp into the
      buffer. Which is done, as stated, in this path the time stamp taken is known
      to match the location in the buffer.
      
      In path 2, there's also two sub paths we care about:
      
       a) The event was not interrupted by another event since it reserved space
          on the buffer and re-reading the write stamp.
       b) The event was interrupted by another event.
      
      In case a, the write stamp is that of the last event that interrupted this
      event between taking the time stamps and reserving. As no event came in
      after re-reading the write stamp, that event is known to be the time of the
      event directly before this event and the delta can be the new time stamp and
      the write stamp.
      
      In case b, one or more events came in between reserving the event and
      re-reading he write stamp. Since this event's buffer reservation is between
      other events at this path, there's no way to know what the delta is. But
      because an event interrupted this event after it started, its fine to just
      give a zero delta, and take the same time stamp as the events that happened
      within the event being recorded.
      
      Here's the implementation of the design of this solution:
      
       All this is per cpu, and only needs to worry about nested events (not
       parallel events).
      
      The players:
      
       write_tail: The index in the buffer where new events can be written to.
           It is incremented via local_add() to reserve space for a new event.
      
       before_stamp: A time stamp set by all events before reserving space.
      
       write_stamp: A time stamp updated by events after it has successfully
           reserved space.
      
      	/* Save the current position of write */
       [A]	w = local_read(write_tail);
      	barrier();
      	/* Read both before and write stamps before touching anything */
      	before = local_read(before_stamp);
      	after = local_read(write_stamp);
      	barrier();
      
      	/*
      	 * If before and after are the same, then this event is not
      	 * interrupting a time update. If it is, then reserve space for adding
      	 * a full time stamp (this can turn into a time extend which is
      	 * just an extended time delta but fill up the extra space).
      	 */
      	if (after != before)
      		abs = true;
      
      	ts = clock();
      
      	/* Now update the before_stamp (everyone does this!) */
       [B]	local_set(before_stamp, ts);
      
      	/* Now reserve space on the buffer */
       [C]	write = local_add_return(len, write_tail);
      
      	/* Set tail to be were this event's data is */
      	tail = write - len;
      
       	if (w == tail) {
      
      		/* Nothing interrupted this between A and C */
       [D]		local_set(write_stamp, ts);
      		barrier();
       [E]		save_before = local_read(before_stamp);
      
       		if (!abs) {
      			/* This did not interrupt a time update */
      			delta = ts - after;
      		} else {
      			delta = ts; /* The full time stamp will be in use */
      		}
      		if (ts != save_before) {
      			/* slow path - Was interrupted between C and E */
      			/* The update to write_stamp could have overwritten the update to
      			 * it by the interrupting event, but before and after should be
      			 * the same for all completed top events */
      			after = local_read(write_stamp);
      			if (save_before > after)
      				local_cmpxchg(write_stamp, after, save_before);
      		}
      	} else {
      		/* slow path - Interrupted between A and C */
      
      		after = local_read(write_stamp);
      		temp_ts = clock();
      		barrier();
       [F]		if (write == local_read(write_tail) && after < temp_ts) {
      			/* This was not interrupted since C and F
      			 * The last write_stamp is still valid for the previous event
      			 * in the buffer. */
      			delta = temp_ts - after;
      			/* OK to keep this new time stamp */
      			ts = temp_ts;
      		} else {
      			/* Interrupted between C and F
      			 * Well, there's no use to try to know what the time stamp
      			 * is for the previous event. Just set delta to zero and
      			 * be the same time as that event that interrupted us before
      			 * the reservation of the buffer. */
      
      			delta = 0;
      		}
      		/* No need to use full timestamps here */
      		abs = 0;
      	}
      
      Link: https://lkml.kernel.org/r/20200625094454.732790f7@oasis.local.home
      Link: https://lore.kernel.org/r/20200627010041.517736087@goodmis.org
      Link: http://lkml.kernel.org/r/20200629025258.957440797@goodmis.orgReviewed-by: default avatarMasami Hiramatsu <mhiramat@kernel.org>
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      a389d86f
    • Steven Rostedt (VMware)'s avatar
      tracing: Move pipe reference to trace array instead of current_tracer · 7ef282e0
      Steven Rostedt (VMware) authored
      If a process has the trace_pipe open on a trace_array, the current tracer
      for that trace array should not be changed. This was original enforced by a
      global lock, but when instances were introduced, it was moved to the
      current_trace. But this structure is shared by all instances, and a
      trace_pipe is for a single instance. There's no reason that a process that
      has trace_pipe open on one instance should prevent another instance from
      changing its current tracer. Move the reference counter to the trace_array
      instead.
      
      This is marked as "Fixes" but is more of a clean up than a true fix.
      Backport if you want, but its not critical.
      
      Fixes: cf6ab6d9 ("tracing: Add ref count to tracer for when they are being read by pipe")
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      7ef282e0
    • Wei Yang's avatar
      tracing: not necessary to define DEFINE_EVENT_PRINT to be empty again · e6bc5b3f
      Wei Yang authored
      After the previous cleanup, DEFINE_EVENT_PRINT's definition has no
      relationship with DEFINE_EVENT. So After we re-define DEFINE_EVENT, it
      is not necessary to define DEFINE_EVENT_PRINT to be empty again.
      
      Link: http://lkml.kernel.org/r/20200612092844.56107-5-richard.weiyang@linux.alibaba.comSigned-off-by: default avatarWei Yang <richard.weiyang@linux.alibaba.com>
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      e6bc5b3f
    • Wei Yang's avatar
      tracing: define DEFINE_EVENT_PRINT not related to DEFINE_EVENT · 61df16fc
      Wei Yang authored
      Current definition define DEFINE_EVENT_PRINT to be DEFINE_EVENT.
      Actually, at this point DEFINE_EVENT is already an empty macro. Let's
      cut the relationship between DEFINE_EVENT_PRINT and DEFINE_EVENT.
      
      Link: http://lkml.kernel.org/r/20200612092844.56107-4-richard.weiyang@linux.alibaba.comSigned-off-by: default avatarWei Yang <richard.weiyang@linux.alibaba.com>
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      61df16fc
    • Wei Yang's avatar
      tracing: not necessary re-define DEFINE_EVENT_PRINT · b6f9eb87
      Wei Yang authored
      The definition of DEFINE_EVENT_PRINT is not changed after previous one,
      so not necessary to re-define is as the same form.
      
      Link: http://lkml.kernel.org/r/20200612092844.56107-3-richard.weiyang@linux.alibaba.comSigned-off-by: default avatarWei Yang <richard.weiyang@linux.alibaba.com>
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      b6f9eb87
    • Wei Yang's avatar
      tracing: not necessary to undefine DEFINE_EVENT again · e8cf9c8c
      Wei Yang authored
      After un-define DEFINE_EVENT in Stage 2, DEFINE_EVENT is not defined to a
      specific form. It is not necessary to un-define it again.
      
      Let's skip this.
      
      Link: http://lkml.kernel.org/r/20200612092844.56107-2-richard.weiyang@linux.alibaba.comSigned-off-by: default avatarWei Yang <richard.weiyang@linux.alibaba.com>
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      e8cf9c8c
  7. 29 Jun, 2020 4 commits
  8. 28 Jun, 2020 10 commits
    • Linus Torvalds's avatar
      Linux 5.8-rc3 · 9ebcfadb
      Linus Torvalds authored
      9ebcfadb
    • Linus Torvalds's avatar
      Merge tag 'arm-omap-fixes-5.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc · f7db192b
      Linus Torvalds authored
      Pull ARM OMAP fixes from Arnd Bergmann:
       "The OMAP developers are particularly active at hunting down
        regressions, so this is a separate branch with OMAP specific
        fixes for v5.8:
      
        As Tony explains
          "The recent display subsystem (DSS) related platform data changes
           caused display related regressions for suspend and resume. Looks
           like I only tested suspend and resume before dropping the legacy
           platform data, and forgot to test it after dropping it. Turns out
           the main issue was that we no longer have platform code calling
           pm_runtime_suspend for DSS like we did for the legacy platform data
           case, and that fix is still being discussed on the dri-devel list
           and will get merged separately. The DSS related testing exposed a
           pile other other display related issues that also need fixing
           though":
      
         - Fix ti-sysc optional clock handling and reset status checks for
           devices that reset automatically in idle like DSS
      
         - Ignore ti-sysc clockactivity bit unless separately requested to
           avoid unexpected performance issues
      
         - Init ti-sysc framedonetv_irq to true and disable for am4
      
         - Avoid duplicate DSS reset for legacy mode with dts data
      
         - Remove LCD timings for am4 as they cause warnings now that we're
           using generic panels
      
        Other OMAP changes from Tony include:
      
         - Fix omap_prm reset deassert as we still have drivers setting the
           pm_runtime_irq_safe() flag
      
         - Flush posted write for ti-sysc enable and disable
      
         - Fix droid4 spi related errors with spi flags
      
         - Fix am335x USB range and a typo for softreset
      
         - Fix dra7 timer nodes for clocks for IPU and DSP
      
         - Drop duplicate mailboxes after mismerge for dra7
      
         - Prevent pocketgeagle header line signal from accidentally setting
           micro-SD write protection signal by removing the default mux
      
         - Fix NFSroot flakeyness after resume for duover by switching the
           smsc911x gpio interrupt to back to level sensitive
      
         - Fix regression for omap4 clockevent source after recent system
           timer changes
      
         - Yet another ethernet regression fix for the "rgmii" vs "rgmii-rxid"
           phy-mode
      
         - One patch to convert am3/am4 DT files to use the regular sdhci-omap
           driver instead of the old hsmmc driver, this was meant for the
           merge window but got lost in the process"
      
      * tag 'arm-omap-fixes-5.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (21 commits)
        ARM: dts: am5729: beaglebone-ai: fix rgmii phy-mode
        ARM: dts: Fix omap4 system timer source clocks
        ARM: dts: Fix duovero smsc interrupt for suspend
        ARM: dts: am335x-pocketbeagle: Fix mmc0 Write Protect
        Revert "bus: ti-sysc: Increase max softreset wait"
        ARM: dts: am437x-epos-evm: remove lcd timings
        ARM: dts: am437x-gp-evm: remove lcd timings
        ARM: dts: am437x-sk-evm: remove lcd timings
        ARM: dts: dra7-evm-common: Fix duplicate mailbox nodes
        ARM: dts: dra7: Fix timer nodes properly for timer_sys_ck clocks
        ARM: dts: Fix am33xx.dtsi ti,sysc-mask wrong softreset flag
        ARM: dts: Fix am33xx.dtsi USB ranges length
        bus: ti-sysc: Increase max softreset wait
        ARM: OMAP2+: Fix legacy mode dss_reset
        bus: ti-sysc: Fix uninitialized framedonetv_irq
        bus: ti-sysc: Ignore clockactivity unless specified as a quirk
        bus: ti-sysc: Use optional clocks on for enable and wait for softreset bit
        ARM: dts: omap4-droid4: Fix spi configuration and increase rate
        bus: ti-sysc: Flush posted write on enable and disable
        soc: ti: omap-prm: use atomic iopoll instead of sleeping one
        ...
      f7db192b
    • Linus Torvalds's avatar
      Merge tag 'arm-fixes-5.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc · e44b59cd
      Linus Torvalds authored
      Pull ARM SoC fixes from Arnd Bergmann:
       "Here are a couple of bug fixes, mostly for devicetree files
      
        NXP i.MX:
         - Use correct voltage on some i.MX8M board device trees to avoid
           hardware damage
         - Code fixes for a compiler warning and incorrect reference counting,
           both harmless.
         - Fix the i.MX8M SoC driver to correctly identify imx8mp
         - Fix watchdog configuration in imx6ul-kontron device tree.
      
        Broadcom:
         - A small regression fix for the Raspberry-Pi firmware driver
         - A Kconfig change to use the correct timer driver on Northstar
         - A DT fix for the Luxul XWC-2000 machine
         - Two more DT fixes for NSP SoCs
      
        STmicroelectronics STI
         - Revert one broken patch for L2 cache configuration
      
        ARM Versatile Express:
         - Fix a regression by reverting a broken DT cleanup
      
        TEE drivers:
         - MAINTAINERS: change tee mailing list"
      
      * tag 'arm-fixes-5.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc:
        Revert "ARM: sti: Implement dummy L2 cache's write_sec"
        soc: imx8m: fix build warning
        ARM: imx6: add missing put_device() call in imx6q_suspend_init()
        ARM: imx5: add missing put_device() call in imx_suspend_alloc_ocram()
        soc: imx8m: Correct i.MX8MP UID fuse offset
        ARM: dts: imx6ul-kontron: Change WDOG_ANY signal from push-pull to open-drain
        ARM: dts: imx6ul-kontron: Move watchdog from Kontron i.MX6UL/ULL board to SoM
        arm64: dts: imx8mm-beacon: Fix voltages on LDO1 and LDO2
        arm64: dts: imx8mn-ddr4-evk: correct ldo1/ldo2 voltage range
        arm64: dts: imx8mm-evk: correct ldo1/ldo2 voltage range
        ARM: dts: NSP: Correct FA2 mailbox node
        ARM: bcm2835: Fix integer overflow in rpi_firmware_print_firmware_revision()
        MAINTAINERS: change tee mailing list
        ARM: dts: NSP: Disable PL330 by default, add dma-coherent property
        ARM: bcm: Select ARM_TIMER_SP804 for ARCH_BCM_NSP
        ARM: dts: BCM5301X: Add missing memory "device_type" for Luxul XWC-2000
        arm: dts: vexpress: Move mcc node back into motherboard node
      e44b59cd
    • Linus Torvalds's avatar
      Merge tag 'timers-urgent-2020-06-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 668f532d
      Linus Torvalds authored
      Pull timer fix from Ingo Molnar:
       "A single DocBook fix"
      
      * tag 'timers-urgent-2020-06-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        timekeeping: Fix kerneldoc system_device_crosststamp & al
      668f532d
    • Linus Torvalds's avatar
      Merge tag 'perf-urgent-2020-06-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · ae71d4bf
      Linus Torvalds authored
      Pull perf fix from Ingo Molnar:
       "A single Kbuild dependency fix"
      
      * tag 'perf-urgent-2020-06-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        perf/x86/rapl: Fix RAPL config variable bug
      ae71d4bf
    • Linus Torvalds's avatar
      Merge tag 'efi-urgent-2020-06-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · bc53f67d
      Linus Torvalds authored
      Pull EFI fixes from Ingo Molnar:
      
       - Fix build regression on v4.8 and older
      
       - Robustness fix for TPM log parsing code
      
       - kobject refcount fix for the ESRT parsing code
      
       - Two efivarfs fixes to make it behave more like an ordinary file
         system
      
       - Style fixup for zero length arrays
      
       - Fix a regression in path separator handling in the initrd loader
      
       - Fix a missing prototype warning
      
       - Add some kerneldoc headers for newly introduced stub routines
      
       - Allow support for SSDT overrides via EFI variables to be disabled
      
       - Report CPU mode and MMU state upon entry for 32-bit ARM
      
       - Use the correct stack pointer alignment when entering from mixed mode
      
      * tag 'efi-urgent-2020-06-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        efi/libstub: arm: Print CPU boot mode and MMU state at boot
        efi/libstub: arm: Omit arch specific config table matching array on arm64
        efi/x86: Setup stack correctly for efi_pe_entry
        efi: Make it possible to disable efivar_ssdt entirely
        efi/libstub: Descriptions for stub helper functions
        efi/libstub: Fix path separator regression
        efi/libstub: Fix missing-prototype warning for skip_spaces()
        efi: Replace zero-length array and use struct_size() helper
        efivarfs: Don't return -EINTR when rate-limiting reads
        efivarfs: Update inode modification time for successful writes
        efi/esrt: Fix reference count leak in esre_create_sysfs_entry.
        efi/tpm: Verify event log header before parsing
        efi/x86: Fix build with gcc 4
      bc53f67d
    • Linus Torvalds's avatar
      Merge tag 'sched_urgent_for_5.8_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 91a9a90d
      Linus Torvalds authored
      Pull scheduler fixes from Borislav Petkov:
       "The most anticipated fix in this pull request is probably the horrible
        build fix for the RANDSTRUCT fail that didn't make -rc2. Also included
        is the cleanup that removes those BUILD_BUG_ON()s and replaces it with
        ugly unions.
      
        Also included is the try_to_wake_up() race fix that was first
        triggered by Paul's RCU-torture runs, but was independently hit by
        Dave Chinner's fstest runs as well"
      
      * tag 'sched_urgent_for_5.8_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        sched/cfs: change initial value of runnable_avg
        smp, irq_work: Continue smp_call_function*() and irq_work*() integration
        sched/core: s/WF_ON_RQ/WQ_ON_CPU/
        sched/core: Fix ttwu() race
        sched/core: Fix PI boosting between RT and DEADLINE tasks
        sched/deadline: Initialize ->dl_boosted
        sched/core: Check cpus_mask, not cpus_ptr in __set_cpus_allowed_ptr(), to fix mask corruption
        sched/core: Fix CONFIG_GCC_PLUGIN_RANDSTRUCT build fail
      91a9a90d
    • Linus Torvalds's avatar
      Merge tag 'x86_urgent_for_5.8_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 098c7938
      Linus Torvalds authored
      Pull x86 fixes from Borislav Petkov:
      
       - AMD Memory bandwidth counter width fix, by Babu Moger.
      
       - Use the proper length type in the 32-bit truncate() syscall variant,
         by Jiri Slaby.
      
       - Reinit IA32_FEAT_CTL during wakeup to fix the case where after
         resume, VMXON would #GP due to VMX not being properly enabled, by
         Sean Christopherson.
      
       - Fix a static checker warning in the resctrl code, by Dan Carpenter.
      
       - Add a CR4 pinning mask for bits which cannot change after boot, by
         Kees Cook.
      
       - Align the start of the loop of __clear_user() to 16 bytes, to improve
         performance on AMD zen1 and zen2 microarchitectures, by Matt Fleming.
      
      * tag 'x86_urgent_for_5.8_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/asm/64: Align start of __clear_user() loop to 16-bytes
        x86/cpu: Use pinning mask for CR4 bits needing to be 0
        x86/resctrl: Fix a NULL vs IS_ERR() static checker warning in rdt_cdp_peer_get()
        x86/cpu: Reinitialize IA32_FEAT_CTL MSR on BSP during wakeup
        syscalls: Fix offset type of ksys_ftruncate()
        x86/resctrl: Fix memory bandwidth counter width for AMD
      098c7938
    • Linus Torvalds's avatar
      Merge tag 'rcu_urgent_for_5.8_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · c141b30e
      Linus Torvalds authored
      Pull RCU-vs-KCSAN fixes from Borislav Petkov:
       "A single commit that uses "arch_" atomic operations to avoid the
        instrumentation that comes with the non-"arch_" versions.
      
        In preparation for that commit, it also has another commit that makes
        these "arch_" atomic operations available to generic code.
      
        Without these commits, KCSAN uses can see pointless errors"
      
      * tag 'rcu_urgent_for_5.8_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        rcu: Fixup noinstr warnings
        locking/atomics: Provide the arch_atomic_ interface to generic code
      c141b30e
    • Linus Torvalds's avatar
      Merge tag 'objtool_urgent_for_5.8_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 7ecb59a5
      Linus Torvalds authored
      Pull objtool fixes from Borislav Petkov:
       "Three fixes from Peter Zijlstra suppressing KCOV instrumentation in
        noinstr sections.
      
        Peter Zijlstra says:
          "Address KCOV vs noinstr. There is no function attribute to
           selectively suppress KCOV instrumentation, instead teach objtool
           to NOP out the calls in noinstr functions"
      
        This cures a bunch of KCOV crashes (as used by syzcaller)"
      
      * tag 'objtool_urgent_for_5.8_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        objtool: Fix noinstr vs KCOV
        objtool: Provide elf_write_{insn,reloc}()
        objtool: Clean up elf_write() condition
      7ecb59a5