1. 10 Jun, 2014 3 commits
  2. 07 Jun, 2014 1 commit
    • Steven Rostedt (Red Hat)'s avatar
      tracing: Fix memory leak on instance deletion · a9fcaaac
      Steven Rostedt (Red Hat) authored
      When an instance is created, it also gets a snapshot ring buffer
      allocated (with minimum of pages). But when it is deleted the snapshot
      buffer is not. There was a helper function added to match the allocation
      of these ring buffers to a way to free them, but it wasn't used by
      the deletion of an instance. Using that helper function solves this
      memory leak.
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      a9fcaaac
  3. 06 Jun, 2014 5 commits
  4. 05 Jun, 2014 1 commit
    • Yoshihiro YUNOMAE's avatar
      tracing: Introduce saved_cmdlines_size file · 939c7a4f
      Yoshihiro YUNOMAE authored
      Introduce saved_cmdlines_size file for changing the number of saved pid-comms.
      saved_cmdlines currently stores 128 command names using SAVED_CMDLINES, but
      'no-existing processes' names are often lost in saved_cmdlines when we
      read the trace data. So, by introducing saved_cmdlines_size file, we can
      now change the 128 command names saved to something much larger if needed.
      
      When we write a value to saved_cmdlines_size, the number of the value will
      be stored in pid-comm list:
      
      	# echo 1024 > /sys/kernel/debug/tracing/saved_cmdlines_size
      
      Here, 1024 command names can be stored. The default number is 128 and the maximum
      number is PID_MAX_DEFAULT (=32768 if CONFIG_BASE_SMALL is not set). So, if we
      want to avoid losing any command names, we need to set 32768 to
      saved_cmdlines_size.
      
      We can read the maximum number of the list:
      
      	# cat /sys/kernel/debug/tracing/saved_cmdlines_size
      	128
      
      Link: http://lkml.kernel.org/p/20140605012427.22115.16173.stgit@yunodevelSigned-off-by: default avatarYoshihiro YUNOMAE <yoshihiro.yunomae.ez@hitachi.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      939c7a4f
  5. 04 Jun, 2014 2 commits
  6. 03 Jun, 2014 2 commits
  7. 02 Jun, 2014 1 commit
    • Minchan Kim's avatar
      tracing: Print max callstack on stacktrace bug · e3172181
      Minchan Kim authored
      While I played with my own feature(ex, something on the way to reclaim),
      the kernel would easily oops. I guessed that the reason had to do with
      stack overflow and wanted to prove it.
      
      I discovered the stack tracer which proved to be very useful for me but
      the kernel would oops before my user program gather the information via
      "watch cat /sys/kernel/debug/tracing/stack_trace" so I couldn't get any
      message from that. What I needed was to have the stack tracer emit the
      kernel stack usage before it does the oops so I could find what was
      hogging the stack.
      
      This patch shows the callstack of max stack usage right before an oops so
      we can find a culprit.
      
      So, the result is as follows.
      
      [ 1116.522206] init: lightdm main process (1246) terminated with status 1
      [ 1119.922916] init: failsafe-x main process (1272) terminated with status 1
      [ 3887.728131] kworker/u24:1 (6637) used greatest stack depth: 256 bytes left
      [ 6397.629227] cc1 (9554) used greatest stack depth: 128 bytes left
      [ 7174.467392]         Depth    Size   Location    (47 entries)
      [ 7174.467392]         -----    ----   --------
      [ 7174.467785]   0)     7248     256   get_page_from_freelist+0xa7/0x920
      [ 7174.468506]   1)     6992     352   __alloc_pages_nodemask+0x1cd/0xb20
      [ 7174.469224]   2)     6640       8   alloc_pages_current+0x10f/0x1f0
      [ 7174.469413]   3)     6632     168   new_slab+0x2c5/0x370
      [ 7174.469413]   4)     6464       8   __slab_alloc+0x3a9/0x501
      [ 7174.469413]   5)     6456      80   __kmalloc+0x1cb/0x200
      [ 7174.469413]   6)     6376     376   vring_add_indirect+0x36/0x200
      [ 7174.469413]   7)     6000     144   virtqueue_add_sgs+0x2e2/0x320
      [ 7174.469413]   8)     5856     288   __virtblk_add_req+0xda/0x1b0
      [ 7174.469413]   9)     5568      96   virtio_queue_rq+0xd3/0x1d0
      [ 7174.469413]  10)     5472     128   __blk_mq_run_hw_queue+0x1ef/0x440
      [ 7174.469413]  11)     5344      16   blk_mq_run_hw_queue+0x35/0x40
      [ 7174.469413]  12)     5328      96   blk_mq_insert_requests+0xdb/0x160
      [ 7174.469413]  13)     5232     112   blk_mq_flush_plug_list+0x12b/0x140
      [ 7174.469413]  14)     5120     112   blk_flush_plug_list+0xc7/0x220
      [ 7174.469413]  15)     5008      64   io_schedule_timeout+0x88/0x100
      [ 7174.469413]  16)     4944     128   mempool_alloc+0x145/0x170
      [ 7174.469413]  17)     4816      96   bio_alloc_bioset+0x10b/0x1d0
      [ 7174.469413]  18)     4720      48   get_swap_bio+0x30/0x90
      [ 7174.469413]  19)     4672     160   __swap_writepage+0x150/0x230
      [ 7174.469413]  20)     4512      32   swap_writepage+0x42/0x90
      [ 7174.469413]  21)     4480     320   shrink_page_list+0x676/0xa80
      [ 7174.469413]  22)     4160     208   shrink_inactive_list+0x262/0x4e0
      [ 7174.469413]  23)     3952     304   shrink_lruvec+0x3e1/0x6a0
      [ 7174.469413]  24)     3648      80   shrink_zone+0x3f/0x110
      [ 7174.469413]  25)     3568     128   do_try_to_free_pages+0x156/0x4c0
      [ 7174.469413]  26)     3440     208   try_to_free_pages+0xf7/0x1e0
      [ 7174.469413]  27)     3232     352   __alloc_pages_nodemask+0x783/0xb20
      [ 7174.469413]  28)     2880       8   alloc_pages_current+0x10f/0x1f0
      [ 7174.469413]  29)     2872     200   __page_cache_alloc+0x13f/0x160
      [ 7174.469413]  30)     2672      80   find_or_create_page+0x4c/0xb0
      [ 7174.469413]  31)     2592      80   ext4_mb_load_buddy+0x1e9/0x370
      [ 7174.469413]  32)     2512     176   ext4_mb_regular_allocator+0x1b7/0x460
      [ 7174.469413]  33)     2336     128   ext4_mb_new_blocks+0x458/0x5f0
      [ 7174.469413]  34)     2208     256   ext4_ext_map_blocks+0x70b/0x1010
      [ 7174.469413]  35)     1952     160   ext4_map_blocks+0x325/0x530
      [ 7174.469413]  36)     1792     384   ext4_writepages+0x6d1/0xce0
      [ 7174.469413]  37)     1408      16   do_writepages+0x23/0x40
      [ 7174.469413]  38)     1392      96   __writeback_single_inode+0x45/0x2e0
      [ 7174.469413]  39)     1296     176   writeback_sb_inodes+0x2ad/0x500
      [ 7174.469413]  40)     1120      80   __writeback_inodes_wb+0x9e/0xd0
      [ 7174.469413]  41)     1040     160   wb_writeback+0x29b/0x350
      [ 7174.469413]  42)      880     208   bdi_writeback_workfn+0x11c/0x480
      [ 7174.469413]  43)      672     144   process_one_work+0x1d2/0x570
      [ 7174.469413]  44)      528     112   worker_thread+0x116/0x370
      [ 7174.469413]  45)      416     240   kthread+0xf3/0x110
      [ 7174.469413]  46)      176     176   ret_from_fork+0x7c/0xb0
      [ 7174.469413] ------------[ cut here ]------------
      [ 7174.469413] kernel BUG at kernel/trace/trace_stack.c:174!
      [ 7174.469413] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC
      [ 7174.469413] Dumping ftrace buffer:
      [ 7174.469413]    (ftrace buffer empty)
      [ 7174.469413] Modules linked in:
      [ 7174.469413] CPU: 0 PID: 440 Comm: kworker/u24:0 Not tainted 3.14.0+ #212
      [ 7174.469413] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
      [ 7174.469413] Workqueue: writeback bdi_writeback_workfn (flush-253:0)
      [ 7174.469413] task: ffff880034170000 ti: ffff880029518000 task.ti: ffff880029518000
      [ 7174.469413] RIP: 0010:[<ffffffff8112336e>]  [<ffffffff8112336e>] stack_trace_call+0x2de/0x340
      [ 7174.469413] RSP: 0000:ffff880029518290  EFLAGS: 00010046
      [ 7174.469413] RAX: 0000000000000030 RBX: 000000000000002f RCX: 0000000000000000
      [ 7174.469413] RDX: 0000000000000000 RSI: 000000000000002f RDI: ffffffff810b7159
      [ 7174.469413] RBP: ffff8800295182f0 R08: ffffffffffffffff R09: 0000000000000000
      [ 7174.469413] R10: 0000000000000001 R11: 0000000000000001 R12: ffffffff82768dfc
      [ 7174.469413] R13: 000000000000f2e8 R14: ffff8800295182b8 R15: 00000000000000f8
      [ 7174.469413] FS:  0000000000000000(0000) GS:ffff880037c00000(0000) knlGS:0000000000000000
      [ 7174.469413] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      [ 7174.469413] CR2: 00002acd0b994000 CR3: 0000000001c0b000 CR4: 00000000000006f0
      [ 7174.469413] Stack:
      [ 7174.469413]  0000000000000000 ffffffff8114fdb7 0000000000000087 0000000000001c50
      [ 7174.469413]  0000000000000000 0000000000000000 0000000000000000 0000000000000000
      [ 7174.469413]  0000000000000002 ffff880034170000 ffff880034171028 0000000000000000
      [ 7174.469413] Call Trace:
      [ 7174.469413]  [<ffffffff8114fdb7>] ? get_page_from_freelist+0xa7/0x920
      [ 7174.469413]  [<ffffffff816eee3f>] ftrace_call+0x5/0x2f
      [ 7174.469413]  [<ffffffff81165065>] ? next_zones_zonelist+0x5/0x70
      [ 7174.469413]  [<ffffffff810a23fa>] ? __bfs+0x11a/0x270
      [ 7174.469413]  [<ffffffff81165065>] ? next_zones_zonelist+0x5/0x70
      [ 7174.469413]  [<ffffffff8114fdb7>] ? get_page_from_freelist+0xa7/0x920
      [ 7174.469413]  [<ffffffff8119092f>] ? alloc_pages_current+0x10f/0x1f0
      [ 7174.469413]  [<ffffffff811507fd>] __alloc_pages_nodemask+0x1cd/0xb20
      [ 7174.469413]  [<ffffffff810a4de6>] ? check_irq_usage+0x96/0xe0
      [ 7174.469413]  [<ffffffff816eee3f>] ? ftrace_call+0x5/0x2f
      [ 7174.469413]  [<ffffffff8119092f>] alloc_pages_current+0x10f/0x1f0
      [ 7174.469413]  [<ffffffff81199cd5>] ? new_slab+0x2c5/0x370
      [ 7174.469413]  [<ffffffff81199cd5>] new_slab+0x2c5/0x370
      [ 7174.469413]  [<ffffffff816eee3f>] ? ftrace_call+0x5/0x2f
      [ 7174.469413]  [<ffffffff816db002>] __slab_alloc+0x3a9/0x501
      [ 7174.469413]  [<ffffffff8119af8b>] ? __kmalloc+0x1cb/0x200
      [ 7174.469413]  [<ffffffff8141dc46>] ? vring_add_indirect+0x36/0x200
      [ 7174.469413]  [<ffffffff8141dc46>] ? vring_add_indirect+0x36/0x200
      [ 7174.469413]  [<ffffffff8141dc46>] ? vring_add_indirect+0x36/0x200
      [ 7174.469413]  [<ffffffff8119af8b>] __kmalloc+0x1cb/0x200
      [ 7174.469413]  [<ffffffff8141de10>] ? vring_add_indirect+0x200/0x200
      [ 7174.469413]  [<ffffffff8141dc46>] vring_add_indirect+0x36/0x200
      [ 7174.469413]  [<ffffffff8141e402>] virtqueue_add_sgs+0x2e2/0x320
      [ 7174.469413]  [<ffffffff8148e35a>] __virtblk_add_req+0xda/0x1b0
      [ 7174.469413]  [<ffffffff8148e503>] virtio_queue_rq+0xd3/0x1d0
      [ 7174.469413]  [<ffffffff8134aa0f>] __blk_mq_run_hw_queue+0x1ef/0x440
      [ 7174.469413]  [<ffffffff8134b0d5>] blk_mq_run_hw_queue+0x35/0x40
      [ 7174.469413]  [<ffffffff8134b7bb>] blk_mq_insert_requests+0xdb/0x160
      [ 7174.469413]  [<ffffffff8134be5b>] blk_mq_flush_plug_list+0x12b/0x140
      [ 7174.469413]  [<ffffffff81342237>] blk_flush_plug_list+0xc7/0x220
      [ 7174.469413]  [<ffffffff816e60ef>] ? _raw_spin_unlock_irqrestore+0x3f/0x70
      [ 7174.469413]  [<ffffffff816e16e8>] io_schedule_timeout+0x88/0x100
      [ 7174.469413]  [<ffffffff816e1665>] ? io_schedule_timeout+0x5/0x100
      [ 7174.469413]  [<ffffffff81149415>] mempool_alloc+0x145/0x170
      [ 7174.469413]  [<ffffffff8109baf0>] ? __init_waitqueue_head+0x60/0x60
      [ 7174.469413]  [<ffffffff811e246b>] bio_alloc_bioset+0x10b/0x1d0
      [ 7174.469413]  [<ffffffff81184230>] ? end_swap_bio_read+0xc0/0xc0
      [ 7174.469413]  [<ffffffff81184230>] ? end_swap_bio_read+0xc0/0xc0
      [ 7174.469413]  [<ffffffff81184110>] get_swap_bio+0x30/0x90
      [ 7174.469413]  [<ffffffff81184230>] ? end_swap_bio_read+0xc0/0xc0
      [ 7174.469413]  [<ffffffff81184660>] __swap_writepage+0x150/0x230
      [ 7174.469413]  [<ffffffff810ab405>] ? do_raw_spin_unlock+0x5/0xa0
      [ 7174.469413]  [<ffffffff81184230>] ? end_swap_bio_read+0xc0/0xc0
      [ 7174.469413]  [<ffffffff81184515>] ? __swap_writepage+0x5/0x230
      [ 7174.469413]  [<ffffffff81184782>] swap_writepage+0x42/0x90
      [ 7174.469413]  [<ffffffff8115ae96>] shrink_page_list+0x676/0xa80
      [ 7174.469413]  [<ffffffff816eee3f>] ? ftrace_call+0x5/0x2f
      [ 7174.469413]  [<ffffffff8115b872>] shrink_inactive_list+0x262/0x4e0
      [ 7174.469413]  [<ffffffff8115c1c1>] shrink_lruvec+0x3e1/0x6a0
      [ 7174.469413]  [<ffffffff8115c4bf>] shrink_zone+0x3f/0x110
      [ 7174.469413]  [<ffffffff816eee3f>] ? ftrace_call+0x5/0x2f
      [ 7174.469413]  [<ffffffff8115c9e6>] do_try_to_free_pages+0x156/0x4c0
      [ 7174.469413]  [<ffffffff8115cf47>] try_to_free_pages+0xf7/0x1e0
      [ 7174.469413]  [<ffffffff81150db3>] __alloc_pages_nodemask+0x783/0xb20
      [ 7174.469413]  [<ffffffff8119092f>] alloc_pages_current+0x10f/0x1f0
      [ 7174.469413]  [<ffffffff81145c0f>] ? __page_cache_alloc+0x13f/0x160
      [ 7174.469413]  [<ffffffff81145c0f>] __page_cache_alloc+0x13f/0x160
      [ 7174.469413]  [<ffffffff81146c6c>] find_or_create_page+0x4c/0xb0
      [ 7174.469413]  [<ffffffff811463e5>] ? find_get_page+0x5/0x130
      [ 7174.469413]  [<ffffffff812837b9>] ext4_mb_load_buddy+0x1e9/0x370
      [ 7174.469413]  [<ffffffff81284c07>] ext4_mb_regular_allocator+0x1b7/0x460
      [ 7174.469413]  [<ffffffff81281070>] ? ext4_mb_use_preallocated+0x40/0x360
      [ 7174.469413]  [<ffffffff816eee3f>] ? ftrace_call+0x5/0x2f
      [ 7174.469413]  [<ffffffff81287eb8>] ext4_mb_new_blocks+0x458/0x5f0
      [ 7174.469413]  [<ffffffff8127d83b>] ext4_ext_map_blocks+0x70b/0x1010
      [ 7174.469413]  [<ffffffff8124e6d5>] ext4_map_blocks+0x325/0x530
      [ 7174.469413]  [<ffffffff81253871>] ext4_writepages+0x6d1/0xce0
      [ 7174.469413]  [<ffffffff812531a0>] ? ext4_journalled_write_end+0x330/0x330
      [ 7174.469413]  [<ffffffff811539b3>] do_writepages+0x23/0x40
      [ 7174.469413]  [<ffffffff811d2365>] __writeback_single_inode+0x45/0x2e0
      [ 7174.469413]  [<ffffffff811d36ed>] writeback_sb_inodes+0x2ad/0x500
      [ 7174.469413]  [<ffffffff811d39de>] __writeback_inodes_wb+0x9e/0xd0
      [ 7174.469413]  [<ffffffff811d40bb>] wb_writeback+0x29b/0x350
      [ 7174.469413]  [<ffffffff81057c3d>] ? __local_bh_enable_ip+0x6d/0xd0
      [ 7174.469413]  [<ffffffff811d6e9c>] bdi_writeback_workfn+0x11c/0x480
      [ 7174.469413]  [<ffffffff81070610>] ? process_one_work+0x170/0x570
      [ 7174.469413]  [<ffffffff81070672>] process_one_work+0x1d2/0x570
      [ 7174.469413]  [<ffffffff81070610>] ? process_one_work+0x170/0x570
      [ 7174.469413]  [<ffffffff81071bb6>] worker_thread+0x116/0x370
      [ 7174.469413]  [<ffffffff81071aa0>] ? manage_workers.isra.19+0x2e0/0x2e0
      [ 7174.469413]  [<ffffffff81078e53>] kthread+0xf3/0x110
      [ 7174.469413]  [<ffffffff81078d60>] ? flush_kthread_worker+0x150/0x150
      [ 7174.469413]  [<ffffffff816ef0ec>] ret_from_fork+0x7c/0xb0
      [ 7174.469413]  [<ffffffff81078d60>] ? flush_kthread_worker+0x150/0x150
      [ 7174.469413] Code: c0 49 bc fc 8d 76 82 ff ff ff ff e8 44 5a 5b 00 31 f6 8b 05 95 2b b3 00 48 39 c6 7d 0e 4c 8b 04 f5 20 5f c5 81 49 83 f8 ff 75 11 <0f> 0b 48 63 05 71 5a 64 01 48 29 c3 e9 d0 fd ff ff 48 8d 5e 01
      [ 7174.469413] RIP  [<ffffffff8112336e>] stack_trace_call+0x2de/0x340
      [ 7174.469413]  RSP <ffff880029518290>
      [ 7174.469413] ---[ end trace c97d325b36b718f3 ]---
      
      Link: http://lkml.kernel.org/p/1401683592-1651-1-git-send-email-minchan@kernel.orgSigned-off-by: default avatarMinchan Kim <minchan@kernel.org>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      e3172181
  8. 30 May, 2014 5 commits
    • Steven Rostedt (Red Hat)'s avatar
      tracing: Move locking of trace_cmdline_lock into start/stop seq calls · 4c27e756
      Steven Rostedt (Red Hat) authored
      With the conversion of the saved_cmdlines output to use seq_read, there
      is now a race between accessing the values of the saved_cmdlines and
      the writing to them. The trace_cmdline_lock needs to be taken at
      the start and stop of the seq calls.
      
      A new __trace_find_cmdline() call is created to allow for the look up
      to happen without taking the lock.
      
      Fixes: 42584c81 tracing: Have saved_cmdlines use the seq_read infrastructure
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      4c27e756
    • Steven Rostedt (Red Hat)'s avatar
      tracing: Try again for saved cmdline if failed due to locking · 379cfdac
      Steven Rostedt (Red Hat) authored
      In order to prevent the saved cmdline cache from being filled when
      tracing is not active, the comms are only recorded after a trace event
      is recorded.
      
      The problem is, a comm can fail to be recorded if the trace_cmdline_lock
      is held. That lock is taken via a trylock to allow it to happen from
      any context (including NMI). If the lock fails to be taken, the comm
      is skipped. No big deal, as we will try again later.
      
      But! Because of the code that was added to only record after an event,
      we may not try again later as the recording is made as a oneshot per
      event per CPU.
      
      Only disable the recording of the comm if the comm is actually recorded.
      
      Fixes: 7ffbd48d "tracing: Cache comms only after an event occurred"
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      379cfdac
    • Yoshihiro YUNOMAE's avatar
      tracing: Have saved_cmdlines use the seq_read infrastructure · 42584c81
      Yoshihiro YUNOMAE authored
      Current tracing_saved_cmdlines_read() implementation is naive; It allocates
      a large buffer, constructs output data to that buffer for each read
      operation, and then copies a portion of the buffer to the user space
      buffer. This has several issues such as slow memory allocation, high
      CPU usage, and even corruption of the output data.
      
      The seq_read infrastructure is made to handle this type of work.
      By converting it to use seq_read() the code becomes smaller, simplified,
      as well as correct.
      
      Link: http://lkml.kernel.org/p/20140220084431.3839.51793.stgit@yunodevelSigned-off-by: default avatarHidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Signed-off-by: default avatarYoshihiro YUNOMAE <yoshihiro.yunomae.ez@hitachi.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      42584c81
    • Steven Rostedt (Red Hat)'s avatar
      tracing: Add tracepoint benchmark tracepoint · 81dc9f0e
      Steven Rostedt (Red Hat) authored
      In order to help benchmark the time tracepoints take, a new config
      option is added called CONFIG_TRACEPOINT_BENCHMARK. When this option
      is set a tracepoint is created called "benchmark:benchmark_event".
      When the tracepoint is enabled, it kicks off a kernel thread that
      goes into an infinite loop (calling cond_sched() to let other tasks
      run), and calls the tracepoint. Each iteration will record the time
      it took to write to the tracepoint and the next iteration that
      data will be passed to the tracepoint itself. That is, the tracepoint
      will report the time it took to do the previous tracepoint.
      The string written to the tracepoint is a static string of 128 bytes
      to keep the time the same. The initial string is simply a write of
      "START". The second string records the cold cache time of the first
      write which is not added to the rest of the calculations.
      
      As it is a tight loop, it benchmarks as hot cache. That's fine because
      we care most about hot paths that are probably in cache already.
      
      An example of the output:
      
           START
           first=3672 [COLD CACHED]
           last=632 first=3672 max=632 min=632 avg=316 std=446 std^2=199712
           last=278 first=3672 max=632 min=278 avg=303 std=316 std^2=100337
           last=277 first=3672 max=632 min=277 avg=296 std=258 std^2=67064
           last=273 first=3672 max=632 min=273 avg=292 std=224 std^2=50411
           last=273 first=3672 max=632 min=273 avg=288 std=200 std^2=40389
           last=281 first=3672 max=632 min=273 avg=287 std=183 std^2=33666
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      81dc9f0e
    • Steven Rostedt's avatar
      tracing: Print nasty banner when trace_printk() is in use · 2184db46
      Steven Rostedt authored
      trace_printk() is used to debug fast paths within the kernel. Places
      that gets called in any context (interrupt or NMI) or thousands of
      times a second. Something you do not want to do with a printk().
      
      In order to make it completely lockless as it needs a temporary buffer
      to handle some of the string formatting, a page is created per cpu for
      every context (four per cpu; normal, softirq, irq, NMI).
      
      Since trace_printk() should only be used for debugging purposes,
      there's no reason to waste memory on these buffers on a production
      system. That means, trace_printk() should never be used unless a
      developer is debugging their kernel. There's macro magic to allocate
      the buffers if trace_printk() is used anywhere in the kernel.
      
      To help enforce that trace_printk() isn't used outside of development,
      when it is used, a nasty banner is displayed on bootup (or when a module
      is loaded that uses trace_printk() and the kernel core does not).
      
      Here's the banner:
      
       **********************************************************
       **   NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE   **
       **                                                      **
       ** trace_printk() being used. Allocating extra memory.  **
       **                                                      **
       ** This means that this is a DEBUG kernel and it is     **
       ** unsafe for produciton use.                           **
       **                                                      **
       ** If you see this message and you are not debugging    **
       ** the kernel, report this immediately to your vendor!  **
       **                                                      **
       **   NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE   **
       **********************************************************
      
      That should hopefully keep developers from trying to sneak in a
      trace_printk() or two.
      
      Link: http://lkml.kernel.org/p/20140528131440.2283213c@gandalf.local.homeSigned-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      2184db46
  9. 21 May, 2014 2 commits
  10. 15 May, 2014 1 commit
    • Steven Rostedt (Red Hat)'s avatar
      tracing: Add __bitmask() macro to trace events to cpumasks and other bitmasks · 4449bf92
      Steven Rostedt (Red Hat) authored
      Being able to show a cpumask of events can be useful as some events
      may affect only some CPUs. There is no standard way to record the
      cpumask and converting it to a string is rather expensive during
      the trace as traces happen in hotpaths. It would be better to record
      the raw event mask and be able to parse it at print time.
      
      The following macros were added for use with the TRACE_EVENT() macro:
      
        __bitmask()
        __assign_bitmask()
        __get_bitmask()
      
      To test this, I added this to the sched_migrate_task event, which
      looked like this:
      
      TRACE_EVENT(sched_migrate_task,
      
      	TP_PROTO(struct task_struct *p, int dest_cpu, const struct cpumask *cpus),
      
      	TP_ARGS(p, dest_cpu, cpus),
      
      	TP_STRUCT__entry(
      		__array(	char,	comm,	TASK_COMM_LEN	)
      		__field(	pid_t,	pid			)
      		__field(	int,	prio			)
      		__field(	int,	orig_cpu		)
      		__field(	int,	dest_cpu		)
      		__bitmask(	cpumask, num_possible_cpus()	)
      	),
      
      	TP_fast_assign(
      		memcpy(__entry->comm, p->comm, TASK_COMM_LEN);
      		__entry->pid		= p->pid;
      		__entry->prio		= p->prio;
      		__entry->orig_cpu	= task_cpu(p);
      		__entry->dest_cpu	= dest_cpu;
      		__assign_bitmask(cpumask, cpumask_bits(cpus), num_possible_cpus());
      	),
      
      	TP_printk("comm=%s pid=%d prio=%d orig_cpu=%d dest_cpu=%d cpumask=%s",
      		  __entry->comm, __entry->pid, __entry->prio,
      		  __entry->orig_cpu, __entry->dest_cpu,
      		  __get_bitmask(cpumask))
      );
      
      With the output of:
      
              ksmtuned-3613  [003] d..2   485.220508: sched_migrate_task: comm=ksmtuned pid=3615 prio=120 orig_cpu=3 dest_cpu=2 cpumask=00000000,0000000f
           migration/1-13    [001] d..5   485.221202: sched_migrate_task: comm=ksmtuned pid=3614 prio=120 orig_cpu=1 dest_cpu=0 cpumask=00000000,0000000f
                   awk-3615  [002] d.H5   485.221747: sched_migrate_task: comm=rcu_preempt pid=7 prio=120 orig_cpu=0 dest_cpu=1 cpumask=00000000,000000ff
           migration/2-18    [002] d..5   485.222062: sched_migrate_task: comm=ksmtuned pid=3615 prio=120 orig_cpu=2 dest_cpu=3 cpumask=00000000,0000000f
      
      Link: http://lkml.kernel.org/r/1399377998-14870-6-git-send-email-javi.merino@arm.com
      Link: http://lkml.kernel.org/r/20140506132238.22e136d1@gandalf.local.homeSuggested-by: default avatarJavi Merino <javi.merino@arm.com>
      Tested-by: default avatarJavi Merino <javi.merino@arm.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      4449bf92
  11. 14 May, 2014 8 commits
  12. 07 May, 2014 1 commit
    • Steven Rostedt (Red Hat)'s avatar
      tracing: Add trace_<tracepoint>_enabled() function · 7c65bbc7
      Steven Rostedt (Red Hat) authored
      There are some code paths in the kernel that need to do some preparations
      before it calls a tracepoint. As that code is worthless overhead when
      the tracepoint is not enabled, it would be prudent to have that code
      only run when the tracepoint is active. To accomplish this, all tracepoints
      now get a static inline function called "trace_<tracepoint-name>_enabled()"
      which returns true when the tracepoint is enabled and false otherwise.
      
      As an added bonus, that function uses the static_key of the tracepoint
      such that no branch is needed.
      
        if (trace_mytracepoint_enabled()) {
      	arg = process_tp_arg();
      	trace_mytracepoint(arg);
        }
      
      Will keep the "process_tp_arg()" (which may be expensive to run) from
      being executed when the tracepoint isn't enabled.
      
      It's best to encapsulate the tracepoint itself in the if statement
      just to keep races. For example, if you had:
      
        if (trace_mytracepoint_enabled())
      	arg = process_tp_arg();
        trace_mytracepoint(arg);
      
      There's a chance that the tracepoint could be enabled just after the
      if statement, and arg will be undefined when calling the tracepoint.
      
      Link: http://lkml.kernel.org/r/20140506094407.507b6435@gandalf.local.homeAcked-by: default avatarMathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      7c65bbc7
  13. 06 May, 2014 1 commit
  14. 05 May, 2014 1 commit
    • Frederic Weisbecker's avatar
      tracing: Remove myself as a tracing maintainer · 3415c28c
      Frederic Weisbecker authored
      It has been a while since I last sent a tracing patch. I always keep an
      eye on tracing evolutions and contributions in general but given
      how busy I am with nohz, isolation and more generally core cleanups stuff,
      I seldom have time left to provide deep reviews of tracing patches nor
      simply for reviews to begin with.
      
      I've been very lucky to start kernel development on a very young subsystem
      with tons of low hanging fruits back in 2008. Given that it deals with
      a lot of tricky stuffs all around (sched, timers, irq, preemption, NMIs,
      SMP, RCU, ....) I basically learned everything there.
      
      Steve has been doing most of the incredible work these last years.
      Thanks a lot!
      
      Of course consider me always available to help on tracing if any hard
      days happen.
      
      Link: http://lkml.kernel.org/p/1399131991-13216-1-git-send-email-fweisbec@gmail.com
      
      Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      3415c28c
  15. 02 May, 2014 1 commit
  16. 30 Apr, 2014 1 commit
    • Steven Rostedt (Red Hat)'s avatar
      tracing: Remove mock up poll wait function · b1169cc6
      Steven Rostedt (Red Hat) authored
      Now that the ring buffer has a built in way to wake up readers
      when there's data, using irq_work such that it is safe to do it
      in any context. But it was still using the old "poor man's"
      wait polling that checks every 1/10 of a second to see if it
      should wake up a waiter. This makes the latency for a wake up
      excruciatingly long. No need to do that anymore.
      
      Completely remove the different wait_poll types from the tracers
      and have them all use the default one now.
      Reported-by: default avatarJohannes Berg <johannes@sipsolutions.net>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      b1169cc6
  17. 29 Apr, 2014 1 commit
  18. 24 Apr, 2014 2 commits
  19. 21 Apr, 2014 1 commit