Commit e36f9e16 authored by Nikita V. Shirokov's avatar Nikita V. Shirokov Committed by yonghong-song

[profile.py]: adding support to collect profile only from specified CPU (#1891)

* [profile.py]: adding support to collect profile only from specified CPU

Summary:
sometime it is usefull to collect stack only from single cpu
for example you have single core saturated while others dont and you
want to know whats going on there. in this diff i'm adding this ability
(network related code could be example of when single core is saturated
as usually you have 1 to 1 mappng between rx queue and cpu)

example of generated code w/ CPU specified:

./tools/profile.py -C 14 2 --ebpf
Sampling at 49 Hertz of all threads by user + kernel stack for 2 secs.

struct key_t {
    u32 pid;
    u64 kernel_ip;
    u64 kernel_ret_ip;
    int user_stack_id;
    int kernel_stack_id;
    char name[TASK_COMM_LEN];
};
BPF_HASH(counts, struct key_t);
BPF_STACK_TRACE(stack_traces, 16384);

// This code gets a bit complex. Probably not suitable for casual hacking.

int do_perf_event(struct bpf_perf_event_data *ctx) {

    if (bpf_get_smp_processor_id() != 14)
        return 0;

    u32 pid = bpf_get_current_pid_tgid() >> 32;
...

and w/o

./tools/profile.py  2 --ebpf
Sampling at 49 Hertz of all threads by user + kernel stack for 2 secs.

struct key_t {
    u32 pid;
    u64 kernel_ip;
    u64 kernel_ret_ip;
    int user_stack_id;
    int kernel_stack_id;
    char name[TASK_COMM_LEN];
};
BPF_HASH(counts, struct key_t);
BPF_STACK_TRACE(stack_traces, 16384);

// This code gets a bit complex. Probably not suitable for casual hacking.

int do_perf_event(struct bpf_perf_event_data *ctx) {

    u32 pid = bpf_get_current_pid_tgid() >> 32;
    if (!(1))
        return 0;
...

* addressing comments

* adding change in man
parent 5965fdc2
......@@ -54,6 +54,9 @@ Show stacks from kernel space only (no user space stacks).
The maximum number of unique stack traces that the kernel will count (default
16384). If the sampled count exceeds this, a warning will be printed.
.TP
\-C cpu
Collect stacks only from specified cpu.
.TP
duration
Duration to trace, in seconds.
.SH EXAMPLES
......
......@@ -102,6 +102,8 @@ parser.add_argument("--stack-storage-size", default=16384,
parser.add_argument("duration", nargs="?", default=99999999,
type=positive_nonzero_int,
help="duration of trace, in seconds")
parser.add_argument("-C", "--cpu", type=int, default=-1,
help="cpu number to run profile on")
parser.add_argument("--ebpf", action="store_true",
help=argparse.SUPPRESS)
......@@ -230,6 +232,8 @@ sample_context = "%s%d %s" % (("", sample_freq, "Hertz") if sample_freq
if not args.folded:
print("Sampling at %s of %s by %s stack" %
(sample_context, thread_context, stack_context), end="")
if args.cpu >= 0:
print(" on CPU#{}".format(args.cpu), end="")
if duration < 99999999:
print(" for %d secs." % duration)
else:
......@@ -244,7 +248,7 @@ if debug or args.ebpf:
b = BPF(text=bpf_text)
b.attach_perf_event(ev_type=PerfType.SOFTWARE,
ev_config=PerfSWConfig.CPU_CLOCK, fn_name="do_perf_event",
sample_period=sample_period, sample_freq=sample_freq)
sample_period=sample_period, sample_freq=sample_freq, cpu=args.cpu)
# signal handler
def signal_ignore(signal, frame):
......
......@@ -573,6 +573,41 @@ Sampling at 9 Hertz of all threads by user + kernel stack... Hit Ctrl-C to end.
You can also restrict profiling to just kernel stacks (-K) or user stacks (-U).
For example, just user stacks:
# ./profile -C 7 2
Sampling at 49 Hertz of all threads by user + kernel stack on CPU#7 for 2 secs.
PyEval_EvalFrameEx
[unknown]
[unknown]
- python (2827439)
1
PyDict_GetItem
[unknown]
- python (2827439)
1
[unknown]
- python (2827439)
1
PyEval_EvalFrameEx
[unknown]
[unknown]
- python (2827439)
1
PyEval_EvalFrameEx
- python (2827439)
1
[unknown]
[unknown]
- python (2827439)
in this example python application was busylooping on a single core/cpu (#7)
we were collecting stack traces only from it
# ./profile -U
Sampling at 49 Hertz of all threads by user stack... Hit Ctrl-C to end.
^C
......@@ -733,6 +768,7 @@ optional arguments:
--stack-storage-size STACK_STORAGE_SIZE
the number of unique stack traces that can be stored
and displayed (default 2048)
-C CPU, --cpu CPU cpu number to run profile on
examples:
./profile # profile stack traces at 49 Hertz until Ctrl-C
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment