Commit fd93dc04 authored by Brendan Gregg's avatar Brendan Gregg

tcplife: switch to the new sock:inet_sock_set_state tracepoint

parent 913450f1
...@@ -10,10 +10,10 @@ duration, and throughput for the session. This is useful for workload ...@@ -10,10 +10,10 @@ duration, and throughput for the session. This is useful for workload
characterisation and flow accounting: identifying what connections are characterisation and flow accounting: identifying what connections are
happening, with the bytes transferred. happening, with the bytes transferred.
This tool works using the tcp:tcp_set_state tracepoint if it exists, added This tool works using the sock:inet_sock_set_state tracepoint if it exists,
to Linux 4.15, and switches to using kernel dynamic tracing for older kernels. added to Linux 4.16, and switches to using kernel dynamic tracing for older
Only TCP state changes are traced, so it is expected that the overhead of kernels. Only TCP state changes are traced, so it is expected that the
this tool is much lower than typical send/receive tracing. overhead of this tool is much lower than typical send/receive tracing.
Since this uses BPF, only the root user can use this tool. Since this uses BPF, only the root user can use this tool.
.SH REQUIREMENTS .SH REQUIREMENTS
......
...@@ -6,8 +6,9 @@ ...@@ -6,8 +6,9 @@
# #
# USAGE: tcplife [-h] [-C] [-S] [-p PID] [interval [count]] # USAGE: tcplife [-h] [-C] [-S] [-p PID] [interval [count]]
# #
# This uses the tcp:tcp_set_state tracepoint if it exists (added to # This uses the sock:inet_sock_set_state tracepoint if it exists (added to
# Linux 4.15), else it uses kernel dynamic tracing of tcp_set_state(). # Linux 4.16, and replacing the earlier tcp:tcp_set_state), else it uses
# kernel dynamic tracing of tcp_set_state().
# #
# While throughput counters are emitted, they are fetched in a low-overhead # While throughput counters are emitted, they are fetched in a low-overhead
# manner: reading members of the tcp_info struct on TCP close. ie, we do not # manner: reading members of the tcp_info struct on TCP close. ie, we do not
...@@ -110,9 +111,9 @@ BPF_HASH(whoami, struct sock *, struct id_t); ...@@ -110,9 +111,9 @@ BPF_HASH(whoami, struct sock *, struct id_t);
# #
# XXX: The following is temporary code for older kernels, Linux 4.14 and # XXX: The following is temporary code for older kernels, Linux 4.14 and
# older. It uses kprobes to instrument tcp_set_state(). On Linux 4.15 and # older. It uses kprobes to instrument tcp_set_state(). On Linux 4.16 and
# later, the tcp:tcp_set_state tracepoint should be used instead, as is # later, the sock:inet_sock_set_state tracepoint should be used instead, as
# done by the code that follows this. In the distant future (2021?), this # is done by the code that follows this. In the distant future (2021?), this
# kprobe code can be removed. This is why there is so much code # kprobe code can be removed. This is why there is so much code
# duplication: to make removal easier. # duplication: to make removal easier.
# #
...@@ -235,10 +236,13 @@ int kprobe__tcp_set_state(struct pt_regs *ctx, struct sock *sk, int state) ...@@ -235,10 +236,13 @@ int kprobe__tcp_set_state(struct pt_regs *ctx, struct sock *sk, int state)
""" """
bpf_text_tracepoint = """ bpf_text_tracepoint = """
TRACEPOINT_PROBE(tcp, tcp_set_state) TRACEPOINT_PROBE(sock, inet_sock_set_state)
{ {
if (args->protocol != IPPROTO_TCP)
return 0;
u32 pid = bpf_get_current_pid_tgid() >> 32; u32 pid = bpf_get_current_pid_tgid() >> 32;
// sk is mostly used as a UUID, once for skc_family, and two tcp stats: // sk is mostly used as a UUID, and for two tcp stats:
struct sock *sk = (struct sock *)args->skaddr; struct sock *sk = (struct sock *)args->skaddr;
// lport is either used in a filter here, or later // lport is either used in a filter here, or later
...@@ -310,10 +314,7 @@ TRACEPOINT_PROBE(tcp, tcp_set_state) ...@@ -310,10 +314,7 @@ TRACEPOINT_PROBE(tcp, tcp_set_state)
bpf_probe_read(&rx_b, sizeof(rx_b), &tp->bytes_received); bpf_probe_read(&rx_b, sizeof(rx_b), &tp->bytes_received);
bpf_probe_read(&tx_b, sizeof(tx_b), &tp->bytes_acked); bpf_probe_read(&tx_b, sizeof(tx_b), &tp->bytes_acked);
u16 family = 0; if (args->family == AF_INET) {
bpf_probe_read(&family, sizeof(family), &sk->__sk_common.skc_family);
if (family == AF_INET) {
struct ipv4_data_t data4 = {.span_us = delta_us, struct ipv4_data_t data4 = {.span_us = delta_us,
.rx_b = rx_b, .tx_b = tx_b}; .rx_b = rx_b, .tx_b = tx_b};
data4.ts_us = bpf_ktime_get_ns() / 1000; data4.ts_us = bpf_ktime_get_ns() / 1000;
...@@ -354,7 +355,7 @@ TRACEPOINT_PROBE(tcp, tcp_set_state) ...@@ -354,7 +355,7 @@ TRACEPOINT_PROBE(tcp, tcp_set_state)
} }
""" """
if (BPF.tracepoint_exists("tcp", "tcp_set_state")): if (BPF.tracepoint_exists("sock", "inet_sock_set_state")):
bpf_text += bpf_text_tracepoint bpf_text += bpf_text_tracepoint
else: else:
bpf_text += bpf_text_kprobe bpf_text += bpf_text_kprobe
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment