Commit 799cda9b authored by Dror Kronstein's avatar Dror Kronstein Committed by GitHub

Merge branch 'master' into master

parents 86ec63fc ba404cfe
...@@ -68,7 +68,12 @@ if(NOT DEFINED BCC_KERNEL_MODULES_SUFFIX) ...@@ -68,7 +68,12 @@ if(NOT DEFINED BCC_KERNEL_MODULES_SUFFIX)
endif() endif()
set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -Wall") set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -Wall")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -std=c++11 -Wall") # iterate over all available directories in LLVM_INCLUDE_DIRS to
# generate a correctly tokenized list of parameters
foreach(ONE_LLVM_INCLUDE_DIR ${LLVM_INCLUDE_DIRS})
set(CXX_ISYSTEM_DIRS "${CXX_ISYSTEM_DIRS} -isystem ${ONE_LLVM_INCLUDE_DIR}")
endforeach()
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -std=c++11 -Wall ${CXX_ISYSTEM_DIRS}")
endif() endif()
add_subdirectory(examples) add_subdirectory(examples)
......
...@@ -15,7 +15,7 @@ More detail for each below. ...@@ -15,7 +15,7 @@ More detail for each below.
## Examples ## Examples
These are grouped into subdirectories (networking, tracing). Your example can either be a Python program with embedded C (eg, tracing/strlen_count.py), or separate Python and C files (eg, tracing/bitehist.*). These are grouped into subdirectories (networking, tracing). Your example can either be a Python program with embedded C (eg, tracing/strlen_count.py), or separate Python and C files (eg, tracing/vfsreadlat.*).
As said earlier: keep it short, neat, and documented (code comments). As said earlier: keep it short, neat, and documented (code comments).
......
...@@ -120,6 +120,7 @@ Examples: ...@@ -120,6 +120,7 @@ Examples:
- tools/[tcpconnect](tools/tcpconnect.py): Trace TCP active connections (connect()). [Examples](tools/tcpconnect_example.txt). - tools/[tcpconnect](tools/tcpconnect.py): Trace TCP active connections (connect()). [Examples](tools/tcpconnect_example.txt).
- tools/[tcpconnlat](tools/tcpconnlat.py): Trace TCP active connection latency (connect()). [Examples](tools/tcpconnlat_example.txt). - tools/[tcpconnlat](tools/tcpconnlat.py): Trace TCP active connection latency (connect()). [Examples](tools/tcpconnlat_example.txt).
- tools/[tcpretrans](tools/tcpretrans.py): Trace TCP retransmits and TLPs. [Examples](tools/tcpretrans_example.txt). - tools/[tcpretrans](tools/tcpretrans.py): Trace TCP retransmits and TLPs. [Examples](tools/tcpretrans_example.txt).
- tools/[tcptop](tools/tcptop.py): Summarize TCP send/recv throughput by host. Top for TCP. [Examples](tools/tcptop_example.txt).
- tools/[tplist](tools/tplist.py): Display kernel tracepoints or USDT probes and their formats. [Examples](tools/tplist_example.txt). - tools/[tplist](tools/tplist.py): Display kernel tracepoints or USDT probes and their formats. [Examples](tools/tplist_example.txt).
- tools/[trace](tools/trace.py): Trace arbitrary functions, with filters. [Examples](tools/trace_example.txt) - tools/[trace](tools/trace.py): Trace arbitrary functions, with filters. [Examples](tools/trace_example.txt)
- tools/[vfscount](tools/vfscount.py) tools/[vfscount.c](tools/vfscount.c): Count VFS calls. [Examples](tools/vfscount_example.txt). - tools/[vfscount](tools/vfscount.py) tools/[vfscount.c](tools/vfscount.c): Count VFS calls. [Examples](tools/vfscount_example.txt).
......
# BPF Features by Linux Kernel Version # BPF Features by Linux Kernel Version
Major milestone releases: 4.1, 4.4. ## eBPF support
## 3.18 Kernel version | Commit
---------------|-------
3.15 | [bd4cf0ed331a](https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=bd4cf0ed331a275e9bf5a49e6d0fd55dffc551b8)
- bpf syscall. ## JIT compiling
## 3.19 Feature / Architecture | Kernel version | Commit
-----------------------|----------------|-------
x86\_64 | 3.16 | [622582786c9e](https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=622582786c9e041d0bd52bde201787adeab249f8)
ARM64 | 3.18 | [e54bcde3d69d](https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=e54bcde3d69d40023ae77727213d14f920eb264a)
s390 | 4.1 | [054623105728](https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=054623105728b06852f077299e2bf1bf3d5f2b0b)
Constant blinding for JIT machines | 4.7 | [4f3446bb809f](https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=4f3446bb809f20ad56cadf712e6006815ae7a8f9)
PowerPC64 | 4.8 | [156d0e290e96](https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=156d0e290e969caba25f1851c52417c14d141b24)
- socket support: bpf can attach to sockets. ## Main features
## 4.1 Feature | Kernel version | Commit
--------|----------------|-------
`AF_PACKET` (libpcap/tcpdump, `cls_bpf` classifier, netfilter's `xt_bpf`, team driver's load-balancing mode…) | 3.15 | [bd4cf0ed331a](https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=bd4cf0ed331a275e9bf5a49e6d0fd55dffc551b8)
Kernel helpers | 3.15 | [bd4cf0ed331a](https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=bd4cf0ed331a275e9bf5a49e6d0fd55dffc551b8)
`bpf()` syscall | 3.18 | [99c55f7d47c0](https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=99c55f7d47c0dc6fc64729f37bf435abf43f4c60)
Tables (_a.k.a._ Maps; details below) | 3.18 | [99c55f7d47c0](https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=99c55f7d47c0dc6fc64729f37bf435abf43f4c60)
BPF attached to sockets | 3.19 | [89aa075832b0](89aa075832b0da4402acebd698d0411dcc82d03e)
BPF attached to `kprobes` | 4.1 | [2541517c32be](https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=2541517c32be2531e0da59dfd7efc1ce844644f5)
`cls_bpf` / `act_bpf` for `tc` | 4.1 | [e2e9b6541dd4](https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=e2e9b6541dd4b31848079da80fe2253daaafb549)
Tail calls | 4.2 | [04fd61ab36ec](https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=04fd61ab36ec065e194ab5e74ae34a5240d992bb)
Non-root programs on sockets | 4.4 | [1be7f75d1668](https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=1be7f75d1668d6296b80bf35dcf6762393530afc)
Persistent maps and programs (virtual FS) | 4.4 | [b2197755b263](https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=b2197755b2633e164a439682fb05a9b5ea48f706)
`tc`'s `direct_action` (`da`) mode | 4.4 | [045efa82ff56](https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=045efa82ff563cd4e656ca1c2e354fa5bf6bbda4)
`tc`'s `clsact` qdisc | 4.5 | [1f211a1b929c](https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=1f211a1b929c804100e138c5d3d656992cfd5622)
BPF attached to tracepoints | 4.7 | [98b5c2c65c29](https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=98b5c2c65c2951772a8fc661f50d675e450e8bce)
Direct packet access | 4.7 | [969bf05eb3ce](https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=969bf05eb3cedd5a8d4b7c346a85c2ede87a6d6d)
XDP (see below) | 4.8 | [6a773a15a1e8](https://git.kernel.org/cgit/linux/kernel/git/davem/net-next.git/commit/?id=6a773a15a1e8874e5eccd2f29190c31085912c95)
BPF attached to perf events | [4.9](https://git.kernel.org/cgit/linux/kernel/git/davem/net-next.git/commit/?id=0515e5999a466dfe6e1924f460da599bb6821487) | [](https://git.kernel.org/cgit/linux/kernel/git/davem/net-next.git/commit/?id=)
- kprobe support: BPF programs can now instrument any kernel function via kernel dynamic tracing. ## Tables (_a.k.a._ Maps)
## 4.3 Table type | Kernel version | Commit
-----------|----------------|-------
Hash | 3.19 | [0f8e4bd8a1fc](https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=0f8e4bd8a1fc8c4185f1630061d0a1f2d197a475)
Array | 3.19 | [28fbcfa08d8e](https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=28fbcfa08d8ed7c5a50d41a0433aad222835e8e3)
Tail call (`PROG_ARRAY`) | 4.2 | [04fd61ab36ec](https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=04fd61ab36ec065e194ab5e74ae34a5240d992bb)
Perf events | 4.3 | [ea317b267e9d](https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=ea317b267e9d03a8241893aa176fba7661d07579)
Per-CPU hash | 4.6 | [824bd0ce6c7c](https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=824bd0ce6c7c43a9e1e210abf124958e54d88342)
Per-CPU array | 4.6 | [a10423b87a7e](https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=a10423b87a7eae75da79ce80a8d9475047a674ee)
Stack trace | 4.6 | [d5a3b1f69186](https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=d5a3b1f691865be576c2bffa708549b8cdccda19)
cgroup array | 4.8 | [4ed8ec521ed5](https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=4ed8ec521ed57c4e207ad464ca0388776de74d4b)
Text string | _To be done?_ |
Variable-length maps | _To be done?_ |
- debug string support: bpf_trace_printk() supports strings. ## XDP
## 4.4 Feature / Driver | Kernel version | Commit
-----------------|----------------|-------
XDP core architecture | 4.8 | [6a773a15a1e8](https://git.kernel.org/cgit/linux/kernel/git/davem/net-next.git/commit/?id=6a773a15a1e8874e5eccd2f29190c31085912c95)
Action: drop | 4.8 | [6a773a15a1e8](https://git.kernel.org/cgit/linux/kernel/git/davem/net-next.git/commit/?id=6a773a15a1e8874e5eccd2f29190c31085912c95)
Action: pass on to stack | 4.8 | [6a773a15a1e8](https://git.kernel.org/cgit/linux/kernel/git/davem/net-next.git/commit/?id=6a773a15a1e8874e5eccd2f29190c31085912c95)
Action: direct forwarding (on same port) | 4.8 | [6ce96ca348a9](https://git.kernel.org/cgit/linux/kernel/git/davem/net-next.git/commit/?id=6ce96ca348a9e949f8c43f4d3e98db367d93cffd)
Direct packet data write | 4.8 | [4acf6c0b84c9](https://git.kernel.org/cgit/linux/kernel/git/davem/net-next.git/commit/?id=4acf6c0b84c91243c705303cd9ff16421914150d)
Mellanox `mlx4` driver | 4.8 | [47a38e155037](https://git.kernel.org/cgit/linux/kernel/git/davem/net-next.git/commit/?id=47a38e155037f417c5740e24ccae6482aedf4b68)
Mellanox `mlx5` driver | [4.9](https://git.kernel.org/cgit/linux/kernel/git/davem/net-next.git/commit/?id=86994156c736978d113e7927455d4eeeb2128b9f) | [](https://git.kernel.org/cgit/linux/kernel/git/davem/net-next.git/commit/?id=)
`e1000` driver | ? | [](https://git.kernel.org/cgit/linux/kernel/git/davem/net-next.git/commit/?id=)
- bpf_perf_event_output: used by many tools that print per-event output. Eg, opensnoop. ## Helpers
- unprivileged BPF for sockets: non-root usage for socket-based programs.
## 4.6 Alphabetical order
- stack traces (BPF_MAP_TYPE_STACK_TRACE): for capturing stack traces as keys in maps. Eg, stackcount. Helper | Kernel version | Commit
-------|----------------|-------
## 4.7 `BPF_FUNC_clone_redirect()` | 4.2 | [3896d655f4d4](https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=3896d655f4d491c67d669a15f275a39f713410f8)
`BPF_FUNC_csum_diff()` | 4.6 | [7d672345ed29](https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=7d672345ed295b1356a5d9f7111da1d1d7d65867)
- tracepoint support (BPF_PROG_TYPE_TRACEPOINT): BPF programs can now use static kernel tracepoints. `BPF_FUNC_csum_update()` | [4.9](https://git.kernel.org/cgit/linux/kernel/git/davem/net-next.git/commit/?id=36bbef52c7eb646ed6247055a2acd3851e317857) | [](https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=)
`BPF_FUNC_current_task_under_cgroup()` | [4.9](https://git.kernel.org/cgit/linux/kernel/git/davem/net-next.git/commit/?id=60d20f9195b260bdf0ac10c275ae9f6016f9c069) | [](https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=)
`BPF_FUNC_get_cgroup_classid()` | 4.3 | [8d20aabe1c76](https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=8d20aabe1c76cccac544d9fcc3ad7823d9e98a2d)
`BPF_FUNC_get_current_comm()` | 4.2 | [ffeedafbf023](https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=ffeedafbf0236f03aeb2e8db273b3e5ae5f5bc89)
`BPF_FUNC_get_current_pid_tgid()` | 4.2 | [ffeedafbf023](https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=ffeedafbf0236f03aeb2e8db273b3e5ae5f5bc89)
`BPF_FUNC_get_current_task()` | 4.8 | [606274c5abd8](https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=606274c5abd8e245add01bc7145a8cbb92b69ba8)
`BPF_FUNC_get_current_uid_gid()` | 4.2 | [ffeedafbf023](https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=ffeedafbf0236f03aeb2e8db273b3e5ae5f5bc89)
`BPF_FUNC_get_hash_recalc()` | 4.8 | [13c5c240f789](https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=13c5c240f789bbd2bcacb14a23771491485ae61f)
`BPF_FUNC_get_prandom_u32()` | 4.1 | [03e69b508b6f](https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=03e69b508b6f7c51743055c9f61d1dfeadf4b635)
`BPF_FUNC_get_route_realm()` | 4.4 | [c46646d0484f](https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=c46646d0484f5d08e2bede9b45034ba5b8b489cc)
`BPF_FUNC_get_smp_processor_id()` | 4.1 | [c04167ce2ca0](https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=c04167ce2ca0ecaeaafef006cb0d65cf01b68e42)
`BPF_FUNC_get_stackid()` | 4.6 | [d5a3b1f69186](https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=d5a3b1f691865be576c2bffa708549b8cdccda19)
`BPF_FUNC_ktime_get_ns()` | 4.1 | [d9847d310ab4](https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=d9847d310ab4003725e6ed1822682e24bd406908)
`BPF_FUNC_l3_csum_replace()` | 4.1 | [91bc4822c3d6](https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=91bc4822c3d61b9bb7ef66d3b77948a4f9177954)
`BPF_FUNC_l4_csum_replace()` | 4.1 | [91bc4822c3d6](https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=91bc4822c3d61b9bb7ef66d3b77948a4f9177954)
`BPF_FUNC_map_delete_elem()` | 3.19 | [d0003ec01c66](https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=d0003ec01c667b731c139e23de3306a8b328ccf5)
`BPF_FUNC_map_lookup_elem()` | 3.19 | [d0003ec01c66](https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=d0003ec01c667b731c139e23de3306a8b328ccf5)
`BPF_FUNC_map_update_elem()` | 3.19 | [d0003ec01c66](https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=d0003ec01c667b731c139e23de3306a8b328ccf5)
`BPF_FUNC_perf_event_output()` | 4.4 | [a43eec304259](https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=a43eec304259a6c637f4014a6d4767159b6a3aa3)
`BPF_FUNC_perf_event_read()` | 4.3 | [35578d798400](https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=35578d7984003097af2b1e34502bc943d40c1804)
`BPF_FUNC_probe_read()` | 4.1 | [2541517c32be](https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=2541517c32be2531e0da59dfd7efc1ce844644f5)
`BPF_FUNC_probe_write_user()` | 4.8 | [96ae52279594](https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=96ae52279594470622ff0585621a13e96b700600)
`BPF_FUNC_redirect()` | 4.4 | [27b29f63058d](https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=27b29f63058d26c6c1742f1993338280d5a41dc6)
`BPF_FUNC_set_hash_invalid()` | [4.9](https://git.kernel.org/cgit/linux/kernel/git/davem/net-next.git/commit/?id=7a4b28c6cc9ffac50f791b99cc7e46106436e5d8) | [](https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=)
`BPF_FUNC_skb_change_proto()` | 4.8 | [6578171a7ff0](https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=6578171a7ff0c31dc73258f93da7407510abf085)
`BPF_FUNC_skb_change_tail()` | [4.9](https://git.kernel.org/cgit/linux/kernel/git/davem/net-next.git/commit/?id=5293efe62df81908f2e90c9820c7edcc8e61f5e9) | [](https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=)
`BPF_FUNC_skb_change_type()` | 4.8 | [d2485c4242a82](https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=d2485c4242a826fdf493fd3a27b8b792965b9b9e)
`BPF_FUNC_skb_get_tunnel_key()` | 4.3 | [d3aa45ce6b94](https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=d3aa45ce6b94c65b83971257317867db13e5f492)
`BPF_FUNC_skb_get_tunnel_opt()` | 4.6 | [14ca0751c96f](https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=14ca0751c96f8d3d0f52e8ed3b3236f8b34d3460)
`BPF_FUNC_skb_load_bytes()` | 4.5 | [05c74e5e53f6](https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=05c74e5e53f6cb07502c3e6a820f33e2777b6605)
`BPF_FUNC_skb_pull_data()` | [4.9](https://git.kernel.org/cgit/linux/kernel/git/davem/net-next.git/commit/?id=36bbef52c7eb646ed6247055a2acd3851e317857) | [](https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=)
`BPF_FUNC_skb_set_tunnel_key()` | 4.3 | [d3aa45ce6b94](https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=d3aa45ce6b94c65b83971257317867db13e5f492)
`BPF_FUNC_skb_set_tunnel_opt()` | 4.6 | [14ca0751c96f](https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=14ca0751c96f8d3d0f52e8ed3b3236f8b34d3460)
`BPF_FUNC_skb_store_bytes()` | 4.1 | [91bc4822c3d6](https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=91bc4822c3d61b9bb7ef66d3b77948a4f9177954)
`BPF_FUNC_skb_under_cgroup()` | 4.8 | [4a482f34afcc](https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=4a482f34afcc162d8456f449b137ec2a95be60d8)
`BPF_FUNC_skb_vlan_pop()` | 4.3 | [4e10df9a60d9](https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=4e10df9a60d96ced321dd2af71da558c6b750078)
`BPF_FUNC_skb_vlan_push()` | 4.3 | [4e10df9a60d9](https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=4e10df9a60d96ced321dd2af71da558c6b750078)
`BPF_FUNC_tail_call()` | 4.2 | [04fd61ab36ec](https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=04fd61ab36ec065e194ab5e74ae34a5240d992bb)
`BPF_FUNC_trace_printk()` | 4.1 | [9c959c863f82](https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=9c959c863f8217a2ff3d7c296e8223654d240569)
...@@ -29,7 +29,8 @@ int handle_ingress(struct __sk_buff *skb) { ...@@ -29,7 +29,8 @@ int handle_ingress(struct __sk_buff *skb) {
struct ethernet_t *ethernet = cursor_advance(cursor, sizeof(*ethernet)); struct ethernet_t *ethernet = cursor_advance(cursor, sizeof(*ethernet));
struct bpf_tunnel_key tkey = {}; struct bpf_tunnel_key tkey = {};
bpf_skb_get_tunnel_key(skb, &tkey, sizeof(tkey), 0); bpf_skb_get_tunnel_key(skb, &tkey,
offsetof(struct bpf_tunnel_key, remote_ipv6[1]), 0);
int *ifindex = vni2if.lookup(&tkey.tunnel_id); int *ifindex = vni2if.lookup(&tkey.tunnel_id);
if (ifindex) { if (ifindex) {
...@@ -63,7 +64,8 @@ int handle_egress(struct __sk_buff *skb) { ...@@ -63,7 +64,8 @@ int handle_egress(struct __sk_buff *skb) {
u32 zero = 0; u32 zero = 0;
tkey.tunnel_id = dst_host->tunnel_id; tkey.tunnel_id = dst_host->tunnel_id;
tkey.remote_ipv4 = dst_host->remote_ipv4; tkey.remote_ipv4 = dst_host->remote_ipv4;
bpf_skb_set_tunnel_key(skb, &tkey, sizeof(tkey), 0); bpf_skb_set_tunnel_key(skb, &tkey,
offsetof(struct bpf_tunnel_key, remote_ipv6[1]), 0);
lock_xadd(&dst_host->tx_pkts, 1); lock_xadd(&dst_host->tx_pkts, 1);
} else { } else {
struct bpf_tunnel_key tkey = {}; struct bpf_tunnel_key tkey = {};
...@@ -73,7 +75,8 @@ int handle_egress(struct __sk_buff *skb) { ...@@ -73,7 +75,8 @@ int handle_egress(struct __sk_buff *skb) {
return 1; return 1;
tkey.tunnel_id = dst_host->tunnel_id; tkey.tunnel_id = dst_host->tunnel_id;
tkey.remote_ipv4 = dst_host->remote_ipv4; tkey.remote_ipv4 = dst_host->remote_ipv4;
bpf_skb_set_tunnel_key(skb, &tkey, sizeof(tkey), 0); bpf_skb_set_tunnel_key(skb, &tkey,
offsetof(struct bpf_tunnel_key, remote_ipv6[1]), 0);
} }
bpf_clone_redirect(skb, cfg->tunnel_ifindex, 0/*egress*/); bpf_clone_redirect(skb, cfg->tunnel_ifindex, 0/*egress*/);
return 1; return 1;
......
...@@ -19,7 +19,8 @@ BPF_TABLE("hash", int, struct tunnel_key, if2tunkey, 1024); ...@@ -19,7 +19,8 @@ BPF_TABLE("hash", int, struct tunnel_key, if2tunkey, 1024);
int handle_ingress(struct __sk_buff *skb) { int handle_ingress(struct __sk_buff *skb) {
struct bpf_tunnel_key tkey = {}; struct bpf_tunnel_key tkey = {};
struct tunnel_key key; struct tunnel_key key;
bpf_skb_get_tunnel_key(skb, &tkey, sizeof(tkey), 0); bpf_skb_get_tunnel_key(skb, &tkey,
offsetof(struct bpf_tunnel_key, remote_ipv6[1]), 0);
key.tunnel_id = tkey.tunnel_id; key.tunnel_id = tkey.tunnel_id;
key.remote_ipv4 = tkey.remote_ipv4; key.remote_ipv4 = tkey.remote_ipv4;
...@@ -57,7 +58,8 @@ int handle_egress(struct __sk_buff *skb) { ...@@ -57,7 +58,8 @@ int handle_egress(struct __sk_buff *skb) {
if (key_p) { if (key_p) {
tkey.tunnel_id = key_p->tunnel_id; tkey.tunnel_id = key_p->tunnel_id;
tkey.remote_ipv4 = key_p->remote_ipv4; tkey.remote_ipv4 = key_p->remote_ipv4;
bpf_skb_set_tunnel_key(skb, &tkey, sizeof(tkey), 0); bpf_skb_set_tunnel_key(skb, &tkey,
offsetof(struct bpf_tunnel_key, remote_ipv6[1]), 0);
bpf_clone_redirect(skb, cfg->tunnel_ifindex, 0/*egress*/); bpf_clone_redirect(skb, cfg->tunnel_ifindex, 0/*egress*/);
} }
return 1; return 1;
......
images/bcc_tracing_tools_2016.png

265 KB | W: | H:

images/bcc_tracing_tools_2016.png

254 KB | W: | H:

images/bcc_tracing_tools_2016.png
images/bcc_tracing_tools_2016.png
images/bcc_tracing_tools_2016.png
images/bcc_tracing_tools_2016.png
  • 2-up
  • Swipe
  • Onion skin
...@@ -92,10 +92,9 @@ The expression(s) to capture. ...@@ -92,10 +92,9 @@ The expression(s) to capture.
These are the values that are assigned to the histogram or raw event collection. These are the values that are assigned to the histogram or raw event collection.
You may use the parameters directly, or valid C expressions that involve the You may use the parameters directly, or valid C expressions that involve the
parameters, such as "size % 10". parameters, such as "size % 10".
Tracepoints may access a special structure called "tp" that is formatted according Tracepoints may access a special structure called "args" that is formatted
to the tracepoint format (which you can obtain using tplist). For example, the according to the tracepoint format (which you can obtain using tplist).
block:block_rq_complete tracepoint can access tp.nr_sector. You may also use the For example, the block:block_rq_complete tracepoint can access args->nr_sector.
members of the "tp" struct directly, e.g. "nr_sector" instead of "tp.nr_sector".
USDT probes may access the arguments defined by the tracing program in the USDT probes may access the arguments defined by the tracing program in the
special arg1, arg2, ... variables. To obtain their types, use the tplist tool. special arg1, arg2, ... variables. To obtain their types, use the tplist tool.
Return probes can use the argument values received by the Return probes can use the argument values received by the
......
...@@ -9,9 +9,9 @@ on who deleted the file, the file age, and the file name. The intent is to ...@@ -9,9 +9,9 @@ on who deleted the file, the file age, and the file name. The intent is to
provide information on short-lived files, for debugging or performance provide information on short-lived files, for debugging or performance
analysis. analysis.
This works by tracing the kernel vfs_create() and vfs_delete() functions using This works by tracing the kernel vfs_create() and vfs_delete() functions (and
dynamic tracing, and will need updating to match any changes to these maybe more, see the source) using dynamic tracing, and will need updating to
functions. match any changes to these functions.
This makes use of a Linux 4.5 feature (bpf_perf_event_output()); This makes use of a Linux 4.5 feature (bpf_perf_event_output());
for kernels older than 4.5, see the version under tools/old, for kernels older than 4.5, see the version under tools/old,
......
.TH tcptop 8 "2016-09-13" "USER COMMANDS"
.SH NAME
tcptop \- Summarize TCP send/recv throughput by host. Top for TCP.
.SH SYNOPSIS
.B tcptop [\-h] [\-C] [\-S] [\-p PID] [interval] [count]
.SH DESCRIPTION
This is top for TCP sessions.
This summarizes TCP send/receive Kbytes by host, and prints a summary that
refreshes, along other system-wide metrics.
This uses dynamic tracing of kernel TCP send/receive functions, and will
need to be updated to match kernel changes.
The traced TCP functions are usually called at a lower rate than
per-packet functions, and therefore have lower overhead. The traced data is
summarized in-kernel using a BPF map to further reduce overhead. At very high
TCP event rates, the overhead may still be measurable. See the OVERHEAD
section for more details.
Since this uses BPF, only the root user can use this tool.
.SH REQUIREMENTS
CONFIG_BPF and bcc.
.SH OPTIONS
.TP
\-h
Print USAGE message.
.TP
\-C
Don't clear the screen.
.TP
\-S
Don't print the system summary line (load averages).
.TP
\-p PID
Trace this PID only.
.TP
interval
Interval between updates, seconds (default 1).
.TP
count
Number of interval summaries (default is many).
.SH EXAMPLES
.TP
Summarize TCP throughput by active sessions, 1 second refresh:
#
.B tcptop
.TP
Don't clear the screen (rolling output), and 5 second summaries:
#
.B tcptop \-C 5
.TP
Trace PID 181 only, and don't clear the screen:
#
.B tcptop \-Cp 181
.SH FIELDS
.TP
loadavg:
The contents of /proc/loadavg
.TP
PID
Process ID.
.TP
COMM
Process name.
.TP
LADDR
Local address (IPv4), and TCP port
.TP
RADDR
Remote address (IPv4), and TCP port
.TP
LADDR6
Source address (IPv6), and TCP port
.TP
RADDR6
Destination address (IPv6), and TCP port
.TP
RX_KB
Received Kbytes
.TP
TX_KB
Transmitted Kbytes
.SH OVERHEAD
This traces all send/receives in TCP, high in the TCP/IP stack (close to the
application) which are usually called at a lower rate than per-packet
functions, lowering overhead. It also summarizes data in-kernel to further
reduce overhead. These techniques help, but there may still be measurable
overhead at high send/receive rates, eg, ~13% of one CPU at 100k events/sec.
use funccount to count the kprobes in the tool to find out this rate, as the
overhead is relative to the rate. Some sample production servers tested found
total TCP event rates of 4k to 15k per second, and the CPU overhead at these
rates ranged from 0.5% to 2.0% of one CPU. If your send/receive rate is low
(eg, <1000/sec) then the overhead is expected to be negligible; Test in a lab
environment first.
.SH SOURCE
This is from bcc.
.IP
https://github.com/iovisor/bcc
.PP
Also look in the bcc distribution for a companion _examples.txt file containing
example usage, output, and commentary for this tool.
.SH OS
Linux
.SH STABILITY
Unstable - in development.
.SH AUTHOR
Brendan Gregg
.SH INSPIRATION
top(1) by William LeFebvre
.SH SEE ALSO
tcpconnect(8), tcpaccept(8)
...@@ -93,11 +93,9 @@ format specifier replacements may be any C expressions, and may refer to the ...@@ -93,11 +93,9 @@ format specifier replacements may be any C expressions, and may refer to the
same special keywords as in the predicate (arg1, arg2, etc.). same special keywords as in the predicate (arg1, arg2, etc.).
In tracepoints, both the predicate and the arguments may refer to the tracepoint In tracepoints, both the predicate and the arguments may refer to the tracepoint
format structure, which is stored in the special "tp" variable. For example, the format structure, which is stored in the special "args" variable. For example, the
block:block_rq_complete tracepoint can print or filter by tp.nr_sector. To block:block_rq_complete tracepoint can print or filter by args->nr_sector. To
discover the format of your tracepoint, use the tplist tool. Note that you can discover the format of your tracepoint, use the tplist tool.
also use the members of the "tp" struct directly, e.g "nr_sector" instead of
"tp.nr_sector".
In USDT probes, the arg1, ..., argN variables refer to the probe's arguments. In USDT probes, the arg1, ..., argN variables refer to the probe's arguments.
To determine which arguments your probe has, use the tplist tool. To determine which arguments your probe has, use the tplist tool.
...@@ -126,7 +124,7 @@ Trace returns from the readline function in bash and print the return value as a ...@@ -126,7 +124,7 @@ Trace returns from the readline function in bash and print the return value as a
.TP .TP
Trace the block:block_rq_complete tracepoint and print the number of sectors completed: Trace the block:block_rq_complete tracepoint and print the number of sectors completed:
# #
.B trace 't:block:block_rq_complete """%d sectors"", nr_sector' .B trace 't:block:block_rq_complete """%d sectors"", args->nr_sector'
.TP .TP
Trace the pthread_create USDT probe from the pthread library and print the address of the thread's start function: Trace the pthread_create USDT probe from the pthread library and print the address of the thread's start function:
# #
......
...@@ -63,7 +63,7 @@ target_link_libraries(bcc-static b_frontend clang_frontend bcc-loader-static ${c ...@@ -63,7 +63,7 @@ target_link_libraries(bcc-static b_frontend clang_frontend bcc-loader-static ${c
install(TARGETS bcc-shared LIBRARY COMPONENT libbcc install(TARGETS bcc-shared LIBRARY COMPONENT libbcc
DESTINATION ${CMAKE_INSTALL_LIBDIR}) DESTINATION ${CMAKE_INSTALL_LIBDIR})
install(FILES bpf_common.h bpf_module.h bcc_syms.h ../libbpf.h COMPONENT libbcc install(FILES bpf_common.h bpf_module.h bcc_syms.h libbpf.h COMPONENT libbcc
DESTINATION include/bcc) DESTINATION include/bcc)
install(DIRECTORY compat/linux/ COMPONENT libbcc install(DIRECTORY compat/linux/ COMPONENT libbcc
DESTINATION include/bcc/compat/linux DESTINATION include/bcc/compat/linux
......
...@@ -26,8 +26,23 @@ void *bcc_usdt_new_frompid(int pid); ...@@ -26,8 +26,23 @@ void *bcc_usdt_new_frompid(int pid);
void *bcc_usdt_new_frompath(const char *path); void *bcc_usdt_new_frompath(const char *path);
void bcc_usdt_close(void *usdt); void bcc_usdt_close(void *usdt);
struct bcc_usdt {
const char *provider;
const char *name;
const char *bin_path;
uint64_t semaphore;
int num_locations;
int num_arguments;
};
typedef void (*bcc_usdt_cb)(struct bcc_usdt *);
void bcc_usdt_foreach(void *usdt, bcc_usdt_cb callback);
int bcc_usdt_enable_probe(void *, const char *, const char *); int bcc_usdt_enable_probe(void *, const char *, const char *);
const char *bcc_usdt_genargs(void *); const char *bcc_usdt_genargs(void *);
const char *bcc_usdt_get_probe_argctype(
void *ctx, const char* probe_name, const int arg_index
);
typedef void (*bcc_usdt_uprobe_cb)(const char *, const char *, uint64_t, int); typedef void (*bcc_usdt_uprobe_cb)(const char *, const char *, uint64_t, int);
void bcc_usdt_foreach_uprobe(void *usdt, bcc_usdt_uprobe_cb callback); void bcc_usdt_foreach_uprobe(void *usdt, bcc_usdt_uprobe_cb callback);
......
...@@ -13,8 +13,8 @@ ...@@ -13,8 +13,8 @@
* See the License for the specific language governing permissions and * See the License for the specific language governing permissions and
* limitations under the License. * limitations under the License.
*/ */
#include "cc/bpf_module.h" #include "bpf_common.h"
#include "cc/bpf_common.h" #include "bpf_module.h"
extern "C" { extern "C" {
void * bpf_module_create_b(const char *filename, const char *proto_filename, unsigned flags) { void * bpf_module_create_b(const char *filename, const char *proto_filename, unsigned flags) {
......
...@@ -24,6 +24,7 @@ ...@@ -24,6 +24,7 @@
#include "bcc_proc.h" #include "bcc_proc.h"
#include "usdt.h" #include "usdt.h"
#include "vendor/tinyformat.hpp" #include "vendor/tinyformat.hpp"
#include "bcc_usdt.h"
namespace USDT { namespace USDT {
...@@ -255,6 +256,19 @@ bool Context::enable_probe(const std::string &probe_name, ...@@ -255,6 +256,19 @@ bool Context::enable_probe(const std::string &probe_name,
return p && p->enable(fn_name); return p && p->enable(fn_name);
} }
void Context::each(each_cb callback) {
for (const auto &probe : probes_) {
struct bcc_usdt info = {0};
info.provider = probe->provider().c_str();
info.bin_path = probe->bin_path().c_str();
info.name = probe->name().c_str();
info.semaphore = probe->semaphore();
info.num_locations = probe->num_locations();
info.num_arguments = probe->num_arguments();
callback(&info);
}
}
void Context::each_uprobe(each_uprobe_cb callback) { void Context::each_uprobe(each_uprobe_cb callback) {
for (auto &p : probes_) { for (auto &p : probes_) {
if (!p->enabled()) if (!p->enabled())
...@@ -288,7 +302,6 @@ Context::~Context() { ...@@ -288,7 +302,6 @@ Context::~Context() {
} }
extern "C" { extern "C" {
#include "bcc_usdt.h"
void *bcc_usdt_new_frompid(int pid) { void *bcc_usdt_new_frompid(int pid) {
USDT::Context *ctx = new USDT::Context(pid); USDT::Context *ctx = new USDT::Context(pid);
...@@ -331,6 +344,19 @@ const char *bcc_usdt_genargs(void *usdt) { ...@@ -331,6 +344,19 @@ const char *bcc_usdt_genargs(void *usdt) {
return storage_.c_str(); return storage_.c_str();
} }
const char *bcc_usdt_get_probe_argctype(
void *ctx, const char* probe_name, const int arg_index
) {
USDT::Probe *p = static_cast<USDT::Context *>(ctx)->get(probe_name);
std::string res = p ? p->get_arg_ctype(arg_index) : "";
return res.c_str();
}
void bcc_usdt_foreach(void *usdt, bcc_usdt_cb callback) {
USDT::Context *ctx = static_cast<USDT::Context *>(usdt);
ctx->each(callback);
}
void bcc_usdt_foreach_uprobe(void *usdt, bcc_usdt_uprobe_cb callback) { void bcc_usdt_foreach_uprobe(void *usdt, bcc_usdt_uprobe_cb callback) {
USDT::Context *ctx = static_cast<USDT::Context *>(usdt); USDT::Context *ctx = static_cast<USDT::Context *>(usdt);
ctx->each_uprobe(callback); ctx->each_uprobe(callback);
......
...@@ -23,6 +23,8 @@ ...@@ -23,6 +23,8 @@
#include "syms.h" #include "syms.h"
#include "vendor/optional.hpp" #include "vendor/optional.hpp"
struct bcc_usdt;
namespace USDT { namespace USDT {
using std::experimental::optional; using std::experimental::optional;
...@@ -148,9 +150,13 @@ public: ...@@ -148,9 +150,13 @@ public:
size_t num_locations() const { return locations_.size(); } size_t num_locations() const { return locations_.size(); }
size_t num_arguments() const { return locations_.front().arguments_.size(); } size_t num_arguments() const { return locations_.front().arguments_.size(); }
uint64_t semaphore() const { return semaphore_; }
uint64_t address(size_t n = 0) const { return locations_[n].address_; } uint64_t address(size_t n = 0) const { return locations_[n].address_; }
bool usdt_getarg(std::ostream &stream); bool usdt_getarg(std::ostream &stream);
std::string get_arg_ctype(int arg_index) {
return largest_arg_type(arg_index);
}
bool need_enable() const { return semaphore_ != 0x0; } bool need_enable() const { return semaphore_ != 0x0; }
bool enable(const std::string &fn_name); bool enable(const std::string &fn_name);
...@@ -194,6 +200,9 @@ public: ...@@ -194,6 +200,9 @@ public:
bool enable_probe(const std::string &probe_name, const std::string &fn_name); bool enable_probe(const std::string &probe_name, const std::string &fn_name);
bool generate_usdt_args(std::ostream &stream); bool generate_usdt_args(std::ostream &stream);
typedef void (*each_cb)(struct bcc_usdt *);
void each(each_cb callback);
typedef void (*each_uprobe_cb)(const char *, const char *, uint64_t, int); typedef void (*each_uprobe_cb)(const char *, const char *, uint64_t, int);
void each_uprobe(each_uprobe_cb callback); void each_uprobe(each_uprobe_cb callback);
}; };
......
...@@ -27,7 +27,6 @@ basestring = (unicode if sys.version_info[0] < 3 else str) ...@@ -27,7 +27,6 @@ basestring = (unicode if sys.version_info[0] < 3 else str)
from .libbcc import lib, _CB_TYPE, bcc_symbol from .libbcc import lib, _CB_TYPE, bcc_symbol
from .table import Table from .table import Table
from .tracepoint import Tracepoint
from .perf import Perf from .perf import Perf
from .usyms import ProcessSymbols from .usyms import ProcessSymbols
...@@ -120,9 +119,9 @@ class BPF(object): ...@@ -120,9 +119,9 @@ class BPF(object):
return filename return filename
@staticmethod @staticmethod
def _find_exe(cls, bin_path): def find_exe(bin_path):
""" """
_find_exe(bin_path) find_exe(bin_path)
Traverses the PATH environment variable, looking for the first Traverses the PATH environment variable, looking for the first
directory that contains an executable file named bin_path, and directory that contains an executable file named bin_path, and
...@@ -149,7 +148,7 @@ class BPF(object): ...@@ -149,7 +148,7 @@ class BPF(object):
return None return None
def __init__(self, src_file="", hdr_file="", text=None, cb=None, debug=0, def __init__(self, src_file="", hdr_file="", text=None, cb=None, debug=0,
cflags=[], usdt=None): cflags=[], usdt_contexts=[]):
"""Create a a new BPF module with the given source code. """Create a a new BPF module with the given source code.
Note: Note:
...@@ -179,7 +178,15 @@ class BPF(object): ...@@ -179,7 +178,15 @@ class BPF(object):
self.tables = {} self.tables = {}
cflags_array = (ct.c_char_p * len(cflags))() cflags_array = (ct.c_char_p * len(cflags))()
for i, s in enumerate(cflags): cflags_array[i] = s.encode("ascii") for i, s in enumerate(cflags): cflags_array[i] = s.encode("ascii")
if usdt and text: text = usdt.get_text() + text if text:
for usdt_context in usdt_contexts:
usdt_text = usdt_context.get_text()
if usdt_text is None:
raise Exception("can't generate USDT probe arguments; " +
"possible cause is missing pid when a " +
"probe in a shared object has multiple " +
"locations")
text = usdt_context.get_text() + text
if text: if text:
self.module = lib.bpf_module_create_c_from_string(text.encode("ascii"), self.module = lib.bpf_module_create_c_from_string(text.encode("ascii"),
...@@ -197,7 +204,8 @@ class BPF(object): ...@@ -197,7 +204,8 @@ class BPF(object):
if not self.module: if not self.module:
raise Exception("Failed to compile BPF module %s" % src_file) raise Exception("Failed to compile BPF module %s" % src_file)
if usdt: usdt.attach_uprobes(self) for usdt_context in usdt_contexts:
usdt_context.attach_uprobes(self)
# If any "kprobe__" or "tracepoint__" prefixed functions were defined, # If any "kprobe__" or "tracepoint__" prefixed functions were defined,
# they will be loaded and attached here. # they will be loaded and attached here.
......
...@@ -157,7 +157,26 @@ lib.bcc_usdt_enable_probe.argtypes = [ct.c_void_p, ct.c_char_p, ct.c_char_p] ...@@ -157,7 +157,26 @@ lib.bcc_usdt_enable_probe.argtypes = [ct.c_void_p, ct.c_char_p, ct.c_char_p]
lib.bcc_usdt_genargs.restype = ct.c_char_p lib.bcc_usdt_genargs.restype = ct.c_char_p
lib.bcc_usdt_genargs.argtypes = [ct.c_void_p] lib.bcc_usdt_genargs.argtypes = [ct.c_void_p]
_USDT_CB = ct.CFUNCTYPE(None, ct.c_char_p, ct.c_char_p, ct.c_ulonglong, ct.c_int) lib.bcc_usdt_get_probe_argctype.restype = ct.c_char_p
lib.bcc_usdt_get_probe_argctype.argtypes = [ct.c_void_p, ct.c_char_p, ct.c_int]
class bcc_usdt(ct.Structure):
_fields_ = [
('provider', ct.c_char_p),
('name', ct.c_char_p),
('bin_path', ct.c_char_p),
('semaphore', ct.c_ulonglong),
('num_locations', ct.c_int),
('num_arguments', ct.c_int),
]
_USDT_CB = ct.CFUNCTYPE(None, ct.POINTER(bcc_usdt))
lib.bcc_usdt_foreach.restype = None
lib.bcc_usdt_foreach.argtypes = [ct.c_void_p, _USDT_CB]
_USDT_PROBE_CB = ct.CFUNCTYPE(None, ct.c_char_p, ct.c_char_p,
ct.c_ulonglong, ct.c_int)
lib.bcc_usdt_foreach_uprobe.restype = None lib.bcc_usdt_foreach_uprobe.restype = None
lib.bcc_usdt_foreach_uprobe.argtypes = [ct.c_void_p, _USDT_CB] lib.bcc_usdt_foreach_uprobe.argtypes = [ct.c_void_p, _USDT_PROBE_CB]
# Copyright 2016 Sasha Goldshtein
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import ctypes as ct
import multiprocessing
import os
import re
class Tracepoint(object):
enabled_tracepoints = []
trace_root = "/sys/kernel/debug/tracing"
event_root = os.path.join(trace_root, "events")
@classmethod
def _any_tracepoints_enabled(cls):
return len(cls.enabled_tracepoints) > 0
@classmethod
def generate_decl(cls):
if not cls._any_tracepoints_enabled():
return ""
return "\nBPF_HASH(__trace_di, u64, u64);\n"
@classmethod
def generate_entry_probe(cls):
if not cls._any_tracepoints_enabled():
return ""
return """
int __trace_entry_update(struct pt_regs *ctx)
{
u64 tid = bpf_get_current_pid_tgid();
u64 val = PT_REGS_PARM1(ctx);
__trace_di.update(&tid, &val);
return 0;
}
"""
def __init__(self, category, event, tp_id):
self.category = category
self.event = event
self.tp_id = tp_id
self._retrieve_struct_fields()
def _retrieve_struct_fields(self):
self.struct_fields = []
format_lines = Tracepoint.get_tpoint_format(self.category,
self.event)
for line in format_lines:
match = re.search(r'field:([^;]*);.*size:(\d+);', line)
if match is None:
continue
parts = match.group(1).split()
field_name = parts[-1:][0]
field_type = " ".join(parts[:-1])
field_size = int(match.group(2))
if "__data_loc" in field_type:
continue
if field_name.startswith("common_"):
continue
self.struct_fields.append((field_type, field_name))
def _generate_struct_fields(self):
text = ""
for field_type, field_name in self.struct_fields:
text += " %s %s;\n" % (field_type, field_name)
return text
def generate_struct(self):
self.struct_name = self.event + "_trace_entry"
return """
struct %s {
u64 __do_not_use__;
%s
};
""" % (self.struct_name, self._generate_struct_fields())
def _generate_struct_locals(self):
text = ""
for field_type, field_name in self.struct_fields:
if field_type == "char" and field_name.endswith(']'):
# Special case for 'char whatever[N]', should
# be assigned to a 'char *'
field_type = "char *"
field_name = re.sub(r'\[\d+\]$', '', field_name)
text += " %s %s = tp.%s;\n" % (
field_type, field_name, field_name)
return text
def generate_get_struct(self):
return """
u64 tid = bpf_get_current_pid_tgid();
u64 *di = __trace_di.lookup(&tid);
if (di == 0) { return 0; }
struct %s tp = {};
bpf_probe_read(&tp, sizeof(tp), (void *)*di);
%s
""" % (self.struct_name, self._generate_struct_locals())
@classmethod
def enable_tracepoint(cls, category, event):
tp_id = cls.get_tpoint_id(category, event)
if tp_id == -1:
raise ValueError("no such tracepoint found: %s:%s" %
(category, event))
Perf.perf_event_open(tp_id, ptype=Perf.PERF_TYPE_TRACEPOINT)
new_tp = Tracepoint(category, event, tp_id)
cls.enabled_tracepoints.append(new_tp)
return new_tp
@staticmethod
def get_tpoint_id(category, event):
evt_dir = os.path.join(Tracepoint.event_root, category, event)
try:
return int(
open(os.path.join(evt_dir, "id")).read().strip())
except:
return -1
@staticmethod
def get_tpoint_format(category, event):
evt_dir = os.path.join(Tracepoint.event_root, category, event)
try:
return open(os.path.join(evt_dir, "format")).readlines()
except:
return ""
@classmethod
def attach(cls, bpf):
if cls._any_tracepoints_enabled():
bpf.attach_kprobe(event="tracing_generic_entry_update",
fn_name="__trace_entry_update")
...@@ -12,34 +12,71 @@ ...@@ -12,34 +12,71 @@
# See the License for the specific language governing permissions and # See the License for the specific language governing permissions and
# limitations under the License. # limitations under the License.
from .libbcc import lib, _USDT_CB from .libbcc import lib, _USDT_CB, _USDT_PROBE_CB
class USDTProbe(object):
def __init__(self, usdt):
self.provider = usdt.provider
self.name = usdt.name
self.bin_path = usdt.bin_path
self.semaphore = usdt.semaphore
self.num_locations = usdt.num_locations
self.num_arguments = usdt.num_arguments
def __str__(self):
return "%s %s:%s [sema 0x%x]\n %d location(s)\n %d argument(s)" % \
(self.bin_path, self.provider, self.name, self.semaphore,
self.num_locations, self.num_arguments)
def short_name(self):
return "%s:%s" % (self.provider, self.name)
class USDT(object): class USDT(object):
def __init__(self, pid=None, path=None): def __init__(self, pid=None, path=None):
if pid: if pid and pid != -1:
self.pid = pid self.pid = pid
self.context = lib.bcc_usdt_new_frompid(pid) self.context = lib.bcc_usdt_new_frompid(pid)
if self.context == None: if self.context == None:
raise Exception("USDT failed to instrument PID %d" % pid) raise Exception("USDT failed to instrument PID %d" % pid)
elif path: elif path:
self.path = path self.path = path
self.context = lib.bcc_usdt_new_frompath(path) self.context = lib.bcc_usdt_new_frompath(path)
if self.context == None: if self.context == None:
raise Exception("USDT failed to instrument path %s" % path) raise Exception("USDT failed to instrument path %s" % path)
else:
raise Exception("either a pid or a binary path must be specified")
def enable_probe(self, probe, fn_name): def enable_probe(self, probe, fn_name):
if lib.bcc_usdt_enable_probe(self.context, probe, fn_name) != 0: if lib.bcc_usdt_enable_probe(self.context, probe, fn_name) != 0:
raise Exception("failed to enable probe '%s'" % probe) raise Exception(("failed to enable probe '%s'; a possible cause " +
"can be that the probe requires a pid to enable") %
probe)
def get_text(self): def get_text(self):
return lib.bcc_usdt_genargs(self.context) return lib.bcc_usdt_genargs(self.context)
def get_probe_arg_ctype(self, probe_name, arg_index):
return lib.bcc_usdt_get_probe_argctype(
self.context, probe_name, arg_index)
def enumerate_probes(self):
probes = []
def _add_probe(probe):
probes.append(USDTProbe(probe.contents))
lib.bcc_usdt_foreach(self.context, _USDT_CB(_add_probe))
return probes
# This is called by the BPF module's __init__ when it realizes that there
# is a USDT context and probes need to be attached.
def attach_uprobes(self, bpf): def attach_uprobes(self, bpf):
probes = [] probes = []
def _add_probe(binpath, fn_name, addr, pid): def _add_probe(binpath, fn_name, addr, pid):
probes.append((binpath, fn_name, addr, pid)) probes.append((binpath, fn_name, addr, pid))
lib.bcc_usdt_foreach_uprobe(self.context, _USDT_CB(_add_probe)) lib.bcc_usdt_foreach_uprobe(self.context, _USDT_PROBE_CB(_add_probe))
for (binpath, fn_name, addr, pid) in probes: for (binpath, fn_name, addr, pid) in probes:
bpf.attach_uprobe(name=binpath, fn_name=fn_name, addr=addr, pid=pid) bpf.attach_uprobe(name=binpath, fn_name=fn_name,
addr=addr, pid=pid)
...@@ -56,15 +56,6 @@ TEST_CASE("test finding a probe in our own process", "[usdt]") { ...@@ -56,15 +56,6 @@ TEST_CASE("test finding a probe in our own process", "[usdt]") {
} }
#endif // HAVE_SDT_HEADER #endif // HAVE_SDT_HEADER
static size_t countsubs(const std::string &str, const std::string &sub) {
size_t count = 0;
for (size_t offset = str.find(sub); offset != std::string::npos;
offset = str.find(sub, offset + sub.length())) {
++count;
}
return count;
}
class ChildProcess { class ChildProcess {
pid_t pid_; pid_t pid_;
......
file(GLOB C_FILES *.c) file(GLOB C_FILES *.c)
file(GLOB PY_FILES *.py) file(GLOB PY_FILES *.py)
file(GLOB TXT_FILES *.txt) file(GLOB TXT_FILES *.txt)
list(REMOVE_ITEM TXT_FILES "CMakeLists.txt")
foreach(FIL ${PY_FILES}) foreach(FIL ${PY_FILES})
get_filename_component(FIL_WE ${FIL} NAME_WE) get_filename_component(FIL_WE ${FIL} NAME_WE)
install(PROGRAMS ${FIL} DESTINATION share/bcc/tools RENAME ${FIL_WE}) install(PROGRAMS ${FIL} DESTINATION share/bcc/tools RENAME ${FIL_WE})
......
...@@ -4,7 +4,7 @@ ...@@ -4,7 +4,7 @@
# parameter values as a histogram or frequency count. # parameter values as a histogram or frequency count.
# #
# USAGE: argdist [-h] [-p PID] [-z STRING_SIZE] [-i INTERVAL] # USAGE: argdist [-h] [-p PID] [-z STRING_SIZE] [-i INTERVAL]
# [-n COUNT] [-v] [-T TOP] # [-n COUNT] [-v] [-c] [-T TOP]
# [-C specifier [specifier ...]] # [-C specifier [specifier ...]]
# [-H specifier [specifier ...]] # [-H specifier [specifier ...]]
# [-I header [header ...]] # [-I header [header ...]]
...@@ -12,7 +12,7 @@ ...@@ -12,7 +12,7 @@
# Licensed under the Apache License, Version 2.0 (the "License") # Licensed under the Apache License, Version 2.0 (the "License")
# Copyright (C) 2016 Sasha Goldshtein. # Copyright (C) 2016 Sasha Goldshtein.
from bcc import BPF, Tracepoint, Perf, USDT from bcc import BPF, USDT
from time import sleep, strftime from time import sleep, strftime
import argparse import argparse
import re import re
...@@ -175,8 +175,10 @@ u64 __time = bpf_ktime_get_ns(); ...@@ -175,8 +175,10 @@ u64 __time = bpf_ktime_get_ns();
self._bail("no exprs specified") self._bail("no exprs specified")
self.exprs = exprs.split(',') self.exprs = exprs.split(',')
def __init__(self, bpf, type, specifier): def __init__(self, tool, type, specifier):
self.pid = bpf.args.pid self.usdt_ctx = None
self.pid = tool.args.pid
self.cumulative = tool.args.cumulative or False
self.raw_spec = specifier self.raw_spec = specifier
self._validate_specifier() self._validate_specifier()
...@@ -193,15 +195,11 @@ u64 __time = bpf_ktime_get_ns(); ...@@ -193,15 +195,11 @@ u64 __time = bpf_ktime_get_ns();
self.library = "" # kernel self.library = "" # kernel
self.tp_category = parts[1] self.tp_category = parts[1]
self.tp_event = self.function self.tp_event = self.function
self.tp = Tracepoint.enable_tracepoint(
self.tp_category, self.tp_event)
self.function = "perf_trace_" + self.function
elif self.probe_type == "u": elif self.probe_type == "u":
self.library = parts[1] self.library = parts[1]
self.probe_func_name = "%s_probe%d" % \ self.probe_func_name = "%s_probe%d" % \
(self.function, Probe.next_probe_index) (self.function, Probe.next_probe_index)
bpf.enable_usdt_probe(self.function, self._enable_usdt_probe()
fn_name=self.probe_func_name)
else: else:
self.library = parts[1] self.library = parts[1]
self.is_user = len(self.library) > 0 self.is_user = len(self.library) > 0
...@@ -242,8 +240,10 @@ u64 __time = bpf_ktime_get_ns(); ...@@ -242,8 +240,10 @@ u64 __time = bpf_ktime_get_ns();
(self.function, Probe.next_probe_index) (self.function, Probe.next_probe_index)
Probe.next_probe_index += 1 Probe.next_probe_index += 1
def close(self): def _enable_usdt_probe(self):
pass self.usdt_ctx = USDT(path=self.library, pid=self.pid)
self.usdt_ctx.enable_probe(
self.function, self.probe_func_name)
def _substitute_exprs(self): def _substitute_exprs(self):
def repl(expr): def repl(expr):
...@@ -262,12 +262,17 @@ u64 __time = bpf_ktime_get_ns(); ...@@ -262,12 +262,17 @@ u64 __time = bpf_ktime_get_ns();
else: else:
return "%s v%d;\n" % (self.expr_types[i], i) return "%s v%d;\n" % (self.expr_types[i], i)
def _generate_usdt_arg_assignment(self, i):
expr = self.exprs[i]
if self.probe_type == "u" and expr[0:3] == "arg":
return (" u64 %s = 0;\n" +
" bpf_usdt_readarg(%s, ctx, &%s);\n") % \
(expr, expr[3], expr)
else:
return ""
def _generate_field_assignment(self, i): def _generate_field_assignment(self, i):
text = "" text = self._generate_usdt_arg_assignment(i)
if self.probe_type == "u" and self.exprs[i][0:3] == "arg":
text = (" u64 %s;\n" +
" bpf_usdt_readarg(%s, ctx, &%s);\n") % \
(self.exprs[i], self.exprs[i][3], self.exprs[i])
if self._is_string(self.expr_types[i]): if self._is_string(self.expr_types[i]):
return (text + " bpf_probe_read(&__key.v%d.s," + return (text + " bpf_probe_read(&__key.v%d.s," +
" sizeof(__key.v%d.s), (void *)%s);\n") % \ " sizeof(__key.v%d.s), (void *)%s);\n") % \
...@@ -291,8 +296,9 @@ u64 __time = bpf_ktime_get_ns(); ...@@ -291,8 +296,9 @@ u64 __time = bpf_ktime_get_ns();
def _generate_key_assignment(self): def _generate_key_assignment(self):
if self.type == "hist": if self.type == "hist":
return "%s __key = %s;\n" % \ return self._generate_usdt_arg_assignment(0) + \
(self.expr_types[0], self.exprs[0]) ("%s __key = %s;\n" % \
(self.expr_types[0], self.exprs[0]))
else: else:
text = "struct %s_key_t __key = {};\n" % \ text = "struct %s_key_t __key = {};\n" % \
self.probe_hash_name self.probe_hash_name
...@@ -320,8 +326,10 @@ u64 __time = bpf_ktime_get_ns(); ...@@ -320,8 +326,10 @@ u64 __time = bpf_ktime_get_ns();
program = "" program = ""
probe_text = """ probe_text = """
DATA_DECL DATA_DECL
""" + (
int PROBENAME(struct pt_regs *ctx SIGNATURE) "TRACEPOINT_PROBE(%s, %s)" % (self.tp_category, self.tp_event) \
if self.probe_type == "t" \
else "int PROBENAME(struct pt_regs *ctx SIGNATURE)") + """
{ {
PID_FILTER PID_FILTER
PREFIX PREFIX
...@@ -343,10 +351,7 @@ int PROBENAME(struct pt_regs *ctx SIGNATURE) ...@@ -343,10 +351,7 @@ int PROBENAME(struct pt_regs *ctx SIGNATURE)
# value we collected when entering the function: # value we collected when entering the function:
self._replace_entry_exprs() self._replace_entry_exprs()
if self.probe_type == "t": if self.probe_type == "p" and len(self.signature) > 0:
program += self.tp.generate_struct()
prefix += self.tp.generate_get_struct()
elif self.probe_type == "p" and len(self.signature) > 0:
# Only entry uprobes/kprobes can have user-specified # Only entry uprobes/kprobes can have user-specified
# signatures. Other probes force it to (). # signatures. Other probes force it to ().
signature = ", " + self.signature signature = ", " + self.signature
...@@ -371,7 +376,7 @@ int PROBENAME(struct pt_regs *ctx SIGNATURE) ...@@ -371,7 +376,7 @@ int PROBENAME(struct pt_regs *ctx SIGNATURE)
def _attach_u(self): def _attach_u(self):
libpath = BPF.find_library(self.library) libpath = BPF.find_library(self.library)
if libpath is None: if libpath is None:
libpath = BPF._find_exe(self.library) libpath = BPF.find_exe(self.library)
if libpath is None or len(libpath) == 0: if libpath is None or len(libpath) == 0:
self._bail("unable to find library %s" % self.library) self._bail("unable to find library %s" % self.library)
...@@ -387,7 +392,9 @@ int PROBENAME(struct pt_regs *ctx SIGNATURE) ...@@ -387,7 +392,9 @@ int PROBENAME(struct pt_regs *ctx SIGNATURE)
pid=self.pid or -1) pid=self.pid or -1)
def _attach_k(self): def _attach_k(self):
if self.probe_type == "r" or self.probe_type == "t": if self.probe_type == "t":
pass # Nothing to do for tracepoints
elif self.probe_type == "r":
self.bpf.attach_kretprobe(event=self.function, self.bpf.attach_kretprobe(event=self.function,
fn_name=self.probe_func_name) fn_name=self.probe_func_name)
else: else:
...@@ -443,10 +450,10 @@ int PROBENAME(struct pt_regs *ctx SIGNATURE) ...@@ -443,10 +450,10 @@ int PROBENAME(struct pt_regs *ctx SIGNATURE)
if self.type == "freq": if self.type == "freq":
print(self.label or self.raw_spec) print(self.label or self.raw_spec)
print("\t%-10s %s" % ("COUNT", "EVENT")) print("\t%-10s %s" % ("COUNT", "EVENT"))
data = sorted(data.items(), key=lambda kv: kv[1].value) sdata = sorted(data.items(), key=lambda kv: kv[1].value)
if top is not None: if top is not None:
data = data[-top:] sdata = sdata[-top:]
for key, value in data: for key, value in sdata:
# Print some nice values if the user didn't # Print some nice values if the user didn't
# specify an expression to probe # specify an expression to probe
if self.is_default_expr: if self.is_default_expr:
...@@ -463,6 +470,8 @@ int PROBENAME(struct pt_regs *ctx SIGNATURE) ...@@ -463,6 +470,8 @@ int PROBENAME(struct pt_regs *ctx SIGNATURE)
label = self.label or (self._display_expr(0) label = self.label or (self._display_expr(0)
if not self.is_default_expr else "retval") if not self.is_default_expr else "retval")
data.print_log2_hist(val_type=label) data.print_log2_hist(val_type=label)
if not self.cumulative:
data.clear()
def __str__(self): def __str__(self):
return self.label or self.raw_spec return self.label or self.raw_spec
...@@ -526,10 +535,10 @@ argdist -C 'p:c:fork()#fork calls' ...@@ -526,10 +535,10 @@ argdist -C 'p:c:fork()#fork calls'
Count fork() calls in libc across all processes Count fork() calls in libc across all processes
Can also use funccount.py, which is easier and more flexible Can also use funccount.py, which is easier and more flexible
argdist -H 't:block:block_rq_complete():u32:tp.nr_sector' argdist -H 't:block:block_rq_complete():u32:args->nr_sector'
Print histogram of number of sectors in completing block I/O requests Print histogram of number of sectors in completing block I/O requests
argdist -C 't:irq:irq_handler_entry():int:tp.irq' argdist -C 't:irq:irq_handler_entry():int:args->irq'
Aggregate interrupts by interrupt request (IRQ) Aggregate interrupts by interrupt request (IRQ)
argdist -C 'u:pthread:pthread_start():u64:arg2' -p 1337 argdist -C 'u:pthread:pthread_start():u64:arg2' -p 1337
...@@ -563,6 +572,8 @@ argdist -p 2780 -z 120 \\ ...@@ -563,6 +572,8 @@ argdist -p 2780 -z 120 \\
help="number of outputs") help="number of outputs")
parser.add_argument("-v", "--verbose", action="store_true", parser.add_argument("-v", "--verbose", action="store_true",
help="print resulting BPF program code before executing") help="print resulting BPF program code before executing")
parser.add_argument("-c", "--cumulative", action="store_true",
help="do not clear histograms and freq counts at each interval")
parser.add_argument("-T", "--top", type=int, parser.add_argument("-T", "--top", type=int,
help="number of top results to show (not applicable to " + help="number of top results to show (not applicable to " +
"histograms)") "histograms)")
...@@ -590,11 +601,6 @@ argdist -p 2780 -z 120 \\ ...@@ -590,11 +601,6 @@ argdist -p 2780 -z 120 \\
print("at least one specifier is required") print("at least one specifier is required")
exit() exit()
def enable_usdt_probe(self, probe_name, fn_name):
if not self.usdt_ctx:
self.usdt_ctx = USDT(pid=self.args.pid)
self.usdt_ctx.enable_probe(probe_name, fn_name)
def _generate_program(self): def _generate_program(self):
bpf_source = """ bpf_source = """
struct __string_t { char s[%d]; }; struct __string_t { char s[%d]; };
...@@ -605,17 +611,18 @@ struct __string_t { char s[%d]; }; ...@@ -605,17 +611,18 @@ struct __string_t { char s[%d]; };
bpf_source += "#include <%s>\n" % include bpf_source += "#include <%s>\n" % include
bpf_source += BPF.generate_auto_includes( bpf_source += BPF.generate_auto_includes(
map(lambda p: p.raw_spec, self.probes)) map(lambda p: p.raw_spec, self.probes))
bpf_source += Tracepoint.generate_decl()
bpf_source += Tracepoint.generate_entry_probe()
for probe in self.probes: for probe in self.probes:
bpf_source += probe.generate_text() bpf_source += probe.generate_text()
if self.args.verbose: if self.args.verbose:
if self.usdt_ctx: print(self.usdt_ctx.get_text()) for text in [probe.usdt_ctx.get_text() \
for probe in self.probes if probe.usdt_ctx]:
print(text)
print(bpf_source) print(bpf_source)
self.bpf = BPF(text=bpf_source, usdt=self.usdt_ctx) usdt_contexts = [probe.usdt_ctx
for probe in self.probes if probe.usdt_ctx]
self.bpf = BPF(text=bpf_source, usdt_contexts=usdt_contexts)
def _attach(self): def _attach(self):
Tracepoint.attach(self.bpf)
for probe in self.probes: for probe in self.probes:
probe.attach(self.bpf) probe.attach(self.bpf)
if self.args.verbose: if self.args.verbose:
...@@ -637,12 +644,6 @@ struct __string_t { char s[%d]; }; ...@@ -637,12 +644,6 @@ struct __string_t { char s[%d]; };
count_so_far >= self.args.count: count_so_far >= self.args.count:
exit() exit()
def _close_probes(self):
for probe in self.probes:
probe.close()
if self.args.verbose:
print("closed probe: " + str(probe))
def run(self): def run(self):
try: try:
self._create_probes() self._create_probes()
...@@ -654,7 +655,6 @@ struct __string_t { char s[%d]; }; ...@@ -654,7 +655,6 @@ struct __string_t { char s[%d]; };
traceback.print_exc() traceback.print_exc()
elif sys.exc_info()[0] is not SystemExit: elif sys.exc_info()[0] is not SystemExit:
print(sys.exc_info()[1]) print(sys.exc_info()[1])
self._close_probes()
if __name__ == "__main__": if __name__ == "__main__":
Tool().run() Tool().run()
...@@ -10,7 +10,7 @@ various functions. ...@@ -10,7 +10,7 @@ various functions.
For example, suppose you want to find what allocation sizes are common in For example, suppose you want to find what allocation sizes are common in
your application: your application:
# ./argdist -p 2420 -C 'p:c:malloc(size_t size):size_t:size' # ./argdist -p 2420 -c -C 'p:c:malloc(size_t size):size_t:size'
[01:42:29] [01:42:29]
p:c:malloc(size_t size):size_t:size p:c:malloc(size_t size):size_t:size
COUNT EVENT COUNT EVENT
...@@ -43,7 +43,7 @@ probed and its value was 16, repeatedly. ...@@ -43,7 +43,7 @@ probed and its value was 16, repeatedly.
Now, suppose you wanted a histogram of buffer sizes passed to the write() Now, suppose you wanted a histogram of buffer sizes passed to the write()
function across the system: function across the system:
# ./argdist -H 'p:c:write(int fd, void *buf, size_t len):size_t:len' # ./argdist -c -H 'p:c:write(int fd, void *buf, size_t len):size_t:len'
[01:45:22] [01:45:22]
p:c:write(int fd, void *buf, size_t len):size_t:len p:c:write(int fd, void *buf, size_t len):size_t:len
len : count distribution len : count distribution
...@@ -81,7 +81,7 @@ bytes, medium writes of 32-63 bytes, and larger writes of 64-127 bytes. ...@@ -81,7 +81,7 @@ bytes, medium writes of 32-63 bytes, and larger writes of 64-127 bytes.
But these are writes across the board -- what if you wanted to focus on writes But these are writes across the board -- what if you wanted to focus on writes
to STDOUT? to STDOUT?
# ./argdist -H 'p:c:write(int fd, void *buf, size_t len):size_t:len:fd==1' # ./argdist -c -H 'p:c:write(int fd, void *buf, size_t len):size_t:len:fd==1'
[01:47:17] [01:47:17]
p:c:write(int fd, void *buf, size_t len):size_t:len:fd==1 p:c:write(int fd, void *buf, size_t len):size_t:len:fd==1
len : count distribution len : count distribution
...@@ -232,7 +232,7 @@ multiple microseconds per byte. ...@@ -232,7 +232,7 @@ multiple microseconds per byte.
You could also group results by more than one field. For example, __kmalloc You could also group results by more than one field. For example, __kmalloc
takes an additional flags parameter that describes how to allocate memory: takes an additional flags parameter that describes how to allocate memory:
# ./argdist -C 'p::__kmalloc(size_t size, gfp_t flags):gfp_t,size_t:flags,size' # ./argdist -c -C 'p::__kmalloc(size_t size, gfp_t flags):gfp_t,size_t:flags,size'
[03:42:29] [03:42:29]
p::__kmalloc(size_t size, gfp_t flags):gfp_t,size_t:flags,size p::__kmalloc(size_t size, gfp_t flags):gfp_t,size_t:flags,size
COUNT EVENT COUNT EVENT
...@@ -264,29 +264,23 @@ certain kinds of allocations or visually group them together. ...@@ -264,29 +264,23 @@ certain kinds of allocations or visually group them together.
argdist also has basic support for kernel tracepoints. It is sometimes more argdist also has basic support for kernel tracepoints. It is sometimes more
convenient to use tracepoints because they are documented and don't vary a lot convenient to use tracepoints because they are documented and don't vary a lot
between kernel versions like function signatures tend to. For example, let's between kernel versions. For example, let's trace the net:net_dev_start_xmit
trace the net:net_dev_start_xmit tracepoint and print the interface name that tracepoint and print out the protocol field from the tracepoint structure:
is transmitting:
# argdist -C 't:net:net_dev_start_xmit(void *a, void *b, struct net_device *c):char*:c->name' -n 2 # argdist -C 't:net:net_dev_start_xmit():u16:args->protocol'
[05:01:10] [13:01:49]
t:net:net_dev_start_xmit(void *a, void *b, struct net_device *c):char*:c->name t:net:net_dev_start_xmit():u16:args->protocol
COUNT EVENT COUNT EVENT
4 c->name = eth0 8 args->protocol = 2048
[05:01:11] ^C
t:net:net_dev_start_xmit(void *a, void *b, struct net_device *c):char*:c->name
COUNT EVENT
6 c->name = lo
92 c->name = eth0
Note that to determine the necessary function signature you need to look at the Note that to discover the format of the net:net_dev_start_xmit tracepoint, you
TP_PROTO declaration in the kernel headers. For example, the net_dev_start_xmit use the tplist tool (tplist -v net:net_dev_start_xmit).
tracepoint is defined in the include/trace/events/net.h header file.
Here's a final example that finds how many write() system calls are performed Here's a final example that finds how many write() system calls are performed
by each process on the system: by each process on the system:
# argdist -C 'p:c:write():int:$PID;write per process' -n 2 # argdist -c -C 'p:c:write():int:$PID;write per process' -n 2
[06:47:18] [06:47:18]
write by process write by process
COUNT EVENT COUNT EVENT
...@@ -305,8 +299,8 @@ USAGE message: ...@@ -305,8 +299,8 @@ USAGE message:
# argdist -h # argdist -h
usage: argdist [-h] [-p PID] [-z STRING_SIZE] [-i INTERVAL] [-n COUNT] [-v] usage: argdist [-h] [-p PID] [-z STRING_SIZE] [-i INTERVAL] [-n COUNT] [-v]
[-T TOP] [-H [specifier [specifier ...]]] [-c] [-T TOP] [-H [specifier [specifier ...]]]
[-C [specifier [specifier ...]]] [-I [header [header ...]]] [-C [specifier [specifier ...]]] [-I [header [header ...]]]
Trace a function and display a summary of its parameter values. Trace a function and display a summary of its parameter values.
...@@ -320,6 +314,7 @@ optional arguments: ...@@ -320,6 +314,7 @@ optional arguments:
-n COUNT, --number COUNT -n COUNT, --number COUNT
number of outputs number of outputs
-v, --verbose print resulting BPF program code before executing -v, --verbose print resulting BPF program code before executing
-c, --cumulative do not clear histograms and freq counts at each interval
-T TOP, --top TOP number of top results to show (not applicable to -T TOP, --top TOP number of top results to show (not applicable to
histograms) histograms)
-H [specifier [specifier ...]], --histogram [specifier [specifier ...]] -H [specifier [specifier ...]], --histogram [specifier [specifier ...]]
...@@ -387,10 +382,10 @@ argdist -C 'p:c:fork()#fork calls' ...@@ -387,10 +382,10 @@ argdist -C 'p:c:fork()#fork calls'
Count fork() calls in libc across all processes Count fork() calls in libc across all processes
Can also use funccount.py, which is easier and more flexible Can also use funccount.py, which is easier and more flexible
argdist -H 't:block:block_rq_complete():u32:tp.nr_sector' argdist -H 't:block:block_rq_complete():u32:args->nr_sector'
Print histogram of number of sectors in completing block I/O requests Print histogram of number of sectors in completing block I/O requests
argdist -C 't:irq:irq_handler_entry():int:tp.irq' argdist -C 't:irq:irq_handler_entry():int:args->irq'
Aggregate interrupts by interrupt request (IRQ) Aggregate interrupts by interrupt request (IRQ)
argdist -C 'u:pthread:pthread_start():u64:arg2' -p 1337 argdist -C 'u:pthread:pthread_start():u64:arg2' -p 1337
......
...@@ -82,13 +82,14 @@ int trace_unlink(struct pt_regs *ctx, struct inode *dir, struct dentry *dentry) ...@@ -82,13 +82,14 @@ int trace_unlink(struct pt_regs *ctx, struct inode *dir, struct dentry *dentry)
delta = (bpf_ktime_get_ns() - *tsp) / 1000000; delta = (bpf_ktime_get_ns() - *tsp) / 1000000;
birth.delete(&dentry); birth.delete(&dentry);
if (dentry->d_iname[0] == 0) if (dentry->d_name.len == 0)
return 0; return 0;
if (bpf_get_current_comm(&data.comm, sizeof(data.comm)) == 0) { if (bpf_get_current_comm(&data.comm, sizeof(data.comm)) == 0) {
data.pid = pid; data.pid = pid;
data.delta = delta; data.delta = delta;
bpf_probe_read(&data.fname, sizeof(data.fname), dentry->d_iname); bpf_probe_read(&data.fname, sizeof(data.fname),
(void *)dentry->d_name.name);
} }
events.perf_submit(ctx, &data, sizeof(data)); events.perf_submit(ctx, &data, sizeof(data));
...@@ -119,6 +120,9 @@ if debug: ...@@ -119,6 +120,9 @@ if debug:
# initialize BPF # initialize BPF
b = BPF(text=bpf_text) b = BPF(text=bpf_text)
b.attach_kprobe(event="vfs_create", fn_name="trace_create") b.attach_kprobe(event="vfs_create", fn_name="trace_create")
# newer kernels (say, 4.8) may don't fire vfs_create, so record (or overwrite)
# the timestamp in security_inode_create():
b.attach_kprobe(event="security_inode_create", fn_name="trace_create")
b.attach_kprobe(event="vfs_unlink", fn_name="trace_unlink") b.attach_kprobe(event="vfs_unlink", fn_name="trace_unlink")
# header # header
......
...@@ -40,6 +40,8 @@ examples = """examples: ...@@ -40,6 +40,8 @@ examples = """examples:
./offcputime # trace off-CPU stack time until Ctrl-C ./offcputime # trace off-CPU stack time until Ctrl-C
./offcputime 5 # trace for 5 seconds only ./offcputime 5 # trace for 5 seconds only
./offcputime -f 5 # 5 seconds, and output in folded format ./offcputime -f 5 # 5 seconds, and output in folded format
./offcputime -m 1000 # trace only events that last more than 1000 usec.
./offcputime -M 10000 # trace only events that last less than 10000 usec.
./offcputime -p 185 # only trace threads for PID 185 ./offcputime -p 185 # only trace threads for PID 185
./offcputime -t 188 # only trace thread 188 ./offcputime -t 188 # only trace thread 188
./offcputime -u # only trace user threads (no kernel) ./offcputime -u # only trace user threads (no kernel)
...@@ -78,6 +80,12 @@ parser.add_argument("--stack-storage-size", default=1024, ...@@ -78,6 +80,12 @@ parser.add_argument("--stack-storage-size", default=1024,
parser.add_argument("duration", nargs="?", default=99999999, parser.add_argument("duration", nargs="?", default=99999999,
type=positive_nonzero_int, type=positive_nonzero_int,
help="duration of trace, in seconds") help="duration of trace, in seconds")
parser.add_argument("-m", "--min-block-time", default=1,
type=positive_nonzero_int,
help="the amount of time in microseconds over which we store traces (default 1)")
parser.add_argument("-M", "--max-block-time", default=(1<<64)-1,
type=positive_nonzero_int,
help="the amount of time in microseconds under which we store traces (default U64_MAX)")
args = parser.parse_args() args = parser.parse_args()
if args.pid and args.tgid: if args.pid and args.tgid:
parser.error("specify only one of -p and -t") parser.error("specify only one of -p and -t")
...@@ -93,7 +101,8 @@ bpf_text = """ ...@@ -93,7 +101,8 @@ bpf_text = """
#include <uapi/linux/ptrace.h> #include <uapi/linux/ptrace.h>
#include <linux/sched.h> #include <linux/sched.h>
#define MINBLOCK_US 1 #define MINBLOCK_US MINBLOCK_US_VALUEULL
#define MAXBLOCK_US MAXBLOCK_US_VALUEULL
struct key_t { struct key_t {
u32 pid; u32 pid;
...@@ -129,7 +138,7 @@ int oncpu(struct pt_regs *ctx, struct task_struct *prev) { ...@@ -129,7 +138,7 @@ int oncpu(struct pt_regs *ctx, struct task_struct *prev) {
u64 delta = bpf_ktime_get_ns() - *tsp; u64 delta = bpf_ktime_get_ns() - *tsp;
start.delete(&pid); start.delete(&pid);
delta = delta / 1000; delta = delta / 1000;
if (delta < MINBLOCK_US) { if ((delta < MINBLOCK_US) || (delta > MAXBLOCK_US)) {
return 0; return 0;
} }
...@@ -170,6 +179,8 @@ bpf_text = bpf_text.replace('THREAD_FILTER', thread_filter) ...@@ -170,6 +179,8 @@ bpf_text = bpf_text.replace('THREAD_FILTER', thread_filter)
# set stack storage size # set stack storage size
bpf_text = bpf_text.replace('STACK_STORAGE_SIZE', str(args.stack_storage_size)) bpf_text = bpf_text.replace('STACK_STORAGE_SIZE', str(args.stack_storage_size))
bpf_text = bpf_text.replace('MINBLOCK_US_VALUE', str(args.min_block_time))
bpf_text = bpf_text.replace('MAXBLOCK_US_VALUE', str(args.max_block_time))
# handle stack args # handle stack args
kernel_stack_get = "stack_traces.get_stackid(ctx, BPF_F_REUSE_STACKID)" kernel_stack_get = "stack_traces.get_stackid(ctx, BPF_F_REUSE_STACKID)"
......
#!/usr/bin/python
# @lint-avoid-python-3-compatibility-imports
#
# tcptop Summarize TCP send/recv throughput by host.
# For Linux, uses BCC, eBPF. Embedded C.
#
# USAGE: tcptop [-h] [-C] [-S] [-p PID] [interval [count]]
#
# This uses dynamic tracing of kernel functions, and will need to be updated
# to match kernel changes.
#
# WARNING: This traces all send/receives at the TCP level, and while it
# summarizes data in-kernel to reduce overhead, there may still be some
# overhead at high TCP send/receive rates (eg, ~13% of one CPU at 100k TCP
# events/sec. This is not the same as packet rate: funccount can be used to
# count the kprobes below to find out the TCP rate). Test in a lab environment
# first. If your send/receive rate is low (eg, <1k/sec) then the overhead is
# expected to be negligible.
#
# ToDo: Fit output to screen size (top X only) in default (not -C) mode.
#
# Copyright 2016 Netflix, Inc.
# Licensed under the Apache License, Version 2.0 (the "License")
#
# 02-Sep-2016 Brendan Gregg Created this.
from __future__ import print_function
from bcc import BPF
import argparse
from socket import inet_ntop, AF_INET, AF_INET6
from struct import pack
from time import sleep, strftime
from subprocess import call
import ctypes as ct
# arguments
examples = """examples:
./tcptop # trace TCP send/recv by host
./tcptop -C # don't clear the screen
./tcptop -p 181 # only trace PID 181
"""
parser = argparse.ArgumentParser(
description="Summarize TCP send/recv throughput by host",
formatter_class=argparse.RawDescriptionHelpFormatter,
epilog=examples)
parser.add_argument("-C", "--noclear", action="store_true",
help="don't clear the screen")
parser.add_argument("-S", "--nosummary", action="store_true",
help="skip system summary line")
parser.add_argument("-p", "--pid",
help="trace this PID only")
parser.add_argument("interval", nargs="?", default=1,
help="output interval, in seconds (default 1)")
parser.add_argument("count", nargs="?", default=99999999,
help="number of outputs")
args = parser.parse_args()
countdown = int(args.count)
if args.interval and int(args.interval) == 0:
print("ERROR: interval 0. Exiting.")
exit()
debug = 0
# linux stats
loadavg = "/proc/loadavg"
# define BPF program
bpf_text = """
#include <uapi/linux/ptrace.h>
#include <net/sock.h>
#include <bcc/proto.h>
struct ipv4_key_t {
u32 pid;
u32 saddr;
u32 daddr;
u16 lport;
u16 dport;
};
BPF_HASH(ipv4_send_bytes, struct ipv4_key_t);
BPF_HASH(ipv4_recv_bytes, struct ipv4_key_t);
struct ipv6_key_t {
u32 pid;
// workaround until unsigned __int128 support:
u64 saddr0;
u64 saddr1;
u64 daddr0;
u64 daddr1;
u16 lport;
u16 dport;
};
BPF_HASH(ipv6_send_bytes, struct ipv6_key_t);
BPF_HASH(ipv6_recv_bytes, struct ipv6_key_t);
int kprobe__tcp_sendmsg(struct pt_regs *ctx, struct sock *sk,
struct msghdr *msg, size_t size)
{
u32 pid = bpf_get_current_pid_tgid();
FILTER
u16 dport = 0, family = sk->__sk_common.skc_family;
u64 *val, zero = 0;
if (family == AF_INET) {
struct ipv4_key_t ipv4_key = {.pid = pid};
ipv4_key.saddr = sk->__sk_common.skc_rcv_saddr;
ipv4_key.daddr = sk->__sk_common.skc_daddr;
ipv4_key.lport = sk->__sk_common.skc_num;
dport = sk->__sk_common.skc_dport;
ipv4_key.dport = ntohs(dport);
val = ipv4_send_bytes.lookup_or_init(&ipv4_key, &zero);
(*val) += size;
} else if (family == AF_INET6) {
struct ipv6_key_t ipv6_key = {.pid = pid};
bpf_probe_read(&ipv6_key.saddr0, sizeof(ipv6_key.saddr0),
&sk->__sk_common.skc_v6_rcv_saddr.in6_u.u6_addr32[0]);
bpf_probe_read(&ipv6_key.saddr1, sizeof(ipv6_key.saddr1),
&sk->__sk_common.skc_v6_rcv_saddr.in6_u.u6_addr32[2]);
bpf_probe_read(&ipv6_key.daddr0, sizeof(ipv6_key.daddr0),
&sk->__sk_common.skc_v6_daddr.in6_u.u6_addr32[0]);
bpf_probe_read(&ipv6_key.daddr1, sizeof(ipv6_key.daddr1),
&sk->__sk_common.skc_v6_daddr.in6_u.u6_addr32[2]);
ipv6_key.lport = sk->__sk_common.skc_num;
dport = sk->__sk_common.skc_dport;
ipv6_key.dport = ntohs(dport);
val = ipv6_send_bytes.lookup_or_init(&ipv6_key, &zero);
(*val) += size;
}
// else drop
return 0;
}
/*
* tcp_recvmsg() would be obvious to trace, but is less suitable because:
* - we'd need to trace both entry and return, to have both sock and size
* - misses tcp_read_sock() traffic
* we'd much prefer tracepoints once they are available.
*/
int kprobe__tcp_cleanup_rbuf(struct pt_regs *ctx, struct sock *sk, int copied)
{
u32 pid = bpf_get_current_pid_tgid();
FILTER
u16 dport = 0, family = sk->__sk_common.skc_family;
u64 *val, zero = 0;
if (family == AF_INET) {
struct ipv4_key_t ipv4_key = {.pid = pid};
ipv4_key.saddr = sk->__sk_common.skc_rcv_saddr;
ipv4_key.daddr = sk->__sk_common.skc_daddr;
ipv4_key.lport = sk->__sk_common.skc_num;
dport = sk->__sk_common.skc_dport;
ipv4_key.dport = ntohs(dport);
val = ipv4_recv_bytes.lookup_or_init(&ipv4_key, &zero);
(*val) += copied;
} else if (family == AF_INET6) {
struct ipv6_key_t ipv6_key = {.pid = pid};
bpf_probe_read(&ipv6_key.saddr0, sizeof(ipv6_key.saddr0),
&sk->__sk_common.skc_v6_rcv_saddr.in6_u.u6_addr32[0]);
bpf_probe_read(&ipv6_key.saddr1, sizeof(ipv6_key.saddr1),
&sk->__sk_common.skc_v6_rcv_saddr.in6_u.u6_addr32[2]);
bpf_probe_read(&ipv6_key.daddr0, sizeof(ipv6_key.daddr0),
&sk->__sk_common.skc_v6_daddr.in6_u.u6_addr32[0]);
bpf_probe_read(&ipv6_key.daddr1, sizeof(ipv6_key.daddr1),
&sk->__sk_common.skc_v6_daddr.in6_u.u6_addr32[2]);
ipv6_key.lport = sk->__sk_common.skc_num;
dport = sk->__sk_common.skc_dport;
ipv6_key.dport = ntohs(dport);
val = ipv6_recv_bytes.lookup_or_init(&ipv6_key, &zero);
(*val) += copied;
}
// else drop
return 0;
}
"""
# code substitutions
if args.pid:
bpf_text = bpf_text.replace('FILTER',
'if (pid != %s) { return 0; }' % args.pid)
else:
bpf_text = bpf_text.replace('FILTER', '')
if debug:
print(bpf_text)
def pid_to_comm(pid):
try:
comm = open("/proc/%d/comm" % pid, "r").read().rstrip()
return comm
except IOError:
return str(pid)
# initialize BPF
b = BPF(text=bpf_text)
ipv4_send_bytes = b["ipv4_send_bytes"]
ipv4_recv_bytes = b["ipv4_recv_bytes"]
ipv6_send_bytes = b["ipv6_send_bytes"]
ipv6_recv_bytes = b["ipv6_recv_bytes"]
print('Tracing... Output every %s secs. Hit Ctrl-C to end' % args.interval)
# output
exiting = 0
while (1):
try:
if args.interval:
sleep(int(args.interval))
else:
sleep(99999999)
except KeyboardInterrupt:
exiting = 1
# header
if args.noclear:
print()
else:
call("clear")
if not args.nosummary:
with open(loadavg) as stats:
print("%-8s loadavg: %s" % (strftime("%H:%M:%S"), stats.read()))
# IPv4: build dict of all seen keys
keys = ipv4_recv_bytes
for k, v in ipv4_send_bytes.items():
if k not in keys:
keys[k] = v
if keys:
print("%-6s %-12s %-21s %-21s %6s %6s" % ("PID", "COMM",
"LADDR", "RADDR", "RX_KB", "TX_KB"))
# output
for k, v in reversed(sorted(keys.items(), key=lambda keys: keys[1].value)):
send_kbytes = 0
if k in ipv4_send_bytes:
send_kbytes = int(ipv4_send_bytes[k].value / 1024)
recv_kbytes = 0
if k in ipv4_recv_bytes:
recv_kbytes = int(ipv4_recv_bytes[k].value / 1024)
print("%-6d %-12.12s %-21s %-21s %6d %6d" % (k.pid,
pid_to_comm(k.pid),
inet_ntop(AF_INET, pack("I", k.saddr)) + ":" + str(k.lport),
inet_ntop(AF_INET, pack("I", k.daddr)) + ":" + str(k.dport),
recv_kbytes, send_kbytes))
ipv4_send_bytes.clear()
ipv4_recv_bytes.clear()
# IPv6: build dict of all seen keys
keys = ipv6_recv_bytes
for k, v in ipv6_send_bytes.items():
if k not in keys:
keys[k] = v
if keys:
# more than 80 chars, sadly.
print("\n%-6s %-12s %-32s %-32s %6s %6s" % ("PID", "COMM",
"LADDR6", "RADDR6", "RX_KB", "TX_KB"))
# output
for k, v in reversed(sorted(keys.items(), key=lambda keys: keys[1].value)):
send_kbytes = 0
if k in ipv6_send_bytes:
send_kbytes = int(ipv6_send_bytes[k].value / 1024)
recv_kbytes = 0
if k in ipv6_recv_bytes:
recv_kbytes = int(ipv6_recv_bytes[k].value / 1024)
print("%-6d %-12.12s %-32s %-32s %6d %6d" % (k.pid,
pid_to_comm(k.pid),
inet_ntop(AF_INET6, pack("QQ", k.saddr0, k.saddr1)) + ":" +
str(k.lport),
inet_ntop(AF_INET6, pack("QQ", k.daddr0, k.daddr1)) + ":" +
str(k.dport),
recv_kbytes, send_kbytes))
ipv6_send_bytes.clear()
ipv6_recv_bytes.clear()
countdown -= 1
if exiting or countdown == 0:
exit()
Demonstrations of tcptop, the Linux eBPF/bcc version.
tcptop summarizes throughput by host and port. Eg:
# tcptop
Tracing... Output every 1 secs. Hit Ctrl-C to end
<screen clears>
19:46:24 loadavg: 1.86 2.67 2.91 3/362 16681
PID COMM LADDR RADDR RX_KB TX_KB
16648 16648 100.66.3.172:22 100.127.69.165:6684 1 0
16647 sshd 100.66.3.172:22 100.127.69.165:6684 0 2149
14374 sshd 100.66.3.172:22 100.127.69.165:25219 0 0
14458 sshd 100.66.3.172:22 100.127.69.165:7165 0 0
PID COMM LADDR6 RADDR6 RX_KB TX_KB
16681 sshd fe80::8a3:9dff:fed5:6b19:22 fe80::8a3:9dff:fed5:6b19:16606 1 1
16679 ssh fe80::8a3:9dff:fed5:6b19:16606 fe80::8a3:9dff:fed5:6b19:22 1 1
16680 sshd fe80::8a3:9dff:fed5:6b19:22 fe80::8a3:9dff:fed5:6b19:16606 0 0
This example output shows two listings of TCP connections, for IPv4 and IPv6.
If there is only traffic for one of these, then only one group is shown.
The output in each listing is sorted by total throughput (send then receive),
and when printed it is rounded (floor) to the nearest Kbyte. The example output
shows PID 16647, sshd, transmitted 2149 Kbytes during the tracing interval.
The other IPv4 sessions had such low throughput they rounded to zero.
All TCP sessions, including over loopback, are included.
The session with the process name (COMM) of 16648 is really a short-lived
process with PID 16648 where we didn't catch the process name when printing
the output. If this behavior is a serious issue for you, you can modify the
tool's code to include bpf_get_current_comm() in the key structs, so that it's
fetched during the event and will always be seen. I did it this way to start
with, but it was measurably increasing the overhead of this tool, so I switched
to the asynchronous model.
The overhead is relative to TCP event rate (the rate of tcp_sendmsg() and
tcp_recvmsg() or tcp_cleanup_rbuf()). Due to buffering, this should be lower
than the packet rate. You can measure the rate of these using funccount.
Some sample production servers tested found total rates of 4k to 15k per
second. The CPU overhead at these rates ranged from 0.5% to 2.0% of one CPU.
Maybe your workloads have higher rates and therefore higher overhead, or,
lower rates.
I much prefer not clearing the screen, so that historic output is in the
scroll-back buffer, and patterns or intermittent issues can be better seen.
You can do this with -C:
# tcptop -C
Tracing... Output every 1 secs. Hit Ctrl-C to end
20:27:12 loadavg: 0.08 0.02 0.17 2/367 17342
PID COMM LADDR RADDR RX_KB TX_KB
17287 17287 100.66.3.172:22 100.127.69.165:57585 3 1
17286 sshd 100.66.3.172:22 100.127.69.165:57585 0 1
14374 sshd 100.66.3.172:22 100.127.69.165:25219 0 0
20:27:13 loadavg: 0.08 0.02 0.17 1/367 17342
PID COMM LADDR RADDR RX_KB TX_KB
17286 sshd 100.66.3.172:22 100.127.69.165:57585 1 7761
14374 sshd 100.66.3.172:22 100.127.69.165:25219 0 0
20:27:14 loadavg: 0.08 0.02 0.17 2/365 17347
PID COMM LADDR RADDR RX_KB TX_KB
17286 17286 100.66.3.172:22 100.127.69.165:57585 1 2501
14374 sshd 100.66.3.172:22 100.127.69.165:25219 0 0
20:27:15 loadavg: 0.07 0.02 0.17 2/367 17403
PID COMM LADDR RADDR RX_KB TX_KB
17349 17349 100.66.3.172:22 100.127.69.165:10161 3 1
17348 sshd 100.66.3.172:22 100.127.69.165:10161 0 1
14374 sshd 100.66.3.172:22 100.127.69.165:25219 0 0
20:27:16 loadavg: 0.07 0.02 0.17 1/367 17403
PID COMM LADDR RADDR RX_KB TX_KB
17348 sshd 100.66.3.172:22 100.127.69.165:10161 3333 0
14374 sshd 100.66.3.172:22 100.127.69.165:25219 0 0
20:27:17 loadavg: 0.07 0.02 0.17 2/366 17409
PID COMM LADDR RADDR RX_KB TX_KB
17348 17348 100.66.3.172:22 100.127.69.165:10161 6909 2
You can disable the loadavg summary line with -S if needed.
USAGE:
# tcptop -h
usage: tcptop.py [-h] [-C] [-S] [-p PID] [interval] [count]
Summarize TCP send/recv throughput by host
positional arguments:
interval output interval, in seconds (default 1)
count number of outputs
optional arguments:
-h, --help show this help message and exit
-C, --noclear don't clear the screen
-S, --nosummary skip system summary line
-p PID, --pid PID trace this PID only
examples:
./tcptop # trace TCP send/recv by host
./tcptop -C # don't clear the screen
./tcptop -p 181 # only trace PID 181
...@@ -13,7 +13,7 @@ import os ...@@ -13,7 +13,7 @@ import os
import re import re
import sys import sys
from bcc import USDTReader from bcc import USDT
trace_root = "/sys/kernel/debug/tracing" trace_root = "/sys/kernel/debug/tracing"
event_root = os.path.join(trace_root, "events") event_root = os.path.join(trace_root, "events")
...@@ -21,7 +21,7 @@ event_root = os.path.join(trace_root, "events") ...@@ -21,7 +21,7 @@ event_root = os.path.join(trace_root, "events")
parser = argparse.ArgumentParser(description= parser = argparse.ArgumentParser(description=
"Display kernel tracepoints or USDT probes and their formats.", "Display kernel tracepoints or USDT probes and their formats.",
formatter_class=argparse.RawDescriptionHelpFormatter) formatter_class=argparse.RawDescriptionHelpFormatter)
parser.add_argument("-p", "--pid", type=int, default=-1, help= parser.add_argument("-p", "--pid", type=int, default=None, help=
"List USDT probes in the specified process") "List USDT probes in the specified process")
parser.add_argument("-l", "--lib", default="", help= parser.add_argument("-l", "--lib", default="", help=
"List USDT probes in the specified library or executable") "List USDT probes in the specified library or executable")
...@@ -65,23 +65,23 @@ def print_tracepoints(): ...@@ -65,23 +65,23 @@ def print_tracepoints():
print_tpoint(category, event) print_tpoint(category, event)
def print_usdt(pid, lib): def print_usdt(pid, lib):
reader = USDTReader(bin_path=lib, pid=pid) reader = USDT(path=lib, pid=pid)
probes_seen = [] probes_seen = []
for probe in reader.probes: for probe in reader.enumerate_probes():
probe_name = "%s:%s" % (probe.provider, probe.name) probe_name = probe.short_name()
if not args.filter or fnmatch.fnmatch(probe_name, args.filter): if not args.filter or fnmatch.fnmatch(probe_name, args.filter):
if probe_name in probes_seen: if probe_name in probes_seen:
continue continue
probes_seen.append(probe_name) probes_seen.append(probe_name)
if args.variables: if args.variables:
print(probe.display_verbose()) print(probe)
else: else:
print("%s %s:%s" % (probe.bin_path, print("%s %s:%s" % (probe.bin_path,
probe.provider, probe.name)) probe.provider, probe.name))
if __name__ == "__main__": if __name__ == "__main__":
try: try:
if args.pid != -1 or args.lib != "": if args.pid or args.lib != "":
print_usdt(args.pid, args.lib) print_usdt(args.pid, args.lib)
else: else:
print_tracepoints() print_tracepoints()
......
...@@ -17,25 +17,18 @@ $ tplist -l basic_usdt ...@@ -17,25 +17,18 @@ $ tplist -l basic_usdt
/home/vagrant/basic_usdt basic_usdt:loop_iter /home/vagrant/basic_usdt basic_usdt:loop_iter
/home/vagrant/basic_usdt basic_usdt:end_main /home/vagrant/basic_usdt basic_usdt:end_main
The loop_iter probe sounds interesting. What are the locations of that The loop_iter probe sounds interesting. How many arguments are available?
probe, and which variables are available?
$ tplist '*loop_iter' -l basic_usdt -v $ tplist '*loop_iter' -l basic_usdt -v
/home/vagrant/basic_usdt basic_usdt:loop_iter [sema 0x601036] /home/vagrant/basic_usdt basic_usdt:loop_iter [sema 0x601036]
location 0x400550 raw args: -4@$42 8@%rax 2 location(s)
4 signed bytes @ constant 42 2 argument(s)
8 unsigned bytes @ register %rax
location 0x40056f raw args: 8@-8(%rbp) 8@%rax
8 unsigned bytes @ -8(%rbp)
8 unsigned bytes @ register %rax
This output indicates that the loop_iter probe is used in two locations This output indicates that the loop_iter probe is used in two locations
in the basic_usdt executable. The first location passes a constant value, in the basic_usdt executable, and that it has two arguments. Fortunately,
42, to the probe. The second location passes a variable value located at the argdist and trace tools understand the probe format and can print out
an offset from the %rbp register. Don't worry -- you don't have to trace the arguments automatically -- you can refer to them as arg1, arg2, and
the register values yourself. The argdist and trace tools understand the so on.
probe format and can print out the arguments automatically -- you can
refer to them as arg1, arg2, and so on.
Try to explore with some common libraries on your system and see if they Try to explore with some common libraries on your system and see if they
contain UDST probes. Here are two examples you might find interesting: contain UDST probes. Here are two examples you might find interesting:
......
...@@ -9,7 +9,8 @@ ...@@ -9,7 +9,8 @@
# Licensed under the Apache License, Version 2.0 (the "License") # Licensed under the Apache License, Version 2.0 (the "License")
# Copyright (C) 2016 Sasha Goldshtein. # Copyright (C) 2016 Sasha Goldshtein.
from bcc import BPF, Tracepoint, Perf, USDT from bcc import BPF, USDT
from functools import partial
from time import sleep, strftime from time import sleep, strftime
import argparse import argparse
import re import re
...@@ -58,9 +59,12 @@ class Probe(object): ...@@ -58,9 +59,12 @@ class Probe(object):
cls.first_ts = Time.monotonic_time() cls.first_ts = Time.monotonic_time()
cls.pid = args.pid or -1 cls.pid = args.pid or -1
def __init__(self, probe, string_size): def __init__(self, probe, string_size, kernel_stack, user_stack):
self.usdt = None
self.raw_probe = probe self.raw_probe = probe
self.string_size = string_size self.string_size = string_size
self.kernel_stack = kernel_stack
self.user_stack = user_stack
Probe.probe_count += 1 Probe.probe_count += 1
self._parse_probe() self._parse_probe()
self.probe_num = Probe.probe_count self.probe_num = Probe.probe_count
...@@ -134,10 +138,8 @@ class Probe(object): ...@@ -134,10 +138,8 @@ class Probe(object):
if self.probe_type == "t": if self.probe_type == "t":
self.tp_category = parts[1] self.tp_category = parts[1]
self.tp_event = parts[2] self.tp_event = parts[2]
self.tp = Tracepoint.enable_tracepoint(
self.tp_category, self.tp_event)
self.library = "" # kernel self.library = "" # kernel
self.function = "perf_trace_%s" % self.tp_event self.function = "" # generated from TRACEPOINT_PROBE
elif self.probe_type == "u": elif self.probe_type == "u":
self.library = parts[1] self.library = parts[1]
self.usdt_name = parts[2] self.usdt_name = parts[2]
...@@ -145,30 +147,15 @@ class Probe(object): ...@@ -145,30 +147,15 @@ class Probe(object):
# We will discover the USDT provider by matching on # We will discover the USDT provider by matching on
# the USDT name in the specified library # the USDT name in the specified library
self._find_usdt_probe() self._find_usdt_probe()
self._enable_usdt_probe()
else: else:
self.library = parts[1] self.library = parts[1]
self.function = parts[2] self.function = parts[2]
def _enable_usdt_probe(self):
if self.usdt.need_enable():
if Probe.pid == -1:
self._bail("probe needs pid to enable")
self.usdt.enable(Probe.pid)
def _disable_usdt_probe(self):
if self.probe_type == "u" and self.usdt.need_enable():
self.usdt.disable(Probe.pid)
def close(self):
self._disable_usdt_probe()
def _find_usdt_probe(self): def _find_usdt_probe(self):
reader = USDTReader(bin_path=self.library) self.usdt = USDT(path=self.library, pid=Probe.pid)
for probe in reader.probes: for probe in self.usdt.enumerate_probes():
if probe.name == self.usdt_name: if probe.name == self.usdt_name:
self.usdt = probe return # Found it, will enable later
return
self._bail("unrecognized USDT probe %s" % self.usdt_name) self._bail("unrecognized USDT probe %s" % self.usdt_name)
def _parse_filter(self, filt): def _parse_filter(self, filt):
...@@ -219,7 +206,8 @@ class Probe(object): ...@@ -219,7 +206,8 @@ class Probe(object):
def _replace_args(self, expr): def _replace_args(self, expr):
for alias, replacement in Probe.aliases.items(): for alias, replacement in Probe.aliases.items():
# For USDT probes, we replace argN values with the # For USDT probes, we replace argN values with the
# actual arguments for that probe. # actual arguments for that probe obtained using special
# bpf_readarg_N macros emitted at BPF construction.
if alias.startswith("arg") and self.probe_type == "u": if alias.startswith("arg") and self.probe_type == "u":
continue continue
expr = expr.replace(alias, replacement) expr = expr.replace(alias, replacement)
...@@ -249,6 +237,10 @@ class Probe(object): ...@@ -249,6 +237,10 @@ class Probe(object):
] ]
for i in range(0, len(self.types)): for i in range(0, len(self.types)):
self._generate_python_field_decl(i, fields) self._generate_python_field_decl(i, fields)
if self.kernel_stack:
fields.append(("kernel_stack_id", ct.c_int))
if self.user_stack:
fields.append(("user_stack_id", ct.c_int))
return type(self.python_struct_name, (ct.Structure,), return type(self.python_struct_name, (ct.Structure,),
dict(_fields_=fields)) dict(_fields_=fields))
...@@ -273,12 +265,19 @@ class Probe(object): ...@@ -273,12 +265,19 @@ class Probe(object):
# construct the final display string. # construct the final display string.
self.events_name = "%s_events" % self.probe_name self.events_name = "%s_events" % self.probe_name
self.struct_name = "%s_data_t" % self.probe_name self.struct_name = "%s_data_t" % self.probe_name
self.stacks_name = "%s_stacks" % self.probe_name
stack_table = "BPF_STACK_TRACE(%s, 1024);" % self.stacks_name \
if (self.kernel_stack or self.user_stack) else ""
data_fields = "" data_fields = ""
for i, field_type in enumerate(self.types): for i, field_type in enumerate(self.types):
data_fields += " " + \ data_fields += " " + \
self._generate_field_decl(i) self._generate_field_decl(i)
kernel_stack_str = " int kernel_stack_id;" \
if self.kernel_stack else ""
user_stack_str = " int user_stack_id;" \
if self.user_stack else ""
text = """ text = """
struct %s struct %s
{ {
...@@ -286,26 +285,57 @@ struct %s ...@@ -286,26 +285,57 @@ struct %s
u32 pid; u32 pid;
char comm[TASK_COMM_LEN]; char comm[TASK_COMM_LEN];
%s %s
%s
%s
}; };
BPF_PERF_OUTPUT(%s); BPF_PERF_OUTPUT(%s);
%s
""" """
return text % (self.struct_name, data_fields, self.events_name) return text % (self.struct_name, data_fields,
kernel_stack_str, user_stack_str,
self.events_name, stack_table)
def _generate_field_assign(self, idx): def _generate_field_assign(self, idx):
field_type = self.types[idx] field_type = self.types[idx]
expr = self.values[idx] expr = self.values[idx].strip()
text = ""
if self.probe_type == "u" and expr[0:3] == "arg":
text = (" u64 %s = 0;\n" +
" bpf_usdt_readarg(%s, ctx, &%s);\n") % \
(expr, expr[3], expr)
if field_type == "s": if field_type == "s":
return """ return text + """
if (%s != 0) { if (%s != 0) {
bpf_probe_read(&__data.v%d, sizeof(__data.v%d), (void *)%s); bpf_probe_read(&__data.v%d, sizeof(__data.v%d), (void *)%s);
} }
""" % (expr, idx, idx, expr) """ % (expr, idx, idx, expr)
if field_type in Probe.fmt_types: if field_type in Probe.fmt_types:
return " __data.v%d = (%s)%s;\n" % \ return text + " __data.v%d = (%s)%s;\n" % \
(idx, Probe.c_type[field_type], expr) (idx, Probe.c_type[field_type], expr)
self._bail("unrecognized field type %s" % field_type) self._bail("unrecognized field type %s" % field_type)
def _generate_usdt_filter_read(self):
text = ""
if self.probe_type == "u":
for arg, _ in Probe.aliases.items():
if not (arg.startswith("arg") and (arg in self.filter)):
continue
arg_index = int(arg.replace("arg", ""))
arg_ctype = self.usdt.get_probe_arg_ctype(
self.usdt_name, arg_index)
if not arg_ctype:
self._bail("Unable to determine type of {} "
"in the filter".format(arg))
text += """
{} {}_filter;
bpf_usdt_readarg({}, ctx, &{}_filter);
""".format(arg_ctype, arg, arg_index, arg)
self.filter = self.filter.replace(
arg, "{}_filter".format(arg))
return text
def generate_program(self, include_self): def generate_program(self, include_self):
data_decl = self._generate_data_decl() data_decl = self._generate_data_decl()
# kprobes don't have built-in pid filters, so we have to add # kprobes don't have built-in pid filters, so we have to add
...@@ -324,24 +354,34 @@ BPF_PERF_OUTPUT(%s); ...@@ -324,24 +354,34 @@ BPF_PERF_OUTPUT(%s);
pid_filter = "" pid_filter = ""
prefix = "" prefix = ""
qualifier = ""
signature = "struct pt_regs *ctx" signature = "struct pt_regs *ctx"
if self.probe_type == "t":
data_decl += self.tp.generate_struct()
prefix = self.tp.generate_get_struct()
elif self.probe_type == "u":
signature += ", int __loc_id"
prefix = self.usdt.generate_usdt_cases(
pid=Probe.pid if Probe.pid != -1 else None)
qualifier = "static inline"
data_fields = "" data_fields = ""
for i, expr in enumerate(self.values): for i, expr in enumerate(self.values):
data_fields += self._generate_field_assign(i) data_fields += self._generate_field_assign(i)
text = """ stack_trace = ""
%s int %s(%s) if self.user_stack:
stack_trace += """
__data.user_stack_id = %s.get_stackid(
ctx, BPF_F_REUSE_STACKID | BPF_F_USER_STACK
);""" % self.stacks_name
if self.kernel_stack:
stack_trace += """
__data.kernel_stack_id = %s.get_stackid(
ctx, BPF_F_REUSE_STACKID
);""" % self.stacks_name
if self.probe_type == "t":
heading = "TRACEPOINT_PROBE(%s, %s)" % \
(self.tp_category, self.tp_event)
ctx_name = "args"
else:
heading = "int %s(%s)" % (self.probe_name, signature)
ctx_name = "ctx"
text = heading + """
{ {
%s
%s %s
%s %s
if (!(%s)) return 0; if (!(%s)) return 0;
...@@ -351,18 +391,15 @@ BPF_PERF_OUTPUT(%s); ...@@ -351,18 +391,15 @@ BPF_PERF_OUTPUT(%s);
__data.pid = bpf_get_current_pid_tgid(); __data.pid = bpf_get_current_pid_tgid();
bpf_get_current_comm(&__data.comm, sizeof(__data.comm)); bpf_get_current_comm(&__data.comm, sizeof(__data.comm));
%s %s
%s.perf_submit(ctx, &__data, sizeof(__data)); %s
%s.perf_submit(%s, &__data, sizeof(__data));
return 0; return 0;
} }
""" """
text = text % (qualifier, self.probe_name, signature, text = text % (pid_filter, prefix,
pid_filter, prefix, self.filter, self._generate_usdt_filter_read(), self.filter,
self.struct_name, data_fields, self.events_name) self.struct_name, data_fields,
stack_trace, self.events_name, ctx_name)
if self.probe_type == "u":
self.usdt_thunk_names = []
text += self.usdt.generate_usdt_thunks(
self.probe_name, self.usdt_thunk_names)
return data_decl + "\n" + text return data_decl + "\n" + text
...@@ -378,7 +415,16 @@ BPF_PERF_OUTPUT(%s); ...@@ -378,7 +415,16 @@ BPF_PERF_OUTPUT(%s);
else: # self.probe_type == 't' else: # self.probe_type == 't'
return self.tp_event return self.tp_event
def print_event(self, cpu, data, size): def print_stack(self, bpf, stack_id, pid):
if stack_id < 0:
print(" %d" % stack_id)
return
stack = list(bpf.get_table(self.stacks_name).walk(stack_id))
for addr in stack:
print(" %016x %s" % (addr, bpf.sym(addr, pid)))
def print_event(self, bpf, cpu, data, size):
# Cast as the generated structure type and display # Cast as the generated structure type and display
# according to the format string in the probe. # according to the format string in the probe.
event = ct.cast(data, ct.POINTER(self.python_struct)).contents event = ct.cast(data, ct.POINTER(self.python_struct)).contents
...@@ -391,6 +437,15 @@ BPF_PERF_OUTPUT(%s); ...@@ -391,6 +437,15 @@ BPF_PERF_OUTPUT(%s);
(time[:8], event.pid, event.comm[:12], (time[:8], event.pid, event.comm[:12],
self._display_function(), msg)) self._display_function(), msg))
if self.user_stack:
print(" User Stack Trace:")
self.print_stack(bpf, event.user_stack_id, event.pid)
if self.kernel_stack:
print(" Kernel Stack Trace:")
self.print_stack(bpf, event.kernel_stack_id, -1)
if self.user_stack or self.kernel_stack:
print("")
Probe.event_count += 1 Probe.event_count += 1
if Probe.max_events is not None and \ if Probe.max_events is not None and \
Probe.event_count >= Probe.max_events: Probe.event_count >= Probe.max_events:
...@@ -402,30 +457,28 @@ BPF_PERF_OUTPUT(%s); ...@@ -402,30 +457,28 @@ BPF_PERF_OUTPUT(%s);
else: else:
self._attach_u(bpf) self._attach_u(bpf)
self.python_struct = self._generate_python_data_decl() self.python_struct = self._generate_python_data_decl()
bpf[self.events_name].open_perf_buffer(self.print_event) callback = partial(self.print_event, bpf)
bpf[self.events_name].open_perf_buffer(callback)
def _attach_k(self, bpf): def _attach_k(self, bpf):
if self.probe_type == "r": if self.probe_type == "r":
bpf.attach_kretprobe(event=self.function, bpf.attach_kretprobe(event=self.function,
fn_name=self.probe_name) fn_name=self.probe_name)
elif self.probe_type == "p" or self.probe_type == "t": elif self.probe_type == "p":
bpf.attach_kprobe(event=self.function, bpf.attach_kprobe(event=self.function,
fn_name=self.probe_name) fn_name=self.probe_name)
# Note that tracepoints don't need an explicit attach
def _attach_u(self, bpf): def _attach_u(self, bpf):
libpath = BPF.find_library(self.library) libpath = BPF.find_library(self.library)
if libpath is None: if libpath is None:
# This might be an executable (e.g. 'bash') # This might be an executable (e.g. 'bash')
libpath = BPF._find_exe(self.library) libpath = BPF.find_exe(self.library)
if libpath is None or len(libpath) == 0: if libpath is None or len(libpath) == 0:
self._bail("unable to find library %s" % self.library) self._bail("unable to find library %s" % self.library)
if self.probe_type == "u": if self.probe_type == "u":
for i, location in enumerate(self.usdt.locations): pass # Was already enabled by the BPF constructor
bpf.attach_uprobe(name=libpath,
addr=location.address,
fn_name=self.usdt_thunk_names[i],
pid=Probe.pid)
elif self.probe_type == "r": elif self.probe_type == "r":
bpf.attach_uretprobe(name=libpath, bpf.attach_uretprobe(name=libpath,
sym=self.function, sym=self.function,
...@@ -459,7 +512,7 @@ trace 'r::__kmalloc (retval == 0) "kmalloc failed!" ...@@ -459,7 +512,7 @@ trace 'r::__kmalloc (retval == 0) "kmalloc failed!"
Trace returns from __kmalloc which returned a null pointer Trace returns from __kmalloc which returned a null pointer
trace 'r:c:malloc (retval) "allocated = %p", retval trace 'r:c:malloc (retval) "allocated = %p", retval
Trace returns from malloc and print non-NULL allocated buffers Trace returns from malloc and print non-NULL allocated buffers
trace 't:block:block_rq_complete "sectors=%d", tp.nr_sector' trace 't:block:block_rq_complete "sectors=%d", args->nr_sector'
Trace the block_rq_complete kernel tracepoint and print # of tx sectors Trace the block_rq_complete kernel tracepoint and print # of tx sectors
trace 'u:pthread:pthread_create (arg4 != 0)' trace 'u:pthread:pthread_create (arg4 != 0)'
Trace the USDT probe pthread_create when its 4th argument is non-zero Trace the USDT probe pthread_create when its 4th argument is non-zero
...@@ -482,6 +535,10 @@ trace 'u:pthread:pthread_create (arg4 != 0)' ...@@ -482,6 +535,10 @@ trace 'u:pthread:pthread_create (arg4 != 0)'
help="number of events to print before quitting") help="number of events to print before quitting")
parser.add_argument("-o", "--offset", action="store_true", parser.add_argument("-o", "--offset", action="store_true",
help="use relative time from first traced message") help="use relative time from first traced message")
parser.add_argument("-K", "--kernel-stack", action="store_true",
help="output kernel stack trace")
parser.add_argument("-U", "--user_stack", action="store_true",
help="output user stack trace")
parser.add_argument(metavar="probe", dest="probes", nargs="+", parser.add_argument(metavar="probe", dest="probes", nargs="+",
help="probe specifier (see examples)") help="probe specifier (see examples)")
self.args = parser.parse_args() self.args = parser.parse_args()
...@@ -491,7 +548,8 @@ trace 'u:pthread:pthread_create (arg4 != 0)' ...@@ -491,7 +548,8 @@ trace 'u:pthread:pthread_create (arg4 != 0)'
self.probes = [] self.probes = []
for probe_spec in self.args.probes: for probe_spec in self.args.probes:
self.probes.append(Probe( self.probes.append(Probe(
probe_spec, self.args.string_size)) probe_spec, self.args.string_size,
self.args.kernel_stack, self.args.user_stack))
def _generate_program(self): def _generate_program(self):
self.program = """ self.program = """
...@@ -501,8 +559,6 @@ trace 'u:pthread:pthread_create (arg4 != 0)' ...@@ -501,8 +559,6 @@ trace 'u:pthread:pthread_create (arg4 != 0)'
""" """
self.program += BPF.generate_auto_includes( self.program += BPF.generate_auto_includes(
map(lambda p: p.raw_probe, self.probes)) map(lambda p: p.raw_probe, self.probes))
self.program += Tracepoint.generate_decl()
self.program += Tracepoint.generate_entry_probe()
for probe in self.probes: for probe in self.probes:
self.program += probe.generate_program( self.program += probe.generate_program(
self.args.include_self) self.args.include_self)
...@@ -511,8 +567,18 @@ trace 'u:pthread:pthread_create (arg4 != 0)' ...@@ -511,8 +567,18 @@ trace 'u:pthread:pthread_create (arg4 != 0)'
print(self.program) print(self.program)
def _attach_probes(self): def _attach_probes(self):
self.bpf = BPF(text=self.program) usdt_contexts = []
Tracepoint.attach(self.bpf) for probe in self.probes:
if probe.usdt:
# USDT probes must be enabled before the BPF object
# is initialized, because that's where the actual
# uprobe is being attached.
probe.usdt.enable_probe(
probe.usdt_name, probe.probe_name)
if self.args.verbose:
print(probe.usdt.get_text())
usdt_contexts.append(probe.usdt)
self.bpf = BPF(text=self.program, usdt_contexts=usdt_contexts)
for probe in self.probes: for probe in self.probes:
if self.args.verbose: if self.args.verbose:
print(probe) print(probe)
...@@ -530,12 +596,6 @@ trace 'u:pthread:pthread_create (arg4 != 0)' ...@@ -530,12 +596,6 @@ trace 'u:pthread:pthread_create (arg4 != 0)'
while True: while True:
self.bpf.kprobe_poll() self.bpf.kprobe_poll()
def _close_probes(self):
for probe in self.probes:
probe.close()
if self.args.verbose:
print("closed probe: " + str(probe))
def run(self): def run(self):
try: try:
self._create_probes() self._create_probes()
...@@ -547,7 +607,6 @@ trace 'u:pthread:pthread_create (arg4 != 0)' ...@@ -547,7 +607,6 @@ trace 'u:pthread:pthread_create (arg4 != 0)'
traceback.print_exc() traceback.print_exc()
elif sys.exc_info()[0] is not SystemExit: elif sys.exc_info()[0] is not SystemExit:
print(sys.exc_info()[1]) print(sys.exc_info()[1])
self._close_probes()
if __name__ == "__main__": if __name__ == "__main__":
Tool().run() Tool().run()
...@@ -84,15 +84,15 @@ trace has also some basic support for kernel tracepoints. For example, let's ...@@ -84,15 +84,15 @@ trace has also some basic support for kernel tracepoints. For example, let's
trace the block:block_rq_complete tracepoint and print out the number of sectors trace the block:block_rq_complete tracepoint and print out the number of sectors
transferred: transferred:
# trace 't:block:block_rq_complete "sectors=%d", tp.nr_sector' # trace 't:block:block_rq_complete "sectors=%d", args->nr_sector'
TIME PID COMM FUNC - TIME PID COMM FUNC -
01:23:51 0 swapper/0 block_rq_complete sectors=8 01:23:51 0 swapper/0 block_rq_complete sectors=8
01:23:55 10017 kworker/u64: block_rq_complete sectors=1 01:23:55 10017 kworker/u64: block_rq_complete sectors=1
01:23:55 0 swapper/0 block_rq_complete sectors=8 01:23:55 0 swapper/0 block_rq_complete sectors=8
^C ^C
To discover the tracepoint structure format (which you can refer to as the "tp" To discover the tracepoint structure format (which you can refer to as the "args"
variable), use the tplist tool. For example: pointer variable), use the tplist tool. For example:
# tplist -v block:block_rq_complete # tplist -v block:block_rq_complete
block:block_rq_complete block:block_rq_complete
...@@ -102,7 +102,7 @@ block:block_rq_complete ...@@ -102,7 +102,7 @@ block:block_rq_complete
int errors; int errors;
char rwbs[8]; char rwbs[8];
This output tells you that you can use "tp.dev", "tp.sector", etc. in your This output tells you that you can use "args->dev", "args->sector", etc. in your
predicate and trace arguments. predicate and trace arguments.
As a final example, let's trace open syscalls for a specific process. By As a final example, let's trace open syscalls for a specific process. By
...@@ -169,7 +169,7 @@ trace 'r::__kmalloc (retval == 0) "kmalloc failed!" ...@@ -169,7 +169,7 @@ trace 'r::__kmalloc (retval == 0) "kmalloc failed!"
Trace returns from __kmalloc which returned a null pointer Trace returns from __kmalloc which returned a null pointer
trace 'r:c:malloc (retval) "allocated = %p", retval trace 'r:c:malloc (retval) "allocated = %p", retval
Trace returns from malloc and print non-NULL allocated buffers Trace returns from malloc and print non-NULL allocated buffers
trace 't:block:block_rq_complete "sectors=%d", tp.nr_sector' trace 't:block:block_rq_complete "sectors=%d", args->nr_sector'
Trace the block_rq_complete kernel tracepoint and print # of tx sectors Trace the block_rq_complete kernel tracepoint and print # of tx sectors
trace 'u:pthread:pthread_create (arg4 != 0)' trace 'u:pthread:pthread_create (arg4 != 0)'
Trace the USDT probe pthread_create when its 4th argument is non-zero Trace the USDT probe pthread_create when its 4th argument is non-zero
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment