Git Product home page Git Product logo

calltop's Introduction

calltop - eBPF powered tracing tool

This program provides a lightweight real-time view of system calls and traces python/java/php/ruby function calls

It uses eBPF (linux only) to trace and report stats on syscalls and functions/methods calls made by python/java/php/ruby. By default it traces every system calls for every processes. It then prints the info in a top-like manner.

You can also trace your application calls by selecting its pid within the tool.

alt text

Features

  • display number, rate and latency of system calls and functions/methods calls.
  • top like output.
    • increase / decrease refresh rate
      • sort stats (pid, process name, total count, rate)
      • reset stats
  • filtering at the command line:
    • on the function name
    • on PIDs
    • on process names
  • filtering in the tool
    • filter dynamically on process name, command line, pid, system call, or function.
  • batch mode (for your scripts)
  • trace userspace application functions.

Feature in the roadmap

  • integration with graphing tools
  • details stats on a given function

How to use this tool

The usage output brings most of the information

$ sudo ./calltop.py -h
usage: calltop.py [-h] [-e SYSCALL] [-i INTERVAL] [-p PID] [-c COMM]
                  [--no-latency] [-b]

It prints realtime view of the Linux syscalls but also languages method calls.
It uses eBPF to do the tracing. So it is working only on Linux.

optional arguments:
  -h, --help            show this help message and exit
  -e SYSCALL, --syscall SYSCALL
                        -e open,read,write,sendto. Used to trace ONLY specific
                        syscalls. It uses kprobe. Without this option
                        TRACEPOINT are used to get all syscalls.
  -i INTERVAL, --interval INTERVAL
                        Set the interval in sec : -i 0.5
  -p PID, --pid PID     Filter on pids : --pid 10001,10002,10003
  -c COMM, --comm COMM  Filter on comm : --comm nginx,memcache,redis
  --no-latency          Do not display latency of the functions you trace. It
                        saves a few nanoseconds per call.
  -b, --batch           Print output in batch mode

Then when the tool is running you can :

  • filter on process name : [f] key. Type the filter and press ENTER
    • The filter should look like this :
      • pid:1234
      • sys:read,comm:nginx
      • sys:bpf,comm:calltop
      • fn:print_body,comm:calltop,pid:1234
  • trace function/method call from python/java/php/ruby in your app : [t] key. Type pid of the process you want to trace and press ENTER to validate. It attaches USDT to this pid.
  • reset the datas : 'z' key
  • print the command line : 'c' key
  • sort processes with the arrow key (right and left key)
    • you can sort on pid, process name, rate and total call number.
    • you can also revert the sort (increasing or decreasing order) by pressing R keys.
  • sort the stats within each process. You can select to sort it on :
    • function name
    • latency
    • call/s
    • Total

Installation

This tool is written in python and just need the bcc packages. See section below.

iovisor/bcc packages

It requires the following packages:

$ dpkg  -l | grep -e bpfcc
ii  bpfcc-tools           0.12.0-2       all          tools for BPF Compiler Collection (BCC)
ii  libbpfcc              0.12.0-2       amd64        shared library for BPF Compiler Collection (BCC)
ii  python-bpfcc          0.12.0-2       all          Python wrappers for BPF Compiler Collection (BCC)
ii  python3-bpfcc         0.12.0-2       all          Python 3 wrappers for BPF Compiler Collection (BCC)

You will need to install the above packages. It is already packaged in the major Linux distributions. The packages name may change according to the distribs. The minimum version is 0.12.0. If your distribution is not packaging v0.12+, you can follow the detailed installation guide of the iovisor/bcc project.

Docker Images

You can find Dockerfiles. This will help you to test the tool. Because of the container, you will be limited to syscall tracing. Python tracing is not supported in container because we can't see the host processes).

To build and run this container use these command lines. You need to be root and add the privileged flag to run it.

docker build . -t calltop --file [email protected]
sudo docker run -it -v /sys:/sys  -v /lib/modules:/lib/modules -v /usr/src/:/usr/src --privileged calltop

Have fun !

Developped by Emilien Gobillot

calltop's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

calltop's Issues

[Feature] add possibility to pause print display

Is your feature request related to a problem? Please describe.
Sometimes you want to read an information but it is refresh to quickly. Pressing the space key stops screen refresh. A second press on space key stop pause mode.

Describe the solution you'd like

  • Pause/unpause with space key.
  • During the pause, the program keeps reading the data from map, but do not print anything. Actually pause = skip print_body().
  • There is no pause with batch mode

[BUG] script do not find ebpf.c if not call from directory

Description
The path to load ebpf.c does not work when the script is call from outside the directory.

To Reproduce
sudo ./calltop/calltop.py -p 6413
Traceback (most recent call last):
File "./calltop/calltop.py", line 844, in <module>
main(display)
File "./calltop/calltop.py", line 826, in main
b = create_and_load_bpf(syscall_list, latency)
File "./calltop/calltop.py", line 662, in create_and_load_bpf
with open('ebpf.c', 'r') as ebpf_code:
IOError: [Errno 2] No such file or directory: 'ebpf.c'

[Feature] Show active dynamic filter on process name

Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]
The interactive view does not allow to identify whether a filter is in place or not.
Especially annoying as reset action does not clear the filter (but that's in a separate bug report)

Describe the solution you'd like
The fact that a filter is active is explicit and visible, and possible the filter rule as well.
Maybe added tothe bottom line.

Describe alternatives you've considered
Alternative ideas to display it :

  • add a 'filtered' word in the column title for the processes.
  • adding a second line at the top, providing the filter value

[Feature] Add a column for cumulated time

Is your feature request related to a problem? Please describe.
We have the cumulated number of call, but not the cumulated time.

Describe the solution you'd like
One more column

[Feature] PID filtering have to be done in eBPF

pid filtering have to done in eBPF
The PID filtering at the command line is done in user space. In order to reduce the impact, filtering in eBPF is a better solution.

eBPF filtering
the idea is to used the pid(s) in the eBPF and if it does not match then return 0;

[BUG] Debug option (-d) does nothing

Describe the bug
The debug CLI option (-d) does nothing. Accoridng to inline documentation (-h), it outputs 'eBPF code'

Environment

  • Distribution : Archlinux
  • Kernel : Linux 5.6.3-arch1-1 #1 SMP PREEMPT x86_64 GNU/Linux
  • Version or revision : git clone, hash 8ee8654
  • package dependencies version : bcc 0.13.0-1, bcc-tools 0.13.0-1, python-bcc 0.13.0-1

[Feature] Add a short keys to top/bottom of display

when you scroll down or up and you want to comme back to the top or bottom, it is a bit boring to move back.
Imagine you scroll down a lot, if you want to come back to the top, you will habe to scroll up (a lot too).

Solution.
One shortkey to go at the first line, one to go at the last one.

[Feature] Add percentile on the interval

Display the minimum / maximum latency for every stats collected during the interval.

Computing the percentile 95 or percentile 5 is a bit more tricky since it has to be done in the eBPF and has to be memory and time efficient.

[enhancement] Lookup and insert complexity is O(n); make it O(1)

Lookup and insert complexity is O(n); make it O(1)

The first implementation of a CtCollection is a list of Doc.

  • This was a first naive approach. The cons : not suitable when there are many (1000+) docs in collection.

An hash table is better suited

  • Use a dictionary instead of a list
  • The key will be "pid+process_name"

Output in batch mode includes log messages

Description

When using batch mode, output includes log messages in addition to expected data.
It could confuse output parsing and make integration more difficult.

I've noticed erorr messages and more informal messages as well

To Reproduce
Steps to reproduce the behavior:

  1. Run calltop in batch mode: 'sudo timeout 30 ./calltop.py -b > /tmp/calltop.out'
  2. Look at output, in particular its beginning

Expected behavior
Output only contains data, each batch separated with an empty line.
Not error or other information messages.
Error messages should be sent to stderr.

Actual behaviour
Error/info messages at the beggining of the file:
head /tmp/calltop.out
Failed to attach to kprobe b'__x64_sys_all'
b'Collecting first data ...'
Pid Process name Function latency(us) Call/s Total
569 chromium read 0 1 1
569 chromium poll 0 3 3

Environment

  • kernel: Linux 5.6.2-arch1-2 #1 SMP PREEMPT x86_64 GNU/Linux
  • Distrib: Archlinux x86_64, stable
  • Version or revision: git clone, 8ee8654
  • package dependencies version: bcc 0.13.0-1, bcc-tools 0.13.0-1, python-bcc 0.13.0-1

Additional context
Maybe the best solution would be to add an option parameter for batch output, to be able to send it to a file.
And at least, to redirect all errors and info messages to stderr (?)

[Feature] Add filter on process name

It would be great to filter on process name
just enter the process name or a subpart of it. And display only process name that matched
The solution expected
Like htop F4 filter, or something similar.

[Feature] Header line is not always visible.

Header Line is no longer visible when scrolling down
Header line is only visible on the first screen, but as soon as we scroll down, it disappears. this is not convenient.

Solution
Always print the header line.

key [f] mapped for dynamic filter is not documented

Describe the bug
It is apparently possible to dynamically filter while using calltop in interactive mode.
But how to use it is hard to guess ! I discovered (by error) that if was the F key.

To Reproduce
Steps to reproduce the behavior:

  1. Run calltop in interactive mode: 'sudo ./calltop.py -l'
  2. Look for keymapping on last line
  3. Quit and look for key binding in embeded help: './calltop.py -h'
  4. Look for documentation: 'cat README.md'

Expected behavior
2. Key mapped for filtering is listed in the bottom line

Actual behaviour
2. Key mapped for filtering is listed in the bottom line, as other keys mapped to other actions
3. Nothing about key mapping
4. Nothing about key mapping

Environment

  • kernel: [i.e uname -a output]
  • Distrib [e.g. ubuntu 18.10, Debian 9]
  • Version or revision : git clone, 8ee8654
  • package dependencies version [i.e bpfcc-tools version

Additional context
I may not be relevant to document everywhere, but at least once, and in a consistent way wrt other mapped keys.

[Feature] display latency should be done by default

Is your feature request related to a problem? Please describe.
I think users want by default the average latency per function. So we should remove the flag -l for latency and make it the default behaviour

Describe the solution you'd like
calltop.py will display latency metrics.
calltop.py --no-latency won't display latency metrics

Describe alternatives you've considered
Do we need the --no-latency flags ? the only reason to keep it, is for performance, as we need a second eBPF attached on return.

[BUG] Project requires external dependency python-psutil

Describe the bug
A clear and concise description of what the bug is.

To Reproduce
Steps to reproduce the behavior:

  1. Run './calltop.py'
  2. Execution fails due to missing dependency
    '''$ ./calltop.py
    Traceback (most recent call last):
    File "./calltop.py", line 23, in
    import psutil
    ModuleNotFoundError: No module named 'psutil'
    '''

Expected behavior
Execution does not fail.
Or this dependency is explained in documentation.

Environment

  • kernel: Linux tperson 5.8.5-arch1-1 #1 SMP PREEMPT Thu, 27 Aug 2020 18:53:02 +0000 x86_64 GNU/Linux
  • Distrib: Archlinux

[minor] monotonic_time() returns time in sec

monotonic_time() should return time in nanoseconds

As describes in the docstring monotonic_time() returns time in nanoseconds.

The time returned is in second.
Please fix it.

[BUG] Sorting options are confusing and too limited

Describe the bug
Interactive mode provides a way to sort based on different criteria.
But its seems very limited and seems to work in a quite confusing way.

Expected behavior
It should be possible to change and choose ascending or descending sort.

Additional information
When sorting by "total count" or by "Call/s", resulting order does not seem to be very logic.

It seems to me that process are first ordered by the total calls (all functions summed up), which is value that is not shown ! Maybe you should add a "process summary" line for each process ?

Even though the order seems wrong to me (?). I would expect that, for each process, lines are ordered by descending call/s or total, but it is apparently not the case.

So it is unclear whether sort applies to processes, functions for each process, or both (somehow).

[Feature] Print command line instead of comm when a special key is pressed

Is your feature request related to a problem? Please describe.
For certains process like python, the tool only print python which is not usefull to clearly identify the application.

Describe the solution you'd like
2 solutions :

  • Like in top : press 'c' key to extend the cmd line
  • like with ps : add an argument to the calltop command line to get the command line of every processes

Use the pid to collect the command line in /proc/PID/cmdline :
Note that you will need to replace \NUL char by SPACE. see the output below

cat /proc/3285/cmdline| tr '\0' ' ' 
/usr/lib/firefox/firefox -contentproc -childID 3 -isForBrowser -prefsLen 7169 -prefMapSize 215831 -parentBuildID 20200403064753 -greomni /usr/lib/firefox/omni.ja -appomni /usr/lib/firefox/browser/omni.ja -appdir /usr/lib/firefox/browser 3081 true tab

[Enhancement] make calltop.py compatible with python3

Description
calltop.py works with python2.7.
python3-bpfcc make it possible to use python3 with bcc.

To Reproduce
python3 calltop.py
screen is empty; nothing is reported.

Expected behavior
works like with python2.7

Environment
bpfcc-tools 0.8.0-4
bpftrace 0.8+git60-gccac69c2239b-1
libbpfcc 0.8.0-4
python-bpfcc 0.8.0-4
python3-bpfcc 0.8.0-4

Additional context
install python 3 for bcc
sudo apt-get install python3-bpfcc̀

Dynamic filter is not cleared on reset

Describe the bug
In interactive mode, pressing the reset key (z) does not reset the dynamic filter on processes

To Reproduce
Steps to reproduce the behavior:

  1. Run 'sudo ./latencyTop.py -l'
  2. Filter on one process : press 'f' then a process name (e.g. 'firefox')
  3. Reset : press 'z'

Expected behavior
3. reset action also resets the dynamic filter on process name.

Environment

  • Version or revision: git clone, 8ee8654

[BUG] Add support for terminals without color support

Describe the bug
Does not support terminal without color support

To Reproduce
Steps to reproduce the behavior:

  1. Use a terminal that does not support colors : run xterm -cm
  2. run ./calltop.py
  3. Error :
    Traceback (most recent call last): File "./calltop.py", line 821, in <module> display.printHeader("Exiting...") NameError: name 'display' is not defined

Expected behavior
No error

Environment

  • kernel: [i.e uname -a output]
  • Distrib [e.g. ubuntu 18.10, Debian 9]
  • Version or revision
  • package dependencies version [i.e bpfcc-tools version

Additional context
terminal does not support colors.
please use black and white in this case

[Feature] add batch mode

add batch mode
Add option -b to enable a batch mode. It is useful to record output to text file.

expected output
Keep the same output but print the new stat to stdout at the given rate.

[BUG] with --no-latency flag, latency displayed is 0.00

Describe the bug
When we do not want to collect latency, the latency should not be displayed.

To Reproduce
python3 ./calltop.py --no-latency

Expected behavior

2 solutions:

  • instead of 0.00, please display '-' or N/A or leave it blank - prefered solution
  • Do not display the colomn. Not ideal in batch mode, if it is used with awk. The Call/s column index will be different if latency is wanted or not. So complexity is push to awk

[enhancement] make call rate more accurate

Metrics precision is the most important
The call rate computation is a bit too naïve. It suffers from delay in the main loop, and it happens when there are a lot of data in the bpf map.
Today call_rate = call_nb / static_sampling_interv ==> rps = stSysStats.cntPerIntvl / self.refreshIntvl

Get a more accurate sampling interval per call
The interval should be the time between 2 reads access. If the main loop is slow down, the sampling interval will be bigger and will reflect the right call rate.

Do it in user space
Avoid doing it in the ebpf code. The fewer instructions, the less the tracing impact will be.

stats inactivity is set at 5 sec; seems too much

stats inactivity threshold is set at 5 sec; seems too much

Inactive stats are cleared in the ebpf map

An inactive stats related to a process is a stats not updated after a given amount of time. In order to free the ebpf map entries, we clear the inactive stats as it is less likely to have concurrency of this data.
This design is a workaround to avoid non atomicity of operations on eBPF maps. The less we touch the map, the better is the atomicity (but not guaranteed)

So far the inactivity threshold is set a 5 secs. This is a bit too much.
Let's try 1 second; 1 second is already an eternity.
It will put more pressure on old stats in map:

  • we will save cpu
  • we do not take too many risks

[Feature] Trace more language

Is your feature request related to a problem? Please describe.
Tracing python is cool, tracing more language is super cool 👍

Describe the solution you'd like

  • Auto detect the language behind a PID, so we do not need to specify the language of a process
  • Attach probes to the right functions in order to collect latency and rate

[doc] add a page in the wiki to help building python3 with dtrace support

With ubuntu 20.04 python3 package comes with the dtrace support that allows the tracing with usdt (ebpf). That not the case for the previous ubuntu version. Other linux distributions may have the same problem. It's important the describe how to build python with dtrace enable (--with-dtrace flag)
A wiki pages would do the job.

[Feature] Help new user to identify root privilege are mandatory

The exception below is often related to non-root privilege

Traceback (most recent call last):
File "./calltop.py", line 910, in
main(display)
File "./calltop.py", line 892, in main
b = create_and_load_bpf(syscall_list, latency)
File "./calltop.py", line 762, in create_and_load_bpf
b = BPF(text=prog)
File "/usr/lib/python2.7/dist-packages/bcc/init.py", line 321, in init
raise Exception("Failed to compile BPF text")
Exception: Failed to compile BPF text

Describe the solution you'd like
A more explicit message would be helpful :

It fails compiling and load the eBPF. Do you have root access ?

[BUG] Error when running '--help'

Describe the bug
Running the --help fails. Note that other commands work nicely.

To Reproduce
Steps to reproduce the behavior:

  1. Install python-psutil (if needed)
  2. Run './calltop.py'
  3. Execution fails
    '''
    $ ./calltop.py -h
    Traceback (most recent call last):
    File "./calltop.py", line 1173, in main
    args = parser.parse_args()
    File "/usr/lib/python3.8/argparse.py", line 1768, in parse_args
    args, argv = self.parse_known_args(args, namespace)
    File "/usr/lib/python3.8/argparse.py", line 1800, in parse_known_args
    namespace, args = self._parse_known_args(args, namespace)
    File "/usr/lib/python3.8/argparse.py", line 2006, in _parse_known_args
    start_index = consume_optional(start_index)
    File "/usr/lib/python3.8/argparse.py", line 1946, in consume_optional
    take_action(action, args, option_string)
    File "/usr/lib/python3.8/argparse.py", line 1874, in take_action
    action(self, namespace, argument_values, option_string)
    File "/usr/lib/python3.8/argparse.py", line 1044, in call
    parser.print_help()
    File "/usr/lib/python3.8/argparse.py", line 2494, in print_help
    self._print_message(self.format_help(), file)
    File "/usr/lib/python3.8/argparse.py", line 2478, in format_help
    return formatter.format_help()
    File "/usr/lib/python3.8/argparse.py", line 282, in format_help
    help = self._root_section.format_help()
    File "/usr/lib/python3.8/argparse.py", line 213, in format_help
    item_help = join([func(*args) for func, args in self.items])
    File "/usr/lib/python3.8/argparse.py", line 213, in
    item_help = join([func(*args) for func, args in self.items])
    File "/usr/lib/python3.8/argparse.py", line 213, in format_help
    item_help = join([func(*args) for func, args in self.items])
    File "/usr/lib/python3.8/argparse.py", line 213, in
    item_help = join([func(*args) for func, args in self.items])
    File "/usr/lib/python3.8/argparse.py", line 530, in _format_action
    help_lines = self._split_lines(help_text, help_width)
    File "/usr/lib/python3.8/argparse.py", line 634, in _split_lines
    text = self._whitespace_matcher.sub(' ', text).strip()
    TypeError: cannot use a string pattern on a bytes-like object
    '''

Expected behavior
Help message is provided

Environment

  • kernel: Linux tperson 5.8.5-arch1-1 #1 SMP PREEMPT Thu, 27 Aug 2020 18:53:02 +0000 x86_64 GNU/Linux
  • Distrib: Archlinux
  • Git master : 69e6c3c
  • package dependencies version : python 3.8.5-1

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.