Git Product home page Git Product logo

calltop's Issues

[Feature] Trace more language

Is your feature request related to a problem? Please describe.
Tracing python is cool, tracing more language is super cool 👍

Describe the solution you'd like

  • Auto detect the language behind a PID, so we do not need to specify the language of a process
  • Attach probes to the right functions in order to collect latency and rate

[BUG] Sorting options are confusing and too limited

Describe the bug
Interactive mode provides a way to sort based on different criteria.
But its seems very limited and seems to work in a quite confusing way.

Expected behavior
It should be possible to change and choose ascending or descending sort.

Additional information
When sorting by "total count" or by "Call/s", resulting order does not seem to be very logic.

It seems to me that process are first ordered by the total calls (all functions summed up), which is value that is not shown ! Maybe you should add a "process summary" line for each process ?

Even though the order seems wrong to me (?). I would expect that, for each process, lines are ordered by descending call/s or total, but it is apparently not the case.

So it is unclear whether sort applies to processes, functions for each process, or both (somehow).

[Feature] PID filtering have to be done in eBPF

pid filtering have to done in eBPF
The PID filtering at the command line is done in user space. In order to reduce the impact, filtering in eBPF is a better solution.

eBPF filtering
the idea is to used the pid(s) in the eBPF and if it does not match then return 0;

[minor] monotonic_time() returns time in sec

monotonic_time() should return time in nanoseconds

As describes in the docstring monotonic_time() returns time in nanoseconds.

The time returned is in second.
Please fix it.

[Feature] Add a column for cumulated time

Is your feature request related to a problem? Please describe.
We have the cumulated number of call, but not the cumulated time.

Describe the solution you'd like
One more column

[Feature] Print command line instead of comm when a special key is pressed

Is your feature request related to a problem? Please describe.
For certains process like python, the tool only print python which is not usefull to clearly identify the application.

Describe the solution you'd like
2 solutions :

  • Like in top : press 'c' key to extend the cmd line
  • like with ps : add an argument to the calltop command line to get the command line of every processes

Use the pid to collect the command line in /proc/PID/cmdline :
Note that you will need to replace \NUL char by SPACE. see the output below

cat /proc/3285/cmdline| tr '\0' ' ' 
/usr/lib/firefox/firefox -contentproc -childID 3 -isForBrowser -prefsLen 7169 -prefMapSize 215831 -parentBuildID 20200403064753 -greomni /usr/lib/firefox/omni.ja -appomni /usr/lib/firefox/browser/omni.ja -appdir /usr/lib/firefox/browser 3081 true tab

[BUG] Debug option (-d) does nothing

Describe the bug
The debug CLI option (-d) does nothing. Accoridng to inline documentation (-h), it outputs 'eBPF code'

Environment

  • Distribution : Archlinux
  • Kernel : Linux 5.6.3-arch1-1 #1 SMP PREEMPT x86_64 GNU/Linux
  • Version or revision : git clone, hash 8ee8654
  • package dependencies version : bcc 0.13.0-1, bcc-tools 0.13.0-1, python-bcc 0.13.0-1

[BUG] Add support for terminals without color support

Describe the bug
Does not support terminal without color support

To Reproduce
Steps to reproduce the behavior:

  1. Use a terminal that does not support colors : run xterm -cm
  2. run ./calltop.py
  3. Error :
    Traceback (most recent call last): File "./calltop.py", line 821, in <module> display.printHeader("Exiting...") NameError: name 'display' is not defined

Expected behavior
No error

Environment

  • kernel: [i.e uname -a output]
  • Distrib [e.g. ubuntu 18.10, Debian 9]
  • Version or revision
  • package dependencies version [i.e bpfcc-tools version

Additional context
terminal does not support colors.
please use black and white in this case

[enhancement] make call rate more accurate

Metrics precision is the most important
The call rate computation is a bit too naïve. It suffers from delay in the main loop, and it happens when there are a lot of data in the bpf map.
Today call_rate = call_nb / static_sampling_interv ==> rps = stSysStats.cntPerIntvl / self.refreshIntvl

Get a more accurate sampling interval per call
The interval should be the time between 2 reads access. If the main loop is slow down, the sampling interval will be bigger and will reflect the right call rate.

Do it in user space
Avoid doing it in the ebpf code. The fewer instructions, the less the tracing impact will be.

[Feature] Add percentile on the interval

Display the minimum / maximum latency for every stats collected during the interval.

Computing the percentile 95 or percentile 5 is a bit more tricky since it has to be done in the eBPF and has to be memory and time efficient.

[Feature] Add a short keys to top/bottom of display

when you scroll down or up and you want to comme back to the top or bottom, it is a bit boring to move back.
Imagine you scroll down a lot, if you want to come back to the top, you will habe to scroll up (a lot too).

Solution.
One shortkey to go at the first line, one to go at the last one.

Dynamic filter is not cleared on reset

Describe the bug
In interactive mode, pressing the reset key (z) does not reset the dynamic filter on processes

To Reproduce
Steps to reproduce the behavior:

  1. Run 'sudo ./latencyTop.py -l'
  2. Filter on one process : press 'f' then a process name (e.g. 'firefox')
  3. Reset : press 'z'

Expected behavior
3. reset action also resets the dynamic filter on process name.

Environment

  • Version or revision: git clone, 8ee8654

[Feature] Show active dynamic filter on process name

Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]
The interactive view does not allow to identify whether a filter is in place or not.
Especially annoying as reset action does not clear the filter (but that's in a separate bug report)

Describe the solution you'd like
The fact that a filter is active is explicit and visible, and possible the filter rule as well.
Maybe added tothe bottom line.

Describe alternatives you've considered
Alternative ideas to display it :

  • add a 'filtered' word in the column title for the processes.
  • adding a second line at the top, providing the filter value

[BUG] Project requires external dependency python-psutil

Describe the bug
A clear and concise description of what the bug is.

To Reproduce
Steps to reproduce the behavior:

  1. Run './calltop.py'
  2. Execution fails due to missing dependency
    '''$ ./calltop.py
    Traceback (most recent call last):
    File "./calltop.py", line 23, in
    import psutil
    ModuleNotFoundError: No module named 'psutil'
    '''

Expected behavior
Execution does not fail.
Or this dependency is explained in documentation.

Environment

  • kernel: Linux tperson 5.8.5-arch1-1 #1 SMP PREEMPT Thu, 27 Aug 2020 18:53:02 +0000 x86_64 GNU/Linux
  • Distrib: Archlinux

[Feature] Add filter on process name

It would be great to filter on process name
just enter the process name or a subpart of it. And display only process name that matched
The solution expected
Like htop F4 filter, or something similar.

[Feature] display latency should be done by default

Is your feature request related to a problem? Please describe.
I think users want by default the average latency per function. So we should remove the flag -l for latency and make it the default behaviour

Describe the solution you'd like
calltop.py will display latency metrics.
calltop.py --no-latency won't display latency metrics

Describe alternatives you've considered
Do we need the --no-latency flags ? the only reason to keep it, is for performance, as we need a second eBPF attached on return.

[Feature] Help new user to identify root privilege are mandatory

The exception below is often related to non-root privilege

Traceback (most recent call last):
File "./calltop.py", line 910, in
main(display)
File "./calltop.py", line 892, in main
b = create_and_load_bpf(syscall_list, latency)
File "./calltop.py", line 762, in create_and_load_bpf
b = BPF(text=prog)
File "/usr/lib/python2.7/dist-packages/bcc/init.py", line 321, in init
raise Exception("Failed to compile BPF text")
Exception: Failed to compile BPF text

Describe the solution you'd like
A more explicit message would be helpful :

It fails compiling and load the eBPF. Do you have root access ?

[doc] add a page in the wiki to help building python3 with dtrace support

With ubuntu 20.04 python3 package comes with the dtrace support that allows the tracing with usdt (ebpf). That not the case for the previous ubuntu version. Other linux distributions may have the same problem. It's important the describe how to build python with dtrace enable (--with-dtrace flag)
A wiki pages would do the job.

[BUG] with --no-latency flag, latency displayed is 0.00

Describe the bug
When we do not want to collect latency, the latency should not be displayed.

To Reproduce
python3 ./calltop.py --no-latency

Expected behavior

2 solutions:

  • instead of 0.00, please display '-' or N/A or leave it blank - prefered solution
  • Do not display the colomn. Not ideal in batch mode, if it is used with awk. The Call/s column index will be different if latency is wanted or not. So complexity is push to awk

[BUG] script do not find ebpf.c if not call from directory

Description
The path to load ebpf.c does not work when the script is call from outside the directory.

To Reproduce
sudo ./calltop/calltop.py -p 6413
Traceback (most recent call last):
File "./calltop/calltop.py", line 844, in <module>
main(display)
File "./calltop/calltop.py", line 826, in main
b = create_and_load_bpf(syscall_list, latency)
File "./calltop/calltop.py", line 662, in create_and_load_bpf
with open('ebpf.c', 'r') as ebpf_code:
IOError: [Errno 2] No such file or directory: 'ebpf.c'

[Feature] add possibility to pause print display

Is your feature request related to a problem? Please describe.
Sometimes you want to read an information but it is refresh to quickly. Pressing the space key stops screen refresh. A second press on space key stop pause mode.

Describe the solution you'd like

  • Pause/unpause with space key.
  • During the pause, the program keeps reading the data from map, but do not print anything. Actually pause = skip print_body().
  • There is no pause with batch mode

[BUG] Error when running '--help'

Describe the bug
Running the --help fails. Note that other commands work nicely.

To Reproduce
Steps to reproduce the behavior:

  1. Install python-psutil (if needed)
  2. Run './calltop.py'
  3. Execution fails
    '''
    $ ./calltop.py -h
    Traceback (most recent call last):
    File "./calltop.py", line 1173, in main
    args = parser.parse_args()
    File "/usr/lib/python3.8/argparse.py", line 1768, in parse_args
    args, argv = self.parse_known_args(args, namespace)
    File "/usr/lib/python3.8/argparse.py", line 1800, in parse_known_args
    namespace, args = self._parse_known_args(args, namespace)
    File "/usr/lib/python3.8/argparse.py", line 2006, in _parse_known_args
    start_index = consume_optional(start_index)
    File "/usr/lib/python3.8/argparse.py", line 1946, in consume_optional
    take_action(action, args, option_string)
    File "/usr/lib/python3.8/argparse.py", line 1874, in take_action
    action(self, namespace, argument_values, option_string)
    File "/usr/lib/python3.8/argparse.py", line 1044, in call
    parser.print_help()
    File "/usr/lib/python3.8/argparse.py", line 2494, in print_help
    self._print_message(self.format_help(), file)
    File "/usr/lib/python3.8/argparse.py", line 2478, in format_help
    return formatter.format_help()
    File "/usr/lib/python3.8/argparse.py", line 282, in format_help
    help = self._root_section.format_help()
    File "/usr/lib/python3.8/argparse.py", line 213, in format_help
    item_help = join([func(*args) for func, args in self.items])
    File "/usr/lib/python3.8/argparse.py", line 213, in
    item_help = join([func(*args) for func, args in self.items])
    File "/usr/lib/python3.8/argparse.py", line 213, in format_help
    item_help = join([func(*args) for func, args in self.items])
    File "/usr/lib/python3.8/argparse.py", line 213, in
    item_help = join([func(*args) for func, args in self.items])
    File "/usr/lib/python3.8/argparse.py", line 530, in _format_action
    help_lines = self._split_lines(help_text, help_width)
    File "/usr/lib/python3.8/argparse.py", line 634, in _split_lines
    text = self._whitespace_matcher.sub(' ', text).strip()
    TypeError: cannot use a string pattern on a bytes-like object
    '''

Expected behavior
Help message is provided

Environment

  • kernel: Linux tperson 5.8.5-arch1-1 #1 SMP PREEMPT Thu, 27 Aug 2020 18:53:02 +0000 x86_64 GNU/Linux
  • Distrib: Archlinux
  • Git master : 69e6c3c
  • package dependencies version : python 3.8.5-1

Output in batch mode includes log messages

Description

When using batch mode, output includes log messages in addition to expected data.
It could confuse output parsing and make integration more difficult.

I've noticed erorr messages and more informal messages as well

To Reproduce
Steps to reproduce the behavior:

  1. Run calltop in batch mode: 'sudo timeout 30 ./calltop.py -b > /tmp/calltop.out'
  2. Look at output, in particular its beginning

Expected behavior
Output only contains data, each batch separated with an empty line.
Not error or other information messages.
Error messages should be sent to stderr.

Actual behaviour
Error/info messages at the beggining of the file:
head /tmp/calltop.out
Failed to attach to kprobe b'__x64_sys_all'
b'Collecting first data ...'
Pid Process name Function latency(us) Call/s Total
569 chromium read 0 1 1
569 chromium poll 0 3 3

Environment

  • kernel: Linux 5.6.2-arch1-2 #1 SMP PREEMPT x86_64 GNU/Linux
  • Distrib: Archlinux x86_64, stable
  • Version or revision: git clone, 8ee8654
  • package dependencies version: bcc 0.13.0-1, bcc-tools 0.13.0-1, python-bcc 0.13.0-1

Additional context
Maybe the best solution would be to add an option parameter for batch output, to be able to send it to a file.
And at least, to redirect all errors and info messages to stderr (?)

[Feature] add batch mode

add batch mode
Add option -b to enable a batch mode. It is useful to record output to text file.

expected output
Keep the same output but print the new stat to stdout at the given rate.

[enhancement] Lookup and insert complexity is O(n); make it O(1)

Lookup and insert complexity is O(n); make it O(1)

The first implementation of a CtCollection is a list of Doc.

  • This was a first naive approach. The cons : not suitable when there are many (1000+) docs in collection.

An hash table is better suited

  • Use a dictionary instead of a list
  • The key will be "pid+process_name"

key [f] mapped for dynamic filter is not documented

Describe the bug
It is apparently possible to dynamically filter while using calltop in interactive mode.
But how to use it is hard to guess ! I discovered (by error) that if was the F key.

To Reproduce
Steps to reproduce the behavior:

  1. Run calltop in interactive mode: 'sudo ./calltop.py -l'
  2. Look for keymapping on last line
  3. Quit and look for key binding in embeded help: './calltop.py -h'
  4. Look for documentation: 'cat README.md'

Expected behavior
2. Key mapped for filtering is listed in the bottom line

Actual behaviour
2. Key mapped for filtering is listed in the bottom line, as other keys mapped to other actions
3. Nothing about key mapping
4. Nothing about key mapping

Environment

  • kernel: [i.e uname -a output]
  • Distrib [e.g. ubuntu 18.10, Debian 9]
  • Version or revision : git clone, 8ee8654
  • package dependencies version [i.e bpfcc-tools version

Additional context
I may not be relevant to document everywhere, but at least once, and in a consistent way wrt other mapped keys.

[Enhancement] make calltop.py compatible with python3

Description
calltop.py works with python2.7.
python3-bpfcc make it possible to use python3 with bcc.

To Reproduce
python3 calltop.py
screen is empty; nothing is reported.

Expected behavior
works like with python2.7

Environment
bpfcc-tools 0.8.0-4
bpftrace 0.8+git60-gccac69c2239b-1
libbpfcc 0.8.0-4
python-bpfcc 0.8.0-4
python3-bpfcc 0.8.0-4

Additional context
install python 3 for bcc
sudo apt-get install python3-bpfcc̀

stats inactivity is set at 5 sec; seems too much

stats inactivity threshold is set at 5 sec; seems too much

Inactive stats are cleared in the ebpf map

An inactive stats related to a process is a stats not updated after a given amount of time. In order to free the ebpf map entries, we clear the inactive stats as it is less likely to have concurrency of this data.
This design is a workaround to avoid non atomicity of operations on eBPF maps. The less we touch the map, the better is the atomicity (but not guaranteed)

So far the inactivity threshold is set a 5 secs. This is a bit too much.
Let's try 1 second; 1 second is already an eternity.
It will put more pressure on old stats in map:

  • we will save cpu
  • we do not take too many risks

[Feature] Header line is not always visible.

Header Line is no longer visible when scrolling down
Header line is only visible on the first screen, but as soon as we scroll down, it disappears. this is not convenient.

Solution
Always print the header line.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.