Git Product home page Git Product logo

retis's Introduction

Retis

Tracing packets in the Linux networking stack, using eBPF and interfacing with control and data paths such as OvS or Netfilter.

Visit the online documentation for more details.

demo

Quick start

An overview and some examples can be found in the Documentation, but note the --help flag should document most of what Retis can do.

$ retis --help
...
$ retis <command> --help
...

Examples

Drop monitoring

Listing packets being dropped by the kernel with an associated stack trace and drop reason

$ retis -p dropmon collect
00:42:00 [INFO] 4 probe(s) loaded

3392678938917 [nc] 2311 [tp] skb:kfree_skb drop (NO_SOCKET)
    bpf_prog_3a0ef5414c2f6fca_sd_devices+0xa0ad
    bpf_prog_3a0ef5414c2f6fca_sd_devices+0xa0ad
    bpf_trace_run3+0x52
    kfree_skb_reason+0x8f
    tcp_v6_rcv+0x77
    ip6_protocol_deliver_rcu+0x6b
    ip6_input_finish+0x43
    __netif_receive_skb_one_core+0x62
    process_backlog+0x85
    __napi_poll+0x28
    net_rx_action+0x2a4
    __do_softirq+0xd1
    do_softirq.part.0+0x5f
    __local_bh_enable_ip+0x68
    __dev_queue_xmit+0x293
    ip6_finish_output2+0x2a3
    ip6_finish_output+0x160
    ip6_xmit+0x2c0
    inet6_csk_xmit+0xe9
    __tcp_transmit_skb+0x534
    tcp_connect+0xaf6
    tcp_v6_connect+0x515
    __inet_stream_connect+0x103
    inet_stream_connect+0x3a
    __sys_connect+0xa8
    __x64_sys_connect+0x18
    do_syscall_64+0x5d
    entry_SYSCALL_64_after_hwframe+0x72
  if 1 (lo) rxif 1 ::1.60634 > ::1.80 ttl 64 label 0x9c404 len 40 proto TCP (6) flags [S] seq 3918324244 win 65476
...

Monitoring packets dropped by netfilter

The exact nft rule can be retrieved using nft -a list table ....

$ retis -p nft-dropmon collect --allow-system-changes
00:42:00 [INFO] 4 probe(s) loaded

3443313082998 [swapper/0] 0 [k] __nft_trace_packet
    __nft_trace_packet+0x1
    nft_do_chain+0x3ef
    nft_do_chain_inet+0x54
    nf_hook_slow+0x42
    ip_local_deliver+0xd0
    ip_sublist_rcv_finish+0x7e
    ip_sublist_rcv+0x186
    ip_list_rcv+0x13d
    __netif_receive_skb_list_core+0x29d
    netif_receive_skb_list_internal+0x1d1
    napi_complete_done+0x72
    virtnet_poll+0x3ce
    __napi_poll+0x28
    net_rx_action+0x2a4
    __do_softirq+0xd1
    __irq_exit_rcu+0xbe
    common_interrupt+0x86
    asm_common_interrupt+0x26
    pv_native_safe_halt+0xf
    default_idle+0x9
    default_idle_call+0x2c
    do_idle+0x226
    cpu_startup_entry+0x1d
    __pfx_kernel_init+0x0
    arch_call_rest_init+0xe
    start_kernel+0x71e
    x86_64_start_reservations+0x18
    x86_64_start_kernel+0x96
    __pfx_verify_cpu+0x0
  if 2 (eth0) rxif 2 172.16.42.1.52294 > 172.16.42.2.8080 ttl 64 tos 0x0 id 37968 off 0 [DF] len 60 proto TCP (6) flags [S] seq 1971640626 win 64240
  table firewalld (1) chain filter_IN_FedoraServer (202) handle 215 drop
...
$ nft -a list table inet firewalld
...
	chain filter_IN_FedoraServer { # handle 202
...
		jump filter_INPUT_POLICIES_post # handle 214
		meta l4proto { icmp, ipv6-icmp } accept # handle 273
		reject with icmpx admin-prohibited # handle 215         <- This one
	}
...

Installation

Retis can be installed from COPR for rpm-compatible distributions, from a container image or from sources.

COPR

RPM packages for Fedora (currently supported releases including Rawhide), RHEL (>= 8) and EPEL (>= 8) are available.

$ dnf -y copr enable @retis/retis
$ dnf -y install retis
$ retis --help

Or on older distributions,

$ yum -y copr enable @retis/retis
$ yum -y install retis
$ retis --help

Container image

The preferred method to run Retis in a container is by using the provided retis_in_container.sh script,

$ curl -O https://raw.githubusercontent.com/retis-org/retis/main/tools/retis_in_container.sh
$ chmod +x retis_in_container.sh
$ ./retis_in_container.sh --help

The Retis container can also be run manually,

$ podman run --privileged --rm -it --pid=host \
      --cap-add SYS_ADMIN --cap-add BPF --cap-add SYSLOG \
      -v /sys/kernel/btf:/sys/kernel/btf:ro \
      -v /sys/kernel/debug:/sys/kernel/debug:ro \
      -v /boot/config-$(uname -r):/kconfig:ro \
      -v $(pwd):/data:rw \
      quay.io/retis/retis:latest --help
  • Or using docker in place of podman in the above.

  • When running on CoreOS, Fedora Silverblue and friends replace -v /boot/config-$(uname -r):/kconfig:ro with -v /lib/modules/$(uname -r)/config:/kconfig:ro in the above.

The /data container mount point is used to allow storing persistent data for future use (e.g. logged events using the -o cli option).

From sources

For details on how to build retis, visit the documentation.

Limitations

Known and current limitations:

  • By default Retis does not modify the system (e.g. load kernel modules, change the configuration, add a firewalling rule). This is done on purpose but might mean some prerequisites will be missing if not added manually. The only example for now is the nft module that requires a specific nft rule to be inserted. If that rule is not there, no nft event will be reported. To allow Retis to modify the system, use the --allow-system-changes option when running the collect command. See retis collect --help for further details about changes applied to the system.

  • Retis operates mainly on struct sk_buff objects meaning a good part of locally generated traffic can't be traced at the moment. E.g. locally generated traffic from a container can be traced when it exits the container.

  • Profiles combination might fail if flags are used multiple times or if some arguments are incompatible. Use with care.

Additional notes (not strictly limitations):

  • Filtering & tracking packets being modified can only work if the packet is at least seen once in a form where it can be matched against the filter. E.g. tracking SNATed packets only in skb:consume_skb with a filter on the original address won't generate any event.

  • As explained in the filtering section filters are eventually translated to eBPF instructions. Currently, the maximum size of an eBPF filter is 4096 instructions.

  • Some fields present in the packet might not be reported when probes are early in the stack, while being shown in later ones. This is because Retis probes rely on the networking stack knowledge of the packet and if some parts weren't processed yet they can't be reported. E.g. TCP ports won't be reported from kprobe:ip_rcv.

retis's People

Contributors

atenart avatar vlrpl avatar amorenoz avatar ffmancera avatar liuhangbin avatar dependabot[bot] avatar dmendes-rh avatar stek29 avatar

Stargazers

Juan Bran avatar Sundaram krishnan avatar Alessandro Ratti avatar Tianyuan Sun avatar Ofek Shaked avatar coolder avatar Chris R Harwell avatar Kabochar avatar ablert avatar insects avatar Chaney Zorn avatar smallnest avatar  avatar Justin Lamp avatar Christos Katsakioris avatar Simone Ferrini avatar  avatar 김은빈 avatar wanghekai avatar Nicolas Vincent avatar Harrison Liu avatar Mehrdad Arshad Rad avatar  avatar David Sastre Medina avatar Juanma Sánchez avatar Pouriya Jamshidi avatar  avatar Jonas Jelonek avatar Cheney avatar Ozan Uslan avatar Arindam Das avatar  avatar  avatar Matheus Castanho avatar Shailendra Sharma avatar Org Contrib avatar Mark Kogan avatar Andrew Onyshchuk avatar Chris Wulff avatar Kenta Matsuo avatar faicker avatar yjuba avatar  avatar Julien Pinsonneau avatar Eishun Kondoh avatar Nobuhiro MIKI avatar Feng Pan avatar Vladislav Odintsov avatar Ger avatar David Hill avatar Martin Kennelly avatar Walter C avatar Eugene Peregudov avatar Damian Wojsław avatar  avatar Florian Wilhelm avatar Sergey Nizovtsev avatar Nikita Malyavin avatar Jeremias Blendin avatar Bao Nguyen avatar  avatar Krzysztof Wilczyński avatar  avatar  avatar fengxsong avatar Félix Cantournet avatar Logan Blyth avatar sfang avatar Iqbal H avatar JH avatar Sandipan Roy avatar Krzysztof Oziomek avatar Miguel Duarte Barroso avatar Vincent Quéméner avatar Sergiu Moga avatar milti leonard avatar Ivan Mikhaylov avatar Jalal Mostafa avatar Thiago Navarro avatar Jose Luis Segura Lucas avatar  avatar Jun Gu avatar keachi avatar duritong avatar Carlos Cardeñosa avatar  avatar

Watchers

Alban Crequy avatar Juanma Sánchez avatar  avatar Mohamed S. Mahmoud avatar  avatar  avatar

retis's Issues

Error reporting from BPF

Implement an error reporting mechanism (in a dedicated map?) for retrieving errors from BPF. This could for example be used to detect if the event map is full and an event is being ignored, or if we can't find an entry in a map for various reasons.

Freplace support

Kernel probes will implement a way for other modules to add eBPF hooks to parse extra arguments and augment the events. A good solution would be to use freplace and an XDP dispatcher like logic.

At the moment it seems libbpf_rs does not support freplace, extra work might be needed.

Simple post-processing command to group and reorder events

Support an initial (default?) post-processing command which would group and reorder events based on (at least) the skb tracking data and the event timestamps. This will be quite handy to understand a packet life in the networking stack. Some kind of formatting might also be needed to provide a nice user interface.

Some options we might consider to support:

  • Provide parameters to match packets, e.g. a starting and/or ending timestamp.
  • A parameter to only display data coming from a set of collectors / functions / having a specific field, field value, etc.
  • Select how to order packets: buffer address, skb address, first timestamp, current one, etc.
  • Having a “diff” view, showing for a given skb only the fields that have changed compared to the previous event.

Add a way to load raw instructions

We might want to have the chance to load raw insns programs (mostly for filtering).
Libbpf doesn't expose any wrapper for that as it targets ELF files.
Implementing a small Rust module to do that seem to be the best option at the moment.

Attach to sockets

We might want to hook to sockets and have early/late filtering on packets. This could allow to better construct a packet lifetime in the Tx path; and to have extra information in Rx.

The corresponding BPF program types (BPF_PROG_TYPE_SK_MSG/SKB) have access to either struct __sk_buff or struct sk_msg_md. This should probably be split into two issues when assigned.

Cmdline parsing

Collectors should have a way to register cmdline arguments and to retrieve their value when the program starts.

A possible solution would be to use clap with Option<Vec<clap::Arg>> when registering a collector and Option<clap::ArgMatches> as one of its init() arguments.

CI: launch runtime tests in a VM

Runtime tests[1] are currently skipped by default but could be run a in controlled VM. The reason we skip them for now is because they do require privileged capabilities.

[1] # cargo test --features=test_cap_bpf

Allow modules to define their own ebpf programs?

While writing #90 I found myself what seemed to be abusing the Probe+Hook system.

It might be specific to OVS module or it might be needed by other modules, that's something to discuss but for me it was clear that some of the hooks I was attaching to some probes didn't need to be hooks.

The main use case was to add a small ebpf program that creates some context (and, say, stores it in a map) and maybe another program later on that clears it. These programs do not send events but need to share the map fd with the hook that will retrieve this context to enrich the event it sends.

Some open questions:

  • First, of course, is this usecase something we want to allow.
  • Would filtering be available? How?
  • Would the module open, load and attach it's own program or would it register the program somehow with the core infrastructure so we can centrally track what are we attaching where.

skb tracking

A way to uniquely identify packets is required so we can reconstruct their life later on.

Net stack collector

Collector that should fill the events with networking stack generic data (info about the skb, interfaces, netns, etc).

BTF library

A BTF library to parse and expose data types and functions. It must be able to read the BTF information from multiple source, as we might needed it for various targets (kernel, OVS, etc).

OVS module makes the tool to fail if the OVS daemon isn't installed

If the tool is started on an environment where there is no OVS daemon, it will report an error and always fail. We should let the tool to continue working for those kind of issues, otherwise the default --collectors option won't often be working.

At the same time not failing when we do expect OVS events would not be good. A solution might be to add another cli option to decide whether or not those kind of issues are acceptable. This option could be used in profiles to make the user experience OK. But there might be other solutions.

Firewalling collector

Investigate and see if we can support a collector retrieving firewalling data. The use case would be for example to link a packet being dropped to an installed rule.

Core collector module

An interface for collectors to implement is needed so we can drive them in batches, as well as a way to register them to a group.

Runtime discovery of the system characteristics

Runtime discovery of what is running, in which version, etc might be handy for:

  • Automatically starting default collectors which have a dependency on a daemon or userspace part, or loaded kernel module.
  • Allow to change the behavior of collectors based on the versions found on a system. Capabilities might be restricted.
  • We also might have an automatic mode enabling all collectors retrieving information from what is running on a given machine.

Support for loading external BPF object in hooks

We could support external BPF objects and load them into hooks. Those external objects could be useful to 1) have a collection of small utilities for users to load in addition to the core features 2) let users compile and provide their own hooks for finer inspection of the stack, as many debugging sessions end up looking for very specific information.

Things to consider:

  • Build environment so they can use our hook infrastructure (trace context, helpers, hook definition, etc.) and build compatible objects.
  • Command line argument to provide hooks. We will also need to support the targeted vs generic hooking mechanism.
  • Logic to report data to the Rust userspace part and add it to the events.
  • How to distribute external hooks, if we do provide a collection of useful ones. (Which would be good and let users augment the tool capabilities over time). Extra care would be needed to avoid loading objects using outdated APIs.

allow to retrieve backtraces for specific symbols

It could be convenient, for a given packet, if it could be possible to show a backtrace.
This needs some investigation first.
Probably not every probe that matches the packet is supposed to generate events (symbol whitelist might make sense).

Interface for probe hooks to access data

BPF probes access data in a probe-specific way, usually using a dedicated context structure. For hooks to safely access this data later on, an interface is required to both pass the data across hooks and to allow them to query for a specific structure or argument #.

Profile support

Instead of letting the users find about all cmdline arguments they should use in a given situation (which collectors to use, where to probe, what extra data should be retrieved, etc.) they could use profiles. Profiles would be a set of cmdline options for a specific use case,such as, "let's inspect the TCP stack".

As discussed in the initial proposal for #10 profiles should reuse the cmdline parsing logic. There are however things to consider:

  • Argument priority? Should profile override user arguments, or the other way. Should we issue a warning?
  • How to define profiles. They should be embedded within the tool for easy deployment. We also want to support external ones.

Unmarhaller registration and execution should be done in two separated steps

While working on #62 we've added a Cache to the Unmarshalers. The whole idea is to allow Unmarshalers to keep some state.
I tried to implement it using a more natural way: using a struct, i.e: having Unmarshaler be a Trait and have a default implementation of that trait for Fn .... It works nicely except for one (big) problem: the list of unmarshalers is sent to a different thread while being updated (registered) so it must be sent as unmutable.

We should refactor this code to make it a two step process so we can move ownership of the entire Unmarshaler list to the unmarshaler thread so they can be turned into stateful structs.

skb collector

The skb collector will be responsible of installing probes to function / tracepoints having an skb as one of their parameters. It won't process much by itself and will delegate the event augmentation to other collectors (OVS, net, ...) by allowing them to provide hooks.

It should support kprobes, fexit and raw tracepoints at minimum.

Some of the skb internals are topics covered by dedicated issues and part of the logic might be shared with other collectors.

Checksum collector for skb (and more?)

Support loading a hook to recompute the checksum of packets and report the result & all related info (if any).

Alternatively if #30 is supported this could probably come as an external BPF object.

Allow to specify a custom btf file path

CONFIG_DEBUG_INFO_BTF=y
CONFIG_DEBUG_INFO_BTF_MODULES=y

both are not always available in major distros. For example, in RHEL 8, CONFIG_DEBUG_INFO_BTF_MODULES is unset.
It might be useful for the user to specify a path to a list of raw BTF files

Report dropped events and errors

As the number of events increase we will drop events at some point.
One typical place where events will be dropped is when reserving ringbuf space. Reporting will give users a hint that they might want to increase the ringbuf space (which should be an option)

A possible implementation could be to create an map that can be indexed by probes (to know which events were lost) and whose values can be increased when we fail to allocate ring buffer o hooks return errors.

When interrupted, the ProbeManager should read the content of that map and report its contents.

User experience review

Before releasing let's make a ux review and check cmd line options, help, documentation, consistency, etc.

Fexit probe support

The current logic to replace hooks into loaded BPF objects uses fexit under the hood. As we can't for now use fexit on fexit functions, we do not support hooking to fexit probes.

We should investigate this and see if there is a way to support this, in some ways. This would be handy for retrieving functions retvals. One option would be to use fexit only for the retval retrieval while still allowing to attach hooks to that function using kprobe.

On the technical part, handling fexit probes dynamically should look like the logic we currently have for raw tracepoints.

Add a contributing doc

Could be in CONTRIBUTING.md and should contain pointers on how to contribute, what to check before the CI does, etc. Another aspect would be to write a small example (with explanations) on how to write a collector.

Please drop below raw information that should be part of it.

Support conntrack

We should have a module dumping the conntrack every so often. This could give us the ability to:

  • Track NATed packet matching one of our filters.
  • Simply monitor the status of a tracked connection in order to see what flags are set and comparing those kind of info with the expectation

ovs collector panicked while creating a bridge

a panic was observed while creating an ovs bridge with the tool already running.
Below the trace:

RUST_BACKTRACE=full ./target/debug/packet-tracer collect -c ovs
18:01:52 [INFO] Attaching probe to usdt /usr/local/sbin/ovs-vswitchd:dpif_netlink_operate__:op_flow_execute
thread '<unnamed>' panicked at 'attempt to subtract with overflow', src/core/user/proc.rs:384:62
stack backtrace:
   0:     0x55a2cf25d2c0 - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::hb280c2b0faedb192
   1:     0x55a2cf27a93e - core::fmt::write::h30e0b7ef777337ad
   2:     0x55a2cf25ac35 - std::io::Write::write_fmt::h86627e30c2b512b3
   3:     0x55a2cf25d085 - std::sys_common::backtrace::print::h7ed0882ed869c236
   4:     0x55a2cf25e90f - std::panicking::default_hook::{{closure}}::h9a127e13324a150a
   5:     0x55a2cf25e64a - std::panicking::default_hook::hf8f07fa1688cedd2
   6:     0x55a2cf25f008 - std::panicking::rust_panic_with_hook::he6d410a49c1deab2
   7:     0x55a2cf25ed61 - std::panicking::begin_panic_handler::{{closure}}::h3a4af972edd4df52
   8:     0x55a2cf25d76c - std::sys_common::backtrace::__rust_end_short_backtrace::h04151587e1857959
   9:     0x55a2cf25eac2 - rust_begin_unwind
  10:     0x55a2cef548d3 - core::panicking::panic_fmt::h5085b5d784b56c67
  11:     0x55a2cef549ad - core::panicking::panic::h699f7acfe9b26bc1
  12:     0x55a2cef89d1c - packet_tracer::core::user::proc::Process::get_note_from_symbol::h42392f1df81038e3
                               at /home/pvalerio/workspace/open_source/github/net-trace/packet-tracer-vlrpl/src/core/user/proc.rs:384:62
  13:     0x55a2cef99fef - packet_tracer::core::probe::user::user::register_unmarshaler::{{closure}}::h7174ef40370fcc82
                               at /home/pvalerio/workspace/open_source/github/net-trace/packet-tracer-vlrpl/src/core/probe/user/user.rs:98:24
  14:     0x55a2cefc75f5 - <alloc::boxed::Box<F,A> as core::ops::function::Fn<Args>>::call::h0eaa2820cd54c2e2
                               at /builddir/build/BUILD/rustc-1.66.1-src/library/alloc/src/boxed.rs:2001:9
  15:     0x55a2cefe6120 - packet_tracer::core::events::bpf::parse_raw_event::h299ad2fecfa5aede
                               at /home/pvalerio/workspace/open_source/github/net-trace/packet-tracer-vlrpl/src/core/events/bpf.rs:262:25
  16:     0x55a2ceffa733 - packet_tracer::core::events::bpf::BpfEvents::start_polling::{{closure}}::h94dc8d997d7c95c3
                               at /home/pvalerio/workspace/open_source/github/net-trace/packet-tracer-vlrpl/src/core/events/bpf.rs:132:31
  17:     0x55a2cf018e34 - libbpf_rs::ringbuf::RingBufferBuilder::call_sample_cb::h6d1234f729d02c26
                               at /home/pvalerio/.cargo/git/checkouts/libbpf-rs-a64433d6203387de/52ab250/libbpf-rs/src/ringbuf.rs:128:9
  18:     0x55a2cf04cd31 - ringbuf_process_ring
                               at /home/pvalerio/.cargo/registry/src/github.com-1ecc6299db9ec823/libbpf-sys-1.0.4+v1.0.1/libbpf/src/ringbuf.c:231:11
  19:     0x55a2cf04ce31 - ring_buffer__poll
                               at /home/pvalerio/.cargo/registry/src/github.com-1ecc6299db9ec823/libbpf-sys-1.0.4+v1.0.1/libbpf/src/ringbuf.c:288:9
  20:     0x55a2cf018ee4 - libbpf_rs::ringbuf::RingBuffer::poll::h211593462a5b2144
                               at /home/pvalerio/.cargo/git/checkouts/libbpf-rs-a64433d6203387de/52ab250/libbpf-rs/src/ringbuf.rs:157:28
  21:     0x55a2ceffacfc - packet_tracer::core::events::bpf::BpfEvents::start_polling::{{closure}}::h016bc39abb859420
                               at /home/pvalerio/workspace/open_source/github/net-trace/packet-tracer-vlrpl/src/core/events/bpf.rs:158:17
  22:     0x55a2cefa51a1 - std::sys_common::backtrace::__rust_begin_short_backtrace::hafb370250f6afa6b
                               at /builddir/build/BUILD/rustc-1.66.1-src/library/std/src/sys_common/backtrace.rs:121:18
  23:     0x55a2cef78e01 - std::thread::Builder::spawn_unchecked_::{{closure}}::{{closure}}::hc858349d835e0303
                               at /builddir/build/BUILD/rustc-1.66.1-src/library/std/src/thread/mod.rs:551:17
  24:     0x55a2cefe3e51 - <core::panic::unwind_safe::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once::hdb80aa3a4e01895b
                               at /builddir/build/BUILD/rustc-1.66.1-src/library/core/src/panic/unwind_safe.rs:271:9
  25:     0x55a2cef8bd11 - std::panicking::try::do_call::hfc928c58770113b0
                               at /builddir/build/BUILD/rustc-1.66.1-src/library/std/src/panicking.rs:483:40
  26:     0x55a2cef8bebb - __rust_try
  27:     0x55a2cef8ba5f - std::panicking::try::hc2c9b75d3499bfe0
                               at /builddir/build/BUILD/rustc-1.66.1-src/library/std/src/panicking.rs:447:19
  28:     0x55a2cef7c401 - std::panic::catch_unwind::h72fc8bbca879c25b
                               at /builddir/build/BUILD/rustc-1.66.1-src/library/std/src/panic.rs:137:14
  29:     0x55a2cef7872c - std::thread::Builder::spawn_unchecked_::{{closure}}::h6537250acbf18c1c
                               at /builddir/build/BUILD/rustc-1.66.1-src/library/std/src/thread/mod.rs:550:30
  30:     0x55a2cefea8ee - core::ops::function::FnOnce::call_once{{vtable.shim}}::h09a1881c17915317
                               at /builddir/build/BUILD/rustc-1.66.1-src/library/core/src/ops/function.rs:251:5
  31:     0x55a2cf261b53 - std::sys::unix::thread::Thread::new::thread_start::hfad602368217ab7c
  32:     0x7f3686b8e12d - start_thread
  33:     0x7f3686c0fbc0 - clone3

Support latency measurements

With events coming from different functions and subsystems for the same packets, we might be able to perform some latency measurement. This however is not a subject that can be overlooked so a proper investigation is required.

Initial OVS collector

We need a collector hooking to OVS data/control path for gathering OVS specific information. The exact scope is yet to be defined.

Cluster wide tracing

Allow the collecting event part to be run on multiple machines and generating events retrieved at a single point. For this to happen some kind of synchronization (including timestamp) and data passing is required.

For example, trace-cmd is supporting something similar.

Documentation on how to write collectors and hooks

We should have at least a starting page with pointers and examples on how to write collectors and hooks. That will be required to allow easier external contributions. If we support external hooks, the hook documentation might also be used for that (see #30).

Kernel symbols support (kallsyms)

An interface to manipulate kernel symbols exposed by /proc/kallsyms is required to convert symbol names to their addresses as well as the opposite.

Support for handling the events (display on stdout, write to a file) after they are processed

Once an event is retrieved and processed we can provide it to the user. No post-processing is done at this point as we need all events for this and such things will be done offline.

Things to consider:

  • Output the events to stdout, without (much) processing. Options are to use raw json, or a slightly better format for reading.
  • The output might be controlled a little:
    • Select the collectors events are coming from.
    • Select specific fields to display. Here an option might be to have a formatting cmdline option such as --format "{timestamp} ksym: {ksym}"; or simpler options such as --show-field timestamp,ksym. We shouldn't support both though (for maintainability reasons).
  • Write the raw json output to a file for later use. This will be required by post-processing commands later on.

For an initial support only a raw output to stdout might be possible to support. That is fine, if so please split this issue.

This depends on #8.

Post-processing in a Python interpreter

Investigate and implement if possible a post-processing command to convert events into Python objects and let the user manipulate them in a launched Python interpreter.

Some things to consider:

Better handle containers

The tool could automatically report more user formatted info on containers:

  • Mapping netns of events to containers names/id.
  • Using BPF_PROG_TYPE_CGROUP_SKB to see packets going in/out cgroups.
  • Allowing to target a container for automatic inspection, something like --container <id>.

add bash completion script

There's a chance we'll end up having a decent number of options and adding a bash completion file could be a nice to have

Allow user provided probe hooks

Allow the user to provide its own BPF object file and load it as a hook in the probes. For this to work the following topics needs to be covered:

  • Provide a way to compile BPF objects using our interfaces and specific data.
  • Allow loading external objects.
  • Having generic logic to fill events with generic data and to retrieve it.

Event reporting and handling

This issue might be split as someone starts working on it.

A solution is needed to report events from probes and to digest them into a known format (json?). Possible solutions are splitting the event reporting logic per-collector, or to share one with more generic capabilities.

Packet filtering

One of the key feature is to match on packets. A solution is required in both the core tool (to accept user provided filters and to modelize them) and in the collectors to perform the actual matching.

Please split this issue into sub-ones if needed.

Remove libbpf-rs `Send` workaround

Currently we have to workaround the fact that libbpf-rs does not mark certain objects (e.g: Map or RingBuf) as Send.
We should:

  • send a PR to libbpf-rs to add it
  • remove the workaround

Attach to qdiscs

Using BPF_PROG_TYPE_SCHED_CLS. The specificity is a qdisc needs to be provided as an argument for the eBPF program to be attached. This could require the user (or the tool) to attach the right kind of qdisc on some interfaces and could modify the way the system works.

The data probes will have access to would be struct __sk_buff.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.