Git Product home page Git Product logo

Comments (7)

brendangregg avatar brendangregg commented on May 14, 2024

this has been done. closing ticket.

from bcc.

drzaeus77 avatar drzaeus77 commented on May 14, 2024

Actually, this is probably working on accident. I was delinquent in not responding to this issue with an explanation earlier, and I didn't notice this in the code change a few days back.

It is not advised to return an undefined value, since it will be used by the different kernel hookpoints (socket, kprobe, etc.) in different ways. In the case of socket, the return value is used to denote whether to capture the packet to userspace or not. In the case of kprobe, the return value is used to determine whether the perf event that may be attached to the same kprobe should be recorded or not. So, an undefined return value may cause quirky perf behavior, if they are running concurrently.

I would propose that a void return value should only be allowed for function types that have special handling to mangle the return value to be 0 or whichever sane default is appropriate for that function type. I can add this support for kprobes.

from bcc.

brendangregg avatar brendangregg commented on May 14, 2024

Not sure I understand what you mean: so the k[ret]probe__ functions could be made void safe, but others should return something properly?

from bcc.

drzaeus77 avatar drzaeus77 commented on May 14, 2024

Let me rephrase. All BPF programs must leave a valid value in R0. All kernel hooks that run a BPF program interpret R0 in some way. If we let BCC allow void programs, it will have to internally stuff in a default return value, since we don't want undefined behavior. The choice of 0 for the default value is the most universal one, and since we are focusing on probes I feel comfortable allowing this default for those types of programs (non-zero is the error/skip case for probes), but not as comfortable for the others. For instance, for SCHED_CLS, the return value currently has a specific meaning depending on how the program is stitched in the tc qdisc/filter/action chain. Additionally, non-kprobe program types usually aren't one-liners, so it isn't a great burden to keep program return type required.

from bcc.

brendangregg avatar brendangregg commented on May 14, 2024

I'm fine with returning 0 for now, eg, changing hello_world (and others) back to:

from bcc import BPF

BPF(text='int kprobe__sys_clone(void *ctx) { bpf_trace_printk("Hello, World!\\n"); return 0; }').trace_print()

We'd just need to be clear that it has value and what it does. I'm still not clear what happens if I return 1 in this example. hello_world.py needs to be 100% explainable, and make 100% sense. :)

from bcc.

drzaeus77 avatar drzaeus77 commented on May 14, 2024

I just got a better explanation from Alexei, so now hopefully my explanation makes more sense as well.

Let's try an example with hello_world.

$ python -c "from bcc import BPF; BPF(text='int kprobe__sys_clone(void *ctx) { return 0; }').trace_print()"

Now in another shell:

$ perf list | grep sys_clone
  kprobes:p_sys_clone                                [Tracepoint event]
$ perf record -e kprobes:p_sys_clone -a
^C[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.360 MB perf.data ]
$ perf report --stdio
Error:
The perf.data file has no samples!

Return 1 instead.

$ python -c "from bcc import BPF; BPF(text='int kprobe__sys_clone(void *ctx) { return 1; }').trace_print()"
$ perf report --stdio
...
# Samples: 47  of event 'kprobes:p_sys_clone'
...
    78.72%  byobu-status  [kernel.vmlinux]  [k] sys_clone
    17.02%  tmux          [kernel.vmlinux]  [k] sys_clone
     4.26%  bash          [kernel.vmlinux]  [k] sys_clone

So what changed? Similar to attaching a bpf program to a socket, a kprobe bpf program acts like a filter for other subscribers to the same event. Returning 0 tells the kprobe infrastructure to return immediately after running the bpf program, whereas non-zero allows the other subscribers to run, thus paying the cost of allocating space in the trace buffer for this event and submitting it (for details see kernel/trace/trace_kprobe.c:kprobe_perf_func()). This last step can be costly, but it can also be powerful.

If we are using bpf programs in a standalone mode, where they do all of the work and coordinate with userspace purely through bpf maps and trace_printks, then it is not necessary or performant to pay the cost of submitting the event to the buffer only to discard it later. Currently this is the case for all of our tools and examples.

Having said that, there is a great future opportunity to use this return value in conjunction with other perf events to intelligently trace with the full power of perf.

I don't think this answer gets towards a 100% explainable and 100% sensical user story, but hopefully it makes sense to you. How should we spin it for the one-liner case without getting into the technical weeds?

from bcc.

brendangregg avatar brendangregg commented on May 14, 2024

I'm going to return the "return 0"s, given the explanations above, and close this ticket.

from bcc.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.