Comments (10)
For scaling to multiple cores on slow targets, the observer will be sent to the other nodes, serialized, and can be evaluated by the other nodes respecitvely. That way there is no need to re-run a target on each node.
If you don't need to scale like that in theory you can ignore serialization and then set the nodes to AlwaysUnique
from libafl.
So if the nodes are AlwaysUnique, we can still have multiple cores for the fuzzer, but the inputs will need to be reevaluated for each instance, correct? So if an instance discovers a new input, it will have to be reevaluated in a different node, correct?
from libafl.
Wouldn't it maybe make sense to have the Observer
not be serializable, but instead make it have a serializable state that can be sent across instances? I presume this would make the API a bit cleaner, and then it also becomes easier to check whether observers may be shared, since we can encode this in the typesystem via a HasObserverState
trait or something? Just some ideas, since i'm still getting familiar with the libAFL design.
from libafl.
So if the nodes are AlwaysUnique, we can still have multiple cores for the fuzzer, but the inputs will need to be reevaluated for each instance, correct? So if an instance discovers a new input, it will have to be reevaluated in a different node, correct?
Yes. But that's not really recommended of course. I don't see a reason why an observer wouldn't be serailizable, maybe you're doing things in the Observer that belong in a Feedback
?
Wouldn't it maybe make sense to have the Observer not be serializable, but instead make it have a serializable state that can be sent across instances?
The thing that should be serializable is the observer's state after an execution, you are roughly describing the way it works now. Where are you stuck exactly?
from libafl.
So my understanding of an Observer is, that it gathers some kind of information from the target and then provides this to the feedback in its raw form. The Feedback can then do some processing on this data to reduce it to a true or false decision.
If this is the case, then the observer needs some kind of access to the target in order to perform the observation.
In case of the map observer this (usually?) happens in the form of a reference to the shared memory map of the target coverage. What i'm now not quite sure about is what happens to this data on serialize. As far as i understand it right now, this data will be converted into an owned vector and then upon deserializing the observer then has no capability to actually observe the target anymore. Is my understanding correct so far?
So the observer is basically frozen upon being sent over the wire, and the other node can use it to do some calculations with the feedback moduls, correct?
If this is the case, then i don't have any problems with implementing this, since any internal objects that are not relevant to the observers state can be ignored during serde. However, i am still convinced (assuming my understanding is correct), that this is not modeled correctly. I would have expected that for example the observer emits a kind of "observation" which can be either sent over the wire, or be used directly by the feedback modules.
PS: I appreciate the quick replies so far. Thank you!
from libafl.
So my understanding of an Observer is, that it gathers some kind of information from the target and then provides this to the feedback in its raw form. The Feedback can then do some processing on this data to reduce it to a true or false decision. If this is the case, then the observer needs some kind of access to the target in order to perform the observation.
Correct
In case of the map observer this (usually?) happens in the form of a reference to the shared memory map of the target coverage. What i'm now not quite sure about is what happens to this data on serialize. As far as i understand it right now, this data will be converted into an owned vector and then upon deserializing the observer then has no capability to actually observe the target anymore. Is my understanding correct so far?
Correct as well :)
So the observer is basically frozen upon being sent over the wire, and the other node can use it to do some calculations with the feedback moduls, correct?
Also correct, yes
If this is the case, then i don't have any problems with implementing this, since any internal objects that are not relevant to the observers state can be ignored during serde. However, i am still convinced (assuming my understanding is correct), that this is not modeled correctly. I would have expected that for example the observer emits a kind of "observation" which can be either sent over the wire, or be used directly by the feedback modules.
This is true, for example the map observer content can be used directly or sent over the wire. That wen don't always expose the map as a vec is simply an optimization - we don't want to memcopy the coverage map for each execution. Keep in mind that the serialization (commonly) only takes place when a new interesting testcase is found and stored - and shared with other nodes.
PS: I appreciate the quick replies so far. Thank you!
Hope I can help :)
from libafl.
Alright, i think i understand all the concepts how you intended them now i think. But don't you think it would make sense to separate the serialization trait into an Observation
object that is returned from the Observer
? There don't need to be any copies here i think, since any data that is kept in the Observer
can simply be borrowed for as long as the Observation
is needed for. I.e. the Observer
could have a function like fn observation(&self) -> &Observation
. The Observation
itself can simply contain the data, or a reference to the data which is kept in the Observer
. This would be an implementation detail. On serialization we need to copy the data anyways (especially if you only have a borrow in the observer to begin with, like with the MapObserver). I don't think this would incur a significant overhead (maybe one or two pointer derefs?), but would more clearly reflect what is actually done.
However, i don't have that good of an overview of what changes this would cascade into. What is your opinion on this?
For my purposes it will suffice to just write a custom Serialize implementation that serializes the data behind the Rc and deserializes it as an object that is not shared anymore (since it will not need to interact with the executor, as far as i understood everything). I can live with that :)
from libafl.
It's a hot code path that executs a few hunded times a second depending on the target, but every single pointer deref counts in general
And I'm not sure it would make it easier for new users since it's more moving parts to understand?
What do you think
from libafl.
Well from my point of view as a "new" user with a lot of fuzzing background, it wasn't exactly clear to me what consequences the serialization has and if it matters that any references to other components still behave in a meaningful way after deserialization.
If we split this into Observer
which does the observing and Observation
which is used for further processing and can be sent to other fuzzers via serialization, then it would be more clear what the responsibilites are.
For the case of the MapObserver you could even directly implement the Observation
trait on the map and return a reference to the map. This shouldn't differ too much from the current amount of derefs you need to do, since you have to deref to the map either way.
I think the only option here is to actually implement it and then check the performance. Everything else is just speculation imho.
Side note: I don't even think we would need an extra trait for that, as it can be simply represented as an associated type in the Observer
trait. Maybe?
from libafl.
No clue, @andreafioraldi what do you think?
from libafl.
Related Issues (20)
- Guidance on how to update libafl_qemu code HOT 3
- Rename AsMutSlice to AsSliceMut for consistency HOT 5
- Implement "ObserverRef" to avoid explicitly referring to observer types HOT 2
- Introduce ``AFL_EXIT_ON_SEED_ISSUES`` for LibAFL HOT 7
- super mega ultra giga house cleaning of doom HOT 4
- Unify constructors accepting names HOT 8
- Add a CONTRIBUTING.md
- Use Reference in place of actual references in constructors HOT 3
- Set up a dependency updating bot
- libafl_qemu does not support -enable-kvm option. HOT 1
- When locking is disabled, CachedOnDiskCorpus will fail if a file already exists for a testcase
- Improve libafl_bolts/rands.rs HOT 2
- Switch (named) metadata to use Cow instead of String
- match_by_ref should accept a &Reference
- Killing multi process fuzzer HOT 16
- Stdout from launcher processes doesn't get redirected to stdout_file provided HOT 10
- Unify usage of error types on missing observers
- Use TypeReference/Reference for (named) metadata
- Revisit adaptive serialization HOT 2
- Use PhantomData<fn() -> (T0, T1, ...)> everywhere HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from libafl.