Comments (18)
In my understanding, all I need to do in _handle_local_function is to pass the state on and take steps in the function, since it's an internal function and we could access it's graph.
Yes, that's right!
Thus, I'd like to pass the caller's rd_state to the first node of that function, and take forward analysis.
I am not too sure of what you mean by "take forward analysis", and "pass... to the first node of that function".
Note that ReachingDefinitionsAnalysis
can be run on a function directly! You don't have to manually manage the state between the blocks of that function.
So I wonder, which APIs should I focus on to take that propagation of rd_state during nodes?
As per my previous comment, I think you should focus on RDA
's API, in particular the ways to instanciate it with a Function
as subject
, an init_state
, and passing the "parent"'s function_handler
, visited_blocks
, dep_graph
, etc., for the "child" RDA to update those structures as it needs.
from bits_of_static_binary_analysis.
ReachingDefinitionsAnalysis.all_definitions
is a dictionary (not a set), where keys are "locations" (triple("node"|"insn", CodeLocation, OP_BEFORE|OP_AFTER)
, and values areLiveDefinitions
. In other terms,.all_definitions
gives you all theLiveDefinitions
you recorded at any point you asked it to!
BTW, ReachingDefinitionsAnalysis.all_definitions
is a set, and ReachingDefinitionsAnalysis.observed_results
is a dictionary.
from bits_of_static_binary_analysis.
how to resolve regs?
I can't share the code I have for that. As you realised, it makes use of the calling convention recovery capabilities of angr
to correlate Atom
with Definition
s.
If you need help on calling convention analysis, you can ask on the angr
Slack.
MemLocationAtom <- Make_MemLoc_Atom(addr:resolve(%rdi), size:resolve(%rsi))
rd_state.AddDef(MemLocationAtom)
Yes, that's the idea.
Note that:
- If
rdi
points to the stack or the heap, you might need to create aSpOffset
, orHeapAddress
(VS aMemoryLocation
); - What you named
addDef
is already implemented asrd_state.kill_and_add_definition()
(as you noted).
rd_state.AddUse(resolve(%rdi))
Kinda.
rd_state.add_use_by_def()
is to be used here, because resolve(%rdi)
should return the relevant (used at the callsite) Definition
for rdi
.
how to resolve the register to generate the potential data? In EngineVex, data could be resolved by _expr, but how to calculate here, in _handle_glibc_function
Definition
s have a .data
attribute, which is a DataSet
(some wrapper around a Python Set
) containing the data.
In your demo, you imported _libc_decls and used parameter_type = _libc_decls[sink_name].args[parameter_position] to determine the type of parameter. I wonder it's that necessary?
This was for specific handling, as described in a comment in the code where it's used.
Maybe you don't need those 🤷
from bits_of_static_binary_analysis.
Read the code carefully: state, self._visited_blocks, self._dep_graph = engine.process(...)
means that engine.process
returns a triple, which first element will be affected to the state
variable.
So a copy is passed to the child. The child updates the copy, and returns it. The parent replaces its state
with the updated copy.
from bits_of_static_binary_analysis.
Hi @Icegrave0391,
This is a question that comes often, so I wrote a blog post to help people trying to get function handlers
working: Handle function calls during static analysis in angr .
Closing this issue, but don't hesitate to pursue the discussion here if the aforementioned article does not answer your questions. It will be helpful for other people as well :)
from bits_of_static_binary_analysis.
Thanks!
By the way, I'd like to ask some specific APIs to implement _handle_local_function
. In my understanding, all I need to do in _handle_local_function
is to pass the state on and take steps in the function, since it's an internal function and we could access it's graph. Thus, I'd like to pass the caller's rd_state to the first node of that function, and take forward analysis. (Hope I got that right😢)
So I wonder, which APIs should I focus on to take that propagation of rd_state during nodes?
from bits_of_static_binary_analysis.
Thanks to your advice, I thought I've finished handle_local_function
. However, I noticed that in engine_vex
, after handle_local_function
, it's necessary to return a rd_state(child rda's state I think).
Since the rd_state is not a property of ReachingDefinitionAnalysis, my idea is to construct a result child_rd_state
from the child_rda
, from all of the definitions. And the code fragment is as followed.
child_rda = self.project.analyses.ReachingDefinitions(
subject=local_function,
func_graph=local_function.graph,
max_iterations=parent_rda._max_iterations,
track_tmps=parent_rda._track_tmps,
observation_points=parent_rda._observation_points,
init_state=parent_rdstate,
cc=local_function.calling_convention,
function_handler=self,
call_stack=parent_rda._call_stack, # callstack <- [parent callstack] + [subject function]
maximum_local_call_depth=parent_rda._maximum_local_call_depth,
observe_all=parent_rda._observe_all,
canonical_size=parent_rda._canonical_size
)
# construct child's rd_state
child_rdstate = ReachingDefinitionsState(
arch=self.project.arch, subject=child_rda.subject,track_tmps=child_rda._track_tmps,
analysis=child_rda,
live_definitions=child_rda.all_definitions,
canonical_size=child_rda._canonical_size
)
return True, child_rdstate, child_rda.visited_blocks, child_rda.dep_graph
However, I noticed the parameter live_definitions
, which is default generated as a type LiveDefinitions
. But in my way, child_rda.all_definitions
is a set.
Finally, it seem that the definitions didn't passed on well, and I'm not sure if I have to instantiation a LiveDefinitions
object and add those definitions manually to pass that parameter.
from bits_of_static_binary_analysis.
Since the rd_state is not a property of ReachingDefinitionAnalysis, my idea is to construct a result child_rd_state from the child_rda, from all of the definitions.
ReachingDefinitionsState
is passed along, and updated by ReachingDefinitionAnalysis
.
(The underlying design idea is to de-correlate the state of the analysis from the manipulations on this state. angr
is a big WIP from the design perspective, and some boundaries aren't very clear / moving.)
So, you don't have to re-create a ReachingDefinitionsState
, you already have one, given as a parameter to your handler.
Your handler should update the state.live_definitions
, with the merge of the LiveDefinitions
at the exit site of the function (they will represent the facts gathered by analysis of the function), and return state
.
However, I noticed the parameter live_definitions, which is default generated as a type LiveDefinitions. But in my way, child_rda.all_definitions is a set.
Because they are two different things, although related.
LiveDefinitions
is a "unit": a set of facts about definitions and uses (and values) at a given point of the program;ReachingDefinitionsAnalysis.all_definitions
is a dictionary (not a set), where keys are "locations" (triple("node"|"insn", CodeLocation, OP_BEFORE|OP_AFTER)
, and values areLiveDefinitions
. In other terms,.all_definitions
gives you all theLiveDefinitions
you recorded at any point you asked it to!
Finally, it seem that the definitions didn't passed on well, and I'm not sure if I have to instantiation a LiveDefinitions object and add those definitions manually to pass that parameter.
At what point didn't it get "passed on" well?
Did you use observation_points
, or observe_all
when instantiating ReachingDefinitionsAnalysis
?
If not, then it's not recording any facts it computes!
from bits_of_static_binary_analysis.
Hi @Pamplemousse!
Thanks to your advice, I successfully worked out handle_local_function()
! And I still want to ask some questions for _handle_extern_function
(sorry for being loquacious🤐).
In my understanding, as your presentation showed, I should simulate the behavior of glibc functions. And since the parameters could be recovered to Resister(Atom)
according to calling_convention, all I need to do is to add defs & uses of those atoms according to the function semantic.
Here, I'd like to post a pseudocode of _handle_fputs, _handle_fgets
and ask if that's correct:
handle_fgets(...){ # fgets(char *buffer, int size, FILE * stream)
# how to resolve regs?
MemLocationAtom <- Make_MemLoc_Atom(addr:resolve(%rdi), size:resolve(%rsi))
rd_state.AddDef(MemLocationAtom)
# or just rd.state.AddDef(RegisterAtom(%rdi),RegisterAtom(%rsi)) ?
}
handle_fputs(...){ # fputs(char *buffer, FILE * stream)
rd_state.AddUse(resolve(%rdi))
}
So, I think, rd_state.kill_and_add_definitions(), rd_state.add_use()
(maybe those are the top level api and I wouldn't have to manipulate KeyedRegions
) could help.
However, I have some questions about those APIs:
- how to resolve the register to generate the potential
data
? InEngineVex
, data could be resolved by_expr
, but how to calculate here, in_handle_glibc_function
? - I noticed an API
rd_state.add_use_by_def()
, which is never used, so I wonder if there is any situation could use? - In your demo, you imported
_libc_decls
and usedparameter_type = _libc_decls[sink_name].args[parameter_position]
to determine the type of parameter. I wonder it's that necessary?
Thanks for your patience! 🙇
from bits_of_static_binary_analysis.
Since the rd_state is not a property of ReachingDefinitionAnalysis, my idea is to construct a result child_rd_state from the child_rda, from all of the definitions.
ReachingDefinitionsState
is passed along, and updated byReachingDefinitionAnalysis
.
(The underlying design idea is to de-correlate the state of the analysis from the manipulations on this state.angr
is a big WIP from the design perspective, and some boundaries aren't very clear / moving.)
So, you don't have to re-create aReachingDefinitionsState
, you already have one, given as a parameter to your handler.
Your handler should update thestate.live_definitions
, with the merge of theLiveDefinitions
at the exit site of the function (they will represent the facts gathered by analysis of the function), and returnstate
.
Hmmm..., Did I misunderstand something? In _run_on_node
(reaching_definitions.py line 333)
def _run_on_node(self, node, state: ReachingDefinitionsState):
...
state = state.copy()
state, self._visited_blocks, self._dep_graph = engine.process(
state,
block=block,
fail_fast=self._fail_fast,
visited_blocks=self._visited_blocks,
dep_graph=self._dep_graph,
)
I think the state passed to engine
is a copy of the origin state... So the origin state may not be updated? #
from bits_of_static_binary_analysis.
Hi @Pamplemousse,
I think I'm quite confused about update the state.live_definitions
at the exit of handle_local_function
as you said before:
Since the rd_state is not a property of ReachingDefinitionAnalysis, my idea is to construct a result child_rd_state from the child_rda, from all of the definitions.
ReachingDefinitionsState
is passed along, and updated byReachingDefinitionAnalysis
.
(The underlying design idea is to de-correlate the state of the analysis from the manipulations on this state.angr
is a big WIP from the design perspective, and some boundaries aren't very clear / moving.)
So, you don't have to re-create aReachingDefinitionsState
, you already have one, given as a parameter to your handler.
Your handler should update thestate.live_definitions
, with the merge of theLiveDefinitions
at the exit site of the function (they will represent the facts gathered by analysis of the function), and returnstate
.
As you said, rd_state is always passed on and self-updated during RDA, and during handling several external functions, state.kill_and_add_definitions()
should also update thoseLiveDefinitions
. Thus I used to think that there is no need to merge live_definitions
manually.
However, I run rda on that simple program :
void read_file(char * buffer, char * filepath){
...
fgets(buffer, count, fp); // <- handle_fgets(): Add a definition of MemoryLocationAtom()
...
}
int main(){ // <- start RDA at main function
....
read_file(BUFFER, FILE); // <- handle_local_function(): start child RDA here
...
}
When handle_fgets()
is executed, a new memory_definition
is created. However, when it returns to child_rda:
def handle_local_function(...):
...
child_rda = self.project.analyses.ReachingDefinitions(
subject=local_function,
func_graph=local_function.graph,
max_iterations=parent_rda._max_iterations,
track_tmps=parent_rda._track_tmps,
observation_points=parent_rda._observation_points,
init_state=parent_rdstate,
cc=local_function.calling_convention,
function_handler=self,
call_stack=parent_rda._call_stack, # callstack <- [parent callstack] + [subject function]
maximum_local_call_depth=parent_rda._maximum_local_call_depth,
observe_all=parent_rda._observe_all,
visited_blocks=parent_rda._visited_blocks,
dep_graph=state.dep_graph,
canonical_size=parent_rda._canonical_size
)
# construct child's rd_state
# merge
print(f"after handle local function: {local_function.name}, state: {parent_rdstate}")
child_rdstate = parent_rdstate
In which I just simply pass the parent_state
as the result state.
But I found that the memory_definition
is missed. Why it didn't updated during the process?(As the state.kill_and_add_definition()
in handle_fgets()
will update it's live_definitions
)
So, it's necessary to merge those live_definitions manually, I created a local_function_stack to record each child rda's live_definitions
, I plan to maintain a global dictionary of each local function's definitions and finally merge them:
class NaiveHandler(FunctionHandler):
g_local_live_definitions: Dict[int, LiveDefinitions] = {}
def __init__(self):
self._analyses = None
self._local_func_stack = []
self.project: 'angr.Project' = None
@property
def current_local_func_addr(self):
if len(self._local_func_stack):
return self._local_func_stack[-1]
return 0
def push_local_func_stack(self, v: int):
self._local_func_stack.append(v)
def pop_local_func_stack(self):
self._local_func_stack.pop()
def update_live_definitions(self, local_func_addr, live_definitions: LiveDefinitions):
if not local_func_addr in NaiveHandler.g_local_live_definitions.keys():
NaiveHandler.g_local_live_definitions[local_func_addr] = live_definitions
else:
origin_defs: LiveDefinitions = NaiveHandler.g_local_live_definitions[local_func_addr]
NaiveHandler.g_local_live_definitions[local_func_addr] = origin_defs.merge(live_definitions)
def clear_live_definitions(self, local_func_addr, live_defs: LiveDefinitions):
if local_func_addr in NaiveHandler.g_local_live_definitions.keys():
NaiveHandler.g_local_live_definitions.pop(local_func_addr)
NaiveHandler.g_local_live_definitions[local_func_addr] = live_defs
def get_live_definitions(self, local_func_addr):
try:
live_defs = NaiveHandler.g_local_live_definitions[local_func_addr]
except KeyError:
raise KeyError(f"Error in function {hex(local_func_addr)}")
return live_defs
def hook(self, analysis):
"""
Hook is just to pass the parent's RDA
:param analysis:
:return:
"""
self._analyses = analysis # parent rda
self.project = analysis.project
return self
def handle_local_function(self, state: 'ReachingDefinitionsState', function_address: int, call_stack: List,
maximum_local_call_depth: int, visited_blocks: Set[int], dep_graph: 'DepGraph',
src_ins_addr: Optional[int] = None,
codeloc: Optional['CodeLocation'] = None):
local_function = self.project.kb.functions.function(addr=function_address)
print(f"handling local function: {local_function.name}")
# 1. get parent's rd-state & rda
parent_rdstate = state
parent_rda = self._analyses
# import IPython; IPython.embed()
# 1.1 init local_function live_definitions
self.push_local_func_stack(function_address)
self.clear_live_definitions(self.current_local_func_addr, parent_rdstate.live_definitions)
# 2. pass the parent's structures and execute child RDA
child_rda = self.project.analyses.ReachingDefinitions(
subject=local_function,
func_graph=local_function.graph,
max_iterations=parent_rda._max_iterations,
track_tmps=parent_rda._track_tmps,
observation_points=parent_rda._observation_points,
init_state=parent_rdstate,
cc=local_function.calling_convention,
function_handler=self,
call_stack=parent_rda._call_stack, # callstack <- [parent callstack] + [subject function]
maximum_local_call_depth=parent_rda._maximum_local_call_depth,
observe_all=parent_rda._observe_all,
visited_blocks=parent_rda._visited_blocks,
dep_graph=state.dep_graph,
canonical_size=parent_rda._canonical_size
)
# construct child's rd_state
# merge
print(f"after handle local function: {local_function.name}, state: {parent_rdstate}")
child_rdstate = parent_rdstate
new_defs = self.get_live_definitions(self.current_local_func_addr)
child_rdstate.live_definitions = child_rdstate.live_definitions.merge(new_defs)
self.pop_local_func_stack()
# update previous
if self.current_local_func_addr:
self.update_live_definitions(self.current_local_func_addr, new_defs)
# return (executed_rda, child_rdstate, visited_blocks, dep_graph)
# import ipdb; ipdb.set_trace()
return True, child_rdstate, child_rda.visited_blocks, child_rda.dep_graph
def handle_fgets(self, state: 'ReachingDefinitionsState', codeloc: 'CodeLocation'):
...
state.AddMemLocDef()
self.update_live_definitions(self.current_local_func_addr, state.live_definitions) <--- manually add to the global's function live_definitions
return True, state
But however, the result seems to be unwonted...
from bits_of_static_binary_analysis.
So I should record each live_definitions
at the external functions(like fgets, fputs...) exit point, and manually merge those live_definitions to the child_rd_state after the child_rda is executed?
from bits_of_static_binary_analysis.
Merging confusion
As you said, rd_state is always passed on and self-updated during RDA, and during handling several external functions, state.kill_and_add_definitions() should also update those LiveDefinitions.
Correct!
Thus I used to think that there is no need to merge live_definitions manually.
You forgot something: LiveDefinitions
is a set of facts a (unique) given point of the program.
What you are expecting is the LiveDefinitions
"at the end of the function".
What about functions that have several exit blocks (several return statements for example)?
Then "the end of the function" is not a unique point of the program (remember that a "point" is a triple ("node"|"insn", CodeLocation, OP_BEFORE|OP_AFTER)
)!
In data-flow analysis, when you have several states and you want a single one, you apply the join
operation (see https://docs.google.com/presentation/d/13SDNRKHblo2xenczp9m6rQahigtwygmUcrBhZ-G3gvo/edit#slide=id.g9976451435_0_25).
The LiveDefinitions
that you have at the end of the function is the merge
of the LiveDefinitions
you have for each exit site.
This should only happen inside the handle_local_function
!
Your code
But I found that the memory_definition is missed. Why it didn't updated during the process?(As the state.kill_and_add_definition() in handle_fgets() will update it's live_definitions)
You should try to locate the problem (breakpoints are very useful to debug handlers): Do you see the definitions being added in handle_fgets
? Where does it disappear?
So, it's necessary to merge those live_definitions manually
Not the way you did it. See first section.
from bits_of_static_binary_analysis.
Confusion about self-updated LiveDefinitions
I took a step-by-step debug on the source code, however, I found something strange about rd_state. My source code is as followed.
void read_file(char * buffer, char * filepath){
...
fgets(buffer, count, fp); // <- handle_fgets(): Add a definition of MemoryLocationAtom()
...
}
int main(){ // <- start RDA at main function
....
read_file(BUFFER, FILE); // <- handle_local_function(): start child RDA here
...
}
The general analysis steps is like:
- start RDA(subject=main)
- In
main RDA
: start RDA(subject=read_file), as we encountered the functionread_file
,handle_local_function
will create the child RDA of that function. - in
read_file RDA
: start RDA(subject=fgets), when we run at the plt table(stub) location offgets
,handle_local_function
will create that child RDA ofcall fgets@plt
. - in
fgets@plt RDA
: triggerhandle_fgets
, and create aMemoryLocation
type definition of the rd_state.
I think, when the child rda is returned, all those new definitions (likeMemoryLocation
infgets@plt rda
should passed). However, the result is as followed:
-
before creating the
fgets@plt
RDA, the initial state is as followed:
-
after creating the
fgets@plt
RDA and triggeringhandle_fgets
, the result state is as followed:
As the figure shown, amemdef
has been created. -
when
fgets
is handled and thefget@plt
rda has done, thememdef
in the result state has been disappeared?
Note that my code ofhandle_local_function
is :
def handle_local_function(self, state: 'ReachingDefinitionsState', function_address: int, call_stack: List,
maximum_local_call_depth: int, visited_blocks: Set[int], dep_graph: 'DepGraph',
src_ins_addr: Optional[int] = None,
codeloc: Optional['CodeLocation'] = None):
local_function = self.project.kb.functions.function(addr=function_address)
# 1. get parent's rd-state & rda
parent_rdstate = state
parent_rda = self._analyses
# 2. pass the parent's structures and execute child RDA
child_rda = self.project.analyses.ReachingDefinitions( <--- we are at here when creating fgets@plt rda
subject=local_function,
func_graph=local_function.graph,
max_iterations=parent_rda._max_iterations,
track_tmps=parent_rda._track_tmps,
observation_points=parent_rda._observation_points,
init_state=parent_rdstate,
cc=local_function.calling_convention,
function_handler=self,
call_stack=parent_rda._call_stack, # callstack <- [parent callstack] + [subject function]
maximum_local_call_depth=parent_rda._maximum_local_call_depth,
observe_all=parent_rda._observe_all,
visited_blocks=parent_rda._visited_blocks,
dep_graph=parent_rdstate.dep_graph,
canonical_size=parent_rda._canonical_size
)
# construct child's rd_state
child_rdstate = parent_rdstate <--- we are at here after executing handle_fgets, which means fgets@plt rda has finished.
return True, child_rdstate, child_rda.visited_blocks, child_rda.dep_graph
Here let's just ignore the fact that I didn't merge potential different exits' LiveDefinitions
, and just focus on the problem of the self-updating of state's LiveDefinitions
.
The result:
So I'm puzzled that the memdef
seems not been updated during the process of parent_rdstate
?
from bits_of_static_binary_analysis.
Which means that, I failed to copy the state of child RDA to parent RDA. And why didn't that init-state
, which from the parent RDA to call child RDA, become self-updated during child RDA, and returns to parent?
from bits_of_static_binary_analysis.
Sorry for asking you those stupid questions above...🤦♂️ Now I've hide them.
It's just because the rd_state
parameter is immutable and the value out of the function won't be changed (a very very stupid question lol).
And I must leverage the live_definitions at observe_points
which represents the function.endpoints
(just as you said) and merge them.
Sorry again for wasting your time at such a stupid question 🙇
from bits_of_static_binary_analysis.
Hi @Pamplemousse ,
I have another question about function handler:
I read the source code at engine_vex and found the _handle_function
:
def _handle_function(self, func_addr: Optional[DataSet], **kwargs):
skip_cc = self._handle_function_core(func_addr, **kwargs)
if not skip_cc:
self._handle_function_cc(func_addr)
Here, skip_cc
is executed_rda
returned by my customized function_handler, and I think _handle_function_cc
just follows the calling_convention and take several steps:
- add uses for arguments
- kill return value registers
- caller-saving registers
- pop return address if necessary
So, in my own _handle_external_function
handlers, should I always return executed_rda=True
and skip that _handle_function_cc
? And if that's true, do I need to do those steps manually (especially steps 2, 4. I don't know why it's necessary to kill return value register and if it's necessary to manipulate the stack pointer by meself)?
Moreover, I wonder if it's necessary to take RDA from the beginning of a function? I tried to pass a function's sub-graph to RDA and I'm not sure if that will go right. And during executing, it warns me that "Invalid number of values for stack pointer. Stack is probably unbalanced. This indicates serious problems with function handlers"...
from bits_of_static_binary_analysis.
_handle_function_cc
is a "minimal implementation" of a handler, in case you did not provide your own, or that your handler says it could not manage to work out the dependencies (for whatever reason) by returning (False, ...)
.
Minimal in the sense that it creates the necessary dependencies between a function's arguments and its return, without any knowledge of the internals.
So, if your handler does update the state itself, it should return True
.
Moreover, I wonder if it's necessary to take RDA from the beginning of a function? I tried to pass a function's sub-graph to RDA and I'm not sure if that will go right.
There is no hard requirements, it depends on what you want to do and the results you expect.
But keep in mind that RDA has to interpret VEX statements to construct the dependencies!
In your case, if your sub-graph does not include the function prologue for example, RDA cannot "guess" part of the stack content, and sanity checks will fail when returning from the function...
from bits_of_static_binary_analysis.
Related Issues (4)
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from bits_of_static_binary_analysis.