Comments (5)
Yeah the current approach is pretty awful. In granary, I had a way of extracting and wrapping most functions from the Linux kernel and libc. I even started to use the same script to generate the std_defs.txt files from pre-processed system header files. Still, the current format leaves a lot to be desired. I think we can still adapt this script though.
Here's the gist of the idea, and it builds directly off of what granary did: use the cparser script to generate a header file containing all system function declarations, and their dependent types (this is doable, modulo a few parse errors). An alternative, possibly better solution is to use Clang for this. So we want to produce a bitcode file with every external as a declared function, then link this in with lifted bitcode, and find a way to adapt the arguments (similar to right now).
Var args are indeed a problem. Right now we punt on it and pretend that functions like printf take 12 to 16 arguments. It's kind of ugly.
In terms of inline assembly, what do you mean? The easiest solution by far is not to model argument passing explicitly. Instead, use the existing detach mechanisms with a callback. This would avoid the problem entirely, solving it by doing the usual marshalling of the reg state struct into native state.
from mcsema.
The detach mechanism is what I meant by inline assembly (I forgot that portion of the code existed).
Since the function definition list is used in get_cfg.py, I'm hesitant to actually suggest a bitcode file as the way to represent this data in an interchange format, but the LLVM function prototypes definitely seem the way to go.
from mcsema.
I don't think the bitcode vs. the function list approaches are mutually exclusive. That is, cfg_to_bc
can consult whatever was put into the CFG proto by get_cfg.py (via the text defs), but it if has better information from a prototype in its bitcode, then it could prefer that.
This will bring mcsema slightly closer to remill, insofar as remill takes a "base" input bitcode module to start with (in remill's case, it contains all instruction semantics), and extends it from there. In this case, cfg_to_bc could be instruction to take in a path to a bitcode file, and lift into that, rather than creating a new llvm::Module
.
Thoughts?
from mcsema.
The master branch, when using --explicit_args
has available a new option, --library /path/to/bitcode.bc
, which is an IR or bitcode file containing declarations of the lifted program's externals.
from mcsema.
Closing this external functions have changed significantly, and ABI libraries (now in progress) should take care of complex argument scenarios.
from mcsema.
Related Issues (20)
- Docker build error HOT 1
- dyninst frontend is not gettting built on Linux HOT 4
- Segfault while running lifted binary HOT 1
- How not to be explicit about "runtime's memory" HOT 4
- Decompiling Windows binaries (32bit and 64bit) does not work at all HOT 3
- Building mcsema with clang12 error HOT 10
- Official support for running mcsema with rizin/cutter HOT 2
- build error on ubuntu 20.04
- Dockerfile is not up-to-date with remill dependency HOT 2
- Feature request: Improve installation process with precompiled binaries HOT 1
- to see details in the generated 'xxx.cfg'
- fatal error: 'anvill/Program.h' file not found HOT 1
- Can you walk through the example on docs/McSemaWalkthrough.md and check if it still works?
- Unable to build the docker image
- error while translating function with function pointer as parameter HOT 5
- wsl-1.0 ubuntu20+win ida7.5 counter a error of
- Converting 64 bit program to 32 bit HOT 2
- Use mcsema with IDA Home
- Issue in disassembling binaries
- how to make llvm ir "store" volatile? HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from mcsema.