falconre / falcon Goto Github PK
View Code? Open in Web Editor NEWBinary Analysis Framework in Rust
License: Apache License 2.0
Binary Analysis Framework in Rust
License: Apache License 2.0
This project looks so cool. Is there any plan to support ARM and AArch64 binaries? :)
Program received signal SIGSEGV, Segmentation fault.
0x000055555589f221 in falcon_capstone::capstone::Instr::new () at src/capstone.rs:192
192 let detail = unsafe { *instr.detail };
(gdb) bt
#0 0x000055555589f221 in falcon_capstone::capstone::Instr::new ()
at src/capstone.rs:192
#1 falcon_capstone::capstone::InstrBuf::get () at src/capstone.rs:395
#2 0x0000555555858ea7 in falcon::translator::x86::translator::translate_block
() at lib/translator/x86/translator.rs:90
#3 0x0000555555831f7f in falcon::translator::x86::{impl#3}::translate_block ()
at lib/translator/x86/mod.rs:54
#4 falcon::translator::Translator::translate_function_extended<falcon::translator::x86::Amd64> () at lib/translator/mod.rs:163
#5 0x0000555555784c49 in falcon::loader::Loader::program_verbose<falcon::loader::elf::elf_linker::ElfLinker> ()
at /home/godtex/.cargo/registry/src/index.crates.io-6f17d22bba15001f/falcon-0.5.5/lib/loader/mod.rs:150
#6 falcon::loader::Loader::program_recursive_verbose<falcon::loader::elf::elf_linker::ElfLinker> ()
at /home/godtex/.cargo/registry/src/index.crates.io-6f17d22bba15001f/falcon-0.5.5/lib/loader/mod.rs:198
#7 falcon::loader::Loader::program_recursive<falcon::loader::elf::elf_linker::ElfLinker> ()
at /home/godtex/.cargo/registry/src/index.crates.io-6f17d22bba15001f/falcon-0.5.5/lib/loader/mod.rs:169
This will allow for users to implement architectures separate from Falcon, and use those architectures with Falcon.
This MIPS instructions lwl, lwr, swl, and swr are complicated. They should have test cases.
When retrieving the Loader::program of cyberblogger from https://github.com/trailofbits/cb-multios (compiled with clang 8.0.1) an error Err("Index does not exist for set_entry")
is returned from the line control_flow_graph.set_entry(block_indices[&function_address].0)?;
at the end of translate_function_extended. This occurs because for the function at address 0x45008 returns from the translate_block call in the x86 decoder immediately due to a CS_ERR_OK. This leaves the translation block with no instructions which results in a bogus insertion to the block_indices at block_indices.insert(*result.0, (block_entry, block_exit));
in translate_function_extended because block entry is set to 0 and block exit is set to 0 as nothing was inserted into the overall Function CFG and the block_entry variable was never set.
My hacky fix for an empty translation block is up at https://github.com/2over12/falcon
I simply check if there are no instructions for a block and add a new empty CFG to the block if so. There probably should not be an empty function though, it is currently unclear to me why falcon_capstone immediately returns a CS_ERR_OK, but it also probably worth handling an empty function in some sort of sane way.
Currently the lock
prefix is ignored. There may be a better way...
rdtsc
instruction is currently translated to a nop. We may want to handle this at some point in the future.
falcon::translator::Arch::translate_function
should take some sort of trait for Memory. This will allow various backing for memory to be passed to translate_function
. Specifically, we can implement this trait for symbolic memory, and have executors concretize and create new functions on the fly.
Would make development easier since many operations are already implemented out of the box
This project is very, very, relevant to my interests.
I see from the blog post you are using Capstone as the disassembler, which implements the ARM architecture. I'd like to take a stab at an implementation of ARM - at least for ARMv6 in Thumb mode (my current target).
I will use this issue to track an attempt at adding a module in falcon/lib/translator
to map the Capstone ARM instruction API to Falcon IL.
If you've thought of this already and have ideas of where it might go wrong please chime in! I'm just familiarizing myself with the codebase now.
Need the ability to load PE binaries
This repository will include just the core falcon framework.
In-line rust documentation for:
This is falcon's tracking issue for our docs.rs documentation failing to build.
docs.rs issue is here: rust-lang/docs.rs#1351
If docs.rs doesn't work, we'll host the docs elsewhere. However, my first choice is make docs.rs work.
Several of the translator tests are currently failing when falcon is built with the capstone4 feature.
Need the equivalent of the x86 ElfLinker for MIPS binaries. The linking process for ELF binaries can be cleaned up in general.
falcon::engine::SymbolicEngine should not save assertions is those assertions are just all constants. Only assertions over symbolic variables should be saved.
il::control_flow_graph::merge
merges adjacent blocks together. It's primarily used immediately after a function is translated. It's slow because of poor logic. Make logic not poor.
Symbolic memory should be paged, with reference counted pages. Forking state will call clone()
over these pages, and be exceptionally fast. Writing to a page will call Rc::make_mut
, producing a copy at the time of the write.
This should greatly reduce symbolic fork times, and greatly reduce memory usage.
Allow deserializing and serializing results of constants analysis with Serde. This will allow analysis to be conducted once and then saved.
We already have 32-bit x86 over Capstone. Port that lifter to 64-bit x86.
Wanted to try this out (nice work btw!)
Do you have a simple cli program that takes a binary as input and outputs whatever?
If not, I'd suggest adding a simple prototype/reference program in lib/bin/main.rs
so could be easy to try out, or perhaps something in examples/
Keep up the good work!
Currently il::Constant
only handles values up to 64-bit. This 64-bit restriction may be enforced elsewhere through the code base. Falcon should handles values up to an arbitrary number of bits.
Hello!
So I've been reading project, really great work and I'm really excited for some of the stuff you're doing, can't wait to see more, no matter what gets decided!
On that note, as you guessed from the title, I'm hoping it might be possible to consolidate 0-N things from this, panopticon, and a theoretical new memory interval crate that I want to write, as well as some other things.
This is a huge, huge topic, and I likely won't hit on a lot of the points, but just getting the ball rolling is good I think, if only to see if you're interested, where you're headed with things, etc.
If you're not interested at all, that is totally fine of course :) Just wanted to see what you think
So, for starters (and probably most controversially), reading through your source, particularly the il
module, there is so much that I think could be refactored (along with panopticon) into a generic function/il rust crate.
I say controversial because it will likely be hard/tedious, but i do think it would be (extremely) beneficial.
It would also require probably the deepest amount of coordination, which could be hard.
Nevertheless, I think some prime candidates are the il, and the function objects. If we could somehow make Function<IL>
, where IL
is the intermediate language used, this could have really really cool benefits.
Its hard for me to state how great this could be if we were able to swap out IL's at will. It also just seems right from an engineering perspective, similar to backends on a compiler.
As it stands now in both codebases, I think this modification is almost trivially possible - except - the disassembler aspect.
But this isn't necessarily bad news!
For almost the exact reasons in 1-3, i think it would be really cool to allow function (or whatever it ends up being) to also be generic in the disassembler, allowing a more robust disassembler implementation (like capstone), or a home grown solution like panopticon, etc.
Again the benefits here are experimentation, can try different assembler for different IR backend, etc.
Doing this I think will require sketching out what a generic function + a generic disassembler would look like, and what would be the most flexible, and hence requires the most cooperation and assessment of current codebases dependencies and expectations etc., but long term I think it would be really cool, it would allow all our work to be pooled together and hence we'd all benefit.
While I think this will be the hardest part to refactor, coordinate and get right, I think it will actually have the most benefit; of course, this is just my opinion though :)
I don't think this is controversial at all, and I think it would be invaluable. I want something like this already for bingrep, panopticon needs it, and i'm sure falcon could use it too.
Basically the idea is a:
[x..y) -> Value
Which is a datastructure that's created after the parser pass (or whenever you want, as long as you can send it a goblin binary), and which initially gets filled up with segment/section data; which ranges, what the name of the segment is, and perhaps what "kind of data" is there. We'd figure out what we want for a segment datatype, what information we'd need, etc. And of course, if its a central crate, when we need something new, we just extend it and everyone gets the benefit.
Similarly, and this would be the tricky part and where I want feedback, downstream users could also extend the memory ranges with their own tagging data, like [0xbeef..0xdead) -> FunctionRange
, etc.
Even if some fancy runtime extendable type doesn't work out, even if we just agree on an enum in this crate which downstream clients use, I think this would be great code reuse and benefit everyone all around.
Your loader looks really awesome!!! So i've been trying to get other persons to help create a relocator crate for a while, but no one is really interested in this stuff ๐
Anyway, at some future data I think panopticon wants to have this. So I've wanted to turn https://github.com/m4b/dryad into a library for quite some time. Basically I like working on that project and I'll find any excuse; also all that code going to waste would be sad.
So I'd like to propose potentially fusing falcon's runtime loader here with dryad, or vice versa, perhaps dryad becomes a lib, or i rip out parts of it via copy paste, whatever, and then that crate is refactored to be a library which downstream consumers like falcon and panopticon (and whoever really, who knows the applications!) can use it as their runtime linking and loading system.
Initial issue is i'll have to put the asm usage in dryad and bare functions behind feature flags, as it requires nightly, and its not nice to force that on downstream clients (which would be sad, since it's a pure rust toolchain dynamic linker that way!)
Anyway, that's my suggestion for 3 different things I think are candidate usecases to refactor out into shared dependencies for great good. I'm sure there are many other opportunities as well.
Let me know what you're thinking; as you can tell, I'm of the persuasion we should combine all of our powers and take over the universe ๐ผ
Thanks for reading this far, I know, it was a lot :)
/cc @flanfly
This issue exists as a place to discuss IL changes for Falcon 0.5.
Constant
- No changeScalar
Expression
- No changeOperation
il::Operation::Conditional(Box<il::Operation>)
, which allows for a conditionally executed instruction.Instruction
- No ChangeBlock
- No ChangeEdge
- No ChangeControlFlowGraph
- No ChangeFunction
- No ChangeProgram
- No ChangeOther Ideas:
We can create an operation, il::Operation::Delay(usize, Box<Operation>)
, but allowing for operations with arbitrary delays makes the implementation of analyses and control-flow recovery more difficult. We would need to create some sort of, "Executor," which could be used by analyses, which kept track of operations in delay slots/pipelines.
I am less sure how to incorporate this. Currently, il::Instruction
corresponds nicely to a single instruction. In some architectures, however, multiple instructions can be executed simultaneously. We can either lift these addresses to an il::Instruction
, and at the il::Instruction
level mark whether the instruction is parallel or not, or we can create an il::Operation::Parallel(Vec<il::Operation>)
. I'm again worried about creating an il::Operation::Parallel(Vec<il::Operation>)
, because it may make implementing analyses more challenging. We would need to integrate this into an, "Executor," of sorts which managed all of this for us.
Often times we Nop out instructions we aren't concerned with. Specific to this conversation is the NOPing of branch instructions. People would like to retain this information in an optional fashion.
There are a couple ways to do this.
Placeholder
Operation, which is another operation people need to consider while doing analyses.placeholder: Option<il::Operation>
field for Nops. This becomes a, "Bonus add-on," modification that should not affect anything as is.il
, beginning with Program
, Function
, Block
, Instruction
, and Operation
. We can go with either json or I am going to recommend, bincode. Json is nice because it allows the entire IL to still be easily serializable. Bincode is nice because anything that is Rust can be encoded with bincode (json only allows things which can be converted to strings as map indices).The num_bigint
crate is slow. Due to requirements in handling amd64 instructions, Falcon moved to a big integer library to support operands > 64 bits in width. This will also be required for SSE/AVX instructions in the future.
However, because il::Constant
is now only backed by num_bigint::BigUint
, this incurs unacceptable slowdowns during operations such as lifting entire binaries. il::Constant
requires more sophisticated logic to back operations over faster u64
-native operations when appropriate.
&Option<T>
is not as ergonomic as Option<&T>
, and is usually just a call to as_ref()
away.
falcon::engine is probably not the best place for expr_to_smtlib2
Falcon needs a good-and-proper Error
enum. This will touch almost all of the codebase.
This is a very heavy lift, and is being forever/continually pushed to the right.
Re: http://reversing.io/posts/the-il-nop/
This is about the most insightful write-up on different IRs I ever read. After ramblings of humble me of course ;-).
I can't claim that I actually used many different IRs, but I've had enough of LLVM IR and just looks at the rest of crowd were enough too. When time has come to wave LLVM IR bye-bye, I figured that readability and familiarity should be the properties of the utmost importance for the IR (all the rest of properties like expressiveness and preciseness of semantics are obvious). That way came PseudoC: https://github.com/pfalcon/ScratchABlock/blob/master/docs/PseudoC-spec.md
And Falcon IR gets quite a section in an IR zoo I maintain: https://github.com/pfalcon/ScratchABlock/blob/master/docs/ir-why-not.md
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.