Comments (18)
Yesterday I finished a commit that implemented this proposal. Hopefully it will be public soon. The second reference to get_this_ptr_for_call() is in a place where the original code can still be called. This turned out to not be the primary source of the memory problem, but when the new code is public, memory consumption should be much improved.
from pharos.
We're still establishing "normal" memory expectations as well. I currently have a test running with 990,975 facts, and I've been in the Prolog phase for a couple of days at 12.5GB. It's still making steady progress, although I (re)started it using Prolog from the command-line, so that may not accurately reflect the usual C++ mode of operation. Obviously, memory consumption varies according to the specific program being analyzed.
We' recently observed that there's a very large number of factNOTMergeClasses(A, B) facts that are related to the worst of our Prolog phase memory problems. It's possible to end up with some significant fraction of N^2 facts where N is the number of methods. Unfortunately, we're not sure this can be prevented, but we've been trying.
As for the sudden growth of memory right at the end, that just came to my attention a couple of days ago. Perhaps like you, the improvements in other areas made this defect more obvious. It appears to be caused by the "final" rules in Prlolog, and we're investigating since that should happen.
Overall, about the most detailed answer I can provide is that things should be several times better than before but still not great. Some of the problems are becoming much more difficult to "fix".
from pharos.
I've reported this upstream to the ROSE developers, and there's some discussion occurring about the problem. This is a non-trivial change, so it probably won't appear any time soon, but there is some optimism that the issue can be addressed eventually.
from pharos.
There are some additional details about these options in the man pages. If you've already read those, your input program might be too large or your system might be to small. ;-) For large "real" programs, we've had to use as much as 256GB of RAM, which is obviously not great. Memory usage reduction is an important goal for us, and we continue work on reducing the memory footprint of OOAnalyzer.
As for the limits not working, one possibility is that there has been a lot of "confusion" over what the values returned by getrusage() are. If you're not on Ubuntu or RedHat, it's possible that we have a "memory unit conversion" problem. That has usually be n the problem for us when the limits didn't work as intended (which is not happening for us now). See: https://github.com/cmu-sei/pharos/blob/master/libpharos/limit.cpp#L179 Some tests on your system to occasionally print the values may reveal the bug.
My suggestion would be to start with a smaller executable, and build confidence in how much RAM is required for given input programs to have a better feel for what can and cannot be accomplished. Our original usage scenario was largely focused on malware, which historically has not involved large programs (hundreds of classes and many thousands of methods). We have gotten OOAnalyzer to work on such programs, but not without a lot of RAM and CPU.
from pharos.
As for the limits not working, one possibility is that there has been a lot of "confusion" over what the values returned by getrusage() are.
I'm not sure ru_maxrss
is the right value to use either. It only considers physical memory but swap space can let you go much further (and works well for this tool).
from pharos.
I've left this issue open because reducing the memory consumption is an important priority for us. I looked at the problem some more today, and I think the problem is in get_this_ptr_for_call() here:
https://github.com/cmu-sei/pharos/blob/master/libpharos/method.cpp#L44
The state that is being used in that call is probably has a ton of memory associated with it. It is in turn used (only) from here:
https://github.com/cmu-sei/pharos/blob/master/libpharos/method.cpp#L483
Which as the comment a few lines above remarks on, is being called from the finish() pass of OOAnalyzer, or after all of the functions have been analyzed. As a result all of the call states can't be freed until after all of the functions have been analyzed, and in fact none of them are being freed "early" at all because there's not much point given the current design.
But what we could do is compute the values from get_this_ptr_for_call() for each call during the per function analysis pass, cache just the this_ptr answer from calling get_this_ptr_for_call(), then free the rest of the state that we got from get_output_state() by calling set_output_state() to set the point to Sawyer::Nothing. This probably frees a lot of memory because most of the state is from reference counted smart pointers. Then the final pass in find_passed_func_offsets() would need to get the values we cached earlier and complete the analysis without referencing the state.
I know roughly what to do, and plan on doing so eventually, but I may not be able to get to it for a few more days. I figure now that I had reviewed the details of the problem, I should write them down in case you're more motivated than I to try and address it in the meantime. :-)
from pharos.
Also regarding the comment on ru_maxrss and swap "working well", that's probably because other than needing to visit the "call state" once more for each call to fetch the this pointer out of it, we're not really accessing any of that state memory ever again. So most of the RAM gets sent to swap, and then never gets paged back in again. In our original design when we just wanted it to work at all, we didn't attempt to free much of anything that we thought might be useful later on. :-(
from pharos.
Also regarding the comment on ru_maxrss and swap "working well", that's probably because other than needing to visit the "call state" once more for each call to fetch the this pointer out of it, we're not really accessing any of that state memory ever again. So most of the RAM gets sent to swap, and then never gets paged back in again.
Exactly 😄
I've left this issue open because reducing the memory consumption is an important priority for us. I looked at the problem some more today, and I think the problem is in get_this_ptr_for_call() here:
The partitioning phase also leaves some memory behind. If you kill the process and re-run it using the serialized data memory consumption is down a few GBs.
from pharos.
I've left this issue open because reducing the memory consumption is an important priority for us. I looked at the problem some more today, and I think the problem is in get_this_ptr_for_call() here:
https://github.com/cmu-sei/pharos/blob/master/libpharos/method.cpp#L44
The state that is being used in that call is probably has a ton of memory associated with it. It is in turn used (only) from here:
https://github.com/cmu-sei/pharos/blob/master/libpharos/method.cpp#L483
Looks like it's also used here:
Line 371 in 8789a3a
from pharos.
Not to draw needless attention to our shameful bug, but here was the real fix:
https://github.com/cmu-sei/pharos/blob/master/libpharos/funcs.cpp#L1058
Although I think that the change I proposed and implemented in this issue saved another 10-15% of RAM on top of that and worked roughly as I expected, but there were some complications resulting in the other changes in usage.cpp because of an .at() call that wasn't found.
I'm going to close this issue now. If there are other concerns about memory consumption, we should probably open a new issue.
from pharos.
What is the expected memory consumption now depending on the exe size / number of facts?
Still seeing lots of GBs held onto when entering the Prolog stage even though the 250000 exported facts are only 15MB in text form. And after finishing it allocates lots of memory again, just to build up the json tree?
from pharos.
Memory limits and usage in Prolog have been greatly improved. Re-open if still an issue.
from pharos.
Btw. might be worth providing some hint somewhere, that some reasonably sized application will require multiple hundreds GB of RAM, so people don't waste a few hours to figure that out on their own. 😄
from pharos.
Btw. might be worth providing some hint somewhere, that some reasonably sized application will require multiple hundreds GB of RAM, so people don't waste a few hours to figure that out on their own.
@eXpl0it3r I am not aware of any application that should require that much RAM. Can you file a new issue for this?
from pharos.
Maybe I was a bit too vague with my description, but I tested a 40MB executable and ooanalyzer froze the Docker container after hitting my 14GB RAM limit of the Docker container, and quit when setting --maximum-memory
, all while it hasn't even reached 30% of PRT2[MARCH]. So a linear calculation would put that at over 256GB of RAM required...
If this is not expected behavior, I'm happy to open a new issue. 🙂
from pharos.
Sadly, for a 40MB input executable, it is quite likely that multiple (many?) gigabytes of RAM are required. This issue was discussed once before here:
where I provided some more explanation. For a 17MB input executable, the problem was bad. For a 40MB input executable, it will be even worse. Unfortunately, there's not much we can do about the issue without some additional assistance from ROSE developers, since the change from an all-in-memory approach to one where instruction disassemblies are discarded during partitioning is a non-trivial change. In some ways, this is a standard memory versus run-time performance trade off. It may seem like a simple problem, but because any instruction found later during disassembly can jump into any earlier instruction or basic block, there's no simple algorithm to determine when you're "done" with an instruction or basic block. As a result, something more sophisticated like a least recently used (LRU) cache for automatically discarding instructions (and possibly disassembling them again later) is required.
As mentioned in the previous ticket, each 3-4 byte instruction becomes a fairly large and detailed collection of instruction & operand objects. Hopefully that explains why so much memory is required. Some users have had good success by just providing lots of swap memory, since most of the instructions aren't referenced very frequently, and once pages are pushed to disk, they're probably not brought back too often. We'll take to the ROSE developers to discuss what changes might be possible.
from pharos.
Thanks for the fast and detailed response. I might look into the swap memory option. So far been more just testing the waters, so there's no time pressure from my side. 👍
from pharos.
My recommendation is to start with some smallish (<100 kilobyte) object oriented executables. OOAnalyzer is a complicated tool, and it's best to understand the entire pipeline from start to finish on small executables before trying big ones. Big executables require lots of "tricks" discussed in the various Github issues, such as partitioner serialization, disabling semantics during partitioning, using the stock ROSE partitioner, using multi-threading, and separating the C++ analysis phase from the Prolog analysis phase (most recently using beta SWI Prolog scripts) in order to get useful results. Not all of these tricks are strictly required, but I wouldn't personally attempt to analyze a 40MB executable without most of them. We're constantly working to improve the process, and some of these features may be automatic in future releases. Frankly, people are using (attempting to use?) OOAnalyzer on bigger and bigger programs all the time, which is uncovering new problems and gradually making the tool better as well.
from pharos.
Related Issues (20)
- Calling conventions and being smarter about statically linked binaries
- SQLite concurrency problem in OOAnalyzer HOT 28
- DKII consistency failures HOT 5
- APIAnalyzer use-after-free HOT 1
- Stop the lies
- stop being rude and stupid
- no need to create lies HOT 1
- ooprolog crash HOT 16
- Consistency checks failed in ooprolog. HOT 22
- Partition stuck at 94%, seems to not be using available memory HOT 14
- Partitioner stucks at 17 % and often gets killed HOT 4
- crash in partitioning: basic block does not contain instruction HOT 8
- Is there a method to exclude classes by name? HOT 29
- Add Support For Non-Standard Windows Based Executables. HOT 6
- Initial sanity check failed in ooprolog HOT 6
- Contribute to OOAnalyzer HOT 1
- Where can I find a plugin for Ghidra? HOT 1
- WSL: cannot see file mapped HOT 1
- Build error running make on CentOS 7 (error: constructor required before non-static data member) HOT 12
- Initial sanity checks failed: Contradictory information about constructor: factConstructor(0x4a347b) but reasonNOTConstructor(0x4a347b) HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pharos.