I am having a hard time getting ooanalyzer to finish. This is what i

Memory limits about pharos HOT 18 CLOSED

cmu-sei commented on August 13, 2024

Memory limits

from pharos.

Comments (18)

sei-ccohen commented on August 13, 2024 1

Yesterday I finished a commit that implemented this proposal. Hopefully it will be public soon. The second reference to get_this_ptr_for_call() is in a place where the original code can still be called. This turned out to not be the primary source of the memory problem, but when the new code is public, memory consumption should be much improved.

from pharos.

sei-ccohen commented on August 13, 2024 1

We're still establishing "normal" memory expectations as well. I currently have a test running with 990,975 facts, and I've been in the Prolog phase for a couple of days at 12.5GB. It's still making steady progress, although I (re)started it using Prolog from the command-line, so that may not accurately reflect the usual C++ mode of operation. Obviously, memory consumption varies according to the specific program being analyzed.

We' recently observed that there's a very large number of factNOTMergeClasses(A, B) facts that are related to the worst of our Prolog phase memory problems. It's possible to end up with some significant fraction of N^2 facts where N is the number of methods. Unfortunately, we're not sure this can be prevented, but we've been trying.

As for the sudden growth of memory right at the end, that just came to my attention a couple of days ago. Perhaps like you, the improvements in other areas made this defect more obvious. It appears to be caused by the "final" rules in Prlolog, and we're investigating since that should happen.

Overall, about the most detailed answer I can provide is that things should be several times better than before but still not great. Some of the problems are becoming much more difficult to "fix".

from pharos.

sei-ccohen commented on August 13, 2024 1

I've reported this upstream to the ROSE developers, and there's some discussion occurring about the problem. This is a non-trivial change, so it probably won't appear any time soon, but there is some optimism that the issue can be addressed eventually.

from pharos.

sei-ccohen commented on August 13, 2024

There are some additional details about these options in the man pages. If you've already read those, your input program might be too large or your system might be to small. ;-) For large "real" programs, we've had to use as much as 256GB of RAM, which is obviously not great. Memory usage reduction is an important goal for us, and we continue work on reducing the memory footprint of OOAnalyzer.

As for the limits not working, one possibility is that there has been a lot of "confusion" over what the values returned by getrusage() are. If you're not on Ubuntu or RedHat, it's possible that we have a "memory unit conversion" problem. That has usually be n the problem for us when the limits didn't work as intended (which is not happening for us now). See: https://github.com/cmu-sei/pharos/blob/master/libpharos/limit.cpp#L179 Some tests on your system to occasionally print the values may reveal the bug.

My suggestion would be to start with a smaller executable, and build confidence in how much RAM is required for given input programs to have a better feel for what can and cannot be accomplished. Our original usage scenario was largely focused on malware, which historically has not involved large programs (hundreds of classes and many thousands of methods). We have gotten OOAnalyzer to work on such programs, but not without a lot of RAM and CPU.

from pharos.

Trass3r commented on August 13, 2024

As for the limits not working, one possibility is that there has been a lot of "confusion" over what the values returned by getrusage() are.

I'm not sure ru_maxrss is the right value to use either. It only considers physical memory but swap space can let you go much further (and works well for this tool).

from pharos.

sei-ccohen commented on August 13, 2024

I've left this issue open because reducing the memory consumption is an important priority for us. I looked at the problem some more today, and I think the problem is in get_this_ptr_for_call() here:

https://github.com/cmu-sei/pharos/blob/master/libpharos/method.cpp#L44

The state that is being used in that call is probably has a ton of memory associated with it. It is in turn used (only) from here:

https://github.com/cmu-sei/pharos/blob/master/libpharos/method.cpp#L483

Which as the comment a few lines above remarks on, is being called from the finish() pass of OOAnalyzer, or after all of the functions have been analyzed. As a result all of the call states can't be freed until after all of the functions have been analyzed, and in fact none of them are being freed "early" at all because there's not much point given the current design.

But what we could do is compute the values from get_this_ptr_for_call() for each call during the per function analysis pass, cache just the this_ptr answer from calling get_this_ptr_for_call(), then free the rest of the state that we got from get_output_state() by calling set_output_state() to set the point to Sawyer::Nothing. This probably frees a lot of memory because most of the state is from reference counted smart pointers. Then the final pass in find_passed_func_offsets() would need to get the values we cached earlier and complete the analysis without referencing the state.

I know roughly what to do, and plan on doing so eventually, but I may not be able to get to it for a few more days. I figure now that I had reviewed the details of the problem, I should write them down in case you're more motivated than I to try and address it in the meantime. :-)

from pharos.

sei-ccohen commented on August 13, 2024

Also regarding the comment on ru_maxrss and swap "working well", that's probably because other than needing to visit the "call state" once more for each call to fetch the this pointer out of it, we're not really accessing any of that state memory ever again. So most of the RAM gets sent to swap, and then never gets paged back in again. In our original design when we just wanted it to work at all, we didn't attempt to free much of anything that we thought might be useful later on. :-(

from pharos.

Trass3r commented on August 13, 2024

Also regarding the comment on ru_maxrss and swap "working well", that's probably because other than needing to visit the "call state" once more for each call to fetch the this pointer out of it, we're not really accessing any of that state memory ever again. So most of the RAM gets sent to swap, and then never gets paged back in again.

Exactly 😄

I've left this issue open because reducing the memory consumption is an important priority for us. I looked at the problem some more today, and I think the problem is in get_this_ptr_for_call() here:

The partitioning phase also leaves some memory behind. If you kill the process and re-run it using the serialized data memory consumption is down a few GBs.

from pharos.

Trass3r commented on August 13, 2024

I've left this issue open because reducing the memory consumption is an important priority for us. I looked at the problem some more today, and I think the problem is in get_this_ptr_for_call() here:

https://github.com/cmu-sei/pharos/blob/master/libpharos/method.cpp#L44

The state that is being used in that call is probably has a ton of memory associated with it. It is in turn used (only) from here:

https://github.com/cmu-sei/pharos/blob/master/libpharos/method.cpp#L483

Looks like it's also used here:

pharos/libpharos/usage.cpp

Line 371 in 8789a3a

SymbolicValuePtr this_ptr = get_this_ptr_for_call(cd);

from pharos.

sei-ccohen commented on August 13, 2024

Not to draw needless attention to our shameful bug, but here was the real fix:

https://github.com/cmu-sei/pharos/blob/master/libpharos/funcs.cpp#L1058

Although I think that the change I proposed and implemented in this issue saved another 10-15% of RAM on top of that and worked roughly as I expected, but there were some complications resulting in the other changes in usage.cpp because of an .at() call that wasn't found.

I'm going to close this issue now. If there are other concerns about memory consumption, we should probably open a new issue.

from pharos.

Trass3r commented on August 13, 2024

What is the expected memory consumption now depending on the exe size / number of facts?
Still seeing lots of GBs held onto when entering the Prolog stage even though the 250000 exported facts are only 15MB in text form. And after finishing it allocates lots of memory again, just to build up the json tree?

from pharos.

sei-eschwartz commented on August 13, 2024

Memory limits and usage in Prolog have been greatly improved. Re-open if still an issue.

from pharos.

eXpl0it3r commented on August 13, 2024

Btw. might be worth providing some hint somewhere, that some reasonably sized application will require multiple hundreds GB of RAM, so people don't waste a few hours to figure that out on their own. 😄

from pharos.

sei-eschwartz commented on August 13, 2024

Btw. might be worth providing some hint somewhere, that some reasonably sized application will require multiple hundreds GB of RAM, so people don't waste a few hours to figure that out on their own.

@eXpl0it3r I am not aware of any application that should require that much RAM. Can you file a new issue for this?

from pharos.

eXpl0it3r commented on August 13, 2024

Maybe I was a bit too vague with my description, but I tested a 40MB executable and ooanalyzer froze the Docker container after hitting my 14GB RAM limit of the Docker container, and quit when setting --maximum-memory, all while it hasn't even reached 30% of PRT2[MARCH]. So a linear calculation would put that at over 256GB of RAM required...

If this is not expected behavior, I'm happy to open a new issue. 🙂

from pharos.

cfcohen commented on August 13, 2024

Sadly, for a 40MB input executable, it is quite likely that multiple (many?) gigabytes of RAM are required. This issue was discussed once before here:

#78 (comment)

where I provided some more explanation. For a 17MB input executable, the problem was bad. For a 40MB input executable, it will be even worse. Unfortunately, there's not much we can do about the issue without some additional assistance from ROSE developers, since the change from an all-in-memory approach to one where instruction disassemblies are discarded during partitioning is a non-trivial change. In some ways, this is a standard memory versus run-time performance trade off. It may seem like a simple problem, but because any instruction found later during disassembly can jump into any earlier instruction or basic block, there's no simple algorithm to determine when you're "done" with an instruction or basic block. As a result, something more sophisticated like a least recently used (LRU) cache for automatically discarding instructions (and possibly disassembling them again later) is required.

As mentioned in the previous ticket, each 3-4 byte instruction becomes a fairly large and detailed collection of instruction & operand objects. Hopefully that explains why so much memory is required. Some users have had good success by just providing lots of swap memory, since most of the instructions aren't referenced very frequently, and once pages are pushed to disk, they're probably not brought back too often. We'll take to the ROSE developers to discuss what changes might be possible.

from pharos.

eXpl0it3r commented on August 13, 2024

Thanks for the fast and detailed response. I might look into the swap memory option. So far been more just testing the waters, so there's no time pressure from my side. 👍

from pharos.

cfcohen commented on August 13, 2024

My recommendation is to start with some smallish (<100 kilobyte) object oriented executables. OOAnalyzer is a complicated tool, and it's best to understand the entire pipeline from start to finish on small executables before trying big ones. Big executables require lots of "tricks" discussed in the various Github issues, such as partitioner serialization, disabling semantics during partitioning, using the stock ROSE partitioner, using multi-threading, and separating the C++ analysis phase from the Prolog analysis phase (most recently using beta SWI Prolog scripts) in order to get useful results. Not all of these tricks are strictly required, but I wouldn't personally attempt to analyze a 40MB executable without most of them. We're constantly working to improve the process, and some of these features may be automatic in future releases. Frankly, people are using (attempting to use?) OOAnalyzer on bigger and bigger programs all the time, which is uncovering new problems and gradually making the tool better as well.

from pharos.

Memory limits about pharos HOT 18 CLOSED

Comments (18)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent