Git Product home page Git Product logo

Comments (13)

marirs avatar marirs commented on June 25, 2024

Yes, thats the MongoDB document limit. If it crosses 16MB limit, Mongo cannot save. So the next step cuckoo does, is to see if it can delete some key and then attempt to save it. But out of luck. So that particular analysis will not be saved into mongo. If JSON report was enabled, then you should have report.json inside of storage/analysis//reports. But yet wont be displayed in the UI.

This happens sometimes, when you have lots of reporting stuff which exceeds the limit of the mongo document size.

To counter this somewhat, the compress results was the solution.

If you have pulled the latest from Kevin's repo, you can try to enable the compressresults in the reporting.conf file and restart cuckoo and try that sample again.

Let us konw how that goes :)

from cape.

enzok avatar enzok commented on June 25, 2024

This already has the compressresults enabled. I'm just curious why the delete failed. Could the delete failure be handled more gracefully, so that just the offending results key is omitted from the results instead of the result failing completely.

from cape.

kevoreilly avatar kevoreilly commented on June 25, 2024

Hi enzok, I agree this failure should be handled more gracefully. I'll try and work out a way to do this - if you can share a sample hash please do.

from cape.

enzok avatar enzok commented on June 25, 2024

I modified mongodb.py with this code to remedy the issue (starting at ~ line 182):

    try:
        self.db.analysis.save(report)
    except InvalidDocument as e:
        parent_key, psize = self.debug_dict_size(report)[0]
        if not self.options.get("fix_large_docs", False):
            # Just log the error and problem keys
            log.error(str(e))
            log.error("Largest parent key: %s (%d MB)" % (parent_key, int(psize) / 1048576))
        else:
            # Delete the problem keys and check for more
            error_saved = True
            while error_saved:
                if type(report) == list:
                    report = report[0]

                try:
                    if type(report[parent_key]) == list:
                        for j, parent_dict in enumerate(report[parent_key]):
                            child_key, csize = self.debug_dict_size(parent_dict)[0]
                            del report[parent_key][j][child_key]
                            log.warn("results['%s']['%s'] deleted due to >16MB" % (parent_key, child_key))
                    else:
                        child_key, csize = self.debug_dict_size(report[parent_key])
                        del report[parent_key][child_key]
                        log.warn("results['%s']['%s'] deleted due to >16MB" % (parent_key, child_key))

                    try:
                        self.db.analysis.save(report)
                        error_saved = False
                    except InvalidDocument as e:
                        parent_key, psize = self.debug_dict_size(report)[0]
                        log.error(str(e))
                        log.error("Largest parent key: %s (%d MB)" % (parent_key, int(psize) / 1048576))
                except Exception as e:
                    log.error("Failed to delete child key: %s" % str(e))
                    error_saved = False

    self.conn.close()

Correct me if I'm wrong, but I don't believe that procdump results are being compressed. I think when there are too many yara strings, the results grow too large.

from cape.

kevoreilly avatar kevoreilly commented on June 25, 2024

Ah yes, I will look at adding compression to procdump output too, as well as implementing the fix you have kindly posted above.

Thanks for your help.

from cape.

kevoreilly avatar kevoreilly commented on June 25, 2024

I have now pushed this fix and enabled compression for procdump. Please let me know if this fixes (or alleviates) this issue.

from cape.

enzok avatar enzok commented on June 25, 2024

Thank you.

from cape.

enzok avatar enzok commented on June 25, 2024

Will compressing the report results affect elasticsearch db (search only)? I noticed I'm now getting serialization errors when storing data into elasticsearch.

from cape.

kevoreilly avatar kevoreilly commented on June 25, 2024

Hmm possibly - I vaguely recall seeing problems previously with Elasticsearch and compression. Any chance you could provide some more details to help me try and narrow it down?

from cape.

enzok avatar enzok commented on June 25, 2024

It appears that that the compressed data doesn't serialize. I added the following code to the elasticsearchdb.py reporting module and it solved the issue.

import json
import zlib

~ line 137:

        try:
            report["summary"] = json.loads(zlib.decompress(results.get("behavior", {}).get("summary")))
        except:
            report["summary"] = results.get("behavior", {}).get("summary")

from cape.

marirs avatar marirs commented on June 25, 2024

I would rather do it this way:
Since you dont want the compressed results to sit in Elastic, and that the views can any ways parse if its compressed or not - you could change the order of these 2 processing files:

elasticsearchdb.py
Line 25:
order = 9998
Change to order = 9997

compressresults.py
Line 27:
order = 9997
Change to order = 9998

This way compressresults will be done after elasticsearch is reported.

from cape.

enzok avatar enzok commented on June 25, 2024

That works for me. I completely forgot about being able to set the order.

from cape.

kevoreilly avatar kevoreilly commented on June 25, 2024

Ah fantastic - thanks both for finding and fixing this. I will make this change now.

from cape.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.