Comments (13)
Yes, thats the MongoDB document limit. If it crosses 16MB limit, Mongo cannot save. So the next step cuckoo does, is to see if it can delete some key and then attempt to save it. But out of luck. So that particular analysis will not be saved into mongo. If JSON report was enabled, then you should have report.json inside of storage/analysis//reports. But yet wont be displayed in the UI.
This happens sometimes, when you have lots of reporting stuff which exceeds the limit of the mongo document size.
To counter this somewhat, the compress results was the solution.
If you have pulled the latest from Kevin's repo, you can try to enable the compressresults in the reporting.conf file and restart cuckoo and try that sample again.
Let us konw how that goes :)
from cape.
This already has the compressresults enabled. I'm just curious why the delete failed. Could the delete failure be handled more gracefully, so that just the offending results key is omitted from the results instead of the result failing completely.
from cape.
Hi enzok, I agree this failure should be handled more gracefully. I'll try and work out a way to do this - if you can share a sample hash please do.
from cape.
I modified mongodb.py with this code to remedy the issue (starting at ~ line 182):
try:
self.db.analysis.save(report)
except InvalidDocument as e:
parent_key, psize = self.debug_dict_size(report)[0]
if not self.options.get("fix_large_docs", False):
# Just log the error and problem keys
log.error(str(e))
log.error("Largest parent key: %s (%d MB)" % (parent_key, int(psize) / 1048576))
else:
# Delete the problem keys and check for more
error_saved = True
while error_saved:
if type(report) == list:
report = report[0]
try:
if type(report[parent_key]) == list:
for j, parent_dict in enumerate(report[parent_key]):
child_key, csize = self.debug_dict_size(parent_dict)[0]
del report[parent_key][j][child_key]
log.warn("results['%s']['%s'] deleted due to >16MB" % (parent_key, child_key))
else:
child_key, csize = self.debug_dict_size(report[parent_key])
del report[parent_key][child_key]
log.warn("results['%s']['%s'] deleted due to >16MB" % (parent_key, child_key))
try:
self.db.analysis.save(report)
error_saved = False
except InvalidDocument as e:
parent_key, psize = self.debug_dict_size(report)[0]
log.error(str(e))
log.error("Largest parent key: %s (%d MB)" % (parent_key, int(psize) / 1048576))
except Exception as e:
log.error("Failed to delete child key: %s" % str(e))
error_saved = False
self.conn.close()
Correct me if I'm wrong, but I don't believe that procdump results are being compressed. I think when there are too many yara strings, the results grow too large.
from cape.
Ah yes, I will look at adding compression to procdump output too, as well as implementing the fix you have kindly posted above.
Thanks for your help.
from cape.
I have now pushed this fix and enabled compression for procdump. Please let me know if this fixes (or alleviates) this issue.
from cape.
Thank you.
from cape.
Will compressing the report results affect elasticsearch db (search only)? I noticed I'm now getting serialization errors when storing data into elasticsearch.
from cape.
Hmm possibly - I vaguely recall seeing problems previously with Elasticsearch and compression. Any chance you could provide some more details to help me try and narrow it down?
from cape.
It appears that that the compressed data doesn't serialize. I added the following code to the elasticsearchdb.py reporting module and it solved the issue.
import json
import zlib
~ line 137:
try:
report["summary"] = json.loads(zlib.decompress(results.get("behavior", {}).get("summary")))
except:
report["summary"] = results.get("behavior", {}).get("summary")
from cape.
I would rather do it this way:
Since you dont want the compressed results to sit in Elastic, and that the views can any ways parse if its compressed or not - you could change the order of these 2 processing files:
elasticsearchdb.py
Line 25:
order = 9998
Change to order = 9997
compressresults.py
Line 27:
order = 9997
Change to order = 9998
This way compressresults will be done after elasticsearch is reported.
from cape.
That works for me. I completely forgot about being able to set the order.
from cape.
Ah fantastic - thanks both for finding and fixing this. I will make this change now.
from cape.
Related Issues (20)
- Alembic not updating db properly HOT 5
- Error when installing from requirements.txt HOT 4
- VPN not selectable in Web Interface HOT 36
- x64 DLL Extraction module doesn't work HOT 1
- Which commit was capemon.dll compiled from HOT 4
- Small bug on web UI submission template HOT 1
- File not detected as being in VT HOT 2
- Injection vs Extraction HOT 4
- Agent.py HOT 3
- KeyError: (<weakref at 0x7fbf4a8f5d68; to 'function' at 0x7fbf43b9dd90 (go)>,) HOT 4
- Permission for Scraping https://www.capesandbox.com/analysis/ HOT 2
- [Feature Request] Add support for Unfurl HOT 1
- Invalid URL under C2Server HOT 1
- Memory Dump on proxmox HOT 1
- Samples not analyzed on Linux guest (Ubuntu 18.04 32-bits) HOT 2
- The PCAP file does not exist
- Result Server Binding error HOT 1
- Cape Sandbox linux analysis
- Linux Analysis of Cape Sandbox
- Getting zero mal score in linux analysis
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from cape.