I've gone through most signatures in this repository, and noticed that there is a lack of consistency when it comes to naming "marks" or self.data
entries. Below are the possible mark names that I've found across the signatures:
['<any virustotal vendor>', '<not a dictionary, just a file path>', '<sha1>', '<sha256>', 'Affid', 'Beacon', 'Buffer', 'C2', 'Campaign', 'ClassName', 'Copy', 'Creates', 'Decoy Document', 'DeletedFile', 'Domain:Port', 'DynamicLoader', 'Event', 'File Move on Reboot', 'HTTPMethod:URI', 'Injection', 'Interacts', 'KernelExploitAttempt', 'KernelExploitBase', 'Key', 'Likely to allow modification of', 'Lure', 'Note', 'Object', 'Payload', 'Payment', 'Process', 'Process executing suspicious JavaScript', 'Program', 'Redirect', 'Regkey', 'SMTP', 'Spam', 'SuspiciousDynamicFunction', 'URL', 'User-Agent', 'Version', 'Window', 'added', 'addit', 'anomalous_version', 'anomaly', 'appends_email', 'appends_new_extension', 'aslr bypass', 'attachment', 'authenticode error', 'author_format', 'begining_of_ransom_message', 'binary', 'browser_inject', 'cmdline', 'command', 'connectivity_check', 'content', 'content_anomaly', 'cookie', 'copy', 'country', 'created_process', 'creation_anomaly', 'cscript_exe', 'cve', 'cve2009_3459', 'cve_2012-0507', 'cve_2012-4681', 'cve_2012-5076', 'cve_2013-0422', 'cve_2013-0431', 'cve_2013-1493', 'cve_2013-2423', 'cve_2013-2460', 'cve_2013-2465', 'cve_2013-2471', 'data', 'data_after_eof', 'data_being_encrypted', 'decoded_base64_string', 'disables_system_recovery', 'disguised_executable', 'domain', 'driver_testsigning', 'drops_unknown_mimetypes', 'embedded content', 'encoded_pe', 'execute', 'fake_useragent', 'file', 'file name', 'file_modifications', 'flash load', 'handlename', 'heap spray', 'http', 'ie_martian', 'ignorefailures', 'injections', 'ioc', 'ip', 'ip address', 'javascript_object', 'jscript_exe', 'key', 'large_attribute', 'last_saved_format', 'lsass credential dumping', 'lsass read access', 'malicious_author', 'mass file_deletion', 'message', 'mimic_dest', 'mimic_source', 'mmbot', 'modified_drive', 'modified_name', 'modified_path', 'mshta_exe', 'mutex', 'mysterious_kernel_module', 'new_appended_file_extension', 'no_pages', 'numerical_author', 'numerical_last_saved', 'obfuscation_reflection', 'office file', 'office_cve_2021_40444', 'office_dl_write_exe', 'office_martian', 'open_action', 'original_name', 'original_path', 'parameter', 'path', 'pattern', 'payload', 'pdbpath', 'percent_match', 'physical drive access', 'pid', 'postscript', 'process', 'reg_query_name', 'regkey', 'regkeyval', 'request', 'section', 'security_permissions', 'self_read', 'serialized_object', 'service', 'servicename', 'short_author_format', 'short_last_saved_format', 'sign', 'signature', 'single_page', 'smtp_header', 'string_length', 'suspicious_deviceiocontrol_ioctl_use', 'system_event_object', 'task', 'unhook', 'unlinked', 'unnamed_driver', 'uri', 'url', 'user-agent', 'window', 'written_content', 'wscript_exe', 'xfa_object']
As you can see, there are a bunch of duplicates if the marks were case insensitive (I will work on fixing this), but in general since there is no standard for assigning marks, the naming has become unpredictable and this makes using CAPEv2 hard to automate.
I will also work on renaming obvious marks to a synonym that is used more often (cmdline -> command, etc).
If there could be some work done to standardize these mark names, whether it be through generalization, constants, or helper methods, I'd appreciate it :)