Git Product home page Git Product logo

statistrings's Introduction

statiStrings

      _        _   _ ____  _        _
  ___| |_ __ _| |_(_) ___|| |_ _ __(_)_ __   __ _ ___
 / __| __/ _` | __| \___ \| __| '__| | '_ \ / _` / __|
 \__ \ || (_| | |_| |___) | |_| |  | | | | | (_| \__ \
 |___/\__\__,_|\__|_|____/ \__|_|  |_|_| |_|\__, |___/
 		       			    |___/
 YARA Rule Strings Statistics Calculator
 Shelly Raban (Sh3llyR), February 2021, Version 0.1
Table of Contents
  1. About The Project
  2. Getting Started
  3. Usage
  4. Contact
  5. Acknowledgements

About The Project

statiStrings is a strings statistics calculator for YARA rules.

The goal is to aid malware research by:

  • Finding common and unique strings within malware samples
  • Finding common strings within clean files
  • Saving time by finding the common characteristics of malware samples automatically

This tool helps writing better, more precise YARA rules for malware detection and malware hunting, based on custom databases of malicious and clean files.

For a given YARA rule and a directory of files, this tool returns the prevalence of each string from the rule in the matched files from the directory.

Built With

Getting Started

To use this tool, you must have Python installed.

Installation

Install yara-python

pip install yara

Clone the repo

git clone https://github.com/Sh3llyR/statiStrings.git

Usage

 usage: statiStrings.py [-h] [-y YARA_RULE] [-d TEST_DIR] [-t OUTPUT_TYPE]

 YARA Rule Strings Statistics Generator and Malware Research Helper

 optional arguments:
   -h, --help      show this help message and exit
   -y YARA_RULE    Path to the YARA Rule
   -d TEST_DIR     Path to the Directory of Files to be Scanned
   -t OUTPUT_TYPE  Output Type: s (sum - number of files in which each string
 				  from the YARA rule ocuured) / p (percentage - percent of
 				  files in which each string from the YARA rule ocuured).
 				  Default is s

Usage example

Research of common strings in malicious batch scripts: First, I wrote a YARA rule with many commands that were found in malicious scripts. The condition was "any of them" - very generic. Then, I ran this tool with the rule I wrote against a malicious scripts directory (shown in the following example). Finally, I ran it against a directory with clean scripts. After Going through the results of both clean and malicious scripts, I was able to:

  1. Group the strings of the YARA rule to suspicious ($s_...), for example tskill, and noisy ($n_...), for example echo.
  2. Create a condition for my rule that catches the malicious samples but not the clean samples, minimizing false positives.
  • python statiStrings.py -y .\batch_commands.yar -d .\batch_samples -t s
  • Results:
     {'$s_ren': 1, '$n_set': 8, '$s_mem': 1, '$s_reg_add': 8, '$s_taskkill': 4, '$n_exit': 9, '$s_maybe_block_sites_hosts_file': 1, '$s_move': 2, '$s_attrib': 6, '$n_copy': 6, '$n_start': 10, '$n_type': 7, '$n_echo': 26, '$n_reg': 11, '$s_aes': 1, '$s_cscript': 1, '$s_change_mouse_settings': 1, '$n_net': 3, '$n_find': 6, '$s_infinite_loop': 2, '$s_shutdown': 9, '$n_del': 6, '$n_goto': 12, '$s_generic_bat_maybe_copy_itself': 5, '$n_ipconfig': 2, '$n_maybe_time_change': 5, '$n_system': 2, '$s_tskill': 3, '$s_cpu_damage': 1, '$s_erase': 3, '$s_make_random_folders': 1, '$s_sleep': 4, '$n_bat_maybe_copy_itself': 9}
     Number of files scanned: 157
  • python statiStrings.py -y .\batch_commands.yar -d .\batch_samples -t p
  • Results:
     {'$s_maybe_block_sites_hosts_file': '0.64%', '$s_sleep': '2.55%', '$s_shutdown': '5.73%', '$s_attrib': '3.82%', '$s_change_mouse_settings': '0.64%', '$n_maybe_time_change': '3.18%', '$s_erase': '1.91%', '$s_move': '1.27%', '$n_net': '1.91%', '$s_aes': '0.64%', '$n_reg': '7.01%', '$n_system': '1.27%', '$n_set': '5.1%', '$s_cscript': '0.64%', '$n_find': '3.82%', '$s_generic_bat_maybe_copy_itself': '3.18%', '$s_cpu_damage': '0.64%', '$n_goto': '7.64%', '$s_tskill': '1.91%', '$s_ren': '0.64%', '$s_mem': '0.64%', '$n_type': '4.46%', '$s_taskkill': '2.55%', '$n_exit': '5.73%', '$n_echo': '16.56%', '$s_infinite_loop': '1.27%', '$n_start': '6.37%', '$s_make_random_folders': '0.64%', '$n_bat_maybe_copy_itself': '5.73%', '$n_ipconfig': '1.27%', '$s_reg_add': '5.1%', '$n_del': '3.82%', '$n_copy': '3.82%'}
     Number of files scanned: 157

Contact

LinkedIn

Project Link: https://github.com/Sh3llyR/statiStrings

Acknowledgements

statistrings's People

Contributors

sh3llyr avatar

Stargazers

Michael L. avatar U avatar Ewifly avatar pandazheng avatar  avatar  avatar Byungho avatar Matteo Lodi avatar Karol Trociński avatar vishnummv avatar Hannah Suarez/hcs0 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.