Git Product home page Git Product logo

android-malware-analysis's Introduction

Getting an API Key

AndroTotal has simplified the process for getting an API Key. Login/Create an Account at http://andrototal.org/ and you will then be able to view your profile settings. There is an API Tab which contains your key.

This repository contains a set of scripts to automate the process of gathering data from malware samples, training a machine learning model on that data, and plotting its classification accuracy.

  1. Make a copy of config-template.ini called config.ini and edit it.

  2. Ensure that the "tools" subdirectory has been initialized ("$ git submodule update --init tools")

  3. Either use get_samples.py to download samples or copy them into "all_apks" from another source. If you're using get_samples.py, you can monitor it in another shell by running watch "ls -l *.apk | wc -l"

  4. sort_malicious.py uses andrototal.org to sort them into "malicious_apk" and "benign_apk" folders. You can monitor it in another shell by running watch "ls -l benign_apk/*.apk | wc -l && ls -l malicious_apk/*.apk | wc -l"

  5. extract_apks_parallel.sh unpacks the .apk files into folders and processes some of the data therein. You can monitor it in another shell by running watch "wc -l benign_apk/valid_apks.txt; wc -l malicious_apk/valid_apks.txt"

  6. Run one of the following scripts to generate feature vectors:

    • parse_xml.py for permissions. "app_permission_vectors.json" is generated
    • parse_maline_output.py for syscalls. "app_syscall_vectors.json" is generated. You will have to run maline first for this to work.
    • parse_disassembled.py for API calls. "app_method_vectors.json" is generated
    • parse_ssdeep.py for fuzzy hashes. "app_hash_vectors.json" is generated. You will have to run ssdeep first for this to work.
    • combine_features.py for a combination of the top weighted features. "app_feature_vectors.json" is generated. This only works if you've previously trained a network on the specified features, and the feature weights files are named appropriately.
  7. Run $ run_trials.sh app_feature_vectors.json (or whichever json you want) which runs the tensorflow_learn.py script (where the ML happens) a number of times and puts the results into a folder. It also runs plot_data.py and match_features.py to create a plot and create a list of top weighted features, respectively.

  8. Change the parameters or input data and repeat step 6. It should be non-destructive so you can compare the results of different runs.

Note: If you want to use a SVM instead of a neural network, use sklearn_svm.py in place of tensorflow_learn.py. You can also use sklearn_tree.py to use a decision tree.

android-malware-analysis's People

Contributors

mwleeds avatar thesecmaven avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

android-malware-analysis's Issues

API Key is not valid

When I got a list of samples by date of submission using api_key, I got "Cannot complete the request, reason: API Key is not valid". However, this api_key works well when scanning and analysising files. Do you encountered such problem ever?

ValueError: range() arg 3 must not be zero

Hi,
I keep getting the following error whenever i run the ./run_trial.sh app_permission_vectors.json , Can you help please , thanks.

File "tensorflow_learn.py", line 85, in
main()
File "tensorflow_learn.py", line 58, in main
malicious_app_name_chunks = list(chunks(malicious_app_names, math.floor(len(dataset['apps']) / NUM_CHUNKS)))
File "tensorflow_learn.py", line 81, in chunks
for i in range(0, len(l), n):
ValueError: range() arg 3 must not be zero

Help Request

Hi,
After i run the model and got accuracy of 89% (100 files clean , 100 malicious) how can i test the same model against different apk(s) files to confirm if they are malicious based on there features without manually putting them in safe/malicious folders, help will be appreciated thanks.

Where can I get AMA api key?

I try to run this repository for a test, but I can not get an AMA api key, so I get a KeyERROR when I run this repo.
So, where can I get an AMA api key?
Thanks~


Stupid as I am ... I read the PDF file and find the api key in http://andrototal.org/api_key , Thanks~

Value Error

Hi i keep getting this error when i run my python flask code
ValueError: range() arg 3 must not be zero

groups = [students[i:i+4] for i in range(1, len(students), group_size)]
ValueError: range() arg 3 must not be zero

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.