Git Product home page Git Product logo

vinayakakv / android-malware-detection Goto Github PK

View Code? Open in Web Editor NEW
29.0 2.0 11.0 29.16 MB

Android Malware Detection with Graph Convolutional Networks using Function Call Graph and its Derivatives.

License: GNU General Public License v3.0

Python 4.02% Jupyter Notebook 95.95% Dockerfile 0.03%
android android-malware-detection android-malware-analysis deep-learning graph-deep-learning dgl pytorch-lightning pytorch source-code-analysis

android-malware-detection's Introduction

AndMal-Detect

Android Malware Detection using Function Call Graphs and Graph Convolutional Networks

What?

A research work carried out by me (Vinayaka K V) during MTech (Research) degree in Department of IT, NITK.

The objectives of the research were:

  1. To evaluate whether GCNs were effective in detecting Android Malware using FCGs, and which GCN algorithm is best for this task.
  2. To enhance the FCGs by incorporating the callback information obtained from the framework code, and evaluate them against the normal FCGs

Code organization

The code achieving first objective is present at master (current) branch, while the code achiving second objective is present at experiment branch.

Methodology

Architecture

Datasets

Stored in the /data folder. Currently, it contains SHA256 of the APKs containing in training and testing splits.

APK Size Balancer

Obtains the histogram of APK sizes, adds APKs wherever there is a huge imbalance between the number of APKs between classes.

Note: The provided dataset is already APK Size balanced 🥳

FCG Extractor

Implemented in scripts/process_dataset.py.

The class FeatureExtractors provides two public methods:

  1. get_user_features() - Returns 15-bit feature vector for internal methods
  2. get_api_features() - Returns a one-hot feature vector for external methods

The method process extracts the FCG and assignes node features.

Node Count Balancer

Balances the dataset so that the node count distribution of the APKs between the classes is exactly the same.

Implemmented in scripts/split_dataset.py.

Note: The provided dataset is already node-count balanced to ensure reproducibility 🤩

GCN Classifier

Multi-layer GCN with dense layer at the end.

Implemented in core/model.py

The Execution Pipeline

  1. Obtain the APKs ug

  2. given SHA256 from AndroZoo

  3. Build the container (either singularity or docker), and get into its shell

  4. Run scripts/process_dataset.py[scripts/process_dataset.py] on the downloaded dataset

     python process_dataset.py \
         --source-dir <source_directory> \
         --dest-dir <dest_directory> \
         --override # If you want to oveeride existing processed files \
         --dry # If you want to perform a dry run
    
  5. Train the model! For configuration, refer to the section below.

    python train_model.py

Configuration

The configuration is achieved using Hydra. Look into config/conf.yaml for available configuration options.

Any configuration option can be overridden in the command line. As an example, to change the number of convolution layers to 2, invoke the program as

python train_model.py model.convolution_count=2

You can also perform a sweep, for example,

    python train_model.py \
        model.convolution_count=0,1,2,3 \
        model.convolution_algorithm=GraphConv, SAGEConv, TAGConv, SGConv, DotGatConv \
        features=degree, method_attributes, method_summary

to train the model in all possible configurations! 🥳

Stack

  • androguard - For FCG extraction and Feature assignment
  • pytorch - for Neural networks
  • dgl - for GCN modules
  • pytorch-lightning - for organization and pipeline 💖
  • hydra - for configuring experiments
  • wandb - for tracking experiments 🔥

Cite as

The research paper corresponding to this work is available at IEEE Xplore. If you find this work helpful and use it, please cite it as

    @INPROCEEDINGS{9478141,
            author={V, Vinayaka K and D, Jaidhar C},
            booktitle={2021 2nd International Conference on Secure Cyber Computing and Communications (ICSCCC)},
            title={Android Malware Detection using Function Call Graph with Graph Convolutional Networks},
            year={2021},
            volume={},
            number={},
            pages={279-287},
            doi={10.1109/ICSCCC51823.2021.9478141}
    }

android-malware-detection's People

Contributors

vinayakakv avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

android-malware-detection's Issues

Please

Hello, my graduation project is the same as yours, and I really really need models, but I can't train it. I hope you will please give me pre-trained model. My email address is: [email protected]. Thank you so much.

Please share me the document

Hey guy, can you share me the document you used?
I see you said "The research paper corresponding to this work is available at IEEE Xplore"
But It's too expensive for me. Please, thank you so much

Error

Hello, I just runed your code in folder "notebook", but I met this problem: "Kernel restarted: c9cfc80d-88b0-46db-a5c5-65a550491687" when I run "train.fit()"
Did you met this problem?
How can I fix it?
image

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.