Git Product home page Git Product logo

jast's Introduction

JaSt - JS AST-Based Analysis

This repository contains the code for the DIMVA'18 paper: "JaSt: Fully Syntactic Detection of Malicious (Obfuscated) JavaScript".
Please note that in its current state, the code is a Poc and not a fully-fledged production-ready API.

Features

Our implementation, which aims at detecting malicious JavaScript samples, is divided into several packages with distinct functionalities:

  • js for the detection of valid JavaScript code;
  • features for the extraction of specific features from JavaScript inputs;
  • clustering for the classification and clustering of JavaScript documents.

JavaScript Detection Tool

Detection of JavaScript samples respecting the grammar defined by ECMA-International, detection of broken JavaScript, and files not written in JavaScript.
To use this tool: python3 <path-of-js/is_js.py> --help.

Classification and Clustering of JavaScript Inputs

An AST-based analysis of JavaScript samples can be performed. This study is based on a frequency analysis of the n-grams present in the considered files.

  • Detection of malicious JavaScript documents.
    To use this tool:
  1. python3 <path-of-clustering/learner.py> --help;
  2. python3 <path-of-clustering/updater.py> --help;
  3. python3 <path-of-clustering/classifier.py> --help.
  • Clustering of JavaScript samples into k (configurable) families.
    To use this tool: python3 <path-of-clustering/cluster.py> --help.

How to use it?

  • The system requirements are given in install.sh;
  • To launch the main function of a package, see the previous Section Features;
  • Application examples of our modules are given in examples.pdf;
  • The complete documentation can be consulted using Python's build-in function help(<name-of-the-module>), or can be generated in HTML format with pydoc: python3 <path-of-pydoc> -w <name-of-the-module>.

Cite this work

If you use JaSt for academic research, you are highly encouraged to cite the following paper:

@inproceedings{fass2018jast,
    author="Fass, Aurore and Krawczyk, Robert P. and Backes, Michael and Stock, Ben",
    title="{\textsc{JaSt}: Fully Syntactic Detection of Malicious (Obfuscated) JavaScript}",
    booktitle="Proceedings of the International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment~(DIMVA)",
    year="2018"
}

Abstract:

JavaScript is a browser scripting language initially created to enhance the interactivity of web sites and to improve their user-friendliness. However, as it offloads the work to the user's browser, it can be used to engage in malicious activities such as Crypto Mining, Drive-by Download attacks, or redirections to web sites hosting malicious software. Given the prevalence of such nefarious scripts, the anti-virus industry has increased the focus on their detection. The attackers, in turn, make increasing use of obfuscation techniques, so as to hinder analysis and the creation of corresponding signatures. Yet these malicious samples share syntactic similarities at an abstract level, which enables to bypass obfuscation and detect even unknown malware variants.

In this paper, we present JaSt, a low-overhead solution that combines the extraction of features from the abstract syntax tree with a random forest classifier to detect malicious JavaScript instances. It is based on a frequency analysis of specific patterns, which are either predictive of benign or of malicious samples. Even though the analysis is entirely static, it yields a high detection accuracy of almost 99.5% and has a low false-negative rate of 0.54%.

Disclaimer

This is a research work and its intend is to publish the results, to make the results verifiable, and to promote information security. It does not necessarily reflect the official policy or position of the Bundesamt für Sicherheit in der Informationstechnik (BSI). Although the author has made every effort to ensure that the information in this publication was correct, the author does not assume and hereby disclaims any liability to any party for any loss, damage, or disruption caused by errors or omissions, whether such errors or omissions result from negligence, accident, or any other cause.

Die von dem Benutzer auf GitHub bereitgestellten Inhalte spiegeln nicht die Meinung des BSI wider.
Die Verwendung der bereitgestellten Inhalte geschieht auf eigene Gefahr des Anwenders. Eine Haftung für die Richtigkeit, Vollständigkeit und Aktualität dieser Inhalte kann seitens des BSI nicht übernommen werden.
Das BSI ist nicht verantwortlich und übernimmt keinerlei Haftung für Schäden, unter anderem für direkte, indirekte, zufällige, vorab konkret zu bestimmende oder Folgeschäden, die angeblich durch die Verwendung der Inhalte aufgetreten sind.

External tools

Esprima, created and maintained by Ariya Hidayat has been used to perform both lexical and syntactic analysis of JavaScript files.

"Copyright JS Foundation and other contributors, https://js.foundation/

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

  • Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
  • Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE."

jast's People

Contributors

aurore54f avatar pararaum avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

jast's Issues

Remove default arguments with fixed paths

Some functions have default parameters which look like fileDir = '/home/...'. Are these dafault arguments really needed? If yes, a relative path would be better as in the current version the home directory is hard coded.

Missing ground truth?

Is there a way to supply ground truth to the classifier? The example provided works fine, but does output the message:

WARNING: No ground truth given: unable to evaluate the accuracy of the classifier's predictions

Difference with JStap

Nice job !

I red your code JStap, I just wanted to quickly understand, what are the differences with this project?

Thank you

some questions about the codes and the publication

hello,
I'm a new learner to classifying malicious and benign javascripts, and I'm very interested in reading this project and the paper. However I met some confusions. Please help me.

  1. In the publication Table 1, there states all possible N-grams. In my opinion, 1-gram should be 26 and 2-grams should be 26^2, but in the table, it's different. Could you explain why?
  2. For example, a declaration line "var goodluck" can be parsed as "Identifier: Name: "goodluck"". In the project, is the N-gram method applied to only "Identifier: Name" or the complete string including "goodluck"? For the reason of obfuscation, including "goodluck" seems not be a good idea.

Thanks very much!

Returning an undefined variable on exception

In NGramsRepresentation.py:

	# return DicoIntToNGrams.dicoIntToNGrams[str(i)];
	try:
		ngram = dico[str(i)];
		return ngram;
	except KeyError as e:
		print('The key ' + str(e) + ' is not in the dictionary.')
		pass;
        return ngram;

On an exception ngram is not defined and the return statement will fail.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.