Git Product home page Git Product logo

jsanalyzer's Introduction

JS Analyzer

This is a project to analyze obfuscated JS code using abstract interpretation and optimizations similar to compiler optimizations

This is a work in progress / proof of concept, it is very incomplete. A lot of essential things are incorrectly handled or unhandled (exceptions, some OOP stuff, async functions, etc), also it is not very optimized (for analysis speed).

Setup

It requires python3 or pypy3 with module esprima. The usage of pypy3 is recommended for performance reasons.

Before use, you must type make in the project directory in order to compile jseval.so

Usage

./do.sh <input JS file without the .js extension>

It will produce a yourfile-out.js

You can test it on the realexemple1.js, it will produce a decoded version of the expression (URL string) assigned to the hook function, at the end of the file.

JSAnalyzer in action

This is an excerpt from a real obfuscated malware "as is" (it has only been automatically indented). Strings are obfuscated and replaced by calls to functions, and control flow has been flattened: Obfuscated code

After processing by JSAnalyzer, the strings are recovered, and the control flow is clarified. We see that the code search for running Discords: Processed code

This is an excerpt for another obfuscated JS served on some website. Strings are encrypted in RC4, there is control flow flatteing, and eval() calls. Obfuscated code

JSAnalyzer interprets automatically the RC4 decryption, emulates the eval() calls and clarifies the control flow. This excerpt show the exchange of requets with the server: Processed code

How it works ?

The obfuscated JS is processed in 4 steps

  • Parsing the JS into an Abstract Syntax Tree (AST) (this is done by the esprima module)
  • Abstract Interpretation on the AST to find out constant expressions (done by analyze.py)
  • Code Transformations on the AST, this is similar to compiler optimizations (done by transform.py)
  • Transformed JS output (done by prettyprint.js using the escodegen module)

Workflow

Abstract interpretation

The general idea

This works by analyzing the program to identify constant expressions (that is, expressions for which the value can be determined without actually executing the program). For instance, let's consider the following example:

x = 10;
if (x == 10) {
	y = 32 + x;
}

console.log(x);  /* x is constant (10) */
console.log(y);  /* y is constant (42) */

if (Math.random()*10 < 5) {
	a = 10;
	b = 20;
	console.log(b); /* b is constant (20) */
} else {
	a = 10;
	b = 30;
	console.log(b); /* b is constant (30) */
}
console.log(a); /* a is constant (10) */
console.log(b); /* b is not constant (it cannot be determined whether b is 20 or 30 without running the program */

JSanalyzer will replace each constant expression with its value, resulting in this output:

x = 10
if (true){
  y = 42
}
console.log(10)
console.log(42)
if (Math.random() * 10 < 5){
  a = 10
  b = 20
  console.log(20)
}
else
{
  a = 10
  b = 30
  console.log(30)
}
console.log(10)
console.log(b)

Constant expression detection

JSanalyzer detects constant expressions by performing an abstract interpretation of the program under the constants domain.

It is basically the same thing as interpreting the program normally, except that each expression is evaluated as an abstract value. An abstract value is either a concrete value (i.e. a number, a string, ...) or a special value named Top, that essentially means that the value of the expression cannot be determined withour running the program. Top values are generated when the analyzer encounters something that cannot be determined statically (for instance, a statement involving I/O, network operation, ...), and is propagated in the various computations.

When execution path diverges based on an unknown condition (for instance, an if with a Top test condition), the analyzer will process each path separately, and perform state merging afterwards. State merging keeps variables that have the same value along each paths, and set others to Top.

The loop are unrolled as long as loop condition is true, up to a configurable number of iterations (default: 1000).

Project code organization

The project is organized in several files:

  • analyzer.py: main program
  • config.py: user-editable configuration file
  • abstract.py: defines classes for abstract value and their operations: JSTop (Top), JSClosure (closure and functions), JSObject (objects and arrays), JSRef (references to object and arrays), JSUndefNaN (represents undefined or NaN), JSPrimitive (represents a primitive value such as number or string), and JSSimFct (built-in JS function coded in python)
  • plugin_manager.py: defines the plugin manager. Plugins live in the plugins/ subdirectory, and can define behavior for unary and binary operators, as well as built-in JS functions.
  • interpreter.py: defines the "main" part of the interpreter. It processes the abstract syntax tree (AST), and interprets the programs using abstract values for each AST element.
  • output.py: defines the pretty-printer / output generator. It is executed after the interpreter, and outputs the result JS, where each constant expression is replaced with its value.

Code Transforms

The used optimizations are common, and found in any good compilation book (dead code/variable elimination, unrolling, etc)

jsanalyzer's People

Contributors

rmonat avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.