Git Product home page Git Product logo

vagrant-pallang's Introduction

Towards Vulnerability Discovery Using Staged Program Analysis

This is a vagrant box that demos Melange, a heuristics based program analysis tool that eagerly flags multiple bug classes in source code. It is vaguely like lint, very vaguely if you look into it closely. Try it!

Demo: Provisioning and running

Pre-requisites

On Host:

  • vagrant (Tested with v1.7.2)
  • virtualbox (Tested with v4.3.14)
  • C++ codebase to analyze, preferably something you are familiar with

For running stage 2 analysis in the box, you'll need

  • VT-x enabled on host
  • Sufficient RAM [4G RAM allocated to guest]
  • lscpu shows at least 2 CPUs

Note

If you don't intend to run the pass in the guest, it is safe to comment out the following portions of Vagrantfile. Besides, if your host architecture does not support VT-x and/or you don't have much compute power to spare, this is a must.

#  config.vm.provider "virtualbox" do |v|
#        v.name = "pallang-vm"
#        v.customize ["modifyvm", :id, "--cpuexecutioncap", "90"]
#        v.memory = 4096
#        v.cpus = 2
#  end

Command line

Clone box config

user@host:~$ git clone https://github.com/bshastry/vagrant-pallang.git
user@host:~$ cd vagrant-pallang

Next, bring up the box.

Bring up box and fetch demo tarball

user@host:<Vagrant-pallang-dir>$ vagrant up
## This gives you an ssh shell in the box with X11 forwarding
user@host:<Vagrant-pallang-dir>$ vagrant ssh
## Fetches demo code. Might take a while. Roughly 350M of data over the network.
vagrant@precise64:~$ /vagrant/scripts/fetch.sh

This will fetch the demo code into $HOME/demo of the vagrant box. At this point, the box is ready for analyzing code. The analysis is performed over two stages. In stage 1, source code analysis is performed alongside compilation. In stage 2, IR code is analyzed. More on this later.

Run Stage 1 analysis

First, clone a c++ repo of your choice.

## Clone something
vagrant@precise64:~$ mkdir code; cd code
vagrant@precise64:~/code$ git clone $something

## Configure or something like that
vagrant@precise64:~/code/something$ pscan-build ./configure

## Make or something like that
vagrant@precise64:~/code/something$ pscan-build make

Note that analysis+compilation is possibly an order of magnitude slower than native compilation only. Expect high latency for large codebases. If you want this to be less intrusive on host processes, lower the max cpu execution cap in Vagrantfile, like so:

#        v.customize ["modifyvm", :id, "--cpuexecutioncap", "20"]

In addition, use screen/tmux to manage analysis sessions in the box. This way, you can leave box while analysis happens, peeking into the box at your leisure via vagrant ssh from the root vagrant directory.

Output of analysis i.e., Clang SA bug reports will be placed in ./scan-build-out relative to $something directory. If you want stuff to persist on host PC after powering off vagrant box (via vagrant halt or vagrant destroy), do everything relative to /vagrant since that directory is explicitly synced with namesake on host prior to box poweroff.

Once analysis is complete, you can view an indexed summary of analysis results via scan-view:

vagrant@precise64:~/code/something$ scan-view ./scan-build-out/$DIRNAME

Next, you are ready to run second and final analysis against IR code. But before that you need to extract IR code from a native executable or library.

Run Stage 2 analysis

First, please locate analysis target, and then run extract-bc against it, like so:

vagrant@precise64:~/code/$something/PATH_TO_TARGET$ extract-bc -v target

This creates an LLVM module from the native library/executable. For the magic behind this, consult whole program llvm. Next, we run the second and final stage analysis.

vagrant@precise64:~/code/$something$ mkdir pallang; cd pallang
vagrant@precise64:~/code/$something$ pallang $PATH_TO_SCAN_BUILD_REPORTS $PATH_TO_BC_TARGET --wpa &

The filename pass.*.txt is going to contain pass results for bug reports analyzed. A summary of analysis will be printed to a namesake file once stage 2 analysis is complete.

Please note that LLVM analysis can be I/O and CPU intensive. Analysis of bug reports against huge LLVM modules (hundreds of MBs) require ~2 hours for analysis.

Box contents

  • precise64 box
    • vim, git, clang-3.6, llvm-3.6, gcc-4.8, make, firefox installed
    • analysis infra
      • prebuilt llvm and patched clang binaries
      • prebuilt analysis plug-in provisioned as a shared library
      • Source code for llvm-pass and pallang script

Citation

@inproceedings{shastry2016towards,
  title={Towards Vulnerability Discovery Using Staged Program Analysis},
  author={Shastry, Bhargava and Yamaguchi, Fabian and Rieck, Konrad and Seifert, Jean-Pierre},
  booktitle={13th International Conference, DIMVA 2016, San Sebasti{\'a}n, Spain, July 7-8, 2016, Proceedings},
  volume={97212016},
  number={Lecture Notes in Computer Science},
  pages={78--97},
  year={2016},
  organization={Springer International Publishing}
}

Acknowledgements

Misc

Archive

vagrant-pallang's People

Stargazers

 avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.