Git Product home page Git Product logo

jtrans's Introduction

jTrans

This repo is the official code of jTrans: Jump-Aware Transformer for Binary Code Similarity Detection.

Illustrating the performance of the proposed jTrans

News

  • [2022/7/7] We update BinaryCorp with the original binaries.
  • [2022/6/18] We release the code and models of jTrans.
  • [2022/6/9] We release the preprocessing code and BinaryCorp, the dataset we used in our paper.
  • [2022/5/26] jTrans is now on ArXiv.

Get Started

Prerequisites

  • Linux (MacOS and Windows are not currently officially supported)
  • Python 3.8+
  • PyTorch 1.10+
  • CUDA 10.2+
  • IDA pro 7.5+ (only used for dataset processing)

Quick Start

a. Create a conda virtual environment and activate it.

conda create -n jtrans python=3.8 pandas tqdm -y
conda activate jtrans

b. Install PyTorch and other packages.

conda install pytorch cudatoolkit=11.0 -c pytorch
python -m pip install simpletransformers networkx pyelftools

c. Get code and models of jTrans.

git clone https://github.com/vul337/jTrans.git && cd jTrans

Download experiments.tar.gz and models.tar.gz and extract them.

tar -xzvf experiments.tar.gz && tar -xzvf models.tar.gz

d. Get the BinaryCorp dataset Download the processed dataset from this link

e. Finetune new models on the BinaryCorp

python finetune.py -h

d. Evaluation

python eval_save.py -h
python fasteval.py -h

try to evaluate jTrans on BinaryCorp-3M after extracting experiments.tar.gz

python fasteval.py

f. Try jTrans on your own binaries

Make sure you have IDA pro 7.5+ and following the instructions at datautils. After extracting features of your binaries, you can try jTrans on them such as the usage at eval_save.py.

Dataset

  • We present a new large-scale and diversified dataset, BinaryCorp, for the task of binary code similarity detection.
  • The description of the dataset can be found at here and we give an example for using BinaryCorp.
  • If you need to use features that we do not provide in advance, such as call graphs, you can download the raw binaries from here.

Acknowledgement

This project is not possible without multiple great open-sourced code bases. We list some notable examples below.

Bibtex

If this work or BinaryCorp dataset are helpful for your research, please consider citing the following BibTeX entry.

@article{wang2022jtrans,
  title={jTrans: Jump-Aware Transformer for Binary Code Similarity},
  author={Wang, Hao and Qu, Wenjie and Katz, Gilad and Zhu, Wenyu and Gao, Zeyu and Qiu, Han and Zhuge, Jianwei and Zhang, Chao},
  journal={arXiv preprint arXiv:2205.12713},
  year={2022}
}

jtrans's People

Contributors

hustcw avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.