Git Product home page Git Product logo

chariot-testing's Introduction

Generating Python Mutants from Bug Fixes using Neural Machine Translation

This repository contains the code for the paper "Generating Python Mutants from Bug Fixes using Neural Machine Translation" by Sergen AŞIK, and Uğur YAYAN.

Dataset

Dataset folder contains the Python scripts that are used to create the dataset. The dataset is created using the following steps:

  • commit_to_diff.py (Extraction) script is used to downloads the diff files of the commits in the commit list file and saves them to the diffTexts folder. The commit list file contains the commit ids of the commits and repository urls of the repositories. The commit list file is in the following format:
commit_id1,repository_url1
commit_id2,repository_url2
...
commit_idn,repository_urln
  • diff_to_tp.py (Transform) script is used to separate text files with the commit differences one by one. It then calls the sep_file function to get the buggy and fixed version of each file. While performing this operation, each buggy and fixed source code pair must be numbered the same. Global variables are needed in order for the numbering process to be regular.

  • edit_actions.py (Classifier) script is used to extract the actions that occur during the conversion of the buggy code to the fixed code. Actions of source codes translated to ASTs are obtained with xml_diff. The actions are saved in the mutation_types folder.

  • abstraction.py (Abstraction) script is used to abstract source code to reduce vocabulary size.

  • ex_abs_main.py script is the main script for the create dataset process. It calls the other scripts in the correct order.

Dataset Interface

Requirements

Python 3.8 or above is required. You can install the requirements using the following command:

pip install -r Dataset/requirements.txt

Usage

To start the dataset creation process, you must first create a commit list file. The commit list file contains the commit ids of the commits and repository urls of the repositories. The commit list file is in the following format:

python Dataset/main.py

Transformer Model

The Transformer is a neural network architecture that solves sequence to sequence problems using attention mechanisms. The Transformer model is used to translate the fixed source code to the buggy source code. The Transformer model is implemented using the PyTorch library. The Transformer model is trained using the dataset created in the previous step. The architecture of the Transformer model is shown in the following figure:

Transformer Architecture

Requirements

Python 3.8 or above is required. You can install the requirements using the following command:

pip install -r Model/requirements.txt

Training

Let's see how we can train the Transformer Model from scratch using the code in this repo. First let's download the dataset. The dataset folder contains the following subfolders:

  • Update: contains the update mutation type source and target files
  • Delete: contains the delete mutation type source and target files
  • Insert: contains the insert mutation type source and target files

The files are seperated two parts: Formatted and Unformatted. Formatted files are the files that are formatted using the special tokens like NEWLINE, INDENT and DEDENT. Unformatted files are the files that are not formatted using the special tokens. We used the formatted files in our experiments.

Then train our model:

python Model/main.py

License

MIT

chariot-testing's People

Contributors

sergenasik avatar

Watchers

uguryayan avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.