Git Product home page Git Product logo

cct5's Introduction

CCT5: A Code-Change-Oriented Pre-Trained Model

We note that a recent work from NTU also focuses on pre-trained models for code change. Since these works are conducted at similar time, we missed to discuss this paper in our Related Work part. We are sorry for that and we hope readers could also pay attention to this paper.

Getting Started

Requirements

    pytorch=2.0.0;
    torchvision=0.15.1;
    torchaudio;
    datasets==1.16.1;
    transformers==4.21.1;
    tensorboard==2.12.2;
    tree-sitter==0.19.1;
    nltk=3.8.1;
    scipy=1.10.1;

Install the above requirements manully or execute the following script:

bash scripts/setup.sh

Download and preprocess

  1. Download the dataset and models:
bash scripts/download.sh
  1. Prepare the dataset for pre-training[optional]
bash scripts/prepare_dataset.sh

Pretrain the model[optional]

bash scripts/pre-train.sh -g [GPU_ID]

Task 1: Commit Message Generation

bash scripts/finetune_msggen.sh -g [GPU_ID] -l [cpp/csharp/java/javascript/python/fira]

The released checkpoint may performs better than stated in the paper. If the evaluation during fine-tuning takes too long, you can adjust the "--evaluate_sample_size" parameter. This parameter refers to the number of cases in the validation set during evaluation.

To evaluate the performance of a specific checkpoint, add the flag "-e" followed by the checkpoint path:

bash scripts/finetune_msggen.sh -g [GPU_ID] -l [cpp/csharp/java/javascript/python/fira] -e [path_to_model]

Note that if [path_to_model] is blank, this script will automatically evaluate our released checkpoint.

Task 2: Just-in-Time Comment Update

bash scripts/finetune_cup.sh -g [GPU_ID]

To evaluate a specific checkpoint like in Task 1, add the flag "-e" followed by the checkpoint path.

Additionally, we have released the the output result of CCT5 and baselines, which is stored at results/CommentUpdate. Execute the following script and assign the path_to_result_file to evaluate its effectiveness:

bash scripts/eval_cup_res.sh --filepath [path_to_result_file]

Task 3: Just-in-Time Defect Prediction

Only semantic features:

Fine-tune:

bash scripts/finetune_jitdp_SF.sh -g [GPU_ID]

Evaluate:

bash scripts/finetune_jitdp_SF.sh -g [GPU_ID] -e [path_to_model]

Semantic features + expert features:

Fine-tune:

bash scripts/finetune_jitdp_SF_EF.sh -g [GPU_ID]

Evaluate:

bash scripts/finetune_jitdp_SF_EF.sh -g [GPU_ID] -e [path_to_model]

Task 4: Code Change Quality Estimation

Fine-tune:

bash scripts/finetune_QE.sh -g [GPU_ID]

Evaluate:

bash scripts/finetune_QE.sh -g [GPU_ID] -e [path_to_model]

Task 5: Review Generation

Fine-tune:

bash scripts/finetune_CodeReview.sh -g [GPU_ID]

Evaluate:

bash scripts/finetune_CodeReview.sh -g [GPU_ID] -e [path_to_model]

Credit

We reused some code from open-source repositories. We would like to extend our gratitude to the following repositories:

  1. CodeT5
  2. CodeBERT
  3. NatGen

Citation

@inproceedings{lin2023cct5,
  title={CCT5: A Code-Change-Oriented Pre-Trained Model},
  author={Lin, Bo and Wang, Shangwen and Liu, Zhongxin and Liu, Yepang and Xia, Xin and Mao, Xiaoguang},
  booktitle={Proceedings of the 31th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering},
  year={2023}
}

cct5's People

Contributors

ringbo avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.