Git Product home page Git Product logo

malihehizadi / catiss Goto Github PK

View Code? Open in Web Editor NEW
10.0 1.0 1.0 16.32 MB

CatIss is an intelligent tool for automatic categorization of issue reports based on the RoBERTa model.

License: Apache License 2.0

Jupyter Notebook 100.00%
nlp calssification transformers machine-learning natural-language-processing fine-tuning pretrained-models roberta-model software-engineering issue-management

catiss's Introduction

CatIss

Good news!!! CatIss is the winner of the NLBSE'22 tool competition! ๐Ÿฅ‡ ๐Ÿ† ๐ŸŽ‰

This repository contains the source code, notebooks, model, and datasets used for training CatIss, an intelligent tool for automatic categorization of issue reports based on the RoBERTa model. I first deduplicated, cleaned and truncated the datasets provided for the NLBSE 2022 Tool Competition (The First International Workshop on Natural Language-based Software Engineering) [1], then fine-tuned RoBERTa for four epochs on the cleaned training set. CatIss is able to achieve an 87.2% F1-score (micro average) on the provided test set.

Shared Model and Data

The saved model and cleaned datasets are shared publicly on Google Drive:

https://drive.google.com/drive/folders/1jgV4U41-2acctpc6jH5DWL3fF5V6bKF8?usp=sharing

System Information

Experiments are conducted on a machine equipped with Ubuntu 16.04, 64-bit as the operating system, two GeForce RTX 2080 GPU cards, AMD Ryzen Threadripper 1920X CPU, and 64G RAM. Training lasts for four hours and 20 minutes. Note that preprocessing the datasets significantly reduces the training cost, while maintaining the accuracy of predictions.

Tool Paper Abstract

This paper describes CatIss, an automatic Categorizer of Issue reports which is built upon the Transformer-based pre-trained RoBERTa model. CatIss classifies issue reports into three main categories of bug report, enhancement/feature request, and question. First, the datasets provided for the NLBSE tool competition are cleaned and preprocessed. Then, the pre-trained RoBERTa model is fine-tuned on the preprocessed dataset. Evaluating CatIss on more than 80 thousand issue reports from GitHub, indicates that it performs very well surpassing the competition baseline, TiketTagger [2, 3], TicketTagger, and achieving 87.2% F1-score (micro average). Additionally, as CatIss is trained on a wide set of repositories, it is a generic prediction model, hence applicable for any unseen software project or projects with little historical data. Scripts for cleaning the datasets, training CatIss and evaluating the model are publicly available. CatIss is based on our recent work published by Empirical Software Engineering Journal (EMSE), which I will also be presenting at the 44th International Conference on Software Engineering (ICSE'22) Conference in the Journal First Track [4].

References

[1] Kallis, R., Chaparro, O., Di Sorbo, A., and Panichella, S., NLBSE'22 Tool Competition, Proceedings of The 1st International Workshop on Natural Language-based Software Engineering (NLBSE'22)

[2] Kallis, R., Di Sorbo, A., Canfora, G., & Panichella, S. (2019, September). Ticket tagger: Machine learning driven issue classification. In 2019 IEEE International Conference on Software Maintenance and Evolution (ICSME) (pp. 406-409). IEEE.

[3] Kallis, R., Di Sorbo, A., Canfora, G., & Panichella, S. (2021). Predicting issue types on GitHub. Science of Computer Programming, 205, 102598.

[4] Izadi, M., Akbari, K., & Heydarnoori, A. (2022). Predicting the objective and priority of issue reports in software repositories. Empirical Software Engineering, 27(2), 1-37.

How to Cite

If you use CatIss in your work, please cite as following:

Izadi, M., CatIss: An Intelligent Tool for Categorizing Issues Reports using Transformers, In Proceedings of The 1st International Workshop on Natural
Language-based Software Engineering (NLBSEโ€™22), page (to appear), 2022.

catiss's People

Contributors

malihehizadi avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

Forkers

sschecker

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.