unsupervised-nlp-research-project's Introduction

Effectiveness of natural language processing techniques in categorizing scientific articles by research methodology.

The code and some other files used in my bachelor thesis.

Abstract: With the ever-growing number of published scientific articles, it is increasingly challenging for researchers to find, review and use relevant research. This study explores the potential of using unsupervised text classification models, specifically a zero-shot classification model (GPTNLI) and a similarity-based (Lbl2vec) classification model, to streamline the literature review process. These models predict the methodological approach based on simple information like the title, keywords and abstract, thereby adding filter to scientific database searches. To accomplish this, an extensive and well-structured definition is established for each class. The finding demonstrates that the GPTNLI model using GPT4, outperforms the other models in accuracy and f1 scores while showing reduced variability in its performance. Through using a binomial test it is shown that the model's performance statistically outperforms a random-guess strategy. Although these results are promising, the study has its limitations; For instance, the use of small test datasets and lack of cost-benefit analysis. Future research could improve the performance of the models by incorporating more sections of the study, further fine-tuning and adding self-learning capabilities.

Recommend Projects

tareksakhi / unsupervised-nlp-research-project Goto Github PK

unsupervised-nlp-research-project's Introduction

Effectiveness of natural language processing techniques in categorizing scientific articles by research methodology.

unsupervised-nlp-research-project's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent