Fine-tuning and Prompt-learning on Commonsense Causal Reasoning (CCR)

The second project of EPFL Machine learning course (CS-433). We choose the ML4Science option for the machine learning project.

Team Members

Yiyang Feng: [email protected]
Naisong Zhou: [email protected]
Yuheng Lu: [email protected]

Abstract

Commonsense Causal Reasoning (CCR) aims to understand and reason about the cause-and-effect relationships in the world. The COPA dataset is widely used to evaluate the performance of systems in CCR tasks. In this project, we define the COPA CCR task into two sub-tasks: the original classification task and the cause/effect generation task. We then implement fine-tuning models and the prompt learning model GPT-3 on these sub-tasks. Finally, we compare the performance between these models, and the results have shown that these models learn some commonsense causal relationship, and the performance of GPT-3 with prompt learning is significantly better on both tasks. We analyze the superior performance of GPT-3 may be due to more of its large number of model parameters and massive pre-trained dataset than prompt learning itself. However, the ability of few-shot learning is still important for its efficiency in down-stream adaptation.

Install requirements

conda create -n mlproject2 python=3.8 jupyterlab=3.2 numpy=1.23.5 transformers=4.20.1 tqdm=4.64.1 evaluate=0.3.0 pandas=1.5.2 wandb=0.12.21 datasets=2.7.1
conda activate mlproject2

For the PyTorch environment, just reference the website and choose the suitable version according your cuda version.

Dataset

COPA, short for Choice Of Plausible Alternatives, is a widely used dataset in evaluating performance in CCR tasks. Here is an example for illustration:

It has five fields. One premise, two choices, one question and one label. The original COPA task is just a binary classification task.

We don't have to download the dataset manually, just import from the class datasets.

from datasets import load_dataset
copa = load_dataset("super_glue", "copa")

Task Definition

We divide the CCR task into two sub-tasks: text classification and generation. Here is the illustration of the two subtasks.

classification	generation

Experiment Settings

We study the fine-tuning and prompt-learning method for these tasks, the two main methods for current language models.

Classification Task

We compare bert-base-uncased, roberta-base, xlm-roberta-base, albert-base-v2, albert-large-v2, and GPT-3-text-davinci-003.

For fine-tuning models, run:

sh copa_classification.sh

For prompt-learning models, run the notebook copa_classification_prompt_learning.ipynb. Add a key api_key.txt in the main folder for storing the OpenAI API key.

Result:

Italics: best fine-tuning performance
Bold: best performance

	BERT -base -uncased	RoBERTa -base	XLM -RoBERTa -base	ALBERT -base -v2	ALBERT -large -v2	GPT-3 -175B
Accuracy	74.6±1.2	68.4±1.2	64.2±1.6	69.6±0.8	74.6±1.2	92.0±0.6
Precision	74.3±1.2	68.4±1.3	66.1±2.3	69.6±1.0	74.4±1.2	92.4±0.7
Recall	74.4±1.3	68.5±1.3	65.4±1.1	69.8±1.0	74.3±1.2	91.6±0.6
F1-score	74.4±1.2	68.3±1.6	64.1±1.5	69.5±0.9	74.3±1.2	91.8±0.6

Generation Task

We compare bert-base-uncased, roberta-base, xlm-roberta-base, and bart-base.

For fine-tuning models, run:

sh copa_generation.sh

For prompt-learning models, run the notebook copa_generation_prompt_learning.ipynb. Add a key api_key.txt in the main folder as above.

Result:

Italics: best fine-tuning performance
Bold: best performance

	BERT -base -uncased	RoBERTa -base	XLM -RoBERTa -base	BART -base	GPT-3 -175B
BLEU-1	2.0%	26.7%	0	31.2%	39.2%
BLEU-2	0.5%	5.9%	0	11.8%	23.3%
BLEU-3	0	0	0	6.2%	14.3%
BLEU-4	0	0	0	3.5%	14.8%
METEOR	8.9%	16.5%	0.8%	23.3%	24.0%
ROUGE-L	4.3%	15.0%	0.8%	22.8%	15.1%
CIDEr	0.001	0.109	0.003	0.399	0.892

Files

copa_classification.py : CCR classification script for the fine-tuning models.
copa_generation.py : CCR generation script for the fine-tuning models.
copa_classification.sh : configuration file for the fine-tuning models on CCR classification. comments in the shell script show how to use the script.
copa_generation.sh : configuration file for the fine-tuning models on CCR generation. comments in the shell script show how to use the script.
report.pdf : our report for this project.
copa_classification_prompt_learning.ipynb : CCR classification notebook for the prompt-learning models.
copa_generation_prompt_learning.ipynb : CCR generation notebook for the prompt-learning models.
api_key.txt: your secret OpenAI API key for running the GPT-3 model.
cider : folder for CIDEr score. (no supported in the official libraries.)
- cider_scorer.py : cider score computation.
- cider.py : define a class for computing cider score.

cs-433 / ml-project-2-404notfound2 Goto Github PK

ml-project-2-404notfound2's Introduction

Fine-tuning and Prompt-learning on Commonsense Causal Reasoning (CCR)

Team Members

Abstract

Install requirements

Dataset

Task Definition

Experiment Settings

Classification Task

Generation Task

Files

ml-project-2-404notfound2's People

Contributors

Watchers

Forkers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent