Active PETs

Code repository for Active PETs. This repository would not be possible without previous open-source projects ALPS and PET.

Our main contribution is a weighted ensemble of PETs, which is used to actively sample the most beneficial samples from an unlabelled pool.

This readme file mainly describe how to use the code. For more details to reproduce the experiments reported in the paper, please see the readme file under scripts.

Installation

Create virtual environment with Python 3.7+
Run following commands:

pip install -r requirements.txt

Organization

The repository is organized as the following subfolders:

data: folder for datasets
src_pet: source code for simulating active learning
pet: core code for PETs
scripts: scripts for running experiments
pets: saved models from running experiments
results: results of active learning experiments

Usage

All commands below should be ran in the top-level directory activepets.

Train a PET model on full training dataset

To simply train a PET model on the full training dataset, run

bash scripts/train.sh

After training, this model will be saved under a subdirectory called base in pets directory. Results on dev set will be saved in eval_results.txt.

You may modify the parameters (like model type, task, seed, etc.) in scripts/train.shby configuring the variables at the top of the script.

Run active learning simulations

To simulate various active learning without ensemble, run

bash scripts/active_train.sh

This script will sample data for a fixed number of iterations and then fine-tune the model on the sampled data for each iteration. The fine-tuned model will be saved under a subdirectory called {strategy}_{size} where strategy is the active learning strategy used to sample data and size is the number of examples used to fine-tune the model. Results on dev set will be saved in eval_results.txt.

To modify parameters in scripts/active_train.sh, you can configure the variables at the top of the script. Please read the instructions below for more information.

Run active_pets simulations

To simulate various active learning, run

bash scripts/active_commitee.sh

Naming conventions of strategies

Here are the naming conventions of the strategies from the paper:

Random sampling: rand
BADDGE: badge
CAL: cal
ALPS: alps
Active-PETS: activepets

So, whenever you want to use Active-PETs, you would pass in activepets as input to the commands presented below.

Sample size

To set the size of data sampled on each iteration, configure the variable INCREMENT. To set the maximum size of total data sampled, configure the variable MAX_SIZE. The number of iterations would be MAX_SIZE\INCREMENT.

Future work and collaborations

I'm interested in extending the current work in various ways, if there is any collaborating interests. For example, its efficiency has lots of potentials.

Contact

Xia Zeng ([email protected])

xiazeng0223 / active_pets Goto Github PK

active_pets's Introduction

Active PETs

Installation

Organization

Usage

Train a PET model on full training dataset

Run active learning simulations

Run active_pets simulations

Naming conventions of strategies

Sample size

Future work and collaborations

Contact

active_pets's People

Contributors

Stargazers

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent