self-augmentation-strategy's Introduction

SAS: Self-Augmentation Strategy for Language Model Pre-training

This repository contains the official pytorch implementation for the paper "SAS: Self-Augmentation Strategy for Language Model Pre-training" based on Huggingface transformers version 4.3.0.

Only the SAS without the disentangled attention mechanism is released for now. To be updated.

File structure

train.py: The file for pre-training.
run_glue.py: The file for finetuning.
models
- modeling_sas.py: The main algorithm for the SAS.
- trainer_sas.py: It is inherited from Huggingface transformers. It is mainly modified for data processing.
utils: It includes all the utilities.
- data_collator_sas.py: It includes the details about self-augmentations.
The rest of codes are supportive.

How to

Download and Install

Clone this repository.
Download dataset for wiki-corpus. Store it to data folder. Currently, we only provide a trail data with 1 million sentence. Full dataset can be pre-processed according to BERT. Detail to be released.

(Optional) Create an environment through conda by the provided environment.yml
- You can also manually install the package:
  - Python==3.9, pytorch==1.10.0, transformers==4.3.0, etc.

    # Clone package
    git clone [email protected]:fei960922/SAS-Self-Augmentation-Strategy.git
    cd SAS-Self-Augmentation-Strategy

    # Establish the environment.
    conda env create -f environment.yml 
    conda activate cssl

    # Download dataset and checkpoint
    wget http://www.stat.ucla.edu/~yifeixu/sas/wiki_corpus_1M.npy

Train from stractch

    # Run default setting 
    bash script/pretrain.sh

    # Run custom setting
    python train.py

    # Starting from checkpoint 
    python train.py --start_from_checkpoint 1 --pretrain_path {PATH_TH_CHECKPOINT}

Caclulate GLUE scores

    # By running this bash, GLUE dataset will be automatically downloaded.
    bash finetune.sh MNLI 0 sas-base output_dir 5e-5 32 4 42
    bash finetune.sh MNLI 0 sas-small output_dir 1e-4 32 4 42

Pre-trained models

Model	Description	Download
SAS_small	SAS using the architecture the same as Electra-small model's discriminator	SAS_small
SAS_DA_small	SAS_small with disentangled attention	SAS_DA_small
SAS_DA_base	SAS using the architecture the same as Electra-base model's discriminator and disentangled attention	SAS_DA_base

GLUE Scores(Dev)

Model	MNLI	QQP	QNLI	SST2	CoLA	STSB	MRPC	RTE
SAS_small	81.82	90.14	89.21	90.13	61.33	87.55	87.25	66.06

self-augmentation-strategy's People

Contributors

Stargazers

Watchers

self-augmentation-strategy's Issues

Downloaded model-type=deberta but generated model-type=sas

I generated a model using your pretraining script, but the config of the generated model differs significantly from the model downloaded from your github. The SAS_DA_base model has model-type deberta and model architecture "SADebertaForPretraining". I downloaded this model and used it as a checkpoint for the pretraining script. The model output for the pretraining script has model-type sas and model architecture "SasForPreTraining."

When I try to load the second into huggingface, it says that it has not match for model type = "sas". If I use the deberta model type, I get the warning "You are using a model of type sas to instantiate a model of type deberta. This is not supported for all configurations of models and can yield errors." followed by a list of weights not used (looks like all of them).

Why is the model generated with the pretraining script different from the model posted on the Github page?

How do you load a model of type "sas" with Huggingface?

SAS_DA_base config:
{
"architectures": [
"SADebertaForPretraining"
],
"attention_probs_dropout_prob": 0.1,
"embedding_size": 768,
"hidden_act": "gelu",
"hidden_dropout_prob": 0.1,
"hidden_size": 768,
"initializer_range": 0.02,
"intermediate_size": 3072,
"layer_norm_eps": 1e-07,
"max_position_embeddings": 512,
"max_relative_positions": -1,
"model_type": "deberta",
"num_attention_heads": 12,
"num_hidden_layers": 12,
"pad_token_id": 0,
"pooler_dropout": 0,
"pooler_hidden_act": "gelu",
"pooler_hidden_size": 768,
"pos_att_type": [
"c2p",
"p2c"
],
"position_biased_input": false,
"relative_attention": true,
"type_vocab_size": 0,
"vocab_size": 30522
}

Output of pretraining script Config:
{
"absolute_position_embedding": 1,
"architectures": [
"SasForPreTraining"
],
"attention_probs_dropout_prob": 0.1,
"augmentation_copies": 1,
"augmentation_temperature": 1,
"cold_start_epochs": 1.0,
"debug_config": {
"debugActivationInterval": 100000000,
"debugExtraMetrics": 1,
"debugGradOverflowInterval": 100,
"debugMemStatsInterval": 1000,
"debugMultiTasksConflictInterval": 1000,
"logging_steps": 200
},
"dis_weight": "50-50",
"dis_weight_scheduler": 4,
"dynamic_masking": 0,
"embedding_size": 768,
"gen_weight": 1,
"hidden_act": "gelu",
"hidden_dropout_prob": 0.1,
"hidden_size": 768,
"initializer_range": 0.02,
"intermediate_size": 3072,
"layer_norm_eps": 1e-12,
"max_position_embeddings": 128,
"model_type": "sas",
"num_attention_heads": 12,
"num_hidden_layers": 12,
"pad_token_id": 0,
"position_embedding_type": [
"absolute"
],
"relative_position_embedding": 0,
"summary_activation": "gelu",
"summary_last_dropout": 0.1,
"summary_type": "first",
"summary_use_proj": true,
"transformers_version": "4.3.0",
"type_vocab_size": 2,
"vocab_size": 30522
}

Recommend Projects

alibaba / self-augmentation-strategy Goto Github PK

self-augmentation-strategy's Introduction

SAS: Self-Augmentation Strategy for Language Model Pre-training

File structure

How to

Download and Install

Train from stractch

Caclulate GLUE scores

Pre-trained models

GLUE Scores(Dev)

self-augmentation-strategy's People

Contributors

Stargazers

Watchers

Forkers

self-augmentation-strategy's Issues

Downloaded model-type=deberta but generated model-type=sas

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent