utterworks / fast-bert Goto Github PK

Super easy library for BERT based NLP models

License: Apache License 2.0

Python 71.13% Jupyter Notebook 25.86% Shell 1.58% Dockerfile 1.43%

fast-bert's Introduction

Fast-Bert

New - Learning Rate Finder for Text Classification Training (borrowed with thanks from https://github.com/davidtvs/pytorch-lr-finder)

Supports LAMB optimizer for faster training. Please refer to https://arxiv.org/abs/1904.00962 for the paper on LAMB optimizer.

Supports BERT and XLNet for both Multi-Class and Multi-Label text classification.

Fast-Bert is the deep learning library that allows developers and data scientists to train and deploy BERT and XLNet based models for natural language processing tasks beginning with Text Classification.

The work on FastBert is built on solid foundations provided by the excellent Hugging Face BERT PyTorch library and is inspired by fast.ai and strives to make the cutting edge deep learning technologies accessible for the vast community of machine learning practitioners.

With FastBert, you will be able to:

Train (more precisely fine-tune) BERT, RoBERTa and XLNet text classification models on your custom dataset.
Tune model hyper-parameters such as epochs, learning rate, batch size, optimiser schedule and more.
Save and deploy trained model for inference (including on AWS Sagemaker).

Fast-Bert will support both multi-class and multi-label text classification for the following and in due course, it will support other NLU tasks such as Named Entity Recognition, Question Answering and Custom Corpus fine-tuning.

BERT (from Google) released with the paper BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding by Jacob Devlin, Ming-Wei Chang, Kenton Lee and Kristina Toutanova.

XLNet (from Google/CMU) released with the paper XLNet: Generalized Autoregressive Pretraining for Language Understanding by Zhilin Yang*, Zihang Dai*, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, Quoc V. Le.
RoBERTa (from Facebook), a Robustly Optimized BERT Pretraining Approach by Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du et al.
DistilBERT (from HuggingFace), released together with the blogpost Smaller, faster, cheaper, lighter: Introducing DistilBERT, a distilled version of BERT by Victor Sanh, Lysandre Debut and Thomas Wolf.

Installation

This repo is tested on Python 3.6+.

With pip

PyTorch-Transformers can be installed by pip as follows:

pip install fast-bert

From source

Clone the repository and run:

pip install [--editable] .

pip install git+https://github.com/kaushaltrivedi/fast-bert.git

You will also need to install NVIDIA Apex.

git clone https://github.com/NVIDIA/apex
cd apex
pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./

Usage

Text Classification

1. Create a DataBunch object

The databunch object takes training, validation and test csv files and converts the data into internal representation for BERT, RoBERTa, DistilBERT or XLNet. The object also instantiates the correct data-loaders based on device profile and batch_size and max_sequence_length.

from fast_bert.data_cls import BertDataBunch

databunch = BertDataBunch(DATA_PATH, LABEL_PATH,
                          tokenizer='bert-base-uncased',
                          train_file='train.csv',
                          val_file='val.csv',
                          label_file='labels.csv',
                          text_col='text',
                          label_col='label',
                          batch_size_per_gpu=16,
                          max_seq_length=512,
                          multi_gpu=True,
                          multi_label=False,
                          model_type='bert')

File format for train.csv and val.csv

index	text	label
0	Looking through the other comments, I'm amazed that there aren't any warnings to potential viewers of what they have to look forward to when renting this garbage. First off, I rented this thing with the understanding that it was a competently rendered Indiana Jones knock-off.	neg
1	I've watched the first 17 episodes and this series is simply amazing! I haven't been this interested in an anime series since Neon Genesis Evangelion. This series is actually based off an h-game, which I'm not sure if it's been done before or not, I haven't played the game, but from what I've heard it follows it very well	pos
2	his movie is nothing short of a dark, gritty masterpiece. I may be bias, as the Apartheid era is an area I've always felt for.	pos

In case the column names are different than the usual text and labels, you will have to provide those names in the databunch text_col and label_col parameters.

labels.csv will contain a list of all unique labels. In this case the file will contain:

pos
neg

For multi-label classification, labels.csv will contain all possible labels:

toxic
severe_toxic
obscene
threat
insult
identity_hate

The file train.csv will then contain one column for each label, with each column value being either 0 or 1. Don't forget to change multi_label=True for multi-label classification in BertDataBunch.

id	text	toxic	severe_toxic	obscene	threat	insult	identity_hate
0	Why the edits made under my username Hardcore Metallica Fan were reverted?	0	0	0	0	0	0
0	I will mess you up	1	0	0	1	0	0

label_col will be a list of label column names. In this case it will be:

['toxic','severe_toxic','obscene','threat','insult','identity_hate']

Tokenizer

You can either create a tokenizer object and pass it to DataBunch or you can pass the model name as tokenizer and DataBunch will automatically download and instantiate an appropriate tokenizer object.

For example for using XLNet base cased model, set tokenizer parameter to 'xlnet-base-cased'. DataBunch will automatically download and instantiate XLNetTokenizer with the vocabulary for xlnet-base-cased model.

Model Type

Fast-Bert supports XLNet, RoBERTa and BERT based classification models. Set model type parameter value to 'bert', roberta or 'xlnet' in order to initiate an appropriate databunch object.

2. Create a Learner Object

BertLearner is the ‘learner’ object that holds everything together. It encapsulates the key logic for the lifecycle of the model such as training, validation and inference.

The learner object will take the databunch created earlier as as input alongwith some of the other parameters such as location for one of the pretrained models, FP16 training, multi_gpu and multi_label options.

The learner class contains the logic for training loop, validation loop, optimiser strategies and key metrics calculation. This help the developers focus on their custom use-cases without worrying about these repetitive activities.

At the same time the learner object is flexible enough to be customised either via using flexible parameters or by creating a subclass of BertLearner and redefining relevant methods.

from fast_bert.learner_cls import BertLearner
from fast_bert.metrics import accuracy
import logging

logger = logging.getLogger()
device_cuda = torch.device("cuda")
metrics = [{'name': 'accuracy', 'function': accuracy}]

learner = BertLearner.from_pretrained_model(
						databunch,
						pretrained_path='bert-base-uncased',
						metrics=metrics,
						device=device_cuda,
						logger=logger,
						output_dir=OUTPUT_DIR,
						finetuned_wgts_path=None,
						warmup_steps=500,
						multi_gpu=True,
						is_fp16=True,
						multi_label=False,
						logging_steps=50)

parameter	description
databunch	Databunch object created earlier
pretrained_path	Directory for the location of the pretrained model files or the name of one of the pretrained models i.e. bert-base-uncased, xlnet-large-cased, etc
metrics	List of metrics functions that you want the model to calculate on the validation set, e.g. accuracy, beta, etc
device	torch.device of type cuda or cpu
logger	logger object
output_dir	Directory for model to save trained artefacts, tokenizer vocabulary and tensorboard files
finetuned_wgts_path	provide the location for fine-tuned language model (experimental feature)
warmup_steps	number of training warms steps for the scheduler
multi_gpu	multiple GPUs available e.g. if running on AWS p3.8xlarge instance
is_fp16	FP16 training
multi_label	multilabel classification
logging_steps	number of steps between each tensorboard metrics calculation. Set it to 0 to disable tensor flow logging. Keeping this value too low will lower the training speed as model will be evaluated each time the metrics are logged

3. Find the optimal learning rate

The learning rate is one of the most important hyperparameters for model training. We have incorporated the learining rate finder that was proposed by Leslie Smith and then built into the fastai library.

learner.lr_find(start_lr=1e-5,optimizer_type='lamb')

The code is heavily borrowed from David Silva's pytorch-lr-finder library.

4. Train the model

learner.fit(epochs=6,
			lr=6e-5,
			validate=True, 	# Evaluate the model after each epoch
			schedule_type="warmup_cosine",
			optimizer_type="lamb")

Fast-Bert now supports LAMB optmizer. Due to the speed of training, we have set LAMB as the default optimizer. You can switch back to AdamW by setting optimizer_type to 'adamw'.

5. Save trained model artifacts

learner.save_model()

Model artefacts will be persisted in the output_dir/'model_out' path provided to the learner object. Following files will be persisted:

File name	description
pytorch_model.bin	trained model weights
spiece.model	sentence tokenizer vocabulary (for xlnet models)
vocab.txt	workpiece tokenizer vocabulary (for bert models)
special_tokens_map.json	special tokens mappings
config.json	model config
added_tokens.json	list of new tokens

As the model artefacts are all stored in the same folder, you will be able to instantiate the learner object to run inference by pointing pretrained_path to this location.

6. Model Inference

If you already have a Learner object with trained model instantiated, just call predict_batch method on the learner object with the list of text data:

texts = ['I really love the Netflix original movies',
		 'this movie is not worth watching']
predictions = learner.predict_batch(texts)

If you have persistent trained model and just want to run inference logic on that trained model, use the second approach, i.e. the predictor object.

from fast_bert.prediction import BertClassificationPredictor

MODEL_PATH = OUTPUT_DIR/'model_out'

predictor = BertClassificationPredictor(
				model_path=MODEL_PATH,
				label_path=LABEL_PATH, # location for labels.csv file
				multi_label=False,
				model_type='xlnet',
				do_lower_case=False,
				device=None) # set custom torch.device, defaults to cuda if available

# Single prediction
single_prediction = predictor.predict("just get me result for this text")

# Batch predictions
texts = [
	"this is the first text",
	"this is the second text"
	]

multiple_predictions = predictor.predict_batch(texts)

Language Model Fine-tuning

A useful approach to use BERT based models on custom datasets is to first finetune the language model task for the custom dataset, an apporach followed by fast.ai's ULMFit. The idea is to start with a pre-trained model and further train the model on the raw text of the custom dataset. We will use the masked LM task to finetune the language model.

This section will describe the usage of FastBert to finetune the language model.

1. Import the necessary libraries

The necessary objects are stored in the files with '_lm' suffix.

# Language model Databunch
from fast_bert.data_lm import BertLMDataBunch
# Language model learner
from fast_bert.learner_lm import BertLMLearner

from pathlib import Path
from box import Box

2. Define parameters and setup datapaths

# Box is a nice wrapper to create an object from a json dict
args = Box({
    "seed": 42,
    "task_name": 'imdb_reviews_lm',
    "model_name": 'roberta-base',
    "model_type": 'roberta',
    "train_batch_size": 16,
    "learning_rate": 4e-5,
    "num_train_epochs": 20,
    "fp16": True,
    "fp16_opt_level": "O2",
    "warmup_steps": 1000,
    "logging_steps": 0,
    "max_seq_length": 512,
    "multi_gpu": True if torch.cuda.device_count() > 1 else False
})

DATA_PATH = Path('../lm_data/')
LOG_PATH = Path('../logs')
MODEL_PATH = Path('../lm_model_{}/'.format(args.model_type))

DATA_PATH.mkdir(exist_ok=True)
MODEL_PATH.mkdir(exist_ok=True)
LOG_PATH.mkdir(exist_ok=True)

3. Create DataBunch object

The BertLMDataBunch class contains a static method 'from_raw_corpus' that will take the list of raw texts and create DataBunch for the language model learner.

The method will at first preprocess the text list by removing html tags, extra spaces and more and then create files lm_train.txt and lm_val.txt. These files will be used for training and evaluating the language model finetuning task.

The next step will be to featurize the texts. The text will be tokenized, numericalized and split into blocks on 512 tokens (including special tokens).

databunch_lm = BertLMDataBunch.from_raw_corpus(
					data_dir=DATA_PATH,
					text_list=texts,
					tokenizer=args.model_name,
					batch_size_per_gpu=args.train_batch_size,
					max_seq_length=args.max_seq_length,
                    multi_gpu=args.multi_gpu,
                    model_type=args.model_type,
                    logger=logger)

As this step can take some time based on the size of your custom dataset's text, the featurized data will be cached in pickled files in the data_dir/lm_cache folder.

The next time, instead of using from_raw_corpus method, you may want to directly instantiate the DataBunch object as shown below:

databunch_lm = BertLMDataBunch(
						data_dir=DATA_PATH,
						tokenizer=args.model_name,
                        batch_size_per_gpu=args.train_batch_size,
                        max_seq_length=args.max_seq_length,
                        multi_gpu=args.multi_gpu,
                        model_type=args.model_type,
                        logger=logger)

4. Create the LM Learner object

BertLearner is the ‘learner’ object that holds everything together. It encapsulates the key logic for the lifecycle of the model such as training, validation and inference.

The learner class contains the logic for training loop, validation loop, and optimizer strategies. This help the developers focus on their custom use-cases without worrying about these repetitive activities.

At the same time the learner object is flexible enough to be customized either via using flexible parameters or by creating a subclass of BertLearner and redefining relevant methods.

learner = BertLMLearner.from_pretrained_model(
							dataBunch=databunch_lm,
							pretrained_path=args.model_name,
							output_dir=MODEL_PATH,
							metrics=[],
							device=device,
							logger=logger,
							multi_gpu=args.multi_gpu,
							logging_steps=args.logging_steps,
							fp16_opt_level=args.fp16_opt_level)

5. Train the model

learner.fit(epochs=6,
			lr=6e-5,
			validate=True, 	# Evaluate the model after each epoch
			schedule_type="warmup_cosine",
			optimizer_type="lamb")

Fast-Bert now supports LAMB optmizer. Due to the speed of training, we have set LAMB as the default optimizer. You can switch back to AdamW by setting optimizer_type to 'adamw'.

6. Save trained model artifacts

learner.save_model()

Model artefacts will be persisted in the output_dir/'model_out' path provided to the learner object. Following files will be persisted:

File name	description
pytorch_model.bin	trained model weights
spiece.model	sentence tokenizer vocabulary (for xlnet models)
vocab.txt	workpiece tokenizer vocabulary (for bert models)
special_tokens_map.json	special tokens mappings
config.json	model config
added_tokens.json	list of new tokens

The pytorch_model.bin contains the finetuned weights and you can point the classification task learner object to this file throgh the finetuned_wgts_path parameter.

Amazon Sagemaker Support

The purpose of this library is to let you train and deploy production grade models. As transformer models require expensive GPUs to train, I have added support for training and deploying model on AWS SageMaker.

The repository contains the docker image and code for building BERT based classification models in Amazon SageMaker.

Please refer to my blog Train and Deploy the Mighty BERT based NLP models using FastBert and Amazon SageMaker that provides detailed explanation on using SageMaker with FastBert.

Citation

Please include a mention of this library and HuggingFace pytorch-transformers library and a link to the present repository if you use this work in a published or open-source project.

Also include my blogs on this topic:

fast-bert's People

Contributors

Stargazers

Watchers

Forkers

templeblock elavin11 ml-lab allensmile cclauss ericxsun stjordanis legendtianjin pawel-kranzberg kendricklee91 jbdatascience soonhwan-kwon zorrock wuxiaobo ogarin allouachesamir bin2000 hiyoung-asr dylanxia2017 gym0569 benzei bharatr21 linhduongtuan pbellinga fredriko laugustyniak parsing-science pavangadde royam0820 mgao05 dsiginn gartentrio slidersun amir22010 yuchia0518 skygram junnyu yifuliu rahulpatraiitkgp triper1022 silencewinter biranchi2018 mjheller weiyanwuda jcarlosneto iamweiweishi anoop2019 baconwaffle leeon2vec xmxoxo shashisingh darshanpatel11 whs1111 vijayk2000 emtropyml mfaisal arun-ghontale danduma mitsvision pzhao16me aiedward little1tow shannonyu srravula1 micseb souschefistry dragomirradev jingmouren ferplascencia tanyanghzsd eanunez amitgayar miyuiki rgaonkar arita37 alberduris shaohongbai rafikrhouma02 hello-ram danyalandriano yanghaocsg ivylee vochicong nomiluks codeaudit annepoiteonai prashant118 rosssong itssimon 4ertovo4ka enzoampil ramakth1 bopo gitrekm rileyshe smpotdar summon-ml thousandoaks abulhasanat astr0w1ng

fast-bert's Issues

Inference on CPU crashes

I'm unable to load a trained model for inference on my Mac which doesn't have an Nvidia GPU.
I think it is because of this line. It should have a check around it to make sure CUDA is available before being called.

BertLearner.from_pretrained_model stuck

Everything works perfectly until I want to create the BertLearner.
When I run following cell
learner = BertLearner.from_pretrained_model(databunch, 'bert-base-multilingual-uncased', metrics, device, logger, finetuned_wgts_path=None, is_fp16=args['fp16'], loss_scale=args['loss_scale'], multi_gpu=multi_gpu, multi_label=False)

the cell is stuck loading.
The logger gives me following hints:

`07/17/2019 10:05:36 - INFO - pytorch_pretrained_bert.modeling - loading archive file https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-multilingual-uncased.tar.gz from cache at /home/ec2-user/.pytorch_pretrained_bert/437da855f7aeb6dcc47ee03b11ac55bfbc069d31354f6867f3b298aad8429925.dd2dce7e7331017693bd2230dbc8015b12a975201a420a856a6efbf7ae9d84c5
07/17/2019 10:05:36 - INFO - pytorch_pretrained_bert.modeling - extracting archive file /home/ec2-user/.pytorch_pretrained_bert/437da855f7aeb6dcc47ee03b11ac55bfbc069d31354f6867f3b298aad8429925.dd2dce7e7331017693bd2230dbc8015b12a975201a420a856a6efbf7ae9d84c5 to temp dir /tmp/tmp5yuiacnx
07/17/2019 10:05:43 - INFO - pytorch_pretrained_bert.modeling - Model config {
"attention_probs_dropout_prob": 0.1,
"directionality": "bidi",
"hidden_act": "gelu",
"hidden_dropout_prob": 0.1,
"hidden_size": 768,
"initializer_range": 0.02,
"intermediate_size": 3072,
"max_position_embeddings": 512,
"num_attention_heads": 12,
"num_hidden_layers": 12,
"pooler_fc_size": 768,
"pooler_num_attention_heads": 12,
"pooler_num_fc_layers": 3,
"pooler_size_per_head": 128,
"pooler_type": "first_token_transform",
"type_vocab_size": 2,
"vocab_size": 105879
}

07/17/2019 10:05:48 - INFO - pytorch_pretrained_bert.modeling - Weights of BertForSequenceClassification not initialized from pretrained model: ['classifier.weight', 'classifier.bias']
07/17/2019 10:05:48 - INFO - pytorch_pretrained_bert.modeling - Weights from pretrained model not used in BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias']`

learner.fit and learner.validate - AttributeError: 'Tensor' object has no attribute 'bool'

/content/xlnet_cased_L-12_H-768_A-12/output/tensorboard
Selected optimization level O1: Insert automatic casts around Pytorch functions and Tensor methods.

Defaults for this optimization level are:
enabled : True
opt_level : O1
cast_model_type : None
patch_torch_functions : True
keep_batchnorm_fp32 : None
master_weights : None
loss_scale : dynamic
Processing user overrides (additional kwargs that are not None)...
After processing overrides, optimization options are:
enabled : True
opt_level : O1
cast_model_type : None
patch_torch_functions : True
keep_batchnorm_fp32 : None
master_weights : None
loss_scale : dynamic
09/08/2019 14:37:51 - INFO - root - * Running training *
09/08/2019 14:37:51 - INFO - root - Num examples = 1000
09/08/2019 14:37:51 - INFO - root - Num Epochs = 6
09/08/2019 14:37:51 - INFO - root - Total train batch size (w. parallel, distributed & accumulation) = 8
09/08/2019 14:37:51 - INFO - root - Gradient Accumulation steps = 1
09/08/2019 14:37:51 - INFO - root - Total optimization steps = 750
0.00% [0/6 00:00<00:00]
100.00% [125/125 04:24<00:00]
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 32768.0
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 16384.0
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 8192.0
09/08/2019 14:42:16 - INFO - root - Running evaluation
09/08/2019 14:42:16 - INFO - root - Num examples = 1000
09/08/2019 14:42:16 - INFO - root - Batch size = 8
100.00% [125/125 01:19<00:00]

AttributeError Traceback (most recent call last)
in ()
----> 1 learner.fit(args.num_train_epochs, args.learning_rate, validate=True)

2 frames
/usr/local/lib/python3.6/dist-packages/fast_bert/metrics.py in accuracy_thresh(y_pred, y_true, thresh, sigmoid)
29 if sigmoid:
30 y_pred = y_pred.sigmoid()
---> 31 return ((y_pred > thresh) == y_true.bool()).float().mean().item()
32 # return np.mean(((y_pred>thresh)==y_true.byte()).float().cpu().numpy(), axis=1).sum()
33

AttributeError: 'Tensor' object has no attribute 'bool'

logits as a result

What do we need to specify for the labels if we need logits as a result?

roc_auc

When I tried to use a metric roc_auc, I got an error:
ValueError: Found input variables with inconsistent numbers of samples: [64, 128]

"train_batch_size": 64, "eval_batch_size": 64,

multi_label=False

Finetune on all layers

How can I train this model with fine tuning all layers?

args = Box({ "run_text": "multilabel toxic comments with freezable layers", "train_size": -1, "val_size": -1, "log_path": LOG_PATH, "full_data_dir": DATA_PATH, "data_dir": DATA_PATH, "task_name": "toxic_classification_lib", "no_cuda": False, "bert_model": BERT_PRETRAINED_PATH, "output_dir": OUTPUT_PATH, "max_seq_length": 512, "do_train": True, "do_eval": True, "do_lower_case": True, "train_batch_size": 8, "eval_batch_size": 16, "learning_rate": 5e-5, "num_train_epochs": 4, "warmup_proportion": 0.0, "no_cuda": False, "local_rank": -1, "seed": 42, "gradient_accumulation_steps": 1, "optimize_on_cpu": False, "fp16": True, "fp16_opt_level": "O1", "weight_decay": 0.0, "adam_epsilon": 1e-8, "max_grad_norm": 1.0, "max_steps": -1, "warmup_steps": 500, "logging_steps": 50, "eval_all_checkpoints": True, "overwrite_output_dir": True, "overwrite_cache": False, "seed": 42, "loss_scale": 128, "task_name": 'intent', "model_name": 'bert-base-uncased', "model_type": 'bert' })

databunch = BertDataBunch(args['data_dir'], LABEL_PATH, args.model_name, train_file='train.csv', val_file='val.csv', test_data='test.csv', text_col="text", label_col=label_cols, batch_size_per_gpu=args['train_batch_size'], max_seq_length=args['max_seq_length'], multi_gpu=args.multi_gpu, multi_label=True, model_type=args.model_type)

learner = BertLearner.from_pretrained_model(databunch, args.model_name, metrics=metrics, device=device, logger=logger, output_dir=args.output_dir, finetuned_wgts_path=FINETUNED_PATH, warmup_steps=args.warmup_steps, multi_gpu=args.multi_gpu, is_fp16=args.fp16, multi_label=True, logging_steps=0)

[Question]:comparison of DistilBERT

I was checking the memory consumption of RoBERTa and DistilBERT. I found there is no significant change in memory usage. Although Inference time is around 1sec for DistilBERT and for RoBERTa is 2sec.
Memory usage on CPU:
Port 9000: DistilBERT
Port 9002: RoBERTa

Have you seen any significant change in memory usage @kaushaltrivedi

module 'torch.distributed' has no attribute 'init_process_group'

Running the following code results in the following error,

databunch = BertDataBunch(DATA_PATH, LABEL_PATH, tokenizer, 
                          train_file='train.csv', val_file='valid.csv', label_file='labels.csv',
                          bs=args['train_batch_size'], maxlen=args['max_seq_length'], 
                          multi_gpu=multi_gpu, multi_label=False)

    373                 train_sampler = RandomSampler(train_data)
    374             else:
--> 375                 torch.distributed.init_process_group(backend="nccl", 
    376                                      init_method = "tcp://localhost:23459",
    377                                      rank=0, world_size=1)

AttributeError: module 'torch.distributed' has no attribute 'init_process_group'```

Unable to load pretrained model using BertLearner

Getting "TypeError: init_weights() takes 1 positional argument but 2 were given" when running the below code for any of bert, xlnet model. Please note that this code was working couple of days back.

learner = BertLearner.from_pretrained_model(
databunch,
pretrained_path='bert-base-uncased',#xlnet-large-cased, bert-base-uncased
metrics=metrics,
device=device_cuda,
logger=logger,
output_dir=OUTPUT_DIR,
finetuned_wgts_path=None,
warmup_steps=500,
multi_gpu=True,
is_fp16=True,
multi_label=True,
logging_steps=50)

unresolved problem

/usr/local/lib/python3.6/dist-packages/fast_bert/learner_cls.py in fit(self, epochs, lr, validate, schedule_type, optimizer_type)
211 def fit(self, epochs, lr, validate=True, schedule_type="warmup_cosine", optimizer_type='lamb'):
212
--> 213 tensorboard_dir = self.output_dir/'tensorboard'
214 tensorboard_dir.mkdir(exist_ok=True)
215 print(tensorboard_dir)

TypeError: unsupported operand type(s) for /: 'str' and 'str

How to save best model based on metrics?

I wonder if there is something like different callbacks in fastai for saving models and earlystopping?

Load the model on CPU instead of GPU

I cant load the model on CPU instead of GPU while trained on GPU. Can somebody tell me

Support for multi-label and multi-class text classification using DistilBERT

How can I use DistilBERT for multi-label classification for building a fast and deploy-able model?

Can't read in train.csv

Hi,

I'm trying to test out fast-bert, and when I setup a train.csv file as follows:
index text label
0 test neg
2 test2 pos

tab seperated test file, I get the following error:

Traceback (most recent call last):
File "/home/w3pt/.local/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 4729, in get_value
return libindex.get_value_box(s, key)
File "pandas/_libs/index.pyx", line 51, in pandas._libs.index.get_value_box
File "pandas/_libs/index.pyx", line 47, in pandas._libs.index.get_value_at
File "pandas/_libs/util.pxd", line 98, in pandas._libs.util.get_value_at
File "pandas/_libs/util.pxd", line 83, in pandas._libs.util.validate_indexer
TypeError: 'str' object cannot be interpreted as an integer

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "bert.py", line 17, in
model_type='bert')
File "/home/w3pt/.local/lib/python3.7/site-packages/fast_bert/data_cls.py", line 332, in init
train_file, text_col=text_col, label_col=label_col)
File "/home/w3pt/.local/lib/python3.7/site-packages/fast_bert/data_cls.py", line 222, in get_train_examples
return self._create_examples(data_df, "train", text_col=text_col, label_col=label_col)
File "/home/w3pt/.local/lib/python3.7/site-packages/fast_bert/data_cls.py", line 257, in _create_examples
return list(df.apply(lambda row: InputExample(guid=row.index, text_a=row[text_col], label=str(row[label_col])), axis=1))
File "/home/w3pt/.local/lib/python3.7/site-packages/pandas/core/frame.py", line 6906, in apply
return op.get_result()
File "/home/w3pt/.local/lib/python3.7/site-packages/pandas/core/apply.py", line 186, in get_result
return self.apply_standard()
File "/home/w3pt/.local/lib/python3.7/site-packages/pandas/core/apply.py", line 292, in apply_standard
self.apply_series_generator()
File "/home/w3pt/.local/lib/python3.7/site-packages/pandas/core/apply.py", line 321, in apply_series_generator
results[i] = self.f(v)
File "/home/w3pt/.local/lib/python3.7/site-packages/fast_bert/data_cls.py", line 257, in
return list(df.apply(lambda row: InputExample(guid=row.index, text_a=row[text_col], label=str(row[label_col])), axis=1))
File "/home/w3pt/.local/lib/python3.7/site-packages/pandas/core/series.py", line 1064, in getitem
result = self.index.get_value(self, key)
File "/home/w3pt/.local/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 4737, in get_value
raise e1
File "/home/w3pt/.local/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 4723, in get_value
return self._engine.get_value(s, k, tz=getattr(series.dtype, "tz", None))
File "pandas/_libs/index.pyx", line 80, in pandas._libs.index.IndexEngine.get_value
File "pandas/_libs/index.pyx", line 88, in pandas._libs.index.IndexEngine.get_value
File "pandas/_libs/index.pyx", line 131, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 1607, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 1614, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: ('text', 'occurred at index 0')

Code:
from fast_bert.data_cls import BertDataBunch
from pathlib import Path
DATA_PATH = Path('./')
LABEL_PATH = Path('./')

databunch = BertDataBunch(DATA_PATH, LABEL_PATH,
tokenizer='bert-base-uncased',
train_file='train.csv',
val_file='val.csv',
label_file='labels.csv',
text_col='text',
label_col='label',
batch_size_per_gpu=16,
max_seq_length=512,
multi_gpu=True,
multi_label=False,

Am I doing something wrong?

Unsupported operand type(s) for /: 'str' and 'str'

when I tried to run the example. I got the error:

databunch = BertDataBunch('./data/', './data/',
                          tokenizer='bert-base-uncased',
                          train_file='train.csv',
                          val_file='val.csv',
                          label_file='labels.csv',
                          text_col='text',
                          label_col='label',
                          batch_size_per_gpu=16,
                          max_seq_length=512,
                          multi_gpu=True,
                          multi_label=False,
                          model_type='bert',
                          no_cache=True)

TypeError Traceback (most recent call last)
in
11 multi_label=False,
12 model_type='bert',
---> 13 no_cache=True)

/data/miniconda3/envs/pt/lib/python3.7/site-packages/fast_bert/data_cls.py in init(self, data_dir, label_dir, tokenizer, train_file, val_file, test_data, label_file, text_col, label_col, batch_size_per_gpu, max_seq_length, multi_gpu, multi_label, backend, model_type, logger, clear_cache, no_cache)
288 self.tokenizer = tokenizer
289 self.data_dir = data_dir
--> 290 self.cache_dir = data_dir/'cache'
291 self.max_seq_length = max_seq_length
292 self.batch_size_per_gpu = batch_size_per_gpu

TypeError: unsupported operand type(s) for /: 'str' and 'str'
Could you help me to deal with that?

High confidence for False Positive results

I have trained multi class text classifier using BERT. I a getting accuracy around 90%. The only issue is the model is classifying out of domain sentences with very high confidence score(e.g. 0.9954564 score).
I have seen in other models like space supervised it classify out of domain sentences with very low confidence which helps to detect them. Is there any method to solve this problem?

RuntimeError: The size of tensor a (2) must match the size of tensor b (9833) at non-singleton dimension

I followed your instructions using my data.
Since the batch_size was too big for my data i changed it to 6.

Then i got this error during evaluation:

08/23/2019 17:50:14 - INFO - root - Running evaluation---------------------------------------------------------| 0.82% [49/5955 00:37<1:15:53] 08/23/2019 17:50:14 - INFO - root - Num examples = 9833 08/23/2019 17:50:14 - INFO - root - Batch size = 6 Traceback (most recent call last): File "train_fast_bert_doc_rerank.py", line 81, in <module> optimizer_type="lamb" File "/usr/local/lib/python3.6/site-packages/fast_bert/learner_cls.py", line 295, in fit results = self.validate() File "/usr/local/lib/python3.6/site-packages/fast_bert/learner_cls.py", line 382, in validate validation_scores[metric['name']] = metric['function'](all_logits, all_labels) File "/usr/local/lib/python3.6/site-packages/fast_bert/metrics.py", line 31, in accuracy_thresh return ((y_pred > thresh) == y_true.byte()).float().mean().item() RuntimeError: The size of tensor a (2) must match the size of tensor b (9833) at non-singleton dimension
Could you help me?
Thank you in advance

Torch not compiled with Cuda enabled

from fast_bert.learner_cls import BertLearner
from fast_bert.metrics import accuracy
import logging

logger = logging.getLogger()
device_cuda = torch.device('cpu') #torch.device("cuda")
metrics = [{'name': 'accuracy', 'function': accuracy}]

learner = BertLearner.from_pretrained_model(
databunch,
pretrained_path='bert-base-uncased',
metrics=metrics,
device=device_cuda,
logger=logger,
output_dir=MODEL_PATH,
finetuned_wgts_path=None,
warmup_steps=500,
multi_gpu=multi_gpu,
is_fp16=True,
multi_label=False,
logging_steps=50)

AssertionError Traceback (most recent call last)
in
19 is_fp16=True,
20 multi_label=False,
---> 21 logging_steps=50)

/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/fast_bert/learner_cls.py in from_pretrained_model(dataBunch, pretrained_path, output_dir, metrics, device, logger, finetuned_wgts_path, multi_gpu, is_fp16, loss_scale, warmup_steps, fp16_opt_level, grad_accumulation_steps, multi_label, max_grad_norm, adam_epsilon, logging_steps)
67 model = model_class[0].from_pretrained(pretrained_path, config=config)
68
---> 69 device_id = torch.cuda.current_device()
70 model.to(device)
71

/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/torch/cuda/init.py in current_device()
349 def current_device():
350 r"""Returns the index of a currently selected device."""
--> 351 _lazy_init()
352 return torch._C._cuda_getDevice()
353

/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/torch/cuda/init.py in _lazy_init()
160 raise RuntimeError(
161 "Cannot re-initialize CUDA in forked subprocess. " + msg)
--> 162 _check_driver()
163 torch._C._cuda_init()
164 _cudart = _load_cudart()

/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/torch/cuda/init.py in _check_driver()
73 def _check_driver():
74 if not hasattr(torch._C, '_cuda_isDriverSufficient'):
---> 75 raise AssertionError("Torch not compiled with CUDA enabled")
76 if not torch._C._cuda_isDriverSufficient():
77 if torch._C._cuda_getDriverVersion() == 0:

AssertionError: Torch not compiled with CUDA enabled

I can't run a model on os X and I was wondering if I could train without using cuda?

500+ multi-labels only predicts zeros

This might not be an issue related to fast-bert, but I give it a shot here either way. I now have a dataset of 500+ labels. At first, fast-bert predicts various values between 0-1 for every label which seems fine, but the more I train it the more it predicts only zeros for everything. Logically, it seems wise as only 1/500 is a positive label while the rest are zeros. Is there a way to fix this? Can I change the loss function somehow? Possibly introduce class weights to really penalize false-negatives?

## ERROR when importing Leaner in Fast-Bert

More than 6 multi-labels possible?

I'm trying to train fast-bert on a custom multi-labeled dataset (10 labels). It works perfectly when I strip down my dataset to only use 6 labels (same number as the provided toxic comments dataset), but when I try to switch the labels to be more or less than that, I get the following error:

Traceback (most recent call last):
File "multilabel.py", line 149, in <module> learner.fit(args.num_train_epochs, args.learning_rate, validate=True)
File "/home/ubuntu/miniconda3/lib/python3.7/site-packages/fast_bert/learner_cls.py", line 271, in fit outputs = self.model(**inputs)
File "/home/ubuntu/miniconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in __call__ result = self.forward(*input, **kwargs)
File "/home/ubuntu/miniconda3/lib/python3.7/site-packages/fast_bert/modeling.py", line 194, in forward loss = loss_fct(logits.view(-1, self.num_labels), labels.view(-1, self.num_labels))
RuntimeError: shape '[-1, 10]' is invalid for input of size 36

Seems like fast-bert is hard-coded to strictly work for only 6 multi-labels. Especially considering I get these different errors when I change the batch size as following with 10 labels in my dataset:

batch_size = 2 --> RuntimeError: shape '[-1, 10]' is invalid for input of size 12
(2 batch_size * 6 (labels?) = 12?)

batch_size = 4 --> RuntimeError: shape '[-1, 10]' is invalid for input of size 24
(4 batch_size * 6 (labels?) = 24?)

batch_size = 6 --> RuntimeError: shape '[-1, 10]' is invalid for input of size 36
(6 batch_size * 6 (labels?) = 36?)

batch_size = 8 --> RuntimeError: shape '[-1, 10]' is invalid for input of size 48
(8 batch_size * 6 (labels?) = 48?)

Any ideas how I can solve fast-bert to use more than 6 labels?

Incomplete class Learner(object)

Hi @kaushaltrivedi. Thanks so much for creating this library, it's great.

I was using it a few days ago and it worked well. But now I'm getting an import error from fast_bert/learner.py. I think it's due to a incomplete class Learner(object):. Complete message below:

File "/usr/local/lib/python3.6/dist-packages/fast_bert/learner.py", line 61 class BertLearner(object): ^ IndentationError: expected an indented block

learner.fit doesn't show result after every epoch

Hi @kaushaltrivedi ,
I used:

learner.fit(epochs=6, 
			lr=6e-5, 
			validate=True. 	# Evaluate the model after each epoch
			schedule_type="warmup_cosine")

However, that code onlys checks after the whole training, not after each epoch.
What could I do?
Thanks

Problem with multiclass model

When I tried to run the model for multi-class problem after training and running evaluation it throws
RuntimeError Traceback (most recent call last) 1 learner.fit(args.num_train_epochs, args.learning_rate, validate=True) 52 if len(types) <= 1: ---> 53 return orig_fn(*args, **kwargs) 54 elif len(types) == 2 and types == set(['HalfTensor', 'FloatTensor']): 55 new_args = utils.casted_args(cast_fn, RuntimeError: The size of tensor a (4) must match the size of tensor b (74) at non-singleton dimension 1
Metrics I have used is fbeta.

cant see any metric while training

earlier I was able to see accuracy and f beta score while training the model but now I can't see anything. Model just completes its epoch and not printing anything.
any suggestions?

Multiple Output Predictions

Hello,

It's possible to create a model that uses pre-trained BERT (or any other model), and feeds data from multiple datasets to predict multiple outputs?

Example, which I have 4 text datasets:
Dataset A contains [ ValueA, ValueB, ValueC ]
Dataset B contains [ ValueA, ValueB, ValueC, ValueD, ValueE, ValueF ]
Dataset C contains [ ValueA, ValueB ]
Dataset D contains [ ValueD, ValueE, ValueF ]

Since all of them are on English, I hope to use BERT to enchance the similarity between datasets.

Approaches that I thought:

Create a general y, and add 0. to empty fields which I don't have for it. In this case, my prediction would be [ ValueA, ValueB, ValueC, ValueD, ValueE, ValueF ]

full code multi label classfication

Shuffling of datasets

Does fast-bert shuffle the train, val and eval datasets?

Error with multi_label=False in BertDataBunch

I am trying to detect lies in text, so it can either be the person telling the truth or a lie.

So this is not a multi_label problem, and therefore my BertDataBunch is looking like


databunch = BertDataBunch(args['data_dir'], LABEL_PATH, tokenizer, train_file='train.csv', val_file='val.csv',
                          test_data='test.csv',
                          text_col="content", label_col=label_cols,
                          bs=args['train_batch_size'], maxlen=args['max_seq_length'], 
                          multi_gpu=multi_gpu, multi_label=False)

However I am then getting a keyerror

'lie 0\nName: 0, dtype: object'

Runtime Crashes on Google Colab

I was trying to create Databunch on Google Colab, using the sentiments140 twitter dataset from google colab. But no matter what batch size I use the GPU always crashes. I tried all batch sizes from 2 to 256. But the runtime crashes every single time. Can anyone please help me to solve the issue.

databunch = BertDataBunch(DATA_PATH, LABEL_PATH, tokenizer='xlnet-base-cased', train_file= 'df_train2.csv', val_file = 'df_valid2.csv', label_file = 'labels.csv', text_col = 'text', label_col = 'label', batch_size_per_gpu=2, max_seq_length=128, multi_gpu=False, multi_label=False, model_type='xlnet', )

This is the code where it crashes.

Classification Metrics usage

How can i use the confusion matrix for each class and the other metrics in this link #17 ??

Undefine name: random_word()

random_word() is called twice but it is not defined or imported.

https://github.com/codertimo/BERT-pytorch/blob/master/bert_pytorch/dataset/dataset.py#L63

AttributeError: 'str' object has no attribute 'input_ids'

Hi, I came to this error, how to solve

Traceback (most recent call last):
File "fastBertDemo.py", line 23, in
model_type='bert')
File "/usr/local/python3/lib/python3.6/site-packages/fast_bert/data_cls.py", line 332, in init
train_dataset = self.get_dataset_from_examples(train_examples, 'train')
File "/usr/local/python3/lib/python3.6/site-packages/fast_bert/data_cls.py", line 431, in get_dataset_from_examples
all_input_ids = torch.tensor([f.input_ids for f in features], dtype=torch.long)
File "/usr/local/python3/lib/python3.6/site-packages/fast_bert/data_cls.py", line 431, in
all_input_ids = torch.tensor([f.input_ids for f in features], dtype=torch.long)
AttributeError: 'str' object has no attribute 'input_ids'

How to train on unsupervised data only, to get domain specific embeddings representations

I saw that we could train labeled dataset using your module. But I have huge corpus of unlabeled text data which are in sentence sequence representations. I just want to train language model kind of model on my data to learn about domain specific word or sentence representations interms of embeddings so than I can use those embddings for downstram unsupervised tasks. Do you have any idea how can I train bert pretrained model on my corpus. Thank you.

Save model weights on epoch with best score

It could be nice to have an option to save the model with the best validation score for a given metric.
Also it could be nice just to have a function to do anything on each epoch's end.

Target Size not same as input size.

Hi,

Target size (torch.Size([0, 6])) must be the same as input size (torch.Size([32, 6]))

Below is the code.

databunch = BertDataBunch('fast-bert/sample_data/multi_label_toxic_comments/data', 'fast-bert/sample_data/multi_label_toxic_comments/label', tokenizer,
train_file='train_sample.csv', val_file='val_sample.csv',label_file='labels.csv',label_col=None,
bs=args['train_batch_size'], maxlen=args['max_seq_length'],
multi_gpu=multi_gpu, multi_label=True)

metrics = []
metrics.append({'name': 'accuracy', 'function': accuracy})

learner = BertLearner.from_pretrained_model(databunch, 'https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-uncased.tar.gz', metrics, device, logger=None,
finetuned_wgts_path=None,
is_fp16=args['fp16'], loss_scale=args['loss_scale'],
multi_gpu=multi_gpu, multi_label=True)

learner.fit(1, lr=args['learning_rate'],
schedule_type="warmup_cosine_hard_restarts")

weights not initialized when saving/loading

When i train a fastbert model and save it using save_and_reload(), the model output is not consistent with the models output before saving.

code to reproduce:

from fast_bert import BertClassificationPredictor


databunch = BertDataBunch(args['data_dir'], LABEL_PATH, tokenizer, train_file='train.csv', val_file='val.csv',
                      test_data=test_df['content'].tolist(),
                      text_col="content", label_col=label_cols,
                      bs=args['train_batch_size'], maxlen=args['max_seq_length'], 
                      multi_gpu=True, multi_label=True)
databunch.save()

metrics = []
metrics.append({'name': 'accuracy_thresh', 'function': accuracy_thresh})
metrics.append({'name': 'roc_auc', 'function': roc_auc})
metrics.append({'name': 'fbeta', 'function': fbeta})
metrics.append({'name': 'accuracy_single', 'function': accuracy_multilabel})

learner = BertLearner.from_pretrained_model(databunch, BERT_PRETRAINED_PATH, metrics, device, logger, 
                                            finetuned_wgts_path=FINETUNED_PATH, 
                                            is_fp16=args['fp16'], loss_scale=args['loss_scale'], 
                                            multi_gpu=True,  multi_label=True,)
learner.fit(4, lr=args['learning_rate'], schedule_type="warmup_cosine_hard_restarts",validate=True)

#save prediction on test set
prediction_before_saving = learner.predict_batch(test_df['content'].tolist())

model_path = os.getcwd()+'/fastBertModels'
model_name = 'fastBert_split_'+str(idx)+'_test'
learner.save_and_reload(model_path,model_name)
predictor = BertClassificationPredictor(model_path=model_path+'/'+model_name+'.bin', pretrained_path = BERT_PRETRAINED_PATH, label_path = LABEL_PATH, multi_label=True)

#save prediction on test set (again)
prediction_after_loading = predictor.predict_batch(test_df['content'].tolist())

#remove column names from predictions 
prediction_before_saving = [[x[0][1],x[1][1]] for x in prediction_before_saving]
prediction_after_loading = [[x[0][1],x[1][1]] for x in prediction_after_loading]


for x,y in zip(prediction_before_saving,prediction_after_loading):
    print(x==y,x,y)

I also get a bunch of warnings regarding the bert model weights when i run save_and_reload(), as well as when i load the model into a BertClassificationPredictor. I suspect this to be the culprit (example below).

 05/28/2019 22:13:30 - INFO - pytorch_pretrained_bert.modeling -   loading archive file uncased_L-12_H-768_A-12 from cache at uncased_L-12_H-768_A-12
05/28/2019 22:13:30 - INFO - pytorch_pretrained_bert.modeling -   Model config {
  "attention_probs_dropout_prob": 0.1,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "max_position_embeddings": 512,
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "type_vocab_size": 2,
  "vocab_size": 30522
}
05/28/2019 22:13:36 - INFO - pytorch_pretrained_bert.modeling -   Weights of BertForMultiLabelSequenceClassification not initialized from pretrained model: ['bert.embeddings.word_embeddings.weight', 'bert.embeddings.position_embeddings.weight', 'bert.embeddings.token_type_embeddings.weight', 'bert.embeddings.LayerNorm.weight', 'bert.embeddings.LayerNorm.bias', 'bert.encoder.layer.0.attention.self.query.weight', 'bert.encoder.layer.0.attention.self.query.bias', 'bert.encoder.layer.0.attention.self.key.weight', 'bert.encoder.layer.0.attention.self.key.bias', 'bert.encoder.layer.0.attention.self.value.weight', 'bert.encoder.layer.0.attention.self.value.bias', 'bert.encoder.layer.0.attention.output.dense.weight', 'bert.encoder.layer.0.attention.output.dense.bias', 'bert.encoder.layer.0.attention.output.LayerNorm.weight', 'bert.encoder.layer.0.attention.output.LayerNorm.bias', 'bert.encoder.layer.0.intermediate.dense.weight', 'bert.encoder.layer.0.intermediate.dense.bias', 'bert.encoder.layer.0.output.dense.weight', 'bert.encoder.layer.0.output.dense.bias', 'bert.encoder.layer.0.output.LayerNorm.weight', 'bert.encoder.layer.0.output.LayerNorm.bias', 'bert.encoder.layer.1.attention.self.query.weight', 'bert.encoder.layer.1.attention.self.query.bias', 'bert.encoder.layer.1.attention.self.key.weight', 'bert.encoder.layer.1.attention.self.key.bias', 'bert.encoder.layer.1.attention.self.value.weight', 'bert.encoder.layer.1.attention.self.value.bias', 'bert.encoder.layer.1.attention.output.dense.weight', 'bert.encoder.layer.1.attention.output.dense.bias', 'bert.encoder.layer.1.attention.output.LayerNorm.weight', 'bert.encoder.layer.1.attention.output.LayerNorm.bias', 'bert.encoder.layer.1.intermediate.dense.weight', 'bert.encoder.layer.1.intermediate.dense.bias', 'bert.encoder.layer.1.output.dense.weight', 'bert.encoder.layer.1.output.dense.bias', 'bert.encoder.layer.1.output.LayerNorm.weight', 'bert.encoder.layer.1.output.LayerNorm.bias', 'bert.encoder.layer.2.attention.self.query.weight', 'bert.encoder.layer.2.attention.self.query.bias', 'bert.encoder.layer.2.attention.self.key.weight', 'bert.encoder.layer.2.attention.self.key.bias', 'bert.encoder.layer.2.attention.self.value.weight', 'bert.encoder.layer.2.attention.self.value.bias', 'bert.encoder.layer.2.attention.output.dense.weight', 'bert.encoder.layer.2.attention.output.dense.bias', 'bert.encoder.layer.2.attention.output.LayerNorm.weight', 'bert.encoder.layer.2.attention.output.LayerNorm.bias', 'bert.encoder.layer.2.intermediate.dense.weight', 'bert.encoder.layer.2.intermediate.dense.bias', 'bert.encoder.layer.2.output.dense.weight', 'bert.encoder.layer.2.output.dense.bias', 'bert.encoder.layer.2.output.LayerNorm.weight', 'bert.encoder.layer.2.output.LayerNorm.bias', 'bert.encoder.layer.3.attention.self.query.weight', 'bert.encoder.layer.3.attention.self.query.bias', 'bert.encoder.layer.3.attention.self.key.weight', 'bert.encoder.layer.3.attention.self.key.bias', 'bert.encoder.layer.3.attention.self.value.weight', 'bert.encoder.layer.3.attention.self.value.bias', 'bert.encoder.layer.3.attention.output.dense.weight', 'bert.encoder.layer.3.attention.output.dense.bias', 'bert.encoder.layer.3.attention.output.LayerNorm.weight', 'bert.encoder.layer.3.attention.output.LayerNorm.bias', 'bert.encoder.layer.3.intermediate.dense.weight', 'bert.encoder.layer.3.intermediate.dense.bias', 'bert.encoder.layer.3.output.dense.weight', 'bert.encoder.layer.3.output.dense.bias', 'bert.encoder.layer.3.output.LayerNorm.weight', 'bert.encoder.layer.3.output.LayerNorm.bias', 'bert.encoder.layer.4.attention.self.query.weight', 'bert.encoder.layer.4.attention.self.query.bias', 'bert.encoder.layer.4.attention.self.key.weight', 'bert.encoder.layer.4.attention.self.key.bias', 'bert.encoder.layer.4.attention.self.value.weight', 'bert.encoder.layer.4.attention.self.value.bias', 'bert.encoder.layer.4.attention.output.dense.weight', 'bert.encoder.layer.4.attention.output.dense.bias', 'bert.encoder.layer.4.attention.output.LayerNorm.weight', 'bert.encoder.layer.4.attention.output.LayerNorm.bias', 'bert.encoder.layer.4.intermediate.dense.weight', 'bert.encoder.layer.4.intermediate.dense.bias', 'bert.encoder.layer.4.output.dense.weight', 'bert.encoder.layer.4.output.dense.bias', 'bert.encoder.layer.4.output.LayerNorm.weight', 'bert.encoder.layer.4.output.LayerNorm.bias', 'bert.encoder.layer.5.attention.self.query.weight', 'bert.encoder.layer.5.attention.self.query.bias', 'bert.encoder.layer.5.attention.self.key.weight', 'bert.encoder.layer.5.attention.self.key.bias', 'bert.encoder.layer.5.attention.self.value.weight', 'bert.encoder.layer.5.attention.self.value.bias', 'bert.encoder.layer.5.attention.output.dense.weight', 'bert.encoder.layer.5.attention.output.dense.bias', 'bert.encoder.layer.5.attention.output.LayerNorm.weight', 'bert.encoder.layer.5.attention.output.LayerNorm.bias', 'bert.encoder.layer.5.intermediate.dense.weight', 'bert.encoder.layer.5.intermediate.dense.bias', 'bert.encoder.layer.5.output.dense.weight', 'bert.encoder.layer.5.output.dense.bias', 'bert.encoder.layer.5.output.LayerNorm.weight', 'bert.encoder.layer.5.output.LayerNorm.bias', 'bert.encoder.layer.6.attention.self.query.weight', 'bert.encoder.layer.6.attention.self.query.bias', 'bert.encoder.layer.6.attention.self.key.weight', 'bert.encoder.layer.6.attention.self.key.bias', 'bert.encoder.layer.6.attention.self.value.weight', 'bert.encoder.layer.6.attention.self.value.bias', 'bert.encoder.layer.6.attention.output.dense.weight', 'bert.encoder.layer.6.attention.output.dense.bias', 'bert.encoder.layer.6.attention.output.LayerNorm.weight', 'bert.encoder.layer.6.attention.output.LayerNorm.bias', 'bert.encoder.layer.6.intermediate.dense.weight', 'bert.encoder.layer.6.intermediate.dense.bias', 'bert.encoder.layer.6.output.dense.weight', 'bert.encoder.layer.6.output.dense.bias', 'bert.encoder.layer.6.output.LayerNorm.weight', 'bert.encoder.layer.6.output.LayerNorm.bias', 'bert.encoder.layer.7.attention.self.query.weight', 'bert.encoder.layer.7.attention.self.query.bias', 'bert.encoder.layer.7.attention.self.key.weight', 'bert.encoder.layer.7.attention.self.key.bias', 'bert.encoder.layer.7.attention.self.value.weight', 'bert.encoder.layer.7.attention.self.value.bias', 'bert.encoder.layer.7.attention.output.dense.weight', 'bert.encoder.layer.7.attention.output.dense.bias', 'bert.encoder.layer.7.attention.output.LayerNorm.weight', 'bert.encoder.layer.7.attention.output.LayerNorm.bias', 'bert.encoder.layer.7.intermediate.dense.weight', 'bert.encoder.layer.7.intermediate.dense.bias', 'bert.encoder.layer.7.output.dense.weight', 'bert.encoder.layer.7.output.dense.bias', 'bert.encoder.layer.7.output.LayerNorm.weight', 'bert.encoder.layer.7.output.LayerNorm.bias', 'bert.encoder.layer.8.attention.self.query.weight', 'bert.encoder.layer.8.attention.self.query.bias', 'bert.encoder.layer.8.attention.self.key.weight', 'bert.encoder.layer.8.attention.self.key.bias', 'bert.encoder.layer.8.attention.self.value.weight', 'bert.encoder.layer.8.attention.self.value.bias', 'bert.encoder.layer.8.attention.output.dense.weight', 'bert.encoder.layer.8.attention.output.dense.bias', 'bert.encoder.layer.8.attention.output.LayerNorm.weight', 'bert.encoder.layer.8.attention.output.LayerNorm.bias', 'bert.encoder.layer.8.intermediate.dense.weight', 'bert.encoder.layer.8.intermediate.dense.bias', 'bert.encoder.layer.8.output.dense.weight', 'bert.encoder.layer.8.output.dense.bias', 'bert.encoder.layer.8.output.LayerNorm.weight', 'bert.encoder.layer.8.output.LayerNorm.bias', 'bert.encoder.layer.9.attention.self.query.weight', 'bert.encoder.layer.9.attention.self.query.bias', 'bert.encoder.layer.9.attention.self.key.weight', 'bert.encoder.layer.9.attention.self.key.bias', 'bert.encoder.layer.9.attention.self.value.weight', 'bert.encoder.layer.9.attention.self.value.bias', 'bert.encoder.layer.9.attention.output.dense.weight', 'bert.encoder.layer.9.attention.output.dense.bias', 'bert.encoder.layer.9.attention.output.LayerNorm.weight', 'bert.encoder.layer.9.attention.output.LayerNorm.bias', 'bert.encoder.layer.9.intermediate.dense.weight', 'bert.encoder.layer.9.intermediate.dense.bias', 'bert.encoder.layer.9.output.dense.weight', 'bert.encoder.layer.9.output.dense.bias', 'bert.encoder.layer.9.output.LayerNorm.weight', 'bert.encoder.layer.9.output.LayerNorm.bias', 'bert.encoder.layer.10.attention.self.query.weight', 'bert.encoder.layer.10.attention.self.query.bias', 'bert.encoder.layer.10.attention.self.key.weight', 'bert.encoder.layer.10.attention.self.key.bias', 'bert.encoder.layer.10.attention.self.value.weight', 'bert.encoder.layer.10.attention.self.value.bias', 'bert.encoder.layer.10.attention.output.dense.weight', 'bert.encoder.layer.10.attention.output.dense.bias', 'bert.encoder.layer.10.attention.output.LayerNorm.weight', 'bert.encoder.layer.10.attention.output.LayerNorm.bias', 'bert.encoder.layer.10.intermediate.dense.weight', 'bert.encoder.layer.10.intermediate.dense.bias', 'bert.encoder.layer.10.output.dense.weight', 'bert.encoder.layer.10.output.dense.bias', 'bert.encoder.layer.10.output.LayerNorm.weight', 'bert.encoder.layer.10.output.LayerNorm.bias', 'bert.encoder.layer.11.attention.self.query.weight', 'bert.encoder.layer.11.attention.self.query.bias', 'bert.encoder.layer.11.attention.self.key.weight', 'bert.encoder.layer.11.attention.self.key.bias', 'bert.encoder.layer.11.attention.self.value.weight', 'bert.encoder.layer.11.attention.self.value.bias', 'bert.encoder.layer.11.attention.output.dense.weight', 'bert.encoder.layer.11.attention.output.dense.bias', 'bert.encoder.layer.11.attention.output.LayerNorm.weight', 'bert.encoder.layer.11.attention.output.LayerNorm.bias', 'bert.encoder.layer.11.intermediate.dense.weight', 'bert.encoder.layer.11.intermediate.dense.bias', 'bert.encoder.layer.11.output.dense.weight', 'bert.encoder.layer.11.output.dense.bias', 'bert.encoder.layer.11.output.LayerNorm.weight', 'bert.encoder.layer.11.output.LayerNorm.bias', 'bert.pooler.dense.weight', 'bert.pooler.dense.bias', 'classifier.weight', 'classifier.bias']
05/28/2019 22:13:36 - INFO - pytorch_pretrained_bert.modeling -   Weights from pretrained model not used in BertForMultiLabelSequenceClassification: ['module.bert.embeddings.word_embeddings.weight', 'module.bert.embeddings.position_embeddings.weight', 'module.bert.embeddings.token_type_embeddings.weight', 'module.bert.embeddings.LayerNorm.weight', 'module.bert.embeddings.LayerNorm.bias', 'module.bert.encoder.layer.0.attention.self.query.weight', 'module.bert.encoder.layer.0.attention.self.query.bias', 'module.bert.encoder.layer.0.attention.self.key.weight', 'module.bert.encoder.layer.0.attention.self.key.bias', 'module.bert.encoder.layer.0.attention.self.value.weight', 'module.bert.encoder.layer.0.attention.self.value.bias', 'module.bert.encoder.layer.0.attention.output.dense.weight', 'module.bert.encoder.layer.0.attention.output.dense.bias', 'module.bert.encoder.layer.0.attention.output.LayerNorm.weight', 'module.bert.encoder.layer.0.attention.output.LayerNorm.bias', 'module.bert.encoder.layer.0.intermediate.dense.weight', 'module.bert.encoder.layer.0.intermediate.dense.bias', 'module.bert.encoder.layer.0.output.dense.weight', 'module.bert.encoder.layer.0.output.dense.bias', 'module.bert.encoder.layer.0.output.LayerNorm.weight', 'module.bert.encoder.layer.0.output.LayerNorm.bias', 'module.bert.encoder.layer.1.attention.self.query.weight', 'module.bert.encoder.layer.1.attention.self.query.bias', 'module.bert.encoder.layer.1.attention.self.key.weight', 'module.bert.encoder.layer.1.attention.self.key.bias', 'module.bert.encoder.layer.1.attention.self.value.weight', 'module.bert.encoder.layer.1.attention.self.value.bias', 'module.bert.encoder.layer.1.attention.output.dense.weight', 'module.bert.encoder.layer.1.attention.output.dense.bias', 'module.bert.encoder.layer.1.attention.output.LayerNorm.weight', 'module.bert.encoder.layer.1.attention.output.LayerNorm.bias', 'module.bert.encoder.layer.1.intermediate.dense.weight', 'module.bert.encoder.layer.1.intermediate.dense.bias', 'module.bert.encoder.layer.1.output.dense.weight', 'module.bert.encoder.layer.1.output.dense.bias', 'module.bert.encoder.layer.1.output.LayerNorm.weight', 'module.bert.encoder.layer.1.output.LayerNorm.bias', 'module.bert.encoder.layer.2.attention.self.query.weight', 'module.bert.encoder.layer.2.attention.self.query.bias', 'module.bert.encoder.layer.2.attention.self.key.weight', 'module.bert.encoder.layer.2.attention.self.key.bias', 'module.bert.encoder.layer.2.attention.self.value.weight', 'module.bert.encoder.layer.2.attention.self.value.bias', 'module.bert.encoder.layer.2.attention.output.dense.weight', 'module.bert.encoder.layer.2.attention.output.dense.bias', 'module.bert.encoder.layer.2.attention.output.LayerNorm.weight', 'module.bert.encoder.layer.2.attention.output.LayerNorm.bias', 'module.bert.encoder.layer.2.intermediate.dense.weight', 'module.bert.encoder.layer.2.intermediate.dense.bias', 'module.bert.encoder.layer.2.output.dense.weight', 'module.bert.encoder.layer.2.output.dense.bias', 'module.bert.encoder.layer.2.output.LayerNorm.weight', 'module.bert.encoder.layer.2.output.LayerNorm.bias', 'module.bert.encoder.layer.3.attention.self.query.weight', 'module.bert.encoder.layer.3.attention.self.query.bias', 'module.bert.encoder.layer.3.attention.self.key.weight', 'module.bert.encoder.layer.3.attention.self.key.bias', 'module.bert.encoder.layer.3.attention.self.value.weight', 'module.bert.encoder.layer.3.attention.self.value.bias', 'module.bert.encoder.layer.3.attention.output.dense.weight', 'module.bert.encoder.layer.3.attention.output.dense.bias', 'module.bert.encoder.layer.3.attention.output.LayerNorm.weight', 'module.bert.encoder.layer.3.attention.output.LayerNorm.bias', 'module.bert.encoder.layer.3.intermediate.dense.weight', 'module.bert.encoder.layer.3.intermediate.dense.bias', 'module.bert.encoder.layer.3.output.dense.weight', 'module.bert.encoder.layer.3.output.dense.bias', 'module.bert.encoder.layer.3.output.LayerNorm.weight', 'module.bert.encoder.layer.3.output.LayerNorm.bias', 'module.bert.encoder.layer.4.attention.self.query.weight', 'module.bert.encoder.layer.4.attention.self.query.bias', 'module.bert.encoder.layer.4.attention.self.key.weight', 'module.bert.encoder.layer.4.attention.self.key.bias', 'module.bert.encoder.layer.4.attention.self.value.weight', 'module.bert.encoder.layer.4.attention.self.value.bias', 'module.bert.encoder.layer.4.attention.output.dense.weight', 'module.bert.encoder.layer.4.attention.output.dense.bias', 'module.bert.encoder.layer.4.attention.output.LayerNorm.weight', 'module.bert.encoder.layer.4.attention.output.LayerNorm.bias', 'module.bert.encoder.layer.4.intermediate.dense.weight', 'module.bert.encoder.layer.4.intermediate.dense.bias', 'module.bert.encoder.layer.4.output.dense.weight', 'module.bert.encoder.layer.4.output.dense.bias', 'module.bert.encoder.layer.4.output.LayerNorm.weight', 'module.bert.encoder.layer.4.output.LayerNorm.bias', 'module.bert.encoder.layer.5.attention.self.query.weight', 'module.bert.encoder.layer.5.attention.self.query.bias', 'module.bert.encoder.layer.5.attention.self.key.weight', 'module.bert.encoder.layer.5.attention.self.key.bias', 'module.bert.encoder.layer.5.attention.self.value.weight', 'module.bert.encoder.layer.5.attention.self.value.bias', 'module.bert.encoder.layer.5.attention.output.dense.weight', 'module.bert.encoder.layer.5.attention.output.dense.bias', 'module.bert.encoder.layer.5.attention.output.LayerNorm.weight', 'module.bert.encoder.layer.5.attention.output.LayerNorm.bias', 'module.bert.encoder.layer.5.intermediate.dense.weight', 'module.bert.encoder.layer.5.intermediate.dense.bias', 'module.bert.encoder.layer.5.output.dense.weight', 'module.bert.encoder.layer.5.output.dense.bias', 'module.bert.encoder.layer.5.output.LayerNorm.weight', 'module.bert.encoder.layer.5.output.LayerNorm.bias', 'module.bert.encoder.layer.6.attention.self.query.weight', 'module.bert.encoder.layer.6.attention.self.query.bias', 'module.bert.encoder.layer.6.attention.self.key.weight', 'module.bert.encoder.layer.6.attention.self.key.bias', 'module.bert.encoder.layer.6.attention.self.value.weight', 'module.bert.encoder.layer.6.attention.self.value.bias', 'module.bert.encoder.layer.6.attention.output.dense.weight', 'module.bert.encoder.layer.6.attention.output.dense.bias', 'module.bert.encoder.layer.6.attention.output.LayerNorm.weight', 'module.bert.encoder.layer.6.attention.output.LayerNorm.bias', 'module.bert.encoder.layer.6.intermediate.dense.weight', 'module.bert.encoder.layer.6.intermediate.dense.bias', 'module.bert.encoder.layer.6.output.dense.weight', 'module.bert.encoder.layer.6.output.dense.bias', 'module.bert.encoder.layer.6.output.LayerNorm.weight', 'module.bert.encoder.layer.6.output.LayerNorm.bias', 'module.bert.encoder.layer.7.attention.self.query.weight', 'module.bert.encoder.layer.7.attention.self.query.bias', 'module.bert.encoder.layer.7.attention.self.key.weight', 'module.bert.encoder.layer.7.attention.self.key.bias', 'module.bert.encoder.layer.7.attention.self.value.weight', 'module.bert.encoder.layer.7.attention.self.value.bias', 'module.bert.encoder.layer.7.attention.output.dense.weight', 'module.bert.encoder.layer.7.attention.output.dense.bias', 'module.bert.encoder.layer.7.attention.output.LayerNorm.weight', 'module.bert.encoder.layer.7.attention.output.LayerNorm.bias', 'module.bert.encoder.layer.7.intermediate.dense.weight', 'module.bert.encoder.layer.7.intermediate.dense.bias', 'module.bert.encoder.layer.7.output.dense.weight', 'module.bert.encoder.layer.7.output.dense.bias', 'module.bert.encoder.layer.7.output.LayerNorm.weight', 'module.bert.encoder.layer.7.output.LayerNorm.bias', 'module.bert.encoder.layer.8.attention.self.query.weight', 'module.bert.encoder.layer.8.attention.self.query.bias', 'module.bert.encoder.layer.8.attention.self.key.weight', 'module.bert.encoder.layer.8.attention.self.key.bias', 'module.bert.encoder.layer.8.attention.self.value.weight', 'module.bert.encoder.layer.8.attention.self.value.bias', 'module.bert.encoder.layer.8.attention.output.dense.weight', 'module.bert.encoder.layer.8.attention.output.dense.bias', 'module.bert.encoder.layer.8.attention.output.LayerNorm.weight', 'module.bert.encoder.layer.8.attention.output.LayerNorm.bias', 'module.bert.encoder.layer.8.intermediate.dense.weight', 'module.bert.encoder.layer.8.intermediate.dense.bias', 'module.bert.encoder.layer.8.output.dense.weight', 'module.bert.encoder.layer.8.output.dense.bias', 'module.bert.encoder.layer.8.output.LayerNorm.weight', 'module.bert.encoder.layer.8.output.LayerNorm.bias', 'module.bert.encoder.layer.9.attention.self.query.weight', 'module.bert.encoder.layer.9.attention.self.query.bias', 'module.bert.encoder.layer.9.attention.self.key.weight', 'module.bert.encoder.layer.9.attention.self.key.bias', 'module.bert.encoder.layer.9.attention.self.value.weight', 'module.bert.encoder.layer.9.attention.self.value.bias', 'module.bert.encoder.layer.9.attention.output.dense.weight', 'module.bert.encoder.layer.9.attention.output.dense.bias', 'module.bert.encoder.layer.9.attention.output.LayerNorm.weight', 'module.bert.encoder.layer.9.attention.output.LayerNorm.bias', 'module.bert.encoder.layer.9.intermediate.dense.weight', 'module.bert.encoder.layer.9.intermediate.dense.bias', 'module.bert.encoder.layer.9.output.dense.weight', 'module.bert.encoder.layer.9.output.dense.bias', 'module.bert.encoder.layer.9.output.LayerNorm.weight', 'module.bert.encoder.layer.9.output.LayerNorm.bias', 'module.bert.encoder.layer.10.attention.self.query.weight', 'module.bert.encoder.layer.10.attention.self.query.bias', 'module.bert.encoder.layer.10.attention.self.key.weight', 'module.bert.encoder.layer.10.attention.self.key.bias', 'module.bert.encoder.layer.10.attention.self.value.weight', 'module.bert.encoder.layer.10.attention.self.value.bias', 'module.bert.encoder.layer.10.attention.output.dense.weight', 'module.bert.encoder.layer.10.attention.output.dense.bias', 'module.bert.encoder.layer.10.attention.output.LayerNorm.weight', 'module.bert.encoder.layer.10.attention.output.LayerNorm.bias', 'module.bert.encoder.layer.10.intermediate.dense.weight', 'module.bert.encoder.layer.10.intermediate.dense.bias', 'module.bert.encoder.layer.10.output.dense.weight', 'module.bert.encoder.layer.10.output.dense.bias', 'module.bert.encoder.layer.10.output.LayerNorm.weight', 'module.bert.encoder.layer.10.output.LayerNorm.bias', 'module.bert.encoder.layer.11.attention.self.query.weight', 'module.bert.encoder.layer.11.attention.self.query.bias', 'module.bert.encoder.layer.11.attention.self.key.weight', 'module.bert.encoder.layer.11.attention.self.key.bias', 'module.bert.encoder.layer.11.attention.self.value.weight', 'module.bert.encoder.layer.11.attention.self.value.bias', 'module.bert.encoder.layer.11.attention.output.dense.weight', 'module.bert.encoder.layer.11.attention.output.dense.bias', 'module.bert.encoder.layer.11.attention.output.LayerNorm.weight', 'module.bert.encoder.layer.11.attention.output.LayerNorm.bias', 'module.bert.encoder.layer.11.intermediate.dense.weight', 'module.bert.encoder.layer.11.intermediate.dense.bias', 'module.bert.encoder.layer.11.output.dense.weight', 'module.bert.encoder.layer.11.output.dense.bias', 'module.bert.encoder.layer.11.output.LayerNorm.weight', 'module.bert.encoder.layer.11.output.LayerNorm.bias', 'module.bert.pooler.dense.weight', 'module.bert.pooler.dense.bias', 'module.classifier.weight', 'module.classifier.bias']

notebook not working out of the box

I'm trying to just get the included toxicity notebook to work from a fresh clone and am having some issues:

Out of the box, the data & labels directory are pointing to the wrong place and the DataBunch is using filenames that are not part of the repo. These are fixed easily enough.
It would help if there was a pointer to where to get the PyTorch pretrained model uncased_L-12_H-768_A-12. There is a Google download which will not work with the from_pretrained_model cell:

FileNotFoundError: [Errno 2] No such file or directory: '../../bert/bert-models/uncased_L-12_H-768_A-12/pytorch_model.bin'

I have been able to get past this step by instead of using 'bert-base-uncased' instead of BERT_PRETRAINED_PATH as the model spec in the tokenizer and from_pretrained_model steps.

Once I get everything loaded, RuntimeError: CUDA out of memory. Tried to allocate 96.00 MiB (GPU 0; 7.43 GiB total capacity; 6.91 GiB already allocated; 10.94 MiB free; 24.36 MiB cached)

This is a standard 8G GPU compute engine instance on GCP. Advice on how to not run out of memory would help the tutorial a lot.

cannot import name 'ConstantLR'

When installing fast_bert after pip install fast_bert, i got this in from fast_bert.learner import BertLearner

Saving bin file from learner and then load it

I used learner.save_and_reload to save my model and an output of pretrained_bert.bin occured. How can i used this .bin file and classify with learner.predict_batches() as i have been stuck for ages and i dont know how.

Could the lamb optimizer be used in ImageNet classification?

Thank you for your contribution.
Like the paper said, the lamb optimizer could also be used for ImageNet classification. I am trying to incorporate the lamb here to my own code. Could the optimizer you contributed here be also applied in this kind of classification?
Many thanks.

F1-score always 0

I use metrics as [{'name': 'F1-score', 'function': F1}], run the samples data for 4 epoch.

However, after each epoch, I got the F1 score is 0, what's wrong?

from fast_bert.learner import *
from fast_bert.metrics import *
from pytorch_pretrained_bert.tokenization import BertTokenizer

from bert_data import *

import torch
from fastai.text import *
import datetime

run_start_time = datetime.datetime.today().strftime('%Y-%m-%d_%H-%M-%S')

LOG_PATH=Path('logs/')  
MODEL_PATH=Path('models/') 

if not LOG_PATH.exists():
  LOG_PATH.mkdir()
import logging

args = {
    "run_text": "my_test",
    "max_seq_length": 512,
    "do_lower_case": True,
    "train_batch_size": 16,
    "learning_rate": 6e-5,
    "num_train_epochs": 12.0,
    "warmup_proportion": 0.002,
    "local_rank": -1,
    "gradient_accumulation_steps": 1,
    "fp16": True,
    "loss_scale": 128
}

logfile = str(LOG_PATH/'log-{}-{}.txt'.format(run_start_time, args["run_text"]))

logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(levelname)s - %(name)s -   %(message)s',
    datefmt='%m/%d/%Y %H:%M:%S',
    handlers=[
        logging.FileHandler(logfile),
        logging.StreamHandler(sys.stdout)
    ])

logger = logging.getLogger()

device = torch.device('cuda')

if torch.cuda.device_count() > 1:
    multi_gpu = True
else:
    multi_gpu = False
    
print('multi_gpu={}'.format('True' if multi_gpu else 'False'))

DATA_PATH = Path('data/sample/data/')     
LABEL_PATH = Path('data/sample/labels')  

BERT_PRETRAINED_MODEL = "bert/bert-base-uncased"

args["do_lower_case"] = True
args["train_batch_size"] = 16
args["learning_rate"] = 6e-5
args["max_seq_length"] = 512
args["fp16"] = True

tokenizer = BertTokenizer.from_pretrained(BERT_PRETRAINED_MODEL, 
                                          do_lower_case=args['do_lower_case'])

label_cols = ["toxic", "severe_toxic", "obscene", "threat", "insult", "identity_hate"]
databunch = BertDataBunch(DATA_PATH, LABEL_PATH, tokenizer, train_file='train.csv', val_file='valid.csv',
                          test_data='test.csv', label_file="labels.csv",
                          text_col="comment_text", label_col=label_cols,
                          bs=args['train_batch_size'], maxlen=args['max_seq_length'], 
                          multi_gpu=multi_gpu, multi_label=True)

#metrics = [{'name': 'accuracy', 'function': accuracy_multilabel}]                          
#metrics = [{'name': 'roc_auc', 'function': roc_auc}]                          
metrics = [{'name': 'F1-score', 'function': F1}]                          
learner = BertLearner.from_pretrained_model(databunch, BERT_PRETRAINED_MODEL, metrics, device, logger, 
                                            is_fp16=args['fp16'], loss_scale=args['loss_scale'], 
                                            multi_gpu=multi_gpu,  multi_label=True)
learner.fit(4, lr=args['learning_rate'], schedule_type="warmup_linear")

TypeError: init() got an unexpected keyword argument 'max_grad_norm'

I have an issue when running
learner.fit(epochs=6,
lr=6e-5,
validate=True, # Evaluate the model after each epoch
schedule_type="warmup_linear")

using the following learner object:

logger = logging.getLogger()
device_cuda = torch.device("cuda")
metrics = [{'name': 'accuracy', 'function': accuracy}]

learner = BertLearner.from_pretrained_model(
databunch,
pretrained_path='bert-base-uncased',
metrics=metrics,
device=device_cuda,
logger=logger,
#output_dir=OUTPUT_DIR,
finetuned_wgts_path=None,
#warmup_steps=500,
multi_gpu=True,
is_fp16=True,
multi_label=False,
max_grad_norm=1.0)

TypeError Traceback (most recent call last)
in
2 lr=6e-5,
3 validate=True, # Evaluate the model after each epoch
----> 4 schedule_type="warmup_linear")

~/.conda/envs/transformers/lib/python3.7/site-packages/fast_bert/learner.py in fit(self, epochs, lr, validate, schedule_type)
462
463 if self.use_amp_optimizer == False:
--> 464 self.fit_old(epochs, lr, validate=validate, schedule_type=schedule_type)
465 return
466

~/.conda/envs/transformers/lib/python3.7/site-packages/fast_bert/learner.py in fit_old(self, epochs, lr, validate, schedule_type)
573 num_train_steps = int(len(self.data.train_dl) / self.grad_accumulation_steps * epochs)
574 if self.optimizer is None:
--> 575 self.optimizer, self.schedule = self.get_optimizer_old(lr , num_train_steps)
576
577 t_total = num_train_steps

~/.conda/envs/transformers/lib/python3.7/site-packages/fast_bert/learner.py in get_optimizer_old(self, lr, num_train_steps, schedule_type)
233 lr=lr,
234 bias_correction=False,
--> 235 max_grad_norm=1.0)
236
237 if self.loss_scale == 0:

TypeError: init() got an unexpected keyword argument 'max_grad_norm'

Does anyone know how to fix?
Thanks!

Unable to use learner.fit() because of Apex dependencies

Hi, I'm trying to follow the notebook example provided in this repo with some of my own data. However, when I go to fit the model, I get the following:

ModuleNotFoundError Traceback (most recent call last)
~/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/fast_bert/learner.py in get_optimizer(self, lr, num_train_steps, schedule_type)
197 try:
--> 198 from apex.optimizers import FP16_Optimizer
199 from apex.optimizers import FusedAdam

ModuleNotFoundError: No module named `'apex.optimizers'

I have installed Apex correctly using NVIDIA's documentation, and the Apex directory appears the same as in their repo, which leads me to think it's a fast-bert issue. I am using an AWS instance (ml.p3.8xlarge), and my environment is conda_pytorch_p36.

Thanks in advance for any help,

Darren

TypeError: unsupported operand type(s) for /: 'str' and 'str'

Hi,
I'm getting
TypeError: unsupported operand type(s) for /: 'str' and 'str'
error when calling BertDataBunch function. I'm actually surprised how it works for others because in line 294 of data_cls.py there is divide symbol between two strings:

292 self.tokenizer = tokenizer
293 self.data_dir = data_dir
--> 294 self.cache_dir = data_dir/'cache'
295 self.max_seq_length = max_seq_length
296 self.batch_size_per_gpu = batch_size_per_gpu
Thanks!

Attention Weights

Does this return the attention weights that is possible to obtain from the BERT model through PyTorch transformers?

Accuracy_multilabel function probably incorrect

Hi,

def accuracy_multilabel(y_pred:Tensor, y_true:Tensor, sigmoid:bool=True):
    if sigmoid: y_pred = y_pred.sigmoid()
    outputs = np.argmax(y_pred, axis=1)
    real_vals = np.argmax(y_true, axis=1)
    return np.mean(outputs.numpy() == real_vals.numpy())

in this block.

This piece of code seems incorrect as the shape of y_pred and y_true is (Batch_size, class_space).
Doing a np.argmax with axis=1 returns a single class index value for each sample.
This is what we do for multi-class classification.

However in multi-class classification we don't normally use sigmoid on y_pred, although it is not wrong.

This function seems much like accuracy_multiclass rather than accuracy_multilabel

NameError: name 'threshold' is not defined

NameError Traceback (most recent call last)
in ()
8 from pytorch_pretrained_bert.tokenization import BertTokenizer
9
---> 10 from fast_bert.data import BertDataBunch
11 from fast_bert.learner import BertLearner
12 from fast_bert.metrics import accuracy, accuracy_thresh, fbeta, roc_auc

/opt/conda/lib/python3.6/site-packages/fast_bert/init.py in ()
1 from .modeling import BertForMultiLabelSequenceClassification
2 from .data import BertDataBunch, InputExample, InputFeatures, MultiLabelTextProcessor, convert_examples_to_features
----> 3 from .metrics import accuracy, accuracy_thresh, fbeta, roc_auc, accuracy_multilabel
4 from .learner import BertLearner
5 from .prediction import BertClassificationPredictor

/opt/conda/lib/python3.6/site-packages/fast_bert/metrics.py in ()
54 return roc_auc["micro"]
55
---> 56 def Hamming_loss(y_pred:Tensor, y_true:Tensor, sigmoid:bool = True, thresh:float = threshold, sample_weight = None):
57 if sigmoid: y_pred = y_pred.sigmoid()
58 y_pred = (y_pred > thresh).float()

NameError: name 'threshold' is not defined

error at learner.fit while running the new-toxic-multilable sample notebook

learner.fit(args.num_train_epochs, args.learning_rate, validate=True)

RuntimeError Traceback (most recent call last)
in
----> 1 learner.fit(args.num_train_epochs, args.learning_rate, validate=True)

~/.conda/envs/fastbert/lib/python3.6/site-packages/fast_bert/learner_cls.py in fit(self, epochs, lr, validate, schedule_type, optimizer_type)
311 # Evaluate the model after every epoch
312 if validate:
--> 313 results = self.validate()
314 for key, value in results.items():
315 self.logger.info("eval_{} after epoch {}: {}: ".format(key, (epoch + 1), value))

~/.conda/envs/fastbert/lib/python3.6/site-packages/fast_bert/learner_cls.py in validate(self)
382 # Evaluation metrics
383 for metric in self.metrics:
--> 384 validation_scores[metric['name']] = metric['function'](all_logits, all_labels)
385
386 results = {'loss': eval_loss }

~/.conda/envs/fastbert/lib/python3.6/site-packages/fast_bert/metrics.py in accuracy_thresh(y_pred, y_true, thresh, sigmoid)
29 if sigmoid:
30 y_pred = y_pred.sigmoid()
---> 31 return ((y_pred > thresh) == y_true.byte()).float().mean().item()
32 # return np.mean(((y_pred>thresh)==y_true.byte()).float().cpu().numpy(), axis=1).sum()
33

~/.conda/envs/fastbert/lib/python3.6/site-packages/apex/amp/wrap.py in wrapper(*args, **kwargs)
51
52 if len(types) <= 1:
---> 53 return orig_fn(*args, **kwargs)
54 elif len(types) == 2 and types == set(['HalfTensor', 'FloatTensor']):
55 new_args = utils.casted_args(cast_fn,

RuntimeError: Expected object of scalar type Bool but got scalar type Byte for argument #2 'other'

Distributed package doesn't have NCCL built in

Hi,
Is it possible to use fast-bert to make submissions on kaggle? I tried but it threw the above error while making databunch.

utterworks / fast-bert Goto Github PK

fast-bert's Introduction

Fast-Bert

Installation

With pip

From source

Usage

Text Classification

1. Create a DataBunch object

File format for train.csv and val.csv

Tokenizer

Model Type

2. Create a Learner Object

3. Find the optimal learning rate

4. Train the model

5. Save trained model artifacts

6. Model Inference

Language Model Fine-tuning

1. Import the necessary libraries

2. Define parameters and setup datapaths

3. Create DataBunch object

4. Create the LM Learner object

5. Train the model

6. Save trained model artifacts

Amazon Sagemaker Support

Citation

fast-bert's People

Contributors

Stargazers

Watchers

Forkers

fast-bert's Issues

Recommend Projects

Recommend Topics

Recommend Org