Multi Label text classification using bert

Jupyter Notebook 100.00%

multi-label-text-classification-using-bert's People

Contributors

Stargazers

Watchers

Forkers

jicksonp rgaonkar lindseypeng steffy-zxf anshul-mehta ankurces a-i-joe carrielui ompanda ydudin3-zz aytugkaya adiguzelomer balawillgetyou vibhor98 nk6june pbabvey hugopu renaissancewoman danielweidman 90217 dasunpubudumal anjapago karthikinu lcaamtb fabianoswald teonaecon amirstudy she852232004 kelly2016 wangdong1992 wengbenjue xiaolong5009 vbehrani reshmakrish campbellboyd shrikanthsingh bigmai-1234 ryannetwork halicia kuldeepdhakar fubincom nargesam ibrahim85 lipsajohny ludybupt yinhuax krantirk srmykola v-manhlt3 camilliazhou frostjsy 210010 hellonlp adeyinka-hub noman712 wfidditch jeriousman emanalomar mani-vegupatti naman9875 baursafi sshefs02 singhvinay37 basilwang mbdn subudhidebadash sucre111 sarahyq eroicax tovi-a christan7652 sandeshregmi benhuang2018 evergreenllc2020 reloadbrain yangxx17 mbkan anigi98932 sushil-ds songkaisong shantanubinra liuyunwu ritjos12345 veesamkrao apsarageek patelashutosh thibault-lefebvre nishantsbi aidanconnoroneill fmigone hannamz waleedkaimkhani nafisahmad alexyoung757 chi-charles-zhang dartths miningmouse yongning-sph flyrainkey nicoleta-kyo

multi-label-text-classification-using-bert's Issues

don't understand this line

hello,
I don't get this line :

x_test = test[125000:140000]
it gives me : x text = Empty DataFrame
Columns: [000idbert22889, Hola saber si chaqueta repelente agua referencia, 0.0, 0.0.1, 0.0.2, 0.0.3, 0.0.4, 0.0.5, 0.0.6, 0.0.7, 0.0.8, 0.0.9, 0.0.10, 0.0.11, 0.0.12, 0.0.13, 0.0.14, 0.0.15, 0.0.16, 0.0.17, 0.0.18, 0.0.19, 21.0, 0.0.20, 0.0.21, 0.0.22, 0.0.23, 0.0.24, 0.0.25, 0.0.26, 0.0.27, 0.0.28, 0.0.29, 0.0.30, 0.0.31, 0.0.32, 0.0.33, 0.0.34, 0.0.35, 0.0.36, 0.0.37, 0.0.38, 0.0.39, 0.0.40, 0.0.41, 0.0.42, 0.0.43, 0.0.44, 0.0.45, 0.0.46, 0.0.47, 0.0.48, 0.0.49, 0.0.50, 0.0.51, 0.0.52, 0.0.53, 0.0.54, 0.0.55, 0.0.56, 0.0.57, 0.0.58, 0.0.59, 0.0.60, 0.0.61, 0.0.62, 0.0.63, 0.0.64, 0.0.65, 0.0.66, 0.0.67, 0.0.68, 0.0.69, 0.0.70, 0.0.71, 0.0.72, 0.0.73, 0.0.74, 0.0.75, 0.0.76, 0.0.77, 0.0.78, 0.0.79, 0.0.80, 0.0.81, 0.0.82, 0.0.83, 0.0.84, 0.0.85, 0.0.86, 0.0.87, 0.0.88, 0.0.89, 0.0.90, 0.0.91, 0.0.92, 0.0.93, 0.0.94, 0.0.95, 0.0.96, ...]
Index: []

when i print it
and the error : Traceback (most recent call last):
File "Bert.py", line 904, in
dff.columns = BERTtest.LABEL_COLUMNS
File "/home/tf/.local/lib/python3.6/site-packages/pandas/core/generic.py", line 5080, in setattr
return object.setattr(self, name, value)
File "pandas/_libs/properties.pyx", line 69, in pandas._libs.properties.AxisProperty.set
File "/home/tf/.local/lib/python3.6/site-packages/pandas/core/generic.py", line 638, in _set_axis
self._data.set_axis(axis, labels)
File "/home/tf/.local/lib/python3.6/site-packages/pandas/core/internals/managers.py", line 155, in set_axis
'values have {new} elements'.format(old=old_len, new=new_len))
ValueError: Length mismatch: Expected axis has 0 elements, new values have 114 elements
Any idea ?

thanks

train.tf_record not found error

Hi, Can you tell me where did you get that file i.e. "train.tf_record"?

Unable to run on TPU

Hi,
Thanks for this code! I am facing some issues running this on the TPU. I get the following error
``
/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/model_fn.py in _validate_scaffold(scaffold)
619 if not isinstance(scaffold, monitored_session.Scaffold):
620 raise TypeError(
--> 621 'scaffold must be tf.train.Scaffold. Given: {}'.format(scaffold))
622 return scaffold
623

TypeError: scaffold must be tf.train.Scaffold. Given: <function model_fn_builder..model_fn..tpu_scaffold at 0x7f450a472bf8>``
Does anyone know how to resolve this?

How to save this model for serving?

Hi, Thanks for the great article.

Can you help me how we can save the estimator for serving a purpose?

missing files.

Hi .. anyone knows where to find or how to create these 2 files?
eval_file = os.path.join(‘./working’, “eval.tf_record”)
train_file = os.path.join(‘./working’, “train.tf_record”)
code is referring these 2 files.
Thanks

Not running with TF 2

Is this suppose to run with TF2? Tried running it in a virtual env on Mac with no success. Any help would be greatly appreciated.

Faster Prediction

Hi, your code is super clean and intuitive, thanks for the good work. I have run it and retrieved saved model under "working/output" dir.
However, I am trying to use this in a presentation. I noticed that whenever I call predict, it loads the model which cost 5+ seconds. Can you provide some advice so that I won't need to feed input_fn into predict and somehow get real-time predictions (one sentence input at a time)?

I have tried fast_predict, doesn't seem to work either.

Using Different Dataset format.

Hi, thank you for your wonderful tutorial.
I was trying to train the model in a slight different dataset.
My data has label value as range between 0.0 to 1.0. As shown below

So basically its a regression task along with classification. I have to tell how much emotion each text input has.

I have implemented your notebook and retrained it, the result looks something like the following

Here is my Implementation Notebook
Am i doing something wrong?

ModuleNotFoundError: No module named 'tensorflow.contrib'

ModuleNotFoundError Traceback (most recent call last)
in ()
1 import bert
----> 2 from bert import run_classifier
3 from bert import optimization
4 from bert import tokenization
5 from bert import modeling

1 frames
/usr/local/lib/python3.6/dist-packages/bert/modeling.py in ()
27 import six
28 import tensorflow.compat.v1 as tf
---> 29 from tensorflow.contrib import layers as contrib_layers
30
31

ModuleNotFoundError: No module named 'tensorflow.contrib'

Why I get all probability range from 0.1 - 0.2 using my own data

I train a multi-label classification with 10 labels [about 1000 training data], but probabilities for all query in each label is from 0.1 to 0.2. When I use the toxic_comments dataset, the output probability is normal. Is this due to the data size?

logits and label shapes are not compatible error

My task has 11 labels, i've changed LABEL_COLUMNS according to my task, but got this error while running:

ValueError                                Traceback (most recent call last)
~\Anaconda3\lib\site-packages\tensorflow_core\python\framework\tensor_shape.py in merge_with(self, other)
    930         for i, dim in enumerate(self._dims):
--> 931           new_dims.append(dim.merge_with(other[i]))
    932         return TensorShape(new_dims)

~\Anaconda3\lib\site-packages\tensorflow_core\python\framework\tensor_shape.py in merge_with(self, other)
    310       return NotImplemented
--> 311     self.assert_is_compatible_with(other)
    312     if self._value is None:

~\Anaconda3\lib\site-packages\tensorflow_core\python\framework\tensor_shape.py in assert_is_compatible_with(self, other)
    274       raise ValueError("Dimensions %s and %s are not compatible" %
--> 275                        (self, other))
    276 

ValueError: Dimensions 6 and 11 are not compatible

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
~\Anaconda3\lib\site-packages\tensorflow_core\python\ops\nn_impl.py in sigmoid_cross_entropy_with_logits(_sentinel, labels, logits, name)
    167     try:
--> 168       labels.get_shape().merge_with(logits.get_shape())
    169     except ValueError:

~\Anaconda3\lib\site-packages\tensorflow_core\python\framework\tensor_shape.py in merge_with(self, other)
    933       except ValueError:
--> 934         raise ValueError("Shapes %s and %s are not compatible" % (self, other))
    935 

ValueError: Shapes (32, 6) and (32, 11) are not compatible

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
<ipython-input-1-e917108fd425> in <module>
    670 print(f'Beginning Training!')
    671 current_time = datetime.now()
--> 672 estimator.train(input_fn=train_input_fn, max_steps=num_train_steps)
    673 print("Training took time ", datetime.now() - current_time)
    674 

~\Anaconda3\lib\site-packages\tensorflow_estimator\python\estimator\estimator.py in train(self, input_fn, hooks, steps, max_steps, saving_listeners)
    368 
    369       saving_listeners = _check_listeners_type(saving_listeners)
--> 370       loss = self._train_model(input_fn, hooks, saving_listeners)
    371       logging.info('Loss for final step: %s.', loss)
    372       return self

~\Anaconda3\lib\site-packages\tensorflow_estimator\python\estimator\estimator.py in _train_model(self, input_fn, hooks, saving_listeners)
   1159       return self._train_model_distributed(input_fn, hooks, saving_listeners)
   1160     else:
-> 1161       return self._train_model_default(input_fn, hooks, saving_listeners)
   1162 
   1163   def _train_model_default(self, input_fn, hooks, saving_listeners):

~\Anaconda3\lib\site-packages\tensorflow_estimator\python\estimator\estimator.py in _train_model_default(self, input_fn, hooks, saving_listeners)
   1189       worker_hooks.extend(input_hooks)
   1190       estimator_spec = self._call_model_fn(
-> 1191           features, labels, ModeKeys.TRAIN, self.config)
   1192       global_step_tensor = training_util.get_global_step(g)
   1193       return self._train_with_estimator_spec(estimator_spec, worker_hooks,

~\Anaconda3\lib\site-packages\tensorflow_estimator\python\estimator\estimator.py in _call_model_fn(self, features, labels, mode, config)
   1147 
   1148     logging.info('Calling model_fn.')
-> 1149     model_fn_results = self._model_fn(features=features, **kwargs)
   1150     logging.info('Done calling model_fn.')
   1151 

<ipython-input-1-e917108fd425> in model_fn(features, labels, mode, params)
    567         (total_loss, per_example_loss, logits, probabilities) = create_model(
    568             bert_config, is_training, input_ids, input_mask, segment_ids, label_ids,
--> 569             num_labels, use_one_hot_embeddings)
    570 
    571         tvars = tf.trainable_variables()

<ipython-input-1-e917108fd425> in create_model(bert_config, is_training, input_ids, input_mask, segment_ids, labels, num_labels, use_one_hot_embeddings)
    527         labels = tf.cast(labels, tf.float32)
    528         tf.logging.info("num_labels:{};logits:{};labels:{}".format(num_labels, logits, labels))
--> 529         per_example_loss = tf.nn.sigmoid_cross_entropy_with_logits(labels=labels, logits=logits)
    530         loss = tf.reduce_mean(per_example_loss)
    531 

~\Anaconda3\lib\site-packages\tensorflow_core\python\ops\nn_impl.py in sigmoid_cross_entropy_with_logits(_sentinel, labels, logits, name)
    169     except ValueError:
    170       raise ValueError("logits and labels must have the same shape (%s vs %s)" %
--> 171                        (logits.get_shape(), labels.get_shape()))
    172 
    173     # The logistic loss formula from above is

ValueError: logits and labels must have the same shape ((32, 11) vs (32, 6))

Error while logging info on 21st line in notebook

TypeError Traceback (most recent call last)
in ()
1 file_based_convert_examples_to_features(
----> 2 train_examples, MAX_SEQ_LENGTH, tokenizer, train_file)
3 tf.logging.info("***** Running training *****")
4 tf.logging.info(" Num examples = %d", len(train_examples))
5 tf.logging.info(" Num steps = %d", num_train_steps)

in file_based_convert_examples_to_features(examples, max_seq_length, tokenizer, output_file)
108 writer = tf.python_io.TFRecordWriter(output_file)
109
--> 110 for (ex_index, example) in enumerate(examples):
111 #if ex_index % 10000 == 0:
112 #tf.logging.info("Writing example %d of %d" % (ex_index, len(examples)))

TypeError: 'NoneType' object is not iterable

logits and labels are of different shape

I am new to this and trying to classify a self made data set into 15 classes so getting this error:

Traceback (most recent call last):
File "train.py", line 656, in
estimator.train(input_fn=train_input_fn, max_steps=num_train_steps)
File "/usr/local/lib/python3.5/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 367, in train
loss = self._train_model(input_fn, hooks, saving_listeners)
File "/usr/local/lib/python3.5/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 1158, in _train_model
return self._train_model_default(input_fn, hooks, saving_listeners)
File "/usr/local/lib/python3.5/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 1188, in _train_model_default
features, labels, ModeKeys.TRAIN, self.config)
File "/usr/local/lib/python3.5/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 1146, in _call_model_fn
model_fn_results = self._model_fn(features=features, **kwargs)
File "train.py", line 263, in model_fn
num_labels, use_one_hot_embeddings)
File "train.py", line 223, in create_model
per_example_loss = tf.nn.sigmoid_cross_entropy_with_logits(labels=labels, logits=logits)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/nn_impl.py", line 168, in sigmoid_cross_entropy_with_logits
(logits.get_shape(), labels.get_shape()))
ValueError: logits and labels must have the same shape ((32, 15) vs (32, 6))

Let me know how to make changes in the model.
Thanks

Unable to run

I wanna run the notebook, but got the error.

NotFoundError when saving checkpoints

I tried running the Jupyter Notebook on Windows and got this during training:

NotFoundError: Failed to create a NewWriteableFile: ./working/output\model.ckpt-0_temp_75eddd18df2249f182707014e8ca29a8/part-00000-of-00001.data-00000-of-00001.tempstate2063351034841590876 : The system cannot find the path specified.
; No such process

In the ./working/output folder I find two files though:

model.ckpt-0_temp_75eddd18df2249f182707014e8ca29a8
graph.pbtxt

Could there be something wrong with the backslashes in this part perhaps:

OUTPUT_DIR = "./working/output"
run_config = tf.estimator.RunConfig(model_dir=OUTPUT_DIR, save_summary_steps=SAVE_SUMMARY_STEPS, keep_checkpoint_max=1, save_checkpoints_steps=SAVE_CHECKPOINTS_STEPS)

AttributeError: ...has no attribute 'Optimizer'

What should I do to fix this attribute error?

AttributeError: module 'tensorflow_core._api.v2.train' has no attribute 'Optimizer'

code:

import bert
from bert import run_classifier
from bert import optimization
from bert import tokenization
from bert import modeling

My tensorflow version is 2.0.0, but I got the same error on 2.1.
bert-tensorflow version 1.0.1

How to plot confusion matrix

Hi there,

Can you give me some suggestions about how to plot confusion matrix using your code?

javaidnabi31 / multi-label-text-classification-using-bert Goto Github PK

multi-label-text-classification-using-bert's People

Contributors

Stargazers

Watchers

Forkers

multi-label-text-classification-using-bert's Issues

Recommend Projects

Recommend Topics

Recommend Org