javaidnabi31 / multi-label-text-classification-using-bert Goto Github PK
View Code? Open in Web Editor NEWMulti Label text classification using bert
Multi Label text classification using bert
hello,
I don't get this line :
x_test = test[125000:140000]
it gives me : x text = Empty DataFrame
Columns: [000idbert22889, Hola saber si chaqueta repelente agua referencia, 0.0, 0.0.1, 0.0.2, 0.0.3, 0.0.4, 0.0.5, 0.0.6, 0.0.7, 0.0.8, 0.0.9, 0.0.10, 0.0.11, 0.0.12, 0.0.13, 0.0.14, 0.0.15, 0.0.16, 0.0.17, 0.0.18, 0.0.19, 21.0, 0.0.20, 0.0.21, 0.0.22, 0.0.23, 0.0.24, 0.0.25, 0.0.26, 0.0.27, 0.0.28, 0.0.29, 0.0.30, 0.0.31, 0.0.32, 0.0.33, 0.0.34, 0.0.35, 0.0.36, 0.0.37, 0.0.38, 0.0.39, 0.0.40, 0.0.41, 0.0.42, 0.0.43, 0.0.44, 0.0.45, 0.0.46, 0.0.47, 0.0.48, 0.0.49, 0.0.50, 0.0.51, 0.0.52, 0.0.53, 0.0.54, 0.0.55, 0.0.56, 0.0.57, 0.0.58, 0.0.59, 0.0.60, 0.0.61, 0.0.62, 0.0.63, 0.0.64, 0.0.65, 0.0.66, 0.0.67, 0.0.68, 0.0.69, 0.0.70, 0.0.71, 0.0.72, 0.0.73, 0.0.74, 0.0.75, 0.0.76, 0.0.77, 0.0.78, 0.0.79, 0.0.80, 0.0.81, 0.0.82, 0.0.83, 0.0.84, 0.0.85, 0.0.86, 0.0.87, 0.0.88, 0.0.89, 0.0.90, 0.0.91, 0.0.92, 0.0.93, 0.0.94, 0.0.95, 0.0.96, ...]
Index: []
when i print it
and the error : Traceback (most recent call last):
File "Bert.py", line 904, in
dff.columns = BERTtest.LABEL_COLUMNS
File "/home/tf/.local/lib/python3.6/site-packages/pandas/core/generic.py", line 5080, in setattr
return object.setattr(self, name, value)
File "pandas/_libs/properties.pyx", line 69, in pandas._libs.properties.AxisProperty.set
File "/home/tf/.local/lib/python3.6/site-packages/pandas/core/generic.py", line 638, in _set_axis
self._data.set_axis(axis, labels)
File "/home/tf/.local/lib/python3.6/site-packages/pandas/core/internals/managers.py", line 155, in set_axis
'values have {new} elements'.format(old=old_len, new=new_len))
ValueError: Length mismatch: Expected axis has 0 elements, new values have 114 elements
Any idea ?
thanks
Hi, Can you tell me where did you get that file i.e. "train.tf_record"?
Hi,
Thanks for this code! I am facing some issues running this on the TPU. I get the following error
``
/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/model_fn.py in _validate_scaffold(scaffold)
619 if not isinstance(scaffold, monitored_session.Scaffold):
620 raise TypeError(
--> 621 'scaffold must be tf.train.Scaffold. Given: {}'.format(scaffold))
622 return scaffold
623
TypeError: scaffold must be tf.train.Scaffold. Given: <function model_fn_builder..model_fn..tpu_scaffold at 0x7f450a472bf8>``
Does anyone know how to resolve this?
Hi, Thanks for the great article.
Can you help me how we can save the estimator for serving a purpose?
Hi .. anyone knows where to find or how to create these 2 files?
eval_file = os.path.join(‘./working’, “eval.tf_record”)
train_file = os.path.join(‘./working’, “train.tf_record”)
code is referring these 2 files.
Thanks
Is this suppose to run with TF2? Tried running it in a virtual env on Mac with no success. Any help would be greatly appreciated.
Hi, your code is super clean and intuitive, thanks for the good work. I have run it and retrieved saved model under "working/output" dir.
However, I am trying to use this in a presentation. I noticed that whenever I call predict
, it loads the model which cost 5+ seconds. Can you provide some advice so that I won't need to feed input_fn
into predict
and somehow get real-time predictions (one sentence input at a time)?
I have tried fast_predict, doesn't seem to work either.
Hi, thank you for your wonderful tutorial.
I was trying to train the model in a slight different dataset.
My data has label value as range between 0.0 to 1.0. As shown below
So basically its a regression task along with classification. I have to tell how much emotion each text input has.
I have implemented your notebook and retrained it, the result looks something like the following
Here is my Implementation Notebook
Am i doing something wrong?
ModuleNotFoundError Traceback (most recent call last)
in ()
1 import bert
----> 2 from bert import run_classifier
3 from bert import optimization
4 from bert import tokenization
5 from bert import modeling
1 frames
/usr/local/lib/python3.6/dist-packages/bert/modeling.py in ()
27 import six
28 import tensorflow.compat.v1 as tf
---> 29 from tensorflow.contrib import layers as contrib_layers
30
31
ModuleNotFoundError: No module named 'tensorflow.contrib'
I train a multi-label classification with 10 labels [about 1000 training data], but probabilities for all query in each label is from 0.1 to 0.2. When I use the toxic_comments dataset, the output probability is normal. Is this due to the data size?
My task has 11 labels, i've changed LABEL_COLUMNS according to my task, but got this error while running:
ValueError Traceback (most recent call last)
~\Anaconda3\lib\site-packages\tensorflow_core\python\framework\tensor_shape.py in merge_with(self, other)
930 for i, dim in enumerate(self._dims):
--> 931 new_dims.append(dim.merge_with(other[i]))
932 return TensorShape(new_dims)
~\Anaconda3\lib\site-packages\tensorflow_core\python\framework\tensor_shape.py in merge_with(self, other)
310 return NotImplemented
--> 311 self.assert_is_compatible_with(other)
312 if self._value is None:
~\Anaconda3\lib\site-packages\tensorflow_core\python\framework\tensor_shape.py in assert_is_compatible_with(self, other)
274 raise ValueError("Dimensions %s and %s are not compatible" %
--> 275 (self, other))
276
ValueError: Dimensions 6 and 11 are not compatible
During handling of the above exception, another exception occurred:
ValueError Traceback (most recent call last)
~\Anaconda3\lib\site-packages\tensorflow_core\python\ops\nn_impl.py in sigmoid_cross_entropy_with_logits(_sentinel, labels, logits, name)
167 try:
--> 168 labels.get_shape().merge_with(logits.get_shape())
169 except ValueError:
~\Anaconda3\lib\site-packages\tensorflow_core\python\framework\tensor_shape.py in merge_with(self, other)
933 except ValueError:
--> 934 raise ValueError("Shapes %s and %s are not compatible" % (self, other))
935
ValueError: Shapes (32, 6) and (32, 11) are not compatible
During handling of the above exception, another exception occurred:
ValueError Traceback (most recent call last)
<ipython-input-1-e917108fd425> in <module>
670 print(f'Beginning Training!')
671 current_time = datetime.now()
--> 672 estimator.train(input_fn=train_input_fn, max_steps=num_train_steps)
673 print("Training took time ", datetime.now() - current_time)
674
~\Anaconda3\lib\site-packages\tensorflow_estimator\python\estimator\estimator.py in train(self, input_fn, hooks, steps, max_steps, saving_listeners)
368
369 saving_listeners = _check_listeners_type(saving_listeners)
--> 370 loss = self._train_model(input_fn, hooks, saving_listeners)
371 logging.info('Loss for final step: %s.', loss)
372 return self
~\Anaconda3\lib\site-packages\tensorflow_estimator\python\estimator\estimator.py in _train_model(self, input_fn, hooks, saving_listeners)
1159 return self._train_model_distributed(input_fn, hooks, saving_listeners)
1160 else:
-> 1161 return self._train_model_default(input_fn, hooks, saving_listeners)
1162
1163 def _train_model_default(self, input_fn, hooks, saving_listeners):
~\Anaconda3\lib\site-packages\tensorflow_estimator\python\estimator\estimator.py in _train_model_default(self, input_fn, hooks, saving_listeners)
1189 worker_hooks.extend(input_hooks)
1190 estimator_spec = self._call_model_fn(
-> 1191 features, labels, ModeKeys.TRAIN, self.config)
1192 global_step_tensor = training_util.get_global_step(g)
1193 return self._train_with_estimator_spec(estimator_spec, worker_hooks,
~\Anaconda3\lib\site-packages\tensorflow_estimator\python\estimator\estimator.py in _call_model_fn(self, features, labels, mode, config)
1147
1148 logging.info('Calling model_fn.')
-> 1149 model_fn_results = self._model_fn(features=features, **kwargs)
1150 logging.info('Done calling model_fn.')
1151
<ipython-input-1-e917108fd425> in model_fn(features, labels, mode, params)
567 (total_loss, per_example_loss, logits, probabilities) = create_model(
568 bert_config, is_training, input_ids, input_mask, segment_ids, label_ids,
--> 569 num_labels, use_one_hot_embeddings)
570
571 tvars = tf.trainable_variables()
<ipython-input-1-e917108fd425> in create_model(bert_config, is_training, input_ids, input_mask, segment_ids, labels, num_labels, use_one_hot_embeddings)
527 labels = tf.cast(labels, tf.float32)
528 tf.logging.info("num_labels:{};logits:{};labels:{}".format(num_labels, logits, labels))
--> 529 per_example_loss = tf.nn.sigmoid_cross_entropy_with_logits(labels=labels, logits=logits)
530 loss = tf.reduce_mean(per_example_loss)
531
~\Anaconda3\lib\site-packages\tensorflow_core\python\ops\nn_impl.py in sigmoid_cross_entropy_with_logits(_sentinel, labels, logits, name)
169 except ValueError:
170 raise ValueError("logits and labels must have the same shape (%s vs %s)" %
--> 171 (logits.get_shape(), labels.get_shape()))
172
173 # The logistic loss formula from above is
ValueError: logits and labels must have the same shape ((32, 11) vs (32, 6))
TypeError Traceback (most recent call last)
in ()
1 file_based_convert_examples_to_features(
----> 2 train_examples, MAX_SEQ_LENGTH, tokenizer, train_file)
3 tf.logging.info("***** Running training *****")
4 tf.logging.info(" Num examples = %d", len(train_examples))
5 tf.logging.info(" Num steps = %d", num_train_steps)
in file_based_convert_examples_to_features(examples, max_seq_length, tokenizer, output_file)
108 writer = tf.python_io.TFRecordWriter(output_file)
109
--> 110 for (ex_index, example) in enumerate(examples):
111 #if ex_index % 10000 == 0:
112 #tf.logging.info("Writing example %d of %d" % (ex_index, len(examples)))
TypeError: 'NoneType' object is not iterable
I am new to this and trying to classify a self made data set into 15 classes so getting this error:
Traceback (most recent call last):
File "train.py", line 656, in
estimator.train(input_fn=train_input_fn, max_steps=num_train_steps)
File "/usr/local/lib/python3.5/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 367, in train
loss = self._train_model(input_fn, hooks, saving_listeners)
File "/usr/local/lib/python3.5/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 1158, in _train_model
return self._train_model_default(input_fn, hooks, saving_listeners)
File "/usr/local/lib/python3.5/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 1188, in _train_model_default
features, labels, ModeKeys.TRAIN, self.config)
File "/usr/local/lib/python3.5/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 1146, in _call_model_fn
model_fn_results = self._model_fn(features=features, **kwargs)
File "train.py", line 263, in model_fn
num_labels, use_one_hot_embeddings)
File "train.py", line 223, in create_model
per_example_loss = tf.nn.sigmoid_cross_entropy_with_logits(labels=labels, logits=logits)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/nn_impl.py", line 168, in sigmoid_cross_entropy_with_logits
(logits.get_shape(), labels.get_shape()))
ValueError: logits and labels must have the same shape ((32, 15) vs (32, 6))
Let me know how to make changes in the model.
Thanks
I tried running the Jupyter Notebook on Windows and got this during training:
NotFoundError: Failed to create a NewWriteableFile: ./working/output\model.ckpt-0_temp_75eddd18df2249f182707014e8ca29a8/part-00000-of-00001.data-00000-of-00001.tempstate2063351034841590876 : The system cannot find the path specified.
; No such process
In the ./working/output folder I find two files though:
Could there be something wrong with the backslashes in this part perhaps:
OUTPUT_DIR = "./working/output"
run_config = tf.estimator.RunConfig(model_dir=OUTPUT_DIR, save_summary_steps=SAVE_SUMMARY_STEPS, keep_checkpoint_max=1, save_checkpoints_steps=SAVE_CHECKPOINTS_STEPS)
What should I do to fix this attribute error?
AttributeError: module 'tensorflow_core._api.v2.train' has no attribute 'Optimizer'
code:
import bert
from bert import run_classifier
from bert import optimization
from bert import tokenization
from bert import modeling
My tensorflow version is 2.0.0, but I got the same error on 2.1.
bert-tensorflow version 1.0.1
Hi there,
Can you give me some suggestions about how to plot confusion matrix using your code?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.