conversationai / conversationai-models Goto Github PK
View Code? Open in Web Editor NEWA repository to house model building experiments and tools that are part of the Conversation AI effort.
License: Apache License 2.0
A repository to house model building experiments and tools that are part of the Conversation AI effort.
License: Apache License 2.0
Work should be done after the current changes in the notebook
Added tests in 33199cc
These are not elegant, but might help.
Determine what must change with the pipeline to allow training on datasets that don't fit in memory. Test this out with available Perspective datasets.
Issue:
We currently depend on vocabularies, like glove embeddings, that are:
Proposed solution project:
Use https://github.com/tensorflow/transform to develop text preprocessing pipelines, e.g. to select tokens that occur sufficiently frequently, and create either random or smarter word embeddings for them.
Looks like ml-engine supports hyperparameter tuning. It would be great to integrate with that.
Docs: https://cloud.google.com/ml-engine/docs/hyperparameter-tuning-overview
It's highly repeatable, and I suspect that because 10K * batch size is approximately the input size, we are having some bad behavior at the repeat() time. Not sure how to test, maybe with a no-op trainer.
Examples:
"Comet lets you track code, experiments, and results on ML projects. It’s fast, simple, and free for open source projects."
Looks cool and (they claim!) easy to set up.
As a way to further evaluate these models, it would be nice to have a flag that will score a subset of the test data using the Perspective API. I'm imagining outputting results that have
comment_id
comment_text
y_class
(e.g. 'toxic', 'obscene' etc.)y_gold
(if available)y_prob
(e.g. .89, 0.03 etc.)perspective_api_prob
y_prob
- perspective_api_prob
|Hi Flavien,
Given the fantastic changes to the framework you've made to ease training/deployment and evaluation of models, would you have time to take a look at the README file in the experiments/ folder and update that to help others use the codebase.
Thanks!
Code from this commit.
Implement hyperparameter tuning in Keras
It would be nice if each directory that contains a SavedModel
from one training run also included a JSON (?) file with the accuracy, AUC, FPR, FNR etc. for the model on the held-out data.
We currently have a script that can load a SavedModel
object from an Estimator
model and use it to evaluate new data. This involves loading a saved VocabularyProcessor
to pre-process new data, loading the SavedModel
and running the new data through the model.
We'd like to add similar functionality for a Keras model. This will mean:
@sorensenjs points out that tf Datasets can read a wide variety of formats. It may be too restrictive to only read in TFRecordFormat.
Currently, tensorboard works locally but runs into a 403 error when running on the cloud.
For simplicity, instead of taking a TF Example with a single text feature, we should just input the tensor directly. This means the tf hub and char models should use
tf.estimator.export.build_raw_serving_input_receiver_fn
in place of the
build_parsing_serving_input_receiver_fn
which would eliminate at least one of the blocking ops for tensorflowjs. #222
The ClusterSpec class will allow us to switch between training on CPU and GPU. Right now we only train on CPU, but it should be easy to use GPU with ml-engine.
Docs: https://www.tensorflow.org/api_docs/python/tf/train/ClusterSpec
In the Model
implementation, we do not guarantee that the model arguments are compatible with the CMLE model (in particular the signature of this model). Errors are spotted when collecting predictions after a batch prediction job.
It seems that this script has a potential solution (look for "(Optional) Inspect the model binaries with the SavedModel CLI ").
It would be a nice feature to help the user initialize a Model
instance.
Since we have access to the code from the Winner's of the Kaggle competition, let's try to add their models to this framework. This will also test our ability to build a framework that is robust to quickly incorporating models from external sources.
When running the tf_gru_attention model on the many communities data, the graph runs for a few hundred steps before failing with the following error
tensorflow.python.framework.errors_impl.InvalidArgumentError: 0-th value returned by pyfunc_0 is double, but expects int64
[[Node: PyFunc = PyFuncTin=[DT_STRING], Tout=[DT_INT64], token="pyfunc_0", _device="/device:CPU:0"]]
[[Node: IteratorGetNext = IteratorGetNextoutput_shapes=[[?], [?], [?,?], [?]], output_types=[DT_FLOAT, DT_INT32, DT_INT64, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"]]
The script used to run this model was:
GCS_RESOURCES="gs://kaggle-model-experiments/resources"
python -m tf_trainer.tf_gru_attention.run
--train_path="${GCS_RESOURCES}/transfer_learning_data/many_communities/20181105_train.tfrecord"
--validate_path="${GCS_RESOURCES}/transfer_learning_data/many_communities/20181105_validate.tfrecord"
--embeddings_path="${GCS_RESOURCES}/glove.6B/glove.6B.100d.txt"
--model_dir="tf_gru_attention_local_model_dir"
--labels="removed"
--label_dtypes="int"
The Dawid Skene training pipeline currently doesn't write out any checkpoints, so you need to wait until training has finished before checking the results. And something could fail at the end and you'd lose all the results. Not ideal.
FiLM: Visual Reasoning with a General Conditioning Layer
(https://arxiv.org/abs/1709.07871)
The Kaggle competition requires the submissions be formatted like this:
id,toxic,severe_toxic,obscene,threat,insult,identity_hate
6044863,0.5,0.5,0.5,0.5,0.5,0.5
6102620,0.5,0.5,0.5,0.5,0.5,0.5
14563293,0.5,0.5,0.5,0.5,0.5,0.5
21086297,0.5,0.5,0.5,0.5,0.5,0.5
We're not actually competing in the competition, but it would be good to output our predictions in the same format so we can test our scoring scripts.
Paper: Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks (https://arxiv.org/abs/1703.03400)
Compare the effectiveness of CNN with attention against LSTM with attention. Metrics for comparison could include:
Estimator uses the most recent model by default, see: https://www.tensorflow.org/get_started/checkpoints ; Note that while checkpoints store model weights, the whole graph + weights (aka models) can be restored - this looks like the right abstraction, and may obviate the need for build_parsing_serving_input_receiver_fn
, which exports a model that takes TF.Example proto as input.
Something like (Thanks to @dborkan for the pointers!):
feature_spec = { 'sentence': tf.FixedLenFeature(dtype=tf.string, shape=1)}
serving_input_fn = tf.estimator.export.build_parsing_serving_input_receiver_fn(feature_spec)
# Note: `estimator` below is an instance of the TF Estimator class.estimator.export_savedmodel(<destination_directory>, serving_input_fn)
This seem to fit naturally into the base_model.py abstraction. To be figured out: what's the right way to specify the appropriate checkpoint to use?
Was using tensorboard and getting fairly erratic eval points when training my model (e.g. one at 500 steps, next one at 7k steps). Am I misinterpreting our train_with_eval function?
If not, given that tf's train_with_eval preserves where it left off (as of tensorflow/tensorflow#19062 (comment)), can we stick it in a loop so we can get more eval points? I wrote some code that seems to work in this branch: https://github.com/conversationai/conversationai-models/tree/train-eval-loop
But may be missing something with how checkpoints interact with when evaluation happens.
Build a hierarchical attention model using this pipeline. Reference: https://www.cs.cmu.edu/~hovy/papers/16HLT-hierarchical-attention-networks.pdf
See PR #127
This should be a small code change, but involves reading the documentation on how to set this up in GCP. https://cloud.google.com/ml-engine/docs/using-gpus
It would also be great to add a flag to the python script to switch between GPU's and CPU's
Apparently it is common practice to use convolutional filters of different sizes, but the CNN implementation in this repo does not do this. So we should add it.
See the run.sh in keras_cnn for an example
In the spirit of open source, it would be nice to include some sample data that people can use to run the dawid-skene code out of the box. I think a subset of the wikipedia data used for the Kaggle competition would be good.
The char model when converted
tensorflowjs_converter --output_node_names='frac_neg/predictions/probabilities' --input_format=tf_saved_model experiments/tf_char
_cnn_local_model_dir/100000/1545431873/ experiments/tf_char_cnn_local_model_dir/100000/tfjs/
produces the following:
ValueError: Unsupported Ops in the model before optimization
DecodeRaw, ParseExample, StringSplit
Currently, the Keras model training pipeline assumes you have a glove embeddings file in /local_data/glove.6B/glove.6B.100d.txt
. It doesn't look like we have any documentation about were to get those embeddings and where to put them.
We should:
Prototypical Networks for Few-shot Learning
(https://arxiv.org/abs/1703.05175)
This test currently fails due to:
Traceback (most recent call last):
File "/usr/local/google/home/sorenj/github/conversationai-models/experiments/tf_trainer/common/tfrecord_input_test.py", line 75, in test_TFRecordInput_rounded
round_labels=True)
TypeError: init() got an unexpected keyword argument 'feature_preprocessor_init'
but the problem is deeper than just a change in parameter name.
The test begins failing with commit 2a08943
Recent attempts to train the cloud keras_gru_attention model via run.ml_engine.sh have been failing with the following error:
Non-OK-status: status_ status: Failed precondition: could not dlopen DSO: libcupti.so.9.0; dlerror: libcupti.so.9.0: cannot open shared object file: No such file or directory
It is unclear when this bug began but a keras_gru_attention model was successfully trained on July 12, 2018.
Goal: make it easier to run all tests, setup CI testing, linting, etc.
Reading CSV files is tough, but it's often useful to look through the test data and predictions beyond just looking at the accuracy metrics. One solution is to write a sample of the predictions in a HTML format that we can add some basic styling to so it's easy to read. That way we can go from new model -> analyzing results really quickly.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.