senwu / emmental Goto Github PK
View Code? Open in Web Editor NEWA deep learning framework for building multimodal multi-task learning systems.
Home Page: https://emmental.readthedocs.io
License: MIT License
A deep learning framework for building multimodal multi-task learning systems.
Home Page: https://emmental.readthedocs.io
License: MIT License
Hey,
Can you provide an example of how to train multi-task classification such as:
Is your feature request related to a problem? Please describe.
Many models use validation loss as a checkpointing metric
Describe the solution you'd like
Add in native support for the validation loss metric
Support wandb in Emmental for logging.
Is your feature request related to a problem? Please describe.
Right now, if I try to checkpoint with a metric that is not implemented, checkpointing silently fails
Describe the solution you'd like
An exception that catches cases where the metric is unrecognized
Describe alternatives you've considered
Additional context
Describe the bug
When you do training and test on two different machines, you have to transfer what's trained from one to another.
I cannot fully save an EmmentalModel, maybe because 6908168 commented out lines that persist variables like task_names
.
As a result, a KeyError happens when you do a prediction using a loaded model.
File "/Users/hiromu/miniconda3/envs/fonduer-mlflow/lib/python3.7/site-packages/emmental/model.py", line 213, in flow
for action in self.task_flows[task_name]:
KeyError: 'wiki'
To Reproduce
Steps to reproduce the behavior:
ATTRIBUTE = "wiki"
tasks = create_task(
ATTRIBUTE, 2, F_train[0].shape[1], 2, emb_layer, model="LogisticRegression"
)
model = EmmentalModel(name=f"{ATTRIBUTE}_task")
for task in tasks:
model.add_task(task)
emmental_learner = EmmentalLearner()
emmental_learner.learn(model, [train_dataloader])
model.save(model_path="model.pkl")
model = EmmentalModel()
model.load(model_path="model.pkl")
model.predict(test_dataloader, return_preds=True)
Expected behavior
A clear and concise description of what you expected to happen.
Error Logs/Screenshots
If applicable, add error logs or screenshots to help explain your problem.
Environment (please complete the following information):
Additional context
Add any other context about the problem here.
Describe the solution you'd like
Support data parallel training.
If a user adds custom optimizers such that the optimizer or parameters are functions, the call in tensorboard_writer.py
that json serializes Meta.config
will throw error Object of type function is not JSON serializable
.
Steps to reproduce the behavior:
Add this code before emmental.learn
is called.
def grouped_parameters(model):
no_decay = ["bias", "LayerNorm.bias", "LayerNorm.weight"]
return [
{
"params": [
p
for n, p in model.named_parameters()
if not any(nd in n for nd in no_decay)
],
"weight_decay": emmental.Meta.config["learner_config"][
"optimizer_config"
]["l2"],
},
{
"params": [
p
for n, p in model.named_parameters()
if any(nd in n for nd in no_decay)
],
"weight_decay": 0.0,
},
]
emmental.Meta.config["learner_config"]["optimizer_config"][
"parameters"
] = grouped_parameters
And set the writer to be tensorboard.
No error. Emmental should handle non-json serializable types in the Meta.config.
Describe the solution you'd like
Support multi-label (slicing) classification
Is your feature request related to a problem? Please describe.
Fonduer depends on Emmental and Snorkel.
They have a conflict on the pytorch version: Snorkel requires torch<1.2.0,>=1.1.0
, while Emmental requires torch>=1.3.1, <2.0.0
.
Describe the solution you'd like
While waiting for Snorkel to support Pytorch > 1.1.0, I'd like Emmental to support pytorch 1.1.0 if it is relatively easy.
Describe alternatives you've considered
Upgrade Snorkel to support Pytorch > 1.1.0.
A few issues have been filed for Snorkel to ask them to support Pytorch > 1.1.0.
snorkel-team/snorkel#1541 (on Jan 31, 2020)
snorkel-team/snorkel#1558 (on Mar 19, 2020)
Additional context
Is your feature request related to a problem? Please describe.
When using bash script, argument like model_path takes string as input. Need handle none properly is model_path is not exist.
Describe the solution you'd like
Support fp16 (half precision) training.
Is your feature request related to a problem? Please describe.
When you do training and test on two different machines, you have to transfer what's trained from one to another.
As described in https://github.com/HazyResearch/fonduer/blob/master/CHANGELOG.rst
# Collect word counter from candidates which is used in LSTM model.
word_counter = collect_word_counter(train_cands)
# Generate word embedding module for LSTM.
emb_layer = EmbeddingModule(
word_counter=word_counter, word_dim=300, specials=special_tokens
)
...
# Create test dataloader to do prediction.
# Build test dataloader
test_dataloader = EmmentalDataLoader(
task_to_label_dict={ATTRIBUTE: "labels"},
dataset=FonduerDataset(
ATTRIBUTE, test_cands[0], F_test[0], emb_layer.word2id, 2
),
split="test",
batch_size=100,
shuffle=False,
)
emb_layer
is trained using train_cands and is used later for test_dataloader.
While EmbeddingModule can load embedding_file in __init__
, it does not have a save method to persist what's learnt.
Describe the solution you'd like
Add save() method to EmbeddingModule.save() to save a embedding_file.
Describe alternatives you've considered
Let me know if you have a better idea.
Additional context
Add any other context or screenshots about the feature request here.
I think the evaluation code for segmentation tasks (specifically, model.eval and scorer function) is storing huge dicts and arrays of the ground truth labels, predicted labels, and probabilities. Since each ground truth label, predicted label, and probability mask is the same size as the input image, storing 3x the dataset size in intermediate dicts/arrays during evaluation quickly eats up memory and kills the job, even for modestly sized val/test sets (eg >500 2D images). Even if the job continues running, some evaluation variable is being stored in memory and sits on a big chunk of memory during training.
It might be better to process the segmentation evaluation datasets (val/test) in batches. Instead of storing the val/test ground truth and predicted labels in dicts/arrays, you can just store the scoring metrics (eg Dice). Additionally, if the predictions are being saved during the eval, the predictions should also be saved in batches, instead of putting everything in memory.
This is a problem specific to tasks where the network outputs are large, and you care about local metrics (eg Dice per image) instead of global metrics (eg accuracy over all images).
Log which slicing functions were used for a given run -- if iterating on slicing functions, hard to know from the logs which exact slicing functions were used.
Easiest way to do this may be to copy the slicing_functions.py
file to the log directory.
Describe the bug
The bug is that the wrong checkpoint_metric
is used in load_best_model
at the end of EmmentalLearner.learn
. I believe that it has to do with the fact that utils.merge
doesn't delete entries, it just replaces them. This leaves us with multiple entries in logging_config.checkpointer_config.checkpointer_metric
.
To Reproduce
Steps to reproduce the behavior:
Meta.update_config(config={
'learner_config': {
'n_epochs': 2,
'valid_split': 'valid',
'optimizer_config': {'optimizer': 'adam', 'lr': 0.01, 'l2': 0.000},
'lr_scheduler_config': {}
},
'logging_config': {
'evaluation_freq': 1,
'checkpointing': True,
'checkpointer_config': {
'checkpoint_metric': {
'model/all/valid/loss': 'min'
}
}
}
})
print(Meta.config['logging_config'])
logging_config.checkpointer_config.checkpoint_metric
. However, in order to see how this affects downstream tasks, run EmmentalLearner.learn
...
model = EmmentalModel(name='model', tasks=tasks)
learner = EmmentalLearner()
learner.learn()
list(learner.logging_manager.checkpointer.checkpoint_metric.keys())[0]
, which shows the value used by Checkpointer.load_best_model
function in order to determine if a best model was found (checkpointer.py
, line ~253). The value from the default config should appear at this point instead of the value from the updated config.Expected behavior
I expect the checkpoint metric I defined in the updated config to be used in Checkpointer.load_best_model
.
Environment
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.