senwu / emmental Goto Github PK

View Code? Open in Web Editor NEW

107.0 107.0 18.0 912 KB

A deep learning framework for building multimodal multi-task learning systems.

Home Page: https://emmental.readthedocs.io

License: MIT License

Makefile 0.12% Python 99.88%

machine-learning multi-task-learning multimodality

emmental's People

Contributors

Stargazers

Watchers

Forkers

pappagari codeaudit purpleseastar geoffreyangus stenpiren kiminh kuan-li curioustauseef hiromuhota maidenlane vishalbelsare woffett lorr1 mleszczy keawang tor4z krish240574 chrisaberger

emmental's Issues

Wrong checkpoint metric used in load_best_model

Describe the bug
The bug is that the wrong checkpoint_metric is used in load_best_model at the end of EmmentalLearner.learn. I believe that it has to do with the fact that utils.merge doesn't delete entries, it just replaces them. This leaves us with multiple entries in logging_config.checkpointer_config.checkpointer_metric.

To Reproduce
Steps to reproduce the behavior:

Initialize an Emmental experiment
Run the following code snippet:

Meta.update_config(config={
    'learner_config': {
        'n_epochs': 2,
        'valid_split': 'valid',
        'optimizer_config': {'optimizer': 'adam', 'lr': 0.01, 'l2': 0.000},
        'lr_scheduler_config': {}
    },
    'logging_config': {
        'evaluation_freq': 1,
        'checkpointing': True,
        'checkpointer_config': {
            'checkpoint_metric': {
                'model/all/valid/loss': 'min'
            }
        }
    }
})
print(Meta.config['logging_config'])

At this point, it should be clear that there are multiple values in logging_config.checkpointer_config.checkpoint_metric. However, in order to see how this affects downstream tasks, run EmmentalLearner.learn

...
model = EmmentalModel(name='model', tasks=tasks)
learner = EmmentalLearner()
learner.learn()

Finally, print list(learner.logging_manager.checkpointer.checkpoint_metric.keys())[0], which shows the value used by Checkpointer.load_best_model function in order to determine if a best model was found (checkpointer.py, line ~253). The value from the default config should appear at this point instead of the value from the updated config.

Expected behavior
I expect the checkpoint metric I defined in the updated config to be used in Checkpointer.load_best_model.

Environment

OS: Ubuntu 16.04
Emmental Version: 0.0.4
Python 3.6

Support data parallel training

Describe the solution you'd like
Support data parallel training.

Throw exception if provided unsupported metric

Is your feature request related to a problem? Please describe.
Right now, if I try to checkpoint with a metric that is not implemented, checkpointing silently fails

Describe the solution you'd like
An exception that catches cases where the metric is unrecognized

Describe alternatives you've considered

Additional context

Add save() method to EmbeddingModule.save() to save a embedding_file

Is your feature request related to a problem? Please describe.

When you do training and test on two different machines, you have to transfer what's trained from one to another.
As described in https://github.com/HazyResearch/fonduer/blob/master/CHANGELOG.rst

# Collect word counter from candidates which is used in LSTM model.
word_counter = collect_word_counter(train_cands)

# Generate word embedding module for LSTM.
emb_layer = EmbeddingModule(
    word_counter=word_counter, word_dim=300, specials=special_tokens
)

...

# Create test dataloader to do prediction.
# Build test dataloader
test_dataloader = EmmentalDataLoader(
    task_to_label_dict={ATTRIBUTE: "labels"},
    dataset=FonduerDataset(
        ATTRIBUTE, test_cands[0], F_test[0], emb_layer.word2id, 2
    ),
    split="test",
    batch_size=100,
    shuffle=False,
)

emb_layer is trained using train_cands and is used later for test_dataloader.
While EmbeddingModule can load embedding_file in __init__, it does not have a save method to persist what's learnt.

Describe the solution you'd like

Add save() method to EmbeddingModule.save() to save a embedding_file.

Describe alternatives you've considered

Let me know if you have a better idea.

Additional context
Add any other context or screenshots about the feature request here.

EmmentalModel is not fully saved

Describe the bug

When you do training and test on two different machines, you have to transfer what's trained from one to another.
I cannot fully save an EmmentalModel, maybe because 6908168 commented out lines that persist variables like task_names.
As a result, a KeyError happens when you do a prediction using a loaded model.

File "/Users/hiromu/miniconda3/envs/fonduer-mlflow/lib/python3.7/site-packages/emmental/model.py", line 213, in flow
for action in self.task_flows[task_name]:
KeyError: 'wiki'

To Reproduce
Steps to reproduce the behavior:

Train an EmmentalModel

ATTRIBUTE = "wiki"
tasks = create_task(
    ATTRIBUTE, 2, F_train[0].shape[1], 2, emb_layer, model="LogisticRegression"
)

model = EmmentalModel(name=f"{ATTRIBUTE}_task")

for task in tasks:
    model.add_task(task)

emmental_learner = EmmentalLearner()
emmental_learner.learn(model, [train_dataloader])

Save the model

model.save(model_path="model.pkl")

Load the model

model = EmmentalModel()
model.load(model_path="model.pkl")

Predict using the model

model.predict(test_dataloader, return_preds=True)

Expected behavior
A clear and concise description of what you expected to happen.

Error Logs/Screenshots
If applicable, add error logs or screenshots to help explain your problem.

Environment (please complete the following information):

Emmental Version: 0.0.4
Fonduer Version: 0.8.0-dev (ace413d0a687f8e9aa817389c8576d77c02baa59)

Additional context
Add any other context about the problem here.

Memory problems with evaluating images

I think the evaluation code for segmentation tasks (specifically, model.eval and scorer function) is storing huge dicts and arrays of the ground truth labels, predicted labels, and probabilities. Since each ground truth label, predicted label, and probability mask is the same size as the input image, storing 3x the dataset size in intermediate dicts/arrays during evaluation quickly eats up memory and kills the job, even for modestly sized val/test sets (eg >500 2D images). Even if the job continues running, some evaluation variable is being stored in memory and sits on a big chunk of memory during training.

It might be better to process the segmentation evaluation datasets (val/test) in batches. Instead of storing the val/test ground truth and predicted labels in dicts/arrays, you can just store the scoring metrics (eg Dice). Additionally, if the predictions are being saved during the eval, the predictions should also be saved in batches, instead of putting everything in memory.

This is a problem specific to tasks where the network outputs are large, and you care about local metrics (eg Dice per image) instead of global metrics (eg accuracy over all images).

Support functions with Tensorboard Writer in Meta.config

Description of the bug

If a user adds custom optimizers such that the optimizer or parameters are functions, the call in tensorboard_writer.py that json serializes Meta.config will throw error Object of type function is not JSON serializable.

To Reproduce

Steps to reproduce the behavior:

Add this code before emmental.learn is called.

def grouped_parameters(model):
  no_decay = ["bias", "LayerNorm.bias", "LayerNorm.weight"]
  return [
      {
          "params": [
              p
              for n, p in model.named_parameters()
              if not any(nd in n for nd in no_decay)
          ],
          "weight_decay": emmental.Meta.config["learner_config"][
              "optimizer_config"
          ]["l2"],
      },
      {
          "params": [
              p
              for n, p in model.named_parameters()
              if any(nd in n for nd in no_decay)
          ],
          "weight_decay": 0.0,
      },
  ]

emmental.Meta.config["learner_config"]["optimizer_config"][
  "parameters"
] = grouped_parameters

And set the writer to be tensorboard.

Expected behavior

No error. Emmental should handle non-json serializable types in the Meta.config.

Environment (please complete the following information)

OS: Ubuntu
Emmental Version: 0.0.9dev

Support fp16 training

Describe the solution you'd like
Support fp16 (half precision) training.

Provide validation loss metric

Is your feature request related to a problem? Please describe.
Many models use validation loss as a checkpointing metric

Describe the solution you'd like
Add in native support for the validation loss metric

CelebA Multi-Task

Hey,
Can you provide an example of how to train multi-task classification such as:

Visual Domain Decathlon
CelebA

Support multi-label (slicing) classification

Describe the solution you'd like
Support multi-label (slicing) classification

Support wandb for logging

Description of the feature request

Support wandb in Emmental for logging.

wandb: https://wandb.ai/site/experiment-tracking

Describe the solution you'd like

While waiting for Snorkel to support Pytorch > 1.1.0, I'd like Emmental to support pytorch 1.1.0 if it is relatively easy.

Describe alternatives you've considered

Upgrade Snorkel to support Pytorch > 1.1.0.
A few issues have been filed for Snorkel to ask them to support Pytorch > 1.1.0.
snorkel-team/snorkel#1541 (on Jan 31, 2020)
snorkel-team/snorkel#1558 (on Mar 19, 2020)

Additional context

senwu / emmental Goto Github PK

emmental's People

Contributors

Stargazers

Watchers

Forkers

emmental's Issues

Description of the bug

To Reproduce

Expected behavior

Environment (please complete the following information)

Description of the feature request

Recommend Projects

Recommend Topics

Recommend Org