Git Product home page Git Product logo

emmental's People

Contributors

krandiash avatar lorr1 avatar senwu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

emmental's Issues

CelebA Multi-Task

Hey,
Can you provide an example of how to train multi-task classification such as:

  • Visual Domain Decathlon
  • CelebA

Provide validation loss metric

Is your feature request related to a problem? Please describe.
Many models use validation loss as a checkpointing metric

Describe the solution you'd like
Add in native support for the validation loss metric

Throw exception if provided unsupported metric

Is your feature request related to a problem? Please describe.
Right now, if I try to checkpoint with a metric that is not implemented, checkpointing silently fails

Describe the solution you'd like
An exception that catches cases where the metric is unrecognized

Describe alternatives you've considered

Additional context

EmmentalModel is not fully saved

Describe the bug

When you do training and test on two different machines, you have to transfer what's trained from one to another.
I cannot fully save an EmmentalModel, maybe because 6908168 commented out lines that persist variables like task_names.
As a result, a KeyError happens when you do a prediction using a loaded model.

File "/Users/hiromu/miniconda3/envs/fonduer-mlflow/lib/python3.7/site-packages/emmental/model.py", line 213, in flow
for action in self.task_flows[task_name]:
KeyError: 'wiki'

To Reproduce
Steps to reproduce the behavior:

  1. Train an EmmentalModel
ATTRIBUTE = "wiki"
tasks = create_task(
    ATTRIBUTE, 2, F_train[0].shape[1], 2, emb_layer, model="LogisticRegression"
)

model = EmmentalModel(name=f"{ATTRIBUTE}_task")

for task in tasks:
    model.add_task(task)

emmental_learner = EmmentalLearner()
emmental_learner.learn(model, [train_dataloader])
  1. Save the model
model.save(model_path="model.pkl")
  1. Load the model
model = EmmentalModel()
model.load(model_path="model.pkl")
  1. Predict using the model
model.predict(test_dataloader, return_preds=True)

Expected behavior
A clear and concise description of what you expected to happen.

Error Logs/Screenshots
If applicable, add error logs or screenshots to help explain your problem.

Environment (please complete the following information):

  • Emmental Version: 0.0.4
  • Fonduer Version: 0.8.0-dev (ace413d0a687f8e9aa817389c8576d77c02baa59)

Additional context
Add any other context about the problem here.

Support functions with Tensorboard Writer in Meta.config

Description of the bug

If a user adds custom optimizers such that the optimizer or parameters are functions, the call in tensorboard_writer.py that json serializes Meta.config will throw error Object of type function is not JSON serializable.

To Reproduce

Steps to reproduce the behavior:

Add this code before emmental.learn is called.

def grouped_parameters(model):
  no_decay = ["bias", "LayerNorm.bias", "LayerNorm.weight"]
  return [
      {
          "params": [
              p
              for n, p in model.named_parameters()
              if not any(nd in n for nd in no_decay)
          ],
          "weight_decay": emmental.Meta.config["learner_config"][
              "optimizer_config"
          ]["l2"],
      },
      {
          "params": [
              p
              for n, p in model.named_parameters()
              if any(nd in n for nd in no_decay)
          ],
          "weight_decay": 0.0,
      },
  ]

emmental.Meta.config["learner_config"]["optimizer_config"][
  "parameters"
] = grouped_parameters

And set the writer to be tensorboard.

Expected behavior

No error. Emmental should handle non-json serializable types in the Meta.config.

Environment (please complete the following information)

  • OS: Ubuntu
  • Emmental Version: 0.0.9dev

Support pytorch==1.1.0

Is your feature request related to a problem? Please describe.

Fonduer depends on Emmental and Snorkel.
They have a conflict on the pytorch version: Snorkel requires torch<1.2.0,>=1.1.0, while Emmental requires torch>=1.3.1, <2.0.0.

Describe the solution you'd like

While waiting for Snorkel to support Pytorch > 1.1.0, I'd like Emmental to support pytorch 1.1.0 if it is relatively easy.

Describe alternatives you've considered

Upgrade Snorkel to support Pytorch > 1.1.0.
A few issues have been filed for Snorkel to ask them to support Pytorch > 1.1.0.
snorkel-team/snorkel#1541 (on Jan 31, 2020)
snorkel-team/snorkel#1558 (on Mar 19, 2020)

Additional context

Handle none for argparse

Is your feature request related to a problem? Please describe.
When using bash script, argument like model_path takes string as input. Need handle none properly is model_path is not exist.

Add save() method to EmbeddingModule.save() to save a embedding_file

Is your feature request related to a problem? Please describe.

When you do training and test on two different machines, you have to transfer what's trained from one to another.
As described in https://github.com/HazyResearch/fonduer/blob/master/CHANGELOG.rst

# Collect word counter from candidates which is used in LSTM model.
word_counter = collect_word_counter(train_cands)

# Generate word embedding module for LSTM.
emb_layer = EmbeddingModule(
    word_counter=word_counter, word_dim=300, specials=special_tokens
)

...

# Create test dataloader to do prediction.
# Build test dataloader
test_dataloader = EmmentalDataLoader(
    task_to_label_dict={ATTRIBUTE: "labels"},
    dataset=FonduerDataset(
        ATTRIBUTE, test_cands[0], F_test[0], emb_layer.word2id, 2
    ),
    split="test",
    batch_size=100,
    shuffle=False,
)

emb_layer is trained using train_cands and is used later for test_dataloader.
While EmbeddingModule can load embedding_file in __init__, it does not have a save method to persist what's learnt.

Describe the solution you'd like

Add save() method to EmbeddingModule.save() to save a embedding_file.

Describe alternatives you've considered

Let me know if you have a better idea.

Additional context
Add any other context or screenshots about the feature request here.

Memory problems with evaluating images

I think the evaluation code for segmentation tasks (specifically, model.eval and scorer function) is storing huge dicts and arrays of the ground truth labels, predicted labels, and probabilities. Since each ground truth label, predicted label, and probability mask is the same size as the input image, storing 3x the dataset size in intermediate dicts/arrays during evaluation quickly eats up memory and kills the job, even for modestly sized val/test sets (eg >500 2D images). Even if the job continues running, some evaluation variable is being stored in memory and sits on a big chunk of memory during training.

It might be better to process the segmentation evaluation datasets (val/test) in batches. Instead of storing the val/test ground truth and predicted labels in dicts/arrays, you can just store the scoring metrics (eg Dice). Additionally, if the predictions are being saved during the eval, the predictions should also be saved in batches, instead of putting everything in memory.

This is a problem specific to tasks where the network outputs are large, and you care about local metrics (eg Dice per image) instead of global metrics (eg accuracy over all images).

Log slicing functions

Log which slicing functions were used for a given run -- if iterating on slicing functions, hard to know from the logs which exact slicing functions were used.

Easiest way to do this may be to copy the slicing_functions.py file to the log directory.

Wrong checkpoint metric used in load_best_model

Describe the bug
The bug is that the wrong checkpoint_metric is used in load_best_model at the end of EmmentalLearner.learn. I believe that it has to do with the fact that utils.merge doesn't delete entries, it just replaces them. This leaves us with multiple entries in logging_config.checkpointer_config.checkpointer_metric.

To Reproduce
Steps to reproduce the behavior:

  1. Initialize an Emmental experiment
  2. Run the following code snippet:
Meta.update_config(config={
    'learner_config': {
        'n_epochs': 2,
        'valid_split': 'valid',
        'optimizer_config': {'optimizer': 'adam', 'lr': 0.01, 'l2': 0.000},
        'lr_scheduler_config': {}
    },
    'logging_config': {
        'evaluation_freq': 1,
        'checkpointing': True,
        'checkpointer_config': {
            'checkpoint_metric': {
                'model/all/valid/loss': 'min'
            }
        }
    }
})
print(Meta.config['logging_config'])
  1. At this point, it should be clear that there are multiple values in logging_config.checkpointer_config.checkpoint_metric. However, in order to see how this affects downstream tasks, run EmmentalLearner.learn
...
model = EmmentalModel(name='model', tasks=tasks)
learner = EmmentalLearner()
learner.learn()
  1. Finally, print list(learner.logging_manager.checkpointer.checkpoint_metric.keys())[0], which shows the value used by Checkpointer.load_best_model function in order to determine if a best model was found (checkpointer.py, line ~253). The value from the default config should appear at this point instead of the value from the updated config.

Expected behavior
I expect the checkpoint metric I defined in the updated config to be used in Checkpointer.load_best_model.

Environment

  • OS: Ubuntu 16.04
  • Emmental Version: 0.0.4
  • Python 3.6

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.