delira-dev / delira Goto Github PK

View Code? Open in Web Editor NEW

221.0 221.0 26.0 7.99 MB

Lightweight framework for fast prototyping and training deep neural networks with PyTorch and TensorFlow

Home Page: https://delira.rtfd.io

License: GNU Affero General Public License v3.0

Python 75.83% Dockerfile 0.10% Jupyter Notebook 23.71% Shell 0.22% TeX 0.14%

deep-learning delira machine-learning medical-images medical-imaging pytorch radiology tensorflow

delira's People

Contributors

Stargazers

Watchers

delira's Issues

[FeatureRequest] Merge kfold types

Description
What should be added/changed?
Do a KFold_predict where u can turn on/off straitfied

Feature History
What have you tried so far?
maybe just delte all labels something like this:
metrics_val = {}
outputs = {}

    if num_splits is None:
        num_splits = 10
        logger.warning("num_splits not defined, using default value of \
                                   10 splits instead ")

    if isinstance(data.dataset, BaseLazyDataset):
        logger.warning("A lazy dataset is given for stratified kfold. \
                               Iterating over the dataset to extract labels for \
                               stratification may be a massive overhead")

    split_idxs = list(range(len(data.dataset)))
    

    fold = KFold(n_splits=num_splits, shuffle=shuffle,
                           random_state=random_seed)

    for idx, (_train_idxs, test_idxs) in enumerate(fold.split(split_idxs,
                                                              split_labels)):
        # extract data from single manager
        _train_data = data.get_subset(_train_idxs)
        _split_idxs = list(range(len(_train_data.dataset)))
        

        val_fold = ShuffleSplit(n_splits=1,
                                          test_size=split_val,
                                          random_state=random_seed)

        for train_idxs, val_idx in val_fold.split(_split_idxs):
            train_data = _train_data.get_subset(train_idxs)
            val_data = _train_data.get_subset(val_idx)

        test_data = data.get_subset(test_idxs)

        # update manager behavior for train and test case
        train_data.update_state_from_dict(train_kwargs)
        val_data.update_state_from_dict(test_kwargs)
        test_data.update_state_from_dict(test_kwargs)
        model = self.run(train_data, val_data,
                         num_epochs=num_epochs,
                         fold=idx,
                         **kwargs)

        _outputs, _metrics_val = self.test(
            self.params, model, test_data)

        outputs[str(idx)] = _outputs
        metrics_val[str(idx)] = _metrics_val

    return outputs, metrics_val

Proposal
How could the feature be implemented?
*Are you able/willing to implement the feature yourself (with some guidance from us)?

Additional context
Add any other context about the feature request here.

[Bug] `PTNetworkTrainer.predict` drops last elements of `batchgen`

Description
This condition is never fulfilled, and therefore the last elements of batchgen are dropped.
If batch_size is modified, internal orig_batch_size is required to correctly generate averages by ceiling.

Environment

OS: Ubuntu 18.04
Python version: 3.6
delira version: 0.2.0-beta.1
How did you install delira? [source]

Reproduction
Run MNIST Classification example

Additional context

[Bug] Error when calling `get_subset()` from datamanager

Calling get_subset from a datamanager throws a TypeError because 'BlankDataset' receives multiple data keys.
Furthermore, the internal variable _load_fn doesn't match its counterpart in the __init__ signature of the BlankDataset. As a result, the BlankDataset is missing one required argument (load_fn).

A simple fix is to pop the data key from the kwargs dictionary and remove leading underscores in the keys.

https://github.com/justusschock/delira/blob/3374ac6b86358b6910f3ac1a8c070888e8dc9c33/delira/data_loading/dataset.py#L86-L112

https://github.com/justusschock/delira/blob/3374ac6b86358b6910f3ac1a8c070888e8dc9c33/delira/data_loading/dataset.py#L148-L156

Coverage

Fix coverage calculation in CI/CD

Tf Integration

Currently working on this

[Bug] TF Tests

Unignore TF Tests once MIC-DKFZ/trixi#12 is resolved

[FeatureRequest] Add Predictor class

Add Predictor-API

Scheduled after #17 to avoid refactoring

Dependencies

Recheck dependencies for deprecations

[FeatureRequest] Merge TfExperiment and PytorchExperiment

Description
Merge TfExperiment and PytorchExperiment to remove code duplication. Seemingly only the handling of tf.Graph objects has to be shifted into the Trainer or alternatively the TfNetwork

Proposal
Add specific tf.Graph namespaces to TfNetwork by providing them to tf.Session. This should eliminate the need to reset the global default graph per Trainer instance.

PEP8

Enable PEP8 checks in CI/CD

[FeatureRequest] Backend Choosing

Get a more robust way to determine the currently installed backends.

Environment variable might be shared across bash instances and conda installations

Hyperparameter search

Integrate hyperparameter tuning with ray

Integrate Docker build into CI/CD

Let Travis build the Docker file

[FeatureRequest] Dataset API expect more than one path for a sample

Description
For data loading it is often important to have a ground truth path and a data path to pass to your loading function. At the moment it looks like this in BaseCacheDataset for example:

if isinstance(path, list):
            # iterate over all elements
            for p in tqdm(path, unit='samples', desc="Loading samples"):
data.append(self._load_fn(p, **self._load_kwargs))

In Lazy Datasets this would be the same case for the getitem method.

Proposal
In my opininion we should add an asterisk in front of the p so that you could also pass lists of pathes to the load function.
Personally i prefer using ** in front of p and pass a list of dictionaries to the Dataset, as this would also allow passing some sort of label or whatever can be of interest for loading. But i am not sure what would be the more elegant way.

Dependencies

Check Dependencies after all other issues for milestone first release have been resolved

Update Logo

Make current PNG logo a SVG

[Bug] Docs

Update docs to read the docs once readthedocs/readthedocs.org#5228 is fixed

[Bug] notebook "tutorial_delira" failed

Description
run https://github.com/justusschock/delira/blob/master/notebooks/tutorial_delira.ipynb
failure:
1, " Experiment" :
ValueError: all the input array dimensions except for the concatenation axis must match exactly
2, "Logging::
nothing show nowhere.

run https://github.com/justusschock/delira/blob/master/notebooks/segmentation_2d_pytorch.ipynb
"logger.info("Init Experiment")", nothing shows nowhere.

Can you please check this notebook again?

[FeatureRequest] Dataset use w/o ground truth

Description
It would be nice to be able to use any Dataset without any given ground truth.

Feature History
I tried not setting gt_extensions and giving it an empty list.
Both did not work and i would prefer if an empty list would be the default value.

Proposal
There would have to be a check in the _is_valid_image_file method if there are file extensions given.
If not the image should be valid if it is existent.

Additional context
This could be a solution:

…
has_label = not self._gt_extensions
for ext in self._gt_extensions:
label_file = fname.rsplit(".", maxsplit=1)[0] + ext
if os.path.isfile(label_file):
     has_label = True

return is_valid_file and has_label

[FeatureRequest] DatasetIterator

Make the dataset iterable (probably sufficient to simply add a __curr_index attribute and __next__ and __iter__ functions to AbstractDataset)

[Bug] Multiple GPUs: Load state_dict

Description
Loading state_dict at begin of training fails if training on multiple GPUs

Environment

OS: Ubuntu 18
Python version: 3.7
delira version 0.3.0
How did you install delira? pip

Register project in pypi

Make delira installable by pip

Squash Commits

Squash commits from cc661d1 to latest to one commit (probably via git rebase since they are all for doc generation

[Bug] batchgenerators version

#88 fixes the batchgen version due to an installation problem with their latest release

Half precision

I would like to add half precision support to delira by adding an option to the PyTorchNetworkTrainer via https://github.com/NVIDIA/Apex. This would add Apex as additional dependency.

Thoughts @haarburger @ORippler ?

Can pytorch_tensor_to_numpy backword gradient?

As torch op has rewrote it's own backward function, the step pytorch_tensor_to_numpy is out of torch op, can it backward gradient to former layers?

[Bug] Tf ResNet18 is missing Residual Connections

Description

Forgot to include Residual Connecitons into Tf ResNet18 implementation

Tests

Write small unittests for pytorch trainer, experiment and parameters

[FeatureRequest] Define seperate transformations for train and test data in k-fold.

The k-fold method in the experiment class currently only works with datamanagers with pre-defined transformations.
It would be great, if you could define some base-transformations (e.g. pre-processing) which are applied to the train and test set and train-transformations (e.g. augmentation) which are only applied to the training set.

https://github.com/justusschock/delira/blob/3374ac6b86358b6910f3ac1a8c070888e8dc9c33/delira/training/experiment.py#L97-L113

[Bug][URGENT] Fix Unittests

Pytest randomly freezes in CI/CD -> switch to other test framework

[FeatureRequest] Add logging Frequency during training

Description
Provide means to specify logging Frequency during training. For Visdom, all logging results are kept in memory (risking OOM), and with tensorboardlogger, disk I/O may bottleneck training otherwise.

Proposal
There are different ways to tackle this. One could be to utilize CombinedLogger from trixi. However, I'm unsure whether CombinedLogger allows for logging frequencies assigned per logged object (e.g. Scalars every iteration and images every Nth iteration), which is a feature I would like to have.

Additional context
Solve before implement logging frequencies as part of #47

[FeatureRequest] Dataset Metrics

Find a way to provide a lazy way for calculating dataset metrics since they'd currently break the intention of lazy datasets (see #66 )

Documentation

Host docs on readthedocs.io

[Bug] Mac OS X: Backend for TF not found

Description
I tried installing delira with and without backend to a conda environment which has an already installed Tensorflow 1.12 in it. I get this error when executing one of the tests for example:

python3 delira/tests/models/test_models_tf.py 
Traceback (most recent call last):
  File "/Users/mmueller/Documents/03_Programmieren/02_machine_learning/installation/lfb/delira/tests/models/test_models_tf.py", line 2, in <module>
    if "tf" in os.environ["DELIRA_BACKEND"]:
  File "/Users/mmueller/miniconda3/envs/ml/lib/python3.7/os.py", line 678, in getitem
    raise KeyError(key) from None
KeyError: 'DELIRA_BACKEND'

I am not sure if i have to install any Backend in addition to a normal tensorflow installation. In this case it may be added to the Readme. Thankful for any help!

Environment

OS: Mac OS X 10.13.6
Python version: 3.7.2
delira version: 0.3.0

Reproduction
I installed Tensorflow 1.12 from this git with this instruction.
Now it is listed in conda as tensowrflow but not as tensorflow-gpu although GPU support works, which i think may be the error.

Additional context
I already tried just changing the requirement to just tensorflow.

Docs

Check automated doc building (on github page via CI/CD and on https://delira.rtfd.io

Move to new AMP API

https://nvidia.github.io/apex/amp.html#transition-guide-for-old-api-users

Refactoring Parameter class

I (and @ORippler ) would like to refactor the HyperParameters class.

Currently we have all parameters splitted in HyperParameters and model_kwargs.
For they hyperparameter search we must decide, whether a parameter is variable or not, and we must combine the variable hyper_parameters with the variable model_kwargs and the fixed hyper_parameters with the fixed model_kwargs.

Our proposal is to unify them in a class Parameter which is dict-like (as our actual hyperParameter class) but contains a few restrictions:

We would introduce fixed sections for model- and trainer-parameters and fixed sections for fiexed and variable parameters.

The whole Structure would either look like

Parameters = {
    "model": {
        "variable":{
            params
        }
        "fixed":{
            params
        }
    }
    "trainer":{
          "variable":{
            params
        }
        "fixed":{
            params
        }
    }
}

or like

Parameters = {
    "variable": {
        "model":{
            params
        }
        "trainer":{
            params
        }
    }
    "fixed":{
          "model":{
            params
        }
        "trainer":{
            params
        }
    }
}

What do you think @haarburger @ORippler ?

[FeatureRequest] kwargs for models

The previous versions of delira had a kwargs argument for models. This was quite useful, it would be great if it would be available in the version of delira as well.

[Bug] Build Environment

We need to correctly re-setup the build environment including docs, testing with unittest and coverage

CI/CD

Recheck CI/CD setup

[FeatureRequest] Add further features to Tf backend

Description
Some nice features that are still missing:

Multi-GPU training
Base Segmentation Network
Base GAN Network
Pretrained Models/Networks, must be implemented by means of tf.keras.Model, as that is where the long term support is heading for tf
Logging frequency during training

Optimize Imports

Optimize Imports to get framework agnostic:

import torch and apex only if needed
wrap framework specific imports with try
write default optimizer wrapper to remove hard-dependency on apex
create issue in trixi repo to avoid hard PyTorch dependency

[FeatureRequest] Convert lists to dicts

Convert all returned lists to dicts and update the processing accordingly (see #66 )

[Bug] Tutorial AttributeError at indexing dataset

Description
tutorial throws AttributeError

Environment

OS: ubuntu 18.04
Python version: 3.7.1
delira version 0.3.1
How did you install delira? [ pip | source | conda | docker ]
pip

Reproduction

from delira.data_loading import TorchvisionClassificationDataset
dataset_train = TorchvisionClassificationDataset("mnist", train=True,img_shape=(224, 224))
first_img=dataset_train[0]

Additional context

Add any other context about the problem here.

AttributeError                            Traceback (most recent call last)
 in 
----> 1 dataset_train[0]

/work/scratch/baehr/anaconda3/envs/delira/lib/python3.7/site-packages/delira/data_loading/dataset.py in __getitem__(self, index)
    772             data = self.data[index]
    773             data_dict = {"data": np.array(data[0]),
--> 774                          "label": data[1].numpy().reshape(1).astype(np.float32)}
    775 
    776             if self.one_hot:

AttributeError: 'int' object has no attribute 'numpy'

[FeatureRequest] Update Tf Backend to 1.13

Description
Update tf backend to tf 1.13 (which supports CUDA 10 and python 3.7)

Update Readme

Update Readme and check all links, images and badges
Add links to prebuild docker images

[FeatureRequest] Dataset refactoring to more general version

Description
Some of the interfaces of the implemented datasets are somtimes to specific. It would be great to create more general datasets which don't impose any constraints to the user.

Proposal
AbstractDataset should only need a load_fn and a data_path (remove img-, label-extensions)

BaseCacheDataset should only need a load_fn and a data_path. Should be able to handle a single data_path where it searches the respective folder for sub directories/files and calls load_fn for all of them. Should be able to handle a list of paths and call load_fn for all of them. (Could be implemented in a single class or in two separate classes.)

BaseLazyDataset: same criterion as BaseCacheDataset

In order to replicate the current ability to load multiple extensions into a single sample an improved loading function would be helpful.

delira-dev / delira Goto Github PK

delira's People

Contributors

Stargazers

Watchers

Forkers

delira's Issues

Recommend Projects

Recommend Topics

Recommend Org