Git Product home page Git Product logo

delira-dev / delira Goto Github PK

View Code? Open in Web Editor NEW
221.0 221.0 26.0 7.99 MB

Lightweight framework for fast prototyping and training deep neural networks with PyTorch and TensorFlow

Home Page: https://delira.rtfd.io

License: GNU Affero General Public License v3.0

Python 75.83% Dockerfile 0.10% Jupyter Notebook 23.71% Shell 0.22% TeX 0.14%
deep-learning delira machine-learning medical-images medical-imaging pytorch radiology tensorflow

delira's People

Contributors

cclauss avatar gedoensmax avatar haarburger avatar justusschock avatar mibaumgartner avatar nkpmedia avatar orippler avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

delira's Issues

[FeatureRequest] Merge kfold types

Description
What should be added/changed?
Do a KFold_predict where u can turn on/off straitfied

Feature History
What have you tried so far?
maybe just delte all labels something like this:
metrics_val = {}
outputs = {}

    if num_splits is None:
        num_splits = 10
        logger.warning("num_splits not defined, using default value of \
                                   10 splits instead ")

    if isinstance(data.dataset, BaseLazyDataset):
        logger.warning("A lazy dataset is given for stratified kfold. \
                               Iterating over the dataset to extract labels for \
                               stratification may be a massive overhead")

    split_idxs = list(range(len(data.dataset)))
    

    fold = KFold(n_splits=num_splits, shuffle=shuffle,
                           random_state=random_seed)

    for idx, (_train_idxs, test_idxs) in enumerate(fold.split(split_idxs,
                                                              split_labels)):
        # extract data from single manager
        _train_data = data.get_subset(_train_idxs)
        _split_idxs = list(range(len(_train_data.dataset)))
        

        val_fold = ShuffleSplit(n_splits=1,
                                          test_size=split_val,
                                          random_state=random_seed)

        for train_idxs, val_idx in val_fold.split(_split_idxs):
            train_data = _train_data.get_subset(train_idxs)
            val_data = _train_data.get_subset(val_idx)

        test_data = data.get_subset(test_idxs)

        # update manager behavior for train and test case
        train_data.update_state_from_dict(train_kwargs)
        val_data.update_state_from_dict(test_kwargs)
        test_data.update_state_from_dict(test_kwargs)
        model = self.run(train_data, val_data,
                         num_epochs=num_epochs,
                         fold=idx,
                         **kwargs)

        _outputs, _metrics_val = self.test(
            self.params, model, test_data)

        outputs[str(idx)] = _outputs
        metrics_val[str(idx)] = _metrics_val

    return outputs, metrics_val

Proposal
How could the feature be implemented?
*Are you able/willing to implement the feature yourself (with some guidance from us)?

Additional context
Add any other context about the feature request here.

[Bug] `PTNetworkTrainer.predict` drops last elements of `batchgen`

Description
This condition is never fulfilled, and therefore the last elements of batchgen are dropped.
If batch_size is modified, internal orig_batch_size is required to correctly generate averages by ceiling.

Environment

  • OS: Ubuntu 18.04
  • Python version: 3.6
  • delira version: 0.2.0-beta.1
  • How did you install delira? [source]

Reproduction
Run MNIST Classification example

Additional context

[Bug] Error when calling `get_subset()` from datamanager

Calling get_subset from a datamanager throws a TypeError because 'BlankDataset' receives multiple data keys.
Furthermore, the internal variable _load_fn doesn't match its counterpart in the __init__ signature of the BlankDataset. As a result, the BlankDataset is missing one required argument (load_fn).

A simple fix is to pop the data key from the kwargs dictionary and remove leading underscores in the keys.

https://github.com/justusschock/delira/blob/3374ac6b86358b6910f3ac1a8c070888e8dc9c33/delira/data_loading/dataset.py#L86-L112

https://github.com/justusschock/delira/blob/3374ac6b86358b6910f3ac1a8c070888e8dc9c33/delira/data_loading/dataset.py#L148-L156

Coverage

Fix coverage calculation in CI/CD

[FeatureRequest] Merge TfExperiment and PytorchExperiment

Description
Merge TfExperiment and PytorchExperiment to remove code duplication. Seemingly only the handling of tf.Graph objects has to be shifted into the Trainer or alternatively the TfNetwork

Proposal
Add specific tf.Graph namespaces to TfNetwork by providing them to tf.Session. This should eliminate the need to reset the global default graph per Trainer instance.

PEP8

Enable PEP8 checks in CI/CD

[FeatureRequest] Backend Choosing

Get a more robust way to determine the currently installed backends.

Environment variable might be shared across bash instances and conda installations

[FeatureRequest] Dataset API expect more than one path for a sample

Description
For data loading it is often important to have a ground truth path and a data path to pass to your loading function. At the moment it looks like this in BaseCacheDataset for example:

if isinstance(path, list):
            # iterate over all elements
            for p in tqdm(path, unit='samples', desc="Loading samples"):
data.append(self._load_fn(p, **self._load_kwargs))

In Lazy Datasets this would be the same case for the getitem method.

Proposal
In my opininion we should add an asterisk in front of the p so that you could also pass lists of pathes to the load function.
Personally i prefer using ** in front of p and pass a list of dictionaries to the Dataset, as this would also allow passing some sort of label or whatever can be of interest for loading. But i am not sure what would be the more elegant way.

[FeatureRequest] Dataset use w/o ground truth

Description
It would be nice to be able to use any Dataset without any given ground truth.

Feature History
I tried not setting gt_extensions and giving it an empty list.
Both did not work and i would prefer if an empty list would be the default value.

Proposal
There would have to be a check in the _is_valid_image_file method if there are file extensions given.
If not the image should be valid if it is existent.

Additional context
This could be a solution:

…
has_label = not self._gt_extensions
for ext in self._gt_extensions:
label_file = fname.rsplit(".", maxsplit=1)[0] + ext
if os.path.isfile(label_file):
     has_label = True

return is_valid_file and has_label

[FeatureRequest] DatasetIterator

Make the dataset iterable (probably sufficient to simply add a __curr_index attribute and __next__ and __iter__ functions to AbstractDataset)

[Bug] Multiple GPUs: Load state_dict

Description
Loading state_dict at begin of training fails if training on multiple GPUs

Environment

  • OS: Ubuntu 18
  • Python version: 3.7
  • delira version 0.3.0
  • How did you install delira? pip

Squash Commits

Squash commits from cc661d1 to latest to one commit (probably via git rebase since they are all for doc generation

Tests

Write small unittests for pytorch trainer, experiment and parameters

[FeatureRequest] Define seperate transformations for train and test data in k-fold.

The k-fold method in the experiment class currently only works with datamanagers with pre-defined transformations.
It would be great, if you could define some base-transformations (e.g. pre-processing) which are applied to the train and test set and train-transformations (e.g. augmentation) which are only applied to the training set.

https://github.com/justusschock/delira/blob/3374ac6b86358b6910f3ac1a8c070888e8dc9c33/delira/training/experiment.py#L97-L113

[FeatureRequest] Add logging Frequency during training

Description
Provide means to specify logging Frequency during training. For Visdom, all logging results are kept in memory (risking OOM), and with tensorboardlogger, disk I/O may bottleneck training otherwise.

Proposal
There are different ways to tackle this. One could be to utilize CombinedLogger from trixi. However, I'm unsure whether CombinedLogger allows for logging frequencies assigned per logged object (e.g. Scalars every iteration and images every Nth iteration), which is a feature I would like to have.

Additional context
Solve before implement logging frequencies as part of #47

[Bug] Mac OS X: Backend for TF not found

Description
I tried installing delira with and without backend to a conda environment which has an already installed Tensorflow 1.12 in it. I get this error when executing one of the tests for example:

python3 delira/tests/models/test_models_tf.py 
Traceback (most recent call last):
  File "/Users/mmueller/Documents/03_Programmieren/02_machine_learning/installation/lfb/delira/tests/models/test_models_tf.py", line 2, in <module>
    if "tf" in os.environ["DELIRA_BACKEND"]:
  File "/Users/mmueller/miniconda3/envs/ml/lib/python3.7/os.py", line 678, in getitem
    raise KeyError(key) from None
KeyError: 'DELIRA_BACKEND'

I am not sure if i have to install any Backend in addition to a normal tensorflow installation. In this case it may be added to the Readme. Thankful for any help!

Environment

  • OS: Mac OS X 10.13.6
  • Python version: 3.7.2
  • delira version: 0.3.0

Reproduction
I installed Tensorflow 1.12 from this git with this instruction.
Now it is listed in conda as tensowrflow but not as tensorflow-gpu although GPU support works, which i think may be the error.

Additional context
I already tried just changing the requirement to just tensorflow.

Refactoring Parameter class

I (and @ORippler ) would like to refactor the HyperParameters class.

Currently we have all parameters splitted in HyperParameters and model_kwargs.
For they hyperparameter search we must decide, whether a parameter is variable or not, and we must combine the variable hyper_parameters with the variable model_kwargs and the fixed hyper_parameters with the fixed model_kwargs.

Our proposal is to unify them in a class Parameter which is dict-like (as our actual hyperParameter class) but contains a few restrictions:

We would introduce fixed sections for model- and trainer-parameters and fixed sections for fiexed and variable parameters.

The whole Structure would either look like

Parameters = {
    "model": {
        "variable":{
            params
        }
        "fixed":{
            params
        }
    }
    "trainer":{
          "variable":{
            params
        }
        "fixed":{
            params
        }
    }
}   

or like

Parameters = {
    "variable": {
        "model":{
            params
        }
        "trainer":{
            params
        }
    }
    "fixed":{
          "model":{
            params
        }
        "trainer":{
            params
        }
    }
}   

What do you think @haarburger @ORippler ?

[FeatureRequest] kwargs for models

The previous versions of delira had a kwargs argument for models. This was quite useful, it would be great if it would be available in the version of delira as well.

[Bug] Build Environment

We need to correctly re-setup the build environment including docs, testing with unittest and coverage

Optimize Imports

Optimize Imports to get framework agnostic:

  • import torch and apex only if needed
  • wrap framework specific imports with try
  • write default optimizer wrapper to remove hard-dependency on apex
  • create issue in trixi repo to avoid hard PyTorch dependency

[Bug] Tutorial AttributeError at indexing dataset

Description
tutorial throws AttributeError

Environment

  • OS: ubuntu 18.04
  • Python version: 3.7.1
  • delira version 0.3.1
  • How did you install delira? [ pip | source | conda | docker ]
    pip

Reproduction

from delira.data_loading import TorchvisionClassificationDataset
dataset_train = TorchvisionClassificationDataset("mnist", train=True,img_shape=(224, 224))
first_img=dataset_train[0]

Additional context

Add any other context about the problem here.

AttributeError                            Traceback (most recent call last)
 in 
----> 1 dataset_train[0]

/work/scratch/baehr/anaconda3/envs/delira/lib/python3.7/site-packages/delira/data_loading/dataset.py in __getitem__(self, index)
    772             data = self.data[index]
    773             data_dict = {"data": np.array(data[0]),
--> 774                          "label": data[1].numpy().reshape(1).astype(np.float32)}
    775 
    776             if self.one_hot:

AttributeError: 'int' object has no attribute 'numpy'

Update Readme

  • Update Readme and check all links, images and badges
  • Add links to prebuild docker images

[FeatureRequest] Dataset refactoring to more general version

Description
Some of the interfaces of the implemented datasets are somtimes to specific. It would be great to create more general datasets which don't impose any constraints to the user.

Proposal
AbstractDataset should only need a load_fn and a data_path (remove img-, label-extensions)

BaseCacheDataset should only need a load_fn and a data_path. Should be able to handle a single data_path where it searches the respective folder for sub directories/files and calls load_fn for all of them. Should be able to handle a list of paths and call load_fn for all of them. (Could be implemented in a single class or in two separate classes.)

BaseLazyDataset: same criterion as BaseCacheDataset

In order to replicate the current ability to load multiple extensions into a single sample an improved loading function would be helpful.

Builds

Check builds for correctness

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.