continvvm / continuum Goto Github PK

View Code? Open in Web Editor NEW

400.0 10.0 44.0 3.35 MB

A clean and simple data loading library for Continual Learning

Home Page: https://continuum.readthedocs.io

License: MIT License

Python 99.89% Makefile 0.11%

continual-learning lifelong-learning incremental-learning online-learning pytorch dataloader dataset

continuum's Introduction

Continuum: Simple Management of Complex Continual Learning Scenarios

A library for PyTorch's loading of datasets in the field of Continual Learning

Aka Continual Learning, Lifelong-Learning, Incremental Learning, etc.

Read the documentation.
Test Continuum on Colab !

Example:

Install from and PyPi:

pip3 install continuum

And run!

from torch.utils.data import DataLoader

from continuum import ClassIncremental
from continuum.datasets import MNIST
from continuum.tasks import split_train_val

dataset = MNIST("my/data/path", download=True, train=True)
scenario = ClassIncremental(
    dataset,
    increment=1,
    initial_increment=5
)

print(f"Number of classes: {scenario.nb_classes}.")
print(f"Number of tasks: {scenario.nb_tasks}.")

for task_id, train_taskset in enumerate(scenario):
    train_taskset, val_taskset = split_train_val(train_taskset, val_split=0.1)
    train_loader = DataLoader(train_taskset, batch_size=32, shuffle=True)
    val_loader = DataLoader(val_taskset, batch_size=32, shuffle=True)

    for x, y, t in train_loader:
        # Do your cool stuff here

Supported Types of Scenarios

Name	Acronym	Supported	Scenario
New Instances	NI	✅	Instances Incremental
New Classes	NC	✅	Classes Incremental
New Instances & Classes	NIC	✅	Data Incremental

Supported Datasets:

Most dataset from torchvision.dasasets are supported, for the complete list, look at the documentation page on datasets here.

Furthermore some "Meta"-datasets are can be create or used from numpy array or any torchvision.datasets or from a folder for datasets having a tree-like structure or by combining several dataset and creating dataset fellowships!

Indexing

All our continual loader are iterable (i.e. you can for loop on them), and are also indexable.

Meaning that clloader[2] returns the third task (index starts at 0). Likewise, if you want to evaluate after each task, on all seen tasks do clloader_test[:n].

Example of Sample Images from a Continuum scenario

CIFAR10:


Task 0	Task 1	Task 2	Task 3	Task 4

MNIST Fellowship (MNIST + FashionMNIST + KMNIST):


Task 0	Task 1	Task 2

PermutedMNIST:


Task 0	Task 1	Task 2	Task 3	Task 4

RotatedMNIST:


Task 0	Task 1	Task 2	Task 3	Task 4

TransformIncremental + BackgroundSwap:


Task 0	Task 1	Task 2

Citation

If you find this library useful in your work, please consider citing it:

@misc{douillardlesort2021continuum,
  author={Douillard, Arthur and Lesort, Timothée},
  title={Continuum: Simple Management of Complex Continual Learning Scenarios},
  publisher={arXiv: 2102.06253},
  year={2021}
}

Maintainers

This project was started by a joint effort from Arthur Douillard & Timothée Lesort, and we are currently the two maintainers.

Feel free to contribute! If you want to propose new features, please create an issue.

Contributors: Lucas Caccia Lucas Cecchi Pau Rodriguez, Yury Antonov, psychicmario, fcld94, Ashok Arjun, Md Rifat Arefin, DanieleMugnai, Xiaohan Zou, Umberto Cappellazzo.

On PyPi

Our project is available on PyPi!

pip3 install continuum

Note that previously another project, a CI tool, was using that name. It is now there continuum_ci.

continuum's People

Contributors

Stargazers

Watchers

continuum's Issues

Create documentation webpage

https://readthedocs.org/

CIFAR-10/100 experiment of Zenke et al. (2017)

Hi @arthurdouillard,
Could you please tell me that your data loader API supports the CIFAR-10/100 experiment? This is an experiment I found in a paper called "continual learning with hyper-networks". BTW, I found there is a module named Fellowship which provides such a combination capability but I am not sure. May I ask you verify it?

create 'sample_batch' function into taskset class

Know if sample comes from reharsal or not

For now, reharsal samples are mixed with others. We need a way to differentiate them.

Check that all datasets can be instanciated

#29 (comment)

We may have to create some dummy data for large datasets such as ImageNet.

CORe50

MultiNLI scenario triggers torchvision. ValueError: pic should be 2/3 dimensional. Got 1 dimensions.

Hi, I tried to investigate how continual learning can be implemented in my NLP classification project. When I tried to see the MultiNLI first and tried to figure out what inside the train_loader using for loop, but an error occured:

dataset = MultiNLI('./data')
scenario = InstanceIncremental(dataset)
for task_id, train_taskset in enumerate(scenario):
    print(task_id, dir(train_taskset))
    train_taskset, val_taskset = split_train_val(train_taskset, val_split=0.1)
    train_loader = DataLoader(train_taskset, batch_size=32, shuffle=True)
    val_loader = DataLoader(val_taskset, batch_size=32, shuffle=True)

   for x,y,t in train_loader:
      print(x, y, t)

For full code:
https://colab.research.google.com/drive/1R8rYCo-0wzoiIUTE64Pko-GtfwbGGQ9C#scrollTo=R2SHCM83-Omg

Assigning each data point a task index

listing data index of each task from the beginning (for NI or NIC scenarios)

Differentiate train clloader from test clloader

Stream-51

Type of "class_order" in ClassIncremental class should be specified

An example in the parameters description could be nice

the version installed with pip is still dependant on imageio

C'est au continuum dataset de savoir si on est en "train" ou "test" ça doit ne dois rien changer pour le loader

l'argument "train" devrait etre retirer du loader.
et le init du pytorchdataset devrait etre sans argument

tester "shared_label_space" in TransformationIncremental

Audio data

AudioSet

TaskSet should not be sampled by TaskSet_Object.x method, it shotcut task specific transformations

code samples for inmemory in readme is not good anymore

Should be:

InMemoryDataset(x,y)

not

InMemoryDataset(x_train,y_train, x_test, y_test)

Validation split

Natural Language Processing

Language Modeling (forgot the ref)
Natural Language Inference (PROGRESSIVE MEMORY BANKS FOR INCREMENTAL DOMAIN ADAPTATION)
Topic classification

For visualization, samples should be randomly selected

It was selected such as forcing balance between classes, but in this case we cannot detect class imbalance in tasks.
NB: This issue should be solved in #29

Error with Core50

There is an error when download is set to False

Core50("/data/douillard/CORe50", download=False, train=True)

However, it works with

Core50("/data/douillard/CORe50", download=True, train=True)

DataIncremental class should be implemented

NIC scenarios

Add a method to CLLoader to modify tranfsormers

Semantic Segmentation

Need to support:

Sequential of Michieli
Disjoint and Overlap of Cermelli

For at least the datasets VOC and ADE20k.

AttributeError: 'TaskSet' object has no attribute 'open_image'

I have tested your snippet (provided bellow) on two devices and I received this error:

from torch.utils.data import DataLoader

from continuum import ClassIncremental
from continuum.datasets import MNIST

clloader = ClassIncremental(
    MNIST("my/data/path", download=True),
    increment=1,
    initial_increment=5,
    train=True  # a different loader for test
)

print(f"Number of classes: {clloader.nb_classes}.")
print(f"Number of tasks: {clloader.nb_tasks}.")

for task_id, train_dataset in enumerate(clloader):
    train_dataset, val_dataset = split_train_val(train_dataset)
    train_loader = DataLoader(train_dataset)
    val_loader = DataLoader(val_dataset)

and the error is:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-6-c5312f5c2b3b> in <module>()
      1 for task_id, train_dataset in enumerate(clloader):
----> 2     train_dataset, val_dataset = split_train_val(train_dataset)
      3     train_loader = DataLoader(train_dataset)
      4     val_loader = DataLoader(val_dataset)

/usr/local/lib/python3.6/dist-packages/continuum/task_set.py in split_train_val(dataset, val_split)
    118 
    119     x, y = dataset.x, dataset.y
--> 120     train_dataset = TaskSet(x[train_indexes], y[train_indexes], dataset.trsf, dataset.open_image)
    121     val_dataset = TaskSet(x[val_indexes], y[val_indexes], dataset.trsf, dataset.open_image)
    122 

AttributeError: 'TaskSet' object has no attribute 'open_image'

May I ask you what is the problem?

Question Regarding Permuted MNIST

Hi,

May I ask you to provide me a snippet for loading data related to the permuted MNIST scenario? I assumed that the following one does the same. Is it right?

from continuum.datasets import MNIST
from continuum import InstanceIncremental
clloader = InstanceIncremental(MNIST(args.data, download=True),
                                nb_tasks=10,
                                train=True)

Speed test for transformed

Tests for transformed are currently quite slow because it applies transformations on the whole MNIST.

We can speed it up by either:

using a small subset of MNIST
using synthetic data

NI scenario: test for presence of all classes in all tasks

Reinforcement Learning

With the current version batch size can only be 1, would be nice to be able to change it

imageio isn't listed as a dependency

Hey there, thanks for your work!

imageio is used only at one place, in viz.py, and isn't listed as a dependency in the setup.py, therefore installing continuum via pip install continuum *works fine, but importing it raises an error

Stream-51 dataset and others

Are you planning on adding other datasets?

PyPi package name

Continuum is already taken on PyPi, by a CI project from 2014.

We need a name for the PyPi, alternative choices:

lifelong
deepcontinuum
torchcontinuum
...

EDIT: the owner of Continuum should give us soon the name.

Custom class order

Support of New Instances (NI)

Need #8

What about renaming _BaseCLLoader -> _BaseScenarioLoader?

I think it would make it clearer

tester "get_task_transformation"

For good visualization of data point, transformations need to be applied

Code example not up to date

https://continuum.readthedocs.io/en/latest/#quick-example

metric documentation should at least list all the metrics proposed

you pip installation command installs a wrong version with an error!

Traceback (most recent call last):
File "main_cl_cifar10_100_scl_prototype_inference_end_to_end.py", line 23, in
from continuum.task_set import split_train_val
File "/home/mohammad/.local/lib/python3.6/site-packages/continuum/task_set.py", line 8, in
from continuum.viz import plot
ImportError: cannot import name 'plot'

from torch.utils.data import DataLoader

from continuum import ClassIncremental, split_train_val
from continuum.datasets import MNIST

clloader = ClassIncremental(
    MNIST("my/data/path", download=True),
    increment=1,
    initial_increment=5,
    train=True  # a different loader for test
)

print(f"Number of classes: {clloader.nb_classes}.")
print(f"Number of tasks: {clloader.nb_tasks}.")

for task_id, train_dataset in enumerate(clloader):
    train_dataset, val_dataset = split_train_val(train_dataset, val_split=0.1)
    train_loader = DataLoader(train_dataset)
    val_loader = DataLoader(val_dataset)
    for x, y in train_loader:
        print("Never gets here?")
        exit()
    # Do your cool stuff here

Number of classes: 10.
Number of tasks: 6.
Traceback (most recent call last):
  File "foo.py", line 20, in <module>
    for x, y in train_loader:
  File "/home/fabrice/miniconda3/envs/pytorch/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 345, in __next__
    data = self._next_data()
  File "/home/fabrice/miniconda3/envs/pytorch/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 385, in _next_data
    data = self._dataset_fetcher.fetch(index)  # may raise StopIteration
  File "/home/fabrice/miniconda3/envs/pytorch/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/fabrice/miniconda3/envs/pytorch/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/fabrice/Source/SSCL/utils/continuum/continuum/task_set.py", line 96, in __getitem__
    t = self.t[index]
TypeError: 'Compose' object does not support indexing