Git Product home page Git Product logo

imbalanced-dataset-sampler's Introduction

imbalanced-dataset-sampler's People

Contributors

alec-schneider avatar borda avatar davinnovation avatar frankfundel avatar hwany-j avatar kousu avatar leetaehoon97 avatar pre-commit-ci[bot] avatar t-schanz avatar ufoym avatar wajihullahbaig avatar zimonitrome avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

imbalanced-dataset-sampler's Issues

AttributeError: 'ConcatDataset' object has no attribute 'img_norm_cfg'

when i run test.py, there is an error:
File "tools/test.py", line 211, in
main()
File "tools/test.py", line 181, in main
outputs = single_gpu_test(model, data_loader, args.show, args.log_dir)
File "tools/test.py", line 39, in single_gpu_test
model.module.show_result(data, result, dataset.img_norm_cfg, dataset='DOTA1_5')
AttributeError: 'ConcatDataset' object has no attribute 'img_norm_cfg'

How can I solve this problem?

ValueError: Cannot set a frame with no defined index and a value that cannot be converted to a Series

Hi, I am using BERT for multi label classification.
The dataset is imbalance and I use ImbalancedDatasetSampler as the sampler.

The train data has been tokenized,
has id, mask and label:

(tensor([ 101, 112, 872, 4761, 6887, 1914, 840, 1914, 7353, 6818, 3300, 784,
720, 1408, 136, 1506, 1506, 3300, 4788, 2357, 5456, 119, 119, 119,
4696, 4638, 741, 677, 1091, 4638, 872, 1420, 1521, 119, 119, 119,
872, 2157, 6929, 1779, 4788, 2357, 3221, 686, 4518, 677, 3297, 1920,
4638, 4788, 2357, 117, 1506, 1506, 117, 7745, 872, 4638, 1568, 2124,
3221, 6432, 2225, 1217, 2861, 4478, 4105, 2357, 3221, 686, 4518, 677,
3297, 1920, 4638, 4105, 2357, 1568, 119, 119, 119, 1506, 1506, 1506,
112, 112, 4268, 4268, 117, 1961, 4638, 1928, 1355, 5456, 106, 2769,
812, 1920, 2812, 7370, 3488, 2094, 6963, 6206, 5436, 677, 3341, 2769,
4692, 1168, 3312, 1928, 5361, 7027, 3300, 1928, 1355, 119, 119, 119,
671, 2137, 3221, 8584, 809, 1184, 1931, 1168, 4638, 117, 872, 6432,
3221, 679, 3221, 136, 138, 4495, 4567, 140, 102, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0]),
tensor([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]),
tensor(0))

When using

from torch.utils.data import DataLoader, RandomSampler, SequentialSampler

they are fine:

batch_size=3
dataloader_train_o = DataLoader(
    dataset_train,
    sampler=RandomSampler(dataset_train),
    batch_size=batch_size,
    # **kwargs
)

However, replace the sampler to ImbalancedDatasetSampler

batch_size=3
dataloader_train_o = DataLoader(
    dataset_train,
    sampler=ImbalancedDatasetSampler(dataset_train),
    batch_size=batch_size,
    # **kwargs
)

The error print below


ValueError Traceback (most recent call last)
File D:\ProgramData\Anaconda3\envs\pytorch\lib\site-packages\pandas\core\frame.py:3892, in DataFrame._ensure_valid_index(self, value)
3891 try:
-> 3892 value = Series(value)
3893 except (ValueError, NotImplementedError, TypeError) as err:

File D:\ProgramData\Anaconda3\envs\pytorch\lib\site-packages\pandas\core\series.py:451, in Series.init(self, data, index, dtype, name, copy, fastpath)
450 else:
--> 451 data = sanitize_array(data, index, dtype, copy)
453 manager = get_option("mode.data_manager")

File D:\ProgramData\Anaconda3\envs\pytorch\lib\site-packages\pandas\core\construction.py:601, in sanitize_array(data, index, dtype, copy, raise_cast_failure, allow_2d)
599 subarr = maybe_infer_to_datetimelike(subarr)
--> 601 subarr = _sanitize_ndim(subarr, data, dtype, index, allow_2d=allow_2d)
603 if isinstance(subarr, np.ndarray):
604 # at this point we should have dtype be None or subarr.dtype == dtype

File D:\ProgramData\Anaconda3\envs\pytorch\lib\site-packages\pandas\core\construction.py:652, in _sanitize_ndim(result, data, dtype, index, allow_2d)
651 return result
--> 652 raise ValueError("Data must be 1-dimensional")
653 if is_object_dtype(dtype) and isinstance(dtype, ExtensionDtype):
654 # i.e. PandasDtype("O")

ValueError: Data must be 1-dimensional

The above exception was the direct cause of the following exception:

ValueError Traceback (most recent call last)
Input In [49], in <cell line: 5>()
2 from torchsampler import ImbalancedDatasetSampler
4 batch_size=3
5 dataloader_train_o = DataLoader(
6 dataset_train,
----> 7 sampler=ImbalancedDatasetSampler(dataset_train),
8 batch_size=batch_size,
9 # **kwargs
10 )
12 dataloader_validation_o = DataLoader(
13 dataset_val,
14 sampler=SequentialSampler(dataset_val),
15 batch_size=batch_size,
16 # **kwargs
17 )

File D:\ProgramData\Anaconda3\envs\pytorch\lib\site-packages\torchsampler\imbalanced.py:37, in ImbalancedDatasetSampler.init(self, dataset, labels, indices, num_samples, callback_get_label)
35 # distribution of classes in the dataset
36 df = pd.DataFrame()
---> 37 df["label"] = self._get_labels(dataset) if labels is None else labels
38 df.index = self.indices
39 df = df.sort_index()

File D:\ProgramData\Anaconda3\envs\pytorch\lib\site-packages\pandas\core\frame.py:3655, in DataFrame.setitem(self, key, value)
3652 self._setitem_array([key], value)
3653 else:
3654 # set column
-> 3655 self._set_item(key, value)

File D:\ProgramData\Anaconda3\envs\pytorch\lib\site-packages\pandas\core\frame.py:3832, in DataFrame._set_item(self, key, value)
3822 def _set_item(self, key, value) -> None:
3823 """
3824 Add series to DataFrame in specified column.
3825
(...)
3830 ensure homogeneity.
3831 """
-> 3832 value = self._sanitize_column(value)
3834 if (
3835 key in self.columns
3836 and value.ndim == 1
3837 and not is_extension_array_dtype(value)
3838 ):
3839 # broadcast across multiple columns if necessary
3840 if not self.columns.is_unique or isinstance(self.columns, MultiIndex):

File D:\ProgramData\Anaconda3\envs\pytorch\lib\site-packages\pandas\core\frame.py:4528, in DataFrame._sanitize_column(self, value)
4515 def _sanitize_column(self, value) -> ArrayLike:
4516 """
4517 Ensures new columns (which go into the BlockManager as new blocks) are
4518 always copied and converted into an array.
(...)
4526 numpy.ndarray or ExtensionArray
4527 """
-> 4528 self._ensure_valid_index(value)
4530 # We should never get here with DataFrame value
4531 if isinstance(value, Series):

File D:\ProgramData\Anaconda3\envs\pytorch\lib\site-packages\pandas\core\frame.py:3894, in DataFrame._ensure_valid_index(self, value)
3892 value = Series(value)
3893 except (ValueError, NotImplementedError, TypeError) as err:
-> 3894 raise ValueError(
3895 "Cannot set a frame with no defined index "
3896 "and a value that cannot be converted to a Series"
3897 ) from err
3899 # GH31368 preserve name of index
3900 index_copy = value.index.copy()

ValueError: Cannot set a frame with no defined index and a value that cannot be converted to a Series

ConcatDataset support

Thanks for the great work!

I try to combine two datasets by using "dataset = dataset1+dataset2", and it gives me such error:
AttributeError: 'ConcatDataset' object has no attribute 'get_labels'

Is there any workaround?

Error when call ImbalancedDatasetSampler function

Following error occurred when on dataloader. I am working on google colab.

Code
train_dataset = DataLoader(train_dataset, sampler=ImbalancedDatasetSampler(train_dataset), batch_size = BATCH_SIZE, drop_last=True )

Error

ERROR:root:Internal Python error in the inspect module.
Below is the traceback from this internal error.

Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/IPython/core/interactiveshell.py", line 2882, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "", line 5, in
train_dataset = DataLoader(train_dataset, sampler=ImbalancedDatasetSampler(train_dataset),batch_size = BATCH_SIZE,
File "/content/drive/My Drive/Research_Shanto/Shanto/Packages/imbalanced-dataset-sampler-master/torchsampler/imbalanced.py", line 32, in init
label = self._get_label(dataset, idx)
File "/content/drive/My Drive/Research_Shanto/Shanto/Packages/imbalanced-dataset-sampler-master/torchsampler/imbalanced.py", line 53, in _get_label
raise NotImplementedError
NotImplementedError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/IPython/core/interactiveshell.py", line 1823, in showtraceback
stb = value.render_traceback()
AttributeError: 'NotImplementedError' object has no attribute 'render_traceback'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/IPython/core/ultratb.py", line 1132, in get_records
return _fixed_getinnerframes(etb, number_of_lines_of_context, tb_offset)
File "/usr/local/lib/python3.6/dist-packages/IPython/core/ultratb.py", line 313, in wrapped
return f(*args, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/IPython/core/ultratb.py", line 358, in _fixed_getinnerframes
records = fix_frame_records_filenames(inspect.getinnerframes(etb, context))
File "/usr/lib/python3.6/inspect.py", line 1490, in getinnerframes
frameinfo = (tb.tb_frame,) + getframeinfo(tb, context)
File "/usr/lib/python3.6/inspect.py", line 1448, in getframeinfo
filename = getsourcefile(frame) or getfile(frame)
File "/usr/lib/python3.6/inspect.py", line 696, in getsourcefile
if getattr(getmodule(object, filename), 'loader', None) is not None:
File "/usr/lib/python3.6/inspect.py", line 725, in getmodule
file = getabsfile(object, _filename)
File "/usr/lib/python3.6/inspect.py", line 709, in getabsfile
return os.path.normcase(os.path.abspath(_filename))
File "/usr/lib/python3.6/posixpath.py", line 383, in abspath
cwd = os.getcwd()
OSError: [Errno 107] Transport endpoint is not connected

I really hope the community sees this issue and gives solution.

Could you explain your way of sampling in details?

Thanks so much for your implementation. But I have several questions:

  1. In the below picture, it seems that the class with less numbers is sampled repeatedly, while the class with more numbers is sub-sampled. So I wonder what's the difference between your method and traditional method?

image

  1. In each epoch, does each image is sampled for only once? Because you mentioned that your method avoids of creating a new balanced dataset,

'MyDataset' object has no attribute 'get_labels'

When I try to use my own Dataset class, I get the error 'MyDataset' object has no attribute 'get_labels' and cannot proceed.

The content of the Dataloader is as follows, and there is nothing strange about it.
It processes the image data and label data in .npz format.

class MyDataset(data.Dataset):
    def __init__(self, images, labels, transform=None):
        self.images = images
        self.labels = labels
        self.transform = transform

    def __len__(self):
        return len(self.images)

    def __getitem__(self, index):
        image = self.images[index]
        label = self.labels[index]

        if self.transform is not None:
            image = self.transform(image=image)["image"]

        return image, label
train_dataset = MyDataset(train_imgs, train_labels, transform=transform)
train_dataloader = torch.utils.data.DataLoader(train_dataset,
                                               sampler=ImbalancedDatasetSampler(train_dataset),
                                               batch_size= batch_size,
                                               shuffle=True,
                                               num_workers=2)

Is there something wrong with the code?
I don't think it's a typo.

How can I fix it so that it works correctly?

Alternative for IterableDatasets

Hello.

I'm opening this issue to make users aware that I've just released a small package for sampling from IterableDatasets. It's thus complementary to this package which only works with batch datasets.

ERROR label_to_count not callable

Hi !

I noticed that they are some bugs introduce with the last commit ad50e22

Step to reproduce

`
import torch
from torchsampler import ImbalancedDatasetSampler

mnist = torchvision.datasets.MNIST('.', train=True, download=True, transform=transform)
train_loader_b = torch.utils.data.DataLoader(
mnist,
sampler=ImbalancedDatasetSampler(mnist),
batch_size=args.batch_size,
)
`

`
TypeError Traceback (most recent call last)
in
1 train_loader= torch.utils.data.DataLoader(
2 mnist,
----> 3 sampler=ImbalancedDatasetSampler(mnist),
4 batch_size=args.batch_size,
5 )

~/.local/lib/python3.8/site-packages/torchsampler/imbalanced.py in init(self, dataset, indices, num_samples, callback_get_label)
34 label_to_count = df["label"].value_counts()
35
---> 36 weights = 1.0 / label_to_count(df["label"])
37
38 self.weights = torch.DoubleTensor(weights)

TypeError: 'Series' object is not callable
`

I just think that label_to_count is now pandas series and can't be called.

Any idea how to fix it ? ( I will give it a try soon)

Too much time cost

It slower than before too many times when I use this sampler

(self.indices[i] for i in torch.multinomial(self.weights, self.num_samples, replacement=True))
it seems that this expression cost too much time!

Any one have any solution?

Is it possible to implement this sampler in segmentation model?

I have an imbalanced dataset consisting of 5 classes of images paired with pixel-wise annotated masks.
My 'mask' is an array that has the same pixel size as the image and class number assigned in the corresponding pixel.
But it seems the imbalanced-dataset-sampler is designed for "labels" rather than mask arrays.
Can I modify my dataset function to fit this sampler?
(my mask array only contains 0 and a specific class number at a time)

class Dataset(BaseDataset):

def __init__(self, 
             image_df,
             mask_list,
             preprocessing = None,
            ):
    self.images_dir = image_df.all_path
    self.masks_dir = mask_list
    self.class_values = list(range(len(CLASSES)))
    self.preprocessing = preprocessing

def __getitem__(self, i):
    image = cv2.imread(self.images_dir[i])
    image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
    mask = cv2.imread(self.masks_dir[i], 0)
    
    masks = [(mask == v) for v in self.class_values]
    mask = np.stack(masks, axis = -1).astype('float')
    
    if self.preprocessing:
        sample = self.preprocessing(image = image, mask = mask)
        image, mask = sample['image'], sample['mask']
        
    return image, mask

def __len__(self):
    return len(self.images_dir)

Should apply dataset.target_transform

In the sampler, when getting the labels of the dataset and counting the frequencies of each, the dataset target transform should be applied. Target transforms sometimes change the label (e.g. by grouping multiple of the original classes) and might affect the frequency counts for each label.

better description of "epoch"

Thanks for the good implementation! I was a bit confused about the description so I'd like to comment.

Then in each epoch, the loader will sample the entire dataset and weigh your samples inversely to your class appearing probability.

But this is no longer what we call an epoch normally, right? I mean, we do not iterate over all data points in an epoch, because same data points that belongs to majority classes are not used in an epoch. Technically, for each "epoch" defined by pytorch, the loader will sample the same number of data points in the original dataset, and each sample is picked with the probability disproportional to the class frequency.

pip install error

Follows your method to install the package:
git clone https://github.com/ufoym/imbalanced-dataset-sampler.git
cd imbalanced-dataset-sampler
python setup.py install
pip install .

But when I run "pip install .", I met the error as follows:
FileNotFoundError: [Errno 2] No such file or directory: '/home/miniconda3/envs/pytorch/lib/python3.7/site-packages/torchsampler-0.1-py3.7.egg'

How can I resolve it?

Unable to use the package in google colab

I tried importing the package in colab but error prompts me to install torchsampler. Then I tried writing !pip install torchsampler and the following error popped up:

ERROR: Could not find a version that satisfies the requirement torchsampler (from versions: none) ERROR: No matching distribution found for torchsampler.

Would be really grateful if quick fix is provided.

Implementation for Pytorch-geometric dataset

I have added a few lines that allow to work with pytorch-geometric dataset. Since Pytorch-geometric data is saved as a list before being loaded by a Pytorch-geometric Dataloader, the modification is pretty simple.
Hope this could be helpful to someone.

Best,

Anna

`from typing import Callable

import pandas as pd
import torch
import torch.utils.data
import torchvision

class ImbalancedDatasetSampler(torch.utils.data.sampler.Sampler):
"""Samples elements randomly from a given list of indices for imbalanced dataset

Arguments:
    indices: a list of indices
    num_samples: number of samples to draw
    callback_get_label: a callback-like function which takes two arguments - dataset and index
"""

def __init__(self, dataset, indices: list = None, num_samples: int = None, callback_get_label: Callable = None):
    # if indices is not provided, all elements in the dataset will be considered
    self.indices = list(range(len(dataset))) if indices is None else indices

    # define custom callback
    self.callback_get_label = callback_get_label

    # if num_samples is not provided, draw `len(indices)` samples in each iteration
    self.num_samples = len(self.indices) if num_samples is None else num_samples

    # distribution of classes in the dataset
    df = pd.DataFrame()
    df["label"] = self._get_labels(dataset)
    df.index = self.indices
    df = df.sort_index()

    label_to_count = df["label"].value_counts()

    weights = 1.0 / label_to_count[df["label"]]

    self.weights = torch.DoubleTensor(weights.to_list())

def _get_labels(self, dataset):
    if self.callback_get_label:
        return self.callback_get_label(dataset)
    elif isinstance(dataset, torchvision.datasets.MNIST):
        return dataset.train_labels.tolist()
    elif isinstance(dataset, torchvision.datasets.ImageFolder):
        return [x[1] for x in dataset.imgs]
    elif isinstance(dataset, torchvision.datasets.DatasetFolder):
        return dataset.samples[:][1]
    elif isinstance(dataset, torch.utils.data.Subset):
        return dataset.dataset.imgs[:][1]
    elif isinstance(dataset, torch.utils.data.Dataset):
        return dataset.get_labels()
    elif isinstance(dataset, list):
        return [dataset[i].y.item() for i in range(len(dataset))]  #here the modification
    else:
        raise NotImplementedError

def __iter__(self):
    return (self.indices[i] for i in torch.multinomial(self.weights, self.num_samples, replacement=True))

def __len__(self):
    return self.num_samples

`

Doesn't work with concatenated dataset

Using the ImbalancedDatasetSampler in the concatenated dataset using ConcatDataset([datasetA, datasetB, datasetC])

AttributeError: 'ConcatDataset' object has no attribute 'get_labels'

ModuleNotFoundError: No module named 'torchsampler'

Thanks for you sharing! when I run
"from torchsampler import ImbalancedDatasetSampler"

ModuleNotFoundError Traceback (most recent call last)
in
8 # os.environ["CUDA_VISIBLE_DEVICES"]="1"
9
---> 10 from torchsampler import ImbalancedDatasetSampler

ModuleNotFoundError: No module named 'torchsampler'

I meet this error,how I can do to solve this problem?

imbalanced data set not reading in correctly to torch

Hello,

I'm getting the following error trying to use the Imbalanced data sampler.


ValueError Traceback (most recent call last)
in
----> 1 train_loader = DataLoader(roof, batch_size=10, sampler = ImbalancedDatasetSampler)

C:\ProgramData\Anaconda3\lib\site-packages\torch\utils\data\dataloader.py in init(self, dataset, batch_size, shuffle, sampler, batch_sampler, num_workers, collate_fn, pin_memory, drop_last, timeout, worker_init_fn, multiprocessing_context)
217 if batch_size is not None and batch_sampler is None:
218 # auto_collation without custom batch_sampler
--> 219 batch_sampler = BatchSampler(sampler, batch_size, drop_last)
220
221 self.batch_size = batch_size

C:\ProgramData\Anaconda3\lib\site-packages\torch\utils\data\sampler.py in init(self, sampler, batch_size, drop_last)
184 raise ValueError("sampler should be an instance of "
185 "torch.utils.data.Sampler, but got sampler={}"
--> 186 .format(sampler))
187 if not isinstance(batch_size, _int_classes) or isinstance(batch_size, bool) or
188 batch_size <= 0:

ValueError: sampler should be an instance of torch.utils.data.Sampler, but got sampler=<class 'sampler.ImbalancedDatasetSampler'>

Was I supposed to save the sampler.py file in a special location? I saved it in my directory and it imports.

Difference with WeightedRandomSampler

What is the difference between this sampler and WeightedRandomSampler in pytorch?
Is it only that in WeightedRandomSampler we need to give the weights and num_samples as input? But, here we give dataset as input?

Thanks

Subset sampling entire dataset

Hi everyone,

I have a question concering using subsets with this sampler. According to the code it chooses samples from all entries in the parent dataset:

elif isinstance(dataset, torch.utils.data.Subset):
return dataset.dataset.imgs[:][1]

Shouldn't it only sample from the samples the chosen subset in dataset.indices? When I try to run _get_labels as is, I get length mismatch. Is my implementation of subset unusual or should this be changed? Only returning the labels corresponding to dataset.indices solved this problem for me:

        elif isinstance(dataset, torch.utils.data.Subset):
            return [dataset.dataset.imgs[ind][1] for ind in dataset.indices]

NotImplemented Error while running ImbalancedDatasetSampler

I followed the steps exactly according to the readme file. Yet I am getting a notimplemented error. There's no explanation for the error as well.

Here's my code:
`from torchvision import transforms
from torchsampler import ImbalancedDatasetSampler

batch_size = 128
val_split = 0.2
shuffle_dataset=True
random_seed=42

dataset_size = len(melanoma_dataset)
indices = list(range(dataset_size))
split = int(np.floor(val_split * dataset_size))
if shuffle_dataset :
np.random.seed(random_seed)
np.random.shuffle(indices)
train_indices, test_indices = indices[split:], indices[:split]

train_loader = torch.utils.data.DataLoader(melanoma_dataset,batch_size=batch_size,sampler=ImbalancedDatasetSampler(melanoma_dataset))
test_loader = torch.utils.data.DataLoader(melanoma_dataset,batch_size=batch_size,sampler=test_sampler)`

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.