mic-dkfz / batchgenerators Goto Github PK

View Code? Open in Web Editor NEW

1.1K 1.1K 221.0 7.48 MB

A framework for data augmentation for 2D and 3D image classification and segmentation

License: Apache License 2.0

Makefile 0.01% Python 10.92% Jupyter Notebook 89.07%

batchgenerators's People

Contributors

Stargazers

Watchers

Forkers

pfjaeger lillianuuang scp-173-cool doctoryfx offbit eong2012 huangmozhilv xy0806 weirdalchemy qiaotian xiaochengcike 12sigmatechnologies ramesh152 yangsenwxy heconnor mini-shark yding5 sunycl abnerxzhe brunoggregorio xidianxiaofeixian devhliu litchisystems mzza liaw05 bonescat yksiat tjmingfeng poodarchu keepersecond adler-j anwenzhiyi junqiangchen mangalbhaskar 845968074 qinqian haochange justusschock shannongxn newmesc junaid199f luccauchon diazandr3s meghbhalerao suyanzhou626 celianglance l-spiecker variablexx baiti01 xihuanliuliu shifeng1981 jrch507 yaohuaxin jematos92 luyu0004 yf817 saadullahakram yangyuren03 adoudou2008 tor4z vanamsterdam ieee820 hyzcn gaaiwai ricardozitseng zgj834124621 jchen42703 kejiejiang winwinjjiang zerojumpline che85 lliai deepmeditativemind siddharthbharthulwar yuehchuan saumya-gupta-26 tonyreina amengi reda-abdellah wasserth arnaudberenbaum samra-irshad nick917 jongchan evanjs lihungchieh freedreamer-crypto giammarco07 cliffordlai aashish24 pengfei1234 sarthakpati dycollapsar ralogon zzl0816 wbilife insight333 yang20085936 joonvan shuhxi

batchgenerators's Issues

difference between 'augment_spatial' and 'augment_spatial_2'

like the title, what's the differences?
which one should I use?
and 'SpatialTransform' VS 'SpatialTransform_2', confused
Thanks

Using 'channels_last' layout of batches

Hi!
First of all, thanks for that awesome project. I feel like it will really be an improvement over my current 3D data augmentation, as it offers a nicer (and modular) way to stack multiple transforms with individual probabilities.

My current solution (CNN, data augmentation, and all) is built entirely around a 'channels_last' data layout, where every batch has a shape (b, x, y, z, c). From the code I viewed, it appears that the augmentations of batchgenerators naturally use a (b, c, x, y, z) format.
Is there an elegant way to still keep my data layout and augment it correctly? (Of course, I could transpose my batches in a DataLoader derived from SlimDataLoaderBase and wrap/derive MultiThreadedAugmenter to revert the transposition, but this seems quite heavy-handed to me.)

Best regards,
Carlchristian

How to implement customized enhancment function?

In some situation, we need to implement our own enhancement algorithm, such as top hat contrast enhancement algorithm. How to implement it and embed into class MultiThreadedAugmenter?

Why the effect is not obvious?

@FabianIsensee Hi, thanks for your excellent works. I try to use the data augmentation to train my brain tumor segmentation task. I have attempted many times to use Spatial Augmentations on my network(3D U-Net) during the training, but in terms of Dice, its effect is not obvious on testing data. The following code only works on the training dataset and the validation dataset does not augment it. The shapes of data and truth are (4,128,128,128) and (1,128,128,128). Can your give me some advices and how to improve it ?

def augment_spatial_data(data, truth, affine,batch_size):
    data_list = list()
    for data_index in range(data.shape[0]):
        image = get_image(data[data_index], affine)
        data_list.append(image.get_data())
    data = np.asarray(data_list)
    data = np.tile(data[None], (batch_size, 1, 1, 1, 1)).astype(np.float32)
    truth_image = get_image(truth, affine).get_data()
    truth_image = np.tile(truth_image[None,None], (batch_size, 1, 1, 1, 1)).astype(np.float32)
    data,truth=augment_spatial(data,truth_image,patch_size=truth.shape,
                       patch_center_dist_from_border=np.array(truth.shape)//2)
    data = np.tile(data[0], (1, 1, 1, 1))
    truth = np.tile(truth[0, 0], (1, 1, 1))
    return data,truth

3D image and label augmentation tutorial

Thanks for the great work.
Would it be possible for you to provide a tutorial for 3D medical image and label augmentation?
Eg: BraTS dataset.

Looking forward to your reply.

Best regards,
Edward

Example.ipynb is not working!

Hi,

I have an issue with the .next attribute :

first the dataloader class is not working for me in the Example.ipynb :

as when doing :
batchgen = DataLoader(data.camera(), 4, None, False) batch = batchgen.next()
returns this error:
AttributeError: 'DataLoader' object has no attribute 'next'

it can be fixed by renaming generate_train_batch in the DataLoader by next, but not sure this is the good way to do it. Indeed later,
multithreaded_generator = MultiThreadedAugmenter(batchgen, all_transforms, 4, 2, seeds=None)

also as no attribute .next and when doing multithreaded_generator.generator.next() which works, the transformation didn't apply...

I don't know where the error come from,

Best regards

Paul

Need help about the MultiThreadedAugmenter

Hi,
I am trying to emulate the brats2017DataLoader3D to get my Kits2019DataLoader3D, but I encounter the following problem. My program stopped at batch = next(tr_gen) when I used the MultiThreadedAugmenter(I tried the SingleThreadedAugmenter as well, which works fine on my laptop). I also try to debug it, but nothing just happened. When I look out the processes that created on my computer by using htop, I found a lot of complexing processes created by the program even if I have terminated it. Therefore, I was wondering whether there are some problems with my code or the MultiThreadedAugmenter.

`if name == "main":
preprocessed_folders = "./npy_data"
num_of_threads_for_kits19 = 4

patients = get_list_of_patients(preprocessed_data_folder=preprocessed_folders)

train, val = get_split_deterministic(patients, fold=0, num_splits=2, random_state=12345)

patch_size = (128, 160, 160)
batch_size = 4

# dataloader = Kits2019DataLoader3D(train, batch_size, patch_size, 1)

# batch = next(dataloader)
# try:
#     from batchviewer import view_batch
#     # batch viewer can show up to 4d tensors. We can show only one sample, but that should be sufficient here
#     view_batch(batch['data'][0], batch['seg'][0])
# except ImportError:
#     view_batch = None
#     print("you can visualize batches with batchviewer. It's a nice and handy tool. You can get it here: "
#           "https://github.com/FabianIsensee/BatchViewer")

# now we have some DataLoader. Let's go and get some augmentations

# first let's collect all shapes, you will see why later
shapes = [Kits2019DataLoader3D.load_patient(
    i)[0].shape[1:] for i in patients]
max_shape = np.max(shapes, 0)
max_shape = np.max((max_shape, patch_size), 0)

# we create a new instance of DataLoader. This one will return batches of shape max_shape. Cropping/padding is
# now done by SpatialTransform. If we do it this way we avoid border artifacts (the entire brain of all cases will
# be in the batch and SpatialTransform will use zeros which is exactly what we have outside the brain)
# this is viable here but not viable if you work with different data. If you work for example with CT scans that
# can be up to 500x500x500 voxels large then you should do this differently. There, instead of using max_shape you
# should estimate what shape you need to extract so that subsequent SpatialTransform does not introduce border
# artifacts
dataloader_train = Kits2019DataLoader3D(
    train, batch_size, patch_size, num_of_threads_for_kits19)

# during training I like to run a validation from time to time to see where I am standing. This is not a correct
# validation because just like training this is patch-based but it's good enough. We don't do augmentation for the
# validation, so patch_size is used as shape target here
dataloader_validation = Kits2019DataLoader3D(
    val, batch_size, patch_size, num_of_threads_for_kits19)

tr_transforms = get_train_transform(patch_size)

# tr_gen = SingleThreadedAugmenter(dataloader_train, transform=tr_transforms)
# val_gen = SingleThreadedAugmenter(dataloader_validation, transform=None)
# finally we can create multithreaded transforms that we can actually use for training
# we don't pin memory here because this is pytorch specific.
tr_gen = MultiThreadedAugmenter(dataloader_train, tr_transforms, num_processes=num_of_threads_for_kits19,
                                num_cached_per_queue=3,
                                seeds=None, pin_memory=False)
# we need less processes for vlaidation because we dont apply transformations
val_gen = MultiThreadedAugmenter(dataloader_validation, None,
                                 num_processes=max(1, num_of_threads_for_kits19//2), num_cached_per_queue=1,
                                 seeds=None,
                                 pin_memory=False)

# lets start the MultiThreadedAugmenter. This is not necessary but allows them to start generating training
# batches while other things run in the main thread
tr_gen.restart()
val_gen.restart()

# now if this was a network training you would run epochs like this (remember tr_gen and val_gen generate
# inifinite examples! Don't do "for batch in tr_gen:"!!!):
'''
num_batches_per_epoch = 10
num_validation_batches_per_epoch = 3
num_epochs = 5
# let's run this to get a time on how long it takes
time_per_epoch = []
start = time()
for epoch in range(num_epochs):
    start_epoch = time()
    for b in range(num_batches_per_epoch):
        batch = next(tr_gen)
        # do network training here with this batch

    for b in range(num_validation_batches_per_epoch):
        batch = next(val_gen)
        # run validation here
    end_epoch = time()
    time_per_epoch.append(end_epoch - start_epoch)
end = time()
total_time = end - start
print("Running %d epochs took a total of %.2f seconds with time per epoch being %s" %
      (num_epochs, total_time, str(time_per_epoch)))
      '''

# if you notice that you have CPU usage issues, reduce the probability with which the spatial transformations are
# applied in get_train_transform (down to 0.1 for example). SpatialTransform is the most expensive transform
from batchviewer import view_batch
# if you wish to visualize some augmented examples, install batchviewer and uncomment this
if view_batch is not None:
    for _ in range(4):
        batch = next(tr_gen)
        view_batch(batch['data'][0], batch['seg'][0])
else:
    print("Cannot visualize batches, install batchviewer first. It's a nice and handy tool. You can get it here: "
          "https://github.com/FabianIsensee/BatchViewer")`

Best,
Yucheng

Sorry I did not find the folder 'dldabg'

there is a command 'cd dldabg' but I did not find the folder

a question about transformations

Hello,

I have this problem（ mirror_transform = MirrorTransform(axes=(2, 3))
File "/home/xy/hb/MIC/batchgenerators-master/batchgenerators/transforms/spatial_transforms.py", line 199, in init
raise ValueError("MirrorTransform now takes the axes as the spatial dimensions. What previously was "
ValueError: MirrorTransform now takes the axes as the spatial dimensions. What previously was axes=(2, 3, 4) to mirror along all spatial dimensions of a 5d tensor (b, c, x, y, z) is now axes=(0, 1, 2). Please adapt your scripts accordingly） when I reproduce your code（MIC-DKFZ/batchgenerators）. What's the reason? Looking forward to your reply!

                                                                                                                                            sincere

                                                                                                                                              Bing

Why am I changing other data (ISLES2018) so slow? I thought it was a bug

I changed the data set to ISLES2018, and resize to (32,128128). When the data be generated in batches,it will get stucked. After debugging, I found out when the spatialtransform was removed, it will work. The data augmentation was so slow, but brats2017 was not slow. Why is that? Can anyone help me?

Slowdown with latest release

Hi,

I just wanted to let you know, that I have some issues with the latest release.

The issues are mainly, that there is a massive slowdown in our CI/CD (from about 45-50 minutes for all jobs in the matrix up to 14 hours or more).

Here is a build with the latest release (0.19.4, installed from PyPi) and here is the same build with release 0.19.3 (installed from PyPi). These builds are completely identical (besides the batchgenerators version).

Unfortunately I did not have time to pinpoint the error (yet).

Best,
Justus

Missing Requirement: sklearn

Hi DKFZ Team,

Due to the examples and the utilities, you got a new requirement which isn't reflected by the requirements.txt file: sklearn is used for data splitting.

Since you probably don't want the whole package to have this requirement (since this is only needed for examples), I propose the following changes:

Move examples to the most outer scope and move utilities inside examples. This way, you could specify another requirements file for the examples.

And while working on this, I'd also recommend moving the tests to the outer scope and using the find_packages from setuptools inside the setup, since this makes your whole setup process a bit more flexible.

These are just minor modifications, but especially the missing requirement could prevent future releases from being correctly installed.

Best,
Justus

EDIT: In fact, this actually prevents batchgenerators from being installed correctly. If you agree with the proposed changes, I'd open a PR for this

Make elastic transform faster

Hey,

Thank you for this great library, and all the rest of the open source code of DKFZ !

I think I have a way to make the elastic transform (slightly) faster. The idea is to build a deformation grid of lower size than the image, and then to do linear interpolation to compute a full displacement field. Depending on image sizes it yields a speed improvement (I have 25% speed improvement on 128**3 images with a grid subsampled by a factor 20.)

Of course, it's not possible to apply this when sigma is too low compared to the subsampling, because it makes the gaussian filtering weird !

#subfact is a subsampling factor on the full image grid, typically 10 to have good perfs
momenta = np.random.uniform(size=(h // subfact, d // subfact, w // subfact)) * 2 - 1

velocity_field = gaussian_filter(momenta, self.elastic_sigma / self.subfact, mode='constant', cval=0) * self.elastic_alpha

coords[dim] += map_coordinates(velocity_field, coordinates_to_map, order=1, mode='constant', cval=0)

The effect on normalization before data augmentation

Hi, I have a doubt whether is it an impact to firstly normalize the raw datas(BraTS2018 Data) for some data augmentations?

Wrong names formatting on Zenodo

Hi,

I'm trying to cite you using the entry on Zenodo. The names are Isensee Fabian; Jäger Paul, etc. That way, the BibTeX format is not generated correctly. You should change it to Isensee, Fabian; Jäger, Paul, etc.

By the way, you can automate the releases and this stuff using a Zenodo config file, as NiBabel guys do.

Question about "patching"

I am coming to this package from Medical Detection Toolkit and the paper and the toolkit claim to do "patching" but it looks to me like what is called "patching" is just cropping (random or center).

Is there anyway in this toolbox to take a large image/volume and return a batch of patches (say input a 100 x 100 image and return 4 25x25 patches)?

Support for NVIDIA Dali?

Hi,

are there any plans to support the Nvidia Dali library for fast GPU based augmentation in the future?

(https://github.com/NVIDIA/DALI, https://www.basicml.com/performance/2019/04/16/pytorch-data-augmentation-with-nvidia-dali)

Best
Andy

help

raise ValueError("MirrorTransform now takes the axes as the spatial dimensions. What previously was "
ValueError: MirrorTransform now takes the axes as the spatial dimensions. What previously was axes=(2, 3, 4) to mirror along all spatial dimensions of a 5d tensor (b, c, x, y, z) is now axes=(0, 1, 2). Please adapt your scripts accordingly.

Can I install Batchgenerators in Anaconda Environment

Hi
Can I install batchgenerators in Anaconda Environment

How to cite batchgenerators

Hi DKFZ,

i wish to cite the batchgenerators repository in a publication, as it's a pretty awesome framework.
Any preferred way on how to do so?

Best,

Except multiprocess, all things work well on Windows

Dear DKFZ,

Thanks for the great repo.

I want to use this tool for off-line argumentation on Win10, and I follow the code in examples/brats2017. All things work well except the multiprocessing.

I paste the error information. Would it be possible for you to tell me how to solve the problem?
My goal is off-line argumentation. I do not pursue efficience and only want it can work.

def main():
    brats_preprocessed_folder = r"Pathto\BraTS2017_preprocessed"

    num_threads_for_brats_example = 1        
    patients = get_list_of_patients(brats_preprocessed_folder)    
    train, val = get_split_deterministic(patients, fold=0, num_splits=2, random_state=12345)
    
    patch_size = (128, 128, 128)
    batch_size = 2
    
    dataloader = BraTS2017DataLoader3D(train, batch_size, patch_size, 1)
    
    batch = next(dataloader)
 
    
    # first let's collect all shapes, you will see why later
    shapes = [BraTS2017DataLoader3D.load_patient(i)[0].shape[1:] for i in patients] 
    max_shape = np.max(shapes, 0) 
    max_shape = np.max((max_shape, patch_size), 0)
    

    # artifacts
    dataloader_train = BraTS2017DataLoader3D(train, batch_size, max_shape, 1)
    
    
    tr_transforms = get_train_transform(patch_size)
    
    tr_gen = MultiThreadedAugmenter(dataloader_train, tr_transforms, num_processes=num_threads_for_brats_example,
                                    num_cached_per_queue=3,
                                    seeds=None, pin_memory=False)
    
    tr_gen.restart()
    
    num_batches_per_epoch = 2
    num_epochs = 1
    # let's run this to get a time on how long it takes
    time_per_epoch = []
    start = time()
    for epoch in range(num_epochs):
        start_epoch = time()
        for b in range(num_batches_per_epoch):
            batch = next(tr_gen)
            print(batch['data'][0].shape)
            # do network training here with this batch

        end_epoch = time()
        time_per_epoch.append(end_epoch - start_epoch)
    end = time()
    total_time = end - start
    print("Running %d epochs took a total of %.2f seconds with time per epoch being %s" %
          (num_epochs, total_time, str(time_per_epoch)))

if __name__ == '__main__':
    from multiprocessing import freeze_support
    freeze_support()
    main()

Following error occurred:

runfile('E:/Data/DataAug/BatchGenerator/brats2017_dataloader_3D.py')
Traceback (most recent call last):

  File "<ipython-input-1-4598568443c2>", line 1, in <module>
    runfile('E:/Data/DataAug/BatchGenerator/brats2017_dataloader_3D.py')

  File "D:\ProgramData\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 705, in runfile
    execfile(filename, namespace)

  File "D:\ProgramData\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 102, in execfile
    exec(compile(f.read(), filename, 'exec'), namespace)

  File "E:/Data/DataAug/BatchGenerator/brats2017_dataloader_3D.py", line 229, in <module>
    main()

  File "E:/Data/DataAug/BatchGenerator/brats2017_dataloader_3D.py", line 198, in main
    tr_gen.restart()

  File "E:\Data\DataAug\BatchGenerator\batchgenerators\dataloading\multi_threaded_augmenter.py", line 254, in restart
    self._start()

  File "E:\Data\DataAug\BatchGenerator\batchgenerators\dataloading\multi_threaded_augmenter.py", line 224, in _start
    self._processes[-1].start()

  File "D:\ProgramData\Anaconda3\lib\multiprocessing\process.py", line 105, in start
    self._popen = self._Popen(self)

  File "D:\ProgramData\Anaconda3\lib\multiprocessing\context.py", line 223, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)

  File "D:\ProgramData\Anaconda3\lib\multiprocessing\context.py", line 322, in _Popen
    return Popen(process_obj)

  File "D:\ProgramData\Anaconda3\lib\multiprocessing\popen_spawn_win32.py", line 33, in __init__
    prep_data = spawn.get_preparation_data(process_obj._name)

  File "D:\ProgramData\Anaconda3\lib\multiprocessing\spawn.py", line 172, in get_preparation_data
    main_mod_name = getattr(main_module.__spec__, "name", None)

AttributeError: module '__main__' has no attribute '__spec__'

I am looking forward to your reply.

Dataloader

Hi,
Thanks for this great work. I saw that you used multithread to build a high-performance data loader. What would be its advantage over Framework's Dataloader?

builtins

Hello! Thanks for your code firstly. It looks pretty good. But where is the model named ' builtins '? Thanks for your help.

The pin_memory option in the MultiThreadedAugmenter might have a memory leak

Hi,

It can be the case that the pin_memory option in the MultiThreadedAugmenter has a memory leak, causing the RAM to overflow. This could happen because the processes started with this command:

batchgenerators/batchgenerators/dataloading/multi_threaded_augmenter.py

Lines 150 to 154 in 7b92930

 if self.pin_memory: 

 self.pin_memory_queue = thrQueue(2) 

 self.pin_memory_thread = threading.Thread(target=pin_memory_loop, args=(self._queues, self.pin_memory_queue)) 

 self.pin_memory_thread.daemon = True 

 self.pin_memory_thread.start()

are not properly stopped again.

In the dataloader from pycharm the following description is given prior to shutting down the workers
(https://pytorch.org/docs/stable/_modules/torch/utils/data/dataloader.html):

Exit pin_memory_thread first because exiting workers may leave
corrupted data in worker_result_queue which pin_memory_thread reads from.

Maybe the problem can be solved if the workers are closed explicitly. I can try to fix it in the next couple of days.

Why are there 2 GaussianNoiseTransforms and 2 GammaTransforms in brats2017_dataloader_3D.py?

Dear FabianIsensee:
1.
I have seen the code of brats2017_dataloader_3D.py,but i saw there are two same transforms about GammaTransforms and GaussianNoiseTransforms,why do you do that?
2.
if you use all spatial transformations with a probability of 0.2 per sample,no augmentation with a probability of 0.8 per sample
The other transformations with a probability of 0.18 per sample, no augmentation with a probability of 0.75 per sample
There are 6 transformations in total, This means that 1-(0.8x0.85x0.85x0.85x0.85x0.85)= 74% of samples will be augmented,is this too much for these sample?

Hope you can reply me :) thanks !~~~

Can the generators be used to load and preprocess files on the fly?

From what I've read in the README is that the input data values must be arrays. Is there any functionality that allows me to have the values as a list of filenames so that the generator loads files on the fly (with augmentation)?

i.e.

# getting paths for data 
from glob import glob 
local_train_path = '/content/training/' 
local_label_path = '/content/labels/' 
train_paths = glob(local_train_path + '**.nii', recursive = True) 
mask_paths = glob(local_label_path + '**.nii', recursive = True) 
data_dict = {'data': train_paths, 'seg': `mask_paths}

Thank you for your time and consideration.

Python random seed

Hi, great work with Batchgenerators!
The MultiThreadedAugmenter leads to some non-deterministic behaviour even when it is seeded. After some investigation, I found that only the numpy.random seed is set by the producer. Additionally setting the random seed for the random package included in python fixed my issues. Maybe you could add that seed to the producer because in your brightness augmentation you are using random.uniform, too.

batchgenerators/batchgenerators/dataloading/multi_threaded_augmenter.py

Lines 33 to 37 in af6dbab

 def producer(queue, data_loader, transform, thread_id, seed, abort_event): 

 try: 

 np.random.seed(seed) 

 data_loader.set_thread_id(thread_id) 

 item = None

batchgenerators/batchgenerators/augmentations/color_augmentations.py

Lines 65 to 71 in e86abab

 def augment_brightness_multiplicative(data_sample, multiplier_range=(0.5, 2), per_channel=True): 

 multiplier = random.uniform(multiplier_range[0], multiplier_range[1]) 

 if not per_channel: 

 data_sample *= multiplier 

 else: 

 for c in range(data_sample.shape[0]): 

 multiplier = random.uniform(multiplier_range[0], multiplier_range[1])

DataLoader for object detection

Is there an example of what data structure and key names should we use for object detection labels? I'm trying to use this together with medicaldetectiontoolkit and I couldn't find an object detection example pipeline, only for segmentation.

Using batchgenerators with TractSeg

Hi,

I am trying to run TractSeg training, and not sure about batchgenerators version I should use.
I have tried the master and got 'PadToMultipleTransform' missing. I have also tried 'tractseg_stable' branch and there I get 'FlipVectorAxisTransform' that is missing.

I will appreciate your help.

Thanks,
Ilya

example_ipynb: cannot import name 'Mirror'

Dear Fabian,
I tried to run the example but got an error :
ImportError: cannot import name 'Mirror'
I checked spatial_transforms.py but only found MirrorTransform.
I am confused and cannot fix it, hope you could help me.
Best,
Ce Liang

PyPI

Hi,
I've got a quick question:

Would it be possible to register batchgenerators at PyPi? This would be really helpfull to develop packages depending on it, since this would not make a separate installation step necessary (as it is now).

Thanks!
Justus

A question about spatial_transforms

Hi,

Thanks for your sharing of your augmentation code. I have a question about ''augment_channel_translation'' in ''spatial_transforms.py''.

What is the meaning of this augment method? Does it in order to prevent the impact of poor registration between different modality of one object？（different modality: like RGB and Depth in netural image; CT and MRI and PET in medical image. One modality is one channel)

Thanks.

Best,

Eric Kani

BatchGenerators on Windows error

Can't pickle local object 'MultiThreadedAugmenter._start.<locals>.producer' Exception ignored in: <bound method MultiThreadedAugmenter.__del__ of <batchgenerators.dataloading.multi_threaded_augmenter.MultiThreadedAugmenter object at 0x000001F3980FDA90>> Traceback (most recent call last): File "C:\ProgramData\Anaconda3\lib\site-packages\batchgenerators\dataloading\multi_threaded_augmenter.py", line 144, in __del__ self._finish() File "C:\ProgramData\Anaconda3\lib\site-packages\batchgenerators\dataloading\multi_threaded_augmenter.py", line 130, in _finish thread.terminate() File "C:\ProgramData\Anaconda3\Lib\multiprocessing\process.py", line 116, in terminate self._popen.terminate() AttributeError: 'NoneType' object has no attribute 'terminate' 485.19 core.mod.core.ioUtil ERROR: File '-c' does not exist 485.21 WARNING: Failed to load command line argument: -c 485.21 core.mod.core.ioUtil ERROR: File 'from multiprocessing.spawn import spawn_main; spawn_main(parent_pid=8892, pipe_handle=7904)' does not exist 485.22 WARNING: Failed to load command line argument: from multiprocessing.spawn import spawn_main; spawn_main(parent_pid=8892, pipe_handle=7904) 485.23 core.mod.core.ioUtil ERROR: File '--multiprocessing-fork' does not exist 485.24 WARNING: Failed to load command line argument: --multiprocessing-fork

Single line installation in readme

The current installation guide is 3-line and assumes that the user has git installed. This might be rather minor but any package on GitHub can be installed in one line without git:

pip install https://github.com/MIC-DKFZ/batchgenerators/archive/master.zip

This might be preferable for the readme.

How to augment image and mask at the same time

I was trying to train a 3d Segmentation network

my dataloader yields:
data: torch.Size([4, 1, 128, 128, 128])
label: torch.Size([4, 1, 128, 128, 128])

Then I used crop augmentation, then I get:
data: torch.Size([4, 1, 48, 48, 48])
label: torch.Size([4, 1, 128, 128, 128])

'data' has been croped, but 'label' was not.

How can I crop them samely?

Thanks!

Missing Package: batchgenerators.utilities in setup.py

Hi,
"batchgenerators.utilities" is missing in setup.py.

After installing 0.19.1 with pip or with setuptools i run into the following problem:
Python 3.6.8 |Anaconda, Inc.| (default, Feb 21 2019, 18:30:04) [MSC v.1916 64 bit (AMD64)] on win32
>>> import batchgenerators
Traceback (most recent call last):
File "", line 1, in
File "...\batchgenerators_init.py", line 7, in _
import batchgenerators.utilities
ModuleNotFoundError: No module named 'batchgenerators.utilities'

Have a nice day!

Augmentation on 3D Medical image

@FabianIsensee Hi, I'm interested in your study and attempted to augment the 3D medical image in my work. I developed it for the example you gave in project. the code is following. But in my backend, It seems to go into an infinite loop and is still in the data augmentation phase. Because it runs about one day and and has no to start training.

class DataLoader(DataLoaderBase):
    def __init__(self, data, BATCH_SIZE=2, num_batches=None, seed=False):
        super(DataLoader, self).__init__(data, BATCH_SIZE, num_batches, seed)
        # data is now stored in self._data.
    def generate_train_batch(self):
        # usually you would now select random instances of your data. We only have one therefore we skip this
        img = self._data[0:4]
        seg_data = self._data[4]
        #  Our batch layout must be (b, c, x, y, z). Let's fix that
        img = np.tile(img[None], (self.BATCH_SIZE, 1, 1, 1, 1))
        seg = np.tile(seg_data[None, None], (self.BATCH_SIZE, 1, 1, 1, 1))
        print('img shape:', img.shape)
        print('seg shape:', seg.shape)
        # now construct the dictionary and return it. np.float32 cast because most networks take float
        return {'data': img.astype(np.float32), 'seg': seg.astype(np.float32)}

def batch_generator_augment(data,truth,affine,batch_size):
    data_list = list()
    for data_index in range(data.shape[0]):
        image = get_image(data[data_index], affine)
        data_list.append(image.get_data())
    truth_image = get_image(truth, affine).get_data()
    data_list.append(truth_image)
    data = np.asarray(data_list)
    print("data.shape:",data.shape)
    batchgen = DataLoader(data,batch_size, None, False)

    tr_transforms = []
    tr_transforms.append(DataChannelSelectionTransform([0, 1, 2, 3]))
    tr_transforms.append(MirrorTransform())
    tr_transforms.append(SpatialTransform(truth.shape, np.array(truth.shape) // 2,
                                     do_elastic_deform=True, alpha=(0., 1300.), sigma=(10., 13.),
                                     do_rotation=True, angle_x=(0., 2*np.pi), angle_y=(0., 2*np.pi),angle_z=(0., 2*np.pi),
                                     do_scale=True, scale=(0.75, 1.25), border_mode_data='nearest',
                                     border_cval_data=0, order_data=3, border_mode_seg='constant', border_cval_seg=0,
                                     order_seg=0, random_crop=True))
    tr_transforms.append(ContrastAugmentationTransform((0.3, 3.), True))
    tr_transforms.append(GammaTransform((0.6, 2), False))
    tr_transforms.append(BrightnessTransform(0.0, 0.1, True))
    tr_transforms.append(SegChannelSelectionTransform([0]))
    #tr_transforms.append(ConvertSegToOnehotTransform(range(3), 0, "seg"))
    #singlethreaded_generator = SingleThreadedAugmenter(batchgen, Compose(tr_transforms))
    # plot_batch(singlethreaded_generator.__next__())
    gen_train = MultiThreadedAugmenter(batchgen, Compose(tr_transforms), 5, 3,None)
    gen_train.restart()
    return convert_to_data(gen_train)

def convert_to_data(gen_train):
    i = 0
    for data_dict in gen_train:
        print("cycle:",i+1)
        data = data_dict["data"].astype(np.float32)
        seg = data_dict["seg"]
        print('before data shape:',data.shape)
        print('before seg shape:', seg.shape)
    return data, seg

Increasing data augmentation throughput/performance

Dear DKFZ,

depending on the batch-size and augmentation types, I experience performance problems, where on-the-fly augmentation is significantly slowing down my training procedure. I.e. CPU is working at 100% load, but still cannot provide/augment samples fast enough for the GPU to crunch through them.

I did some profiling of your code and noticed that this was especially the case when scipy.ndimage.interpolation.map_coordinates is involved. I already change the interpolation order to 1, but still experience significant slowdown for high batch-sizes/large patch-volumes. Do you have any suggestions for improvements?

I thought about maybe integrating ITK, which now has a python wrapper, that allows memory sharing between np.ndarray and itk.image objects, and performing the actual deformation in itk with itk.WarpImageFilter. What are your thoughts on this? Did you maybe try it out already?

tests folder is installed to wrong place

It seems that tests folder is installed to site-packages/tests, it should be install to site-packages/batchgenerators/tests or we just don't installed it at all.

use batchgenerators as ImageDataGenerator in keras

dear Fabian Isensee:
hi. i am new in python and keras. i use your batchgenerator and i gave data loader entire dataset and add some transformation but how can i use your batchgenerator as ImageDataGenerator in keras to train my model? do i have to write Custom Keras Generators with your generator or not?
thank you in advance

Reason for **kwargs

Hi! I'm enjoying exploring this library, awesome contribution! I had one question though: what was the rationale for using a **kwargs version of the **data_dict in the __call__ function of the transforms?

My use case is that I'm actually wanting to use your transforms through pytorch's Dataset and Dataloaders, but the data format for these is just a normal python positional argument, e.g., a data_dict instead of **data_dict. It's hard to get your transforms to play nice or inter-operate with pytorch's existing data loading functionality. If it won't alter functionality or usability, I'm wondering if you'd consider changing your transform __call__ functions to just accept a positional data_dict argument instead? That would allow it to interface with pytorch without altering existing functionality, I believe.

Of course, there may be a reason that I'm not aware of :)

A question about compatibility about transforms

Hi,

The tranforms are very comprehensive and useful, but I was wondering whether those transforms are compatible with torchvision.transforms.Compose().
I noticed that batchgenerators also provided Compose() function and Dataloader. However, my work was mostly composed with pytorch and its related tools. And I hope to use the transforms of batchgenerators in pytorch's Dataloader, which helps minimize my workload.

Best ,
Yucheng

Syntax error after a recent commit

There may be a small syntax error in line 439 of batchgenerators/augmentations/utils.py since a recent commit.

Changing the line 439:
roi_masks = np.zeros((seg.shape[0], n_max_gt, *seg.shape[2:]))
to:
roi_masks = np.zeros((seg.shape[0], n_max_gt, seg.shape[2:]))
solved the problem.

Can this batchgenerators be used in the python3.6?

Can this batchgenerators be used in the python3.6? Thank you!

test normalization

Dear FabianIsensee:
I use the example of a brats2017_preprocessing. I found that the train sample use normalization preprocessing with the label, but the test data only have image, What should I do ?

How to use batchgenerators for regression

I'm working in inverse problems and quite commonly I have a set of inputs (e.g. noisy volumes) and some outputs (noiseless volume). batchgenerators looks like a perfect fit for data augmentation, but it seems to be designed for image -> segmentation. Is there any way to work with image -> image problems, in that case how do I call the library?

Can't pickle local object 'MultiThreadedAugmenter._start.<locals>.producer

Hi,

I used batchgenerators for quite a while now and it usually worked fine. I just wrote a custom dataset (which doesn't do anything special - just loading images with skimage.io.imread, resizing them and loading labels from a preloaded list and from files with np.loadtxt).

Suddenly I got the following error:

File "/home/students/schock/Delira/delira/training/trainer.py", line 256, in _train_epoch
    for i, batch in pbar:
  File "/home/temp/schock/anaconda3/envs/delira/lib/python3.6/site-packages/tqdm/_tqdm.py", line 979, in __iter__
    for obj in iterable:
  File "/home/temp/schock/anaconda3/envs/delira/lib/python3.6/site-packages/batchgenerators/dataloading/multi_threaded_augmenter.py", line 82, in __next__
    self._start()
  File "/home/temp/schock/anaconda3/envs/delira/lib/python3.6/site-packages/batchgenerators/dataloading/multi_threaded_augmenter.py", line 122, in _start
    self._threads[-1].start()
  File "/home/temp/schock/anaconda3/envs/delira/lib/python3.6/multiprocessing/process.py", line 105, in start
    self._popen = self._Popen(self)
  File "/home/temp/schock/anaconda3/envs/delira/lib/python3.6/multiprocessing/context.py", line 223, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "/home/temp/schock/anaconda3/envs/delira/lib/python3.6/multiprocessing/context.py", line 284, in _Popen
    return Popen(process_obj)
  File "/home/temp/schock/anaconda3/envs/delira/lib/python3.6/multiprocessing/popen_spawn_posix.py", line 32, in __init__
    super().__init__(process_obj)
  File "/home/temp/schock/anaconda3/envs/delira/lib/python3.6/multiprocessing/popen_fork.py", line 19, in __init__
    self._launch(process_obj)
  File "/home/temp/schock/anaconda3/envs/delira/lib/python3.6/multiprocessing/popen_spawn_posix.py", line 47, in _launch
    reduction.dump(process_obj, fp)
  File "/home/temp/schock/anaconda3/envs/delira/lib/python3.6/multiprocessing/reduction.py", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)
AttributeError: Can't pickle local object 'MultiThreadedAugmenter._start.<locals>.producer'

Is this a known issue or are there any known workarounds? I honestly don't know why it happened this time and where the error comes from.

My setup:
Ubuntu 18.04
PYthon: 3.6 (Conda environment)

Thanks in advance!
Justus

Need some advice

Hi,
Thanks for sharing your augmentation code. I'm studying on medical image preprocessing and want to get some advice.
In your paper An attempt at beating the 3D U-Net, you mentioned that you made use of scaling, rotations, brightness, contrast, gamma and Gaussian noise augmentations. Would you mind share some details about your strategy, like using theses augmentations separately or combine them in a specific order?
I would appreciate it very much if you are willing to give me some help!
Best,
Chen

PyPI version is ahead of github

Hi, the version of batchgenerators in PyPI is 0.19.3, but 0.19.2 in github. Could we make them sync? Thanks.

Odd scaling of slowdown when going from 2D to 3D

Hi,

I'm not sure if this is a bug, an error on my part, or just the way things are expected to be but I thought it was odd so I figured I'd bring it to your attention.

I've recently switched from looking at 2D images to 3D volumes, and I've found that the slowdown in augmenting (using SpatialTransform) is significantly higher than I would have expected.

I did some tests using dummy code from @FabianIsensee in #5 and compared the speed of the augmenter there using:

Dummy data size (32, 1, 256, 256) and patch_size (128, 128)
Dummy data size (32, 1, 256, 256, 256) and patch_size (128, 128, 128)

and found that with the SingleThreadedAugmenter, generation took ~0.2s/batch for test 1, and 57s/batch for test 2 -- a factor of about 300, rather than the 128 I was expecting.

Actual code here:

from batchgenerators.transforms.color_transforms import \
    BrightnessMultiplicativeTransform
from batchgenerators.transforms.spatial_transforms import SpatialTransform
from batchgenerators.transforms.abstract_transforms import Compose
from batchgenerators.transforms.sample_normalization_transforms import \
    MeanStdNormalizationTransform
from batchgenerators.dataloading.data_loader import SlimDataLoaderBase
import numpy as np
from time import time

# First 2D

class DummyLoader(SlimDataLoaderBase):
    def __init__(self):
        super(DummyLoader, self).__init__(None, None, None)

    def generate_train_batch(self):
        return {'data': np.random.random((32, 1, 256, 256))}


transforms = Compose([
    BrightnessMultiplicativeTransform(multiplier_range=(0.7, 1.3),
                                      per_channel=True),

    SpatialTransform(patch_size=(128, 128),
                     do_elastic_deform=True,
                     alpha=(90., 750.),
                     sigma=(9., 11.),
                     do_scale=True,
                     random_crop=False,
                     do_rotation=False,
                     order_data=1,
                     border_mode_data='reflect'),

    MeanStdNormalizationTransform(mean=[0.485],
                                  std=[0.229])
])


single_threaded_gen = SingleThreadedAugmenter(DummyLoader(), transforms)
multi_threaded_gen_one_thread = MultiThreadedAugmenter(DummyLoader(), transforms, 1, 1, None)
multi_threaded_gen_eight_threads = MultiThreadedAugmenter(DummyLoader(), transforms, 8, 1, None)

num_batches_warmup = 16
num_batches_run = 16

print("Running 2D tests")

####### SingleThreadedAugmenter #######
# warumup
_ = [next(single_threaded_gen) for _ in range(num_batches_warmup)]
# run
start = time()
_ = [next(single_threaded_gen) for _ in range(num_batches_run)]
end = time()
print("Generated %d batches with SingleThreadedAugmenter in %f seconds; %f s/batch" % (num_batches_run, end - start, (end - start) / num_batches_run))


####### MultiThreadedAugmenter (1 thread) #######
# warumup
_ = [next(multi_threaded_gen_one_thread) for _ in range(num_batches_warmup)]
# run
start = time()
_ = [next(multi_threaded_gen_one_thread) for _ in range(num_batches_run)]
end = time()
print("Generated %d batches with MultiThreadedAugmenter (1 thread) in %f seconds; %f s/batch" % (num_batches_run, end - start, (end - start) / num_batches_run))


####### MultiThreadedAugmenter (8 threads) #######
# warumup
_ = [next(multi_threaded_gen_eight_threads) for _ in range(num_batches_warmup)]
# run
start = time()
_ = [next(multi_threaded_gen_eight_threads) for _ in range(num_batches_run)]
end = time()
print("Generated %d batches with MultiThreadedAugmenter (8 threads) in %f seconds; %f s/batch" % (num_batches_run, end - start, (end - start) / num_batches_run))

# Now 3D

class DummyLoader(SlimDataLoaderBase):
    def __init__(self):
        super(DummyLoader, self).__init__(None, None, None)

    def generate_train_batch(self):
        return {'data': np.random.random((32, 1, 256, 256, 256))}


transforms = Compose([
    BrightnessMultiplicativeTransform(multiplier_range=(0.7, 1.3),
                                      per_channel=True),

    SpatialTransform(patch_size=(128, 128, 128),
                     do_elastic_deform=True,
                     alpha=(90., 750.),
                     sigma=(9., 11.),
                     do_scale=True,
                     random_crop=False,
                     do_rotation=False,
                     order_data=1,
                     border_mode_data='reflect'),

    MeanStdNormalizationTransform(mean=[0.485],
                                  std=[0.229])
])


single_threaded_gen = SingleThreadedAugmenter(DummyLoader(), transforms)
multi_threaded_gen_one_thread = MultiThreadedAugmenter(DummyLoader(), transforms, 1, 1, None)
multi_threaded_gen_eight_threads = MultiThreadedAugmenter(DummyLoader(), transforms, 8, 1, None)

num_batches_warmup = 16
num_batches_run = 16

print("Running 3D tests")
####### SingleThreadedAugmenter #######
# warumup
_ = [next(single_threaded_gen) for _ in range(num_batches_warmup)]
# run
start = time()
_ = [next(single_threaded_gen) for _ in range(num_batches_run)]
end = time()
print("Generated %d batches with SingleThreadedAugmenter in %f seconds; %f s/batch" % (num_batches_run, end - start, (end - start) / num_batches_run))

####### MultiThreadedAugmenter (1 thread) #######
# warumup
_ = [next(multi_threaded_gen_one_thread) for _ in range(num_batches_warmup)]
# run
start = time()
_ = [next(multi_threaded_gen_one_thread) for _ in range(num_batches_run)]
end = time()
print("Generated %d batches with MultiThreadedAugmenter (1 thread) in %f seconds; %f s/batch" % (num_batches_run, end - start, (end - start) / num_batches_run))

####### MultiThreadedAugmenter (8 threads) #######
# warumup
_ = [next(multi_threaded_gen_eight_threads) for _ in range(num_batches_warmup)]
# run
start = time()
_ = [next(multi_threaded_gen_eight_threads) for _ in range(num_batches_run)]
end = time()
print("Generated %d batches with MultiThreadedAugmenter (8 threads) in %f seconds; %f s/batch" % (num_batches_run, end - start, (end - start) / num_batches_run))

and output here:

Running 2D tests
Generated 16 batches with SingleThreadedAugmenter in 3.183878 seconds; 0.198992 s/batch
Generated 16 batches with MultiThreadedAugmenter (1 thread) in 3.397480 seconds; 0.212343 s/batch
Generated 16 batches with MultiThreadedAugmenter (8 threads) in 0.509470 seconds; 0.031842 s/batch
Running 3D tests
Generated 16 batches with SingleThreadedAugmenter in 919.891915 seconds; 57.493245 s/batch
Generated 16 batches with MultiThreadedAugmenter (1 thread) in 931.741837 seconds; 58.233865 s/batch
Generated 16 batches with MultiThreadedAugmenter (8 threads) in 232.098892 seconds; 14.506181 s/batch

PS: Thanks so much for your work in making this excellent package! It really is absolutely fantastic

	if self.pin_memory:
	self.pin_memory_queue = thrQueue(2)
	self.pin_memory_thread = threading.Thread(target=pin_memory_loop, args=(self._queues, self.pin_memory_queue))
	self.pin_memory_thread.daemon = True
	self.pin_memory_thread.start()

	def producer(queue, data_loader, transform, thread_id, seed, abort_event):
	try:
	np.random.seed(seed)
	data_loader.set_thread_id(thread_id)
	item = None

	def augment_brightness_multiplicative(data_sample, multiplier_range=(0.5, 2), per_channel=True):
	multiplier = random.uniform(multiplier_range[0], multiplier_range[1])
	if not per_channel:
	data_sample *= multiplier
	else:
	for c in range(data_sample.shape[0]):
	multiplier = random.uniform(multiplier_range[0], multiplier_range[1])