mic-dkfz / batchgenerators Goto Github PK
View Code? Open in Web Editor NEWA framework for data augmentation for 2D and 3D image classification and segmentation
License: Apache License 2.0
A framework for data augmentation for 2D and 3D image classification and segmentation
License: Apache License 2.0
like the title, what's the differences?
which one should I use?
and 'SpatialTransform' VS 'SpatialTransform_2', confused
Thanks
Hi!
First of all, thanks for that awesome project. I feel like it will really be an improvement over my current 3D data augmentation, as it offers a nicer (and modular) way to stack multiple transforms with individual probabilities.
My current solution (CNN, data augmentation, and all) is built entirely around a 'channels_last' data layout, where every batch has a shape (b, x, y, z, c). From the code I viewed, it appears that the augmentations of batchgenerators naturally use a (b, c, x, y, z) format.
Is there an elegant way to still keep my data layout and augment it correctly? (Of course, I could transpose my batches in a DataLoader derived from SlimDataLoaderBase and wrap/derive MultiThreadedAugmenter to revert the transposition, but this seems quite heavy-handed to me.)
Best regards,
Carlchristian
In some situation, we need to implement our own enhancement algorithm, such as top hat contrast enhancement algorithm. How to implement it and embed into class MultiThreadedAugmenter?
@FabianIsensee Hi, thanks for your excellent works. I try to use the data augmentation to train my brain tumor segmentation task. I have attempted many times to use Spatial Augmentations
on my network(3D U-Net) during the training, but in terms of Dice, its effect is not obvious on testing data. The following code only works on the training dataset and the validation dataset does not augment it. The shapes of data and truth are (4,128,128,128) and (1,128,128,128). Can your give me some advices and how to improve it ?
def augment_spatial_data(data, truth, affine,batch_size):
data_list = list()
for data_index in range(data.shape[0]):
image = get_image(data[data_index], affine)
data_list.append(image.get_data())
data = np.asarray(data_list)
data = np.tile(data[None], (batch_size, 1, 1, 1, 1)).astype(np.float32)
truth_image = get_image(truth, affine).get_data()
truth_image = np.tile(truth_image[None,None], (batch_size, 1, 1, 1, 1)).astype(np.float32)
data,truth=augment_spatial(data,truth_image,patch_size=truth.shape,
patch_center_dist_from_border=np.array(truth.shape)//2)
data = np.tile(data[0], (1, 1, 1, 1))
truth = np.tile(truth[0, 0], (1, 1, 1))
return data,truth
Thanks for the great work.
Would it be possible for you to provide a tutorial for 3D medical image and label augmentation?
Eg: BraTS dataset.
Looking forward to your reply.
Best regards,
Edward
Hi,
I have an issue with the .next attribute :
first the dataloader class is not working for me in the Example.ipynb :
as when doing :
batchgen = DataLoader(data.camera(), 4, None, False) batch = batchgen.next()
returns this error:
AttributeError: 'DataLoader' object has no attribute 'next'
it can be fixed by renaming generate_train_batch in the DataLoader by next, but not sure this is the good way to do it. Indeed later,
multithreaded_generator = MultiThreadedAugmenter(batchgen, all_transforms, 4, 2, seeds=None)
also as no attribute .next and when doing multithreaded_generator.generator.next() which works, the transformation didn't apply...
I don't know where the error come from,
Best regards
Paul
Hi,
I am trying to emulate the brats2017DataLoader3D
to get my Kits2019DataLoader3D
, but I encounter the following problem. My program stopped at batch = next(tr_gen)
when I used the MultiThreadedAugmenter
(I tried the SingleThreadedAugmenter
as well, which works fine on my laptop). I also try to debug it, but nothing just happened. When I look out the processes that created on my computer by using htop, I found a lot of complexing processes created by the program even if I have terminated it. Therefore, I was wondering whether there are some problems with my code or the MultiThreadedAugmenter
.
`if name == "main":
preprocessed_folders = "./npy_data"
num_of_threads_for_kits19 = 4
patients = get_list_of_patients(preprocessed_data_folder=preprocessed_folders)
train, val = get_split_deterministic(patients, fold=0, num_splits=2, random_state=12345)
patch_size = (128, 160, 160)
batch_size = 4
# dataloader = Kits2019DataLoader3D(train, batch_size, patch_size, 1)
# batch = next(dataloader)
# try:
# from batchviewer import view_batch
# # batch viewer can show up to 4d tensors. We can show only one sample, but that should be sufficient here
# view_batch(batch['data'][0], batch['seg'][0])
# except ImportError:
# view_batch = None
# print("you can visualize batches with batchviewer. It's a nice and handy tool. You can get it here: "
# "https://github.com/FabianIsensee/BatchViewer")
# now we have some DataLoader. Let's go and get some augmentations
# first let's collect all shapes, you will see why later
shapes = [Kits2019DataLoader3D.load_patient(
i)[0].shape[1:] for i in patients]
max_shape = np.max(shapes, 0)
max_shape = np.max((max_shape, patch_size), 0)
# we create a new instance of DataLoader. This one will return batches of shape max_shape. Cropping/padding is
# now done by SpatialTransform. If we do it this way we avoid border artifacts (the entire brain of all cases will
# be in the batch and SpatialTransform will use zeros which is exactly what we have outside the brain)
# this is viable here but not viable if you work with different data. If you work for example with CT scans that
# can be up to 500x500x500 voxels large then you should do this differently. There, instead of using max_shape you
# should estimate what shape you need to extract so that subsequent SpatialTransform does not introduce border
# artifacts
dataloader_train = Kits2019DataLoader3D(
train, batch_size, patch_size, num_of_threads_for_kits19)
# during training I like to run a validation from time to time to see where I am standing. This is not a correct
# validation because just like training this is patch-based but it's good enough. We don't do augmentation for the
# validation, so patch_size is used as shape target here
dataloader_validation = Kits2019DataLoader3D(
val, batch_size, patch_size, num_of_threads_for_kits19)
tr_transforms = get_train_transform(patch_size)
# tr_gen = SingleThreadedAugmenter(dataloader_train, transform=tr_transforms)
# val_gen = SingleThreadedAugmenter(dataloader_validation, transform=None)
# finally we can create multithreaded transforms that we can actually use for training
# we don't pin memory here because this is pytorch specific.
tr_gen = MultiThreadedAugmenter(dataloader_train, tr_transforms, num_processes=num_of_threads_for_kits19,
num_cached_per_queue=3,
seeds=None, pin_memory=False)
# we need less processes for vlaidation because we dont apply transformations
val_gen = MultiThreadedAugmenter(dataloader_validation, None,
num_processes=max(1, num_of_threads_for_kits19//2), num_cached_per_queue=1,
seeds=None,
pin_memory=False)
# lets start the MultiThreadedAugmenter. This is not necessary but allows them to start generating training
# batches while other things run in the main thread
tr_gen.restart()
val_gen.restart()
# now if this was a network training you would run epochs like this (remember tr_gen and val_gen generate
# inifinite examples! Don't do "for batch in tr_gen:"!!!):
'''
num_batches_per_epoch = 10
num_validation_batches_per_epoch = 3
num_epochs = 5
# let's run this to get a time on how long it takes
time_per_epoch = []
start = time()
for epoch in range(num_epochs):
start_epoch = time()
for b in range(num_batches_per_epoch):
batch = next(tr_gen)
# do network training here with this batch
for b in range(num_validation_batches_per_epoch):
batch = next(val_gen)
# run validation here
end_epoch = time()
time_per_epoch.append(end_epoch - start_epoch)
end = time()
total_time = end - start
print("Running %d epochs took a total of %.2f seconds with time per epoch being %s" %
(num_epochs, total_time, str(time_per_epoch)))
'''
# if you notice that you have CPU usage issues, reduce the probability with which the spatial transformations are
# applied in get_train_transform (down to 0.1 for example). SpatialTransform is the most expensive transform
from batchviewer import view_batch
# if you wish to visualize some augmented examples, install batchviewer and uncomment this
if view_batch is not None:
for _ in range(4):
batch = next(tr_gen)
view_batch(batch['data'][0], batch['seg'][0])
else:
print("Cannot visualize batches, install batchviewer first. It's a nice and handy tool. You can get it here: "
"https://github.com/FabianIsensee/BatchViewer")`
Best,
Yucheng
there is a command 'cd dldabg' but I did not find the folder
Hello,
I have this problem( mirror_transform = MirrorTransform(axes=(2, 3))
File "/home/xy/hb/MIC/batchgenerators-master/batchgenerators/transforms/spatial_transforms.py", line 199, in init
raise ValueError("MirrorTransform now takes the axes as the spatial dimensions. What previously was "
ValueError: MirrorTransform now takes the axes as the spatial dimensions. What previously was axes=(2, 3, 4) to mirror along all spatial dimensions of a 5d tensor (b, c, x, y, z) is now axes=(0, 1, 2). Please adapt your scripts accordingly) when I reproduce your code(MIC-DKFZ/batchgenerators). What's the reason? Looking forward to your reply!
sincere
Bing
I changed the data set to ISLES2018, and resize to (32,128128). When the data be generated in batches,it will get stucked. After debugging, I found out when the spatialtransform was removed, it will work. The data augmentation was so slow, but brats2017 was not slow. Why is that? Can anyone help me?
Hi,
I just wanted to let you know, that I have some issues with the latest release.
The issues are mainly, that there is a massive slowdown in our CI/CD (from about 45-50 minutes for all jobs in the matrix up to 14 hours or more).
Here is a build with the latest release (0.19.4, installed from PyPi) and here is the same build with release 0.19.3 (installed from PyPi). These builds are completely identical (besides the batchgenerators
version).
Unfortunately I did not have time to pinpoint the error (yet).
Best,
Justus
Hi DKFZ Team,
Due to the examples and the utilities, you got a new requirement which isn't reflected by the requirements.txt
file: sklearn
is used for data splitting.
Since you probably don't want the whole package to have this requirement (since this is only needed for examples), I propose the following changes:
Move examples
to the most outer scope and move utilities
inside examples
. This way, you could specify another requirements file for the examples.
And while working on this, I'd also recommend moving the tests
to the outer scope and using the find_packages
from setuptools
inside the setup, since this makes your whole setup process a bit more flexible.
These are just minor modifications, but especially the missing requirement could prevent future releases from being correctly installed.
Best,
Justus
EDIT: In fact, this actually prevents batchgenerators from being installed correctly. If you agree with the proposed changes, I'd open a PR for this
Hey,
Thank you for this great library, and all the rest of the open source code of DKFZ !
I think I have a way to make the elastic transform (slightly) faster. The idea is to build a deformation grid of lower size than the image, and then to do linear interpolation to compute a full displacement field. Depending on image sizes it yields a speed improvement (I have 25% speed improvement on 128**3 images with a grid subsampled by a factor 20.)
Of course, it's not possible to apply this when sigma is too low compared to the subsampling, because it makes the gaussian filtering weird !
#subfact is a subsampling factor on the full image grid, typically 10 to have good perfs
momenta = np.random.uniform(size=(h // subfact, d // subfact, w // subfact)) * 2 - 1
velocity_field = gaussian_filter(momenta, self.elastic_sigma / self.subfact, mode='constant', cval=0) * self.elastic_alpha
coords[dim] += map_coordinates(velocity_field, coordinates_to_map, order=1, mode='constant', cval=0)
Hi, I have a doubt whether is it an impact to firstly normalize the raw datas(BraTS2018 Data) for some data augmentations?
Hi,
I'm trying to cite you using the entry on Zenodo. The names are Isensee Fabian; Jäger Paul
, etc. That way, the BibTeX format is not generated correctly. You should change it to Isensee, Fabian; Jäger, Paul
, etc.
By the way, you can automate the releases and this stuff using a Zenodo config file, as NiBabel guys do.
I am coming to this package from Medical Detection Toolkit and the paper and the toolkit claim to do "patching" but it looks to me like what is called "patching" is just cropping (random or center).
Is there anyway in this toolbox to take a large image/volume and return a batch of patches (say input a 100 x 100 image and return 4 25x25 patches)?
Hi,
are there any plans to support the Nvidia Dali library for fast GPU based augmentation in the future?
(https://github.com/NVIDIA/DALI, https://www.basicml.com/performance/2019/04/16/pytorch-data-augmentation-with-nvidia-dali)
Best
Andy
raise ValueError("MirrorTransform now takes the axes as the spatial dimensions. What previously was "
ValueError: MirrorTransform now takes the axes as the spatial dimensions. What previously was axes=(2, 3, 4) to mirror along all spatial dimensions of a 5d tensor (b, c, x, y, z) is now axes=(0, 1, 2). Please adapt your scripts accordingly.
Hi
Can I install batchgenerators in Anaconda Environment
Hi DKFZ,
i wish to cite the batchgenerators repository in a publication, as it's a pretty awesome framework.
Any preferred way on how to do so?
Best,
Dear DKFZ,
Thanks for the great repo.
I want to use this tool for off-line argumentation on Win10, and I follow the code in examples/brats2017. All things work well except the multiprocessing
.
I paste the error information. Would it be possible for you to tell me how to solve the problem?
My goal is off-line argumentation. I do not pursue efficience and only want it can work.
def main():
brats_preprocessed_folder = r"Pathto\BraTS2017_preprocessed"
num_threads_for_brats_example = 1
patients = get_list_of_patients(brats_preprocessed_folder)
train, val = get_split_deterministic(patients, fold=0, num_splits=2, random_state=12345)
patch_size = (128, 128, 128)
batch_size = 2
dataloader = BraTS2017DataLoader3D(train, batch_size, patch_size, 1)
batch = next(dataloader)
# first let's collect all shapes, you will see why later
shapes = [BraTS2017DataLoader3D.load_patient(i)[0].shape[1:] for i in patients]
max_shape = np.max(shapes, 0)
max_shape = np.max((max_shape, patch_size), 0)
# artifacts
dataloader_train = BraTS2017DataLoader3D(train, batch_size, max_shape, 1)
tr_transforms = get_train_transform(patch_size)
tr_gen = MultiThreadedAugmenter(dataloader_train, tr_transforms, num_processes=num_threads_for_brats_example,
num_cached_per_queue=3,
seeds=None, pin_memory=False)
tr_gen.restart()
num_batches_per_epoch = 2
num_epochs = 1
# let's run this to get a time on how long it takes
time_per_epoch = []
start = time()
for epoch in range(num_epochs):
start_epoch = time()
for b in range(num_batches_per_epoch):
batch = next(tr_gen)
print(batch['data'][0].shape)
# do network training here with this batch
end_epoch = time()
time_per_epoch.append(end_epoch - start_epoch)
end = time()
total_time = end - start
print("Running %d epochs took a total of %.2f seconds with time per epoch being %s" %
(num_epochs, total_time, str(time_per_epoch)))
if __name__ == '__main__':
from multiprocessing import freeze_support
freeze_support()
main()
Following error occurred:
runfile('E:/Data/DataAug/BatchGenerator/brats2017_dataloader_3D.py')
Traceback (most recent call last):
File "<ipython-input-1-4598568443c2>", line 1, in <module>
runfile('E:/Data/DataAug/BatchGenerator/brats2017_dataloader_3D.py')
File "D:\ProgramData\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 705, in runfile
execfile(filename, namespace)
File "D:\ProgramData\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 102, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)
File "E:/Data/DataAug/BatchGenerator/brats2017_dataloader_3D.py", line 229, in <module>
main()
File "E:/Data/DataAug/BatchGenerator/brats2017_dataloader_3D.py", line 198, in main
tr_gen.restart()
File "E:\Data\DataAug\BatchGenerator\batchgenerators\dataloading\multi_threaded_augmenter.py", line 254, in restart
self._start()
File "E:\Data\DataAug\BatchGenerator\batchgenerators\dataloading\multi_threaded_augmenter.py", line 224, in _start
self._processes[-1].start()
File "D:\ProgramData\Anaconda3\lib\multiprocessing\process.py", line 105, in start
self._popen = self._Popen(self)
File "D:\ProgramData\Anaconda3\lib\multiprocessing\context.py", line 223, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "D:\ProgramData\Anaconda3\lib\multiprocessing\context.py", line 322, in _Popen
return Popen(process_obj)
File "D:\ProgramData\Anaconda3\lib\multiprocessing\popen_spawn_win32.py", line 33, in __init__
prep_data = spawn.get_preparation_data(process_obj._name)
File "D:\ProgramData\Anaconda3\lib\multiprocessing\spawn.py", line 172, in get_preparation_data
main_mod_name = getattr(main_module.__spec__, "name", None)
AttributeError: module '__main__' has no attribute '__spec__'
I am looking forward to your reply.
Hi,
Thanks for this great work. I saw that you used multithread to build a high-performance data loader. What would be its advantage over Framework's Dataloader?
Hello! Thanks for your code firstly. It looks pretty good. But where is the model named ' builtins '? Thanks for your help.
Hi,
It can be the case that the pin_memory option in the MultiThreadedAugmenter has a memory leak, causing the RAM to overflow. This could happen because the processes started with this command:
batchgenerators/batchgenerators/dataloading/multi_threaded_augmenter.py
Lines 150 to 154 in 7b92930
are not properly stopped again.
In the dataloader from pycharm the following description is given prior to shutting down the workers
(https://pytorch.org/docs/stable/_modules/torch/utils/data/dataloader.html):
Exit
pin_memory_thread
first because exiting workers may leave
corrupted data inworker_result_queue
whichpin_memory_thread
reads from.
Maybe the problem can be solved if the workers are closed explicitly. I can try to fix it in the next couple of days.
Dear FabianIsensee:
1.
I have seen the code of brats2017_dataloader_3D.py,but i saw there are two same transforms about GammaTransforms and GaussianNoiseTransforms,why do you do that?
2.
if you use all spatial transformations with a probability of 0.2 per sample,no augmentation with a probability of 0.8 per sample
The other transformations with a probability of 0.18 per sample, no augmentation with a probability of 0.75 per sample
There are 6 transformations in total, This means that 1-(0.8x0.85x0.85x0.85x0.85x0.85)= 74% of samples will be augmented,is this too much for these sample?
Hope you can reply me :) thanks !~~~
From what I've read in the README is that the input data values must be arrays. Is there any functionality that allows me to have the values as a list of filenames so that the generator loads files on the fly (with augmentation)?
i.e.
# getting paths for data
from glob import glob
local_train_path = '/content/training/'
local_label_path = '/content/labels/'
train_paths = glob(local_train_path + '**.nii', recursive = True)
mask_paths = glob(local_label_path + '**.nii', recursive = True)
data_dict = {'data': train_paths, 'seg': `mask_paths}
Thank you for your time and consideration.
Hi, great work with Batchgenerators!
The MultiThreadedAugmenter leads to some non-deterministic behaviour even when it is seeded. After some investigation, I found that only the numpy.random seed is set by the producer. Additionally setting the random seed for the random package included in python fixed my issues. Maybe you could add that seed to the producer because in your brightness augmentation you are using random.uniform
, too.
batchgenerators/batchgenerators/augmentations/color_augmentations.py
Lines 65 to 71 in e86abab
Is there an example of what data structure and key names should we use for object detection labels? I'm trying to use this together with medicaldetectiontoolkit and I couldn't find an object detection example pipeline, only for segmentation.
Hi,
I am trying to run TractSeg training, and not sure about batchgenerators version I should use.
I have tried the master and got 'PadToMultipleTransform' missing. I have also tried 'tractseg_stable' branch and there I get 'FlipVectorAxisTransform' that is missing.
I will appreciate your help.
Thanks,
Ilya
Dear Fabian,
I tried to run the example but got an error :
ImportError: cannot import name 'Mirror'
I checked spatial_transforms.py but only found MirrorTransform
.
I am confused and cannot fix it, hope you could help me.
Best,
Ce Liang
Hi,
I've got a quick question:
Would it be possible to register batchgenerators
at PyPi? This would be really helpfull to develop packages depending on it, since this would not make a separate installation step necessary (as it is now).
Thanks!
Justus
Hi,
Thanks for your sharing of your augmentation code. I have a question about ''augment_channel_translation'' in ''spatial_transforms.py''.
What is the meaning of this augment method? Does it in order to prevent the impact of poor registration between different modality of one object?(different modality: like RGB and Depth in netural image; CT and MRI and PET in medical image. One modality is one channel)
Thanks.
Best,
Eric Kani
Can't pickle local object 'MultiThreadedAugmenter._start.<locals>.producer' Exception ignored in: <bound method MultiThreadedAugmenter.__del__ of <batchgenerators.dataloading.multi_threaded_augmenter.MultiThreadedAugmenter object at 0x000001F3980FDA90>> Traceback (most recent call last): File "C:\ProgramData\Anaconda3\lib\site-packages\batchgenerators\dataloading\multi_threaded_augmenter.py", line 144, in __del__ self._finish() File "C:\ProgramData\Anaconda3\lib\site-packages\batchgenerators\dataloading\multi_threaded_augmenter.py", line 130, in _finish thread.terminate() File "C:\ProgramData\Anaconda3\Lib\multiprocessing\process.py", line 116, in terminate self._popen.terminate() AttributeError: 'NoneType' object has no attribute 'terminate' 485.19 core.mod.core.ioUtil ERROR: File '-c' does not exist 485.21 WARNING: Failed to load command line argument: -c 485.21 core.mod.core.ioUtil ERROR: File 'from multiprocessing.spawn import spawn_main; spawn_main(parent_pid=8892, pipe_handle=7904)' does not exist 485.22 WARNING: Failed to load command line argument: from multiprocessing.spawn import spawn_main; spawn_main(parent_pid=8892, pipe_handle=7904) 485.23 core.mod.core.ioUtil ERROR: File '--multiprocessing-fork' does not exist 485.24 WARNING: Failed to load command line argument: --multiprocessing-fork
The current installation guide is 3-line and assumes that the user has git installed. This might be rather minor but any package on GitHub can be installed in one line without git:
pip install https://github.com/MIC-DKFZ/batchgenerators/archive/master.zip
This might be preferable for the readme.
I was trying to train a 3d Segmentation network
my dataloader yields:
data: torch.Size([4, 1, 128, 128, 128])
label: torch.Size([4, 1, 128, 128, 128])
Then I used crop augmentation, then I get:
data: torch.Size([4, 1, 48, 48, 48])
label: torch.Size([4, 1, 128, 128, 128])
'data' has been croped, but 'label' was not.
How can I crop them samely?
Thanks!
Hi,
"batchgenerators.utilities" is missing in setup.py.
After installing 0.19.1 with pip or with setuptools i run into the following problem:
Python 3.6.8 |Anaconda, Inc.| (default, Feb 21 2019, 18:30:04) [MSC v.1916 64 bit (AMD64)] on win32
>>> import batchgenerators
Traceback (most recent call last):
File "", line 1, in
File "...\batchgenerators_init.py", line 7, in _
import batchgenerators.utilities
ModuleNotFoundError: No module named 'batchgenerators.utilities'
Have a nice day!
@FabianIsensee Hi, I'm interested in your study and attempted to augment the 3D medical image in my work. I developed it for the example you gave in project. the code is following. But in my backend, It seems to go into an infinite loop and is still in the data augmentation phase. Because it runs about one day and and has no to start training.
class DataLoader(DataLoaderBase):
def __init__(self, data, BATCH_SIZE=2, num_batches=None, seed=False):
super(DataLoader, self).__init__(data, BATCH_SIZE, num_batches, seed)
# data is now stored in self._data.
def generate_train_batch(self):
# usually you would now select random instances of your data. We only have one therefore we skip this
img = self._data[0:4]
seg_data = self._data[4]
# Our batch layout must be (b, c, x, y, z). Let's fix that
img = np.tile(img[None], (self.BATCH_SIZE, 1, 1, 1, 1))
seg = np.tile(seg_data[None, None], (self.BATCH_SIZE, 1, 1, 1, 1))
print('img shape:', img.shape)
print('seg shape:', seg.shape)
# now construct the dictionary and return it. np.float32 cast because most networks take float
return {'data': img.astype(np.float32), 'seg': seg.astype(np.float32)}
def batch_generator_augment(data,truth,affine,batch_size):
data_list = list()
for data_index in range(data.shape[0]):
image = get_image(data[data_index], affine)
data_list.append(image.get_data())
truth_image = get_image(truth, affine).get_data()
data_list.append(truth_image)
data = np.asarray(data_list)
print("data.shape:",data.shape)
batchgen = DataLoader(data,batch_size, None, False)
tr_transforms = []
tr_transforms.append(DataChannelSelectionTransform([0, 1, 2, 3]))
tr_transforms.append(MirrorTransform())
tr_transforms.append(SpatialTransform(truth.shape, np.array(truth.shape) // 2,
do_elastic_deform=True, alpha=(0., 1300.), sigma=(10., 13.),
do_rotation=True, angle_x=(0., 2*np.pi), angle_y=(0., 2*np.pi),angle_z=(0., 2*np.pi),
do_scale=True, scale=(0.75, 1.25), border_mode_data='nearest',
border_cval_data=0, order_data=3, border_mode_seg='constant', border_cval_seg=0,
order_seg=0, random_crop=True))
tr_transforms.append(ContrastAugmentationTransform((0.3, 3.), True))
tr_transforms.append(GammaTransform((0.6, 2), False))
tr_transforms.append(BrightnessTransform(0.0, 0.1, True))
tr_transforms.append(SegChannelSelectionTransform([0]))
#tr_transforms.append(ConvertSegToOnehotTransform(range(3), 0, "seg"))
#singlethreaded_generator = SingleThreadedAugmenter(batchgen, Compose(tr_transforms))
# plot_batch(singlethreaded_generator.__next__())
gen_train = MultiThreadedAugmenter(batchgen, Compose(tr_transforms), 5, 3,None)
gen_train.restart()
return convert_to_data(gen_train)
def convert_to_data(gen_train):
i = 0
for data_dict in gen_train:
print("cycle:",i+1)
data = data_dict["data"].astype(np.float32)
seg = data_dict["seg"]
print('before data shape:',data.shape)
print('before seg shape:', seg.shape)
return data, seg
Dear DKFZ,
depending on the batch-size and augmentation types, I experience performance problems, where on-the-fly augmentation is significantly slowing down my training procedure. I.e. CPU is working at 100% load, but still cannot provide/augment samples fast enough for the GPU to crunch through them.
I did some profiling of your code and noticed that this was especially the case when scipy.ndimage.interpolation.map_coordinates
is involved. I already change the interpolation order to 1, but still experience significant slowdown for high batch-sizes/large patch-volumes. Do you have any suggestions for improvements?
I thought about maybe integrating ITK
, which now has a python wrapper, that allows memory sharing between np.ndarray
and itk.image
objects, and performing the actual deformation in itk with itk.WarpImageFilter
. What are your thoughts on this? Did you maybe try it out already?
It seems that tests
folder is installed to site-packages/tests
, it should be install to site-packages/batchgenerators/tests
or we just don't installed it at all.
dear Fabian Isensee:
hi. i am new in python and keras. i use your batchgenerator and i gave data loader entire dataset and add some transformation but how can i use your batchgenerator as ImageDataGenerator in keras to train my model? do i have to write Custom Keras Generators with your generator or not?
thank you in advance
Hi! I'm enjoying exploring this library, awesome contribution! I had one question though: what was the rationale for using a **kwargs
version of the **data_dict
in the __call__
function of the transforms?
My use case is that I'm actually wanting to use your transforms through pytorch's Dataset
and Dataloaders
, but the data format for these is just a normal python positional argument, e.g., a data_dict
instead of **data_dict
. It's hard to get your transforms to play nice or inter-operate with pytorch's existing data loading functionality. If it won't alter functionality or usability, I'm wondering if you'd consider changing your transform __call__
functions to just accept a positional data_dict
argument instead? That would allow it to interface with pytorch without altering existing functionality, I believe.
Of course, there may be a reason that I'm not aware of :)
Hi,
The tranforms are very comprehensive and useful, but I was wondering whether those transforms are compatible with torchvision.transforms.Compose().
I noticed that batchgenerators also provided Compose()
function and Dataloader
. However, my work was mostly composed with pytorch and its related tools. And I hope to use the transforms of batchgenerators in pytorch's Dataloader
, which helps minimize my workload.
Best ,
Yucheng
There may be a small syntax error in line 439 of batchgenerators/augmentations/utils.py since a recent commit.
Changing the line 439:
roi_masks = np.zeros((seg.shape[0], n_max_gt, *seg.shape[2:]))
to:
roi_masks = np.zeros((seg.shape[0], n_max_gt, seg.shape[2:]))
solved the problem.
Can this batchgenerators be used in the python3.6? Thank you!
Dear FabianIsensee:
I use the example of a brats2017_preprocessing. I found that the train sample use normalization preprocessing with the label, but the test data only have image, What should I do ?
I'm working in inverse problems and quite commonly I have a set of inputs (e.g. noisy volumes) and some outputs (noiseless volume). batchgenerators looks like a perfect fit for data augmentation, but it seems to be designed for image -> segmentation. Is there any way to work with image -> image problems, in that case how do I call the library?
Hi,
I used batchgenerators for quite a while now and it usually worked fine. I just wrote a custom dataset (which doesn't do anything special - just loading images with skimage.io.imread
, resizing them and loading labels from a preloaded list and from files with np.loadtxt
).
Suddenly I got the following error:
File "/home/students/schock/Delira/delira/training/trainer.py", line 256, in _train_epoch
for i, batch in pbar:
File "/home/temp/schock/anaconda3/envs/delira/lib/python3.6/site-packages/tqdm/_tqdm.py", line 979, in __iter__
for obj in iterable:
File "/home/temp/schock/anaconda3/envs/delira/lib/python3.6/site-packages/batchgenerators/dataloading/multi_threaded_augmenter.py", line 82, in __next__
self._start()
File "/home/temp/schock/anaconda3/envs/delira/lib/python3.6/site-packages/batchgenerators/dataloading/multi_threaded_augmenter.py", line 122, in _start
self._threads[-1].start()
File "/home/temp/schock/anaconda3/envs/delira/lib/python3.6/multiprocessing/process.py", line 105, in start
self._popen = self._Popen(self)
File "/home/temp/schock/anaconda3/envs/delira/lib/python3.6/multiprocessing/context.py", line 223, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "/home/temp/schock/anaconda3/envs/delira/lib/python3.6/multiprocessing/context.py", line 284, in _Popen
return Popen(process_obj)
File "/home/temp/schock/anaconda3/envs/delira/lib/python3.6/multiprocessing/popen_spawn_posix.py", line 32, in __init__
super().__init__(process_obj)
File "/home/temp/schock/anaconda3/envs/delira/lib/python3.6/multiprocessing/popen_fork.py", line 19, in __init__
self._launch(process_obj)
File "/home/temp/schock/anaconda3/envs/delira/lib/python3.6/multiprocessing/popen_spawn_posix.py", line 47, in _launch
reduction.dump(process_obj, fp)
File "/home/temp/schock/anaconda3/envs/delira/lib/python3.6/multiprocessing/reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
AttributeError: Can't pickle local object 'MultiThreadedAugmenter._start.<locals>.producer'
Is this a known issue or are there any known workarounds? I honestly don't know why it happened this time and where the error comes from.
My setup:
Ubuntu 18.04
PYthon: 3.6 (Conda environment)
Thanks in advance!
Justus
Hi,
Thanks for sharing your augmentation code. I'm studying on medical image preprocessing and want to get some advice.
In your paper An attempt at beating the 3D U-Net, you mentioned that you made use of scaling, rotations, brightness, contrast, gamma and Gaussian noise augmentations. Would you mind share some details about your strategy, like using theses augmentations separately or combine them in a specific order?
I would appreciate it very much if you are willing to give me some help!
Best,
Chen
Hi, the version of batchgenerators in PyPI is 0.19.3, but 0.19.2 in github. Could we make them sync? Thanks.
Hi,
I'm not sure if this is a bug, an error on my part, or just the way things are expected to be but I thought it was odd so I figured I'd bring it to your attention.
I've recently switched from looking at 2D images to 3D volumes, and I've found that the slowdown in augmenting (using SpatialTransform) is significantly higher than I would have expected.
I did some tests using dummy code from @FabianIsensee in #5 and compared the speed of the augmenter there using:
and found that with the SingleThreadedAugmenter, generation took ~0.2s/batch for test 1, and 57s/batch for test 2 -- a factor of about 300, rather than the 128 I was expecting.
Actual code here:
from batchgenerators.transforms.color_transforms import \
BrightnessMultiplicativeTransform
from batchgenerators.transforms.spatial_transforms import SpatialTransform
from batchgenerators.transforms.abstract_transforms import Compose
from batchgenerators.transforms.sample_normalization_transforms import \
MeanStdNormalizationTransform
from batchgenerators.dataloading.data_loader import SlimDataLoaderBase
import numpy as np
from time import time
# First 2D
class DummyLoader(SlimDataLoaderBase):
def __init__(self):
super(DummyLoader, self).__init__(None, None, None)
def generate_train_batch(self):
return {'data': np.random.random((32, 1, 256, 256))}
transforms = Compose([
BrightnessMultiplicativeTransform(multiplier_range=(0.7, 1.3),
per_channel=True),
SpatialTransform(patch_size=(128, 128),
do_elastic_deform=True,
alpha=(90., 750.),
sigma=(9., 11.),
do_scale=True,
random_crop=False,
do_rotation=False,
order_data=1,
border_mode_data='reflect'),
MeanStdNormalizationTransform(mean=[0.485],
std=[0.229])
])
single_threaded_gen = SingleThreadedAugmenter(DummyLoader(), transforms)
multi_threaded_gen_one_thread = MultiThreadedAugmenter(DummyLoader(), transforms, 1, 1, None)
multi_threaded_gen_eight_threads = MultiThreadedAugmenter(DummyLoader(), transforms, 8, 1, None)
num_batches_warmup = 16
num_batches_run = 16
print("Running 2D tests")
####### SingleThreadedAugmenter #######
# warumup
_ = [next(single_threaded_gen) for _ in range(num_batches_warmup)]
# run
start = time()
_ = [next(single_threaded_gen) for _ in range(num_batches_run)]
end = time()
print("Generated %d batches with SingleThreadedAugmenter in %f seconds; %f s/batch" % (num_batches_run, end - start, (end - start) / num_batches_run))
####### MultiThreadedAugmenter (1 thread) #######
# warumup
_ = [next(multi_threaded_gen_one_thread) for _ in range(num_batches_warmup)]
# run
start = time()
_ = [next(multi_threaded_gen_one_thread) for _ in range(num_batches_run)]
end = time()
print("Generated %d batches with MultiThreadedAugmenter (1 thread) in %f seconds; %f s/batch" % (num_batches_run, end - start, (end - start) / num_batches_run))
####### MultiThreadedAugmenter (8 threads) #######
# warumup
_ = [next(multi_threaded_gen_eight_threads) for _ in range(num_batches_warmup)]
# run
start = time()
_ = [next(multi_threaded_gen_eight_threads) for _ in range(num_batches_run)]
end = time()
print("Generated %d batches with MultiThreadedAugmenter (8 threads) in %f seconds; %f s/batch" % (num_batches_run, end - start, (end - start) / num_batches_run))
# Now 3D
class DummyLoader(SlimDataLoaderBase):
def __init__(self):
super(DummyLoader, self).__init__(None, None, None)
def generate_train_batch(self):
return {'data': np.random.random((32, 1, 256, 256, 256))}
transforms = Compose([
BrightnessMultiplicativeTransform(multiplier_range=(0.7, 1.3),
per_channel=True),
SpatialTransform(patch_size=(128, 128, 128),
do_elastic_deform=True,
alpha=(90., 750.),
sigma=(9., 11.),
do_scale=True,
random_crop=False,
do_rotation=False,
order_data=1,
border_mode_data='reflect'),
MeanStdNormalizationTransform(mean=[0.485],
std=[0.229])
])
single_threaded_gen = SingleThreadedAugmenter(DummyLoader(), transforms)
multi_threaded_gen_one_thread = MultiThreadedAugmenter(DummyLoader(), transforms, 1, 1, None)
multi_threaded_gen_eight_threads = MultiThreadedAugmenter(DummyLoader(), transforms, 8, 1, None)
num_batches_warmup = 16
num_batches_run = 16
print("Running 3D tests")
####### SingleThreadedAugmenter #######
# warumup
_ = [next(single_threaded_gen) for _ in range(num_batches_warmup)]
# run
start = time()
_ = [next(single_threaded_gen) for _ in range(num_batches_run)]
end = time()
print("Generated %d batches with SingleThreadedAugmenter in %f seconds; %f s/batch" % (num_batches_run, end - start, (end - start) / num_batches_run))
####### MultiThreadedAugmenter (1 thread) #######
# warumup
_ = [next(multi_threaded_gen_one_thread) for _ in range(num_batches_warmup)]
# run
start = time()
_ = [next(multi_threaded_gen_one_thread) for _ in range(num_batches_run)]
end = time()
print("Generated %d batches with MultiThreadedAugmenter (1 thread) in %f seconds; %f s/batch" % (num_batches_run, end - start, (end - start) / num_batches_run))
####### MultiThreadedAugmenter (8 threads) #######
# warumup
_ = [next(multi_threaded_gen_eight_threads) for _ in range(num_batches_warmup)]
# run
start = time()
_ = [next(multi_threaded_gen_eight_threads) for _ in range(num_batches_run)]
end = time()
print("Generated %d batches with MultiThreadedAugmenter (8 threads) in %f seconds; %f s/batch" % (num_batches_run, end - start, (end - start) / num_batches_run))
and output here:
Running 2D tests
Generated 16 batches with SingleThreadedAugmenter in 3.183878 seconds; 0.198992 s/batch
Generated 16 batches with MultiThreadedAugmenter (1 thread) in 3.397480 seconds; 0.212343 s/batch
Generated 16 batches with MultiThreadedAugmenter (8 threads) in 0.509470 seconds; 0.031842 s/batch
Running 3D tests
Generated 16 batches with SingleThreadedAugmenter in 919.891915 seconds; 57.493245 s/batch
Generated 16 batches with MultiThreadedAugmenter (1 thread) in 931.741837 seconds; 58.233865 s/batch
Generated 16 batches with MultiThreadedAugmenter (8 threads) in 232.098892 seconds; 14.506181 s/batch
PS: Thanks so much for your work in making this excellent package! It really is absolutely fantastic
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.