Comments (13)
I've had people with random issues caused by conda environments. That's why I was asking =)
from batchgenerators.
I agree with @justusschock . You should probably sample difficult cases more often rather than augmenting them offline.
from batchgenerators.
Hi, @justusschock . I very appreciate your time and valuable help. Following your guidance, batchgeneraters
works well now. Thank you very much.
from batchgenerators.
Mhm difficult. I don't have any experience with Windows. @justusschock maybe knows what to do?
Irrespective of this particular issue I think data augmentation should not be done offline unless absolutely necessary. You can get a lot more variability if you do it online. BraTS is tough due to the four modalities, but you can train a 3D UNet with maybe ~5 CPU cores no problem when running data augmentation on the fly
from batchgenerators.
That's strange. I tested this on two separate windows machines and this works on both of them.
I sometimes get a broken pipe and/or lock due to race conditions if I try to access the same hdf5 file at the same time. Saying that I only got this issue with hdf5 meaning it should not be a general batchgenerators issue.
Can you maybe check if the same applies to you (and maybe create a gist containing a minimum working example to check on this)?
Are you maybe using additional custom multiprocessing within your loader/dataset?
EDIT :
Why do you restart the generator directly after creating it?
from batchgenerators.
Hi,
Why do you restart the generator directly after creating it?
That is on me. The code is something I wrote. See, if you initialize the MTA it will not start generating batches right away. It will do so only after you requested the first batch OR if you restart it. It is just a habit, but I usually initialize the data augmentation pipelin, then initialize the network and so on. If I restart the MTA then it will already start generating batches while the main process is busy with other things. It is a little more efficient. But at the end of the day it doesn't matter because a training can run for days. So a few seconds at the start wont change much.
@JunMa11 how did you install your python environment? Conda?
from batchgenerators.
@FabianIsensee maybe you might want to include a restart at the end of the initialization? I think this would match the excepted behavior more than having to restart it manually.
I also think the only thing that matters is how you installed the package itself, as conda simply provides an encapsulated environment as long as you don't install packages via conda (an I can confirm it works well with conda).
from batchgenerators.
@FabianIsensee Yes, I install the python environment by conda.
from batchgenerators.
Hi, @justusschock thanks for your reply very much.
I only got this issue with hdf5 meaning it should not be a general batchgenerators issue. Can you maybe check if the same applies to you (and maybe create a gist containing a minimum working example to check on this)?
Sorry, I do not know what hdf5 means. I provide an ErrorDemo to reproduce my error.
The enviroment is win10, python 3.6, Anaconda3-5.1.0-Windows-x86_64
.
Are you maybe using additional custom multiprocessing within your loader/dataset?
I do not have experience on multiprocessing. Would it be possible for you to give me more insights?
from batchgenerators.
Hi, @FabianIsensee thanks for your comment on offline data augmentation.
Irrespective of this particular issue I think data augmentation should not be done offline unless absolutely necessary. You can get a lot more variability if you do it online.
I agree with you that online augmentation can obtain more variability. My motivation of offline data augmentation is following.
I look at my whole tumor segmentation on brats 2018, most of the cases can get good results (Dice>0.88), but few "hard" cases get very low Dice (0.6-0.7).
I want to do some offline data argumentation for the case with low Dice score and do online data augmentation during training, too. In this way, I hope the network can learn these "hard" cases better.
Could you share your comment on this idea?
from batchgenerators.
I'll test this on Monday. Unfortunately my local machine is running linux.
If you are not familiar with multiprocessing, you most likely don't have a custom one. I thought you might have additional multiprocessing inside your BraTS2017DataLoader3D
which may have caused the problem, but this does not seem to be the case, so nevermind :)
what else do you have installed inside your environment?
Regarding the augmentation. Maybe it would be worth considering a weighted sampling together with online augmentation to present hard cases more frequently?
from batchgenerators.
Hi @justusschock Thanks for your quick reply.
These screenshots show the python packages in my environment.
Weighted sampling is a good idea that I missed. Thank you very much.
from batchgenerators.
So I just got the time to test this and I absolutely can't reproduce the error.
I tried the script you provided (which should be similar to the one by @FabianIsensee ). The only thing I noticed: I had to clone the repo again manually, since you mixed the setup code with the actual implementation (probably you just copied it there to get the imports working without an install). After I did a clean clone and a clean install everything worked like a charm (even with multiple epochs).
The steps I did are:
- Create a new conda environment:
conda create -n batchgen_test python=3.6
- Activate the environment:
conda activate batchgen_test
- clone the repo:
git clone https://github.com/MIC-DKFZ/batchgenerators
(maybe this has to be executed in a git bash) - Cd into the repo:
cd batchgenerators
- Install the repo locally:
pip install -e .
- Cd to thescript to execute:
cd YOUR/PATH/HERE
- Execute the script:
python brats2017_dataloader_3D.py
and the output was like
python brats2017_dataloader_3D.py
(4, 128, 128, 128)
(4, 128, 128, 128)
(4, 128, 128, 128)
(4, 128, 128, 128)
(4, 128, 128, 128)
(4, 128, 128, 128)
Running 3 epochs took a total of 38.92 seconds with time per epoch being [25.76951551437378, 3.648458957672119, 9.5059654712677]
The time is not representative since I'm running some heavily CPU-consuming tasks in parallel.
Can you maybe try this and confirm if this works?
from batchgenerators.
Related Issues (20)
- Suggestion of multiprocess mechanism in MultiThreadedAugmenter
- How can we use batchgenerators offline
- Data type for both input images and labels
- How to do batchgenerator on many images in a folder?
- Why twice crop in Brats example?
- Proposal: reproducibility in DataLoader HOT 1
- Incompatible with nnU-Net? HOT 1
- ImportError: cannot import name 'MultiThreadedAugmenter' from 'batchgenerators.dataloading' HOT 2
- RuntimeError HOT 7
- fillup_pad disappeared?
- from batchgenerators.dataloading import SingleThreadedAugmenter?
- RuntimeWarning in color_augumentations.
- error HOT 1
- Publish new version of batchgenerator for using Misalignment DA
- Logging should not be done with the root logger
- multithreaded_with_batches.ipynb: ImportError HOT 1
- A problem about SpatialTransform_2
- About 3D augmentation
- Question about multithreading:
- ImportError: cannot import name 'MultiThreadedAugmenter' from 'batchgenerators.dataloading' HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from batchgenerators.