Comments (10)
Are you running something else already on the GPU?
from tensorflow-wavenet.
nope, fresh install, fresh reboot, running headless and only process using the GPU. Has anyone successfully trained the network in less than 3.5 GiB on the GPU ?
from tensorflow-wavenet.
Looks like it stopped on 2 particularly large input samples.
Did this happen immediately, or after a few iterations?
If it's just because of large samples, pull request #54 should fix this by always cutting them into fixed size pieces.
You can also try setting the BATCH_SIZE
to 1, or reducing the number of dilated convolutional layers in wavenet_params.json
.
3.5 GB isn't a lot, but should be enough to train the network with some tweaks.
Note that we haven't yet managed to reproduce the results from the wavenet paper, so it's not worth training the network unless you can invest the time to look for good hyperparameters.
from tensorflow-wavenet.
I get OOM on 8GB video ram always after 160 step
Seems like BATCH_SIZE = 1 solves the problem for me.
from tensorflow-wavenet.
BATCH_SIZE=1 allows the process to run for a few dozen steps before OOMing. I just tried to mount a second identical GPU to increase total mem to 7GB but it behaves similarly. The second GPU is well recognized by TF but I think we would need explicit placement to make it useful. I will now look into #54
from tensorflow-wavenet.
@lelayf: Yeah, we're not making use of extra GPUs at the moment.
#54 should definitely help. You can adjust the SAMPLE_SIZE
downwards if you still run into problems.
from tensorflow-wavenet.
it seems the training is now stalling silently after a few hundred steps. I do not get OOMs. I tried using 96000 sample_size, then 64000 and finally ran it with no sample_size on the command line. I also tried different GPUs, a GRID K2 and a Tesla K80. In all cases the same silent stalling arises.
from tensorflow-wavenet.
Could this be the same problem as in #65?
The audio pipeline stopped processing data after traversing the input files once.
Updating to one of the newer commits should fix that.
from tensorflow-wavenet.
Closing this, as all mentioned issues should be fixed at this point.
Feel free to comment if you still experience problems with this.
from tensorflow-wavenet.
Getting same OOM when running on 61GiB aws instance even with sample size 10,000.
commit: 3c973c0
python: 3.6
with --silence_threshold=0 on one utterance p225 from the original VCTK corpus
>>> psutil.virtual_memory()
svmem(total=64389132288, available=60933558272, percent=5.4, used=2929561600, free=59257782272, active=3300839424, inactive=1334075392, buffers=73183232, cached=2128605184, shared=21499904, slab=143368192)
Exception:
2018-05-13 06:20:26.790439: W tensorflow/core/framework/op_kernel.cc:1318] OP_REQUIRES failed at slice_op.cc:154 : Resource exhausted: OOM when allocating tensor with shape[1,44050,256] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
Storing checkpoint to ./logdir/train/2018-05-13T06-19-22 ... Done.
Traceback (most recent call last):
File "/home/ubuntu/training/tensorflow-wavenet/venv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1322, in _do_call
return fn(*args)
File "/home/ubuntu/training/tensorflow-wavenet/venv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1307, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File "/home/ubuntu/training/tensorflow-wavenet/venv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1409, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[1,38934,256] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[Node: wavenet_1/loss/Slice = Slice[Index=DT_INT32, T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"](wavenet_1/loss/Reshape, wavenet_1/loss/Slice/begin, wavenet_1/loss/Slice/size)]]
from tensorflow-wavenet.
Related Issues (20)
- how dialated convolution actually work ?
- How to stop and resume training HOT 2
- Problem on runing it on colab HOT 2
- generate.py very slow with GPU HOT 1
- TypeError: cast() missing 1 required positional argument: 'dtype'
- tensorboard result: the generated audio of generate.py is 0 seconds
- Understanding convolution kernels in dilation layers HOT 4
- TypeError: Value passed to parameter 'indices' has DataType float32 not in list of allowed values: uint8, int32, int64 HOT 1
- I failed to download the dataset, how should I resolve the voice HOT 1
- My loss function fluctuates like crazy.
- Colab problem: continue previous training HOT 4
- problem on generate only noise HOT 5
- testing much worse than training?
- QUESTION How long does it take to generate one sample? HOT 1
- Module 'tensorflow' has no attribute 'placeholder' HOT 8
- Why is there no activation function applied to the 1x1 conv that produces the dense output?
- ModuleNotFoundError: No module named 'tensorflow.contrib' HOT 1
- about loading VCTK_Corpus dataset?
- Project dependencies may have API risk issues
- Training wavenet to rap?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from tensorflow-wavenet.