Comments (7)
Sorry for the late reply @Smerity -- pulled the changes in your branch is it works on my code! Running on multi-GPU no with DataParallel
on. Thanks for the tutorial on how to handle multi-GPU with the .compile()
. Very fast.
from pytorch-qrnn.
I believe it’s a CuPy thing. You’d probably have more success running with DistributedDataParallel since each process has a totally separate CUDA handle, but maybe there’s a way to fix it here? What's the error?
from pytorch-qrnn.
Yep CuPy thing for sure. Here's my stack trace:
Code is very simple. torch.nn.Embedding
on a list of IDs, which get fed into single-layer QRNN. Works fine in CPU mode and in GPU mode with single machine and DataParallel off, just .cuda()
the model.
Will try DistributedDataParallel
in the meantime. Thanks!
File "main.py", line 510, in train
output = model(...)
File "/opt/conda/envs/pytorch-py35/lib/python3.5/site-packages/torch/nn/modules/module.py", line 224, in __call__
result = self.forward(*input, **kwargs)
File "/opt/conda/envs/pytorch-py35/lib/python3.5/site-packages/torch/nn/parallel/data_parallel.py", line 60, in forward
outputs = self.parallel_apply(replicas, inputs, kwargs)
File "/opt/conda/envs/pytorch-py35/lib/python3.5/site-packages/torch/nn/parallel/data_parallel.py", line 70, in parallel_apply
return parallel_apply(replicas, inputs, kwargs, self.device_ids)
File "/opt/conda/envs/pytorch-py35/lib/python3.5/site-packages/torch/nn/parallel/parallel_apply.py", line 67, in parallel_apply
raise output
File "/opt/conda/envs/pytorch-py35/lib/python3.5/site-packages/torch/nn/parallel/parallel_apply.py", line 42, in _worker
output = module(*input, **kwargs)
File "/opt/conda/envs/pytorch-py35/lib/python3.5/site-packages/torch/nn/modules/module.py", line 224, in __call__
result = self.forward(*input, **kwargs)
File "/code/model_rnn.py", line 171, in forward
output, hidden = self.rnn_jr(j_vec)
File "/opt/conda/envs/pytorch-py35/lib/python3.5/site-packages/torch/nn/modules/module.py", line 224, in __call__
result = self.forward(*input, **kwargs)
File "/opt/conda/envs/pytorch-py35/lib/python3.5/site-packages/torchqrnn/qrnn.py", line 160, in forward
input, hn = layer(input, None if hidden is None else hidden[i])
File "/opt/conda/envs/pytorch-py35/lib/python3.5/site-packages/torch/nn/modules/module.py", line 224, in __call__
result = self.forward(*input, **kwargs)
File "/opt/conda/envs/pytorch-py35/lib/python3.5/site-packages/torchqrnn/qrnn.py", line 95, in forward
C = ForgetMult()(F, Z, hidden, use_cuda=self.use_cuda)
File "/opt/conda/envs/pytorch-py35/lib/python3.5/site-packages/torch/nn/modules/module.py", line 224, in __call__
result = self.forward(*input, **kwargs)
File "/opt/conda/envs/pytorch-py35/lib/python3.5/site-packages/torchqrnn/forget_mult.py", line 172, in forward
if hidden_init is None: return GPUForgetMult()(f, x) if use_cuda else CPUForgetMult()(f, x)
File "/opt/conda/envs/pytorch-py35/lib/python3.5/site-packages/torchqrnn/forget_mult.py", line 125, in forward
self.forget_mult(grid=grid, block=(grid_hidden_size, 1), args=[result.data_ptr(), f.data_ptr(), x.data_ptr(), seq_size, batch_size, hidden_size], stream=self.stream)
File "cupy/cuda/function.pyx", line 141, in cupy.cuda.function.Function.__call__
File "cupy/cuda/function.pyx", line 123, in cupy.cuda.function._launch
File "cupy/cuda/driver.pyx", line 169, in cupy.cuda.driver.launchKernel
File "cupy/cuda/driver.pyx", line 69, in cupy.cuda.driver.check_status
cupy.cuda.driver.CUDADriverError: CUDA_ERROR_INVALID_HANDLE: invalid resource handle
srun: error: hsw224: task 0: Exited with exit code 1
from pytorch-qrnn.
+1 for @jekbradbury's DistributedDataParallel
suggestion for now. Unfortunately I'm swamped right now but I'll look at this after ICLR deadline.
My guess as to the line that needs to be updated for such support is:
pytorch-qrnn/torchqrnn/forget_mult.py
Line 112 in 3aa5e72
For now ForgetMult
assumes that the CUDA stream it sees when initially compiling the CUDA kernel is the correct one and is tied to a class variable. I did this as I wasn't sure how long setting up the stream
would take re: performance and I was only using it in a single GPU setting.
Setting self.stream
to the correct value either when constructing ForgetMult
or in the forward pass (where presumably the current CUDA stream is "correct") could be the fix? Wait ... Hmm ... We might need to consider what's set on the GPU during the compile()
step too.
from pytorch-qrnn.
Actually, scratch that, I have a version that appears to be working for DataParallel
. @moscow25, interested in playing with it to see if I've missed anything?
In master of https://github.com/Smerity/pytorch-qrnn where I've updated ForgetMult
and also included under examples
a test of multiple GPU using DataParallel
.
For quite large matrices and sequences (where I didn't want to go much larger as the single GPU runs out of memory):
Single
Time: 47.81051325798035
Two GPUs
Time: 31.48588228225708
Difference:
Variable containing:
0
[torch.cuda.FloatTensor of size 1 (GPU 0)]
where "difference" is the total sum difference between the result from the single GPU and two GPU runs.
Note: the speed-up could be even better (the single GPU sits at 100% utilization but the two GPUs sit at ~70% utilization when the batch is split) but then the experiment would take forever on a single GPU / run out of memory.
from pytorch-qrnn.
Glad you could test it - and huzzah that it's working for you! ^_^ Any vague approximation note on how much faster 4xGPU QRNN is than 4xLSTM? Eh, I'll settle for "very fast" anyway - really glad it's working for you ^_^
I'll merge this in now but will need to update the README to state completion either if I get ahead of my paper deadline for this Friday (lol) or early next week.
from pytorch-qrnn.
Will update numbers here when I get them for 4x GPU comparison with LSTM. Had trouble getting a P100 scheduled. I also need to get QRNN working for 2+ layers -- some size mismatch for that one so not "drop in replacement" for LSTM, but I'm sure I can fix it.
from pytorch-qrnn.
Related Issues (20)
- Problem with QRNN num_layers=2, layers=None, and input_size != hidden_size HOT 1
- Bad squeeze in CPUForgetMult HOT 2
- RuntimeError: matrix and matrix expected HOT 1
- Could you update the code to Pytorch 0.4?
- Any updates on bidirectional QRNN?
- Bidirectional QRNN? HOT 2
- Strings are encoded twice by both QRNN and Pynvrtc? HOT 1
- RuntimeError: size mismatch if use the window size of 2
- Is there a sample dataset to demo the project on seq2seq model HOT 1
- Can QRNN be used in a online manner?
- Error when running your example and on AWD-LSTM-LM HOT 3
- ForgetMult equation in code is different from the paper HOT 2
- AttributeError: 'bytes' object has no attribute 'encode' HOT 4
- [WinError 126] The specified module could not be found - Any idea of the error source? HOT 1
- Legacy autograd Runtime error HOT 2
- Is there tensorflow code for QRNN?
- Support local development by removing dependency on PyCy when not used
- Package install ASCII error from long_description=open('README')? HOT 3
- Error in executing QRNN HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pytorch-qrnn.