loudinthecloud / pytorch-ntm Goto Github PK
View Code? Open in Web Editor NEWNeural Turing Machines (NTM) - PyTorch Implementation
License: BSD 3-Clause "New" or "Revised" License
Neural Turing Machines (NTM) - PyTorch Implementation
License: BSD 3-Clause "New" or "Revised" License
Hello,
I'm just trying to reproduce the results/graphs shown in the README.md for the "copy" task.
I am running this on the latest master branch:
Commit 5c5ce66376e8032c38ef4327ca381fee145f4d0f
./train.py --seed 1000 --task copy --checkpoint_interval 500 --checkpoint-path ./notebooks/copy -pbatch_size=15
NOTE: I used a batch_size of 15 instead of the default of 1, since it seems to lead to more stable convergence rates.
I then used the python notebook to generate 3 graphs shown in the README.md.
For convenience when comparing, I've included both the graphs I got, and the graph I expect (taken from the README.md).
(The expected graph being that Outputs matches the Targets)
My setup is:
pip freeze
reports the version as:torch==0.3.0.post4
/usr/lib/x86_64-linux-gnu/libcuda.so.384.111
/usr/local/cuda-9.0/lib64/libcublas.so.9.0.176
/usr/local/cuda-9.0/lib64/libcudart.so.9.0.176
/usr/local/cuda-9.0/lib64/libcudnn.so.7.0.5
/usr/local/cuda-9.0/lib64/libcurand.so.9.0.176
/usr/local/cuda-9.0/lib64/libcusparse.so.9.0.176
/usr/local/cuda-9.0/lib64/libnvrtc.so.9.0.176
/usr/local/cuda-9.0/lib64/libnvToolsExt.so.1.0.0
Let me know if there is any other information I can provide.
I have an issue regarding the training, I launch the program I get :
KeyError: "attribute 'mem_bias' already exists"
It seems that you define 2 times mem_bias in memory.py :
self.mem_bias = Variable(torch.Tensor(N, M))
self.register_buffer('mem_bias', self.mem_bias.data)
It is perhaps a problem with the pytorch version (I have pytorch 0.3).
when run the code piece
seq_len = 60
_, x, y = next(iter(dataloader(1, 1, 8, seq_len, seq_len)))
result = evaluate(model.net, model.criterion, x, y)
y_out = result['y_out']
there comes the error information:
IndexError Traceback (most recent call last)
<ipython-input-41-127bd44fb490> in <module>()
1 seq_len = 60
2 _, x, y = next(iter(dataloader(1, 1, 8, seq_len, seq_len)))
----> 3 result = evaluate(model.net, model.criterion, x, y)
4 y_out = result['y_out']
D:\GithubProjs\pytorch-ntm-master\train.py in evaluate(net, criterion, X, Y)
151
152 result = {
--> 153 'loss': loss.data[0],
154 'cost': cost / batch_size,
155 'y_out': y_out,
IndexError: invalid index of a 0-dim tensor. Use tensor.item() to convert a 0-dim tensor to a Python number
how to solve it?
Dear author,
I've fork a repo(at https://github.com/marcwww/pytorch-ntm) from your work, mainly expecting to test the model on longer sequences(for example, training on sequences of length ranging from 1 to 10, and testing on seqs of length ranging from 11 to 20).
The question is that the final testing result after training without testing in the middle terms of the training process is different from that with testing in the middle terms. The experiment setting of the repo is the latter one (at https://github.com/marcwww/pytorch-ntm/blob/1d0595e165a6790219df76e0b7f13b48e406b4d9/train_test.py#L236).
In the forked repo batches for testing are sampled in the same way of ones for training (at https://github.com/marcwww/pytorch-ntm/blob/1d0595e165a6790219df76e0b7f13b48e406b4d9/tasks/copytask_test.py#L16). Actually, I've tried to see whether the result are from the intertwined sampling of training and testing by loading a pre-generated test set, and it does not help.
Could you please help me with this? Thanks a lot.
Hi, dear author:
def write(self, w, e, a):
"""write to memory (according to section 3.2)."""
self.prev_mem = self.memory
self.memory = Variable(torch.Tensor(self.batch_size, self.N, self.M))
erase = torch.matmul(w.unsqueeze(-1), e.unsqueeze(1))
add = torch.matmul(w.unsqueeze(-1), a.unsqueeze(1))
self.memory = self.prev_mem * (1 - erase) + add
In your writing method, I dont understand why u create a new Variable(torch.Tensor(self.batch_size, self.N, self.M))
and then assign the new value,
Why not write as following directly :
def write(self, w, e, a):
"""write to memory (according to section 3.2)."""
erase = torch.matmul(w.unsqueeze(-1), e.unsqueeze(1))
add = torch.matmul(w.unsqueeze(-1), a.unsqueeze(1))
self.memory = self.memory * (1 - erase) + add
Hi,
I have tried to run the copy task with the default parameters (controller_size=100, controller_layers=1, num_heads=1, sequence_width=8, sequence_min_len=1, sequence_max_len=20, memory_n=128, memory_m=20, batch_size=1), the result is similar to the one in the notebook. However, when I changed the sequence length to a smaller one (sequence_min_len=1, sequence_max_len=5), the fitting rate is really slow (like the figure below) which is unexpected since smaller sequence should be learned faster. Do you have any idea why this happen and how to train smaller sequences properly? Any suggestion is welcomed.
Dear author,
I found your readme is very easy to follow and the animation of training process is vivid.
I wonder how to draw these pictures and animation?
So people can train on gpu easily.
Dear sir:
I have read your code and I really appreciate your work.But I have get some questions.
just clean all the content in the memory and initialize it with mem_bias.
So what's the point to write and read from memory ? it just be the same with mem_bias each batch and mem_bias is not updated which means it's never changed.
I think I just could not figure it out and I would readlly appreciate it if you could answer my question.
Dear sir:
Sorry for my rude words.But I do really want to know how to change the code if the sequence length is different.It seems that ntm need sequence length to be the same.
I tried to fill short sequence with zero but the ntm use attention and if filled with zero the predict rate down.
I want to know how could I do to overcome the problem.Please help me.
Best wishes to you.
Dear sir:
Viewing the code,for each input batch, B C,it would be stored as B N M.All the N
weight sum to 1.It actually split C into N pieces C.It's called vague storage.But I don't know why user this kind of storage.What's the advantage of this storage?
Dear author:
Your code of NTM is a pretty work, and its structure is concise and easy to follow. But I have a little confusion about why each batch has a memory? Why not using only a memory for every batch, just like a LSTM but just expanding the memory cell size? Could you help me address this confusion? Thank you very much!
Dear author:
I found you initialize read vector and memory as :
self.register_buffer('mem_bias', torch.Tensor(N, M))
# Initialize memory bias
stdev = 1 / (np.sqrt(N + M))
nn.init.uniform_(self.mem_bias, -stdev, stdev)
and
init_r_bias = torch.randn(1, M).to('cuda') * 0.01
# the initial value of read vector is not optimized.
self.register_buffer("read{}_bias".format(self.num_read_heads), init_r_bias)
I wonder whether the initialization scheme will make a big difference,
or I can just all initialized to torch.zeros()
??
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.