Git Product home page Git Product logo

pytorch-ntm's People

Contributors

loudinthecloud avatar marikgoldstein avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pytorch-ntm's Issues

Cannot reproduce README.md graphs/results on copy task [Commit: 5c5ce66]

Hello,

I'm just trying to reproduce the results/graphs shown in the README.md for the "copy" task.

I am running this on the latest master branch:
Commit 5c5ce66376e8032c38ef4327ca381fee145f4d0f

How I trained my model:

./train.py --seed 1000 --task copy --checkpoint_interval 500 --checkpoint-path ./notebooks/copy -pbatch_size=15

NOTE: I used a batch_size of 15 instead of the default of 1, since it seems to lead to more stable convergence rates.

I then used the python notebook to generate 3 graphs shown in the README.md.
For convenience when comparing, I've included both the graphs I got, and the graph I expect (taken from the README.md).

Graph 1: Training convergence

I got:
image
I expect:
image

Graph 2: Training convergence (per sequence length)

I got:
image
I expect:
image

Graph 3: Evaluate

I got:
image

(The expected graph being that Outputs matches the Targets)

Setup information

My setup is:

  • OS: Ubuntu 16.04.4 LTS
  • Python version: 3.6.4
  • PyTorch is installed using Anaconda, pip freeze reports the version as:
    torch==0.3.0.post4
  • CUDA/cuDNN versions/libraries in use by pytorch at runtime:
/usr/lib/x86_64-linux-gnu/libcuda.so.384.111
/usr/local/cuda-9.0/lib64/libcublas.so.9.0.176
/usr/local/cuda-9.0/lib64/libcudart.so.9.0.176
/usr/local/cuda-9.0/lib64/libcudnn.so.7.0.5
/usr/local/cuda-9.0/lib64/libcurand.so.9.0.176
/usr/local/cuda-9.0/lib64/libcusparse.so.9.0.176
/usr/local/cuda-9.0/lib64/libnvrtc.so.9.0.176
/usr/local/cuda-9.0/lib64/libnvToolsExt.so.1.0.0

Let me know if there is any other information I can provide.

register_buffer

I have an issue regarding the training, I launch the program I get :

KeyError: "attribute 'mem_bias' already exists"

It seems that you define 2 times mem_bias in memory.py :

self.mem_bias = Variable(torch.Tensor(N, M))
self.register_buffer('mem_bias', self.mem_bias.data)

It is perhaps a problem with the pytorch version (I have pytorch 0.3).

error in copy-task-plots.ipynb

when run the code piece

seq_len = 60
_, x, y = next(iter(dataloader(1, 1, 8, seq_len, seq_len)))
result = evaluate(model.net, model.criterion, x, y)
y_out = result['y_out']

there comes the error information:

IndexError                                Traceback (most recent call last)
<ipython-input-41-127bd44fb490> in <module>()
      1 seq_len = 60
      2 _, x, y = next(iter(dataloader(1, 1, 8, seq_len, seq_len)))
----> 3 result = evaluate(model.net, model.criterion, x, y)
      4 y_out = result['y_out']

D:\GithubProjs\pytorch-ntm-master\train.py in evaluate(net, criterion, X, Y)
    151 
    152     result = {
--> 153         'loss': loss.data[0],
    154         'cost': cost / batch_size,
    155         'y_out': y_out,

IndexError: invalid index of a 0-dim tensor. Use tensor.item() to convert a 0-dim tensor to a Python number

how to solve it?

Different results between testing in the mid terms of training and at the end of training

Dear author,

I've fork a repo(at https://github.com/marcwww/pytorch-ntm) from your work, mainly expecting to test the model on longer sequences(for example, training on sequences of length ranging from 1 to 10, and testing on seqs of length ranging from 11 to 20).

The question is that the final testing result after training without testing in the middle terms of the training process is different from that with testing in the middle terms. The experiment setting of the repo is the latter one (at https://github.com/marcwww/pytorch-ntm/blob/1d0595e165a6790219df76e0b7f13b48e406b4d9/train_test.py#L236).

In the forked repo batches for testing are sampled in the same way of ones for training (at https://github.com/marcwww/pytorch-ntm/blob/1d0595e165a6790219df76e0b7f13b48e406b4d9/tasks/copytask_test.py#L16). Actually, I've tried to see whether the result are from the intertwined sampling of training and testing by loading a pre-generated test set, and it does not help.

Could you please help me with this? Thanks a lot.

Why to create a new tensor?

Hi, dear author:

    def write(self, w, e, a):
        """write to memory (according to section 3.2)."""
        self.prev_mem = self.memory
        self.memory = Variable(torch.Tensor(self.batch_size, self.N, self.M))
        erase = torch.matmul(w.unsqueeze(-1), e.unsqueeze(1))
        add = torch.matmul(w.unsqueeze(-1), a.unsqueeze(1))
        self.memory = self.prev_mem * (1 - erase) + add

In your writing method, I dont understand why u create a new Variable(torch.Tensor(self.batch_size, self.N, self.M)) and then assign the new value,
Why not write as following directly :

    def write(self, w, e, a):
        """write to memory (according to section 3.2).""" 
        erase = torch.matmul(w.unsqueeze(-1), e.unsqueeze(1))
        add = torch.matmul(w.unsqueeze(-1), a.unsqueeze(1))
        self.memory = self.memory * (1 - erase) + add

Convergence is really slow with copy task when sequence length is smaller

Hi,

I have tried to run the copy task with the default parameters (controller_size=100, controller_layers=1, num_heads=1, sequence_width=8, sequence_min_len=1, sequence_max_len=20, memory_n=128, memory_m=20, batch_size=1), the result is similar to the one in the notebook. However, when I changed the sequence length to a smaller one (sequence_min_len=1, sequence_max_len=5), the fitting rate is really slow (like the figure below) which is unexpected since smaller sequence should be learned faster. Do you have any idea why this happen and how to train smaller sequences properly? Any suggestion is welcomed.

figure_1

What's the meaning of using memory?

Dear sir:
I have read your code and I really appreciate your work.But I have get some questions.

  1. self.register_buffer('mem_bias',torch.Tensor(N,M)) #mem_bias was used as buffer, which means it would not be update
  2. self.memory =self.mem_bias.clone().repeat(batch_size,1,1) # self.memory was create using mem_bias to match the batch_size
  3. for each batch, we run init_squence(),
    image
    which reset the memory,and the reset function ,
    self.batch_size = batch_size
    self.memory = self.mem_bias.clone().repeat(batch_size, 1, 1)

just clean all the content in the memory and initialize it with mem_bias.
So what's the point to write and read from memory ? it just be the same with mem_bias each batch and mem_bias is not updated which means it's never changed.
I think I just could not figure it out and I would readlly appreciate it if you could answer my question.

How to change code when each sequence length is different

Dear sir:
Sorry for my rude words.But I do really want to know how to change the code if the sequence length is different.It seems that ntm need sequence length to be the same.
I tried to fill short sequence with zero but the ntm use attention and if filled with zero the predict rate down.
I want to know how could I do to overcome the problem.Please help me.
Best wishes to you.

Why use vague storage

Dear sir:
Viewing the code,for each input batch, B C,it would be stored as B N M.All the N
weight sum to 1.It actually split C into N pieces C.It's called vague storage.But I don't know why user this kind of storage.What's the advantage of this storage?

Why did each batch has a memory?

Dear author:
Your code of NTM is a pretty work, and its structure is concise and easy to follow. But I have a little confusion about why each batch has a memory? Why not using only a memory for every batch, just like a LSTM but just expanding the memory cell size? Could you help me address this confusion? Thank you very much!

why need to initialize read vector and memory?

Dear author:
I found you initialize read vector and memory as :

		self.register_buffer('mem_bias', torch.Tensor(N, M))

		# Initialize memory bias
		stdev = 1 / (np.sqrt(N + M))
		nn.init.uniform_(self.mem_bias, -stdev, stdev)

and

init_r_bias = torch.randn(1, M).to('cuda') * 0.01
# the initial value of read vector is not optimized.
self.register_buffer("read{}_bias".format(self.num_read_heads), init_r_bias)

I wonder whether the initialization scheme will make a big difference,
or I can just all initialized to torch.zeros()??

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.