Git Product home page Git Product logo

pytorch-neucom's Issues

Compare with DeepMind's implementation

Hi @ypxie,

I just saw DeepMind released their Tensorflow/Sonnet implementation of the DNC,

https://github.com/deepmind/dnc

I was wondering if you're interested in re-looking at this project to see if we can compare it with DMs code? I spent quite a lot of time previously on your implementation, be a lot of fun to start working on it again ๐Ÿ‘

Kind regards,

Ajay

Serialise and save the `DNC` at checkpoints

Hi,

I think the training progresses well, and there does'nt seem to be any improvement after 10,000 iterations. Sometimes I get nan s very early on, and have to restart, and I also got nans after 40,000 iterations.

May I ask how to serialise and save the DNC at checkpoints? I set a check point after 100 iterations, and got the following error,

Using CUDA.
Iteration 0/50000
	Avg. Logistic Loss: 0.6931
Iteration 50/50000
	Avg. Logistic Loss: 0.6674
Iteration 100/50000
	Avg. Logistic Loss: 0.4560

Saving Checkpoint ... Traceback (most recent call last):
  File "train.py", line 183, in <module>
    torch.save(ncomputer, f)
  File "/home/ajay/anaconda3/envs/rllab3/lib/python3.5/site-packages/torch/serialization.py", line 120, in save
    return _save(obj, f, pickle_module, pickle_protocol)
  File "/home/ajay/anaconda3/envs/rllab3/lib/python3.5/site-packages/torch/serialization.py", line 186, in _save
    pickler.dump(obj)
_pickle.PicklingError: Can't pickle <class 'memory.mem_tuple'>: attribute lookup mem_tuple on memory failed

PS - Sometimes I get nan s very early on, and have to restart, and I also got nans after 40,000 iterations. I've seen this very often with Neural Turing Machines, so I guess it's inherent with these type of things?

Error running copy example

Using CPU.                   
Iteration 0/100000Traceback (most recent call last):
  File "train.py", line 177, in <module>
    output, _ = ncomputer(input_data)
  File "/home/jramapuram/.venv/lib/python2.7/site-packages/torch/nn/modules/module.py", line 206, in __call__
    result = self.forward(*input, **kwargs)
  File "../../neucom/dnc.py", line 108, in forward
    interface['erase_vector']
  File "../../neucom/memory.py", line 398, in write
    allocation_weight = self.get_allocation_weight(sorted_usage, free_list)
  File "../../neucom/memory.py", line 144, in get_allocation_weight
    flat_unordered_allocation_weight.cpu()
  File "/home/jramapuram/.venv/lib/python2.7/site-packages/torch/autograd/variable.py", line 629, in scatter_
    return Scatter(dim, True)(self, index, source)
RuntimeError: a leaf Variable that requires grad has been used in an in-place operation.

TypeError: norm received an invalid combination of arguments - got (int, bool), but expected one of:

(VENVpytorch2) mldl@mldlUB1604:~/ub16_prj/pytorch-NeuCom/tasks/Copy$ python train.py
Using CUDA.
Iteration 0/100000Traceback (most recent call last):
File "train.py", line 177, in
output, _ = ncomputer(input_data)
File "/home/mldl/ub16_prj/VENV_host/VENV_pytorch2/VENVpytorch2/local/lib/python2.7/site-packages/torch/nn/modules/module.py", line 206, in call
result = self.forward(*input, **kwargs)
File "../../neucom/dnc.py", line 108, in forward
interface['erase_vector']
File "../../neucom/memory.py", line 384, in write
lookup_weight = self.get_content_address(memory_matrix, key, strength)
File "../../neucom/memory.py", line 78, in get_content_address
cos_dist = cosine_distance(memory_matrix, query_keys)
File "../../neucom/utils.py", line 154, in cosine_distance
memory_norm = torch.norm(memory_matrix, 2, dim=True)
File "/home/mldl/ub16_prj/VENV_host/VENV_pytorch2/VENVpytorch2/local/lib/python2.7/site-packages/torch/autograd/variable.py", line 596, in norm
return Norm(p, dim)(self)
File "/home/mldl/ub16_prj/VENV_host/VENV_pytorch2/VENVpytorch2/local/lib/python2.7/site-packages/torch/autograd/_functions/reduce.py", line 192, in forward
output = input.norm(self.norm_type, self.dim)
TypeError: norm received an invalid combination of arguments - got (int, bool), but expected one of:

  • no arguments
  • (float p)
  • (float p, int dim)
    didn't match because some of the arguments have invalid types: (int, bool)

(VENVpytorch2) mldl@mldlUB1604:~/ub16_prj/pytorch-NeuCom/tasks/Copy$

backward through cumprod

Hey,
In function get_allocation_weight() in memory.py, cumprod is used, but as far as I know, this operation does not have autograd support yet and is still in PR: https://github.com/pytorch/pytorch/pull/1439, so I'm really confused as how is the backward passed through this operation for you? Is the resulting variables not in the computation graph?
Thanks in advance! Really nice repository:)

Error,,which pytorch version?

mldl@mldlUB1604:/ub16_prj/pytorch-NeuCom/tasks/Copy$ python train.py
Using CUDA.
Iteration 0/100000Traceback (most recent call last):
File "train.py", line 177, in
output, _ = ncomputer(input_data)
File "/usr/local/lib/python2.7/dist-packages/torch/nn/modules/module.py", line 206, in call
result = self.forward(*input, **kwargs)
File "../../neucom/dnc.py", line 108, in forward
interface['erase_vector']
File "../../neucom/memory.py", line 384, in write
lookup_weight = self.get_content_address(memory_matrix, key, strength)
File "../../neucom/memory.py", line 78, in get_content_address
cos_dist = cosine_distance(memory_matrix, query_keys)
File "../../neucom/utils.py", line 154, in cosine_distance
memory_norm = torch.norm(memory_matrix, 2, 2, keepdim=True)
TypeError: norm() got an unexpected keyword argument 'keepdim'
mldl@mldlUB1604:
/ub16_prj/pytorch-NeuCom/tasks/Copy$

AssertionError: leaf variable was used in an inplace operation

Hi thanks for sharing this, it's really cool that you managed to get it working ๐Ÿ‘

I got an error though when trying to run the repo,

Iteration 0/100000Traceback (most recent call last):
  File "train.py", line 155, in <module>
    loss.backward()
  File "/home/ajay/anaconda3/envs/pyphi/lib/python3.6/site-packages/torch/autograd/variable.py", line 158, in backward
    self._execution_engine.run_backward((self,), (gradient,), retain_variables)
  File "/home/ajay/anaconda3/envs/pyphi/lib/python3.6/site-packages/torch/autograd/variable.py", line 201, in _do_backward
    "leaf variable was used in an inplace operation"
AssertionError: leaf variable was used in an inplace operation

I've seen this before but I can't remember how to fix it? Can you suggest anything please?

Also maybe there's a typo on the "how to run" part of the cover page for the repo, I think it should be ,

python tasks/Copy/train.py

Thanks a lot for your help ๐Ÿ‘

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.