Git Product home page Git Product logo

Comments (14)

flybass avatar flybass commented on August 28, 2024 1

@nicholas-leonard Thanks for the advice (and super fast response). I'll see if normalization makes a difference here. At least I'm using the modules correctly. I added clipping on the gradients and managed to get the MSE much lower than the baseline. However, I still supposed that the LSTM should be able to do a better job learning addition (adding more hidden units seems like a lot of parameters for a simple function). I'm working on an analogous problem with very long documents (but first layer word embeddings). Are there any tips on initialization for the models? I see many use cases that set weights to uniform parameters.

Thanks again,
Daniel

from rnn.

nicholas-leonard avatar nicholas-leonard commented on August 28, 2024

Hi @shuokay, thank you for your question. So if I understand correctly, you want to input a sequence of images (handwritten characters) into the LSTM to predict a sequence characters (using SoftMax)?

from rnn.

shuokay avatar shuokay commented on August 28, 2024

@nicholas-leonard thanks for your reply. the input is a sequence of points that every stroke in one charater throughed, not images, and the output is ONE charater. (the 'online' means you can get every point in one stroke exactly while 'offline' means you can't get the points exactly, what you can get is only one image)
I think it is a many to one task described in this page http://karpathy.github.io/2015/05/21/rnn-effectiveness/

from rnn.

nicholas-leonard avatar nicholas-leonard commented on August 28, 2024

For many to one problems, you can use this kind of architecture :

rnn = nn.Sequential()
rnn:add(nn.Sequencer(nn.Recurrent(...)))
rnn:add(nn.SelectTable(-1))
rnn:add(nn.Linear(...))
rnn:add(nn.LogSoftMax())

So basically, input is a sequence of points (coordinates or whatever), output is log-likelihood of character. The key is the SelectTable(-1) which only takes the last step's rnn output to predict the character.

from rnn.

shuokay avatar shuokay commented on August 28, 2024

@nicholas-leonard thank you very much. Just one advice, Torch and its packages need more detailed documentation.^_^

from rnn.

shuokay avatar shuokay commented on August 28, 2024

Hi @nicholas-leonard , still get errors. My code like this:

require 'dp'
require 'torch'
rho=5
hiddenSize=100
inputSize=2
outputSize=10

lstm=nn.LSTM(inputSize, hiddenSize,rho)
model = nn.Sequential()
model:add(lstm)
model:add(nn.SelectTable(-1))
model:add(nn.Linear(hiddenSize,outputSize))
model:add(nn.LogSoftMax())

criterion=nn.ClassNLLCriterion()

input=torch.Tensor({{0,0,1,1,2,2,3,3}}) --the input points in one stroke (0,0),(1,1),(2,2),(3,3)
output=torch.Tensor({1,0,0,0,0,0,0,0,0,0})
model:forward(input[1])

and it give errors:

/home/gys/torch/install/share/lua/5.1/nn/Linear.lua:39: size mismatch, [10 x 2], [8] at /home/gys/torch/pkg/torch/lib/TH/generic/THTensorMath.c:527
stack traceback:
    [C]: in function 'addmv'
    /home/gys/torch/install/share/lua/5.1/nn/Linear.lua:39: in function 'updateOutput'
    /home/gys/torch/install/share/lua/5.1/nn/ParallelTable.lua:12: in function 'updateOutput'
    /home/gys/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function 'updateOutput'
    /home/gys/torch/install/share/lua/5.1/nn/ConcatTable.lua:11: in function 'updateOutput'
    /home/gys/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function 'updateOutput'
    /home/gys/torch/install/share/lua/5.1/nn/ConcatTable.lua:11: in function 'updateOutput'
    /home/gys/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function 'updateOutput'
    /home/gys/torch/install/share/lua/5.1/nn/ConcatTable.lua:11: in function 'updateOutput'
    /home/gys/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function 'updateOutput'
    /home/gys/torch/install/share/lua/5.1/nnx/LSTM.lua:170: in function 'updateOutput'
    /home/gys/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function 'forward'
    [string "require 'dp'..."]:15: in main chunk
    [C]: in function 'xpcall'
    /home/gys/torch/install/share/lua/5.1/itorch/main.lua:177: in function </home/gys/torch/install/share/lua/5.1/itorch/main.lua:143>
    /home/gys/torch/install/share/lua/5.1/lzmq/poller.lua:75: in function 'poll'
    /home/gys/torch/install/share/lua/5.1/lzmq/impl/loop.lua:307: in function 'poll'
    /home/gys/torch/install/share/lua/5.1/lzmq/impl/loop.lua:325: in function 'sleep_ex'
    /home/gys/torch/install/share/lua/5.1/lzmq/impl/loop.lua:370: in function 'start'
    /home/gys/torch/install/share/lua/5.1/itorch/main.lua:344: in main chunk
    [C]: in function 'require'
    [string "arg={'/home/gys/.ipython/profile_torch/securi..."]:1: in main chunk

from rnn.

nicholas-leonard avatar nicholas-leonard commented on August 28, 2024
require 'dp'
require 'torch'

hiddenSize=100
inputSize=2
outputSize=10

lstm=nn.LSTM(inputSize, hiddenSize)
model = nn.Sequential()
model:add(nn.SplitTable(1,2))
model:add(nn.Sequencer(lstm))
model:add(nn.SelectTable(-1))
model:add(nn.Linear(hiddenSize,outputSize))
model:add(nn.LogSoftMax())

criterion=nn.ClassNLLCriterion()

input=torch.Tensor({{0,0},{1,1},{2,2},{3,3}}) --the input points in one stroke (0,0),(1,1),(2,2),(3,3)
output=torch.Tensor({1,0,0,0,0,0,0,0,0,0})
model:forward(input)

from rnn.

shuokay avatar shuokay commented on August 28, 2024

@nicholas-leonard everything goes well ,thank you very. I will close this issue.

from rnn.

flybass avatar flybass commented on August 28, 2024

@nicholas-leonard I have a question along similar lines. Do you mind telling me what is wrong with my training here? In this toy example, the RNN doesn't perform well learning the sum of a series.

-- more imports than necessary
require 'nn'
require 'cunn'
require 'rnn'
require 'dp'
require 'cutorch'

-- set to a small learning rate due to possible explosion
lr = .00001

-- roll lstm over the numbers in the sequence, select last output layer, apply linear model
rnn = nn.Sequential()
lstm = nn.FastLSTM( 1,500)
rnn:add( nn.Sequencer( lstm) )
rnn:add( nn.SelectTable(-1) )
rnn:add( nn.Linear(500,1) )

rnn:cuda()
criterion = nn.MSECriterion():cuda()


-- random numbers that are scaled to make the problem a little harder
inputs = torch.rand(100,200)
for i=1,100 do
    inputs[i] = inputs[i]* i
end


inputs = inputs:cuda()
targets = inputs:sum(2):cuda()

baseline = targets:std()^2

print( baseline ) --  usually around 8,600,000  
for i =1,10000 do
    rnn:training()
    errors = {}
    for j = 1,100 do 
                --get input row and target
        local input = inputs[j]:split(1)
        local target = targets[j]
        -- print( target)
        local output = rnn:forward( input)
        local err = criterion:forward( output, target)
        table.insert( errors, err)
        local gradOutputs = criterion:backward(output, target)  
        rnn:backward(input, gradOutputs)
        rnn:updateParameters(lr)
        rnn:zeroGradParameters()
        rnn:forget()
    end
    print ( torch.mean( torch.Tensor( errors))   )
end

from rnn.

nicholas-leonard avatar nicholas-leonard commented on August 28, 2024

@flybass Your problem is very hard. Maybe the model needs more capacity. You could try adding more hidden (1000 instead of 500), and/or stacking another LSTM over the first one.

Also, you could try it with more examples. 100 is not a lot. Also, the data is unbounded. So a sum could be 1, or it could be 10000. If you could find a way to keep it within a range, say between -1 and 1, that might help. To do so, you could normalize the dataset so that the min and max sum is -1 and 1 respectively.

from rnn.

nicholas-leonard avatar nicholas-leonard commented on August 28, 2024

@flybass You can try the default initialization, or getParameters():uniform(0.1) (or something around that number). It's empirical, so trial and error. But usually, I find uniform 0.1 to work for me.

from rnn.

flybass avatar flybass commented on August 28, 2024

@nicholas-leonard
Hey Nick,

I spoke to a professor about this issue and he mentioned to initialize the LSTM weights so they are orthogonal. I'll compare this method.

from rnn.

nicholas-leonard avatar nicholas-leonard commented on August 28, 2024

@flybass let me know if that works best. Also, if it does work, you could add a initOrthogonal() to LSTM !

from rnn.

robotsorcerer avatar robotsorcerer commented on August 28, 2024

Is there a way to initialize the rnn for a one to many sequence? I am doing a single output to six output system Everything is fine until I call gradInput = rnn:backward(inputs[step], gradOutputs[step]). It keeps complaining that inputs size should be equal in size to gradOutputs. I do not expect them to be equal. I noticed this is due to an assert function that was built into the Linear and nn modules. Would you be able to look into the code below for me please? I have

require 'torch'
require 'nn'

ninputs     = 1
noutputs    = 6
nhiddens_rnn = 1
  start       = 1                         -- the size of the output (excluding the batch dimension)        
  rnnInput    = nn.Linear(ninputs, start)     --the size of the output
  feedback    = nn.Linear(start, ninputs)           --module that feeds back prev/output to transfer module
  transfer    = nn.ReLU()  

--RNN
    r = nn.Recurrent(start, 
                     rnnInput,  --input module from inputs to outs
                     feedback,
                     transfer,
                     rho             
                     )

    neunet     = nn.Sequential()
              :add(r)
              :add(nn.Linear(nhiddens_rnn, noutputs))

    neunet    = nn.Sequencer(neunet)
    print('rnn')
    print(neunet)

I train as follows in mini-batches

   local offsets = {}
    --form mini batch
    for t = 1, train_input:size()[1], 1 do

      for i = t, t+opt.batchSize-1 do
        table.insert(offsets, train_input[i])  
      end      
      offsets = torch.cat({offsets[1], offsets[2], offsets[3], offsets[4], 
                            offsets[5], offsets[6]})
     -- print('offsets b4'); print(offsets)
      offsets = torch.LongTensor():resize(offsets:size()[1]):copy(offsets)

      --BPTT
      local iter = 1

      -- 1. create a sequence of rho time-steps
      local inputs, targets = {}, {}
      for step = 1, rho do                              
        --batch of inputs
        inputs[step] = train_input:index(1, offsets)
        --batch of targets
        targets[step] = {train_out[1]:index(1, offsets), train_out[2]:index(1, offsets), 
                          train_out[3]:index(1, offsets), train_out[4]:index(1, offsets), 
                          train_out[5]:index(1, offsets), train_out[6]:index(1, offsets)}
        --increase offsets indices by 1
        offsets = train_input[{ {t+step, t+step+rho} }] 
        offsets = torch.LongTensor():resize(offsets:size()[1]):copy(offsets)
      end  

      --2. Forward sequence through rnn

      neunet:zeroGradParameters()
      neunet:forget()  --forget all past time steps

      local outputs, err = {}, 0
      local inputs_ = {}
      local targetsTable =    {}    

      for step = 1, rho do   
        table.insert(inputs_, inputs[step])
        outputs[step] = neunet:forward(inputs_)
        _, outputs[step] = catOut(outputs, step, noutputs, opt)

        --reshape output data

        _, targetsTable = catOut(targets, step, noutputs, opt) 
        err     = err + cost:forward(outputs[step], targetsTable)
        print('err', err)
      end
      print(string.format("Step %d, Loss error = %f ", iter, err ))

      --3. do backward propagation through time(Werbos, 1990, Rummelhart, 1986)
      local gradOutputs, gradInputs = {}, {}
      local inputs_bkwd = {}
      for step = rho, 1, -1 do  --we basically reverse order of forward calls              
        gradOutputs[step] = cost:backward(outputs[step], targets[step])

        --resize inputs before backward call
        inputs_bkwd = gradInputResize(inputs, step, noutputs, opt)
        gradInputs[step]  = neunet:backward(inputs_bkwd, gradOutputs[step])
        -- print('gradInputs'); print(gradInputs)
      end

      --4. update lr
      neunet:updateParameters(opt.rnnlearningRate)

      iter = iter + 1
    end

I am getting errors when I call neunet:backward(inputs_bkwd, gradOutputs[step]):

home/local/ANT/ogunmolu/torch/install/bin/luajit: ...l/ANT/ogunmolu/torch/install/share/lua/5.1/nn/Linear.lua:75: size mismatch, m1: [6 x 1], m2: [6 x 1] at /home/local/ANT/ogunmolu/torch/pkg/torch/lib/TH/generic/THTensorMath.c:706
stack traceback:
    [C]: in function 'addmm'
    ...l/ANT/ogunmolu/torch/install/share/lua/5.1/nn/Linear.lua:75: in function 'updateGradInput'
    ...T/ogunmolu/torch/install/share/lua/5.1/nn/Sequential.lua:55: in function 'updateGradInput'
    ...NT/ogunmolu/torch/install/share/lua/5.1/rnn/Recursor.lua:45: in function '_updateGradInput'
    ...lu/torch/install/share/lua/5.1/rnn/AbstractRecurrent.lua:46: in function 'updateGradInput'
    ...T/ogunmolu/torch/install/share/lua/5.1/rnn/Sequencer.lua:78: in function 'updateGradInput'
    ...l/ANT/ogunmolu/torch/install/share/lua/5.1/nn/Module.lua:30: in function 'backward'
    rnn.lua:475: in function 'train'
    rnn.lua:688: in main chunk
    [C]: in function 'dofile'
    ...molu/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
    [C]: at 0x00406240

Would appreciate your help!

I left a gist here in case you are interested in the code.

from rnn.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.