torch / optim Goto Github PK

View Code? Open in Web Editor NEW

197.0 28.0 153.0 403 KB

A numeric optimization package for Torch.

License: Other

CMake 0.66% Lua 99.34%

optim's Introduction

Optimization package

This package contains several optimization routines and a logger for Torch:

optim's People

Contributors

Stargazers

Watchers

Forkers

rlowrance fidlej charlesmartin14 jonathantompson akfidjeland alvinlschua nicholas-leonard maoli-cambia gdesjardins gsair paidi skaae jhjin jjh42 yossibiton aysegul diz-vara sagarwaghmare69 vidalalcala lvdmaaten louissmit willwilliams ajaytalati vseledkin urikz cantabresearch karpathy ivendrov dpfau sfl2 pumpkin007 samehkhamis kashif mys007 seanbolt eladhoffer xcyan jesseengel linusu bshillingford svenkreiss gcheron pengsun eulerreich rui-liu xqwang1 0xshipthecode eriche2016 maxreimann innovarul caomw scabi varunnagpaal wgapl yitzhaksp moravcikm andreuxmath caldweln gigitsu omry jj jiakai ludybupt apaszke euwern gfl699468 colesbury jaiabhayk hfxunlp gandhisunilr hongtao-argmin zhixinshu chenb67 mathxingkong ltrottier mutual-ai marsbzp atcold 10sun 105 mranzinger cadene deeplearningsprint foelin mudelin gcinbis iassael sabirdvd nafitzgerald codeac29 jonathanasdf hughperkins dmitryulyanov f0k toshika23 bragilee ibmua jessika0991 kavehtp achalddave

optim's Issues

rmsprop causing strange loss of accurracy part way through training

I've been using Adagrad normally, but I decided to try Rmsprop to see if it improves accuracy. In our tests, rmsprop seemed to converge faster and to a higher maximum, so we started training our big models using it. However, I have noticed something strange that happens during training. It seems like randomly, the accuracy will suddenly precipitously drop, with loss suddenly shooting up. Sometimes I've even seen "infinity" in our testing results when this happens - as if one of the model parameters got accidentally changed to infinity, causing a cascade of failed calculations. See these results:

This is one of the first rmsprop runs:

decayed learning rate by a factor 0.97 to 0.012665023782736
iteration 6800/17090000, seq_length = 500, loss = 25.60889482, loss/seq_len = 0.02560889, gradnorm = 1.3048e+01. Time Elapsed: 3070 seconds
iteration 6850/17090000, seq_length = 500, loss = 35.99438245, loss/seq_len = 0.03599438, gradnorm = 1.7849e+01. Time Elapsed: 3158 seconds
iteration 6900/17090000, seq_length = 500, loss = 14.20753793, loss/seq_len = 0.01420754, gradnorm = 1.6731e+01. Time Elapsed: 3185 seconds
iteration 6950/17090000, seq_length = 500, loss = 31.02228065, loss/seq_len = 0.03102228, gradnorm = 2.1421e+01. Time Elapsed: 3205 seconds
decayed learning rate by a factor 0.97 to 0.012285073069254
iteration 7000/17090000, seq_length = 500, loss = 126072.68073179, loss/seq_len = 126.07268073, gradnorm = 9.3243e+03. Time Elapsed: 3183 seconds
iteration 7050/17090000, seq_length = 500, loss = 71258.54748077, loss/seq_len = 71.25854748, gradnorm = 9.2335e+03. Time Elapsed: 6792 seconds
iteration 7100/17090000, seq_length = 500, loss = 59993.95191604, loss/seq_len = 59.99395192, gradnorm = 8.9946e+03. Time Elapsed: 3071 seconds
iteration 7150/17090000, seq_length = 500, loss = 80161.97462837, loss/seq_len = 80.16197463, gradnorm = 9.0648e+03. Time Elapsed: 3223 seconds
decayed learning rate by a factor 0.97 to 0.011916520877176
iteration 7200/17090000, seq_length = 500, loss = 62363.37415352, loss/seq_len = 62.36337415, gradnorm = 6.3187e+03. Time Elapsed: 3077 seconds
iteration 7250/17090000, seq_length = 500, loss = 77396.41234885, loss/seq_len = 77.39641235, gradnorm = 6.3629e+03. Time Elapsed: 2930 seconds
iteration 7300/17090000, seq_length = 500, loss = 66974.65153092, loss/seq_len = 66.97465153, gradnorm = 5.9655e+03. Time Elapsed: 2989 seconds
iteration 7350/17090000, seq_length = 500, loss = 34369.91119689, loss/seq_len = 34.36991120, gradnorm = 5.8163e+03. Time Elapsed: 2813 seconds

Notice what happens around iteration 7000. The loss just shoots up all of the sudden. If I check the testing results, the testing loss is "infinify". It goes back to normal in subsequent iterations. At first I thought it was a rare hardware issue, but then a different model did the same thing:

Iteration Time Training Loss Testing Loss Testing # Correct Testing # Wrong Testing # Total Accurracy
1000 3032 1.998393671 3.460828 8220 140937 149157 5.51
2000 3321 1.506352061 1.13135852 106180 42977 149157 71.19
3000 3389 0.6526988754 0.6081444923 126793 22364 149157 85.01
4000 3382 0.4032474733 0.4583896942 131588 17569 149157 88.22
5000 3075 2.197617545 17.48262351 60603 88554 149157 40.63

In this second example, I can see the point where the loss starts shooting up in the logs. It doesn't appear to be instantaneous - perhaps an error is made in one iteration that slowly cascades until it affects everything.

decayed learning rate by a factor 0.97 to 0.01825346
iteration 4400/17090000, seq_length = 500, loss = 0.38249470, gradnorm = 8.0499e+01. Time Elapsed: 3280 seconds
iteration 4450/17090000, seq_length = 500, loss = 0.37212085, gradnorm = 2.9393e+02. Time Elapsed: 3426 seconds
iteration 4500/17090000, seq_length = 500, loss = 0.36586265, gradnorm = 8.7689e+01. Time Elapsed: 3288 seconds
iteration 4550/17090000, seq_length = 500, loss = 0.35865728, gradnorm = 5.4034e+01. Time Elapsed: 3416 seconds
decayed learning rate by a factor 0.97 to 0.0177058562
iteration 4600/17090000, seq_length = 500, loss = 0.40036575, gradnorm = 7.8565e+01. Time Elapsed: 3327 seconds
iteration 4650/17090000, seq_length = 500, loss = 0.42660431, gradnorm = 2.2500e+02. Time Elapsed: 3309 seconds
iteration 4700/17090000, seq_length = 500, loss = 0.49915671, gradnorm = 4.2741e+03. Time Elapsed: 3237 seconds
iteration 4750/17090000, seq_length = 500, loss = 0.86534878, gradnorm = 3.5756e+03. Time Elapsed: 3251 seconds
decayed learning rate by a factor 0.97 to 0.017174680514
iteration 4800/17090000, seq_length = 500, loss = 1.24005108, gradnorm = 4.3706e+03. Time Elapsed: 3232 seconds
iteration 4850/17090000, seq_length = 500, loss = 1.22130984, gradnorm = 5.6758e+03. Time Elapsed: 3117 seconds
iteration 4900/17090000, seq_length = 500, loss = 6.12171381, gradnorm = 9.2302e+03. Time Elapsed: 3232 seconds
iteration 4950/17090000, seq_length = 500, loss = 11.80134205, gradnorm = 9.0186e+03. Time Elapsed: 3029 seconds
decayed learning rate by a factor 0.97 to 0.01665944009858
iteration 5000/17090000, seq_length = 500, loss = 17.11424646, gradnorm = 6.3805e+03. Time Elapsed: 3075 seconds

You can see loss going down, and then it starts going up again slowly, which isn't totally unusual. But then it quickly spikes and never recovers! We didn't see any "infinity's" in this run but the same curious sudden change in loss is visible. I wouldn't be surprised if there was actually an infinity, but in one of the iterations inbetween where we don't record results.

Does anyone have any insight into what might be happening? I haven't ever seen something like this when using Adagrad - only with the models that we train using rmsprop.

How can I use learningRates option in SGD?

I try to give a vector of lrs per layer but it does not work. Here is my code;

     local learningRates = {}
     local params, gradParams = model:parameters()
     print(params[1]:size())
     for i=1, #params do
       learningRates[i] =  opt.LR
     end
     print("setting LR")
     learningRates[#params] = opt.topLayerLR
    --  print(learningRates)
     learningRates = torch.Tensor(learningRates):reshape(#params,1)
ocal params, gradParams = model:parameters()
     print(params[1]:size())
     for i=1, #params do
       learningRates[i] =  opt.LR
     end
     print("setting LR")
     learningRates[#params] = opt.topLayerLR
    --  print(learningRates)
     learningRates = torch.Tensor(learningRates):reshape(#params,1)

Could you help me how to use learningRates in a propoer way?

ConfusionMatrix:batchAdd error in function indexAdd

Hello,

I would like to plot the confusion matrix for the resnet network https://github.com/facebook/fb.resnet.torch. I substitute the 1000-way classifier with a binary one.
My output is a tensor of size 32x2 and my target a tensor of size 32x1.

When I try to use ConfusionMatrix:batchAdd I get this error:

/home/jessica/torch/install/bin/luajit: ...ca/torch/install/share/lua/5.1/optim/ConfusionMatrix.lua:117: bad argument #1 to 'indexAdd' (out of range at /home/jessica/torch/pkg/torch/lib/TH/generic/THTensor.c:729)
stack traceback:
[C]: in function 'indexAdd'
...ca/torch/install/share/lua/5.1/optim/ConfusionMatrix.lua:117: in function 'batchAdd'
./train.lua:71: in function 'train'
main.lua:59: in main chunk
[C]: in function 'dofile'
...sica/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
[C]: at 0x00406670

Given that i am new to torch and lua. Could help me understanding what is going wrong? Thank you

Optim not working using SplitTable with no updateGradInput method

Hi there,

I'm using Torch to implement TransE model. Actually I need to use a SplitTable and I got an error like the one reported here: torch/nn#568. I've tried to redefine the updateGradInput method but using optim I got the following error in the backward pass:

/opt/torch/install/bin/luajit: /opt/torch/install/share/lua/5.1/optim/sgd.lua:82: inconsistent tensor size at /opt/torch/pkg/torch/lib/TH/generic/THTensorMath.c:500
stack traceback:
        [C]: in function 'add'
        /opt/torch/install/share/lua/5.1/optim/sgd.lua:82: in function 'optim_method'

I've tried my model using the on the fly training procedure and it works. So I think that the problem is related to optim; maybe it is not able to manage the weird case of the undefined updateGradInput method in the SplitTable.

Thank you for your help.

Alessandro

interested in levenberg-marquart ?

Hi, is the developer interested to add the the levenberg-marquart algorithm to the package? I have implemented it recently in Torch. Practice has shown that it is one of the best optimization algorithms for common (not deep) neural nets.

no math.log10 in LUA52

If torch is installed with LUA52 ConfutionMatrix wont work because there is no math.log10 in LUA52:
https://github.com/torch/optim/blob/master/ConfusionMatrix.lua#L197

Small logic error when ConfusionMatrix:batchAdd() is called

Hi, all~

This is the real first time to use minibatch mode since I've been sticking to minibatch=1 (SGD ;-) for long time. And it should be equivalent when call batchAdd() with single input and single label, but it seems there is a small bug in batchAdd() at Click me ;-)

error case:
for network model

net = nn.Sequential()
....
-- expected to outpout 8 predictions
net:add(nn.Reshape(8))

and with input and label shown below

input = torch.FloatTensor(1,4,12,12)
target = torch.FloatTensor(1,1)

The output data formate is torch.LongTensor(8) and torch.FloatTensor(1,1).
Then we go to here ->> WARNING for prediction and here for label.

Finally, we got pred with torch.LongTensor(4) but label with torch.FloatTensor(1).
As a result, out of range is thrown at here
x_x

Summary:

batchAdd() and add() works fine on minibatch > 2
nn.Reshape(8) will squeeze first dimension: 1x8x1x1 --> 8, which is one of the cause of this bug
dimensional check at batchAdd is not all-covering, maybe mutual check of preds and targets should be considered.

Just report bug, if it's helpful to you, why not star repo?
happy in hacking~ ;-)

module 'optim' not found

I have installed optim, but it could not be found.
lua: ./Network.lua:1: module 'optim' not found: no field package.preload['optim'] no file '/usr/local/share/lua/5.2/optim.lua' no file '/usr/local/share/lua/5.2/optim/init.lua' no file '/usr/local/lib/lua/5.2/optim.lua' no file '/usr/local/lib/lua/5.2/optim/init.lua' no file '/usr/share/lua/5.2/optim.lua' no file '/usr/share/lua/5.2/optim/init.lua' no file './optim.lua' no file '/usr/local/lib/lua/5.2/optim.so' no file '/usr/lib/x86_64-linux-gnu/lua/5.2/optim.so' no file '/usr/lib/lua/5.2/optim.so' no file '/usr/local/lib/lua/5.2/loadall.so' no file './optim.so' stack traceback: [C]: in function 'require' ./Network.lua:1: in main chunk [C]: in function 'require' ./AN4CTCTest.lua:4: in main chunk [C]: in ?
The installing information is following:
`sherrie@sherrie-PC:~/CTCSR$ luarocks install optim
Installing https://raw.githubusercontent.com/torch/rocks/master/optim-1.0.5-0.rockspec...
Using https://raw.githubusercontent.com/torch/rocks/master/optim-1.0.5-0.rockspec... switching to 'build' mode
正克隆到 'optim'...
remote: Counting objects: 50, done.
remote: Compressing objects: 100% (42/42), done.
remote: Total 50 (delta 10), reused 22 (delta 6), pack-reused 0
接收对象中: 100% (50/50), 40.67 KiB | 0 bytes/s, done.
处理 delta 中: 100% (10/10), done.
检查连接... 完成。
cmake -E make_directory build && cd build && cmake .. -DCMAKE_BUILD_TYPE=Release -DCMAKE_PREFIX_PATH="/home/sherrie/torch/install/bin/.." -DCMAKE_INSTALL_PREFIX="/home/sherrie/torch/install/lib/luarocks/rocks/optim/1.0.5-0" && make

-- The C compiler identification is GNU 4.8.4
-- The CXX compiler identification is GNU 4.8.4
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Found Torch7 in /home/sherrie/torch/install
-- Configuring done
-- Generating done
-- Build files have been written to: /tmp/luarocks_optim-1.0.5-0-9983/optim/build
cd build && make install
Install the project...
-- Install configuration: "Release"
-- Installing: /home/sherrie/torch/install/lib/luarocks/rocks/optim/1.0.5-0/lua/optim/checkgrad.lua
-- Installing: /home/sherrie/torch/install/lib/luarocks/rocks/optim/1.0.5-0/lua/optim/adagrad.lua
-- Installing: /home/sherrie/torch/install/lib/luarocks/rocks/optim/1.0.5-0/lua/optim/adadelta.lua
-- Installing: /home/sherrie/torch/install/lib/luarocks/rocks/optim/1.0.5-0/lua/optim/polyinterp.lua
-- Installing: /home/sherrie/torch/install/lib/luarocks/rocks/optim/1.0.5-0/lua/optim/rmsprop.lua
-- Installing: /home/sherrie/torch/install/lib/luarocks/rocks/optim/1.0.5-0/lua/optim/lswolfe.lua
-- Installing: /home/sherrie/torch/install/lib/luarocks/rocks/optim/1.0.5-0/lua/optim/nag.lua
-- Installing: /home/sherrie/torch/install/lib/luarocks/rocks/optim/1.0.5-0/lua/optim/adam.lua
-- Installing: /home/sherrie/torch/install/lib/luarocks/rocks/optim/1.0.5-0/lua/optim/rprop.lua
-- Installing: /home/sherrie/torch/install/lib/luarocks/rocks/optim/1.0.5-0/lua/optim/init.lua
-- Installing: /home/sherrie/torch/install/lib/luarocks/rocks/optim/1.0.5-0/lua/optim/cg.lua
-- Installing: /home/sherrie/torch/install/lib/luarocks/rocks/optim/1.0.5-0/lua/optim/sgd.lua
-- Installing: /home/sherrie/torch/install/lib/luarocks/rocks/optim/1.0.5-0/lua/optim/fista.lua
-- Installing: /home/sherrie/torch/install/lib/luarocks/rocks/optim/1.0.5-0/lua/optim/adamax.lua
-- Installing: /home/sherrie/torch/install/lib/luarocks/rocks/optim/1.0.5-0/lua/optim/asgd.lua
-- Installing: /home/sherrie/torch/install/lib/luarocks/rocks/optim/1.0.5-0/lua/optim/lbfgs.lua
-- Installing: /home/sherrie/torch/install/lib/luarocks/rocks/optim/1.0.5-0/lua/optim/Logger.lua
-- Installing: /home/sherrie/torch/install/lib/luarocks/rocks/optim/1.0.5-0/lua/optim/ConfusionMatrix.lua
-- Installing: /home/sherrie/torch/install/lib/luarocks/rocks/optim/1.0.5-0/lua/optim/cmaes.lua
Updating manifest for /home/sherrie/torch/install/lib/luarocks/rocks
optim 1.0.5-0 is now built and installed in /home/sherrie/torch/install/ (license: BSD)
`
Could you help me about this error?

is optim module use cublas

hi, guys
is optim module use cublas for speed up?

Learning rate decay

In the function optim.sgd the learning rate decay is implemented this way:
line 71: local clr = lr / (1 + nevals*lrd)
Where nevals is equal to state.evalCounter. However, this evalCounter is on the line 85 increased any time when optim method is called during the training. i.e. each mini-batch update. So, the nevals not contains number of iteration, but number of mini-batches.
Is that a bug or your intention?

Thanks,
Petr

There is no L1 weightDecay in Torch.

The weightDecay in Torch refers to L2, Yes?
So How can I implemment L1 weightDecay?

multiple plots with optim.logger

is there a way to have multiple plots in one figure? one plot works fine, but when I do something like
logger:add{['training loss'] = loss1 }; logger:add{['test loss'] = loss2}
it gives an error. so I define two loggers but the disadvantage is, that they produce two figures instead of one.

How to get top-5 or top-3 accuracy by ConfusionMatrix?

Hello all,
Can I use ConfusionMatrix to compute top-n accuracy for my classifier or not?
If yes, how can I do that? Is there any sample code to do that?
Thanks

Question on rmsprop implementation

According to Tensorflow implementation, it seems to me that the 53rd line of 'rmsprop.lua' needs to modified as

state.tmp:sqrt(state.m):add(epsilon) --> state.tmp:sqrt(state.m+epsilon)

Is it okay to use the original one without modification when I try to train Inception-resnet v2 from the scratch uisng the same optimization parameters ?

Issue with rmsprop

While training with sgd works fine for the exact architecture, rmsprop throws the fallowing error:

qlua: /home/psxab5/torch/install/share/lua/5.1/optim/rmsprop.lua:49: calling 'addcmul' on bad self (sizes do not match at /tmp/luarocks_cutorch-scm-1-4946/cutorch/lib/THC/THCTensorMath.cu:231)
stack traceback:
        [C]: at 0x7fcea4fe4e20
        [C]: in function 'addcmul'
        /home/psxab5/torch/install/share/lua/5.1/optim/rmsprop.lua:49: in function 'rmsprop'
        ./train.lua:62: in function 'train'
        main.lua:91: in main chunk

Is weight decay implemented correctly?

As far as I understand the point of weight decay is to avoid weights being too big (in absolute value).
Shouldn't weight decay be implemented with absolute values to avoid negative values getting larger in magnitude? Otherwise, if we have a large negative weight the gradient will make it try to become even larger.

Current implementation:
dfdx:add(wd, x)

How it should be:
dfdx:add(wd, torch.abs(x))

This applies to the Adagrad and SGD weight decay.

How to implements L1 and L2 regularization?

Optim method runs on multi-core by default?

Hello,
I have a neural network using 'nn' and training with 'optim' package.
When training my model, all CPUs are taken and working 100%.
I wonder if optim method automatically finds all available cpu and performs parallel processing?
If yes, is there some option to configure the number of cpu for training?

Thanks

Feature request: Don't show plot window in Logger

Currently, Logger:plot(...) always displays a plot window and optionally writes to a file, i.e. it's not possible to save a plot without displaying it. This might be inconvenient if running a batch of experiments, for example. I suggest introducing a new member variable self.showPlot = true to control the behaviour.

Feature Request: Adasecant

Hi,

This could be a good addition to your library:

Adasecant
"In this paper, we propose an adaptive learning rate algorithm, which utilizes stochastic curvature information of the loss function for automatically tuning the learning rates"

Link to pdf

New rockspec

I think its time for a new rockspec.

Could we do without tags and just have a rockspec that gets updates as the repository gets updated?

ConfusionMatrix: Stochastic bug with batchAdd

I'm running into a bug that appears and disappears for no apparent reason in the use of ConfustionMatrix.batchAdd. See these two consecutive runs:

     COMMND  asb  ~  git  mnist  src  th train.lua --printstep 20 --skiplog --cuda                       [284/403825]
    72 of 45000 training records will be unused per epoch.
    24 of 15000 validation records will be unused per epoch.
    [2016-10-14 18:40:09] Finished epoch = 1, batch = 20, with loss = 1.1431102752686.
    [2016-10-14 18:40:10] Finished epoch = 1, batch = 40, with loss = 0.98379397392273.
    [2016-10-14 18:40:10] Finished epoch = 1, batch = 60, with loss = 0.69640064239502.
    [2016-10-14 18:40:11] Finished epoch = 1, batch = 80, with loss = 0.53388464450836.
    [2016-10-14 18:40:11] Finished epoch = 1, batch = 100, with loss = 0.42102938890457.
    [2016-10-14 18:40:12] Finished epoch = 1, batch = 120, with loss = 0.69019424915314.
    [2016-10-14 18:40:13] Finished epoch = 1, batch = 140, with loss = 0.28126338124275.
    [2016-10-14 18:40:13] Finished epoch = 1, batch = 160, with loss = 0.31771036982536.
    [2016-10-14 18:40:14] Finished epoch = 1, batch = 180, with loss = 0.36902123689651.
    [2016-10-14 18:40:15] Finished epoch = 1, batch = 200, with loss = 0.15535597503185.
    [2016-10-14 18:40:15] Finished epoch = 1, batch = 220, with loss = 0.26898837089539.
    [2016-10-14 18:40:16] Finished epoch = 1, batch = 240, with loss = 0.2337928712368.
    [2016-10-14 18:40:16] Finished epoch = 1, batch = 260, with loss = 0.19574552774429.
    [2016-10-14 18:40:17] Finished epoch = 1, batch = 280, with loss = 0.37691986560822.
    [2016-10-14 18:40:18] Finished epoch = 1, batch = 300, with loss = 0.27491936087608.
    [2016-10-14 18:40:18] Finished epoch = 1, batch = 320, with loss = 0.36371386051178.
    [2016-10-14 18:40:19] Finished epoch = 1, batch = 340, with loss = 0.15922805666924.
    /home/asb/torch/install/bin/luajit: ...sb/torch/install/share/lua/5.1/optim/ConfusionMatrix.lua:117: bad argument #1 to
    'indexAdd' (out of range at /home/asb/torch/pkg/torch/lib/TH/generic/THTensor.c:729)
    stack traceback:
            [C]: in function 'indexAdd'
            ...sb/torch/install/share/lua/5.1/optim/ConfusionMatrix.lua:117: in function 'batchAdd'
            train.lua:153: in main chunk
            [C]: in function 'dofile'
            .../asb/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
            [C]: at 0x00405b80

     COMMND  asb  ~  git  mnist  src  th train.lua --printstep 20 --skiplog --cuda                         master 
    72 of 45000 training records will be unused per epoch.
    24 of 15000 validation records will be unused per epoch.
    [2016-10-14 18:43:17] Finished epoch = 1, batch = 20, with loss = 2.0140626430511.
    [2016-10-14 18:43:18] Finished epoch = 1, batch = 40, with loss = 0.97827231884003.
    [2016-10-14 18:43:18] Finished epoch = 1, batch = 60, with loss = 0.62330090999603.
    [2016-10-14 18:43:19] Finished epoch = 1, batch = 80, with loss = 0.73870342969894.
    [2016-10-14 18:43:19] Finished epoch = 1, batch = 100, with loss = 0.61164426803589.
    [2016-10-14 18:43:20] Finished epoch = 1, batch = 120, with loss = 0.40717771649361.
    [2016-10-14 18:43:21] Finished epoch = 1, batch = 140, with loss = 0.46196541190147.
    [2016-10-14 18:43:21] Finished epoch = 1, batch = 160, with loss = 0.7626816034317.
    [2016-10-14 18:43:22] Finished epoch = 1, batch = 180, with loss = 0.42969378829002.
    [2016-10-14 18:43:23] Finished epoch = 1, batch = 200, with loss = 0.42102152109146.
    [2016-10-14 18:43:23] Finished epoch = 1, batch = 220, with loss = 0.34528177976608.
    [2016-10-14 18:43:24] Finished epoch = 1, batch = 240, with loss = 0.32393988966942.
    [2016-10-14 18:43:24] Finished epoch = 1, batch = 260, with loss = 0.25361078977585.
    [2016-10-14 18:43:25] Finished epoch = 1, batch = 280, with loss = 0.35111820697784.
    [2016-10-14 18:43:26] Finished epoch = 1, batch = 300, with loss = 0.35840207338333.
    [2016-10-14 18:43:26] Finished epoch = 1, batch = 320, with loss = 0.19336950778961.
    [2016-10-14 18:43:27] Finished epoch = 1, batch = 340, with loss = 0.23242954909801.
    Total accuracy of classifier at completion of epoch 1 = 92.062784433365.
    Mean accuracy across classes at completion of epoch 1 = 92.140758547009.
    [2016-10-14 18:43:29] Finished epoch = 2, batch = 20, with loss = 0.45858466625214.
    [2016-10-14 18:43:29] Finished epoch = 2, batch = 40, with loss = 0.22427660226822.
    [2016-10-14 18:43:30] Finished epoch = 2, batch = 60, with loss = 0.2953850030899.
    [2016-10-14 18:43:31] Finished epoch = 2, batch = 80, with loss = 0.2055009752512.

The first run fails while the other succeeds with no changes whatsoever.

Moreover, arg 1 for indexAdd that is being reported in the stack trace is hard-coded to the value 1. So not sure how user code should even affect it.

My code is available here for reference.

Any ideas to debug this?

Thanks.

Fista.lua:61: bad argument #1 to 'resizeAs' (torch.FloatTensor expected, got torch.CudaTensor)

When I try to use fista for optimization with GPU, I got the error. It also report here

[Feature request] A suitable test problem for Adam

Could I request a test problem for the Adam optimizer, just to understand how it works better? Thanks 👍

I tried to use the new adam optimizer on the rosenbrock test problem in optim, used for adagrad, but it does'nt seem to work? I can't get it to work with a range of different config parameters for rosenberg, or for the ML problem I'm working on - simple copy tasks using the neural Turing machine?

I expect I've misunderstood the adam paper, so I'm doing something wrong/really dumb -- does the objective necessarily have to be stochastic -- for adam to be applied?

If so then I expect it would'nt work for rosenbrock, and the LSTM used in the neural Turing machine which is which is deterministic?

My failed rosenbrock attempt

require 'torch'
require 'optim'
require 'rosenbrock'
require 'l2'
x = torch.Tensor(2):fill(0)
fx = {}
config_adagrad = {learningRate=1e-1}

config_adam = adam_config = {
learningRate = 1e-6,
beta1 = 0.01,
beta2 = 0.001
}

for i = 1,10001 do
--x,f=optim.adagrad(rosenbrock,x,config_adagrad)
x,f=optim.adam(rosenbrock,x,config_adam)
if (i-1)%1000 == 0 then
table.insert(fx,f[1])
end
end

print()
print('Rosenbrock test')
print()
print('x=');print(x)
print('fx=')

OUTPUT

Rosenbrock test
x=
0.01 *
2.0243
0.0523
[torch.DoubleTensor of dimension 2]

fx=
1 1
1001 0.96578919291079
2001 0.96476690379526
3001 0.96406793828219
4001 0.96344671327807
5001 0.96285032358036
6001 0.96226242754115
7001 0.96167791102736
8001 0.96111328576091
9001 0.96050921334115
10001 0.9599246681675

[Feature request] Implementation of RMSProp

Dear all,

are there any plans to include an implementation of RMSProp in the near future? I know that there are already several basic/toy implementations available elsewhere [1,2,3], but it would be nice to have a solid implementation included in the optim-package, making it easier to use in terms of installing additional packages etc...

Thanks,
Michael

[1] https://github.com/w-cheng/optimx/blob/master/rmsprop.lua
[2] https://github.com/y0ast/VariationalDeconvnet/blob/master/rmsprop.lua
[3] https://github.com/kaishengtai/torch-ntm/blob/master/rmsprop.lua

Problem Solved. Setting learningRate and learningRateDecay in ADAM does not work.

Hi, it seems that every time the learningRate and learningRateDecay in ADAM get replaced by the default values 0.001 and 0 instead of the user-set values. Can anyone have a check?

Thanks!

ConfusionMatrix:add(pred, target) has logical error when `target` is a dim1 tensor of length 1

because these two lines of code don't check the length of target and the self.mat will be wrongly updated if target:numel()==1.

Note that feeding ConfusionMatrix with target being a dim 1 tensor of length 1 is possible because of this modification to nn.ClassNLLCriterion. I didn't submit a patch because I'm not sure whether we should support or disallow this kind of size.

config.dampening in optim.sgd?

Can anyone specify what values config.dampening in optim.sgd?

No plot windows when running optim.plot from inside a docker

Hi,
The optim.plot() doesn't plot while I train from inside a docker. Any solutions?

Optim does not handle cloned weights

While playing with the example for MarginRankingCriterion at https://github.com/torch/nn/blob/master/doc/criterion.md#nn.MarginRankingCriterion, I noticed that optim does not seem to be able to handle cloned weights and biases, due a size mismatch of the flattened parameters and flattened gradient params. Here's a simple example:

require 'nn'
require 'optim'

p1_mlp = nn.Linear(5, 2)
p2_mlp = p1_mlp:clone('weight', 'bias')

prl = nn.ParallelTable()
prl:add(p1_mlp)
prl:add(p2_mlp)

mlp1 = nn.Sequential()
mlp1:add(prl)
mlp1:add(nn.DotProduct())

mlp2 = mlp1:clone('weight', 'bias')

mlpa = nn.Sequential()
prla = nn.ParallelTable()
prla:add(mlp1)
prla:add(mlp2)
mlpa:add(prla)

criterion = nn.MarginRankingCriterion(0.1)

x, y, z = torch.randn(5), torch.randn(5), torch.randn(5)

parameters, gradParameters = mlpa:getParameters()
print(parameters:size(), gradParameters:size()) -- show size difference

function feval(params)
    local pred = mlpa:forward({{x, y}, {x, z}})
    local err = criterion:forward(pred, 1)
    local gradCriterion = criterion:backward(pred, 1)
    mlpa:backward({{x, y}, {x, z}}, gradCriterion)
    return err, gradParameters
end
optErr = optim.sgd(feval, parameters, {learningRate=0.01})

Which gives me:

 12
[torch.LongStorage of size 1]
 48
[torch.LongStorage of size 1]
luajit: .../share/lua/5.1/optim/sgd.lua:81: inconsistent tensor size at /tmp/luarocks_torch-scm-1-5221/torch7/lib/TH/generic/THTensorMath.c:456
stack traceback:
        [C]: in function 'add'
        .../share/lua/5.1/optim/sgd.lua:81: in function 'sgd'
        example.lua:37: in main chunk
        [C]: in function 'dofile'
        .../lib/luarocks/rocks/trepl/scm-1/bin/th:131: in main chunk
        [C]: at 0x00406260

The example in the documentation uses updateParameters(), which handles the parameter updating differently from optim's addition: see e.g. https://github.com/torch/optim/blob/master/sgd.lua#L81 versus the following in nn/Module.lua:

for i=1,#params do
   params[i]:add(-learningRate, gradParams[i])
end

This is a bit confusing. Of course I could somehow resize the gradParameters in feval, but that doesn't seem to be the right way to do this? I would say that if something works with the simple backward-updateParameters loop it should also work with optim. Am I missing something here?

nesterov momentum is wrong in sgd

@clementfarabet elaborate here?

Is it possible to optimize a table of modules together?

I'm currently working on a project where there are two modules that need to be optimized. And these two modules are somewhat relative to each other. I'm wondering if it is possible to optimize them together using optim? For example, could I write a feval function whose input is a table of parameters: { paramFromModule1, paramFromModule2 } and returns a table of grads: { gradsFromModule1, gradsFromModule2 }?

New release, PLEASE

The current release 1.0.3-1 is bugged! Could we please release a new one where gnuPOT has been definitely fixed (by my commit)? I'm getting this bug on every machine..

Please add ability to plot only specified categories.

Say we log time, train and test accuracy. We can not plot time and accuracy on same figure since scale is different. I suggest to add ability to select what categories to plot. In my example we probably want to plot only accuracies

Checking gradients on GPU?

This seems way too basic, but I've been looking at this for several hours and can't see a way around this:

I have a basic script which is creating a network and checking the gradient. When run on the CPU using double precision floats, the gradient check is good. On GPU, because CudaTensor objects use single precision floats, the analytic and numerical gradients don't match. This issue can also be replicated by setting the Tensor type on CPU to be FloatTensor. I feel like I have to be messing something up, but really don't see anything.

Here's the script:

require 'nn'
require 'optim'
require 'cunn'

local inputSize = 10
local outputSize = 10
local batchSize = 5

function check_gpu()
  -- First we do everything on CPU

  -- If you uncomment this line, the CPU estimates will be inconsistent too
  -- torch.setdefaulttensortype('torch.FloatTensor') 

  local net = nn.Sequential()
    -- :add(nn.Sigmoid())
    :add(nn.Linear(inputSize, outputSize))
  local w,dw = net:getParameters()
  local inp = torch.randn(batchSize, inputSize)
  local tgt = torch.zeros(batchSize)
  for i=1,batchSize do
    local i1 = torch.random(1, inputSize)
    tgt[i] = i1
  end

  local crit = nn.CrossEntropyCriterion()
  crit.sizeAverage = false

  local feval = function(x)
    if x ~= w then w:copy(x) end
    local out = net:forward(inp)
    local ce = crit:forward(out, tgt)
    local gradOutput = crit:backward(out, tgt)
    net:backward(inp, gradOutput)

    return ce, dw
  end
  
  local diff_cpu = optim.checkgrad(feval, w, 1e-4)
  print ('on cpu', diff_cpu)

  -- Then we do everything on GPU
  net:cuda()
  -- inp = inp:type('torch.CudaDoubleTensor')
  -- tgt = tgt:type('torch.CudaDoubleTensor')
  inp = inp:cuda()
  tgt = tgt:cuda()
  w, dw = net:getParameters()
  dw:zero()
  crit:cuda()
  local diff_gpu = optim.checkgrad(feval, w, 1e-4)
  print ('on gpu now', diff_gpu)
end

check_gpu()

Here's some sample output:

on cpu	4.0399252194701e-10	
on gpu now	0.003014417524189

polyinterp nans / wrong matrix shape

In the polyinterp function, cp is first an N-dim vector, and then is reshaped into an Nx2 matrix, only if no NaNs are found.

In the case where NaNs are found that statement crashes, as it expects an Nx2 matrix.

Not sure what the right behavior should be there. Is it normal than NaNs appear at that point?

Bug: line 35 of optim/polyinterp.lua crash in case n == 1

I ran an experiment with optim.lbfgs which uses optim.lswolfe which in turn calls the function roots() in optim/polyinterp.lua, and my program crashed at line https://github.com/torch/optim/blob/master/polyinterp.lua#L35.

I found that this line won't pass in case n == 1, so it might be a bug

thanks

rushan chen

<optim.lbfgs> function value changing less than tolX

How does this come about?

Missing entries in Logger.add

Is it possible for Logger:add(s) to add only a subset of the variables? For instance, I would like to call Logger:setNames({'loss', 'epoch', 'batch'}) and then do Logger:add({loss=0.5, batch=10}). The unknown field epoch should be set to a default value, say zero.

I think the current version of Logger is designed to only take as argument an ordered array (?). In the above example, if I do Logger:add({0.5, 10}), it will simply set the epoch to 10 and leave batch blank. This behavior is quite a pain while logging lots of variables.

customization confusion ouput matrix format

In this line the space is set default by %8d, can we customize it? That's because it will be tough to view if #classes is large

sensitivity / specificity error

require 'optim'
conf = optim.ConfusionMatrix(3)
conf:add(1,3)
conf:add(2,2)
conf:add(3,1)
conf:sensitivity() -- or conf:specificity()

heres a pull request that fixes it
#28

Plot Confusion Matrix Using iTorch

I noticed the current configuration for plotting the confusion matrix is based on qt

Is it possible to plot the same in iTorch?

ConfusionMatrix:render() bug

This is related to pull request #63 : #63

There are many divisions happening in the render function.
The rendered image is full of zeros because in the self.mat LongTensor, everything is rounded down to zero after dividing by the number of samples.

That commit should be reverted...

@soumith
@jonathantompson

Documentation

Each function should be documented in the README, instead of just inline. There should also be a link for each optimization function to the original paper.

Adagrad fails with GPU nets

Adagrad uses torch.sqrt applied to CudaTensors. Unfortunately, that's not supported so it fails with:

t7> =torch.sqrt(t,torch.CudaTensor(10))
expected arguments: [DoubleTensor] DoubleTensor | double
stack traceback:
[C]: at 0x7f1057ba8140
[C]: at 0x7f1057bc9070
[C]: at 0x7f1063c67960
t7>

cp variable in line 215 of polyinterp.lua is one dimensional

While training an autoencoder for mnist that uses optim.lbfgs, my program crashes at line 215 of polyinterp.lua with the error

bad argument #2 to '?' (too many indices provided at ~/torch/pkg/torch/generic/Tensor.c:894)

On inspection, I found that the cp variable is a one-dimensional tensor, but the program expects it to be two dimensional.

ConfusionMatrix instance calculates correct totalValid field value only after printing that particular ConfusionMatrix instance

I use
optim/1.0.5-0 (using macos as host)

Here's a torch session as a proof(see first bold section - before printing a confusion matrix instance and bold section after printing that instance):

th> foo = optim.ConfusionMatrix({'1', '2'})
[0.0001s]
th> foo:add(1,2)
[0.0001s]
th> foo.totalValid
0
[0.0000s]
th> foo:add(1,1)
[0.0000s]
th> foo.totalValid
0
[0.0001s]
th> print(foo)
ConfusionMatrix:
[[ 1 0] 100.000% [class: 1]
[ 1 0]] 0.000% [class: 2]

average row correct: 50%
average rowUcol correct (VOC measure): 25%
global correct: 50%
[0.0002s]

th> foo.totalValid
0.5
[0.0000s]

I suppose, that a ConfusionMatrix instance has to calculate its 'totalValid' field properly even if that ConfusionMatrix instance hasn't been displayed before referencing the 'totalValid' field.

Initialization of rmsprop exponential smoothing value 'm'

Currently the running 'mean square' variable 'm' of rmsprop is initialized to zero. If alpha is near one (e.g. the default value 0.99) the gradient is often divided by a number < 1 during the first few iterations. Especially at the beginning of the optimization it might be beneficial not to amplify the gradient too much (e.g. with the current impl the learning rate has to be set to a much smaller value when using rmsprop compared to plain-vanilla sgd in order not to diverge, quite often I see extreme error values during the first few rmsprop steps).

A simple solution could be to initialize 'm' with 1, e.g. :fill(1) instead of :zero() or to specify an initialization value in the optimization state/options.
A different approach could be to estimate the mean over N timesteps while not dividing, e.g. run a warmup-phase or to use a boolean 'reset' flag that when true initializes the mean to the gradient values of the next batch (not averaging over multiple steps).

BTW Does anybody know a publication dealing with a double-exponential smoothing rmsprop-alternative?

lswolfe does not work with CudaTensors

lswolfe uses torch.Tensor by default internally and does not work with CudaTensors. If you set the default tensor type to CudaTensor it still doesn't work. This prevents using line searches with lbfgs when using CudaTensors.

What is the correct way continue training after xth epoch?

I am currently using optim.adam to train my network. Let say, I am training my network up to xth epoch and I save my model, what settings in the optim function should I save in order to continue the training?

I notice that if I just load my save model, the loss computed is did not follow the trend. (the loss actually went back to loss computed in the first epoch). There must be some settings I need to reload in order to get back the similar loss.

The way I compare the result is by computing the loss at x + n epoch but I saved my model at xth epoch. After that I just reload my saved model at xth iteration and train to n epoch and compare the loss computed.

Technically speaking they should be similar. I hope someone can shed some light in this issue.