yajiemiao / pdnn Goto Github PK

View Code? Open in Web Editor NEW

224.0 224.0 105.0 212 KB

PDNN: A Python Toolkit for Deep Learning. http://www.cs.cmu.edu/~ymiao/pdnntk.html

License: Apache License 2.0

Python 99.79% Shell 0.21%

pdnn's People

Contributors

Stargazers

Watchers

Forkers

posenhuang oplatek chagge eileen0909 sreerammusgma david-zhao-365 magic2du zhangaustin tofigh- xczhanjun linan7788626 ababook smallcattom beronx86 lucosax fw1121 zhiyu-jiang keyua-cisco guanlongtianzi njuhugn honeyflyfish liumangtu stevenlee-belief shroom orangelpai hson648 truongdo ticentxia dianaponce jiangxianliang chialunwu yanweifu bin2000 hocgabri nitbix mclaughlin6464 lukasschmit hlim86 zendevelopmentsystems kevinpsk cvml brianchan1024 nalisaoucha viveksck shiweixingcn philipnz vananhpham88 intfloat vanova ericjpfk imclab rollingstone npy179 jren2012 synetkim xinchoubiology robi56 hdubey leekinboo ericschles josvanroosmalen chrisemoulton caomw neoboy zhangn5 paulbakker65 zhuangh zhangjiulong sohuren imaculate plaffitte brainstormers dreadlord1984 leezqcst flyinggh weili1988 hbyw618 tuming1990 mzthhy swaileh tkytm aditay dihong oalsing ltoscano imagingearth redsuncmx jiancao92 wpfhtl tanzita chenpan0506 niucheney georgid afcarl flyahead coolsnake nd1511 mindcs2018 vishalbelsare mirishkarganesh

pdnn's Issues

Training data

Hi May I know how to get the training data for your PDNN?

Is there any way to enhance pdnn performance on GPU machine? the run time for the same algorithm on cpu is almost identical to the gpu one? any ideas?

I use a big partition value and i enable Stream, but nothing seems to make it work better.

memory error

hi everyone,
I use copy-feats binary from kaldi, to convert my ascii features in .ark and .scp
Then I copied all the independent .scp files into a unique one which I called SmallSet0.scp:

SESS0003BLOCKA_06 /home/ana/DB/SmallSet0/feat/SESS0003BLOCKA_06.ark:18
SESS0003BLOCKA_07 /home/ana/DB/SmallSet0/feat/SESS0003BLOCKA_07.ark:18
SESS0003BLOCKA_08 /home/ana/DB/SmallSet0/feat/SESS0003BLOCKA_08.ark:18
SESS0003BLOCKA_09 /home/ana/DB/SmallSet0/feat/SESS0003BLOCKA_09.ark:18
SESS0003BLOCKA_10 /home/ana/DB/SmallSet0/feat/SESS0003BLOCKA_10.ark:18
SESS0003BLOCKA_11 /home/ana/DB/SmallSet0/feat/SESS0003BLOCKA_11.ark:18

Then I tryed to train 4 stacked RBM using run_RBM.py and got the following memory error:

ana@ana-HP-EliteBook-Folio-9470m:~/PDNN/pdnn$ python /home/ana/PDNN/pdnn/cmds/run_RBM.py --train-data "/home/ana/DB/SmallSet0/feat/SmallSet0.scp,partition=600m,stream=true,random=true" --nnet-spec "215:1024:1024:43:1024" --wdir ./ --ptr-layer-number 4 --epoch-number 10 --batch-size 128 --learning-rate 0.08 --gbrbm-learning-rate 0.005 --momentum 0.5:0.9:5 --first_layer_type gb --param-output-file /home/ana/PDNN/Working_dir/rbm.mdl
[2015-07-28 23:06:57.528732] > ... initializing the model
Traceback (most recent call last):
File "/home/ana/PDNN/pdnn/cmds/run_RBM.py", line 62, in
cfg.init_data_reading(train_data_spec)
File "/home/ana/PDNN/pdnn/utils/rbm_config.py", line 65, in init_data_reading
self.train_sets, self.train_xy, self.train_x, self.train_y = read_dataset(train_dataset, train_dataset_args)
File "/home/ana/PDNN/pdnn/io_func/data_io.py", line 92, in read_dataset
data_reader.initialize_read(first_time_reading = True)
File "/home/ana/PDNN/pdnn/io_func/kaldi_io.py", line 102, in initialize_read
utt_id, utt_mat = self.read_next_utt()
File "/home/ana/PDNN/pdnn/io_func/kaldi_io.py", line 89, in read_next_utt
tmp_mat = numpy.frombuffer(ark_read_buffer.read(rows * cols * 4), dtype=numpy.float32)
MemoryError

what did I do wrong?
best regards
ana

Typo in dropout_nnet.py

142 if self.l2_reg is not None:
143 for i in xrange(self.n_layers):
144 W = self.layers[i].W
145 self.finetune_cost += self.l2_reg * T.sqr(W).sum()

'self.n_layers' seems should be 'self.hidden_layers_number'
Because the class has no 'n_layers' variable.

Increasingly negative loss in denoising autoencoder

Hi,
I am training 3 layers stacked denoising autoencoder which has a bit of difference in loss function.

I want to make autoencoder that tries to reconstruct the 'global' input (not previous layer's output), which means the original input that was fed to the first layer, using normally obtained input which is previous layer's output.

I just edited all parameter 'self.x' to 'self.x_global' on def 'get_cost_updates' in layers/da.py.
self.x_global means the original input that was fed to the first layer (self.x in models/sda.py).

And the result of training was like this.

[2016-12-04 01:36:48.845309] > ... training the model
[2016-12-04 01:37:01.856494] > layer 0, epoch 0, reconstruction cost 405.427734
[2016-12-04 01:37:15.579682] > layer 0, epoch 1, reconstruction cost 381.404175
[2016-12-04 01:37:29.242537] > layer 0, epoch 2, reconstruction cost 377.724701
[2016-12-04 01:37:43.045209] > layer 0, epoch 3, reconstruction cost 375.875977
[2016-12-04 01:37:56.615403] > layer 0, epoch 4, reconstruction cost 374.741211
[2016-12-04 01:38:11.105572] > layer 1, epoch 0, reconstruction cost -108174.476562
[2016-12-04 01:38:24.891239] > layer 1, epoch 1, reconstruction cost -334065.656250
[2016-12-04 01:38:38.807076] > layer 1, epoch 2, reconstruction cost -561826.187500
[2016-12-04 01:38:52.979225] > layer 1, epoch 3, reconstruction cost -790545.687500
[2016-12-04 01:39:07.143726] > layer 1, epoch 4, reconstruction cost -1019794.250000
[2016-12-04 01:39:21.975468] > layer 2, epoch 0, reconstruction cost -152930.156250
[2016-12-04 01:39:36.551489] > layer 2, epoch 1, reconstruction cost -460353.750000
[2016-12-04 01:39:51.328428] > layer 2, epoch 2, reconstruction cost -767839.625000
[2016-12-04 01:40:05.910295] > layer 2, epoch 3, reconstruction cost -1075358.750000
[2016-12-04 01:40:20.484577] > layer 2, epoch 4, reconstruction cost -1382889.500000

Reconstruction cost goes increasingly negative.
Is it normal? What it means??

Here is my edited code (just self.x to self.x_global in original code).

self.x_global is self.x in models/sda.py (original input)
###################################################
def get_last_cost_updates(self, corruption_level, learning_rate, momentum):
""" This function computes the cost and the updates for one trainng step of the dA """

    tilde_x = self.get_corrupted_input(self.x, corruption_level)
    y = self.get_hidden_values(tilde_x)
z = self.get_reconstructed_input(y)
    L = - T.sum(self.x_global * T.log(z) + (1 - self.x_global) * T.log(1 - z), axis=1)
#L=0

    if self.reconstruct_activation is T.tanh:
        L = T.sqr(self.x_global - z).sum(axis=1)
#    L=0

    if self.sparsity_weight is not None:
        sparsity_level = T.extra_ops.repeat(self.sparsity, self.n_hidden)
        avg_act = y.mean(axis=0)

        kl_div = self.kl_divergence(sparsity_level, avg_act)

        cost = T.mean(L) + self.sparsity_weight * kl_div.sum()
    else:
        cost = T.mean(L)

    # compute the gradients of the cost of the `dA` with respect
    # to its parameters (derivative cost with respect to params)
    gparams = T.grad(cost, self.params)
    # generate the list of updates
    updates = collections.OrderedDict()
    for dparam, gparam in zip(self.delta_params, gparams):
        updates[dparam] = momentum * dparam - gparam*learning_rate
    for dparam, param in zip(self.delta_params, self.params):
        updates[param] = param + updates[dparam]

    return (cost, updates)

###################################################

Decoding

Problems creating new datafiles

I have just cloned the pdnn package, verified that mnist/mnist_rbm examples work and I am trying to build some new examples in order to verify the pickle file creation before working on my real data.
First of all I reproduced the example at page
https://www.cs.cmu.edu/~ymiao/pdnntk/data.html
writing the python script that create a sample file:

import cPickle, numpy, gzip
feature = numpy.array([[0.2, 0.3, 0.5, 1.4], [1.3, 2.1, 0.3, 0.1], [0.3, 0.5, 0.5, 1.4]], dtype = 'float32')
label = numpy.array([2, 0, 1])
with gzip.open('filename.pkl.gz', 'wb') as f:
cPickle.dump((feature, label), f)

The creation process was fine, but, when I tried to run a simple DNN training using the script

!/bin/bash

two variables you need to set

pdnndir=/home/guest-fac/tamburin/pdnn # pointer to PDNN
device=cpu # the device to be used. set it to "cpu" if you don't have GPUs

export environment variables

export PYTHONPATH=$PYTHONPATH:$pdnndir
export THEANO_FLAGS=mode=FAST_RUN,device=$device,floatX=float32

rm *.tmp

TRAIN DNN

python $pdnndir/cmds/run_DNN.py --train-data "filename.pkl.gz" --valid-data "filename.pkl.gz" --nnet-spec "4:5:3" --wdir ./ --param-output-file dnn.mdl --cfg-output-file dnn.cfg

I get the following output:

[2015-11-10 13:20:47.589817] > ... building the model
[2015-11-10 13:20:47.603441] > ... getting the finetuning functions
[2015-11-10 13:20:48.612798] > ... finetuning the model
/usr/lib/python2.7/dist-packages/numpy/core/_methods.py:55: RuntimeWarning: Mean of empty slice.
warnings.warn("Mean of empty slice.", RuntimeWarning)
[2015-11-10 13:20:48.614276] > epoch 1, training error nan (%)
[2015-11-10 13:20:48.615054] > epoch 1, lrate 0.080000, validation error nan (%)
[2015-11-10 13:20:48.619409] > epoch 2, training error nan (%)
[2015-11-10 13:20:48.619491] > epoch 2, lrate 0.080000, validation error nan (%)
[2015-11-10 13:20:48.622980] > epoch 3, training error nan (%)
[2015-11-10 13:20:48.623059] > epoch 3, lrate 0.080000, validation error nan (%)
[2015-11-10 13:20:48.626443] > epoch 4, training error nan (%)

and nothing change forever...
Actually, I got this behavior using a lot of different datasets, but I reproduced it here with this simple example for clarity.
Any idea about the problem?
I got this problem on MacOSX 10.10, python 2.7.10 and on Linux SMP Debian 3.16.7, python 2.7.9, thus it should not be dependent on local python installations.
Any help is more than welcome.
Thanks!
Fabio

Decoding

Hi!
I have obtained a kaldi formatted model. How can I use this model for decoding?

Reg: Error while training PDNN using maxout

Hi all,

I was running PDNN with dropout+maxout parameters. Found the error at DNN training.
Please help me out how to solve this issue.

*_Using gpu device 0: Tesla K40m
[2016-04-28 06:54:31.804045] > ... building the model
Traceback (most recent call last):
File "pdnn/cmds/run_DNN.py", line 85, in
_file2nnet(dnn.layers, set_layer_num = ptr_layer_number, filename = ptr_file)
File "/home/kk/work/eng/pdnn/io_func/model_io.py", line 114, in file2nnet
layer.W.set_value(factor * np.asarray(string_2_array(nnet_dict[dict_a]), dtype=theano.config.floatX).reshape(mat_shape))
ValueError: total size of new array must be unchanged
*

Command used for training DNN is:

   $pythonCMD pdnn/cmds/run_DNN.py --train-data "$working_dir/train_tr95.pfile.*.gz,partition=2000m,random=true,stream=true" \
                                    --valid-data "$working_dir/train_cv05.pfile.*.gz,partition=600m,random=true,stream=true" \
                                    --nnet-spec "429:400:400:400:913" \
                                    --ptr-file $working_dir/dnn.ptr --ptr-layer-number $numHidden \
                                    --dropout-factor "0.2,0.2,0.2" \
                                    --activation "maxout:3" \
                                    --lrate "D:0.08:0.5:0.2,0.2:8" \
                                    --wdir $working_dir --kaldi-output-file $working_dir/dnn.nnet || exit 1;

Issue when Changing Number of Hidden Layers

I was playing with the examples and encountered a problem. I modified "784:1024:1024:10" in run.sh line 23 to be just "784:1024:10" And it no longer runs correctly. I noticed in the theano debug that the shapes of the two W's (for the hidden layer and the logistic layer) are (784, 1024) and (1024,1024) while the delta_W have shapes (784, 1024) and (1024, 10). So the delta_W's have the right shape but the W's somehow have the wrong shape. How could this be happening?

我运行出的dnn.classify文件查看都是数字相同的数组（已解决，我的数据集有问题，抱歉）

我使用example给的参数，数据集换成了自己的，或者是如下
marker_train = np.array([[0.2, 0.3, 0.5, 1.4], [1.3, 2.1, 0.3, 0.1], [0.3, 0.5, 0.5, 1.4]], dtype = 'float32')
y_train = np.array([2, 0, 1])
marker_va = np.array([[0.2, 0.3, 0.8, 1.4], [1.3, 2.3, 0.3, 0.1], [0.3, 0.5, 0.2, 1.4]], dtype = 'float32')
y_va = np.array([4, 3, 1])
marker_te = np.array([[0.6, 0.3, 0.5, 1.4], [1.3, 2.3, 0.1, 0.1], [0.3, 0.5, 1.2, 1.4]], dtype = 'float32')
y_te = np.array([2, 0, 0])
with gzip.open('train.pickle.gz', 'wb') as f:cPickle.dump((marker_train, label), f)
输出的结果
aaa=gzip.open("/home/malab5/Software/pdnn/examples/mnist/dnn.classify.pickle.gz")
bbb = cPickle.load(aaa)

bbb
array([[ 1.],
[ 1.],
[ 1.]], dtype=float32)
都是这种数字一模一样的数字，我知道是哪个参数出错了

训练的代码

!/bin/bash

pdnndir=/home/malab5/Software/pdnn # pointer to PDNN
device=gpu0

export environment variables

export PYTHONPATH=$PYTHONPATH:$pdnndir
export THEANO_FLAGS=mode=FAST_RUN,device=$device,floatX=float32

train DNN model

echo "Training the DNN model ..."

python $pdnndir/cmds/run_DNN.py --train-data "train.pickle.gz,partition=600m,random=true" \

--valid-data "valid.pickle.gz,partition=600m,random=true" \

--nnet-spec "1000:1024:1024:1024:1024:1024:1000" \

--wdir ./ --param-output-file dnn.mdl

echo "Training the DNN model ..."

train DNN model

echo "Training the DNN model ..."
python $pdnndir/cmds/run_DNN.py --train-data "train.pickle.gz"
--valid-data "valid.pickle.gz"
--nnet-spec "4:20:20:1" --wdir ./
--l2-reg 0.1 --lrate "C:10:20" --model-save-step 10
--param-output-file dnn.param --cfg-output-file dnn.cfg >& dnn.training.log

classification on the testing data; -1 means the final layer, that is, the classification softmax layer

echo "Classifying with the DNN model ..."
python $pdnndir/cmds/run_Extract_Feats.py --data "test.pickle.gz"
--nnet-param dnn.param --nnet-cfg dnn.cfg
--output-file "dnn.classify.pickle.gz" --layer-index 10
--batch-size 10 >& dnn.testing.log

python show_results.py dnn.classify.pickle.gz

train的结果
/usr/local/lib/python2.7/dist-packages/numpy/core/_methods.py:59: RuntimeWarning: Mean of empty slice.
warnings.warn("Mean of empty slice.", RuntimeWarning)
[2016-04-19 15:32:57.491671] > epoch 1, training error nan (%)
[2016-04-19 15:32:57.491899] > epoch 1, lrate 10.000000, validation error nan (%)
[2016-04-19 15:32:57.491951] > epoch 2, training error nan (%)
[2016-04-19 15:32:57.491981] > epoch 2, lrate 10.000000, validation error nan (%)
[2016-04-19 15:32:57.492019] > epoch 3, training error nan (%)
[2016-04-19 15:32:57.492051] > epoch 3, lrate 10.000000, validation error nan (%)
[2016-04-19 15:32:57.492092] > epoch 4, training error nan (%)
[2016-04-19 15:32:57.492117] > epoch 4, lrate 10.000000, validation error nan (%)

test结果
ERROR (theano.sandbox.cuda): nvcc compiler not found on $PATH. Check your nvcc installation and try again.
/usr/local/lib/python2.7/dist-packages/theano/tensor/signal/downsample.py:6: UserWarning: downsample module has been moved to the theano.tensor.signal.pool module.
"downsample module has been moved to the theano.tensor.signal.pool module.")
[2016-04-19 15:32:57.740441] > ... setting up the model and loading parameters
[2016-04-19 15:32:57.748392] > ... getting the feat-extraction function
Traceback (most recent call last):
File "/home/malab5/Software/pdnn/cmds/run_Extract_Feats.py", line 78, in
extract_func = model.build_extract_feat_function(layer_index)
File "/home/malab5/Software/pdnn/models/dnn.py", line 179, in build_extract_feat_function
out_da = theano.function([feat], self.layers[output_layer].output, updates = None, givens={self.x:feat}, on_unused_input='warn')
IndexError: list index out of range

Constant Folding Error

Hi,
I am new with PDNN.
I installed the Theano in Windows 7 according to its instructions, and using the miniconda.
Then, to verify the installation, I tried to ran the example "mnist".
After the data preparation, when I am running the "run_dnn.py" I got some Error as follows:

[2019-06-02 19:49:50.400000] > ... building the model
[2019-06-02 19:49:50.766000] > ... getting the finetuning functions
ERROR (theano.gof.opt): Optimization failure due to: constant_folding
ERROR (theano.gof.opt): node: InplaceDimShuffle{x}(TensorConstant{1.0})
ERROR (theano.gof.opt): TRACEBACK:
ERROR (theano.gof.opt): Traceback (most recent call last):
File "C:\Users\MB\Miniconda2\lib\site-packages\theano\gof\opt.py", line 1982,
in process_node
replacements = lopt.transform(node)
......

The attached file is the complete log.
what is the root cause of this Error? and How I can solve it?
Let me to know if more information is required.

I am so appreciated, for your nice helps.

log.txt

Finetuning the model

Hello again,
after fixing the learning rate problem, I struggle with the next one: I come to the "finetuning the model" step and then there is this error:

"Traceback (most recent call last):
File "pdnn/cmds/run_CNN.py", line 93, in
train_error = train_sgd(train_fn, cfg)
File "pdnn/learning/sgd.py", line 72, in train_sgd
train_error.append(train_fn(index=batch_index, learning_rate = learning_rate, momentum = momentum))
File "/usr/local/lib/python2.7/dist-packages/Theano-0.7.0-py2.7.egg/theano/compile/function_module.py", line 606, in call
storage_map=self.fn.storage_map)
File "/usr/local/lib/python2.7/dist-packages/Theano-0.7.0-py2.7.egg/theano/compile/function_module.py", line 595, in call
outputs = self.fn()
ValueError: total size of new array must be unchanged
Apply node that caused the error: Reshape{4}(Subtensor{int64:int64:}.0, TensorConstant{[256 1 28 28]})
Inputs types: [TensorType(float64, matrix), TensorType(int64, vector)]
Inputs shapes: [(256, 40), (4,)]
Inputs strides: [(320, 8), (8,)]
Inputs values: ['not shown', array([256, 1, 28, 28])]"

I'm a bit puzzled by this, can you please help me?

Accuratness problem

Hello, I'm trying to use pdnn for CNNs (standard call, exactly like in the description in the documentation), the programm gets to initialising the model and finetuning functions, but then I get the following error:

Traceback (most recent call last):
File ".pdnn/cmds/run_CNN.py", line 88, in
batch_size=cfg.batch_size)
File "pdnn/models/cnn.py", line 140, in build_finetune_functions
(index + 1) * batch_size]})
File "/usr/local/lib/python2.7/dist-packages/Theano-0.7.0-py2.7.egg/theano/compile/function.py", line 266, in function
profile=profile)
File "/usr/local/lib/python2.7/dist-packages/Theano-0.7.0-py2.7.egg/theano/compile/pfunc.py", line 511, in pfunc
on_unused_input=on_unused_input)
File "/usr/local/lib/python2.7/dist-packages/Theano-0.7.0-py2.7.egg/theano/compile/function_module.py", line 1466, in orig_function
defaults)
File "/usr/local/lib/python2.7/dist-packages/Theano-0.7.0-py2.7.egg/theano/compile/function_module.py", line 1339, in create
defaults, self.unpack_single, self.return_none, self)
File "/usr/local/lib/python2.7/dist-packages/Theano-0.7.0-py2.7.egg/theano/compile/function_module.py", line 338, in init
c.value = value
File "/usr/local/lib/python2.7/dist-packages/Theano-0.7.0-py2.7.egg/theano/gof/link.py", line 345, in set
self.storage[0] = self.type.filter(value, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/Theano-0.7.0-py2.7.egg/theano/tensor/type.py", line 164, in filter
raise TypeError(err_msg, data)
TypeError: ('TensorType(float32, scalar) cannot store accurately value 0.0001, it would be represented as 9.99999974738e-05. If you do not mind this precision loss, you can: 1) explicitly convert your data to a numpy array of dtype float32, or 2) set "allow_input_downcast=True" when calling "function".', 0.0001, 'Container name "learning_rate"')

It's a bit cryptic to me, so I have no idea what to do (although I understand that the problem is that values cannot be stored the right way). Can you help me please?

Cause Error when using rectifier as activation function.

Dear Yajie.
I used pdnn for deep learning.
It works fine but when I used 'rectifiler' as activation function, it cause following error.
　
cPickle.dump(cfg, output, cPickle.HIGHEST_PROTOCOL)
cPickle.PicklingError: Can't pickle <type 'function'>: attribute lookup builtin.function failed

I guess, this error be caused by rectifier function is defined as lambda function.
Please give me some advice if you do not mind.
Regards.

Is this an error?

ubgpu@ubgpu:~/github/pdnn/examples/mnist_rbm$ sudo ./run.sh
--2015-05-16 15:48:05-- http://www.iro.umontreal.ca/~lisa/deep/data/mnist/mnist.pkl.gz
Resolving www.iro.umontreal.ca (www.iro.umontreal.ca)... 132.204.24.179
Connecting to www.iro.umontreal.ca (www.iro.umontreal.ca)|132.204.24.179|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 16168813 (15M) [application/x-gzip]
Saving to: 鈥榤nist.pkl.gz.6鈥

100%[===============================================================================================================================================================>] 16,168,813 2.98MB/s in 5.5s

2015-05-16 15:48:11 (2.81 MB/s) - 鈥榤nist.pkl.gz.6鈥saved [16168813/16168813]

Preparing datasets ...
Traceback (most recent call last):
File "data_prep.py", line 9, in
train_set, valid_set, test_set = cPickle.load(f)
File "/usr/lib/python2.7/gzip.py", line 261, in read
self._read(readsize)
File "/usr/lib/python2.7/gzip.py", line 308, in _read
self._read_eof()
File "/usr/lib/python2.7/gzip.py", line 347, in _read_eof
hex(self.crc)))
IOError: CRC check failed 0x3d667246 != 0x7f91b084L
Training the RBM model ...
ubgpu@ubgpu:/github/pdnn/examples/mnist_rbm$
ubgpu@ubgpu:/github/pdnn/examples/mnist_rbm$

DNN training error

Hi,
I have a small dataset for classification(9 classes). 137 training samples with 20 fratures. validation set has 13 samples. I have created the pickle files as described in your example.

But, when I run the command:
python $pdnndir/cmds/run_DNN.py --train-data "train.pickle.gz"
--valid-data "valid.pickle.gz"
--nnet-spec "20:1024:1024:9" --wdir ./
--l2-reg 0.0001 --lrate "C:0.1:10"
--model-save-step 20 --param-output-file dnn.param
--cfg-output-file dnn.cfg > dnn.training.log

I got the following error:
Traceback (most recent call last):
File "/home/somu/pdnn/cmds/run_DNN.py", line 56, in
cfg.init_data_reading(train_data_spec, valid_data_spec)
File "/home/somu/pdnn/utils/network_config.py", line 94, in init_data_reading
self.train_sets, self.train_xy, self.train_x, self.train_y = read_dataset(train_dataset, train_dataset_args)
File "/home/somu/pdnn/io_func/data_io.py", line 83, in read_dataset
data_reader = PickleDataRead(file_path_list, read_opts)
File "/home/somu/pdnn/io_func/pickle_io.py", line 36, in init
self.pfile_path = pfile_path_list[0]
IndexError: list index out of range

Could you please help me to correct the error.

Another thing I would like to know:
Each training sample has 20 columns, so I have used 20 in the --net-spec "20:1024:1024:9". Again 9 in the last parameter as I have 9 output classes. Is this correct?

Error to read PDNN nnet with kaldi

Hi all!
I'm trying to extract bnf (steps_pdnn/make_bnf_feat.sh) using a nnet with "rectifier" activation function trained with PDNN and the nnet-forward function from kaldi tells me that it does not recognize that marker component.

It seems that a pdnn nnet with ReLu cant be read by kaldi. How can i convert the PDNN nnet model to Kaldi nnet model, to recognize the model type.

thx in advance
Flavio

Multiple labels for 1 feature

Hello,

Is it possible to load eg. for a feature vector: [1, 2, 3, 4, 5] a label vector [6, 7, 8, 9], instead of a single label?
So there will be a feature matrix, with each row corresponding to a row from a label matrix.
What modifications are required?

Thanks in advance!

`nan` parameters

Hi! Training with PDNN sometimes ends with the nan parameters. Command is as follows:
python run_DNN.py --train-data "train.pfile,partition=2000m,stream=true,random=true" --valid-data "valid.pfile,partition=200m,stream=true,random=true" --nnet-spec "1320:2048:2048:2048:2048:2048:1947" --lrate "D:0.8:0.5:0.1,0.1:6" --dropout-factor 0.2,0.2,0.2,0.2,0.2 --wdir model --kaldi-output-file model/model.dnn --model-save-step 1
What is the problem?

Dropout gradient and adjustment factor

Hi everyone,

I was wondering if anyone else is having problem with using dropout. I got the error:

theano.gradient.NullTypeGradError: tensor.grad encountered a NaN. This variable is null because the grad method for input 4 () of the for{cpu,scan_fn} op is mathematically undefined. Depends on a shared variable.

Original code:
return theano_rng.binomial(n=1, p=1-p, size=hid_out.shape, dtype=theano.config.floatX) * hid_out

Tried a couple of things:

When I remove the binomial sampling, i.e. "return hid_out", everything works fine:
Tried writing a function with theano_rng, which worked too:
x = T.vector()
bin = theano_rng.binomial(n=1, p=1-p, size=x.shape, dtype=theano.config.floatX)
y = T.dot(bin,x)
dy = theano.grad(y,x)
Tried using numpy random numbers, which worked as well:
hid_out *= numpy.float32(numpy.random.binomial([numpy.ones(n_out,dtype=theano.config.floatX)],1-p))

😖

One more thing - if I'm not wrong, we should adjust for the dropouts? I've changed the code slightly below. Please let me know if this is not appropriate :)

return theano_rng.binomial(n=1, p=1-p, size=hid_out.shape, dtype=theano.config.floatX) * hid_out /(1-p)

Precision loss error

I'm trying to train a net using some data, but shortly after starting training, I have this:
python2 pdnn/cmds/run_DNN.py --train-data "train.pickle.gz,random=false" --valid-data "valid.pickle.gz,random=false" --nnet-spec "64:256:256:64:1" --wdir ./ --param-output-file test1.mdl --lrate "C:0.1:200"
[2015-01-29 00:59:02.621588] > ... building the model
[2015-01-29 00:59:02.635272] > ... getting the finetuning functions
Traceback (most recent call last):
File "pdnn/cmds/run_DNN.py", line 93, in
batch_size=cfg.batch_size)
File "/home/me/RProp/pdnn/models/dnn.py", line 162, in build_finetune_functions
(index + 1) * batch_size]})
File "/usr/lib/python2.7/site-packages/theano/compile/function.py", line 223, in function
profile=profile)
File "/usr/lib/python2.7/site-packages/theano/compile/pfunc.py", line 512, in pfunc
on_unused_input=on_unused_input)
File "/usr/lib/python2.7/site-packages/theano/compile/function_module.py", line 1312, in orig_function
defaults)
File "/usr/lib/python2.7/site-packages/theano/compile/function_module.py", line 1192, in create
defaults, self.unpack_single, self.return_none, self)
File "/usr/lib/python2.7/site-packages/theano/compile/function_module.py", line 326, in init
c.value = value
File "/usr/lib/python2.7/site-packages/theano/gof/link.py", line 278, in set
self.storage[0] = self.type.filter(value, **kwargs)
File "/usr/lib/python2.7/site-packages/theano/tensor/type.py", line 152, in filter
raise TypeError(err_msg, data)
TypeError: ('TensorType(float32, scalar) cannot store accurately value 0.0001, it would be represented as 9.99999974738e-05. If you do not mind this precision loss, you can: 1) explicitly convert your data to a numpy array of dtype float32, or 2) set "allow_input_downcast=True" when calling "function".', 0.0001, 'Container name "learning_rate"')

The provided mnist example works.

parallel Conv net

Hi Yajie,
Is it possible to design a network with 4 parallel convolutional networks where the outputs of these four parallel network are connected to a fully connected layer?
The 4 parallel networks are not connected (no share weights) and each of them have their own input (lets say 4 different images are the inputs to these networks)

TypeError with Learning Rate

Nevermind. Fixed.

delta_W_rec dimensions

in rnn, shouldn't:
self.delta_W_rec = theano.shared(value = numpy.zeros((n_in,n_out),
dtype=theano.config.floatX), name='delta_W_rec')
be:
self.delta_W_rec = theano.shared(value = numpy.zeros((n_out,n_out),
dtype=theano.config.floatX), name='delta_W_rec')
?

does pdnn supports soft class association?

Hello,

I want to know whether pdnn supports soft class labels?
For example in the following 4 class classification problem:

Feature Vector              Class Label

[0.2, 0.3, 0.5, 1.4, 1.8, 2.5] [0.1 0.0 0.0 0.9]
[1.3, 2.1, 0.3, 0.1, 1.4, 0.9] [0.2 0.2 0.6 0.0]
[0.3, 0.5, 0.5, 1.4, 0.8, 1.4] [0.0 0.8 0.1 0.1]

association of feature vectors to each class is introduced by a normalized vector of association labels.

Confusion around the RBM layer

Hi,

I am fairly new to implementing neural networks and was browsing through the RBM layer code when I found something I couldn't wrap my head around.

On line 129, the "self.sample_h_given_v" method takes v_rec_sigm and not v_rec_sample as input, i.e. the mean and not the actual sample.
Similarly on line 180, the same method takes v_rec, in this case the pre_sigmoid_sample and not the actual gaussian samle as input.

In the same way there is a difference when calculating the gradient of the vbias in both cases

T.sum(v_rec_sigm, axis=0) --- Using sum of the mean
T.sum(v_rec, axis=0) --- Using sum of the pre_sigmoid

(RBM): updates[self.delta_vbias] = momentum * self.delta_vbias + lr * (1.0/batch_size) * (T.sum(self.input, axis=0) - T.sum(v_rec_sigm, axis=0))
(GBRBM): updates[self.delta_vbias] = momentum * self.delta_vbias + lr * (1.0/batch_size) * (T.sum(self.input, axis=0) - T.sum(v_rec, axis=0))

In both cases it is probably just my lack of knowledge that is the problem, but I hoped someone could shed some light on it for me. Any pointers would be highly appreciated!

JSON.dump for saving nnet.tmp

Hi yasjie,
I'm wondering if there is a particular reason that you used json format to dump the nnet dictionary. As the size of the network increases, it because a huge burden since a network with four parallel conv layers may need more than 1.5G space to be saved and you also can imagine how much dump it may get to load this back to the memory.

about CNN

I am wondering if this CNN is designed just for 2D data like images? Does it reshape features directly into 2D and then start the training?

Stacked Autoencoders

Hi I'm trying to use run_SdA.py using Kaldi I/O . I've two questions that are probably related:

First, I don't need alignments (no label file) as the output has to be the clean version of the input features. How can i do it using run_SdA.py?
I'm getting memory error (in spite of setting partition option to 50m). Is it because I'm not providing any kaldi alignment file (labels)?

output scores along with class labels

Hi, Can you please tell if there is a quick switch/fix to output the highest score (of the output layer) in addition to the corresponding class-label?

MTLDNN

Hi! I have checked MTL-PDNN. I have expected that final models would differ on only last hidden layer and softmax layer. But final models for each task are completely different. Note that, iteration numbers are the same. Is it true?

MemoryError in extracting binary ark file

I have scp file with corresponding ark file, whenever I want to use class kaldi_feat.py, using read_next_utt() function I get the following error:
MemoryError
I backtraced the error and obtained the following information of error:

tmp_mat = numpy.frombuffer(ark_read_buffer.read(rows * cols * 4), dtype=numpy.float32)
Traceback (most recent call last):
File "", line 1, in
MemoryError
Please note values of rows and cols are (1254238158, 5784298), their * becomes very big number.

How to use extracted feature? or How to extract label from pfile?

Hello,

I used 'cmds/run_Extract_feats.py' to extract feature.
And the extracted feature have no label information, but just values.
When I use the extracted feature as input of new DNN, I need label information (right?).
How can I the extracted feature as a input of new DNN?

How can I extract label information from pfile? or how to convert pfile to pickle format?
(I couldn't search this information in internet.. there are few information about pfile in internet.. :( )

Thanks in advance!

DNN Model and Configuration

How i can view "nnet.mdl' and 'nnet.cfg' files in MATLAB and PYTHON.

additional input

Hi,
Assume a Conv network consisting of several conv layers and several FC layers. Is it possible to connect some of the inputs directly to a FC layer ,i.e., for some inputs can we skip the Conv layers?

hidden layers outputs

Hi all!
I run the /kaldipdnn/run_timit/run-bnf-tandem.sh example, Is there any way to get the values of each hidden layers in the trained neural network (bnf.nnet)?

Can PDNN support the MMSE criterion?

If I want to apply PDNN to the task of regression, should I modify the source code?

Was I doing something wrong? KeyError: 'W0'?

Hi,

After setting up by following the document, I changed directory to "/pdnn/examples/mnist" and executed
"sh run.sh." Although the error rates were different from what was put on the README.md, I believed it should be caused by numerical round-off error since I can see the error rates kept reducing iteration by iteration.

After run.sh was done, I re-executed "sh run.sh", in
def _file2nnet(layers, set_layer_num = -1, filename='nnet.in', factor=1.0):
filename='nnet.tmp' was actually read in and I got

Traceback (most recent call last):
File "/tmp/ATLAS/linux_install/pdnn/cmds/run_DNN.py", line 87, in
_file2nnet(dnn.layers, filename = wdir + '/nnet.tmp')
File "/tmp/ATLAS/linux_install/pdnn/io_func/model_io.py", line 115, in _file2nnet
layer.W.set_value(factor * np.asarray(string_2_array(nnet_dict[dict_a]), dtype=theano.config.floatX).reshape(mat_shape))
KeyError: 'W0'

I checked the file 'nnet.tmp', I found key = 'W2' and key = 'W3", but there was no 'W0' or 'W1'. (I do see 'W0 00', ... and 'W1 0 0...)

Was I doing something wrong?

Best,
Henry

run_Extract_Feats.py for stacked RBMs?

run_Extract_Feats.py appears to work for DNN or CNN only. Don't ask why, but I'd need to run stacked RBMs example, trained the network but failed to extract features using run_Extract_Feats.py. Am I missing something or isn't this just not implemented?

Changing the training data after end of each iteration

Another question is,
Is it possible to change the training data after each epoch?

Classfi problem

i have done the training , with 3 label . but when i classfi the data. it alway 66,66667 % .sorry for my bad in english

Error the argument valid_data has to be specified

Hello, I try to use your lib and have this error for the following request:

> python pdnn/cmds/run_MTL.py 
> --train-data "caco2_train.pkl|herg_train.pkl"\ 
> --valid-data "caco2_test.pkl|herg_test.pkl, partition=600m, random=True" \ 
> --task-number 2 --wdir /home/mariia\ 
> --shared-nnet-spec="4000:2000:1000:1000" --indiv-nnet-spec"1024:2000|1024:2000"\ --activation sigmoid/ \ 
> --param-output-file caco2_herg.mdl --cfg-output-file caco2_herg.cfg \ 
> --lrate="C:0.05:4" \ --momentum 0.9 \ --dropout-factor 0.25 0.25 0.25 0.1

What is wrong? I don't understand...

ImportError: No module named models.dnn

I have downloaded successfully all dependencies required to run pdnn. But when i run the following command from terminal

python pdnn/cmds/run_DNN.py --train-data "train.pickle.gz,partition=600m,random=true" \

it shows an import error message.

As I am new to this learning environment. I seek your help for the purpose to rectify this error.

Regards,

Multi channel CNN

Hello
I tried to use CNN part in PDNN.
I have AMI corpus two channel speech filter-bank feature.
(40 dimension filter-bank, delta and double-delta & splicing +-5 frames)
So total dimension of my feature is 2x33x40.

And I convert feature to pfile, and give only 1 label.
When I make pfile I use final.mdl file which is created by training single channel.
Add channel 1 and channel 2 filter-bank => Create pfile with 1 label

--conv-nnet-spec "2x33x40:128,33x9,p1x3,f"
--nnet-spec "1024:1024:$num_pdfs" \

During training CNN error decreases
[2015-03-27 20:50:59.615275] > ... initializing the model
[2015-03-27 20:50:59.704674] > ... getting the finetuning functions
[2015-03-27 20:51:01.789192] > ... finetunning the model
[2015-03-28 02:06:13.517982] > epoch 1, training error 85.005036 (%)
[2015-03-28 02:15:04.781514] > epoch 1, lrate 0.080000, validation error 89.706542 (%)
[2015-03-28 07:30:39.580127] > epoch 2, training error 80.255429 (%)
[2015-03-28 07:39:31.307037] > epoch 2, lrate 0.080000, validation error 89.092275 (%)
[2015-03-28 12:54:14.694241] > epoch 3, training error 79.482440 (%)
[2015-03-28 13:03:04.171323] > epoch 3, lrate 0.080000, validation error 89.489965 (%)
[2015-03-28 18:17:20.099473] > epoch 4, training error 78.750167 (%)
[2015-03-28 18:26:11.689961] > epoch 4, lrate 0.080000, validation error 89.382604 (%)
[2015-03-28 23:41:57.655036] > epoch 5, training error 78.529532 (%)
[2015-03-28 23:50:46.840260] > epoch 5, lrate 0.080000, validation error 89.295978 (%)
[2015-03-29 05:05:29.906363] > epoch 6, training error 78.304295 (%)
[2015-03-29 05:14:21.794159] > epoch 6, lrate 0.040000, validation error 89.095217 (%)
[2015-03-29 05:14:32.717682] > ... the final PDNN model parameter is exp/pdnn/pfile/nnet.param
[2015-03-29 05:14:32.725299] > ... the final PDNN model config is exp/pdnn/pfile/nnet.cfg
[2015-03-29 05:14:45.070086] > ... the final Kaldi model (only FC layers) is exp/pdnn/pfile/dnn.nnet

But when I test to check word error rate, error rate is 92%.
What should I do to run multi channel CNN??

Interpreting dnn.param file

I just wanted to confirm whether I am interpreting the dnn.param file correctly.
So after computing the values for the output nodes (by using the weights of the hidden layer & applying the specified activation function), the order of output node labels is : 0,1,2,3...,n i.e, the output node sequence is the same as the one we specify during input time, right ?

kaldi data format

Hi, I am trying to build a DBN for language Id using PDNN. As is a huge amount of data, I decided to use kaldi data format to structure my data. I use copy-feat kaldi binary to convert my ascii features to .ark, but I don’t know how to do with the labels.
I already have ascci files with the phonetic frame labels, how do I convert that into .ali files?
thx in advance
ana

LSTM RNN

Is there any toolkit like PDNN, but using LSTM RNN?

yajiemiao / pdnn Goto Github PK

pdnn's People

Contributors

Stargazers

Watchers

Forkers

pdnn's Issues

!/bin/bash

two variables you need to set

export environment variables

TRAIN DNN

!/bin/bash

export environment variables

train DNN model

python $pdnndir/cmds/run_DNN.py --train-data "train.pickle.gz,partition=600m,random=true" \

--valid-data "valid.pickle.gz,partition=600m,random=true" \

--nnet-spec "1000:1024:1024:1024:1024:1024:1000" \

--wdir ./ --param-output-file dnn.mdl

echo "Training the DNN model ..."

train DNN model

train DNN model

classification on the testing data; -1 means the final layer, that is, the classification softmax layer

Recommend Projects

Recommend Topics

Recommend Org