Hi, I trained the autoencoder successfully. However, when I was doin

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

ValueError: Shapes are not compatible when training transition model after autoencoder trained about research HOT 26 OPEN

commaai commented on September 22, 2024

ValueError: Shapes are not compatible when training transition model after autoencoder trained

from research.

Comments (26)

lxgen commented on September 22, 2024 5

I have solved the problem by changing the Keras version from 1.0.8 to 1.0.6.

from research.

lxgen commented on September 22, 2024 2

Have you solved the problem? I have the same problem as you.

from research.

jamesjackson commented on September 22, 2024 1

@andrewraharjo I'm seeing the same issue "ValueError: Filter must not be larger than the input: Filter: (8, 8) Input: (3, 160)" on an AWS GPU instance with TF 0.10 and Keras 1.1.0. I don't get the issue if running locally on a MacBook Pro with TF 0.9 and Keras 1.0.8. I'll try setting up the AWS instance with TF 0.9.

from research.

EderSantana commented on September 22, 2024 1

guys, if you are using the new tensorflow and keras make sure to pass unroll=True as input parameters to RNN layers. I had this problem with other layers as well

from research.

chqsark commented on September 22, 2024

can someone take a look? thanks!

from research.

EderSantana commented on September 22, 2024

this is weird, see that you have two outputs with shape (?, 64, 512) and (9, 2, 512). They should be (64, 5, 512) and (64, 9, 512). But K.rnn is messing the shapes up. I'll check what is going on.

In case you want to investigate as well, here is where the bug should be happening https://github.com/commaai/research/blob/master/models/layers.py#L359-L374

What is your keras version by the way?

from research.

EderSantana commented on September 22, 2024

So, for the now the only place that I see could be cause this problem is the consume_less RNN parameter in Keras. Try changing https://github.com/commaai/research/blob/master/models/transition.py#L41-L42 to:

model.add(DreamyRNN(output_dim=z_dim, output_length=out_leng-1, return_sequences=True,
                    activation="tanh", consume_less="not_cpu", batch_input_shape=(batch_size, time, z_dim)))

Unfortunately I can't reproduce your bug right now. But I'll give you more information as soon as I get an opportunity.

from research.

chqsark commented on September 22, 2024

@EderSantana Thanks a lot for the response. I've tried keras 1.0.6 and 1.0.8, tensorflow 0.9, and 0.10. All gave the same error. I still got the error after changing transition.py as you suggested.

I realized that comma.ai has a fork of keras. Should I use that instead of the original one? Or any specific branch of keras?

from research.

EderSantana commented on September 22, 2024

no I tried this code on Keras public release. I think the problem is with the recurrent layer consume_less parameter. But I can't test it right now :(

from research.

chqsark commented on September 22, 2024

I just tried 'cpu', 'gpu', 'mem' for consume_less parameter. No luck :(

My server.py output is like this
guan.wang@Z440SJ-243:~/ml/comma/research$ ./server.py --time 60 --batch 64
INFO:main:server started
INFO:dask_generator:Loading 9 hdf5 buckets.
x 52722 | t 263583 | f 52722
x 58993 | t 294919 | f 58993
x 19731 | t 98719 | f 19731
x 56166 | t 280785 | f 56166
x 25865 | t 129344 | f 25865
x 85296 | t 426596 | f 85296
x 78463 | t 392182 | f 78463
x 30538 | t 152650 | f 30538
x 51691 | t 258571 | f 51691
training on 436627/459465 examples
INFO:dask_generator:camera files 9
4296.05 ms
X (64, 60, 3, 160, 320)
angle (64, 60, 1)
speed (64, 60, 1)

from research.

EderSantana commented on September 22, 2024

@chqsark thanks for the information. I'll continue investigating.

from research.

kamal94 commented on September 22, 2024

Suffering the same problem. Any thoughts?

from research.

chqsark commented on September 22, 2024

I also solved it by completely removing keras and install the 1.0.6 version. Previously I tried virtualenv for 1.0.6 and it didn't work. Maybe my package system messed it up. Now it started running. Just the server side generates the following periodically.

Traceback (most recent call last):
File "/home/guan.wang/ml/comma/research/dask_generator.py", line 109, in datagen
X_batch[count] = x[i-es-time_len+1:i-es+1]
File "/usr/lib/python2.7/dist-packages/h5py/_hl/dataset.py", line 419, in getitem
selection = sel.select(self.shape, args, dsid=self.id)
File "/usr/lib/python2.7/dist-packages/h5py/_hl/selections.py", line 91, in select
sel[args]
File "/usr/lib/python2.7/dist-packages/h5py/_hl/selections.py", line 258, in getitem
start, count, step, scalar = _handle_simple(self.shape,args)
File "/usr/lib/python2.7/dist-packages/h5py/_hl/selections.py", line 509, in _handle_simple
x,y,z = _translate_slice(arg, length)
File "/usr/lib/python2.7/dist-packages/h5py/_hl/selections.py", line 550, in _translate_slice
raise ValueError("Reverse-order selections are not allowed")
ValueError: Reverse-order selections are not allowed

from research.

Yale323 commented on September 22, 2024

@EderSantana
I also have the same situation. After install Keras 1.0.6 and start the training of transition, there is two kind of errors in the server side.
One is the "ValueError: Reverse-order selections are not allowed"
Traceback (most recent call last):
File "/home/yale/research/dask_generator.py", line 109, in datagen
X_batch[count] = x[i-es-time_len+1:i-es+1]
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper (/tmp/pip-4rPeHA-build/h5py/_objects.c:2684)
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper (/tmp/pip-4rPeHA-build/h5py/_objects.c:2642)
File "/home/yale/anaconda2/envs/tensorflow/lib/python2.7/site-packages/h5py/_hl/dataset.py", line 462, in getitem
selection = sel.select(self.shape, args, dsid=self.id)
File "/home/yale/anaconda2/envs/tensorflow/lib/python2.7/site-packages/h5py/_hl/selections.py", line 92, in select
sel[args]
File "/home/yale/anaconda2/envs/tensorflow/lib/python2.7/site-packages/h5py/_hl/selections.py", line 259, in getitem
start, count, step, scalar = _handle_simple(self.shape,args)
File "/home/yale/anaconda2/envs/tensorflow/lib/python2.7/site-packages/h5py/_hl/selections.py", line 443, in _handle_simple
x,y,z = _translate_slice(arg, length)
File "/home/yale/anaconda2/envs/tensorflow/lib/python2.7/site-packages/h5py/_hl/selections.py", line 484, in _translate_slice
raise ValueError("Reverse-order selections are not allowed")
ValueError: Reverse-order selections are not allowed

The other is the "could not broadcast input array from shape (5,1) into shape (60,1)"
Traceback (most recent call last):
File "/home/yale/research/dask_generator.py", line 112, in datagen
angle_batch[count] = np.copy(angle[i-time_len+1:i+1])[:, None]
ValueError: could not broadcast input array from shape (5,1) into shape (60,1)

Is it related to the different Keras version?

from research.

andrewraharjo commented on September 22, 2024

@chqsark how did you completely remove keras ? Was it out of your virtualenv or under your virtualenv or conda environment ?

@EderSantana I got this issue
"/home/dev-box/anaconda2/envs/python2/lib/python2.7/site-packages/tensorflow/python/framework/common_shapes.py", line 246, in conv2d_shape padding)

File "/home/dev-box/anaconda2/envs/python2/lib/python2.7/site-packages/tensorflow/python/framework/common_shapes.py", line 184, in get2d_conv_output_size (row_stride, col_stride), padding_type)

File "/home/dev-box/anaconda2/envs/python2/lib/python2.7/site-packages/tensorflow/python/framework/common_shapes.py", line 149, in get_conv_output_size "Filter: %r Input: %r" % (filter_size, input_size))

ValueError: Filter must not be larger than the input: Filter: (8, 8) Input: (3, 160)

Keras 1.0.6 and TF 0.10, Theano 0.8.2

from research.

andrewraharjo commented on September 22, 2024

@jamesjackson It seems that it is related with current TF build.Checkout this link I haven't given up yet with TF but the way I start the training is using Theano backend 0.8.2, keras 1.1.0 with cuDNN 5.1 though it's recommended running cuDNN 5.0. If you can't keep going with TF then try to modify your keras.json to theano and update your theanorc file by changing CPU to GPU. Oh by the way I'm not running AWS, I have use stationary dev-box

from research.

jamesjackson commented on September 22, 2024

Thanks @andrewraharjo , @EderSantana

It appears to be an odd environmental issue related to the packaging and/or Anaconda. I tried several TF/Keras versions, and they all failed in the same way. Building from source and avoiding Anaconda does work.

from research.

andrewraharjo commented on September 22, 2024

@jamesjackson I was thinking that way earlier and I verified with my buddy who installed use Anaconda3 and setup the virtualenv for Python 2.7. He could run with TF and I was confused why the Anaconda2 won't work. Did you solve this problem by building from source and avoid anaconda ?

from research.

EderSantana commented on September 22, 2024

As a note, my tensorflow was installed from source as well. (but I did use anaconda)

from research.

jamesjackson commented on September 22, 2024

@andrewraharjo Yeah, source-based without Anaconda is working.

from research.

chqsark commented on September 22, 2024

@jamesjackson Yes, source-based without Anaconda +1
@andrewraharjo Yes, I completely removed keras and reinstalled the right version.

from research.

skywong1230 commented on September 22, 2024

I got the same erro when I am going to run the code to train the transition model:

The error in server side:
Traceback (most recent call last):
File "/home/sky/research/dask_generator.py", line 112, in datagen
angle_batch[count] = np.copy(angle[i-time_len+1:i+1])[:, None]
ValueError: could not broadcast input array from shape (5,1) into shape (60,1)

Does anyone know the solutions?

from research.

zhaohuaqing1993 commented on September 22, 2024

have you run the/view_generative_model.py transition --name transition successfull?

from research.

pandamax commented on September 22, 2024

Traceback (most recent call last):
File "/home/deep-learning/research-master/dask_generator.py", line 112, in datagen
angle_batch[count] = np.copy(angle[i-time_len+1:i+1])[:, None]
ValueError: could not broadcast input array from shape (55,1) into shape (60,1)
same problem occured~

from research.

ahmedyahia3393 commented on September 22, 2024

in the view steering model.py file
I found his error (ValueError: bad marshal data (unknown type code)) result when trying to execute the view steering model.py
here is the result from the cmd prompt

Traceback (most recent call last):
File "C:\Users\lenovo\Anaconda3\lib\site-packages\keras\utils\generic_utils.py
", line 229, in func_load
raw_code = codecs.decode(code.encode('ascii'), 'base64')
UnicodeEncodeError: 'ascii' codec can't encode character '\xe0' in position 46:
ordinal not in range(128)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "view_steering_model.py", line 94, in
model = model_from_json(json.load(jfile))
File "C:\Users\lenovo\Anaconda3\lib\site-packages\keras\models.py", line 349,
in model_from_json
return layer_module.deserialize(config, custom_objects=custom_objects)
File "C:\Users\lenovo\Anaconda3\lib\site-packages\keras\layers_init_.py", l
ine 55, in deserialize
printable_module_name='layer')
File "C:\Users\lenovo\Anaconda3\lib\site-packages\keras\utils\generic_utils.py
", line 144, in deserialize_keras_object
list(custom_objects.items())))
File "C:\Users\lenovo\Anaconda3\lib\site-packages\keras\models.py", line 1349,
in from_config
layer = layer_module.deserialize(conf, custom_objects=custom_objects)
File "C:\Users\lenovo\Anaconda3\lib\site-packages\keras\layers_init_.py", l
ine 55, in deserialize
printable_module_name='layer')
File "C:\Users\lenovo\Anaconda3\lib\site-packages\keras\utils\generic_utils.py
", line 144, in deserialize_keras_object
list(custom_objects.items())))
File "C:\Users\lenovo\Anaconda3\lib\site-packages\keras\layers\core.py", line
711, in from_config
function = func_load(config['function'], globs=globs)
File "C:\Users\lenovo\Anaconda3\lib\site-packages\keras\utils\generic_utils.py
", line 234, in func_load
code = marshal.loads(raw_code)
ValueError: bad marshal data (unknown type code)

from research.

kingxueyuf commented on September 22, 2024

looks like the issue is from Keras, which version are you using?

from research.

ValueError: Shapes are not compatible when training transition model after autoencoder trained about research HOT 26 OPEN

Comments (26)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent