I'm running the encoder/demo.ipynb notebook with Python2.7 and PyTorch '0.1.12_1'

The error message you have comes from this: <div class="highlight highlight-source

Oh ok I get it. Can you try to change the line in models.py here: <a href="https://git

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

<a class="commit-link" data-hovercard-type="commit" data-hovercard-url="https://github

ValueError when encoding about infersent HOT 7 CLOSED

facebookresearch commented on June 2, 2024

ValueError when encoding

from infersent.

Comments (7)

aconneau commented on June 2, 2024

The error message you have comes from this:

import numpy as np
embeddings = [np.zeros((64, 4096)), np.zeros((64, 4096))]
embeddings = np.vstack(embeddings) # no error
embeddings = [np.zeros((64, 4096)), np.zeros((64, 4096)), np.zeros((64, 3412))]
embeddings = np.vstack(embeddings) # error
# -> ValueError: all the input array dimensions except for the concatenation axis must match exactly

For some reasons, one of the element in "embeddings" is not of size (batch_size=128, emb_dim=4096). So there must be one or more element of size different than (128, 4096).

Just before the error in line 209, could you print the shape of each element in embeddings?

for batch in embeddings:
    print(batch.shape)

to see if we can spot the element with the wrong size.

What is in "sentences"? Can you check that you don't have an empty sentence?
what is the length of "sentences" ?
Could you update pytorch to a more recent version and see if you still have the issue?

from infersent.

briandw commented on June 2, 2024

Thanks for the quick response.

I believe that I'm on the latest torch version of 0.1.12_1. Is there a later version?

sentences length is is 9815 and there are no 0 length sentences in the array.

This is the output from just before the line 209:

Nb words kept : 128201/130068 (98.56 %)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 23, 4096)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-8-3d88dd6254e6> in <module>()
      1 tmp = sentences[:128]
----> 2 model.encode(sentences, tokenize=False, verbose=True)

/home/brian/InferSent/encoder/models.py in encode(self, sentences, bsize, tokenize, verbose)
    210         for batch in embeddings:
    211             print(batch.shape)
--> 212         embeddings = np.vstack(embeddings)
    213 
    214         # unsort

/home/brian/anaconda3/envs/py2/lib/python2.7/site-packages/numpy/core/shape_base.pyc in vstack(tup)
    235 
    236     """
--> 237     return _nx.concatenate([atleast_2d(_m) for _m in tup], 0)
    238 
    239 def hstack(tup):

ValueError: all the input array dimensions except for the concatenation axis must match exactly

from infersent.

aconneau commented on June 2, 2024

Oh ok I get it. Can you try to change the line in models.py here: https://github.com/facebookresearch/InferSent/blob/master/encoder/models.py#L67

emb = torch.max(sent_output, 0)[0]

into:

emb = torch.max(sent_output, 0)[0].squeeze(0)

and see if this works then?

from infersent.

briandw commented on June 2, 2024

That's working now. Thanks! I wonder why this didn't show up before?

from infersent.

aconneau commented on June 2, 2024

@briandw So this is an issue linked to the change of policy in pytorch functions such as max, mean, sum etc.

If you have a tensor of size (say) (23, 128, 4096). If you take the torch.max (or torch.mean ..) over the first dimension, then you get a tensor of size:

(128, 4096) for recent versions of pytorch
(1, 128, 4096) for old versions of pytorch

So it means your version of pytorch is too old. I will update the requirement part in the README, and add an exception in the models.py to handle this case.

Thanks

from infersent.

aconneau commented on June 2, 2024

4b7f9ec

from infersent.

Pragtisood commented on June 2, 2024

setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (9815,) + inhomogeneous part.
getting this error on
embeddings = infersent.encode(sentences, bsize=128, tokenize=False, verbose=True)
print('nb sentences encoded : {0}'.format(len(embeddings)))

from infersent.

ValueError when encoding about infersent HOT 7 CLOSED

Comments (7)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent