Git Product home page Git Product logo

Comments (7)

aconneau avatar aconneau commented on June 2, 2024

The error message you have comes from this:

import numpy as np
embeddings = [np.zeros((64, 4096)), np.zeros((64, 4096))]
embeddings = np.vstack(embeddings) # no error
embeddings = [np.zeros((64, 4096)), np.zeros((64, 4096)), np.zeros((64, 3412))]
embeddings = np.vstack(embeddings) # error
# -> ValueError: all the input array dimensions except for the concatenation axis must match exactly

For some reasons, one of the element in "embeddings" is not of size (batch_size=128, emb_dim=4096). So there must be one or more element of size different than (128, 4096).

  1. Just before the error in line 209, could you print the shape of each element in embeddings?
for batch in embeddings:
    print(batch.shape)

to see if we can spot the element with the wrong size.

  1. What is in "sentences"? Can you check that you don't have an empty sentence?

  2. what is the length of "sentences" ?

  3. Could you update pytorch to a more recent version and see if you still have the issue?

from infersent.

briandw avatar briandw commented on June 2, 2024

Thanks for the quick response.

I believe that I'm on the latest torch version of 0.1.12_1. Is there a later version?

sentences length is is 9815 and there are no 0 length sentences in the array.

This is the output from just before the line 209:

Nb words kept : 128201/130068 (98.56 %)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 23, 4096)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-8-3d88dd6254e6> in <module>()
      1 tmp = sentences[:128]
----> 2 model.encode(sentences, tokenize=False, verbose=True)

/home/brian/InferSent/encoder/models.py in encode(self, sentences, bsize, tokenize, verbose)
    210         for batch in embeddings:
    211             print(batch.shape)
--> 212         embeddings = np.vstack(embeddings)
    213 
    214         # unsort

/home/brian/anaconda3/envs/py2/lib/python2.7/site-packages/numpy/core/shape_base.pyc in vstack(tup)
    235 
    236     """
--> 237     return _nx.concatenate([atleast_2d(_m) for _m in tup], 0)
    238 
    239 def hstack(tup):

ValueError: all the input array dimensions except for the concatenation axis must match exactly

from infersent.

aconneau avatar aconneau commented on June 2, 2024

Oh ok I get it. Can you try to change the line in models.py here: https://github.com/facebookresearch/InferSent/blob/master/encoder/models.py#L67

emb = torch.max(sent_output, 0)[0]

into:

emb = torch.max(sent_output, 0)[0].squeeze(0)

and see if this works then?

from infersent.

briandw avatar briandw commented on June 2, 2024

That's working now. Thanks! I wonder why this didn't show up before?

from infersent.

aconneau avatar aconneau commented on June 2, 2024

@briandw So this is an issue linked to the change of policy in pytorch functions such as max, mean, sum etc.

If you have a tensor of size (say) (23, 128, 4096). If you take the torch.max (or torch.mean ..) over the first dimension, then you get a tensor of size:

(128, 4096) for recent versions of pytorch
(1, 128, 4096) for old versions of pytorch

So it means your version of pytorch is too old. I will update the requirement part in the README, and add an exception in the models.py to handle this case.

Thanks

from infersent.

aconneau avatar aconneau commented on June 2, 2024

4b7f9ec

from infersent.

Pragtisood avatar Pragtisood commented on June 2, 2024

setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (9815,) + inhomogeneous part.
getting this error on
embeddings = infersent.encode(sentences, bsize=128, tokenize=False, verbose=True)
print('nb sentences encoded : {0}'.format(len(embeddings)))

from infersent.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.