Git Product home page Git Product logo

spm_toolkit's People

Contributors

lanwuwei avatar philipskokoh avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

spm_toolkit's Issues

some question with DecAtt

Thanks for sharing your great work!
I am trying to use DecAtt with quora dataset , but the model can not train like you?
the prediction always be 0 , so I print the attention matrix.
raw_attentions = torch.matmul(repr1, repr2) print(raw_attentions)
At beginning, it is filled with big number like this
tensor([[[ 141.6384, 3.2162, 3.8512, ..., 3.0097, 146.9306, 152.0328], [ 3.8926, 0.2856, 0.3139, ..., 0.2734, 5.9284, 6.0491], [ 3.8910, 0.2930, 0.3183, ..., 0.2677, 7.0301, 6.9951], ...,
after many batches it amazingly to be all zero !!
`tensor([[[ 0., 0., 0., ..., 0., 0., 0.],
[ 0., 0., 0., ..., 0., 0., 0.],
[ 0., 0., 0., ..., 0., 0., 0.],
...,
[ 0., 0., 0., ..., 0., 0., 0.],
[ 0., 0., 0., ..., 0., 0., 0.],
[ 0., 0., 0., ..., 0., 0., 0.]],

    [[ 0.,  0.,  0.,  ...,  0.,  0.,  0.],
     [ 0.,  0.,  0.,  ...,  0.,  0.,  0.],
     [ 0.,  0.,  0.,  ...,  0.,  0.,  0.],
     ...,`

Clarifications regarding ESIM Model code

Hi

I have a few questions related to the ESIM model for NLI:

  1. The ESIM paper says all vectors are updated during training including word vectors. Do the pretrained word embeddings from Glove also get updated in your ESIM model code?
  2. The file at ESIM/main_batch_snli.py uses only ESIM model while the file at ESIM/Tree_IM/main_snli.py uses both ESIM and Tree LSTM Models, right?
  3. The num_units in ESIM LSTM is always equal to pretrained embeddings dimension. Is it a necessity? I got dimension errors while trying to change num_units in LSTM.
  4. There is a default parameter max_sentence_length = 30 but it isn't used anywhere in the model_batch.py file. Is there any significance for this? I thought max_len is the parameter that controls sequence length and can be modified in the main_snli.py file.

It would be great if you could clear these doubts.
Thanks!

about STS task

First,i have not found this function 'readSTSdata()' in your code,
Secondly,When the model is initialized, there is a corresponding classification,Such as STS(6), TrecQA(2),but in the end it becomes 2 classes by default in the forward function.Is it really right?
Finally,How do you solve the similarity problem with the classified code? Some of the code is not very clear.

some questions on PWIM

running main.py and it throw out "cannot import name 'load_word_vectors'"
did not have this function?

Tokenize Method for Quora dataset

Hi, I am recently reimplementing the SSE model and I am confused how you pre-process the quora_duplicate_questions.tsv:

  1. I wonder how you generate the /pytorch/DeepPairWiseWord/data/quora/a.tok and b.tok? What tokenized method do you use?
  2. How do you split train/test/dev dataset from the quora_duplicate_questions.tsv? do you use same split as "Zhiguo Wang, Wael Hamza, and Radu Florian. 2017. Bilateral multi-perspective matching for natural language sentences. In Proceedings of IJCAI." and "Neural Paraphrase Identification of Questions with Noisy Pretraining"?
    I would appreciate it if you could answer my questions, Thank you.

test acc for ESIM model on snli testset

image
I just use your trained ESIM model to eval the test acc on snli testset and just get 57.82% acc, which is far below the acc reported on its papar. I don't know why?

Test results on SNLI from trained model

I found a trained ESIM SNLI model in the Google drive folder where preprocessed data is available.
I loaded that model and tested it on the preprocessed test data but got a test accuracy of 58% only. Can someone please tell me if that model is trained or I need to train a model from scratch and then test it?

A question about PWIM-main_sts

Traceback (most recent call last):
File "main_sts.py", line 893, in
main(args)
File "main_sts.py", line 466, in main
combine_mode, lm_mode)#, corpus)
TypeError: init() takes exactly 24 arguments (23 given)

请问这个问题怎么解决呢 我看是模型里缺了一个deepCNN的参数 这个应该怎么赋值呢 谢谢

load_word_vectors problems

I have torchtext 0.1.1 , python 2.7
but function "load_word_vectors" would not download the correct zip file (the 2.18GB one)
when it starts downloading
"glove.840B.300d: 8.19kB [00:00, 9.51kB/s]"
and it always ends up with a bad zip file

was I missing anything?

ESIM model questions

I see that 500 epochs need to be trained, do I need to train so many epochs, which will take a long time to train.
image
and I also want to know whether your experimental results close to the original paper.
thanks.

SSE Model pertained word vectors loading

Hi Wuwei,

Thanks for sharing the code and learned from your code! I have a small question below:

I found that for other models you use from torchtext.vocab import load_word_vectors to load pretrained word vectors while for SSE model, you directly use torch.load(EMB_FILE) to load the pickled embeddings. However, I could not find the embedding files in the data folder you provided. Could you please upload this file?

Thanks!

Regards,
Shuailong

Inquiry on results in the original paper

Hi Wuwei,

Thanks for sharing the code. I have a question on the results in Table 4 of the original paper. Are those results(let's say acc of PWIM on Quora is 0.834) on training set or test set? I noticed that the results in Table 4 are quite corresponding to the results of the training curves in Figure 3, so just want to make sure. Thank you!

Preprocessing?

how did you generate a.toks, b.toks and what kind of preprocessing did you apply to questions?

about DecAtt

When i use this model for wikiQA Task,i found that the batch list is difficult.
image
image
Why should we resort the length?And The interval of batch_list is not 32.

What is quora_vocab_cased.pkl that ESIM/main_batch_quora.py needs

I found that quora_vocab_cased.pkl would be loaded in line 155 in ESIM/main_batch_quora.py, in the beginning of that program.

When I try to run ESIM code, I got stuck in that file(quora_vocab_cased.pkl)
I found this file in the drive folder you share in DecAtt/README.

But currently I have a sentence matching task, how could I generate a file like quora_vocab_cased.pkl?
I have not found any code about generating that file.

image

A question about STS Task

Excuse me,i have a question about the PWIM model.The more training rounds, the better the effect of STS tasks.Is it all right?

error message in main_snli.py

Hi,
I run the main_snli.py, but got an error message in
def create_batch(data,from_index, to_index):

I changed the following sentence
left_sents = torch.cat((dict[word].view(1, -1) for word in lsent))
to
left_sents = torch.cat([dict[word].view(1, -1) for word in lsent])

Very small issue about grammar.

$print "XXX"$ and $print("XXX")$ are all used in the program. I think it is better to keep consistent in python 2.7 grammar or python 3.5 grammar.

why there is runtimwError of PWIM

RuntimeError: invalid argument 2: size '[-1 x 300]' is invalid for input with 270338015 elements at /pytorch/aten/src/TH/THStorage.c:37

PWIM Model

I used the same sentence pair into the PWIM model, but each time the result is different, why is this?

Train SNLI model with multiple GPU

Hi

I am trying to train the ESIM SNLI model on my own dataset where some premise lengths are very long (around 5500). I increased the maximum length but then my GPU is not able to handle the the data with 64 batch size.
I have multiple GPU environment and set CUDA_VISIBLE_DEVICES variable to multiple GPUs but the code still uses one GPU.
I also wrapped the model in DataParallel pipeline shown here (https://pytorch.org/tutorials/beginner/blitz/data_parallel_tutorial.html) as follows:

model = ESIM(dim_word, 3, n_words, dim_word, pretrained_emb)
model=nn.DataParallel(model)
if torch.cuda.is_available():
model = model.cuda()
criterion = criterion.cuda()
I get the this error on doing so:
ValueError: Expected input batch_size (256) to match target batch_size (64)

I am not able to understand why target batch size isn't changed when multiple GPUs are used.
Can anyone tell me how to use multiple GPUs to train the ESIM SNLI model? Or if there is any other way to handle large sequence length in the model?

Folder structure?

Hi! I'm trying to run ESIM and DecATT on Quora dataset. What folder structure and files are you using? If the files are not downloadable, please explain how/where they are generated. Thanks!

Why there is a layer named "linear_layer_intra"

Hi lanwu, thanks for your reproduction of DecAtt! I'm confused on the linear_layer_intra in _transformation_input. Is it defined anywhere else? I'd appreciate it if you could solve my problem, thanks a lot!

Do you use pre-trained word embedding when training infer-sent model on Quora Dataset?

Hi, I am trying to reproduce your result in the paper: "Neural Network Models for Paraphrase Identification, Semantic Textual Similarity, Natural Language Inference, and Question Answering", where you got 86.6% test acc on Quora dataset with the inferSent model. I wonder whether you use Pre-trained word embedding or not, because the facebook group use Pre-trained word embedding in their paper: "Supervised Learning of Universal Sentence Representations from Natural Language Inference Data". Thanks.

code about ESIM

Is this problem about the torchtext??
ImportError: cannot import name 'load_word_vectors'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.