Git Product home page Git Product logo

nli's Introduction

Enhanced LSTM for Natural Language Inference

Source code for "Enhanced LSTM for Natural Language Inference" runnable on GPU and CPU based on Theano. If you use this code as part of any published research, please acknowledge the following paper.

"Enhanced LSTM for Natural Language Inference" Qian Chen, Xiaodan Zhu, Zhenhua Ling, Si Wei, Hui Jiang, Diana Inkpen. ACL (2017)

@InProceedings{Chen-Qian:2017:ACL,
  author    = {Chen, Qian and Zhu, Xiaodan and Ling, Zhenhua and Wei, Si and Jiang, Hui and Inkpen, Diana},
  title     = {Enhanced LSTM for Natural Language Inference},
  booktitle = {Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (ACL 2017)},
  month     = {July},
  year      = {2017},
  address   = {Vancouver},
  publisher = {ACL}
}

Homepage of the Qian Chen, http://home.ustc.edu.cn/~cq1231/

The code is modified from GitHub - nyu-dl/dl4mt-tutorial.

The code for tree-LSTM version has been released. Tree-LSTM part is modified from GitHub - dallascard/TreeLSTM, but support minibatches.

Dependencies

To run it perfectly, you will need:

  • Python 2.7
  • Theano 0.8.2

Running the Script

  1. Download and preprocess
cd data
bash fetch_and_preprocess.sh
  1. Train and test model for ESIM
cd scripts/ESIM/
bash train.sh
  1. Train and test model for TreeLSTM-IM
cd scripts/TreeLSTM-IM/
bash train.sh

The result is in log.txt file.

nli's People

Contributors

jabalazs avatar lukecq1231 avatar uduse avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

nli's Issues

run on google colab

hello!
I have python 2 and theano 0.8.2.
I want run project in google colab.
I encounter the following error:

Theano does not recognise this flag: CUDA_DIR
warnings.warn('Theano does not recognise this flag: {0}'.format(key))

I set device=cuda0 , but I see the following error:
ERROR (theano.sandbox.gpuarray): pygpu was configured but could not be imported

now, I run the code below:

!wget -c https://repo.continuum.io/archive/Anaconda2-5.1.0-Linux-x86_64.sh
!chmod +x Anaconda2-5.1.0-Linux-x86_64.sh
!bash ./Anaconda2-5.1.0-Linux-x86_64.sh -b -f -p /usr/local
!conda install theano pygpu

but , I have the following error:
ERROR (theano.gpuarray): Could not initialize pygpu, support disabled
....

Please check the implementation of the project on Google colab. I want to run Kim and ESIM projects on Google colab, but both give the same errors.
please help me.

lstm_layer mask

Hello, I'm following your work, and try to reimplement ESIM by tensorflow.
I noticed in you lstm_layer() , you masked c and h, I'm wondering how much will the mask improve the model compared with the no-mask(just basic LSTM).
And how much will the ortho_weight help?
Thank you so much.

GPU

If I want to use GPU, I will deal with this problem.
image

While Google colab uses Tesla k80, what should I do to get the code running on the GPU?

ESIM using keras

Hi
Since I don't have access to GPU, I can't execute your code, but there is another code in the github that implements your model with the keras Library . Are you confirming the following code and correct?

"""
Implementation of ESIM(Enhanced LSTM for Natural Language Inference)
https://arxiv.org/abs/1609.06038
"""
import numpy as np
from keras.layers import *
from keras.activations import softmax
from keras.models import Model

def StaticEmbedding(embedding_matrix):
in_dim, out_dim = embedding_matrix.shape
return Embedding(in_dim, out_dim, weights=[embedding_matrix], trainable=False)

def subtract(input_1, input_2):
minus_input_2 = Lambda(lambda x: -x)(input_2)
return add([input_1, minus_input_2])

def aggregate(input_1, input_2, num_dense=300, dropout_rate=0.5):
feat1 = concatenate([GlobalAvgPool1D()(input_1), GlobalMaxPool1D()(input_1)])
feat2 = concatenate([GlobalAvgPool1D()(input_2), GlobalMaxPool1D()(input_2)])
x = concatenate([feat1, feat2])
x = BatchNormalization()(x)
x = Dense(num_dense, activation='relu')(x)
x = BatchNormalization()(x)
x = Dropout(dropout_rate)(x)
x = Dense(num_dense, activation='relu')(x)
x = BatchNormalization()(x)
x = Dropout(dropout_rate)(x)
return x

def align(input_1, input_2):
attention = Dot(axes=-1)([input_1, input_2])
w_att_1 = Lambda(lambda x: softmax(x, axis=1))(attention)
w_att_2 = Permute((2,1))(Lambda(lambda x: softmax(x, axis=2))(attention))
in1_aligned = Dot(axes=1)([w_att_1, input_1])
in2_aligned = Dot(axes=1)([w_att_2, input_2])
return in1_aligned, in2_aligned

def build_model(embedding_matrix, num_class=1, max_length=30, lstm_dim=300):
q1 = Input(shape=(max_length,))
q2 = Input(shape=(max_length,))

# Embedding
embedding = StaticEmbedding(embedding_matrix)
q1_embed = BatchNormalization(axis=2)(embedding(q1))
q2_embed = BatchNormalization(axis=2)(embedding(q2))

# Encoding
encode = Bidirectional(LSTM(lstm_dim, return_sequences=True))
q1_encoded = encode(q1_embed)
q2_encoded = encode(q2_embed)

# Alignment
q1_aligned, q2_aligned = align(q1_encoded, q2_encoded)

# Compare
q1_combined = concatenate([q1_encoded, q2_aligned, subtract(q1_encoded, q2_aligned), multiply([q1_encoded, q2_aligned])])
q2_combined = concatenate([q2_encoded, q1_aligned, subtract(q2_encoded, q1_aligned), multiply([q2_encoded, q1_aligned])]) 
compare = Bidirectional(LSTM(lstm_dim, return_sequences=True))
q1_compare = compare(q1_combined)
q2_compare = compare(q2_combined)

# Aggregate
x = aggregate(q1_compare, q2_compare)
x = Dense(num_class, activation='sigmoid')(x)

return Model(inputs=[q1, q2], outputs=x)

link github: https://gist.github.com/namakemono/b74547e82ef9307da9c29057c650cdf1

NaN detected

Hi, I am getting a NaN detected error. Just running your scripts with minor adaptations, i.e.:

  • theano 0.10-dev (bleeding edge)
  • python3 (basically just changing some "print", "xrange" commands in the code)

Training runs fine until Epoch 5 Update 91000 ...

error

Hi, What is the cause of the following error?

image

Please respond faster.
Thanks

Training time

Hello, I am interested to this model and want to know its training time of one epoch in SNLI dataset. And how many epoches does it need to reach the convergence state?

tree lstm

Hi,
I am checking the implementation, and I couldnt find the parts related to the tree lstm. are you planning to release that part too?
thanks

save model

Hello.
I lowered the amount of data so I can run with CPU. But at last the model did not run and I encountered the following error. what is the reason.
image

Question

Hello
I want to run this code and I want to reduce the number of samples first, for example, on 1000 samples.
I want to reduce the amount of dataset.
What changes should I make in the code?
Which files do I change?
thanks for your help

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.