Git Product home page Git Product logo

arcii-for-matching-natural-language-sentences's Introduction

ARCII-for-Matching-Natural-Language-Sentences

A simple version of ARC-II model implemented in Keras.
Please reference paper:Convolutional Neural Network Architectures for Matching Natural Language Sentences

Quick Glance

  1. Input Data Format
  • Train set:
label	|q1	|q2
1	|Amrozi accused his brother, whom he called "the witness", of deliberately distorting his evidence.	|Referring to him as only "the witness", Amrozi accused his brother of deliberately distorting his evidence.
0	|Yucaipa owned Dominick's before selling the chain to Safeway in 1998 for $2.5 billion.	|Yucaipa bought Dominick's in 1995 for $693 million and sold it to Safeway for $1.8 billion in 1998.
  • Test set:
q1	|q2
Amrozi accused his brother, whom he called "the witness", of deliberately distorting his evidence.	|Referring to him as only "the witness", Amrozi accused his brother of deliberately distorting his evidence.
Yucaipa owned Dominick's before selling the chain to Safeway in 1998 for $2.5 billion.	|Yucaipa bought Dominick's in 1995 for $693 million and sold it to Safeway for $1.8 billion in 1998.
  • Word Embedding:
word	|embedding (300-dimension)
Amrozi	|-0.54645991 2.28509140 ... -0.34052843 -2.01874685
chief	|-9.01635551 -3.80108356 ... 1.86873138 2.14706421
  1. Train the model
$ python arcii.py
  1. Loss and Accuracy
    A toy data set example copied from MatchZoo's toy example

Requirements

  • Python 3.5
  • TensorFlow 1.8.0
  • Keras 2.1.6

To do list

  • Negative Sampling
  • Mask zero inputs

arcii-for-matching-natural-language-sentences's People

Contributors

wyu-du avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

arcii-for-matching-natural-language-sentences's Issues

layer1_input

If the batch size is 128, then q_embed is the features of the 128 questions after the 'word embedding', and d_embed is the features of the 128 responses. After the concatenate function, it is not the corresponding two-sentence connection, So the first layer of input is how to perform one-dimensional convolution? I don't understand this very well. Can you explain it?

Dataset

Hi, thank you for sharing the codes!
However, there is still something wrong with the dataset you provided. In corpus_preprocessed.txt, the indexes are numbers, whereas in relation_train.txt, the query indexes are Q+numbers. This results in the error:
File "arcii.py", line 40, in get_texts texts1.append(all_words[t_].split(' ')) KeyError: 'Q1'

Could you help update the dataset?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.