Git Product home page Git Product logo

text_matching's Introduction

text_matching

文本匹配模型

本项目包含目前大部分文本匹配模型,持续更新中,其中论文解读请点击文本相似度,文本匹配模型归纳总结

数据集为QA_corpus,训练数据10w条,验证集和测试集均为1w条

其中对应模型文件夹下的args.py文件是超参数

训练: python train.py

测试: python test.py

词向量: 不同的模型输入不一样,有的模型的输入只有简单的字向量,有的模型换成了字向量+词向量,甚至还有静态词向量(训练过程中不进行更新)和 动态词向量(训练过程中更新词向量),所有不同形式的输入均以封装好,调用方法如下

静态词向量,请执行 python word2vec_gensim.py,该版本是采用gensim来训练词向量

动态词向量,请执行 python word2vec.py,该版本是采用tensorflow来训练词向量,训练完成后会保存embedding矩阵、词典和词向量在二维矩阵的相对位置的图片, 如果非win10环境,由于字体的原因图片可能保存失败

测试集结果对比:

模型 loss acc 输入说明 论文地址
DSSM 0.7613157 0.6864 字向量 DSSM
ConvNet 0.6872447 0.6977 字向量 ConvNet
ESIM 0.55444807 0.736 字向量 ESIM
ABCNN 0.5771452 0.7503 字向量 ABCNN
BiMPM 0.4852 0.764 字向量+静态词向量 BiMPM
DIIN 0.48298636 0.7694 字向量+动态词向量 DIIN
DRCN 0.6549849 0.7811 字向量+静态词向量+动态词向量+是否有相同词 DRCN

以上测试结果可能不是模型的最优解,超参的选择也不一定是最优的,如果你想用到自己的实际工程中,请自行调整超参

text_matching's People

Contributors

terrifyzhao avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

text_matching's Issues

没有load_data函数

我尝试运行abcnn/train.py时报错:

Traceback (most recent call last):
  File "d:\GithubProjs\text_matching-master\abcnn\train.py", line 8, in <module>
    from utils.load_data import load_data
ImportError: cannot import name 'load_data' from 'utils.load_data' (d:\GithubProjs\text_matching-master\abcnn\..\utils\load_data.py)

打开ultils/load_data.py发现里面并没有load_data函数。

如何解决?

关于w2v_dynamic

为什么要用tf训练一个emb?直接用gensim,然后增量训练有什么不同呢?

DRCN模型中5层lstm的stacked的输入疑问

我理解论文中公式6的意思是,第l层t时刻的输入为(1)第l-1层t时刻隐向量,(2)第l-1层的attention向量,(3)第l-1层t时刻的输入, 三者contact起来为第l层t时刻的输入。
而代码是如下:

` for j in range(5):
with tf.variable_scope(f'p_lstm_{i}{j}', reuse=None):
p_state, _ = self.BiLSTM(tf.concat(p_state, axis=-1))
with tf.variable_scope(f'p_lstm
{i}_{j}' + str(i), reuse=None):
h_state, _ = self.BiLSTM(tf.concat(h_state, axis=-1))

            p_state = tf.concat(p_state, axis=-1)
            h_state = tf.concat(h_state, axis=-1)
            # attention
            cosine = tf.divide(tf.matmul(p_state, tf.matrix_transpose(h_state)),
                               (tf.norm(p_state, axis=-1, keep_dims=True) * tf.norm(h_state, axis=-1, keep_dims=True)))
            att_matrix = tf.nn.softmax(cosine)
            p_attention = tf.matmul(att_matrix, h_state)
            h_attention = tf.matmul(att_matrix, p_state)

            # DesNet
            p = tf.concat((p, p_state, p_attention), axis=-1)
            h = tf.concat((h, h_state, h_attention), axis=-1)

`

所以,第j层的输入应该是p,而不是p_state
不知道我理解的对不对

还有一个细节,5层stacked的bilstm的输出,是要和原始字词的embedding拼接给到下一个5层stacked的bilstm?论文图1是这么画的,文字的话,好像没有提这一点
论文中还有一个pooling结构,在4个5层bilstm后面,输出如果是(30,100)的话(30个词, 每个词的embedding是100维),则进行按列进行max-pooling成100维的p、q向量,然后进行公示7的拼接,在进行3层dense。

bimpm attentive_match

n = tf.norm(v1, axis=2, keep_dims=True) * tf.norm(v2, axis=2, keep_dims=True)

这个地方应该有问题吧?是不是应该这样
tf.norm(metric1, axis=-1, keep_dims=True) * tf.transpose(tf.norm(metric2, axis=-1, keep_dims=True), perm=[0, 2, 1])

model predict

can this model predict the label from the new dataset?How?

关于ESIM pooling操作的问题

  1. 计算的维度错了
  2. pooling需要使用实际序列长度,不能直接在整个时间序列维度上reduce。建议引入mask

Question about Version

Hi! Thank you for your code. May I ask what is the suitable version (including tensorflow, cuda and cudnn) about this code? It seems running at different versions will occur different errors. Thank you!

关于引用路径的小建议

建议在train.py中路径,尽量不要用 “..”的方式作为上级路径吧... 尽量用os.path.abspath(os.path.dirname(os.path.dirname(file)))的方式引用上级路径,运行的时候不会出问题

esim模型的pooling操作出错

Pooling阶段,对BiLSTM的输出做pooling操作,应该是在句子长度的这个维度,也就是维度1上,不是维度2.

关于dssm代码的graph.py一处错误

我发现dssm下的graph.h

函数cosine:

def cosine(p, h):
    p_norm = tf.norm(p, axis=1, keepdims=True)
    h_norm = tf.norm(p, axis=1, keepdims=True)

上面的h_norm应该是使用”h“而非"p"吧?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.