insanelife / dssm Goto Github PK
View Code? Open in Web Editor NEWDSSM and Multi-View DSSM
DSSM and Multi-View DSSM
Can you share this file? thanks!
您好。能方便给一下数据格式么。不知道咋个弄训练数据啊
_transform_2seq2bert_id
我看这个函数里是拼接bert的输入,两个句子时间是不是要加入[SEP]
您好,按照最新的代码,还是报数据错误,求指教,谢谢。
如题,是天池比赛的所有数据吗?还是抽样出了一些呢。
感谢
想请教一下,对bert输出的所有token做平均池化的代码是什么?如何添加呢?
请问有谁用全量数据训过吗?大概需要多大的内存空间?带GPU的呢?
你好,能否告知训练样本的格式是怎么样的呢(正负样本如何组织的,输入是一个query对应1个正样本,4个负样本吗),还有你中文特征提取是只用了uni_gramn吗,方便留个邮箱或者联系方式吗,谢谢(by the way, 我也是在成都哟,哈哈)
multi_view_data_input.py 没找到。 可以发我一份吗?
能说一下输入的query_in,positive_in,negative_in的shape吗
Can you tell me the datasets format or show a screenshot ?
In the following, you use data_sets.query_test_data, data_sets.doc_test_positive, data_sets.doc_test_negative, so I don't quite understand the format.
Thanks!
你好,请问mac或者linux可以训练吗
邮箱是[email protected]谢谢
用”siamese_bert“模型,在80万公司数据集上,1(正):4(负),跑出来的cos倒排,感觉完全不靠谱,发愁
auc: 0.64
准确率: 0.75
在dssm.py中,计算loss的代码
with tf.name_scope('Loss'):
# Train Loss
cross_entropy = tf.nn.sigmoid_cross_entropy_with_logits(labels=doc_label_batch, logits=cos_sim)
losses = tf.reduce_sum(cross_entropy)
tf.summary.scalar('loss', losses)
pass
是不是有问题?为什么是reduce_sum?而不是reduce_mean
为什么我的在训练5个epoch ,loss还在下降,但是输出的softmax的值都变味nan了。结果auc变味0.5了。
import data_input ModuleNotFoundError:No module named 'data_input'
Where is the data_input
需要先训练模型,然后做预测,训练入口:train.py
训练(默认使用功LCQMC数据集):
python train.py --mode=train
预测:
python train.py --mode=train --file=$predict_file$
测试文件格式: q1\tq2, 例如:
今天天气怎么样 今天温度怎么样
Originally posted by @InsaneLife in #25 (comment)
👋 Hello, @InsaneLife - a potential high severity Deserialization of Untrusted Data vulnerability in your repository has been disclosed to us.
1️⃣ Visit https://huntr.dev/bounties/1-other-InsaneLife/dssm for more advisory information.
2️⃣ Sign-up to validate or speak to the researcher for more assistance.
3️⃣ Propose a patch or outsource it to our community - whoever fixes it gets paid.
Join us on our Discord and a member of our team will be happy to help! 🤗
Speak to a member of our team: @JamieSlome
This issue was automatically generated by huntr.dev - a bug bounty board for securing open source code.
您好,我看代码里定义损失函数那一块,先对query分别和正样本负样本的out_embedding求cos,然后外接softmax之后,只用到了正样本的概率结果,为什么不把负样本的概率结果求负之后也加进来呢?
如果按照您的loss定义,那么完全可以舍去负样本的输入。
Line 78 in eefe42e
Hi,
When I run dssm_rnn.py, the train loss always shows nan. Change learning rate, no matter what.
I print out the variables in the model, and the variable embedding in the word_embeddings_layer shows nan for the first time.
How to deal with it. Thanks!
谢谢!
作者, 这个预测怎么做呀
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.