junefeng / relationclassification-rl Goto Github PK
View Code? Open in Web Editor NEWReinforcement Learning for Relation Classification from Noisy Data(AAAI2018)
Reinforcement Learning for Relation Classification from Noisy Data(AAAI2018)
Hi, I have found some duplicate data in train.txt. For example,
line 190245: m.053x3n m.0fnb4 shamsur_rahman dhaka /people/deceased_person/place_of_death these include '' the best poems of shamsur_rahman , '' published last year in new delhi ; and '' the devotee , the combatant : selected poems of shamsur_rahman , '' published in 2000 in dhaka . ###END###
line 190246: m.053x3n m.0fnb4 shamsur_rahman dhaka /people/deceased_person/place_of_death these include '' the best poems of shamsur_rahman , '' published last year in new delhi ; and '' the devotee , the combatant : selected poems of shamsur_rahman , '' published in 2000 in dhaka . ###END###
line 190667: m.05fjf m.0xsbj new_jersey bound_brook /location/location/contains bound_brook is one of the oldest settlements in new_jersey , dating to 1681 . ###END###
line 190668: m.05fjf m.0xsbj new_jersey bound_brook /location/location/contains bound_brook is one of the oldest settlements in new_jersey , dating to 1681 . ###END###
They are totally the same in both entities and sentences. Are they set for some reason?
您好,我在调试代码时,发现 data/pretrain/word2vec.txt 和 data/pretrain/pre_bestRL.txt 两个文件不存在,能否提供?
I tried to run testing using the pre-trained CNN weights provided in the "data/pretrain" directory. I did this by setting outString
in main.cpp to the directory where the pre-trained weights are stored. But after running this test, I obtain a pr curve very similar to the pr curve obtained after joint-training.
Can you clarify why this happens?
PR curve of pretrained CNN-
PR curve after joint training-
README 里面没发现这个项目的环境配置相关的信息,麻烦了解的人不吝解答,谢谢
For example
matrixRelationDao = (float *)calloc(dimensionC*relationTotal, sizeof(float));
matrixW1Dao = (float*)calloc(dimensionC * dimension * window, sizeof(float));
matrixB1Dao = (float*)calloc(dimensionC, sizeof(float));
updateMatrixRelation = (float *)calloc(dimensionC*relationTotal, sizeof(float));
updateMatrixW1 = (float*)calloc(dimensionC * dimension * window, sizeof(float));
updateMatrixB1 = (float*)calloc(dimensionC, sizeof(float));
我在研究您的论文时,产生了一个疑惑:
你的模型/方法破坏了training set & testing set的原始分布。
其他的RL工作都是基于改变模型参数来适配拟合数据的,也就是不会改变training data & testing data。这样就保证了training set & testing set的原始分布。
但是这篇文章的工作核心是:用RL来对原始training数据的noise bag进行剔除,通过标签Y改变input data。这在training阶段是OK的,这样做确实可以减少noise data对我的分类模型的干扰。但是在testing阶段还能这样吗?testing set都没label了,如何反馈reward给policy module进行testing set中的bag的剔除?那么我在testing phrase还如何work呢?
我看了代码,发现in testing phrase,确实是直接对test set用CNN做关系分类。
谢谢。
mldl@ub1604:/ub16_prj/RelationClassification-RL$ ./main rlpre 0.01/ub16_prj/RelationClassification-RL$ ./main r 0.01
wordTotal= 114042
Word dimension= 50
Segmentation fault (core dumped)
mldl@ub1604:
mldl@ub1604:/ub16_prj/RelationClassification-RL$ ./main rl 0.01/ub16_prj/RelationClassification-RL$ ./main test
wordTotal= 114042
Word dimension= 50
Segmentation fault (core dumped)
mldl@ub1604:
wordTotal= 114042
Word dimension= 50
Segmentation fault (core dumped)
mldl@ub1604:~/ub16_prj/RelationClassification-RL$
There are 49,828 entities in the training set, but there are only 39,528 pre-trained entity embeddings.
Hi,
Do you have codes for tensorflow implementation?
for the special reward setting in this work, better policy will select the sentences in the bag that has higher logP(r|xi), the best result is find the max one, which means finding one max sentence for each bag and feed it to train the classifier. Is that correct?
Hi,
There are 56 relations in sentences, but 53 in train.txt. However, I find that three lost relations are not rare:
/business/company/industry: 6 sentences
/people/ethnicity/includes_groups: 7 sentences
/people/ethnicity/people: 169 sentences
As a contrast, there are 4 relations whose sentences are only 1. They are:
/location/fr_region/capital
/business/shopping_center/owner
/business/shopping_center_owner/shopping_centers_owned
/location/mx_state/capital
So why do you delete frequent relations and leave relations which only have 1 sentences?
你好,
我下载了你提供的数据集RE.zip,发现训练集的句子个数570088和论文中汇报的(522611)不一致,还有就是我发现这个训练集(570088)中存在和测试集中的entity pair重叠的部分,用这个作为训练集是不是不太合适?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.