Git Product home page Git Product logo

relationclassification-rl's People

Contributors

junefeng avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

relationclassification-rl's Issues

duplicate data in train.txt

Hi, I have found some duplicate data in train.txt. For example,
line 190245: m.053x3n m.0fnb4 shamsur_rahman dhaka /people/deceased_person/place_of_death these include '' the best poems of shamsur_rahman , '' published last year in new delhi ; and '' the devotee , the combatant : selected poems of shamsur_rahman , '' published in 2000 in dhaka . ###END###
line 190246: m.053x3n m.0fnb4 shamsur_rahman dhaka /people/deceased_person/place_of_death these include '' the best poems of shamsur_rahman , '' published last year in new delhi ; and '' the devotee , the combatant : selected poems of shamsur_rahman , '' published in 2000 in dhaka . ###END###

line 190667: m.05fjf m.0xsbj new_jersey bound_brook /location/location/contains bound_brook is one of the oldest settlements in new_jersey , dating to 1681 . ###END###
line 190668: m.05fjf m.0xsbj new_jersey bound_brook /location/location/contains bound_brook is one of the oldest settlements in new_jersey , dating to 1681 . ###END###

They are totally the same in both entities and sentences. Are they set for some reason?

数据文件问题

您好,我在调试代码时,发现 data/pretrain/word2vec.txt 和 data/pretrain/pre_bestRL.txt 两个文件不存在,能否提供?

Testing using the pretrained CNN weights yields pr-curve similar to the pr-curve obtained after joint training.

I tried to run testing using the pre-trained CNN weights provided in the "data/pretrain" directory. I did this by setting outString in main.cpp to the directory where the pre-trained weights are stored. But after running this test, I obtain a pr curve very similar to the pr curve obtained after joint-training.
Can you clarify why this happens?
PR curve of pretrained CNN-
cnn_pr
PR curve after joint training-
pr

到底是sentence-level还是bag-level的??

你好,看到你的paper里全文提到的都是sentence-level的关系抽取,但是打印了test时部分输出的结果,如下:
image
这很明显是bag级别的评测啊,总共bag数*52个类=5027256,tot表示的是positive bag数1950个。
如果是sentence级别的训练,为什么要用bag级别的评测呢??(另外也大致看了train部分的代码,虽然C++不太懂,但是里面也出现了bags_train这样的变量,是不是训练的时候也是bag-level的呢?如果这样的话不是跟paper冲突了吗?)
希望能解答下,非常感谢!!

环境、配置

README 里面没发现这个项目的环境配置相关的信息,麻烦了解的人不吝解答,谢谢

what's the meaing of Dao? Is it gradients?

For example

matrixRelationDao = (float *)calloc(dimensionC*relationTotal, sizeof(float));
matrixW1Dao =  (float*)calloc(dimensionC * dimension * window, sizeof(float));
matrixB1Dao =  (float*)calloc(dimensionC, sizeof(float));

updateMatrixRelation = (float *)calloc(dimensionC*relationTotal, sizeof(float));
updateMatrixW1 =  (float*)calloc(dimensionC * dimension * window, sizeof(float));
updateMatrixB1 =  (float*)calloc(dimensionC, sizeof(float));

对这个work有一个疑惑:

我在研究您的论文时,产生了一个疑惑:
你的模型/方法破坏了training set & testing set的原始分布。

其他的RL工作都是基于改变模型参数来适配拟合数据的,也就是不会改变training data & testing data。这样就保证了training set & testing set的原始分布。

但是这篇文章的工作核心是:用RL来对原始training数据的noise bag进行剔除,通过标签Y改变input data。这在training阶段是OK的,这样做确实可以减少noise data对我的分类模型的干扰。但是在testing阶段还能这样吗?testing set都没label了,如何反馈reward给policy module进行testing set中的bag的剔除?那么我在testing phrase还如何work呢?

我看了代码,发现in testing phrase,确实是直接对test set用CNN做关系分类。

谢谢。

Segmentation fault (core dumped)

mldl@ub1604:/ub16_prj/RelationClassification-RL$ ./main rlpre 0.01
wordTotal= 114042
Word dimension= 50
Segmentation fault (core dumped)
mldl@ub1604:
/ub16_prj/RelationClassification-RL$ ./main r 0.01
mldl@ub1604:/ub16_prj/RelationClassification-RL$ ./main rl 0.01
wordTotal= 114042
Word dimension= 50
Segmentation fault (core dumped)
mldl@ub1604:
/ub16_prj/RelationClassification-RL$ ./main test
wordTotal= 114042
Word dimension= 50
Segmentation fault (core dumped)
mldl@ub1604:~/ub16_prj/RelationClassification-RL$

a doubt for the idea

for the special reward setting in this work, better policy will select the sentences in the bag that has higher logP(r|xi), the best result is find the max one, which means finding one max sentence for each bag and feed it to train the classifier. Is that correct?

three lost relations

Hi,
There are 56 relations in sentences, but 53 in train.txt. However, I find that three lost relations are not rare:
/business/company/industry: 6 sentences
/people/ethnicity/includes_groups: 7 sentences
/people/ethnicity/people: 169 sentences

As a contrast, there are 4 relations whose sentences are only 1. They are:
/location/fr_region/capital
/business/shopping_center/owner
/business/shopping_center_owner/shopping_centers_owned
/location/mx_state/capital

So why do you delete frequent relations and leave relations which only have 1 sentences?

关于数据集的问题

你好,
我下载了你提供的数据集RE.zip,发现训练集的句子个数570088和论文中汇报的(522611)不一致,还有就是我发现这个训练集(570088)中存在和测试集中的entity pair重叠的部分,用这个作为训练集是不是不太合适?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.