Hi, can you please explain how I can form my own dataset for training MLMICNN. I'm con

Hi, Could you please refer <a href="https://github.com/may-/cnn-re-t

Dataset format and input format for new predictions about cnn-re-tf HOT 4 CLOSED

may- commented on June 7, 2024

Dataset format and input format for new predictions

from cnn-re-tf.

Comments (4)

may- commented on June 7, 2024

Hi,

Could you please refer issue#1 (how to check prediction) and issue#2 (how to create source.att), at first?

MLMI stands for multi-label multi-instance training, that is, the labels are not mutually exclusive.
My task is to predict the relationship between two entities. For example, if you have the entity pair:

Berlin <relation> Germany

then the correct prediction will be: country, located_in and capital_of. (In this case, three correct labels according to WikiData)

Let's say, I found the following sentence online:

The city of Berlin is located in eastern Germany on the River Spree.

Then the training data looks like:

source.txt (Berlin has entity ID: <Q64>, Germany: <Q183>)

The city of <Q64> is located in eastern <Q183> on the River Spree .

source.left
```
The city of <Q64>
```
source.middle
```
is located in eastern
```
source.right
```
<Q183> on the River Spree . 
```
target.txt (assume that the first column represents whether the sentence has the label country, second column: located_in, third column: capital_of, etc.)
```
1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
```

Here is the list of 23 labels (and the class-wise results):

If you still have any questions, describe them little bit more concretely, please? I don't have any clue what your task is, how your data look like etc...

from cnn-re-tf.

ArijeetC commented on June 7, 2024

Hi, thanks for the explanation. My task is also relationship extraction, I need to find the relations that exist in a given set of documents. I plan to first create a dataset from the documents with the entities and relations annotated, and then train a model on it to predict relations for new documents. Will this project be suitable for my task?

from cnn-re-tf.

may- commented on June 7, 2024

It sounds pretty similar to my task.
Ah, so, your question is how to do distant supervision, right?

I'm not sure if it helps, but I describe here the four steps I did in the distant_supervision.py script.

Download raw text data (if you already have the documents, you can replace this with your data)
For example:
```
The city of Berlin is located in eastern Germany on the River Spree. 
```

Find entity

execute StanfordNER (demo)

The city of <Berlin_LOCATION> is located in eastern <Germany_LOCATION> on the <River Spree_LOCATION>.

extract all combinations. In the case above:

<Berlin_LOCATION> <Germany_LOCATION>
<Berlin_LOCATION> <River Spree_LOCATION>
<Germany_LOCATION> <River Spree_LOCATION>

Find entity ID (using SPARQL on WikiData API (sample))

<Q64 Berlin> <Q183 Germany>
<Q64 Berlin> <Q1684 River Spree>
<Q183 Germany> <Q1684 River Spree>

Find relation (using SPARQL on WikiData API (sample))

<Q64 Berlin> <Q183 Germany> --> <P17 country>, <P1376 capital_of>, <P131 located_in>
<Q64 Berlin> <Q1684 River Spree> --> <P206 located_next_to>
<Q183 Germany> <Q1684 River Spree> --> None

At the end of these four steps, we get

source.txt

The city of <Q64> is located in eastern <Q183> on the River Spree . 
The city of <Q64> is located in eastern Germany on the <Q1684> .

and

target.txt

1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

If you want to learn the theory, I recommend to read:

Minz et al. Distant supervision for relation extraction without labeled data ACL 2009.
Dong et al. Knowledge Vault: A Web-Scale Approach to
Probabilistic Knowledge Fusion ACM SIGKDD 2014.

from cnn-re-tf.

ArijeetC commented on June 7, 2024

Thanks for the detailed explanation, I think I might be able to solve my task with this.

from cnn-re-tf.

Dataset format and input format for new predictions about cnn-re-tf HOT 4 CLOSED

Comments (4)

Related Issues (8)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent