Comments (4)
Hi,
Could you please refer issue#1 (how to check prediction) and issue#2 (how to create source.att), at first?
MLMI stands for multi-label multi-instance training, that is, the labels are not mutually exclusive.
My task is to predict the relationship between two entities. For example, if you have the entity pair:
Berlin <relation> Germany
then the correct prediction will be: country
, located_in
and capital_of
. (In this case, three correct labels according to WikiData)
Let's say, I found the following sentence online:
The city of Berlin is located in eastern Germany on the River Spree.
Then the training data looks like:
-
source.txt
(Berlin
has entity ID: <Q64>,Germany
: <Q183>)The city of <Q64> is located in eastern <Q183> on the River Spree .
-
source.left
The city of <Q64>
-
source.middle
is located in eastern
-
source.right
<Q183> on the River Spree .
-
target.txt
(assume that the first column represents whether the sentence has the labelcountry
, second column:located_in
, third column:capital_of
, etc.)1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Here is the list of 23 labels (and the class-wise results):
If you still have any questions, describe them little bit more concretely, please? I don't have any clue what your task is, how your data look like etc...
from cnn-re-tf.
Hi, thanks for the explanation. My task is also relationship extraction, I need to find the relations that exist in a given set of documents. I plan to first create a dataset from the documents with the entities and relations annotated, and then train a model on it to predict relations for new documents. Will this project be suitable for my task?
from cnn-re-tf.
It sounds pretty similar to my task.
Ah, so, your question is how to do distant supervision, right?
I'm not sure if it helps, but I describe here the four steps I did in the distant_supervision.py
script.
- Download raw text data (if you already have the documents, you can replace this with your data)
For example:The city of Berlin is located in eastern Germany on the River Spree.
- Find entity
- execute StanfordNER (demo)
The city of <Berlin_LOCATION> is located in eastern <Germany_LOCATION> on the <River Spree_LOCATION>.
- extract all combinations. In the case above:
<Berlin_LOCATION> <Germany_LOCATION> <Berlin_LOCATION> <River Spree_LOCATION> <Germany_LOCATION> <River Spree_LOCATION>
- execute StanfordNER (demo)
- Find entity ID (using SPARQL on WikiData API (sample))
<Q64 Berlin> <Q183 Germany> <Q64 Berlin> <Q1684 River Spree> <Q183 Germany> <Q1684 River Spree>
- Find relation (using SPARQL on WikiData API (sample))
<Q64 Berlin> <Q183 Germany> --> <P17 country>, <P1376 capital_of>, <P131 located_in> <Q64 Berlin> <Q1684 River Spree> --> <P206 located_next_to> <Q183 Germany> <Q1684 River Spree> --> None
At the end of these four steps, we get
source.txt
The city of <Q64> is located in eastern <Q183> on the River Spree .
The city of <Q64> is located in eastern Germany on the <Q1684> .
and
target.txt
1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
If you want to learn the theory, I recommend to read:
- Minz et al. Distant supervision for relation extraction without labeled data ACL 2009.
- Dong et al. Knowledge Vault: A Web-Scale Approach to
Probabilistic Knowledge Fusion ACM SIGKDD 2014.
from cnn-re-tf.
Thanks for the detailed explanation, I think I might be able to solve my task with this.
from cnn-re-tf.
Related Issues (8)
- [Help] How do I specify the positive class? How to output the prediction results? HOT 5
- How to prepare the source.att file HOT 1
- How do you create the entities.pickle file? HOT 4
- STANFORD NER HOT 7
- distant supervision script exists with error HOT 2
- Did you optimize F1 specifically
- TypeError: object of type 'NoneType' has no len() with #3 settings
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from cnn-re-tf.