Comments (5)
Hi, thanks for your interest!
Yes, you will need to convert the character-level indexes to word-level ones. You can refer to SciERC as an example for the actual format.
Let me know if you have further questions!
from pure.
Hi, our models are trained on sentence-level relations, so the models are not able to extract relations across sentences.
You can test the pre-trained models by concatenating all sentences in one document as one sentence in the .json
file (so relations are still within the same sentence), however, the performance of the models may not be good.
You can also train a model with your document-level annotations if you have a training set.
from pure.
Hey! Many thanks for your answer!
I understood how to build the .json
file with your suggestion.
Now, I have a doubt about the model: inside my dataset, I have relations between different sentences. Are they supported for this model?
The situation I'm referring is about the example I wrote in the first comment and her it is a part of the text document:
[...] And here are the facts. 210,000 dead people in our country in just the last several months. [...]
As you can see, the relation is between two different sentences.
I'm having this trouble since I noticed that also the relations (like the example below) are nested per sentences as well as the entities.
"relations": [[[4, 4, 6, 17, "USED-FOR"], [20, 21, 4, 4, "USED-FOR"], [24, 26, 20, 21, "FEATURE-OF"], [28, 29, 24, 26, "FEATURE-OF"]], [[42, 42, 44, 45, "CONJUNCTION"], [44, 45, 48, 49, "CONJUNCTION"]], [[58, 62, 55, 55, "USED-FOR"], [67, 67, 64, 64, "HYPONYM-OF"], [67, 67, 69, 69, "USED-FOR"], [67, 67, 72, 72, "CONJUNCTION"], [72, 72, 64, 64, "HYPONYM-OF"], [72, 72, 74, 75, "USED-FOR"], [79, 79, 81, 88, "USED-FOR"]], [[95, 105, 91, 91, "USED-FOR"]], []]
In this case, I don't know how to manage the making of the .json
file if I have relations in among different sentences.
Thank you in advance for your explanation.
from pure.
Hi! Thank you as always for your reply!
Yes, because of a completely different dataset (based on entities like “Claim”
and “Premise”
and relations like “Support”
and “Attack”
), my idea is to train the model with my dataset and test it with the test set.
I suppose I have to change something here in order to "support" document-level training and testing
--eval_batch_size 32 \
--learning_rate 2e-5 \
--num_train_epochs 10 \
--context_window {0 | 100} \
--max_seq_length {128 | 228} \```
from pure.
Hi! In order to support document-level training, the easiest thing you can do is to concatenate sentences of one document together and use it as a single example (instead of using each sentence as one example).
You will also need to modify --max_seq_length
accordingly (indicating the max #tokens in one document).
Note that this can only work when your documents are not too long (i.e., <= 512 tokens).
from pure.
Related Issues (20)
- Multiple issues HOT 2
- different F1 with the same seed HOT 2
- tensorflow版本 HOT 1
- About the relation in datasets HOT 1
- [Paper] What are "gold" entity and relationship types? HOT 2
- Provide full environment
- How to load models into Python HOT 2
- some code problems reguarding run_relation_approx(get_features_from_file) HOT 2
- where is the code of Efficient Batch Computations
- Approximation Model Training & Inference HOT 1
- entity is S or O ?
- Further question of f1 and e2e_f1
- 版本库问题 HOT 1
- 版本库问题
- ACE dataset
- Training a model on a dataset that is not ace04, ace05, or scierc HOT 1
- training model for WLP -- stuck in suboptimal solution
- Input data format question for custom dataset !
- cuda out of memory
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pure.