Hello! I'm trying to use your repo but first I need to convert my annotations in <

Hey! Many thanks for your answer! I understood how to build the <code class="notra

Input Data Format about pure HOT 5 CLOSED

princeton-nlp commented on May 30, 2024

Input Data Format

from pure.

Comments (5)

a3616001 commented on May 30, 2024 1

Hi, thanks for your interest!
Yes, you will need to convert the character-level indexes to word-level ones. You can refer to SciERC as an example for the actual format.

Let me know if you have further questions!

from pure.

a3616001 commented on May 30, 2024 1

Hi, our models are trained on sentence-level relations, so the models are not able to extract relations across sentences.
You can test the pre-trained models by concatenating all sentences in one document as one sentence in the .json file (so relations are still within the same sentence), however, the performance of the models may not be good.
You can also train a model with your document-level annotations if you have a training set.

from pure.

pierpaologoffredo commented on May 30, 2024

Hey! Many thanks for your answer!
I understood how to build the .jsonfile with your suggestion.

Now, I have a doubt about the model: inside my dataset, I have relations between different sentences. Are they supported for this model?
The situation I'm referring is about the example I wrote in the first comment and her it is a part of the text document:

[...] And here are the facts. 210,000 dead people in our country in just the last several months. [...]

As you can see, the relation is between two different sentences.

I'm having this trouble since I noticed that also the relations (like the example below) are nested per sentences as well as the entities.

"relations": [[[4, 4, 6, 17, "USED-FOR"], [20, 21, 4, 4, "USED-FOR"], [24, 26, 20, 21, "FEATURE-OF"], [28, 29, 24, 26, "FEATURE-OF"]], [[42, 42, 44, 45, "CONJUNCTION"], [44, 45, 48, 49, "CONJUNCTION"]], [[58, 62, 55, 55, "USED-FOR"], [67, 67, 64, 64, "HYPONYM-OF"], [67, 67, 69, 69, "USED-FOR"], [67, 67, 72, 72, "CONJUNCTION"], [72, 72, 64, 64, "HYPONYM-OF"], [72, 72, 74, 75, "USED-FOR"], [79, 79, 81, 88, "USED-FOR"]], [[95, 105, 91, 91, "USED-FOR"]], []]

In this case, I don't know how to manage the making of the .json file if I have relations in among different sentences.

Thank you in advance for your explanation.

from pure.

pierpaologoffredo commented on May 30, 2024

Hi! Thank you as always for your reply!

Yes, because of a completely different dataset (based on entities like “Claim” and “Premise” and relations like “Support” and “Attack”), my idea is to train the model with my dataset and test it with the test set.

I suppose I have to change something here in order to "support" document-level training and testing

 --eval_batch_size 32 \
 --learning_rate 2e-5 \
 --num_train_epochs 10 \
 --context_window {0 | 100} \
 --max_seq_length {128 | 228} \```

from pure.

a3616001 commented on May 30, 2024

Hi! In order to support document-level training, the easiest thing you can do is to concatenate sentences of one document together and use it as a single example (instead of using each sentence as one example).
You will also need to modify --max_seq_length accordingly (indicating the max #tokens in one document).

Note that this can only work when your documents are not too long (i.e., <= 512 tokens).

from pure.

Input Data Format about pure HOT 5 CLOSED

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent