Git Product home page Git Product logo

tanl's Issues

CoNLL2012 Datasets in .json format

datasets.py expects .json files for CoNLL2012 dataset. However, after searching online, I cannot find any preprocessing tools to yield .json files for the CoNLL2012 dataset.

Would the authors be able to provide a way to preprocess the CoNLL2012 dataset so that it can be used for training?

Thanks,

reproduce on other datasets

Since you mentioned, "For other datasets, we provide sample processing code which does not necessarily match the format of publicly available versions (we do not plan to adapt the code to load datasets in other formats)". I'd like to know how can I reproduce the results on other datasets in the paper.

Inquiry Regarding ACE2005-Event Data for TANL

Hi,
could you kindly share the ACE2005 dataset for Event Extraction(Event Trigger Dataset & Event Argument Dataset) or provide guidance on how I might obtain access to it?
Thanks!

Bug in augment_sentence function

Hi, there seems to be a small bug in augment_sentence function in utils.py. When the root of the entity tree is an entity with tags, those tags won't be augmented onto the output. For example, when I run the code above:

from utils import augment_sentence

tokens = ['Tolkien', 'was', 'born', 'here']
augmentations =  [
        ([('person',), ('born in', 'here')], 0, 1),
        ([('location',)], 3, 4),
    ]

# example from the test set of conll03 NER
tokens = ['Premier', 'league']
augmentations = [([('miscellaneous',)], 0, 2)]

begin_entity_token = "["
sep_token = "|"
relation_sep_token = "="
end_entity_token = "]"

augmented_output = augment_sentence(tokens, augmentations, begin_entity_token, sep_token, relation_sep_token, end_entity_token)
print(augmented_output)

It prints out Premier league instead of [ Premier league | miscellaneous ]. This happened because in line 124 (utils.py), the value of the root in entity tree is reset to an empty list. My quick fix of this is initializing the start index of the root as -1. That is changing line 103 in utils.py to

root = (None, -1, len(tokens))   # this node represents the entire sentence

It would be great if someone could let me know if I am correct on this. Thanks!

Episode numbers in few-shot experiment

Hi,
Thank you for sharing the code ! I'm trying to reproduce the results on FewRel 1.0 . And I'm wondering how many episodes and query numbers are used in 1 shot , and 5 shot cases, respectively ?

Thanks.

The format of Multiwoz dataset

Hi Giovanni,

Nice work and thanks for the sharing. I am reproducing the results of the DST task. However, I found the processed data format of multiwoz 2.1 dataset using the script from https://github.com/jasonwu0731/trade-dst does not match your code. May I ask if you do additional preprocessing procedure? If so, would you mind sharing the script?

Sincerely,
Yan

Ace2005EventExtraction Dataset

Hi,

I've followed the instructions per section A.5 of the paper using this github repo: https://github.com/nlpcl-lab/ace2005-preprocessing/tree/96c8fd4b5a8c87dd6a265d5c14f4d8b8eb9b7fbe

which gives me train/dev/test.json files for ace2005.

However, inside of tanl/datasets.py, https://github.com/amazon-research/tanl/blob/2bd8052f0ff6df3b8fd04d7da1469d73f8639099/datasets.py#L1165 , I cannot find a way to run Ace2005. I am currently receiving the following error when attempting to train with ace2005 -

FileNotFoundError: [Errno 2] No such file or directory: 'data/ace2005event/ace2005event_types.json'

Does anyone have any advice on how to obtain the necessary files besides train/dev/test.json files for ace2005 to train Ace2005 event extraction?

Thanks,

About data files used for the FewRel dataset

Hi! I'm wondering how to prepare the data files for the FewRel dataset.
Do we use the full train_wiki.json from https://github.com/thunlp/FewRel/tree/master/data as the training split for meta-training, and the full val_wiki.json for evaluation (support&query)? I'm confused because I notice that the fewrel_meta config also specifies do_eval=True. Then what dev split would the code use?
Would appreciate any guidance on this!

About performance on tacred

Hi,

Thanks for sharing the code. I try to reproduce the result on tacred. However, the F1 score on the test set is only 67.67.

The config I used is listed below.

[tacred]
datasets = tacred
multitask = False
model_name_or_path = t5-base
num_train_epochs = 10
max_seq_length = 256
train_split = train
per_device_train_batch_size = 16
do_train = True
do_eval = True
do_predict = True

I run the code with

CUDA_VISIBLE_DEVICES=0,1 nohup python3 -m torch.distributed.launch --nproc_per_node=2 run.py tacred > result.log 2>&1 &

May I ask which part goes wrong? Thank you.

Regards,
Yiming

ATIS and SNIPS Dataset Source

Hi,

Would you mind leaving some instructions for where you found/preprocessed the ATIS and SNIPS datasets?

I found some .tsv files here for train/dev/test, but the format is not exactly what tanl/datasets.py expects.

Thanks,

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.