Git Product home page Git Product logo

Comments (9)

tom68-ll avatar tom68-ll commented on September 1, 2024 1

Thank you very much for your response. I will make another attempt

from cyclenlg.

Edillower avatar Edillower commented on September 1, 2024 1

@ktxlh
Not sure which is the first or who is the original author, but
For (2) we used the camel_case_split function: https://github.com/search?q=camel_case_split+webnlg&type=code
For (3) we used the remove_accents function: https://github.com/lizhe2016/RFBFN/blob/9dfcabeb60e6f9583b5691ac48125322c8e354f2/data_preprocess/preprocess.py#L377
Feel free to let me know if you need any further assitance from my side.

from cyclenlg.

Edillower avatar Edillower commented on September 1, 2024

Hi, you can check section 4.3 for the training parameters. In our paper, we use T5-base as the backbone model, but you can also use any other seq2seq models that are available throught Huggingface.

from cyclenlg.

Edillower avatar Edillower commented on September 1, 2024

Feel free to reopen the issue if you run into any further problems.

from cyclenlg.

tom68-ll avatar tom68-ll commented on September 1, 2024

Sorry for bothering again, may I ask if we need to preprocess the dataset to fit this task, like with WebNLG?

from cyclenlg.

Edillower avatar Edillower commented on September 1, 2024

See section 4.1 of the paper for preprocessings we have done. Those processings are some conventional things previous works on WebNLG and T5 have done, but you can definitely preprocess the data in a way you prefer though the results may vary a little bit.

from cyclenlg.

ktxlh avatar ktxlh commented on September 1, 2024

What "prevous works" are you referring to? It is fine to just point us to a specific paper...

from cyclenlg.

Edillower avatar Edillower commented on September 1, 2024

Hi @ktxlh,
For (1) Prefixing task description string is what described in the original T5 paper.
For (2) convert camel-cased or snake-cased subjects, predicates and objects to regular strings; and (3) normalize accented characters, there's a convertion script that has been reused by multiple previous works on WebNLG. Let me try to find the original github repo and get back to you later.

Thanks,
Zhuoer Wang

from cyclenlg.

Edillower avatar Edillower commented on September 1, 2024

@ktxlh Following the issue you opened at the Amazon repo - Sorry I wasn't able to release the data/preprocessing script via Amazon, but here are some sample inputs for your reference.

Sample data file (line sperated):
Generate in English: [S] Aarhus Airport [P] location [O] Tirstrup [S] Tirstrup [P] is part of [O] Central Denmark Region [S] Tirstrup [P] is part of [O] Denmark
Generate in English: [S] Aarhus Airport [P] operating organisation [O] Aarhus Lufthavn A/S [S] Aarhus Airport [P] runway name [O] 10L/28R [S] Aarhus Airport [P] runway length [O] 2702.0
Generate in English: [S] Aarhus Airport [P] runway length [O] 2776.0 [S] Aarhus Airport [P] operating organisation [O] Aarhus Lufthavn A/S [S] Aarhus Airport [P] runway name [O] 10L/28R

Sample text file(line sperated):
Extract Triplets: 1634 The Ram Rebellion was written in the US, where Native Americans are an ethnic group, Washington DC is the capital and Barack Obama is the leader and President.
Extract Triplets: The demonym for people living in the United States is Americans, African Americans are an ethnic group there and is where 1634 The Ram Rebellion was written. The leader of the United States is known as the President and is currently Barack Obama.
Extract Triplets: The United States are inhabited by Americans and the ethnic group of African Americans. 1634 The Ram Rebellion was written in the country where President Barack Obama is the leader.

from cyclenlg.

Related Issues (1)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.