Git Product home page Git Product logo

cairo's People

Contributors

dan-zeman avatar florianpfisterer avatar plumaj avatar robasile avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cairo's Issues

brat2conllu.pl Does Not Output Dependency Relations

We have used brat's universal dependencies configuration (slightly adjusted with some renamed/added dependency relations) in brat v1.3 to annotate dependency relations and POS-tags in a small dataset.

After acquiring the brat-standoff-formatted .ann files, I ran the brat2conllu.pl script on them, but the resulting CoNLL-U formatted data lacks dependency structure and dependency relation labels.

Example

Plain Text (example.txt)

in a bowl .

Brat Standoff Annotation (example.ann)

T1.1	ADP 0 2	in
R1.1-1	case Arg1:T1.3 Arg2:T1.1
#1.1	AnnotatorNotes T1.1	LEMMA=in POSTAG=IN
T1.2	DET 3 4	a
A1.2-1	Definite T1.2 Ind
A1.2-2	PronType T1.2 Art
R1.2-1	det Arg1:T1.3 Arg2:T1.2
#1.2	AnnotatorNotes T1.2	LEMMA=a POSTAG=DT
T1.3	NOUN 5 9	bowl
A1.3-1	Number T1.3 Sing
#1.3	AnnotatorNotes T1.3	LEMMA=bowl POSTAG=NN
T1.4	PUNCT 10 11	.
R1.4-1	punct Arg1:T1.3 Arg2:T1.4
#1.4	AnnotatorNotes T1.4	LEMMA=. POSTAG=.

brat2conllu.pl Output (example.conllu)

Running perl brat2conllu.pl example.txt example.ann > example.conllu yields:

# sent_id = s1
# text = in a bowl .
1	in	_	ADP	_	_	0	_	_	Offset=0-2
2	a	_	DET	_	Definite=Ind|PronType=Art	0	_	_	Offset=3-4
3	bowl	_	NOUN	_	Number=Sing	0	_	_	Offset=5-9
4	.	_	PUNCT	_	_	0	_	_	Offset=10-11
  • Expected: the CoNLL-U output contains the correct dependency labels and structure.
  • Actual: as you can see, all words have 0 as their head (i.e. all words are roots) and the dependency labels are completely left out.

Question: what are we doing wrong? Does the script expect another version of the brat standoff format?

Do not have .brat file for brat2conllu.pl

Hi.

The documentation for brat2conllu.pl shows as an example to use ru.txt and ru.brat as input. From brat I can only export ".ann" files, but I do not have any .brat files. Art those the same? I renamed my .ann files to .brat and it seems to work.

Thanks!

License

Hi,

What license are the sentences released under? I wanted to add some of them to Tatoeba, but can only do so if they are released with a compatible license such as CC BY or CC0.

CC0 would be ideal, as it's compatible with the terms of the Mozilla Common Voice project.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.