universaldependencies / cairo Goto Github PK
View Code? Open in Web Editor NEWCairo CICLing Corpus – a multi-lingual parallel UD-style treebank of short sentences
Cairo CICLing Corpus – a multi-lingual parallel UD-style treebank of short sentences
We have used brat's universal dependencies configuration (slightly adjusted with some renamed/added dependency relations) in brat v1.3 to annotate dependency relations and POS-tags in a small dataset.
After acquiring the brat-standoff-formatted .ann
files, I ran the brat2conllu.pl
script on them, but the resulting CoNLL-U formatted data lacks dependency structure and dependency relation labels.
example.txt
)in a bowl .
example.ann
)T1.1 ADP 0 2 in
R1.1-1 case Arg1:T1.3 Arg2:T1.1
#1.1 AnnotatorNotes T1.1 LEMMA=in POSTAG=IN
T1.2 DET 3 4 a
A1.2-1 Definite T1.2 Ind
A1.2-2 PronType T1.2 Art
R1.2-1 det Arg1:T1.3 Arg2:T1.2
#1.2 AnnotatorNotes T1.2 LEMMA=a POSTAG=DT
T1.3 NOUN 5 9 bowl
A1.3-1 Number T1.3 Sing
#1.3 AnnotatorNotes T1.3 LEMMA=bowl POSTAG=NN
T1.4 PUNCT 10 11 .
R1.4-1 punct Arg1:T1.3 Arg2:T1.4
#1.4 AnnotatorNotes T1.4 LEMMA=. POSTAG=.
brat2conllu.pl
Output (example.conllu
)Running perl brat2conllu.pl example.txt example.ann > example.conllu
yields:
# sent_id = s1
# text = in a bowl .
1 in _ ADP _ _ 0 _ _ Offset=0-2
2 a _ DET _ Definite=Ind|PronType=Art 0 _ _ Offset=3-4
3 bowl _ NOUN _ Number=Sing 0 _ _ Offset=5-9
4 . _ PUNCT _ _ 0 _ _ Offset=10-11
Question: what are we doing wrong? Does the script expect another version of the brat standoff format?
Hi.
The documentation for brat2conllu.pl shows as an example to use ru.txt and ru.brat as input. From brat I can only export ".ann" files, but I do not have any .brat files. Art those the same? I renamed my .ann files to .brat and it seems to work.
Thanks!
Hi,
What license are the sentences released under? I wanted to add some of them to Tatoeba, but can only do so if they are released with a compatible license such as CC BY or CC0.
CC0 would be ideal, as it's compatible with the terms of the Mozilla Common Voice project.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.