maslinych / daba Goto Github PK

View Code? Open in Web Editor NEW

5.0 5.0 5.0 1.02 MB

Pattern-based morphemic analysis toolkit

License: Other

Python 100.00%

daba's People

Contributors

Stargazers

Watchers

Forkers

mompolice eldams vieenrose israaar s7d11

daba's Issues

Last entry

It seems that the parser does not see the last entry of each dictionary. So, in Bamadaba, it does not see the entry zùwɛn (therefore, all zùwɛn in the parsed texts remain non-analyzed). In yorow.txt, it does not see Zonba, etc.

bug in a parser: a conglomerate

The Parser does not recognize don-ka-bo, although this word is in Bamadaba:

dòn-kà-bɔ́:n:agitation [dòn:v:entrer kà:pm:INF bɔ́:v:sortir]

Parsing of tonal texts

Concerning the parsing of Dumestre's tonal texts:
For monosyllabic verbs (kɛ́, bɔ́, suffixes -ra/-la/-na, and -len/-nen are not glossged correctly (because of the tonal diacritics?)

do not treat parens and quotes as sentence boundaries

Не считать скобки и кавычки границами предложения.

Это действительно было бы хорошо, потому что - с какой стати они границы предложения?

splitting of two autors

If a text have two autors (or more), they should be introduced into the autors' database as separate entries (so far, they are in one entry)

siginɔnɔkɛnɛ parsing

Кирилл, ещё парсер несправился с такой формой:

siginɔnɔkɛnɛ

Она должна члениться так:

sìgi:n:buffle nɔ́nɔ:n:lait kɛ́nɛ:adj:sain

Но парсер не предлагает правильного членения, он хочет обязательно делить nɔnɔ на две части.
Вообще-то правильно писать так: siginɔnɔ kɛnɛ (потому что kɛnɛ - прилагательное, оно должно писаться отдельно), т.е. в исходнике неправильное написание. Может, это из-за этого?

Search, CQL, corbama-net-tonal

Actually, if one makes a search in corbama-net-tonal by CQL, only occurances in the original texts can be found where tones were marked, i.e., a slim minority of all the occurances.
I think, this types of search should be done not in the "original" line, but in the desambiguated line. Otherwise, this search makes no sens.

gdisamb: possibility to select fonts

Add possibility to select font size and family for all crucial
fonts in disambiguisation interface.

gdisamb: show name of the opened file

Ещё одно желательное усовершенствование: хорошо бы в программе дезамбигуизации при работе над файлом где-нибудь (в нижней или верхней рамке, например) высвечивалось его имя. А то ведь, бывает, пока работаешь, уже и забываешь, над чем работаешь – и посмотреть можно только через Save As.

gdisamb: понятная диагностика при попытке открытия нераспарсенного текста

Часто открывают тексты .html с метаразметкой, но не прошедшие через парсер,
gdisamb падает с невнятной диагностикой. Сделать внятную диагностику.

parser: support for language switching

Сделать возможность в парсере переключаться между языками. А лучше сделать варианты с предзагруженными словарями и грамматикой для каждого из языков (чтобы не загружать заново).

sɛmɛkala parsing

имеем слово sɛmɛkala, оно должно члениться на sɛ̀mɛ:n:hache-houe + kàla:n:tige. Парсер предлагает разные другие членения (более мелкие), но не это. Это - баг, или так и должно быть, и такие случаи предполагается всегда доводить вручную?

parser progress indicator

а трудно ли сделать в интерфэйсе парсера какой-нибудь индикатор,
отражающий ход парсирования? Ну, типа - чтобы загоралась какая-нибудь
точка, что, мол, процесс парсинга идёт, ждите. Ещё лучше - чтобы отражался
процент сделанной работы (чтобы понятно было, сколько ещё ждать). А то
сейчас это вообще какой-то чёрный ящик, непонятно - то ли работа идёт, то
ли уже закончилась... (это, конечно, не самое насущное - главное, чтобы
парсер в принципе работал).

faamatɔ parsing

Жан Жак жалуется, что парсер плохо глоссирует слов faamatɔ: здесь мы имеем -tɔ:mrph:ST, но парсер этого варианта почему-то не предлагает.

Off-line version of the Bamana Reference corpus

gdisamb: crash on JoinTokens

gdisamb crashes on Join tokens operation (Linux).
Need to make "be ka" and similar a single token with space inside it.
Provide that it should work later in corpus.

possibility to add comments to glosses added locally in gdisamb

добавление поля комментария к глоссам в localdict

font size in gdisamb gloss editor

Make font larger of make font selection widget.
Tones are too small and not visible clearly.

Metaeditor: a minor disfunction

In the Metaeditor, when in the option "Author", key combinations do not work. I.e., "file - Open" (etc.) can be accessed only by clicking.

Types of texts

It would be fine if, when the option "Text type" is activated, one could select SEVERAL individual texts (i.e., compose an individual subcorpus). By now, it is impossible.

Makefile extension

Hi Kirill,
Is it possible to extend the makefile to compile the project on Linux Workstation ?
Cheers

Multiple authors

In Metaeditor, if a file has multiple authors, each author is not stored in the Authors' Database separately. Instead, the set of all the authors of one text is stored as one entry. So, if the text A has one author (ex., Amadu Ture), and the text B has two authors (ex., Amadu Ture & Mamadu Sisoko), Amadu Ture of the text B is not identified with Amadu Ture of the Text A; instead, a new entry, "Amadu Ture & Mamadu Sisoko", is created.
It is necessary to modify the mechanism so that multiple authors' entries were split into individual authors' entries.

very long sentences make disambiguisation impossible

Сейчас мы используем таг <br/> для обозначения границ строк в стихах, таблицах и списках. Но вот парсер эти тэги не считает границами предложений (видимо, оправданно) - и получается, что иногда довольно много строк, разделённых этим тэгом, идут одним огромным блоком, который просто заполняет всё пространство экрана в дезамбигуизаторе - при этом его нельзя никак ни прокрутить, ни подвинуть с экрана. То есть, обрабатывать такие куски оказывается вообще невозможно.

gdisamb: interface to add and delete tokens

Need interface to add and delete tokens, including sentence boundaries.

metaeditor: support for language switching

Сделать возможным переключаться между разными системами метаразметки
(при старте программы). Нужно для метаразметки бамана и манинка с разными meta.xml и authors.xml

tèmènen, tinyènen parsing

Почему-то формы tèmènen, tinyènen (в новой орфографии: tɛmɛnen, tiɲɛnen) не анализируются парсером правильно (не выделятется –nen:mrph:PTCP.RES).

parsing of suffix combination -len-ba

When we have combination of the suffixes -len:mrph:PTCP.RES and -ba:mrph:AUGM, the parser does not recognize (probably, because -ba is not supposed to combine with participles? but in reality, they can combine!). Ex.:
ɲágalilenba:ptcp: [ɲágali:v:être.content len:mrph:PTCP.RES ba:mrph:AUGM]

parser: недоступна кнопка добавить словарь

Если загружено много словарей, список длинный и не влезает в панель, кнопка добавления словаря не влезает и не видна.

Authors' database for the metaeditor

Now, if a text has two authors (or more), their data is automatically imputted into the database as follows:

It is necessary to split it and represent separately for each author.

parser: turn apostrophe converter plugin on by default

People sometimes forget to switch it on, as it makes no harm and only good, it should be switched on by default.

gdisamb GUI enhancements (jjmeric)

always keep order of ps variants

Make Gloss.ps field internally an ordered set or somehow keep the order in all processing.

bug: kalifabaa

In the word kalifabaa, gparser has not recognized -baa/baga AG.OCC

possibility to move sentence boundaries in gdisamb

Возможность изменять границы предложений при снятии омонимии

gdisamb: wrong placement of diacritics

Когда пишешь вручную французскую глоссу, включив французскую раскладку, то получается, что аксанты перескакивают на букву вправо. Т.е., когда пишешь, например, complètement, то получается вот так:
completement̀
(аксан сдвигается в конце концов в крайне правую позицию).
У Жан Жака тоже получается так же.
Воспроизводится ли этот баг у Вас?

GlossInputDialog: dropdown list of previously entered glosses

Make gloss entry field a dropdown combo: so that one may choose either one of the previously entered glosses or edit gloss manually as usual.

Editing of a word analysis in gdesamb

When clicking on a word (for editing of the word analysis), it is not the current analysis (to be modified) that appears, but a semi-empty variant.
E.g.: for the word yaalala, I've chosen an analysis yáala:v:se.promener la:mrph:AG.PRM. After that I click on the word, and in the editing interface, instead of yaalala:n: [yáala:v:se.promener la:mrph:AG.PRM], I find the following: yaalala::
By the way, a mistake in the Grammar file: in the proposed analysis, a verbal stem + la:mrph:AG.PRM should produce a noun, i.e. yaalala:n: [yáala:v:se.promener la:mrph:AG.PRM], while the Parser produces a "verb" instead: yaalala:v: [yáala:v:se.promener la:mrph:AG.PRM]