Git Product home page Git Product logo

giella-shared's Introduction

giella-shared now ARCHIVED

Cf this issue. After a while the repository will be deleted. All content and history should be preserved in the new repositories.

giella-shared

Shared linguistic resources, like names, digits, fst filtering and dependency parsing.

This repo is required by all lang repos, and will be cloned automatically wheen running ./autogen.sh in a lang repo for the first time.

giella-shared's People

Contributors

albbas avatar duomdaamaendra avatar flammie avatar ilm024 avatar leneantonsen avatar reynoldsnlp avatar rueter avatar samama74 avatar siripaivio avatar snomos avatar trondtr avatar trondtynnol avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

giella-shared's Issues

\ symbol either misses dependency node or get ghost analysis

topic: Bug in analysis of our corpustexts for forthcoming SIKOR update.

problem: \ symbol either misses dependency node or get ghost analysis, where the dependency node is the final #n->m tag of each regel, and the ghost analysis is the analysis of the non-existing (=empty) character "<>".

to repeat: Run the following pipeline for one of two fst's, where the analysis pipeline in both cases is the same (standing in lang-xxx):

echo "ja \ ja" | hfst-tokenise -cg tools/tokenisers/tokeniser-disamb-gt-desc.pmhfst |vislcg3 -g src/cg3/disambiguator.cg3 |vislcg3 -g src/cg3/functions.cg3 | vislcg3 -g src/cg3/dependency.cg3

Option (a): run the command above with \ as lemma, where the file src/fst/generated-files/symbols.lexc has an the following entry for backslash:

\   Noun_symbols_never_inflected       ;

(this is the case today):

The analysis is:

"<ja>"
	"ja" CC <W:0.0> @X #1->0
: 
"<\>"
	"\" N Symbol <W:0.0>
: 
"<ja>"
	"ja" CC <W:0.0> @X #3->0
:\n

What is missing is the dependency node (see the rightmost node #1 and #3 (sic) on the two other words).

Option (b): In the symbols file, set backslash (or whatever) as lemma, \ as stem, the entry is then:

backslash:\  Noun_symbols_never_inflected       ;

Now recompile, run the same command, and the analysis is:

"<ja>"
	"ja" CC <W:0.0> @CVP #1->0
: 
"<\>"
	"backslash" N Symbol <W:0.0> @X #2->0
"<>"
	"'" PUNCT <W:0.0> #3->4
: 
"<ja>"
	"ja" CC <W:0.0> @CVP #4->0
:\n
"<>"
	"'" PUNCT <W:0.0> #5->5

Note that the dipendency node now is in place. Good. But the downside is the ghost analysis of "<>", that we do not want.

Neither option is optimal, but the former one is worst: Here, the dep node #3->0 is missing, and the analysis for our corpus stops. For the latter version, we get the dep node, as can be seen, but here it gets an empty reading of "<>" (with a depnode) as an unwanted passenger.

Now, (b) does not give exactly what we want, but (a) gives us no analysis at all. The Best Solution would be to have dep analysis and no ghost analysis.

Discuss, document and enforce version numbering routines

When building the mobile speller for divvun.org, the compilation fails

./configure --without-forrest --enable-spellers --disable-hfst-desktop-spellers --prefix=$HOME/.local --enable-hfst-mobile-speller

/usr/bin/hfst-lexc: The file lexicon.tmp.lexc did not compile cleanly.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.