Git Product home page Git Product logo

lingua-stem-unine-pm5's People

Contributors

kstarsinic avatar patch avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Forkers

wollmers

lingua-stem-unine-pm5's Issues

add Indo-Aryan languages

UniNE provides stemming algorithms for Bengali, Hindi, and Marathi. None of them have current stemmers on CPAN, so let's make them a higher priority.

fix uninitialized value $language bug

There are lots of failing CPAN tester reports for v0.05:
http://www.cpantesters.org/distro/L/Lingua-Stem-UniNE.html?version=0.05

Use of uninitialized value $language:: in concatenation (.) or string at /home/cpan/pit/rel/conf/perl-5.16.2/.cpanplus/5.16.2/build/Lingua-Stem-UniNE-0.05/blib/lib/Lingua/Stem/UniNE.pm line 51.
# Looks like you planned 30 tests but ran 9.
# Looks like your test exited with 2 just after 9.
t/01-stemmer.t .. 
Dubious, test returned 2 (wstat 512, 0x200)

investigate Perl v5.19 bug with non-Latin scripts

In all tested versions of Perl v5.19 (.5, .6, .7, .8), all tests using non-Latin scripts (Arabic, Cyrillic) are failing. Non-ASCII characters within the Latin script are fine. This may be a Perl bug in these development versions, in which case it needs to be reported.

http://matrix.cpantesters.org/?dist=Lingua-Stem-UniNE+0.06

Example:

#   Failed test 'remove article: -и�'
#   at t/bg.t line 16.
#          got: 'и�тори'
#     expected: 'и�тор'

add option to disable language-based diacritic stripping

Some of the stemming algorithms will strip specific diacritical marks from the entire word. This type of word normalization in addition to stemming isn't always desired. Let's add an object attribute to optionally disable it.

For example, the to-be-implemented German stemmer replaces ä with a, ö with o, and ü with u.

add Arabic language

The only current Arabic stemmer on CPAN is Lingua::AR::Word, which is of questionable quality. Here's a comment from Lingua::AR::Word::Stem.

#let's strip down every prefix and suffix I'm aware of.

The UniNE Arabic stemmer would be a welcome addition to CPAN, along with the Persian stemmer we already include. I'd make this the next priority after the Indo-Aryan languages (issue #3).

investigate undefined subroutine utf8::se bug

A single CPAN tester report is reporting the following bug.

# Lingua::Stem::UniNE v0.06, Moo v1.004002, Perl v5.14.3 (C:\strawberry5143\perl\bin\perl.exe)
t\00-load.t ..... ok
Undefined subroutine utf8::se called at C:\strawberry5143\cpan\build\Lingua-Stem-UniNE-0.06-KyTxMK\blib\lib/Lingua/Stem/UniNE.pm line 3.
Compilation failed in require at t\01-stemmer.t line 6.
BEGIN failed--compilation aborted at t\01-stemmer.t line 6.
t\01-stemmer.t .. 
Dubious, test returned 255 (wstat 65280, 0xff00)
Failed 30/30 subtests 

add aggressive option

We've started implementing different aggressive and light stemmers, so we need an attribute in the Lingua::Stem::UniNE object to select aggressive over the default light stemmers.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.