Git Product home page Git Product logo

ntrex's People

Contributors

cbaziotis avatar cfedermann avatar chanberg avatar kocmitom avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ntrex's Issues

Templatic fragments

Some files contain templatic fragments <seg id=12">. It may be a problem from preprocessing but also problem that translators misunderstood the request. It needs human verification:

Na Amerika ena itekivu ni yabaki <seg id="12">A tauyavutaka na Tabacakacaka ni Veitaqomaki e dua na Koro ni Veivakaukauwataki Ni Vuku Cokovata, ka kena inaki me ra vakaitavi kina na itokani mai na tabana ni cakacaka kei na vuli, ka kacivaka na White House na tauyavutaki ni Komiti digitaki ena Artificial Intelligence.

Bad language code?

As per email conversation:

One of your published datasets is named newstest2019-ref.nso (which I assume is Northern Sotho, commonly referred to as Sepedi - because it's from the northern part of South Africa), however the content inside is Sesotho (the Southern Sotho - the Sotho from the Southern part), with code st, or sometimes sot.

Zulu reference line count

The Zulu reference line count (1998) seems one off from the other references and the English source. There might be a misalignment issue with Zulu.

Wolof seems to have incorrect no. of lines

Wolof seems to have 2019 lines, compared to the other lines that have 1997. I tried filtering for empty lines, and it does not seem to make a difference - unlike other files, that actually had many empty lines!

Seems to be an outlier, can someone please look into this?

Different number of lines in some files

Some files have not 1997 lines:

1998 newstest2019-ref.fij.txt
1998 newstest2019-ref.tgk-Cyrl.txt
1998 newstest2019-ref.zul.txt
2019 newstest2019-ref.wol.txt
2031 newstest2019-ref.urd.txt
2042 newstest2019-ref.vie.txt

Am i right, this doesn't allow to match them with newstest2019-src.eng.txt ?

Small things

Thank you this is really great.

I have two comments:

  1. the punctuation is really erratic. I think it would be great if everything was normalized and then post-processed (both according to language). quote before or after full period or comma, space or not before/after quote, ... small things that makes a difference.

  2. Even though these are human translated, I am looking at French (my native language) and there are some inconsistencies.
    Just one example, line 49. It has been considered with line 48 but cut. it is not the English line 49 translation.
    When comparing FR and CA_FR also I am seeing weird things.

Again, great stuff.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.