Git Product home page Git Product logo

Comments (6)

jorgtied avatar jorgtied commented on May 18, 2024

This is the old BPE-based model. Try https://object.pouta.csc.fi/OPUS-MT-models/ru-en/opus-2020-02-26.zip
(sorry that the released models are sorted by having the most recent one furthest down ....)

from opus-mt-train.

koren-v avatar koren-v commented on May 18, 2024

This is the old BPE-based model. Try https://object.pouta.csc.fi/OPUS-MT-models/ru-en/opus-2020-02-26.zip
(sorry that the released models are sorted by having the most recent one furthest down ....)

@jorgtied The Hugging Fase has the link with the same name (I mean opus-2020-02-26.zip), probably it the same model.
Maybe I miss the preprocessing and postprocessing stages because I didn't use the postprocess.sh and preprocess.sh scripts? If so, can you please explain how to use them?

from opus-mt-train.

jorgtied avatar jorgtied commented on May 18, 2024

But your command-line call suggests that you are using an older model opus-2019-12-05-ru-en. Also the output looks like it is the BPE model that you use. And, certainly, you need to use the preprocess and postprocess scripts!

from opus-mt-train.

koren-v avatar koren-v commented on May 18, 2024

@jorgtied Could you please show the example of using these scripts as I didn't get it from the description?
(I've tried a newer model, but got non-expected results because I didn't use preprocessing)

from opus-mt-train.

jorgtied avatar jorgtied commented on May 18, 2024

For an English-German model something like

echo "Hello world" | ./preprocess.sh deu source.spm | marian-decoder -c decoder.yml --cpu-threads 4 | ./postprocess.sh

but make sure that mosesdecoder and spm_encode are installed and found by the preprocess script. Otherwise you can probably also skip the moses-scripts and just encode with spm_encode

echo "Hello world" | spm_encode --model source.spm | ~/projappl/marian-dev/build/marian-decoder -c decoder.yml --cpu-threads 4 | sed 's/ //g;s/▁/ /g'

from opus-mt-train.

koren-v avatar koren-v commented on May 18, 2024

@jorgtied Thanks! For now, I've tried to first tokenize sentence using Hugging Face pretrained tokenizer that I suppose uses the same source.spm as well as target.spm and finally got a good translation.
Sorry for my stupid questions, but what exactly I need to install? I mean the sources of both: mosesdecoder and spm_encode. I've just installed spm_encode from repo using vcpkg but can't find any path's matching to paste it into ./preprocess.sh

from opus-mt-train.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.