Hi, I've loaded the models from the following directory: <a href="https://github.com/H

This is the old BPE-based model. Try <a href="https://object.pouta.csc.fi/OPUS-MT-mode

This is the old BPE-based model. Try <a href="https://object.pouta.csc.fi

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

For an English-German model something like <div class="snippet-clipboard-content n

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Bad translation using marian-decoder about opus-mt-train HOT 6 OPEN

helsinki-nlp commented on May 18, 2024

Bad translation using marian-decoder

from opus-mt-train.

Comments (6)

jorgtied commented on May 18, 2024

This is the old BPE-based model. Try https://object.pouta.csc.fi/OPUS-MT-models/ru-en/opus-2020-02-26.zip
(sorry that the released models are sorted by having the most recent one furthest down ....)

from opus-mt-train.

koren-v commented on May 18, 2024

This is the old BPE-based model. Try https://object.pouta.csc.fi/OPUS-MT-models/ru-en/opus-2020-02-26.zip
(sorry that the released models are sorted by having the most recent one furthest down ....)

@jorgtied The Hugging Fase has the link with the same name (I mean opus-2020-02-26.zip), probably it the same model.
Maybe I miss the preprocessing and postprocessing stages because I didn't use the postprocess.sh and preprocess.sh scripts? If so, can you please explain how to use them?

from opus-mt-train.

jorgtied commented on May 18, 2024

But your command-line call suggests that you are using an older model opus-2019-12-05-ru-en. Also the output looks like it is the BPE model that you use. And, certainly, you need to use the preprocess and postprocess scripts!

from opus-mt-train.

koren-v commented on May 18, 2024

@jorgtied Could you please show the example of using these scripts as I didn't get it from the description?
(I've tried a newer model, but got non-expected results because I didn't use preprocessing)

from opus-mt-train.

jorgtied commented on May 18, 2024

For an English-German model something like

echo "Hello world" | ./preprocess.sh deu source.spm | marian-decoder -c decoder.yml --cpu-threads 4 | ./postprocess.sh

but make sure that mosesdecoder and spm_encode are installed and found by the preprocess script. Otherwise you can probably also skip the moses-scripts and just encode with spm_encode

echo "Hello world" | spm_encode --model source.spm | ~/projappl/marian-dev/build/marian-decoder -c decoder.yml --cpu-threads 4 | sed 's/ //g;s/▁/ /g'

from opus-mt-train.

koren-v commented on May 18, 2024

@jorgtied Thanks! For now, I've tried to first tokenize sentence using Hugging Face pretrained tokenizer that I suppose uses the same source.spm as well as target.spm and finally got a good translation.
Sorry for my stupid questions, but what exactly I need to install? I mean the sources of both: mosesdecoder and spm_encode. I've just installed spm_encode from repo using vcpkg but can't find any path's matching to paste it into ./preprocess.sh

from opus-mt-train.

Recommend Projects

Bad translation using marian-decoder about opus-mt-train HOT 6 OPEN

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent