Comments (6)
This is the old BPE-based model. Try https://object.pouta.csc.fi/OPUS-MT-models/ru-en/opus-2020-02-26.zip
(sorry that the released models are sorted by having the most recent one furthest down ....)
from opus-mt-train.
This is the old BPE-based model. Try https://object.pouta.csc.fi/OPUS-MT-models/ru-en/opus-2020-02-26.zip
(sorry that the released models are sorted by having the most recent one furthest down ....)
@jorgtied The Hugging Fase has the link with the same name (I mean opus-2020-02-26.zip), probably it the same model.
Maybe I miss the preprocessing and postprocessing stages because I didn't use the postprocess.sh
and preprocess.sh
scripts? If so, can you please explain how to use them?
from opus-mt-train.
But your command-line call suggests that you are using an older model opus-2019-12-05-ru-en
. Also the output looks like it is the BPE model that you use. And, certainly, you need to use the preprocess and postprocess scripts!
from opus-mt-train.
@jorgtied Could you please show the example of using these scripts as I didn't get it from the description?
(I've tried a newer model, but got non-expected results because I didn't use preprocessing)
from opus-mt-train.
For an English-German model something like
echo "Hello world" | ./preprocess.sh deu source.spm | marian-decoder -c decoder.yml --cpu-threads 4 | ./postprocess.sh
but make sure that mosesdecoder and spm_encode are installed and found by the preprocess script. Otherwise you can probably also skip the moses-scripts and just encode with spm_encode
echo "Hello world" | spm_encode --model source.spm | ~/projappl/marian-dev/build/marian-decoder -c decoder.yml --cpu-threads 4 | sed 's/ //g;s/▁/ /g'
from opus-mt-train.
@jorgtied Thanks! For now, I've tried to first tokenize sentence using Hugging Face pretrained tokenizer that I suppose uses the same source.spm as well as target.spm and finally got a good translation.
Sorry for my stupid questions, but what exactly I need to install? I mean the sources of both: mosesdecoder and spm_encode. I've just installed spm_encode from repo using vcpkg but can't find any path's matching to paste it into ./preprocess.sh
from opus-mt-train.
Related Issues (20)
- how to fine tune the model published on huggingface | opus-mt-en-zh HOT 1
- model Helsinki-NLP/opus-mt-en-uk translates some sentences into Russian instead of Ukrainian HOT 1
- Problem Fine-tuning Models using TMX files HOT 5
- Preprocessing of training data HOT 1
- HuggingFace conversion script doesn't work HOT 1
- Model not available on huggingface model page, how do I use it with huggingface. HOT 4
- Preparing fine-tune data for Marian HOT 2
- Source.spm & Target.spm Files HOT 2
- Bridge Language
- What's the dataset used for training opus-mt-en-de HOT 1
- Language Code Difference HOT 1
- What is tatoeba-langtune? HOT 2
- Preprocessing Script Question
- Korean Finetuning
- Multilingual Tuned Model Translating everything to "sssssssss" HOT 2
- What could cause widely varying inference time when using pre-trained opus-mt-en-fr model with python transformers library? HOT 2
- Wrong tokenizer/vocab for the 'Helsinki-NLP/opus-mt-tc-big-en-ko' model
- How to translate from english to Japan?
- Using OPUS-MT with DeepSpeed
- update Dockerfile.gpu--fixed
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from opus-mt-train.