Dear Team of Neural Machine Translation, ` pass a

Is YouTokenToMe a right tool to do it? <a href="https://github.com/VKCOM/YouTokenT

examples/nlp/nmt_tutorial.py: how to generate YouTokenToMe model for custom language? about nemo HOT 3 CLOSED

nvidia commented on May 17, 2024

examples/nlp/nmt_tutorial.py: how to generate YouTokenToMe model for custom language?

from nemo.

Comments (3)

ican24 commented on May 17, 2024

Is YouTokenToMe a right tool to do it?
https://github.com/VKCOM/YouTokenToMe

If yes, do I need to use below command line to generate tokenized model for the used languages?
yttm bpe --data data/train.en --model bpe8k_yttm.model --vocab_size 8200

from nemo.

okuchaiev commented on May 17, 2024

Yes, please refer to the YouTokenToMe tokenizer's docs on how to do it.
The command you posted looks good to me. Do you have any issues with it?

from nemo.

ican24 commented on May 17, 2024

Dear Oleksii,
No the issue was solved!
So I am closing this issue.
I will try to formulate my further questions for using of new language with NeMo's NLP as separate issues.

from nemo.

Related Issues (20)

Feature Normalization in the ASR preprocessor is too slow. HOT 3
Conflict between precision and plugins arguments in Trainer HOT 1
Error response from daemon: unauhorized: authentication required HOT 1
Why use two types of names? spe refers to the Google sentencepiece library tokenizer. bpe for SentencePiece tokenizer HOT 2
Error in coverting Mixtral-7B hf checkpoint to Nemo HOT 1
Error while exporting to TensorRTLLM format - AttributeError: 'NoneType' object has no attribute 'get' HOT 3
`megatron_gpt_finetuning.py` does not work `max_epochs` HOT 1
Is frame marblenet VAD still supported?
Context parallel does not work in some cases which works well using megatron-lm directly
Can't train/finetune a model on two RTX4090
is Forced Alignment available on prebuilt Docker images? HOT 1
Support for Specifying Start and End Time when Reading WAV File HOT 1
Saving and reloading the pretrained model's vocab breaks the tokenizer. HOT 2
IndexError: index -1 is out of bounds for dimension 1 with size 0
Support required for fine tuning cache aware streaming model
Canary model stuck in a loop? Just repeats the same phrases over and over. HOT 2
Slow training on Mixtral-8x22B when DP size > 1 HOT 2
TypeError: EncDecRNNTBPEModel.change_vocabulary() got an unexpected keyword argument 'new_vocabulary' HOT 1
Memory Allocation Error during alignement (tools/nemo_forced_aligner/align.py)
"greedy_batched" methods should support "partial_hypotheses" option

examples/nlp/nmt_tutorial.py: how to generate YouTokenToMe model for custom language? about nemo HOT 3 CLOSED

Comments (3)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent