Git Product home page Git Product logo

transphone's People

Contributors

xinjli avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

transphone's Issues

FileNotFoundError: [Errno 2] No such file or directory: '042801_base.yml'

Hi, I'm trying to use as is your package in Colab but even "Hello World" doesn't work:

In [1]: !pip install -q transphone

In [2]: from transphone import read_tokenizer                                                                                                  

In [3]: eng = read_tokenizer('eng')
---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
[<ipython-input-5-718514773e9e>](https://localhost:8080/#) in <cell line: 1>()
----> 1 eng = read_tokenizer('eng')

5 frames
[/usr/local/lib/python3.10/dist-packages/transphone/tokenizer.py](https://localhost:8080/#) in read_tokenizer(lang_id, g2p_model, device, use_lexicon)
     30 
     31     if lang_id in lang2tokenizer:
---> 32         return lang2tokenizer[lang_id](lang_id=lang_id, g2p_model=g2p_model, device=device, use_lexicon=use_lexicon)
     33     else:
     34         return read_g2p_tokenizer(lang_id=lang_id, g2p_model=g2p_model, device=device)

[/usr/local/lib/python3.10/dist-packages/transphone/lang/eng/tokenizer.py](https://localhost:8080/#) in read_eng_tokenizer(lang_id, g2p_model, device, use_lexicon)
      5 
      6 def read_eng_tokenizer(lang_id='eng', g2p_model='latest', device=None, use_lexicon=True):
----> 7     return ENGTokenizer(lang_id, g2p_model, device)
      8 
      9 

[/usr/local/lib/python3.10/dist-packages/transphone/lang/eng/tokenizer.py](https://localhost:8080/#) in __init__(self, lang_id, g2p_model, device)
     12     def __init__(self, lang_id='eng', g2p_model='latest', device=None):
     13 
---> 14         super().__init__(lang_id, g2p_model, device)
     15 
     16         # import jieba and pypinyin for segmentation

[/usr/local/lib/python3.10/dist-packages/transphone/lang/base_tokenizer.py](https://localhost:8080/#) in __init__(self, lang_id, g2p_model, device)
     12             self.g2p = None
     13         else:
---> 14             self.g2p = read_g2p(g2p_model, device)
     15 
     16         self.cache = {}

[/usr/local/lib/python3.10/dist-packages/transphone/g2p.py](https://localhost:8080/#) in read_g2p(model_name, device, checkpoint)
     47             cache_path = None
     48 
---> 49     config = read_model_config(model_name)
     50 
     51     model = G2P(checkpoint, cache_path, config)

[/usr/local/lib/python3.10/dist-packages/transphone/model/utils.py](https://localhost:8080/#) in read_model_config(exp)
     14     yaml_file = TransphoneConfig.data_path / 'exp' / f'{exp}.yml'
     15     # Open the YAML file and load its contents into a Python dictionary
---> 16     with open(yaml_file, "r") as f:
     17         model_config = yaml.safe_load(f)
     18 

FileNotFoundError: [Errno 2] No such file or directory: '/usr/local/lib/python3.10/dist-packages/transphone/data/exp/042801_base.yml'

UnicodeDecodeError and NotADirectoryError

Hi Xinjian! Thank you for building this awesome package! I've been using transphone on linux so far and it's working well, however today I tried running it on a Windows machine and ran into an issue.

When I run model = read_g2p(), I get the following error:

File "venv\lib\site-packages\transphone\g2p.py", line 38, in __init__
    self.grapheme_vocab = Vocab.read(model_path / 'grapheme.vocab')
File "\venv\lib\site-packages\transphone\data\vocab.py", line 22, in read
    for i, line in enumerate(open(Path(file_path))):
File "C:\Users\Flux\AppData\Local\Programs\Python\Python310\lib\encodings\cp1252.py", line 23, in decode
    return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 5437: character maps to <undefined>

It can be fixed by adding encoding="utf8" here:

for i, line in enumerate(open(Path(file_path))):

However after that I found another issue: I got a NotADirectoryError when the tarfile is being unpacked in the following line:

files.extractall(str(model_dir))

This one I couldn't figure out yet. It says that ...\\model\\latest\\prn is an invalid name for a directoy. I suspect the double backslashes to be the problem (which would explain why it's a windows exclusive problem). I just don't see why there are double backslashes. Do you have any hints on that?

Error while read_tokenizer

whine i run
from transphone import read_tokenizer
eng = read_tokenizer('eng')
i got error
UnicodeDecodeError: 'charmap' codec can't decode byte 0x90 in position 970: character maps to
any solution?

Stress a vowel manually

Hello
Thanks for your exciting work!
Can you please tell me, is there a possibility to stress a vowel manually with you phonemizer?
For example: alibab'a / alib'aba, з'амок / зам'ок
And does your phonemizer support stresses at all?

Estimate for when the code will be available?

Hi! I saw your excellent work at ACL, read the paper and talked to David at the poster. I'm very excited to try it out for in combination with the crosslingual text-to-speech I'm working on. Do you have an estimate when the code (and hopfully models) will be available?

Finetuning transphone g2p

Thank you for sharing your work!
I was wondering if it's possible to finetune the transphone G2P model with proprietary lexicons. If yes, could you please share some instructions on how to achieve this?

Error Running Examples?

I tried installing transphone both using pip and from source, and I keep getting the following error:
>>> from transphone.g2p import read_g2p
Traceback (most recent call last):
File "", line 1, in
File "/home/osboxes/Desktop/transphone-main/transphone/g2p.py", line 2, in
from transphone.model.transformer import TransformerG2P
File "/home/osboxes/Desktop/transphone-main/transphone/model/transformer.py", line 6, in
from transphone.data.utils import pad_sos_eos
ModuleNotFoundError: No module named 'transphone.data'

Thank you very much!

Outputs are inconsistent

Hi,
I'm using the latest commit on github and running model.inference on the same input. I get different results for each call.

> python3 -c "from transphone.g2p import read_g2p  ; model = read_g2p() ; print(model.inference('transphone', 'eng')); print(model.inference('transphone', 'eng'));"

['t', 'k', 'ð', 'a', 'ɛ', 'a']
['s', 'k', 's', 'j', 'ə', 'a', 'l', 'a', 'i', 'a', 'a', 'a', 'ɪ', 'a', 'j']

torch version = 1.8.0+cu111

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.