ksw0306 / flowavenet Goto Github PK
View Code? Open in Web Editor NEWA Pytorch implementation of "FloWaveNet: A Generative Flow for Raw Audio"
License: MIT License
A Pytorch implementation of "FloWaveNet: A Generative Flow for Raw Audio"
License: MIT License
Hi,
Thank you for your nice job. But I found that either your FloWaveNet or NVIDIA's WaveGlow has a problem of containing periodic noise.
Maybe this is caused by the squeeze operations, because I found that the frequencies of the periodic noise in WaveGlow are multiples of sample_rate // squeeze_factor. (for example, 16khz audio with squeeze_factor 8 may have periodic noise with 2khz, 4khz, 6khz and so on)
So do you any idea about how to solve this problem?
thanks for this nice job
but I have some promble when synthesize, there is always sound reverberation in synthesize audio compared with raw audio, does someone have same problem with me(batch size = 4,1000k step). I guess the "change order"module may lead in this problem?
after 200k step, I found loss almost remain unchanged in (-3.4,-3.7), and the result of synthesize is similar too from 200k to 1000k,so I want to ask it is reasonable? and if not, which scale of loss is reasonable
Hi
I am unable to figure out how to generate a synthesis with these parameters using the MELS spectrogram generated by Tacotron-2
parser.add_argument('--data_path', type=str, default='./DATASETS/ljspeech/', help='Dataset Path')
parser.add_argument('--sample_path', type=str, default='./samples', help='Sample Path')
parser.add_argument('--model_name', type=str, default='flowavenet', help='Model Name')
parser.add_argument('--num_samples', type=int, default=10, help='# of audio samples')
parser.add_argument('--load_step', type=int, default=0, help='Load Step')
parser.add_argument('--temp', type=float, default=0.8, help='Temperature')
parser.add_argument('--load', '-l', type=str, default='./params', help='Checkpoint path to resume / test.')
parser.add_argument('--n_layer', type=int, default=2, help='Number of layers')
parser.add_argument('--n_flow', type=int, default=6, help='Number of layers')
parser.add_argument('--n_block', type=int, default=8, help='Number of layers')
parser.add_argument('--cin_channels', type=int, default=80, help='Cin Channels')
parser.add_argument('--block_per_split', type=int, default=4, help='Block per split')
parser.add_argument('--num_workers', type=int, default=0, help='Number of workers')
parser.add_argument('--log', type=str, default='./log', help='Log folder.')
Where is the parameter for the MELS?
Does anyone have an example commandline?
Many Thanks
Joshua
I haven't seen the parameter in previous version.
Can anyone explain the effect of the parameter (block_per_split)?
The auther proposed the new version for Multi-scale architecture in 1 Feb.
I am so curious about the new feature.
Because I already trained the model at previous version.
I hesitate whether I need retrain the model.
(Training time is so long...)
Hi, I get an out of memory message nearly instantly after training start. I use a Geforce GTX 1080TI with 11 gigabyte memory and Cuda 9.0. My question is:
Thanks for your help.
Edit: When I reduce the batch size to 6 I have a 10,4 GB of my GPU memory used which is fine. So I guess that is the solution.
Stack trace is:
(flowavenet) C:\Users\admin\FloWaveNet>python train.py --model_name flowavenet --batch_size 8 --n_block 8 --n_flow 6 --n_layer 2 --causal no
Traceback (most recent call last):
File "train.py", line 229, in <module>
training_epoch_loss = train(epoch, model, optimizer)
File "train.py", line 108, in train
log_p, logdet = model(x, c)
File "C:\Users\admin\Anaconda3\envs\flowavenet\lib\site-packages\torch\nn\modules\module.py", line 477, in __call__
result = self.forward(*input, **kwargs)
File "C:\Users\admin\FloWaveNet\model.py", line 195, in forward
out, c, logdet_new = block(out, c)
File "C:\Users\admin\Anaconda3\envs\flowavenet\lib\site-packages\torch\nn\modules\module.py", line 477, in __call__
result = self.forward(*input, **kwargs)
File "C:\Users\admin\FloWaveNet\model.py", line 150, in forward
out, c, det = flow(out, c)
File "C:\Users\admin\Anaconda3\envs\flowavenet\lib\site-packages\torch\nn\modules\module.py", line 477, in __call__
result = self.forward(*input, **kwargs)
File "C:\Users\admin\FloWaveNet\model.py", line 114, in forward
out, det = self.coupling(out, c)
File "C:\Users\admin\Anaconda3\envs\flowavenet\lib\site-packages\torch\nn\modules\module.py", line 477, in __call__
result = self.forward(*input, **kwargs)
File "C:\Users\admin\FloWaveNet\model.py", line 74, in forward
log_s, t = self.net(in_a, c_a).chunk(2, 1)
File "C:\Users\admin\Anaconda3\envs\flowavenet\lib\site-packages\torch\nn\modules\module.py", line 477, in __call__
result = self.forward(*input, **kwargs)
File "C:\Users\admin\FloWaveNet\modules.py", line 117, in forward
h, s = f(h, c)
File "C:\Users\admin\Anaconda3\envs\flowavenet\lib\site-packages\torch\nn\modules\module.py", line 477, in __call__
result = self.forward(*input, **kwargs)
File "C:\Users\admin\FloWaveNet\modules.py", line 71, in forward
h_gate = self.gate_conv(tensor)
File "C:\Users\admin\Anaconda3\envs\flowavenet\lib\site-packages\torch\nn\modules\module.py", line 477, in __call__
result = self.forward(*input, **kwargs)
File "C:\Users\admin\FloWaveNet\modules.py", line 21, in forward
out = self.conv(tensor)
File "C:\Users\admin\Anaconda3\envs\flowavenet\lib\site-packages\torch\nn\modules\module.py", line 477, in __call__
result = self.forward(*input, **kwargs)
File "C:\Users\admin\Anaconda3\envs\flowavenet\lib\site-packages\torch\nn\modules\conv.py", line 176, in forward
self.padding, self.dilation, self.groups)
RuntimeError: CUDA error: out of memory
Currently, we cannot run the multi-GPU training on PyTorch v1.0.0 due to a strange null gradient issue.
Hi, can you please share pre-trained models which is giving good result if any?
I've trained this project with 10000 sentence speech about 2secs/sentecne ,namely 2.78 hours of training data for about 18hours ,with 43 epoches and 98860 steps now. But the loss decrease to -4.7929 while no clear voice can get in the generated wavs.
Hello,
I am trying to run the training part on multiple GPUs (4 Tesla V100), using the command
python train.py --model_name flowavenet --batch_size 8 --n_block 8 --n_flow 6 --n_layer 2 --block_per_split 4 --num_gpu 4
It runs everything without an error and outputs
num_gpu > 1 detected. converting the model to DataParallel...
It was frozen with this output for more than 1 hour. I checked the usage of the GPUs and all of them were used, but I didn't see any change. I have several questions: do I have some problem with the code or I have to wait more for the training to start? Will decrease in batch_size increase the speed of conversion to DataParallel?
Note* I run training on LJ-Speech-Dataset
Also, can you give us the download links of the pretrained models? It would be very helpful.
With GPU and without, on already trained model.
I want to use some open source text-to-speech on AWS Lambda, I am looking into some solutions that can render audio faster than real time.
Thanks for the nice job!
I cannot really understand the ZeroConv1d
in modules.py
.
Since the weight and bias are initialized to exactly zeros, will the output of the ZeroConv1d
always be zeros? Or I mis-understand?
Could you please clear me a little bit? Thank you in advance.
Best.
hi, I run the following command, but got error during training. I don't konw why the training stopped.
Global Step : 47160, [4, 100] [Log pdf, Log p(z), Log Det] : [-4.071 -1.3882 5.4592]
Global Step : 47160, [4, 200] [Log pdf, Log p(z), Log Det] : [-3.8072 -1.3883 5.1956]
Global Step : 47160, [4, 300] [Log pdf, Log p(z), Log Det] : [-3.833 -1.3862 5.2192]
Global Step : 47160, [4, 400] [Log pdf, Log p(z), Log Det] : [-3.8035 -1.386 5.1895]
Global Step : 47160, [4, 500] [Log pdf, Log p(z), Log Det] : [-3.9065 -1.3802 5.2868]
Global Step : 47160, [4, 600] [Log pdf, Log p(z), Log Det] : [-3.7542 -1.3834 5.1376]
Global Step : 47160, [4, 700] [Log pdf, Log p(z), Log Det] : [-3.7565 -1.3889 5.1454]
Global Step : 47160, [4, 800] [Log pdf, Log p(z), Log Det] : [-3.9123 -1.3848 5.2971]
Global Step : 47160, [4, 900] [Log pdf, Log p(z), Log Det] : [-3.8986 -1.3856 5.2842]
Global Step : 47160, [4, 1000] [Log pdf, Log p(z), Log Det] : [-4.0164 -1.3854 5.4018]
Global Step : 47160, [4, 1100] [Log pdf, Log p(z), Log Det] : [-3.9558 -1.3868 5.3426]
Global Step : 47160, [4, 1200] [Log pdf, Log p(z), Log Det] : [-4.0773 -1.3795 5.4568]
Global Step : 47160, [4, 1300] [Log pdf, Log p(z), Log Det] : [-3.8799 -1.383 5.2629]
Evaluation Loss : -3.8945
Traceback (most recent call last):
File "train.py", line 241, in
save_checkpoint(model, optimizer, global_step, epoch)
File "train.py", line 185, in save_checkpoint
"global_epoch": global_epoch}, checkpoint_path)
File "/home/hu/anaconda3/envs/hhh/lib/python3.6/site-packages/torch/serialization.py", line 209, in save
return _with_file_like(f, "wb", lambda f: _save(obj, f, pickle_module, pickle_protocol))
File "/home/hu/anaconda3/envs/hhh/lib/python3.6/site-packages/torch/serialization.py", line 134, in _with_file_like
return body(f)
File "/home/hu/anaconda3/envs/hhh/lib/python3.6/site-packages/torch/serialization.py", line 209, in
return _with_file_like(f, "wb", lambda f: _save(obj, f, pickle_module, pickle_protocol))
File "/home/hu/anaconda3/envs/hhh/lib/python3.6/site-packages/torch/serialization.py", line 288, in _save
serialized_storages[key]._write_file(f, _should_read_directly(f))
RuntimeError: Unknown error -1
I fix this bug by replace this expression with
out = out[:, :, :out.size(2)-self.padding]
and this code work well in causal mode.
Hi
Very nice work and thanks for sharing the code.
I have some questions about Equation (3) and (5) in your paper at https://arxiv.org/pdf/1811.02155.pdf.
The forward transformation (Equation (3)) and reverse transformation (Equation (5)) are different than the ones in GLOW paper (Row 3, Table 1, https://arxiv.org/pdf/1807.03039.pdf).
Their reverse function, such as x = (y_a - t)/s
, is your forward function (implementation https://github.com/ksw0306/FloWaveNet/blob/master/model.py#L76) and Equation (3).
Their forward function, such as y = s*x_a + t
, matches your reverse function in implementation at https://github.com/ksw0306/FloWaveNet/blob/master/model.py#L90, but not Equation (5) on your paper.
Did I miss anything? Thanks
Global Step : 1300, [1, 1300] [Log pdf, Log p(z), Log Det] : [-3.2562 -1.4938 4.75 ]
Global Step : 1400, [1, 1400] [Log pdf, Log p(z), Log Det] : [-3.154 -1.4319 4.5859]
Global Step : 1500, [1, 1500] [Log pdf, Log p(z), Log Det] : [-3.3218 -1.4467 4.7684]
Global Step : 1600, [1, 1600] [Log pdf, Log p(z), Log Det] : [-3.3319 -1.4162 4.7481]
Global Step : 1700, [1, 1700] [Log pdf, Log p(z), Log Det] : [-3.532 -1.4159 4.9479]
Global Step : 1800, [1, 1800] [Log pdf, Log p(z), Log Det] : [-3.0115 -1.549 4.5605]
Global Step : 1900, [1, 1900] [Log pdf, Log p(z), Log Det] : [-3.572 -1.4092 4.9812]
Global Step : 2000, [1, 2000] [Log pdf, Log p(z), Log Det] : [-3.595 -1.4123 5.0073]
Global Step : 2100, [1, 2100] [Log pdf, Log p(z), Log Det] : [-3.3661 -1.4437 4.8097]
Global Step : 2200, [1, 2200] [Log pdf, Log p(z), Log Det] : [-3.43 -1.416 4.846]
1 Epoch Training Loss : -2.8819
Global Step : 2250, [1, 100] [Log pdf, Log p(z), Log Det] : [-3.1498 -1.4728 4.6226]
Global Step : 2250, [1, 200] [Log pdf, Log p(z), Log Det] : [-3.1846 -1.4745 4.6591]
Evaluation Loss : -3.1582
Epoch 1 Model Saved! Loss : -3.1582
Traceback (most recent call last):
File "H:/workspace/FloWaveNet/train.py", line 244, in <module>
synthesize(model)
File "H:/workspace/FloWaveNet/train.py", line 170, in synthesize
y_gen = model.reverse(z, c).squeeze()
File "H:\workspace\FloWaveNet\model.py", line 204, in reverse
c = self.upsample(c)
File "H:\workspace\FloWaveNet\model.py", line 230, in upsample
c = f(c)
File "C:\Python35\lib\site-packages\torch\nn\modules\module.py", line 477, in __call__
result = self.forward(*input, **kwargs)
File "C:\Python35\lib\site-packages\torch\nn\modules\conv.py", line 691, in forward
output_padding, self.groups, self.dilation)
RuntimeError: CuDNN error: CUDNN_STATUS_INTERNAL_ERROR
Hi, I'm on a Windows 10 system with anaconda and python 3.6. I ran the preprocessing succesfully but on calling train.py I get the following error. This issues is definitely related to windows (see https://stackoverflow.com/questions/18204782/runtimeerror-on-windows-trying-python-multiprocessing) but I don't know how to solve it.
(FloWaveNet) C:\Users\admin\FloWaveNet>python train.py --model_name flowavenet --batch_size 8 --n_block 8 --n_flow 6 --n_layer 2 --causal no
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "C:\Users\admin\Anaconda3\envs\FloWaveNet\lib\multiprocessing\spawn.py", line 105, in spawn_main
exitcode = _main(fd)
File "C:\Users\admin\Anaconda3\envs\FloWaveNet\lib\multiprocessing\spawn.py", line 114, in _main
prepare(preparation_data)
File "C:\Users\admin\Anaconda3\envs\FloWaveNet\lib\multiprocessing\spawn.py", line 225, in prepare
_fixup_main_from_path(data['init_main_from_path'])
File "C:\Users\admin\Anaconda3\envs\FloWaveNet\lib\multiprocessing\spawn.py", line 277, in _fixup_main_from_path
run_name="__mp_main__")
File "C:\Users\admin\Anaconda3\envs\FloWaveNet\lib\runpy.py", line 263, in run_path
pkg_name=pkg_name, script_name=fname)
File "C:\Users\admin\Anaconda3\envs\FloWaveNet\lib\runpy.py", line 96, in _run_module_code
mod_name, mod_spec, pkg_name, script_name)
File "C:\Users\admin\Anaconda3\envs\FloWaveNet\lib\runpy.py", line 85, in _run_code
exec(code, run_globals)
File "C:\Users\admin\FloWaveNet\train.py", line 228, in <module>
training_epoch_loss = train(epoch, model, optimizer)
File "C:\Users\admin\FloWaveNet\train.py", line 91, in train
for batch_idx, (x, c) in enumerate(train_loader):
File "C:\Users\admin\Anaconda3\envs\FloWaveNet\lib\site-packages\torch\utils\data\dataloader.py", line 501, in __iter__
return _DataLoaderIter(self)
File "C:\Users\admin\Anaconda3\envs\FloWaveNet\lib\site-packages\torch\utils\data\dataloader.py", line 289, in __init__
w.start()
File "C:\Users\admin\Anaconda3\envs\FloWaveNet\lib\multiprocessing\process.py", line 105, in start
self._popen = self._Popen(self)
File "C:\Users\admin\Anaconda3\envs\FloWaveNet\lib\multiprocessing\context.py", line 223, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "C:\Users\admin\Anaconda3\envs\FloWaveNet\lib\multiprocessing\context.py", line 322, in _Popen
Traceback (most recent call last):
File "train.py", line 228, in <module>
return Popen(process_obj)
training_epoch_loss = train(epoch, model, optimizer)
File "C:\Users\admin\Anaconda3\envs\FloWaveNet\lib\multiprocessing\popen_spawn_win32.py", line 33, in __init__
File "train.py", line 91, in train
prep_data = spawn.get_preparation_data(process_obj._name)
for batch_idx, (x, c) in enumerate(train_loader):
File "C:\Users\admin\Anaconda3\envs\FloWaveNet\lib\site-packages\torch\utils\data\dataloader.py", line 501, in __iter__
File "C:\Users\admin\Anaconda3\envs\FloWaveNet\lib\multiprocessing\spawn.py", line 143, in get_preparation_data
return _DataLoaderIter(self)
File "C:\Users\admin\Anaconda3\envs\FloWaveNet\lib\site-packages\torch\utils\data\dataloader.py", line 289, in __init__
_check_not_importing_main()
w.start()
File "C:\Users\admin\Anaconda3\envs\FloWaveNet\lib\multiprocessing\spawn.py", line 136, in _check_not_importing_main
File "C:\Users\admin\Anaconda3\envs\FloWaveNet\lib\multiprocessing\process.py", line 105, in start
is not going to be frozen to produce an executable.''')
self._popen = self._Popen(self)
RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.
This probably means that you are not using fork to start your
child processes and you have forgotten to use the proper idiom
in the main module:
if __name__ == '__main__':
freeze_support()
...
The "freeze_support()" line can be omitted if the program
is not going to be frozen to produce an executable. File "C:\Users\admin\Anaconda3\envs\FloWaveNet\lib\multiprocessing\context.py", line 223, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "C:\Users\admin\Anaconda3\envs\FloWaveNet\lib\multiprocessing\context.py", line 322, in _Popen
return Popen(process_obj)
File "C:\Users\admin\Anaconda3\envs\FloWaveNet\lib\multiprocessing\popen_spawn_win32.py", line 65, in __init__
reduction.dump(process_obj, to_child)
File "C:\Users\admin\Anaconda3\envs\FloWaveNet\lib\multiprocessing\reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
BrokenPipeError: [Errno 32] Broken pipe
[Deleted]
The training is as follows:
Use the LJSpeech dataset, Synthesized audio
I only want to input the mel-feature generated from tacotron2. How should I modify the script "synthesize.py"?
Hi there,
I want to change text for the output, generated wav file, any idea how? I was looking at the source but cannot figure out how does it work..
Thankies.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.