Comments (5)
我多次测试,发现如果train.jsonl里面大于7条就会出现这个问题,小于7条是可以进行训练,不过训练的时候一直是 Update best acc: 0.0000, outputs/model.pt.best,正确率一直是0
from funasr.
已经修复,可以重新pull代码再试一下
from funasr.
还是一样报同样的错误
[2024-04-17 15:23:00,349][root][INFO] - Train epoch: 0, rank: 0
/home/chushaobo/anaconda3/envs/funasrv2/lib/python3.8/site-packages/torch/autograd/init.py:200: UserWarning: Grad strides do not match bucket view strides. This may indicate grad was not created according to the gradient layout contract, or that the param's strides changed since DDP was constructed. This is not an error, but may impair performance.
grad.sizes() = [1, 512], strides() = [1, 1]
bucket_view.sizes() = [1, 512], strides() = [512, 1] (Triggered internally at ../torch/csrc/distributed/c10d/reducer.cpp:323.)
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
/home/chushaobo/anaconda3/envs/funasrv2/lib/python3.8/site-packages/torch/autograd/init.py:200: UserWarning: Grad strides do not match bucket view strides. This may indicate grad was not created according to the gradient layout contract, or that the param's strides changed since DDP was constructed. This is not an error, but may impair performance.
grad.sizes() = [1, 512], strides() = [1, 1]
bucket_view.sizes() = [1, 512], strides() = [512, 1] (Triggered internally at ../torch/csrc/distributed/c10d/reducer.cpp:323.)
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
[2024-04-17 15:23:02,723][root][INFO] - train, rank: 1, epoch: 0/50, step: 1/1, total step: 1, (loss_avg_rank: 0.013), (loss_avg_epoch: 0.008), (ppl_avg_epoch: 1.008e+00), (acc_avg_epoch: 0.000), (lr: 1.333e-08), [('loss_seaco', 0.008), ('loss', 0.008)], {'data_load': '0.994', 'forward_time': '0.692', 'backward_time': '0.505', 'optim_time': '0.174', 'total_time': '2.364'}, GPU, memory: usage: 3.784 GB, peak: 6.489 GB, cache: 6.828 GB, cache_peak: 6.828 GB
[2024-04-17 15:23:02,750][root][INFO] - train, rank: 0, epoch: 0/50, step: 1/1, total step: 1, (loss_avg_rank: 0.003), (loss_avg_epoch: 0.008), (ppl_avg_epoch: 1.008e+00), (acc_avg_epoch: 0.000), (lr: 1.333e-08), [('loss_seaco', 0.007), ('loss', 0.007)], {'data_load': '0.987', 'forward_time': '0.541', 'backward_time': '0.663', 'optim_time': '0.200', 'total_time': '2.391'}, GPU, memory: usage: 3.684 GB, peak: 6.373 GB, cache: 6.521 GB, cache_peak: 6.521 GB
[2024-04-17 15:23:02,849][root][INFO] - Validate epoch: 0, rank: 1
[2024-04-17 15:23:02,849][root][INFO] - Validate epoch: 0, rank: 0
Error executing job with overrides: ['++model=iic/speech_seaco_paraformer_large_asr_nat-zh-cn-16k-common-vocab8404-pytorch', '++train_data_set_list=../../../data/list/train.jsonl', '++valid_data_set_list=../../../data/list/val.jsonl', '++dataset_conf.batch_size=2000', '++dataset_conf.batch_type=token', '++dataset_conf.num_workers=4', '++train_conf.max_epoch=50', '++train_conf.log_interval=1', '++train_conf.resume=false', '++train_conf.validate_interval=2000', '++train_conf.save_checkpoint_interval=2000', '++train_conf.keep_nbest_models=20', '++optim_conf.lr=0.0002', '++output_dir=./outputs']
Traceback (most recent call last):
File "../../../funasr/bin/train.py", line 225, in
main_hydra()
File "/home/chushaobo/anaconda3/envs/funasrv2/lib/python3.8/site-packages/hydra/main.py", line 94, in decorated_main
_run_hydra(
File "/home/chushaobo/anaconda3/envs/funasrv2/lib/python3.8/site-packages/hydra/_internal/utils.py", line 394, in _run_hydra
_run_app(
File "/home/chushaobo/anaconda3/envs/funasrv2/lib/python3.8/site-packages/hydra/_internal/utils.py", line 457, in _run_app
run_and_report(
File "/home/chushaobo/anaconda3/envs/funasrv2/lib/python3.8/site-packages/hydra/_internal/utils.py", line 223, in run_and_report
raise ex
File "/home/chushaobo/anaconda3/envs/funasrv2/lib/python3.8/site-packages/hydra/_internal/utils.py", line 220, in run_and_report
return func()
File "/home/chushaobo/anaconda3/envs/funasrv2/lib/python3.8/site-packages/hydra/_internal/utils.py", line 458, in
lambda: hydra.run(
File "/home/chushaobo/anaconda3/envs/funasrv2/lib/python3.8/site-packages/hydra/_internal/hydra.py", line 132, in run
_ = ret.return_value
File "/home/chushaobo/anaconda3/envs/funasrv2/lib/python3.8/site-packages/hydra/core/utils.py", line 260, in return_value
raise self._return_value
File "/home/chushaobo/anaconda3/envs/funasrv2/lib/python3.8/site-packages/hydra/core/utils.py", line 186, in run_job
ret.return_value = task_function(task_cfg)
File "../../../funasr/bin/train.py", line 48, in main_hydra
main(**kwargs)
File "../../../funasr/bin/train.py", line 196, in main
trainer.validate_epoch(
File "/home/chushaobo/anaconda3/envs/funasrv2/lib/python3.8/site-packages/funasr/train_utils/trainer.py", line 432, in validate_epoch
retval = model(**batch)
File "/home/chushaobo/anaconda3/envs/funasrv2/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/chushaobo/anaconda3/envs/funasrv2/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 1156, in forward
output = self._run_ddp_forward(*inputs, **kwargs)
File "/home/chushaobo/anaconda3/envs/funasrv2/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 1110, in _run_ddp_forward
Error executing job with overrides: ['++model=iic/speech_seaco_paraformer_large_asr_nat-zh-cn-16k-common-vocab8404-pytorch', '++train_data_set_list=../../../data/list/train.jsonl', '++valid_data_set_list=../../../data/list/val.jsonl', '++dataset_conf.batch_size=2000', '++dataset_conf.batch_type=token', '++dataset_conf.num_workers=4', '++train_conf.max_epoch=50', '++train_conf.log_interval=1', '++train_conf.resume=false', '++train_conf.validate_interval=2000', '++train_conf.save_checkpoint_interval=2000', '++train_conf.keep_nbest_models=20', '++optim_conf.lr=0.0002', '++output_dir=./outputs']
return module_to_run(*inputs[0], **kwargs[0]) # type: ignore[index]
File "/home/chushaobo/anaconda3/envs/funasrv2/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
Traceback (most recent call last):
File "../../../funasr/bin/train.py", line 225, in
return forward_call(*args, **kwargs)
File "/home/chushaobo/anaconda3/envs/funasrv2/lib/python3.8/site-packages/funasr/models/seaco_paraformer/model.py", line 122, in forward
main_hydra()
File "/home/chushaobo/anaconda3/envs/funasrv2/lib/python3.8/site-packages/hydra/main.py", line 94, in decorated_main
assert text_lengths.dim() == 1, text_lengths.shape
AssertionError_run_hydra(
File "/home/chushaobo/anaconda3/envs/funasrv2/lib/python3.8/site-packages/hydra/_internal/utils.py", line 394, in _run_hydra
: torch.Size([])
_run_app(
File "/home/chushaobo/anaconda3/envs/funasrv2/lib/python3.8/site-packages/hydra/_internal/utils.py", line 457, in _run_app
run_and_report(
File "/home/chushaobo/anaconda3/envs/funasrv2/lib/python3.8/site-packages/hydra/_internal/utils.py", line 223, in run_and_report
raise ex
File "/home/chushaobo/anaconda3/envs/funasrv2/lib/python3.8/site-packages/hydra/_internal/utils.py", line 220, in run_and_report
return func()
File "/home/chushaobo/anaconda3/envs/funasrv2/lib/python3.8/site-packages/hydra/_internal/utils.py", line 458, in
lambda: hydra.run(
File "/home/chushaobo/anaconda3/envs/funasrv2/lib/python3.8/site-packages/hydra/_internal/hydra.py", line 132, in run
_ = ret.return_value
File "/home/chushaobo/anaconda3/envs/funasrv2/lib/python3.8/site-packages/hydra/core/utils.py", line 260, in return_value
raise self._return_value
File "/home/chushaobo/anaconda3/envs/funasrv2/lib/python3.8/site-packages/hydra/core/utils.py", line 186, in run_job
ret.return_value = task_function(task_cfg)
File "../../../funasr/bin/train.py", line 48, in main_hydra
main(**kwargs)
File "../../../funasr/bin/train.py", line 196, in main
trainer.validate_epoch(
File "/home/chushaobo/anaconda3/envs/funasrv2/lib/python3.8/site-packages/funasr/train_utils/trainer.py", line 432, in validate_epoch
retval = model(**batch)
File "/home/chushaobo/anaconda3/envs/funasrv2/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/chushaobo/anaconda3/envs/funasrv2/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 1156, in forward
output = self._run_ddp_forward(*inputs, **kwargs)
File "/home/chushaobo/anaconda3/envs/funasrv2/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 1110, in _run_ddp_forward
return module_to_run(*inputs[0], **kwargs[0]) # type: ignore[index]
File "/home/chushaobo/anaconda3/envs/funasrv2/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/chushaobo/anaconda3/envs/funasrv2/lib/python3.8/site-packages/funasr/models/seaco_paraformer/model.py", line 122, in forward
assert text_lengths.dim() == 1, text_lengths.shape
AssertionError: torch.Size([])
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 197344) of binary: /home/chushaobo/anaconda3/envs/funasrv2/bin/python
Traceback (most recent call last):
File "/home/chushaobo/anaconda3/envs/funasrv2/bin/torchrun", line 8, in
sys.exit(main())
File "/home/chushaobo/anaconda3/envs/funasrv2/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 346, in wrapper
return f(*args, **kwargs)
File "/home/chushaobo/anaconda3/envs/funasrv2/lib/python3.8/site-packages/torch/distributed/run.py", line 794, in main
run(args)
File "/home/chushaobo/anaconda3/envs/funasrv2/lib/python3.8/site-packages/torch/distributed/run.py", line 785, in run
elastic_launch(
File "/home/chushaobo/anaconda3/envs/funasrv2/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 134, in call
return launch_agent(self._config, self._entrypoint, list(args))
File "/home/chushaobo/anaconda3/envs/funasrv2/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 250, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
../../../funasr/bin/train.py FAILED
Failures:
[1]:
time : 2024-04-17_15:23:05
host : chushaobo
rank : 1 (local_rank: 1)
exitcode : 1 (pid: 197345)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
Root Cause (first observed failure):
[0]:
time : 2024-04-17_15:23:05
host : chushaobo
rank : 0 (local_rank: 0)
exitcode : 1 (pid: 197344)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
from funasr.
使用 "git pull, pip install -e ./" 来更新源码,你这个代码没有更新
from funasr.
非常感谢,测试可以训练了,我训练几天看看效果
from funasr.
Related Issues (20)
- 下载funasr:funasr-runtime-sdk-gpu-0.1.1,下到一半提示这个是什么问题
- 可以区分说话人角色和数量 HOT 2
- sencevoice模型无法和punc模型与cam模型一起加载 HOT 1
- 40/10000 实时翻译 40/10000 real-time translation: In the fine tuned inference results, the ID is confused, which makes it inconvenient to calculate the error rate. Is there a solution? Translation: In the fine tuned inference result, the ID is garbled, making it inconvenient to calculate the error rate. Is there a solution 划译 In the fine tuned Inference result, the ID is garbled, making it inconvenient to calculate the error rate. Is there a solution
- In the fine tuned Inference result, the ID is garbled, making it inconvenient to calculate the error rate. Is there a solution?
- 2pass模式下,部署funasr的英文识别结果,最后两个单词之间没有空格 HOT 2
- 有多人说话场景分离的模型吗? HOT 2
- KeyError: 'asr-inference is not in the pipelines registry group auto-speech-recognition. Please make sure the correct version of ModelScope library is used.'
- TypeError: forward() missing 3 required positional arguments: 'speech_lengths', 'text', and 'text_lengths'
- 实时语音听写服务后续会考虑支持时间戳吗 HOT 1
- How to export a fine-tuned model to ONNX format and deploy it to run in a WebSocket or docker? HOT 1
- /examples/industrial_data_pretraining/paraformer/finetune.sh运行报错 HOT 4
- 如何使用JAVA来实现实时语音翻译功能 HOT 2
- 如何用FunASR预先判断音频是中文还是英文? HOT 1
- ffmpeg处理pcm格式音频
- 微调时用自己的数据集总会遇到Error executing job with overrides:,在aishell1上就无此问题,请问有知道解决办法的么
- 请问如何使用FunASR进行语种的识别?
- Passing wrong arguments into compute_mask_indices
- 如果要训练一个中文标点检测模型,那么数据的准备格式是怎样的呢? HOT 2
- The English model is not working properly
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from funasr.