paddlepaddle / paddleclas Goto Github PK
View Code? Open in Web Editor NEWA treasure chest for visual classification and recognition powered by PaddlePaddle
License: Apache License 2.0
A treasure chest for visual classification and recognition powered by PaddlePaddle
License: Apache License 2.0
貌似配置里面不能设置数据列表的delimiter,我这数据集里面文件名带空格,能用 | 的话会很方便
although we support workable running scripts, some mistakes always happen when users try to run their own script without export PYTHONPATH
你好,我在aistudio上已将训练的模型转成inference模型之后,在推断的时候报错了:
!export PYTHONPATH=./:$PYTHONPATH && python tools/infer/predict.py
-m=./inference/ResNet50_vd/model
-p=./inference/ResNet50_vd/params
-i=./dataset/flowers102/jpg/image_02275.jpg
--use_gpu=1
--use_tensorrt=True
报错信息如下:
Traceback (most recent call last):
File "tools/infer/predict.py", line 156, in
main()
File "tools/infer/predict.py", line 110, in main
predictor = create_predictor(args)
File "tools/infer/predict.py", line 66, in create_predictor
predictor = create_paddle_predictor(config)
paddle.fluid.core_avx.EnforceNotMet:
0 std::string paddle::platform::GetTraceBackString<char const*>(char const*&&, char const*, int)
1 paddle::platform::EnforceNotMet::EnforceNotMet(std::__exception_ptr::exception_ptr, char const*, int)
2 paddle::framework::ir::PassRegistry::Get(std::string const&) const
3 paddle::inference::analysis::IRPassManager::CreatePasses(paddle::inference::analysis::Argument*, std::vector<std::string, std::allocatorstd::string > const&)
4 paddle::inference::analysis::IRPassManager::IRPassManager(paddle::inference::analysis::Argument*)
5 paddle::inference::analysis::IrAnalysisPass::RunImpl(paddle::inference::analysis::Argument*)
6 paddle::inference::analysis::Analyzer::RunAnalysis(paddle::inference::analysis::Argument*)
7 paddle::AnalysisPredictor::OptimizeInferenceProgram()
8 paddle::AnalysisPredictor::PrepareProgram(std::shared_ptrpaddle::framework::ProgramDesc const&)
9 paddle::AnalysisPredictor::Init(std::shared_ptrpaddle::framework::Scope const&, std::shared_ptrpaddle::framework::ProgramDesc const&)
10 std::unique_ptr<paddle::PaddlePredictor, std::default_deletepaddle::PaddlePredictor > paddle::CreatePaddlePredictor<paddle::AnalysisConfig, (paddle::PaddleEngineKind)2>(paddle::AnalysisConfig const&)
11 std::unique_ptr<paddle::PaddlePredictor, std::default_deletepaddle::PaddlePredictor > paddle::CreatePaddlePredictorpaddle::AnalysisConfig(paddle::AnalysisConfig const&)
Error: Pass tensorrt_subgraph_pass has not been registered at (/paddle/paddle/fluid/framework/ir/pass.h:201)
请问如何解决?
export CUDA_VISIBLE_DEVICES=0
python -m paddle.distributed.launch
--selected_gpus="0"
tools/train.py
-c ./configs/quick_start/ResNet50_vd.yaml
使用上述命令训练模型后,然后通过export_model转换模型
python tools/export_model.py --model=ResNet50_vd --pretrained_model=output/ResNet50_vd/19/ --output_path=inference/ResNet50_vd --class_dim=102
报错
2020-05-09 14:36:17,701-WARNING: output/ResNet50_vd/19/.pdparams not found, try to load model file saved with [ save_params, save_persistables, save_vars ]
2020-05-09 14:36:17,701-WARNING: output/ResNet50_vd/19/.pdparams not found, try to load model file saved with [ save_params, save_persistables, save_vars ]
2020-05-09 14:36:17,703-WARNING: variable file [ output/ResNet50_vd/19/ppcls.pdopt output/ResNet50_vd/19/ppcls.pdparams output/ResNet50_vd/19/ppcls.pdmodel ] not used
2020-05-09 14:36:17,703-WARNING: variable file [ output/ResNet50_vd/19/ppcls.pdopt output/ResNet50_vd/19/ppcls.pdparams output/ResNet50_vd/19/ppcls.pdmodel ] not used
/home/lishi/anaconda3/lib/python3.7/site-packages/paddle/fluid/executor.py:804: UserWarning: There are no operators in the program to be executed. If you pass Program manually, please use fluid.program_guard to ensure the current Program is being used.
warnings.warn(error_info)
/home/lishi/anaconda3/lib/python3.7/site-packages/paddle/fluid/executor.py:782: UserWarning: The following exception is not an EOF exception.
"The following exception is not an EOF exception.")
Traceback (most recent call last):
File "tools/export_model.py", line 78, in
main()
File "tools/export_model.py", line 74, in main
params_filename='params')
File "/home/lishi/anaconda3/lib/python3.7/site-packages/paddle/fluid/io.py", line 1245, in save_inference_model
save_persistables(executor, save_dirname, main_program, params_filename)
File "/home/lishi/anaconda3/lib/python3.7/site-packages/paddle/fluid/io.py", line 640, in save_persistables
filename=filename)
File "/home/lishi/anaconda3/lib/python3.7/site-packages/paddle/fluid/io.py", line 295, in save_vars
filename=filename)
File "/home/lishi/anaconda3/lib/python3.7/site-packages/paddle/fluid/io.py", line 350, in save_vars
executor.run(save_program)
File "/home/lishi/anaconda3/lib/python3.7/site-packages/paddle/fluid/executor.py", line 783, in run
six.reraise(*sys.exc_info())
File "/home/lishi/anaconda3/lib/python3.7/site-packages/six.py", line 703, in reraise
raise value
File "/home/lishi/anaconda3/lib/python3.7/site-packages/paddle/fluid/executor.py", line 778, in run
use_program_cache=use_program_cache)
File "/home/lishi/anaconda3/lib/python3.7/site-packages/paddle/fluid/executor.py", line 831, in _run_impl
use_program_cache=use_program_cache)
File "/home/lishi/anaconda3/lib/python3.7/site-packages/paddle/fluid/executor.py", line 905, in _run_program
fetch_var_name)
paddle.fluid.core_avx.EnforceNotMet:
0 std::string paddle::platform::GetTraceBackString<std::string const&>(std::string const&, char const*, int)
1 paddle::platform::EnforceNotMet::EnforceNotMet(std::string const&, char const*, int)
2 paddle::framework::Tensor::type() const
3 paddle::operators::SaveCombineOpKernel<paddle::platform::CPUDeviceContext, float>::Compute(paddle::framework::ExecutionContext const&) const
4 std::_Function_handler<void (paddle::framework::ExecutionContext const&), paddle::framework::OpKernelRegistrarFunctor<paddle::platform::CPUPlace, false, 0ul, paddle::operators::SaveCombineOpKernel<paddle::platform::CPUDeviceContext, float>, paddle::operators::SaveCombineOpKernel<paddle::platform::CPUDeviceContext, double>, paddle::operators::SaveCombineOpKernel<paddle::platform::CPUDeviceContext, int>, paddle::operators::SaveCombineOpKernel<paddle::platform::CPUDeviceContext, long> >::operator()(char const*, char const*, int) const::{lambda(paddle::framework::ExecutionContext const&)#1}>::_M_invoke(std::_Any_data const&, paddle::framework::ExecutionContext const&)
5 paddle::framework::OperatorWithKernel::RunImpl(paddle::framework::Scope const&, paddle::platform::Place const&, paddle::framework::RuntimeContext*) const
6 paddle::framework::OperatorWithKernel::RunImpl(paddle::framework::Scope const&, paddle::platform::Place const&) const
7 paddle::framework::OperatorBase::Run(paddle::framework::Scope const&, paddle::platform::Place const&)
8 paddle::framework::Executor::RunPreparedContext(paddle::framework::ExecutorPrepareContext*, paddle::framework::Scope*, bool, bool, bool)
9 paddle::framework::Executor::Run(paddle::framework::ProgramDesc const&, paddle::framework::Scope*, int, bool, bool, std::vector<std::string, std::allocatorstd::string > const&, bool, bool)
File "/home/lishi/anaconda3/lib/python3.7/site-packages/paddle/fluid/framework.py", line 2525, in append_op
attrs=kwargs.get("attrs", None))
File "/home/lishi/anaconda3/lib/python3.7/site-packages/paddle/fluid/io.py", line 343, in save_vars
'save_to_memory': save_to_memory
File "/home/lishi/anaconda3/lib/python3.7/site-packages/paddle/fluid/io.py", line 295, in save_vars
filename=filename)
File "/home/lishi/anaconda3/lib/python3.7/site-packages/paddle/fluid/io.py", line 640, in save_persistables
filename=filename)
File "/home/lishi/anaconda3/lib/python3.7/site-packages/paddle/fluid/io.py", line 1245, in save_inference_model
save_persistables(executor, save_dirname, main_program, params_filename)
File "tools/export_model.py", line 74, in main
params_filename='params')
File "tools/export_model.py", line 78, in
main()
Error: Tensor not initialized yet when Tensor::type() is called.
[Hint: holder_ should not be null.] at (/paddle/paddle/fluid/framework/tensor.h:140)
[operator < save_combine > error]
my infer.sh:
export PYTHONPATH=$PWD:$PYTHONPATH
python -m paddle.distributed.launch
--selected_gpus="0"
tools/infer/infer.py -i "dataset/FGVC2020_SSFGRC/test/26.jpg"
-m "SENet154_vd"
-p "output/expr20_SENet154_vd_train_bestv1_25971.txt_val2000_val2750_78.84"
ERROR:
Traceback (most recent call last):
File "tools/infer/infer.py", line 121, in
main()
File "tools/infer/infer.py", line 113, in main
return_numpy=False)
File "/home/daibing/software/anaconda2/lib/python2.7/site-packages/paddle/fluid/executor.py", line 790, in run
six.reraise(*sys.exc_info())
File "/home/daibing/software/anaconda2/lib/python2.7/site-packages/paddle/fluid/executor.py", line 785, in run
use_program_cache=use_program_cache)
File "/home/daibing/software/anaconda2/lib/python2.7/site-packages/paddle/fluid/executor.py", line 838, in _run_impl
use_program_cache=use_program_cache)
File "/home/daibing/software/anaconda2/lib/python2.7/site-packages/paddle/fluid/executor.py", line 909, in _run_program
self._feed_data(program, feed, feed_var_name, scope)
File "/home/daibing/software/anaconda2/lib/python2.7/site-packages/paddle/fluid/executor.py", line 591, in _feed_data
check_feed_shape_type(var, cur_feed)
File "/home/daibing/software/anaconda2/lib/python2.7/site-packages/paddle/fluid/executor.py", line 230, in check_feed_shape_type
(var.name, len(var.shape), var.shape, feed_shape))
ValueError: The fed Variable u'image' should have dimensions = 4, shape = (-1L, 3L, 224L, 224L), but received fed shape [3L, 224L, 224L] on each device
想请问下,能否单独导出trt引擎文件,希望更灵活的使用trt模型,比如deep stream。
报错: File "tools/train.py", line 124, in
main(args)
InvalidArgumentError: If Attr(soft_label) == true, Input(X) and Input(Label) shall have the same dimensions. But received: the dimensions of Input(X) is [2],the shape of Input(X) is [-1, 2], the dimensions of Input(Label) is [3], the shape ofInput(Label) is [-1, 1, 2]
[Hint: Expected rank == label_dims.size(), but received rank:2 != label_dims.size():3.] at (D:\1.8.1\paddle\paddle\fluid\operators\cross_entropy_op.cc:63)
[operator < cross_entropy > error]
INFO 2020-05-23 18:17:34,812 utils.py:272] terminate all the procs
ERROR 2020-05-23 18:17:34,812 utils.py:416] ABORT!!! Out of all 1 trainers, the trainer process with rank=[0] was aborted. Please check its log.
INFO 2020-05-23 18:17:34,813 utils.py:272] terminate all the procs
图片512*512png,8位深度,类别1,2,3。这个报错里的rank和label_dims.size()分别是什么意思??
文档和model_zoo中均没有找到
你好,我用SSLD模型微调-基于ResNet50_vd_ssld预训练模型来做训练,然后生成了inference模型,想用PaddleHub进行部署的操作,有没有什么借壳快速部署的方式替换一下原来的module下的inference模型就可以启动部署的方式。
is_test
is not correctly set in EfficientNet, leading to drop_connect in test time. It can be easily reproduced by a repeat of inferring in the same image, like what happens in the following.
The predicted probabilities were different between different runs.
The cause may like this.
is_test
defaults to False in EfficientNet
and is not being set to True in either infer.py
or predict.py
.
Moreover, duplicated definition of is_test
in both __init__
and net
leads to confusion.
In fact, _drop_connect
uses self.is_test
and is_test
passed by methods is not used.
It would be better to fix it.
!python tools/download.py -a ResNet50_vd -p ./pretrained -d True
!python tools/download.py -a ResNet50_vd_ssld -p ./pretrained -d True
!python tools/download.py -a MobileNetV3_large_x1_0 -p ./pretrained -d True
Traceback (most recent call last):
File "tools/download.py", line 17, in
from ppcls import model_zoo
ModuleNotFoundError: No module named 'ppcls'
Traceback (most recent call last):
File "tools/download.py", line 17, in
from ppcls import model_zoo
ModuleNotFoundError: No module named 'ppcls'
Traceback (most recent call last):
File "tools/download.py", line 17, in
from ppcls import model_zoo
ModuleNotFoundError: No module named 'ppcls'
请问下目前paddleclas是否有c++的部署代码,以及采用c++部署后的性能结果?
你好,我在v100上测试resnet50vd耗时接近24ms,你们的5ms以内是怎么测试的
In operators.py
It seems that to_np
, order
and channel_first
is not necessary
we already have a ToCHWImage function
2020-05-13 23:57:14 INFO: ARCHITECTURE :
2020-05-13 23:57:14 INFO: name : ResNet50_vd
2020-05-13 23:57:14 INFO: ------------------------------------------------------------
2020-05-13 23:57:14 INFO: LEARNING_RATE :
2020-05-13 23:57:14 INFO: function : Cosine
2020-05-13 23:57:14 INFO: params :
2020-05-13 23:57:14 INFO: lr : 0.00375
2020-05-13 23:57:14 INFO: ------------------------------------------------------------
2020-05-13 23:57:14 INFO: OPTIMIZER :
2020-05-13 23:57:14 INFO: function : Momentum
2020-05-13 23:57:14 INFO: params :
2020-05-13 23:57:14 INFO: momentum : 0.9
2020-05-13 23:57:14 INFO: regularizer :
2020-05-13 23:57:14 INFO: factor : 1e-06
2020-05-13 23:57:14 INFO: function : L2
2020-05-13 23:57:14 INFO: ------------------------------------------------------------
2020-05-13 23:57:14 INFO: TRAIN :
2020-05-13 23:57:14 INFO: batch_size : 32
2020-05-13 23:57:14 INFO: data_dir : G:/ai_data/paddle/0513/
2020-05-13 23:57:14 INFO: file_list : G:/ai_data/paddle/0513train.list
2020-05-13 23:57:14 INFO: num_workers : 4
2020-05-13 23:57:14 INFO: shuffle_seed : 0
2020-05-13 23:57:14 INFO: transforms :
2020-05-13 23:57:14 INFO: DecodeImage :
2020-05-13 23:57:14 INFO: channel_first : False
2020-05-13 23:57:14 INFO: to_np : False
2020-05-13 23:57:14 INFO: to_rgb : True
2020-05-13 23:57:14 INFO: RandCropImage :
2020-05-13 23:57:14 INFO: size : 224
2020-05-13 23:57:14 INFO: RandFlipImage :
2020-05-13 23:57:14 INFO: flip_code : 1
2020-05-13 23:57:14 INFO: NormalizeImage :
2020-05-13 23:57:14 INFO: mean : [0.485, 0.456, 0.406]
2020-05-13 23:57:14 INFO: order :
2020-05-13 23:57:14 INFO: scale : 1./255.
2020-05-13 23:57:14 INFO: std : [0.229, 0.224, 0.225]
2020-05-13 23:57:14 INFO: ToCHWImage : None
2020-05-13 23:57:14 INFO: ------------------------------------------------------------
2020-05-13 23:57:14 INFO: VALID :
2020-05-13 23:57:14 INFO: batch_size : 20
2020-05-13 23:57:14 INFO: data_dir : G:/ai_data/paddle/0513/
2020-05-13 23:57:14 INFO: file_list : G:/ai_data/paddle/0513test.list
2020-05-13 23:57:14 INFO: num_workers : 4
2020-05-13 23:57:14 INFO: shuffle_seed : 0
2020-05-13 23:57:14 INFO: transforms :
2020-05-13 23:57:14 INFO: DecodeImage :
2020-05-13 23:57:14 INFO: channel_first : False
2020-05-13 23:57:14 INFO: to_np : False
2020-05-13 23:57:14 INFO: to_rgb : True
2020-05-13 23:57:14 INFO: ResizeImage :
2020-05-13 23:57:14 INFO: resize_short : 256
2020-05-13 23:57:14 INFO: CropImage :
2020-05-13 23:57:14 INFO: size : 224
2020-05-13 23:57:14 INFO: NormalizeImage :
2020-05-13 23:57:14 INFO: mean : [0.485, 0.456, 0.406]
2020-05-13 23:57:14 INFO: order :
2020-05-13 23:57:14 INFO: scale : 1.0/255.0
2020-05-13 23:57:14 INFO: std : [0.229, 0.224, 0.225]
2020-05-13 23:57:14 INFO: ToCHWImage : None
2020-05-13 23:57:14 INFO: ------------------------------------------------------------
2020-05-13 23:57:14 INFO: classes_num : 3
2020-05-13 23:57:14 INFO: epochs : 20
2020-05-13 23:57:14 INFO: image_shape : [3, 224, 224]
2020-05-13 23:57:14 INFO: mode : train
2020-05-13 23:57:14 INFO: model_save_dir : E:/projects/PaddleClas-master/output/
2020-05-13 23:57:14 INFO: pretrained_model : E:/projects/PaddleClas-master/ResNet50_vd_pretrained
2020-05-13 23:57:14 INFO: save_interval : 1
2020-05-13 23:57:14 INFO: topk : 5
2020-05-13 23:57:14 INFO: total_images : 795
2020-05-13 23:57:14 INFO: valid_interval : 1
2020-05-13 23:57:14 INFO: validate : True
API is deprecated since 2.0.0 Please use FleetAPI instead.
WIKI: https://github.com/PaddlePaddle/Fleet/blob/develop/markdown_doc/transpiler
Traceback (most recent call last):
File "tools/train.py", line 124, in
main(args)
File "tools/train.py", line 69, in main
config, train_prog, startup_prog, is_train=True)
File "E:\projects\PaddleClas-master\tools\program.py", line 341, in build
optimizer.minimize(fetchs['loss'][0])
File "C:\python\tf\lib\site-packages\paddle\fluid\incubate\fleet\collective_init_.py", line 424, in minimize
fleet.main_program = self.try_to_compile(startup_program, main_program)
File "C:\python\tf\lib\site-packages\paddle\fluid\incubate\fleet\collective_init.py", line 358, in _try_to_compile
self.transpile(startup_program, main_program)
File "C:\python\tf\lib\site-packages\paddle\fluid\incubate\fleet\collective_init.py", line 285, in _transpile
current_endpoint=current_endpoint)
File "C:\python\tf\lib\site-packages\paddle\fluid\transpiler\distribute_transpiler.py", line 625, in transpile
wait_port=self.config.wait_port)
File "C:\python\tf\lib\site-packages\paddle\fluid\transpiler\distribute_transpiler.py", line 397, in _transpile_nccl2
self.config.hierarchical_allreduce_inter_nranks
File "C:\python\tf\lib\site-packages\paddle\fluid\framework.py", line 2525, in append_op
attrs=kwargs.get("attrs", None))
File "C:\python\tf\lib\site-packages\paddle\fluid\framework.py", line 1797, in init
proto = OpProtoHolder.instance().get_op_proto(type)
File "C:\python\tf\lib\site-packages\paddle\fluid\framework.py", line 1679, in get_op_proto
raise ValueError("Operator "%s" has not been registered." % type)
ValueError: Operator "gen_nccl_id" has not been registered.
2020-05-13 15:57:16,981-ERROR: ABORT!!! Out of all 1 trainers, the trainer process with rank=[0] was aborted. Please check its log.
ERROR 2020-05-13 15:57:16,981 launch.py:284] ABORT!!! Out of all 1 trainers, the trainer process with rank=[0] was aborted. Please check its log.
这是什么问题?
如题,开发者你好,请问一下目前这个库的动态图版本代码能正常运行么?和静态图版本的开发进度目前有哪些是不对齐的?
As the CI is already built,
The unittest can be reconstructed, like:
|—— ppcls
|
|—— test
|————|———— test_reader.py
|————|———— test_imaug.py
|————|———— test_download.py
|————|———— test_compress.py
|————|———— test_model.py
|————|———— test_speed.py
|————|———— test_finetune.py
|————|———— test_eval.py
|————|———— test_train.py
|————|———— test_infer.py
|————|———— test_performance.py (IMPORTANT)
|_________|__________test_export.py
在创建res2net 200层模型时,py2会报错:
UnicodeDecodeError: 'utf8' codec can't decode byte 0x80 in position 4: invalid start byte
因为层数超过26个英文字母,代码里的命名会出错
conv_name = "res" + str(block+2) + chr(97+i)
代码里的上一个分支应该增加res2net200
if layers in [101, 152, 200] and block == 2:
FAQ中说“启动运行后,日志会实时输出到mylog/workerlog.*中,可以在这里查看实时的日志。”
但我为什么我运行后却找不到mylog文件夹?另外怎么可视化训练过程?
执行你们教程,报AssertionError: can't find PADDLE_TRAINER_ENDPOINTS
python tools/eval.py
-c ./configs/eval.yaml
-o ARCHITECTURE.name="ResNet50_vd"
-o pretrained_model=output/ResNet50_vd/19/ppcls
I am training WRN-28-10 on CIFAR10 using PaddleClas. When batch size > 128, using larger batch size, training gets slower. A detailed comparison is shown below.
Batch Size | Time (Per Epoch) |
---|---|
32 | 82.2s |
64 | 72.8s |
128 | 68.5s |
256 | 74.1s |
512 | 86.4s |
1024 | 110.5s |
The time of the 2nd epoch is reported, so warm-up time is not counted. Experiments showed that the results were consistent.
This behavior is strange and unexpected. Could you help me to find the reason?
Code to reproduce is here.
Thank you very much!
在windows10,cpu环境下运行下面的命令出现上面的错误
python -m paddle.distributed.launch tools/train.py -c ./configs/quick_start/ResNet50_vd.yaml
PaddleClas能否提供对cpu的支持呢?
我的笔记本是window10 x64, 显卡是NVIDIA GeForce GTX 1650.
我按照示例程序编写训练语句,如下:python -m paddle.distributed.launch
--selected_gpus="0"
tools/train.py
-c ./configs/quick_start/ResNet50_vd.yaml
结果提示 ”gen_nccl_id ” has not been registered, 咨询QQ群说是window不支持多卡,请问针对我目前情况,应该如何写训练语句
The concept: place
confuse when someone tries to set available gpu places by indicating CUDA_VISIBLE_DEVICES
using Fleet interface, only the FLAGS_selected_gpus works
so we have to obtain gpu num by
gpu_num = paddle.fluid.core.get_cuda_device_count() if (
'PADDLE_TRAINERS_NUM') and (
'PADDLE_TRAINER_ID'
) not in env else int(env.get('PADDLE_TRAINERS_NUM', 0))
您好,我用infer脚本进行推断的时候遇到了如下的问题
第一次infer:class id: 1, probability: 0.9075
第二次infer:class id: 1, probability: 0.9048
第三次infer:class id: 1, probability: 0.9069
这是我的运行脚本:
export PYTHONPATH=$PWD:$PYTHONPATH
export CUDA_VISIBLE_DEVICES=0
#--model=EfficientNetB0 --pretrained_model=output/EfficientNetB0_val/best_model_in_epoch_124/ppcls --output_paht=./convert
python tools/infer/infer.py
--image_file=./tools/img.jpg
--model=EfficientNetB0
--pretrained_model=output/EfficientNetB0_val/best_model_in_epoch_124/ppcls \
其中我的改动是,在resize的时候去掉了resize_short模式,将图片直接resize到288大小
有遇到的小伙伴帮忙答疑一下呀,谢谢~~
模型调用命令,使用百度ResNet50_vd_10w的预训练模型:
set CUDA_VISIBLE_DEVICES=0
python -m paddle.distributed.launch --selected_gpus="0" tools/train.py -c ./configs/quick_start/ResNet50_vd_10w_finetune.yaml
报错:
Traceback (most recent call last):
File "tools/train.py", line 150, in
main(args)
File "tools/train.py", line 75, in main
config, train_prog, startup_prog, is_train=True)
File "F:\pythonproject\PaddleClas\PaddleClas\tools\program.py", line 363, in build
optimizer.minimize(fetchs['loss'][0])
File "F:\Anaconda3\lib\site-packages\paddle\fluid\incubate\fleet\collective_init_.py", line 652, in minimize
fleet.main_program = self.try_to_compile(startup_program, main_program)
File "F:\Anaconda3\lib\site-packages\paddle\fluid\incubate\fleet\collective_init.py", line 562, in _try_to_compile
self.transpile(startup_program, main_program)
File "F:\Anaconda3\lib\site-packages\paddle\fluid\incubate\fleet\collective_init.py", line 489, in _transpile
current_endpoint=current_endpoint)
File "F:\Anaconda3\lib\site-packages\paddle\fluid\transpiler\distribute_transpiler.py", line 625, in transpile
wait_port=self.config.wait_port)
File "F:\Anaconda3\lib\site-packages\paddle\fluid\transpiler\distribute_transpiler.py", line 397, in _transpile_nccl2
self.config.hierarchical_allreduce_inter_nranks
File "F:\Anaconda3\lib\site-packages\paddle\fluid\framework.py", line 2610, in append_op
attrs=kwargs.get("attrs", None))
File "F:\Anaconda3\lib\site-packages\paddle\fluid\framework.py", line 1870, in init
proto = OpProtoHolder.instance().get_op_proto(type)
File "F:\Anaconda3\lib\site-packages\paddle\fluid\framework.py", line 1751, in get_op_proto
raise ValueError("Operator "%s" has not been registered." % type)
ValueError: Operator "gen_nccl_id" has not been registered.
INFO 2020-06-22 11:29:30,706 utils.py:272] terminate all the procs
ERROR 2020-06-22 11:29:30,706 utils.py:416] ABORT!!! Out of all 1 trainers, the trainer process with rank=[0] was aborted. Please check its log.
INFO 2020-06-22 11:29:30,706 utils.py:272] terminate all the procs
ResNet50_vd_10w_finetune.yaml文件配置如下:
mode: 'train'
ARCHITECTURE:
name: 'ResNet50_vd'
pretrained_model: "F:/pythonproject/PaddleClas/PaddleClas/ResNet50_vd_10w_pretrained/ResNet50_vd_10w_pretrained"
model_save_dir: "./output/"
classes_num: 5
total_images: 11745
save_interval: 1
validate: True
valid_interval: 1
epochs: 20
topk: 2
image_shape: [3, 224, 224]
LEARNING_RATE:
function: 'Cosine'
params:
lr: 0.00375
OPTIMIZER:
function: 'Momentum'
params:
momentum: 0.9
regularizer:
function: 'L2'
factor: 0.000001
TRAIN:
batch_size: 32
num_workers: 4
file_list: "F:/pythonproject\PaddleClas/PaddleClas/dataset/driver/train_list.txt"
data_dir: "F:/pythonproject\PaddleClas/PaddleClas/dataset/driver/"
shuffle_seed: 0
transforms:
- DecodeImage:
to_rgb: True
to_np: False
channel_first: False
- RandCropImage:
size: 224
- RandFlipImage:
flip_code: 1
- NormalizeImage:
scale: 1./255.
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
- ToCHWImage:
VALID:
batch_size: 20
num_workers: 4
file_list: "F:/pythonproject\PaddleClas/PaddleClas/dataset/driver/val_list.txt"
data_dir: "F:/pythonproject\PaddleClas/PaddleClas/dataset/driver/"
shuffle_seed: 0
transforms:
- DecodeImage:
to_rgb: True
to_np: False
channel_first: False
- ResizeImage:
resize_short: 256
- CropImage:
size: 224
- NormalizeImage:
scale: 1.0/255.0
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
- ToCHWImage:
请问有没有一个能跑通的,给我个链接
你好,请问如果训练数据不均衡出现数据倾斜,目前PaddleClas是否有相对应解决办法,谢谢。
paddle环境1.7.2 cuda9.0 cudnn7.5
如果使用命令/home/vis/duyuting/app/anaconda3/bin/python -m paddle.distributed.launch --selected_gpus="0" tools/train.py -c ./configs/quick_start/ResNet50_vd.yaml 会报错:
Error: Failed to find dynamic library: libnccl.so ( /lib64/libc.so.6: version `GLIBC_2.14' not found (required by /home/vis/duyuting/app/nccl_2.5.6-1+cuda10.0_x86_64/lib/libnccl.so) )
Please specify its path correctly using following ways:
Method. set environment variable LD_LIBRARY_PATH on Linux or DYLD_LIBRARY_PATH on Mac OS.
For instance, issue command: export LD_LIBRARY_PATH=...
Note: After Mac OS 10.11, using the DYLD_LIBRARY_PATH is impossible unless System Integrity Protection (SIP) is disabled. at (/paddle/paddle/fluid/platform/dynload/dynamic_loader.cc:177)
[operator < gen_nccl_id > error] 看起来是nccl问题
去官网下载了cuda9版本的nccl报错:
Error: An error occurred here. There is no accurate error hint for this error yet. We are continuously in the process of increasing hint for this kind of error check. It would be helpful if you could inform us of how this conversion went by opening a github issue. And we will resolve it with high priority.
Some models'speed are different
Mixed precision training is available in PaddleCV/image_classification but not in this repo. According to Release Notes of PaddlePaddle 1.7, AMP interfaces have been added.
Based on these, I think it would be convenient to implement it.
Mixed precision training is critical to fast training on V100. Please consider adding it. Thank you!
UnavailableError: Load operator fail to open file pretrained/ResNet50_vd_10w_pretrained/fc_0.w_0, please check whether the model file is complete or damaged.
[Hint: Expected static_cast(fin) == true, but received static_cast(fin):0 != true:1.] at (/paddle/paddle/fluid/operators/load_op.h:41)
[operator < load > error]
请教下大佬:
1、使用如下命令貌似只能推断一张图片,如果做到推断一个文件夹呢?类似paddle detection那样指定一个infer_dir。
python tools/infer/predict.py
-m model文件路径
-p params文件路径
-i 图片路径
--use_gpu=1
--use_tensorrt=True
2、windows环境下,怎样设置环境变量呢?我用aistudio上面的命令,Windows终端不认啊:
export PYTHONPATH=$PWD:$PYTHONPATH
尊敬的开发者,你好!
请问一下,飞桨有尝试使用HRNet来进行ImageNet的分类任务吗?
期待你的回复!
aistudio@jupyter-305239-473669:~/work/PaddleClas$ python tools/infer/predict.py -m output_ca/ResNet50_vd/last/model -p output_ca/ResNet50_vd/last/params -i ./test0.jpg --use_gpu=1
Traceback (most recent call last):
File "tools/infer/predict.py", line 160, in
main()
File "tools/infer/predict.py", line 121, in main
inputs = preprocess(args.image_file, operators)
File "tools/infer/predict.py", line 88, in preprocess
data = open(fname).read()
File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/codecs.py", line 322, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte
what the problem?
Error: Pass tensorrt_subgraph_pass has not been registered at (/paddle/paddle/fluid/framework/ir/pass.h:170)
Process Process-1:
Traceback (most recent call last):
File "/usr/lib/python3.5/multiprocessing/process.py", line 249, in _bootstrap
self.run()
File "/usr/lib/python3.5/multiprocessing/process.py", line 93, in run
self._target(*self._args, **self._kwargs)
File "/usr/local/lib/python3.5/dist-packages/paddle/reader/decorator.py", line 549, in _read_into_queue
for sample in reader():
File "/usr/local/lib/python3.5/dist-packages/six.py", line 703, in reraise
raise value
File "/usr/local/lib/python3.5/dist-packages/paddle/reader/decorator.py", line 549, in _read_into_queue
for sample in reader():
File "/home/pd_source/cla/ppcls/data/reader.py", line 191, in reader
for line in full_lines:
File "/home/pd_source/cla/ppcls/data/reader.py", line 191, in reader
for line in full_lines:
File "/usr/lib/python3.5/bdb.py", line 48, in trace_dispatch
return self.dispatch_line(frame)
File "/usr/lib/python3.5/bdb.py", line 67, in dispatch_line
if self.quitting: raise BdbQuit
bdb.BdbQuit
/home/pd_source/cla/ppcls/data/reader.py(191)reader()
-> for line in full_lines:
(Pdb)
/home/pd_source/cla/ppcls/data/reader.py(191)reader()
-> for line in full_lines:
Process Process-2:
(Pdb)
Traceback (most recent call last):
File "/usr/lib/python3.5/multiprocessing/process.py", line 249, in _bootstrap
self.run()
File "/usr/lib/python3.5/multiprocessing/process.py", line 93, in run
self._target(*self._args, **self._kwargs)
File "/usr/local/lib/python3.5/dist-packages/paddle/reader/decorator.py", line 549, in _read_into_queue
for sample in reader():
File "/usr/local/lib/python3.5/dist-packages/six.py", line 703, in reraise
raise value
File "/usr/local/lib/python3.5/dist-packages/paddle/reader/decorator.py", line 549, in _read_into_queue
for sample in reader():
File "/home/pd_source/cla/ppcls/data/reader.py", line 191, in reader
for line in full_lines:
File "/home/pd_source/cla/ppcls/data/reader.py", line 191, in reader
for line in full_lines:
File "/usr/lib/python3.5/bdb.py", line 48, in trace_dispatch
return self.dispatch_line(frame)
File "/usr/lib/python3.5/bdb.py", line 67, in dispatch_line
if self.quitting: raise BdbQuit
bdb.BdbQuit
Process Process-1:
Traceback (most recent call last):
File "/usr/lib/python3.5/multiprocessing/process.py", line 249, in _bootstrap
self.run()
File "/usr/lib/python3.5/multiprocessing/process.py", line 93, in run
self._target(*self._args, **self._kwargs)
File "/usr/local/lib/python3.5/dist-packages/paddle/reader/decorator.py", line 549, in _read_into_queue
for sample in reader():
File "/usr/local/lib/python3.5/dist-packages/six.py", line 703, in reraise
raise value
File "/usr/local/lib/python3.5/dist-packages/paddle/reader/decorator.py", line 549, in _read_into_queue
for sample in reader():
File "/home/pd_source/cla/ppcls/data/reader.py", line 191, in reader
for line in full_lines:
File "/home/pd_source/cla/ppcls/data/reader.py", line 191, in reader
for line in full_lines:
File "/usr/lib/python3.5/bdb.py", line 48, in trace_dispatch
return self.dispatch_line(frame)
File "/usr/lib/python3.5/bdb.py", line 67, in dispatch_line
if self.quitting: raise BdbQuit
bdb.BdbQuit
/home/pd_source/cla/ppcls/data/reader.py(191)reader()
-> for line in full_lines:
(Pdb)
Process Process-3:
Traceback (most recent call last):
File "/usr/lib/python3.5/multiprocessing/process.py", line 249, in _bootstrap
self.run()
File "/usr/lib/python3.5/multiprocessing/process.py", line 93, in run
self._target(*self._args, **self._kwargs)
File "/usr/local/lib/python3.5/dist-packages/paddle/reader/decorator.py", line 549, in _read_into_queue
for sample in reader():
File "/usr/local/lib/python3.5/dist-packages/six.py", line 703, in reraise
raise value
File "/usr/local/lib/python3.5/dist-packages/paddle/reader/decorator.py", line 549, in _read_into_queue
for sample in reader():
File "/home/pd_source/cla/ppcls/data/reader.py", line 191, in reader
for line in full_lines:
File "/home/pd_source/cla/ppcls/data/reader.py", line 191, in reader
for line in full_lines:
File "/usr/lib/python3.5/bdb.py", line 48, in trace_dispatch
return self.dispatch_line(frame)
File "/usr/lib/python3.5/bdb.py", line 67, in dispatch_line
if self.quitting: raise BdbQuit
bdb.BdbQuit
2020-05-27 14:43:10 WARNING: Your reader has raised an exception!
Exception in thread Thread-1:
Traceback (most recent call last):
File "/usr/lib/python3.5/threading.py", line 914, in _bootstrap_inner
self.run()
File "/usr/lib/python3.5/threading.py", line 862, in run
self._target(*self._args, **self._kwargs)
File "/usr/local/lib/python3.5/dist-packages/paddle/fluid/reader.py", line 1156, in thread_main
six.reraise(*sys.exc_info())
File "/usr/local/lib/python3.5/dist-packages/six.py", line 703, in reraise
raise value
File "/usr/local/lib/python3.5/dist-packages/paddle/fluid/reader.py", line 1136, in thread_main
for tensors in self._tensor_reader():
File "/usr/local/lib/python3.5/dist-packages/paddle/fluid/reader.py", line 1206, in tensor_reader_impl
for slots in paddle_reader():
File "/usr/local/lib/python3.5/dist-packages/paddle/fluid/data_feeder.py", line 506, in reader_creator
for item in reader():
File "/home/pd_source/cla/ppcls/data/reader.py", line 267, in wrapper
for idx, sample in enumerate(reader()):
File "/usr/local/lib/python3.5/dist-packages/paddle/reader/decorator.py", line 572, in queue_reader
raise ValueError("multiprocess reader raises an exception")
ValueError: multiprocess reader raises an exception
/home/pd_source/cla/ppcls/data/reader.py(191)reader()
-> for line in full_lines:
(Pdb)
Process Process-4:
Traceback (most recent call last):
File "/usr/lib/python3.5/multiprocessing/process.py", line 249, in _bootstrap
self.run()
File "/usr/lib/python3.5/multiprocessing/process.py", line 93, in run
self._target(*self._args, **self._kwargs)
File "/usr/local/lib/python3.5/dist-packages/paddle/reader/decorator.py", line 549, in _read_into_queue
for sample in reader():
File "/usr/local/lib/python3.5/dist-packages/six.py", line 703, in reraise
raise value
File "/usr/local/lib/python3.5/dist-packages/paddle/reader/decorator.py", line 549, in _read_into_queue
for sample in reader():
File "/home/pd_source/cla/ppcls/data/reader.py", line 191, in reader
for line in full_lines:
File "/home/pd_source/cla/ppcls/data/reader.py", line 191, in reader
for line in full_lines:
File "/usr/lib/python3.5/bdb.py", line 48, in trace_dispatch
return self.dispatch_line(frame)
File "/usr/lib/python3.5/bdb.py", line 67, in dispatch_line
if self.quitting: raise BdbQuit
bdb.BdbQuit
/home/pd_source/cla/ppcls/data/reader.py(191)reader()
-> for line in full_lines:
(Pdb)
Process Process-2:
Traceback (most recent call last):
File "/usr/lib/python3.5/multiprocessing/process.py", line 249, in _bootstrap
self.run()
File "/usr/lib/python3.5/multiprocessing/process.py", line 93, in run
self._target(*self._args, **self._kwargs)
File "/usr/local/lib/python3.5/dist-packages/paddle/reader/decorator.py", line 549, in _read_into_queue
for sample in reader():
File "/usr/local/lib/python3.5/dist-packages/six.py", line 703, in reraise
raise value
File "/usr/local/lib/python3.5/dist-packages/paddle/reader/decorator.py", line 549, in _read_into_queue
for sample in reader():
File "/home/pd_source/cla/ppcls/data/reader.py", line 191, in reader
for line in full_lines:
File "/home/pd_source/cla/ppcls/data/reader.py", line 191, in reader
for line in full_lines:
File "/usr/lib/python3.5/bdb.py", line 48, in trace_dispatch
return self.dispatch_line(frame)
File "/usr/lib/python3.5/bdb.py", line 67, in dispatch_line
if self.quitting: raise BdbQuit
bdb.BdbQuit
Traceback (most recent call last):
File "./jaits_utils/task_tools.py", line 494, in inner
func(jif,*args, **kwargs)
File "cla/jaits_train.py", line 215, in main
epoch_id, 'train')
File "/home/pd_source/cla/program.py", line 413, in run
for idx, batch in enumerate(dataloader()):
File "/usr/local/lib/python3.5/dist-packages/paddle/fluid/reader.py", line 1102, in next
return self._reader.read_next()
paddle.fluid.core_avx.EnforceNotMet:
0 std::string paddle::platform::GetTraceBackString<std::string const&>(std::string const&, char const*, int)
1 paddle::platform::EnforceNotMet::EnforceNotMet(std::string const&, char const*, int)
2 paddle::operators::reader::BlockingQueue<std::vector<paddle::framework::LoDTensor, std::allocatorpaddle::framework::LoDTensor > >::Receive(std::vector<paddle::framework::LoDTensor, std::allocatorpaddle::framework::LoDTensor >)
3 paddle::operators::reader::PyReader::ReadNext(std::vector<paddle::framework::LoDTensor, std::allocatorpaddle::framework::LoDTensor >)
4 std::_Function_handler<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> (), std::__future_base::_Task_setter<std::unique_ptr<std::__future_base::_Result, std::__future_base::_Result_base::_Deleter>, unsigned long> >::_M_invoke(std::_Any_data const&)
5 std::__future_base::_State_base::_M_do_set(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>&, bool&)
6 ThreadPool::ThreadPool(unsigned long)::{lambda()#1}::operator()() const
Error: Blocking queue is killed because the data reader raises an exception
[Hint: Expected killed_ != true, but received killed_:1 == true:1.] at (/paddle/paddle/fluid/operators/reader/blocking_queue.h:141)
2020-05-27 14:43:10 INFO: SO:exception-Traceback (most recent call last):
File "./jaits_utils/task_tools.py", line 494, in inner
func(jif,*args, **kwargs)
File "cla/jaits_train.py", line 215, in main
epoch_id, 'train')
File "/home/pd_source/cla/program.py", line 413, in run
for idx, batch in enumerate(dataloader()):
File "/usr/local/lib/python3.5/dist-packages/paddle/fluid/reader.py", line 1102, in next
return self._reader.read_next()
paddle.fluid.core_avx.EnforceNotMet:
0 std::string paddle::platform::GetTraceBackString<std::string const&>(std::string const&, char const*, int)
1 paddle::platform::EnforceNotMet::EnforceNotMet(std::string const&, char const*, int)
2 paddle::operators::reader::BlockingQueue<std::vector<paddle::framework::LoDTensor, std::allocatorpaddle::framework::LoDTensor > >::Receive(std::vector<paddle::framework::LoDTensor, std::allocatorpaddle::framework::LoDTensor >)
3 paddle::operators::reader::PyReader::ReadNext(std::vector<paddle::framework::LoDTensor, std::allocatorpaddle::framework::LoDTensor >)
4 std::_Function_handler<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> (), std::__future_base::_Task_setter<std::unique_ptr<std::__future_base::_Result, std::__future_base::_Result_base::_Deleter>, unsigned long> >::_M_invoke(std::_Any_data const&)
5 std::__future_base::_State_base::_M_do_set(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>&, bool&)
6 ThreadPool::ThreadPool(unsigned long)::{lambda()#1}::operator()() const
Error: Blocking queue is killed because the data reader raises an exception
[Hint: Expected killed_ != true, but received killed_:1 == true:1.] at (/paddle/paddle/fluid/operators/reader/blocking_queue.h:141)
/home/pd_source/cla/ppcls/data/reader.py(191)reader()
-> for line in full_lines:
(Pdb)
Process Process-3:
Traceback (most recent call last):
File "/usr/lib/python3.5/multiprocessing/process.py", line 249, in _bootstrap
self.run()
File "/usr/lib/python3.5/multiprocessing/process.py", line 93, in run
self._target(*self._args, **self._kwargs)
File "/usr/local/lib/python3.5/dist-packages/paddle/reader/decorator.py", line 549, in _read_into_queue
for sample in reader():
File "/usr/local/lib/python3.5/dist-packages/six.py", line 703, in reraise
raise value
File "/usr/local/lib/python3.5/dist-packages/paddle/reader/decorator.py", line 549, in _read_into_queue
for sample in reader():
File "/home/pd_source/cla/ppcls/data/reader.py", line 191, in reader
for line in full_lines:
File "/home/pd_source/cla/ppcls/data/reader.py", line 191, in reader
for line in full_lines:
File "/usr/lib/python3.5/bdb.py", line 48, in trace_dispatch
return self.dispatch_line(frame)
File "/usr/lib/python3.5/bdb.py", line 67, in dispatch_line
if self.quitting: raise BdbQuit
bdb.BdbQuit
2020-05-27 14:43:10 WARNING: Your reader has raised an exception!
Exception in thread Thread-1:
Traceback (most recent call last):
File "/usr/lib/python3.5/threading.py", line 914, in _bootstrap_inner
self.run()
File "/usr/lib/python3.5/threading.py", line 862, in run
self._target(*self._args, **self._kwargs)
File "/usr/local/lib/python3.5/dist-packages/paddle/fluid/reader.py", line 1156, in thread_main
six.reraise(*sys.exc_info())
File "/usr/local/lib/python3.5/dist-packages/six.py", line 703, in reraise
raise value
File "/usr/local/lib/python3.5/dist-packages/paddle/fluid/reader.py", line 1136, in thread_main
for tensors in self._tensor_reader():
File "/usr/local/lib/python3.5/dist-packages/paddle/fluid/reader.py", line 1206, in tensor_reader_impl
for slots in paddle_reader():
File "/usr/local/lib/python3.5/dist-packages/paddle/fluid/data_feeder.py", line 506, in reader_creator
for item in reader():
File "/home/pd_source/cla/ppcls/data/reader.py", line 267, in wrapper
for idx, sample in enumerate(reader()):
File "/usr/local/lib/python3.5/dist-packages/paddle/reader/decorator.py", line 572, in queue_reader
raise ValueError("multiprocess reader raises an exception")
ValueError: multiprocess reader raises an exception
/home/pd_source/cla/ppcls/data/reader.py(191)reader()
-> for line in full_lines:
(Pdb)
Process Process-4:
Traceback (most recent call last):
File "/usr/lib/python3.5/multiprocessing/process.py", line 249, in _bootstrap
self.run()
File "/usr/lib/python3.5/multiprocessing/process.py", line 93, in run
self._target(*self._args, **self._kwargs)
File "/usr/local/lib/python3.5/dist-packages/paddle/reader/decorator.py", line 549, in _read_into_queue
for sample in reader():
File "/usr/local/lib/python3.5/dist-packages/six.py", line 703, in reraise
raise value
File "/usr/local/lib/python3.5/dist-packages/paddle/reader/decorator.py", line 549, in _read_into_queue
for sample in reader():
File "/home/pd_source/cla/ppcls/data/reader.py", line 191, in reader
for line in full_lines:
File "/home/pd_source/cla/ppcls/data/reader.py", line 191, in reader
for line in full_lines:
File "/usr/lib/python3.5/bdb.py", line 48, in trace_dispatch
return self.dispatch_line(frame)
File "/usr/lib/python3.5/bdb.py", line 67, in dispatch_line
if self.quitting: raise BdbQuit
bdb.BdbQuit
Traceback (most recent call last):
File "./jaits_utils/task_tools.py", line 494, in inner
func(jif,*args, **kwargs)
File "cla/jaits_train.py", line 215, in main
epoch_id, 'train')
File "/home/pd_source/cla/program.py", line 413, in run
for idx, batch in enumerate(dataloader()):
File "/usr/local/lib/python3.5/dist-packages/paddle/fluid/reader.py", line 1102, in next
return self._reader.read_next()
paddle.fluid.core_avx.EnforceNotMet:
0 std::string paddle::platform::GetTraceBackString<std::string const&>(std::string const&, char const*, int)
1 paddle::platform::EnforceNotMet::EnforceNotMet(std::string const&, char const*, int)
2 paddle::operators::reader::BlockingQueue<std::vector<paddle::framework::LoDTensor, std::allocatorpaddle::framework::LoDTensor > >::Receive(std::vector<paddle::framework::LoDTensor, std::allocatorpaddle::framework::LoDTensor >)
3 paddle::operators::reader::PyReader::ReadNext(std::vector<paddle::framework::LoDTensor, std::allocatorpaddle::framework::LoDTensor >)
4 std::_Function_handler<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> (), std::__future_base::_Task_setter<std::unique_ptr<std::__future_base::_Result, std::__future_base::_Result_base::_Deleter>, unsigned long> >::_M_invoke(std::_Any_data const&)
5 std::__future_base::_State_base::_M_do_set(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>&, bool&)
6 ThreadPool::ThreadPool(unsigned long)::{lambda()#1}::operator()() const
Error: Blocking queue is killed because the data reader raises an exception
[Hint: Expected killed_ != true, but received killed_:1 == true:1.] at (/paddle/paddle/fluid/operators/reader/blocking_queue.h:141)
2020-05-27 14:43:10 INFO: SO:exception-Traceback (most recent call last):
File "./jaits_utils/task_tools.py", line 494, in inner
func(jif,*args, **kwargs)
File "cla/jaits_train.py", line 215, in main
epoch_id, 'train')
File "/home/pd_source/cla/program.py", line 413, in run
for idx, batch in enumerate(dataloader()):
File "/usr/local/lib/python3.5/dist-packages/paddle/fluid/reader.py", line 1102, in next
return self._reader.read_next()
paddle.fluid.core_avx.EnforceNotMet:
0 std::string paddle::platform::GetTraceBackString<std::string const&>(std::string const&, char const*, int)
1 paddle::platform::EnforceNotMet::EnforceNotMet(std::string const&, char const*, int)
2 paddle::operators::reader::BlockingQueue<std::vector<paddle::framework::LoDTensor, std::allocatorpaddle::framework::LoDTensor > >::Receive(std::vector<paddle::framework::LoDTensor, std::allocatorpaddle::framework::LoDTensor >)
3 paddle::operators::reader::PyReader::ReadNext(std::vector<paddle::framework::LoDTensor, std::allocatorpaddle::framework::LoDTensor >)
4 std::_Function_handler<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> (), std::__future_base::_Task_setter<std::unique_ptr<std::__future_base::_Result, std::__future_base::_Result_base::_Deleter>, unsigned long> >::_M_invoke(std::_Any_data const&)
5 std::__future_base::_State_base::_M_do_set(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>&, bool&)
6 ThreadPool::ThreadPool(unsigned long)::{lambda()#1}::operator()() const
Error: Blocking queue is killed because the data reader raises an exception
[Hint: Expected killed_ != true, but received killed_:1 == true:1.] at (/paddle/paddle/fluid/operators/reader/blocking_queue.h:141)
根据教程 导出模型的过程:
python tools/export_model.py
--model=MobileNetV3_large_x1_0
--pretrained_model=./output/MobileNetV3_large_x1_0/best_model_in_epoch_7/
--output_path=./convert/ \
File "/root/anaconda3/lib/python3.7/site-packages/paddle/fluid/framework.py", line 2525, in append_op
attrs=kwargs.get("attrs", None))
File "/root/anaconda3/lib/python3.7/site-packages/paddle/fluid/io.py", line 343, in save_vars
'save_to_memory': save_to_memory
File "/root/anaconda3/lib/python3.7/site-packages/paddle/fluid/io.py", line 295, in save_vars
filename=filename)
File "/root/anaconda3/lib/python3.7/site-packages/paddle/fluid/io.py", line 641, in save_persistables
filename=filename)
File "/root/anaconda3/lib/python3.7/site-packages/paddle/fluid/io.py", line 1246, in save_inference_model
save_persistables(executor, save_dirname, main_program, params_filename)
File "tools/export_model.py", line 74, in main
params_filename='params')
File "tools/export_model.py", line 78, in
main()
Error: Tensor not initialized yet when Tensor::type() is called.
[Hint: holder_ should not be null.] at (/paddle/paddle/fluid/framework/tensor.h:140)
[operator < save_combine > error]
因为我想在HRNet下加上注意力机制,所以选择使用se+hrnet,在赢一个issue中反馈给我的是SE+HRNet需要有带SE的预训练,直接加载没有SE的预训练的模型精度会比较低。
我的问题:
1.是否有SE+HRNet的预训练
2.如果没有,我应该怎么训练能有一个较好的结果,是否有可行性的建议
3.是否有其他易于训练的注意力机制,相较于SE+HRNet在没有预训练模型的情况下容易训练。
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.