huawei-noah / vega Goto Github PK
View Code? Open in Web Editor NEWAutoML tools chain
Home Page: http://www.noahlab.com.hk/opensource/vega/
License: Other
AutoML tools chain
Home Page: http://www.noahlab.com.hk/opensource/vega/
License: Other
As above, have you supplied test file with video, camera or image file, thanks?
With reference to the issue #14. Can you please tell where can I find this model SPNetXB_COCO_ImageNetPretrained.pth
?
I want to run the auto_lane according to the user guide but get the errors. the configure files have changed, but the command recommended is not changed
how to configure the right files of lane detection?
how to register the class name before using them?
how to study the Curvelane-NAS algorithms step by step?
for example,
how to configure the yaml file in some dir some file, how to set the weight between the background and the lane lines (0.4 vs 1.0 maybe not fit for some model according to my experiments of SCNN? because their mIOU only about 0.105 if the weight of the background is set 0.4, the lane lines' weights are set to 1.0. So it is not possible to use mIOU 0.5 to calculate whether the corresponding lane line exists.
please give detailed parameters of your when training and testing on curve lane dataset)
.........
I have read the sure guide carefully, but I don't know how to handle the questions above.
please solve the problems or update the corresponding guide
how to test the SP-NAS mAP on COCO with vega
In super_network.py, you have
self.initializer()
Is that a mistake? If not, why you initialize all the hyperparameters everything the model is called?
Probably because of the fact I am new to vega but I cannot find this wheel vega-0.9.1-py3-none-any.whl
. It says on the installation page download the vega-0.9.1-py3-none-any.whl file in the release directory
Where is the release directory if someone can kindly point out.
I know we can built from the source but I would like to see.
in vega/docs/cn/developer/quick_start.md
@NetworkFactory.register(NetTypes.CUSTOM)
def __init__(self, desc):
super(SimpleCnn, self).__init__()
it may be missing this code:
class SimpleCnn(nn.Module):
When I run quant_ea.yaml, it shows that:
Traceback (most recent call last):
File "", line 1, in
File "/root/.local/lib/python3.6/site-packages/zeus/trainer_base.py", line 153, in train_process
self._train_loop()
File "/root/.local/lib/python3.6/site-packages/zeus/trainer_base.py", line 279, in _train_loop
self.callbacks.before_train()
File "/root/.local/lib/python3.6/site-packages/zeus/trainer/callbacks/callback_list.py", line 139, in before_train
callback.before_train(logs)
File "/root/.local/lib/python3.6/site-packages/vega/algorithms/compression/quant_ea/quant_trainer_callback.py", line 62, in before_train
self.latency_count = calc_forward_latency(model, count_input, sess_config)
File "/root/.local/lib/python3.6/site-packages/zeus/metrics/forward_latency.py", line 31, in calc_forward_latency
step_cfg = UserConfig().data.get("nas")
AttributeError: 'NoneType' object has no attribute 'get'
Hi, thank you for the great work. I want to run pba with my own data and model. Is there any doc to quickly tell me how to use my own backbone or trainer?
Could you help tell how to run the inference.py to inference the auto lane model?
In the model_zoo.md, there is only model and desc file provided but inference code.
I try to use vega/model_zoo/inference.py and dont know how to set the data_type and data_path.
If you can provide document of auto lane inferencing, it will help a lot.
When will K8S deployment be supported with documentation on how to use it?
Thanks!
2020-08-28 23:18:40.631 ERROR Failed to run pipeline.
Traceback (most recent call last):
File "/usr/local/python3.7/lib/python3.7/site-packages/vega/core/pipeline/pipeline.py", line 52, in run
PipeStep().do()
File "/usr/local/python3.7/lib/python3.7/site-packages/vega/core/pipeline/nas_pipe_step.py", line 43, in do
self._dispatch_trainer(res)
File "/usr/local/python3.7/lib/python3.7/site-packages/vega/core/pipeline/nas_pipe_step.py", line 73, in _dispatch_trainer
self.master.run(trainer)
File "/usr/local/python3.7/lib/python3.7/site-packages/vega/core/scheduler/local_master.py", line 42, in run
worker.train_process()
File "/usr/local/python3.7/lib/python3.7/site-packages/vega/core/trainer/trainer.py", line 152, in train_process
self.build(model=self.model, hps=self.hps, load_ckpt_flag=self.load_ckpt_flag)
File "/usr/local/python3.7/lib/python3.7/site-packages/vega/core/trainer/trainer.py", line 189, in build
mode='train', loader=train_loader)
File "/usr/local/python3.7/lib/python3.7/site-packages/vega/core/trainer/trainer.py", line 360, in _init_dataloader
dataset = dataset_cls(mode=mode)
File "/usr/local/python3.7/lib/python3.7/site-packages/vega/datasets/pytorch/cifar10.py", line 41, in init
transform=Compose(self.transforms.transform), download=self.args.download)
File "/root/.local/lib/python3.7/site-packages/torchvision/datasets/cifar.py", line 67, in init
raise RuntimeError('Dataset not found or corrupted.' +
RuntimeError: Dataset not found or corrupted. You can use download=True to download it
2020-08-28 23:18:40.631 ERROR None
I try to use the CARS algorithm to search for datasets on cifar10. The dataset path is set correctly, but the preceding error occurs. Why?
Thanks for the open-source and continuous maintenance of VEGA. Why is VEGA, is there any story? :)
Only .whl mode seems to be supported.
could you please provide a bib for vega? or we cite the URL?
Can I know when will the 1.0 version to release?
when I run run_example.py
output:
ERROR Illegal alpha.
Then I watch source
idx = torch.argmax(alpha[start:end, :], dim=1) cnt = 0 if torch.nonzero(idx).size(0) > 2: logger.error("Illegal alpha.")
the shape of torch.nonzero(idx) is torch.Size([5, 1]) \ torch.Size([4, 1]) or torch.Size([3, 1])
if you want control the limit of connection number can use
if sum(alphaalpha[start:end, :]) > 2:
thanks
when I set parallel_search: True in prune.yml, I get this error
Traceback (most recent call last):
File "", line 1, in
File "/wn/vega/zeus/trainer_base.py", line 153, in train_process
self._train_loop()
File "/wn/vega/zeus/trainer_base.py", line 279, in _train_loop
self.callbacks.before_train()
File "/wn/vega/zeus/trainer/callbacks/callback_list.py", line 139, in before_train
callback.before_train(logs)
File "/wn/vega/vega/algorithms/compression/prune_ea/prune_trainer_callback.py", line 61, in before_train
self.latency_count = calc_forward_latency(self.trainer.model, count_input, sess_config)
File "/wn/vega/zeus/metrics/forward_latency.py", line 30, in calc_forward_latency
step_cfg = UserConfig().data.get("nas")
AttributeError: 'NoneType' object has no attribute 'get'
Its all fine to set parallel_search:False and run prune algo demo, whats wrong with parallel_search
In model_statistics.py, the FLOPS is computed with the 3rd party package "thop". In their GitHub repo, it has been explained that the output of profile is actually MACs instead of FLOPS.
It is even puzzling with this line of code: self.gflops, self.kparams = flops_count * 1600 * 1e-9, params_count * 1e-3
Multiplying 1e-9 is to make it GMACS, but why multiplying 1600?
hello, I met this error when I try to train the auto-lane model with culane dataset. There is no dataset.py in culane dataset, so where can I get dataset.py file, many thanks!!!
Traceback (most recent call last):
File "", line 1, in
File "/home/haha/.local/lib/python3.7/site-packages/vega/core/trainer/trainer.py", line 152, in train_process
self.build(model=self.model, hps=self.hps, load_ckpt_flag=self.load_ckpt_flag)
File "/home/haha/.local/lib/python3.7/site-packages/vega/core/trainer/trainer.py", line 189, in build
mode='train', loader=train_loader)
File "/home/haha/.local/lib/python3.7/site-packages/vega/core/trainer/trainer.py", line 360, in _init_dataloader
dataset = dataset_cls(mode=mode)
File "/home/haha/.local/lib/python3.7/site-packages/vega/datasets/pytorch/auto_lane_datasets.py", line 97, in init
train=load_module(self.args.dataset_file).create_train_subset(),
File "/home/haha/.local/lib/python3.7/site-packages/vega/datasets/pytorch/common/auto_lane_utils.py", line 214, in load_module
spec.loader.exec_module(mod)
File "", line 724, in exec_module
File "", line 859, in get_code
File "", line 916, in get_data
FileNotFoundError: [Errno 2] No such file or directory: '/cache/dataset/CULane/dataset.py'
2020-09-08 15:47:20.949 INFO {'code': 'r34_48_1-1111-1-22112111111111111111+012-122', 'method': 'random'}
I have seemingly installed vega correctly and I can import it as well.
I have correctly placed the pre-trained models and dataset in the respective folders (cache/models and cache/datasets folder).
I am trying to run inside example folder
the following command:
python run_example.py nas/sp_nas/spnas.yml
The code breaks with the following error
vega/algorithms/nas/sp_nas/spnet/spnet.py line 636, in __init__ TypeError: '<' not supported between instances of 'str' and 'int' assert max(out_indices) < num_stages
The error message is clear that you have string on one side and int on the other. When I print both out_indices
and num_stages
I see out_indices is out_indices: {'__tuple__': True, 'items': [0, 1, 2, 3]}
and max(out_indices)
returns simply returns items
Is it a bug or there is an issue in my python or any other lib version etc. ?
I am using Python 3.7.7
What's more, is there any test code or inference code guide of auto_lane module?
I am trying to reproduce numbers for EuroCityPersons using your pre-trained model in the Zoo. I have downloaded model from zoo for ECP. However, the corresponding config file is missing. Can you please provide it.
I cannot find where this log is saved and dont know how to analysis.
can you help answer me?
Failed to save desc, file=/home/mengzhibin/vega/tasks/1105.182606.151/workers/nas/4/desc_4.json, desc={'detector': {'name': 'AutoLaneDetector', 'modules': ['backbone', 'neck', 'head'], 'num_class': 2, 'method': 'random', 'code': 'x50(2x24d)_48_112111-211112-1-1+122-022', 'backbone': {'name': 'ResNeXtVariantDet', 'arch': '112111-211112-1-1', 'base_depth': 50, 'base_channel': 48, 'groups': 2, 'base_width': 24, 'num_stages': 4, 'strides': (1, 2, 2, 2), 'dilations': (1, 1, 1, 1), 'out_indices': (0, 1, 2, 3), 'frozen_stages': -1, 'zero_init_residual': False, 'norm_cfg': {'type': 'BN', 'requires_grad': True}, 'conv_cfg': {'type': 'Conv'}, 'out_channels': [384, 1536, 1536, 1536], 'style': 'pytorch'}, 'neck': {'arch_code': '122-022', 'name': 'FeatureFusionModule', 'in_channels': [384, 1536, 1536, 1536]}, 'head': {'base_channel': 1792, 'num_classes': 2, 'up_points': 73, 'down_points': 72, 'name': 'AutoLaneHead'}, 'limits': {'GFlops': 1}}, 'modules': ['detector']}, msg=local variable 'value' referenced before assignment Failed to save performance, file=/home/mengzhibin/vega/tasks/1105.182606.151/workers/nas/4/performance_4.json, desc={'LaneMetric': 0.0}, msg=local variable 'value' referenced before assignment
Hi, thanks for your awesome work, but i doubt whether the implement of EA's mutation is correct, the paper says that we should acquire the block credits during model evaluation procedure, which can be used to guide the mutation to accelerate searching and find better architecture, I find that vega's implementation is general mutaion, could you help me? thanks!
I test the loss terms in vega/search_space/networks/pytorch/detectors/auto_lane_detector.py:
`image = input
loc_targets = kwargs['gt_loc']
cls_targets = kwargs['gt_cls']
feat = self.extract_feat(image)
predict = self.head(feat)
loc_preds = predict['predict_loc']
cls_preds = predict['predict_cls']
cls_targets = cls_targets[..., 1].view(-1)
pmask = cls_targets > 0
nmask = ~ pmask
fpmask = pmask.float()
fnmask = nmask.float()
cls_preds = cls_preds.view(-1, cls_preds.shape[-1])
loc_preds = loc_preds.view(-1, loc_preds.shape[-1])
loc_targets = loc_targets.view(-1, loc_targets.shape[-1])
total_postive_num = torch.sum(fpmask)
total_negative_num = torch.sum(fnmask) # Number of negative entries to select
negative_num = torch.clamp(total_postive_num * self.NEGATIVE_RATIO, max=total_negative_num, min=1).int()
positive_num = torch.clamp(total_postive_num, min=1).int()
# cls loss begin
bg_fg_predict = F.log_softmax(cls_preds, dim=-1)
fg_predict = bg_fg_predict[..., 1]
bg_predict = bg_fg_predict[..., 0]
max_hard_pred = find_k_th_small_in_a_tensor(bg_predict[nmask].detach(), negative_num)
fnmask_ohem = (bg_predict <= max_hard_pred).float() * nmask.float()
total_cross_pos = -torch.sum(self.ALPHA * fg_predict * fpmask)
total_cross_neg = -torch.sum(self.ALPHA * bg_predict * fnmask_ohem)
# class loss end
# regression loss begin
length_weighted_mask = torch.ones_like(loc_targets)
length_weighted_mask[..., self.LANE_POINTS_NUM_DOWN] = 10
valid_lines_mask = pmask.unsqueeze(-1).expand_as(loc_targets)
valid_points_mask = (loc_targets != 0)
unified_mask = length_weighted_mask.float() * valid_lines_mask.float() * valid_points_mask.float()
smooth_huber = huber_fun(loc_preds - loc_targets) * unified_mask
loc_smooth_l1_loss = torch.sum(smooth_huber, -1)
point_num_per_gt_anchor = torch.sum(valid_points_mask.float(), -1).clamp(min=1)
total_loc = torch.sum(loc_smooth_l1_loss / point_num_per_gt_anchor)
# regression loss end
total_cross_pos = total_cross_pos / positive_num
total_cross_neg = total_cross_neg / positive_num
total_loc = total_loc / positive_num`
on the CurveLanes Dataset, using the provided optimizer parameters,such as lr=0.02, weight_decay=1e-4,momentum=0.9, etc.
And build a model structrue just like the readme file"https://github.com/huaweinoah/vega/blob/master/docs/en/algorithms/auto_lane.md".
However the loss values are pretty large:
loss_pos = 36.0+
loss_neg = 10.+
loss_loc = 100.0+
and did not convergen after 12 epoches, which is mentioned in this paper :https://arxiv.org/abs/2007.12147
Does the loss terms in auto_lane_detector.py are wrong? or just I miss some important steps?
When I try to solve this problem for esr_ea algorithm by this way: #84 , The error shows that : "/root/.local/lib/python3.6/site-packages/vega/core/pipeline/horovod/run_cluster_horovod_train.sh: No such file or directory," . Could you tell me how to deal with it?
I am sorry but it is puzzling on how to test Sp-Nas after training finishes. So my problem is that I have trained SP-Net in the example
folder using the following command:
python run_example.py ./nas/sp_nas/spnas.yml
so the code is using the config file that you guys provide called /nas/sp_nas/faster_rcnn_r50_fpn_1x.py
. It trains fine and during training it evaluates on the validation set and everything is fine and mAp is also reasonable.
Without changing anything in the code or the .yml file or anything, basically cloning it and setting up the dataset paths and pre-trained model and thats it.
However, after training finishes I am trying to run the saved model in the folder:
examples/tasks/0719.042952.773/output/2/1112-1112-11111-21-1-11.pth
Using the command
python test.py vega/examples/nas/sp_nas/faster_rcnn_r50_fpn_1x.py --checkpoint examples/tasks/0719.042952.773/output/2/1112-1112-11111-21-1-11.pth --out res.pkl
but it gives basically 0 mAp which I know is wrong. Do I need to change something in the config file faster_rcnn_r50_fpn_1x.py
or how can I run the test, could you please elaborate ?
Error caused by super class's Module function: forward; When training, Cls AutoLaneHead's function forward_train is not called.
I want to run the training pipeline, can you give the file of /data/2019_mdc_lane/c00523047/mass_storage/culane/CULane/dataset.py
?
Hi, thanks for the great work!
I am curious about the output I get from cars algorithm in the examples.
I got 86.488 as best top-1 valid accuracy after running the command given in readme.
Should the accuracy suppose to be more higher or I need to modify the cars.yml for better performance.
Similar to the bug reported in #19 #23. The following line in test.py is also redundant and it causes the code to crash in second stage of training.
vega/vega/algorithms/nas/sp_nas/tools/test.py
Line 193 in fc8f30c
只有torch.cuda.manual_seed()
缺少了torch.manual_seed()
会导致相同模型训练的精度不一致。
I run example code, like that
python run_pipeline.py classification/classify.yml
with one new config itemtrainer:distributed=True
Some errors occur about horovod. ValueError: Horovod has not been initialized; use hvd.init().
.
Does horovod is not supported in this example?
When the two lane lines are so close that they are in the same grid. Which lane line the grid will respond to?
And another question is about the adaptive score masking:
what does the uxf and uyf mean?
I can't find the chatroom.
is it possible for you guys to share the yml file of your conda environment. I am running into some issues due to versions because of pip.
evey time you run pipline.py . the worker_path is random generated to save model and tensorboard log. such as 0126.091415.578 and 0126.090933.510 etc. it is very inconvenient. can you set the fixed worker_path in yml txt?
in addition, during one task - training , i found some parameter set in yml and some parameter set in config py. it is very informal. as a big company . i think you should Specificate code framework. tks
I found that we cannot download the pretrained models which vega provided in this page https://github.com/huawei-noah/vega/blob/master/docs/cn/user/examples.md
.
Can you fix the url links of pre-trained models or provide other ones?
Training SP-NAS with the default configurations, spnas.yml
(the one you guys provide) breaks in the second phase where it starts training nas2
. It complains it cannot find the file total_list_s.csv
. I think the problem is in the variable remote_output_path
; when using default settings, the code expects the file to be in the folder nas2
, which is not yet created. Instead, the file is at the following location:
tasks/0726.021123.846/output/total_list_s.csv
instead the current code(with default params) searches the file at following location:
tasks/0726.021123.846/output/nas2/total_list_s.csv
and below is the trail of the error message
2020-07-26 04:56:45.680 INFO Start pipeline step: [nas2]
vega-0.9.1-py3.7.egg/vega/core/pipeline/pipeline.py", line 58, in run
PipeStep().do()
vega/algorithms/nas/sp_nas/spnas_pipe_step.py", line 27, in __init__
super().__init__()
vega/core/pipeline/nas_pipe_step.py", line 28, in __init__
self.generator = Generator()
vega/core/pipeline/generator.py", line 25, in __init__
self.search_alg = SearchAlgorithm(self.search_space)
vega-0.9.1-py3.7.egg/vega/algorithms/nas/sp_nas/sp_nas.py", line 50, in __init__
), "Not found serial results!"
AssertionError: Not found serial results!
I am trying to use the ecp model(spnet_checkpoint_ecp.pth) provided in the model zoo to reproduce results of the paper.
When I try to execute test.py in spnet/tools/ it complains about mismatch between keys of the models.
Without the config files (such as faster_rcnn_r50_fpn_1x.py) I cannot do anything?
I have also tried to use the file you guys mention in the issue #14. The code breaks with the error that it is missing attribute in keep_all_stages
Suggestion: I think it would be great if you guys can provide a small example of running you pre-trained model.
Thanks
hello! I'm wondering when the result of (self.points_per_line / self.feature_height)
is not integer, will this line have some problems? (by default it's 72/18=4)
center_y = y_list[int((self.points_per_line / self.feature_height) * (self.feature_height - 1 - h))]
?Dose auto Lane limited to detect a pre-defined number of lanes? thanks
Hello:
I did not find the code for GPU selection.
Thank you !
I follow the quickstart example and get the following error , can anyone help?
File "quickstart.py", line 60, in
vega.run("./my.yml")
File "/home/lchen/.conda/envs/clpython/lib/python3.7/site-packages/vega/core/run.py", line 34, in run
_init_env(cfg_path)
File "/home/lchen/.conda/envs/clpython/lib/python3.7/site-packages/vega/core/run.py", line 62, in _init_env
set_backend(General.backend, General.device_category)
File "/home/lchen/.conda/envs/clpython/lib/python3.7/site-packages/vega/core/backend_register.py", line 64, in set_backend
register_pytorch()
File "/home/lchen/.conda/envs/clpython/lib/python3.7/site-packages/vega/core/backend_register.py", line 20, in register_pytorch
import vega.core.trainer.timm_trainer_callback
File "/home/lchen/.conda/envs/clpython/lib/python3.7/site-packages/vega/core/trainer/timm_trainer_callback.py", line 60, in
mean=IMAGENET_DEFAULT_MEAN,
NameError: name 'IMAGENET_DEFAULT_MEAN' is not defined
Hi, I have run pba and get some augment policies. But I do not known how to fully train the model with the found policy. Could you help me?
Hi,
I am trying to train the full pipeline [nas1, nas2, fullytrain] of SP-NAS. I did not change anything, except I changed one line in spnas.yml, that is I changed:
pipeline: [nas1]
to pipeline: [nas1, nas2, fullytrain]
It trains fine for nas1
and nas2
. However, the code breaks by complaining that it cannot find records. This is the error trail.
Can you suggest a quick fix ?
2020-09-24 10:08:23.81 INFO performance save to vega/examples/tasks/0924.025954.103/workers/nas2/11/performance
2020-09-24 10:08:24.275 INFO Latest checkpoint save to vega/examples/tasks/0924.025954.103/output/11
2020-09-24 10:08:24.276 INFO update generator, step name: nas2, worker id: 11
2020-09-24 10:08:24.277 INFO SpNas.update(), performance file=vega/examples/tasks/0924.025954.103/workers/nas2/11/performance/performance.pkl
2020-09-24 10:08:24.321 INFO Start pipeline step: [fullytrain]
2020-09-24 10:08:24.322 INFO init FullyTrainPipeStep...
2020-09-24 10:08:24.322 INFO FullyTrainPipeStep started...
2020-09-24 10:08:24.324 ERROR Failed to load records from model folder, folder=vega/examples/tasks/0924.025954.103/output/nas2
2020-09-24 10:08:24.324 WARNING Failed to dump records, report is emplty.
output/nas2 this folder is never created by the code.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.