Git Product home page Git Product logo

vega's Issues

ERROR Illegal alpha.

when I run
ERROR Illegal alpha.
Then I watch source

idx = torch.argmax(alpha[start:end, :], dim=1) cnt = 0 if torch.nonzero(idx).size(0) > 2: logger.error("Illegal alpha.")
the shape of torch.nonzero(idx) is torch.Size([5, 1]) \ torch.Size([4, 1]) or torch.Size([3, 1])
if you want control the limit of connection number can use
if sum(alphaalpha[start:end, :]) > 2:


Where to find download the vega-0.9.1-py3-none-any.whl ?

Probably because of the fact I am new to vega but I cannot find this wheel vega-0.9.1-py3-none-any.whl. It says on the installation page download the vega-0.9.1-py3-none-any.whl file in the release directory
Where is the release directory if someone can kindly point out.
I know we can built from the source but I would like to see.

can user set worker_path in config?

evey time you run . the worker_path is random generated to save model and tensorboard log. such as 0126.091415.578 and 0126.090933.510 etc. it is very inconvenient. can you set the fixed worker_path in yml txt?
in addition, during one task - training , i found some parameter set in yml and some parameter set in config py. it is very informal. as a big company . i think you should Specificate code framework. tks

How to reproduce results of the paper ? Plese provide config files for the models in model zoo

I am trying to use the ecp model(spnet_checkpoint_ecp.pth) provided in the model zoo to reproduce results of the paper.
When I try to execute in spnet/tools/ it complains about mismatch between keys of the models.

Without the config files (such as I cannot do anything?
I have also tried to use the file you guys mention in the issue #14. The code breaks with the error that it is missing attribute in keep_all_stages

Suggestion: I think it would be great if you guys can provide a small example of running you pre-trained model.

[Bug] "Not found serial results"

Training SP-NAS with the default configurations, spnas.yml (the one you guys provide) breaks in the second phase where it starts training nas2. It complains it cannot find the file total_list_s.csv. I think the problem is in the variable remote_output_path; when using default settings, the code expects the file to be in the folder nas2, which is not yet created. Instead, the file is at the following location:


instead the current code(with default params) searches the file at following location:


and below is the trail of the error message

2020-07-26 04:56:45.680 INFO Start pipeline step: [nas2]
vega-0.9.1-py3.7.egg/vega/core/pipeline/", line 58, in run
 vega/algorithms/nas/sp_nas/", line 27, in __init__
vega/core/pipeline/", line 28, in __init__
    self.generator = Generator()
vega/core/pipeline/", line 25, in __init__
    self.search_alg = SearchAlgorithm(self.search_space)

vega-0.9.1-py3.7.egg/vega/algorithms/nas/sp_nas/", line 50, in __init__
    ), "Not found serial results!"
AssertionError: Not found serial results!

Question on ""

hello! I'm wondering when the result of (self.points_per_line / self.feature_height) is not integer, will this line have some problems? (by default it's 72/18=4)

center_y = y_list[int(self.points_per_line / self.feature_height) * (self.feature_height - 1 - h)]

should it be center_y = y_list[int((self.points_per_line / self.feature_height) * (self.feature_height - 1 - h))] ?

The performance of CARS from example

Hi, thanks for the great work!
I am curious about the output I get from cars algorithm in the examples.
I got 86.488 as best top-1 valid accuracy after running the command given in readme.
Should the accuracy suppose to be more higher or I need to modify the cars.yml for better performance.

PRUNE_EA parallel_search error

when I set parallel_search: True in prune.yml, I get this error

Traceback (most recent call last):
File "", line 1, in
File "/wn/vega/zeus/", line 153, in train_process
File "/wn/vega/zeus/", line 279, in _train_loop
File "/wn/vega/zeus/trainer/callbacks/", line 139, in before_train
File "/wn/vega/vega/algorithms/compression/prune_ea/", line 61, in before_train
self.latency_count = calc_forward_latency(self.trainer.model, count_input, sess_config)
File "/wn/vega/zeus/metrics/", line 30, in calc_forward_latency
step_cfg = UserConfig().data.get("nas")
AttributeError: 'NoneType' object has no attribute 'get'

Its all fine to set parallel_search:False and run prune algo demo, whats wrong with parallel_search

[URL fail] lackness of URL for pre-trained model source.

I found that we cannot download the pretrained models which vega provided in this page
Can you fix the url links of pre-trained models or provide other ones?

yml file of the conda environment ?

is it possible for you guys to share the yml file of your conda environment. I am running into some issues due to versions because of pip.

SP-NAS Config file for EuroCity Persons model

I am trying to reproduce numbers for EuroCityPersons using your pre-trained model in the Zoo. I have downloaded model from zoo for ECP. However, the corresponding config file is missing. Can you please provide it.

some code is missing in

in vega/docs/cn/developer/


    def __init__(self, desc):
        super(SimpleCnn, self).__init__()

it may be missing this code:
class SimpleCnn(nn.Module):

How to test SP-Nas after training ?

I am sorry but it is puzzling on how to test Sp-Nas after training finishes. So my problem is that I have trained SP-Net in the example folder using the following command:
python ./nas/sp_nas/spnas.yml

so the code is using the config file that you guys provide called /nas/sp_nas/ . It trains fine and during training it evaluates on the validation set and everything is fine and mAp is also reasonable.
Without changing anything in the code or the .yml file or anything, basically cloning it and setting up the dataset paths and pre-trained model and thats it.

However, after training finishes I am trying to run the saved model in the folder:

Using the command
python vega/examples/nas/sp_nas/ --checkpoint examples/tasks/0719.042952.773/output/2/1112-1112-11111-21-1-11.pth --out res.pkl
but it gives basically 0 mAp which I know is wrong. Do I need to change something in the config file or how can I run the test, could you please elaborate ?

Fully train of pba

Hi, I have run pba and get some augment policies. But I do not known how to fully train the model with the found policy. Could you help me?

question about generate the ground truth

When the two lane lines are so close that they are in the same grid. Which lane line the grid will respond to?

And another question is about the adaptive score masking:
what does the uxf and uyf mean?


Thanks for the open-source and continuous maintenance of VEGA. Why is VEGA, is there any story? :)

step_cfg = UserConfig().data.get("nas")

When I run quant_ea.yaml, it shows that:
Traceback (most recent call last):
File "", line 1, in
File "/root/.local/lib/python3.6/site-packages/zeus/", line 153, in train_process
File "/root/.local/lib/python3.6/site-packages/zeus/", line 279, in _train_loop
File "/root/.local/lib/python3.6/site-packages/zeus/trainer/callbacks/", line 139, in before_train
File "/root/.local/lib/python3.6/site-packages/vega/algorithms/compression/quant_ea/", line 62, in before_train
self.latency_count = calc_forward_latency(model, count_input, sess_config)
File "/root/.local/lib/python3.6/site-packages/zeus/metrics/", line 31, in calc_forward_latency
step_cfg = UserConfig().data.get("nas")
AttributeError: 'NoneType' object has no attribute 'get'

where is the

hello, I met this error when I try to train the auto-lane model with culane dataset. There is no in culane dataset, so where can I get file, many thanks!!!
Traceback (most recent call last):
File "", line 1, in
File "/home/haha/.local/lib/python3.7/site-packages/vega/core/trainer/", line 152, in train_process, hps=self.hps, load_ckpt_flag=self.load_ckpt_flag)
File "/home/haha/.local/lib/python3.7/site-packages/vega/core/trainer/", line 189, in build
mode='train', loader=train_loader)
File "/home/haha/.local/lib/python3.7/site-packages/vega/core/trainer/", line 360, in _init_dataloader
dataset = dataset_cls(mode=mode)
File "/home/haha/.local/lib/python3.7/site-packages/vega/datasets/pytorch/", line 97, in init
File "/home/haha/.local/lib/python3.7/site-packages/vega/datasets/pytorch/common/", line 214, in load_module
File "", line 724, in exec_module
File "", line 859, in get_code
File "", line 916, in get_data
FileNotFoundError: [Errno 2] No such file or directory: '/cache/dataset/CULane/'
2020-09-08 15:47:20.949 INFO {'code': 'r34_48_1-1111-1-22112111111111111111+012-122', 'method': 'random'}

[Bug] Problem in SP-NAS (fullytrain) ERROR Failed to load records from model folder.


I am trying to train the full pipeline [nas1, nas2, fullytrain] of SP-NAS. I did not change anything, except I changed one line in spnas.yml, that is I changed:

pipeline: [nas1] to pipeline: [nas1, nas2, fullytrain]

It trains fine for nas1 and nas2. However, the code breaks by complaining that it cannot find records. This is the error trail.
Can you suggest a quick fix ?

2020-09-24 10:08:23.81 INFO performance save to vega/examples/tasks/0924.025954.103/workers/nas2/11/performance
2020-09-24 10:08:24.275 INFO Latest checkpoint save to vega/examples/tasks/0924.025954.103/output/11
2020-09-24 10:08:24.276 INFO update generator, step name: nas2, worker id: 11
2020-09-24 10:08:24.277 INFO SpNas.update(), performance file=vega/examples/tasks/0924.025954.103/workers/nas2/11/performance/performance.pkl
2020-09-24 10:08:24.321 INFO Start pipeline step: [fullytrain]
2020-09-24 10:08:24.322 INFO init FullyTrainPipeStep...
2020-09-24 10:08:24.322 INFO FullyTrainPipeStep started...
2020-09-24 10:08:24.324 ERROR Failed to load records from model folder, folder=vega/examples/tasks/0924.025954.103/output/nas2
2020-09-24 10:08:24.324 WARNING Failed to dump records, report is emplty.

output/nas2 this folder is never created by the code.

Train Pipeline

I want to run the training pipeline, can you give the file of /data/2019_mdc_lane/c00523047/mass_storage/culane/CULane/

RuntimeError: Dataset not found or corrupted. You can use download=True to download it

2020-08-28 23:18:40.631 ERROR Failed to run pipeline.
Traceback (most recent call last):
File "/usr/local/python3.7/lib/python3.7/site-packages/vega/core/pipeline/", line 52, in run
File "/usr/local/python3.7/lib/python3.7/site-packages/vega/core/pipeline/", line 43, in do
File "/usr/local/python3.7/lib/python3.7/site-packages/vega/core/pipeline/", line 73, in _dispatch_trainer
File "/usr/local/python3.7/lib/python3.7/site-packages/vega/core/scheduler/", line 42, in run
File "/usr/local/python3.7/lib/python3.7/site-packages/vega/core/trainer/", line 152, in train_process, hps=self.hps, load_ckpt_flag=self.load_ckpt_flag)
File "/usr/local/python3.7/lib/python3.7/site-packages/vega/core/trainer/", line 189, in build
mode='train', loader=train_loader)
File "/usr/local/python3.7/lib/python3.7/site-packages/vega/core/trainer/", line 360, in _init_dataloader
dataset = dataset_cls(mode=mode)
File "/usr/local/python3.7/lib/python3.7/site-packages/vega/datasets/pytorch/", line 41, in init
File "/root/.local/lib/python3.7/site-packages/torchvision/datasets/", line 67, in init
raise RuntimeError('Dataset not found or corrupted.' +
RuntimeError: Dataset not found or corrupted. You can use download=True to download it
2020-08-28 23:18:40.631 ERROR None

I try to use the CARS algorithm to search for datasets on cifar10. The dataset path is set correctly, but the preceding error occurs. Why?

Failed to save desc

I cannot find where this log is saved and dont know how to analysis.
can you help answer me?

Failed to save desc, file=/home/mengzhibin/vega/tasks/1105.182606.151/workers/nas/4/desc_4.json, desc={'detector': {'name': 'AutoLaneDetector', 'modules': ['backbone', 'neck', 'head'], 'num_class': 2, 'method': 'random', 'code': 'x50(2x24d)_48_112111-211112-1-1+122-022', 'backbone': {'name': 'ResNeXtVariantDet', 'arch': '112111-211112-1-1', 'base_depth': 50, 'base_channel': 48, 'groups': 2, 'base_width': 24, 'num_stages': 4, 'strides': (1, 2, 2, 2), 'dilations': (1, 1, 1, 1), 'out_indices': (0, 1, 2, 3), 'frozen_stages': -1, 'zero_init_residual': False, 'norm_cfg': {'type': 'BN', 'requires_grad': True}, 'conv_cfg': {'type': 'Conv'}, 'out_channels': [384, 1536, 1536, 1536], 'style': 'pytorch'}, 'neck': {'arch_code': '122-022', 'name': 'FeatureFusionModule', 'in_channels': [384, 1536, 1536, 1536]}, 'head': {'base_channel': 1792, 'num_classes': 2, 'up_points': 73, 'down_points': 72, 'name': 'AutoLaneHead'}, 'limits': {'GFlops': 1}}, 'modules': ['detector']}, msg=local variable 'value' referenced before assignment Failed to save performance, file=/home/mengzhibin/vega/tasks/1105.182606.151/workers/nas/4/performance_4.json, desc={'LaneMetric': 0.0}, msg=local variable 'value' referenced before assignment

Is there no code about guilded mutation in esr-ea or I missed it ?

Hi, thanks for your awesome work, but i doubt whether the implement of EA's mutation is correct, the paper says that we should acquire the block credits during model evaluation procedure, which can be used to guide the mutation to accelerate searching and find better architecture, I find that vega's implementation is general mutaion, could you help me? thanks!

AutoLaneHead forward error

Error caused by super class's Module function: forward; When training, Cls AutoLaneHead's function forward_train is not called.

how to run the auto_lane

I want to run the auto_lane according to the user guide but get the errors. the configure files have changed, but the command recommended is not changed
how to configure the right files of lane detection?
how to register the class name before using them?
how to study the Curvelane-NAS algorithms step by step?
for example,
how to configure the yaml file in some dir some file, how to set the weight between the background and the lane lines (0.4 vs 1.0 maybe not fit for some model according to my experiments of SCNN? because their mIOU only about 0.105 if the weight of the background is set 0.4, the lane lines' weights are set to 1.0. So it is not possible to use mIOU 0.5 to calculate whether the corresponding lane line exists.
please give detailed parameters of your when training and testing on curve lane dataset)

I have read the sure guide carefully, but I don't know how to handle the questions above.
please solve the problems or update the corresponding guide No such file or directory

When I try to solve this problem for esr_ea algorithm by this way: #84 , The error shows that : "/root/.local/lib/python3.6/site-packages/vega/core/pipeline/horovod/ No such file or directory," . Could you tell me how to deal with it?

Issues regarding the computation of FLOPS with thop

In, the FLOPS is computed with the 3rd party package "thop". In their GitHub repo, it has been explained that the output of profile is actually MACs instead of FLOPS.

It is even puzzling with this line of code: self.gflops, self.kparams = flops_count * 1600 * 1e-9, params_count * 1e-3

Multiplying 1e-9 is to make it GMACS, but why multiplying 1600?

the losses of auto lane are not converging on CurveLanes Dataset

I test the loss terms in vega/search_space/networks/pytorch/detectors/

     `image = input
    loc_targets = kwargs['gt_loc']
    cls_targets = kwargs['gt_cls']

    feat = self.extract_feat(image)
    predict = self.head(feat)

    loc_preds = predict['predict_loc']
    cls_preds = predict['predict_cls']
    cls_targets = cls_targets[..., 1].view(-1)
    pmask = cls_targets > 0
    nmask = ~ pmask
    fpmask = pmask.float()
    fnmask = nmask.float()
    cls_preds = cls_preds.view(-1, cls_preds.shape[-1])
    loc_preds = loc_preds.view(-1, loc_preds.shape[-1])
    loc_targets = loc_targets.view(-1, loc_targets.shape[-1])
    total_postive_num = torch.sum(fpmask)
    total_negative_num = torch.sum(fnmask)  # Number of negative entries to select
    negative_num = torch.clamp(total_postive_num * self.NEGATIVE_RATIO, max=total_negative_num, min=1).int()
    positive_num = torch.clamp(total_postive_num, min=1).int()
    # cls loss begin
    bg_fg_predict = F.log_softmax(cls_preds, dim=-1)
    fg_predict = bg_fg_predict[..., 1]
    bg_predict = bg_fg_predict[..., 0]
    max_hard_pred = find_k_th_small_in_a_tensor(bg_predict[nmask].detach(), negative_num)
    fnmask_ohem = (bg_predict <= max_hard_pred).float() * nmask.float()
    total_cross_pos = -torch.sum(self.ALPHA * fg_predict * fpmask)
    total_cross_neg = -torch.sum(self.ALPHA * bg_predict * fnmask_ohem)
    # class loss end
    # regression loss begin
    length_weighted_mask = torch.ones_like(loc_targets)
    length_weighted_mask[..., self.LANE_POINTS_NUM_DOWN] = 10
    valid_lines_mask = pmask.unsqueeze(-1).expand_as(loc_targets)
    valid_points_mask = (loc_targets != 0)
    unified_mask = length_weighted_mask.float() * valid_lines_mask.float() * valid_points_mask.float()
    smooth_huber = huber_fun(loc_preds - loc_targets) * unified_mask
    loc_smooth_l1_loss = torch.sum(smooth_huber, -1)
    point_num_per_gt_anchor = torch.sum(valid_points_mask.float(), -1).clamp(min=1)
    total_loc = torch.sum(loc_smooth_l1_loss / point_num_per_gt_anchor)
    # regression loss end
    total_cross_pos = total_cross_pos / positive_num
    total_cross_neg = total_cross_neg / positive_num
    total_loc = total_loc / positive_num`

on the CurveLanes Dataset, using the provided optimizer parameters,such as lr=0.02, weight_decay=1e-4,momentum=0.9, etc.
And build a model structrue just like the readme file"".

However the loss values are pretty large:
loss_pos = 36.0+
loss_neg = 10.+
loss_loc = 100.0+

and did not convergen after 12 epoches, which is mentioned in this paper :

Does the loss terms in are wrong? or just I miss some important steps?

NameError: name 'IMAGENET_DEFAULT_MEAN' is not defined

I follow the quickstart example and get the following error , can anyone help?

File "", line 60, in"./my.yml")
File "/home/lchen/.conda/envs/clpython/lib/python3.7/site-packages/vega/core/", line 34, in run
File "/home/lchen/.conda/envs/clpython/lib/python3.7/site-packages/vega/core/", line 62, in _init_env
set_backend(General.backend, General.device_category)
File "/home/lchen/.conda/envs/clpython/lib/python3.7/site-packages/vega/core/", line 64, in set_backend
File "/home/lchen/.conda/envs/clpython/lib/python3.7/site-packages/vega/core/", line 20, in register_pytorch
import vega.core.trainer.timm_trainer_callback
File "/home/lchen/.conda/envs/clpython/lib/python3.7/site-packages/vega/core/trainer/", line 60, in
NameError: name 'IMAGENET_DEFAULT_MEAN' is not defined

Potential bug in ?

I have seemingly installed vega correctly and I can import it as well.
I have correctly placed the pre-trained models and dataset in the respective folders (cache/models and cache/datasets folder).

I am trying to run inside example folder the following command:
python nas/sp_nas/spnas.yml

The code breaks with the following error
vega/algorithms/nas/sp_nas/spnet/ line 636, in __init__ TypeError: '<' not supported between instances of 'str' and 'int' assert max(out_indices) < num_stages
The error message is clear that you have string on one side and int on the other. When I print both out_indices and num_stages
I see out_indices is out_indices: {'__tuple__': True, 'items': [0, 1, 2, 3]} and max(out_indices) returns simply returns items
Is it a bug or there is an issue in my python or any other lib version etc. ?
I am using Python 3.7.7

Inference fo auto lane

Could you help tell how to run the to inference the auto lane model?
In the, there is only model and desc file provided but inference code.
I try to use vega/model_zoo/ and dont know how to set the data_type and data_path.

If you can provide document of auto lane inferencing, it will help a lot.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.