huawei-noah / vega Goto Github PK

View Code? Open in Web Editor NEW

834.0 834.0 176.0 15.12 MB

AutoML tools chain

Home Page: http://www.noahlab.com.hk/opensource/vega/

License: Other

Python 99.15% Shell 0.10% C++ 0.70% CMake 0.04%

vega's People

Contributors

Stargazers

Watchers

Forkers

hasanirtiza dawncc matrixplayer siatwangmin bazige cndylan jingmouren allensmile lilujunai 5663015 zhwzhong scape1989 poisonbox 1157942086 qixiuai ljyljy fanbenchao crishawy kepengxu khanhdinhduy zhuangkechen nuoheizi wanghaoren karin-s zhangliliang oneinsect bflfanbo skylarkhop kur0x okevinok 1ziyanw1 deeplearning2012 chenboability georgefang kiminh tonylv shuharold lzc06 liangli-zhen csbingao tomzhang fmsnew xinyuecai2016 lifengss toremik1 jerrybonjour kc96226 haopo2005 knowledgehacker dlwbm123 smellly sanyam07 xrosliang wjwangppt hazzacheng techmonsterwang zhuqingling zeyefkey zeta1999 18651626101 hikaruzzz rubinxin bingoko hustllz chnxindong skyliuhc nicole456 creatorcen wangsheng1991 likun1234 arui1 zzqiuzz wrx812 doinker shao19950821 davidocea gbyy422990 wnov yanghedada macaca50507 ylfzr jacoblee121 avain hujiaodou96 pbdahzou bnuyfdoen04d krisandchris noticeable jie311 ilyatrofimov liuyuediyu666 wangtaogithub liubob-0 blackdopamine mjlovesz dragonzly chaucergit erichyd danpanwan fengpan1010

vega's Issues

how to visualize auto-lane model from model_zoo

As above, have you supplied test file with video, camera or image file， thanks？

Where can I find SPNetXB_COCO_ImageNetPretrained.pth ?

With reference to the issue #14. Can you please tell where can I find this model SPNetXB_COCO_ImageNetPretrained.pth ?

how to run the auto_lane

I want to run the auto_lane according to the user guide but get the errors. the configure files have changed, but the command recommended is not changed
how to configure the right files of lane detection?
how to register the class name before using them?
how to study the Curvelane-NAS algorithms step by step?
for example,
how to configure the yaml file in some dir some file, how to set the weight between the background and the lane lines (0.4 vs 1.0 maybe not fit for some model according to my experiments of SCNN? because their mIOU only about 0.105 if the weight of the background is set 0.4, the lane lines' weights are set to 1.0. So it is not possible to use mIOU 0.5 to calculate whether the corresponding lane line exists.
please give detailed parameters of your when training and testing on curve lane dataset)
.........

I have read the sure guide carefully, but I don't know how to handle the questions above.
please solve the problems or update the corresponding guide

how to test the SP-NAS mAP on COCO with vega

wrong implementation for DARTS?

In super_network.py, you have

self.initializer()

Is that a mistake? If not, why you initialize all the hyperparameters everything the model is called?

Where to find download the vega-0.9.1-py3-none-any.whl ?

Probably because of the fact I am new to vega but I cannot find this wheel vega-0.9.1-py3-none-any.whl. It says on the installation page download the vega-0.9.1-py3-none-any.whl file in the release directory
Where is the release directory if someone can kindly point out.
I know we can built from the source but I would like to see.

some code is missing in quick_start.md

in vega/docs/cn/developer/quick_start.md

@NetworkFactory.register(NetTypes.CUSTOM)

    def __init__(self, desc):
        super(SimpleCnn, self).__init__()

it may be missing this code：
class SimpleCnn(nn.Module):

step_cfg = UserConfig().data.get("nas")

When I run quant_ea.yaml, it shows that:
Traceback (most recent call last):
File "", line 1, in
File "/root/.local/lib/python3.6/site-packages/zeus/trainer_base.py", line 153, in train_process
self._train_loop()
File "/root/.local/lib/python3.6/site-packages/zeus/trainer_base.py", line 279, in _train_loop
self.callbacks.before_train()
File "/root/.local/lib/python3.6/site-packages/zeus/trainer/callbacks/callback_list.py", line 139, in before_train
callback.before_train(logs)
File "/root/.local/lib/python3.6/site-packages/vega/algorithms/compression/quant_ea/quant_trainer_callback.py", line 62, in before_train
self.latency_count = calc_forward_latency(model, count_input, sess_config)
File "/root/.local/lib/python3.6/site-packages/zeus/metrics/forward_latency.py", line 31, in calc_forward_latency
step_cfg = UserConfig().data.get("nas")
AttributeError: 'NoneType' object has no attribute 'get'

Is there any doc for using my own backbone or trainer?

Hi, thank you for the great work. I want to run pba with my own data and model. Is there any doc to quickly tell me how to use my own backbone or trainer?

Inference fo auto lane

Could you help tell how to run the inference.py to inference the auto lane model?
In the model_zoo.md, there is only model and desc file provided but inference code.
I try to use vega/model_zoo/inference.py and dont know how to set the data_type and data_path.

If you can provide document of auto lane inferencing, it will help a lot.

When will K8S deployment be supported

When will K8S deployment be supported with documentation on how to use it？
Thanks！

RuntimeError: Dataset not found or corrupted. You can use download=True to download it

2020-08-28 23:18:40.631 ERROR Failed to run pipeline.
Traceback (most recent call last):
File "/usr/local/python3.7/lib/python3.7/site-packages/vega/core/pipeline/pipeline.py", line 52, in run
PipeStep().do()
File "/usr/local/python3.7/lib/python3.7/site-packages/vega/core/pipeline/nas_pipe_step.py", line 43, in do
self._dispatch_trainer(res)
File "/usr/local/python3.7/lib/python3.7/site-packages/vega/core/pipeline/nas_pipe_step.py", line 73, in _dispatch_trainer
self.master.run(trainer)
File "/usr/local/python3.7/lib/python3.7/site-packages/vega/core/scheduler/local_master.py", line 42, in run
worker.train_process()
File "/usr/local/python3.7/lib/python3.7/site-packages/vega/core/trainer/trainer.py", line 152, in train_process
self.build(model=self.model, hps=self.hps, load_ckpt_flag=self.load_ckpt_flag)
File "/usr/local/python3.7/lib/python3.7/site-packages/vega/core/trainer/trainer.py", line 189, in build
mode='train', loader=train_loader)
File "/usr/local/python3.7/lib/python3.7/site-packages/vega/core/trainer/trainer.py", line 360, in _init_dataloader
dataset = dataset_cls(mode=mode)
File "/usr/local/python3.7/lib/python3.7/site-packages/vega/datasets/pytorch/cifar10.py", line 41, in init
transform=Compose(self.transforms.transform), download=self.args.download)
File "/root/.local/lib/python3.7/site-packages/torchvision/datasets/cifar.py", line 67, in init
raise RuntimeError('Dataset not found or corrupted.' +
RuntimeError: Dataset not found or corrupted. You can use download=True to download it
2020-08-28 23:18:40.631 ERROR None

I try to use the CARS algorithm to search for datasets on cifar10. The dataset path is set correctly, but the preceding error occurs. Why?

Why VEGA?

Thanks for the open-source and continuous maintenance of VEGA. Why is VEGA, is there any story? :)

Can we install vega from source?

Only .whl mode seems to be supported.

how to cite this repository?

could you please provide a bib for vega? or we cite the URL?

When will the bega 1.0 version to release?

Can I know when will the 1.0 version to release?

ERROR Illegal alpha.

when I run run_example.py
output:
ERROR Illegal alpha.
Then I watch source

idx = torch.argmax(alpha[start:end, :], dim=1) cnt = 0 if torch.nonzero(idx).size(0) > 2: logger.error("Illegal alpha.")
the shape of torch.nonzero(idx) is torch.Size([5, 1]) \ torch.Size([4, 1]) or torch.Size([3, 1])
if you want control the limit of connection number can use
if sum(alphaalpha[start:end, :]) > 2:

thanks

PRUNE_EA parallel_search error

when I set parallel_search: True in prune.yml, I get this error

Traceback (most recent call last):
File "", line 1, in
File "/wn/vega/zeus/trainer_base.py", line 153, in train_process
self._train_loop()
File "/wn/vega/zeus/trainer_base.py", line 279, in _train_loop
self.callbacks.before_train()
File "/wn/vega/zeus/trainer/callbacks/callback_list.py", line 139, in before_train
callback.before_train(logs)
File "/wn/vega/vega/algorithms/compression/prune_ea/prune_trainer_callback.py", line 61, in before_train
self.latency_count = calc_forward_latency(self.trainer.model, count_input, sess_config)
File "/wn/vega/zeus/metrics/forward_latency.py", line 30, in calc_forward_latency
step_cfg = UserConfig().data.get("nas")
AttributeError: 'NoneType' object has no attribute 'get'

Its all fine to set parallel_search:False and run prune algo demo, whats wrong with parallel_search

Issues regarding the computation of FLOPS with thop

In model_statistics.py, the FLOPS is computed with the 3rd party package "thop". In their GitHub repo, it has been explained that the output of profile is actually MACs instead of FLOPS.

It is even puzzling with this line of code: self.gflops, self.kparams = flops_count * 1600 * 1e-9, params_count * 1e-3

Multiplying 1e-9 is to make it GMACS, but why multiplying 1600?

where is the dataset.py?

hello, I met this error when I try to train the auto-lane model with culane dataset. There is no dataset.py in culane dataset, so where can I get dataset.py file, many thanks!!!
Traceback (most recent call last):
File "", line 1, in
File "/home/haha/.local/lib/python3.7/site-packages/vega/core/trainer/trainer.py", line 152, in train_process
self.build(model=self.model, hps=self.hps, load_ckpt_flag=self.load_ckpt_flag)
File "/home/haha/.local/lib/python3.7/site-packages/vega/core/trainer/trainer.py", line 189, in build
mode='train', loader=train_loader)
File "/home/haha/.local/lib/python3.7/site-packages/vega/core/trainer/trainer.py", line 360, in _init_dataloader
dataset = dataset_cls(mode=mode)
File "/home/haha/.local/lib/python3.7/site-packages/vega/datasets/pytorch/auto_lane_datasets.py", line 97, in init
train=load_module(self.args.dataset_file).create_train_subset(),
File "/home/haha/.local/lib/python3.7/site-packages/vega/datasets/pytorch/common/auto_lane_utils.py", line 214, in load_module
spec.loader.exec_module(mod)
File "", line 724, in exec_module
File "", line 859, in get_code
File "", line 916, in get_data
FileNotFoundError: [Errno 2] No such file or directory: '/cache/dataset/CULane/dataset.py'
2020-09-08 15:47:20.949 INFO {'code': 'r34_48_1-1111-1-22112111111111111111+012-122', 'method': 'random'}

Potential bug in spnet.py ?

I have seemingly installed vega correctly and I can import it as well.
I have correctly placed the pre-trained models and dataset in the respective folders (cache/models and cache/datasets folder).

I am trying to run inside example folder the following command:
python run_example.py nas/sp_nas/spnas.yml

The code breaks with the following error
vega/algorithms/nas/sp_nas/spnet/spnet.py line 636, in __init__ TypeError: '<' not supported between instances of 'str' and 'int' assert max(out_indices) < num_stages
The error message is clear that you have string on one side and int on the other. When I print both out_indices and num_stages
I see out_indices is out_indices: {'__tuple__': True, 'items': [0, 1, 2, 3]} and max(out_indices) returns simply returns items
Is it a bug or there is an issue in my python or any other lib version etc. ?
I am using Python 3.7.7

Can not find the auto_lane pretrained pth.

What's more, is there any test code or inference code guide of auto_lane module?

SP-NAS Config file for EuroCity Persons model

I am trying to reproduce numbers for EuroCityPersons using your pre-trained model in the Zoo. I have downloaded model from zoo for ECP. However, the corresponding config file is missing. Can you please provide it.

Failed to save desc

I cannot find where this log is saved and dont know how to analysis.
can you help answer me?

Failed to save desc, file=/home/mengzhibin/vega/tasks/1105.182606.151/workers/nas/4/desc_4.json, desc={'detector': {'name': 'AutoLaneDetector', 'modules': ['backbone', 'neck', 'head'], 'num_class': 2, 'method': 'random', 'code': 'x50(2x24d)_48_112111-211112-1-1+122-022', 'backbone': {'name': 'ResNeXtVariantDet', 'arch': '112111-211112-1-1', 'base_depth': 50, 'base_channel': 48, 'groups': 2, 'base_width': 24, 'num_stages': 4, 'strides': (1, 2, 2, 2), 'dilations': (1, 1, 1, 1), 'out_indices': (0, 1, 2, 3), 'frozen_stages': -1, 'zero_init_residual': False, 'norm_cfg': {'type': 'BN', 'requires_grad': True}, 'conv_cfg': {'type': 'Conv'}, 'out_channels': [384, 1536, 1536, 1536], 'style': 'pytorch'}, 'neck': {'arch_code': '122-022', 'name': 'FeatureFusionModule', 'in_channels': [384, 1536, 1536, 1536]}, 'head': {'base_channel': 1792, 'num_classes': 2, 'up_points': 73, 'down_points': 72, 'name': 'AutoLaneHead'}, 'limits': {'GFlops': 1}}, 'modules': ['detector']}, msg=local variable 'value' referenced before assignment Failed to save performance, file=/home/mengzhibin/vega/tasks/1105.182606.151/workers/nas/4/performance_4.json, desc={'LaneMetric': 0.0}, msg=local variable 'value' referenced before assignment

Is there no code about guilded mutation in esr-ea or I missed it ?

Hi, thanks for your awesome work, but i doubt whether the implement of EA's mutation is correct, the paper says that we should acquire the block credits during model evaluation procedure, which can be used to guide the mutation to accelerate searching and find better architecture, I find that vega's implementation is general mutaion, could you help me? thanks!

auto lane tar file cannot download

AutoLane tar file cannot download
https://github.com/huawei-noah/vega/blob/master/docs/cn/model_zoo/model_zoo.md
Is the download site wrong?
Only pretrained model on culane is uploaded, can i ask when the curve lanes based model been uploaded?

can't find the pretrained model?

the losses of auto lane are not converging on CurveLanes Dataset

I test the loss terms in vega/search_space/networks/pytorch/detectors/auto_lane_detector.py:

     `image = input
    loc_targets = kwargs['gt_loc']
    cls_targets = kwargs['gt_cls']

    feat = self.extract_feat(image)
    predict = self.head(feat)

    loc_preds = predict['predict_loc']
    cls_preds = predict['predict_cls']
    cls_targets = cls_targets[..., 1].view(-1)
    pmask = cls_targets > 0
    nmask = ~ pmask
    fpmask = pmask.float()
    fnmask = nmask.float()
    cls_preds = cls_preds.view(-1, cls_preds.shape[-1])
    loc_preds = loc_preds.view(-1, loc_preds.shape[-1])
    loc_targets = loc_targets.view(-1, loc_targets.shape[-1])
    total_postive_num = torch.sum(fpmask)
    total_negative_num = torch.sum(fnmask)  # Number of negative entries to select
    negative_num = torch.clamp(total_postive_num * self.NEGATIVE_RATIO, max=total_negative_num, min=1).int()
    positive_num = torch.clamp(total_postive_num, min=1).int()
    # cls loss begin
    bg_fg_predict = F.log_softmax(cls_preds, dim=-1)
    fg_predict = bg_fg_predict[..., 1]
    bg_predict = bg_fg_predict[..., 0]
    max_hard_pred = find_k_th_small_in_a_tensor(bg_predict[nmask].detach(), negative_num)
    fnmask_ohem = (bg_predict <= max_hard_pred).float() * nmask.float()
    total_cross_pos = -torch.sum(self.ALPHA * fg_predict * fpmask)
    total_cross_neg = -torch.sum(self.ALPHA * bg_predict * fnmask_ohem)
    # class loss end
    # regression loss begin
    length_weighted_mask = torch.ones_like(loc_targets)
    length_weighted_mask[..., self.LANE_POINTS_NUM_DOWN] = 10
    valid_lines_mask = pmask.unsqueeze(-1).expand_as(loc_targets)
    valid_points_mask = (loc_targets != 0)
    unified_mask = length_weighted_mask.float() * valid_lines_mask.float() * valid_points_mask.float()
    smooth_huber = huber_fun(loc_preds - loc_targets) * unified_mask
    loc_smooth_l1_loss = torch.sum(smooth_huber, -1)
    point_num_per_gt_anchor = torch.sum(valid_points_mask.float(), -1).clamp(min=1)
    total_loc = torch.sum(loc_smooth_l1_loss / point_num_per_gt_anchor)
    # regression loss end
    total_cross_pos = total_cross_pos / positive_num
    total_cross_neg = total_cross_neg / positive_num
    total_loc = total_loc / positive_num`

on the CurveLanes Dataset, using the provided optimizer parameters,such as lr=0.02, weight_decay=1e-4,momentum=0.9, etc.
And build a model structrue just like the readme file"https://github.com/huaweinoah/vega/blob/master/docs/en/algorithms/auto_lane.md".

However the loss values are pretty large:
loss_pos = 36.0+
loss_neg = 10.+
loss_loc = 100.0+

and did not convergen after 12 epoches, which is mentioned in this paper :https://arxiv.org/abs/2007.12147

Does the loss terms in auto_lane_detector.py are wrong? or just I miss some important steps?

run_cluster_horovod_train.sh: No such file or directory

When I try to solve this problem for esr_ea algorithm by this way: #84 , The error shows that : "/root/.local/lib/python3.6/site-packages/vega/core/pipeline/horovod/run_cluster_horovod_train.sh: No such file or directory," . Could you tell me how to deal with it?

How to test SP-Nas after training ?

I am sorry but it is puzzling on how to test Sp-Nas after training finishes. So my problem is that I have trained SP-Net in the example folder using the following command:
python run_example.py ./nas/sp_nas/spnas.yml

so the code is using the config file that you guys provide called /nas/sp_nas/faster_rcnn_r50_fpn_1x.py . It trains fine and during training it evaluates on the validation set and everything is fine and mAp is also reasonable.
Without changing anything in the code or the .yml file or anything, basically cloning it and setting up the dataset paths and pre-trained model and thats it.

However, after training finishes I am trying to run the saved model in the folder:
examples/tasks/0719.042952.773/output/2/1112-1112-11111-21-1-11.pth

Using the command
python test.py vega/examples/nas/sp_nas/faster_rcnn_r50_fpn_1x.py --checkpoint examples/tasks/0719.042952.773/output/2/1112-1112-11111-21-1-11.pth --out res.pkl
but it gives basically 0 mAp which I know is wrong. Do I need to change something in the config file faster_rcnn_r50_fpn_1x.py or how can I run the test, could you please elaborate ?

AutoLaneHead forward error

Error caused by super class's Module function: forward; When training, Cls AutoLaneHead's function forward_train is not called.

Train Pipeline

I want to run the training pipeline, can you give the file of /data/2019_mdc_lane/c00523047/mass_storage/culane/CULane/dataset.py?

The performance of CARS from example

Hi, thanks for the great work!
I am curious about the output I get from cars algorithm in the examples.
I got 86.488 as best top-1 valid accuracy after running the command given in readme.
Should the accuracy suppose to be more higher or I need to modify the cars.yml for better performance.

when the code "Circumventing Outliers of AutoAugment with Knowledge Distillation" can be released?

[Bug] Fix the bug in Sp-NAS test.py

Similar to the bug reported in #19 #23. The following line in test.py is also redundant and it causes the code to crash in second stage of training.

vega/vega/algorithms/nas/sp_nas/tools/test.py

Line 193 in fc8f30c

cfg = mmcv.Config.fromfile(args.config)

no torch.manual_seed in trainer

只有torch.cuda.manual_seed()
缺少了torch.manual_seed()
会导致相同模型训练的精度不一致。

examples/classifytask: Horovod has not been initialized; use hvd.init()

I run example code, like that
python run_pipeline.py classification/classify.yml with one new config itemtrainer:distributed=True

Some errors occur about horovod. ValueError: Horovod has not been initialized; use hvd.init()..

Does horovod is not supported in this example?

question about generate the ground truth

When the two lane lines are so close that they are in the same grid. Which lane line the grid will respond to？

And another question is about the adaptive score masking:
what does the uxf and uyf mean?

QQ chatroom

I can't find the chatroom.

yml file of the conda environment ?

is it possible for you guys to share the yml file of your conda environment. I am running into some issues due to versions because of pip.

can user set worker_path in config?

evey time you run pipline.py . the worker_path is random generated to save model and tensorboard log. such as 0126.091415.578 and 0126.090933.510 etc. it is very inconvenient. can you set the fixed worker_path in yml txt？
in addition, during one task - training , i found some parameter set in yml and some parameter set in config py. it is very informal. as a big company . i think you should Specificate code framework. tks

[URL fail] lackness of URL for pre-trained model source.

I found that we cannot download the pretrained models which vega provided in this page https://github.com/huawei-noah/vega/blob/master/docs/cn/user/examples.md.
Can you fix the url links of pre-trained models or provide other ones?

[Bug] "Not found serial results"

Training SP-NAS with the default configurations, spnas.yml (the one you guys provide) breaks in the second phase where it starts training nas2. It complains it cannot find the file total_list_s.csv. I think the problem is in the variable remote_output_path; when using default settings, the code expects the file to be in the folder nas2, which is not yet created. Instead, the file is at the following location:

tasks/0726.021123.846/output/total_list_s.csv

instead the current code(with default params) searches the file at following location:

tasks/0726.021123.846/output/nas2/total_list_s.csv

and below is the trail of the error message

2020-07-26 04:56:45.680 INFO Start pipeline step: [nas2]
vega-0.9.1-py3.7.egg/vega/core/pipeline/pipeline.py", line 58, in run
    PipeStep().do()
 vega/algorithms/nas/sp_nas/spnas_pipe_step.py", line 27, in __init__
    super().__init__()
vega/core/pipeline/nas_pipe_step.py", line 28, in __init__
    self.generator = Generator()
vega/core/pipeline/generator.py", line 25, in __init__
    self.search_alg = SearchAlgorithm(self.search_space)

vega-0.9.1-py3.7.egg/vega/algorithms/nas/sp_nas/sp_nas.py", line 50, in __init__
    ), "Not found serial results!"
AssertionError: Not found serial results!

How to reproduce results of the paper ? Plese provide config files for the models in model zoo

I am trying to use the ecp model(spnet_checkpoint_ecp.pth) provided in the model zoo to reproduce results of the paper.
When I try to execute test.py in spnet/tools/ it complains about mismatch between keys of the models.

Without the config files (such as faster_rcnn_r50_fpn_1x.py) I cannot do anything?
I have also tried to use the file you guys mention in the issue #14. The code breaks with the error that it is missing attribute in keep_all_stages

Suggestion: I think it would be great if you guys can provide a small example of running you pre-trained model.
Thanks

Question on "auto_lane_pointlane_codec.py"

hello! I'm wondering when the result of (self.points_per_line / self.feature_height) is not integer, will this line have some problems? (by default it's 72/18=4)

vega/zeus/datasets/common/utils/auto_lane_pointlane_codec.py

Line 289 in 698b6c0

 center_y = y_list[int(self.points_per_line / self.feature_height) * (self.feature_height - 1 - h)] 

should it be center_y = y_list[int((self.points_per_line / self.feature_height) * (self.feature_height - 1 - h))] ?

Can auto Lane detect any number of lanes?

Dose auto Lane limited to detect a pre-defined number of lanes? thanks

I have two GPUs, how to choose one of them to work?

Hello：

I did not find the code for GPU selection.

Thank you ！

NameError: name 'IMAGENET_DEFAULT_MEAN' is not defined

I follow the quickstart example and get the following error , can anyone help?

File "quickstart.py", line 60, in
vega.run("./my.yml")
File "/home/lchen/.conda/envs/clpython/lib/python3.7/site-packages/vega/core/run.py", line 34, in run
_init_env(cfg_path)
File "/home/lchen/.conda/envs/clpython/lib/python3.7/site-packages/vega/core/run.py", line 62, in _init_env
set_backend(General.backend, General.device_category)
File "/home/lchen/.conda/envs/clpython/lib/python3.7/site-packages/vega/core/backend_register.py", line 64, in set_backend
register_pytorch()
File "/home/lchen/.conda/envs/clpython/lib/python3.7/site-packages/vega/core/backend_register.py", line 20, in register_pytorch
import vega.core.trainer.timm_trainer_callback
File "/home/lchen/.conda/envs/clpython/lib/python3.7/site-packages/vega/core/trainer/timm_trainer_callback.py", line 60, in
mean=IMAGENET_DEFAULT_MEAN,
NameError: name 'IMAGENET_DEFAULT_MEAN' is not defined

Fully train of pba

Hi, I have run pba and get some augment policies. But I do not known how to fully train the model with the found policy. Could you help me?

[Bug] Problem in SP-NAS (fullytrain) ERROR Failed to load records from model folder.

Hi,

I am trying to train the full pipeline [nas1, nas2, fullytrain] of SP-NAS. I did not change anything, except I changed one line in spnas.yml, that is I changed:

pipeline: [nas1] to pipeline: [nas1, nas2, fullytrain]

It trains fine for nas1 and nas2. However, the code breaks by complaining that it cannot find records. This is the error trail.
Can you suggest a quick fix ?

2020-09-24 10:08:23.81 INFO performance save to vega/examples/tasks/0924.025954.103/workers/nas2/11/performance
2020-09-24 10:08:24.275 INFO Latest checkpoint save to vega/examples/tasks/0924.025954.103/output/11
2020-09-24 10:08:24.276 INFO update generator, step name: nas2, worker id: 11
2020-09-24 10:08:24.277 INFO SpNas.update(), performance file=vega/examples/tasks/0924.025954.103/workers/nas2/11/performance/performance.pkl
2020-09-24 10:08:24.321 INFO Start pipeline step: [fullytrain]
2020-09-24 10:08:24.322 INFO init FullyTrainPipeStep...
2020-09-24 10:08:24.322 INFO FullyTrainPipeStep started...
2020-09-24 10:08:24.324 ERROR Failed to load records from model folder, folder=vega/examples/tasks/0924.025954.103/output/nas2
2020-09-24 10:08:24.324 WARNING Failed to dump records, report is emplty.

output/nas2 this folder is never created by the code.