kakaobrain / fast-autoaugment Goto Github PK
View Code? Open in Web Editor NEWOfficial Implementation of 'Fast AutoAugment' in PyTorch.
License: MIT License
Official Implementation of 'Fast AutoAugment' in PyTorch.
License: MIT License
@ildoonet @sublee Hi, it is so nice of you guys to release the search code and will be much appreciated. While running with your search code and retrain with found policies, I still got some problems and hopefully you can help me figure them out.
Looking forward to your reply
Thanks again.
Hi, the code seems good for research purposes. There is not much documentation, but that's ok.
I think I've managed to understand what it does and how it works.
Now I need to use this package on my new custom datasets (none of the default torchvision.datasets) for production purposes.
Any idea on how to run the search.py with custom dataset?
Thanks
Having prepared environment.yml
with:
name: fast-aa
channels:
- conda-forge
- defaults
dependencies:
- python=3.6.9
- pytorch=1.2.0
- torchvision=0.4.0
- cudatoolkit=10
- pip
- pip:
- git+https://github.com/wbaek/theconf@de32022f8c0651a043dc812d17194cdfd62066e8
- git+https://github.com/ildoonet/pytorch-gradual-warmup-lr.git@08f7d5e
- git+https://github.com/ildoonet/pystopwatch2.git
- git+https://github.com/hyperopt/hyperopt.git
- pretrainedmodels
- gorilla
- tabulate
- pandas
- tqdm
- tensorboardx
- sklearn
- ray
- psutil
- setproctitle
- requests
and using search.py
with:
python FastAutoAugment/search.py -c confs/wresnet40x2_cifar.yaml
I keep receiving warnings or errors on wrong or missing package, e.g.
HyperOptSearch DeprecationWarning: This class has been moved.
Could you be able to share validated package versions ?
Hi,
Thank you for the work. Just wanna share a tiny issue that confuses me.
fast-autoaugment/FastAutoAugment/data.py
Lines 313 to 314 in 2424224
It would be great if you could kindly take a look. Thank you.
I'd like to train with this config: https://github.com/kakaobrain/fast-autoaugment/blob/master/confs/pyramid272_cifar10_b64.yaml
with multiple GPUs (as training on one takes about 600 hours). Is there any parameters adjustment to that config that you recommend to make?
Dear authors! I am having a question about back-propagation through operation of PILLOW library. How to pass probability parameter p through PIL operations and learn this parameter?
I did not find the Sample Pairing operation listed in the policies found in CIFAR. From FastAutoAugment/augmentations.py, I also noticed that Related codes for Sample Pairing operation were marked as commented-out code. Do these mean that in the experiments, this operation is not included in the search space? I will be grateful if you could give me some suggestions.
Hi, I want to run your code for ImageNet, but it seems the ResNet-50 implementation is missing.
In fast-autoaugment/FastAutoAugment/networks/__init__.py
,
from pretrainedmodels import models
...
model = models.resnet50(num_classes=num_class, pretrained=None)
but pretrainedmodels
is not uploaded yet. Is this ResNet50 from torchvision or your original implementation?
Hello, I see in the serarching code setting num-search
200 and resources_per_trial
1. Does this denote that fast autoaugment needs 200 gpus for searching?
I trained Imagenet using 32 GPUs via horovod (8V100*4) but got acc 77.1% which was much less than 78.6% reported in your paper by running:
python train.py -c confs/resnet50_imagenet_b4096.yaml --aug fa_reduced_imagenet --horovod
Moreover,as your yaml config,lr type should be multistep(adjust_learning_rate_resnet) which can be seen in train.py,but i saw cosine lr decay adopted during my test via your code.
Waiting for your reply ,thx.
Hi, I've tried to run the code for searching policies, but I have trouble in initializing the Ray server.
It seems that there is someting wrong with the code "ray.init(redis_address=args.redis)" in search.py Line 164
Traceback (most recent call last):
File "search.py", line 166, in
ray.init(redis_address=args.redis)
File "/home/xcq/anaconda3/envs/pytorch-video/lib/python3.6/site-packages/ray/worker.py", line 1425, in init
redis_address = services.address_to_ip(redis_address)
File "/home/xcq/anaconda3/envs/pytorch-video/lib/python3.6/site-packages/ray/services.py", line 145, in address_to_ip
ip_address = socket.gethostbyname(address_parts[0])
UnicodeError: encoding with 'idna' codec failed (UnicodeError: label empty or too long)
Any suggestion about what I could be doing wrong/ how to workaround this?
ray version==0.6.5
Hello, I have some questions.
In the paper, it is written that train dataset (D_train) is divided into k folds and divided into D_M and D_A based on ratio, and policy search process is operated for each fold. However, in the "data.py" code, I think that the data size of all folds is not divided into k.
Also, when training for D_train by integrating all policies at the end, it is implemented not by applying all policies to D_train, but by randomly selecting one of the subpolicies that synthesized the entire policies, performing transform, and learning.
I am wondering if I am misunderstanding these two things.
Thank you.
According to the following lines, your code doesn't update batch norm parameters. What is the real reason for that?
params_without_bn = [params for name, params in model.named_parameters() if not ('_bn' in name or '.bn' in name)]
Hello,
running the code I encounter this message from the torch implementation.
what(): owning_ptr == NullType::singleton() || owning_ptr->refcount_.load() > 0 INTERNAL ASSERT FAILED at /pytorch/c10/util/intrusive_ptr.h:348, please report a bug to PyTorch. intrusive_ptr:
I used the suggested versions. Do you have some advice?
Thank you.
@ildoonet , I refer to the paper.
Can you explain this? And which one should I follow?
Many thanks.
Hello!
I use python3.6.5
I install use pip install git+https://github.com/wbaek/theconf.git
I get:
Collecting git+https://github.com/wbaek/theconf.git
Cloning https://github.com/wbaek/theconf.git to /tmp/pip-req-build-pun46x94
Complete output from command python setup.py egg_info:
Traceback (most recent call last):
File "", line 1, in
File "/tmp/pip-req-build-pun46x94/setup.py", line 19, in
long_description = fp.read()
File "/opt/conda/lib/python3.6/encodings/ascii.py", line 26, in decode
return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xec in position 449: ordinal not in range(128)
HI, @ildoonet thanks for the work first, but during running the code python search.py -c confs/wresnet40x2_cifar10_b512.yaml --dataroot ... --redis ...
on a Ray cluster, I find the header worker can't gather the models trained on worker nodes for the subsequent searching policy stage e.g. the main process on header node will throw exception: No such file or directory: '/FastAutoAugment/models/cifar10_wresnet40_2_ratio0.4_fold0.model
. So can you tell me how did you train end-to-end using python search.py -c confs/wresnet40x2_cifar10_b512.yaml --dataroot ... --redis ...
on the Ray cluster which has multiple nodes working in parallel?
Hi, I've tried to run the code for searching policies, but there is no way I can make it run on a single machine with several GPUs and the problem seems to be with Ray. I do initialize the ray server correctly, but apparently the trouble is with the train_and_eval function.
Any suggestion about what I could be doing wrong/ how to workaround this?
2020-08-15 10:18:09,387 ERROR worker.py:1717 -- Possible unhandled error from worker: ray_worker (pid=13478, host=macaron) File "FastAutoAugment/search.py", line 66, in train_model result = train_and_eval(None, dataroot, cv_ratio_test, cv_fold, save_path=save_path, only_eval=skip_exist) File "/home/gim282/data_augmentation/good/FastAutoAugment/train.py", line 123, in train_and_eval add_filehandler(logger, args.save + '.log') NameError: name 'args' is not defined
Intuitively, the Optimizer can choose NO Augment to achieve higher validate accuracy. Why can it work? Looking forward to your answer
I installed the package 'theconf' from here: https://github.com/ildoonet/theconf. However, there is no class named 'Config'. Could you help figure it out? Many thanks!
Hi , man I want to excute seach.py in cifar 10 , I notice that fa_reduced_cifar10
is the result you got , but in wresnet40x2_cifar10_b512.yaml
, there alreayd exist a key aug: fa_reduced_cifar10
, should I delete it and type with nothing ?
Could you provide an example of how to use ur technique to find the best policy for another dataset or maybe for CIFAR.
Thanks for your work!
1.cp search.py ../search.py
2.in the directory. ..../fast-autoaugment
exec the flowing command:
python search.py -c confs/wresnet40x2_cifar10_b512.yaml
got the error.
46%|████████████████████████████████████████████████████████████████████████▎ | 91/200 30:06<00:30, 3.54it/s, cv1=200, cv2=200, cv3=90, cv4=200, cv5=200
(pid=31751) 0200]: 80%|████████ | 16/20 [00:04<00:01, 3.63it/s, loss=0.299, top1=0.905, top5=0.997]
(pid=31751) 0200]: 90%|█████████ | 18/20 [00:04<00:00, 4.60it/s, loss=0.298, top1=0.905, top5=0.997]
46%|████████████████████████████████████████████████████████████████████████▎ | 91/200 30:07<00:30, 3.54it/s, cv1=200, cv2=200, cv3=90, cv4=200, cv5=200
[*test 0000/0200]: 100%|██████████| 20/20 [00:06<00:00, 3.31it/s, loss=0.298, top1=0.904, top5=0.997]
(pid=31751) 2019-12-26 15:58:24,892 ERROR worker.py:433 -- SystemExit was raised from the worker
(pid=31751) Traceback (most recent call last):
(pid=31751) File "python/ray/_raylet.pyx", line 711, in ray._raylet.task_execution_handler
(pid=31751) File "python/ray/_raylet.pyx", line 694, in ray._raylet.execute_task
(pid=31751) SystemExit: 0
170500096it [30:00, 94677.34it/s]
46%|████████████████████████████████████████████████████████████████████████▎
I tried the following script with the found policy on CIFAR10 as described in README.
python FastAutoAugment/train.py -c confs/wresnet40x2_cifar10_b512.yaml --aug fa_reduced_cifar10 --dataset cifar10
With the script, I achieved 92.89% accuracy (i.e., 7.11% error rate) on the test set. However, the 3.7% error is reported in README. I think this gap is too large, so it seems a bug.
How to fix it?
(My PyTorch version is 1.1.0 and machine has 4 Titan Xp GPUs.)
The provided code first subsample 6,000 images, but then filter out irrelevant classes. Therefore, the resulting dataset is much smaller:
https://github.com/kakaobrain/fast-autoaugment/blob/master/FastAutoAugment/data.py#L132-L163
In the code search.py approx L259 ,it is final_policy_set.extend(final_policy), I think it should be final_policy_set.append(final_policy)。
when i use
python search.py -c confs/wresnet40x2_cifar10_b512.yaml --dataroot ... --redis ...
i got result as
[Errno 2] No such file or directory: '/home/kaijie.tang/code/fast-autoaugment/FastAutoAugment/models/cifar10_wresnet40_2_ratio0.4_fold0.model'
[Errno 2] No such file or directory: '/home/kaijie.tang/code/fast-autoaugment/FastAutoAugment/models/cifar10_wresnet40_2_ratio0.4_fold1.model'
[Errno 2] No such file or directory: '/home/kaijie.tang/code/fast-autoaugment/FastAutoAugment/models/cifar10_wresnet40_2_ratio0.4_fold2.model'
[Errno 2] No such file or directory: '/home/kaijie.tang/code/fast-autoaugment/FastAutoAugment/models/cifar10_wresnet40_2_ratio0.4_fold3.model'
[Errno 2] No such file or directory: '/home/kaijie.tang/code/fast-autoaugment/FastAutoAugment/models/cifar10_wresnet40_2_ratio0.4_fold4.model'
code is error in search.py line 186 like
for cv_idx in range(cv_num):
try:
latest_ckpt = torch.load(paths[cv_idx])
if 'epoch' not in latest_ckpt:
epochs_per_cv['cv%d' % (cv_idx + 1)] = C.get()['epoch']
continue
epochs_per_cv['cv%d' % (cv_idx+1)] = latest_ckpt['epoch']
except Exception as e:
continue
why the code load the ckpt before training?
where to find or generate those model ckpt?
Hi Ildoo, thank you for sharing great code.
I tried ResNet18 on ImageNet, but the result is not good. Have you ever experimented on ResNet18, or do you have any suggestions?
Hyperparameters: SGD with linear lr=0.4, batch=1024, weight decay=1e-4, epochs=120.
method | top1 |
---|---|
baseline | 70.68 |
fast aa | 70.22 |
Hi~ I have some questions about the paper and code:
Questions about the code
eval_tta()
(in search.py
)?for _ in range(1):
are used in some places like search.py and class Augmentation in data.py?Questions about the algorithm
our goal is to improve the generalization ability by searching the augmentation policies that match the density of
$D_{train}$ with density of augmented$D_{valid}$
However, the algorithm of ffa just seems to fit the trained model. This algorithm may only pick
augmented data that can make model get high score easily, and these 'easy' augmented data may not match training data? Is there any theoretical guarantee that this algorithm can work?
Inconsistency (maybe) between code and paper
$\mathcal{T}$ indicates a set of augmented images of dataset D transformed by every sub-policies$\tau \in \mathcal{T}$
However, in class Augmentation in data.py, policy = random.choice(self.policies)
is used, so only one of five policies will be used during searching test time augmentation policies. policy
in the code is the same as sub-policy
right? But this method is actually used in AutoAugment, not faa?
Thanks very much if you can offer some help!
I run testing code by using your provided model.
Cifar10
[Wide-ResNet-28-10 | 3.9 | 3.1 | 2.6 | 2.7 / 2.7 | Download]
Cifar100
[Wide-ResNet-28-10 | 18.8 | 18.4 | 17.1 | 17.3 / 17.3 | Download]
But I can't get the paper's results? Is something wrong ?
Looking forward to your reply, thank you~
The results of cifar10 are below:
[2020-11-12 06:10:48,729] [Fast AutoAugment] [WARNING] tag not provided, no tensorboard log.
[2020-11-12 06:10:48,730] [Fast AutoAugment] [INFO] ./FAA_Paper_models/cifar10_wresnet28x10_top1.pth file found. loading...
[2020-11-12 06:10:48,934] [Fast AutoAugment] [INFO] checkpoint epoch@10
[2020-11-12 06:10:48,941] [Fast AutoAugment] [INFO] optimizer.load_state_dict+
[2020-11-12 06:10:48,950] [Fast AutoAugment] [INFO] evaluation only+
[2020-11-12 06:11:57,150] [Fast AutoAugment] [INFO] done.
[2020-11-12 06:11:57,151] [Fast AutoAugment] [INFO] model: {'type': 'wresnet28_10'}
[2020-11-12 06:11:57,151] [Fast AutoAugment] [INFO] augmentation: fa_reduced_cifar10
[2020-11-12 06:11:57,151] [Fast AutoAugment] [INFO]
{
"loss_train": NaN,
"loss_valid": 0.0,
"loss_test": NaN,
"top1_train": 0.09475160256410256,
"top1_valid": 0.0,
"top1_test": 0.0909,
"top5_train": 0.4834735576923077,
"top5_valid": 0.0,
"top5_test": 0.4729,
"epoch": 0
}
[2020-11-12 06:11:57,152] [Fast AutoAugment] [INFO] elapsed time: 0.021 Hours
[2020-11-12 06:11:57,152] [Fast AutoAugment] [INFO] top1 error in testset: 0.9091
[2020-11-12 06:11:57,152] [Fast AutoAugment] [INFO] ./FAA_Paper_models/cifar10_wresnet28x10_top1.pth
After the iterative search in the parameter space is completed, it will get stuck, and there is no error message (399 is the last iteration).
iter 397 ma=0.509 OrderedDict([('RUNNING', 1), ('TERMINATED', 198), ('PENDING', 1), ('PAUSED', 0), ('ERROR', 0)]
2021-05-07 16:49:31,787 WARNING logger.py:126 -- Couldn't import TensorFlow - disabling TensorBoard logging.
2021-05-07 16:49:31,787 WARNING logger.py:220 -- Could not instantiate <class 'ray.tune.logger.TFLogger'> - skipping.
iter 398 ma=0.509 OrderedDict([('RUNNING', 2), ('TERMINATED', 198), ('PENDING', 0), ('PAUSED', 0), ('ERROR', 0)]
2021-05-07 16:49:48,651 INFO ray_trial_executor.py:178 -- Destroying actor for trial search_par_resnet50_fold1_ratio0.4_200_cv_fold=1,cv_ratio_test=0.4,dataroot=_home_ccf_project_SB_PAR_data_rapv2_,level_0_0=0.77372,level_0_1=0.45162,level_1_0=0.00049368,level_1_1=0.39083,level_2_0=0.46218,level_2_1=0.69141,level_3_0=0.0028208,level_3_1=0.27047,level_4_0=0.65674,level_4_1=0.84919,num_op=2,num_policy=5,policy_0_0=3,policy_0_1=7,policy_1_0=0,policy_1_1=10,policy_2_0=13,policy_2_1=7,policy_3_0=5,policy_3_1=12,policy_4_0=11,policy_4_1=1,prob_0_0=0.40213,prob_0_1=0.39349,prob_1_0=0.47788,prob_1_1=0.63856,prob_2_0=0.6497,prob_2_1=0.50779,prob_3_0=0.58183,prob_3_1=0.30122,prob_4_0=0.62576,prob_4_1=0.92233,save_path=_home_ccf_project_fastautoaugment_models_par_resnet50_ratio0.4_fold1.model. If your trainable is slow to initialize, consider setting reuse_actors=True to reduce actor creation overheads.
iter 399 ma=0.509 OrderedDict([('RUNNING', 1), ('TERMINATED', 199), ('PENDING', 0), ('PAUSED', 0), ('ERROR', 0)]
I found that if the following errors are reported before the iteration is completed, it will get stuck. If there are no errors, it can continue to next stage.
iter 364 ma=0.509 OrderedDict([('RUNNING', 2), ('TERMINATED', 181), ('PENDING', 17), ('PAUSED', 0), ('ERROR', 0)
(pid=45772) WARNING: Logging before InitGoogleLogging() is written to STDERR
(pid=45772) E0507 16:44:34.685359 45846 raylet_client.cc:345] IOError: [RayletClient] Connection closed unexpectedly. [RayletClient] Failed to push profile events.
ray==0.6.5
python=3.6.9
tensorflow not install
centos7
Hi,thank you for your work.
Here are my problems:
(1) from watch import PyStopwatch
ImportError: cannot import name 'PyStopwatch' from 'watch' (D:\Anaconda3\lib\site-packages\watch_init_.py)
(2)
[2021-07-24 14:43:40,018] [Fast AutoAugment] [INFO] initialize ray...
2021-07-24 14:43:43,768 INFO services.py:1272 -- View the Ray dashboard at http://127.0.0.1:8265
[2021-07-24 14:43:55,301] [Fast AutoAugment] [INFO] search augmentation policies, dataset=cifar10 model=wresnet40_2
[2021-07-24 14:43:55,301] [Fast AutoAugment] [INFO] ----- Train without Augmentations cv=5 ratio(test)=0.4 -----
['C:\Users\djr83\Desktop\fast-autoaugment-master\FastAutoAugment\models/cifar10_wresnet40_2_ratio0.4_fold0.model', 'C:\Users\djr83\Desktop\fast-autoaugment-master\FastAutoAugment\models/cifar10_wresnet40_2_ratio0.4_fold1.model', 'C:\Users\djr83\Desktop\fast-autoaugment-master\FastAutoAugment\models/cifar10_wresnet40_2_ratio0.4_fold2.model', 'C:\Users\djr83\Desktop\fast-autoaugment-master\FastAutoAugment\models/cifar10_wresnet40_2_ratio0.4_fold3.model', 'C:\Users\djr83\Desktop\fast-autoaugment-master\FastAutoAugment\models/cifar10_wresnet40_2_ratio0.4_fold4.model']
0%| | 0/2 [00:00<?, ?it/s]2021-07-24 14:43:55,344 WARNING worker.py:1123 -- The actor or task with ID a67dc375e60ddd1affffffffffffffffffffffff01000000 cannot be scheduled right now. It requires {GPU: 4.000000}, {CPU: 1.000000} for placement, however the cluster currently cannot provide the requested resources. The required resources may be added as autoscaling takes place or placement groups are scheduled. Otherwise, consider reducing the resource requirements of the task.
0%| | 0/2 [18:40<?, ?it/s]
The version of ray I use is 1.4 and runs under win10.
Can you describe the running environment in detail.
Thank you.
Could you please tell me the portion of D_M and D_A in each fold? Are they evenly splitted?
When I run search.py, I got an error by register_trainable
ValueError: Unknown argument found in the Trainable function. The function args must include a 'config' positional parameter. Any other args must be 'checkpoint_dir'. Found: ['augs', 'rpt']
Any ideas how to fix this?
Hi, @ildoonet
Thanks for your great work, and it has inpired me a lot. Recently, I'm trying to reproduce the search results. When I utilize the ray.tune.hyperoptsearch method as the search method, I cannot get higher accuracy after augmenting the validation data compared with "without augmentation". However, as mentioned in your paper, the results should be better.
Is the phenomenon is normal? If not, how can you utilize the ray package to implement the search process?
Looking forward to your reply.
Thanks.
In Imagenet.py, Imagenet files are allocated through pre-defined URL. But, If I've tried to run the Imagenet.py, the pre-defined URL is not found.
Any suggestion Using Imagenet URL??
Thank you!
fast-autoaugment/FastAutoAugment/archive.py
Line 106 in e79d0c7
Hello,
I have been trying FAA in my application.
I noticed that in search.py, line 116, you're basically taking the minimum loss over all the losses for each image.
Since your loss function defined in line 95 has no reduction, losses ends up being a vector (num_policies*batch_size,). Therefore you just get the minimum loss for a single image as your minimum loss for your metrics. Hence, if you are truly minimizing loss (as it says you do in the paper) using this code, then you're basically just getting mostly random noise as your reward_attr since there will probably always be at least one very good prediction giving low loss.
This may help explain why your CIFAR10 and SVHN policies are basically random. (I am using the policies from archive.py here:
Each augmentation appears roughly the same number of times, with almost uniform distributions of strength and probability of each. What's plotted is the normalized probability of each augmentation ((number of times it appears/total augmentations) * mean probability of the augmentation) on the y axis, vs average strength of each augmentation on the x axis. The same graph is true for SVHN.
On imagenet, you can see the distribution does seem a little bit less random - perhaps this is because loss will be a little more meaningful since there are 1000 classes so the minimum loss will be a slightly less noisy reward signal
By contrast, the augmentations from AutoAug:
This makes more sense to me: there should be some terrible operations that don't get used much, and some that are valuable and are used more...the fact they are roughly equal for cifar 10 is surprising.
So question: did you use top_1_valid to get the policies in archive? Or minus_loss? If it's the latter, was the code that is published being used?
Thanks!
Sean
[2020-08-03 21:23:54,603] [Fast AutoAugment] [INFO] processed in 76.2692 secs [2020-08-03 21:23:54,603] [Fast AutoAugment] [INFO] ----- Search Test-Time Augmentation Policies ----- search_cifar10_wresnet40_2_fold0_ratio0.1 Traceback (most recent call last): File "search.py", line 230, in <module> algo = HyperOptSearch(space, max_concurrent=4*20, reward_attr=reward_attr) TypeError: __init__() got an unexpected keyword argument 'reward_attr' [*test 0000/0010]: 100%|██████████| 79/79 [00:01<00:00, 50.82it/s, loss=0.459, top1=0.848, top5=0.994, loss_ema=0.423]
when I use python search.py -c confs/wresnet40x2_cifar.yaml --aug default
there are some errors. and i want to know where i can see the autoaugment policy i searched.
@ildoonet From config files, it seems there is no cutout on search stage on reduced_imagenet and eval stage on ImageNet. Would you like to confirm/share the detailed information about cutout on ImageNet?
I create this algorithm for voice-antispoofing problem.
Answer please for next some questions because i did not understand this from your paper (Fast AutoAugment):
I hope I described it clearly.
With love, Makarov Rostislav.
hi guys! nice paper.
was wondering if you considered using kornia.augmentation
for your project ? I believe it can help to differentiate over the whole set of augmentation operators.
This would help pretty much also for us to test in real use cases robustness of our API.
Thanks in advance,
Edgar
Thanks for the publication of codes.
In order to evaluate the performance of CIFAR-100, we need the hyper-parameters for the training of CIFAR-100. Would you like to provide the training confs of CIFAR-100?
This code is broken it seems. there were many minor bugs that I fixed but now i see this error when running search.py. I have space left
OSError: [Errno 28] No space left on device
2020-03-13 03:06:59,2529ERROR trial_runner.py:345 -- Trial Runner checkpointing failed. 78), ('PAUSED', 0), ('ERROR', 3)])
Traceback (most recent call last):
File "/home/zhasan/anaconda3/envs/cs234/lib/python3.6/site-packages/ray/tune/trial_runner.py", line 343, in step
self.checkpoint()
File "/home/zhasan/anaconda3/envs/cs234/lib/python3.6/site-packages/ray/tune/trial_runner.py", line 272, in checkpoint
json.dump(runner_state, f, indent=2, cls=_TuneFunctionEncoder)
File "/home/zhasan/anaconda3/envs/cs234/lib/python3.6/json/init.py", line 180, in dump
fp.write(chunk)
OSError: [Errno 28] No space left on device
Hi,
I am currently trying to utilize the PyramidNet + ShakeDrop. However I am getting the following error:
RuntimeError: Output 0 of ShakeDropFunctionBackward is a view and is being modified inplace. This view was created inside a custom Function (or because an input was returned as-is) and the autograd logic to handle view+inplace would override the custom backward associated with the custom Function, leading to incorrect gradients. This behavior is forbidden. You can remove this warning by cloning the output of the custom Function.
If I try to fix the error by changing some lines, the memory usage seems to increase a lot. So I was wondering whether you also encountered the following errors.
Thank you!
First of all, thanks you very much for your generous to sharing code public.
My problem happen when I try to run search.py file and it returns error as an image in below. I don't know why how I get folder models in FastautoAugment folder.
Hope that you can answer question soon. Thank you very much!
P/s: I run your code in my google colaboratory.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.