fxmeng / filter-grafting Goto Github PK

View Code? Open in Web Editor NEW

140.0 6.0 23.0 173 KB

Filter Grafting for Deep Neural Networks(CVPR 2020)

Home Page: https://arxiv.org/abs/2001.05868

Shell 5.61% Python 94.39%

filter grafting pruning distillation multi-network pytorch

filter-grafting's Introduction

Filter Grafting for Deep Neural Networks

Introduction

This is the PyTorch implementation of our CVPR 2020 paper "Filter Grafting for Deep Neural Networks".

Invalid filters limit the potential of DNNs since they are identified as having little effect on the network. While filter pruning removes these invalid filters for efficiency consideration, Filter Grafting re-activates them from an accuracy boosting perspective. The activation is processed by grafting external information (weights) into invalid filters.

Prerequisites

Python 3.6+

PyTorch 1.0+

CIFAR dataset

grafting.py [-h] [--lr LR] [--epochs EPOCHS] [--device DEVICE]
                   [--data DATA] [--s S] [--model MODEL] [--cifar CIFAR]
                   [--print_frequence PRINT_FREQUENCE] [--a A] [--c C]
                   [--num NUM] [--i I] [--cos] [--difflr]
PyTorch Grafting Training
optional arguments:
  -h, --help            show this help message and exit
  --lr LR               learning rate
  --epochs EPOCHS       total epochs for training
  --device DEVICE       cuda or cpu
  --data DATA           dataset path
  --s S                 checkpoint save path
  --model MODEL         Network used
  --cifar CIFAR         cifar10 or cifar100 dataset
  --print_frequence PRINT_FREQUENCE
                        test accuracy print frequency
  --a A                 hyperparameter a for calculate weighted average
                        coefficient
  --c C                 hyper parameter c for calculate weighted average
                        coefficient
  --num NUM             Number of Networks used for grafting
  --i I                 This program is the i th Network of all Networks
  --cos                 Use cosine annealing learning rate
  --difflr              Use different initial learning rate

Execute example

Simply run

cd grafting_cifar
./grafting.sh

Two models grafting

CUDA_VISIBLE_DEVICES=0 nohup python grafting.py --s checkpoint/grafting_cifar10_resnet32 --cifar 10  --model resnet32 --num 2 --i 1 >checkpoint/grafting_cifar10_resnet32/1.log &
CUDA_VISIBLE_DEVICES=1 nohup python grafting.py --s checkpoint/grafting_cifar10_resnet32 --cifar 10  --model resnet32 --num 2 --i 2  >checkpoint/grafting_cifar10_resnet32/2.log &

Three models grafting

CUDA_VISIBLE_DEVICES=0 nohup python grafting.py --s checkpoint/grafting_cifar10_resnet32 --cifar 10  --model resnet32 --num 3 --i 1 >checkpoint/grafting_cifar10_resnet32/1.log &
CUDA_VISIBLE_DEVICES=1 nohup python grafting.py --s checkpoint/grafting_cifar10_resnet32 --cifar 10  --model resnet32 --num 3 --i 2 >checkpoint/grafting_cifar10_resnet32/2.log &
CUDA_VISIBLE_DEVICES=2 nohup python grafting.py --s checkpoint/grafting_cifar10_resnet32 --cifar 10  --model resnet32 --num 3 --i 3 >checkpoint/grafting_cifar10_resnet32/3.log &

Results

model	method	cifar10	cifar100
ResNet32	baseline	92.83	69.82
	grafting(slr)	93.33	71.16
	grafting(dlr)	93.94	71.28
ResNet56	baseline	93.50	71.55
	grafting(slr)	94.28	73.09
	grafting(dlr)	94.73	72.83
ResNet110	baseline	93.81	73.21
	grafting(slr)	94.60	74.70
	grafting(dlr)	94.96	75.27
MobileNetv2	baseline	92.42	71.44
	grafting(slr)	93.53	73.26
	grafting(dlr)	94.20	74.15

Grafting(slr) use the same learning rate with baseline that initial learning rate 0.1, and decay 0.1 at every 60 epochs.

While grafting(dlr) set different initial learning rate to increase two models' diversity, and use cosine annealing learning rate to make each batch of data have different importance to further increase the diversity.

MobileNetV2	CIFAR-10	CIFAR-100
baseline	92.42	71.44
6 models ensemble	94.09	76.75
2 models grafting	94.20	74.15
3 models grafting	94.55	76.21
4 models grafting	95.23	77.08
6 models grafting	95.33	78.32
8 models grafting	95.20	77.76

Comparison of the number of invalid filters

model	threshold	baseline(invlid/total)	grafting(invlid/total)
ResNet32	0.1	36/1136	14/1136
	0.01	35/1136	8/1136
MobileNetV2	0.1	10929/17088	9903/17088
	0.01	9834/17088	8492/17088

The relu function will generates a large number of convolution kernels with a gradient of 0. Activation functions like leaky relu will not have kernels with gradients that are always 0. However, invalid filters are still generated that still do not contribute to the model. Grafting also works for models using leaky_relu activation functions.

model	method	cifar10	cifar100
resnet32_leaky_relu	baseline	93.28	70.04
	grafting	93.6	70.93
resnet32_leaky_relu	baseline	94.03	72.24
	grafting	94.32	73.14
resnet32_leaky_relu	baseline	93.24	73.34
	grafting	93.97	73.82

Filter level grafting

model	method	level	cifar10	cifar100
VGG16	baseline	---	93.68	73.41
	grafting	layer	94.02	74.28
	grafting	filter	94.26	74.63

Discusse the two hyper-pameters A and c

MoblieNetV2	A	c	cifar10	cifar100
baseline	---	---	92.42	71.44
		1	93.19	73.3
		5	92.76	72.69
	0.4	10	93.31	73.26
		50	93.24	73.05
		500	92.79	72.38
grafting	0		93.4	72.55
	0.2		93.61	72.9
	0.4	100	93.46	73.13
	0.6		92.6	72.68
	0.8		93.03	71.8
	1		92.53	72.27

ImageNet dataset

usage

grafting.py [-h] [--data DIR] [-a ARCH] [-j N] [--epochs N]
                   [--start-epoch N] [-b N] [--lr LR] [--momentum M] [--wd W]
                   [-p N] [--resume RESUME] [-e] [--pretrained] [--gpu GPU]
                   [--s S] [--num NUM] [--i I] [--a A] [--c C]
PyTorch ImageNet Training
optional arguments:
  -h, --help            show this help message and exit
  --data DIR            path to dataset
  -a ARCH, --arch ARCH  model architecture: alexnet | densenet121 |
                        densenet161 | densenet169 | densenet201 | inception_v3
                        | resnet101 | resnet152 | resnet18 | resnet34 |
                        resnet50 | squeezenet1_0 | squeezenet1_1 | vgg11 |
                        vgg11_bn | vgg13 | vgg13_bn | vgg16 | vgg16_bn | vgg19
                        | vgg19_bn (default: resnet18)
  -j N, --workers N     number of data loading workers (default: 4)
  --epochs N            number of total epochs to run
  --start-epoch N       manual epoch number (useful on restarts)
  -b N, --batch-size N  mini-batch size (default: 256), this is the total
                        batch size of all GPUs on the current node when using
                        Data Parallel or Distributed Data Parallel
  --lr LR, --learning-rate LR
                        initial learning rate
  --momentum M          momentum
  --wd W, --weight-decay W
                        weight decay (default: 1e-4)
  -p N, --print-freq N  print frequency (default: 10)
  --resume RESUME       path to latest checkpoint (default: none)
  -e, --evaluate        evaluate model on validation set
  --pretrained          use pre-trained model
  --gpu GPU             GPU id to use.
  --s S                 checkpoint save dir
  --num NUM             number of Networks in grafting
  --i I                 the i-th program
  --a A                 hyperparameter a for calculate weighted average
                        coefficient
  --c C                 hyper parameter c for calculate weighted average
                        coefficient

Execute example

Simply run

cd grafting_imagenet
./grafting.sh

Two models grafting

CUDA_VISIBLE_DEVICES=0 nohup python grafting.py --arch resnet18 --s grafting_imagenet_resnet18 --num 2 --i 1 >checkpoint/grafting_imagenet_resnet18/1.out &
CUDA_VISIBLE_DEVICES=1 nohup python grafting.py --arch resnet18 --s grafting_imagenet_resnet18 --num 2 --i 2 >checkpoint/grafting_imagenet_resnet18/2.out &

Three models grafting

CUDA_VISIBLE_DEVICES=0 nohup python grafting.py --arch resnet18 --s grafting_imagenet_resnet18 --num 3 --i 1 >checkpoint/grafting_imagenet_resnet18/1.out &
CUDA_VISIBLE_DEVICES=1 nohup python grafting.py --arch resnet18 --s grafting_imagenet_resnet18 --num 3 --i 2 >checkpoint/grafting_imagenet_resnet18/2.out &
CUDA_VISIBLE_DEVICES=2 nohup python grafting.py --arch resnet18 --s grafting_imagenet_resnet18 --num 3 --i 3 >checkpoint/grafting_imagenet_resnet18/3.out &

Results

model	method	top 1	top 5
ResNet18	baseline	69.15	88.87
	grafting	71.19	90.01
ResNet34	baseline	72.60	90.91
	grafting	74.58	92.05
ResNet50	baseline	75.92	92.81
	grafting	76.76	93.34

Citation

If you find this code useful, please cite the following paper:

@InProceedings{Meng_2020_CVPR,
author = {Meng, Fanxu and Cheng, Hao and Li, Ke and Xu, Zhixin and Ji, Rongrong and Sun, Xing and Lu, Guangming},
title = {Filter Grafting for Deep Neural Networks},
booktitle = {The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2020}
}

References

For CIFAR, our code is based on https://github.com/kuangliu/pytorch-cifar.git

For ImageNet, our code is based on https://github.com/pytorch/examples/tree/master/imagenet

filter-grafting's People

Contributors

Stargazers

Watchers

filter-grafting's Issues

作者你好，想问一个论文中的问题

作者你好，你们这个工作很赞！有个问题是Table3和Table5中，graft-resnet32-56-110，在CIFAR10上的结果为什么是不一样的？谢谢~

Adaptive Weighting公式中的A和c的值为0.4和500是否合适？

w=round(0.4*(np.arctan(500*((float(entropy(u).cpu())-float(entropy(checkpoint[key]).cpu())))))/np.pi+1/2,2)

作者你好，如代码里所示，Adaptive Weighting公式中的A和c的默认值为0.4和500，如果是这样的话，算出来的w值基本是0.3或者是0.7，并不是我所理解的一个比较adaptive(smooth)的区间，请问这里会有问题吗？

此外，A应该是控制当熵比较小的时候保留本模型filter的权重，这里的0.4对应的是本模型的权重最小是0.3。请问A的值对结果影响大吗？作者有试过其他值吗？

filter is batter than layer?

in the code,
bn.append(m1.weight.data.abs() / (m1.weight.data.abs() + m2.weight.data.abs()))

i don't understand why the weight better than enotrpy？

Have you tired to train the baseline and grafting models both with cosine learning rate?

Hello:
I noticed that the acc would drop after the decreasing epoch. I trained your baseline model and grafting model with the same cosine learning rate. The acc for the baseline (mobilenetv2) is 72.83, and that for the grafting model(2 models) is 72.80. The acc decreased! I noticed that you never tried the cosine lr. I wonder that is there any hyperparameters wrong (I changed nothing in your grafting.py)?

作者你好，请问一下您，直接运行grafting.py，无法直接训练数据集是CIFAR10

直接运行grafting.py 无法一直训练

训练只会到 epoch 0 ，然后暂停，，，显示是一直停留在time.sleep(10)这个语句。。。
请问下作者
正常的训练步骤是怎样的，，，我的问题出在哪里。。。谢谢

Is there something wrong?

Hi, thanks for your work! I have a question about grafting. I run grafting.sh, the accuracy is 93.910. And the result is 93.430, when train without grafting. It seems normal. but I calculate the number of invalid filters. The invalid filters ratio is 0.0366041362285614 without grafting, the result is 0.0404568612575531 with grafting.

Hello author, I directly run the grafting.py file, the cifar10 dataset, and can't train normally, may I ask you, what should I do

Running grafting.py directly cannot train all the time

The training will only go to epoch 0, then pause, and show that it has been stuck in the time.sleep (10) statement. . .
May I ask the author
What is the normal training procedure, and where is my problem. . . Thank you

batchnorm parameters

I would like to know do you use bn in grafting training?
If so, how do you grafting bn parameters, the same as weights?

Ambiguous value for fuse-weight?

    for i,(key,u) in enumerate(net.state_dict().items()):
        if 'conv' in key:
            w=round(0.4*(np.arctan(500*((float(entropy(u).cpu())-float(entropy(checkpoint[key]).cpu())))))/np.pi+1/2,2)
        model[key]=u*w+checkpoint[key]*(1-w)

For layeres are not "conv", such as "BN", what value of "w" sholud be? It seemes that they are determinted by the previous layer, typically are pre "conv"?

Your baseline model do not converge

Is there any bugs in your baseline.py? I ran the resnet32 on cifar100, but the acc is only 4%

Can this work extend to other domains like segmentation/detection ?

It seems there is no special limitation for this work.

请教一下代码中参数α的计算问题

您的论文中的α的定义和代码中的计算方式有所不同
代码中α计算为w = round(args.a / np.pi * np.arctan(args.c * (entropy(u) - entropy(checkpoint[key]))) + 0.5, 2)
而文章中并未除以π
请问代码中多出的这一步计算参数有何意义呢

graft训练mbv2模型没有优化效果

你好，非常感谢你的工作，但是我在测试你的方法的时候，始终无法达到你的效果,我验证了不同学习率下ｍｂｖ２网络在cifar-10上的得分，以及cos学习策略下的得分，和使用graft方法后得分，发现graft算法好像没作用（基础模型是92.1，graft是92.28），请问你知道原因吗？所有设置均按照默认设置
学习方式学习率精度文件夹
lr 　　　0.1 92.10 2
lr 　　　　 0.1(2) 94.06 5
lr 　　0.1(10) 90.10 4
lr 　　 0.1(100) 40.90 3
cos 　　 0.1 92.75(92.6) 6
grafting(lr) 　 0.1 92.28 2

关于Grafting.py中的几个问题，盼复

176行 torch.save(state, '%s/ckpt%d_%d.t7' % (args.s, args.i % args.num, epoch))，保存了一个epoch训练好的模型权重
106行 checkpoint = torch.load('%s/ckpt%d_%d.t7' % (args.s, args.i - 1, epoch))['net']。加载已有的权重信息，可是没有这个名字的权重信息，会导致程序在 time.sleep(10)死循环里，希望能够得到您的解答感谢

你好，使用VGG模型运行grafting_cifar时，出现错误

models的vgg代码中构造网络时使用了 _make_layers构造卷积层，在运行grafting.py时会出现UnboundLocalError: local variable 'w' referenced before assignment

您好，关于grafting.py中的grafting函数有个问题

我看代码的缩进是这样的：

def grafting(net, epoch):
    while True:
        try:
            checkpoint = torch.load('%s/ckpt%d_%d.t7' % (args.s, args.i - 1, epoch))['net']
            break
        except:
            time.sleep(10)
    model = collections.OrderedDict()
    for i, (key, u) in enumerate(net.state_dict().items()):
        if 'conv' in key:
            w = round(args.a / np.pi * np.arctan(args.c * (entropy(u) - entropy(checkpoint[key]))) + 0.5, 2)
        model[key] = u * w + checkpoint[key] * (1 - w)
    net.load_state_dict(model)

这里w是嫁接系数α
所以是所有层都参与嫁接？
但是这些不是卷积层的层，它们的嫁接系数是通过其上面的一个卷积层来计算的？
谢谢！

imagenet实验baseline使用step lr，而grafting使用cosine

如题，请问是否有一些不公平？

您好，关于论文的一些细节请教。

请问，文章使用KL散度作为了一个评价准则，但是KL散度是互信息，请问互信息作为单一变量的评价准则是否合适？还是我的理解有偏差？

您好，我有几个关于grafting.py代码的问题想请教您

您好，针对这段代码我有几个问题想请教您：
问题1：graft操作对卷积层的权重和偏置以及后面的BN层都进行了处理了吗？
问题2：对于vgg16的情况，key是features.0.weight, features.1.bias这样的形式，该如何修改才能使这段代码适用于vgg16呢？

can not reproduce the results

![image](https://user-images.githubusercontent.com/26025961/77132458-8a898e80-6a9a-11ea-9782-4398f851b752.png
I trained the resnet32 on cifar100 with 2 grafting setting. I set the seed 1 for the first model and 2 for the second. But I found the acc even decreased.

etwork:2 epoch:0 accuracy:14.700 best:14.700
Network:2 epoch:1 accuracy:16.540 best:16.540
Network:2 epoch:2 accuracy:24.700 best:24.700
Network:2 epoch:3 accuracy:23.520 best:24.700
Network:2 epoch:4 accuracy:32.250 best:32.250
Network:2 epoch:5 accuracy:34.070 best:34.070
Network:2 epoch:6 accuracy:37.280 best:37.280
Network:2 epoch:7 accuracy:37.320 best:37.320
Network:2 epoch:8 accuracy:36.540 best:37.320
Network:2 epoch:9 accuracy:40.050 best:40.050
Network:2 epoch:10 accuracy:44.270 best:44.270
Network:2 epoch:11 accuracy:39.200 best:44.270
Network:2 epoch:12 accuracy:38.870 best:44.270
Network:2 epoch:13 accuracy:44.300 best:44.300
Network:2 epoch:14 accuracy:43.690 best:44.300
Network:2 epoch:15 accuracy:38.960 best:44.300
Network:2 epoch:16 accuracy:47.530 best:47.530
Network:2 epoch:17 accuracy:40.490 best:47.530
Network:2 epoch:18 accuracy:46.240 best:47.530
Network:2 epoch:19 accuracy:40.100 best:47.530
Network:2 epoch:20 accuracy:41.650 best:47.530
Network:2 epoch:21 accuracy:44.300 best:47.530
Network:2 epoch:22 accuracy:44.440 best:47.530
Network:2 epoch:23 accuracy:44.870 best:47.530
Network:2 epoch:24 accuracy:43.860 best:47.530
Network:2 epoch:25 accuracy:46.760 best:47.530
Network:2 epoch:26 accuracy:32.770 best:47.530
Network:2 epoch:27 accuracy:42.220 best:47.530
Network:2 epoch:28 accuracy:48.920 best:48.920
Network:2 epoch:29 accuracy:44.960 best:48.920
Network:2 epoch:30 accuracy:45.670 best:48.920
Network:2 epoch:31 accuracy:45.630 best:48.920
Network:2 epoch:32 accuracy:45.840 best:48.920
Network:2 epoch:33 accuracy:46.910 best:48.920
Network:2 epoch:34 accuracy:51.240 best:51.240
Network:2 epoch:35 accuracy:48.490 best:51.240
Network:2 epoch:36 accuracy:49.460 best:51.240
Network:2 epoch:37 accuracy:45.080 best:51.240
Network:2 epoch:38 accuracy:49.390 best:51.240
Network:2 epoch:39 accuracy:45.370 best:51.240
Network:2 epoch:40 accuracy:40.510 best:51.240
Network:2 epoch:41 accuracy:39.560 best:51.240
Network:2 epoch:42 accuracy:46.540 best:51.240
Network:2 epoch:43 accuracy:48.780 best:51.240
Network:2 epoch:44 accuracy:49.220 best:51.240
Network:2 epoch:45 accuracy:46.590 best:51.240
Network:2 epoch:46 accuracy:40.120 best:51.240
Network:2 epoch:47 accuracy:44.470 best:51.240
Network:2 epoch:48 accuracy:42.030 best:51.240
Network:2 epoch:49 accuracy:47.310 best:51.240
Network:2 epoch:50 accuracy:46.580 best:51.240
Network:2 epoch:51 accuracy:45.010 best:51.240
Network:2 epoch:52 accuracy:46.270 best:51.240
Network:2 epoch:53 accuracy:47.070 best:51.240
Network:2 epoch:54 accuracy:46.270 best:51.240
Network:2 epoch:55 accuracy:49.480 best:51.240
Network:2 epoch:56 accuracy:45.360 best:51.240
Network:2 epoch:57 accuracy:46.950 best:51.240
Network:2 epoch:58 accuracy:47.840 best:51.240
Network:2 epoch:59 accuracy:52.580 best:52.580
Network:2 epoch:60 accuracy:43.420 best:52.580
Network:2 epoch:61 accuracy:68.060 best:68.060
Network:2 epoch:62 accuracy:68.230 best:68.230
Network:2 epoch:63 accuracy:68.550 best:68.550
Network:2 epoch:64 accuracy:68.750 best:68.750
Network:2 epoch:65 accuracy:68.490 best:68.750
Network:2 epoch:66 accuracy:68.290 best:68.750
Network:2 epoch:67 accuracy:68.320 best:68.750
Network:2 epoch:68 accuracy:67.870 best:68.750
Network:2 epoch:69 accuracy:68.240 best:68.750
Network:2 epoch:70 accuracy:67.790 best:68.750
Network:2 epoch:71 accuracy:67.170 best:68.750
Network:2 epoch:72 accuracy:68.130 best:68.750
Network:2 epoch:73 accuracy:68.610 best:68.750
Network:2 epoch:74 accuracy:66.910 best:68.750
Network:2 epoch:75 accuracy:66.640 best:68.750
Network:2 epoch:76 accuracy:66.710 best:68.750
Network:2 epoch:77 accuracy:66.220 best:68.750
Network:2 epoch:78 accuracy:65.440 best:68.750
Network:2 epoch:79 accuracy:66.520 best:68.750
Network:2 epoch:80 accuracy:66.810 best:68.750
Network:2 epoch:81 accuracy:66.030 best:68.750
Network:2 epoch:82 accuracy:65.430 best:68.750
Network:2 epoch:83 accuracy:66.470 best:68.750
Network:2 epoch:84 accuracy:66.250 best:68.750
Network:2 epoch:85 accuracy:65.690 best:68.750
Network:2 epoch:86 accuracy:65.500 best:68.750
Network:2 epoch:87 accuracy:66.020 best:68.750
Network:2 epoch:88 accuracy:65.160 best:68.750
Network:2 epoch:89 accuracy:63.700 best:68.750
Network:2 epoch:90 accuracy:65.590 best:68.750
Network:2 epoch:91 accuracy:65.310 best:68.750
Network:2 epoch:92 accuracy:63.440 best:68.750
Network:2 epoch:93 accuracy:64.340 best:68.750
Network:2 epoch:94 accuracy:64.090 best:68.750
Network:2 epoch:95 accuracy:64.020 best:68.750
Network:2 epoch:96 accuracy:63.130 best:68.750
Network:2 epoch:97 accuracy:62.210 best:68.750
Network:2 epoch:98 accuracy:63.610 best:68.750
Network:2 epoch:99 accuracy:63.960 best:68.750
Network:2 epoch:100 accuracy:64.730 best:68.750
Network:2 epoch:101 accuracy:65.030 best:68.750
Network:2 epoch:102 accuracy:64.990 best:68.750
Network:2 epoch:103 accuracy:64.130 best:68.750
Network:2 epoch:104 accuracy:63.280 best:68.750
Network:2 epoch:105 accuracy:63.800 best:68.750
Network:2 epoch:106 accuracy:64.050 best:68.750
Network:2 epoch:107 accuracy:63.460 best:68.750
Network:2 epoch:108 accuracy:64.790 best:68.750
Network:2 epoch:109 accuracy:64.470 best:68.750
Network:2 epoch:110 accuracy:65.210 best:68.750
Network:2 epoch:111 accuracy:64.350 best:68.750
Network:2 epoch:112 accuracy:62.980 best:68.750
Network:2 epoch:113 accuracy:63.390 best:68.750
Network:2 epoch:114 accuracy:63.860 best:68.750
Network:2 epoch:115 accuracy:64.430 best:68.750
Network:2 epoch:116 accuracy:62.950 best:68.750
Network:2 epoch:117 accuracy:63.950 best:68.750
Network:2 epoch:118 accuracy:64.000 best:68.750
Network:2 epoch:119 accuracy:63.570 best:68.750
Network:2 epoch:120 accuracy:62.570 best:68.750
Network:2 epoch:121 accuracy:70.290 best:70.290
Network:2 epoch:122 accuracy:69.990 best:70.290
Network:2 epoch:123 accuracy:70.000 best:70.290
Network:2 epoch:124 accuracy:70.210 best:70.290
Network:2 epoch:125 accuracy:69.750 best:70.290
Network:2 epoch:126 accuracy:69.850 best:70.290
Network:2 epoch:127 accuracy:70.200 best:70.290
Network:2 epoch:128 accuracy:69.730 best:70.290
Network:2 epoch:129 accuracy:69.830 best:70.290
Network:2 epoch:130 accuracy:69.780 best:70.290
Network:2 epoch:131 accuracy:69.520 best:70.290
Network:2 epoch:132 accuracy:69.560 best:70.290
Network:2 epoch:133 accuracy:69.630 best:70.290
Network:2 epoch:134 accuracy:69.770 best:70.290
Network:2 epoch:135 accuracy:69.750 best:70.290
Network:2 epoch:136 accuracy:69.390 best:70.290
Network:2 epoch:137 accuracy:69.630 best:70.290
Network:2 epoch:138 accuracy:69.250 best:70.290
Network:2 epoch:139 accuracy:69.460 best:70.290
Network:2 epoch:140 accuracy:69.420 best:70.290
Network:2 epoch:141 accuracy:69.230 best:70.290
Network:2 epoch:142 accuracy:69.490 best:70.290
Network:2 epoch:143 accuracy:69.430 best:70.290
Network:2 epoch:144 accuracy:69.220 best:70.290
Network:2 epoch:145 accuracy:69.660 best:70.290
Network:2 epoch:146 accuracy:69.330 best:70.290
Network:2 epoch:147 accuracy:69.070 best:70.290
Network:2 epoch:148 accuracy:69.260 best:70.290
Network:2 epoch:149 accuracy:69.350 best:70.290
Network:2 epoch:150 accuracy:69.130 best:70.290
Network:2 epoch:151 accuracy:69.270 best:70.290
Network:2 epoch:152 accuracy:68.890 best:70.290
Network:2 epoch:153 accuracy:69.220 best:70.290
Network:2 epoch:154 accuracy:68.980 best:70.290
Network:2 epoch:155 accuracy:68.850 best:70.290
Network:2 epoch:156 accuracy:68.970 best:70.290
Network:2 epoch:157 accuracy:69.260 best:70.290
Network:2 epoch:158 accuracy:69.140 best:70.290
Network:2 epoch:159 accuracy:69.100 best:70.290
Network:2 epoch:160 accuracy:68.860 best:70.290
Network:2 epoch:161 accuracy:68.990 best:70.290
Network:2 epoch:162 accuracy:69.120 best:70.290
Network:2 epoch:163 accuracy:68.780 best:70.290
Network:2 epoch:164 accuracy:69.190 best:70.290
Network:2 epoch:165 accuracy:68.560 best:70.290
Network:2 epoch:166 accuracy:68.860 best:70.290
Network:2 epoch:167 accuracy:68.860 best:70.290
Network:2 epoch:168 accuracy:68.620 best:70.290
Network:2 epoch:169 accuracy:69.010 best:70.290
Network:2 epoch:170 accuracy:68.760 best:70.290
Network:2 epoch:171 accuracy:68.680 best:70.290
Network:2 epoch:172 accuracy:68.950 best:70.290
Network:2 epoch:173 accuracy:68.830 best:70.290
Network:2 epoch:174 accuracy:68.740 best:70.290
Network:2 epoch:175 accuracy:68.780 best:70.290
Network:2 epoch:176 accuracy:68.620 best:70.290
Network:2 epoch:177 accuracy:68.410 best:70.290
Network:2 epoch:178 accuracy:68.540 best:70.290
Network:2 epoch:179 accuracy:68.660 best:70.290
Network:2 epoch:180 accuracy:68.610 best:70.290
Network:2 epoch:181 accuracy:68.750 best:70.290
Network:2 epoch:182 accuracy:68.690 best:70.290
Network:2 epoch:183 accuracy:68.730 best:70.290
Network:2 epoch:184 accuracy:68.710 best:70.290
Network:2 epoch:185 accuracy:68.750 best:70.290
Network:2 epoch:186 accuracy:68.590 best:70.290
Network:2 epoch:187 accuracy:68.710 best:70.290
Network:2 epoch:188 accuracy:68.740 best:70.290
Network:2 epoch:189 accuracy:68.690 best:70.290
Network:2 epoch:190 accuracy:68.820 best:70.290
Network:2 epoch:191 accuracy:68.820 best:70.290
Network:2 epoch:192 accuracy:68.480 best:70.290
Network:2 epoch:193 accuracy:68.810 best:70.290
Network:2 epoch:194 accuracy:68.760 best:70.290
Network:2 epoch:195 accuracy:68.790 best:70.290
Network:2 epoch:196 accuracy:68.770 best:70.290
Network:2 epoch:197 accuracy:68.590 best:70.290
Network:2 epoch:198 accuracy:68.730 best:70.290
Network:2 epoch:199 accuracy:68.840 best:70.290

The final act is 68.84. The corresponding baseline in your paper is 69.82

fxmeng / filter-grafting Goto Github PK

filter-grafting's Introduction

Filter Grafting for Deep Neural Networks

Introduction

Prerequisites

CIFAR dataset

Execute example

Simply run

Two models grafting

Three models grafting

Results

ImageNet dataset

usage

Execute example

Simply run

Two models grafting

Three models grafting

Results

Citation

References

filter-grafting's People

Contributors

Stargazers

Watchers

Forkers

filter-grafting's Issues

Recommend Projects

Recommend Topics

Recommend Org