Comments (7)
Hi, @sixzerotech
This is an expected behavior as the gate is very difficult to tune. I suggest you limit the routing space to larger sub-networks (e.g., choice 4-8) if you want to select larger ones. Or, you could try disabling the complexity loss and lower the weight of SGS loss.
from ds-net.
Hi, @sixzerotech
This is an expected behavior as the gate is very difficult to tune. I suggest you limit the routing space to larger sub-networks (e.g., choice 4-8) if you want to select larger ones. Or, you could try disabling the complexity loss and lower the weight of SGS loss.
Yes, I agree with you that the gate is really difficult to tune after my countless experiments. Thank you for your quick reply, looking forward to your future work to solve this thorny problem.
from ds-net.
To optimize the nondifferentiable slimming head of dynamic gate
I don't understand why slimming head is nondifferentiable, because I think the output of slimming head is not in the computation graph of latter network layers.
from ds-net.
Hi @sixzerotech
The output of slimming head is used as a sub-network routing signal for subsequent layers.
First, let we assume that the SGS training loss is not introduced. To optimize the gate by AutoGrad end-to-end, we need to include its output in the computation graph. This is achieved by masking the output of subsequent layers by the output of the gate. However, this hard (0 or 1) mask is not differentiable. We follow previous works that use tricks such as semihash, gumbel-softmax to tuckle this.
Second, as we already introduced SGS loss, the end-to-end target loss with gumbel-softmax is not a necessity. However, as SGS loss will only encourage the network to choose the first or last gate (gate target is [1, 0, 0, 0] or [0, 0, 0, 1]), it is better to combine it with end-to-end target loss with gumbel-softmax.
from ds-net.
Thank you very much for your reply!
from ds-net.
hi, changlin, I have another question about the num_choice
. If I set the num_choice
to 14 and train the gate, the gate tends to choose the smallest sub-network even with gumbel softmax.
Below is my log
`02/20 10:45:57 AM Distributing BatchNorm running means and vars
02/20 10:46:49 AM blocks.3.first_block.gate: tensor([271., 11., 24., 8., 13., 12., 5., 9., 8., 15., 11., 9.,
12., 16.], device='cuda:0')
02/20 10:46:49 AM Test: [ 48/48 (100%)] Loss: 0.7052 (1.2775) Acc@1: 83.254715 (70.4500) Acc@5: 94.221695 (89.2820) GateAcc: 53.7736(53.3780) Flops: 201547424 (185565811) Time: 0.918s, 923.72/s (1.067s, 794.82/s) DataTime: 0.113 (0.126)
02/20 10:47:40 AM blocks.3.first_block.gate: tensor([257., 12., 12., 7., 12., 10., 9., 18., 11., 13., 9., 18.,
21., 15.], device='cuda:0')
02/20 10:47:40 AM Test(EMA): [ 48/48 (100%)] Loss: 0.7114 (1.2773) Acc@1: 83.254715 (70.4200) Acc@5: 94.575470 (89.2200) GateAcc: 51.2972(53.1000) Flops: 220756448 (185856099) Time: 0.904s, 938.03/s (1.041s, 814.28/s) DataTime: 0.117 (0.121)
02/20 10:47:40 AM Current checkpoints:
('./output/train-dynamic-slimmable-slimmable_mbnet_v1_bn_uniform/20220217-223124-slimmable_mbnet_v1_bn_uniform/checkpoint-9.pth.tar', 70.53800006835938)
('./output/train-dynamic-slimmable-slimmable_mbnet_v1_bn_uniform/20220217-223124-slimmable_mbnet_v1_bn_uniform/checkpoint-4.pth.tar', 70.49399994140624)
('./output/train-dynamic-slimmable-slimmable_mbnet_v1_bn_uniform/20220217-223124-slimmable_mbnet_v1_bn_uniform/checkpoint-10.pth.tar', 70.4880000439453)
('./output/train-dynamic-slimmable-slimmable_mbnet_v1_bn_uniform/20220217-223124-slimmable_mbnet_v1_bn_uniform/checkpoint-18.pth.tar', 70.4880000439453)
('./output/train-dynamic-slimmable-slimmable_mbnet_v1_bn_uniform/20220217-223124-slimmable_mbnet_v1_bn_uniform/checkpoint-32.pth.tar', 70.48800001953126)
('./output/train-dynamic-slimmable-slimmable_mbnet_v1_bn_uniform/20220217-223124-slimmable_mbnet_v1_bn_uniform/checkpoint-13.pth.tar', 70.48600001953125)
('./output/train-dynamic-slimmable-slimmable_mbnet_v1_bn_uniform/20220217-223124-slimmable_mbnet_v1_bn_uniform/checkpoint-25.pth.tar', 70.46999999267578)
('./output/train-dynamic-slimmable-slimmable_mbnet_v1_bn_uniform/20220217-223124-slimmable_mbnet_v1_bn_uniform/checkpoint-8.pth.tar', 70.46800004638672)
('./output/train-dynamic-slimmable-slimmable_mbnet_v1_bn_uniform/20220217-223124-slimmable_mbnet_v1_bn_uniform/checkpoint-22.pth.tar', 70.46600004394531)
('./output/train-dynamic-slimmable-slimmable_mbnet_v1_bn_uniform/20220217-223124-slimmable_mbnet_v1_bn_uniform/checkpoint-39.pth.tar', 70.41999996582031)`
from ds-net.
Thank you for your understanding. I'm closing this for now.
from ds-net.
Related Issues (19)
- The Approximate Date for Stage II training code HOT 6
- Actual acceleration on Resnet HOT 2
- Dynamic path for DS-mobilenet HOT 1
- 运行问题 HOT 1
- DS-Net for object detection HOT 8
- MAdds of Pretrained Supernet HOT 2
- why not set ensemble_ib to True? HOT 2
- Can we futher improve autoalim without gate? HOT 3
- Softmax twice for SGS loss?
- Commands to perfrom Inference
- Question about calculating MAdds of dynamic network in the paper HOT 3
- Pretrained models HOT 2
- Object Detection HOT 2
- project environment HOT 1
- Why the num_choice in different yml is different? HOT 2
- Some issues about the gradients of slimNet HOT 6
- UserWarning: Argument interpolation should be of type InterpolationMode instead of int. Please, use InterpolationMode enum. HOT 3
- Error of change the num_choice in mobilenetv1_bn_uniform_reset_bn.yml HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ds-net.