It is not easy to deploy gate operator with some other backends, like TensorRT. <p

Hi, <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url=

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Can we futher improve autoalim without gate? about ds-net HOT 3 OPEN

changlin31 commented on June 14, 2024

Can we futher improve autoalim without gate?

from ds-net.

Comments (3)

changlin31 commented on June 14, 2024 1

Hi, @twmht

Our supernet does not contain dynamic gate, you can use Autoslim algorithm to find a most suitable subnetwork in the supernet. Please note that our routing space (or search space) is small (only 14 sub-networks) as we need to save BN statics for every sub-network. If you want to perform NAS (eg. Autoslim), you could enlarge the search space.

During our supernet training, we use in-place bootstrapping, which outperform the in-place distillation used in original Autoslim paper (by around 1~2%). So we expect searching in our supernet can lead to better result than Autoslim.

from ds-net.

twmht commented on June 14, 2024

Or can we just remove the dynamic gate after training, and then run the geedy autoslim, and found a better result?

from ds-net.

twmht commented on June 14, 2024

@changlin31

I am not sure if the performance can be boosted if I use dynamic gate for training, but inference without them.

For example, without the distillation, Have you ever compared the following two examples,

training with dynamic gate, and remove the dynamic gate, and then searching from the supernet, get a subnet and get the accruacy1.
training without dynamic gate, like normal slimmable network, and get a similar flops subnet like example1 and get the accuracy2.

I am curious if accuracy1 is better than accuracy2? if it is, then I can conclude that gate training is helpful for boost the performance with slimmable network.

By the way, talking about the distillation for object detection, I am trying to train for feature map distilling, but it's not good. Maybe the feature map distilling dose not make sense for slimmable network, since the weights are shared for all subnet and supernet. So I am wondering how you do distilling for object detection?

from ds-net.

Recommend Projects

Can we futher improve autoalim without gate? about ds-net HOT 3 OPEN

Comments (3)

Related Issues (18)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent