mit-han-lab / gan-compression Goto Github PK
View Code? Open in Web Editor NEW[CVPR 2020] GAN Compression: Efficient Architectures for Interactive Conditional GANs
License: Other
[CVPR 2020] GAN Compression: Efficient Architectures for Interactive Conditional GANs
License: Other
Could you tell me which document can let me understand the process of channel pruning?
The normlize is not work, when tensor.dim() == 4.
if image_tensor.dim() == 4:
# transform each image in the batch
images_np = []
for b in range(image_tensor.size(0)):
one_image = image_tensor[b]
one_image_np = tensor2im(one_image)
images_np.append(one_image_np.reshape(1, *one_image_np.shape))
images_np = np.concatenate(images_np, axis=0)
https://github.com/mit-han-lab/gan-compression/blob/master/utils/util.py#L69
Great work!
Is it possible used in stylegan2?
If it is possible, how to use it?Can you give me some tips?
Thanks a lot!
i'm trying to download your pretrained models and it seems that the fileserver is down. Can you guys check it out?
Hi, author!
How can I transfer Pix2Pix model to my own datasets such as deRain datasets?
Can I just prepare the dataset according to the form like your datasets?
Tried to test colab with default opts, but can not find necessary file "opt_compressed.pkl". Should I run a specific script to generate it?
Could you share the real_stats for the COCO data? It is not downloadable, and running the get_real_stat.py file for the coco data returns error:
python get_real_stat.py --dataroot database/coco_stuff --output_path real_stat/coco_A.npz --direction BtoA --dataset_mode coco
RuntimeError: Sizes of tensors must match except in dimension 0. Got 424 and 640 in dimension 2 (The offending index is 1)
Thanks.
if **getattr(opt, 'sort_channels', False)** and opt.restore_student_G_path is not None: # line 74 for base_resnet_distiller.py
For "**getattr(opt, 'sort_channels', False)**", I check the definition of the function "getattr", it used to be the format of "getattr(object, name, default=None)", and when setting this default papram to "False" or "True", it won't affect the function output, this function just return the value of **opt.sort_channels** .
I want to know do I need to sort channels before OFA, that is, setting sort_channels = True. cuz I notice the role of "netG_student_tmp" in supernet training, sorting channels before transfering pretrained weights to student_netG.
I spotted that you have a normalization layer in your separable convolution implementation.
self.conv = nn.Sequential(
nn.Conv2d(in_channels=in_channels, out_channels=in_channels * scale_factor, kernel_size=kernel_size,
stride=stride, padding=padding, groups=in_channels, bias=use_bias),
norm_layer(in_channels),
nn.Conv2d(in_channels=in_channels * scale_factor, out_channels=out_channels,
kernel_size=1, stride=1, bias=use_bias),
)
I did not see such implementations before. Also, why it doesn't get adjust to scale_factor
?
I just have one GPU. Searching is so slow. I want to run another code.
How can I continue searching after finish another project?
65%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████▌ | 37494/57600 [40:38:20<137:38:38, 24.65s/it]MACs: 4.364G Params: 1.987M
{'config_str': '48_32_32_48_40_32_24_16', 'macs': 4363780096, 'fid': 68.67807846983607}
65%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████▌ | 37495/57600 [40:38:48<143:48:26, 25.75s/it]MACs: 5.282G Params: 1.987M
{'config_str': '48_32_32_48_40_32_16_64', 'macs': 5282070528, 'fid': 64.41984751671731}
65%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████▌ | 37496/57600 [40:39:19<152:29:44, 27.31s/it]MACs: 4.825G Params: 1.987M
Thank you!
Hi, after going through the codes I have come up with few questions regarding the training and distillation.
resnet_supernet.py
?resnet_distiller.py
and transfer the weight to the student supernet?load_networks
function in resnet_distiller.py
, is it necessary to transfer the weight of the teacher network to the student network? or is it just for faster training and convergence?I notice your scripts can distill resblock of 16, 24, 32.
But how to distill a specific structure like "16_16_32_16_32_32_16_16"?
I think your experiment in fig 6 of the paper, is it to compare 'pruning+distill" with "gan-compressIon" in the same MACs and resblock structure?
@lmxyy
This project license is not fully clear.
Can't this repository be used for commercial uses?
How can I use multi-gpu when training?
I have tried --gpu_ids 1,2,3,4 when training, but when training mobile this can run correctly but when distilling this was not.
I notice your code can support student with structure "16_16_16...._16", "24_24...24", "32_32..._32",
but how to distill a student like "16_16_32_16_32_32_16_16"?
If I add config_str in distill options, will it work?
@lmxyy
i am trying everything from scratch, when running get_real_stat.py
i get an error.
why does it require creating a val
folder? there is already valA
and valB
(gan) home@home-lnx:~/programs/level 2/gan-compression$ python get_real_stat.py --dataroot database/horse2zebra/ --output_path real_stat/horse2zebra_B.npz --direction AtoB
Traceback (most recent call last):
File "get_real_stat.py", line 84, in <module>
main(opt)
File "get_real_stat.py", line 15, in main
dataloader = create_dataloader(opt)
File "/home/home/programs/level 2/gan-compression/data/__init__.py", line 45, in create_dataloader
dataloader = CustomDatasetDataLoader(opt, verbose)
File "/home/home/programs/level 2/gan-compression/data/__init__.py", line 97, in __init__
self.dataset = dataset_class(opt)
File "/home/home/programs/level 2/gan-compression/data/aligned_dataset.py", line 24, in __init__
self.AB_paths = sorted(make_dataset(self.dir_AB)) # get image paths
File "/home/home/programs/level 2/gan-compression/data/image_folder.py", line 45, in make_dataset
assert os.path.isdir(dir) or os.path.islink(dir), '%s is not a valid directory' % dir
AssertionError: database/horse2zebra/val is not a valid directory
(gan) home@home-lnx:~/programs/level 2/gan-compression/database/horse2zebra$ tree -d
.
├── testA
├── testB
├── trainA
├── trainB
├── valA -> testA
└── valB -> testB
Thank you for sharing your work! Did you define the "MobileNet Teacher Model" by yourself or from the original paper?
Hi! Thank you all for the tremendous and awesome work.
I want to ask you what would be your recommendations regarding incorporating pix2pixHD?
I've done almost all the necessary steps, but I'm interested in your advice on supernet. pix2pixHD has the same amount of blocks as pix2pix, however, the dimensionality of connecting layers (mapping layers) differs. How would you propose to modify resnet-9blocks for that? or maybe you have better faith in another architecture?
This is a great project. However, I met the similar issue as this. Moreover, the issue seems to be a little randomly, such that training with 2 GPUs can sometimes run correctly but sometimes not. The key point is that the intermediate features are obtained from forward hook and stored in dictionaries, which cannot guarantee correct device as the netA. The bug occurs here, where netA is always on cuda:0 because it is not wrapped with data parallel correctly (https://github.com/mit-han-lab/gan-compression/blob/master/distillers/base_resnet_distiller.py#L117-L123 , where it should be netA = networks.init_net(netA, gpu_ids=self.gpu_ids), and there should not be to(device) in the above two cases; this line and this line should be modified accordingly to something like getattr(netA, 'module', netA)). However, since Tact and Sact are randomly on different devices, and dictionary will lose this information, there will still be some bug on this. One possible solution is to include such device in the keys here to include information of device of output, but since netA is data parallel, which scatter inputs and replicate weights during its forward, netA(Sact) still does not work. If moving netA to the device of Sact before calling it, the optimizerG step will cause problem as the grad will be on different device as the netA's weight. I am not sure if there is a solution on this. I notice you changed the code structure in the spade net, but I wonder if there can be simpler solution.
Hi,
I wanted to implement GAN compression on TensorFlow so I can use it on a mobile application.
I'm pretty new to this field and I'm not aware of the challenges of doing something like this might have.
Should I implement the whole thing on Tensorflow myself? would it be as fast as it is in Pytorch?
Or should I convert the pytorch model to tensorflow with onnx?
I would really appreciate any help
hi!
I noticed that your distill process only support resnet, so I want to know would smaller teacher model work.
Is it possible used in any GAN-based model?
such as : https://github.com/HsinYingLee/DRIT and https://github.com/NVlabs/FUNIT
I try to compress pix2pixhd. But I find it will cost longer time in one iter.
For example, it cost 0.4s per images in the first epoch. But it will cost 8s per image in the 5 epoch.
Do you have the same question?
I compare the distill result and the supernet result(w/o finetune), results are:
#super net w/o finetune:
config_str MACs FID
32_32_32_32_32_32_32_32 4.955 55.73
32_16_32_24_32_32_24_24 3.639 60.27
16_16_32_16_32_32_16_16 2.546 65.33
16_16_16_32_32_32_16_24 1.977 134.45
16_16_16_16_16_16_16_16 1.421 223.46
#distill
config_str MACs FID
32_32_32_32_32_32_32_32 4.955 65.78
32_16_32_24_32_32_24_24 3.639 73.50
16_16_32_16_32_32_16_16 2.546 80.36
16_16_16_32_32_32_16_24 1.977 83.06
16_16_16_16_16_16_16_16 1.421 103.17
it seems when MACs>1.977, supernet is better than distill, but when MACs<1.977, distill is better.
Also it seems when MACs>1.977, OFA training reduce the capacity gpa between student and teacher, then get better performance.
Do you know why?
I feel puzzled at self.netG_pretrained in resnet_distiller.py.
It is here: https://github.com/mit-han-lab/gan-compression/blob/master/distillers/resnet_distiller.py#L94
Why it is deleted after loaded?
def load_networks(self, verbose=True):
if self.opt.restore_pretrained_G_path is not None:
util.load_network(self.netG_pretrained, self.opt.restore_pretrained_G_path, verbose)
load_pretrained_weight(self.opt.pretrained_netG, self.opt.student_netG,
self.netG_pretrained, self.netG_student,
self.opt.pretrained_ngf, self.opt.student_ngf)
del self.netG_pretrained
super(ResnetDistiller, self).load_networks()
Could you give me some instructions about how to train the compressed model, not the pre-trained compressed model? How about to take pix2pix for example? I don't understand those train mode.
Hello, the following is the "once-for-all" training stratage you mentioned in ur paper:
"At each training step, we randomly sample a sub-network with a certain channel number configuration, compute the output and gradients, and update the extracted weights using our learning objective (Equation 4)"
where can I find this stratage in your codes?
I had successfully run search.py and evaluate sub-models, but when using search_multi.py, there is an error says
'RuntimeError: Expected tensor for argument #1 'input' to have the same device as tensor for argument #2 'weight'; but device 6 does not equal 0', which means Tensor or model is not on the same GPU.
How can I run search_multi.py successfully?
As mentioned in your paper (Appendix 6.1)
we first train a MobileNet [25] style network from scratch, and then use the network as a teacher model to distill a smaller student network.
I guess "use the network as a teacher model to distill a smaller student network." corresponding to "Pre-distillation (Optional)..." here in your docs/training_tutorial.md.
And you have compared "Pruning + distill" to the "GAN Compression" method in 4.3 Figure 6.
My questions are:
Hope for your reply.
Is it necessary to first train regular pix2pix and then compress it with the code in this repository? Is it possible to train compressed model straight away?
Hello, I want to use this method to compress the other GAN models, like starGAN, styleGAN. If it is feasible, What should I do?
I notice your scripts can distill resblock of 16, 24, 32.
But how to distill a specific structure like "16_16_32_16_32_32_16_16"?
I think your experiment in fig 6 of the paper, is it to compare 'pruning+distill" with "gan-compressIon" in the same MACs and resblock structure?
Hi, I follow the instructions in Pix2pix Model Compression "bash scripts/pix2pix/edges2shoes-r/train_mobile.sh", but the model I got only got 28.79 on FID score which is different from the results in your paper. Is there anything I forgot to do?
First of all, this is a subversive technical paper, which makes the reasoning model of mobile devices possible. If the picture definition is improved a little more, the breakdown threshold can be achieved.
on the left, middle and right, real_ A,fake_ B,real_B
The cartoon picture is simple, but it's OK. Others may need to be improved.
I'm focusing on pix2pixhd and partialconv. The calculation amount and capacity of the model are relatively large. I wonder if I can refine the advantages to solve the problem.
when preparing the dataset for cyclegan, i get a warning saying that only BtoA is allowed:
(gan) home@home-lnx:~/programs/level 2/gan-compression$ python get_real_stat.py --dataroot database/horse2zebra/valB --output_path real_stat/horse2zebra_B.npz --direction AtoB --dataset_mode single
get_real_stat.py:61: UserWarning: Dataset mode [single] only supports direction BtoA. We will change the direction to BtoA.!
warnings.warn('Dataset mode [single] only supports direction BtoA. '
I trained a cycleGAN model without using the statistical information ofthe groud-truth images (using the original cycleGAN code). Is it possible to make the statistical information up to the model trained using origial cycleGAN or I have to re-train the model using the new version?
Thanks.
I want to know the purpose of pre-distillation in the GAN-compression pipeline. How does it improve the pruning pipeline? It is not mentioned anywhere in the paper.
I try to get my own dataset by get_real_stat.py
.
But I get a error need more than 13G memory in tensors = util.tensor2im(tensors).astype(float)
So, I try to reduce memory.
Firstly, It is simple to change it to tensors = util.tensor2im(tensors).astype(np.float32)
. It is useful.
But not good.
I try to change function get_activations_from_ims
in fid_score.py
like:
images = images.transpose((0, 3, 1, 2))
images = images.astype(np.float32)/255
But I find it is not good enought by memory_profiler, like follow:
Line # Mem usage Increment Line Contents
================================================
10 9041.7 MiB 9041.7 MiB @profile
11 def get_fid(fakes, model, npz, device, batch_size=1, use_tqdm=True, bgr=False):
12 9041.7 MiB 0.0 MiB m1, s1 = npz['mu'], npz['sigma']
13 15747.4 MiB 6705.7 MiB fakes = torch.cat(fakes, dim=0)
14 10226.1 MiB 0.0 MiB fakes = util.tensor2im(fakes, normalize=False) #.astype(np.float32) # default float
15 10226.1 MiB 0.0 MiB m2, s2 = _compute_statistics_of_ims(fakes, model, batch_size, 2048,
16 10338.3 MiB 112.3 MiB device, use_tqdm=use_tqdm, bgr=bgr)
17 10346.7 MiB 8.4 MiB return float(calculate_frechet_distance(m1, s1, m2, s2))
Do you have some good ideas to reduce memory?
Thank you!
take CycleGAN compression as an example.
the teacher generator: a MobileNet-based CycleGAN trained from scratch
the student generator: a MobileNet-based distillation CycleGAN model
the final generator: a fine-tuned sub network of student generator
Is my understanding correct?
Which means the original CycleGAN model(with normal conv) is not needed in compression algorthm?
In paper you mentioned that you inherit discriminator from pretrained teacher. I see that in GauGan compression you do not do that. am I correct?
Following the tutorial, i train the mobile style CycleGAN. (Without changing parameters except the dataset)
But when i runing train_supernet.sh, it end up with the following error.
Traceback (most recent call last):
File "train_supernet.py", line 4, in <module>
trainer = Trainer('supernet')
File "/data1/edvardzeng/myspace/gan-compression/trainer.py", line 52, in __init__
model.setup(opt) # regular setup: load and print networks; create schedulers
File "/data1/edvardzeng/myspace/gan-compression/distillers/base_resnet_distiller.py", line 153, in setup
self.load_networks(verbose)
File "/data1/edvardzeng/myspace/gan-compression/supernets/resnet_supernet.py", line 182, in load_networks
self.opt.student_ngf, self.opt.student_ngf)
File "/data1/edvardzeng/myspace/gan-compression/utils/weight_transfer.py", line 167, in load_pretrained_weight
index = transfer(m1, m2, index)
File "/data1/edvardzeng/myspace/gan-compression/utils/weight_transfer.py", line 139, in transfer
return transfer_MobileResnetBlock(m1, m2, input_index, output_index)
File "/data1/edvardzeng/myspace/gan-compression/utils/weight_transfer.py", line 84, in transfer_MobileResnetBlock
idxs = transfer(m1.conv_block[1], m2.conv_block[1], input_index=input_index)
File "/data1/edvardzeng/myspace/gan-compression/utils/weight_transfer.py", line 145, in transfer
raise NotImplementedError('Unknown module [%s]!' % type(m1))
NotImplementedError: Unknown module [<class 'torch.nn.modules.instancenorm.InstanceNorm2d'>]!
Thanks for sharing your excellent work. I hava two questions about once for all.
1.Different hardware platforms have different optimizations for op and We often choose efficient op according to differnt hardware platform, can OFA handle this situation when different hardware platform have different prefer op?
2.On mobile platforms, different camera sensor produce different data, so different training data for different hardware platform. when we usr OFA for a generative network, like srgan, which platform's training data should be used?
Hi there, @junyanz @lmxyy
when trying to train a "once-for-all" network, i get error:
(gan) home@home-lnx:~/programs/level 2/gan-compression$ bash scripts/cycle_gan/horse2zebra_lite/train_supernet.sh
Traceback (most recent call last):
File "train_supernet.py", line 4, in <module>
trainer = Trainer('supernet')
File "/home/home/programs/level 2/gan-compression/trainer.py", line 38, in __init__
opt = Options().parse()
File "/home/home/programs/level 2/gan-compression/options/base_options.py", line 134, in parse
opt = self.gather_options()
File "/home/home/programs/level 2/gan-compression/options/supernet_options.py", line 76, in gather_options
supernet_option_setter = supernets.get_option_setter(supernet_name)
File "/home/home/programs/level 2/gan-compression/supernets/__init__.py", line 22, in get_option_setter
supernet_class = find_supernet_using_name(supernet_name)
File "/home/home/programs/level 2/gan-compression/supernets/__init__.py", line 6, in find_supernet_using_name
modellib = importlib.import_module(supernet_filename)
File "/home/home/anaconda3/envs/gan/lib/python3.8/importlib/__init__.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 1014, in _gcd_import
File "<frozen importlib._bootstrap>", line 991, in _find_and_load
File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 671, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 783, in exec_module
File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
File "/home/home/programs/level 2/gan-compression/supernets/resnet_supernet.py", line 13, in <module>
from distillers.base_resnet_distiller import BaseResnetDistiller
File "/home/home/programs/level 2/gan-compression/distillers/base_resnet_distiller.py", line 11, in <module>
from metric.cityscapes_mIoU import DRNSeg
ModuleNotFoundError: No module named 'metric.cityscapes_mIoU'
I noticed you removed spectral normalization in GauGAN generators. Is there a particular motivation behind this?
.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.