akshitac8 / tfvaegan Goto Github PK

[ECCV 2020] Official Pytorch implementation for "Latent Embedding Feedback and Discriminative Features for Zero-Shot Classification". SOTA results for ZSL and GZSL

License: MIT License

Python 100.00%

zero-shot-learning image-classification f-vaegan clswgan action-recognition eccv2020 gzsl gan zsl pytorch-gan

tfvaegan's Introduction

Latent Embedding Feedback and Discriminative Features for Zero-Shot Classification (ECCV 2020)

Sanath Narayan^, Akshita Gupta^, Fahad Shahbaz Khan, Cees G. M. Snoek, Ling Shao

(* denotes equal contribution)

Paper: https://www.ecva.net/papers/eccv_2020/papers_ECCV/papers/123670477.pdf

Video Presentation: Short summary , Overview

Finetuned features: https://drive.google.com/drive/folders/13-eyljOmGwVRUzfMZIf_19HmCj1yShf1?usp=sharing

Webpage: https://akshitac8.github.io/tfvaegan/

Zero-shot learning strives to classify unseen categories for which no data is available during training. In the generalized variant, the test samples can further belong to seen or unseen categories. The stateof-the-art relies on Generative Adversarial Networks that synthesize unseen class features by leveraging class-specific semantic embeddings. During training, they generate semantically consistent features, but discard this constraint during feature synthesis and classification. We propose to enforce semantic consistency at all stages of (generalized) zero-shot learning: training, feature synthesis and classification. We first introduce a feedback loop, from a semantic embedding decoder, that iteratively refines the generated features during both the training and feature synthesis stages. The synthesized features together with their corresponding latent embeddings from the decoder are then transformed into discriminative features and utilized during classification to reduce ambiguities among categories. Experiments on (generalized) zero-shot object and action classification reveal the benefit of semantic consistency and iterative feedback, outperforming existing methods on six zero-shot learning benchmarks

Overall Architecture:

Overall Framework for TF-Vaegan


A feedback module, which utilizes the auxiliary decoder during both training and feature synthesis stages for improving semantic quality of synthesized feature.	A discriminative feature transformation that utilizes the auxiliary decoder during the classification stage for enhancing zero-shot classification.

Prerequisites

Python 3.6
Pytorch 0.3.1
torchvision 0.2.0
h5py 2.10
scikit-learn 0.22.1
scipy=1.4.1
numpy 1.18.1
numpy-base 1.18.1
pillow 5.1.0

Installation

The model is built in PyTorch 0.3.1 and tested on Ubuntu 16.04 environment (Python3.6, CUDA9.0, cuDNN7.5).

For installing, follow these intructions

conda create -n tfvaegan python=3.6
conda activate tfvaegan
pip install https://download.pytorch.org/whl/cu90/torch-0.3.1-cp36-cp36m-linux_x86_64.whl
pip install torchvision==0.2.0 scikit-learn==0.22.1 scipy==1.4.1 h5py==2.10 numpy==1.18.1

Data preparation

Standard ZSL and GZSL datasets

Download CUB, AWA, FLO and SUN features from the drive link shared below.

link: https://drive.google.com/drive/folders/16Xk1eFSWjQTtuQivTogMmvL3P6F_084u?usp=sharing

Download UCF101 and HMDB51 features from the drive link shared below.

link: https://drive.google.com/drive/folders/1pNlnL3LFSkXkJNkTHNYrQ3-Ie4vvewBy?usp=sharing

Extract them in the datasets folder.

Custom datasets

Download the custom dataset images in the datsets folder.
Use a pre-defined RESNET101 as feature extractor. For example, you can a have look here
Extract features from the pre-defined RESNET101 and save the features in the dictionary format with keys 'features', 'image_files', 'labels'.

Save the dictionary in a .mat format using,

import scipy.io as io
io.savemat('temp',feat)

Training

Zero-Shot Image Classification

To train and evaluate ZSL and GZSL models on CUB, AWA, FLO and SUN, please run:

CUB : python scripts/run_cub_tfvaegan.py
AWA : python scripts/run_awa_tfvaegan.py
FLO : python scripts/run_flo_tfvaegan.py
SUN : python scripts/run_sun_tfvaegan.py

Zero-Shot Action Classification

To train and evaluate ZSL and GZSL models on UCF101, HMDB51, please run:

HMDB51 : python scripts/run_hmdb51_tfvaegan.py
UCF101 : python scripts/run_ucf101_tfvaegan.py

Results

Citation:

If you find this useful, please cite our work as follows:

@inproceedings{narayan2020latent,
	title={Latent Embedding Feedback and Discriminative Features for Zero-Shot Classification},
	author={Narayan, Sanath and Gupta, Akshita and Khan, Fahad Shahbaz and Snoek, Cees GM and Shao, Ling},
	booktitle={ECCV},
	year={2020}
}

tfvaegan's People

Contributors

Stargazers

Watchers

tfvaegan's Issues

_test_unseen_feature = scaler.transform(feature[test_unseen_loc]) IndexError: index 382 is out of bounds for axis 0 with size 382

Question about training on action recognition

Hi, Thanks to your excellent work!

However, when I try to train action UCF101 or HMDB51 following ur README, I found some error, the major is as belows:

When it comes to sample() in train_actions.py, I found that the shape of batch_att is (101,300), after samling the shape is (64,300). But the shape of input_att should be (64,115), so the code could not do copy().
PS: I used the att_splits.mat in ur google drive.

Looking forward to your reply!

There is problem with AWA2!!

I run your code and the results are very different from your report!
[0/120] Loss_D: 28.9489 Loss_G: 3.5188, Wasserstein_dist:30.8353, vae_loss_seen:417.3835 ZSL: unseen accuracy=0.3096
[1/120] Loss_D: -38.1489 Loss_G: 5.0456, Wasserstein_dist:42.0203, vae_loss_seen:420.9953 ZSL: unseen accuracy=0.4673
[2/120] Loss_D: -27.9687 Loss_G: 5.7563, Wasserstein_dist:47.2592, vae_loss_seen:440.3323 ZSL: unseen accuracy=0.3712
[3/120] Loss_D: -40.0992 Loss_G: 6.2145, Wasserstein_dist:46.7544, vae_loss_seen:420.0519 ZSL: unseen accuracy=0.4334
[4/120] Loss_D: -43.3750 Loss_G: 6.8418, Wasserstein_dist:49.8265, vae_loss_seen:458.5927 ZSL: unseen accuracy=0.4220
[5/120] Loss_D: -34.5511 Loss_G: 6.9677, Wasserstein_dist:49.1632, vae_loss_seen:439.0522 ZSL: unseen accuracy=0.5199
[6/120] Loss_D: -41.3066 Loss_G: 7.0170, Wasserstein_dist:47.2840, vae_loss_seen:433.2614 ZSL: unseen accuracy=0.4818
[7/120] Loss_D: -39.5687 Loss_G: 6.7393, Wasserstein_dist:48.4081, vae_loss_seen:454.8207 ZSL: unseen accuracy=0.4993
[8/120] Loss_D: -31.6542 Loss_G: 6.4731, Wasserstein_dist:39.9834, vae_loss_seen:433.3832 ZSL: unseen accuracy=0.5064
[9/120] Loss_D: -30.2242 Loss_G: 6.4314, Wasserstein_dist:38.1065, vae_loss_seen:432.4793 ZSL: unseen accuracy=0.5597
[10/120] Loss_D: -21.7901 Loss_G: 6.0656, Wasserstein_dist:30.8512, vae_loss_seen:403.6347 ZSL: unseen accuracy=0.5471
[11/120] Loss_D: -24.2578 Loss_G: 5.9910, Wasserstein_dist:31.9925, vae_loss_seen:427.8307 ZSL: unseen accuracy=0.5417
[12/120] Loss_D: -24.3917 Loss_G: 5.8288, Wasserstein_dist:30.9009, vae_loss_seen:424.8787 ZSL: unseen accuracy=0.5821
[13/120] Loss_D: -24.8628 Loss_G: 5.4020, Wasserstein_dist:33.0159, vae_loss_seen:451.1423 ZSL: unseen accuracy=0.5861
[14/120] Loss_D: -19.3216 Loss_G: 5.3820, Wasserstein_dist:26.8774, vae_loss_seen:422.8309 ZSL: unseen accuracy=0.6012
[15/120] Loss_D: -19.0281 Loss_G: 5.1767, Wasserstein_dist:26.6548, vae_loss_seen:419.1060 ZSL: unseen accuracy=0.6001
[16/120] Loss_D: -17.1176 Loss_G: 4.9828, Wasserstein_dist:24.3713, vae_loss_seen:416.7115 ZSL: unseen accuracy=0.6066
[17/120] Loss_D: -17.5543 Loss_G: 4.7131, Wasserstein_dist:25.1840, vae_loss_seen:423.3133 ZSL: unseen accuracy=0.6277
[18/120] Loss_D: -18.0680 Loss_G: 4.7025, Wasserstein_dist:24.0267, vae_loss_seen:419.6989 ZSL: unseen accuracy=0.6158
[19/120] Loss_D: -16.6648 Loss_G: 4.5674, Wasserstein_dist:23.0303, vae_loss_seen:413.6142 ZSL: unseen accuracy=0.6048
[20/120] Loss_D: -16.7052 Loss_G: 4.3856, Wasserstein_dist:22.5796, vae_loss_seen:406.2739 ZSL: unseen accuracy=0.6157
[21/120] Loss_D: -13.1247 Loss_G: 4.1400, Wasserstein_dist:19.5000, vae_loss_seen:383.6900 ZSL: unseen accuracy=0.6188
[22/120] Loss_D: -15.2950 Loss_G: 4.0473, Wasserstein_dist:19.8167, vae_loss_seen:395.0753 ZSL: unseen accuracy=0.6254
[23/120] Loss_D: -16.2702 Loss_G: 3.7179, Wasserstein_dist:22.2775, vae_loss_seen:410.0670 ZSL: unseen accuracy=0.6138
[24/120] Loss_D: -14.8846 Loss_G: 3.5764, Wasserstein_dist:20.1954, vae_loss_seen:390.0315 ZSL: unseen accuracy=0.6234
[25/120] Loss_D: -14.7134 Loss_G: 3.4894, Wasserstein_dist:19.9493, vae_loss_seen:384.4587 ZSL: unseen accuracy=0.6230
[26/120] Loss_D: -14.1834 Loss_G: 3.2758, Wasserstein_dist:20.6925, vae_loss_seen:398.3151 ZSL: unseen accuracy=0.6103
[27/120] Loss_D: -16.3200 Loss_G: 3.0178, Wasserstein_dist:21.8677, vae_loss_seen:413.6834 ZSL: unseen accuracy=0.6163
[28/120] Loss_D: -13.9609 Loss_G: 3.0464, Wasserstein_dist:19.2708, vae_loss_seen:387.7805 ZSL: unseen accuracy=0.6203
[29/120] Loss_D: -14.7927 Loss_G: 2.8775, Wasserstein_dist:19.2506, vae_loss_seen:380.6116 ZSL: unseen accuracy=0.6219
[30/120] Loss_D: -12.7497 Loss_G: 2.6132, Wasserstein_dist:18.4324, vae_loss_seen:370.7046 ZSL: unseen accuracy=0.6285
[31/120] Loss_D: -15.6029 Loss_G: 2.4640, Wasserstein_dist:20.4424, vae_loss_seen:394.2675 ZSL: unseen accuracy=0.6352
[32/120] Loss_D: -12.5964 Loss_G: 2.4132, Wasserstein_dist:17.3379, vae_loss_seen:356.7068 ZSL: unseen accuracy=0.6342
[33/120] Loss_D: -14.9023 Loss_G: 2.2222, Wasserstein_dist:19.8327, vae_loss_seen:387.2333 ZSL: unseen accuracy=0.6328
[34/120] Loss_D: -17.4444 Loss_G: 2.0733, Wasserstein_dist:22.5138, vae_loss_seen:407.3970 ZSL: unseen accuracy=0.6386
[35/120] Loss_D: -15.5337 Loss_G: 1.9952, Wasserstein_dist:20.9760, vae_loss_seen:390.0721 ZSL: unseen accuracy=0.6453
[36/120] Loss_D: -14.9247 Loss_G: 1.9750, Wasserstein_dist:18.6419, vae_loss_seen:371.8835 ZSL: unseen accuracy=0.6469
[37/120] Loss_D: -13.8661 Loss_G: 1.8783, Wasserstein_dist:18.0047, vae_loss_seen:366.3225 ZSL: unseen accuracy=0.6535
[38/120] Loss_D: -13.6998 Loss_G: 1.7310, Wasserstein_dist:18.0273, vae_loss_seen:371.9961 ZSL: unseen accuracy=0.6441
[39/120] Loss_D: -16.0056 Loss_G: 1.6070, Wasserstein_dist:19.6938, vae_loss_seen:372.3855 ZSL: unseen accuracy=0.6521
[40/120] Loss_D: -12.5254 Loss_G: 1.5056, Wasserstein_dist:16.4971, vae_loss_seen:355.2552 ZSL: unseen accuracy=0.6517
[41/120] Loss_D: -12.9945 Loss_G: 1.4330, Wasserstein_dist:18.2767, vae_loss_seen:369.5546 ZSL: unseen accuracy=0.6695
[42/120] Loss_D: -13.8390 Loss_G: 1.2997, Wasserstein_dist:18.0963, vae_loss_seen:369.7717 ZSL: unseen accuracy=0.6567
[43/120] Loss_D: -14.4553 Loss_G: 1.1602, Wasserstein_dist:19.4352, vae_loss_seen:379.5174 ZSL: unseen accuracy=0.6605
[44/120] Loss_D: -13.1109 Loss_G: 1.1575, Wasserstein_dist:17.5158, vae_loss_seen:359.2687 ZSL: unseen accuracy=0.6614
[45/120] Loss_D: -15.1307 Loss_G: 1.0393, Wasserstein_dist:19.9992, vae_loss_seen:378.5796 ZSL: unseen accuracy=0.6575
[46/120] Loss_D: -13.3974 Loss_G: 1.0107, Wasserstein_dist:17.2745, vae_loss_seen:366.0474 ZSL: unseen accuracy=0.6610
[47/120] Loss_D: -13.9099 Loss_G: 0.8800, Wasserstein_dist:18.4491, vae_loss_seen:361.1379 ZSL: unseen accuracy=0.6622
[48/120] Loss_D: -13.8484 Loss_G: 0.8339, Wasserstein_dist:18.7640, vae_loss_seen:367.4486 ZSL: unseen accuracy=0.6613
[49/120] Loss_D: -13.7281 Loss_G: 0.7328, Wasserstein_dist:17.5516, vae_loss_seen:350.7204 ZSL: unseen accuracy=0.6659
[50/120] Loss_D: -14.8838 Loss_G: 0.6333, Wasserstein_dist:18.6946, vae_loss_seen:372.8707 ZSL: unseen accuracy=0.6675
[51/120] Loss_D: -12.2430 Loss_G: 0.5957, Wasserstein_dist:16.5738, vae_loss_seen:359.0114 ZSL: unseen accuracy=0.6669
[52/120] Loss_D: -13.2465 Loss_G: 0.4294, Wasserstein_dist:17.6025, vae_loss_seen:365.4046 ZSL: unseen accuracy=0.6649
[53/120] Loss_D: -12.6869 Loss_G: 0.3857, Wasserstein_dist:16.9520, vae_loss_seen:347.0602 ZSL: unseen accuracy=0.6698
[54/120] Loss_D: -11.0332 Loss_G: 0.4596, Wasserstein_dist:16.0554, vae_loss_seen:336.9040 ZSL: unseen accuracy=0.6634
[55/120] Loss_D: -13.5534 Loss_G: 0.3988, Wasserstein_dist:17.2773, vae_loss_seen:351.3984 ZSL: unseen accuracy=0.6623
[56/120] Loss_D: -16.2916 Loss_G: 0.4029, Wasserstein_dist:20.4739, vae_loss_seen:388.2424 ZSL: unseen accuracy=0.6694
[57/120] Loss_D: -11.0350 Loss_G: 0.1916, Wasserstein_dist:16.9984, vae_loss_seen:356.0585 ZSL: unseen accuracy=0.6722
[58/120] Loss_D: -12.2267 Loss_G: 0.2432, Wasserstein_dist:16.5049, vae_loss_seen:347.4862 ZSL: unseen accuracy=0.6670
[59/120] Loss_D: -11.7591 Loss_G: 0.1562, Wasserstein_dist:16.0517, vae_loss_seen:341.9133 ZSL: unseen accuracy=0.6698
[60/120] Loss_D: -11.7878 Loss_G: 0.0945, Wasserstein_dist:16.0950, vae_loss_seen:347.2085 ZSL: unseen accuracy=0.6726
[61/120] Loss_D: -12.0919 Loss_G: -0.0465, Wasserstein_dist:16.5042, vae_loss_seen:343.2227 ZSL: unseen accuracy=0.6747
[62/120] Loss_D: -12.5338 Loss_G: 0.0532, Wasserstein_dist:16.9401, vae_loss_seen:357.8211 ZSL: unseen accuracy=0.6688
[63/120] Loss_D: -13.2174 Loss_G: -0.0086, Wasserstein_dist:18.1508, vae_loss_seen:361.2040 ZSL: unseen accuracy=0.6725
[64/120] Loss_D: -10.3915 Loss_G: 0.0809, Wasserstein_dist:14.7022, vae_loss_seen:331.4780 ZSL: unseen accuracy=0.6774
[65/120] Loss_D: -13.3581 Loss_G: -0.0096, Wasserstein_dist:17.6856, vae_loss_seen:352.0486 ZSL: unseen accuracy=0.6758
[66/120] Loss_D: -13.9676 Loss_G: -0.0516, Wasserstein_dist:17.7229, vae_loss_seen:364.4155 ZSL: unseen accuracy=0.6851
[67/120] Loss_D: -13.0893 Loss_G: -0.1197, Wasserstein_dist:16.4608, vae_loss_seen:348.7996 ZSL: unseen accuracy=0.6728
[68/120] Loss_D: -11.5536 Loss_G: -0.1619, Wasserstein_dist:16.5234, vae_loss_seen:362.5963 ZSL: unseen accuracy=0.6729
[69/120] Loss_D: -10.9419 Loss_G: -0.2287, Wasserstein_dist:15.2413, vae_loss_seen:348.5123 ZSL: unseen accuracy=0.6761
[70/120] Loss_D: -13.8686 Loss_G: -0.1638, Wasserstein_dist:17.5385, vae_loss_seen:371.5560 ZSL: unseen accuracy=0.6762
[71/120] Loss_D: -11.9027 Loss_G: -0.2468, Wasserstein_dist:16.2225, vae_loss_seen:353.1340 ZSL: unseen accuracy=0.6754
[72/120] Loss_D: -12.2049 Loss_G: -0.0970, Wasserstein_dist:16.0503, vae_loss_seen:347.4665 ZSL: unseen accuracy=0.6700
[73/120] Loss_D: -10.3142 Loss_G: -0.2448, Wasserstein_dist:14.6234, vae_loss_seen:339.0842 ZSL: unseen accuracy=0.6762
[74/120] Loss_D: -13.8482 Loss_G: -0.2471, Wasserstein_dist:17.5982, vae_loss_seen:363.6411 ZSL: unseen accuracy=0.6748
[75/120] Loss_D: -13.6796 Loss_G: -0.2883, Wasserstein_dist:17.7744, vae_loss_seen:361.7907 ZSL: unseen accuracy=0.6670
[76/120] Loss_D: -10.3488 Loss_G: -0.1847, Wasserstein_dist:14.3110, vae_loss_seen:344.1082 ZSL: unseen accuracy=0.6832
[77/120] Loss_D: -11.7748 Loss_G: -0.3006, Wasserstein_dist:15.3507, vae_loss_seen:336.7664 ZSL: unseen accuracy=0.6877
[78/120] Loss_D: -11.2038 Loss_G: -0.4176, Wasserstein_dist:15.4924, vae_loss_seen:347.2991 ZSL: unseen accuracy=0.6949
[79/120] Loss_D: -9.6939 Loss_G: -0.3472, Wasserstein_dist:14.5233, vae_loss_seen:337.1395 ZSL: unseen accuracy=0.6789
[80/120] Loss_D: -11.5880 Loss_G: -0.2479, Wasserstein_dist:15.9844, vae_loss_seen:346.7844 ZSL: unseen accuracy=0.6748
[81/120] Loss_D: -13.7347 Loss_G: -0.3050, Wasserstein_dist:17.4421, vae_loss_seen:374.5667 ZSL: unseen accuracy=0.6846
[82/120] Loss_D: -11.2488 Loss_G: -0.2365, Wasserstein_dist:15.2620, vae_loss_seen:343.5577 ZSL: unseen accuracy=0.6724
[83/120] Loss_D: -12.0944 Loss_G: -0.3197, Wasserstein_dist:16.5121, vae_loss_seen:367.4224 ZSL: unseen accuracy=0.6800
[84/120] Loss_D: -11.3483 Loss_G: -0.2923, Wasserstein_dist:15.3394, vae_loss_seen:353.5658 ZSL: unseen accuracy=0.6878
[85/120] Loss_D: -11.9544 Loss_G: -0.3463, Wasserstein_dist:15.9010, vae_loss_seen:358.6190 ZSL: unseen accuracy=0.6802
[86/120] Loss_D: -12.2786 Loss_G: -0.2794, Wasserstein_dist:16.3081, vae_loss_seen:366.2616 ZSL: unseen accuracy=0.6766
[87/120] Loss_D: -11.3531 Loss_G: -0.2205, Wasserstein_dist:15.4377, vae_loss_seen:349.5160 ZSL: unseen accuracy=0.6814
[88/120] Loss_D: -10.5067 Loss_G: -0.2649, Wasserstein_dist:14.9118, vae_loss_seen:341.2050 ZSL: unseen accuracy=0.6952
[89/120] Loss_D: -12.3879 Loss_G: -0.3045, Wasserstein_dist:16.0138, vae_loss_seen:364.4077 ZSL: unseen accuracy=0.6873
[90/120] Loss_D: -10.6881 Loss_G: -0.2168, Wasserstein_dist:13.9609, vae_loss_seen:345.1350 ZSL: unseen accuracy=0.6704
[91/120] Loss_D: -11.4577 Loss_G: -0.1902, Wasserstein_dist:15.6705, vae_loss_seen:360.9030 ZSL: unseen accuracy=0.6856
[92/120] Loss_D: -11.8921 Loss_G: -0.2537, Wasserstein_dist:15.5879, vae_loss_seen:357.1034 ZSL: unseen accuracy=0.6819
[93/120] Loss_D: -11.6485 Loss_G: -0.2178, Wasserstein_dist:15.3665, vae_loss_seen:354.0023 ZSL: unseen accuracy=0.6809
[94/120] Loss_D: -11.4810 Loss_G: -0.2585, Wasserstein_dist:15.8418, vae_loss_seen:360.4694 ZSL: unseen accuracy=0.6755
[95/120] Loss_D: -9.8617 Loss_G: -0.2793, Wasserstein_dist:14.3561, vae_loss_seen:346.4721 ZSL: unseen accuracy=0.6857
[96/120] Loss_D: -12.4210 Loss_G: -0.2584, Wasserstein_dist:16.1214, vae_loss_seen:367.1083 ZSL: unseen accuracy=0.6731
[97/120] Loss_D: -11.6503 Loss_G: -0.1654, Wasserstein_dist:15.6967, vae_loss_seen:362.2595 ZSL: unseen accuracy=0.6830
[98/120] Loss_D: -12.0017 Loss_G: -0.1422, Wasserstein_dist:15.0335, vae_loss_seen:350.0283 ZSL: unseen accuracy=0.6730
[99/120] Loss_D: -11.2206 Loss_G: -0.2038, Wasserstein_dist:14.5147, vae_loss_seen:353.3007 ZSL: unseen accuracy=0.6735
[100/120] Loss_D: -11.2889 Loss_G: -0.2190, Wasserstein_dist:14.7292, vae_loss_seen:355.7301 ZSL: unseen accuracy=0.6803
[101/120] Loss_D: -12.2800 Loss_G: -0.2927, Wasserstein_dist:15.6958, vae_loss_seen:368.2365 ZSL: unseen accuracy=0.6958
[102/120] Loss_D: -12.0904 Loss_G: -0.2620, Wasserstein_dist:16.0737, vae_loss_seen:363.4147 ZSL: unseen accuracy=0.6927
[103/120] Loss_D: -9.5963 Loss_G: -0.0818, Wasserstein_dist:13.1985, vae_loss_seen:335.8094 ZSL: unseen accuracy=0.6851
[104/120] Loss_D: -10.6255 Loss_G: -0.2326, Wasserstein_dist:14.2097, vae_loss_seen:351.2049 ZSL: unseen accuracy=0.6789
[105/120] Loss_D: -12.3060 Loss_G: -0.2093, Wasserstein_dist:15.1102, vae_loss_seen:357.3372 ZSL: unseen accuracy=0.6776
[106/120] Loss_D: -10.6729 Loss_G: -0.2010, Wasserstein_dist:14.3125, vae_loss_seen:343.7611 ZSL: unseen accuracy=0.6843
[107/120] Loss_D: -11.3574 Loss_G: -0.2230, Wasserstein_dist:15.0471, vae_loss_seen:350.5015 ZSL: unseen accuracy=0.6822
[108/120] Loss_D: -11.0685 Loss_G: -0.1947, Wasserstein_dist:14.6980, vae_loss_seen:364.0989 ZSL: unseen accuracy=0.6734
[109/120] Loss_D: -9.7010 Loss_G: -0.2283, Wasserstein_dist:13.7200, vae_loss_seen:351.6648 ZSL: unseen accuracy=0.6829
[110/120] Loss_D: -10.8478 Loss_G: -0.2382, Wasserstein_dist:14.5784, vae_loss_seen:342.6785 ZSL: unseen accuracy=0.6973
[111/120] Loss_D: -12.1049 Loss_G: -0.1735, Wasserstein_dist:15.5099, vae_loss_seen:378.7214 ZSL: unseen accuracy=0.6796
[112/120] Loss_D: -10.3194 Loss_G: -0.1183, Wasserstein_dist:13.8210, vae_loss_seen:343.2414 ZSL: unseen accuracy=0.6951
[113/120] Loss_D: -10.0676 Loss_G: -0.1055, Wasserstein_dist:13.3668, vae_loss_seen:352.7232 ZSL: unseen accuracy=0.6800
[114/120] Loss_D: -11.1200 Loss_G: -0.2201, Wasserstein_dist:14.6772, vae_loss_seen:356.9974 ZSL: unseen accuracy=0.6913
[115/120] Loss_D: -9.4396 Loss_G: -0.1900, Wasserstein_dist:12.8698, vae_loss_seen:352.2440 ZSL: unseen accuracy=0.6762
[116/120] Loss_D: -9.9157 Loss_G: -0.0919, Wasserstein_dist:13.9452, vae_loss_seen:368.9888 ZSL: unseen accuracy=0.6837
[117/120] Loss_D: -10.0496 Loss_G: -0.0448, Wasserstein_dist:13.3463, vae_loss_seen:340.1138 ZSL: unseen accuracy=0.6783
[118/120] Loss_D: -9.8089 Loss_G: -0.0601, Wasserstein_dist:13.3715, vae_loss_seen:356.6316 ZSL: unseen accuracy=0.6865
[119/120] Loss_D: -9.8530 Loss_G: -0.0254, Wasserstein_dist:13.4506, vae_loss_seen:336.4763 ZSL: unseen accuracy=0.6742
Dataset AWA2
the best ZSL unseen accuracy is tensor(0.6973)

RuntimeError: value cannot be converted to type float without overflow: inf

Hi,

First of all, thank you for your code and paper (Your results are impressive). I am running your code now on a different dataset (similar to SUN regarding the number of classes) and I got: RuntimeError: value cannot be converted to type float without overflow: inf (in line 235) vae_loss_seen = loss_fn(recon_x, input_resv, means, log_var). I believe means or log_var gets 'inf' or '-inf' inside. Do you have any idea how to solve it? I am using SUN dataset's parameters btw.

P.S. I was able to run your code for SUn and reproduce the results so it is not related to PyTorch version or related.

Many thanks,
Sarkhan

Transductive Setting

Hello,
Is it possible to share the code for the transductive setting?
Thank you!

Question about argparse

Hi Akshita,
Thanks for updating your codes.

In the script of running ucf101, there is --dataset ucf101_i3d/split_{split}.

I would ask: how do codes above fetch data split from 30 files?

I did not find codes like split = [1:30] and make a for loop to run 30 times, and at last, calculate the averaged accuracy over 30 splits. Any suggestions for that. Thanks in advance.

Kind regards.
Kai

Question regarding fine-tuned features

Hi @akshitac8 ,

I wanted to clarify the procedure of retrieving the finetuned features. The features provided in the datasets are ResNet101 features trained on ImageNet. How do you achieve the finetuned features that you have mentioned in the paper? The performance for fine-tuned features is superb!

Questions about training

Hi Akshitac,

Sorry to open a new issue since I get more questions about training.

Question 1: For each epoch during training, the model is trained with seen data and then evaluate the model (from this epoch) with test set. After that, the accuracy is obtained after evaluation in this epoch. After 30 epochs, you pick the best accuracy as the final result. Am I right?

Question 2: How about the argparse parameter: --syn_num? I can see that the default is set to 600 (means generating 600 visual representations for each action class?). Is there any experiment to indicate that this parameter can significantly influence in the model performance? Any suggestions for that?

Thanks in advance.

Kind regards.
Kaiqiang

About the loss in the training process

Hello, I observed the loss function in the training process, and found that the total loss function did not approach 0 at last, but stabilized at a fixed value with a small amplitude oscillation. Is this normal? If so, why didn't the value of the loss continue to decrease? Hope you can answer!
(I used the relevant parameters of the FLO data set you provided for the test.)

Transductive setting's hyperparameters

Hi @akshitac8, is there any chance of providing the transductive setting's hyperparameters for each dataset?
thank you

'Variable' object has no attribute 'getLayersOutDet'

In generate_syn_feature function;

if netF is not None:
           # dec_out = netDec(fake) # only to call the forward function of decoder
           **dec_hidden_feat = netDec(fake).getLayersOutDet() #no detach layers**
           feedback_out = netF(dec_hidden_feat)
           fake = generator(syn_noisev, a1=opt.a2, c=syn_attv, feedback_layers=feedback_out)

there is an attribute error; netDec(fake).getLayersOutDet()

Traceback (most recent call last):
  File "train_images.py", line 270, in <module>
    syn_feature, syn_label = generate_syn_feature(netG, data.unseenclasses, data.attribute, opt.syn_num, netF=netF, netDec=netDec)
  File "train_images.py", line 101, in generate_syn_feature
    dec_hidden_feat = netDec(fake).getLayersOutDet()#netDec(fake).getLayersOutDet() #no detach layers
  File "/home/cemil/miniconda3/envs/tfvaegan/lib/python3.6/site-packages/torch/autograd/variable.py", line 67, in __getattr__
    return object.__getattribute__(self, name)
AttributeError: 'Variable' object has no attribute 'getLayersOutDet'

Are you getting the same error?

Thank you very much.

Recreating ResNet visual features

Hi @akshitac8 @naraysa ,

I was trying to recreate the ResNet visual features provided, from CUB dataset using images.
I used the pretrained ResNet from latest version of PyTorch.
It would be really helpful if you can help me to recreate the ResNet visual features from Images (for CUB dataset).

Unable to run zero-shot-images code.

I am getting this error on PyTorch=1.6 and torchvision=0.7

I saw that you have tested this on PyTorch v0.3, but setting up environment using your yml file is taking forever.

regularization parameter?

I am very interested in this paper, and prepare to cite it in my paper. But there is a question that why some of the total loss function of (4) not have regularization parameter?

There is a gap in performance than in the paper

In order to make the code run on pytorch1.8, two changes have been made.
One is to change .data[0] to item(), and the other is to remove all the volatile parameters of Variable(,volatile=True)

when i use link: https://drive.google.com/drive/folders/16Xk1eFSWjQTtuQivTogMmvL3P6F_084u?usp=sharing
`
Dataset CUB

the best ZSL unseen accuracy is tensor(0.6168)
Dataset CUB
the best GZSL seen accuracy is tensor(0.6192)
the best GZSL unseen accuracy is tensor(0.4801)
the best GZSL H is tensor(0.5408)
when i use cub_feat.mat **link: https://drive.google.com/drive/folders/1SOUNd8mgNmY0kFn4iKSvPE8lsMuxaJhp**(Fine-tune)
Dataset CUB
the best ZSL unseen accuracy is tensor(0.7142)
Dataset CUB
the best GZSL seen accuracy is tensor(0.7154)
the best GZSL unseen accuracy is tensor(0.6128)
the best GZSL H is tensor(0.6602)`

When I read "Counterfactual Zero-Shot and Open-Set Visual Recognition" (CVPR2021), I found that the results of his use of your code to reproduce are also 2% lower than the paper.

Is there anything we need to pay attention to in our reproduce?

What is the training unseen feature under transductive setting?

what is the training unseen feature that you used in transductive setting? Is that feature[val_loc]? do you have the plan to release the code of transductive zsl?

video dataset

Hi ! It is an amazing work!
I want to reproduce video action classification, but how can i get dataset of ucf101,such as feature file and attribute file ?
I see "ucf101_manual_att.npy" in this code, can you share it ?

UCF101 and HMDB51 data splits

Hi Akshita,

Thanks for sharing your codes.

Would you mind sharing more .mat files? For example, ucf101 and hmdb51 dataset splits that include 'traing_loc' and 'test_seen_loc' and so on.

Thanks in advance.

Model selection criteria

Hi @akshitac8 , congratulations on your work! Could you elaborate a bit on the model selection criteria? What losses do you pay attention to, and what behavior do you look for?

FileNotFoundError: [Errno 2] No such file or directory: 'data/CUB/res101.mat' and No such file or directory: 'data/CUB/att_splits.mat'

Hello,

when I cd zero-shot-images and execute python image-scripts/run_cub_tfvaegan.py

The following errors occured

FileNotFoundError: [Errno 2] No such file or directory: 'data/CUB/res101.mat' and No such file or directory: 'data/CUB/att_splits.mat'

Could you please tell me how can I get these two files? Thanks

If there any details about how to fine-tune the resnet101

This work is very interesting. But the code doesn't include the training of the feature extraction network. Can you provide corresponding code or training details? Thank you.

performance about CUB dataset

thanks for your work, i am very interest in this work,
when i download the code and try to train on CUB as you offered the features, but i think maybe i did not train it successful. i did not change any parameter， could you give me some idea about this situation

Cannot run zero-shot-actions

Hi Askshita,

I cannot run the task of zero-shot-actions. Some configurations are not clear in the codes. Fo example, opt.zsl_dec and opt.use_mult_rep are not defined in config.py. And, I am confused on these two files: classifier.py and classifier_entropy.py. Some variables are not defined as well. Can you check and update this repo?

Many thanks.

role of parameter

Hi!what is role of encoded_noise? your scripts file encoded_noise is False,

Could you share the codes to fineturn resnet, extract visual features and generate .mat files?

Hello, thanks for your great job!
I just want to try your work on a subset dataset of ImageNet. Could provide the scripts you used to fineturn the resnet101 and the datials about how you extracted visual features and stored information in .mat file. Thanks

ModuleNotFoundError: No module named 'util'

Hello,

I detected an import issue on line 7 in classifiers/classifier_images.py.

I think it should be

import datasets.image_util as util

instead of

import util

FLO dataset

Hi,
First of all, thank you for your work, which will greatly help us continue to explore Zero shot learning。However, at present, I can't search FLO dataset（res101 and splits. Would you like to share it with me! Thank you very much again!

Finetuned features MISS

Finetuned features: https://drive.google.com/drive/folders/13-eyljOmGwVRUzfMZIf_19HmCj1yShf1?usp=sharing

classify

Hello,
I have run your code,
but I want to load the trained model for classify after training by your work,
how can I achieve that?

is there any details about Transductive setiing?

I am very interested in this paper, and I really want to know how to realize the Transductive setiing. But there is no relevant code here，could you please send me one? Thank you very much!!!

epoch 0

i ran the code and it only run first epoch (0)

UCF 101 Attributes

Hi,

I wish to create attribute vectors for UCF101. Are the attribute vectors simply binary? Is there some normalization that is performed on binary attribute vectors? If you have the creation script for .mat files (that are loaded in dataloader) can you please share it?

Thanks in advance !

Question on training custom dataset

Hi, if I want to train my custom dataset with tf-vaegan, what should I prepare?
In util.py, line 34 and line 37, which read the image_embedding and class_embedding file (.mat format) respectively.
How can I generate .mat files so that I can train my own dataset? Thanks