An implementation of http://openaccess.thecvf.com/content_CVPRW_2019/papers/Sight%20and%20Sound/Konstantinos_Vougioukas_End-to-End_Speech-Driven_Realistic_Facial_Animation_with_Temporal_GANs_CVPRW_2019_paper.pdf
Thanks for your sharing! In the process of reading code, there is a question: in identity_encoder, why is the channel of convolution network always 50? With the reduction of feature map size, and keep the output channel unchanged, will not cause the loss of image feature information?