Implementation of Bayesian NNs in Pytorch (https://arxiv.org/pdf/1703.02910.pdf) (With some help from https://github.com/Riashat/Deep-Bayesian-Active-Learning/))
Hi, in the paper, there is a proof that when T goes to infinity, the estimate of conditional mutual information approaches to the real value of conditional mutual information of output y and parameters w. I wonder that why is this necessary? If I can derive an equation which is a proportional of conditional mutual information, can I use it to measure the uncertainty in the view of BALD? Why or why not?
Thanks!