Dear @1Konny,
Thanks for your implementation!
I have detected that line 168 in gradcam.py
:
alpha_denom = gradients.pow(2).mul(2) + \ activations.mul(gradients.pow(3)).view(b, k, u*v).sum(-1, keepdim=True).view(b, k, 1, 1)
should be:
global_sum = activations.view(b, k, u*v).sum(-1, keepdim=True).view(b, k, 1, 1) alpha_denom = gradients.pow(2).mul(2) + global_sum.mul(gradients.pow(3))
This is because of Eq. 19 in the paper [@adityac94]. If you pay attention, you need first to compute the sum over all {a, b}
for each activation map k
. Then you have a ponderation for each activation map k
that you will use as a multiplier of all gradients {i, j}
in the respective kernel k
.
In your implementation, you first multiply each cell activation A{a,b,k}
by its respective gradient {i, j, k}
and then you sum over {i, j}
, mixing the indices {a, b}
and {i, j}
, but they are independent.
I hope it's clear :)
PD: I have fixed this in gradcam.py
and added a flag in example.ipynb
to automatically detect if Cuda can be used, otherwise, use CPU. I will make you a "Pull request".
Thanks for your time!