zhoubolei / cam Goto Github PK

View Code? Open in Web Editor NEW

1.8K 1.8K 467.0 13.04 MB

Class Activation Mapping

Home Page: http://cnnlocalization.csail.mit.edu/

License: MIT License

MATLAB 71.48% Makefile 0.29% C 6.82% C++ 16.82% Shell 0.22% Python 4.24% M 0.13%

cam's People

Contributors

Stargazers

Watchers

Forkers

ajschumacher panovr tonychouzju lemonsakura rollingstone kltsyn pinglin caomw yuechengli fakhraddin cv-ip wyw636 dksword wang4959520 hosnasattar vanova lamdang rayantony silasxue kshitid20 vyraun yuzhile gcucurull fanghaizhao marialeyvallina bigjohnn mldl marcreyesmarti drafly wantajob shownx nlkim0817 wucpmark 1292765944 xinyaojiejie ieee820 jjdblast guoshengcv jzjsnow solomon1588 liu09114 gftq queenie88 shanyuhu imclab ghdeng1992 westamine tinyloop kim-seongjung csgaobb gaobb pinglmlcv erinchen824 yiliangnie codermanok dinggd giannisgl pb-pravin srinivaschalamala lomascolo lixiaosi33 aliaosha biexingle jia-honghenrylee wangzhecheng blankworld xiaahui rongchangzhao qianwangn trantorrepository holyhao cnn-gan frank-afn pplntech dreadlord1984 zhaoyang1708 bob48523 briangong babynxj thinkronize hershelm hguosc githubbingochen mocherson jnfarooq jsgyun michealray visjedi github4210 ivanhehe ustcxxs wh-forker sinianyutian napolun279 huaxinxiao jskdr liangxi627 nanase-always jwprescott youhebuke

cam's Issues

Different Bbox generation method than the one mentioned in paper?

Hi.
I was looking into the code for bounding box generation in C, and I noticed that you have accumulated bounding boxes for different thresholds (with default values being 30, 90, and 150) bboxGenerator/dt_box.cpp. However, in the paper, only one threshold is mentioned (20% of the largest value in the mapping).

It would be great if you can elaborate on this difference. Am I missing something?

Other models like VGGNet for CAM?

Current repository includes the train_val file and solver of the GoogleCAM, have you tried other models like VGGNet or ResNet for CAM?

Pretrained VGG16CAM Model

Thanks for your effort. Could you please share the pretrained vgg16CAM model?

I have tried to convert caffemodel to torchmodel accroding to prototxt and .caffemodel, but the accuracy is much lower than it should be. Would you share the pretrained vgg16CAM model with me? Or Is there anything I should pay special attention to?

Thanks.

GoogLeNet-CAM model on ImageNet

GoogLeNet-CAM model on ImageNet.

I was not able to find the file: models/deploy_googlenetCAM.prototxt to use with the pre-trained weights. Is it possible to add this in?

File "pytorch_CAM.py", missing module 'requests'

Traceback (most recent call last):
File "pytorch_CAM.py", line 4, in
import requests
ModuleNotFoundError: No module named 'requests'

CAM for CNN regression networks?

I mainly tackle regression problems by CNNs, and want to find a reliable method to calculate the heatmaps for NN's reuslts. However , i find almost all interpretation methods including CAM is used for classification NNs but not for regression NNs. Is there any interpretation method suitable for CNNs which do regression tasks?

Can't find file in dataset/ILSVRC2012?

It's confused to find dataset/ILSVRC2012/imageListVal.mat and sizeImg_ILSVRC2014.mat.
Can you apply more details in data preparing?

How to train CAM?

if i want to train a cam which is used medical image（3 class），which loss is my need？

python code that generate the bbox according to the heatmap?

Could you please share the python code that generate the bounding box according to the heatmap?

Originally posted by @liu666666 in #18 (comment)

Applying threshold on negative weight values or not?

Hello,

I saw that in Places365 CAM code, a thresholding is applied over the softmax layer weights while in this repository and in the original paper, there's no mention about this trick.

Is there an explanation for this?

Thanks!

There seems an error in mergeTenCrop.m

As far as I can see, the size of heatmap for each testing image should be 224-224. So in mergeTenCrop.m, the size of input, i.e. CAMmap_crops, is expected to be 224-224-10, not 256-256-10. And the size of cropImgSet should also be changed to 224-224-3-10.

How to generate CAM for fine-tuned model and custom data set?

I use Caffe to fine-tune VGG-16 on a new data set. After fine-tuning, I get the fine-tuned model. So I want to ask that how to generate CAM for this fine-tuned model and custom data set?

Sorry I made a mistake... Please delete this

1. Dimension of weight_softmax[idx] and features channel 2. Upsampling (directly resize)?

Meet this interesting work so late.

Here is my little doubt.

CAM/pytorch_CAM.py

Line 48 in c63f285

cam = weight_softmax[idx].dot(feature_conv.reshape((nc, h*w)))

The dimension of weight_softmax[idx] should be 512. However, for layer4's nc, it should be 256. Is there a mistake here? In other words, I suspect that CAM can only be used for the last layer, so as to match the dimension of 512.
Is there a better process for upsampling the final class activation map? I feel the resize is a bit rough.

Pretrained VGG16CAM Model

Thanks for your effort. Could you please share the pretrained vgg16CAM model?

Thanks.

CAM on market1501...

Are there any methods to use the CAM on re-id datasets e.g. market1501? the tiny images can not show the great performance of CAM.

CAM to activate multiple tags

Hello, I want to use the activation map cam to activate multiple tags. Using the trained model, the remote sensing images feel the same, and all are activated. Can you help me solve the puzzle? The following is a map of remote sensing (in the data set used in the training model) and two different targets

a queston about the results on the classification

I have a question about the results on the classification in the paper:

For classification, why do you compare your GAP network with the NIN? NIN also has the GAP layer, then what can be included when the GAP network performs better than NIN?

I am really a little confused about this, so could you be so kind to give me some explanations? Thanks!

How can I run use the dt_box and run the generate_bbox.m？

I am not familiar to C++. I want to run your code generate_bbox.m, but when it runs to the line
system(['bboxgenerator/./dt_box ' curHeatMapFile ' ' curParaThreshold ' ' curBBoxFile]);
some errors occur. e.g., the generate_bbox cannot be used.

I have some questions:

where can i obtain generate_bbox, whether is it a .dll file or some other files？
second: how can i compile the dt_box? in fact, i have established a new project, and add the files dt.c, dt.h,dt_box.cpp., and obtained a dt_box.exe, but it cannot be used.
Can you give me some helps? thanks so much!

How do I use a pre-training model to train model on my dataset？

Regarding fine tuning.

In the paper you say that you finetune the model, so do you finetune the whole network or only the part of the network where you introduced the 331024 convolutions followed by global average pooling followed by soft max. We have very less compute so this detail may help us wasting out time and help us concentrate on other tasks.

Thank you.

AttributeError: 'tuple' object has no attribute 'softmax'

Traceback (most recent call last):
File "CAM3.py", line 69, in
h_x = F.softmax(logit, dim=1).data.squeeze()
File "/home/omnisky/anaconda2/envs/swin/lib/python3.7/site-packages/torch/nn/functional.py", line 1512, in softmax
ret = input.softmax(dim)
AttributeError: 'tuple' object has no attribute 'softmax'

When I put pytorch_ CAM. Py encountered the above problems when it was used in my own model. Do you know how to solve them? Thank you very much and look forward to your reply@Bolei Zhou

Fine tuning

How to fine tune your model?
I don't have sufficient data to retrain your model from scratch.I want to fine tune your model on my data which has only two classes ?

label 地址好像有问题

你好，label地址好像有问题，访问地址的时候提示错误，下载不了Jason文件，请问该怎么解决

What format of annotation is required for training own data?

I'm hoping to use this to train on my own data from scratch. I have images that have point annotations in the format of coordinates. What is the format of the annotations required for CAM?

one bounding box for one CAM in the demo code

In the demo code of generating bounding box for CAM, the code seems to generate multiple boxes for one map, how can I modify the code so that one map only generate one box for localization?

Can you provide the dataset you are using?

Nice online demo

Nice online demo with Class Activation Mapping and Squeezenet-v1.1:
https://transcranial.github.io/keras-js/#/squeezenet-v1.1

Doubt about using pytorch_CAM.py

Hi ,
Actually I am new to this field and I am now checking out the possibility of using CAM for Weakly Supervised Object Localization.

As far as I understood , If I have trained a classifier (lets say to classify 3 objects) with network architectures such as resnet, densenet, or other squeezenet that use global average pooling at the end . I can apply your pytorch_CAM.py in the repository to generate heat_maps for those 3 classes.

It would be great , if anyone can correct me if I am wrong.

Thanks
rahul

CAM for binary class

for example, the task is to classify dog or not, and we label dog as class 1.
when comes a dog image, class 1 focus on any part except dog;
on the contrary, class 0 focus right on dog.

Purpose of crop2img

https://github.com/metalbubble/CAM/blob/18419ae817d9fcda72bb5fcbe132113ce9d58cc8/ILSVRC_generate_heatmap.m#L121

The function crop2img seems to combine the "gradients" into one image. Since the resulting variables are called "alignImgMean" and "alignImgSet", i guess this function is somehow related to the "mergeTenCrop.m" script. So I try to reproduce crop2img with mergeTenCrop. What I didn't figure out so far is whether crop2img would convert the gradients or not.

https://github.com/metalbubble/CAM/blob/18419ae817d9fcda72bb5fcbe132113ce9d58cc8/mergeTenCrop.m#L6
The function mergeTenCrops for example needs a 256,256,1,10 matrix (cropImgSet), but ILSVRC_generate_heatmap.m provides 256,256,3,10 (CAMmap_crops) as input.

Am I supposed to pre-convert the gradients?
If true, Is this third dimension used for color channel (3 = rgb, 1=black/white) ?
Is crop2img really based on mergeTenCrop or did I miss something?

Could you please provide an explanation on the purpose of/how to reimplement crop2img?

Required File is missing

https://github.com/metalbubble/CAM/blob/18419ae817d9fcda72bb5fcbe132113ce9d58cc8/ILSVRC_evaluate_bbox.m#L9

I want to evaluate the CAM algorithm with the provided evaluation script.
It occurs to me, that the "sizeImg_ILSVRC2014.mat" file is missing. It is neither located in this projects folder, nor in the ILSVCR devkit. In Google, this file doesn't seem to be mentioned, either.

So where/how do i get this file?

problem with the trained model

The trained models provided at http://cnnlocalization.csail.mit.edu/demoCAM/models/imagenet_googleletCAM_train_iter_120000.caffemodel
which is downloaded using the download.sh script is too small (3.75MB) , also it gives error while running the demo.m with online=1
Check failed: ReadProtoFromBinaryFile(param_file, param) Failed to parse NetParameter file: models/imagenet_googleletCAM_train_iter_120000.caffemodel

Visualize intermediate feature map?

What if i want to visualize the intermediate feature map and its channel is difference from the last layer. For example, Visualize the layer3 in resnet18 in pytorch.

some error about dt_box

Hello! I appreciate your work very well.
When I run the demo, there is an error like this:
error while loading shared libraries: libopencv_core.so.2.4: cannot open shared object file: No such file or directory
How can I solve it? Thank you!

Undefined function 'loadHeatMap'

Hi, when I run ILSVRC_evaluate_bbox.m to evaluate the bbox, I get the error as follow:
"Undefined function 'loadHeatMap' for input arguments of type 'char'."

The function "loadHeatMap" used in ILSVRC_evaluate_bbox.m (line 73) is undefined. Would you like to update this script? Thank you!

How to get the coords from heatmap?

Hello,how is the heatmap_6.txt generated?

how to train the model?

how to train the model?
i have my dataset, and i want to train the model based on my dataset...

misclaim of CAM

I am the developer of CAM. Recently I found this blog article (https://thehive.ai/blog/inside-a-neural-networks-mind) to introduce CAM and grad-CAM. The overview on the CAM and grad-CAM in the blog article is good, but found there is some bias or misleading claim to CAM, compared to grad-CAM. This wrong claim has been existing for a while that I would like to clarify as below:

First of all, nowadays all the mainstream network architectures such as resnet, densenet, or other squeezenet use global average pooling at the end, so the class activation map (the heatmap) could be generated directly using CAM, without modifying any network architectures. So the claim that the grad-Cam is superior over CAM because of using grad-cam without modifying architecture is false.

Meanwhile, if you are using resnet or densenet or squeeznet or any modern networks, so you can basically generate heatmap using CAM directly (see example code at https://github.com/metalbubble/CAM/blob/master/pytorch_CAM.py), without needing the extra step to compute the gradient as in grad-CAM. Through that you save the backward computation, in which you save almost half of the computation. This is crucial in some application such as video processing that CAM is able to use the forward pass only to generate the prediction and heatmap for each frame. So in the associated code of that blog (https://github.com/hiveml/tensorflow-grad-cam), they are already using the resnet, but still use the gradient to generate CAM. It simply wastes the computation.

What is name of upsampling technique you used in this code??

Hello, I read the code and paper "Learning Deep Features for Discriminative Localization". Now I'm trying to apply this method to a smaller class label case (only 2). Everything was okay before the up-sampling part. However, In the up-sampling part (Maybe mergeTencrops.m are doing this) I can't get the technique you used, so now I'm trying to search in google or find some paper about it if you tell some information about the technique.

In sum, can you tell me about the name of technique you used in mergeTencrops ??

Thank you for reading.

Is values of heatmap plane comparable?

For example I have 2 classes and for single image I have heatmap with shape [H,W,N_CLASSES]. I train my model with sigmoid + binary crossentropy. At prediction time when I use larger image as network input, at each pixel I want my classes to be exclusive, so I need to compare heatmap values with np.argmax to get 'best' class, so my questions is values in heatmaps are really comparable?

I tried to dump min, max values of heatmap for single image:

i 0
np.min(heatmap[:,:,i]),np.max(heatmap[:,:,i]) -38.4533 19.9384
i 1
np.min(heatmap[:,:,i]),np.max(heatmap[:,:,i]) -20.2977 34.8101

As I can see range of values are different and heatmaps are not normalized.

Is there a way to normalize all planes of heatmap to [0,1] range and make them comparable?

Can it be used for detection task?

Thank you for sharing the code.but can it be used for detection task?

J

minor bug in pytorch code?

I am using this script in pytorch available in your repository.
Currently, this line is
cam = weight_softmax[class_idx].dot(feature_conv.reshape((nc, h*w)))
But I think it should be,
cam = weight_softmax[idx].dot(feature_conv.reshape((nc, h*w)))

class_idx should be replaced by idx, becuase class_idx is a list, idx is a integer.

Required modifications to existing pretrained networks to allow CAM

I have tried your demo using the pre-trained CAM models. I now want to try it with my own pre-trained Caffe CNNs.

I would be very grateful if you could demonstrate the modifications that are required to convert a pre-trained non-CAM model to the CAM format. Is it as simple as modifying the prototxt files directly? Thanks.

Issue: extract feature online

@metalbubble @ajschumacher
Thank you so much for your code, it is helpful for me. I miss some trouble:
"online = 0; % whether extract features online or load pre-extracted features"
In this line, I want to use my own dataset and extract features online, but when I change "online=1", the results seem terrible.
do you think what's the reason? and is there any other code I need to change?
Thank you so much again and wait for your reply.

Why average pooling instead of global average pooling.

Why do you use average pooling instead of global average pooling like in paper?

layers {
  name: "CAM_pool"
  type: POOLING
  bottom: "CAM_conv"
  top: "CAM_pool"
  pooling_param {
    pool: AVE
    kernel_size: 14
    stride: 14
  }
}

In the 'online' mode, matlab crashed when running caffe.Net(model, weight, 'test')

The error is :
Check failed: ReadProtoFromBinaryFile(param_file, param) Failed to parse NetParameter file: CAM/models/vgg16CAM_train_iter_90000.caffemodel
*** Check failure stack trace: ***

I think it might be the problem of the version of caffe...

Maybe a mistake in mergeTenCrop.m

Thanks for your providing of the source code, which helps me a lot.

I found a mistake in mergeTenCrop.m.

In line 25, 26

alignImgSet(i:i+cropSize-1, j:j+cropSize-1,:,curr) = curCrop1;
alignImgSet(i:i+cropSize-1, j:j+cropSize-1,:, curr+5) = curCrop2;

They maybe modified into

alignImgSet(j:j+cropSize-1, i:i+cropSize-1,:,curr) = curCrop1;
alignImgSet(j:j+cropSize-1, i:i+cropSize-1,:, curr+5) = curCrop2;

You can check it whether it is right.

Prepare_image

Does the VggNet apply the same data preparing (prepare_image.m) as GoogLetNet?