bychelsea / vand-april-gan Goto Github PK

View Code? Open in Web Editor NEW

156.0 3.0 19.0 28.56 MB

[CVPR 2023 Workshop] VAND Challenge: 1st Place on Zero-shot AD and 4th Place on Few-shot AD

Python 98.85% Shell 1.15%

anomaly-detection clip visual-lauguage few-shot zero-shot

vand-april-gan's People

Contributors

Stargazers

Watchers

Forkers

holmes-gu windroser dongcc1316 xiaomujiang sy00n hsaigroup feeq21 zhangxujinsh liutongkun xhd5656123 bonchae hannahhan1119 hsuchialun ray-ms daxiahuang callmealexs cinout odegaard8 lucabrembilla

vand-april-gan's Issues

Few Shot is just One Shot here?

From debugging the code, it seems that here the patch tokens of just one reference image are compared to the query image since we iterate over patch_tokens that has the length of few_shot_features. Shouldn't it be possible to use multiple reference images with the k_shot variable?

VAND-APRIL-GAN/test.py

Lines 189 to 201 in 46fcbe5

 for idx, p in enumerate(patch_tokens): 

 if 'ViT' in args.model: 

 p = p[0, 1:, :] 

 else: 

 p = p[0].view(p.shape[1], -1).permute(1, 0).contiguous() 

 cos = pairwise.cosine_similarity(mem_features[cls_name[0]][idx].cpu(), p.cpu()) 

 height = int(np.sqrt(cos.shape[1])) 

 anomaly_map_few_shot = np.min((1 - cos), 0).reshape(1, 1, height, height) 

 anomaly_map_few_shot = F.interpolate(torch.tensor(anomaly_map_few_shot), 

 size=img_size, mode='bilinear', align_corners=True) 

 anomaly_maps_few_shot.append(anomaly_map_few_shot[0].cpu().numpy()) 

 anomaly_map_few_shot = np.sum(anomaly_maps_few_shot, axis=0) 

 anomaly_map = anomaly_map + anomaly_map_few_shot

About text_probs.

Hello, thank you for your contribution to anomaly detection in the zero-shot learning domain. However, I have a question that I found in the code and I hope you can explain it. During the zero-shot process, when processing normal images, as shown in line 169 of the test.py file, text_probs[0][1] is still being used to represent semantic information. According to my understanding, text_probs[0][0] should represent the semantic information of normal images, while text_probs[0][1] should represent the semantic information of abnormal images. Therefore, when processing normal images, should the code be changed to text_probs[0][0]? Thank you very much!

initialized by other weight

Hello, may I ask if you think the linear layer used for the network middle layer is to train the text-language alignment ability from scratch? And have you tried to train the framwork with initialized by other fine pretrained feature extractor（have not been trained by clip）？

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation:

when run !python /content/VAND-APRIL-GAN/train.py --train_data_path "/content/VAND-APRIL-GAN/data" --config_path "/content/VAND-APRIL-GAN/open_clip/model_configs/ViT-B-16.json" it shows this error!

/usr/local/lib/python3.10/dist-packages/torch/autograd/init.py:200: UserWarning: Error detected in LinalgVectorNormBackward0. No forward pass information available. Enable detect anomaly during forward pass for more information. (Triggered internally at ../torch/csrc/autograd/python_anomaly_mode.cpp:92.)
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
Traceback (most recent call last):
File "/content/VAND-APRIL-GAN/train.py", line 176, in
train(args)
File "/content/VAND-APRIL-GAN/train.py", line 140, in train
loss.backward()
File "/usr/local/lib/python3.10/dist-packages/torch/_tensor.py", line 487, in backward
torch.autograd.backward(
File "/usr/local/lib/python3.10/dist-packages/torch/autograd/init.py", line 200, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.HalfTensor [1, 196, 512]], which is output 0 of AsStridedBackward0, is at version 1; expected version 0 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!

检测结果

对于test.py中的检测结果，除了生成的热力图结果，还有其他方法可以看检测的某一张照片是不是异常吗

Query about zero-shot anomaly segmentation

As far as I understood, unlike WinClip for anomaly segmentation, the original ground truth of test images is used for model adaptation, while the final performance is reported on the same test images. Can you please confirm?

Using Resnet backbone

Hi!

I was trying to run with a Resnet50 backbone with this prompt:

!python train.py --dataset visa --train_data_path /content/visa-dataset/ \
--save_path ./exps/mvtec/RN50x16_384 --config_path ./open_clip/model_configs/RN50x16.json --model RN50x16 \
--features_list 1 2 3 4 --pretrained openai --image_size 384  --batch_size 8 --aug_rate -1 --print_freq 1 \
--epoch 3 --save_freq 1

But it does not work correctly:


Traceback (most recent call last):
  File "/code/VAND-APRIL-GAN/train.py", line 170, in <module>
    train(args)
  File "/code/VAND-APRIL-GAN/train.py", line 108, in train
    image_features, patch_tokens = model.encode_image(image, features_list)
  File "/code/VAND-APRIL-GAN/open_clip/model.py", line 213, in encode_image
    features = self.visual(image, out_layers)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
TypeError: ModifiedResNet.forward() takes 2 positional arguments but 3 were given

Could you share the changes that should be made in modified_resnet.py to allow this? Thank you.

-

Abbreviations

Thank you very much for this nice repository. I was wondering what is denoted by the abbreviations "px" and "sp" in the code.

Use image_features instead of patch_tokens

Hi, thanks for contributing nice work. Here I have a question for discussion.

Question: How can we use image_feature (in your train.py line 112) instead of patch_tokens with ResNet50 backbone. And do you have any suggestions on how to achieve this?

In the original code (with ResNet50 backbone), you are using different scale patch_tokens to element-wise multiply text_feature with shape:
(B, 9612, 768) and (B, 768, 2) => (B, 9612, 2)
(B, 2304, 768) and (B, 768, 2) => (B, 2304, 2)
(B, 576, 768) and (B, 768, 2) => (B, 576, 2)
and reshape, interpolate to target anomaly map size, and so on...

But the image_features shape is (B, 768) and the text_features shape is (B, 768, 2). How should we modify and design the rest actions to continue to train linear layers and generate anomaly maps for inference?

If you have any questions, feel free to ask, thanksss!

is that possible to convert and using onnx file?

Thank you for sharing your works.

By the way, is it possible to convert the model file to ONNX format for test?

实验结果的可视化问题

跑了一下test，出来的结果是这样的，

有那种二值化的结果图，或者把异常区域圈出来的吗？

Guidance on Threshold Setting for Accurate Defect Detection in Heatmap Visualizations (red mark)

Thank you so much for your code in AND-APRIL-GAN. Thanks to your published code, I was able to study this field better and understand coding more deeply.

However, I have a question regarding visualization. In the visualization, I see that it uses a heatmap, and usually, defects are marked with a red mark. But in my case, due to incorrect threshold settings, not only the defects but also other parts are marked in red. Do you have any ideas on how to address this issue? Or what kind of values should I generally set for the threshold?"

few-shot的异常分类分数

请问few-shot时能否针对每张图生成对应的异常分类的分数，源码里似乎是针对某一类进行统计的，关于异常图的最大值的应用这一部分能否说明一下

/data/visa/meta.json

I hope this message finds you well. I've been working with your code for the Visa and MVTec datasets, and I've encountered an issue related to the missing meta.json file in the dataset path /data/visa/meta.json.

It seems that the code relies on this meta.json file to load important dataset information, and as a result, I'm encountering a FileNotFoundError when trying to run the code. The code snippet that specifically references the missing file is as follows:

meta_info = json.load(open(f'{self.root}/meta.json', 'r'))

I have checked the provided dataset path, and indeed, there is no meta.json file located at /data/visa/meta.json.

Could you please provide more guidance on how to resolve this issue? Do I need to create or obtain the meta.json file for the dataset, and if so, how should it be structured?

Your assistance in resolving this issue would be greatly appreciated. Thank you for your time and support.

Best regards,

mvtec epoch

Why did you set the epoch to 3 when training the MVTec dataset, but to 15 when training the Visa dataset? I noticed that the loss on MVTec was still decreasing after the third epoch

What type of GPU are you using and how long does it take

Thanks for your great work!
By the way, what type of GPU are you using and how long does it take?🙌

关于训练时梯度的问题

您好，我在修改train.py文件进行网络训练的时候，在最后loss计算梯度的时候出现了如下错误：RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation，请问您知道该问题如何解决吗？我的cuda版本12.2，因此使用requirement.txt中的版本不合适，我先使用了torch2.1.0的版本，之后更换到 2.2.1+cu118版本均会出现该问题。希望您的回复。

正常图片和异常图片得到的分类概率分数都一样

你好，这边做测试用一张正常图片和一张异常图片做测试代码如下

`image = preprocess(Image.open("/content/000.png")).unsqueeze(0).to(device)
obj_list=["screw"]
with torch.cuda.amp.autocast(), torch.no_grad():
text_prompts = encode_text_with_prompt_ensemble(model, obj_list, tokenizer, device)
image_features= model.encode_image(image)
image_features /= image_features.norm(dim=-1, keepdim=True)

text_features = []
text_features.append(text_prompts["screw"])
text_features = torch.stack(text_features, dim=0)
# sample
text_probs = (100.0 * image_features @ text_features[0]).softmax(dim=-1)`

得到的text_probs 都为这样tensor([[0.7982, 0.2018]], device='cuda:0') ，第一个值都是大于第二值，请问这样是否表明分类不正确

why the AUPRO lower than the WinCLIP?

Thank you for your working first. I found that both your AUROC and F1max score on mvtec-ad dataset for zero-shot segmentation are higher than the WinCLIP, but the AUPRO is lower (64.6 for WinCLIP and 44 for your work), can you provide some explanation for it? Thank you.

Is there a way to get the score of how anomalous an image is?

I want to know if there is a variable/method i can use to score how suspicious/anomalous an image is. im kind of new to this stuff so idk lol

Nice work!

Hi! I am very interested in your excellent work. I do believe that this can bring new insights into zero-shot anomaly detection and pioneer the way into unifying anomaly detection. Let's make anomaly detection great together.

Resnet相比VIT的效果怎么样呢

可以提供下Resnet的指标吗

Issues Recreating Model

I have been trying to run the model for quite some time and finally stumbled across an error I think could be solved through an issue.

When running visa.py:

and when running test.py:

I did attempt also to run a version of test.py on my local machine and it couldn't find visa.json, which is why I was running visa.py. Please let me know if you have a solution or if there is something I might be doing incorrectly when loading in the model to get testing results.

Thanks!

关于train.py中的的Dataset问题

在train.py中：
if args.dataset == 'mvtec':
train_data = MVTecDataset(root=args.train_data_path, transform=preprocess, target_transform=transform,
aug_rate=args.aug_rate)
并未设置mode='train',而 MVTecDataset中mode的默认值为 'test' 。下述代码：
if mode == 'train':
# 如果模式是 'train'
self.cls_names = [obj_name] # 将对象名称添加到类别名称列表
save_dir = os.path.join(save_dir, 'k_shot.txt') # 构建保存目录的路径
else:
self.cls_names = list(meta_info.keys()) # 否则，获取所有类别名称
无论是test.py还是train.py都不会执行 mode == 'train'。请问是应该将mode默认值设置为train，还是我理解的幼体，在train.py中mode就应该为test？

有推理文件吗

作者你好，怎么设置推理呢，有推理文件吗

Can it work without the mask images?

I want to use this perfect work in a new dataset, but the dataset doesn't have mask labels. My objective is to perform image-level anomaly detection (normal or abnormal classification). Is that possible to achieve this using this code?

global image representations

Hello dear authors！In the code "image_features, patch_tokens = model.encode_image(image, features_list)", is "image_features" the global image representations? Just like the source code in openclip: "image_features = model.encode_image(image)", as we always do. But you change the transformer to add additional outputs of assigned "features_list". I don't know whether my understanding is right or not.

why multiply 100 in "anomaly_map = (100.0 * patch_tokens[layer] @ text_features)"?

Is it chosen based on empirical observations?

Can we test without loading pre-trained weights?

Can I test without loading pre-trained weights, and how do I set the checkpoint_path?

Calculation False Positive Rate

Hi, thank you for your valuable work.

In the function of calculating the pro_auc (AUPRO), I noticed that the False Positive Rate (FPR) is computed using the formula:
fpr = fp_pixels / inverse_masks.sum()
While commonly, the FPR is calculated as:
fpr = FP / (FP + FN)
I would appreciate some clarification on why the FPR is being computed with the inverse of the ground truth mask in this specific context.

Thank you!

除了添加的线性层 clip其它部分的权重有被微调吗

您好！请问在训练过程中，除了添加的线性层clip其它部分的权重有被微调吗？还是clip主干部分的权重是冻结的呢。

	for idx, p in enumerate(patch_tokens):
	if 'ViT' in args.model:
	p = p[0, 1:, :]
	else:
	p = p[0].view(p.shape[1], -1).permute(1, 0).contiguous()
	cos = pairwise.cosine_similarity(mem_features[cls_name[0]][idx].cpu(), p.cpu())
	height = int(np.sqrt(cos.shape[1]))
	anomaly_map_few_shot = np.min((1 - cos), 0).reshape(1, 1, height, height)
	anomaly_map_few_shot = F.interpolate(torch.tensor(anomaly_map_few_shot),
	size=img_size, mode='bilinear', align_corners=True)
	anomaly_maps_few_shot.append(anomaly_map_few_shot[0].cpu().numpy())
	anomaly_map_few_shot = np.sum(anomaly_maps_few_shot, axis=0)
	anomaly_map = anomaly_map + anomaly_map_few_shot