Git Product home page Git Product logo

vand-april-gan's People

Contributors

bychelsea avatar hanyue1648 avatar zhangzjn avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

vand-april-gan's Issues

Few Shot is just One Shot here?

From debugging the code, it seems that here the patch tokens of just one reference image are compared to the query image since we iterate over patch_tokens that has the length of few_shot_features. Shouldn't it be possible to use multiple reference images with the k_shot variable?

VAND-APRIL-GAN/test.py

Lines 189 to 201 in 46fcbe5

for idx, p in enumerate(patch_tokens):
if 'ViT' in args.model:
p = p[0, 1:, :]
else:
p = p[0].view(p.shape[1], -1).permute(1, 0).contiguous()
cos = pairwise.cosine_similarity(mem_features[cls_name[0]][idx].cpu(), p.cpu())
height = int(np.sqrt(cos.shape[1]))
anomaly_map_few_shot = np.min((1 - cos), 0).reshape(1, 1, height, height)
anomaly_map_few_shot = F.interpolate(torch.tensor(anomaly_map_few_shot),
size=img_size, mode='bilinear', align_corners=True)
anomaly_maps_few_shot.append(anomaly_map_few_shot[0].cpu().numpy())
anomaly_map_few_shot = np.sum(anomaly_maps_few_shot, axis=0)
anomaly_map = anomaly_map + anomaly_map_few_shot

About text_probs.

Hello, thank you for your contribution to anomaly detection in the zero-shot learning domain. However, I have a question that I found in the code and I hope you can explain it. During the zero-shot process, when processing normal images, as shown in line 169 of the test.py file, text_probs[0][1] is still being used to represent semantic information. According to my understanding, text_probs[0][0] should represent the semantic information of normal images, while text_probs[0][1] should represent the semantic information of abnormal images. Therefore, when processing normal images, should the code be changed to text_probs[0][0]? Thank you very much!

initialized by other weight

Hello, may I ask if you think the linear layer used for the network middle layer is to train the text-language alignment ability from scratch? And have you tried to train the framwork with initialized by other fine pretrained feature extractor(have not been trained by clip)?

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation:

when run !python /content/VAND-APRIL-GAN/train.py --train_data_path "/content/VAND-APRIL-GAN/data" --config_path "/content/VAND-APRIL-GAN/open_clip/model_configs/ViT-B-16.json" it shows this error!

/usr/local/lib/python3.10/dist-packages/torch/autograd/init.py:200: UserWarning: Error detected in LinalgVectorNormBackward0. No forward pass information available. Enable detect anomaly during forward pass for more information. (Triggered internally at ../torch/csrc/autograd/python_anomaly_mode.cpp:92.)
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
Traceback (most recent call last):
File "/content/VAND-APRIL-GAN/train.py", line 176, in
train(args)
File "/content/VAND-APRIL-GAN/train.py", line 140, in train
loss.backward()
File "/usr/local/lib/python3.10/dist-packages/torch/_tensor.py", line 487, in backward
torch.autograd.backward(
File "/usr/local/lib/python3.10/dist-packages/torch/autograd/init.py", line 200, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.HalfTensor [1, 196, 512]], which is output 0 of AsStridedBackward0, is at version 1; expected version 0 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!

检测结果

对于test.py中的检测结果,除了生成的热力图结果,还有其他方法可以看检测的某一张照片是不是异常吗

Query about zero-shot anomaly segmentation

As far as I understood, unlike WinClip for anomaly segmentation, the original ground truth of test images is used for model adaptation, while the final performance is reported on the same test images. Can you please confirm?

Using Resnet backbone

Hi!

I was trying to run with a Resnet50 backbone with this prompt:

!python train.py --dataset visa --train_data_path /content/visa-dataset/ \
--save_path ./exps/mvtec/RN50x16_384 --config_path ./open_clip/model_configs/RN50x16.json --model RN50x16 \
--features_list 1 2 3 4 --pretrained openai --image_size 384  --batch_size 8 --aug_rate -1 --print_freq 1 \
--epoch 3 --save_freq 1

But it does not work correctly:


Traceback (most recent call last):
  File "/code/VAND-APRIL-GAN/train.py", line 170, in <module>
    train(args)
  File "/code/VAND-APRIL-GAN/train.py", line 108, in train
    image_features, patch_tokens = model.encode_image(image, features_list)
  File "/code/VAND-APRIL-GAN/open_clip/model.py", line 213, in encode_image
    features = self.visual(image, out_layers)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
TypeError: ModifiedResNet.forward() takes 2 positional arguments but 3 were given

Could you share the changes that should be made in modified_resnet.py to allow this? Thank you.

Abbreviations

Thank you very much for this nice repository. I was wondering what is denoted by the abbreviations "px" and "sp" in the code.

Use image_features instead of patch_tokens

Hi, thanks for contributing nice work. Here I have a question for discussion.

Question: How can we use image_feature (in your train.py line 112) instead of patch_tokens with ResNet50 backbone. And do you have any suggestions on how to achieve this?

In the original code (with ResNet50 backbone), you are using different scale patch_tokens to element-wise multiply text_feature with shape:
(B, 9612, 768) and (B, 768, 2) => (B, 9612, 2)
(B, 2304, 768) and (B, 768, 2) => (B, 2304, 2)
(B, 576, 768) and (B, 768, 2) => (B, 576, 2)
and reshape, interpolate to target anomaly map size, and so on...

But the image_features shape is (B, 768) and the text_features shape is (B, 768, 2). How should we modify and design the rest actions to continue to train linear layers and generate anomaly maps for inference?

If you have any questions, feel free to ask, thanksss!

实验结果的可视化问题

跑了一下test,出来的结果是这样的,
image
有那种二值化的结果图,或者把异常区域圈出来的吗?

Guidance on Threshold Setting for Accurate Defect Detection in Heatmap Visualizations (red mark)

Thank you so much for your code in AND-APRIL-GAN. Thanks to your published code, I was able to study this field better and understand coding more deeply.

However, I have a question regarding visualization. In the visualization, I see that it uses a heatmap, and usually, defects are marked with a red mark. But in my case, due to incorrect threshold settings, not only the defects but also other parts are marked in red. Do you have any ideas on how to address this issue? Or what kind of values should I generally set for the threshold?"

few-shot的异常分类分数

请问few-shot时能否针对每张图生成对应的异常分类的分数,源码里似乎是针对某一类进行统计的,关于异常图的最大值的应用这一部分能否说明一下

/data/visa/meta.json

I hope this message finds you well. I've been working with your code for the Visa and MVTec datasets, and I've encountered an issue related to the missing meta.json file in the dataset path /data/visa/meta.json.

It seems that the code relies on this meta.json file to load important dataset information, and as a result, I'm encountering a FileNotFoundError when trying to run the code. The code snippet that specifically references the missing file is as follows:

meta_info = json.load(open(f'{self.root}/meta.json', 'r'))

I have checked the provided dataset path, and indeed, there is no meta.json file located at /data/visa/meta.json.

Could you please provide more guidance on how to resolve this issue? Do I need to create or obtain the meta.json file for the dataset, and if so, how should it be structured?

Your assistance in resolving this issue would be greatly appreciated. Thank you for your time and support.

Best regards,

mvtec epoch

Why did you set the epoch to 3 when training the MVTec dataset, but to 15 when training the Visa dataset? I noticed that the loss on MVTec was still decreasing after the third epoch

关于训练时梯度的问题

您好,我在修改train.py文件进行网络训练的时候,在最后loss计算梯度的时候出现了如下错误:RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation,请问您知道该问题如何解决吗?我的cuda版本12.2,因此使用requirement.txt中的版本不合适,我先使用了torch2.1.0的版本,之后更换到 2.2.1+cu118版本均会出现该问题。希望您的回复。

正常图片和异常图片得到的分类概率分数都一样

你好, 这边做测试用一张正常图片和一张异常图片做测试 代码如下

`image = preprocess(Image.open("/content/000.png")).unsqueeze(0).to(device)
obj_list=["screw"]
with torch.cuda.amp.autocast(), torch.no_grad():
text_prompts = encode_text_with_prompt_ensemble(model, obj_list, tokenizer, device)
image_features= model.encode_image(image)
image_features /= image_features.norm(dim=-1, keepdim=True)

text_features = []
text_features.append(text_prompts["screw"])
text_features = torch.stack(text_features, dim=0)
# sample
text_probs = (100.0 * image_features @ text_features[0]).softmax(dim=-1)`

得到的text_probs 都为这样tensor([[0.7982, 0.2018]], device='cuda:0') ,第一个值都是大于第二值,请问这样是否表明分类不正确

why the AUPRO lower than the WinCLIP?

Thank you for your working first. I found that both your AUROC and F1max score on mvtec-ad dataset for zero-shot segmentation are higher than the WinCLIP, but the AUPRO is lower (64.6 for WinCLIP and 44 for your work), can you provide some explanation for it? Thank you.

Nice work!

Hi! I am very interested in your excellent work. I do believe that this can bring new insights into zero-shot anomaly detection and pioneer the way into unifying anomaly detection. Let's make anomaly detection great together.

Issues Recreating Model

I have been trying to run the model for quite some time and finally stumbled across an error I think could be solved through an issue.

When running visa.py:
image

and when running test.py:
image

I did attempt also to run a version of test.py on my local machine and it couldn't find visa.json, which is why I was running visa.py. Please let me know if you have a solution or if there is something I might be doing incorrectly when loading in the model to get testing results.

Thanks!

关于train.py中的的Dataset问题

在train.py中:
if args.dataset == 'mvtec':
train_data = MVTecDataset(root=args.train_data_path, transform=preprocess, target_transform=transform,
aug_rate=args.aug_rate)
并未设置mode='train',而 MVTecDataset中mode的默认值为 'test' 。下述代码:
if mode == 'train':
# 如果模式是 'train'
self.cls_names = [obj_name] # 将对象名称添加到类别名称列表
save_dir = os.path.join(save_dir, 'k_shot.txt') # 构建保存目录的路径
else:
self.cls_names = list(meta_info.keys()) # 否则,获取所有类别名称
无论是test.py还是train.py都不会执行 mode == 'train'。请问是应该将mode默认值设置为train,还是我理解的幼体,在train.py中mode就应该为test?

Can it work without the mask images?

I want to use this perfect work in a new dataset, but the dataset doesn't have mask labels. My objective is to perform image-level anomaly detection (normal or abnormal classification). Is that possible to achieve this using this code?

global image representations

Hello dear authors!In the code "image_features, patch_tokens = model.encode_image(image, features_list)", is "image_features" the global image representations? Just like the source code in openclip: "image_features = model.encode_image(image)", as we always do. But you change the transformer to add additional outputs of assigned "features_list". I don't know whether my understanding is right or not.

Calculation False Positive Rate

Hi, thank you for your valuable work.

In the function of calculating the pro_auc (AUPRO), I noticed that the False Positive Rate (FPR) is computed using the formula:
fpr = fp_pixels / inverse_masks.sum()
While commonly, the FPR is calculated as:
fpr = FP / (FP + FN)
I would appreciate some clarification on why the FPR is being computed with the inverse of the ground truth mask in this specific context.

Thank you!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.