zhiyuanyou / uniad Goto Github PK

[NeurIPS 2022 Spotlight] A Unified Model for Multi-class Anomaly Detection

License: Apache License 2.0

Python 97.15% Shell 2.85%

uniad's Introduction

UniAD

Official PyTorch Implementation of A Unified Model for Multi-class Anomaly Detection, Accepted by NeurIPS 2022 Spotlight.

1. Quick Start

1.1 MVTec-AD

Create the MVTec-AD dataset directory. Download the MVTec-AD dataset from here. Unzip the file and move some to ./data/MVTec-AD/. The MVTec-AD dataset directory should be as follows.

|-- data
    |-- MVTec-AD
        |-- mvtec_anomaly_detection
        |-- json_vis_decoder
        |-- train.json
        |-- test.json

cd the experiment directory by running cd ./experiments/MVTec-AD/.
Train or eval by running:

(1) For slurm group: sh train.sh #NUM_GPUS #PARTITION or sh eval.sh #NUM_GPUS #PARTITION.

(2) For torch.distributed.launch: sh train_torch.sh #NUM_GPUS #GPU_IDS or sh eval_torch.sh #NUM_GPUS #GPU_IDS, e.g., train with GPUs 1,3,4,6 (4 GPUs in total): sh train_torch.sh 4 1,3,4,6.

Note: During eval, please set config.saver.load_path to load the checkpoints.
Results and checkpoints.

Platform	GPU	Detection AUROC	Localization AUROC	Checkpoints	Note
slurm group	8 GPUs (NVIDIA Tesla V100 16GB)	96.7	96.8	here	A unified model for all categories
torch.distributed.launch	1 GPU (NVIDIA GeForce GTX 1080 Ti 11 GB)	97.6	97.0	here	A unified model for all categories

1.2 CIFAR-10

Create the CIFAR-10 dataset directory. Download the CIFAR-10 dataset from here. Unzip the file and move some to ./data/CIFAR-10/. The CIFAR-10 dataset directory should be as follows.

|-- data
    |-- CIFAR-10
        |-- cifar-10-batches-py

cd the experiment directory by running cd ./experiments/CIFAR-10/01234/. Here we take class 0,1,2,3,4 as normal samples, and other settings are similar.
Train or eval by running:

(1) For slurm group: sh train.sh #NUM_GPUS #PARTITION or sh eval.sh #NUM_GPUS #PARTITION.

(2) For torch.distributed.launch: sh train_torch.sh #NUM_GPUS #GPU_IDS or sh eval_torch.sh #NUM_GPUS #GPU_IDS.

Note: During eval, please set config.saver.load_path to load the checkpoints.
Results and checkpoints. Training on 8 GPUs (NVIDIA Tesla V100 16GB) results in following performance.

Normal Samples	{01234}	{56789}	{02468}	{13579}	Mean
AUROC	84.4	79.6	93.0	89.1	86.5

2. Visualize Reconstructed Features

We highly recommend to visualize reconstructed features, since this could directly prove that our UniAD reconstructs anomalies to their corresponding normal samples.

2.1 Train Decoders for Visualization

cd the experiment directory by running cd ./experiments/train_vis_decoder/.
Train by running:

(1) For slurm group: sh train.sh #NUM_GPUS #PARTITION.

(2) For torch.distributed.launch: sh train_torch.sh #NUM_GPUS #GPU_IDS #CLASS_NAME.

Note: for torch.distributed.launch, you should train one vis_decoder for a specific class for one time.

2.2 Visualize Reconstructed Features

cd the experiment directory by running cd ./experiments/vis_recon/.
Visualize by running (only support 1 GPU):

(1) For slurm group: sh vis_recon.sh #PARTITION.

(2) For torch.distributed.launch: sh vis_recon_torch.sh #CLASS_NAME.

Note: for torch.distributed.launch, you should visualize a specific class for one time.

3. Questions

3.1 Explanation of Evaluation Results

The first line of the evaluation results are shown as follows.

clsname	pixel	mean	max	std

The pixel means anomaly localization results.

The mean, max, and std mean post-processing methods for anomaly detection. That is to say, the anomaly localization result is an anomaly map with the shape of H x W. We need to convert this map to a scalar as the anomaly score for this whole image. For this convert, you have 3 options:

use the mean value of the anomaly map.
use the max value of the (averagely pooled) anomaly map.
use the std value of the anomaly map.

In our paper, we use max for MVTec-AD and mean for CIFAR-10.

3.2 Visualize Learned Query Embedding

If you have finished the training of the main model and decoders (used for visualization) for MVTec-AD, you could also choose to visualize the learned query embedding in the main model.

cd the experiment directory by running cd ./experiments/vis_query/.
Visualize by running (only support 1 GPU):

(1) For slurm group: sh vis_query.sh #PARTITION.

(2) For torch.distributed.launch: sh vis_query_torch.sh #CLASS_NAME.

Note: for torch.distributed.launch, you should visualize a specific class for one time.

Some results are very interesting. The learned query embedding partly contains some features of normal samples. However, we did not fully figure out this and this part was not included in our paper.

Acknowledgement

We use some codes from repositories including detr and efficientnet.

uniad's People

Contributors

Stargazers

Watchers

uniad's Issues

why decoders need to re-trained for visualization?

Thanks for your excellent work!
But I wonder why decoders need to re-trained for visualization?

Who can help me to export this model to onnx?

when i try to use "torch.onnx.export(model, input, "test.onnx")", I get nothing but lots of errors. Who can show me the right codes?
And, if it is possible to let the input just a n3224*224 tensor, without mask when infering ?

the file train.json

Hello, I would like to ask how the contents in the file [train.json] are obtained. I see that the contents in the file are not compiled according to the serial number of the picture, will this affect the final result

缺陷位置定位偏差

在尝试将UniAD应用到公司的一个实际上项目上，发现缺陷位置与热力图总是一定的偏移，见下图

尝试将NMA全部去除以及调整BackBone重新训练模型，位置偏差还是存在，思考了很久不知道该向哪个方向努力，想请教下这可能是哪里的问题？

tokenized the feature map

Thank you for your great work, I have a question.

In your paper, you said that you tokenized the feature map into H × W feature tokens using Corg channels, but I can't find the tokenization code in the code you provided. Could you please explain where I can find this part of the code and how I can try to tokenize it?
Also, may I ask why you are using the Model_helper class instead of using the UniAD model class?

detection results with AUROC metric

Hello, according to your code and test results, I guess you used max as the metric of image-level anomaly detection. I'm a little confused. Is max equal to the score of AUROC?

torch.distributed.launch is deprecated

home/ubuntu/anaconda3/envs/bisenet/lib/python3.6/site-packages/torch/distributed/launch.py:186: FutureWarning: The module torch.distributed.launch is deprecated
and will be removed in future. Use torchrun.
Note that --use_env is set by default in torchrun.
If your script expects --local_rank argument to be set, please
change it to read from os.environ['LOCAL_RANK'] instead. See
https://pytorch.org/docs/stable/distributed.html#launch-utility for
further instructions
FutureWarning,

It was able to run through before, but it suddenly encountered this problem today. Please kindly take a look，thank you

What does #PARTITION mean in "sh train.sh #NUM_GPUS #PARTITION"?

Could you please give a specific example of using this command? Thanks a lot!

RuntimeError: CUDA error: no kernel image is available for execution on the device

[2023-05-19 15:47:23,443][       utils.py][line: 740][    INFO]  not exist, load from https://github.com/lukemelas/EfficientNet-PyTorch/releases/download/1.0/efficientnet-b4-6ed6700e.pth
[2023-05-19 15:47:24,572][       utils.py][line: 761][    INFO] Loaded ImageNet pretrained efficientnet-b4
[2023-05-19 15:47:36,139][   train_val.py][line:  90][    INFO] layers: ['backbone', 'neck', 'reconstruction']
[2023-05-19 15:47:36,140][   train_val.py][line:  91][    INFO] active layers: ['reconstruction', 'neck']
=> loading checkpoint './checkpoints/ckpt.pth.tar'
[2023-05-19 15:47:41,654][custom_dataset.py][line:  36][    INFO] building CustomDataset from: ../../data/MVTec-AD/train.json
[2023-05-19 15:47:41,667][custom_dataset.py][line:  36][    INFO] building CustomDataset from: ../../data/MVTec-AD/test.json
Traceback (most recent call last):
  File "../../tools/train_val.py", line 329, in <module>
    main()
  File "../../tools/train_val.py", line 125, in main
    validate(val_loader, model)
  File "../../tools/train_val.py", line 269, in validate
    outputs = model(input)
  File "/usr/local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 445, in forward
    output = self.module(*inputs[0], **kwargs[0])
  File "/usr/local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/NFS/workspaces/hjha/dev/UniAD/models/model_helper.py", line 49, in forward
    output = submodule(input)
  File "/usr/local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/NFS/workspaces/hjha/dev/UniAD/models/backbones/efficientnet/model.py", line 392, in forward
    x, feat_dict, _ = self.extract_features(image)
  File "/NFS/workspaces/hjha/dev/UniAD/models/backbones/efficientnet/model.py", line 358, in extract_features
    x = self._swish(self._bn0(self._conv_stem(inputs)))
  File "/usr/local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/NFS/workspaces/hjha/dev/UniAD/models/backbones/efficientnet/utils.py", line 101, in forward
    x = x * torch.sigmoid(x)
RuntimeError: CUDA error: no kernel image is available for execution on the device
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/local/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/usr/local/lib/python3.8/site-packages/torch/distributed/launch.py", line 263, in <module>
    main()
  File "/usr/local/lib/python3.8/site-packages/torch/distributed/launch.py", line 258, in main
    raise subprocess.CalledProcessError(returncode=process.returncode,
subprocess.CalledProcessError: Command '['/usr/local/bin/python', '-u', '../../tools/train_val.py', '--local_rank=0', '-e']' returned non-zero exit status 1.

FLOPs calculation

May I ask how to calculate FLOPs ?

How to inference？

Thanks for your great work!
So, how to inference by one image?

Intuition or Precise basis of the LQD design

Hi, thanks for sharing your awesome work.
The code is really readable and I learned a lot from your modularization.

I have few questions about your intuition in designing LQD.

1. Multiple query embeddings

The paper says that each decoder layer has its own learnable query embedding to 'intensify the use of query embedding'.
I visualized each layer's learned embeddings but I cannot understand how using distinct query embeddings in each layer helps intensifying query embedding usage. The figure below is what I visualized:

What each query embeddings actually learned? and which intuition made you design that way??

2. Connection of LQD

In vanilla transformer, query embedding first passes through a self attention layer, and then a cross attention layer.
In LQD, each query embedding first passes through a cross attention with the encoder output, and then do another cross attention with previous decoder output.
What makes this design so good at anomaly detection?
I first thought that the design helps query embedding to imitate encoder output by keep reminding the encoder output & previous output, but I think I was wrong after I visualize the encoded tokens..

Really appreciate if you give some help understanding your work!

Thanks.

cv2.error:Opencv(3.4.10)

thanks for your excellent work
I still met a question as follows：
cv2.error:Opencv(3.4.10) /tmp/pip-req-build-806eaxqo-opencv/modules/imgproc/src/colot.cpp:182........

Reproduce ResNet Result

Hello~ Thanks for the nice work and code. I was trying to reproduce the (image/pixel) auroc for ResNet50 and ResNet101, which are (92.4/96) and (92.2/95.9) in the paper, respectively. However, with the provided config file in MVTec folder and the suggested layer changes (https://github.com/zhiyuanyou/UniAD/blob/main/experiments/MVTec-AD/config.yaml#L89), the reproduced result for ResNet50 and ResNet101 are (86.6/94.1) and (87.19/94.66). Could you please provide the config file and checkpoint for resnet50 and resnet101? Thank you in advance.

Training category issue

Hello, when using models to train my own dataset, I would like to ask the following questions:
Can all multi category data be trained in one train or do we need to place different category data in different folders?
2. If the feature maps that have been processed by NME and LQD are decoded into three features from high to low and compared with the original Resne feature extractor, will it result in inaccurate results?

AdamW bug

[2022-10-10 10:01:45,308][cifar_dataset.py][line: 35][ INFO] building CustomDataset from: ../../../data/CIFAR-10/
/home/ubuntu/anaconda3/envs/bisenet/lib/python3.6/site-packages/torch/optim/lr_scheduler.py:370: UserWarning: To get the last learning rate computed by the scheduler, please use get_last_lr().
"please use get_last_lr().", UserWarning)
/home/ubuntu/anaconda3/envs/bisenet/lib/python3.6/site-packages/torch/nn/functional.py:3503: UserWarning: The default behavior for interpolate/upsample with float scale_factor changed in 1.6.0 to align with other frameworks/libraries, and now uses scale_factor directly, instead of relying on the computed output size. If you wish to restore the old behavior, please set recompute_scale_factor=True. See the documentation of nn.Upsample for details.
"The default behavior for interpolate/upsample with float scale_factor changed "
Traceback (most recent call last):
File "../../../tools/train_val.py", line 329, in
main()
File "../../../tools/train_val.py", line 143, in main
frozen_layers,
File "../../../tools/train_val.py", line 221, in train_one_epoch
optimizer.step()
File "/home/ubuntu/anaconda3/envs/bisenet/lib/python3.6/site-packages/torch/optim/lr_scheduler.py", line 65, in wrapper
return wrapped(*args, **kwargs)
File "/home/ubuntu/anaconda3/envs/bisenet/lib/python3.6/site-packages/torch/optim/optimizer.py", line 89, in wrapper
return func(*args, **kwargs)
File "/home/ubuntu/anaconda3/envs/bisenet/lib/python3.6/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/home/ubuntu/anaconda3/envs/bisenet/lib/python3.6/site-packages/torch/optim/adamw.py", line 117, in step
beta1,
UnboundLocalError: local variable 'beta1' referenced before assignment
Killing subprocess 3774
Traceback (most recent call last):
File "/home/ubuntu/anaconda3/envs/bisenet/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/home/ubuntu/anaconda3/envs/bisenet/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/ubuntu/anaconda3/envs/bisenet/lib/python3.6/site-packages/torch/distributed/launch.py", line 340, in
main()
File "/home/ubuntu/anaconda3/envs/bisenet/lib/python3.6/site-packages/torch/distributed/launch.py", line 326, in main
sigkill_handler(signal.SIGTERM, None) # not coming back
File "/home/ubuntu/anaconda3/envs/bisenet/lib/python3.6/site-packages/torch/distributed/launch.py", line 301, in sigkill_handler
raise subprocess.CalledProcessError(returncode=last_return_code, cmd=cmd)

如何用自己的数据集测试

非常感谢您的分享，您创新性的工作让我耳目一新。我之前用MVTec测试了一遍，效果很好。但是我想用我自己创建的数据集测试时遇到了一些困难，您能分享一下如何对代码进行修改，从而检测自己的数据吗？非常感谢！

config.yaml 的一点问题

作者，你好。
想问一下在config.yaml中
port: 11111 是啥意思呀
以及如果要修改input_size的话例如512,512 pixel_mean以及pixel_std需要修改吗
input_size: [224,224] # [h,w]
pixel_mean: [0.485, 0.456, 0.406]
pixel_std: [0.229, 0.224, 0.225]

How to get 128x128 cifar10 images?

If I simply resize the image with bicubic interplotation, the resized images will contain a lot of griding(chessboard) noise. I also find a 128x128 version of cifar10 by using super resolution tech from https://www.kaggle.com/datasets/joaopauloschuler/cifar10-128x128-resized-via-cai-super-resolution. But the SR images also contain lots of other kinds of artifacts. I guess all these artifacts will effect the AD performance.
Do you have similar problem? How do you get the 128x128 size for cifar10 images whose original size is 32x32?
Thx~

compared to PatchCore

Do you compare your method with PatchCore? It is the SOTA of anomaly detection reported by the MVTec AD benchmark.

Looking forward to your outstanding work!

Multi-gpu bug: Address already in use

About torch.distributed.launch: sh train_torch.sh #NUM_GPUS.
It works well with 1 GPU.
However, it shows RuntimeError: Address already in use with multiple GPUs.

train.json, test.json

请问如果使用其他数据集，如何构造您所提到的这两个文件

about the vis_decoder

Hello, author, thank you very much for your work. I encountered some problems with vis_decoder when I reproduced your work. In your paper, I need to use the vis_decoder model to view reconstructed images, mainly the reverse process of resnet. My question is how can the exception be eliminated through this reverse process? May I ask your advice on this matter? I would appreciate it.

About the color jitter

Hi! Thanks for releasing the code of this great work. I have a question about the color jitter. Why is it only added in training but not in validation? I thought it was added to avoid reconstructing inputs through shortcuts. However, if it is only added in the training phase, it is more likely to be an augmentation.

About training

thanks for your excellent work
I still have a few questions：

Whether the learning rate needs to be modified when switching to different number of Gpus;
If I want to use UniAD to train a single class, in order to verify Table 1 and table 2 in the paper, do I only need to restrict the input data to a single class? I have done this, but I can not achieve the result in some class, such as screw, I only got 84.6 detection AUROC metric;

About visualizing anomaly localization results

Loop About Computer Configuration

May I ask if the RTX3090 graphics card can run your program?

关于模型的一些使用问题

你好在vis_query文件中的config.yaml 中出现了
saver:
load_path: ../train_vis_decoder/{class_name}/checkpoints/ckpt.pth.tar
save_dir: result_vis_query
log_dir: log

vis_query:
model_path: ../MVTec-AD/checkpoints/ckpt.pth.tar
hidden_dim: 256
num_decoder_layers: 4
with_text: True
请问为什么saver中的load_path起什么作用呀
我训练完之后想要预测单张图片我可以使用哪个文件的接口呀

anomaly localization results are close, but anomaly detection results are different.

Using the configuration you have given

./experiments/MVTec-AD/connfig.yaml

the configuration does not contain

  metrics:
    auc:
      - name: mean

I added it and tried to reproduce it

mean AUROC seems a bit low, especially for screw

Are there any tips for training? or anything I should be aware of?

clsname	mean	pixel	max	std
capsule	0.83566	0.98597	0.913841	0.871959
bottle	0.988095	0.97981	1	1
toothbrush	0.888889	0.983198	0.936111	0.972222
screw	0.539045	0.987008	0.913302	0.947325
transistor	0.919167	0.982308	0.99875	0.99625
wood	0.961404	0.930861	0.985965	0.980702
tile	0.994949	0.919262	0.993506	0.997835
hazelnut	0.995357	0.980919	1	0.997857
leather	1	0.987544	1	1
pill	0.832242	0.960712	0.945717	0.875614
grid	0.951546	0.972235	0.988304	0.993317
metal_nut	0.882209	0.932279	0.995112	0.969697
zipper	0.985557	0.974935	0.979254	0.982668
cable	0.923913	0.974855	0.957271	0.958958
carpet	0.886437	0.983814	0.997994	0.998796
mean	0.905631	0.969047	0.973675	0.969547

Error:Exception has occurred: KeyError 'RANK'

When I run the train_val.py in tools the error occurs in rank = int(os.environ["RANK"]) in dist_helper.py
The error is reported as follows.
Exception has occurred: KeyError
'RANK'

During handling of the above exception, another exception occurred:

File "D:\Python\UniAD\utils\dist_helper.py", line 31, in setup_distributed
rank = int(os.environ["RANK"])
File "D:\Python\UniAD\tools\train_val.py", line 61, in main
rank, world_size = utils.dist_helper.setup_distributed(port=config.port)
File "D:\Python\UniAD\tools\train_val.py", line 343, in
main()

How to solve this problem?

When replacing the backbone, the loss becomes smaller

hello，when I replace backbone to reset, the loss becomes less than 1 at the very beginning，which is different from efficient, is that normal?

test_torch.sh

Hello, could you upload the test file? Thank you.

TypeError: efficientnet_b4() got an unexpected keyword argument 'outlayers'

Hello,
I'm trying to execute and understand your code and I have the following error :
TypeError: efficientnet_b4() got an unexpected keyword argument 'outlayers'

In config.yaml there is :
- name: backbone
type: models.backbones.efficientnet_b4
frozen: True
kwargs:
pretrained: True
# select outlayers from: resnet [1,2,3,4], efficientnet [1,2,3,4,5]
# empirically, for industrial: resnet [1,2,3] or [2,3], efficientnet [1,2,3,4] or [2,3,4]
outlayers: [1,2,3,4]

However, when looking in the efficientnet directory, no file contain "outlayers". Where is this argument defined ?

Could you help to solve this error ?