implus / mae_segmentation Goto Github PK

View Code? Open in Web Editor NEW

145.0 3.0 14.0 203 KB

reproduction of semantic segmentation using masked autoencoder (mae)

Python 99.42% Shell 0.58%

mae self-supervised-learning semantic-segmentation masked-autoencoder vit vision-transformer

mae_segmentation's Introduction

ADE20k Semantic segmentation with MAE

Getting started

Install the mmsegmentation library and some required packages.

pip install mmcv-full==1.3.0 mmsegmentation==0.11.0
pip install scipy timm==0.3.2

Install apex for mixed-precision training

git clone https://github.com/NVIDIA/apex
cd apex
pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./

Follow the guide in mmseg to prepare the ADE20k dataset.

Fine-tuning for Reproducing Results of MAE ViT-Base

Command:

tools/dist_train.sh configs/mae/upernet_mae_base_12_512_slide_160k_ade20k.py 8 --seed 0  --options model.pretrained=https://dl.fbaipublicfiles.com/mae/pretrain/mae_pretrain_vit_base.pth

Expected results log(paper results: 48.1 mIoU):

+--------+-------+-------+-------+
| Scope  | mIoU  | mAcc  | aAcc  |
+--------+-------+-------+-------+
| global | 48.15 | 58.99 | 83.05 |
+--------+-------+-------+-------+

Evaluation

Command format:

tools/dist_test.sh  <CONFIG_PATH> <CHECKPOINT_PATH> <NUM_GPUS> --eval mIoU

Acknowledgment

This code is built using the mmsegmentation library, Timm library, the Swin repository, XCiT, SETR, BEiT and the MAE repository.

mae_segmentation's People

Contributors

Stargazers

Watchers

Forkers

sorrowyn sailfish009 cl886699 yuanliuuuuuu wasedamagina linwk20 cjrd codwest zbwxp whuhxb z1j1n1 sekunde wuyilin510520 ca-tt-ac

mae_segmentation's Issues

Paper results use 100 epochs ~= 126k iterations. These results use 160k iterations?

The paper reports results for 100 epochs of training with a batch size of 16. For the 20,210 ade20k training images this is 20,210x100/16 ~= 126k iterations. I noticed your results use 160k iterations -- any idea if this reproduces the results with 100 epochs?

About the Pre-trained Model

Hi @implus, thanks for the nice work of reproducing the segmentation results of MAE!

I checked the log you provided, and noticed that unexpected keys equals to norm.weight, norm.bias
https://github.com/implus/mae_segmentation/blob/main/log/20220131_012835.log#L229

Does it mean that the pre-trained model is first fine-tuned on ImageNet-1K, and then be loaded as the backbone in segmentation?
Is this a common practice for self-supervised methods?

Finetune the model with single gpu

Hi, thank you for the fantastic work.

Can you please provide the code which runs on a single GPU?

Thank you

model weight

Dear,

Thanks for your great work!

With your offered code and hyper-parameters, I get the results as follows:

2022-10-01 04:03:09,229 - mmseg - INFO - Iter(val) [16000]      mIoU: 0.3869, mAcc: 0.5037, aAcc: 0.8005
2022-10-01 05:56:34,575 - mmseg - INFO - Iter(val) [32000]      mIoU: 0.4353, mAcc: 0.5557, aAcc: 0.8148
2022-10-01 07:49:43,813 - mmseg - INFO - Iter(val) [48000]      mIoU: 0.4535, mAcc: 0.5794, aAcc: 0.8188
2022-10-01 09:42:33,149 - mmseg - INFO - Iter(val) [64000]      mIoU: 0.4523, mAcc: 0.5758, aAcc: 0.8216
2022-10-01 11:35:34,234 - mmseg - INFO - Iter(val) [80000]      mIoU: 0.4655, mAcc: 0.5783, aAcc: 0.8256
2022-10-01 13:28:33,442 - mmseg - INFO - Iter(val) [96000]      mIoU: 0.4648, mAcc: 0.5726, aAcc: 0.8279
2022-10-01 15:21:28,416 - mmseg - INFO - Iter(val) [112000]     mIoU: 0.4678, mAcc: 0.5798, aAcc: 0.8252
2022-10-01 17:14:35,033 - mmseg - INFO - Iter(val) [128000]     mIoU: 0.4683, mAcc: 0.5806, aAcc: 0.8270
2022-10-01 19:07:43,025 - mmseg - INFO - Iter(val) [144000]     mIoU: 0.4729, mAcc: 0.5804, aAcc: 0.8279
2022-10-01 21:00:48,207 - mmseg - INFO - Iter(val) [160000]     mIoU: 0.4758, mAcc: 0.5841, aAcc: 0.8293

It seems a few lower than yours.

Could you provide the model weights that reach 48.1% ?
Sincerely.

Visualization or demo?

Thanks for your effort in sharing this excellent work. Can you please provide a demo of applying the pre-trained models to custom images?
Does the network apply masking during training and testing like MAE?

执行训练脚本时报错，不知道是否为torch版本的原因

你好，想问下项目代码运行的torch版本是多少呢？

我使用torch==1.10.0，有以下的错误，把continuous操作修改为clone以及relu修改为inplace=False，还是会报相同的问题(训练的过程中，我设置了use_fp16=False)

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [2, 768, 32, 32]], which is output 0 of ReluBackward0, is at version 1; expected version 0 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).