Git Product home page Git Product logo

polyp-pvt's Introduction

Polyp-PVT

by Bo Dong, Wenhai Wang, Jinpeng Li, Deng-Ping Fan.

This repo is the official implementation of "Polyp-PVT: Polyp Segmentation with Pyramid Vision Transformers".

1. Introduction

Polyp-PVT is initially described in arxiv.

Most polyp segmentation methods use CNNs as their backbone, leading to two key issues when exchanging information between the encoder and decoder: 1) taking into account the differences in contribution between different-level features; and 2) designing effective mechanism for fusing these features. Different from existing CNN-based methods, we adopt a transformer encoder, which learns more powerful and robust representations. In addition, considering the image acquisition influence and elusive properties of polyps, we introduce three novel modules, including a cascaded fusion module (CFM), a camouflage identification module (CIM), and a similarity aggregation module (SAM). Among these, the CFM is used to collect the semantic and location information of polyps from high-level features, while the CIM is applied to capture polyp information disguised in low-level features. With the help of the SAM, we extend the pixel features of the polyp area with high-level semantic position information to the entire polyp area, thereby effectively fusing cross-level features. The proposed model, named Polyp-PVT , effectively suppresses noises in the features and significantly improves their expressive capabilities.

Polyp-PVT achieves strong performance on image-level polyp segmentation (0.808 mean Dice and 0.727 mean IoU on ColonDB) and video polyp segmentation (0.880 mean dice and 0.802 mean IoU on CVC-300-TV), surpassing previous models by a large margin.

2. Framework Overview

3. Results

3.1 Image-level Polyp Segmentation

3.2 Image-level Polyp Segmentation Compared Results:

We also provide some result of baseline methods, You could download from Google Drive/Baidu Drive [code:qw9i], including our results and that of compared models.

3.3 Video Polyp Segmentation

3.4 Video Polyp Segmentation Compared Results:

We also provide some result of baseline methods, You could download from Google Drive/Baidu Drive [code:rtvt], including our results and that of compared models.

4. Usage:

4.1 Recommended environment:

Python 3.8
Pytorch 1.7.1
torchvision 0.8.2

4.2 Data preparation:

Downloading training and testing datasets and move them into ./dataset/, which can be found in this Google Drive/Baidu Drive [code:sydz].

4.3 Pretrained model:

You should download the pretrained model from Google Drive/Baidu Drive [code:w4vk], and then put it in the './pretrained_pth' folder for initialization.

4.4 Training:

Clone the repository:

git clone https://github.com/DengPingFan/Polyp-PVT.git
cd Polyp-PVT 
bash train.sh

4.5 Testing:

cd Polyp-PVT 
bash test.sh

4.6 Evaluating your trained model:

Matlab: Please refer to the work of MICCAI2020 (link).

Python: Please refer to the work of ACMMM2021 (link).

Please note that we use the Matlab version to evaluate in our paper.

4.7 Well trained model:

You could download the trained model from Google Drive/Baidu Drive [code:9rpy] and put the model in directory './model_pth'.

4.8 Pre-computed maps:

Google Drive/Baidu Drive [code:x3jc]

5. Citation:

@aticle{dong2023PolypPVT,
  title={Polyp-PVT: Polyp Segmentation with PyramidVision Transformers},
  author={Bo, Dong and Wenhai, Wang and Deng-Ping, Fan and Jinpeng, Li and Huazhu, Fu and Ling, Shao},
  journal={CAAI AIR},
  year={2023}
}

6. Acknowledgement

We are very grateful for these excellent works PraNet, EAGRNet and MSEG, which have provided the basis for our framework.

7. FAQ:

If you want to improve the usability or any piece of advice, please feel free to contact me directly ([email protected]).

8. License

The source code is free for research and education use only. Any comercial use should get formal permission first.

polyp-pvt's People

Contributors

dengpingfan avatar dongbo811 avatar whai362 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

polyp-pvt's Issues

最优权重

你好,我想问问最后的最优权重是怎么选取的,是根据总的测试文件进行选取的最优吗,还是根据各个不同的测试数据集进行测试分别选取的

Question about the channel selection in SAM

Hello, I notice this fantastic work, and have the following question:
In SAM, this work applies "a Softmax function on the channel dimension of T2 and chooses the second channel as the attention map". I wonder why choose the second channel.
I have already read the related paper "Edge-aware graph representation learning and reasoning for face parsing", but still have no idea about it. I would appreciate it if you could give me an answer!
Best regard

evaluation

Hello, I see that you only have the mdice evaluation indicator in Train.py. Could you please share the code for other evaluation indicators during the training of the model?

def test(model, path, dataset):

data_path = os.path.join(path, dataset)
image_root = '{}/images/'.format(data_path)
gt_root = '{}/masks/'.format(data_path)
model.eval()
num1 = len(os.listdir(gt_root))
test_loader = test_dataset(image_root, gt_root, 352)
DSC = 0.0
for i in range(num1):
    image, gt, name = test_loader.load_data()
    gt = np.asarray(gt, np.float32)
    gt /= (gt.max() + 1e-8)
    image = image.cuda()

    res, res1  = model(image)
    # eval Dice
    res = F.upsample(res + res1 , size=gt.shape, mode='bilinear', align_corners=False)
    res = res.sigmoid().data.cpu().numpy().squeeze()
    res = (res - res.min()) / (res.max() - res.min() + 1e-8)
    input = res
    target = np.array(gt)
    N = gt.shape
    smooth = 1
    input_flat = np.reshape(input, (-1))
    target_flat = np.reshape(target, (-1))
    intersection = (input_flat * target_flat)
    dice = (2 * intersection.sum() + smooth) / (input.sum() + target.sum() + smooth)
    dice = '{:.4f}'.format(dice)
    dice = float(dice)
    DSC = DSC + dice

return DSC / num1

PVT V2 implementation

Hi @DengPingFan

Did you check the implementation of PVT V2? Actually, I need a classification head in forward propagation to apply some loss functions in classification HEAD. Unfortunately, you comment out this line. Could you please tell me the solution to this problem?

About pretrained module

Hello, I saw in the paper that you compared the resunet + + network. Can you send the pre training model of this network? I've tried for a long time and haven't realized it. I want to do a comparative experiment.

可视化结果

可视化结果中,红色,绿神,黄色是怎么画出来的啊

could u give FPS or FLOPs about Polyp-PVT, i test this backbone, its so slow

if name == "main":
a = torch.randn(1, 3, 512, 512).cuda()
backbone = pvt_v2_b0().cuda()
start = time.time()
out = backbone(a)
end = time.time()-start
print('each image use %5f seconds, and image size is 512' % end, )
print([i.shape for i in out])
each image use 0.374312 seconds, and image size is 512
[torch.Size([1, 32, 128, 128]), torch.Size([1, 64, 64, 64]), torch.Size([1, 160, 32, 32]), torch.Size([1, 256, 16, 16])]

Sincerely request the thesis baseline code

Hi, thank you for your excellent work.
The baseline you used for comparison in your paper is from "Pvtv2: Improved baselines with pyramid vision transformer", which I have tried many times without success. I didn't find this part of the code in the project. Could you provide a baseline code of PVTV2 that you use?
Thanks!

关于模型 Train.py 的几点疑问

以下疑问以 Train.py 的最新版本为准。

  1. 第 110 行的 (epoch + 1) % 1 == 0
    所有整数对 1 取余的结果都是 0。所以如果作者您的用意是每次训练完一个周期都对比一下各周期下不同模型的 mdice 系数的话,那么为什么要用这个取余运算呢?我反复看了这一行代码总觉得似乎没有必要写,也可能是我没有真正理解您的用意,烦请您指教!

  2. 第 215 行的 for epoch in range(1, opt.epoch):
    您的论文里说的是训练 100 个周期,但是我看这行代码是从 1 开始到 99 结束,也就是说实际上模型只训练了99个周期吗?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.