Git Product home page Git Product logo

roni-lab / ppsnet Goto Github PK

View Code? Open in Web Editor NEW
21.0 5.0 2.0 30 KB

PPSNet: Leveraging Near-Field Lighting for Monocular Depth Estimation from Endoscopy Videos (ECCV, 2024)

Home Page: https://ppsnet.github.io/

Python 99.90% Shell 0.10%
clinical-data depth depth-estimation depth-refinement endoscopic-vision endoscopy endoscopy-video inverse-rendering knowledge-distillation monocular-depth-estimation near-field photometric sim2real transfer-learning

ppsnet's Introduction

Leveraging Near-Field Lighting for Monocular Depth Estimation from Endoscopy Videos

๐Ÿ”ฅ Please remember to โญ this repo if you find it useful and cite our work if you end up using it in your work! ๐Ÿ”ฅ

๐Ÿ”ฅ If you have any questions or concerns, please create an issue ๐Ÿ“! ๐Ÿ”ฅ

Pre-print | Project Website

๐Ÿ“– Abstract

Monocular depth estimation in endoscopy videos can enable assistive and robotic surgery to obtain better coverage of the organ and detection of various health issues. Despite promising progress on mainstream, natural image depth estimation, techniques perform poorly on endoscopy images due to a lack of strong geometric features and challenging illumination effects. In this paper, we utilize the photometric cues, i.e., the light emitted from an endoscope and reflected by the surface, to improve monocular depth estimation. We first create two novel loss functions with supervised and self-supervised variants that utilize a per-pixel shading representation. We then propose a novel depth refinement network (PPSNet) that leverages the same per-pixel shading representation. Finally, we introduce teacher-student transfer learning to produce better depth maps from both synthetic data with supervision and clinical data with self-supervision. We achieve state-of-the-art results on the C3VD dataset while estimating high-quality depth maps from clinical data. Our code, pre-trained models, and supplementary materials can be found on our project page: https://ppsnet.github.io/.

๐Ÿ”ง Setup

STEP1: bash setup.sh

STEP2: conda activate ppsnet

STEP3: pip3 install -r requirements.txt

STEP 4: Install PyTorch using the below command,

pip3 install torch==2.2.0 torchvision==0.17.0 torchaudio==2.2.0 --index-url https://download.pytorch.org/whl/cu118

The exact versioning may vary depending on your computing environment and what GPUs you have access to. Note this good article for maintaining multiple system-level versions of CUDA.

STEP 5: Download the C3VD dataset. Our preprocessing steps for the dataset involve performing calibration and undistorting the images (a script for which will be released in the near future). We've provided a validation portion of the dataset in a Google Drive for reference and ease-of-use with this repo's evaluation code. You can download that portion of the dataset here (~29GB). Note the original licensing terms of the C3VD data.

STEP 5: Download the appropriate pre-trained models and place them in a newly created folder called checkpoints/.

๐Ÿ’ป Usage

You can evaluate our backbone model using the test_backbone.py script:

python3 test_backbone.py --data_dir /your/path/to/data/dir --log_dir ./your_path_to_log_dir --ckpt ./your_path_to_checkpoint

Similarly, our teacher model and our student model can be evaluated using the test_ppsnet.py script:

python3 test_ppsnet.py --data_dir /your/path/to/data/dir --log_dir ./your_path_to_log_dir --ckpt ./your_path_to_checkpoint

In addition to generating metrics such as abs_rel and RMSE, both scripts will generate various folders in the specified log_dir containing input images, ground truth and estimate depths, and percent depth error maps. Please keep an eye on this repo for future updates, including a full release of the training code, baselines included in the paper, mesh generation and visualization code, and more.

๐Ÿ“œ Acknowledgments

Thanks to the authors of Depth Anything and NFPS for their wonderful repos with open-source code!

๐Ÿ“œ Citation

If you find our paper or this toolbox useful for your research, please cite our work.

@article{paruchuri2024leveraging,
  title={Leveraging Near-Field Lighting for Monocular Depth Estimation from Endoscopy Videos},
  author={Paruchuri, Akshay and Ehrenstein, Samuel and Wang, Shuxian and Fried, Inbar and Pizer, Stephen M and Niethammer, Marc and Sengupta, Roni},
  journal={arXiv preprint arXiv:2403.17915},
  year={2024}
}

ppsnet's People

Contributors

yahskapar avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

ppsnet's Issues

How to calcualte the similiarty map between PPS and input image?

Thanks for your great job in monocular depth estimation of endoscopy. Could you explain how to calcualte the similarity map between PPS and input RGB image? Fig. 3 of your ECCV paper shows the similarity map between two images, so could you show some clues about how to compute it?

Train module

Thanks for your work in monocular depth estimation of endocsope. I would like to test the performance of this nerwork on endocsope videos with surgical tools and shadows. Could you please release the code of training module? Thank you!

About SSL for C3VD dataset

Thanks for your great work!

I am also doing some work to apply self-supervised learning depth estimation based on the C3VD dataset. There's a problem is that the distance variation in some of the frames is too small (e.g. nearly no change for about 10 successive frames), so how do you select the frames for reprojection loss? For example, in monodepth2 or Af-SfMLearner they choose [0, -1, 1]. But when I try [0, -1, 1] for C3VD, it will not converge to the right depths.

Do you select the same frames for all 22 videos or do you have some special selection?

Looking forward to your reply!

Best,
Beilei

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.