Git Product home page Git Product logo

bupt-ai-cz / balnmp Goto Github PK

View Code? Open in Web Editor NEW
52.0 3.0 12.0 179.2 MB

Predicting Axillary Lymph Node Metastasis in Early Breast Cancer Using Deep Learning on Primary Tumor Biopsy Slides, BCNB Dataset

Python 98.84% Shell 1.16%
breast-cancer breast-cancer-classification breast-cancer-prediction axillary-lymph-node-metastasis deep-learning primary-tumor-classification biopsy metastasis wsi-images

balnmp's Introduction

Predicting Axillary Lymph Node Metastasis in Early Breast Cancer Using Deep Learning on Primary Tumor Biopsy Slides visitors

Grand-Challenge | Arxiv | Dataset Page | Tweet

This repo is the official implementation of our paper "Predicting Axillary Lymph Node Metastasis in Early Breast Cancer Using Deep Learning on Primary Tumor Biopsy Slides".

Our paper is accepted by Frontiers in Oncology, and you can also get access our paper from Arxiv or MedRxiv.

News

  • We launched a Grand Challenge: BCNB to promote relevant research.
  • We released our data. Please visit homepage to get the downloading information.
  • Paper codes are released, please see code for more details.

Abstract

  • Objectives: To develop and validate a deep learning (DL)-based primary tumor biopsy signature for predicting axillary lymph node (ALN) metastasis preoperatively in early breast cancer (EBC) patients with clinically negative ALN.

  • Methods: A total of 1,058 EBC patients with pathologically confirmed ALN status were enrolled from May 2010 to August 2020. A DL core-needle biopsy (DL-CNB) model was built on the attention-based multiple instance-learning (AMIL) framework to predict ALN status utilizing the DL features, which were extracted from the cancer areas of digitized whole-slide images (WSIs) of breast CNB specimens annotated by two pathologists. Accuracy, sensitivity, specificity, receiver operating characteristic (ROC) curves, and areas under the ROC curve (AUCs) were analyzed to evaluate our model.

  • Results: The best-performing DL-CNB model with VGG16_BN as the feature extractor achieved an AUC of 0.816 (95% confidence interval (CI): 0.758, 0.865) in predicting positive ALN metastasis in the independent test cohort. Furthermore, our model incorporating the clinical data, which was called DL-CNB+C, yielded the best accuracy of 0.831 (95% CI: 0.775, 0.878), especially for patients younger than 50 years (AUC: 0.918, 95% CI: 0.825, 0.971). The interpretation of DL-CNB model showed that the top signatures most predictive of ALN metastasis were characterized by the nucleus features including density (p = 0.015), circumference (p = 0.009), circularity (p = 0.010), and orientation (p = 0.012).

  • Conclusion: Our study provides a novel DL-based biomarker on primary tumor CNB slides to predict the metastatic status of ALN preoperatively for patients with EBC.

Setup

Clone this repo

git clone https://github.com/bupt-ai-cz/BALNMP.git

Environment

Create environment and install dependencies.

conda create -n BALNMP python=3.6 -y
conda activate BALNMP
pip install -r code/requirements.txt

Dataset

For your convenience, we have provided preprocessed clinical data in code/dataset, please download the processed WSI patches from here and unzip them by the following scripts:

cd code/dataset
# download paper_patches.zip
unzip paper_patches.zip

Training

Our codes have supported the following experiments, whose results have been presented in our paper and supplementary material.

experiment_index:

  1. N0 vs N+(>0)
  2. N+(1-2) vs N+(>2)
  3. N0 vs N+(1-2) vs N+(>2)
  4. N0 vs N+(1-2)
  5. N0 vs N+(>2)

To run any experiment, you can do as this:

cd code
bash run.sh ${experiment_index}

Furthermore, if you want to try other settings, please see train.py for more details.

Paper results

The results in our paper are computed based on the cut-off value in ROC. For your convenient reference, we have recomputed the classification results with argmax prediction rule, where the threshold for binary classification is 0.5, and the detailed recomputed results are here.

The performance in prediction of ALN status (N0 vs. N(+))

N0 vs. N(+)

The performance in prediction of ALN status (N0 vs. N + (1-2))

N0 vs. N + (1-2)

The performance in prediction of ALN status (N0 vs. N + (>2))

N0 vs. N + (>2)

Implementation details

Data preparation

In our all experiments, the patch number (N) of each bag is fixed as 10, however, the bag number (M) for each WSI is not fixed and is dependent on the resolution of a WSI. According to our statistical results, the bag number (M) of WSIs varies from 1 to 300, which is not fixed for a WSI during training and testing. The process of dataset preparation is shown in the following figure, and the details are as follows:

  • Firstly, we cut out annotated tumor regions for each WSI, and there may exist multiple annotated tumor regions in a WSI.

  • Then, each extracted tumor region is cropped into amounts of non-overlapping square patches with a resolution of 256 * 256, and patches with a blank ratio greater than 0.3 are filtered out.

  • Finally, for each WSI, a bag is composed of randomly sampled 10 (N) patches, and the left patches which can not be grouped into a bag will be discarded.

The 5 clinical characteristics used in our experiments are age (numerical), tumor size (numerical), ER (categorical), PR (categorical), and HER2 (categorical), which are provided in our BCNB Dataset, and you can access them from our BCNB Dataset.

a

Model testing

As mentioned above, a WSI is split into multiple bags, and each bag is inputted into the MIL model to obtain predicted probabilities. So for obtaining the comprehensive predicted results of a WSI during testing, we compute the average predicted probabilities of all bags to achieve "Result Merging".

c

Demo software

We have also provided software for easily checking the performance of our model to predict ALN metastasis.

Please download the software from here, and check the README.txt for usage. Please note that this software is only used for demo, and it cannot be used for other purposes.

demo-software

Citation

If this work helps your research, please cite this paper in your publications.

@article{xu2021predicting,
  title={Predicting axillary lymph node metastasis in early breast cancer using deep learning on primary tumor biopsy slides},
  author={Xu, Feng and Zhu, Chuang and Tang, Wenqi and Wang, Ying and Zhang, Yu and Li, Jie and Jiang, Hongchuan and Shi, Zhongyue and Liu, Jun and Jin, Mulan},
  journal={Frontiers in oncology},
  volume={11},
  pages={759007},
  year={2021},
  publisher={Frontiers Media SA}
}

Contact

If you encounter any problems, please open an issue without hesitation, and you can also contact us with the following:

Acknowledgements

This project is based on the following open-source projects. We thank their authors for making the source code publically available.

balnmp's People

Contributors

bupt-ai-cz avatar super233 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

balnmp's Issues

Global-Chem Partnership with BALNMP

Hello,

My name is Suliman Sharif and I am author of a python package called Global-Chem - A Dictionary from common chemical names to their molecular definition.

We have been keeping tracking of your as part of our medical database resources and noticed it went down with our monitor:

https://github.com/Sulstice/Uptime-Medical-Informatics

I want to know if you would like us to host your data at no cost to you to ensure the database is maintained for however long as part of our open source chemical database network. This is to build our knowledge graph in how chemical data relates to cancer imaging.

The data would still be maintained by you. Would that be something you would be interested in?

Thank you,
-Suliman

Questions about the .jpg

Dear all,
Thank you very much for disclosing the data. I would like to ask if you have the original data. The format is .svs or .tiff pyramid. The data you provide now seems to be compressed pictures, jpeg, and I can’t find the physics of the images. What is the size in um?

Thanks
Sen

Fast converting to arrays

Greetings, we are using your patches to classify of the images are ALN status 0/1. This work is happening as part of internship with Bits Pilani students in India. Kindly can you help us on the fastest ways to convert images to arrays as to use in the model. Also can you give your thoughts on how much time it will take to convert patch images to arrays.

Our procedure goes with 41000 patch images for training and rest distributed to validation and test. using 80 -20 split.
Kindly request to help us on converting images to arrays as for us it is taking lot and lot hours.

Regards,
Vamsi

Questions about the .jpg WSIs resolution

Hello,
Thank you for your selfless data sharing,
But it seems you only provide .jpg WSIs without telling the resolution.

So, can you tell me the resolution of the data conversion?

Question about the bouding box of the annotations

Hi,

When I do cut_tumor_regions.py, I noticed the some bboxes gotten from get_annotation_points_and_bboxes(json_path) are not valid.
The coords in the bboxes can have values smaller than 0, or larger than the max shape of the wsi_img.

for example,: wsi id 732

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.