Git Product home page Git Product logo

zaaachos / thesis-diagnostic-captioning Goto Github PK

View Code? Open in Web Editor NEW
4.0 3.0 0.0 70.99 MB

B.Sc. Thesis Deep Learning & NLP research on Medical Image Captioning

Home Page: http://nlp.cs.aueb.gr/theses/g_zachariadis_bsc_thesis.pdf

License: MIT License

Python 74.58% Jupyter Notebook 25.13% Shell 0.30%
biomedical-image-processing biomedical-research deep-learning keras-tensorflow neural-networks pytorch sota-model cnn bert-models machine-learning

thesis-diagnostic-captioning's Introduction

BSc Thesis research in Diagnostic Captioning

Thesis paper

Exploring Uni-modal, Cross-modal, and Multi-modal Diagnostic Captioning

Abstract

Recent years have witnessed an increase in studies associated with image captioning, but little of that knowledge has been utilised in the biomedical field. This thesis addresses medical image captioning, referred as Diagnostic Captioning (DC), the task of assisting medical experts in diagnosis/report drafting. We present deep learning uni-modal, cross-modal and multi-modal methods that aim to generate a representative ``diagnostic text'' for a given medical image. The multi-modal approaches, utilise the radiology concepts (tags) used by clinicians to describe a patient's image (e.g., X-Ray, CT scan, etc.) as an additional input data. These methods, have not been adequately applied to biomedical research. We also experimented with a novel technique that utilises the captions generated from all the systems implemented as part of this thesis. Lastly, this thesis concerns the participation of AUEB's NLP Group, with the author being the main driver, on the 2022 ImageCLEFmedical Caption Prediction task. Out of 10 teams, our team came in second based on the primary evaluation metric, using an encoder-decoder approach, and first based on the secondary metric, utilising an ensemble technique applied on our generated captions. More about our paper can be found here

Enviroment setup

If you have GPU installed on your system, it is highly suggested to use conda as your virtual enviroment to run code. You can download conda from here

After the installation is completed, open a terminal inside this project and run the following commands, to setup conda enviroment. The latter will be compatible with Tensorflow.

  1. conda create --name tf_gpu
  2. activate tf_gpu
  3. conda install tensorflow-gpu
  4. pip install -r requirements.txt

If you decide to use ClinicalBERT as the main text embeddings extraction model, you have to execute the dc.py in Pytorch-based enviroment. Thus, follow the next steps:

  1. conda create --name torch_gpu
  2. activate torch_gpu
  3. conda install torch-gpu
  4. pip install -r requirements.txt

Now, your environment will be compatible with Pytorch. Then comment-out the imports from models/__init__.py and models/kNN.py

Dataset Instructions

As mentioned in the Abstract section, I participated in ImageCLEFmedical 2022 Caption Prediction task. The code also handles ImageCLEF dataset, but the latter as well as evaluation measures are not provided, due to the fact that we, as a group, signed an End User Agreement. Thus, only the IU X-Ray dataset is available. Go to Datasets, download the dataset (i.e. IU X-Ray) and store it to the data directory

You have to have something like this:

.
├── data
│   ├── iu_xray
|   |   ├──two_captions.json
|   |   ├──two_images.json
|   |   ├──two_tags.json
|   |   └──densenet121.pkl     
|   |
|   ├──fasttext_voc.pkl
|   └──fasttext.npy

Execution Instructions

Disclaimer

Throughout my research on this Thesis, I experimented with models that had state-of-the-art performance (SOTA) on several biomedical datasets (like IU X-Ray, MIMIC III. etc.). These models are provided in SOTA_models directory as submodules repos. More details about each model are provided on my Thesis paper. I do not provide any additional data loaders, which I created for this models. Thus, if you want to further experiment with these models, please do so according to the guidelines provided in each of these repositories.

Main applications

Follow the aforementioned steps to use conda and run the following command, to train my implemented methods (i.e. CNN-RNN, kNN). Default arguments are set.

python3 dc.py

For arguments passing, run the following command in order to watch the available args.

python3 dc.py -h

Particular training procedures

It is suggested to use a Unix-like OS (like Linux) to execute the following specific processes or using WSL in Windows OS.

  • Cross-modal CNN-RNN: bash cross_modal_cnn_rnn.sh
  • Multi-modal CNN-RNN: bash multi_modal_cnn_rnn.sh
  • Cross-modal k-NN: bash cross_modal_kNN.sh
  • Multi-modal CNN-RNN: bash multi_modal_kNN.sh

Citations

If you use or extend my work, please cite my paper.

@unpublished{Zachariadis2022,
  author = "G. Zachariadis",
  title = "Exploring Uni-modal, Cross-modal, and Multi-modal Diagnostic Captioning",
  year = "2022",
  note = "B.Sc. thesis, Department of Informatics, Athens University of Economics and Business}
}

You can read our publication "AUEB NLP Group at ImageCLEFmedical Caption 2022", Proceedings of the CLEF 2022 at this link. If you use or extend our work, please cite our paper:

@article{charalampakos2022aueb,
  title={Aueb nlp group at imageclefmedical caption 2022},
  author={Charalampakos, Foivos and Zachariadis, Giorgos and Pavlopoulos, John and Karatzas, Vasilis and Trakas, Christoforos and Androutsopoulos, Ion},
  year={2022}
}

License

MIT License

thesis-diagnostic-captioning's People

Contributors

zaaachos avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.