Git Product home page Git Product logo

enhancing-x-ray-image-text-matching's Introduction

Enhancing-X-ray-Image-Text-Matching

Project B 044169 – Spring 2022

Technion – Israel Institute of Technology

Mayan Leavitt Edan Kinderman





📝 Summary

Our project aimed to improve the matching of two X-ray scans with their fitting radiology report, using the SGRAF image-text matching model as a baseline. To achieve this, we tested various loss functions, architectures, and training methods.

Through our experimentation, we successfully incorporated the second X-ray scan into our models and achieved significantly better results. Our research provides insights into enhancing the accuracy of image-text matching, which can have important implications for medical diagnosis and treatment.


🫁 The SGRAF Model

The model extract features from the given image and text, and learn vector-based similarity representations between different areas in them. Then a SAF (Similarity Attention Filtartaion) module processes the vectors alignments using attention mechanisms to identify significant alignments and reduce less meaningful ones. The module outputs a matching score indicating the compatibility between the image and text. For more details see the original article [1].


🩺 Data

We used the MIMIC-CXR dataset, which contains studies with a frontal image, a lateral image and a radiology report. In the existing image-text matching models, the lateral image is often not used, even though it contains critical information.


💭 Proposed Improvements

  1. Check different loss functions: Bi-directional ranking loss, NT-Xnet [2] and their weighted sum.
  2. Train two regular SGRAF models simultaneously – one for each viewpoint, and use learned weights to average the similarity scores.
  3. Concatenate the two image types features to obtain one input.
  4. Use positional encoding to differentiate between the two viewpoints.

📊 Comparison

Here is an evaluation of the model's ability to match the image with the correct text. A higher R@K value indicates improved retrieval performance, indicating a better alignment between the image and the corresponding text.

Here is a comparison of the basic models, which trained only on one type of image (frontal or lateral).

Image type Loss R@1 R@5 R@10
Frontal BRL 0.5 4.2 8.5
Lateral BRL 0.5 1.5 3.1
Frontal NT-Xent 6.6 18.6 27.2
Lateral NT-Xent 5.0 13.9 21.1
Frontal Sum 3.3 10.4 15.4
Lateral Sum 0.3 2 3.4

Here is a comparison of the "double" models family, which has two encoders for encoding each image type (frontal and lateral). Those models are trained on both image types.

Model type Learned weights Shared text encoder R@1 R@5 R@10
Uniform Average X X 8.1 21.3 29.3
Weighted Average X X 8.2 21.2 29.5
Double Model V X 6.7 21.1 30.4
Light Double Model V V 8.5 22.5 31.5
Pretrained Model V X 8.1 21 29.6

Here is a comparison of the "concatenation" models family, which gets as input a text and a concatenation of the frontal and lateral images. Some of those models trained with positional encoding [4] added to the images.

Model type Positional encoding R@1 R@5 R@10
Basic Concatenation X 7.4 20.2 29.9
Tagged Features X 6.6 20.1 27
Constant Positional Encoding V 7.4 18.8 27.2
Full Positional Encoding V 7.5 20.6 28

We can see that using the lateral images improves results as opposed to using frontal data alone. In addition, training two models at once achieves the best performance, but concatenating image features is a cheaper way to combine viewpoints.


👨‍💻 Files and Usage

File Name Description
average_eval.py Evaluate 2 trained models
data_xray.py Dealing with the data loading and batching
evaluation_xray.py Evaluate a trained model
model_xray.py The models implementation
opts_xray.py Running experiments using scripts
train_xray.py For training a model

🙌 How to Run the Code

You can train a regular SGRAF model on the MIMIC-CXR dataset, using only frontal images, with this script:

opts_xray.py --model_name '../checkpoint/<model_name>' --view 'frontal' --model_num <number> --model_type 'regular_model' --batch_size 64 --num_epochs 40

🙌 References and credits

  • Project supervisor: Gefen Dawidowicz. Some of the algorithms were implemented based on her code.
  • [1] Z. M. L. Diao, "Similarity Reasoning and Filtration for Image-Text Matching", AAAI conference on artificial intelligence, 2021.
  • [2] K. N. H. Chen, “A Simple Framework for Contrastive Learning of Visual Representations” PMLR , pp. 1597-1607, 2020.
  • [3] S. M. S. P. G. Ji, “Improving Joint Learning of Chest X-Ray and Radiology Report by Word Region Alignment” MLMI , pp. 110-119, 2021.
  • [4] S. P. U. J. N. G. K. P. Vaswani, “Attention is all you need” Advances in neural information processing systems, 2017.

enhancing-x-ray-image-text-matching's People

Contributors

idankinderman avatar mayanleavitt avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.