Git Product home page Git Product logo

cxr-clip's People

Contributors

kihyunu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

cxr-clip's Issues

How to tackle missing both impressions and findings sections

I appreciate your work. It helps us very much.

I want to ask how did you handle reports where both findings and impressions are missing. According to what I've read in the official paper of mimic-cxr dataset, there are approximately 10K reports devoid of both impressions and findings, instead they includes last_paragraph or comparisons section. Did you replace missing impressions and findings with these sections or just simply leave them out? I read cursorly through your code and it seemed like you did not accept empty list in the text column of the dataset.

image

Thank you in advance for your supports.

[BOS] and [EOS] token missing

Hello,

thanks for sharing the code!

I met two problems when using the code:

(1) when I try to initialize the tokenizer, there is a line tokenizer.bos_token_id = tokenizer.cls_token_id

I receive an error: TypeError: 'int' object is not iterable

My understanding is that, since you donot define the [BOS] token (because you are using a BERT tokenizer), you cannot directly define a value of this token.

(2) following the first question, I found the global text feature extraction of the text encoder uses the [EOS] token. But this token does not exist in the BERT tokenizer either.

Could you please let me know how to correctly initialize your tokenizer with both [BOS] and [EOS] tokens and their corresponding ids?

Thank you!

Some confusion about pre training datasets

Thanks a lot for the releasing code!

I have some doubts about the use of pre training datasets

In /cxr-clip/cxrclip/data/datasets/imagetext.py

image

This code doesn't seem quite correct. After so many if statements, it seems that the selection of images in this code does not meet the requirements of "sample images from two distinct views as possible" described in the paper, and some parts may even appear incorrect. Because every time the 80th command is executed, the "image_path_list" variable is reset instead of truly becoming a list for storing image paths.

This also raises another question for me, how are the elements of the list in the view column counted for the training csv of MIMIC-CXR? For example, in a study, I have two AP images, one PA image, and two Lateral images. Should my list be [AP, AP, PA, Lateral, Lateral] or just [AP, PA, Lateral]?

Looking forward to your answer!

Some questions about data processing

Hi,
It is a meaningful work : )
I have some questions.
The "text" in each subject is like: [[Findings1,Impression1],[Findings2,Impression2][Findings3,Impression3]...[Findings,Impression]].
Is this correct?
I have seen in the before Issues, it said if findings is missing and just take impression as a list [impression]. But actually when I try it, the "back_translation.py" showes wrong that "ValueError: not enough values to unpack (expected 2, got 1)" which means "list: [impression]" is incorrect.
Thank you.

Cannot reproduce the results in the paper

I tried training the pretrain model on mimic dataset with 30 epochs, batch size 32, lr 5e-6, weight decay 1e-4, the optimizer I used was adamw. Other configs were analogous to the default configs in the repo. After completing 30 epochs, this is the loss I got over each epoch:
image
When I run evaluate_clip script, I got this result:
image
which is inferior to your results.

This is my data after processed:

subject_id,study_id,image,view,PA,LATERAL,AP,LL,nan,LAO,RAO,AP AXIAL,SWIMMERS,PA LLD,AP LLD,XTABLE LATERAL,AP RLD,PA RLD,LPO,text,split,text_augment
10000032,53189527,"['/mnt/ssd1/CXR/data/imgs/p10/p10000032/s53189527/2a2277a9-b0ded155-c0de8eb9-c124d10e-82c5caab.jpg', '/mnt/ssd1/CXR/data/imgs/p10/p10000032/s53189527/e084de3b-be89b11e-20fe3f9f-9c8d8dfe-4cfd202c.jpg']","['LATERAL', 'PA']",['/mnt/ssd1/CXR/data/imgs/p10/p10000032/s53189527/2a2277a9-b0ded155-c0de8eb9-c124d10e-82c5caab.jpg'],['/mnt/ssd1/CXR/data/imgs/p10/p10000032/s53189527/e084de3b-be89b11e-20fe3f9f-9c8d8dfe-4cfd202c.jpg'],,,,,,,,,,,,,,"['The cardiac, mediastinal and hilar contours are normal. Pulmonary vasculature\n is normal.  Lungs are clear. No pleural effusion or pneumothorax is present.\n Multiple clips are again seen projecting over the left breast.  Remote\n left-sided rib fractures are also re- demonstrated.', 'No acute cardiopulmonary abnormality.']",train,"['Heart contours, medistina and hilar are normal. Lung vascularity and usual. Lung. Lung is clear. No pleural effusion or pneumometers. Multiple clips are still being seen projecting to the left part of the breast. They are also remote fractures of the left ribs are re-proved.', 'No acute cardiac pulmonary disease and dysrhythmia.']"
10000032,53911762,"['/mnt/ssd1/CXR/data/imgs/p10/p10000032/s53911762/68b5c4b1-227d0485-9cc38c3f-7b84ab51-4b472714.jpg', '/mnt/ssd1/CXR/data/imgs/p10/p10000032/s53911762/fffabebf-74fd3a1f-673b6b41-96ec0ac9-2ab69818.jpg']",['AP'],,,"['/mnt/ssd1/CXR/data/imgs/p10/p10000032/s53911762/68b5c4b1-227d0485-9cc38c3f-7b84ab51-4b472714.jpg', '/mnt/ssd1/CXR/data/imgs/p10/p10000032/s53911762/fffabebf-74fd3a1f-673b6b41-96ec0ac9-2ab69818.jpg']",,,,,,,,,,,,,"['Single frontal view of the chest provided.\n \n There is no focal consolidation, effusion, or pneumothorax. The\n cardiomediastinal silhouette is normal.  Again seen are multiple clips\n projecting over the left breast and remote left-sided rib fractures.  No free\n air below the right hemidiaphragm is seen.', 'No acute intrathoracic process.']",train,"['One front view of your chest does not make a focal effect, effusion, or pneumothorax. cardiomediastinal outlines are normally common to one another. Again, it can be seen that you are projecting several clips onto your left chest and off the fractures of your left side coast. No free air under your right hemidiafragma should be observed.', ""Yeah, well, it can't really be discreet.""]"
10001122,53447138,"['/mnt/ssd1/CXR/data/imgs/p10/p10001122/s53447138/8039752c-2ea661b7-16f1eafe-055b7e7b-dbd4cdd1.jpg', '/mnt/ssd1/CXR/data/imgs/p10/p10001122/s53447138/832b57d8-3ae08663-e152699e-51c5db98-b7cb4226.jpg']",[],,,,,,,,,,,,,,,,['The lung volumes are normal.  No evidence of TB or other parenchymal changes. \n Mild elevation of the left hemidiaphragm.  No pleural effusions.  No\n pneumonia.  The lateral radiograph shows evidence of anterior ligament\n calcification at the anterior aspect of the thoracic spine.  Status post\n cholecystectomy.'],train,"['There is no sign that there are other parenchymal changes and TB, no pleural effusion. Lateral X-ray shows evidence of calciification of ligament at the rear of the front of the rib. Post cholecystitomy. No sign are observed of TB and any other parenchymal variations.']"
10000032,56699142,['/mnt/ssd1/CXR/data/imgs/p10/p10000032/s56699142/ea030e7a-2e3b1346-bc518786-7a8fd698-f673b44c.jpg'],['AP'],,,['/mnt/ssd1/CXR/data/imgs/p10/p10000032/s56699142/ea030e7a-2e3b1346-bc518786-7a8fd698-f673b44c.jpg'],,,,,,,,,,,,,"['The lungs are clear of focal consolidation, pleural effusion or pneumothorax. \n The heart size is normal.  The mediastinal contours are normal. Multiple\n surgical clips project over the left breast, and old left rib fractures are\n noted.', 'No acute cardiopulmonary process.']",train,"['They are free from focal bloated moments, pleural precipitations or pneumothorax, normal heart size. The thoracic contours are normal. There are numerous surgical fractures in the left breast and the old fractures of left ribs have been noted.', 'No acute cardio-pilmonerous processes.']"

Can you elaborate if my processed data is wrong somewhere. I wonder if keeping other views than AP, PA, and LATERAL leads to the inferior figures. Plus, can you share me your training logs?

I look forward to you supports. Thank you in advance.

MIMIC-CXR Pretrain Dataset Pre-Processing

Thank you for releasing the code of this fantastic work! :)

I have a few questions regarding the data preprocessing of MIMIC dataset. I saw there are 5 steps in the preprocessing in the ReadME, I am wondering whether there is any code to perform any data cleaning. Because there are some samples that miss the 'findings' or 'impression' and after the back translation, this will give weird augmented text as shown in my screenshot.

From what I know the only preprocessing code is the "back_translation.py", did I miss any preprocessing code? And if there are other pre-processing steps, could you release these code?

Cheers!

mimic
图片_20240126223337

How to resume my training?

I always encounter the situation shown in the following picture during training, where the process is terminated by signal 9. I don't know why this situation always occurs, but now I hope to continue training on the checkpoint that I have trained for 4 epochs. How should I do this?

12172/13725 [4:44:35<35:03,  1.35s/it, lr=['0.00004109'], loss=0.975784, CUDA-Mem=0%, CUDAUtil=0%]WARNING:torch.distributed.elastic.rendezvous.dynamic_rendezvous:The node 'a4000x4313_1543408_0' has failed to send a keep-alive heartbeat to the rendezvous '78d6761b-372b-4b1a-afe6-abb8735493a4' due to an error of type RendezvousTimeoutError.                                                                                              
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1543460 closing signal SIGTERM                                            
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: -9) local_rank: 1 (pid: 1543461) of binary: /home/gem/anaconda3/envs/cxr-clip/bin/python                                                                                                                                  
Traceback (most recent call last):                                                                                                              
  File "/home/gem/anaconda3/envs/cxr-clip/bin/torchrun", line 8, in <module>                                                                    
    sys.exit(main())                                                                                                                            
  File "/home/gem/anaconda3/envs/cxr-clip/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, i
n wrapper                                                                                                                                       
    return f(*args, **kwargs)                                                                                                                   
  File "/home/gem/anaconda3/envs/cxr-clip/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main                              
    run(args)                                                                                                                                   
  File "/home/gem/anaconda3/envs/cxr-clip/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run                               
    elastic_launch(                                                                                                                             
  File "/home/gem/anaconda3/envs/cxr-clip/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__                 
    return launch_agent(self._config, self._entrypoint, list(args))                                                                             
  File "/home/gem/anaconda3/envs/cxr-clip/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent             
    raise ChildFailedError(                                                                                                                     
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:                                                                              
========================================================                                                                                        
train.py FAILED                                                                                                                                 
--------------------------------------------------------                                                                                        
Failures:                                                                                                                                       
  <NO_OTHER_FAILURES>                                                                                                                           
--------------------------------------------------------                                                                                        
Root Cause (first observed failure):                                                                                                            
[0]:                                                                                                                                            
  time      : 2024-04-09_16:19:50                                                                                                               
  host      : a4000x4313                                                                                                                        
  rank      : 1 (local_rank: 1)                                                                                                                 
  exitcode  : -9 (pid: 1543461)                                                                                                                 
  error_file: <N/A>                                                                                                                             
  traceback : Signal 9 (SIGKILL) received by PID 1543461                                                                                        
========================================================

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.