chaoyi-wu / radfm Goto Github PK
View Code? Open in Web Editor NEWThe official code for "Towards Generalist Foundation Model for Radiology by Leveraging Web-scale 2D&3D Medical Data".
The official code for "Towards Generalist Foundation Model for Radiology by Leveraging Web-scale 2D&3D Medical Data".
Could you please release requirement.txt? I have met quite a lot of problems about preparing environment.
Where can we download the RadBench dataset?
Hello, I downloaded the radiopaedia data you updated recently and found that the rad3d_train.json seems to be incomplete after line 764425. Could you double check and share the full version with us? Additionally, It seems that all image files whose "image_path" in the json files that have ".npy" as the postfix of their values cannot be found. Can you fix this as well?
Thank you for your great contribution to the community.
如题,请问百度云的提取码是什么?谢谢!
Thanks for your great job! Compared to some works I reproduced, this repo's result looks convinced. But, I met some problems during training with transformers==4.28.1. The situation is that the correct result is produced in transformers==4.28.0.dev0-py3.9.egg while 4.28.1 will come out weird output.
for 4.28.0.dev0-py3.9.egg
for 4.28.1
However, if I used 4.28.0.dev0-py3.9.egg , it will throw errors like
ImportError: cannot import name 'strtobool' from 'transformers.utils'
Do you have any idea about this weird output or dependence? Thanks.
Hello!
I rewrite the question in Quick_demo test.py such as "Please write a report for this scan". but it outputs nothing. So I want to know if there has a requirement for input text?
Dear Author
Thanks for the exciting work! It's amazing that the model has been trained on and thus might cover over 5000 diseases. I am wondering, since your goal is building a generalist foundation model for radiology, does the model work beyond the 5000 diseases that it is trained on?
For example, i work with a heart disease that is diagnose via ultrasound (which is a modality covered in your model). Can i ask the diagnosis question about this disease to the model?
Thanks!
你好!我看论文中提到使用的NVIDIA A100 GPU(80GB)训练模型,每一个GPU上batchsiz为1,请问如果我希望使用自己的3D数据微调所提出的预训练的3D ViT,具体需要多少显存?所需要的显存是不是应该与输入图像的尺寸相关?
我看论文提到的数据预处理是将输入的3D图像最大限制在了(64,256,256)的大小,如果就是(64,256,256)这个尺寸作为输入,需要占用多少显存?
Dear Author
Thank you for the exciting work.
I have run your demo and it works. How do i modify your code to do few-shot and COT prompting?
would you be able to provide some example?
The work you've done is truly commendable and holds immense potential.
However, I've come across an issue that I'd like to bring to your attention. When using specific queries, particularly those related to medical imaging such as "Please write a report for this X-ray image" or "Please write a radiology report consists of findings that explains this medical scan", the model either provides no response or consistently outputs "MS Plaques".
So what prompt should I use to let this have outputs?
Hello, I downloaded the radiopaedia data you provided and found that the rad3d_train.json seems not valid. It shows "Unterminated string starting at: line 1704502 column 13 (char 80168752)" when loading it in python. And I checked the json file, it seems to be incomplete after line 1704502. Could you double check and share the full version with us? Thank you very much!
Hi, are there any plans to release the Rad3D dataset?
背景:我的机器有多张GPU显卡,但是每张GPU显卡显存只有16G,目前运行test.py进行模型推理尝试,提示显存不够。
问题1: 该RadFM模型推理所需的GPU显存需要至少多大?例如 N GB
问题2:如果单张GPU显存没有达到N GB,那是否有其他的方案使得能运行模型推理,例如DeepSpeed?能否给具体的方案说明?
Thank you for making available your model. Awesome work!
I was trying to load the model and I got the error message (below). I downloaded all files and then I concatenate them doing cat RadFM.z* > model.zip
. Then I unzip performing: unzip model.zip
and get the file pytorch_model.bin
. Am I doing this procedure right? Thank you.
RuntimeError: Error(s) in loading state_dict for MultiLLaMAForCausalLM: Unexpected key(s) in state_dict: "lang_model.model.layers.0.self_attn.rotary_emb.inv_freq", "lang_model.model.layers.1.self_attn.rotary_emb.inv_freq", "lang_model.model.layers.2.self_attn.rotary_emb.inv_freq", "lang_model.model.layers.3.self_attn.rotary_emb.inv_freq", "lang_model.model.layers.4.self_attn.rotary_emb.inv_freq", "lang_model.model.layers.5.self_attn.rotary_emb.inv_freq", "lang_model.model.layers.6.self_attn.rotary_emb.inv_freq", "lang_model.model.layers.7.self_attn.rotary_emb.inv_freq", "lang_model.model.layers.8.self_attn.rotary_emb.inv_freq", "lang_model.model.layers.9.self_attn.rotary_emb.inv_freq", "lang_model.model.layers.10.self_attn.rotary_emb.inv_freq", "lang_model.model.layers.11.self_attn.rotary_emb.inv_freq", "lang_model.model.layers.12.self_attn.rotary_emb.inv_freq", "lang_model.model.layers.13.self_attn.rotary_emb.inv_freq", "lang_model.model.layers.14.self_attn.rotary_emb.inv_freq", "lang_model.model.layers.15.self_attn.rotary_emb.inv_freq", "lang_model.model.layers.16.self_attn.rotary_emb.inv_freq", "lang_model.model.layers.17.self_attn.rotary_emb.inv_freq", "lang_model.model.layers.18.self_attn.rotary_emb.inv_freq", "lang_model.model.layers.19.self_attn.rotary_emb.inv_freq", "lang_model.model.layers.20.self_attn.rotary_emb.inv_freq", "lang_model.model.layers.21.self_attn.rotary_emb.inv_freq", "lang_model.model.layers.22.self_attn.rotary_emb.inv_freq", "lang_model.model.layers.23.self_attn.rotary_emb.inv_freq", "lang_model.model.layers.24.self_attn.rotary_emb.inv_freq", "lang_model.model.layers.25.self_attn.rotary_emb.inv_freq", "lang_model.model.layers.26.self_attn.rotary_emb.inv_freq", "lang_model.model.layers.27.self_attn.rotary_emb.inv_freq", "lang_model.model.layers.28.self_attn.rotary_emb.inv_freq", "lang_model.model.layers.29.self_attn.rotary_emb.inv_freq", "lang_model.model.layers.30.self_attn.rotary_emb.inv_freq", "lang_model.model.layers.31.self_attn.rotary_emb.inv_freq", "lang_model.model.layers.32.self_attn.rotary_emb.inv_freq", "lang_model.model.layers.33.self_attn.rotary_emb.inv_freq", "lang_model.model.layers.34.self_attn.rotary_emb.inv_freq", "lang_model.model.layers.35.self_attn.rotary_emb.inv_freq", "lang_model.model.layers.36.self_attn.rotary_emb.inv_freq", "lang_model.model.layers.37.self_attn.rotary_emb.inv_freq", "lang_model.model.layers.38.self_attn.rotary_emb.inv_freq", "lang_model.model.layers.39.self_attn.rotary_emb.inv_freq", "embedding_layer.bert_model.embeddings.position_ids".
I am in the process of training a model and I have a question regarding the lang_encoder_path. I've placed the pytorch_model.bin file in it. However, I encountered an issue where I received the following message: "Some weights of the model checkpoint at ./Language_files were not used when initializing LlamaForCausalLM."
Could you provide guidance on this matter?
Hi, thanks for sharing such a meaningful work!
I encountered an issue while testing the downloaded disease diagnosis datasets (e.g., Vindr-SpineXR, Vindr-PCXR, and Vindr-Mammo). The image formats in these datasets are ".dicom", whereas the .csv file you uploaded in Hugging Face uses the ".png" format. I have attached a screenshot to illustrate this issue:
Given this situation, I kindly request your assistance in providing a straightforward guide or instructions on converting the ".dicom" files to ".png" format. This guidance would be immensely beneficial not only to me but potentially to other users who might encounter the same challenge.
这模型怎么解压?
In the requirements.txt file, the version of scispacy is not compatible. I removed "== ..." and changed it to scispacy, which works.
Dear Chaoyi,
I trust this message finds you well. I recently had the opportunity to delve into your remarkable work, and I must express my admiration for the innovative approach and substantial contributions outlined in your article.
In particular, the integration of language and vision in your proposed large model captured my attention. The versatility showcased in handling combinations of visual images and language questions is indeed impressive. However, my inquiry pertains to the potential applicability of your model as a standalone feature extractor for radiology images.
Given the success of models like CLIP in serving as effective image encoders, I am curious to know if your model, too, could be employed in a similar capacity. Can it be utilized as a pre-trained feature extractor for radiology images without the need for accompanying language inputs? I am interested in understanding the extent to which your model's capabilities extend to image processing tasks in the domain of radiology.
Thank you for your time and consideration. I look forward to gaining insights into this aspect of your work and exploring potential applications in the realm of medical imaging.
Hi Wu,
Thank you for your great contribution to the community. I have a small question about the data shape. I find this operation here, it seems that the raw 3D CT shape is N,C,H,W,D. In image_dict, “image” shape is C,H,W,D after indexing. However, if contain_nan here, the image shape is C,H,W,D, which can not be indexed on image.shape[0]. I don't know if I'm thinking right here. I just want to figure out what the shape of "image" in image_dict is, and whether their shapes are uniform regardless of whether they are 2D images or 3D images.
Looking forward to your reply.
Best regards,
BAI Fan
thank you for your works, but i have a small question about the paper and the code.
in the paper, the perceiver is implemented as
"For the visual encoder, we adopt a 12-layer 3D ViT with 768 feature dimensions and the
perceiver is chosen as 6-layer transformer decoder with the learnable latent array in 32 × 5120 dimension, so
that all images will be embeded as a 32 × 5120 feature embedding after passing visual encoding and perceiver
aggregation."
however in the code, the latent dim is 32x768, and it is extended to 32x5120 after a fc layer, im not quiet understand about the effect of the fc layer.
i think not using 5120 in the perceiver may be the reason of GPU memory limitation, and extend to 5120 in fc layer may be to decode the information in the latent to facilitate processing by subsequent models.
so i want to ask the effect of the fc layer.
Thx for your awesome work collections. As the dataset contains multiple image sources, wonder if is there any tutorial for downloading and preprocessing these datasets.(e.g., MedPix/images/
etc.)
Hi,
thank you all for this awesome work, and also for sharing it!
I tried to run the Quick_demo by following the README instructions:
Downloaded the pytorch_model.bin from [1] and placed it in folder Quick_demo/
Following issue #5, downgraded to transformers 4.28.1
Ran python test.py with a) 4x RTX3090, then b) 8x RTX3090, but got the following error OOM message both times:
Setup tokenizer
Finish loading tokenizer
Setup demo case
Finish loading demo case
Setup Model
Finish loading model
Traceback (most recent call last):
File "/storage/homefs/lz20w714/git/radfm/Quick_demo/test.py", line 121, in
main()
File "/storage/homefs/lz20w714/git/radfm/Quick_demo/test.py", line 105, in main
model = model.to('cuda')
File "/storage/homefs/lz20w714/anaconda3/envs/llmood/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1145, in to
return self._apply(convert)
File "/storage/homefs/lz20w714/anaconda3/envs/llmood/lib/python3.9/site-packages/torch/nn/modules/module.py", line 797, in _apply
module._apply(fn)
File "/storage/homefs/lz20w714/anaconda3/envs/llmood/lib/python3.9/site-packages/torch/nn/modules/module.py", line 797, in _apply
module._apply(fn)
File "/storage/homefs/lz20w714/anaconda3/envs/llmood/lib/python3.9/site-packages/torch/nn/modules/module.py", line 797, in _apply
module._apply(fn)
[Previous line repeated 3 more times]
File "/storage/homefs/lz20w714/anaconda3/envs/llmood/lib/python3.9/site-packages/torch/nn/modules/module.py", line 820, in _apply
param_applied = fn(param)
File "/storage/homefs/lz20w714/anaconda3/envs/llmood/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1143, in convert
return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 100.00 MiB (GPU 0; 23.70 GiB total capacity; 23.39 GiB already allocated; 10.69 MiB free; 23.39 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Should there be used a quantized version of the model or sth else?
Thanks for your help!
Best,
Lukas
[1] https://huggingface.co/chaoyi-wu/RadFM/blob/main/pytorch_model.zip
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.