paulhager / mmcl-tabular-imaging Goto Github PK

View Code? Open in Web Editor NEW

80.0 80.0 13.0 770 KB

Python 85.50% Jupyter Notebook 14.50%

mmcl-tabular-imaging's People

Contributors

Stargazers

Watchers

Forkers

whuhxb yangchenghuang mmmmimic lyrain2001 abbyqiayi ayushi-3536 andrea-fung xiaohuayin jeremstym nathanpainchaud mohammedrakib fkekwick hertera1

mmcl-tabular-imaging's Issues

LAAF questions

Hi Hager,

Thanks for your great work.
I would like to ask for training the Laaf:

do the corruption augmentation and one hot encoder also applied for the label?
Do the loss function for Laaf is using "supcon_loss_custom.py" this py file? I am getting confused with the other two py files "supcon_loss_clip_binary.py" and "supcon_loss_clip.py"

Thank you.

Questions about the pytorch_lightning

Hi,

I encountered the following problem in the process of reproducing:

Traceback (most recent call last):
  File "/home/gong/projects/MMCL-Tabular-Imaging/run.py", line 14, in <module>
    from trainers.pretrain import pretrain
  File "/home/gong/projects/MMCL-Tabular-Imaging/trainers/pretrain.py", line 10, in <module>
    from utils.ssl_online_custom import SSLOnlineEvaluator
  File "/home/gong/projects/MMCL-Tabular-Imaging/utils/ssl_online_custom.py", line 12, in <module>
    from pl_bolts.models.self_supervised.evaluator import SSLEvaluator
  File "/home/gong/anaconda3/envs/selfsuper/lib/python3.9/site-packages/pl_bolts/__init__.py", line 48, in <module>
    from pl_bolts import callbacks, datamodules, datasets, losses, metrics, models, optimizers, transforms, utils
  File "/home/gong/anaconda3/envs/selfsuper/lib/python3.9/site-packages/pl_bolts/callbacks/__init__.py", line 7, in <module>
    from pl_bolts.callbacks.ssl_online import SSLOnlineEvaluator  # noqa: F401
  File "/home/gong/anaconda3/envs/selfsuper/lib/python3.9/site-packages/pl_bolts/callbacks/ssl_online.py", line 5, in <module>
    from pytorch_lightning.metrics.functional import accuracy
ModuleNotFoundError: No module named 'pytorch_lightning.metrics'

I downloaded and installed the corresponding library according to environment.yaml, and installed "torchmetrics" at the same time, but the code still reports an error. I would like to ask how to solve this problem. Is it because some of my parameters are not set correctly or the environment does not match? Can you give me some suggestions?

Thank you!

from Utils import check_or_save

In the given code, the implementation of the check_or_save function is not provided. It appears that the code assumes the existence of the check_or_save function from an external module or file called "Utils". However, since the specific implementation of the function is not available in the provided code, it's not possible to determine how it is implemented without additional information or access to the "Utils" module.

very important issue

Very well,

Would you please help me?

When I run in shell:

python run.py

I get the error:

In 'config': Could not find 'dataset/cardiac_new'

Available options in 'dataset':
	cardiac
	cardinal_ht_severity
	dvm_all_server
	dvm_all_server_0.01
	dvm_all_server_0.1
Config search path:
	provider=hydra, path=pkg://hydra.conf
	provider=main, path=file:///home/a/Downloads/2/MMCL-Tabular-Imaging-main/configs
	provider=schema, path=structured://

When I changed the config.yaml "cardiac_new" to "cardiac" I get another error:

Global seed set to 2022
Error executing job with overrides: []
Traceback (most recent call last):
  File "/home/a/Downloads/2/MMCL-Tabular-Imaging-main/run.py", line 85, in control
    run(args)
  File "/home/a/Downloads/2/MMCL-Tabular-Imaging-main/run.py", line 27, in run
    args = prepend_paths(args)
  File "/home/a/Downloads/2/MMCL-Tabular-Imaging-main/utils/utils.py", line 148, in prepend_paths
    db = hparams.data_base
omegaconf.errors.MissingMandatoryValue: Missing mandatory value: data_base
    full_key: data_base
    object_type=dict

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.

Would you please help me how I can solve this issue?

[Errno 2] No such file or directory: 'data/DVM/features/train_paths_all_views.pt

Hello, I have question for running your code.

I've run the command:

python run.py pretrain=True datatype=multimodal checkpoint=checkpoints

And have encountered the following issue.

The error message says, we do not have "train_paths_all_views.pt" file.
I've noticed that we only have two files,

train_paths_all_views_server.pt train_paths_all_views_tower.pt

the cell might be relevant is:

And the whole datapreprocessing code, including cell does not save train_paths_all_views.pt.

May I ask how can I handle this problem?

I appreciate your help.

Carrtesy

there is no val_images.pt?

in this code you write:

data/create_dvm_dataset.ipynb

are you sure about this file?

Where this file should be created "val_images.pt"?

is there typo here?

Thanks for your help again

dataset argument? where is it?

1+ Stars.

Very nice.

one problem:

why this error occurred?

which dataset argument it means?

please help me !

Training Details

Hi Paul,

I have seen that your paper is quite similar to our work "Learning visual models using a knowledge graph as a trainer" published at ISWC21, where we learn image-based classifier with tabular data in form of a knowledge graph (KG).
We also investigated the importance of context (in the KG) in https://arxiv.org/abs/2210.11233, also published at ISWC22.

I am quite interested to push research further, also to other domains.

Therefore, can you please provide some information about the training details?
(Number of GPUs, training time)

I am trying to reproduce the results with the given config files and an Nvidia-V100, however I can not achieve the same results.
The images are loaded into a single tensor and live_loading= False.

with DVM Dataset:
pretrain=False, datatype=tabular --> best.val.acc = 70.86% (31m 33s)
pretrain=False, datatype=imaging --> best.val.acc = 88.78% (2d 5h 33m)
pretrain=False, datatype=multimodal --> best.val.acc = 87.43 % (20h 11m 6)

and
pretrain=True, datatype=tabular --> best.val.acc = 70.83% (9h 53m 19)

Still running:
Epoch = 91
pretrain=True, datatype=multimodal --> best.val.acc = 16.85% (6d 22h 48m)

For tabular data I use (since we don't have access to 'Ad_table (extra).csv'):
data_train_tabular: dvm_features_train_noOH_all_views.csv

DVM Cars dataset explanation

Hey,
I am not sure how the dataset preparation is done.
As far as I understood, the labels are the models of the car brands, e.g. Abarth_124 Spider.
"Car models with less than 100 samples were removed, resulting in 286 target classes."

However, when I do this step and remove all the classes with less than 100 images, I get 580 classes, what am I missing?
Is the order of the images relevant or is it just important that image_list and label_list are in the same order?

Thank you!

code for files in data_base

Hi,
could you provide the code you used for preparing "labels_model_all_train_all_views.pt" in data_base folder (config)?

I having trouble reproducing your work.
I think it might has something to do with preparing the training dataset.

Also, any reason you set "nclasses = hparams.batch_size" in line 33 of models/MultimodalSimCLR.py ?
Are we only meant not to have more than one sample of the same class in a batch?

Thanks

question about the config.yaml option

I got it, thanks.

Testing Issue

Hi,

Thank You for your research work. I am having trouble in running python test.py.
The test.py file contains self.run_process(controller.control, args).
So, did you have any separately written "Controller" file, that you have imported and have missed adding it in the repository, or is it a normal Python library?
The environment.yaml file you have provided does not have any dependency on "controller". I tried pip install controller and run python test.py again, but then I am getting Attribute Error: Module 'controller' has no attribute 'control'.
So, is there some other python dependency that has controller.control in it?

I am done with the training part, and I am not able to proceed with the testing for a long time now. Can you please help me resolve this issue or provide me with a method to bypass this controller part, if possible.

Thank You for your help.

Image Normalization

Thank you for this great contribution. I have a question regarding to the normalization of images. I found that in the notebook file data/create_dvm_dataset.ipynb, you transform the value range from 0-255 to 0-1 by dividing 255. However, normally the image normalization process also include image = (image - mean) / std to make the image distribution have the mean value of 0 and the std of 1.

Could I kindly ask if you have some special reasons for not doing that?

init.py missing in "datasets"

I think there is a init.py in the datasets folder missing so that it can be recognized as a import-folder. At least that fixed a import problem for me. Just a minor issue.

cardiac_new

where is dataset: cardiac_new,i cant find it.

Questions on data preprocessing and batch size

Thanks for uploading Data folder,
I have few more question which I would be so grateful if you could help me reproduce the paper.

what was the purpose of "check_or_save" function? (could you release the code for this function? )
with nclasses = batch size, should there only be a single sample from a certain class in a batch ?
If so, how could the batch size be larger than the number of classes ? (e.g., DVM has 286 classes, and trained with 512 batch size)
what is the purpose of "indices" in clip loss ?

Thanks for your help !

Potential Parameter Issues in Evaluation

Hi Hager,

Thank you for sharing your work. I have been going through your code and found it extremely insightful.

However, I came across a few lines that I thought might need some adjustments. Could you please take a look and confirm if these changes are indeed necessary?

Evaluator.py lines 17-22: The code uses hparams.datatype to select the evaluating model based on the modality. Should it instead be hparams.eval_datatype? These two parameters seem to have different meanings.
evaluate.py lines 56-59: The hparams.weights is being used, but I noticed that the config.yaml file only contains a parameter named weighted_sampler. Is it possible that hparams.weights should be changed to weighted_sampler?

Thank you.

Environment Setup Issue

Hello Hager,

Thank you for this awesome research work.
I have trouble configure the environment properly from the yaml file as some of the libraries are deprecated. Is it possible for you to share your environment as setup.exe file, so that I can mirror it.
I appreciate your help.
Thank you

CMR data preprocessing

I am particularly interested in the preprocessing method for CMR images you mentioned in your paper, specifically the part where you stated, "The images used are two-channel 2D images whose channels are the middle baso-apical slice of the short axis cardiac MRI image at end-systolic and end-diastolic phases." In my experience with short axis CMR images, the number of slices varies between individuals due to differences in heart sizes, often ranging from 8 to 12 slices. I am curious about how you determined the middle baso-apical slice in your study.
I have contemplated a few approaches, such as consistently selecting the fourth or fifth slice or choosing the median slice number. However, I am keen to understand the methodology you employed in your research.
Or are you comfortable disclosing your data preprocessing code?
Thank you very much.

Would you provide the config yaml file whose blanks are all filled ?

Thanks for your efforts to provide full code.
It's quite hard to deploy your code with pre-given config yaml file, because they are not fully filled. For example, hp.data_base is blank. Could you provide example config yaml file ?

Can you provide a data link for research paper?

DVM Dataset Processing Out of Memory Issue

Hi,

Firstly, I'd like to express my admiration for the impressive work.

I encountered an Out Of Memory issue when running create_dvm_dataset.ipynb, particularly at the image normalization and saving step. To align my setup with yours and ensure a smooth run, could you please share the GPU memory specifications and any batch size or optimization strategies used in your environment?

Thanks in advance for your time.

DVM Preprocessing Error: No such file or directory: 'features/dvm_features_train_noOH_all_views_physical_jittered_50.csv'

Hello,

First of all, thanks for your great work.

I'd like to ask questions regarding your dvm datasets preprocessing jupyter notebook code.

I works well till "Check transform" cells, but it complains when we try to execute the cell right below.

What might be relevant is the cell after, which defines the function add_jitter. But it also complains, as "Ad_table_physical_filled.csv" does not exist.

what might be relevant is another cell that loads "Ad table (extra).csv", but I don't know what this file is, and cannot find in DVM folder.

May I ask how can I handle this problem?

Thanks in advance,

Carrtesy

How to do image preprocessing

Hi,

I'm trying to use your network structure for pulmonary embolism detection based on CT images and electronic medical records, but I have some problems in the reproduction process:

Since I haven't got the UKBB dataset yet, I don't know the structure of the dataset, can you introduce how you deal with medical images and electronic medical records? What should their directory look like in code?
If the inputs are CT images and electronic medical records, how can I get the respective .py files?

Thanks！