stanfordmimi / ddm2 Goto Github PK
View Code? Open in Web Editor NEW[ICLR2023] Official repository of DDM2: Self-Supervised Diffusion MRI Denoising with Generative Diffusion Models
[ICLR2023] Official repository of DDM2: Self-Supervised Diffusion MRI Denoising with Generative Diffusion Models
Hi,thank you for your great work. I encountered this difficulty while running the program. Can you guide me on exactly where the error occurred?
Traceback (most recent call last):
File "D:\DDM2\train_noise_model.py", line 25, in
opt = Logger.parse(args, stage=1)
File "D:\DDM2\core\logger.py", line 28, in parse
with open(opt_path, 'r') as f:
FileNotFoundError: [Errno 2] No such file or directory: 'config/sr_sr3_16_128.json'
Thank you very much!
Hello, your work is impressive. I am trying to reproduce your result and I have 2 questions.
First, I am not sure but it seems that your codes only support one gpu training? Can parallelism accelerate training?
Second, I want to know if this denoise method can be applied to other CV field?Have you conduct any experiments?
Thanks!
There is no clear instructions in the repo how to calculate and verify the metrics published in the paper, neither it has been calculated in training and validation step only the input and denoised images are saved as results in the experiments folder.
There is no reference of metrics PSNR and SSIM as defined in the file: https://github.com/StanfordMIMI/DDM2/blob/4f5a551a7f16e18883e3bf7451df7b46e691236d/core/metrics.py
Can you please add instructions to calculated the metrics after third stage of training is completed.
Hi, thank you for your contribution and sharing.
I would like to ask, why does sqrt_alphas_cumprod_prev need append 1? And how should we solve the problem when the time step matched in the second stage is 0?
Hi,
Thank you for this repository. It is great work. I have a few questions and would like to clarify some doubts.
I was successfully able to run all three stages, but I have a question: Do we need to pass the Stage 1 model to Stage 3 during training? I noticed that when I train the Stage 3 diffusion stage by loading the network of Stage 1 results through the config file, the results are better.
I pass the Stage 1 trained model under this:
"noise_model": { "resume_state":
However, if the Stage 1 Noise model is not passed through Stage 3 training, then the results drop and the network is not able to learn even after all the iterations.
Can you explain why that might be the case and why the weights and state dictionary of Stage 1 are needed for Stage 3?
Thank you.
Hi, I looked closely at the posted code and noticed that the sampling process is different from the original DDPM and the labels for the training process are different, the DDPM is noise and here you have the original noisy image, I would like to ask if the iterative formula for the sampling process is mentioned in any article?
Hi @tiangexiang
I have a few questions about evaluation and metrics.
I run the denoise.py with the save option, but in the code I see:
if args.save:
dataset_opt['val_volume_idx'] = 32 # save only the 32th volume
dataset_opt['val_slice_idx'] = 'all' #save all slices
After this step I am going to run the evaluation metrics, so my question is:
Is that value (val_volume_idx = 32) correct?
My second question is related to quantitative_metrics.ipynb:
Can I use quantitative_metrics.ipynb to evaluate the results obtained for PPMI or Sherbrooke datasets?
Thanks!
I was experimenting with a simulated 4D phantom of ellipse with 4th dimension being time. I generated a 3D ellipse with smaller ellipsed and changed signal intensity over time and concatenated it to make a 4D MRI. I added noise to it and ran the three stages. When the noise was high, the model performed fine. When I added less noise, the first stage was good. But the third stage and denoising gave empty image. I have added the results here. Can you guide me in debugging why this could be happening?
First stage:
But the third stage:
Whereas when I added high noise, the third stage looked like :
Thank you in advance for your time.
Hello,
Thanks for making this tool available. When I processed the Hardi_150 example following your instructions, the denoised image has a spatial resolution of 1x1x1 mm^3. However, the input image voxel size is 2x2x2 mm^3. The volume dimensions are the same in both input and output 81x106x76.
Any ideas what could be causing this?
Thanks!
Hi, thanks for publishing the code!
I am trying to implement your code on other MRI modalities, whose noise levels may be less series than DWI images.
However, I found the matched states in Stage 2 are quite small (around 0~50). Do you have any insights on how to use your method on less noisy images, or have you done these kinds of experiments?
Thanks so much!
HI,
Great work and great paper, I'm wondering if you are open for other researchers to cross-validate and then work along this direction, maybe you can provide a sample dataset with its associated trained model so that we can use that as a reference to run unit-test to just avoid any caveat. Thanks!
cheers
Joe
Great job! Thanks! Will you upload the trained model in the future? It's something we can use to infer directly, without training. I don't know if it's okay to ask, thank you again!!!
File "/home/dell/ZXT/Code/DiffusionModels/DDM2/model/model.py", line 71, in init
self.load_network()
File "/home/dell/ZXT/Code/DiffusionModels/DDM2/model/model.py", line 224, in load_network
self.optG.load_state_dict(opt['optimizer'])
File "/home/dell/anaconda3/envs/pytorch/lib/python3.10/site-packages/torch/optim/optimizer.py", line 171, in load_state_dict
raise ValueError("loaded state dict contains a parameter group "
ValueError: loaded state dict contains a parameter group that doesn't match the size of optimizer's group
I encountered the above problems when I run ./run_stage3.sh, how can I solve it?
Hello,I understand that the first stage involves generating data using a denoising model, which results in data lacking medical details. However, if I have a set of corresponding clearer images for the data used in training, can I replace the first stage generated data with these clearer images? If so, would this be beneficial for the subsequent second and third stages?
Thanks!
Can you given example for one dataset like stanford_hardi what the typical dataroot in config looks like for these datasets ?
I have downloaded stanford_hardi in a folder specifying the folder path does not work ? can you update readme with proper instructions of doing that.
Can I ask what is the correct shape for matched state? I Originally thought it should be an integer referred to t, but it's not correct.
Has that dataset be applied to another type of MRI?
The MRI I am working on is with dimensions 90,90,3,5 - magnetic field x evolution time?
I tried to combine them as I did for Patch2Self which seemed to work.
I get this error :
export CUDA_VISIBLE_DEVICES=0
24-03-10 19:49:07.449 - INFO: [Phase 1] Training noise model!
Loaded data of size: (90, 90, 1, 15)
Traceback (most recent call last):
File "/Users/dolorious/Desktop/MLMethods/DDM2/train_noise_model.py", line 42, in
train_set = Data.create_dataset(dataset_opt, phase)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/dolorious/Desktop/MLMethods/DDM2/data/init.py", line 30, in create_dataset
dataset = MRIDataset(dataroot=dataset_opt['dataroot'],
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/dolorious/Desktop/MLMethods/DDM2/data/mri_dataset.py", line 35, in init
self.raw_data = np.pad(raw_data.astype(np.float32), ((0,0), (0,0), (in_channel//2, in_channel//2), (self.padding, self.padding)), mode='wrap').astype(np.float32)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/dolorious/.pyenv/versions/3.12.0/lib/python3.12/site-packages/numpy/lib/arraypad.py", line 819, in pad
raise ValueError(
ValueError: can't extend empty axis 3 using modes other than 'constant' or 'empty'
(ddm2) dolorious@bigDPotter DDM2 %
Hi,
I have 2 questions regarding the datasets:
1- Can DDM2 be run with multi-shell datasets?
I see in the paper that DDM2 was run with:
- Hardi with b-value= 2000
- Sherbroke with b-value = 1000
- Parkinson with b-value= 2000
Can DDM2 be trained on a dataset with b-values = {0, 1000, 2000} ?
I am interested in training on a multi-shell dataset.
2- When I run the denoising process, the resulting dataset does not have the b0s.
Why are b0s not returned in the denoised dataset? Are b0s used in all other phases?
Thanks!
I'm having difficulty understanding the code you provided. Could you please clarify the following points for me?
Hi Tiange,
I'm having an issue while running at stage 3. The run are stuck at the following step for 2 days now. I updated to pytorch2.0. Did you encountered this issue?
23-04-04 10:35:55.230 - INFO: Note: NumExpr detected 40 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8.
23-04-04 10:35:55.231 - INFO: NumExpr defaulting to 8 threads.
23-04-04 10:36:04.536 - INFO: MRI dataset [hardi] is created.
23-04-04 10:36:05.309 - INFO: MRI dataset [hardi] is created.
23-04-04 10:36:05.309 - INFO: Initial Dataset Finished
23-04-04 10:36:05.309 - INFO: ('2.0.0', '11.8')
23-04-04 10:36:12.078 - INFO: Initialization method [orthogonal]
Hi @tiangexiang
Thanks for great work.
Could you provide information about computational requirements of the model training from Stage-I through Stage-III i.e. number of GPUs used and approximate total training time?
Hello, I was wondering if this can be used for 2D images
Hi, thank you for this amazing paper. I wanted to ask you very few questions to elaborate in very detail.
Should the data used for training during the training process be 1 4D data?
Can a model trained on only one 4D data be used on other people's 4D data?
Hi, thank you for your research. I am very interested in your research, but got the following Error when I tried to train (Stage I).
(ddm2) root@2080Ti:~/DDM2# python3 train_noise_model.py -p train -c config/hardi_150.json
1.8.0 10.2
export CUDA_VISIBLE_DEVICES=0
23-07-01 00:03:35.315 - INFO: [Phase 1] Training noise model!
Loaded data of size: (81, 106, 76, 160)
23-07-01 00:03:38.916 - INFO: MRI dataset [hardi] is created.
Loaded data of size: (81, 106, 76, 160)
23-07-01 00:03:41.523 - INFO: MRI dataset [hardi] is created.
23-07-01 00:03:41.524 - INFO: Initial Dataset Finished
dropout 0.0 encoder dropout 0.0
23-07-01 00:03:45.363 - INFO: Noise Model is created.
23-07-01 00:03:45.363 - INFO: Initial Model Finished
Traceback (most recent call last):
File "train_noise_model.py", line 72, in <module>
trainer.optimize_parameters()
File "/root/DDM2/model/model_stage1.py", line 62, in optimize_parameters
outputs = self.netG(self.data)
File "/root/miniconda3/envs/ddm2/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/root/DDM2/model/mri_modules/noise_model.py", line 44, in forward
return self.p_losses(x, *args, **kwargs)
File "/root/DDM2/model/mri_modules/noise_model.py", line 36, in p_losses
x_recon = self.denoise_fn(x_in['condition'])
File "/root/miniconda3/envs/ddm2/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/root/DDM2/model/mri_modules/unet.py", line 286, in forward
x = layer(x)
File "/root/miniconda3/envs/ddm2/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/root/miniconda3/envs/ddm2/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 399, in forward
return self._conv_forward(input, self.weight, self.bias)
File "/root/miniconda3/envs/ddm2/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 395, in _conv_forward
return F.conv2d(input, weight, bias, self.stride,
RuntimeError: cuDNN error: CUDNN_STATUS_NOT_INITIALIZED
I created env on a 2080Ti using the .yml file provided in the repo, so the experimental setup is the same as yours. So I suspect something is wrong with my data processing or config file.
I have used the following code to save the Hardi150 data:
hardi_fname, hardi_bval_fname, hardi_bvec_fname = get_fnames('stanford_hardi')
data, affine = load_nifti(hardi_fname)
save_nifti('hardi.nii.gz', data, affine)
and correspondingly updated dataroot
in lines 17 and 30 in config/hardi_150.json
, and keep everything else as it is.
Is the way of saving the data and using the config file correct?
I'm a rookie so probably my questions are very foolish.
I would greatly appreciate it if you could respond me.
Hello,
Firstly thanks for this great paper and detailed git.
I am training the network for DCE-MRI after adding noise explicitly to the 4D data. My dataset is 320x320. I was successfully able to train it for stage 1 and stage 2.
In stage 3 I am facing the error in model.py , line 223
self.optG.load_state_dict(opt['optimizer'])
The error being: loaded state dict contains a parameter group that doesn't match the size of optimizer's group
I delved deeper and found "initial_lr" missing in the self.optG.state_dict()['param_groups'] and loaded dict opt['optimizer']['param_groups'] had it.
I though the issue is that a new optimizer is being initialized and a trained optimizer is being loaded so, I added a line after line 65 in model.py
65| self.optG = torch.optim.Adam(
optim_params, lr=opt['train']["optimizer"]["lr"])
Line added:
66 | self.scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(self.optG, opt['train']['n_iter'], eta_min=opt['train']["optimizer"]["lr"])
after this addition I saw both the self.optG and opt['optimizer'] have same size and parameter groups yet the error persists.
Am I missing something, or was my approach wrong.
The changes I did for my purpose :
I had to change the image_size to 320 in .json files and uncommented the resize line in transform in mri_dataset.py because I did not want to downsize my data and had to reduce batch size to 2 for my training purposes.
I thank you in advance for your time.
Hi, I wanted to denoise on my dataset but it failed. I have found that this project doesn't include any model that have trained well, Is that true?
Hi, thank you for this amazing paper. I wanted to ask you very few questions to elaborate in very detail.
I have seen it in multiple places (i.e. mri_dataset.py
) that you define valid_mask = [10, 160]
. Considering your data size as (81, 106, 76, 160), are there any particular reasons you choose val_volume _idx = 40
and select valid_mask = [10, 160]
in hardi150.json? The reason I asked is that I am working with (118, 118, 25, 56) 4D-diffusion data and there are some issues I fall into when defining mri_dataset.py as follows:
valid_mask = np.zeros(56,)
valid_mask[10:] += 1
valid_mask = valid_mask.astype(np.bool8)
dataset = MRIDataset("/home/anar/DDM2/data/HARDI150.nii.gz", valid_mask = [10, 56],
phase='train', val_volume_idx=40, padding=3)
Traceback (most recent call last):
File "/home/anar/DDM2/train_noise_model.py", line 92, in <module>
for _, val_data in enumerate(val_loader):
File "/home/anar/miniconda3/envs/ddm2/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 630, in __next__
data = self._next_data()
File "/home/anar/miniconda3/envs/ddm2/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1345, in _next_data
return self._process_data(data)
File "/home/anar/miniconda3/envs/ddm2/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1371, in _process_data
data.reraise()
File "/home/anar/miniconda3/envs/ddm2/lib/python3.10/site-packages/torch/_utils.py", line 694, in reraise
raise exception
IndexError: Caught IndexError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/home/anar/miniconda3/envs/ddm2/lib/python3.10/site-packages/torch/utils/data/_utils/worker.py", line 308, in _worker_loop
data = fetcher.fetch(index)
File "/home/anar/miniconda3/envs/ddm2/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 51, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/anar/miniconda3/envs/ddm2/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 51, in <listcomp>
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/anar/DDM2/data/mri_dataset.py", line 118, in __getitem__
raw_input[:,:,slice_idx:slice_idx+2*(self.in_channel//2)+1,[volume_idx+self.padding]]), axis=-1)
IndexError: index 41 is out of bounds for axis 3 with size 8
Do you have any slightest ideas where could this originate from?
Hi!
I'm trying to run Stage1 with the PPMI dataset but I ge the error:
Traceback (most recent call last): File "/content/DDM2_test/train_noise_model.py", line 98, in <module> trainer.optimize_parameters() File "/content/DDM2_test/model/model_stage1.py", line 69, in optimize_parameters outputs = self.netG(self.data) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) File "/content/DDM2_test/model/mri_modules/noise_model.py", line 44, in forward return self.p_losses(x, *args, **kwargs) File "/content/DDM2_test/model/mri_modules/noise_model.py", line 36, in p_losses x_recon = self.denoise_fn(x_in['condition']) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) File "/content/DDM2_test/model/mri_modules/unet.py", line 286, in forward x = layer(x) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py", line 460, in forward return self._conv_forward(input, self.weight, self.bias) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py", line 456, in _conv_forward return F.conv2d(input, weight, bias, self.stride, RuntimeError: Input type (double) and bias type (float) should be the same
It's the same dataset that you use in your article.
I have added a new config file as follows:
{
"name": "ppmi64",
"phase": "train", // always set to train in the config
"gpu_ids": [
0
],
"path": { //set the path
"log": "logs",
"tb_logger": "tb_logger",
"results": "results",
"checkpoint": "checkpoint",
"resume_state": null // UPDATE THIS FOR RESUMING TRAINING
},
"datasets": {
"train": {
"name": "ppmi",
"dataroot": "/PPMI/noisy.nii.gz",
"valid_mask": [10,64],
"phase": "train",
"padding": 3,
"val_volume_idx": 40, // the volume to visualize for validation
"val_slice_idx": 40, // the slice to visualize for validation
"batch_size": 32,
"in_channel": 1,
"num_workers": 0,
"use_shuffle": true
},
"val": {
"name": "ppmi",
"dataroot": "/PPMI/noisy.nii.gz",
"valid_mask": [10,64],
"phase": "val",
"padding": 3,
"val_volume_idx": 40, // the volume to visualize for validation
"val_slice_idx": 40, // the slice to visualize for validation
"batch_size": 1,
"in_channel": 1,
"num_workers": 0
}
},
"model": {
"which_model_G": "mri",
"finetune_norm": false,
"drop_rate": 0.0,
"unet": {
"in_channel": 1,
"out_channel": 1,
"inner_channel": 32,
"norm_groups": 32,
"channel_multiplier": [
1,
2,
4,
8,
8
],
"attn_res": [
16
],
"res_blocks": 2,
"dropout": 0.0,
"version": "v1"
},
"beta_schedule": { // use munual beta_schedule for acceleration
"train": {
"schedule": "rev_warmup70",
"n_timestep": 1000,
"linear_start": 5e-5,
"linear_end": 1e-2
},
"val": {
"schedule": "rev_warmup70",
"n_timestep": 1000,
"linear_start": 5e-5,
"linear_end": 1e-2
}
},
"diffusion": {
"image_size": 128,
"channels": 3, //sample channel
"conditional": true // not used for DDM2
}
},
"train": {
"n_iter": 100000, //150000,
"val_freq": 1e3,
"save_checkpoint_freq": 1e4,
"print_freq": 1e2,
"optimizer": {
"type": "adam",
"lr": 1e-4
},
"ema_scheduler": { // not used now
"step_start_ema": 5000,
"update_ema_every": 1,
"ema_decay": 0.9999
}
},
// for Phase1
"noise_model": {
"resume_state": null,
"drop_rate": 0.0,
"unet": {
"in_channel": 2,
"out_channel": 1,
"inner_channel": 32,
"norm_groups": 32,
"channel_multiplier": [
1,
2,
4,
8,
8
],
"attn_res": [
16
],
"res_blocks": 2,
"dropout": 0.0,
"version": "v1"
},
"beta_schedule": { // use munual beta_schedule for accelerationß
"linear_start": 5e-5,
"linear_end": 1e-2
},
"n_iter": 10000,
"val_freq": 2e3,
"save_checkpoint_freq": 1e4,
"print_freq": 1e3,
"optimizer": {
"type": "adam",
"lr": 1e-4
}
},
"stage2_file": "" // **UPDATE THIS TO THE PATH OF PHASE2 MATCHED FILE**
}
Do we need to change anything else in the config file?
Can you share the configuration files you have used for the other datasets?
Hi,
Thank you for sharing your code.
Can you please share the trained models (both 2D and 3D)?
Hello, thank you for this amazing research and open source codes.
Sorry to take up your time, I have a question not related to this open source code but related to your DDM2 work.
When I was reproducing the FA map in the article, I realized that using only the noisy data didn't come out with the matching figure in the paper.
my code is:
import numpy as np
import dipy.reconst.dki as dki
import dipy.reconst.dti as dti
from dipy.core.gradients import gradient_table
from dipy.data import get_fnames
from dipy.io.gradients import read_bvals_bvecs
from dipy.io.image import load_nifti
from dipy.segment.mask import median_otsu
from dipy.viz.plotting import compare_maps
from scipy.ndimage import gaussian_filter
import matplotlib.pyplot as plt
from warnings import warn
import cv2
import numpy as np
hardi_fname, hardi_bval_fname, hardi_bvec_fname = get_fnames(
'stanford_hardi')
b0_size = 10
data, affine = load_nifti(hardi_fname)
bvals, bvecs = read_bvals_bvecs(hardi_bval_fname, hardi_bvec_fname)
gtab = gradient_table(bvals, bvecs)
maskdata, mask = median_otsu(data, vol_idx=[0, 1], median_radius=4, numpass=2,
autocrop=False, dilate=1)
slice=40
data = data[:, :, slice:slice+1]
mask = mask[:, :, slice:slice+1]
tenmodel = dti.TensorModel(gtab)
tenfit = tenmodel.fit(data, mask=mask)
fits = [tenfit]
maps = ['fa', 'md', 'ad', 'rd']
fit_labels = ['DTI']
for i in range(len(maps)):
attr = getattr(fits[0], maps[i])
attr=(attr - np.min(attr)) / (np.max(attr) - np.min(attr))
plt.imshow(attr,cmap='jet')
plt.axis('off')
plt.savefig('{}.png'.format('origin'+maps[i]))
plt.close()
the map in paper:
the map my code give:
Thank you for taking the time. Good luck with your research.
Hi, I'm having trouble understanding stage 3, can you give a brief description of how stage 3 works and what does continuous_sqrt_alpha_cumprod do once it's passed into the network?
i meet a same problem as a closed issues as follows:
Traceback (most recent call last):
File "train_diff_model.py", line 54, in
diffusion = Model.create_model(opt)
File "/home/star/jiangjiawei/DDM2-main/model/init.py", line 7, in create_model
m = M(opt)
File "/home/star/jiangjiawei/DDM2-main/model/model.py", line 71, in init
self.load_network()
File "/home/star/jiangjiawei/DDM2-main/model/model.py", line 223, in load_network
self.optG.load_state_dict(opt['optimizer'])
File "/home/star/anaconda3/envs/py37/lib/python3.7/site-packages/torch/optim/optimizer.py", line 201, in load_state_dict
raise ValueError("loaded state dict contains a parameter group "
ValueError: loaded state dict contains a parameter group that doesn't match the size of optimizer's group
As reminded, i checked the config file path of stage 1 and stage 2, and i think the path setting is correct, are there any other problems may cause this issue? and the details of training logs and the .txt file are also attached.
Hi, thanks a lot for your interest in those issues, I wanted to ask about your comment on the following issue when I want to train Stage1:
24-01-18 01:06:41.203 - INFO: [Phase 1] Training noise model!
24-01-18 01:07:04.744 - INFO: MRI dataset [hardi] is created.
24-01-18 01:07:23.001 - INFO: MRI dataset [hardi] is created.
24-01-18 01:07:23.001 - INFO: Initial Dataset Finished
/home/anar/mambaforge-pypy3/envs/ddm2_image/lib/python3.8/site-packages/torch/cuda/__init__.py:104: UserWarning:
NVIDIA RTX 6000 Ada Generation with CUDA capability sm_89 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70.
If you want to use the NVIDIA RTX 6000 Ada Generation GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/
warnings.warn(incompatible_device_warn.format(device_name, capability, " ".join(arch_list), device_name))
24-01-18 01:07:23.542 - INFO: Noise Model is created.
24-01-18 01:07:23.542 - INFO: Initial Model Finished
1.8.0 10.2
export CUDA_VISIBLE_DEVICES=2
Loaded data of size: (118, 118, 25, 56)
Loaded data of size: (118, 118, 25, 56)
dropout 0.0 encoder dropout 0.0
raw_input shape before slicing: (118, 118, 1, 3)
raw_input shape after slicing: (118, 118, 3)
raw_input shape before slicing: (118, 118, 1, 3)
raw_input shape after slicing: (118, 118, 3)
raw_input shape before slicing: (118, 118, 1, 3)
raw_input shape after slicing: (118, 118, 3)
raw_input shape before slicing: (118, 118, 1, 3)
raw_input shape after slicing: (118, 118, 3)
raw_input shape before slicing: (118, 118, 1, 3)
raw_input shape after slicing: (118, 118, 3)
raw_input shape before slicing: (118, 118, 1, 3)
raw_input shape after slicing: (118, 118, 3)
raw_input shape before slicing: (118, 118, 1, 3)
raw_input shape after slicing: (118, 118, 3)
raw_input shape before slicing: (118, 118, 1, 3)
raw_input shape after slicing: (118, 118, 3)
raw_input shape before slicing: (118, 118, 1, 3)
raw_input shape after slicing: (118, 118, 3)
raw_input shape before slicing: (118, 118, 1, 3)
raw_input shape after slicing: (118, 118, 3)
raw_input shape before slicing: (118, 118, 1, 3)
raw_input shape after slicing: (118, 118, 3)
raw_input shape before slicing: (118, 118, 1, 3)
raw_input shape after slicing: (118, 118, 3)
raw_input shape before slicing: (118, 118, 1, 3)
raw_input shape after slicing: (118, 118, 3)
raw_input shape before slicing: (118, 118, 1, 3)
raw_input shape after slicing: (118, 118, 3)
raw_input shape before slicing: (118, 118, 1, 3)
raw_input shape after slicing: (118, 118, 3)
raw_input shape before slicing: (118, 118, 1, 3)
raw_input shape after slicing: (118, 118, 3)
raw_input shape before slicing: (118, 118, 1, 3)
raw_input shape after slicing: (118, 118, 3)
raw_input shape before slicing: (118, 118, 1, 3)
raw_input shape after slicing: (118, 118, 3)
raw_input shape before slicing: (118, 118, 1, 3)
raw_input shape after slicing: (118, 118, 3)
raw_input shape before slicing: (118, 118, 1, 3)
raw_input shape after slicing: (118, 118, 3)
raw_input shape before slicing: (118, 118, 1, 3)
raw_input shape after slicing: (118, 118, 3)
raw_input shape before slicing: (118, 118, 1, 3)
raw_input shape after slicing: (118, 118, 3)
raw_input shape before slicing: (118, 118, 1, 3)
raw_input shape after slicing: (118, 118, 3)
raw_input shape before slicing: (118, 118, 1, 3)
raw_input shape after slicing: (118, 118, 3)
raw_input shape before slicing: (118, 118, 1, 3)
raw_input shape after slicing: (118, 118, 3)
raw_input shape before slicing: (118, 118, 1, 3)
raw_input shape after slicing: (118, 118, 3)
raw_input shape before slicing: (118, 118, 1, 3)
raw_input shape after slicing: (118, 118, 3)
raw_input shape before slicing: (118, 118, 1, 3)
raw_input shape after slicing: (118, 118, 3)
raw_input shape before slicing: (118, 118, 1, 3)
raw_input shape after slicing: (118, 118, 3)
raw_input shape before slicing: (118, 118, 1, 3)
raw_input shape after slicing: (118, 118, 3)
raw_input shape before slicing: (118, 118, 1, 3)
raw_input shape after slicing: (118, 118, 3)
raw_input shape before slicing: (118, 118, 1, 3)
raw_input shape after slicing: (118, 118, 3)
Traceback (most recent call last):
File "train_noise_model.py", line 72, in <module>
trainer.optimize_parameters()
File "/home/anar/DDM2/model/model_stage1.py", line 62, in optimize_parameters
outputs = self.netG(self.data)
File "/home/anar/mambaforge-pypy3/envs/ddm2_image/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/anar/DDM2/model/mri_modules/noise_model.py", line 44, in forward
return self.p_losses(x, *args, **kwargs)
File "/home/anar/DDM2/model/mri_modules/noise_model.py", line 36, in p_losses
x_recon = self.denoise_fn(x_in['condition'])
File "/home/anar/mambaforge-pypy3/envs/ddm2_image/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/anar/DDM2/model/mri_modules/unet.py", line 286, in forward
x = layer(x)
File "/home/anar/mambaforge-pypy3/envs/ddm2_image/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/anar/mambaforge-pypy3/envs/ddm2_image/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 399, in forward
return self._conv_forward(input, self.weight, self.bias)
File "/home/anar/mambaforge-pypy3/envs/ddm2_image/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 395, in _conv_forward
return F.conv2d(input, weight, bias, self.stride,
RuntimeError: CUDA error: no kernel image is available for execution on the device
Previously, when I was trying to denoise HARDI150 volumes, I didn't specify any PyTorch version and made Python>=3.10. But after noticing your initial environment.yaml
criteria, I changed to very specific cases for torch, torchvision, and python but frankly, I started to get the above issue. Do you think it is better I do not specify any version for PyTorch or they should exactly match?
The reason I ask this is because I feel like from the previous issue when the validation loader was not working, I thought maybe it happened due to version mismatches from the environment file but after getting the above problem, I am still very unsure on this as well.
I encounter some trouble when trying to read your code. I plan to change the dataset to my version, but I do not understand what does 'condition' mean?
Could you please explain it to me?
ret = dict(X=raw_input[[-1], :, :], condition=raw_input[:-1, :, :])
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.