I am experimenting with the Endometrial_POLE dataset with an added "Patients_per_Image" file. This was created according to the instructions in the README file on GitHub, with the scope to use multiple images for each patient. I attached the files, in case the issue is caused by the content of these files.
Patient_to_Image.xlsx
Image_Labels.xlsx
The first issue concerns the classes used to train Naronet. In this context, each patient - from the 12 selected in the cohort - has 4 classes assigned to his/her knowledge graph, one for each classification task.
However, the two following lines in Naronet.py select only the second label. Is this correct? Why so?
self.Train_indices = [self.IndexAndClass[i][1] for i in self.Train_indices]
self.Test_indices = [self.IndexAndClass[i][1] for i in self.Test_indices]
I also observed that the set of training and test indices are always the same, as shown in the attached image. This raises the following issue: the class assigned to the patients selected in the training set for the second classification task, saved in the y_train variable, is always 1. This raises the following error:
File "/home/carol/NaroNet-main/NaroNet-main/src/NaroNet/NaroNet.py", line 204, in initialize_fold
self.Train_indices, _ = ros.fit_resample(x_trainn, y_trainn)
ValueError: The target 'y' needs to have more than 1 class. I got 1 class instead
x_trainn = np.expand_dims(np.array(self.Train_indices),1)
y_trainn = [self.labels[i][0] for i in self.Train_indices]
y-trainn and x_trainn for context. I added these two variables for clarity, the functionality is exactly the same.
To get past this issue I set the indices of training and testing instances differently by hand, so there will be at least an instance with class 0 in both sets.
self.Train_indices = [0, 1, 2, 3, 4, 6]
self.Test_indices = [5, 7, 8, 9, 10, 11]
This leads me to the second issue, shown below. For this experiment, I am using a server with two GPUs. I tried using both of them on two separate runs but received the same error. The model of both GPUs is Nvidia RTX A6000 with 48GB RAM, which is higher than the hardware mentioned in the paper with 11GB RAM. As I checked the code, no Data Parallelization method was implemented.
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 618.00 MiB (GPU 1; 47.54 GiB total capacity; 45.74 GiB already allocated; 189.12 MiB free; 45.97 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.
See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
This first appeared on the following line:
File "/home/carol/NaroNet-main/NaroNet-main/src/NaroNet/NaroNet_model/GNN.py", line 440, in MLPintoFeatures
x = F.relu(conv0(x))
I added these lines at the beginning of GNN.py:
import os
os.environ['PYTORCH_CUDA_ALLOC_CONF'] = 'max_split_size_mb:512'
import torch
torch.cuda.empty_cache()
Now the error appears at:
File "/home/carol/NaroNet-main/NaroNet-main/src/NaroNet/NaroNet_model/GNN.py", line 443, in MLPintoFeatures
x = F.relu(conv0(x))