lin-lcx / hgcn Goto Github PK

Python 87.38% Jupyter Notebook 12.62%

hgcn's Introduction

HGCN

Hybrid Graph Convolutional Network with Online Masked Autoencoder for Robust Multimodal Cancer Survival Prediction

Abstract

Cancer survival prediction requires exploiting complementary multimodal information (e.g., pathological, clinical and genomic features) and it is even more challenging in clinical practices due to the incompleteness of patient’s multimodal data. Existing methods lack of sufficient intra- and inter-modal interactions and suffer from severe performance degradation caused by missing modalities. This paper proposes a novel hybrid graph convolutional network, called HGCN, equipped with an online masked autoencoder paradigm for robust multimodal cancer survival prediction. Particularly, we pioneer modeling the patient’s multimodal data into flexible and interpretable multimodal graphs with modality-specific preprocessing. Our elaborately designed HGCN integrates the advantages of graph convolutional network (GCN) and hypergraph convolutional network (HCN), by utilizing the node message passing and the hyperedge mixing mechanism to facilities intra-modal and inter-modal interactions of multimodal graphs. With the proposed HGCN, the potential of multimodal data can be better unleashed, leading to more reliable predictions of patient’s survival risk. More important, to handle the missing modalities in clinical scenarios, we incorporate an online masked autoencoder paradigm into HGCN, which can capture the intrinsic dependence between modalities and seamlessly generate missing hyperedges for model inference. Extensive experiments and analysis on six cancer cohorts from TCGA project (e.g., KIRC, LIHC, ESCA, LUSC, LUAD and UCEC) show that our method significantly outperforms the state-of-the-arts under both complete and missing modal settings.

Data processing

Genomic profile can be downloaded from the cBioPortal.The categorization of gene embeddings can be obtained from MSigDB.

Pathological slide and clinical records can be downloaded from the GDC. Clinical records included in different trials are shown in Supplementary.

cut_and_pretrain.py gives the code for cutting patch and pre-training.

The detailed steps of processing data can be seen in gendata.ipynb,this file shows how to encapsulate a data into the format we want.The tool for building graph neural network is pytorch geometric.

Data

We provide the data and labels (patients, sur_and_time, all_data, seed_fit_split) required for the experiment in the article, as well as a set of trained model parameters.

Train

After setting the parameters and save path in the file train.py, you can directly use the command line python train.py for training. The training process will be printed out, and the prediction results will be saved in the path.

Hyperparameter setting and Experimental environment are shown in Supplementary.

The folder data_split contains the data division we used for the experiment. If you want to use it, you can modify the parameter if_fit_split to True.

Citation

If you found our work useful in your research, please consider citing our works(s) at:

@article{hou2023hybrid,
  title={Hybrid Graph Convolutional Network with Online Masked Autoencoder for Robust Multimodal Cancer Survival Prediction},
  author={Hou, Wentai and Lin, Chengxuan and Yu, Lequan and Qin, Jing and Yu, Rongshan and Wang, Liansheng},
  journal={IEEE Transactions on Medical Imaging},
  year={2023},
  publisher={IEEE}
}

hgcn's People

Contributors

Stargazers

Watchers

Forkers

ljhofgithub aishwarya8s zoule41 meimei-xixi lqh42 tiantiy ge-yl vison307

hgcn's Issues

如何可视化呢

我目前已经训练完了，但是我想看下我的可视化的效果要如何去实现呢

关于数据集分割

您好！在gendata.ipynb中没有得到split.pkl的方法，请问您在这个过程中是如何进行分割的？非常感谢！

RuntimeError: The size of tensor a (512) must match the size of tensor b (3) at non-singleton dimension 1

感谢作者的杰出工作，请问这个问题需要如何解决？另外是否可以公布推理 test相关代码呢

病理图像归一化问题

作者您好，阅读了您以及王连生教授的文章后感觉受益匪浅，想问下您的病理图像的normalize是使用的什么方法？

use_slide.pkl file loss

When I process the cut_and_pretrain.py, I cannot find the use_slide.pkl file.How to address it?

BUG

我用的是GTX1650的卡，配置的pytorch为2.3.1.我下载了谷歌上的代码，按照方式运行，但是却显示

我想知道有人了解，或者遇到过吗

请问关于论文中的图4

您好！
非常棒的工作！也感谢您把代码release出来供大家学习！
想请问下论文中图4的“Interpretation of the proposed framework”关于WSI的heatmap以及“Integrated gradient analysis of genomic profile”是怎么生成的呢，请问这部分代码会放出来嘛，或是参照了哪篇文章或code，期待您的回复！

'use_slide.pkl' file missing

When I process the cut_and_pretrain.py, I cannot find the use_slide.pkl file

FileNotFoundError: [Errno 2] No such file or directory: 'use_slide.pkl'

请问如何解决呢

关于基因组数据集

您好！我根据notebook里面的提示找到了https://www.gsea-msigdb.org/gsea/msigdb/gene_families.jsp?ex=1，之后我点击Tumor Supression，跳转到了https://www.gsea-msigdb.org/gsea/msigdb/human/annotate.jsp 页面，我没有找到直接的数据集下载地址，我尝试了https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE11233 地址和https://www.gsea-msigdb.org/gsea/downloads.jsp 地址，都没有找到TCGA的五种基因数据，请问怎么样能拿到原始的基因数据集？非常感谢！

Data stored in pkl files

Thank you for your extraordinary work! However, I have a question about the pkl files provided in data and label. Has the pkl file already included the gene data. For example, is esca_data.pkl the final pkl (i.e., all_data.pkl) file after gendata.ipynb. And could you provide the t_rna_fea.pkl and ttt_cli_feas.pkl in gendata.ipynb?

关于环境配置

您好！请问能导出一下python和cuda等的环境配置吗？

No module named 'torch_geometric.data.storage'

您好！请问您处理时数据集使用的是Pyg2吗？我用您发布的lihc数据集运行命令:
python train.py --if_fit_split True --cancer_type lihc
时，出现了如下问题：
Traceback (most recent call last):
File "train.py", line 655, in
main(args)
File "train.py", line 380, in main
all_data=joblib.load('/home/jupyter-ljh/data/mydata/HGCN-main/LIHC/lihc_data.pkl')
File "/home/jupyter-ljh/.conda/envs/hgcn/lib/python3.7/site-packages/joblib/numpy_pickle.py", line 658, in load
obj = _unpickle(fobj, filename, mmap_mode)
File "/home/jupyter-ljh/.conda/envs/hgcn/lib/python3.7/site-packages/joblib/numpy_pickle.py", line 577, in _unpickle
obj = unpickler.load()
File "/home/jupyter-ljh/.conda/envs/hgcn/lib/python3.7/pickle.py", line 1088, in load
dispatchkey[0]
File "/home/jupyter-ljh/.conda/envs/hgcn/lib/python3.7/pickle.py", line 1385, in load_stack_global
self.append(self.find_class(module, name))
File "/home/jupyter-ljh/.conda/envs/hgcn/lib/python3.7/pickle.py", line 1426, in find_class
import(module, level=0)
ModuleNotFoundError: No module named 'torch_geometric.data.storage'
我在网上找到的解释是：https://github.com/pyg-team/pytorch_geometric/issues/4732；如果我重新安装pyg2，可以正常运行train.py吗？

关于训练过程

您好！感谢您的贡献！请问您没有tensorboard这种观察训练过程的工具，是如何知道训练结果是否过拟合的？如果光是通过打印在命令行的指标查看训练过程，是不是有点不太直观？非常感谢！