Git Product home page Git Product logo

hgcn's Introduction

HGCN

Hybrid Graph Convolutional Network with Online Masked Autoencoder for Robust Multimodal Cancer Survival Prediction

Image text

Abstract

Cancer survival prediction requires exploiting complementary multimodal information (e.g., pathological, clinical and genomic features) and it is even more challenging in clinical practices due to the incompleteness of patient’s multimodal data. Existing methods lack of sufficient intra- and inter-modal interactions and suffer from severe performance degradation caused by missing modalities. This paper proposes a novel hybrid graph convolutional network, called HGCN, equipped with an online masked autoencoder paradigm for robust multimodal cancer survival prediction. Particularly, we pioneer modeling the patient’s multimodal data into flexible and interpretable multimodal graphs with modality-specific preprocessing. Our elaborately designed HGCN integrates the advantages of graph convolutional network (GCN) and hypergraph convolutional network (HCN), by utilizing the node message passing and the hyperedge mixing mechanism to facilities intra-modal and inter-modal interactions of multimodal graphs. With the proposed HGCN, the potential of multimodal data can be better unleashed, leading to more reliable predictions of patient’s survival risk. More important, to handle the missing modalities in clinical scenarios, we incorporate an online masked autoencoder paradigm into HGCN, which can capture the intrinsic dependence between modalities and seamlessly generate missing hyperedges for model inference. Extensive experiments and analysis on six cancer cohorts from TCGA project (e.g., KIRC, LIHC, ESCA, LUSC, LUAD and UCEC) show that our method significantly outperforms the state-of-the-arts under both complete and missing modal settings.

Data processing

Genomic profile can be downloaded from the cBioPortal.The categorization of gene embeddings can be obtained from MSigDB.

Pathological slide and clinical records can be downloaded from the GDC. Clinical records included in different trials are shown in Supplementary.

cut_and_pretrain.py gives the code for cutting patch and pre-training.

The detailed steps of processing data can be seen in gendata.ipynb,this file shows how to encapsulate a data into the format we want.The tool for building graph neural network is pytorch geometric.

Data

We provide the data and labels (patients, sur_and_time, all_data, seed_fit_split) required for the experiment in the article, as well as a set of trained model parameters.

Train

After setting the parameters and save path in the file train.py, you can directly use the command line python train.py for training. The training process will be printed out, and the prediction results will be saved in the path.

Hyperparameter setting and Experimental environment are shown in Supplementary.

The folder data_split contains the data division we used for the experiment. If you want to use it, you can modify the parameter if_fit_split to True.

Citation

  • If you found our work useful in your research, please consider citing our works(s) at:
@article{hou2023hybrid,
  title={Hybrid Graph Convolutional Network with Online Masked Autoencoder for Robust Multimodal Cancer Survival Prediction},
  author={Hou, Wentai and Lin, Chengxuan and Yu, Lequan and Qin, Jing and Yu, Rongshan and Wang, Liansheng},
  journal={IEEE Transactions on Medical Imaging},
  year={2023},
  publisher={IEEE}
}

hgcn's People

Contributors

lin-lcx avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

hgcn's Issues

如何可视化呢

我目前已经训练完了,但是我想看下我的可视化的效果要如何去实现呢

关于数据集分割

您好!在gendata.ipynb中没有得到split.pkl的方法,请问您在这个过程中是如何进行分割的?非常感谢!

病理图像归一化问题

作者您好,阅读了您以及王连生教授的文章后感觉受益匪浅,想问下您的病理图像的normalize是使用的什么方法?

use_slide.pkl file loss

When I process the cut_and_pretrain.py, I cannot find the use_slide.pkl file.How to address it?

BUG

我用的是GTX1650的卡,配置的pytorch为2.3.1.我下载了谷歌上的代码,按照方式运行,但是却显示
image
我想知道有人了解,或者遇到过吗

请问关于论文中的图4

您好!
非常棒的工作!也感谢您把代码release出来供大家学习!
想请问下论文中图4的“Interpretation of the proposed framework”关于WSI的heatmap以及“Integrated gradient analysis of genomic profile”是怎么生成的呢,请问这部分代码会放出来嘛,或是参照了哪篇文章或code,期待您的回复!

关于基因组数据集

您好!我根据notebook里面的提示找到了https://www.gsea-msigdb.org/gsea/msigdb/gene_families.jsp?ex=1,之后我点击Tumor Supression,跳转到了https://www.gsea-msigdb.org/gsea/msigdb/human/annotate.jsp 页面,我没有找到直接的数据集下载地址,我尝试了https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE11233 地址和https://www.gsea-msigdb.org/gsea/downloads.jsp 地址,都没有找到TCGA的五种基因数据,请问怎么样能拿到原始的基因数据集?非常感谢!

Data stored in pkl files

Thank you for your extraordinary work! However, I have a question about the pkl files provided in data and label. Has the pkl file already included the gene data. For example, is esca_data.pkl the final pkl (i.e., all_data.pkl) file after gendata.ipynb. And could you provide the t_rna_fea.pkl and ttt_cli_feas.pkl in gendata.ipynb?
0c087fed41613f41b65f9a1aa57144d
367c09304961165740680ddc26dedd9

No module named 'torch_geometric.data.storage'

您好!请问您处理时数据集使用的是Pyg2吗?我用您发布的lihc数据集运行命令:
python train.py --if_fit_split True --cancer_type lihc
时,出现了如下问题:
Traceback (most recent call last):
File "train.py", line 655, in
main(args)
File "train.py", line 380, in main
all_data=joblib.load('/home/jupyter-ljh/data/mydata/HGCN-main/LIHC/lihc_data.pkl')
File "/home/jupyter-ljh/.conda/envs/hgcn/lib/python3.7/site-packages/joblib/numpy_pickle.py", line 658, in load
obj = _unpickle(fobj, filename, mmap_mode)
File "/home/jupyter-ljh/.conda/envs/hgcn/lib/python3.7/site-packages/joblib/numpy_pickle.py", line 577, in _unpickle
obj = unpickler.load()
File "/home/jupyter-ljh/.conda/envs/hgcn/lib/python3.7/pickle.py", line 1088, in load
dispatchkey[0]
File "/home/jupyter-ljh/.conda/envs/hgcn/lib/python3.7/pickle.py", line 1385, in load_stack_global
self.append(self.find_class(module, name))
File "/home/jupyter-ljh/.conda/envs/hgcn/lib/python3.7/pickle.py", line 1426, in find_class
import(module, level=0)
ModuleNotFoundError: No module named 'torch_geometric.data.storage'
我在网上找到的解释是:https://github.com/pyg-team/pytorch_geometric/issues/4732;如果我重新安装pyg2,可以正常运行train.py吗?

关于训练过程

您好!感谢您的贡献!请问您没有tensorboard这种观察训练过程的工具,是如何知道训练结果是否过拟合的?如果光是通过打印在命令行的指标查看训练过程,是不是有点不太直观?非常感谢!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.