mims-harvard / shepherd Goto Github PK

SHEPHERD: Deep learning for diagnosing patients with rare genetic diseases

Home Page: https://zitniklab.hms.harvard.edu/projects/SHEPHERD

License: MIT License

Shell 0.16% Python 10.89% Jupyter Notebook 5.59% HTML 83.35%

deep-learning embeddings graph-neural-networks graph-representation-learning knowledge-graph medical-diagnosis patients rare-disease

shepherd's People

Stargazers

Watchers

Forkers

ancientguy hertera1 vishalbelsare stepwise-ai-dev bousejin beckyatmsft

shepherd's Issues

Bug in train process!

CombinedGPAligner - train without checkpoint (best_ckpt arg) cause excpetion

When running train.py (train mode, not do_inference), the method get_model is being called with load_from_checkpoint=False which passes node_cpkt=None to CombinedGPAligner.
CombinedGPAligner loads from checkpoint the NodeEmbedder which loads_from_checkpoint the node_cpkt=None which cause exception!

This is different from the CombinedPatientNCA where NodeEmbedder reads the checkpoint from hparams['saved_checkpoint_path'] so I think its a bug.

def get_model(args, hparams, node_hparams, all_data, edge_attr_dict, n_nodes, load_from_checkpoint=False):
    print("setting up model", hparams['model_type'])
    # get patient model 
    if hparams['model_type'] == 'aligner':
        if load_from_checkpoint: 
            comb_patient_model = CombinedGPAligner.load_from_checkpoint(checkpoint_path=str(Path(project_config.PROJECT_DIR /  args.best_ckpt)), 
                                    edge_attr_dict=edge_attr_dict, all_data=all_data, n_nodes=n_nodes, node_ckpt = hparams["saved_checkpoint_path"], node_hparams=node_hparams)
        else:
            comb_patient_model = CombinedGPAligner(edge_attr_dict=edge_attr_dict, all_data=all_data, n_nodes=n_nodes, hparams=hparams, node_hparams=node_hparams)

class CombinedGPAligner(pl.LightningModule):

    def __init__(self, edge_attr_dict, all_data, n_nodes=None, node_ckpt = None, hparams=None, node_hparams=None,  spl_pca=[], spl_gate=[]):
        super().__init__()
        print('Initializing Model')

        self.save_hyperparameters('hparams', ignore=["spl_pca", "spl_gate"]) # spl_pca and spl_gate never get used

        print("Node checkpoint:", node_ckpt)

        print('Saved combined model hyperparameters: ', self.hparams)

        self.all_data = all_data

        self.all_train_nodes = {}
        self.train_patient_nodes = {}
        self.train_sparse_nodes = {}
        self.train_target_batch = {}
        self.train_corr_gene_nid = {}

        #print(f"Loading Node Embedder from {self.hparams.hparams['saved_checkpoint_path']}")
        print(f"Loading Node Embedder from {node_ckpt}")

        # NOTE: loads in saved hyperparameters
        self.node_model = NodeEmbeder.load_from_checkpoint(checkpoint_path=node_ckpt, #self.hparams.hparams['saved_checkpoint_path'], 
                                                           all_data=all_data, edge_attr_dict=edge_attr_dict, 
                                                           num_nodes=n_nodes)
                                                           #num_nodes=n_nodes, combined_training=self.hparams.hparams['combined_training']) ```

Environment setup problem on Mac

Hi!
Great work! just wanted to inform that env setup using conda (as in tutorial) on local mac fails with the following prompt:
I'm still trying to understand why does it happen, but I raise an issue if its a general problem.

`Solving environment: failed

ResolvePackageNotFound:

libgcc-ng=9.3.0
ld_impl_linux-64=2.33.1
_openmp_mutex=4.5
libstdcxx-ng=9.1.0
libgfortran-ng=7.5.0
libxcb=1.14
faiss-gpu=1.7.0
libgomp=9.3.0
cudatoolkit=10.2.89
`

omim_to_mondo_dict.pkl is missing (preprocess_mygene2.py)

Also, which code creates it?
(Trying to learn about your cohort-graph mapping)

add_spl_patients: swapped saved file names?

Hello,

I am currently in the process of trying to evaluate the pre-trained SHEPHERD model for causal gene discovery on the myGene2 dataset. However, after computing the SPL matrix for this particular dataset using the add_spl_to_patients.py script, I encounter an issue. It seems that when I subsequently run the train.py script (with the flag --do_inference for evaluation), an error occurs. Interestingly, I have discovered that this error can be resolved by swapping the names of the saved files, namely spl_index_fname and spl_matrix_fname as follows:

with open(str(project_config.PROJECT_DIR / 'patients' / spl_index_fname), 'wb') as handle:
        pickle.dump(spl_indexing, handle, protocol=pickle.HIGHEST_PROTOCOL)
np.save(str(project_config.PROJECT_DIR / 'patients' / spl_matrix_fname), patients_spl_matrix)

Is there a specific reason for this behavior? I'm wondering if there is something I might be doing incorrectly.

NeighborSampler duplicates batch size - implementation question

Hello :)

In the NeighborSampler in the sample method, the batch initialize with the following code:

# sample nodes to form positive edges. we will try to predict these edges
row, col, e_id = self.adj_t_sample.coo()
# NOTE: only does self loops when no edges in the current partition of the dataset
target_batch = random_walk(row, col, source_batch, walk_length=1, coalesced=False)[:, 1]
batch = torch.cat([source_batch, target_batch], dim=0)


batch_size: int = len(batch)
        adjs = []
        n_id = batch
        for size in self.sizes:
            adj_t, n_id = self.adj_t.sample_adj(n_id, size, replace=False)
.....

For an isolated node u, the random_walk returns u in a walk of length 1, but for other nodes the walk will return a random neighbor.
Either way, the new batch, which now consist of duplicate amount of nodes, is then fed to the adj sampler. Hereby, sampling twice from the same nodes, or sampling neighbors from nodes that didn't appeared in the original batch.
I don't fully understand how this achieves self-loop for isolated nodes, wouldn't it make more sense just to add the missing edges to obtain self-loop?

Thank you :)

mims-harvard / shepherd Goto Github PK

shepherd's People

Stargazers

Watchers

Forkers

shepherd's Issues

Bug in train process!

Environment setup problem on Mac

omim_to_mondo_dict.pkl is missing (preprocess_mygene2.py)

add_spl_patients: swapped saved file names?

NeighborSampler duplicates batch size - implementation question

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent