Git Product home page Git Product logo

monarch-semantic-similarity-profiles's Introduction

monarch-semantic-similarity-profiles

monarch-semantic-similarity-profiles's People

Contributors

souzadevinicius avatar matentzn avatar

Stargazers

Harry Caufield avatar  avatar

Watchers

 avatar Monica Munoz-Torres avatar  avatar  avatar  avatar

monarch-semantic-similarity-profiles's Issues

Create Monarch KG enriched variant of Phenio for experimentation

Pseudocode

kg.ingestible.kgx->kg.ingestible.ttl (say g2p)
O = ROBOT_MERGE(kg.ingestible.ttl, PHENIO)
runoak semsimian O --relations g2p,p,i

kg.ingestible.kgx is an rdf dump created by Koza on a data modality we are interested in, such as gene to disease.

This ticket can be closed when we have one example implemented:

PHENIO + HPOA or
PHENIO + g2d

All TSV files produced should come with a SSSOM-style header

Headers are valid #-commented yaml files. We use them like:

# ontology: upheno1
# branches:
#   - HP:123
#   - MP:123
# similarity_measure: jaccard
# similarity_threshold: 0.7
# tool: semsimian
# tool_version: 0.0.1
subject_id	object_id	diff
....

This is will drastically reduce file sizes. See SSSOM for example implementation.

Why is HP:phenotypic abormality so different from MP:phenotypic abnormality?

In monarch-initiative/semsimian#82 (comment)

@caufieldjh showed us that HP:phenotypic abnormality is very different parents than MP:phenotypic abnormality.

Can we determine why? In particular, why does the HP term have Uberon ancestors?

@caufieldjh I will assign you for now, but feel free to talk to Chris and assign someone else - it is easier for me to work if I can assign while creating the ticket so I am sure its not dropping of the radar.

EPIC: replace exomiser p2p pipeline with semsimian

  • Create a make pipeline to generate hp-hp mappings based on phenio (with 0.4 Jaccard threshold)
  • Create a make pipeline to generate hp-mp mappings based on phenio (with 0.4 Jaccard threshold)
  • Create a make pipeline to generate hp-zp mappings based on phenio (with 0.4 Jaccard threshold)
  • Create the following experimental goals (4 configs, 12 distinct semantic similarity profiles, for phenio, use the "normal phenio" for now, not the "Monarch version" - note, this may change in the future - the reason for this is that Monarch PHENIO is changed in ways that I dont understand fully):
    • phenio ---> IC scores ---> semsimian + IC ---> hp-hp, hp-mp, hp-zp profile (#18)
    • phenio + equivalence ---> IC scores ---> semsimian + IC ---> hp-hp, hp-mp, hp-zp profile (#18)
    • phenio ---> semsimian ---> hp-hp, hp-mp, hp-zp profile
    • phenio ---> NEAT embeddings --> semsimian (cosine similarity) ---> hp-hp, hp-mp, hp-zp profile (#6)
  • Run exomiser with the default config and all of these three configs
  • Generate your new rank-changed plot
  • Present at PhEval meeting

Make sure all directories exist

runoak -i semsimian:sqlite:data/ontology/upheno2-lattice.db similarity -p i --set1-file data/tmp/upheno2-lattice_hp_terms.txt --set2-file data/tmp/upheno2-lattice_mp_terms.txt -O csv -o profiles/upheno2-lattice-hp-mp.semsimian.tsv
/usr/local/lib/python3.10/dist-packages/rdflib_jsonld/__init__.py:9: DeprecationWarning: The rdflib-jsonld package has been integrated into rdflib as of rdflib==6.0.0.  Please remove rdflib-jsonld from your project's dependencies.
  warnings.warn(
FileNotFoundError: [Errno 2] No such file or directory: 'profiles/upheno2-lattice-hp-mp.semsimian.tsv'
make: *** [Makefile:177: profiles/upheno2-lattice-hp-mp.semsimian.tsv] Error 1

Whenever a command is run that creates a file in a directory that does not have a dependency (direct or indirect) in that same direct, add mkdir -p dirname

Add a method for generating ontology embeddings to makefile

From @caufieldjh:

To get graph embeddings (note this is just with grape - NEAT may be used to automate the process, but this is what runs):
Install grape: pip install grape

from grape.datasets.kghub import KGPhenio
from grape.embedders import FirstOrderLINEEnsmallen
graph = KGPhenio()
embedding = FirstOrderLINEEnsmallen().fit_transform(graph)

By default, embedding will be a pandas df, or if you run `.fit_transform(graph, return_dataframe=False) then it will be a numpy array.
So the final step is to save accordingly, e.g. with embedding.to_csv('embedding.tsv', sep="\t")

Phenotypic similarity based on g2d, g2p _across_ SSPOs

Right now we only pull in gene associations from HPOA. This means, we only have HP->Gene associations. However, to facilitate gene-level semantic similarity between HP and MP, we need to have MP->Gene associations and Genetic orthologue relations as well.

@kevinschaper can you help us here wrt.

How would you compare two phenotypic profiles, one MP, one HP, only along their p2g associations?

Add odk.sh and `make install`

  • Add a step make install that installs all the latest dependencies (.PHONY goal)
  • Add an odk.sh wrapper to be able to run the whole pipeline in docker

Document in README.md how to run the whole pipeline from install to generating everything

Provide make goal for upheno2-merged.owl

This is the replacement ticket for obophenotype/upheno-dev#38

We need to iterate over this goal, as it is, as of yet, not clear how to fix this. In the old uPheno,

MP:123 = HP:123.

In uPheno2, MP:123 sub UPHENO:111, HP:123 sub UPHENO:111, so the equivalence axiom is replaced by a common parent. This drastically changes the way (graph-based) semantic similarity algorithms behave.

This is the key ticket: INCATools/ontology-access-kit#634

@souzadevinicius This has a high priority, but right now, I don't know how to advice you on fixing it.

Can you make sure this does not fall under the radar, and mention it to me every time we meet? (add to your board as high priority)

Add jinja pipeline (config + Makfile.j2)

profiles.yml contains:

ontologies:
   - id: upheno2-lattice
   - id: upheno1-equivalent
   - id: upheno1
semantic_similarity_profiles:
   - name: all
      method: semsimian 
      ontology: upheno2-lattice
      branches: 
         subject: UPHENO:0001001
         object: UPHENO:0001001
   - subset: hp-mp
      method: semsimian 
      ontology: upheno2-lattice
      branches: 
         subject: UPHENO:0001001
         object: UPHENO:0001001
      prefixes:
         subject: HP
         object: MP 
   - subset: hp-mp
      method: semsimian 
      ontology: upheno2-lattice
      branches: 
         subject: UPHENO:0001001
         object: UPHENO:0001001
      prefixes:
         subject: HP
         object: MP
   - subset: hp-mp
      method: cosine 
      ontology: upheno2-lattice
      branches: 
         subject: UPHENO:0001001
         object: UPHENO:0001001
      prefixes:
         subject: HP
         object: MP

Makefile.j2 includes all make goals to:

  1. generate ontologies in the ontologies section of the config.
  2. The semantic similarity profiles according to the config

Semsimian is oak semsimian and cosine is neat consine similarity - work with Justin and Harry to set this up.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.