monarch-initiative / monarch-semantic-similarity-profiles Goto Github PK
View Code? Open in Web Editor NEWLicense: MIT License
License: MIT License
Pseudocode
kg.ingestible.kgx->kg.ingestible.ttl (say g2p)
O = ROBOT_MERGE(kg.ingestible.ttl, PHENIO)
runoak semsimian O --relations g2p,p,i
kg.ingestible.kgx is an rdf dump created by Koza on a data modality we are interested in, such as gene to disease.
This ticket can be closed when we have one example implemented:
PHENIO + HPOA or
PHENIO + g2d
And make sure its uploaded in the right location
Adding custom.Makefile goals to include these associations in phenio ontology:
Headers are valid #
-commented yaml files. We use them like:
# ontology: upheno1
# branches:
# - HP:123
# - MP:123
# similarity_measure: jaccard
# similarity_threshold: 0.7
# tool: semsimian
# tool_version: 0.0.1
subject_id object_id diff
....
This is will drastically reduce file sizes. See SSSOM for example implementation.
Add make release
goal to makefile that uploads all semantic similarity profiles as GZIPPED archives to a new versioned semsim profile release.
Inspiration: https://github.com/INCATools/ontology-development-kit/blob/master/template/src/ontology/Makefile.jinja2#L1084
In monarch-initiative/semsimian#82 (comment)
@caufieldjh showed us that HP:phenotypic abnormality is very different parents than MP:phenotypic abnormality.
Can we determine why? In particular, why does the HP term have Uberon ancestors?
@caufieldjh I will assign you for now, but feel free to talk to Chris and assign someone else - it is easier for me to work if I can assign while creating the ticket so I am sure its not dropping of the radar.
phenio
, use the "normal phenio" for now, not the "Monarch version" - note, this may change in the future - the reason for this is that Monarch PHENIO is changed in ways that I dont understand fully):
runoak -i semsimian:sqlite:data/ontology/upheno2-lattice.db similarity -p i --set1-file data/tmp/upheno2-lattice_hp_terms.txt --set2-file data/tmp/upheno2-lattice_mp_terms.txt -O csv -o profiles/upheno2-lattice-hp-mp.semsimian.tsv
/usr/local/lib/python3.10/dist-packages/rdflib_jsonld/__init__.py:9: DeprecationWarning: The rdflib-jsonld package has been integrated into rdflib as of rdflib==6.0.0. Please remove rdflib-jsonld from your project's dependencies.
warnings.warn(
FileNotFoundError: [Errno 2] No such file or directory: 'profiles/upheno2-lattice-hp-mp.semsimian.tsv'
make: *** [Makefile:177: profiles/upheno2-lattice-hp-mp.semsimian.tsv] Error 1
Whenever a command is run that creates a file in a directory that does not have a dependency (direct or indirect) in that same direct, add mkdir -p dirname
Add a goal to the repo to gzip all the semantic similarity profiles, see https://github.com/INCATools/ontology-development-kit/blob/master/template/src/ontology/Makefile.jinja2#L905 for inspiration.
From @caufieldjh:
To get graph embeddings (note this is just with grape - NEAT may be used to automate the process, but this is what runs):
Install grape: pip install grape
from grape.datasets.kghub import KGPhenio
from grape.embedders import FirstOrderLINEEnsmallen
graph = KGPhenio()
embedding = FirstOrderLINEEnsmallen().fit_transform(graph)
By default, embedding will be a pandas df, or if you run `.fit_transform(graph, return_dataframe=False) then it will be a numpy array.
So the final step is to save accordingly, e.g. with embedding.to_csv('embedding.tsv', sep="\t")
Right now we only pull in gene associations from HPOA. This means, we only have HP->Gene associations. However, to facilitate gene-level semantic similarity between HP and MP, we need to have MP->Gene associations and Genetic orthologue relations as well.
@kevinschaper can you help us here wrt.
How would you compare two phenotypic profiles, one MP, one HP, only along their p2g associations?
make install
that installs all the latest dependencies (.PHONY goal)Document in README.md
how to run the whole pipeline from install to generating everything
IC scores should be computed before we run runoak similarity and passed in there using the ic-map parameter.
See monarch-initiative/semsimian#124 (comment) for some context
This is the replacement ticket for obophenotype/upheno-dev#38
We need to iterate over this goal, as it is, as of yet, not clear how to fix this. In the old uPheno,
MP:123 = HP:123.
In uPheno2, MP:123 sub UPHENO:111, HP:123 sub UPHENO:111, so the equivalence axiom is replaced by a common parent. This drastically changes the way (graph-based) semantic similarity algorithms behave.
This is the key ticket: INCATools/ontology-access-kit#634
@souzadevinicius This has a high priority, but right now, I don't know how to advice you on fixing it.
Can you make sure this does not fall under the radar, and mention it to me every time we meet? (add to your board as high priority)
profiles.yml contains:
ontologies:
- id: upheno2-lattice
- id: upheno1-equivalent
- id: upheno1
semantic_similarity_profiles:
- name: all
method: semsimian
ontology: upheno2-lattice
branches:
subject: UPHENO:0001001
object: UPHENO:0001001
- subset: hp-mp
method: semsimian
ontology: upheno2-lattice
branches:
subject: UPHENO:0001001
object: UPHENO:0001001
prefixes:
subject: HP
object: MP
- subset: hp-mp
method: semsimian
ontology: upheno2-lattice
branches:
subject: UPHENO:0001001
object: UPHENO:0001001
prefixes:
subject: HP
object: MP
- subset: hp-mp
method: cosine
ontology: upheno2-lattice
branches:
subject: UPHENO:0001001
object: UPHENO:0001001
prefixes:
subject: HP
object: MP
Makefile.j2
includes all make goals to:
ontologies
section of the config.Semsimian is oak semsimian and cosine
is neat consine similarity - work with Justin and Harry to set this up.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.