Comments (9)
@souzadevinicius can you try to use phenio.db as located in the URL @kevinschaper provided?
from monarch-semantic-similarity-profiles.
To clarify: I didn't intend to suggest not using jaccard, I just pointed out that jaccard is between a pair of phenotype terms is computed using the ontology only, the associations make no difference.
IC can be calculated with different corpuses: just the ontology, ontology + associations.
from monarch-semantic-similarity-profiles.
@souzadevinicius I updated the text for this issue!
from monarch-semantic-similarity-profiles.
from monarch-semantic-similarity-profiles.
I'm working on subsetted tsv output this week, so soon I'll have an "all gene to phenotype" file or a "just mouse gene to phenotype" file.
I don't know how it would mix in to your pipeline, but the phenio.db that's distributed with monarch-kg builds has all of the phenotype associations populated in the term_association table.
It also might be practical (if ugly) to grep from the tsv:
zcat monarch-kg-denormalized-edges.tsv.gz | grep -e '^id' -e 'has_phenotype' | grep 'MGI:' | grep 'MP:'
or a bit more structured by using the sqlite artifact:
sqlite3 monarch-kg.db -cmd ".mode tabs" -cmd ".headers on" "select * from denormalized_edges where subject_taxon = 'NCBITaxon:10090' and predicate = 'biolink:has_phenotype'" > mgi_g2p.tsv
from monarch-semantic-similarity-profiles.
@kevinschaper how did the phenio build end up with all of those associations? is this phenio + all of monarch KG? Any documentation (or issue) about this, and a location where to get it?
I also learned something else today from @cmungall. @souzadevinicius we should typically not use Jaccard for comparing phenotypic profiles with gene associations. It seems that we should be using the "Resnik score" only, which somehow (the details are hazy) takes these scores into account by leveraging Information Content. Can you, for our next meeting:
- Understand how semsimian computes resnik (formula, how it works, with an example)
- Check if the resnik scores look somehow "sane" when using semsimian?
from monarch-semantic-similarity-profiles.
Sorry, not great documentation. Here is the issue and where it happens in the code: https://github.com/monarch-initiative/monarch-ingest/blob/d65456eb3667a47d960e21f766b1ec65f1b4f774/scripts/load_sqlite.sh#L36
Semsimian uses phenio.db directly, and the semantic sql schema has a table for term associations, so it made sense to supply the associations that way...except that it's obviously a bit weird and circular to populate monarch-kg with phenio and then populate phenio with associations from monarch-kg. So the solution we have right now is that a kg release includes a phenio.db with these extra associations.
from monarch-semantic-similarity-profiles.
Oh, kg artifacts available at: http://data.monarchinitiative.org/monarch-kg/latest/ (purl still to come, of course...)
from monarch-semantic-similarity-profiles.
This issue has been resolved - we now understand that the gene associations were meant to affect only the IC score, and not the Jaccard score. Thanks everyone for your help!
from monarch-semantic-similarity-profiles.
Related Issues (20)
- Add `make release` goal to makefile
- All semantic similarity profiles should be gzipped
- Make sure all directories exist
- Add a method for generating ontology embeddings to makefile HOT 1
- Why is HP:phenotypic abormality so different from MP:phenotypic abnormality? HOT 2
- Provide make goal for upheno2-merged.owl HOT 2
- All TSV files produced should come with a SSSOM-style header
- Create Monarch KG enriched variant of Phenio for experimentation HOT 1
- Adding G2P and D2P associations to Phenio
- Create semantic similarity profile for HP-FBcv
- Create a goal for computing ic scores in Makefile
- EPIC: replace exomiser p2p pipeline with semsimian
- Error during loading IC file in SEMSIM calculation HOT 6
- Calculating HPxXPO Semsim
- Add suffix IC to the generated SEMSIM profiles
- Add information content file as a dependency during SEMSIM profile generation
- Rerun experiments after SEMSIM IC bug fix
- Generate SEMSIM Profiles Markdown Documentation from Jinja template
- New analysis of phenodigm score
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from monarch-semantic-similarity-profiles.