Git Product home page Git Product logo

Comments (9)

matentzn avatar matentzn commented on September 25, 2024 1

@souzadevinicius can you try to use phenio.db as located in the URL @kevinschaper provided?

from monarch-semantic-similarity-profiles.

cmungall avatar cmungall commented on September 25, 2024 1

To clarify: I didn't intend to suggest not using jaccard, I just pointed out that jaccard is between a pair of phenotype terms is computed using the ontology only, the associations make no difference.

IC can be calculated with different corpuses: just the ontology, ontology + associations.

from monarch-semantic-similarity-profiles.

matentzn avatar matentzn commented on September 25, 2024

@souzadevinicius I updated the text for this issue!

from monarch-semantic-similarity-profiles.

matentzn avatar matentzn commented on September 25, 2024

cc @kevinschaper

from monarch-semantic-similarity-profiles.

kevinschaper avatar kevinschaper commented on September 25, 2024

I'm working on subsetted tsv output this week, so soon I'll have an "all gene to phenotype" file or a "just mouse gene to phenotype" file.

I don't know how it would mix in to your pipeline, but the phenio.db that's distributed with monarch-kg builds has all of the phenotype associations populated in the term_association table.

It also might be practical (if ugly) to grep from the tsv:

zcat monarch-kg-denormalized-edges.tsv.gz | grep -e '^id' -e 'has_phenotype' | grep 'MGI:' | grep 'MP:'

or a bit more structured by using the sqlite artifact:

sqlite3 monarch-kg.db -cmd ".mode tabs" -cmd ".headers on"  "select * from denormalized_edges where subject_taxon = 'NCBITaxon:10090' and predicate = 'biolink:has_phenotype'" > mgi_g2p.tsv

from monarch-semantic-similarity-profiles.

matentzn avatar matentzn commented on September 25, 2024

@kevinschaper how did the phenio build end up with all of those associations? is this phenio + all of monarch KG? Any documentation (or issue) about this, and a location where to get it?

I also learned something else today from @cmungall. @souzadevinicius we should typically not use Jaccard for comparing phenotypic profiles with gene associations. It seems that we should be using the "Resnik score" only, which somehow (the details are hazy) takes these scores into account by leveraging Information Content. Can you, for our next meeting:

  1. Understand how semsimian computes resnik (formula, how it works, with an example)
  2. Check if the resnik scores look somehow "sane" when using semsimian?

from monarch-semantic-similarity-profiles.

kevinschaper avatar kevinschaper commented on September 25, 2024

Sorry, not great documentation. Here is the issue and where it happens in the code: https://github.com/monarch-initiative/monarch-ingest/blob/d65456eb3667a47d960e21f766b1ec65f1b4f774/scripts/load_sqlite.sh#L36

Semsimian uses phenio.db directly, and the semantic sql schema has a table for term associations, so it made sense to supply the associations that way...except that it's obviously a bit weird and circular to populate monarch-kg with phenio and then populate phenio with associations from monarch-kg. So the solution we have right now is that a kg release includes a phenio.db with these extra associations.

from monarch-semantic-similarity-profiles.

kevinschaper avatar kevinschaper commented on September 25, 2024

Oh, kg artifacts available at: http://data.monarchinitiative.org/monarch-kg/latest/ (purl still to come, of course...)

from monarch-semantic-similarity-profiles.

matentzn avatar matentzn commented on September 25, 2024

This issue has been resolved - we now understand that the gene associations were meant to affect only the IC score, and not the Jaccard score. Thanks everyone for your help!

from monarch-semantic-similarity-profiles.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.