Git Product home page Git Product logo

Comments (3)

adw96 avatar adw96 commented on August 15, 2024

Thanks so much @jorondo1 -- I have some suspicions about what's happening. Two questions:

  1. How are you producing your count matrix? Are you filtering singletons?
  2. Could you please provide some example frequency count tables?

I think I can diagnose given the above information.

from breakaway.

jorondo1 avatar jorondo1 commented on August 15, 2024

The counts matrix is a Sourmash output. Here is a subset of the phyloseq object I am using as input. I have not done any filtering (except here for this subset), the counts are as-is. It looks like this:


             GQ1   GQ2  GQ32  GQ33   GQ3  GQ34  GQ35   GQ4  GQ36  GQ37   GQ5   GQ6  GQ11  GQ12  GQ40
 1 [Eubac…   130     0     0     0     0     0     0     0     0     0     0     0     0     0     0
 2 [Eubac…  1115 10007     0     0  7426     0   109   251     0     0 12743  9898     0    80     0
 3 [Eubac…   169   111     0     0     0     0     0     0     0     0     0     0     0     0     0
 4 Actino…   287   214     0     0   198     0     0     0     0     0     0   211   137   857     0
 5 Actino…   323   182     0     0     0     0     0     0     0     0     0     0     0   304     0
 6 Actino…    75     0     0     0     0     0     0     0     0     0     0     0     0   359     0
 7 Actino…  2333 28694     0     0  8682     0     0  2403     0     0  7657  9792   141    64     0
 8 Actino…   562   171     0     0   196     0     0     0     0     0   213     0    70   215     0
 9 Actino…  3796  1403     0     0  3390     0     0   559     0     0   157   124  1215  1414     0
10 Actino…   860  2216     0     0  3446     0     0    73     0     0  1970   967  1106  1233     0

Thanks a lot for your time!

from breakaway.

adw96 avatar adw96 commented on August 15, 2024

Interesting -- it looks like you have very few biological units that are observed infrequently. What breakaway is (reasonably) inferring from this is that there are few unobserved biological units, which is why breakaway's estimates are the same as your plugin estimates. Basically if your data structure doesn't suggest that you have anything that's rare, it will predict that you have nothing missing! This is definitely the intended behavior of breakaway, so I'm going to close it as an issue. I hope that helps answer your question!

That said, given the extremely strong correlation between depth and observed richness, if you wanted to fit a model for richness adjusting for depth, you might consider fitting a model like lm(sample_richness ~ your_covariates + depth). I don't recommend this often but given that you don't have the ability to estimate the # of rare units (min hash sketches?), this might be the best you can do in this setting.

Next time I chat with Taylor and Titus I might ask them about whether sourmash can detect rare/low abundance min hash sketches, since I'm ignorant on this.

from breakaway.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.