Git Product home page Git Product logo

Comments (7)

scubalaina avatar scubalaina commented on July 17, 2024

Hi there,

I also have a similar question and just wanted to boost this!
I'm working with RNAP (B and B' subunit genes separately) which are single-copy markers, so I don't have to worry about copy-numbers skewing thigns, but I'm wondering how the diversity calculations are implemented and interpreted with metagenomic data in which the whole composition of the single-gene community only accounts for a very small portion of the reads/members of the community - in other words, their relative abundances will not sum to 1?

Thanks,
Alaina :)

from divnet.

mooreryan avatar mooreryan commented on July 17, 2024

@scubalaina I have used DivNet in a similar way to you. When you are running it on the subcommunity (ie just the rna pol seqs) you are passing the data to DivNet as counts right? If so, it will go through its process treating that as samples/community in the right way.

from divnet.

scubalaina avatar scubalaina commented on July 17, 2024

from divnet.

mooreryan avatar mooreryan commented on July 17, 2024

@scubalaina something to keep in mind about normalizations ...you will be changing the read counts which could have an affect on variance estimations. Check out this tiny example. It's a silly contrived example where each gene has the same gene length, but the counts are still normalized by the gene length (ie reducing the count equally for all sample/genes in this particular example, and so increasing the variance). Of course this is just a silly example, but the point is that normalizing could impact variance estimations. Though, in practice, I'm not sure how much of an issue it will be. Someone from the Willis lab will have to comment on that.

One other thing if you're doing some normalization, you could think of a gene in a sample that has a low count like 2, but it is a 4kb gene, so its "per kilobase" count would be 0.5. Depending on your choice of pseudocount (for example, 0.5 was chosen in the DivNet manuscript for the analysis) that could be around the sam as that normalized count. Another thing to keep in mind.

divnet_rpk_variance.R.txt

alpha_div

(Not relevant to this discussion, but I work in a viral ecology lab, so I know some of your papers! Just a cool coincidence 😄)

from divnet.

scubalaina avatar scubalaina commented on July 17, 2024

from divnet.

mooreryan avatar mooreryan commented on July 17, 2024

I wonder how one could avoid compromising variance calculations without overestimating the abundance of longer genes if gene length isn't accounted for?

^ Yeah, that's a good question...as far as I know that is still an open research question. Someone from the Willis lab will have to weigh in here.

I attached the example below

^ I think you may have forgotten the attachment...I'm not seeing it.

(Yep in Wommack's lab...small world haha!)

from divnet.

scubalaina avatar scubalaina commented on July 17, 2024

Hi Ryan,

Sorry I was corresponding via email so the attachment probably didn't work through github. Here it is!
Screenshot 2023-06-26 at 5 52 35 PM

from divnet.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.