Git Product home page Git Product logo

Comments (12)

matsen avatar matsen commented on September 24, 2024

Re first point, it seems to me that we can do exactly the same as this:

except rather than doing matrix multiplication (e.g. in A_0 \cdot T) we take the max (rather than sum) of the component-wise products. We also record this index.

We also need to have someplace to store this information. We could have another Smooshish type that holds this information, does not have viterbi_ be the same as marginal_ as it is now.

I don't understand your second point-- the Viterbi path for the whole thing is the best path among those for the various chains, right?

from linearham.

dunleavy005 avatar dunleavy005 commented on September 24, 2024

Agree to the first point! Maybe even store it in NTInsertion class, but never initialize it if we want the marginal prob?

To your second comment, yes. But if we have tons of chains due to there being tons of SW matches, then doing a final max over all those wont be cheap? Its a minor thing if we have few matches per read (as our example files have). The sketch i have for how to do proper viterbi is not clean, so practically if you feel the number of SW matches is usually small most times then it shouldn't hurt to do that final max.

from linearham.

dunleavy005 avatar dunleavy005 commented on September 24, 2024

(Personally I'd rather not change so much of what we've already done for a proper viterbi if the gains aren't so high)

from linearham.

matsen avatar matsen commented on September 24, 2024

Agree to the first point! Maybe even store it in NTInsertion class, but never initialize it if we want the marginal prob?

Yep!

For your second point, I'm not sure what you are proposing as an alternate means of doing viterbi.

from linearham.

dunleavy005 avatar dunleavy005 commented on September 24, 2024

It would involve keeping track of not only max transition points but also max germline genes. Right now we only do it over transition points.

from linearham.

matsen avatar matsen commented on September 24, 2024

It seems to me that we can't get any more efficient than using the Chain structure, right? And at the end of the Chain inferences we will have the Viteri probabilities. From there it's just taking a max over a vector.

(Pretty sure I'm missing your point here...)

from linearham.

dunleavy005 avatar dunleavy005 commented on September 24, 2024

Its just that the max over all VDJ chains could be expensive because its max over (#V) * (#D) * (#J) elements, which could be costly depending on how many there are of V's, D's, J's, yeah?

from linearham.

matsen avatar matsen commented on September 24, 2024

Yes, but are you proposing a means of getting around that?

from linearham.

dunleavy005 avatar dunleavy005 commented on September 24, 2024

yes, but A) you seem ok with max'ing over all VDJ chains? and B) it might drastically change the code structure so do we care enough to do it?

from linearham.

matsen avatar matsen commented on September 24, 2024

You really have me curious, but I can guess what you're proposing by your comments above. It seems like for this round of coding we can focus on getting the existing branch merged and closing the issues as they stand, which AFAIK just requires more tests.

Then we can consider optimizing both the marginal and the viterbi calculations with more specialized code?

from linearham.

matsen avatar matsen commented on September 24, 2024

(BTW, partis will only serve I think 3 V's 5 D's and 4 J's, which is 60 combinations max)

from linearham.

dunleavy005 avatar dunleavy005 commented on September 24, 2024

we are going bayesian! viterbi unnecessary!

from linearham.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.