Git Product home page Git Product logo

Comments (2)

semenko avatar semenko commented on August 30, 2024

Cool idea -- what were you thinking for a different normalization option?

from pipeline.

jdimatteo avatar jdimatteo commented on August 30, 2024

@semenko I was thinking a method where all the counts for a given .bam file add up to 1

For bamliquidator with bins, just divide each count by the total of the counts, e.g. if two bins and just one chromosome:

  1. chr1, bin 1 count is 3, normalized count is 3/(3+4)
  2. chr2, bin 2 count is 4, normalized count is 4/(3+4)

For regions, first divide by bin width to normalize on bin width. Then divide each of the normalized-on-bin-width counts by the total of the normalized-on-bin-width counts. For example, if two regions:

  1. chr1, start 100, stop 200, count is 1, normalized by bin width count is 1/(200-100)= 0.01, then final normalized count is 0.01/(0.01+0.003)=0.769230...
  2. chr2, start 2000, stop 3000, count is 3, normalized by bin width count is 3/(2000-1000)=0.003, then final normalized count is 0.003/(0.01+0.003)=0.230769...

I should probably add twice the extension length to the region size for this calculation.

Maybe I'll add the command line option -u/--unity_normalization to use this alternative normalization method where things add up to 1, and the default will remain bases per million reads per base?

@bradnerComputation : does this sound like a less arbitrary normalization option? Does adding twice the extension length to the region size make sense?

I don't want to add many normalization options, but I would like to add one more normalization option that isn't skewed by bams with different read lengths.

from pipeline.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.