Git Product home page Git Product logo

delfi_scripts's Introduction

delfi_scripts

This fork of delfi_scripts is for benchmarking against the DELFI method (short/long fragment ratios in 5mb bins). While we have attempted to stay as true to the original approach as possible, the following changes have been made:

  1. We have made it runnable with the HG38 assembly.

  2. We have removed the summary statistics, as we only want the feature creation. We run the cross-validation in python to have the same setup for all benchmarks.

  3. We have ignored the z-scores (a meaasure of chromosome-arm-specific copy number changes). In the paper, these do not add significantly to the model AUC.

  4. The features are saved as a csv file, so we can read them into python more easily. The feature creation is performed directly in 04-5mb_bins.r instead of 06-gbm_full.r.

  5. Added/replaced command line arguments to use optparse and handle in- and output files in parent workflow.

  6. GC correction failed when the GC content (bias argument) had NAs. So we set them to the median value.

  7. Extraction of GC contents per fragment required insane amounts of RAM (>512gb available for some files), so we replaced the original approach with the one used in DELFI Lucas (later paper).

  8. When summing across the 50 100kbp bins at a time, we ignore NAs instead of making the full 5mbp bin NA. This should reduce the number of 5mbp bins that are NA leading to removal of entire bins (which is problematic in a cross-dataset context when the test set has new NAs - would require imputation or similar).

delfi_scripts's People

Contributors

scristia avatar danielbruhm avatar rscharpf avatar

Stargazers

James Li avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.