Git Product home page Git Product logo

sccnv's Introduction

SCCNV

SCCNV: a software tool for identifying copy number variation from single-cell whole-genome sequencing

Version 1.0.2

Updated date: 2020.07.20

Author and License

Author: Xiao Dong

Email: [email protected], [email protected]

Licensed under the GNU Affero General Public License version 3 or later

DEPENDENCIES

python 2 or python 3

python modules os, argparse, math, numpy

bedtools, samtools

OPTIONS

-i, (or --ibam): a text file providing a list of bam files of single-cell whole-genome sequencing data; one row one file; required input file.

-o, (or --odir): a directory to output result; required output directory

-g, (or --genome): genome version; default: hg19

-s, (or --genomesize): genome size; default: 3140000000

-b, (or --binsize): binsize (bp); default: 500000; min: 1000

-w, (or --windowsize): windowsize (number of bins); default: 10

-p, (or --ploidy): ploidy of autosomes; default: 2

-m, (or --minmap): minimum mappability; default: 0.3

-n, (or --maxN): maximum N in the bin; default: 0.1

-k, (or --skipbed): skip calulcating bed with bedtools; default: False

-r, (or --report): report intermediate results; default: False

USAGE

I. Prepare bam files of every single cell for analysis

Below is an example for cell_A. I recommend filter out reads with mapq<30.

samtools view -b -q 30 ./bam/cell_A.bam > ./bam_mapq30/cell_A.mapq30.bam

samtools index ./bam_mapq30/cell_A.mapq30.bam

II. Prepare a list of the single-cell bam files.

Provide a file (e.g. “bamlist.txt”) with the following content. See NOTE below for important information when analyzing multiple single cells of a same clone, e.g., tumor cells.

./bam_mapq30/cell_A.mapq30.bam

./bam_mapq30/cell_B.mapq30.bam

./bam_mapq30/cell_C.mapq30.bam

III. Perform CNV calling

python sccnv.py -i bamlist.txt -o cellsAtoC -k False -r True

IV. Tips

One can prepare bed files for calculation using bedtools and samtools (for every cell in paralleles),

mkdir cellsAtoC

bedtools makewindows -g ./resource/hg19.chrlength.txt -w 500000 > ./cellsAtoC/hg19.bin500000.bed

samtools bedcov ./cellsAtoC/hg19.bin500000.bed ../bam_mapq30/cell_A.mapq30.bam > ./cellsAtoC/cell_A.depth.bin500000.bed

And skip the related step in SCCNV,

python sccnv.py -i bamlist.txt -o cellsAtoC -k True -r True

V. Example data

An example dataset is included in "sccnv_example_v1.0.2.zip". It includes (1) intermediate and final result files generated using SCCNV in txt format; and (2) basic R scripts for visualizing CNVs across the genome.

NOTE

(1) SCCNV aims to discover difference in CNV between every single cell in the bamlist.txt and the other cells in the bamlist.txt. When analyzing CNV of multiple tumor cells, it is not appropriate to include all tumor cells in the bamlist.txt. Instead, please use one tumor cell with two or more normal diploid cells in the bamlist.txt.

RELEASE NOTES

v1.0.2 2020.07.20, allowed calling CN>4 and included additional example data.

v1.0.1 2020.04.10, revised readme and included example data.

v1.0.0 2018.07.26, release version.

v0.0.3 2018.07.25, rewrote in Python.

v0.0.2 2017.08, fixed bugs.

v0.0.1 2016.09, drafted script in R.

sccnv's People

Contributors

biosinodx avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.