BISulfite-seq CUI Toolkit (BISCUIT) is a utility suite for analyzing sodium bisulfite conversion-based DNA methylation/modification data. It was written to perform alignment, DNA methylation and mutation calling, allele specific methylation from bisulfite sequencing data.
Latest release is here. To install BISCUIT,
$ unzip release.zip
$ cd biscuit-release
$ make
The created biscuit
binary is the main entry point.
All releases are available here. Note after v0.2.0, make sure use git clone --recursive
to get the submodules.
biscuit index GRCh38.fa
The index of BISCUIT composed of the 2-bit packed reference (.bis.pac
, .bis.amb
, .bis.ann
). The suffix array and FM-index of the parent strand (.par.bwt
and .par.sa
) and the daughter strand (.dau.bwt
and .dau.sa
).
The following snippet shows how BISCUIT can be used in conjunction with samtools to produce indexed alignment BAM file.
$ biscuit align -t 10 GRCh38.fa fastq1.fq.gz fastq2.fq.gz | samtools sort -T . -O bam -o output.bam
$ samtools index output.bam
$ samtools flagstat output.bam >output.bam.flagstat
The tview
subroutine colors the alignments in bisulfite mode. Here is a screenshot.
$ biscuit tview -g chr19:7525080 input.bam ref.fa
Unlike samtools, in this subroutine, a reference fasta file is mandatory so that bisulfite conversion can be identified.
This step is optional. The mark duplicate of BISCUIT is bisulfite strand aware.
$ biscuit markdup input.bam output.bam
Like samtools, BISCUIT extract DNA methylation as well as genetic information. The following shows how to produce a tabix-indexed VCF file.
$ biscuit pileup -r GRCh38.fa -i input.bam -o output.vcf -q 20
$ bgzip output.vcf
$ tabix -p vcf output.vcf.gz
The following extract CpG beta values from the VCF file.
$ biscuit vcf2bed -k 10 -t cg input.vcf.gz
-t
can also take
snp
- SNP informationc
- all cytosineshcg
- HCG for NOMe-seqgch
- GCH for NOMe-seq
Following illustrates how to produce epiread
which carries the information of epi-haplotype.
$ biscuit epiread -r GRCh38.fa -i input.bam -B snp.bed
To test all SNP-CpG pair,
$ biscuit epiread -r GRCh38.fa -P -i input.bam -B snp.bed
Details can be found here.
sort -k1,1 -k2,2n -k3,3n in.epiread >out.epiread
biscuit asm out.epiread >out.asm
Sometimes, the bisulfite conversion label in a given alignment is inaccurate, conflicting or ambiguous. The bsstrand
command summarizes these labels given the number of C>T, G>A substitutions. It can correct inaccurate labels as an option.
$ biscuit bsstrand GRCh37.fa input.bam
- lib/aln was adapted from Heng Li's BWA-mem code.
- lib/htslib was subtree-ed from the htslib library.
- lib/klib was subtree-ed from Heng Li's klib.