This repository includes projects using Python, R, Bioconductor, and Galaxy to understand, analyze, and interpret data from next-generation sequencing experiments.
The following projects are included:
Set up a workflow to analyse DNA polymorphic sites of father-mother-child sequencing samples
This project includes a set of tools for dna sequence analyses (a. check records in file (count_records), b. compute the length of each DNA sequence (check_length), c. identify open read frame in each DNA sequence (orf_identifier), d. identify repeated motif in sequence (repeats_identifier)).
3. [Linux Command Line Tools for Data Science in Biology] (https://github.com/lanttern/DATA_SCIENCE_IN_BIOLOGY/tree/master/Linux%20Command%20Line%20Tools%20for%20Data%20Science%20in%20Biology)
This project apply bowtie2, samtools, bedtools and bcftools to: 1) Analyze RNA-seq data to determine sets of genes that are expressed in the various tissues; 2) cataloging genetic variation using SAMtools and BEDtools as well as other Unix commands; 3) develop a pipeline for variant calling in a given genome; 4) perform the bioinformatics analysis to determine genes that are differentially expressed at different experimental conditions with Linux command line.
4. [Sequencing Alignment and Genome Assembly] (https://github.com/lanttern/DATA_SCIENCE_IN_BIOLOGY/tree/master/Sequencing%20Alignment%20and%20Genome%20Assembly)
Python is used to implement key algorithms and data structures to analyze real genomes and DNA sequencing datasets.
5. [R and Bioconductor] (https://github.com/lanttern/DATA_SCIENCE_IN_BIOLOGY/tree/master/Bioconductor%20for%20Genomic%20Analysis%20)
Use tools from the Bioconductor project and R program language to analyze genomic data.
6. [Statistics with R for Genomic Data] (https://github.com/lanttern/DATA_SCIENCE_IN_BIOLOGY/tree/master/Statistics%20for%20Genomic%20Data%20Science)
An introduction to the statistics for analysis of genomic data.