Git Product home page Git Product logo

genome_assembly's Introduction

Genome Assembly

This repository is a usable, publicly available tutorial. All steps have been provided for the UConn CBC Xanadu cluster here with appropriate headers for the Slurm scheduler that can be modified simply to run. Commands should never be executed on the submit nodes of any HPC machine. If working on the Xanadu cluster, you should use sbatch scriptname after modifying the script for each stage. Basic editing of all scripts can be performed on the server with tools such as nano, vim, or emacs. If you are new to Linux, please use this handy guide for the operating system commands. In this guide, you will be working with common bioinformatic file formats, such as FASTA, FASTQ, SAM/BAM, and GFF3/GTF. You can learn even more about each file format here. If you do not have a Xanadu account and are an affiliate of UConn/UCHC, please apply for one here.

Contents

  1. Data Download
  2. Quality Control
  3. Genome Size Estimation
  4. Caddisfly Assembly
  5. Hornwort Long Read Assembly
  6. Hornwort Short Read Assembly
  7. Hornwort Hybrid Assembly

1. Data Download

Follow the scripts in 01_data_download to obtain the publically available data. Here we are downloading pacbio HiFi, ONT, and illumina reads.

2. Quality Control

Follow the scripts in the 02_quality_control directory to characterize the reads with NanoPlot and FastQC. In here we also trim the illumina data and filter all the reads for contamination.

3. Genome Size Estimation

Follow the scripts in the 03_genome_size directory to generate counts of k-mers/ a histogram with Jellyfish to then take to GenomeScope. This will give an estimation for the genome size and heterozygosity of the sample.

4. Caddisfly Assembly

Follow the scripts in the 04_caddisfly directory to assemble a genome with PacBio HiFi data. We use both flye and hifiasm assemblers in this directory. We then evaluate with busco, quast, and minimap2.

5. Hornwort Long Read Assembly

In the 05_hornwort_ONT there are scripts to assemble a genome with ONT reads and Flye assembler. We then demonstrate polishing with medaka (long reads) and polca (short reads). We then evaluate with busco, quast, minimap2, and bwa.

6. Hornwort Short Read Assembly

In 06_hornwort_illumina we demonstrate a short read assembly with MaSuRCA. We then evaluate with busco, quast, and bwa.

7. Hornwort Hybrid Assembly

In 07_hornwort_hybrid we use both illumina and ONT reads to assemble a hybrid genome with MaSuRCA. We also demonstrate polishing with polca. We then evaluate with busco, quast, minimap2, and bwa.

genome_assembly's People

Contributors

golden75 avatar mianahom avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.