Git Product home page Git Product logo

gnrs's Introduction

GNRS (Graph nonreference sequences)

Pangenome of human nonreference sequences from population-scale long-read sequencing


 ___________         ____         __         ____________         __________
|  _________|       |    \       |  |       |  ________  |       |   ____   | 
|  |                |  |\ \      |  |       |  |      |  |       |  |    |__|
|  |                |  | \ \     |  |       |  |      |  |       |  |        
|  |   _____        |  |  \ \    |  |       |  |______|  |       |  |_______  
|  |  |__   |       |  |   \ \   |  |       |   ___   ___|       |_______   |
|  |     |  |       |  |    \ \  |  |       |  |   \  \           __     |  |          
|  |     |  |       |  |     \ \ |  |       |  |    \  \         |  |    |  |           
|  |_____|  |       |  |      \ \|  |       |  |     \  \        |  |____|  |            
|___________|       |__|       \____|       |__|      \__\       |__________|   


Description

In order to make sure the results are reproduceable, the pipeline is performed using framework Snakemake coupled with the environment conducted by Anoconda. And the pipeline can be used in other cohort with long-read sequencing.

The workflow of GNRS on the population-scale long-read sequencing are below: image

Schematic representation of GraphNRS

  • a, Long-read sequencing data from different platforms are de novo assembled and polished.
  • b, The NRSs are anchored to GRCh38. Placed NRSs are clustered to select the representative NRSs, and unplaced NRSs are clustered after filtering out contaminants and centromeric repeats. Then, we merge the placed and the unplaced NRSs to obtain the nonredundant NRSs of the whole population.
  • c, vg is used to construct the graph pangenome, and NRS genotyping is performed for each NRS of the individual.

Requirements

1. wtdbg2 v2.5
2. MarginPolish v1.3.0
3. Hifiasm v0.16.1-r375
4. NextPolish v1.4.0 
5. QUAST v5.0.2
6. AGE v0.4
7. Kalign v3.3
8. Jasmine v1.1.0
9. vg toolkit v1.33.1
10. GraphAligner v1.0.13
11. snakemake v7.2.1

Configure the environment

Install the software and configure the environment:

Please note the comments in the pipeline. Change the sample path in the configuration file.


Quick start for the pipeline

usage: snakemake -p -s GNRS.pipeline.py --configfile GNRS.pipeline.yaml --cores

Our pipeline works for any species with a reference genome. We tested the pipeline on yeast dataset.

The results of the tests are in the output folder.


Datasets generated from GNRS

We provided the NRS callsets of the 539 individuals produced by GNRS from three different long-read sequencing platforms. (i.e. PacBio CLR, PacBio HiFi, and ONT). The sequences and genotypes of the NRSs are publicly available at the National Genomics Data Center (NGDC), China National Center for Bioinformation (CNCB) with project accession number PRJCA007976. The sequences and genotypes of the placed NRSs are available with accession number GVM000324. And the sequences of the unplaced NRSs are under the accession number GWHBHSK00000000.


Citation

Wu Z, Li T, Jiang Z, Zheng J, Gu Y, Liu Y, Liu Y, Xie Z. Human pangenome analysis of sequences missing from the reference genome reveals their widespread evolutionary, phenotypic, and functional roles. Nucleic Acids Res. 2024 Feb 14:gkae086. doi: 10.1093/nar/gkae086. Epub ahead of print. PMID: 38364871.


Contact

For advising, bug reporting and requiring help, please post on Github Issue or contact [email protected].

gnrs's People

Contributors

kmanjor avatar

Watchers

 avatar

Forkers

xie-lab

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.