Git Product home page Git Product logo

mockcommunity's Introduction

Nanopore GridION and PromethION Mock Microbial Community Data Community Release

R10 Data Release (2019-02-28)

We are pleased to be able to release prototype R10 pore data for the even Zymo mock community generated on the Oxford Nanopore GridION. This data is from the even mock community and has been generated in the same manner as previous releases, except that this material is from PCR-amplified material.

The R10 pore is expected to give better discrimination in homopolymer regions and preliminary analysis suggests it improves consensus-level accuracy (see Jared Simpson's talk at NCM18 for early results: https://vimeo.com/306201411). This dataset is made available without restriction for community analysis. We anticipate releasing detailed analysis in the near future.

Summary stats:

  • Reads: 4.23M
  • Bases: 16.59Gb
  • Read length N50: 4,620bp

Signal files were basecalled using Guppy 2.3.1+1b9405b using a pre-release basecalling model R10_flipflop_model.json.

Download links (courtesy of CLIMB):

We thank Rosemary Dokos, Chris Wright, Jon Pugh and Jayne Wallace from Oxford Nanopore Technologies for their help and assistance with preparation of this dataset.

Release 2 (2018-10-17)

We recently released data for the Zymo mock community (1 run on PromethION, 3 runs on MinION). However, we found that our previous bead-based DNA extraction which focused on the bacterial cell pellet from the Zymo mock community resulted in under-representation of the Gram-negative bacteria in the sample. Additionally, during this work, the composition of the Zymo mock community changed.

Therefore we have repeated the sequencing of the most recent Zymo log and even community samples on the GridION and PromethION. In this case we prepared libraries simultaneously, incorporating both the pellet extraction and the supernatant to try to recover all species equally.

We loaded the same libraries on GridION and PromethION, permitting a direct comparison of these two instruments to be made.

Zymo Community Standards 2 (Even) Batch ZRC190633

  • Useful for evaluating nanopore data analysis, including basecalling, alignment, assembly and taxonomic assignment methods
  • 10 species (5 Gram-positive, 3 Gram-negative, 2 yeast): the bacteria are present at 12% and yeast at 2% (by genomic DNA)
  • Zymo Specification Sheet
  • Data available from:
    • GridION (Zymo-GridION-EVEN-BB-SN)
    • PromethION (Zymo-PromethION-EVEN-BB-SN)

Zymo Community Standards 2 (Log/Staggered) Batch ZRC190842

  • Useful for evaluating limit of detection at high coverage and assess metagenomics assembly across large differences in abundances
  • 10 species (5 Gram-positive, 3 Gram-negative, 2 yeast) ranging from 10^2 - 10^8 genomic DNA abundance (total input 5 x 10^8 cells)
  • Zymo Specification Sheet
  • Data available from:
    • GridION (Zymo-GridION-LOG-BB-SN)
    • PromethION (Zymo-PromethION-LOG-BB-SN)

Data Availability

Name Reads (M) Yield (G) FASTQ Run Folder Restarts FAST5
Zymo-PromethION-LOG-BB-SN 35.1 148 fastq.gz 64h run restarts download.sh, restarts.tar
Zymo-PromethION-EVEN-BB-SN 36.5 146 fastq.gz 64h run restarts download.sh, restarts.tar
Zymo-GridION-LOG-BB-SN 3.7 16 fastq.gz 48h run n/a signal.tar
Zymo-GridION-EVEN-BB-SN 3.5 14 fastq.gz 48h run n/a signal.tar

Further information

Please refer to Josh Quick's talk at Genome Science 2018.

Additional resources

Refer to our project website for assemblies.

License

Data are available under a Creative Commons Attribution 4.0 International (CC BY 4.0) license, i.e. you are free to use the data with attribution.

Acknowlegements

We are grateful to Hannah McDonnell (Cambridge Biosciences) for free samples of the Zymo mock community, and Shuiquan Tang for helpful advice for this work. We are grateful to Divya Mirrington (Oxford Nanopore Technologies) for assistance with PromethION library preparation. We thank Radoslaw Poplawski (CLIMB, University of Birmingham) for sequencer networking and file system help with the data release. We are grateful to CLIMB for data hosting.

Previous Data Releases

mockcommunity's People

Contributors

nickloman avatar samstudio8 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mockcommunity's Issues

R10 Guppy model

Hi,
It is mentioned in https://lomanlab.github.io/mockcommunity/r10.html that there is somewhere a Guppy model for modification-aware R10 basecalling (template_r10_450bps_hac_meth). Is it publicly available?
There is no cfg to basecall methylation (Dam really) fo R10 flowcells in current Guppy, and we have some files of this type...
Best regards,
Sergey Lavrov

Fast5 file download

Hi, I want to know if I can download the fast5 file by aspera because wget is too slow to download the 271GB file.

WebLink for Bacillus subtilis in sources.txt leads to an entry for Enterococcus faecium

Hi, I'm interested in using the R10 dataset for some benchmarking, and one thing I wanted to do was to download all the reference genomes provided in sources.txt.

However, if you follow the web link for Bacillus subtilis (line 6), this leads to an entry for Enterococcus faecalis strain sorialis.

Is this an incorrect entry in sources.txt, or something to be updated in the metadata, or other?

species of R10 data

Dear,
I head a talk today, and the speaker introduced the R10, which is much more accurate than R9.4. I am interested in the R10 data and found this github site. I wonder what is the species of the data https://s3.climb.ac.uk/nanopore/Zymo-GridION-EVEN-BB-SN-PCR-R10HC-flipflop.fq.gz and https://s3.climb.ac.uk/nanopore/Zymo-GridION-EVEN-BB-SN-PCR-R10HC_multi.tar. And also, I would like to know the coverage of this dataset.

Thanks very much.

Best
Xiaofei

guppy version used for R10.3 pore data

Hi,

Could you tell me what was the guppy version used for the basecalling of the R10.3 pore data? I think that info is missing on the website.

Thanks

Missed CellType col in metadata/even.txt

This will cause analysis/scripts/summariseStats.R to fail:

Error in `select()`:
! Can't subset columns that don't exist.Column `CellType` doesn't exist.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.