Git Product home page Git Product logo

rjoliver33 / genomics-in-the-cloud Goto Github PK

View Code? Open in Web Editor NEW

This project forked from broadinstitute/genomics-in-the-cloud

0.0 0.0 0.0 1.9 MB

Source code and related materials for the O'Reilly book

Home Page: https://broadinstitute.github.io/genomics-in-the-cloud/

License: BSD 3-Clause "New" or "Revised" License

R 0.75% WDL 13.05% Shell 0.14% HTML 0.62% Python 0.88% Jupyter Notebook 84.57%

genomics-in-the-cloud's Introduction

genomics-in-the-cloud

Source code and related materials for Genomics in the Cloud, an O'Reilly book by Geraldine A. Van der Auwera and Brian D. O'Connor.

This site is a work in progress, we will continue to add content here now that the book has been released.

Find the electronic version of the book today at https://oreil.ly/genomics-cloud or on Amazon (Kindle version), or pre-order the paperback version on Amazon.

Book overview

Data in the genomics field is booming. In just a few years, organizations such as the National Institutes of Health (NIH) will host 50+ petabytes—or 50 million gigabytes—of genomic data, and they’re turning to cloud infrastructure to make that data available to the research community. How do you adapt analysis tools and protocols to access and analyze that data in the cloud?

With this practical book, researchers will learn how to work with genomics algorithms using open source tools including the Genome Analysis Toolkit (GATK), Docker, WDL, and Terra. Geraldine Van der Auwera, longtime custodian of the GATK user community, and Brian O’Connor of the UC Santa Cruz Genomics Institute guide you through the process. You’ll learn by working with real data and genomics algorithms from the field.

This book takes you through:

  • Essential genomics and computing technology background
  • Basic cloud computing operations
  • Getting started with GATK
  • Three major GATK Best Practices pipelines for variant discovery
  • Automating analysis with scripted workflows using WDL and Cromwell
  • Scaling up workflow execution in the cloud, including parallelization and cost optimization
  • Interactive analysis in the cloud using Jupyter notebooks
  • Secure collaboration and computational reproducibility using Terra

Resources

List of commands

See the commands folder for text files that let you easily copy and paste the commands from the hands-on exercises.

Figures

For those of you reading the print version of the book, which does not include color figures, we've made the figures available in the figures directory of the GCS bucket.
You may use all figures except 3-3 and 6-15 in your own non-commercial work, preferably with a notice of attribution referring to the book. For commercial use, please contact [email protected]. Figures 3-3 and 6-15 do not belong to us, so you must request permission from their respective owners, which are noted in the book.

Blog

We're developing a blog for the book at https://broadinstitute.github.io/genomics-in-the-cloud/ where we will publish blog posts, additional tutorials, errata for the book, and regular updates on new features that you maay be interested in. Feel free to suggest blog topics by reaching out to us on Twitter or LinkedIn (see contact info below).

Reporting errors

If you encounter errors or broken links in the book, please file an issue on O'Reilly's Errata page. Anything reported there that we can verify will get fixed and updated in both the electronic versions and subsequent printing runs of the book, so others won't run into the same problems.

We don't use Github Issues for this project to avoid confusion and redundancy with the O'Reilly Errata page.

Getting help

If you run into problems while working through the hands-on exercises, or if have follow-up questions about the topics we discuss in the book, please post your questions in either the GATK forum or the Terra forum. The frontline support team will most likely be able to address your questions, and for anything else they will loop us into the conversation if you mention that your question is related to our book. If you're not sure which forum to use, just flip a coin; it's the same team that maintains both communities.

Remember also that you can often save yourself some time by searching the GATK documentation or Terra documentation before posting a question -- that way you don't have to wait for someone to get back to you.

Getting in touch with us

If you'd like to get in touch, you can reach us on Twitter (@VdAGeraldine and @boconnor) and on LinkedIn (Geraldine and Brian). We look forward to hearing what you think of the book! If you like it, please consider posting a review on Amazon.

genomics-in-the-cloud's People

Contributors

briandoconnor avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.