Git Product home page Git Product logo

coding_for_cancer's Introduction

Coding for Cancer

This project is currently under construction!

The information presented below includes some options for how these materials might be developed.

Description

These materials are supported by the Science Education and Training program at Fred Hutchinson Cancer Research Center.

General objectives:

  • developing computational thinking
  • understanding the process of biomedical research
  • coding with R

Target audience:

  • assumes no prior coding experience
  • aimed at upper-level high school students (and potentially their teachers)

Delivery mechanism:

  • two-week instructor-led summer course
  • potential incorporation with Articulate
  • self-directed, self-paced learning

Suggestions for curriculum development

For a two week summer program, the first week could focus on training and skills development, with the second week providing time to explore inquiry-based questions with various levels of scaffolding

To create basic intro to R materials, the introductory materials using a clinical cancer dataset could be streamlined by:

  • framing the example coding as research questions
  • minimize redundancy in examples
  • remove second half of class 2 (factors and creating data frames by hand)
  • add very basic statistical testing (t-tests?) to complement data visualizations
  • explicitly incorporating aspects of computational thinking and the research lifecycle, perhaps also reproducibility and open science principles?
  • including additional challenge exercises

The inquiry-based coding explorations in the second week could build on:

  • these prompts created for practice coding using COVID-19 data; see R prompts here
  • these materials that include RNAseq analysis, taught to interns in summer 2019

Class material

Materials for the original fredhutch.io materials include:

  • Class 1: R syntax, assigning objects, using functions
  • Class 2: Data types and structures; slicing and subsetting data
  • Class 3: Data manipulation with dplyr
  • Class 4: Data visualization in ggplot2

The data used for this course are from the National Cancer Institute's Genomic Data Commons. Please see Introduction to R from fredhutch.io for more information about how these data were compiled.

Please see the instructor's guide for information on teaching these materials and the contributing guide for assistance in developing or modifying these materials.

Required software: Software requirements for this course include:

This course is adapted from the following sources:

coding_for_cancer's People

Contributors

k8hertweck avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

coding_for_cancer's Issues

Examples and scaffolding

  • include mini-journal format example?
  • example "complete" project (depends on how much we frame around open science/reproducibility)

Motivation

  • shared culture: camaraderie and care about a community of practice
  • ability to answer important questions in biology (and beyond)

Career objectives

  • identify transferability of skills to both other areas of biology and other data-focused careers
  • relate to prior experience and future work

Data sharing/publishing

We've got the gene expression data from TCGA downloaded from GDC and in a format that is appropriate for students to use, but we need to figure out how to make data accessible to students. Options include:

  • Have students use a script to download using TCGAbiolinks: this is probably too complicated (though we could simplify by writing some custom functions for them?), and also requires them to install additional packages
  • Store in this GitHub repository: files are almost always too large
  • Create project on figshare, like the Data Carpentry Ecology dataset
  • Publish in another public repository?

I'm inclined to use figshare. Regardless, we need to determine what the data use policy is for TCGA data. They are publicly accessible, but are we allowed to publish them in a modified format elsewhere? What documentation/licensing needs to be included?

TCGA data access policies here

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.