getting-started

How to get started with the single cell batches comparison

Motivation
Data and document access
Meetings
Data Access
- Single-cell Human BJ Fibroblasts produced by CSHL Single Cell Analysis students
- Publicly available bulk fibroblast sequencing
Git tutorials/Introductions
Downloading folders (aka "cloning repositories")
Adding an algorithm
Getting and Giving Help

Motivation

The students of the Single Cell Analysis course at CSHL this year produced an excellent RNA-seq dataset of ~13,000 single cells produced by 6 different pairs of students (the "batch"). A cursory analysis of the data, using PhenoGraph to cluster by the group that performed the experiment, rather than completely randomly.

We aim to compare multiple batch effect correction algorithms to see if we can remove the per-group effect.

Data and document access

Fill out this form with your name, email, and github username. After you fill that out, you should always have seamless access to the websites and services below. If it's not the case, email Olga (olga.botvinnik-youknowwhatgoeshere-czbiohub.org)
1. Calendar events (link) - We meet Mondays at 9am PDT / 11am EDT
2. Google Drive documents (link)
3. Amazon Web Services storage aka "AWS S3". Everything will be stored in the publicly viewable s3://singlecell-batches bucket. IF you would like write access, please contact Olga ([email protected])
4. GitHub singlecell-batches organization (link)
5. Open Science Framework "Single-cell RNA-Seq Batch Effect Algorithm Comparison" project (link)
6. Authorea collaboratively written paper (link)
Join the Slack Channel! singlecell-batches.slack.com invite link (expires 2017-08-26)

Meetings

We have a weekly meeting on Mondays 9am PDT / 12pm EDT (Link to the Google Hangouts Call). This is where we discuss batch effect correction algorithms, how to use them, data access, and analysis results. Hope to see you there!

Data Access

Single-cell Human BJ Fibroblasts produced by CSHL Single Cell Analysis students

The data from the CSHL course is hosted in the publicly viewable s3 bucket s3://singlecell-batches/fibroblasts/singlecell-cshl/. You can view the contents of this bucket using the Amazon Web Services Command Line Interface (awscli):

$ aws s3 ls --human-readable s3://singlecell-batches/fibroblasts/singlecell-cshl/

That command should output:

                           PRE HiSeq_CB1CDANXX/
                           PRE allGroups/
2017-08-15 16:27:11  117.5 MiB Group1.matrix.txt
2017-08-15 16:27:11  117.5 MiB Group2.matrix.txt
2017-08-15 16:27:11  581.2 MiB Group3.matrix.txt
2017-08-15 16:27:11  465.2 MiB Group4.matrix.txt
2017-08-15 16:27:11  175.5 MiB Group5.matrix.txt
2017-08-15 16:27:14   94.3 MiB Group8.matrix.txt

To copy these files, you can use the aws s3 cp command and the --exclude and --include flags to use wildcards. This will recursively copy all files ending in .txt to your current directory (about 1.5 GB). Here is an example of running the command and the output:

$ aws s3 cp --recursive --exclude "*" --include "*.txt" s3://singlecell-batches/fibroblasts/singlecell-cshl/ .
download: s3://singlecell-batches/fibroblasts/singlecell-cshl/Group1.matrix.txt to ./Group1.matrix.txt
download: s3://singlecell-batches/fibroblasts/singlecell-cshl/Group2.matrix.txt to ./Group2.matrix.txt
download: s3://singlecell-batches/fibroblasts/singlecell-cshl/allGroups/allGroups.markers.corrSP.pg.txt to allGroups/allGroups.markers.corrSP.pg.txt
download: s3://singlecell-batches/fibroblasts/singlecell-cshl/allGroups/allGroups.markers.corrSP.tsne.txt to allGroups/allGroups.markers.corrSP.tsne.txt
download: s3://singlecell-batches/fibroblasts/singlecell-cshl/allGroups/allGroups.markers.txt to allGroups/allGroups.markers.txt
download: s3://singlecell-batches/fibroblasts/singlecell-cshl/allGroups/allGroups.matrix.hist.txt to allGroups/allGroups.matrix.hist.txt
download: s3://singlecell-batches/fibroblasts/singlecell-cshl/allGroups/allGroups.samples.txt to allGroups/allGroups.samples.txt
download: s3://singlecell-batches/fibroblasts/singlecell-cshl/Group5.matrix.txt to ./Group5.matrix.txt
download: s3://singlecell-batches/fibroblasts/singlecell-cshl/Group8.matrix.txt to ./Group8.matrix.txt
download: s3://singlecell-batches/fibroblasts/singlecell-cshl/Group4.matrix.txt to ./Group4.matrix.txt
download: s3://singlecell-batches/fibroblasts/singlecell-cshl/Group3.matrix.txt to ./Group3.matrix.txt

Publicly available bulk fibroblast sequencing

Thanks to @yunfangjuan, we also have an cloud RNA-seq pipeline run on all publicly available bulk human fibroblast datasets. They are located here:

$ aws s3 ls --human-readable s3://singlecell-batches/fibroblasts/public-bulk/

Currently, the ~6 terrabytes of files are still transfering and the above aws s3 ls command outputs this:

                           PRE 200024565/
                           PRE 200026284/
                           PRE 200030567/

Eventually, it will output all the files:

                           PRE 200024565/
                           PRE 200026284/
                           PRE 200030567/
                           PRE 200038265/
                           PRE 200055169/
                           PRE 200056293/
                           PRE 200057049/
                           PRE 200061656/
                           PRE 200062776/
                           PRE 200063577/
                           PRE 200063734/
                           PRE 200063738/
                           PRE 200068327/

IF you want to look at all the files and their subdirectories, use:

$ aws s3 ls --human-readable --recursive s3://singlecell-batches/fibroblasts/public-bulk/

To copy ALL the files (not recommended as this is 5.2 TB - terabytes), do:

aws s3 cp --region us-east-2 s3://singlecell-batches/fibroblasts/public-bulk/ . --recursive

To only copy the text files (recommended), use the --exclude "*" flag to remove all files, and then selectively add the *.txt files from htseq, the *.out files from STAR, and the *family.soft.gz files from SRA:

aws s3 cp --region us-east-2 s3://singlecell-batches/fibroblasts/public-bulk/ . --recursive --exclude "*" --include "*.txt" --include "*.csv" --include "*.out" --include "*family.soft.gz"

Git tutorials/Introductions

GitHub's git tutorial - Fun, interactive tutorial from GitHub, within the browser (no installation required) #interactive #commandline
Git Tutorial for Beginners: Command-Line Fundamentals - Highly rated video of someone explaining Git and command line fundamentals #video
Software Carpentry Git Tutorial - Gentle introduction to git using some cute scenarios #interactive #commandline

Downloading folders (aka "cloning repositories")

Create a folder called singlecell-batches. I keep all my github repositories in a folder called ~/code so mine is located in ~/code/singlecell-batches, so for me, the commands are:
```
cd ~/code
mkdir singlecell-batches
```
To make a copy of the folder, you want to use the command git clone to copy the "folder and its entire history of changed files" ("repository"). Let's say you want to download the folder for the Seurat alignment algorithm, combat. Then you would do:
```
git clone https://github.com/singlecell-batches/combat
```
Which would create a folder called combat on your computer.

Adding an algorithm

To add an algorithm, use the algorithm template created here: https://github.com/singlecell-batches/cookiecutter-reproducible-science, which contains instructions for how to use it and upload the folders to GitHub.

Getting and Giving Help

We are committed to offering a pleasant setup experience for our learners and organizers. If you find bugs in our instructions, or would like to suggest improvements, please file an issue or mail us.

singlecell-batches / getting-started Goto Github PK

getting-started's Introduction

getting-started

Motivation

Data and document access

Meetings

Data Access

Single-cell Human BJ Fibroblasts produced by CSHL Single Cell Analysis students

Publicly available bulk fibroblast sequencing

Git tutorials/Introductions

Downloading folders (aka "cloning repositories")

Adding an algorithm

Getting and Giving Help

getting-started's People

Contributors

Stargazers

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent