Git Product home page Git Product logo

molgenis / 1m-cells Goto Github PK

View Code? Open in Web Editor NEW
10.0 7.0 0.0 521 KB

This is the repository which contains the code that was used to generate the results and figures of the “Single-cell RNA-sequencing reveals widespread personalized, context-specific gene expression regulation in immune cells” paper (https://doi.org/10.1038/s41467-022-30893-5)

R 94.05% Python 3.57% Shell 2.39%
bio-informatics single-cell

1m-cells's Introduction

1M-cells

This is the repository which contains the code that was used to generate the results and figures of the “Single-cell RNA-sequencing reveals widespread personalized, context-specific gene expression regulation in immune cells” paper (https://doi.org/10.1038/s41467-022-30893-5)

data availability

Expression data is available in three flavours at https://eqtlgen.org/sc/datasets/1m-scbloodnl-dataset.html:

  • QC-ed and normalised (A)
  • QC-ed without normalisation (B)
  • pre-QC (C)

A. normalized and QC-ed data

To use the normalized and QC-ed data, the following files are required (for the v2 samples):

  • 10x_v2_barcodes.tsv.gz
  • 10x_v2_SCT_features.tsv.gz
  • 10x_v2_SCT_matrix.mtx.gz

Given these three files are located in a given folder, with the filenames renamed to barcodes.tsv.gz, features.tsv.gz and matrix.mtx.gz, they can be loaded into Seurat, using the following command:

m1_processed_v2 <- Read10X('/dir/to/three/files/', gene.column = 1, cell.column = 1)

or in Scanpy using the original filenames:

# lead count data
m1_processed_v2 = sc.read_mtx('/dir/to/three/files/10x_v2_SCT_matrix.mtx.gz')
# read barcodes
m1_bc_v2 = pd.read_csv('/dir/to/three/files/10x_v2_barcodes.tsv.gz', header=None)
# read features
m1_features_v2 = pd.read_csv('/dir/to/three/files/10x_v2_SCT_features.tsv.gz', header=None)
# transpose to scanpy format
m1_processed_v2 = m1_processed_v2.T
# add barcodes and genes to obs and vars
m1_processed_v2.obs['cell_id']= m1_bc_v2[0].tolist()
m1_processed_v2.var['gene_name']= m1_features_v2[0].tolist()
# set indices for the obs and vars
m1_processed_v2.obs.index = m1_processed_v2.obs['cell_id']
m1_processed_v2.var.index = m1_processed_v2.var['gene_name']

The procedure is the same for the v3 samples

B. raw QC-ed data

To use the non-normalized counts, the following files are required (for the v2 samples):

  • 10x_v2_barcodes.tsv.gz
  • 10x_v2_RNA_features.tsv.gz
  • 10x_v2_RNA_matrix.mtx.gz

check the previous section on how to load these into Seurat or Scanpy. The procedure is the same for the v2 and v3 samples

C. pre-QC data

To use the pre-QC non-normalized counts, the following files are required (these are not split by 10x chemistry):

  • unfiltered_barcodes.tsv.gz
  • unfiltered_features_raw.tsv.gz
  • unfiltered_matrix_raw.mtx.gz

check the previous section on how to load these into Seurat or Scanpy.

Test data

A Seurat object to test with, is supplied here: https://molgenis26.gcc.rug.nl/downloads/1m-scbloodnl/small-test-dataset/ This contains the v3 samples in the UT condition, as well as the SNP affecting RPS26 co-expression.

Processing overview

The code to generate the results is separated by the different steps taken to get from the raw data to the results. Languages and packages are listed below:

  • R >= 3.6.1
  • Seurat >= 3.1
  • Python 3.7.4
  • numpy 1.19.5
  • pandas 1.2.1
  • scipy 1.6.0
  • statsmodels 0.12.2

External tools used were:

If want to rerun any of the analysis steps in R, consider using the Singularity image used for most of the analyses: https://github.com/royoelen/single-cell-container-server

Steps and their respective directories are the following:

License

The code availabe in this repository is available under the 2-Clause BSD License: https://opensource.org/licenses/BSD-2-Clause

Hardware

Analyses were performed on either a 2019 MacBook Pro (16GB), the Gearshift cluster http://docs.gcc.rug.nl/gearshift/cluster/ or for specifically the dataset normalization via SCTransform, the Peregrine cluster https://wiki.hpc.rug.nl/peregrine/start

1m-cells's People

Contributors

harmbrugge avatar royoelen avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.