Git Product home page Git Product logo

2020_wilk_covid's Introduction

2020_Wilk_COVID

Reproducibility repository accompanying Wilk, Rustagi, Zhao, et al. "A single-cell atlas of the peripheral immune response in patients with severe COVID-19" Nature Medicine (2020).

You will find R and python scripts to download data and reproduce the analyses in the "code" folder.

Processed count matrices with de-identified metadata and embeddings are available for download from the Covid-19 Cell Atlas (https://www.covid19cellatlas.org/#wilk20) hosted by the Wellcome Sanger Institute. Processed data is also available for viewing and exploration on the publicly accessible cellxgene platform by the Chan Zuckerberg Initiative at https://cellxgene.cziscience.com/d/Single_cell_atlas_of_peripheral_immune_response_to_SARS_CoV_2_infection-25.cxg/. Raw sequencing data are available at the NCBI Gene Expression Omnibus (accession number GSE150728). Requests for additional materials can be made via email to the corresponding authors: Catherine A. Blish ([email protected]) or Angela J. Rogers ([email protected]).

Bam files are available here: https://drive.google.com/drive/folders/1qf62ip8WorEV-KLf_WSi8AMyyMoNdfln?usp=sharing

This repository is a work-in-progress and will be updated frequently!

2020_wilk_covid's People

Contributors

ajwilk avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

2020_wilk_covid's Issues

Confused about cell annotation in "covid_analysis_markdown.Rmd"

Thank you for providing the code for everyone to learn. I have a little question. You divided cell type in this way here, but cluster 23 and cluster 21 have the same annotation ("IgA PB"), and cluster 21 belong to covid_B.idents, but cluster 23 not. Why? What's more, I don't know why clusters 24 and 27 are divided in covid_B.idents.
Looking forward to your reply.

covid_myeloid.idents <- c("3", "6", "7", "8", "10", "20", "24", "25", "26", "27", "28")
covid_B.idents <- c("5", "9", "16", "18", "21", "24", "27", "29")
covid_fine.idents <- c("NK", "CD8m T", "CD4m T", "CD14 Monocyte", "CD4n T", "B", "CD14 Monocyte", "CD14 Monocyte", "CD14 Monocyte", "IgM PB", "CD16 Monocyte", "NK", "RBC", "CD8eff T", "RBC", "CD8m T", "IgG PB", "Platelet", "IgG PB", "CD4 T", "DC", "IgA PB", "gd T", "IgA PB", "SC & Eosinophil", "Neutrophil", "pDC", "Developing Neutrophil", "CD16 Monocyte", "IgA PB")

adata object related to the plasma cell to neutrophil transition for dynamo analysis

Hi Aaron @ajwilk, I am really interested in the possible plasma to neutrophil transition as proposed in your paper. While it seems controversial (as I followed the papers of discussion between you and Jose and Joseph), I wonder whether the computational framework, dynamo (https://github.com/aristoteleo/dynamo-release), I developed can provide some further validation for your conclusion. I am personally a believer in this kind of cell-fate transition in general.

Would you mind to share the adata object (saved as h5ad format) that can be used to perform this analysis. The adata object that includes intron/exon raw UMI for all cells and your cell type annotation, as well as the original UMAP embedding for the full dataset will be ideal. The UMAP embedding for the subset of plasm and neutrophil cells that are used for the transition analysis will be helpful too.

Cell attribute "log_umi" contains NA, NaN, or infinite value

Hello, I have been trying to run your code but have encountered many problems, like the one in issue #10, so I tried to just use the normal workflow of Seurat for the first part(and a little quality control). But there is a problem when I try to use SCTransform:

Error in make_cell_attr(umi, cell_attr, latent_var, batch_var, latent_var_nonreg, :
cell attribute "log_umi" contains NA, NaN, or infinite value

Here is my code:

library(Seurat)
library(tidyverse)
library(ggplot2)
library(gridExtra)
library(harmony)
library(future.apply)
library(cowplot)
library(patchwork)
library("DESeq2")
library(sctransform)
library(EpicTools)
library(grr)
library(Matrix)
library(factoextra)
library(ComplexHeatmap)
library(circlize)
library(ggpubr)
library(data.table)
library(RColorBrewer)
#library(rowr)
library(SingleR)
library(scater)
#library(nichenetr)
library(future)
library(future.apply)
# Nombres de los archivos
file_list <- c(
  "GSM4557327_555_1_cell.counts.matrices.rds", "GSM4557328_555_2_cell.counts.matrices.rds", "GSM4557329_556_cell.counts.matrices.rds", "GSM4557330_557_cell.counts.matrices.rds", "GSM4557331_558_cell.counts.matrices.rds", "GSM4557332_559_cell.counts.matrices.rds", 
  "GSM4557333_561_cell.counts.matrices.rds",
  "GSM4557334_HIP002_cell.counts.matrices.rds",
  "GSM4557335_HIP015_cell.counts.matrices.rds",
  "GSM4557336_HIP023_cell.counts.matrices.rds",
  "GSM4557337_HIP043_cell.counts.matrices.rds",
  "GSM4557338_HIP044_cell.counts.matrices.rds",
  "GSM4557339_HIP045_cell.counts.matrices.rds"
)

# Directorio de destino
dest_dir <- "/Users/rodrigohermoza/Desktop/UTEC/2024-0/PoliSia/COVID/GSE150728_RAW"

gc()
# Lista para almacenar los datos (lectura paralela)
data_list <- future_lapply(file.path(dest_dir, file_list), readRDS)
gc()
# Puedes acceder a los conjuntos de datos usando data_list[[1]], data_list[[2]], etc.

Conseguimos los nombre de los documentos


file_list_cleaned <- gsub("_cell.counts.matrices.rds", "", file_list)
for (x in c(1:13)) {
  name<-file_list_cleaned[x]
  assign(name, CreateSeuratObject( counts = data_list[[x]], min.cells = 10))
}
merged <- merge(
  `GSM4557327_555_1`, 
  y = list(
    `GSM4557328_555_2`, 
    `GSM4557329_556`, 
    `GSM4557330_557`, 
    `GSM4557331_558`, 
    `GSM4557332_559`,
    `GSM4557333_561`,
    `GSM4557334_HIP002`,
    `GSM4557335_HIP015`,
    `GSM4557336_HIP023`,
    `GSM4557337_HIP043`,
    `GSM4557338_HIP044`,
    `GSM4557339_HIP045`
  ),
  add.cell.ids = c(
    "C1A", 
    "C1B", 
    "C2", 
    "C3", 
    "C4", 
    "C5",
    "C7",
    "H1",
    "H2",
    "H3",
    "H4",
    "H5",
    "H6"
  ),
  project = "COVID"
)
rm(`GSM4557328_555_2`, 
    `GSM4557329_556`, 
    `GSM4557330_557`, 
    `GSM4557331_558`, 
    `GSM4557332_559`,
    `GSM4557333_561`,
    `GSM4557334_HIP002`,
    `GSM4557335_HIP015`,
    `GSM4557336_HIP023`,
    `GSM4557337_HIP043`,
    `GSM4557338_HIP044`,
    `GSM4557339_HIP045`,
   `GSM4557327_555_1`)

merged$Sample <- rownames([email protected])
[email protected] <- separate([email protected], col = "Sample", into = c("Patient", "Barcode"), sep = "_")
merged <- PercentageFeatureSet(merged, pattern = "^MT-", col.name = "percent.mt")
merged <- subset(merged, subset = nFeature_RNA > 100 & percent.mt <25 & nCount_RNA > 50)
list1 <- SplitObject(object = merged, split.by = "Patient")
for (i in 1:length(list)){
  list[[i]] <- SCTransform(list[[i]], verbose = F)
}

Error: Cannot find 'cell.type.fine' in this Seurat object

Dear Aaron:
Could I have your help.
when I try to execute the code file"generating_Seurat_object.Rmd",I meet some troubles.
1.In your code,you removed cluster 3and cluster 12,becauese they are low quality cluste with only poor quality genes. and when the code run in my computer ,the two bad clusers are cluster2 and cluster 13.
image
image
I can't image the reason that cause the difference.
so I have to remove cluster2 and cluster 13,which make the cell clustering , DE genes and cell annotation results different with the your provide in the Supplementary Table 2.

  1. in the line of 254 :
    DotPlot(Seurat:::subset.Seurat(covid_combined.nc, idents = c(covid_myeloid.idents)), features = unique(c("CD14", "LYZ", "FCGR3B", "FCGR3A", "CLC", "ELANE", "LTF", "MPO", "CTSG", "IDO1", "FCER1A", "FLT3", "IL3RA", "NRP1", "MME", "CD22", "FCER2", "CD44", "CD27", "SDC1", "CD4", "CD8A", "ITGAL", "SELL", "GZMB", "CD3E", "CD3G", "CD3D")), group.by = "cell.type.fine") + ggpubr::rotate_x_text()
    Error: Cannot find 'cell.type.fine' in this Seurat object.
    I don‘t kown when we have generate the 'cell.type.fine' ,because I can’t find it in the lines above.

3.I can't ues the function Seurat:::subset.Seurat , is it has replaced by the function ”subset()“ of package Seurat"?

warning when running EpicPreHS in file "generating_Searat_object.Rmd"

Hi,
when I run the script, a warning pops up when I run this line:
##pre-processing
cm.pp <- mapply(EpicPreHS, cm.files, orig.ident = names(cm.files), SIMPLIFY = F)

warning:
In (function (cm_name, min.counts = 500, max.counts = 15000, ... :
The resulting cell names will be strange if the input object name does not end with '.cm'

Would you mind clarifying if this warning will affect the results?

Can you please provide bam files?

Hello,
I really appreciate you creating this repository and sharing your code data analysis you performed in your wonderful paper. I was wondering if it will be possible for you to share the aligned reads as bam files? I am trying to re-analyse the data from a different perspective and having the bam files to start with will be really helpful.
So, is it possible for you to share the bam files generated by https://github.com/ajwilk/2020_Wilk_COVID/blob/master/code/sample_alignment.sh script?
I will really appreciate your help.
Thank you.

Problem in filtering markers.

covid_combined.markers %>% group_by(cluster) %>% top_n(n = 10, wt = avg_logFC)
gives the following error:

Error: Problem with filter() input ..1.
i Input ..1 is top_n_rank(10, avg_logFC).
x object 'avg_logFC' not found
i The error occurred in group 1: cluster = 0.

Does the argument of top_n "wt =avg_logFC" or "wt = avg_log2FC"?

Error in crapmarkers$avg_logFC fxn

When I run the command

crap2.markers[order(-crap2.markers$avg_logFC),]

I got this error:
Error in -crap.markers$avg_logFC : invalid argument to unary operator

Does anyone have an idea to solve the problem?

About scVelo

Hello,
I'm recapturing your results in the paper
The code:

scv.pp.filter_and_normalize(adata, min_shared_counts=10, n_top_genes=3000)
scv.pp.moments(adata, n_pcs=30, n_neighbors=20)
scv.tl.recover_dynamics(adata)
scv.tl.velocity(adata,mode='dynamical')

print(adata.var.velocity_genes.sum())

scv.tl.velocity_graph(adata)
scv.pl.velocity_embedding_stream(adata, basis='umap',color='seurat_clusters')
scv.pl.velocity_embedding_stream(adata, basis='umap',color='cell.type.fine')
scv.pl.velocity_embedding(adata, basis='umap',color='cell.type.fine',arrow_length=5)
scv.pl.velocity_embedding_grid(adata, basis='umap',color='seurat_clusters',arrow_length=5)


#####latent time
#scv.tl.terminal_states(adata)
scv.tl.latent_time(adata)
scv.pl.scatter(adata, color='latent_time',color_map='gnuplot', size=80)
scv.pl.scatter(adata, color=[ 'root_cells', 'end_points'])

The results:
image
image
Seems different in the arrow graph.
Can you tell me the parameter you use or give me the ipynb file? Thank you for your help.

mergeCM - Unsupported type for sort

Hi @ajwilk
I created count matrices from raw reads and I used 13 matrices files for analysis. I am getting error in the mergeCM function.
Code:-
covid_combined.emat <- mergeCM(cm.pp, type = "emat")
Error:-
Error in grr::matches(by.x, by.y, all.x, all.y, nomatch = NA)
Unsupported type for sort.

How to resolve this. Thanks in advance.

Error In Adding Metadata

rownames(metadata_combined) <- rownames([email protected])
This line gives the following error:

Warning: Error in .rowNamesDF<-(x, value = value) : invalid 'row.names' length

Because the metadata file you provide ("https://raw.githubusercontent.com/ajwilk/2020_Wilk_COVID/master/code/COVID-19_metadata_repo.csv") have different "orig.ident" names than created Seurat Object.

Problem is solved simply by changing the "orig.ident" names in the metadata file (eg. from "covid_556" to "556").

Error running code in file "generating_Searat_object.Rmd"

Dear Aaron,

I am running into an error while trying to execute your code in document "generating_Seurat_object.Rmd"

In the Rmd code chunk called 'merge', when i execute that chunk (previous chinks are fine) i am running into the eror below when exectung:

covid_combined.emat <- mergeCM(cm.pp, type = "emat")
Could this be an error associated with one of the functions in EpicTools? I have looked at the source code for mergeCM() function from EpicTools but could not discover where this error is coming from

Error in grr::matches(by.x, by.y, all.x, all.y, nomatch = NA) :
Unsupported type for sort.

Could you help me figure out how to proceed from here?
Much appreaciated,

Kind regards,
Marc
p.s. I have updated my clone of the repo first by a pull from the github source, this did not solve the issue.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.