ajwilk / 2020_wilk_covid Goto Github PK

Reproducibility repository accompanying Wilk, Rustagi, Zhao, et al. "A single-cell atlas of the peripheral immune response to severe COVID-19"

R 97.82% Shell 2.18%

2020_wilk_covid's Introduction

2020_Wilk_COVID

Reproducibility repository accompanying Wilk, Rustagi, Zhao, et al. "A single-cell atlas of the peripheral immune response in patients with severe COVID-19" Nature Medicine (2020).

You will find R and python scripts to download data and reproduce the analyses in the "code" folder.

Processed count matrices with de-identified metadata and embeddings are available for download from the Covid-19 Cell Atlas (https://www.covid19cellatlas.org/#wilk20) hosted by the Wellcome Sanger Institute. Processed data is also available for viewing and exploration on the publicly accessible cellxgene platform by the Chan Zuckerberg Initiative at https://cellxgene.cziscience.com/d/Single_cell_atlas_of_peripheral_immune_response_to_SARS_CoV_2_infection-25.cxg/. Raw sequencing data are available at the NCBI Gene Expression Omnibus (accession number GSE150728). Requests for additional materials can be made via email to the corresponding authors: Catherine A. Blish ([email protected]) or Angela J. Rogers ([email protected]).

Bam files are available here: https://drive.google.com/drive/folders/1qf62ip8WorEV-KLf_WSi8AMyyMoNdfln?usp=sharing

This repository is a work-in-progress and will be updated frequently!

2020_wilk_covid's People

Contributors

Stargazers

Watchers

2020_wilk_covid's Issues

How did the object "EpicPreHS" come about? Not explained in the script, hope to know

pre-processing

cm.pp <- mapply(EpicPreHS, cm.files, orig.ident = names(cm.files), SIMPLIFY = F)

covid_560 is missing from the count matrix RDS files on GEO

The entry states that 8 patient samples and 6 control samples were run, but only 7 covid samples have GSM entries, and the raw data reflects this. Discovered while matching covariate CSV to data...

https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE150728

Please fix?

Confused about cell annotation in "covid_analysis_markdown.Rmd"

Thank you for providing the code for everyone to learn. I have a little question. You divided cell type in this way here, but cluster 23 and cluster 21 have the same annotation ("IgA PB"), and cluster 21 belong to covid_B.idents, but cluster 23 not. Why? What's more, I don't know why clusters 24 and 27 are divided in covid_B.idents.
Looking forward to your reply.

covid_myeloid.idents <- c("3", "6", "7", "8", "10", "20", "24", "25", "26", "27", "28")
covid_B.idents <- c("5", "9", "16", "18", "21", "24", "27", "29")
covid_fine.idents <- c("NK", "CD8m T", "CD4m T", "CD14 Monocyte", "CD4n T", "B", "CD14 Monocyte", "CD14 Monocyte", "CD14 Monocyte", "IgM PB", "CD16 Monocyte", "NK", "RBC", "CD8eff T", "RBC", "CD8m T", "IgG PB", "Platelet", "IgG PB", "CD4 T", "DC", "IgA PB", "gd T", "IgA PB", "SC & Eosinophil", "Neutrophil", "pDC", "Developing Neutrophil", "CD16 Monocyte", "IgA PB")

adata object related to the plasma cell to neutrophil transition for dynamo analysis

Hi Aaron @ajwilk, I am really interested in the possible plasma to neutrophil transition as proposed in your paper. While it seems controversial (as I followed the papers of discussion between you and Jose and Joseph), I wonder whether the computational framework, dynamo (https://github.com/aristoteleo/dynamo-release), I developed can provide some further validation for your conclusion. I am personally a believer in this kind of cell-fate transition in general.

Would you mind to share the adata object (saved as h5ad format) that can be used to perform this analysis. The adata object that includes intron/exon raw UMI for all cells and your cell type annotation, as well as the original UMAP embedding for the full dataset will be ideal. The UMAP embedding for the subset of plasm and neutrophil cells that are used for the transition analysis will be helpful too.

Cell attribute "log_umi" contains NA, NaN, or infinite value

Hello, I have been trying to run your code but have encountered many problems, like the one in issue #10, so I tried to just use the normal workflow of Seurat for the first part(and a little quality control). But there is a problem when I try to use SCTransform:

Error in make_cell_attr(umi, cell_attr, latent_var, batch_var, latent_var_nonreg, :
cell attribute "log_umi" contains NA, NaN, or infinite value

Here is my code:

library(Seurat)
library(tidyverse)
library(ggplot2)
library(gridExtra)
library(harmony)
library(future.apply)
library(cowplot)
library(patchwork)
library("DESeq2")
library(sctransform)
library(EpicTools)
library(grr)
library(Matrix)
library(factoextra)
library(ComplexHeatmap)
library(circlize)
library(ggpubr)
library(data.table)
library(RColorBrewer)
#library(rowr)
library(SingleR)
library(scater)
#library(nichenetr)
library(future)
library(future.apply)

# Nombres de los archivos
file_list <- c(
  "GSM4557327_555_1_cell.counts.matrices.rds", "GSM4557328_555_2_cell.counts.matrices.rds", "GSM4557329_556_cell.counts.matrices.rds", "GSM4557330_557_cell.counts.matrices.rds", "GSM4557331_558_cell.counts.matrices.rds", "GSM4557332_559_cell.counts.matrices.rds", 
  "GSM4557333_561_cell.counts.matrices.rds",
  "GSM4557334_HIP002_cell.counts.matrices.rds",
  "GSM4557335_HIP015_cell.counts.matrices.rds",
  "GSM4557336_HIP023_cell.counts.matrices.rds",
  "GSM4557337_HIP043_cell.counts.matrices.rds",
  "GSM4557338_HIP044_cell.counts.matrices.rds",
  "GSM4557339_HIP045_cell.counts.matrices.rds"
)

# Directorio de destino
dest_dir <- "/Users/rodrigohermoza/Desktop/UTEC/2024-0/PoliSia/COVID/GSE150728_RAW"

gc()
# Lista para almacenar los datos (lectura paralela)
data_list <- future_lapply(file.path(dest_dir, file_list), readRDS)
gc()
# Puedes acceder a los conjuntos de datos usando data_list[[1]], data_list[[2]], etc.

Conseguimos los nombre de los documentos


file_list_cleaned <- gsub("_cell.counts.matrices.rds", "", file_list)

for (x in c(1:13)) {
  name<-file_list_cleaned[x]
  assign(name, CreateSeuratObject( counts = data_list[[x]], min.cells = 10))
}

merged <- merge(
  `GSM4557327_555_1`, 
  y = list(
    `GSM4557328_555_2`, 
    `GSM4557329_556`, 
    `GSM4557330_557`, 
    `GSM4557331_558`, 
    `GSM4557332_559`,
    `GSM4557333_561`,
    `GSM4557334_HIP002`,
    `GSM4557335_HIP015`,
    `GSM4557336_HIP023`,
    `GSM4557337_HIP043`,
    `GSM4557338_HIP044`,
    `GSM4557339_HIP045`
  ),
  add.cell.ids = c(
    "C1A", 
    "C1B", 
    "C2", 
    "C3", 
    "C4", 
    "C5",
    "C7",
    "H1",
    "H2",
    "H3",
    "H4",
    "H5",
    "H6"
  ),
  project = "COVID"
)

rm(`GSM4557328_555_2`, 
    `GSM4557329_556`, 
    `GSM4557330_557`, 
    `GSM4557331_558`, 
    `GSM4557332_559`,
    `GSM4557333_561`,
    `GSM4557334_HIP002`,
    `GSM4557335_HIP015`,
    `GSM4557336_HIP023`,
    `GSM4557337_HIP043`,
    `GSM4557338_HIP044`,
    `GSM4557339_HIP045`,
   `GSM4557327_555_1`)

merged$Sample <- rownames([email protected])

[email protected] <- separate([email protected], col = "Sample", into = c("Patient", "Barcode"), sep = "_")

merged <- PercentageFeatureSet(merged, pattern = "^MT-", col.name = "percent.mt")

merged <- subset(merged, subset = nFeature_RNA > 100 & percent.mt <25 & nCount_RNA > 50)

list1 <- SplitObject(object = merged, split.by = "Patient")
for (i in 1:length(list)){
  list[[i]] <- SCTransform(list[[i]], verbose = F)
}

Error: Cannot find 'cell.type.fine' in this Seurat object

Dear Aaron：
Could I have your help.
when I try to execute the code file"generating_Seurat_object.Rmd",I meet some troubles.
1.In your code，you removed cluster 3and cluster 12,becauese they are low quality cluste with only poor quality genes. and when the code run in my computer ,the two bad clusers are cluster2 and cluster 13.

I can't image the reason that cause the difference.
so I have to remove cluster2 and cluster 13,which make the cell clustering ， DE genes and cell annotation results different with the your provide in the Supplementary Table 2.

in the line of 254 ：
DotPlot(Seurat:::subset.Seurat(covid_combined.nc, idents = c(covid_myeloid.idents)), features = unique(c("CD14", "LYZ", "FCGR3B", "FCGR3A", "CLC", "ELANE", "LTF", "MPO", "CTSG", "IDO1", "FCER1A", "FLT3", "IL3RA", "NRP1", "MME", "CD22", "FCER2", "CD44", "CD27", "SDC1", "CD4", "CD8A", "ITGAL", "SELL", "GZMB", "CD3E", "CD3G", "CD3D")), group.by = "cell.type.fine") + ggpubr::rotate_x_text()
Error: Cannot find 'cell.type.fine' in this Seurat object.
I don‘t kown when we have generate the 'cell.type.fine' ，because I can’t find it in the lines above.

3.I can't ues the function Seurat:::subset.Seurat , is it has replaced by the function ”subset()“ of package Seurat"?

Codes for alignments and counts

Hi ajwilk,

Would you mind adding a file containing the codes that you used for alignment and counting?

Thanks

warning when running EpicPreHS in file "generating_Searat_object.Rmd"

Hi,
when I run the script, a warning pops up when I run this line:
##pre-processing
cm.pp <- mapply(EpicPreHS, cm.files, orig.ident = names(cm.files), SIMPLIFY = F)

warning:
In (function (cm_name, min.counts = 500, max.counts = 15000, ... :
The resulting cell names will be strange if the input object name does not end with '.cm'

Would you mind clarifying if this warning will affect the results?

How to create a AnnData by mtx and csv file

I'm very interested in the part of RNA velocity analysis in your article. Can you provide the code for creating AnnData by .mtx and .csv? Thank you very much!

Can you please provide bam files?

Hello,
I really appreciate you creating this repository and sharing your code data analysis you performed in your wonderful paper. I was wondering if it will be possible for you to share the aligned reads as bam files? I am trying to re-analyse the data from a different perspective and having the bam files to start with will be really helpful.
So, is it possible for you to share the bam files generated by https://github.com/ajwilk/2020_Wilk_COVID/blob/master/code/sample_alignment.sh script?
I will really appreciate your help.
Thank you.

Problem in filtering markers.

covid_combined.markers %>% group_by(cluster) %>% top_n(n = 10, wt = avg_logFC)
gives the following error:

Error: Problem with filter() input ..1.
i Input ..1 is top_n_rank(10, avg_logFC).
x object 'avg_logFC' not found
i The error occurred in group 1: cluster = 0.

Does the argument of top_n "wt =avg_logFC" or "wt = avg_log2FC"?

Error in crapmarkers$avg_logFC fxn

When I run the command

crap2.markers[order(-crap2.markers$avg_logFC),]

I got this error:
Error in -crap.markers$avg_logFC : invalid argument to unary operator

Does anyone have an idea to solve the problem?

About scVelo

Hello,
I'm recapturing your results in the paper
The code:

scv.pp.filter_and_normalize(adata, min_shared_counts=10, n_top_genes=3000)
scv.pp.moments(adata, n_pcs=30, n_neighbors=20)
scv.tl.recover_dynamics(adata)
scv.tl.velocity(adata,mode='dynamical')

print(adata.var.velocity_genes.sum())

scv.tl.velocity_graph(adata)
scv.pl.velocity_embedding_stream(adata, basis='umap',color='seurat_clusters')
scv.pl.velocity_embedding_stream(adata, basis='umap',color='cell.type.fine')
scv.pl.velocity_embedding(adata, basis='umap',color='cell.type.fine',arrow_length=5)
scv.pl.velocity_embedding_grid(adata, basis='umap',color='seurat_clusters',arrow_length=5)


#####latent time
#scv.tl.terminal_states(adata)
scv.tl.latent_time(adata)
scv.pl.scatter(adata, color='latent_time',color_map='gnuplot', size=80)
scv.pl.scatter(adata, color=[ 'root_cells', 'end_points'])

The results:

Seems different in the arrow graph.
Can you tell me the parameter you use or give me the ipynb file? Thank you for your help.

mergeCM - Unsupported type for sort

Hi @ajwilk
I created count matrices from raw reads and I used 13 matrices files for analysis. I am getting error in the mergeCM function.
Code:-
covid_combined.emat <- mergeCM(cm.pp, type = "emat")
Error:-
Error in grr::matches(by.x, by.y, all.x, all.y, nomatch = NA)
Unsupported type for sort.

How to resolve this. Thanks in advance.

Error In Adding Metadata

rownames(metadata_combined) <- rownames([email protected])
This line gives the following error:

Warning: Error in .rowNamesDF<-(x, value = value) : invalid 'row.names' length

Because the metadata file you provide ("https://raw.githubusercontent.com/ajwilk/2020_Wilk_COVID/master/code/COVID-19_metadata_repo.csv") have different "orig.ident" names than created Seurat Object.

Problem is solved simply by changing the "orig.ident" names in the metadata file (eg. from "covid_556" to "556").

Error running code in file "generating_Searat_object.Rmd"

Dear Aaron,

I am running into an error while trying to execute your code in document "generating_Seurat_object.Rmd"

In the Rmd code chunk called 'merge', when i execute that chunk (previous chinks are fine) i am running into the eror below when exectung:

covid_combined.emat <- mergeCM(cm.pp, type = "emat")
Could this be an error associated with one of the functions in EpicTools? I have looked at the source code for mergeCM() function from EpicTools but could not discover where this error is coming from

Error in grr::matches(by.x, by.y, all.x, all.y, nomatch = NA) :
Unsupported type for sort.

Could you help me figure out how to proceed from here?
Much appreaciated,

Kind regards,
Marc
p.s. I have updated my clone of the repo first by a pull from the github source, this did not solve the issue.

ajwilk / 2020_wilk_covid Goto Github PK

2020_wilk_covid's Introduction

2020_Wilk_COVID

2020_wilk_covid's People

Contributors

Stargazers

Watchers

Forkers

2020_wilk_covid's Issues

Recommend Projects

Recommend Topics

Recommend Org