scverse / scvi-tutorials Goto Github PK

View Code? Open in Web Editor NEW

43.0 7.0 24.0 506.09 MB

Notebooks used in scvi-tools tutorials

Home Page: https://docs.scvi-tools.org/en/stable/tutorials/index.html

License: BSD 3-Clause "New" or "Revised" License

Jupyter Notebook 100.00%

scverse tutorial scvi-tools

scvi-tutorials's Introduction

scvi-tools (single-cell variational inference tools) is a package for probabilistic modeling and analysis of single-cell omics data, built on top of PyTorch and AnnData.

Tutorials

This repository contains the source notebooks for the tutorials on the scvi-tools site and is included in the main repository as a submodule. Please refer to the latter for additional resources.

scvi-tutorials's People

Contributors

Stargazers

Watchers

scvi-tutorials's Issues

gene length

Hello,
I am trying to follow the tutorial "Integration and label transfer with Tabula Muris" in which you provided a text file for gene length.
The total number of genes is about 20k.
However my sequencing data has about 30k genes. I trying to get an estimate of those gene's length by looking at them in the gtf reference file i used. The length of gene does not match with that in your file.
Could you provide more details on how you calculate gene length.
Thanks very much!

Use lung dataset for main harmonization tutorial

Data accessible here:

https://figshare.com/articles/dataset/Benchmarking_atlas-level_data_integration_in_single-cell_genomics_-_integration_task_datasets_Immune_and_pancreas_/12420968?file=24539942

For the scANVI part we can hold out a few batches and treat them as unlabelled.

This publication describes the dataset: https://www.biorxiv.org/content/10.1101/2020.05.22.111161v1

module_user_guide.ipynb: NegativeBinomial function call raises "unexpected keyword argument 'total_counts'" error

I am trying to create my own PyTorch module by looking at the documentation as in Crafting the Module in vanilla PyTorch. In the next step, I want to implement the model as in Constructing a high-level model.

In the first linked page, the loss function has a line, in cell 6,
log_lik = NegativeBinomial(total_counts=theta, total=nb_logits).log_prob(x).sum(dim=-1)

whereas the PyTorch documentation for the function is given as
torch.distributions.negative_binomial.NegativeBinomial(total_count, probs=None, logits=None, validate_args=None)

So, in the next stage, when I am replacing the built-in VAE model with MyModule, I am getting the unexpected keyword argument 'total_counts' error.

I am attaching the file for reproducing the issue.

scvi_tut.py.txt

I am using PyTorch 1.10.0+cpu.

Remove `use_gpu` in tutorials

Add jupyter black pre-commit hook here

And then we can add a dummy commit on all the files and get everything reformatted!

Quick Start Tutorial Code Error

In the quickstart tutorial, scvi.settings.seed = 0 should be:

scvi._settings.seed = 0
print("Last run with scvi-tools version:", scvi.version)

I apologize if my issue / bug reporting is not in the correct format. Completely new to this field. Just trying my best to help out.

Add VELOVI tutorial

https://github.com/YosefLab/velovi/blob/main/docs/tutorial.ipynb

cc @adamgayoso and @WeilerP

Rerun scanvi tutorials that use subsampling

[TEXT HERE]

# Your code here

[Paste the error output produced by the above code here]

Versions:

VERSION

Set seed for every tutorial

DestVI gene imputation comparison between slides

Hi,

thank you for your great software. We used DestVI to calculate cell proportion values and impute gene expression values in two different tissues and on multiple slides. We are very satisfied with the cell proportion estimates (based on comparison with the tissue histology) and with the gene imputation values (based on biological insights).

The questions we have are:

Are the imputed gene expression values for each spot and cell type comparable between slides of the same tissue (same scRNA-seq reference used for training)?
Are they comparable between different tissues (different scRNA-seq reference used for deconvolution)?

Thank you very much for your assistance.
Best,

Ivana

Switch from using scrublet package to scanpy version

Model tuner tutorial

quick guide typo

sc.pp.normalize_total(adata, target_sum=10e4)

should be 1e4

Add tutorial with custom Census dataloader

merge in new changes for stereoscope

be careful, need to add the text and the result interpretation from cartal into the older version which code is most updated

Make a pull request template

All tutorials can only have one # level heading (the first one).
All tutorials should work in Google Colab
All outputs should be in the file
scvi.settings.seed = 0 should be used
counts and normalized data should coexist in anndata (see api overview tutorial), so as to show users how to maintain both forms of the data.
normalization should be counts per median library size -> log1p. If it isn't, a reason should be given.

new rna + protein dataset

https://figshare.com/projects/Single-cell_proteo-genomic_reference_maps_of_the_human_hematopoietic_system/94469

https://www.nature.com/articles/s41590-021-01059-0#Abs1

Update the vignette

Hello!

I am using your scANVI tool but I have to report that the vignette is old. For example, there is the command

scvi$data$setup_anndata(adata_both, labels_key="celltype", batch_key="batch")

However, it worked with scvi-tool v=0.8. With version 0.15, the command is

scvi$model$SCVI$setup_anndata(adata_both, labels_key="celltype", batch_key="batch")

I think that it would be important to maintain the vignette update because many people, including people approaching the bioinformatic, have access.

Thanks!

Emanuela

Add scvi-criticism tutorial

The plotting code can live here for now, let's use hugging face to pull the model and make sure it runs in colab.

Script to run tutorials in docker container that can see GPU

cell2location notebook not rendered for dark mode

It's quite difficult to view the plots from the cell2location notebooks with the docs in dark mode:

Could these get re-rendered?

add white background to images from matplotlib

Installation link gide broken

In the guide (https://docs.scvi-tools.org/en/stable/tutorials/notebooks/scvi_in_R.html)

In
"This tutorial requires Reticulate. Please check out our installation guide [fix the link] for instructions on installing Reticulate and scvi-tools."

Here is the correct link I think:
https://docs.scvi-tools.org/en/stable/installation.html#r

Some other links at the same tutorial are also broken, I can try to go through them later

docs: broken link to Ray docs

There are broken links to the Ray docs, for example in the first paragraph of the hypertuning tutorial.

The old link uses .../tune/api_docs/... and it appears Ray docs now use .../tune/api/...

For example:

old link: https://docs.ray.io/en/latest/tune/api_docs/search_space.html
should be: https://docs.ray.io/en/latest/tune/api/search_space.html

Versions:

1.0.0, found in https://docs.scvi-tools.org/en/stable/tutorials/notebooks/autotune_scvi.html#running-the-tuner

merge in seedlabeling vignette on readthedocs

as in the title, make sure this gets in the readthedocs

Write a CI workflow to ensure that notebooks only have one # level heading

Use myst nb admonitions

all warning, info, etc boxes should be switched to this style

:::{note}
**Wow**, a note!
:::

note can be replaced with important, warning, caution etc as explained here https://sphinx-book-theme--677.org.readthedocs.build/en/677/reference/kitchen-sink/admonitions.html

Release 1.1.0

atac

PeakVI #236
PoissonVI #235
scbasset_batch https://github.com/scverse/scvi-tutorials/actions/runs/7876877984/job/21491824752#step:5:186
scbasset

dev

data_tutorial #237
model_user_guide #238
module_user_guide #239

hub

minification #242
scvi_hub_intro_and_download #240
scvi_hub_upload_and_large_files #241

multimodal

cite_scrna_integration_w_totalVI #243
MultVI_tutorial #244
totalVI_reference_mapping #248
totalVI #245

quick_start

api_overview #246
data_loading #247

scrna

spatial

cell2location_lymph_node_spatial_tutorial https://github.com/scverse/scvi-tutorials/actions/runs/7875700829/job/21488144818#step:5:142
DestVI_tutorial https://github.com/scverse/scvi-tutorials/actions/runs/7875702231/job/21488150334#step:5:123
gimvi_tutorial #259
stereoscope_heart_LV_tutorial #261
tangram_scvi_tools https://github.com/scverse/scvi-tutorials/actions/runs/7875705031/job/21488159741#step:5:122

tuning

autotune_new_model #262
autotune_scvi #265

updating tutorial on dev API for modules

update from Mariano

Here I will write something like:

_get_generative_input(): selecting the registered tensors from the anndata, as well as the latent variables (from inference) used in the model. The dictionary created in this function should match the input of generative() .
New

_get_inference_input(): selecting the registered tensors from the anndata used in the inference. The dictionary created in this function should match the input of inference() .

Support running R tutorials

Requires building an R Docker container with the required dependencies and figuring out if nbconvert works with R notebooks (or otherwise find an alternative to run R notebooks top to bottom).

Lung dataset has no raw counts

Hello,
I was trying to reproduce the Atlas-level integration tutorial. The dataset downloaded via figshare is supposed to store raw counts in adata.layers['counts'], but these values here don't really look like counts.

adata = sc.read(
    "data/lung_atlas.h5ad",
    backup_url="https://figshare.com/ndownloader/files/24539942",
)

adata.layers['counts'].data

array([1.       , 1.       , 1.       , ..., 1.1387753, 2.2072291,
       1.0936011], dtype=float32)

Are these some sort of soup-corrected counts, and does it matter?

Backup URL in Integration and label transfer with Tabula Muris

The current backup URL in this tutorial points to:
https://s3.amazonaws.com/czbiohub-tabula-muris/TM_droplet_mat.h5ad

I have no access to this link.

/usr/lib/python3.7/urllib/request.py in http_error_default(self, req, fp, code, msg, hdrs)
647 class HTTPDefaultErrorHandler(BaseHandler):
648 def http_error_default(self, req, fp, code, msg, hdrs):
--> 649 raise HTTPError(req.full_url, code, msg, hdrs, fp)
650
651 class HTTPRedirectHandler(BaseHandler):

HTTPError: HTTP Error 403: Forbidden

setting seed=0 could not reproduce same UMAP embeddings

[TEXT HERE]

]([url]([url](url)))

scvi.settings.seed = 0
subcluster_res = 0.8

sca.models.SCVI.setup_anndata(t_adata,
                             layer = 'counts',
                             batch_key=batch_key,
                             continuous_covariate_keys=['percent_mt'], 
                             #categorical_covariate_keys=['USUBJID']
                             )

vae = sca.models.SCVI(t_adata,
                      n_layers=2, 
                      n_latent=30, 
                      dispersion = 'gene-batch', 
                      gene_likelihood="nb"
                     )

vae.train(use_gpu=True, #train_size=0.8,
           early_stopping= True, 
#            lr_patience=20,
           max_epochs=100,
           early_stopping_monitor = 'elbo_validation',
           plan_kwargs={"n_epochs_kl_warmup":25}
           )

![Screen Shot 2023-10-11 at 3 16 19 PM](https://github.com/scverse/scvi-tutorials/assets/32940798/d8541730-93dc-4a04-8b4f-add3740656cb)

[Paste the error output produced by the above code here]

Versions:

VERSION

DestVI tensor nan error

Hi, Thank you for your great work. I meet the same error with #65 . I am trying to run DestVI on other datasets, and I've transformed all the data into h5ad format with all necessary values to train the model.

I'm following the DestVI_tutorial.ipynb to run the training in order to do cell type deconvolution

My single cell data:
AnnData object with n_obs × n_vars = 32534 × 979
obs: 'Area', 'AspectRatio', 'Width', 'Height', 'Mean.CD298', 'Max.CD298', 'Mean.PanCK', 'Max.PanCK', 'Mean.CD45', 'Max.CD45', 'Mean.CD20', 'Max.CD20', 'Mean.DAPI', 'Max.DAPI', 'dualfiles', 'Run_name', 'Slide_name', 'ISH.concentration', 'Dash', 'tissue', 'slide_ID_numeric', 'Run_Tissue_name', 'Panel', 'Diversity', 'totalcounts', 'log10totalcounts', 'background', 'remove_flagged_cells', 'IFcolor', 'nb_clus', 'leiden_clus', 'negmean', 'class', 'cell_type', 'cell_ID'
My spatial data:
AnnData object with n_obs × n_vars = 340 × 979
obs: 'fov', 'spot_id', 'x', 'y', 'cell_counts', 'Ascending.vasa.recta.endothelium', 'B-cell', 'Connecting.tubule', 'Descending.vasa.recta.endothelium', 'Distinct.proximal.tubule.1', 'Distinct.proximal.tubule.2', 'Epithelial.progenitor.cell', 'Fibroblast', 'Glomerular.endothelium', 'Indistinct.intercalated.cell', 'MNP.a.classical.monocyte.derived', 'MNP.b.non.classical.monocyte.derived', 'MNP.c.dendritic.cell', 'Myofibroblast', 'NK', 'Pelvic.epithelium', 'Peritubular.capillary.endothelium.1', 'Peritubular.capillary.endothelium.2', 'Podocyte', 'Principal.cell', 'Proliferating.Proximal.Tubule', 'Proximal.tubule', 'T CD4 memory', 'T CD4 naive', 'T CD8 memory', 'T CD8 naive', 'Thick.ascending.limb.of.Loop.of.Henle', 'Transitional.urothelium', 'Treg', 'Type.A.intercalated.cell', 'Type.B.intercalated.cell', 'mDC', 'macrophage', 'mast', 'monocyte', 'neutrophil', 'pDC', 'plasmablast'
obsm: 'location'

Single cell data has the same gene list with spatial data

However, when training the model: it works fine in CondSCVI, but I meet the ValueError in DestVI training:
Exception has occurred: ValueError Expected value argument (Parameter of shape (979,)) to be within the support (Real()) of the distribution Normal(loc: torch.Size([979]), scale: torch.Size([979])), but found invalid values: Parameter containing: tensor([nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,...

I am try to use raw data input or use log-normalized data input, or change the learning rate, but I still meet the ValueError.
Have you faced with similar error before? Hoping that you can help me, thank you so much!

lvm-DE notebook

Update scArches tutorials to discuss encode_covariates

scverse/scvi-tools#2331

Release 1.1.0-rc.1

atac

PeakVI #165
PoissonVI #179
scbasset_batch #166
scbasset https://github.com/scverse/scvi-tutorials/actions/runs/7133507513

dev

data_tutorial #180
model_user_guide #167
module_user_guide #168

hub

minification #169
scvi_hub_intro_and_download #174
scvi_hub_upload_and_large_files #170

multimodal

cite_scrna_integration_w_totalVI #171
MultVI_tutorial #172
totalVI_reference_mapping #175
totalVI #176

quick_start

api_overview #177
data_loading #178

scrna

spatial

cell2location_lymph_node_spatial_tutorial https://github.com/scverse/scvi-tutorials/actions/runs/7122671576/job/19394075468
DestVI_tutorial https://github.com/scverse/scvi-tutorials/actions/runs/7122672392/job/19394077690
gimvi_tutorial #199
stereoscope_heart_LV_tutorial #197
tangram_scvi_tools https://github.com/scverse/scvi-tutorials/actions/runs/7122675417/job/19394086030

tuning

autotune_new_model #194
autotune_scvi #201

Update hlca tutorial to use hub

https://docs.scvi-tools.org/en/stable/tutorials/notebooks/query_hlca_knn.html

Rewrite autotune tutorials

scverse/scvi-tools#2561

Tensor Nan Error

Hello, thank you for the great work, I'm trying to run DestVI on other datasets, and I've transformed all the data into h5ad format with all necessary values to train the model,

I'm following DestVI_tutorial.ipynb to run the training in order to get the cell type composition matrix

My input singlecell data:
AnnData object with n_obs × n_vars = 1691 × 19972
obs: 'cell_types'
My input ST data:
AnnData object with n_obs × n_vars = 3067 × 135
obsm: 'location'

However, when training the sc model, the following error occurred:

ValueError: Expected parameter loc (Tensor of shape (128, 5)) of distribution Normal(loc: torch.Size([128, 5]), scale: torch.Size([128, 5])) to satisfy the constraint Real(), but found invalid values:
tensor([[nan, nan, nan, nan, nan],
        nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan]], device='cuda:0', grad_fn=<AddmmBackward0>)

Have you faced with similar error before? Hoping that you can help me, thank you so much!

Update scBasset tutorials to explain motif scoring

Show new functionality from scverse/scvi-tools#2010

multimodal integrated dataset from multiVI on tutorial of Joint analysis of paired and unpaired multiomic data with MultiVI

Hi,

I have a question about the MULTIVI.get_latent_representation on the tutorials of Joint analysis of paired and unpaired multiomic data with MultiVI.

Follow the tutorial, I run my demo data with multiVI and get the latent space matrix via get_latent_representation function.
My latent representation result includes 3 times the total cells of my original datasets ( original 1000 cells, latent space 3000 cells).
I think that may be because for the multiVI we concatenate 3 anndata
adata_mvi = scvi.data.organize_multiome_anndatas(adata_paired, adata_rna, adata_atac).
If I want to get the multimodal integrated dataset from multiVI, should I only use the first 1000 rows (in my example) as the integrated result of RNA and ATAC modality?

lda tutorial -> reupload from downloaded colab version

this automatically sets the gpu for people

reference mapping

i am following https://docs.scvi-tools.org/en/stable/tutorials/notebooks/scrna/scarches_scvi_tools.html#reference-mapping-with-scanvi

import os
import tempfile
import anndata
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import scanpy as sc
import scrublet as scr
import scvi
import seaborn as sns
import torch

scvi_model = SCVI(adata, **arches_params)
Traceback (most recent call last):
File "", line 1, in
NameError: name 'SCVI' is not defined

scverse / scvi-tutorials Goto Github PK

scvi-tutorials's Introduction

Tutorials

scvi-tutorials's People

Contributors

Stargazers

Watchers

Forkers

scvi-tutorials's Issues

Versions:

Versions:

atac

dev

hub

multimodal

quick_start

scrna

spatial

tuning

Versions:

atac

dev

hub

multimodal

quick_start

scrna

spatial

tuning

Recommend Projects

Recommend Topics

Recommend Org