Git Product home page Git Product logo

scvi-tutorials's Introduction

scvi-tools

PyPI

scvi-tools (single-cell variational inference tools) is a package for probabilistic modeling and analysis of single-cell omics data, built on top of PyTorch and AnnData.

Tutorials

This repository contains the source notebooks for the tutorials on the scvi-tools site and is included in the main repository as a submodule. Please refer to the latter for additional resources.

scvi-tutorials's People

Contributors

adamgayoso avatar canergen avatar ethanweinberger avatar galenxing avatar github-actions[bot] avatar justjhong avatar lauradmartens avatar martinkim0 avatar mkarikom avatar munfred avatar pierreboyeau avatar pre-commit-ci[bot] avatar romain-lopez avatar semenko avatar talashuach avatar vitkl avatar watiss avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

scvi-tutorials's Issues

gene length

Hello,
I am trying to follow the tutorial "Integration and label transfer with Tabula Muris" in which you provided a text file for gene length.
The total number of genes is about 20k.
However my sequencing data has about 30k genes. I trying to get an estimate of those gene's length by looking at them in the gtf reference file i used. The length of gene does not match with that in your file.
Could you provide more details on how you calculate gene length.
Thanks very much!

module_user_guide.ipynb: NegativeBinomial function call raises "unexpected keyword argument 'total_counts'" error

I am trying to create my own PyTorch module by looking at the documentation as in Crafting the Module in vanilla PyTorch. In the next step, I want to implement the model as in Constructing a high-level model.

In the first linked page, the loss function has a line, in cell 6,
log_lik = NegativeBinomial(total_counts=theta, total=nb_logits).log_prob(x).sum(dim=-1)

whereas the PyTorch documentation for the function is given as
torch.distributions.negative_binomial.NegativeBinomial(total_count, probs=None, logits=None, validate_args=None)

So, in the next stage, when I am replacing the built-in VAE model with MyModule, I am getting the unexpected keyword argument 'total_counts' error.

I am attaching the file for reproducing the issue.

scvi_tut.py.txt

I am using PyTorch 1.10.0+cpu.

Quick Start Tutorial Code Error

In the quickstart tutorial, scvi.settings.seed = 0 should be:

scvi._settings.seed = 0
print("Last run with scvi-tools version:", scvi.version)

I apologize if my issue / bug reporting is not in the correct format. Completely new to this field. Just trying my best to help out.

image image

DestVI gene imputation comparison between slides

Hi,

thank you for your great software. We used DestVI to calculate cell proportion values and impute gene expression values in two different tissues and on multiple slides. We are very satisfied with the cell proportion estimates (based on comparison with the tissue histology) and with the gene imputation values (based on biological insights).

The questions we have are:

  1. Are the imputed gene expression values for each spot and cell type comparable between slides of the same tissue (same scRNA-seq reference used for training)?
  2. Are they comparable between different tissues (different scRNA-seq reference used for deconvolution)?

Thank you very much for your assistance.
Best,

Ivana

Make a pull request template

  1. All tutorials can only have one # level heading (the first one).
  2. All tutorials should work in Google Colab
  3. All outputs should be in the file
  4. scvi.settings.seed = 0 should be used
  5. counts and normalized data should coexist in anndata (see api overview tutorial), so as to show users how to maintain both forms of the data.
  6. normalization should be counts per median library size -> log1p. If it isn't, a reason should be given.

Update the vignette

Hello!

I am using your scANVI tool but I have to report that the vignette is old. For example, there is the command

scvi$data$setup_anndata(adata_both, labels_key="celltype", batch_key="batch")

However, it worked with scvi-tool v=0.8. With version 0.15, the command is

scvi$model$SCVI$setup_anndata(adata_both, labels_key="celltype", batch_key="batch")

I think that it would be important to maintain the vignette update because many people, including people approaching the bioinformatic, have access.

Thanks!

Emanuela

Add scvi-criticism tutorial

The plotting code can live here for now, let's use hugging face to pull the model and make sure it runs in colab.

Release 1.1.0

atac

dev

  • data_tutorial #237
  • model_user_guide #238
  • module_user_guide #239

hub

  • minification #242
  • scvi_hub_intro_and_download #240
  • scvi_hub_upload_and_large_files #241

multimodal

  • cite_scrna_integration_w_totalVI #243
  • MultVI_tutorial #244
  • totalVI_reference_mapping #248
  • totalVI #245

quick_start

scrna

spatial

tuning

  • autotune_new_model #262
  • autotune_scvi #265

updating tutorial on dev API for modules

update from Mariano

Here I will write something like:

_get_generative_input(): selecting the registered tensors from the anndata, as well as the latent variables (from inference) used in the model. The dictionary created in this function should match the input of generative() .
New

_get_inference_input(): selecting the registered tensors from the anndata used in the inference. The dictionary created in this function should match the input of inference() .

Support running R tutorials

Requires building an R Docker container with the required dependencies and figuring out if nbconvert works with R notebooks (or otherwise find an alternative to run R notebooks top to bottom).

Lung dataset has no raw counts

Hello,
I was trying to reproduce the Atlas-level integration tutorial. The dataset downloaded via figshare is supposed to store raw counts in adata.layers['counts'], but these values here don't really look like counts.

adata = sc.read(
    "data/lung_atlas.h5ad",
    backup_url="https://figshare.com/ndownloader/files/24539942",
)

adata.layers['counts'].data
array([1.       , 1.       , 1.       , ..., 1.1387753, 2.2072291,
       1.0936011], dtype=float32)

Are these some sort of soup-corrected counts, and does it matter?

Backup URL in Integration and label transfer with Tabula Muris

The current backup URL in this tutorial points to:
https://s3.amazonaws.com/czbiohub-tabula-muris/TM_droplet_mat.h5ad

I have no access to this link.

/usr/lib/python3.7/urllib/request.py in http_error_default(self, req, fp, code, msg, hdrs)
647 class HTTPDefaultErrorHandler(BaseHandler):
648 def http_error_default(self, req, fp, code, msg, hdrs):
--> 649 raise HTTPError(req.full_url, code, msg, hdrs, fp)
650
651 class HTTPRedirectHandler(BaseHandler):

HTTPError: HTTP Error 403: Forbidden

setting seed=0 could not reproduce same UMAP embeddings

[TEXT HERE]

]([url]([url](url)))
scvi.settings.seed = 0
subcluster_res = 0.8

sca.models.SCVI.setup_anndata(t_adata,
                             layer = 'counts',
                             batch_key=batch_key,
                             continuous_covariate_keys=['percent_mt'], 
                             #categorical_covariate_keys=['USUBJID']
                             )

vae = sca.models.SCVI(t_adata,
                      n_layers=2, 
                      n_latent=30, 
                      dispersion = 'gene-batch', 
                      gene_likelihood="nb"
                     )

vae.train(use_gpu=True, #train_size=0.8,
           early_stopping= True, 
#            lr_patience=20,
           max_epochs=100,
           early_stopping_monitor = 'elbo_validation',
           plan_kwargs={"n_epochs_kl_warmup":25}
           )
![Screen Shot 2023-10-11 at 3 16 19 PM](https://github.com/scverse/scvi-tutorials/assets/32940798/d8541730-93dc-4a04-8b4f-add3740656cb)

[Paste the error output produced by the above code here]

Versions:

VERSION

DestVI tensor nan error

Hi, Thank you for your great work. I meet the same error with #65 . I am trying to run DestVI on other datasets, and I've transformed all the data into h5ad format with all necessary values to train the model.

I'm following the DestVI_tutorial.ipynb to run the training in order to do cell type deconvolution

My single cell data:
AnnData object with n_obs × n_vars = 32534 × 979
obs: 'Area', 'AspectRatio', 'Width', 'Height', 'Mean.CD298', 'Max.CD298', 'Mean.PanCK', 'Max.PanCK', 'Mean.CD45', 'Max.CD45', 'Mean.CD20', 'Max.CD20', 'Mean.DAPI', 'Max.DAPI', 'dualfiles', 'Run_name', 'Slide_name', 'ISH.concentration', 'Dash', 'tissue', 'slide_ID_numeric', 'Run_Tissue_name', 'Panel', 'Diversity', 'totalcounts', 'log10totalcounts', 'background', 'remove_flagged_cells', 'IFcolor', 'nb_clus', 'leiden_clus', 'negmean', 'class', 'cell_type', 'cell_ID'
My spatial data:
AnnData object with n_obs × n_vars = 340 × 979
obs: 'fov', 'spot_id', 'x', 'y', 'cell_counts', 'Ascending.vasa.recta.endothelium', 'B-cell', 'Connecting.tubule', 'Descending.vasa.recta.endothelium', 'Distinct.proximal.tubule.1', 'Distinct.proximal.tubule.2', 'Epithelial.progenitor.cell', 'Fibroblast', 'Glomerular.endothelium', 'Indistinct.intercalated.cell', 'MNP.a.classical.monocyte.derived', 'MNP.b.non.classical.monocyte.derived', 'MNP.c.dendritic.cell', 'Myofibroblast', 'NK', 'Pelvic.epithelium', 'Peritubular.capillary.endothelium.1', 'Peritubular.capillary.endothelium.2', 'Podocyte', 'Principal.cell', 'Proliferating.Proximal.Tubule', 'Proximal.tubule', 'T CD4 memory', 'T CD4 naive', 'T CD8 memory', 'T CD8 naive', 'Thick.ascending.limb.of.Loop.of.Henle', 'Transitional.urothelium', 'Treg', 'Type.A.intercalated.cell', 'Type.B.intercalated.cell', 'mDC', 'macrophage', 'mast', 'monocyte', 'neutrophil', 'pDC', 'plasmablast'
obsm: 'location'

Single cell data has the same gene list with spatial data

However, when training the model: it works fine in CondSCVI, but I meet the ValueError in DestVI training:
Exception has occurred: ValueError Expected value argument (Parameter of shape (979,)) to be within the support (Real()) of the distribution Normal(loc: torch.Size([979]), scale: torch.Size([979])), but found invalid values: Parameter containing: tensor([nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,...

I am try to use raw data input or use log-normalized data input, or change the learning rate, but I still meet the ValueError.
Have you faced with similar error before? Hoping that you can help me, thank you so much!

Release 1.1.0-rc.1

atac

dev

  • data_tutorial #180
  • model_user_guide #167
  • module_user_guide #168

hub

  • minification #169
  • scvi_hub_intro_and_download #174
  • scvi_hub_upload_and_large_files #170

multimodal

  • cite_scrna_integration_w_totalVI #171
  • MultVI_tutorial #172
  • totalVI_reference_mapping #175
  • totalVI #176

quick_start

scrna

spatial

tuning

  • autotune_new_model #194
  • autotune_scvi #201

Tensor Nan Error

Hello, thank you for the great work, I'm trying to run DestVI on other datasets, and I've transformed all the data into h5ad format with all necessary values to train the model,

I'm following DestVI_tutorial.ipynb to run the training in order to get the cell type composition matrix

My input singlecell data:
AnnData object with n_obs × n_vars = 1691 × 19972
obs: 'cell_types'
My input ST data:
AnnData object with n_obs × n_vars = 3067 × 135
obsm: 'location'

However, when training the sc model, the following error occurred:

ValueError: Expected parameter loc (Tensor of shape (128, 5)) of distribution Normal(loc: torch.Size([128, 5]), scale: torch.Size([128, 5])) to satisfy the constraint Real(), but found invalid values:
tensor([[nan, nan, nan, nan, nan],
        nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan]], device='cuda:0', grad_fn=<AddmmBackward0>)

Have you faced with similar error before? Hoping that you can help me, thank you so much!

multimodal integrated dataset from multiVI on tutorial of Joint analysis of paired and unpaired multiomic data with MultiVI

Hi,

I have a question about the MULTIVI.get_latent_representation on the tutorials of Joint analysis of paired and unpaired multiomic data with MultiVI.

Follow the tutorial, I run my demo data with multiVI and get the latent space matrix via get_latent_representation function.
My latent representation result includes 3 times the total cells of my original datasets ( original 1000 cells, latent space 3000 cells).
I think that may be because for the multiVI we concatenate 3 anndata
adata_mvi = scvi.data.organize_multiome_anndatas(adata_paired, adata_rna, adata_atac).
If I want to get the multimodal integrated dataset from multiVI, should I only use the first 1000 rows (in my example) as the integrated result of RNA and ATAC modality?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.