Git Product home page Git Product logo

panpipes's People

Contributors

bio-la avatar crichgriffin avatar deevdevil88 avatar giuliaelgarcia avatar kevinrue avatar lilly-may avatar sarahouologuem avatar zethson avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

panpipes's Issues

LSI, Neighbors, panpipes_clustering

Hi,

a small detail I noticed in panpipes_clustering:

When calculating neighbors (meaning, use_existing=False), the pipeline only recalculates the PCA if not present, and not the LSI (Please see: https://github.com/DendrouLab/panpipes/blob/main/panpipes/python_scripts/rerun_find_neighbors_for_clustering.py#L53).
It is possible to run neighbors on LSI, but only if the LSI is already present in the object. Otherwise, an error is thrown: ValueError: Did not find X_lsi in .obsm.keys(). You need to compute it first. in: https://github.com/DendrouLab/panpipes/blob/main/panpipes/funcs/scmethods.py#L267
A small enhancement could be to also recalculate the LSI if dim_red: X_lsi and it is not present in the object (the same way it's already done for PCA).

ADT PCA run parameters choice

hi,
Currently PCA on the ADT modality , if clr and dsb is run, is always run on dsb. Perhaps this can be parametrised, so the user can decide if they want PCA on clr or dsb. If this is in-convinient then this should be made clearer in the pipeline.yml, so users are aware that when running dsb, pca is always based on dsb. As this also affects downstream tasks.

Secondly, based on the recent single cell best practices book , removing the isotypes when doing dim-reduction might be an a sensisble choice, since not everyone might want to do this, perhaps this choice can also be paramterised and be an option in the pipeline.yml for the panpies_preprocess workflow

best,
Devika

pipeline_ingest.concat_filtered_mudatas requires pytz

pipeline_ingest.concat_filtered_mudatas part of the pipeline throws an error because of missing pytz.

ERROR main control -



Original exception:

Exception #1
'builtins.OSError(Job 29542491 has non-zero exitStatus 1: hasExited=True, wasAborted=FalsehasSignal=False, terminatedSignal=''
Traceback (most recent call last):
File "[path]/envs/panpipes/lib/python3.9/site-packages/panpipes/python_scripts/concat_adata.py", line 1, in
import scanpy as sc
File "[path]/envs/panpipes2/lib/python3.9/site-packages/scanpy/init.py", line 6, in
from ._utils import check_versions
File "[path]/envs/panpipes2/lib/python3.9/site-packages/scanpy/_utils/init.py", line 21, in
from anndata import AnnData, version as anndata_version
File "[path]/envs/panpipes2/lib/python3.9/site-packages/anndata/init.py", line 7, in
from ._core.anndata import AnnData
File "[path]/envs/panpipes2/lib/python3.9/site-packages/anndata/_core/anndata.py", line 21, in
import pandas as pd
File "[path]/envs/panpipes2/lib/python3.9/site-packages/pandas/init.py", line 16, in
raise ImportError(
ImportError: Unable to import required dependencies:
pytz: No module named 'pytz'
)' raised in ...
Task = def pipeline_ingest.concat_filtered_mudatas(...):
Traceback (most recent call last):
File "[path]/envs/panpipes/lib/python3.9/site-packages/ruffus/task.py", line 712, in run_pooled_job_without_exceptions
return_value = job_wrapper(params, user_defined_work_func,
File "[path]/envs/panpipes/lib/python3.9/site-packages/ruffus/task.py", line 545, in job_wrapper_io_files
ret_val = user_defined_work_func(*params)
File "[path]/envs/panpipes/lib/python3.9/site-packages/panpipes/panpipes/pipeline_ingest.py", line 178, in concat_filtered_mudatas
P.run(cmd, **job_kwargs)
File "[path]/envs/panpipes/lib/python3.9/site-packages/cgatcore/pipeline/execution.py", line 1244, in run
benchmark_data = r.run(statement_list)
File "[path]/envs/panpipes/lib/python3.9/site-packages/cgatcore/pipeline/execution.py", line 820, in run
stdout, stderr, resource_usage = self.queue_manager.collect_single_job_from_cluster(
File "[path]/envs/panpipes/lib/python3.9/site-packages/cgatcore/pipeline/cluster.py", line 145, in collect_single_job_from_cluster
raise OSError(error_msg)
OSError: Job 29542491 has non-zero exitStatus 1: hasExited=True, wasAborted=FalsehasSignal=False, terminatedSignal=''
Traceback (most recent call last):
File "[path]/envs/panpipes/lib/python3.9/site-packages/panpipes/python_scripts/concat_adata.py", line 1, in
import scanpy as sc
File "[path]/envs/panpipes2/lib/python3.9/site-packages/scanpy/init.py", line 6, in
from ._utils import check_versions
File "[path]/envs/panpipes2/lib/python3.9/site-packages/scanpy/_utils/init.py", line 21, in
from anndata import AnnData, version as anndata_version
File "[path]/envs/panpipes2/lib/python3.9/site-packages/anndata/init.py", line 7, in
from ._core.anndata import AnnData
File "[path]/envs/panpipes2/lib/python3.9/site-packages/anndata/_core/anndata.py", line 21, in
import pandas as pd
File "[path]/envs/panpipes2/lib/python3.9/site-packages/pandas/init.py", line 16, in
raise ImportError(
ImportError: Unable to import required dependencies:
pytz: No module named 'pytz'

\

I removed lines of the log containing individual .h5mu files to protect patient information and replaced my cluster paths with [path]. After conda install -c anaconda pytz the error persists.

Violin plots are not being plotted

Hi,
I have noticed that violin plots are no longer being for the data. The files are generated and there is a border but i dont actually see any violin plots being plotted. This has only started happening recently.
Best,
Devika

mofa failing within panpipes

Hi
I have installed panpipes using a python venv for the Oxford BMRC cluster.

modules i have loaded are
Python/3.10.4-GCCcore-11.3.0 R-bundle-Bioconductor/3.15-foss-2022a-R-4.2.1
I am using muon version 0.1.5, mudata 0.2.3 and mofapy2 0.7.0.

the training model converges, but the pipeline fails with this error (attached screenshot)

image

Thanks,
Devika

LSI, scATAC-Seq

Hi,

I noticed that the preprocessing/QC part of the pipeline doesn't provide a plot that could guide the decision on whether to exclude the first LSI component or not.
The signac package provides a plot of the correlation between the sequencing depth and the components: https://stuartlab.org/signac/reference/depthcor. Including this plot in the pipeline may be a nice extension.

[preprocess] how to run pre-filtered objects

These lines are unnecessary:

PARAMS['filt_file'] = PARAMS['sample_prefix'] + "_filt.h5mu"

PARAMS['scaled_file'] = PARAMS['sample_prefix'] + "_scaled.h5mu"

And the yml for pipeline_preprocess is wrong about what files that have already been filtered is wrong

The correct thing to do is name your file {sample_prefix}.h5mu

missing package openpyxl - checked on new PyPy installation

message came up while running find cluster markers. @crichgriffin check installation requirements please!

  File "/Users/fabiola.curion/Documents/devel/github/panpipes/panpipes/python_scripts/run_find_markers_multi.py", line 213, in <module>
    main(adata, 
  File "/Users/fabiola.curion/Documents/devel/github/panpipes/panpipes/python_scripts/run_find_markers_multi.py", line 183, in main
    with pd.ExcelWriter(excel_file_top) as writer:
  File "/Users/fabiola.curion/Documents/devel/miniconda3/envs/pipeline_bbknn/lib/python3.9/site-packages/pandas/io/excel/_openpyxl.py", line 56, in __init__
    from openpyxl.workbook import Workbook
ModuleNotFoundError: No module named 'openpyxl'

reduce wnn runtime by fetching the precomputed no_batch_*

in the current version of the integration , if wnn is run on no-batch corrected modalities, it will run neighbours on each modality on the flight in a "no_batch" way (i.e. on precomputed dimred such as PCA or LSI if specified) with the same param as specified for each of the no_batch unimodal analyses.
it's a different behaviour when wnn is calc on pre-batch corrected unimodal data, cause in that case the pipeline expects each batch corrected object to exist and it's correctly reflected in the decorators flow.

we need to modify wnn to fetch precomputed no_batch instead of running on the flight to reduce the runtime (currently runs nobatch twice per modality if wnn is called on no_batch)

Current installation instructions do not work

I am aware that pip install . does not work as intended, fixing it will require a resturcture of the repo.

Current alternative method of installation is as follows

pip install -r requirements_minimal.txt
Rscript r_install_libraries.R
python setup_orig.py develop

Hopefully this will get fixed up in the next couple of days!

ATAC preprocess binarizing data even when 'binarize' set to False

Hello! Am having some issues with preprocessing atac data (paired multiome). I am trying to perform preprocessing to be able to run harmony for batch correction.

The pipeline.yml settings are as follows: 
atac:
  binarize: False
  normalize: log1p

Arguments appear to be read in correctly when running the pipeline:

pid: 45740, system: Linux 3.10.0-1160.62.1.el7.x86_64 #1 SMP Tue Apr 5 16:57:59 UTC 2022 x86_64
2023-09-01 17:42:35,606 INFO main control - atac                                    : {'binarize': False, 'normalize': 'log1p', 'TFIDF_flavour': None, 'feature_selection_flavour': 'scanpy', 'min_mean': None, 'max_mean': None, 'min_disp': None, 'min_cutoff': None, 'dimred': 'PCA', 'dim_remove': None} \
                                           atac_TFIDF_flavour                      : None \
                                           atac_binarize                           : False \
                                           atac_dim_remove                         : None \
                                           atac_dimred                             : PCA \
                                           atac_feature_selection_flavour          : scanpy \
                                           atac_normalize                          : log1p \

But I still have the preprocess log outputs as:

2023-09-01 18:06:29,095: INFO - running with args:
2023-09-01 18:14:14,192: INFO - binarizing peak count matrix
2023-09-01 18:14:15,416: WARNING - Careful, you have decided to binarize data but also to normalize per cell and log1p. Not sure this is meaningful
2023-09-01 18:14:45,824: WARNING - You have 8984 Highly Variable Features
2023-09-01 18:38:51,939: INFO - Done

Is there any other variable causing the atac processing to default to binarizing?
Thank you!

issue with plotting covariates and faceting plots uniformly across methods in panpipes_integration.py

hiya!
i noticed this while looking at plots for the different covariates across the multiple batch correction methods, when running integration workflow from panpipes. The same colours are not use to depict the same legend categories (in my case i noticed it for VDJ receptor subtypes) across all the methods, and when facet plots are created, the order of the headings is different for each method. The latter isnt a problem as such, but does make it difficult to compare across methods in a facet plot. Not sure if the first plottign issue happpens for all covariates or not, or only certain types. Thought i would flag the issue.
Screenshot 2023-03-14 at 14 20 49
Screenshot 2023-03-14 at 14 22 18

best,
Devika

Documentation suggestions

Clustering: Make it more clear that if you want to subcluster, you need to re run preprocess & integration before clustering.

Repertoire: no panpipes documentation on what gets incorporated. there doesn't seem to be a column for productive sequence or not?

1st Experience: Preprocessing, Integration, Clustering

Hi,

again just a few little things I ran into while running the steps Preprocessing, Integration, and Clustering.

  • Preprocessing:

    • Specifying the parameter "output_logged_mudata" in the pipeline.yml did not work for me, it threw an error. But in the pipeline.yml it is stated as "TODO" that this parameter is supposed to go, so I guessed that this error isn't so important. When leaving this parameter empty, the preprocessing worked completely fine
  • Integration:

    • Running "panpipes integration make plot_pcas --local" resulted in an error, stating:
      "Target task 'plot_pcas' is not a pipelined task in Ruffus. Is it spelt correctly ?"
  • Clustering:

    • So that others don't experience the same error, it may be helpful to mention that one needs to specify more than one resolution, otherwise clustree throws an error:
      "Error: Less than two column names matched the prefix: leiden_re
      Execution halted"

Best,
Sarah

No X_pca in obsm if filtering hvgs

When filtering to keep top hvgs only the outputted h5mu does not contain variables associated with scaling (.var 'std' or 'mean') or PCA (.obsm X_pca), even though other outputs (output_pca.txt.gz, filtered_genes.tsv) indicate these steps are being run:

AnnData object with n_obs × n_vars = 370316 × 61860
    obs: 'sample_id', 'doublet_scores', 'predicted_doublets', 'n_genes_by_counts', 'log1p_n_genes_by_counts', 'total_counts', 'log1p_total_counts', 'total_counts_hb', 'log1p_total_counts_hb', 'pct_counts_hb', 'total_counts_mt', 'log1p_total_counts_mt', 'pct_counts_mt', 'total_counts_rp', 'log1p_total_counts_rp', 'pct_counts_rp', 'total_counts_ig', 'log1p_total_counts_ig', 'pct_counts_ig', 'MarkersNeutro_score', 'S_score', 'G2M_score', 'batch'
    var: 'gene_ids', 'feature_types', 'genome', 'interval', 'hb', 'mt', 'rp', 'ig', 'n_cells_by_counts', 'mean_counts', 'log1p_mean_counts', 'pct_dropout_by_counts', 'total_counts', 'log1p_total_counts', 'highly_variable', 'highly_variable_rank', 'means', 'variances', 'variances_norm'
    uns: 'hvg', 'log1p'
    layers: 'raw_counts'

When not filtering hvgs, this is not an issue.

Muon's LSI, HVFs

Hi,

I noticed that Muon's implementation of the LSI for ATAC data doesn't take highly variable features into account.
As far as I see, the function takes the adata.X slot without providing the possibility to first select specific features for the LSI to be run on.
Please see their source code: https://github.com/scverse/muon/blob/master/muon/_atac/tools.py#L28
Also in their documentation, there is no such parameter: https://muon.readthedocs.io/en/latest/api/generated/muon.atac.tl.lsi.html

Meaning, when running LSI in panpipes in run_preprocess_atac.py, it is always run on all the features, even if atac.var["highly_variable"] is defined.

Please correct me if I'm wrong.

Possible scATAC extensions

Hi,

the following may be possible extensions to panpipes in regards to scATAC-data:

Preprocessing:

Visualization:

Analysis:
Not sure how deep panpipes wants to go on the analysis part, but:

RNA+ATAC:

Experience using the pipeline for the 1st time

Hi,

there are some aspects I've noticed while using the QC+Preprocessing for the first time with a RNA+ATAC multiome dataset (filtered_feature_bc_matrix.h5 file):

  • Sample submission file: unclear to me what is meant by the cellranger "outs" folder in regards to the keys "cellranger" and "cellranger_multi". What files are expected to be in the outs folder? (The barcodes.tsv, genes.tsv and matrix.mtx f.ex.?)

    • was unsure whether the folder containing the .h5 file (or cellranger outputs) needs to be named "outs"
  • Regarding the QC_mm gene lists: didn't know before running the pipeline that one has to provide a list & that it's not an option, as the documentation of the gene list formats states "...,the user can provide custom gene lists..."

  • Regarding the QC pipeline.yml file:

    • wasn't sure how to specify the "score_genes" parameter & what "MarkersNeutro" is (-> MarkersNeutro is a group of genes in the provided gene list, right?)
    • ATAC QC: did not know how to specify the "partner_rna" parameter for the multiome (RNA+ATAC) dataset, whether to set it as "True"/"False" etc; was not clear to me that this parameter needs to be left empty for my case + threw an error when trying to set "partner_rna" to the .h5 file of the RNA+ATAC data;
  • Regarding the output of the QC:

    • The scatter plot of the "n_genes_by_counts x doublet_scores" was too small, couldn't see the distribution clearly (see attached)
    • Filtering in the "Preprocessing" step of the pipeline: When wanting to filter genes by the number of cells they are expressed in (i.e. n_cells_by_counts) and the genes' total_counts, I wasn't able to decide on a cutoff because the QC produced no plots of the two metrics
    • Violin plots of "n_genes_by_counts" and the number of molecules in each cell (total_counts) would be nice for the user to have to decide on cutoffs. I know a lot of people who used those violin plots (including me), Seurat's tutorial: https://satijalab.org/seurat/articles/pbmc3k_tutorial.html also uses them
    • I ran the QC multiple times for the same dataset. Somehow I only got suggested thresholds for the RNA in tsv files the first time that I ran the QC. The other times I ran the QC, I didn't get this output

scatter_sample_id_rna-genes_rna-doublet_scores_rna-numi

AttributeError: 'NoneType' object has no attribute 'get_legend_handles_labels

I am ingest workflow in a multiome data. however, the panpipes ingest aborts with an error when it reaches 'run_scanpyQC_atac.py ' script. The error I get is:

sc.pl.violin(atac, qc_vars_plot,
File "/miniconda3/envs/pipeline_env/lib/python3.9/site-packages/scanpy/plotting/_anndata.py", line 795, in violin

g = sns.catplot(
File "
/seaborn/categorical.py", line 2932, in catplot
p.plot_violins(
File "/seaborn/categorical.py", line 1153, in plot_violi
ns
self._configure_legend(ax, legend_artist, common_kws)
File "
/seaborn/categorical.py", line 420, in configure
legend
handles, _ = ax.get_legend_handles_labels()
AttributeError: 'NoneType' object has no attribute 'get_legend_handles_labels' \

dsb normalisation: ValueError: could not convert string to float: 'Sample_587'

I am using panpipes to analyse a CITE seq data. In my submission file, the 'sample_id' column is 'Sample_587'. When I run the ingest workflow, I specify the 'dsb' normalization. However, the pipeline aborts at 'assess_background.py' script with the following error:

sns.heatmap(plt_df.iloc[1:split_int,:], ax=ax[0])
File "/envs/pipeline_env/lib/python3.9/site-packages/seaborn/matrix.py", line 446, in heatmap
plotter = _HeatMapper(data, vmin, vmax, cmap, center, robust, annot, fmt,
File "
/envs/pipeline_env/lib/python3.9/site-packages/seaborn/matrix.py", line 163, in init
self._determine_cmap_params(plot_data, vmin, vmax,
File "~/envs/pipeline_env/lib/python3.9/site-packages/seaborn/matrix.py", line 197, in determine_cmap
params
calc_data = plot_data.astype(float).filled(np.nan)
ValueError: could not convert string to float: 'Sample_587' \

Error Visualization Continuous Variable

Hi,

I ran the visualization part of the pipeline with a muData containing both scRNA- and scATAC- data. (Before the visualization step I ran the QC, Preprocessing, Clustering successfully).
When specifying both categorical & continuous variables for the RNA (not the ATAC, I left the ATAC part empty),
so f.ex. "rna:
- rna:total_counts",
it worked completely fine. But when only wanting to plot categorical variables and leaving the continuous variables empty, errors were thrown and the pipeline stopped. The parameter "continuous_violin" was set to False in both cases.
The errors included:
Error in mutate():
ℹ In argument: mod = ifelse(X1 %in% c("rna", "prot", "atac", "rep"), X1, "multimodal").
Caused by error in X1 %in% c("rna", "prot", "atac", "rep"):
! object 'X1' not found
when running:
Rscript /home/sarah/anaconda3/envs/pipeline_env/lib/python3.8/site-packages/panpipes/R_scripts/plot_metrics.R --mtd_object sample1_cell_metadata.tsv --params_yaml pipeline.yml > logs/plot_metrics.log

Do I need to specify the parameters in the pipeline.yml in a specific way so that it works? What do I have to consider when only wanting to plot categorical variables?

Thanks.

refactoring scib

We have removed all scib metrics computation from the integration pipeline.

scib metrics were implemented with the scope of evaluating unimodal integration. The use of scib metrics for evaluating multimodal integration and reference mapping has been adopted by the community and can provide useful insights for evaluation of multimodal integration.
However there is currently a lack of benchmarking metrics developed specifically for the evaluation of these tasks, which can result in misleading interpretation of integration results.
We and others in the sc field are currently working on generating ad-hoc benchmarking metrics for these tasks and they will be released in the near future.

Therefore, our aim for the next panpipes release is to:

  • write a separate workflow to allow extra flexibility when calling the scib computation
  • substitute scib with the faster scib-metrics package wherever possible
  • increase the number of metrics to include newly ad-hoc generated ones

We have left for now the calculation of scib metrics in the refmap workflow as a legacy example of how these are currently computed, but we will be refactoring them in due time.

If you feel you have ideas on implementing integration and/or refmap benchmarking metrics and want to contribute feel free to reach out!

ingest config won't work unless igraph updated

did conda install...had igraph 0.10.2. When I tried to do ingest config, had following error: AttributeError: module 'igraph' has no attribute 'VertexClustering'

Solution: uninstall igraph, pip re-installed igraph-0.10.8. Worked.

dsb does not run when intersection is False

dsb does not run when half the samples are rna + adt, half the samples are rna only (and no intersection between rna,adt is taken).

fix is to take the intersection of the background: mu.pp.intersect_obs(mdata_bg) prior to mu.prot.pp.dsb

Preparing submission file for multiome data

I am running a multiome data and preparing my submission file for the ingest workflow. Specifying the cellranger 'outs' folder as a x_path and 'cellranger' as x_filetype results in an error. However, specifying the complete path i.e 'outs/flitered_feature_bc_matrix.h5' and filetype '10X_h5' solves the issue.

AttributeError: 'YTick' object has no attribute 'label'

While running 'panpipes ingest make full --local' locally on my computer I receive this error: AttributeError: 'YTick' object has no attribute 'label'.

I guess it has to do something with matplotlib.

Full error code:

Traceback (most recent call last):
  File "/Users/justina/opt/anaconda3/envs/multiome_panpipes/lib/python3.9/site-packages/ruffus/task.py", line 712, in run_pooled_job_without_exceptions
    return_value = job_wrapper(params, user_defined_work_func,
  File "/Users/justina/opt/anaconda3/envs/multiome_panpipes/lib/python3.9/site-packages/ruffus/task.py", line 608, in job_wrapper_output_files
    job_wrapper_io_files(params, user_defined_work_func, register_cleanup, touch_files_only,
  File "/Users/justina/opt/anaconda3/envs/multiome_panpipes/lib/python3.9/site-packages/ruffus/task.py", line 540, in job_wrapper_io_files
    ret_val = user_defined_work_func(*(params[1:]))
  File "/Users/justina/opt/anaconda3/envs/multiome_panpipes/lib/python3.9/site-packages/panpipes/panpipes/pipeline_ingest.py", line 469, in run_dsb_clr
    P.run(cmd, **job_kwargs)
  File "/Users/justina/opt/anaconda3/envs/multiome_panpipes/lib/python3.9/site-packages/cgatcore/pipeline/execution.py", line 1244, in run
    benchmark_data = r.run(statement_list)
  File "/Users/justina/opt/anaconda3/envs/multiome_panpipes/lib/python3.9/site-packages/cgatcore/pipeline/execution.py", line 1029, in run
    raise OSError(
OSError: ---------------------------------------
Child was terminated by signal -1:
The stderr was:
/Users/justina/opt/anaconda3/envs/multiome_panpipes/lib/python3.9/site-packages/scvi/_settings.py:63: UserWarning: Since v1.0.0, scvi-tools no longer uses a random seed by default. Run `scvi.settings.seed = 0` to reproduce results from previous versions.
  self.seed = seed
/Users/justina/opt/anaconda3/envs/multiome_panpipes/lib/python3.9/site-packages/scvi/_settings.py:70: UserWarning: Setting `dl_pin_memory_gpu_training` is deprecated in v1.0 and will be removed in v1.1. Please pass in `pin_memory` to the data loaders instead.
  self.dl_pin_memory_gpu_training = (
/Users/justina/opt/anaconda3/envs/multiome_panpipes/lib/python3.9/site-packages/muon/_prot/preproc.py:219: UserWarning: adata.X is sparse but not in CSC format. Converting to CSC.
  warn("adata.X is sparse but not in CSC format. Converting to CSC.")
Traceback (most recent call last):
  File "/Users/justina/opt/anaconda3/envs/multiome_panpipes/lib/python3.9/site-packages/panpipes/python_scripts/run_preprocess_prot.py", line 144, in <module>
    pnp.plotting.ridgeplot(mdata["prot"], features=plot_features, layer="clr",  splitplot=6)
  File "/Users/justina/opt/anaconda3/envs/multiome_panpipes/lib/python3.9/site-packages/panpipes/funcs/plotting.py", line 299, in ridgeplot
    tick.label.set_fontsize(10)
AttributeError: 'YTick' object has no attribute 'label'

python /Users/justina/opt/anaconda3/envs/multiome_panpipes/lib/python3.9/site-packages/panpipes/python_scripts/run_preprocess_prot.py         --filtered_mudata test_unfilt.h5mu         --figpath ./figures/prot          --channel_col sample_id --normalisation_methods clr --quantile_clipping True --clr_margin 0 > logs/run_dsb_clr.log

Matplotlib version: 3.8.0

signac hvf selection expects gene_ids

https://github.com/DendrouLab/panpipes/blob/47f752e7a570dece48c420ae10d6298938282cbf/panpipes/funcs/scmethods.py#L42C9-L51C30

running this selection on an atac object that doesn't have "gene_ids" column doesn't work.
@SarahOuologuem can it be substituted with "features" instead? (the function should test if "gene_ids" is present in var_names otherwise use features or else issue warning and automatically set hvf selection to "scanpy")

                                              features  n_cells_by_counts  mean_counts  pct_dropout_by_counts  total_counts
chr1-9962-10510                        chr1-9962-10510                 12     0.005464              99.453552          12.0
chr1-180614-181999                  chr1-180614-181999                 65     0.031876              97.040073          70.0
chr1-191356-191736                  chr1-191356-191736                  3     0.001366              99.863388           3.0
chr1-267811-268201                  chr1-267811-268201                 13     0.005920              99.408015          13.0
chr1-586031-586368                  chr1-586031-586368                  3     0.001366              99.863388           3.0
...                                                ...                ...          ...                    ...           ...
KI270727.1-52104-52803          KI270727.1-52104-52803                 59     0.028689              97.313297          63.0
KI270728.1-232459-232988      KI270728.1-232459-232988                  6     0.002732              99.726776           6.0
KI270728.1-1791305-1792428  KI270728.1-1791305-1792428                  9     0.005009              99.590164          11.0
KI270734.1-117216-117331      KI270734.1-117216-117331                  5     0.002277              99.772313           5.0
KI270734.1-133749-134116      KI270734.1-133749-134116                  8     0.004098              99.635701           9.0

thank you!

'TypeError: _init_from_dict_() got an unexpected keyword argument 'matrix'

I am using panpipes to analyze CITEseq data, and managed to run the ingest workflow and get the resulting h5mu file. However, when I try to load the resulting 'x_unfilt.h5mu' file in a jupyter notebook using 'muon.read_h5mu()', I get an error that says: ' 'TypeError: init_from_dict() got an unexpected keyword argument 'matrix'.

Protein PCA fails when number of samples > number of features

In case of small number of samples and when number of features is 50, panpipes preprocess incorrectly establishes the number of PCAs that should be calculated:

n_comps=min(50,all_mdata['prot'].var.shape[0]-1),

Changing this line to n_comps=min(50,all_mdata['prot'].var.shape[0]-1, all_mdata['prot'].var.shape[1]-1) fixes the issue.

I also suggest considering changing the solver to auto below a certain threshold of cells, as it's more robust (but slower).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.