Git Product home page Git Product logo

scflow's Introduction

scflow

Documentation Status

This repository contains a collection of pipelines that aid the analysis of single cell sequencing experiments. Currently there is one pipeline implimented that allows the analysis of drop-seq and 10X sequencing analysis. Current pipelines in development: 1) pseudoalignment scpipeline 2) velocyto pipeline 2) kallisto bustools pipeline.

Installation

Conda installation - in progress

The preferred method for installation is through conda/mamba. Preferably the installation should be in a seperate environment::

mamba env create -f conda/environments/scflow.yml
conda activate scflow
python setup.py develop

# Install a specific version of kb-tools from Adam's cloned repo
git clone [email protected]:Acribbs/kb_python.git
cd kb_python
python setup.py develop

scflow --help

Usage

Run the scflow --help command view the help documentation for how to run the single-cell repository.

To run the main single_cell droplet based pipeline run first generate a configuration file::

scflow singlecell config

Then run the pipeline::

scflow singlecell make full -v5

Then to run the report::

scflow singlecell make build_report

Documentation

Further help that introduces single-cell and provides a tutorial of how to run example code can be found at read the docs

Pipelines overview

scflow main quantnuclei

scflow main quantcells

seurat qc-1

seurat filter-2

seurat cluster-3

seurat doublet-4

seurat integration-5

Project Info

scflow's People

Contributors

2003100127 avatar acribbs avatar annajbott avatar carlacohen avatar ecrwatson avatar jencyw avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

scflow's Issues

Dependencies missing

The following dependencies need to be installed in the scflow conda env following updates to the qc-1 pipeline

DropletUtils

conda install -c bioconda bioconductor-dropletutils
Install celda
conda install -c bioconda bioconductor-celda

Bug in QC.Rmd

Line 158
ensembl <- try(useMart("ensembl", dataset = "hsapiens_gene_ensembl", mirror=ini$mirror))
if(class(ensembl) == "try-error"){
httr::set_config(httr::config(ssl_verifypeer = FALSE))
ensembl <- useMart("ensembl", dataset = "hsapiens_gene_ensembl")
}

We need to remove the argument mirror=ini$mirror because this is not an argument for useMart and causes it to crash.

Filter.QC requires SeuratData installation

FilterQC.Rmd is run during the filter-2 pipeline.
This requires the SeuratData package.
Installing with mamba did not work so I installed via github using R in the terminal

R
Load SeuratDisk via github
if (!requireNamespace("remotes", quietly = TRUE)) {
install.packages("remotes")
}
remotes::install_github("mojaveazure/seurat-disk")

QC.Rmd requires additional packages

I ran the qc-1 pipeline which generates QC.Rmd
Attempted to render this file on the command line using

Rscript -e "rmarkdown::render('QC.Rmd')"
Before this would work I had to install the following packages (using mamba)
mamba install -c bioconda bioconductor-tximport
mamba install -c bioconda bioconductor-busparse
mamba install -c bioconda r-seurat
mamba install -c bioconda/label/broken r-ggthemes

quantnuclei: Error message even though pipeline is running

I've got the scflow main quantnuclei pipeline to run. However the pipeline.log always ends with the following error message, regardless of what part of the pipeline was running. The outputs are all there as expected. I think it's possibly to do with the conda env?

Error from pipeline.log
`
Original exception:

Exception #1
'<class 'TypeError'>handle_sigint() got multiple values for argument 'pool'' raised in ...
Traceback (most recent call last):
File "/home/c/ccohen/conda/obds_conda/envs/scflow/lib/python3.8/site-packages/ruffus/task.py", line 5250, in pipeline_run
job_result = ii.next(**itr_kwargs)
File "src/gevent/_imap.py", line 105, in gevent._gevent_c_imap.IMapUnordered.next
File "src/gevent/_imap.py", line 113, in gevent._gevent_c_imap.IMapUnordered._inext
File "src/gevent/queue.py", line 350, in gevent._gevent_cqueue.Queue.get
File "src/gevent/queue.py", line 327, in gevent._gevent_cqueue.Queue._Queue__get_or_peek
File "src/gevent/_waiter.py", line 154, in gevent._gevent_c_waiter.Waiter.get
File "src/gevent/_greenlet_primitives.py", line 61, in gevent._gevent_c_greenlet_primitives.SwitchOutGreenletWithLoop.switch
File "src/gevent/_greenlet_primitives.py", line 61, in gevent._gevent_c_greenlet_primitives.SwitchOutGreenletWithLoop.switch
File "src/gevent/_greenlet_primitives.py", line 65, in gevent._gevent_c_greenlet_primitives.SwitchOutGreenletWithLoop.switch
File "src/gevent/_gevent_c_greenlet_primitives.pxd", line 35, in gevent._gevent_c_greenlet_primitives._greenlet_switch
TypeError: handle_sigint() got multiple values for argument 'pool'
\

2023-03-10 08:58:50,243 ERROR main control - end of all error messages
`

Error from nohup.out
`Original exception:

Exception #1
  '<class 'TypeError'>handle_sigint() got multiple values for argument 'pool'' raised in ...
Traceback (most recent call last):
  File "/home/c/ccohen/conda/obds_conda/envs/scflow/lib/python3.8/site-packages/ruffus/task.py", line 5250, in pipeline_run
    job_result = ii.next(**itr_kwargs)
  File "src/gevent/_imap.py", line 105, in gevent._gevent_c_imap.IMapUnordered.__next__
  File "src/gevent/_imap.py", line 113, in gevent._gevent_c_imap.IMapUnordered._inext
  File "src/gevent/queue.py", line 350, in gevent._gevent_cqueue.Queue.get
  File "src/gevent/queue.py", line 327, in gevent._gevent_cqueue.Queue._Queue__get_or_peek
  File "src/gevent/_waiter.py", line 154, in gevent._gevent_c_waiter.Waiter.get
  File "src/gevent/_greenlet_primitives.py", line 61, in gevent._gevent_c_greenlet_primitives.SwitchOutGreenletWithLoop.switch
  File "src/gevent/_greenlet_primitives.py", line 61, in gevent._gevent_c_greenlet_primitives.SwitchOutGreenletWithLoop.switch
  File "src/gevent/_greenlet_primitives.py", line 65, in gevent._gevent_c_greenlet_primitives.SwitchOutGreenletWithLoop.switch
  File "src/gevent/_gevent_c_greenlet_primitives.pxd", line 35, in gevent._gevent_c_greenlet_primitives._greenlet_switch
TypeError: handle_sigint() got multiple values for argument 'pool'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/home/c/ccohen/conda/obds_conda/envs/scflow/bin/scflow", line 33, in
sys.exit(load_entry_point('scflow', 'console_scripts', 'scflow')())
File "/ceph/project/tendonhca/ccohen/git/scflow/scpipelines/entry.py", line 104, in main
module.main(sys.argv)
File "/ceph/project/tendonhca/ccohen/git/scflow/scpipelines/pipeline_quantnuclei.py", line 269, in main
P.main(argv)
File "/home/c/ccohen/conda/obds_conda/envs/scflow/lib/python3.8/site-packages/cgatcore/pipeline/control.py", line 1528, in main
run_workflow(args)
File "/home/c/ccohen/conda/obds_conda/envs/scflow/lib/python3.8/site-packages/cgatcore/pipeline/control.py", line 1458, in run_workflow
raise ValueError("pipeline failed with errors") from ex
ValueError: pipeline failed with errors
`

to do next

To do for pipeline:
[ ] Make pipeline downstream tasks run for both kallisto and salmon
[ ] Modify monocle to monocle3
[ ] Monocle task maximum DLL - fix issue or workaround
[ ] Modify seurat to seurate3
[ ] velocyto
[ ] benchmark

QC.Rmd filtering error

QC.Rmd line 229
Comments say remove cells with <100 UMIs but command says which(metadata$nUMI > 5), surely this should be which(metadata$nUMI > 100)?

Also when we created the Seurat object, we used min.features = 200 so this step is not going to change anything.

quantnuclei: updating dependencies

I was trying to run scflow main quantnuclei pipeline. However the pipeline always stop with the following error message.

Original exception:

Exception #1
  'builtins.OSError(---------------------------------------
Child was terminated by signal -127: 
The stderr was: 

kb ref -i geneset.dir/index.idx -g geneset.dir/t2g.txt -f1 geneset.dir/cdna.fa     -f2 geneset.dir/intron.fa -c1 geneset.dir/cdna_t2c.txt -c2 geneset.dir/intron_t2c.txt     --workflow nucleus  Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz Homo_sapiens.GRCh38.105.gtf.gz 2> ref.log
-----------------------------------------)' raised in ...
   Task = def pipeline_quantnuclei.build_kallisto_index(...):
   Job  = [None -> geneset.dir/index.idx]

Traceback (most recent call last):
  File "/home/cenk/miniconda3/envs/scflow/lib/python3.8/site-packages/ruffus/task.py", line 712, in run_pooled_job_without_exceptions
    return_value = job_wrapper(params, user_defined_work_func,
  File "/home/cenk/miniconda3/envs/scflow/lib/python3.8/site-packages/ruffus/task.py", line 608, in job_wrapper_output_files
    job_wrapper_io_files(params, user_defined_work_func, register_cleanup, touch_files_only,
  File "/home/cenk/miniconda3/envs/scflow/lib/python3.8/site-packages/ruffus/task.py", line 540, in job_wrapper_io_files
    ret_val = user_defined_work_func(*(params[1:]))
  File "/mnt/c373681d-f151-47ce-9133-962128da5732/cenk/Bone_Seq/scflow/scpipelines/pipeline_quantnuclei.py", line 149, in build_kallisto_index
    P.run(statement, job_memory = "100G")
  File "/home/cenk/miniconda3/envs/scflow/lib/python3.8/site-packages/cgatcore/pipeline/execution.py", line 1244, in run
    benchmark_data = r.run(statement_list)
  File "/home/cenk/miniconda3/envs/scflow/lib/python3.8/site-packages/cgatcore/pipeline/execution.py", line 1029, in run
    raise OSError(
OSError: ---------------------------------------
Child was terminated by signal -127: 
The stderr was: 

kb ref -i geneset.dir/index.idx -g geneset.dir/t2g.txt -f1 geneset.dir/cdna.fa     -f2 geneset.dir/intron.fa -c1 geneset.dir/cdna_t2c.txt -c2 geneset.dir/intron_t2c.txt     --workflow nucleus  Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz Homo_sapiens.GRCh38.105.gtf.gz 2> ref.log
-----------------------------------------

When narrowed down, the error seems to be caused by termination signal -127. Which usually indicates a command not found or an executable not being in the system's PATH. Further investigation showed that 'kb-python' package was not installed and accessible, as seen in the following ref.log file.

Results/ctmpu9okdu7v.sh: line 23: kb: command not found

When kb-python was installed manually as explained in the main page, pipeline ran without any errors.

Therefore, if we could update the the following dependency and add it to pipeline it would be easier to use the whole pipeline
-kb-python

Installation issues

I used conda to install scflow

Not all dependencies are installed

  • installed kb_python via Acribbs/kb_python.git
  • used mamba to install python 3.8

mamba install python=3.8

  • required another dependency

mamba install nbconvert

pipeline_doublets-4.py

In order to run this script I had to install scDblFinder into the conda environment.

mamba install bioconductor-scdblfinder

Windows issue with fonts

The custom ggplot theme "theme_Publication" specifies base_family = "Arial".
This gives an error in Windows saying "font family not found in Windows font database".
However it does not interrupt plotting of the graphs.

This error can be overcome by using the following packages:

install.packages("extrafont")
library(extrafont)
Need to install a specific version of one particular dependency.
library(remotes)
remotes::install_version("Rttf2pt1", version = "1.3.8")
extrafont::font_import()
loadfonts(device = "win")

See
https://stackoverflow.com/questions/34522732/changing-fonts-in-ggplot2
and
https://stackoverflow.com/questions/61204259/how-can-i-resolve-the-no-font-name-issue-when-importing-fonts-into-r-using-ext

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.