Git Product home page Git Product logo

nf-core / crisprseq Goto Github PK

View Code? Open in Web Editor NEW
23.0 141.0 25.0 7.19 MB

A pipeline for the analysis of CRISPR edited data. It allows the evaluation of the quality of gene editing experiments using targeted next generation sequencing (NGS) data (`targeted`) as well as the discovery of important genes from knock-out or activation CRISPR-Cas9 screens using CRISPR pooled DNA (`screening`).

Home Page: https://nf-co.re/crisprseq

License: MIT License

HTML 0.76% Python 18.23% Nextflow 49.59% R 31.42%
crispr crispr-analysis crispr-cas nextflow nf-core ngs pipeline workflow

crisprseq's People

Contributors

laurencekuhl avatar martasanvicente avatar mashehu avatar mirpedrol avatar msanvicente avatar mschaffer-incyte avatar nf-core-bot avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

crisprseq's Issues

Is there a way to specify a common reference genome and run the analysis without protospacer

Description of feature

Hello,

This is a great pipeline, I have a feature request, which I believe is simple, I have multiple samples, with the same reference genome, now providing reference genome within the sampleshseet.csv creates a separate reference.fasta for each of them with the sample name as contig_id. Now when Minimap2 maps the Fastq to the references, all my bam files have different contigs, making it bit difficult to compare.

Also, I see the a large portion of the pipeline can be run without "protospacer" , would it be possible to make it optional ?

Best,

Add --control-sgrna parameter for megeck count

Description of feature

I have some non-target sgRNA in my library.

image

megeck count privide parameters --control-sgrna for generate the null distribution.

What does the --control-sgrna CONTROL_SGRNA option do? How to use this option?
A: This option tells MAGeCK to use provided negative control sgRNAs to generate the null distribution when calculating the p values. If this option is not specified, MAGeCK generates the null distribution of RRA scores by assuming all of the genes in the library are non-essential. This approach is sometimes over-conservative, and you can improve this if you know some genes are not essential. By providing the corresponding sgRNA IDs in the --control-sgrna option, MAGeCK will have a better estimation of p values.

Can you kindly add these paramter, thanks!

Improve documentation describing day0 label

Description of feature

I've read the crisprseq and MAGeCK documentation a few times and the description and function of the day0_label is still a bit unclear to me, both in terms of how it should be specified (especially if there are multiple control samples/replicates) and what specifically it does.

Let's say I have 3 control samples, and 3 treated samples, should the day0_label be set to control as is specified in my contrasts? Or should it be a comma-separated list of e.g. Control_Rep1,Control_Rep2,Control_Rep3? Or is it only appropriate for a true time-course experiment?

Further, according to the MAGeCK documentation the day0 label is a required parameter, but it does not seem to be required for crisprseq to run--I get MAGeCK counts and MLE results regardless. Is it figuring out a label automatically based on the contrasts, or is this a bug?

Thanks!

Pipeline breaking when I tried to run with -r 2.1.1 instead of -r dev

Description of the bug

Hi,

I am not sure if this any compatibility issue but I couldn't run the pipeline with -r 2.1.1 which was suggested by one of the nfcore crisprseq developer. However when I did get something using -r dev it failed at some other step. I haven't been able to traceback the error and would appreciate any help in debugging.

Command used and terminal output

nextflow run nf-core/crisprseq -r 2.1.1 --input samplesheet_w_control.csv --mle_design_matrix design_matrix.txt --library final_input_sgrna.tsv --analysis screening --outdir /home/ap764/mageck/output -profile conda

Relevant files

Screenshot 2024-07-23 at 11 57 49 PM

The error when I ran the same command with -r dev :

`executor > local (8)
[90/f186a3] NFCORE_CRISPRSEQ:CRISPRSEQ_SCREENING:FASTQC (POST-KCNM-CDR-O) | 5 of 5 ✔
[ad/da22ac] NFC…R-O,PRE-CDR-O,POST-EGF-CDR-O,POST-KCNM-CDR-O,POST-PTPRZ-CDR-O) | 1 of 1 ✔
[7b/c72af1] NFC…RISPRSEQ:CRISPRSEQ_SCREENING:MAGECK_MLE_MATRIX (design_matrix) | 1 of 1 ✔
[8d/013eae] NFC…_CRISPRSEQ:CRISPRSEQ_SCREENING:MAGECK_FLUTEMLE (design_matrix) | 1 of 1, failed: 1 ✘
[- ] NFCORE_CRISPRSEQ:CRISPRSEQ_SCREENING:MULTIQC -
Execution cancelled -- Finishing pending tasks before exit
-[nf-core/crisprseq] Pipeline completed with errors-
ERROR ~ Error executing process > 'NFCORE_CRISPRSEQ:CRISPRSEQ_SCREENING:MAGECK_FLUTEMLE (design_matrix)'

Caused by:
Process NFCORE_CRISPRSEQ:CRISPRSEQ_SCREENING:MAGECK_FLUTEMLE (design_matrix) terminated with an error exit status (127)

Command executed [/home/ap764/.nextflow/assets/nf-core/crisprseq/templates/template_fluteMLE.R]:

#!/usr/bin/env Rscript

author: Laurence Kuhlburger

Released under the MIT license. See git repository (https://github.com/nf-core/crisprseq) for full license text.

graphs mageck MLE

library(MAGeCKFlute)
library(clusterProfiler)
library(ggplot2)

library(pathview)
options(ggrepel.max.overlaps = Inf)
mle <- read.table("design_matrix.gene_summary.txt", header = TRUE, sep = " ", stringsAsFactors = FALSE)

if("design_matrix" == "day0") {
beta_strings <- grep("\.beta", colnames(mle), value = TRUE)
before_beta <- sub("\.beta.", "", beta_strings)
unique_strings <- unique(before_beta)
for(i in unique_strings) {
tryCatch(
{
FluteMLE(mle, treatname= i, proj=i, pathview.top=5)
},
error=function(e) {
print(paste("Could not run FluteMLE with project",i))
}
)
}
} else {
beta_strings <- grep("\.beta", colnames(mle), value = TRUE)
before_beta <- sub("\.beta.
", "", beta_strings)
unique_strings <- unique(before_beta)
for(i in unique_strings) {
tryCatch(
{
FluteMLE(mle, treatname= i, proj=i, , pathview.top=5)
},
error=function(e) {
print(paste("Could not run FluteMLE with project",i))
}
)
}
}

version_file_path <- "versions.yml"
version_flute <- paste(unlist(packageVersion("MAGeCKFlute")), collapse = ".")
version_ggplot <- paste(unlist(packageVersion("ggplot2")), collapse = ".")
version_clusterprofiler <- paste(unlist(packageVersion("clusterProfiler")), collapse = ".")
version_pathview <- paste(unlist(packageVersion("pathview")), collapse = ".")

f <- file(version_file_path, "w")
writeLines('"NFCORE_CRISPRSEQ:CRISPRSEQ_SCREENING:MAGECK_FLUTEMLE":', f)
writeLines(" MAGeCKFlute: ", f, sep = "")
writeLines(version_flute, f)
writeLines(" ggplot2: ", f, sep = "")
writeLines(version_ggplot, f)
writeLines(" clusterProfiler: ", f, sep = "")
writeLines(version_clusterprofiler, f)
writeLines(" pathview: ", f, sep = "")
writeLines(version_pathview, f)
close(f)

Command exit status:
127

Command output:
(empty)

Command error:
/usr/bin/env: 'Rscript': No such file or directory

Work dir:
/home/ap764/mageck/work/8d/013eae9d1032316baef01978a208a4

Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named .command.sh

-- Check '.nextflow.log' file for details
ERROR ~ Pipeline failed. Please refer to troubleshooting docs: https://nf-co.re/docs/usage/troubleshooting

-- Check '.nextflow.log' file for details`

System information

No response

Replace System.exit(1)

Description of the bug

Replace System.exit(1) by Nextflow.error()

Command used and terminal output

No response

Relevant files

No response

System information

No response

Suggested addition: a PCA illustrating sample/condition/replicates

Description of feature

Maybe this isn't the typical case for most users, but my experiment has multiple replicates of treatment and control. If I wanted to check how concordant my replicates were, and how separated my treatment from control conditions were, I'd likely attempt a PCA based on the raw or normalized counts produced by MAGeCK, coloring the points both by sample and condition (similar to DESeq2 output in nf-core/rnaseq). This could also help identify whether any sample(s) were outliers compared to others, which would be supported by the countsummary table--ie perhaps one sample was an outlier from others in PC space and also has a lower-than-usual mapping rate or very high Gini index etc.

I'd also plot the cross-sample pairwise correlations of normalized counts as a (n * n) heatmap, which likely also would be a useful output

Sample not being detected as edited despite evidence of editing in the bam file

Description of the bug

I’ve run this pipeline now with a few hundred samples - generally it seems to be working very well. However, I notice that one sample which clearly shows editing in the bam file, is not being detected as edited. Looking at the indels QC file, almost all the reads are not passing filter. However, I can’t see from the bam file why it shouldn’t be passing filter.
Screenshot 2024-07-09 at 13 25 58

Command used and terminal output

No response

Relevant files

No response

System information

No response

MAGeCK is pulling conditions instead of unique sample names

Description of the bug

I have an experimental setup with biological triplicates of my conditions: treated_5hr, treated_7hr, and control. My sample names as specified in the samplesheet look like

treated_5hr_1
treated_5hr_2
treated_5hr_3
treated_7hr_1
treated_7hr_2
treated_7hr_3
control_1
control_2
control_3

However my MAGeCK-count log shows my sample labels as

--sample-label treated_5hr,treated_5hr,treated_5hr,treated_7hr,treated_7hr,treated_7hr,control,control,control

As a result, I can't tell my replicates apart in my counts tables (count_table.count.txt , count_table.count_normalized.txt)

sgRNA   Gene    treated_5hr     treated_5hr     treated_5hr     treated_7hr     treated_7hr     treated_7hr     control control control
sgRNA1976       CD28    303     315     134     207     374     438     399     281     329
sgRNA56069      POTEB2  350     652     587     555     501     784     558     785     509
sgRNA37077      ZC2HC1C 224     53      43      121     0       57      308     73      76
sgRNA7735       KRT5    571     458     393     396     533     811     339     278     352
sgRNA9783       OPRM1   164     39      67      167     386     120     177     145     107
[...]

Is there an additional metadata column I can pass in my samplesheet, or is there another parameter somewhere I've missed? Or is this somehow pulling names from the condition column instead?

Command used and terminal output

No response

Relevant files

No response

System information

No response

Add mock samples

Description of feature

Add the functionality to analyse mock samples and remove noise from the final results.

MAGECK mle cnv correction doesn't work

Description of the bug

When the user adds a cnv correction with the config file, the pipeline takes ages and then fails
count_table.count.txt
Brunello_RepA_Dropout_A375_Brunello_RepB_Dropout_A375_vs_Brunello_pDNA.txt

Command used and terminal output

nextflow run crisprseq -r dev -profile cfc --count_table count_table.count.txt --mle_design_matrix Brunello_RepA_Dropout_A375_Brunello_RepB_Dropout_A375_vs_Brunello_pDNA.txt  --outdir test --analysis screening -resume -dump-channels -c user.config

Relevant files

No response

System information

No response

Alignment subworkflows

Description of feature

Instead of using modules for the alignment step, use the nf-core subworkflows.

  • Use fastq_align_bwa for BWA alignment
  • Use fastq_align_bowtie2 for bowtie2 alignemnt
  • Create and use a new subworkflow for minimap2 alignment
  • Once we have Samtools stats we can also add these to the MultiQC report

** Consider creating a subworkflow with indexing + the respective alignment subworkflows

Improve metro map

Description of feature

As mentioned in a review comment we can add the "mandatory" steps in the legend.
We can also add a magnifying glass to clarify that the UMI clustering section is a zoom in of the steps.
Additionally, make sure that the text boxes do not overlap black lines.

Check if the matrice creation works properly

Description of the bug

A user seems to be having an issue with the matrice creation with the enclosed contrast file. Run matrice creation to see if it fails

Command used and terminal output

No response

Relevant files

reference;treatment
POST-RFP-CDR-O,PRE-CDR-O;POST-EGF-CDR-O,POST-KCNM-CDR-O,POST-PTPRZ-CDR-O

System information

No response

error in NFCORE_CRISPRSEQ:CRISPRSEQ_SCREENING:MAGECK_FLUTEMLE file is not in PNG format

Description of the bug

The process NFCORE_CRISPRSEQ:CRISPRSEQ_SCREENING:MAGECK_FLUTEMLE starts the FluteMLE command using the data of the gene_summary.txt file. It downloads several pathway files until pathway 'hsa05213' were it stops due to the error shown below.

The error did not occur when running only the 2.2.0/templates/template_fluteMLE.R script on my computer using the gene_summary.txt file. I also tried to use the latest image of mageckflute in the 2.2.0/modules/local/mageck/flutemle.nf with the same output.

Command used and terminal output

nextflow run 2.2.0 -params-file params.yaml -profile normal,singularity -resume -with-report -with-trace -with-timeline > log.txt

params.yaml
input: 'sample_sheet.csv'
outdir: 'results'
analysis: 'screening'
library: 'sgRNA_library.txt'
mle_design_matrix: 'design_matrices/T0_vs_T1_or_T2.txt' 
crisprcleanr: 'library_targets.csv'
min_reads: 3

error message:
  'select()' returned 1:1 mapping between keys and columns
  Info: Working in directory /dnext/project/78/a60432643daa4e4bad087f59d46236
  Info: Writing image file hsa05213.pathview.multi.png
  Error in png::readPNG(figure, native = FALSE) : file is not in PNG format
  Calls: FluteMLE ... arrangePathview -> lapply -> FUN -> <Anonymous> -> <Anonymous>
  In addition: There were 24 warnings (use warnings() to see them)
  Execution halted

Relevant files

No response

System information

Nextflow version: 23.04.0
Hardware: HPC
Executor: slurm
Container engine: Singularity
Version of nf-core/crisprseq: 2.2.0

No such variable CUTADAPT (v.2.2.0)

Description of the bug

Problem description:

I encountered an error while running the pipeline with the cutadapt option, specifically when passing adapter sequences using the five_prime_adapter and three_prime_adapter parameters. The pipeline execution fails with the following error message:

ERROR ~ No such variable: CUTADAPT

 -- Check script '/pod-data/pipelines/new_crisprseq/with_error/crisprseq/./workflows/crisprseq_screening.nf' at line: 89 or see '.nextflow.log' file for more details

Workaround:

I have resolved this issue locally by making the following changes:

  • Replaced CUTADAPT.out with CUTADAPT_FIVE_PRIME in line 89.
  • Replaced CUTADAPT.out with CUTADAPT_THREE_PRIME in line 98.
    These modifications allowed the pipeline to run without encountering the No such variable error. I attach the whole fragment of the modified code below:

Modified code

In the workflows/crisprseq_screening.nf:

        //set adapter seq to null to make it compatible with crispr targeted
        ch_cutadapt = ch_input.combine(Channel.value([[]]))
        if(params.five_prime_adapter) {
            CUTADAPT_FIVE_PRIME(
                ch_cutadapt
            )
            CUTADAPT_FIVE_PRIME.out.reads.combine(Channel.value([[]])).set { ch_cutadapt }
            ch_cutadapt.map{ meta, fastq, proto  ->
                meta.id = "${meta.id}_trim"
                [meta, fastq, proto]
            }.set { ch_cutadapt }

            ch_multiqc_files = ch_multiqc_files.mix(CUTADAPT_FIVE_PRIME.out.log.collect{it[1]})
            ch_versions = ch_versions.mix(CUTADAPT_FIVE_PRIME.out.versions)
        }

        if(params.three_prime_adapter) {
            CUTADAPT_THREE_PRIME(
                ch_cutadapt
            )
            ch_cutadapt = CUTADAPT_THREE_PRIME.out.reads.combine(Channel.value([[]]))
            ch_multiqc_files = ch_multiqc_files.mix(CUTADAPT_THREE_PRIME.out.log.collect{it[1]})
            ch_versions = ch_versions.mix(CUTADAPT_THREE_PRIME.out.versions)
        }

Command used and terminal output

nextflow run main.nf -profile k8s,docker -params-file parameters.json

ERROR ~ No such variable: CUTADAPT

 -- Check script '/pod-data/pipelines/new_crisprseq/with_error/crisprseq/./workflows/crisprseq_screening.nf' at line: 89 or see '.nextflow.log' file for more details

Relevant files

No response

System information

Nextflow info:

  • Version: 23.10.0 build 5889
  • Created: 15-10-2023 15:07 UTC
  • System: Linux 6.1.0-12-amd64
  • Runtime: Groovy 3.0.19 on OpenJDK 64-Bit Server VM 17.0.8+7-LTS
  • Encoding: UTF-8 (UTF-8)

template based edits are always zero and way to increase sensitivity

Description of feature

I have a few inquiries and a request that I would greatly appreciate your assistance with.

Following my analysis of several samples using the targeted pipeline (v.2.0.0), I noticed that upon reviewing the *_edits.csv files for all samples, the counts for template-based edits consistently read as 0. Could this possibly be attributed to the fact that I only provided reference sequences and omitted templates?

Would you please clarify on the distinction between reference and template sequences. Are they expected to differ, or can they be identical?

We have observed a higher number of edits than initially anticipated. Is there a means by which the sensitivity of the pipeline can be adjusted to address this? Also the primer sequences should it retained within the reference/template sequences or removed.

I attempted to execute the pipeline using Singularity, but encountered an issue at the "SAMPLESHEET_CHECK" stage, resulting in a "command not found" error. Unfortunately, I do not have the precise log for this. However, when using Docker, no such issues arise.

Thanks in advance for your help.

Pass file to cutadapt

Description of feature

Thanks for developing this useful pipeline.

I was wondering if it's possible to run the screening pipeline with cutadapt specifying fasta file of multiple adaptor sequences e.g. like here. I have tried trying to pass --cutadapt file:adapters.fasta to the nextflow pipeline, however, I receive an error stating that the 'adapters.fasta' file cannot be found, even though it is there locally.

I am running nextflow version 23.10.1, crisprseq v2.1.1, with singularity as a profile.

Thanks for your help.

example file for `--rra_contrasts`

MAGeCK RRA performs robust ranking aggregation to identify genes that are consistently ranked highly across multiple replicate screens. To run MAGeCK rra, --rra_contrasts should be used with a csv separated file stating the two conditions to be compared.

Hi, Can you show a example file for parameter --rra_contrasts if I start with fastq files. Sorry that I could not figure out the input for long time.

UMI extraction

Description of feature

Create a module for extracting UMIs form sequencing reads

Please pin the version of the `nf-validation` plugin!

Description of feature

Hello!

⚠️ The nf-validation plugin will have soon a new release (v2.0) which has breaking changes.
This will break the usage of all pipelines that don't pin the version of this plugin, as Nextflow tries to pull the latest version.

  • ❓ How to pin the version:
    Add this to your nextflow.config file (notice the @1.1.3):
plugins {
    [email protected]
}
  • 🚨 Release a patch release as soon as possible!
    Either if you already pinned the version on your dev branch or if you are going to add it now, please release this changes ASAP, to avoid breaking the usage of the latest version of your pipeline when the new version of nf-validation comes out.

  • 🤓 How to use the new version of nf-validation?
    If you want to be prepared for the new version of nf-validation, this will use the latest version of a JSON Schema draft. You will have to update the nextflow_schema.json and input_schema.json files (and any other JSON schemas that you use!).
    You can find a migration guide here.

Add spikes functionality

Description of feature

Add the functionality to analyse samples with spike ins in order to remove noise from the results.

Process exceeded running time limit (8h)

Description of the bug

I encountered a running time limit error when running CRISPR screening pipeline. Sometimes mageck mle running extremely long.

  error [nextflow.exception.ProcessFailedException]: Process `NFCORE_CRISPRSEQ:CRISPRSEQ_SCREENING:MAGECK_MLE (treatment_vs_control)` failed
Feb-08 15:11:52.579 [Task monitor] ERROR nextflow.processor.TaskProcessor - Error executing process > 'NFCORE_CRISPRSEQ:CRISPRSEQ_SCREENING:MAGECK_MLE (treatment_vs_control)'

Caused by:
  Process exceeded running time limit (8h)

Command executed:

  mageck \
      mle \
       \
      -k count_table.count.txt \
      -d treatment_vs_control.txt \
      -n treatment_vs_control

  cat <<-END_VERSIONS > versions.yml
  "NFCORE_CRISPRSEQ:CRISPRSEQ_SCREENING:MAGECK_MLE":
      mageck: $(mageck -v)
  END_VERSIONS

Command used and terminal output

No response

Relevant files

No response

System information

No response

Copy original file rather than softlink

Description of feature

Softlinks became invalid after nextflow clean. It's not convenient sometimes, especially when I want to backup results. Could this pipeline copy results from cache rather than just softlink?

ZeroDivisionError at alignment_summary.py stage due to low read number in input: mapped_reads_percentage = mapped_reads_count * 100 / total_reads_count

Description of the bug

Command executed [/Users/alantracey/.nextflow/assets/nf-core/crisprseq/templates/alignment_summary.py]:

#!/usr/bin/env python

############################

Summary of alignment

author: Júlia Mir @mirpedrol

Released under the MIT license. See git repository (https://github.com/nf-core/crisprseq) for full license text.

############################

import sys

import pysam

mapped_reads_count = int(pysam.view("-c", "-b", "-F", "4", "S01E1_BSM.bam"))
total_reads_count = int(pysam.view("-c", "-b", "S01E1_BSM.bam"))
mapped_reads_percentage = mapped_reads_count * 100 / total_reads_count

with open("S01E1_BSM_clustering_summary.csv", "r") as summary:
summary_lines = summary.readlines()

add_line = True
outname = "S01E1_BSM_clustering_summary.csv".replace("_clustering_summary.csv", "_alignment_summary.csv")
with open(outname, "w") as output_file:
for line in summary_lines:
if "aligned-reads" not in line:
output_file.write(line)
else:
output_file.write(f"aligned-reads, {mapped_reads_count} ({round(mapped_reads_percentage, 1)}%)\n")
add_line = False
if add_line:
output_file.write(f"aligned-reads, {mapped_reads_count} ({round(mapped_reads_percentage, 1)}%)\n")

with open("versions.yml", "w") as f:
f.write('"NFCORE_CRISPRSEQ:CRISPRSEQ_TARGETED:ALIGNMENT_SUMMARY":\n')
f.write(f' pysam: "{pysam.version}"\n')

Command exit status:
1

Command output:
(empty)

Command error:
Traceback (most recent call last):
File "/tmp/nxf.XXXXN0mHBp/.command.sh", line 15, in
mapped_reads_percentage = mapped_reads_count * 100 / total_reads_count
ZeroDivisionError: division by zero

Command used and terminal output

No response

Relevant files

No response

System information

No response

N characters introduced into *indels.csv

Description of the bug

Hi, I’ve just run crisprseq using the targeted pipeline with a read1.fastq.gz only. I heavily quality filtered the input reads, removing any reads containing N characters. In the output indels.csv, there are many cases of N characters being reported in the "pre_ins_nt", "ins_nt" and "post_ins_nt" columns. When I check these reads in the input fastq file, the reported N characters are [ACGT] characters with Q value > 30. For the handful of reads I’ve looked at with these reported N characters, the majority called insertion (normal ACGT sequence) can be found in the input sequence, further suggesting these N calls could be erroneous results. My data is confidential so I unfortunately cannot share it. However, I notice that in the test dataset output, there are N's reported in some of the insertion outcomes which don't occur in the input reads, eg M00724:1:000000000-DC7GJ:1:1102:19229:3583 in hCas9-TRAC-a_R*.fastq.gz - this has AGA-N-CAT.

Command used and terminal output

No response

Relevant files

No response

System information

No response

format rra contrasts documentation

Description of the bug

The markdown table explaining the RRA contrast matrix seems to not be well formated on the nf-core website
https://raw.githubusercontent.com/nf-core/test-datasets/crisprseq/testdata/rra_contrasts.txt
image

In addition, the example link points to a samplesheet. If I'm not wrong, the correct link should be https://raw.githubusercontent.com/nf-core/test-datasets/crisprseq/testdata/rra_contrasts.txt

Command used and terminal output

No response

Relevant files

No response

System information

No response

Output link on website is broken, directs to wrong url

Description of the bug

The crisprseq website output tab lets you select from outputs from targeted or screening, however both of these point to a dead link because of a url issue:

https://nf-co.re/crisprseq/2.2.0/docs/output/output/targeted/
https://nf-co.re/crisprseq/2.2.0/docs/output/output/screening/

Seems like the second output/ is the issue as these links work
https://nf-co.re/crisprseq/2.2.0/docs/output/targeted/
https://nf-co.re/crisprseq/2.2.0/docs/output/screening/

Could be related to recent website changes, e.g. here

Command used and terminal output

No response

Relevant files

No response

System information

No response

The number of labels (['KO14', 'KO-control']) must be equal to the number of fastq files provided.

Description of the bug

command:

nextflow run /path-to/nf-core-crisprseq-2.0.0/workflow \
    -preview \
    --analysis screening \
    --library library.csv \
    --genome GRCh37 \
    -profile singularity \
    --igenomes_base /path-to/references \
    -offline \
    --outdir . \
    --input samplesheet.csv \
    -resume \
    --max_cpus 16 \
    --max_memory 64.GB \
    --max_time 256.h

samplesheet.csv

sample,fastq_1,fastq_2,condition
KO14,KO14_S1_L003_R1_001.fastq.gz,KO14_S1_L003_R2_001.fastq.gz,KO14
KO-control,KO-control_S6_L005_R1_001.fastq.gz,KO-control_S6_L005_R2_001.fastq.gz,KO-control

Error:

Command error:
  WARNING: Skipping mount /var/lib/singularity/mnt/session/etc/resolv.conf [files]: /etc/resolv.conf doesn't exist in container
  INFO  @ Mon, 31 Jul 2023 18:09:46: Parameters: /usr/local/bin/mageck count -l library.csv -n count_table --sample-label KO14,KO-control --fastq KO14_S1_L003_R1_001.fastq.gz KO14_S1_L003_R2_001.fastq.gz KO-control_S6_L005_R1_001.fastq.gz KO-control_S6_L005_R2_001.fastq.gz
  INFO  @ Mon, 31 Jul 2023 18:09:46: Welcome to MAGeCK v0.5.9. Command: count
  ERROR @ Mon, 31 Jul 2023 18:09:46: The number of labels (['KO14', 'KO-control']) must be equal to the number of fastq files provided.

In megeck count --fastq, replicates seperated by comma, and samples with different condition seperated by space. However, crisprseq pipeline wrongly seperated fastq1 and fastq2 by comma, which fastq2 should be passed to --fastq-2.

image
  --fastq-2 FASTQ_2 [FASTQ_2 ...]
                        Paired sample fastq files (or fastq.gz files), the
                        order of which should be consistent with that in fastq
                        option.

Command used and terminal output

No response

Relevant files

No response

System information

No response

ERROR ~ No such variable: ch_library

Description of the bug

command:

nextflow run /path-to/nf-core-crisprseq-2.0.0/workflow \
    -preview \
    --analysis screening \
    --genome GRCh37 \
    -profile singularity \
    --igenomes_base /path-to/references \
    -offline \
    --outdir . \
    --input samplesheet.csv \
    -resume \
    --max_cpus 16 \
    --max_memory 64.GB \
    --max_time 256.h

samplesheet.csv

sample,fastq_1,fastq_2,condition
KO14,KO14_S1_L003_R1_001.fastq.gz,KO14_S1_L003_R2_001.fastq.gz,treatment
KO-control,KO_control_S6_L005_R1_001.fastq.gz,KO-control_S6_L005_R2_001.fastq.gz,control

output:

!! Only displaying parameters that differ from the pipeline defaults !!
------------------------------------------------------
If you use nf-core/crisprseq for your analysis please cite:

* The pipeline
  https://doi.org/10.5281/zenodo.7598497

* The nf-core framework
  https://doi.org/10.1038/s41587-020-0439-x

* Software dependencies
  https://github.com/nf-core/crisprseq/blob/master/CITATIONS.md
------------------------------------------------------
[-        ] process > NFCORE_CRISPRSEQ:CRISPRSEQ_SCREENING:INPUT_CHECK_... -
ERROR ~ No such variable: ch_library

 -- Check script '/path-to/nf-core-crisprseq-2.0.0/workflow/./workflows/crisprseq_screening.nf' at line: 108 or see '.nextflow.log' file for more details

crisprseq_screening.nf at line: 108

105         //
106         // MODULE: Run mageck count
107         //
108         MAGECK_COUNT (
109             joined,
110             ch_library
111         )
112
113         ch_versions = ch_versions.mix(MAGECK_COUNT.out.versions.first())
114
115
116         MAGECK_COUNT.out.count.map {
117         it -> it[1]
118         }.set { ch_counts }
119
120     } else {
121         Channel.fromPath(params.count_table)
122         .set { ch_counts }
123     }

.nextflow.log

Jul-29 22:10:07.018 [main] DEBUG nextflow.script.ProcessConfig - Config settings `withLabel:process_single` matches labels `process_single` for process with name NFCORE_CRISPRSEQ:CRISPRSEQ_SCREENING:INPUT_CHECK_SCREENING:SAMPLESHEET_CHECK_SCREENING
Jul-29 22:10:07.027 [main] DEBUG nextflow.executor.ExecutorFactory - << taskConfig executor: null
Jul-29 22:10:07.027 [main] DEBUG nextflow.executor.ExecutorFactory - >> processorType: 'local'
Jul-29 22:10:07.034 [main] DEBUG nextflow.executor.Executor - [warm up] executor > local
Jul-29 22:10:07.040 [main] DEBUG n.processor.LocalPollingMonitor - Creating local task monitor for executor 'local' > cpus=56; memory=251.6 GB; capacity=56; pollInterval=100ms; dumpInterval=5m
Jul-29 22:10:07.212 [main] DEBUG nextflow.script.ProcessConfig - Config settings `withLabel:process_medium` matches labels `process_medium` for process with name NFCORE_CRISPRSEQ:CRISPRSEQ_SCREENING:FASTQC
Jul-29 22:10:07.213 [main] DEBUG nextflow.script.ProcessConfig - Config settings `withName:FASTQC` matches process NFCORE_CRISPRSEQ:CRISPRSEQ_SCREENING:FASTQC
Jul-29 22:10:07.216 [main] DEBUG nextflow.executor.ExecutorFactory - << taskConfig executor: null
Jul-29 22:10:07.217 [main] DEBUG nextflow.executor.ExecutorFactory - >> processorType: 'local'
Jul-29 22:10:07.247 [main] DEBUG nextflow.Session - Session aborted -- Cause: No such property: ch_library for class: nextflow.script.WorkflowBinding
Jul-29 22:10:07.266 [main] DEBUG nextflow.Session - The following nodes are still active:
  [operator] splitCsv
  [operator] map
  [operator] mix
  [operator] mix
  [operator] map
  [operator] reduce
  [operator] map

Jul-29 22:10:07.274 [main] ERROR nextflow.cli.Launcher - @unknown
groovy.lang.MissingPropertyException: No such property: ch_library for class: nextflow.script.WorkflowBinding
        at groovy.lang.Binding.getVariable(Binding.java:61)
        at nextflow.script.WorkflowBinding.getVariable(WorkflowBinding.groovy:140)
        at groovy.lang.Binding.getProperty(Binding.java:116)
        at nextflow.script.WorkflowBinding.getProperty(WorkflowBinding.groovy:129)
        at org.codehaus.groovy.runtime.InvokerHelper.getProperty(InvokerHelper.java:190)
        at groovy.lang.Closure.getPropertyTryThese(Closure.java:320)
        at groovy.lang.Closure.getPropertyDelegateFirst(Closure.java:310)
        at groovy.lang.Closure.getProperty(Closure.java:296)
        at org.codehaus.groovy.runtime.callsite.PogoGetPropertySite.getProperty(PogoGetPropertySite.java:49)
        at org.codehaus.groovy.runtime.callsite.AbstractCallSite.callGroovyObjectGetProperty(AbstractCallSite.java:341)
        at Script_21dbb60c$_runScript_closure2$_closure4.doCall(Script_21dbb60c:108)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.base/java.lang.reflect.Method.invoke(Method.java:566)
        at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:107)
        at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:323)
        at org.codehaus.groovy.runtime.metaclass.ClosureMetaClass.invokeMethod(ClosureMetaClass.java:274)
        at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1035)
        at groovy.lang.Closure.call(Closure.java:412)
        at groovy.lang.Closure.call(Closure.java:406)
        at nextflow.script.WorkflowDef.run0(WorkflowDef.groovy:204)
        at nextflow.script.WorkflowDef.run(WorkflowDef.groovy:188)
        at nextflow.script.BindableDef.invoke_a(BindableDef.groovy:51)
        at nextflow.script.ComponentDef.invoke_o(ComponentDef.groovy:40)
        at nextflow.script.WorkflowBinding.invokeMethod(WorkflowBinding.groovy:102)
        at org.codehaus.groovy.runtime.metaclass.ClosureMetaClass.invokeOnDelegationObjects(ClosureMetaClass.java:408)
        at org.codehaus.groovy.runtime.metaclass.ClosureMetaClass.invokeMethod(ClosureMetaClass.java:350)
        at org.codehaus.groovy.runtime.callsite.PogoMetaClassSite.callCurrent(PogoMetaClassSite.java:61)
        at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCallCurrent(CallSiteArray.java:51)
        at org.codehaus.groovy.runtime.callsite.AbstractCallSite.callCurrent(AbstractCallSite.java:171)
        at org.codehaus.groovy.runtime.callsite.AbstractCallSite.callCurrent(AbstractCallSite.java:176)
        at Script_54323686$_runScript_closure1$_closure3.doCall(Script_54323686:46)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.base/java.lang.reflect.Method.invoke(Method.java:566)
        at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:107)
        at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:323)
        at org.codehaus.groovy.runtime.metaclass.ClosureMetaClass.invokeMethod(ClosureMetaClass.java:274)
        at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1035)
        at groovy.lang.Closure.call(Closure.java:412)
        at groovy.lang.Closure.call(Closure.java:406)
        at nextflow.script.WorkflowDef.run0(WorkflowDef.groovy:204)
        at nextflow.script.WorkflowDef.run(WorkflowDef.groovy:188)
        at nextflow.script.BindableDef.invoke_a(BindableDef.groovy:51)
        at nextflow.script.ComponentDef.invoke_o(ComponentDef.groovy:40)
        at nextflow.script.WorkflowBinding.invokeMethod(WorkflowBinding.groovy:102)
        at org.codehaus.groovy.runtime.metaclass.ClosureMetaClass.invokeOnDelegationObjects(ClosureMetaClass.java:408)
        at org.codehaus.groovy.runtime.metaclass.ClosureMetaClass.invokeMethod(ClosureMetaClass.java:350)
        at org.codehaus.groovy.runtime.callsite.PogoMetaClassSite.callCurrent(PogoMetaClassSite.java:61)
        at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCallCurrent(CallSiteArray.java:51)
        at org.codehaus.groovy.runtime.callsite.AbstractCallSite.callCurrent(AbstractCallSite.java:171)
        at org.codehaus.groovy.runtime.callsite.AbstractCallSite.callCurrent(AbstractCallSite.java:176)
        at Script_54323686$_runScript_closure2$_closure4.doCall(Script_54323686:61)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.base/java.lang.reflect.Method.invoke(Method.java:566)
        at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:107)
        at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:323)
        at org.codehaus.groovy.runtime.metaclass.ClosureMetaClass.invokeMethod(ClosureMetaClass.java:274)
        at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1035)
        at groovy.lang.Closure.call(Closure.java:412)
        at groovy.lang.Closure.call(Closure.java:406)
        at nextflow.script.WorkflowDef.run0(WorkflowDef.groovy:204)
        at nextflow.script.WorkflowDef.run(WorkflowDef.groovy:188)
        at nextflow.script.BindableDef.invoke_a(BindableDef.groovy:51)
        at nextflow.script.IterableDef$invoke_a.call(Unknown Source)
        at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:47)
        at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:125)
        at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:139)
        at nextflow.script.BaseScript.run0(BaseScript.groovy:183)
        at nextflow.script.BaseScript.run(BaseScript.groovy:192)
        at nextflow.script.ScriptParser.runScript(ScriptParser.groovy:229)
        at nextflow.script.ScriptRunner.run(ScriptRunner.groovy:224)
        at nextflow.script.ScriptRunner.execute(ScriptRunner.groovy:130)
        at nextflow.cli.CmdRun.run(CmdRun.groovy:368)
        at nextflow.cli.Launcher.run(Launcher.groovy:494)
        at nextflow.cli.Launcher.main(Launcher.groovy:653)

Command used and terminal output

No response

Relevant files

No response

System information

No response

Add UMIs

Description of feature

Add the subworkflow to handle samples with UMIs

Targeted pipeline does not work with BWA

Description of the bug

Hello,

I was trying to the V2.0.0 using Docker with --aligner bwa, however, I am getting error.

nextflow run nf-core/crisprseq -profile docker --overrepresented --input samplesheet.csv --analysis targeted --outdir $PWD --aligner bwa -r 2.0.0 -resume

Command used and terminal output

nextflow run nf-core/crisprseq -profile docker  --overrepresented --input samplesheet.csv --analysis targeted --outdir $PWD -aligner bwa  -r 2.0.0 -resume


ERROR ~ Error executing process > 'NFCORE_CRISPRSEQ:CRISPRSEQ_TARGETED:BWA_INDEX (7)'

Caused by:
  Not a valid path value type: org.codehaus.groovy.runtime.NullObject (null)


Tip: view the complete command output by changing to the process work dir and entering the command `cat .command.out`

 -- Check '.nextflow.log' file for details

Relevant files

No response

System information

No response

--reference and --protospacer not working

          Hi there, appreciate these changes as this is my exact use case, but I've found that I'm getting errors when I specify both options in v2.1.1.

I have tried multiple combinations of specifying protospacer and reference in samplesheet and params (running via nextflow tower on aws batch) - the only way that leads to successful pipeline completion is with both specified only in the samplesheet.

When I use only --protospacer I encounter an error on CRISPRSEQ_PLOTTER where the protospacer is passed as [].

When I use --reference as a path to a .fasta I encounter an error on ORIENT_REFERENCE:

Caused by:
  No such variable: id -- Check script '.nextflow/assets/nf-core/crisprseq/./workflows/../modules/local/orient_reference.nf' at line: 2

Appreciate any help you may have!

Originally posted by @benemead in #43 (comment)

count_table flag can't be taken into account

Description of the bug

If we try to provide the count table already, we need to be able to not specify any input samplesheet, however this is leading to an error right now.
THis is probably coming from aa validation from nf-plugin or something

Command used and terminal output

nextflow run nf-core/crisprseq  -r 2.1.0 --analysis screening --count_table all_raw_counts.tsv --mle_design_matrix design_matrix/*.txt -profile cfc -dump-channels --input ../samplesheet_test.csv --outdir results


ERROR ~ ERROR: Validation of 'input' file failed!

 -- Check '.nextflow.log' file for details
The following errors have been detected:

* -- Entry 1 - fastq_1: the file or directory '/Users/laurencekuhlburger/Documents/crisprseq/nf-core-test/test-datasets/testdata/SRR8983579.fastq.gz' does not exist.
* -- Entry 2 - fastq_1: the file or directory '/Users/laurencekuhlburger/Documents/crisprseq/nf-core-test/test-datasets/testdata/SRR8983580.fastq.gz' does not exist.

Relevant files

No response

System information

No response

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.