lucapinello / crispresso Goto Github PK

View Code? Open in Web Editor NEW

130.0 12.0 55.0 5.05 MB

Software pipeline for the analysis of CRISPR-Cas9 genome editing outcomes from sequencing data

License: Other

Python 99.72% Dockerfile 0.28%

crispr-analysis crispr-cas9 crispr cas9 ngs amplicon python docker

crispresso's People

Contributors

Stargazers

Watchers

crispresso's Issues

Amplicon Sequence

I got this error while the amplicon sequence looks fine in the excel.

ERROR: The amplicon sequence ??ރh?????m? contains wrong characters: ? ? ? ? ? ! ? " $ ) ? ? . 0 4 H 9 D ? I ? K ? ? S U O ? ? ] ? ? ? ? ? ? ? ? ? ? ?

Thanks,

Steve

Different results from online and command line CRISPResso with same data and parameters

Hi Luca,

I'm running CRISPResso in single read mode normally from command line, and checked with the online version, and it seems they output considerably different results. Specifically, I'm getting out about 50% less NHEJ from command line version compared to online version across samples, with the same data and parameters (see example and files attached below). So currently I don´t know which to trust.
H6_R2.fastq.tar.gz
H6_R2.fastq.tar.gz

I'm wondering if this is because of the specific installation on my machine, or whether the backend of online and command-line veersions differ, or whether the default parameters between the two differ. Any thoughts?

On another matter, does a CRISPResso forum exist? Sometimes would be better to address the question broadly. Any case, thanks for the great software.

PARAMETERS:
Online:
-Single-end reads
-Seq homology for HDR: 98%
-Window size: 25
-Min average read qual: 20
-Min single bp qual: 20
-Exclude bp from left: 5
-Exclude bp from right: 40
-Amplicon seq: TCGTGCTGCTTCATGTGGTCGGGGTAGCGGCTGAAGCACTGCACGCCGTAGGTCAGGGTGGTCACGAGGGTGGGCCAGGGCACGGGCAGCTTGCCGGTGGTGCAGATGAACTTCAGGGTCAGCTTGCCGTAGGT
-HDR seq: TCGTGCTGCTTCATGTGGTCGGGGTAGCGGCTGAAGCACTGCACGCCGTGGCTCAGGGTGGTCACGAGGGTGGGCCAGGGCACGGGCAGCTTGCCGGTGGTGCAGATGAACTTCAGGGTCAGCTTGCCGTAGGT
-Guide seq: ACCTACGGCGTGCAGTGCTT
-All else default

Command line:
CRISPResso -r1 H6_R2.fastq -g ACCTACGGCGTGCAGTGCTT -a TCGTGCTGCTTCATGTGGTCGGGGTAGCGGCTGAAGCACTGCACGCCGTAGGTCAGGGTGGTCACGAGGGTGGGCCAGGGCACGGGCAGCTTGCCGGTGGTGCAGATGAACTTCAGGGTCAGCTTGCCGTAGGT -e TCGTGCTGCTTCATGTGGTCGGGGTAGCGGCTGAAGCACTGCACGCCGTGGCTCAGGGTGGTCACGAGGGTGGGCCAGGGCACGGGCAGCTTGCCGGTGGTGCAGATGAACTTCAGGGTCAGCTTGCCGTAGGT -o test --exclude_bp_from_left 5 --exclude_bp_from_right 40 --save_also_png -w 25 -q 20 -s 20

Trailing whitespace in the amplicon sequence is not tolerated

Feature request: Trim trailing whitespace in amplicon field (Column 2) of amplicon file.

Background:
I sometimes get the following error when running CRISPRessoPooled in amplicon mode:

ERROR: The amplicon sequence 4-11873390-11873412 contains wrong characters:

It is related to trailing whitespace in the amplicon field of the amplicon file which seems to happen frequently in when Excel is involved in generating these files.

Python error running locally

Hello,

I tried your software online and it worked perfect for my sample, so I installed it on my system using pip (as recommended), but when I run the same sample with exactly the same options that you do on the website (commandline extracted from the report that your webtool generates), it gives me an error:

[...]
INFO  @ Tue, 25 Apr 2017 22:09:46:
         Quantifying indels/substitutions...

/modules/ogi-mbc/software/CRISPResso/0.7.0/lib/python2.7/site-packages/CRISPResso/CRISPRessoCORE.py:1336: RuntimeWarning: invalid value encountered in divide
  avg_vector_ins_all/=(effect_vector_insertion+effect_vector_insertion_hdr+effect_vector_insertion_mixed)
/modules/ogi-mbc/software/CRISPResso/0.7.0/lib/python2.7/site-packages/CRISPResso/CRISPRessoCORE.py:1337: RuntimeWarning: invalid value encountered in divide
  avg_vector_del_all/=(effect_vector_deletion+effect_vector_deletion_hdr+effect_vector_deletion_mixed)
INFO  @ Tue, 25 Apr 2017 22:18:12:
         Done!

INFO  @ Tue, 25 Apr 2017 22:18:12:
         Calculating indel distribution based on the length of the reads...

INFO  @ Tue, 25 Apr 2017 22:18:21:
         Done!

INFO  @ Tue, 25 Apr 2017 22:18:21:
         Calculating alleles frequencies...

CRITICAL @ Tue, 25 Apr 2017 22:18:21:
         Unexpected error, please check your input.

ERROR: invalid literal for int() with base 10: '0rc1'

Commandline:

CRISPResso -a CGAGAGCCGCAGCCATGAACGGCACAGAGGGCCCCAATTTTTATGTGCCCTTCTCCAACGTCACAGGCGTGGTGCGGAGCCCCTTCGAGCAGCCGCAGTACTACCTGGCGGAACCATGGCAGTTCTCCATGCTGGCAGCGTACATGTTCCTGCTCATCGTGCTGGG -r1 ../spli
t_fastq/2.P23H2_Het_276035w_DB12.fastq.R1.fastq -r2 ../split_fastq/2.P23H2_Het_276035w_DB12.fastq.R2.fastq -q 0 -s 0 --exclude_bp_from_left 15 --exclude_bp_from_right 15 --hdr_perfect_alignment_threshold 98 -w 1 --name TMP --output_folder
 ./ --save_also_png

I understand that must be something regarding my installation, but I have no clue what is not going well.

Thanks.

Error with Flash

I get the following error

[Command used]:
CRISPResso /Library/Frameworks/Python.framework/Versions/2.7/bin/CRISPResso -r1 43_S15_L001_R1_001.fastq.gz -r2 43_S15_L001_R2_001.fastq.gz -q 20 -a gagtgctggctctggcctggtgccacccgcctatgcccctccccctgccgtccccggccatcctgccccccagagtgctgaggtgtggggcgggccttctggggcacagcctgggcacagaggtggctgtgcgaagaggggcttgacctcggggttcagaaggggactttacgcgggaaggtactttccctccctccagctcccctcccccgcgtccttccacctctcccggtctctcccactcctcccctggccctccacagcccctcttcttcctcccctggccctctccttcctcccagtccctccccatcccctcccccctacttttcctcctccttccctcccctcctccctgtgcttcttccctgtctctctttcccgccccgctgtacctctccctctgcccctccgctccccgttcactctccctcctcccctgcccctcgacactgtccctcccc -g CGAAGAGGGGCTTGACCTCGGGG -o 43_S15_q20_out

[Execution log]:
Filtering reads with average bp quality < 20 ...
Estimating average read length...
Merging paired sequences with Flash...
[FLASH] ERROR: Maximum overlap (-49) cannot be less than the minimum overlap (4).
Please make sure you have provided the read length and fragment length
correctly.  Or, alternatively, specify the minimum and maximum overlap
manually with the --min-overlap and --max-overlap options.
[FLASH] FLASH did not complete successfully; exiting with failure status (1)
Merging error, please check your input.

ERROR: Flash failed to run, please check the log file.

I cannot seem to specify --max-overlap though. I' am working with 150bp PE reads from MiSeq.

Error "'numpy.int64'

Hello @lucapinello ,

I am trying to run CRISPResso locally to genotype clonal cell lines. It stops at two different points depending of the input:

Error 1: "Quantifying indels/substitutions... " reporting:

ERROR: Zero sequences aligned, please check your amplicon sequence

CRISPResso_RUNNING_LOG_Zero_error.txt

or Error 2: "Calculating alleles frequencies... " reporting:

("'numpy.int64' object is not iterable", u'occurred at index 0')

CRISPResso_RUNNING_LOG_numpy.int64_error.txt

Could be related with a low quality input fastq file?

I don't think that is a problem in the installation because I properly ran CRISPResso locally with successful results using fastq files with higher quality.

I would like to send you an example of both files (those than worked and current ones) in order to know if you are able to detect the difference that avoids CRISPResso to run properly. The problem is that I can't attach them because they are so big.

Running machine: Mac (OSX El Captain)

Thank you so much for your attention,

Andrés Marco

Window around cleavage position seems to be asymmetrical

With various data, I get different quantification results when I flip the reference amplicon sequence between sense and anti-sense strand (i.e. when I reverse-complement it) if I provide a guide RNA that I keep unchanged when I run CRISPRessoPooled. The results should be independent of that.

I think the problem is caused by the fact that the window around the cleavage position is asymmetrical with respect to the cleavage position (which by default is located between the third and fourth base of the guide sequence), i.e. there are more bases to the left of the cleavage position then on the right of the cleavage position taken into account when quantifying indels.

Looking at the code, I would propose
st=max(0,cut_p-half_window+1)
en=min(len(args.amplicon_seq),cut_p+half_window+1)
in lines 1228 and 1229 of CRISPRessoCORE.py. This did make the asymmetry between the results better in my example, but did not fully remove it, so it can't be the full solution.

Run for a long times

Hi,
I use the CRISPResso with a Single-Read as follows command.

CRISPResso -r1 sample_R1_001.fastq.gz -a ATATGACCAGGTCGTACACGATGTGGATCTGCAGAAGCTGCCTGTAAGATTTGCAATGGACAGAGCTGGCCTCGTTGGTGCAGATGGTCCAACACATTGTGGGGCTTTTGATGTCACTTTCATG -g TTGGTGCAGATGGTCCAACACAT --name sample -o ${out} --trim_sequences -p ${cpu} -w 20

But It's still in the step "Calculating alleles frequencies" when running long times.

INFO  @ Thu, 31 May 2018 14:15:20:
	 Calculating alleles frequencies...

So, whether the CRISPResso is supporting for Single-Read? And what is the best way for the Single-Read sequence?

Longer Allele Sequences

Hello Luca,

Is there any way to have longer allele sequences in the file Alleles_frequency_table_around_cut_site_for_*.txt?

It would be useful for the analysis of HDR events that are located far away from the gRNA cutting site.

Thank you for your attention,

Andrés

Front End Code

Hello,
I love your tool, it is very well done. Would it be possible to provide the Front End code that is running on http://crispresso.rocks/
Thanks!
Karly

Specify the FLASH --max-overlap parameter?

I get the following warning from FLASH about a high proportion of paired end reads overlapping by more than 100bp. This pooled dataset has many short amplicons and 150bp PE reads, so this is probably to be expected. Is it possible to specify the --max-overlap (-M) parameter to fix this?

[FLASH]  
[FLASH] Read combination statistics:
[FLASH]     Total pairs:      2554170
[FLASH]     Combined pairs:   415920
[FLASH]     Uncombined pairs: 2138250
[FLASH]     Percent combined: 16.28%
[FLASH]  
[FLASH] Writing histogram files.
[FLASH] WARNING: An unexpectedly high proportion of combined pairs (10.04%)
overlapped by more than 100 bp, the --max-overlap (-M) parameter.  Consider
increasing this parameter.  (As-is, FLASH is penalizing overlaps longer than
100 bp when considering them for possible combining!)

CRISPRessoPooledWGSCompare - hasnans

Running the CRISPRessoPooledWGSCompare I noticed a syntax bug in the CRISPRessoPooledWGSCompareCORE.py file.

Currently the log file gives the following error:

[Command used]:
CRISPRessoPooledWGSCompare /usr/local/bin/CRISPRessoPooledWGSCompare --name A4_S186_vs_A5_S184 --sample_1_name A4_S186 --sample_2_name A5_S184 --output_folder /home/aiezza/amplicon_exp/cspresso/A4_S186_vs_A5_S184 --save_also_png /home/aiezza/amplicon_exp/cspresso/A4_S186/CRISPRessoPooled_on_A4_S186 /home/aiezza/amplicon_exp/cspresso/A5_S184/CRISPRessoPooled_on_A5_S184

[Execution log]:


ERROR: 'numpy.bool_' object is not callable

The solution is described here on stackoverflow: pandas hasnan and I tested the field rather than the method and it worked giving the new following output:

[Command used]:
CRISPRessoPooledWGSCompare /usr/local/bin/CRISPRessoPooledWGSCompare --name A4_S186_vs_A5_S184 --sample_1_name A4_S186 --sample_2_name A5_S184 --output_folder /home/aiezza/amplicon_exp/cspresso/A4_S186_vs_A5_S184 --save_also_png /home/aiezza/amplicon_exp/cspresso/A4_S186/CRISPRessoPooled_on_A4_S186 /home/aiezza/amplicon_exp/cspresso/A5_S184/CRISPRessoPooled_on_A5_S184

[Execution log]:
Running CRISPRessoCompare:CRISPRessoCompare "/home/aiezza/amplicon_exp/cspresso/A4_S186/CRISPRessoPooled_on_A4_S186/CRISPResso_on_Fmn1" "/home/aiezza/amplicon_exp/cspresso/A5_S184/CRISPRessoPooled_on_A5_S184/CRISPResso_on_Fmn1" -o "/home/aiezza/amplicon_exp/cspresso/A4_S186_vs_A5_S184/CRISPRessoPooledWGSCompare_on_A4_S186_vs_A5_S184" -n1 "A4_S186_Fmn1" -n2 "A5_S184_Fmn1"
Running CRISPRessoCompare:CRISPRessoCompare "/home/aiezza/amplicon_exp/cspresso/A4_S186/CRISPRessoPooled_on_A4_S186/CRISPResso_on_Dntt" "/home/aiezza/amplicon_exp/cspresso/A5_S184/CRISPRessoPooled_on_A5_S184/CRISPResso_on_Dntt" -o "/home/aiezza/amplicon_exp/cspresso/A4_S186_vs_A5_S184/CRISPRessoPooledWGSCompare_on_A4_S186_vs_A5_S184" -n1 "A4_S186_Dntt" -n2 "A5_S184_Dntt"
Running CRISPRessoCompare:CRISPRessoCompare "/home/aiezza/amplicon_exp/cspresso/A4_S186/CRISPRessoPooled_on_A4_S186/CRISPResso_on_Ankrd10" "/home/aiezza/amplicon_exp/cspresso/A5_S184/CRISPRessoPooled_on_A5_S184/CRISPResso_on_Ankrd10" -o "/home/aiezza/amplicon_exp/cspresso/A4_S186_vs_A5_S184/CRISPRessoPooledWGSCompare_on_A4_S186_vs_A5_S184" -n1 "A4_S186_Ankrd10" -n2 "A5_S184_Ankrd10"
Running CRISPRessoCompare:CRISPRessoCompare "/home/aiezza/amplicon_exp/cspresso/A4_S186/CRISPRessoPooled_on_A4_S186/CRISPResso_on_Mt1" "/home/aiezza/amplicon_exp/cspresso/A5_S184/CRISPRessoPooled_on_A5_S184/CRISPResso_on_Mt1" -o "/home/aiezza/amplicon_exp/cspresso/A4_S186_vs_A5_S184/CRISPRessoPooledWGSCompare_on_A4_S186_vs_A5_S184" -n1 "A4_S186_Mt1" -n2 "A5_S184_Mt1"
Running CRISPRessoCompare:CRISPRessoCompare "/home/aiezza/amplicon_exp/cspresso/A4_S186/CRISPRessoPooled_on_A4_S186/CRISPResso_on_Psmd13" "/home/aiezza/amplicon_exp/cspresso/A5_S184/CRISPRessoPooled_on_A5_S184/CRISPResso_on_Psmd13" -o "/home/aiezza/amplicon_exp/cspresso/A4_S186_vs_A5_S184/CRISPRessoPooledWGSCompare_on_A4_S186_vs_A5_S184" -n1 "A4_S186_Psmd13" -n2 "A5_S184_Psmd13"
Running CRISPRessoCompare:CRISPRessoCompare "/home/aiezza/amplicon_exp/cspresso/A4_S186/CRISPRessoPooled_on_A4_S186/CRISPResso_on_Asap1" "/home/aiezza/amplicon_exp/cspresso/A5_S184/CRISPRessoPooled_on_A5_S184/CRISPResso_on_Asap1" -o "/home/aiezza/amplicon_exp/cspresso/A4_S186_vs_A5_S184/CRISPRessoPooledWGSCompare_on_A4_S186_vs_A5_S184" -n1 "A4_S186_Asap1" -n2 "A5_S184_Asap1"
Running CRISPRessoCompare:CRISPRessoCompare "/home/aiezza/amplicon_exp/cspresso/A4_S186/CRISPRessoPooled_on_A4_S186/CRISPResso_on_chr10_1" "/home/aiezza/amplicon_exp/cspresso/A5_S184/CRISPRessoPooled_on_A5_S184/CRISPResso_on_chr10_1" -o "/home/aiezza/amplicon_exp/cspresso/A4_S186_vs_A5_S184/CRISPRessoPooledWGSCompare_on_A4_S186_vs_A5_S184" -n1 "A4_S186_chr10_1" -n2 "A5_S184_chr10_1"
Skipping sample chr14 since it was not processed in one or both conditions
Running CRISPRessoCompare:CRISPRessoCompare "/home/aiezza/amplicon_exp/cspresso/A4_S186/CRISPRessoPooled_on_A4_S186/CRISPResso_on_chr13" "/home/aiezza/amplicon_exp/cspresso/A5_S184/CRISPRessoPooled_on_A5_S184/CRISPResso_on_chr13" -o "/home/aiezza/amplicon_exp/cspresso/A4_S186_vs_A5_S184/CRISPRessoPooledWGSCompare_on_A4_S186_vs_A5_S184" -n1 "A4_S186_chr13" -n2 "A5_S184_chr13"
Running CRISPRessoCompare:CRISPRessoCompare "/home/aiezza/amplicon_exp/cspresso/A4_S186/CRISPRessoPooled_on_A4_S186/CRISPResso_on_chr10_2" "/home/aiezza/amplicon_exp/cspresso/A5_S184/CRISPRessoPooled_on_A5_S184/CRISPResso_on_chr10_2" -o "/home/aiezza/amplicon_exp/cspresso/A4_S186_vs_A5_S184/CRISPRessoPooledWGSCompare_on_A4_S186_vs_A5_S184" -n1 "A4_S186_chr10_2" -n2 "A5_S184_chr10_2"
All Done!

best,
Alex

ERROR: 'transform' must be an instance of 'matplotlib.transform.Transform'

Hi Luca!
In the latest version of matplotlib (1.5.1) they changed how Transform is called. I can no longer get your code to work. Here is the following error message:

CRITICAL @ Sun, 14 Aug 2016 20:37:38:
Unexpected error, please check your input.

ERROR: 'transform' must be an instance of 'matplotlib.transform.Transform'

mutation frequency plot

hello @lucapinello

Thank you for your help so far, i was able to get CRISPResso working, I had a question with respeect to the mutation frequency plot, as seen below

What could be reason we seen peaks at the beginning and end of the sequence, I can see this in almost all my sequences i did used '--trim_sequence' for trimming adapters.

Thank you once again

If guide seq is wrong, do not crash instead run without guide seq

Thanks for the software and the recent addition of allowing outies for flash ! TOP 👍

Would you maybe as well considering following change as well in the Core code?

                     if not cut_points:
                         #CHANGE ADDED here (add warning and default values instead of a crash)
                         #raise SgRNASequenceException('The guide sequence/s provided is(are) not present in the amplicon sequence! \n\nPlease check your input!')
                         warn('The guide sequence/s provided is(are) not present in the amplicon sequence! \n\nPlease check your input!, running now without guide_sequence!!')
                         cut_points=[]
                         sgRNA_intervals=[]
                         offset_plots=[]
                     else:
                         info('Cut Points from guide seq:%s' % cut_points)

This helps for throughput analysis. I know that the downstream calculation values are off, but the resulting allele_frequency_tsv should not be affected at all. What do you think?

free variable 'df_genes' referenced before assignment in enclosing scope", u'occurred at index Site1'

Hi,
I get follows error when running the CRISPResso with Mixed mode (Amplicons + Genome).

ERROR: ("free variable 'df_genes' referenced before assignment in enclosing scope", u'occurred at index Site1')

My used genome is no exist in the UCSC. So, I create the gene annotations file through converting a GFF3 annotations file to a genePred file then input the --gene_annotations parameter. What’s wrong with it? And how to solve this problem?
Thanks.

Alignment error, please check your input

Dear Luca,

I am encountering the below error when running the command:

CRISPResso
-r1 /work/rpapa/sbelleghem/mutant_miSeq/fastq_trimmed/TL12_R1_paired.fastq.gz
-r2 /work/rpapa/sbelleghem/mutant_miSeq/fastq_trimmed/TL12_R2_paired.fastq.gz
--amplicon_seq ATTGGATCTTAAAAGCTTGGGCTAAGCTCATGTCGACGGTCAGTAATTAGCATTCCGCATATAGTTTACAAAGCATTGCCGTTGTAAATTATTGGAAACTATAATCTTGTGCAAAAACTTGTTTTTTTATAAATATTATAAAATATATTCGTACAGGATTGAAATATAAAAAAAACATATCAGCTGCGAATAAAATTAATAGAGAATAAAAAAATATACTTATATCACAGCGACATATTTATTTTATTCTCTATTTTATTCACATTATATTTTTACTCCATGCCAAATTGATAATAGAATATGAACCTGTAACAACAGTCCTTAAAAATCCAAAACGATTATTAAGTGGTTTAATATTTTTACATAACAACATCAAATAATTTAAATTATATCTATTTCTAGGTAATACAGACAGGTGCTCAACAGGCGGTTGAAGAGTGTCAATACCAATTCCGAAACAGCCGCTGGAACTGCAGCACTGTCGAAAACAGCACTGATATATTTGGAGGAGTACTTAAATTTAGTAAGTAAAAGTTAAATTTTTGATTTAAATTTGTAAATCCTTTTTAATTGACAACCTAAATACTTATTTTTATTTGGATATATTATATAAAAATGTTGGATGAGTTTGGATTCCACTTACTACTTGGCTTCTTGAGCACTAACTTTAAAAATATATAAATTCTATTTGGAAAACGAAAGAAATAAGATTTCAAATGATCTATAACTAACAATTTTTATTATGATAAACCACAAACAACTATACAAAACGATTTACACGTAAAATTAACATATTCTCAACATATTACACAAATAATACTACCGTTAACTCAAAATTGGCATATACATATAAATAAATCTTGAATCATAAAATTCATTTCCGCTCGGATTTCAAGTCAAAGTAAGTTGTAAATTCTCAAATAATTATCGGTTGCATACATCGGCAACTCTTCAAAGGACGTGTTAAGTG
--max_paired_end_reads_overlap 150
--name TL12
--output_folder /work/rpapa/sbelleghem/mutant_miSeq/CRISPResso_out

###############
INFO @ Wed, 05 Jun 2019 13:10:08:
Finished reads; N_TOT_READS: 29195 N_COMPUTED_ALN: 0 N_CACHED_ALN: 0 N_COMPUTED_NOTALN: 6874 N_CACHED_NOTALN: 22321

INFO @ Wed, 05 Jun 2019 13:10:08:
Done!

INFO @ Wed, 05 Jun 2019 13:10:08:
Quantifying indels/substitutions...

INFO @ Wed, 05 Jun 2019 13:10:08:
Done!

CRITICAL @ Wed, 05 Jun 2019 13:10:08:
Alignment error, please check your input.

ERROR: Error: No alignments were found
#############

Would you have any advice on what to check to know what is going wrong? The amplicon should be fine as I can easily find alignable sequences in my fastq files.

Thank you for any help!

Steven

Error with FLASH for CRISPResso

I tried several times running my already split paired-end reads with CRISPResso, unfortunately this is the result I get every time.

[Command used]:
CRISPResso /Users/mtoetzl/anaconda/bin/CRISPResso -r1 3_HeLa_SG1_293817w_CA3_R1.fastq -r2 3_HeLa_SG1_293817w_CA3_R2.fastq -a CAAGGCTGAAATTGAGAATGAAGACTATAGTTATACAAAAGATGGAATAGGACTAGATTTGGAAAATTCTTTTAGTAACATTCTGTTATTTGTTCCTGAGTACTTAGACTTCATGCAGAATGGTAACTACTTTCTGATTTTTGTGAAGTCATGGAGCTTGAACACCTCTGGTCTGCGGATTACCACCTTGAGCTCCAATTTGTACAAAAGAGATATAACATCTGCAAAAGTCATGAATGCCACTGCTGCACTGGAGTTCCTCAAAGACATGAA -g GGTGGTAATCCGCAGACCAGAGG

[Execution log]:
Estimating average read length...
Merging paired sequences with Flash...
[FLASH] ERROR: Maximum overlap (-97) cannot be less than the minimum overlap (4).
Please make sure you have provided the read length and fragment length
correctly. Or, alternatively, specify the minimum and maximum overlap
manually with the --min-overlap and --max-overlap options.
[FLASH] FLASH did not complete successfully; exiting with failure status (1)
Merging error, please check your input.

ERROR: Flash failed to run, please check the log file.

no of reads from pie chart and alleles do not add up

Hi I am using crispr guided to a target region. we then sequenced the genomic dna (pcr product) that targets this region using miseq 2x250 + 10bp Index1 + 8bp Index2 (as defined in my previous question
Alleles_frequency.txt
alleles.pdf
Quantification_of_editing_frequency.txt
pie.pdf

)

On reading the paper and supplementary we decided to use CRISPResso.py for our analysis, ~CRISPResso.py --trim_sequences -r1 sample _R1_001.fastq.gz -r2 sample_R2_001.fastq.gz -a CCTCGCAGACATTAAAGCCCgtgctttgcaggcccgaggggcgagaggttaccactgcaatcgagagacggccaccactgccatcggaggggggggtggcccgggtggaggtggcactcgggccatcgatgagggaggtggcagagacagcagcaGTGGTGATGGTAGTGAGGCC -g grna seq -o sample_out

My question is after successful run, the number of reads show in the pie chart and the alleles frequency do not add up (so the plot shows only site above 0.2%, but even when you look at the text file the number of reads is confusing). Was wondering if something is wrong our command or are we comprehending it in a wrong way. Have attached all the figures and text associated with it.

Thank you.

ERROR: If using all scalar values, you must pass an index

Hi there,

I am attempting your pipeline with single end reads (since my overlap is over 65bp and I get an error from flash when trying to merge. Is there any way to adjust flash params?). Reads were quality filtered and merged with usearch.

I have attached my merged reads and my reference sequence. They gave reasonable results with GATK/MuTect2...

I get this error, thanks for your help!:

-Analysis of CRISPR/Cas9 outcomes from deep sequencing data-

                      )
                     (
                    __)__
                 C\|     |
                   \     /
                    \___/
             

[Luca Pinello 2015, send bugs, suggestions or *green coffee* to lucapinello AT gmail DOT com]

Version 1.0.8

INFO  @ Tue, 24 Oct 2017 10:05:27:
	 Creating Folder CRISPResso_on_JC04-4-CT13_S4_filter 

INFO  @ Tue, 24 Oct 2017 10:05:27:
	 Done! 

INFO  @ Tue, 24 Oct 2017 10:05:27:
	 Preparing files for the alignment... 

INFO  @ Tue, 24 Oct 2017 10:05:27:
	 Done! 

INFO  @ Tue, 24 Oct 2017 10:05:27:
	 Aligning sequences... 

INFO  @ Tue, 24 Oct 2017 10:09:34:
[JC04-4-CT13_S4_filter.fastq.gz](https://github.com/lucapinello/CRISPResso/files/1411814/JC04-4-CT13_S4_filter.fastq.gz)
[JC04-4-CT13_reference.txt](https://github.com/lucapinello/CRISPResso/files/1411820/JC04-4-CT13_reference.txt)



	 Align sequences to reverse complement of the amplicon... 

INFO  @ Tue, 24 Oct 2017 10:09:34:
	 Done! 

INFO  @ Tue, 24 Oct 2017 10:13:35:
	 Quantifying indels/substitutions... 

CRITICAL @ Tue, 24 Oct 2017 10:13:35:
	 Unexpected error, please check your input.

ERROR: If using all scalar values, you must pass an index

Running with non-overlapping reads

Is it possible to run CRISPresso with reads that originate from a large amplicon (a few kilobases) that has been fragmented and then sequenced? These reads would not map to only the ends of the target amplicon...

Running CRISPResso - query

hello Developer,

Thank you for developing this tool.

I am not sure which module should I use for my purpose, we have crispr guided to a target region. we then sequenced the genomic dna (pcr product) that targets this region using miseq 2x250 + 10bp Index1 + 8bp Index2
I used two two different approach

using fastq and
using the bam file (alignment using bwa)

I have attached plots from both the steps, having difficulty in understanding the plots. could you please help in this?

4b.usingbam.pdf
4b.usingfastq.pdf

ERROR: The amplicons should be all distinct!

Hi,
I get follows stderr when running the CRISPRessoPooled.

ERROR: The amplicons should be all distinct!

CRISPResso command failed (return value 127) on region #0:

Using Crespresso 2.0.23
CRISPRessoWGS terminated with following error message:
Total region analyzed 18227
Similar message for all 18227 regions.
Running CRISPResso on region #1/18227: /home/pankum/miniconda3/lib/python2.7/site-packages/CRISPResso.py -r1 /san/ongoing/CRISPER_WGS_Data/CRISpresso/B2M-KO_101_predicted/CRISPRessoWGS_on_B2M-KO_101_predicted/ANALYZED_REGIONS/REGION_R_1.fastq.gz -a catctctctagggcaacgtcggctgcagctgagatggctgctccccggtg -o /san/ongoing/CRISPER_WGS_Data/CRISpresso/B2M-KO_101_predicted/CRISPRessoWGS_on_B2M-KO_101_predicted --name R_1 --needleman_wunsch_gap_extend -2 --max_rows_alleles_around_cut_to_plot 50 --aln_seed_count 5 --needleman_wunsch_aln_matrix_loc EDNAFULL --quantification_window_size 1 --quantification_window_center -3 --trimmomatic_command trimmomatic --conversion_nuc_from C --min_bp_quality_or_N 0 --default_min_aln_score 60 --needleman_wunsch_gap_incentive 1 --plot_window_size 40 --aln_seed_min 2 --needleman_wunsch_gap_open -20 --aln_seed_len 10 --conversion_nuc_to T --min_single_bp_quality 0 --exclude_bp_from_left 15 --min_average_read_quality 0 --min_frequency_alleles_around_cut_to_plot 0.2 --exclude_bp_from_right 15
CRISPResso command failed (return value 127) on region #0: "/home/pankum/miniconda3/lib/python2.7/site-packages/CRISPResso.py -r1 /san/ongoing/CRISPER_WGS_Data/CRISpresso/B2M-KO_101_predicted/CRISPRessoWGS_on_B2M-KO_101_predicted/ANALYZED_REGIONS/REGION_R_1.fastq.gz -a catctctctagggcaacgtcggctgcagctgagatggctgctccccggtg -o /san/ongoing/CRISPER_WGS_Data/CRISpresso/B2M-KO_101_predicted/CRISPRessoWGS_on_B2M-KO_101_predicted --name R_1 --needleman_wunsch_gap_extend -2 --max_rows_alleles_around_cut_to_plot 50 --aln_seed_count 5 --needleman_wunsch_aln_matrix_loc EDNAFULL --quantification_window_size 1 --quantification_window_center -3 --trimmomatic_command trimmomatic --conversion_nuc_from C --min_bp_quality_or_N 0 --default_min_aln_score 60 --needleman_wunsch_gap_incentive 1 --plot_window_size 40 --aln_seed_min 2 --needleman_wunsch_gap_open -20 --aln_seed_len 10 --conversion_nuc_to T --min_single_bp_quality 0 --exclude_bp_from_left 15 --min_average_read_quality 0 --min_frequency_alleles_around_cut_to_plot 0.2 --exclude_bp_from_right 15"

Error while generating plots with Cpf1 data

I have been using CRIPResso with Cas9 and Cpf1. So far, all the Cas9 experiments are fine, but when I ran the Cpf1 I get an error when generating the plots. I should say that the plots 1a, 1b, 2, 3, 4a, 4b, and 4e are generated. It seems to fail while generating plot 9. BTW I am specifying "--guide_seq" and also "--cleavage_offset 1" when running Cpf1.

Here's the last section of the log file:
....
INFO @ Thu, 28 Jun 2018 14:53:51:
Calculating alleles frequencies...

INFO @ Thu, 28 Jun 2018 14:55:38:
Done!

INFO @ Thu, 28 Jun 2018 14:55:38:
Making Plots...

CRITICAL @ Thu, 28 Jun 2018 14:55:49:
Unexpected error, please check your input.

ERROR: 'N'

Make installing of external dependencies optional

Currently it is not possible to install CRISPResso without the external dependencies if these are not yet in the current PATH. This makes it difficult to install CRISPResso in a conda environment, where the external dependencies are installed as part of the environment (which is not yet active and therefore not detectable by the CRISPResso setup.py script.

I would propose to separate the installation of the external tools from the setup.py script, as I think that setup.py should not be responsible for installing the external dependencies to start with. If you would like to facilitate the installation of external tools, I would either provide instructions to do so or provide a separate script for installing the dependencies (or ideally both).

[FLASH] ERROR: Maximum overlap (-173) cannot be less than the minimum overlap (4).

Hi:

Thanks for the wonderful tool! But when I used it to evaluate the sgRNA indel with the following script:

CRISPResso -r1 1.fq.gz -r2 2.fq.gz -a amplicon_sequence

there was an error "[FLASH] ERROR: Maximum overlap (-173) cannot be less than the minimum overlap (4)." It seemed that 'flash' failed to align when there was paired reads didn't cover the amplicon. Is that right? What should I do for this error? Thank you very much!

Running failed with the example sequences

Environment: CentOS6.5, Python 2.7.11, CRISPResso installed from source (master.zip downloaded from github), needle, flash and trimmomatic are in the CRISPResso dependencies dir.

Data was downloaded from http://bcb.dfci.harvard.edu/~lpinello/CRISPResso as indicated and run as:

CRISPResso -r1 reads1.fastq.gz -r2 reads2.fastq.gz -a AATGTCCCCCAATGGGAAGTTCATCTGGCACTGCCCACAGGTGAGGAGGTCATGATCCCCTTCTGGAGCTCCCAACGGGCCGTGGTCTGGTTCATCATCTGTAAGAATGGCTTCAAGAGGCTCGGCTGTGGTT -g TGAACCAGACCACGGCCCGT

output as follows:

-Analysis of CRISPR/Cas9 outcomes from deep sequencing data-

                      )
                     (
                    __)__
                 C\|     \
                   \     /
                    \___/


[Luca Pinello 2015, send bugs, suggestions or *green coffee* to lucapinello AT gmail DOT com]

Version 0.9.8

INFO  @ Tue, 02 Aug 2016 17:32:39:
     Cut Points from guide seq:[76]

WARNING @ Tue, 02 Aug 2016 17:32:39:
     Folder CRISPResso_on_reads1_reads2 already exists.

INFO  @ Tue, 02 Aug 2016 17:32:39:
     Estimating average read length...

INFO  @ Tue, 02 Aug 2016 17:32:40:
     Merging paired sequences with Flash...

INFO  @ Tue, 02 Aug 2016 17:32:41:
     Done!

INFO  @ Tue, 02 Aug 2016 17:32:42:
     Preparing files for the alignment...

INFO  @ Tue, 02 Aug 2016 17:32:42:
     Done!

INFO  @ Tue, 02 Aug 2016 17:32:42:
     Aligning sequences...

sed: couldn't write 73 items to stdout: Broken pipe
awk: (FILENAME=- FNR=809) fatal: print to "standard output" failed (Broken pipe)

gzip: stdout: Broken pipe
cat: write error: Broken pipe
INFO  @ Tue, 02 Aug 2016 17:32:42:
     Quantifying indels/substitutions...

CRITICAL @ Tue, 02 Aug 2016 17:32:42:
     Alignment error, please check your input.

ERROR: Zero sequences aligned, please check your amplicon sequence

RuntimeWarning: invalid value encountered in divide

When running control samples using CRISPResso v1.0.2, I get the following warnings:
/usr/lib/python2.7/site-packages/CRISPResso-1.0.2-py2.7.egg/CRISPResso/CRISPRessoCORE.py:1315: RuntimeWarning: invalid value encountered in divide
avg_vector_ins_all/=(effect_vector_insertion+effect_vector_insertion_hdr+effect_vector_insertion_mixed)
/usr/lib/python2.7/site-packages/CRISPResso-1.0.2-py2.7.egg/CRISPResso/CRISPRessoCORE.py:1316: RuntimeWarning: invalid value encountered in divide
avg_vector_del_all/=(effect_vector_deletion+effect_vector_deletion_hdr+effect_vector_deletion_mixed)

I did not get these warnings when I ran the same sample using v0.9.8. Here is my command:
FQ1=161219_A4_S4_L001_R1_001.fastq.gz
FQ2=161219_A4_S4_L001_R2_001.fastq.gz

AmpliconSequence=TAAGTGAATTACTTTTTTTGTCAATCATTTAACCATCTTTAACCTAAAAGAGTTTTATGTGAAATGGCTTATAATTGCTTAGAGAATATTTGTAGAGAGGCACATTTGCCAGTATTAGATTTAAAAGTGATGTTTTCTTTATCTAAATGA
sgRNA=TGTGAAATGGCTTATAATTGC
SAMPLE_NAME=38343_S_3
OUTDIR=$(pwd)

    CRISPResso \
    -r1 $FQ1 \
    -r2 $FQ2 \
    -a $AmpliconSequence \
    -g $sgRNA \
    -n $SAMPLE_NAME \
    -o $OUTDIR \
    --keep_intermediate \
    --save_also_png \
    --window_around_sgrna 10 \
    --hide_mutations_outside_window_NHEJ

I've attached two example files that generate this error. Would you please take a look?

Thanks for your help!
Matt

161219_A4_S4_L001_R2_001.fastq.gz
161219_A4_S4_L001_R1_001.fastq.gz

Left-aligning option

Hi Luca,

First of all thanks for developing this pipeline.

I have a question for you. I'm pasting below two of the alleles from the "Alleles_frequency_table_around_cut_site" file.

Aligned_Sequence Reference_Sequence Unedited %Reads #Reads
GACTGTAAGTGAATTACTTTTTTTGTCAATCA----ACCATCTTTAACCTAAAAGAGTTT GACTGTAAGTGAATTACTTTTTTTGTCAATCATTTAACCATCTTTAACCTAAAAGAGTTT True 0.24111800019683108 49
GACTGTAAGTGAATTACTTTTTTTGTCAATC----AACCATCTTTAACCTAAAAGAGTTT GACTGTAAGTGAATTACTTTTTTTGTCAATCATTTAACCATCTTTAACCTAAAAGAGTTT True 0.16730636748351543 34

As you can see, they are exactly the same except for the A which can align to either side. This reminds me to the LeftAlignAndTrimVariants module from GATK which in cases like this, simplifies the output by aligning those bases always to the left. Is it possible to improve the alignment in CRISPResso, so they are grouped as the same event?

Thanks

Use --min_identity_score for read vs amplicon

Hi,
I am using CRISPResso for simple amplicon seq analysis. I initially got the error that no reads aligned, which I tracked back to the fact that the amplicon sequence is 500 bp and the reads are only 150 bp. Would it be possible to change the min_identity_score to apply to the percent of the read that aligned rather than the amplicon? Pointers to where I could change this in the code would be appreciated as well. From a quick check, this number is being parsed out from the needle output, which defines it as the number of identical bases divided by the total in the alignment, which ends up being approxiamtely the length of the amplicon in the case described above. Maybe we can parse out the identical base count and divide by the length of the read instead?
Thanks for the help!
-Rahul

CRIPRessoPooled - query

Hello once again @lucapinello

Sorry to bug you again, I am using CrispressoPooled to analyze some pooled amplicon data

My experimental set up,

Paired end seq, 4 samples, 6 amplicons and 6 guides, i prepared the description file as mention in the docs had few questions

it the pooled analysis limited to only 5 amplicons?
"A description file containing the amplicon sequences used to enrich regions in the genome and some additional information. In particular, this file, is a tab delimited text file with up to 5 columns (first 2 columns required):"
Also i see this error "Skipping amplicon [site4] since no reads are aligning to it" all the other (5 of them) amplicons produce results as expected except site4, to add further I use the this amplicon and guide and process it using (the same fastq files) only CRISPResso.py and it seems to work fine was wondering what am i missing. Note i am using default parameters for both the analysis.

Thanks a ton once again.

Frank

Issue with ONLY AMPLICONS running mode

Hello,

I am trying to run CRISPRessoPooled in ONLY AMPLICONS mode. I have a problem into the generation of reads file, I think related with the following reported information into the _RUNNING_LOG file:

No samples; assembling all-inclusive block
Sorting block of length 1074 for bucket 1
(Using difference cover)
Error: reads file does not look like a FASTQ file
Error: Encountered exception: 'Unidentified exception'

I am already starting into bioinformatic analysis so I am not able to understand what it is happening. Furthermore, I couldn't find an example of this running mode to try solving the problem by my shelf. I attached the complete _RUNNING_LOG file and the input files.

AMPLICONS.txt
AV4T6XXXX1.fastq.gz
AV4T6XXXX2.fastq.gz
CRISPRessoPooled_RUNNING_LOG.txt

The program must be work because in CRISPResso mode I am able to reproduce the results obtained from the online version. But our design of sequencing experiments requires the Pooled version to make the analysis more easy and automatic.

Thank you so much for your attention,

Andrés Marco Giménez
Phd Student
Institute for Bioengineering of Catalonia (IBEC)

ERROR: Flash failed to run, please check the log file.

Hi,

I am trying to install CRISPResso on my computer, but am having trouble using FLASH. Here's the output I get when I try to run the example files:

CRISPResso -r1 reads1.fastq -r2 reads2.fastq -a GCTTACACTTGCTTCTGACACAACTGTGTTCACGAGCAACCTCAAACAGACACCATGGTGCATCTGACTCCTGAGGAGAAGAATGCCGTCACCACCCTGTGGGGCAAGGTGAACGTGGATGAAGTTGGTGGTGAGGCCCTGGGCAGGTTGGTATCAAGGTTACAAGA -e GCTTACACTTGCTTCTGACACAACTGTGTTCACGAGCAACCTCAAACAGACACCATGGTGCATCTGACTCCTGTGGAAAAAAACGCCGTCACGACGTTATGGGGCAAGGTGAACGTGGATGAAGTTGGTGGTGAGGCCCTGGGCAGGTTGGTATCAAGGTTACAAGA

-Analysis of CRISPR/Cas9 outcomes from deep sequencing data-

                      )
                     (
                    __)__
                 C\|     |
                   \     /
                    \___/
             

[Luca Pinello 2015, send bugs, suggestions or *green coffee* to lucapinello AT gmail DOT com]

Version 1.0.13

WARNING @ Mon, 12 Nov 2018 14:47:51:
	 Folder CRISPResso_on_reads1_reads2 already exists. 

INFO  @ Mon, 12 Nov 2018 14:47:51:
	 Estimating average read length... 

INFO  @ Mon, 12 Nov 2018 14:47:53:
	 Merging paired sequences with Flash... 

CRITICAL @ Mon, 12 Nov 2018 14:47:53:
	 Merging error, please check your input.

ERROR: Flash failed to run, please check the log file.

The running log says:
[Execution log]:
Estimating average read length...
Merging paired sequences with Flash...
/bin/sh: /Users/madeleinesitton/CRISPResso_dependencies/bin/flash: cannot execute binary file
Merging error, please check your input.

ERROR: Flash failed to run, please check the log file.

Thank you for the help!!

Last stage error Error with FLASH for CRISPRessoPooled.py

Dear Luca Pinello
I am using CRISPRessoPooled.py script to analyzed the paired ends reads data by Illumina.
the read length is 150 bp.
the amplicon size is 180bp long.
below is the command that I am used several time.
it ends up with an error at last of the analysis
I am new to the Linux environment. Kindly help me to resolve this issue.
thanks
ERROR: Flash failed to run, please check the log file.

$ python CRISPRessoPooled.py -r1 /home/bilal/Sir_Qayyum_data/6297_1_1.fastq.gz -r2 /home/bilal/Sir_Qayyum_data/6297_1_2.fastq.gz -f /home/bilal/Sir_Qayyum_data/CRISPResso-master/cat.csv

-Analysis of CRISPR/Cas9 outcomes from POOLED deep sequencing data-

              )                                            )
             (           _______________________          (
            __)__       | __  __  __     __ __  |        __)__
         C\|     \      ||__)/  \/  \|  |_ |  \ |     C\|     \
           \     /      ||   \__/\__/|__|__|__/ |       \     /
            \___/       |_______________________|        \___/
        

[Luca Pinello 2015, send bugs, suggestions or *green coffee* to lucapinello AT gmail DOT com]

Version 1.0.13

INFO  @ Tue, 05 Feb 2019 22:43:54:
	 Checking dependencies... 

INFO  @ Tue, 05 Feb 2019 22:43:54:
	 
 All the required dependencies are present! 

INFO  @ Tue, 05 Feb 2019 22:43:54:
	 Only the Amplicon description file was provided. The analysis will be perfomed using only the provided amplicons sequences. 

INFO  @ Tue, 05 Feb 2019 22:43:54:
	 Creating Folder CRISPRessoPooled_on_6297_1_1_6297_1_2 

WARNING @ Tue, 05 Feb 2019 22:43:54:
	 Folder CRISPRessoPooled_on_6297_1_1_6297_1_2 already exists. 

INFO  @ Tue, 05 Feb 2019 22:43:54:
	 Merging paired sequences with Flash... 

CRITICAL @ Tue, 05 Feb 2019 22:43:54:
	 

ERROR: Flash failed to run, please check the log file.

crispressoweb dockerfile

Hi @lucapinello,

Would you be willing to share the dockerfile for the crispressoweb docker image you have listed on docker hub?

modify the python scripts for availability to other CRISPR systems

Hi,
This program is best for analysis of conventional Cas9 (20bp+NGG) genome-editing sequencing data. But I also want to use this program to analysis cpf1(TTTN+23bp) sequencing data. So, I need to modify the source python scripts to instead the NGG to TTTN and other contents. The scripts CRISPRessoCORE.py is complicated and I can not find how to change.
So, can you give some hints?
Thanks.

FLASH: Low Percent Combined

Greetings!

I have Illumina NextSeq500 150bp, paired end reads, generated from whole shotgun sequencing of environmental samples. These sequences have been quality filtered (Sickle, Phred > 20, default length to keep a read = 20bp). I am now trying to use FLASH to merge the paired end reads. For all of my metagenomes, I get really low numbers (as compared to the FLASH website and paper).

I ran the command as follows for all metagenomes:
./flash -M 150 -o Flash.out Forward.fastq Reverse.fastq | tee Flash.log

The range of Percent combined: Min = 13.04% ; Max = 52.72% ; Ave = 32.11%

I am curious as to why these numbers are so low or if this is considered to be "acceptable."

Many Thanks!!

Appears to be counting reads with deletions as unedited

Hi Luca,
Big fan of crispresso and very much looking forward to crispresso 2.
I was running a batch of crispressos and I came across a few wells that seems to have a usually high amount of unmodified reads. When i looked at the alleles around the predicted cut side they appeared to have clear deletions but were counted as unmodified in every statistic.

Aligned_Sequence Reference_Sequence Unedited %Reads #Reads
AGTGGAGGATGCCTTCT--ACGTTGGTGCGTGAGATCCGG AGTGGAGGATGCCTTCTACACGTTGGTGCGTGAGATCCGG True 61.50844322453524 103082
GATGCCTTC-ACATGTCTCACGTTGGTGCGTGAGATCCGG GATGCCTTCTA-------CACGTTGGTGCGTGAGATCCGG True 35.92815800465359 60212
AGTGGAGGATGCCTTCT----------------------- AGTGGAGGATGCCTTCTACACGTTGGTGCGTGAGATCCGG False 0.6533802732859946 1095

I'm not sure why it CRISPresso could be counting these reads as unmodified. I think there is the same thing in the reads in the left-aligning option issue where the reads have deletions but are counted as unedited.

Any help would be deeply appreciated.

Thanks,

Alexander Raeside
Oxford Genetics

Tests

Is this thing even working? Who knows) I think it is time to add some tests.

Flash failed to run flash

Hello,

I was trying to use CRISPRessoPoll on my sequence file but received some FLASH error. The following is the log:

`[Command used]:
CRISPRessoPooled /Library/Frameworks/Python.framework/Versions/2.7/bin/CRISPRessoPooled -r1 FGC1478_s_1_1_AGGCAGAA-ACTGCATA.fastq.gz -r2 FGC1478_s_1_2_AGGCAGAA-ACTGCATA.fastq.gz -f CRISPResso1111.xlsx

[Execution log]:
Merging paired sequences with Flash...
[FLASH] Starting FLASH v1.2.11
[FLASH] Fast Length Adjustment of SHort reads
[FLASH]
[FLASH] Input files:
[FLASH] FGC1478_s_1_1_AGGCAGAA-ACTGCATA.fastq.gz
[FLASH] FGC1478_s_1_2_AGGCAGAA-ACTGCATA.fastq.gz
[FLASH]
[FLASH] Output files:
[FLASH] CRISPRessoPooled_on_FGC1478_s_1_1_AGGCAGAA-ACTGCATA_FGC1478_s_1_2_AGGCAGAA-ACTGCATA/out.extendedFrags.fastq.gz
[FLASH] CRISPRessoPooled_on_FGC1478_s_1_1_AGGCAGAA-ACTGCATA_FGC1478_s_1_2_AGGCAGAA-ACTGCATA/out.notCombined_1.fastq.gz
[FLASH] CRISPRessoPooled_on_FGC1478_s_1_1_AGGCAGAA-ACTGCATA_FGC1478_s_1_2_AGGCAGAA-ACTGCATA/out.notCombined_2.fastq.gz
[FLASH] CRISPRessoPooled_on_FGC1478_s_1_1_AGGCAGAA-ACTGCATA_FGC1478_s_1_2_AGGCAGAA-ACTGCATA/out.hist
[FLASH] CRISPRessoPooled_on_FGC1478_s_1_1_AGGCAGAA-ACTGCATA_FGC1478_s_1_2_AGGCAGAA-ACTGCATA/out.histogram
[FLASH]
[FLASH] Parameters:
[FLASH] Min overlap: 4
[FLASH] Max overlap: 100
[FLASH] Max mismatch density: 0.250000
[FLASH] Allow "outie" pairs: false
[FLASH] Cap mismatch quals: false
[FLASH] Combiner threads: 8
[FLASH] Input format: FASTQ, phred_offset=33
[FLASH] Output format: FASTQ, phred_offset=33, gzip
[FLASH]
[FLASH] Starting reader and writer threads
[FLASH] Starting 8 combiner threads
[FLASH] Processed 25000 read pairs
[FLASH] Processed 50000 read pairs
[FLASH] Processed 75000 read pairs
[FLASH] Processed 100000 read pairs
[FLASH] Processed 125000 read pairs
[FLASH] Processed 150000 read pairs
[FLASH] Processed 175000 read pairs
[FLASH] Processed 200000 read pairs
[FLASH] Processed 225000 read pairs
[FLASH] Processed 250000 read pairs
[FLASH] Processed 275000 read pairs
[FLASH] Processed 300000 read pairs
[FLASH] Processed 325000 read pairs
[FLASH] Processed 350000 read pairs
[FLASH] Processed 375000 read pairs
[FLASH] ERROR: Qual string length (55) not the same as sequence length (250) (file "FGC1478_s_1_1_AGGCAGAA-ACTGCATA.fastq.gz", near line 1502597)
[FLASH] FLASH did not complete successfully; exiting with failure status (1)

ERROR: Flash failed to run, please check the log file.
`

It seems like there's something wrong with my fastq file because I can run your test example without any problem, although that did not require CRISPRessoPoll.

CRISPResso available on bioconda

I hope this is OK; I made CRISPResso available for installation via bioconda:
https://bioconda.github.io/recipes/crispresso/README.html

(It's now possible to conda install crispresso)

Recipe source on GitHub here:
https://github.com/bioconda/bioconda-recipes/tree/master/recipes/crispresso

I haven't put it through its paces, so it is possible some changes may need to be made to the recipe (especially with respect to dependencies), but in general it is available and should be easy for users to install.

In the future, it would be nice to have tagged releases on GitHub so downstream users can keep track of versions.

Cheers,
Chris

No reads aligned?

I'm getting this error with my data, trying to align just one of the paired end read files. The amplicon input is a single line of text (I don't think it's terminated by a newline character). Alignment of the same reads to this sequence in CLC has no problems. Also test run of CRISPResso completed successfully, no problem.
[Command used]:
CRISPResso /usr/local/bin/CRISPResso -r1 2_S2_L001_R1_001.fastq.gz -a GATCGGAGAATAAGCATGAGTAGTTATTGAGATCTGGGTCTGACTGCAGGTAGCGTGGTCTTCTAGACGTTTAAGTGGGAGATTTGGAGGGGATGAGGAATGAAGGAACTTCAGGATAG AAAAGGGCTGAAGTCAAGTTCAGCTCCTAAAATGGATGTGGGAGCAAACTTTGAAGATAAACTGAATGACCCAGAGGATGAAACAGCGCAGATCAAAGAGGGGCCTGGAGCTCTGAGAAGAGAAGGAGACTCATCCGTGTTGAGTTTCCACAAGTACTGTCTTGAGTTTTGCAATAAAAGTGGGATAGC AGAGTTGAGTGAGCCGTAGGCTGAGTTCTCTCTTTTGTCTCCTAAGTTTTTATGACTACAAAAATCAGTAGTATGTCCTGAAATAATCATTAAGCTGTTTGAAAGTATGACTGCTTGCCATGTAGATACCATGGCTTGCTGAATAATCAGAAGAGGTGTGACTCTTATTCTAAAATTTGTCACAAAATG TCAAAATGAGAGACTCTGTAGGAACG

[Execution log]:
Preparing files for the alignment...
Done!
Aligning sequences...
Needleman-Wunsch global alignment of two sequences
Align sequences to reverse complement of the amplicon...
Done!
Needleman-Wunsch global alignment of two sequences
Quantifying indels/substitutions...
Alignment error, please check your input.

ERROR: Zero sequences aligned, please check your amplicon sequence

Trimming adapter sequences that are not Nextera

Hello,
I have a bunch a fastq.gz files to analyze, paired end reads.
However the adapter sequences are not Nextera and I don't know which ones they are exactly.

Just from looking at the fastq.gz files, can I know which adapter sequences were used, and if so, how can I then trim them with --trimmomatic_options_string ??

I guess I have to create a .fa file similar to the "NexteraPe-PE.fa" file, however I'm not sure how to correctly do this.

I attach the two files (pair ends) of one sequencing.
Would you please help me out and guide me as to be able to this by myself in the future?
Thank you so much, I would really appreciate it! You tool is very useful and a great contribution to the scientific community.
Best,
Alex
Won_Tae_1_S48_L001_R1_001.fastq.gz
Won_Tae_1_S48_L001_R2_001.fastq.gz

Unexpected error: invalid literal for int() with base 10: '0rc1'

JC04-4-CT13_short.txt

JC04-4-CT13_S4_filter.fastq.gz

Hi there,

I ran the web version and these files generated meaningful output, but when running the following with the command line version, I get the following error:

earnest@biolinux8[CRISPResso-master] python CRISPResso.py -r1 /home/earnest/Jeff.S/CRISPR_test/merged/filter/JC04-4-CT13_S4_filter.fastq -a TGCATGTCATCTCTTTCAGGTGTGGCATTTCAAGGGGGCTTGTGTCTTGAAAACAGCAACTGTGAGGACACTTGATAGTCATTTCCTTCAGTTCTGCTTTTGTCTCCCTAGGTGACTGTGGCCTTCCCCCAGATGTACCTAATGCCCAGCCAGCTTTGGAAGGCCGTACAAGTTTTCCCGAGGATACTGTAATAACGTACAAATGTGAAGAAAGCTTTGTGAAAATTCCTGGCGAGAAGGACTCAGT

-Analysis of CRISPR/Cas9 outcomes from deep sequencing data-

                      )
                     (
                    __)__
                 C\|     |
                   \     /
                    \___/
             

[Luca Pinello 2015, send bugs, suggestions or *green coffee* to lucapinello AT gmail DOT com]

Version 1.0.8

WARNING @ Tue, 14 Nov 2017 16:39:12:
	 Folder CRISPResso_on_JC04-4-CT13_S4_filter already exists. 

INFO  @ Tue, 14 Nov 2017 16:39:12:
	 Preparing files for the alignment... 

INFO  @ Tue, 14 Nov 2017 16:39:12:
	 Done! 

INFO  @ Tue, 14 Nov 2017 16:39:12:
	 Aligning sequences... 

INFO  @ Tue, 14 Nov 2017 16:40:01:
	 Align sequences to reverse complement of the amplicon... 

INFO  @ Tue, 14 Nov 2017 16:40:01:
	 Done! 

INFO  @ Tue, 14 Nov 2017 16:40:23:
	 Quantifying indels/substitutions... 

INFO  @ Tue, 14 Nov 2017 16:43:49:
	 Done! 

INFO  @ Tue, 14 Nov 2017 16:43:49:
	 Calculating indel distribution based on the length of the reads... 

INFO  @ Tue, 14 Nov 2017 16:43:51:
	 Done! 

INFO  @ Tue, 14 Nov 2017 16:43:51:
	 Calculating alleles frequencies... 

CRITICAL @ Tue, 14 Nov 2017 16:43:51:
	 Unexpected error, please check your input.

ERROR: invalid literal for int() with base 10: '0rc1'

Deprecation of convert_objects causing fatal error

I'm running CRISPRessoPooled in mixed-mode with the following command:

CRISPRessoPooled \
    --fastq_r1 A5_S184_L001_R1_001.fastq.gz \
    --fastq_r2 A5_S184_L001_R2_001.fastq.gz \
    --amplicons_file amplicons_description.txt \
    --bowtie2_index /data/ref_genome/mouse/musculus \
    --gene_annotations /data/ref_genome_annot/ucsc/mouse/vMM10.annotation.gz \
    --n_processes 4 \
    --name A5_S184 \
    --output_folder cspresso/A5_S184 \
    --save_also_png

This leads to the following output:

INFO  @ Tue, 12 Jul 2016 19:03:09:
         Checking dependencies...

INFO  @ Tue, 12 Jul 2016 19:03:09:

 All the required dependencies are present!

INFO  @ Tue, 12 Jul 2016 19:03:09:
         Amplicon description file and bowtie2 reference genome index files provided. The analysis will be perfomed using the reads that are aligned ony to the amplicons provided and not to other genomic regions.

INFO  @ Tue, 12 Jul 2016 19:03:09:
         Creating Folder /cvri/miano/amplicon_exp/cspresso/A5_S184/CRISPRessoPooled_on_A5_S184

INFO  @ Tue, 12 Jul 2016 19:03:09:
         Done!

INFO  @ Tue, 12 Jul 2016 19:03:09:
         Merging paired sequences with Flash...

INFO  @ Tue, 12 Jul 2016 19:03:10:
         Done!

INFO  @ Tue, 12 Jul 2016 19:03:10:
         Loading gene coordinates from annotation file: /cvri/data/ref_genome_annot/ucsc/mouse/vMM10.annotation.gz...

INFO  @ Tue, 12 Jul 2016 19:03:11:
         The uncompressed reference fasta file for /cvri/data/ref_genome/mouse/musculus is already present! Skipping generation.

INFO  @ Tue, 12 Jul 2016 19:03:11:
         Aligning reads to the provided genome index...

INFO  @ Tue, 12 Jul 2016 18:48:20:
         Demultiplexing reads by location...

gzip: /cvri/miano/amplicon_exp/cspresso/A5_S184/CRISPRessoPooled_on_A5_S184/MAPPED_REGIONS//*.fastq: No such file or directory
INFO  @ Tue, 12 Jul 2016 18:48:20:
         Reporting problematic regions...

/usr/local/lib/python2.7/dist-packages/CRISPResso/CRISPRessoPooledCORE.py:770: FutureWarning: convert_objects is deprecated.  Use the data-type specific converters pd.to_datetime, pd.to_timedelta and pd.to_numeric.
  df_regions=df_regions.convert_objects(convert_numeric=True)
CRITICAL @ Tue, 12 Jul 2016 18:48:20:


ERROR: Cannot set a frame with no defined index and a value that cannot be converted to a Series


~~~CRISPRessoPooled~~~
-Analysis of CRISPR/Cas9 outcomes from POOLED deep sequencing data-
              )                                            )
             (           _______________________          (
            __)__       | __  __  __     __ __  |        __)__
         C\|     \      ||__)/  \/  \|  |_ |  \ |     C\|     \
           \     /      ||   \__/\__/|__|__|__/ |       \     /
            \___/       |_______________________|        \___/


[Luca Pinello 2015, send bugs, suggestions or *green coffee* to lucapinello AT gmail DOT com]

Version 0.9.4

Mapping amplicons to the reference genome...

At this point the program stops executing. I found that if you alter CRISPRessoPooledCORE.py at 771 and 801 to df_regions=df_regions.apply(pd.to_numeric, errors='ignore') this problem goes away yielding these new results:

INFO  @ Tue, 12 Jul 2016 19:03:09:
         Checking dependencies...

INFO  @ Tue, 12 Jul 2016 19:03:09:

 All the required dependencies are present!

INFO  @ Tue, 12 Jul 2016 19:03:09:
         Amplicon description file and bowtie2 reference genome index files provided. The analysis will be perfomed using the reads that are aligned ony to the amplicons provided and not to other genomic regions.

INFO  @ Tue, 12 Jul 2016 19:03:09:
         Creating Folder /cvri/miano/amplicon_exp/cspresso/A5_S184/CRISPRessoPooled_on_A5_S184

INFO  @ Tue, 12 Jul 2016 19:03:09:
         Done!

INFO  @ Tue, 12 Jul 2016 19:03:09:
         Merging paired sequences with Flash...

INFO  @ Tue, 12 Jul 2016 19:03:10:
         Done!

INFO  @ Tue, 12 Jul 2016 19:03:10:
         Loading gene coordinates from annotation file: /cvri/data/ref_genome_annot/ucsc/mouse/vMM10.annotation.gz...

INFO  @ Tue, 12 Jul 2016 19:03:11:
         The uncompressed reference fasta file for /cvri/data/ref_genome/mouse/musculus is already present! Skipping generation.

INFO  @ Tue, 12 Jul 2016 19:03:11:
         Aligning reads to the provided genome index...

gzip: /cvri/miano/amplicon_exp/cspresso/A5_S184/CRISPRessoPooled_on_A5_S184/MAPPED_REGIONS//*.fastq: No such file or directory
INFO  @ Tue, 12 Jul 2016 19:05:56:
         Reporting problematic regions...

CRITICAL @ Tue, 12 Jul 2016 19:05:56:


ERROR: Cannot set a frame with no defined index and a value that cannot be converted to a Series


~~~CRISPRessoPooled~~~
-Analysis of CRISPR/Cas9 outcomes from POOLED deep sequencing data-

              )                                            )
             (           _______________________          (
            __)__       | __  __  __     __ __  |        __)__
         C\|     \      ||__)/  \/  \|  |_ |  \ |     C\|     \
           \     /      ||   \__/\__/|__|__|__/ |       \     /
            \___/       |_______________________|        \___/


[Luca Pinello 2015, send bugs, suggestions or *green coffee* to lucapinello AT gmail DOT com]

Version 0.9.4

Mapping amplicons to the reference genome...

There is still an error, but it continues to run this time even though all that was fixed was a deprecation. Not sure really if that is a good thing or not...

lucapinello / crispresso Goto Github PK

crispresso's People

Contributors

Stargazers

Watchers

Forkers

crispresso's Issues

Recommend Projects

Recommend Topics

Recommend Org