ecogenomics / app Goto Github PK

View Code? Open in Web Editor NEW

This project forked from minillinim/app

0.0 0.0 1.0 441 KB

Ace Pyrotag Pipeline - wrapper for QIIME and R - interfaces with PyroDB

Perl 100.00%

app's People

Contributors

Watchers

Forkers

fauziharoon

app's Issues

Error with APP 2.3.3

Hey Adam,

I ran app_combine (APP 2.3.3) on Brown on some data and this worked fine. However, the next step, app_make_results, does not even start:

Variable "$VERSION" is not imported at /srv/whitlam/bio/apps/sw/app/2.3.3/app_make_results.pl line 1179.
Global symbol "$global_TB_processing_dir" requires explicit package name at /srv/whitlam/bio/apps/sw/app/2.3.3/app_make_results.pl line 84.
Global symbol "$global_TB_processing_dir" requires explicit package name at /srv/whitlam/bio/apps/sw/app/2.3.3/app_make_results.pl line 85.
Global symbol "$global_SB_processing_dir" requires explicit package name at /srv/whitlam/bio/apps/sw/app/2.3.3/app_make_results.pl line 87.
Global symbol "$global_SB_processing_dir" requires explicit package name at /srv/whitlam/bio/apps/sw/app/2.3.3/app_make_results.pl line 88.
Global symbol "$QIIME_TAX_tax_file" requires explicit package name at /srv/whitlam/bio/apps/sw/app/2.3.3/app_make_results.pl line 151.
Global symbol "$QIIME_TAX_blast_file" requires explicit package name at /srv/whitlam/bio/apps/sw/app/2.3.3/app_make_results.pl line 152.
Global symbol "$QIIME_TAX_aligned_blast_file" requires explicit package name at /srv/whitlam/bio/apps/sw/app/2.3.3/app_make_results.pl line 153.
Global symbol "$QIIME_imputed_file" requires explicit package name at /srv/whitlam/bio/apps/sw/app/2.3.3/app_make_results.pl line 154.
Global symbol "$SILVA_TAX_tax_file" requires explicit package name at /srv/whitlam/bio/apps/sw/app/2.3.3/app_make_results.pl line 158.
Global symbol "$SILVA_TAX_blast_file" requires explicit package name at /srv/whitlam/bio/apps/sw/app/2.3.3/app_make_results.pl line 159.
Global symbol "$SILVA_TAX_aligned_blast_file" requires explicit package name at /srv/whitlam/bio/apps/sw/app/2.3.3/app_make_results.pl line 160.
Global symbol "$SILVA_imputed_file" requires explicit package name at /srv/whitlam/bio/apps/sw/app/2.3.3/app_make_results.pl line 161.
Global symbol "$MERGED_TAX_tax_file" requires explicit package name at /srv/whitlam/bio/apps/sw/app/2.3.3/app_make_results.pl line 165.
Global symbol "$MERGED_TAX_blast_file" requires explicit package name at /srv/whitlam/bio/apps/sw/app/2.3.3/app_make_results.pl line 166.
Global symbol "$QIIME_imputed_file" requires explicit package name at /srv/whitlam/bio/apps/sw/app/2.3.3/app_make_results.pl line 183.
...

Hopefully it's not too hard to fix.

Florent

use platform-independent chomping of input files

There's apparently platform-independent chomp functions on cpan, according to @fangly. Maybe should use these instead of just assuming everything is linux.

Resuming app_make_results

Hi Adam,

You recently included an option to stop APP after an OTU table has been generated. Is this before or after normalization, by the way? Anyway, I was thinking that it would be nice to be able to resume an APP job.

Here is a use case: Stop APP after an OTU table has been generated, mock around with the OTU table manually (e.g. remove a known contaminant OTU, remove all eukaryotes or unknown OTUs), and then resume APP. When using the universal PCR primers, one amplifies eukaryotic rRNA genes, but when using only the Greengenes database, all these reads, which can amount to a lot, will appear has unknown. So, you see that it could be very useful to stop and then resume the APP pipeline.

Food for thoughts...

Florent

usearch threads are hard coded

Need to move config parsing to APPConfig.pm from app_make_results.pl to allow do_QA to get number of threads from the config. Will do this at a later date.

Waste of CPU

app_run_analysis.pl is wasteful because its runs blastall on 10 threads. In my experience, the parallelization of blast is not effective, not when there are plenty of short reads as input.

So, I guess, to avoid wasting cpu, the number of cpus could be decreased from 10 (-a 10) to 1 (-a 1) in this piece of code:

        checkAndRunCommand("assign_taxonomy.py", [{-i => "$processing_dir/rep_set.fa",
                                                   -t => $extra_param_hash->{DB}->{TAXONOMIES},
                                                   -b => $extra_param_hash->{DB}->{OTUS},
                                                   -m => "blast",
                                                   -a => 10,
                                                   -e => 0.001,
                                                   -o => $processing_dir}], DIE_ON_FAILURE);

app_combine forgets configuration

Hi,

I just noticed something with APP 2.3.3. I ran app_combine on two jobs. In each of the two config files, I had put:
DB=MERGED

However, in the resulting file, ./ac/app_ac.config, I have
DB=0

In essence, app_combine did not care or forgot about the options I put in :(

Florent

Bug in table based normalization with custom databases

terminal output is shown below, I'm using version 2.3.3-connor on brown.

app_make_results.pl -threads 20 -c app_265.config -b ~/db/ppk/ppk.fasta -t ~/db/ppk/ppk.nds.1 
---------------------------------------------------------------- 
 /srv/whitlam/bio/apps/sw/app/2.3.3-connor/app_make_results.pl
 Version 2.3.3
 Copyright (C) 2011 Michael Imelfort and Paul Dennis

 This program comes with ABSOLUTELY NO WARRANTY;
 This is free software, and you are welcome to redistribute it
 under certain conditions: See the source for more details.
---------------------------------------------------------------- 
app_analysis_20121008/results/table_based
Checking if all the config checks out...        /srv/whitlam/home/projects/EBPR_PHAGE/pyrotags/265/QA/qiime_mapping.txt ...Processing 23 samples
Finding normalisation size automatically
Normalised sample size calculated at: 1500 reads
Using taxonomy file: /srv/whitlam/home/users/uqcskenn/db/ppk/ppk.nds.1
Using blast file: /srv/whitlam/home/users/uqcskenn/db/ppk/ppk.fasta
Using aligned blast file: /srv/whitlam/bio/db/gg/qiime_default/gg_otus_4feb2011/rep_set/gg_99_otus_4feb2011_aligned.fasta
Using imputed file: /srv/whitlam/bio/db/gg/qiime_default/core_set_aligned.fasta.imputed
All good!
----------------------------------------------------------------
Start TABLE BASED NORMALISATION data set processing...
----------------------------------------------------------------
Copying reads for analysis...
Beginning OTU and Taxonomy module...
Picking OTUs for non normalised data set...
    pick_otus.py -o app_analysis_20121008/processing/table_based/uclust_picked_otus -s 0.97 -i app_analysis_20121008/processing/table_based/non_normalised.fa

Getting a representative set...
    pick_rep_set.py -i app_analysis_20121008/processing/table_based/uclust_picked_otus/non_normalised_otus.txt -f app_analysis_20121008/processing/table_based/non_normalised.fa

Assigning taxonomy for non normalised data set...
Assign taxonomy method: blast
    assign_taxonomy.py -a 20 -o app_analysis_20121008/processing/table_based -m blast -t /srv/whitlam/home/users/uqcskenn/db/ppk/ppk.nds.1 -i app_analysis_20121008/processing/table_based/non_normalised.fa_rep_set.fasta -e 0.001 -b /srv/whitlam/home/users/uqcskenn/db/ppk/ppk.fasta

Making NON NORMALISED otu table...
    make_otu_table.py -o app_analysis_20121008/results/table_based/non_normalised_otu_table.txt -t app_analysis_20121008/processing/table_based/non_normalised.fa_rep_set_tax_assignments.txt -i app_analysis_20121008/processing/table_based/uclust_picked_otus/non_normalised_otus.txt

Making NON NORMALISED otu table (Extended format)...
    reformat_otu_table.py -o app_analysis_20121008/results/table_based/non_normalised_otu_table_expanded.tsv -t app_analysis_20121008/processing/table_based/non_normalised.fa_rep_set_tax_assignments.txt -i app_analysis_20121008/results/table_based/non_normalised_otu_table.txt

Rarefaction...
    multiple_rarefactions.py -n 50 -o app_analysis_20121008/processing/table_based/rarefied_otu_tables/ -s 50 -m 50 -i app_analysis_20121008/results/table_based/non_normalised_otu_table.txt -x 19190

Normalizing non normalised table at 1500 sequences... [1500, 1000]
    multiple_rarefactions_even_depth.py -n 1000 -o app_analysis_20121008/processing/table_based/rare_tables/ -i app_analysis_20121008/results/table_based/non_normalised_otu_table.txt -d 1500 --lineages_included --k

Calculating centroid subsampled table...
Calculating centroid OTU table from tables in app_analysis_20121008/processing/table_based/rare_tables/...
  --start loading data...
  --data loaded, calculating centroid...
  --calculating distances of tables to centroid...
  --table: -1 is the centroid table
Problem running this R command:
mantel.otu <- mantel(ave,big_frame[,,min_index]);

Got the error:
'x' and 'y' must have the same length
Warning message:
In as.dist.default(ydis) : non-square matrix

Option to make up quality scores

Hi Adam,

It's happened a few times to me that I want to put data through APP for which I have no quality scores.
As far as I know, quality scores are needed for APP. So, maybe it would be good to include an option in app_combine and app_qa to make fake quality scores if the user requests it.

You could ask Dana for her script that does that.

Florent

APP should log to a file in the output folder, as well as to STDOUT

Sometimes it is hard to match logs with the output data after analysis was done a while ago, particularly when there are multiple app_make_results folders lying around in the same base folder.

If there was a log inside the output folder, then I'd know exactly what I did and when.

Merged database versions

When using a custom database, e.g. the 'MERGED' database:
1/ it does not seem that APP checks that the hardcoded database files are available prior to continuing its work
2/ there is no way to specify which version of the merged database to use, though apparently, one can specify a couple of different Greengenes database versions

Ideally, APP should have a config file that lets people provide a database, instead of hardcoding file paths.
Cheers,
Florent

Removal of spaces in taxonomic strings

Given a taxonomic file that contains spaces, e.g.
k__Bacteria; p__Firmicutes; c__Clostridia; o__Clostridiales; f__Lachnospiraceae; g__[Ruminococcus]; s__
APP properly generates a raw OTU table

However, after rarefaction, the spaces in the taxonomic strings are removed
k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales;f__Lachnospiraceae;g__[Ruminococcus];s__

This removal of spaces is apparently due to the QIIME script that takes care of the rarefaction and is very problematic for programs that rely on these taxonomic strings to remain unchanged.

app_otu_to_krona.pl is broken

I tried to run the script in version 2.3.3 and got errors complaining about not being able to find krona scripts - it didn't matter whether the krona module was loaded or not. An older version of APP ~2.1 ran fine.

Skip beta diversity analyses when there is a single sample

Ok, I got this error during app_make_results.pl version 2.3.3:

   Jacknifed beta diversity....
   beta_diversity.py -o app_analysis_20121002/results/table_based/de_novo_phylogeny/beta_diversity/non_normalised/weighted_unifrac -m weighted_unifrac -t app_analysis_20121002/processing/table_based/de_novo_phylogeny/non_normalised_tree.tre -i app_analysis_20121002/results/table_based/non_normalised_otu_table.txt

   upgma_cluster.py -o app_analysis_20121002/results/table_based/de_novo_phylogeny/beta_diversity/non_normalised/weighted_unifrac/weighted_unifrac_non_normalised_otu_table_upgma.tre -i app_analysis_20121002/results/table_based/de_novo_phylogeny/beta_diversity/non_normalised/weighted_unifrac/weighted_unifrac_non_normalised_otu_table.txt

Traceback (most recent call last):
  File "/srv/whitlam/bio/apps/12.04/sw/FrankenQIIME/1.2.0/bin/upgma_cluster.py", line 53, in <module>
    main()
  File "/srv/whitlam/bio/apps/12.04/sw/FrankenQIIME/1.2.0/bin/upgma_cluster.py", line 47, in main
    single_file_upgma(opts.input_path, opts.output_path)
  File "/srv/whitlam/bio/apps/12.04/sw/FrankenQIIME/1.2.0/lib/qiime/hierarchical_cluster.py", line 56, in single_file_upgma
    Ensure it has more than one sample present""" % (str(input_file),))
RuntimeError: input file app_analysis_20121002/results/table_based/de_novo_phylogeny/beta_diversity/non_normalised/weighted_unifrac/weighted_unifrac_non_normalised_otu_table.txt did not make a UPGMA tree.
 Ensure it has more than one sample present
**ERROR: /srv/whitlam/bio/apps/sw/app/2.3.3/app_make_results.pl : No such file or directory
 at /srv/whitlam/bio/apps/sw/app/2.3.3/app_make_results.pl line 622.

I did not really investigate this issue further. Considering that I have a single sample, it should be pretty obvious what is happening here: one cannot calculate the beta-diversity from a single sample. The APP pipeline should skip any beta-diversity computations if there are less than 2 samples.

Florent

FASTA file lookup too narrow

Hi Adam,

APP (app_do_QA.pl and probably other APP scripts) takes a config file as an argument and expects to find a corresponding FASTA and QUAL files.

1/ in the --help message, under -config explanations, perhaps put that these files are expected
2/ APP will find a FASTA file named *.fna, but not *.fasta or *.fa. It is annoying because I never remember which extension it expects. It would be great if it could find different FASTA extensions.

Cheers,

Florent

app_make_results.pl custom databases not working

setting the -b -t or -i options in version 2.3.3 has no affect on the behaviour. I've fixed this issue with version 2.3.3-connor that is currently on brown

APP does not enforce unique IDs

The first column of the APP config file is meant to contain unique IDs. However, when APP is given a config file with duplicate IDs, APP proceeds happily. I do not know if APP processes these duplicate IDs in a sensible way.

The expected behaviour would be for APP to check for duplicate IDs and return an error when duplicate IDs are detected.

If there is a use-case for users using duplicate IDs to maybe group multiple samples into a single one, then I suggest that this be implemented as an extra argument option to provide on the command-line.

Cheers,

Florent

Sanity checking - number of reads after split_libraries

Hi,

I am getting no reads in seqs.fna after the split_libraries.py step. Before going into the next step, UCHIME, it would be easy to throw a meaningful error message and die if the size of seqs.fna is 0 bytes (i.e. no reads) to prevent going any further in the pipeline.
Best,

Florent

ecogenomics / app Goto Github PK

app's People

Contributors

Watchers

Forkers

app's Issues

Recommend Projects

Recommend Topics

Recommend Org