ecogenomics / app Goto Github PK
View Code? Open in Web Editor NEWThis project forked from minillinim/app
Ace Pyrotag Pipeline - wrapper for QIIME and R - interfaces with PyroDB
This project forked from minillinim/app
Ace Pyrotag Pipeline - wrapper for QIIME and R - interfaces with PyroDB
Hey Adam,
I ran app_combine (APP 2.3.3) on Brown on some data and this worked fine. However, the next step, app_make_results, does not even start:
Variable "$VERSION" is not imported at /srv/whitlam/bio/apps/sw/app/2.3.3/app_make_results.pl line 1179.
Global symbol "$global_TB_processing_dir" requires explicit package name at /srv/whitlam/bio/apps/sw/app/2.3.3/app_make_results.pl line 84.
Global symbol "$global_TB_processing_dir" requires explicit package name at /srv/whitlam/bio/apps/sw/app/2.3.3/app_make_results.pl line 85.
Global symbol "$global_SB_processing_dir" requires explicit package name at /srv/whitlam/bio/apps/sw/app/2.3.3/app_make_results.pl line 87.
Global symbol "$global_SB_processing_dir" requires explicit package name at /srv/whitlam/bio/apps/sw/app/2.3.3/app_make_results.pl line 88.
Global symbol "$QIIME_TAX_tax_file" requires explicit package name at /srv/whitlam/bio/apps/sw/app/2.3.3/app_make_results.pl line 151.
Global symbol "$QIIME_TAX_blast_file" requires explicit package name at /srv/whitlam/bio/apps/sw/app/2.3.3/app_make_results.pl line 152.
Global symbol "$QIIME_TAX_aligned_blast_file" requires explicit package name at /srv/whitlam/bio/apps/sw/app/2.3.3/app_make_results.pl line 153.
Global symbol "$QIIME_imputed_file" requires explicit package name at /srv/whitlam/bio/apps/sw/app/2.3.3/app_make_results.pl line 154.
Global symbol "$SILVA_TAX_tax_file" requires explicit package name at /srv/whitlam/bio/apps/sw/app/2.3.3/app_make_results.pl line 158.
Global symbol "$SILVA_TAX_blast_file" requires explicit package name at /srv/whitlam/bio/apps/sw/app/2.3.3/app_make_results.pl line 159.
Global symbol "$SILVA_TAX_aligned_blast_file" requires explicit package name at /srv/whitlam/bio/apps/sw/app/2.3.3/app_make_results.pl line 160.
Global symbol "$SILVA_imputed_file" requires explicit package name at /srv/whitlam/bio/apps/sw/app/2.3.3/app_make_results.pl line 161.
Global symbol "$MERGED_TAX_tax_file" requires explicit package name at /srv/whitlam/bio/apps/sw/app/2.3.3/app_make_results.pl line 165.
Global symbol "$MERGED_TAX_blast_file" requires explicit package name at /srv/whitlam/bio/apps/sw/app/2.3.3/app_make_results.pl line 166.
Global symbol "$QIIME_imputed_file" requires explicit package name at /srv/whitlam/bio/apps/sw/app/2.3.3/app_make_results.pl line 183.
...
Hopefully it's not too hard to fix.
Florent
There's apparently platform-independent chomp
functions on cpan, according to @fangly. Maybe should use these instead of just assuming everything is linux.
Hi Adam,
You recently included an option to stop APP after an OTU table has been generated. Is this before or after normalization, by the way? Anyway, I was thinking that it would be nice to be able to resume an APP job.
Here is a use case: Stop APP after an OTU table has been generated, mock around with the OTU table manually (e.g. remove a known contaminant OTU, remove all eukaryotes or unknown OTUs), and then resume APP. When using the universal PCR primers, one amplifies eukaryotic rRNA genes, but when using only the Greengenes database, all these reads, which can amount to a lot, will appear has unknown. So, you see that it could be very useful to stop and then resume the APP pipeline.
Food for thoughts...
Florent
Need to move config parsing to APPConfig.pm from app_make_results.pl to allow do_QA to get number of threads from the config. Will do this at a later date.
app_run_analysis.pl is wasteful because its runs blastall on 10 threads. In my experience, the parallelization of blast is not effective, not when there are plenty of short reads as input.
So, I guess, to avoid wasting cpu, the number of cpus could be decreased from 10 (-a 10) to 1 (-a 1) in this piece of code:
checkAndRunCommand("assign_taxonomy.py", [{-i => "$processing_dir/rep_set.fa",
-t => $extra_param_hash->{DB}->{TAXONOMIES},
-b => $extra_param_hash->{DB}->{OTUS},
-m => "blast",
-a => 10,
-e => 0.001,
-o => $processing_dir}], DIE_ON_FAILURE);
Hi,
I just noticed something with APP 2.3.3. I ran app_combine on two jobs. In each of the two config files, I had put:
DB=MERGED
However, in the resulting file, ./ac/app_ac.config, I have
DB=0
In essence, app_combine did not care or forgot about the options I put in :(
Florent
terminal output is shown below, I'm using version 2.3.3-connor on brown.
app_make_results.pl -threads 20 -c app_265.config -b ~/db/ppk/ppk.fasta -t ~/db/ppk/ppk.nds.1
----------------------------------------------------------------
/srv/whitlam/bio/apps/sw/app/2.3.3-connor/app_make_results.pl
Version 2.3.3
Copyright (C) 2011 Michael Imelfort and Paul Dennis
This program comes with ABSOLUTELY NO WARRANTY;
This is free software, and you are welcome to redistribute it
under certain conditions: See the source for more details.
----------------------------------------------------------------
app_analysis_20121008/results/table_based
Checking if all the config checks out... /srv/whitlam/home/projects/EBPR_PHAGE/pyrotags/265/QA/qiime_mapping.txt ...Processing 23 samples
Finding normalisation size automatically
Normalised sample size calculated at: 1500 reads
Using taxonomy file: /srv/whitlam/home/users/uqcskenn/db/ppk/ppk.nds.1
Using blast file: /srv/whitlam/home/users/uqcskenn/db/ppk/ppk.fasta
Using aligned blast file: /srv/whitlam/bio/db/gg/qiime_default/gg_otus_4feb2011/rep_set/gg_99_otus_4feb2011_aligned.fasta
Using imputed file: /srv/whitlam/bio/db/gg/qiime_default/core_set_aligned.fasta.imputed
All good!
----------------------------------------------------------------
Start TABLE BASED NORMALISATION data set processing...
----------------------------------------------------------------
Copying reads for analysis...
Beginning OTU and Taxonomy module...
Picking OTUs for non normalised data set...
pick_otus.py -o app_analysis_20121008/processing/table_based/uclust_picked_otus -s 0.97 -i app_analysis_20121008/processing/table_based/non_normalised.fa
Getting a representative set...
pick_rep_set.py -i app_analysis_20121008/processing/table_based/uclust_picked_otus/non_normalised_otus.txt -f app_analysis_20121008/processing/table_based/non_normalised.fa
Assigning taxonomy for non normalised data set...
Assign taxonomy method: blast
assign_taxonomy.py -a 20 -o app_analysis_20121008/processing/table_based -m blast -t /srv/whitlam/home/users/uqcskenn/db/ppk/ppk.nds.1 -i app_analysis_20121008/processing/table_based/non_normalised.fa_rep_set.fasta -e 0.001 -b /srv/whitlam/home/users/uqcskenn/db/ppk/ppk.fasta
Making NON NORMALISED otu table...
make_otu_table.py -o app_analysis_20121008/results/table_based/non_normalised_otu_table.txt -t app_analysis_20121008/processing/table_based/non_normalised.fa_rep_set_tax_assignments.txt -i app_analysis_20121008/processing/table_based/uclust_picked_otus/non_normalised_otus.txt
Making NON NORMALISED otu table (Extended format)...
reformat_otu_table.py -o app_analysis_20121008/results/table_based/non_normalised_otu_table_expanded.tsv -t app_analysis_20121008/processing/table_based/non_normalised.fa_rep_set_tax_assignments.txt -i app_analysis_20121008/results/table_based/non_normalised_otu_table.txt
Rarefaction...
multiple_rarefactions.py -n 50 -o app_analysis_20121008/processing/table_based/rarefied_otu_tables/ -s 50 -m 50 -i app_analysis_20121008/results/table_based/non_normalised_otu_table.txt -x 19190
Normalizing non normalised table at 1500 sequences... [1500, 1000]
multiple_rarefactions_even_depth.py -n 1000 -o app_analysis_20121008/processing/table_based/rare_tables/ -i app_analysis_20121008/results/table_based/non_normalised_otu_table.txt -d 1500 --lineages_included --k
Calculating centroid subsampled table...
Calculating centroid OTU table from tables in app_analysis_20121008/processing/table_based/rare_tables/...
--start loading data...
--data loaded, calculating centroid...
--calculating distances of tables to centroid...
--table: -1 is the centroid table
Problem running this R command:
mantel.otu <- mantel(ave,big_frame[,,min_index]);
Got the error:
'x' and 'y' must have the same length
Warning message:
In as.dist.default(ydis) : non-square matrix
Hi Adam,
It's happened a few times to me that I want to put data through APP for which I have no quality scores.
As far as I know, quality scores are needed for APP. So, maybe it would be good to include an option in app_combine and app_qa to make fake quality scores if the user requests it.
You could ask Dana for her script that does that.
Florent
Sometimes it is hard to match logs with the output data after analysis was done a while ago, particularly when there are multiple app_make_results folders lying around in the same base folder.
If there was a log inside the output folder, then I'd know exactly what I did and when.
When using a custom database, e.g. the 'MERGED' database:
1/ it does not seem that APP checks that the hardcoded database files are available prior to continuing its work
2/ there is no way to specify which version of the merged database to use, though apparently, one can specify a couple of different Greengenes database versions
Ideally, APP should have a config file that lets people provide a database, instead of hardcoding file paths.
Cheers,
Florent
Given a taxonomic file that contains spaces, e.g.
k__Bacteria; p__Firmicutes; c__Clostridia; o__Clostridiales; f__Lachnospiraceae; g__[Ruminococcus]; s__
APP properly generates a raw OTU table
However, after rarefaction, the spaces in the taxonomic strings are removed
k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales;f__Lachnospiraceae;g__[Ruminococcus];s__
This removal of spaces is apparently due to the QIIME script that takes care of the rarefaction and is very problematic for programs that rely on these taxonomic strings to remain unchanged.
I tried to run the script in version 2.3.3 and got errors complaining about not being able to find krona scripts - it didn't matter whether the krona module was loaded or not. An older version of APP ~2.1 ran fine.
Ok, I got this error during app_make_results.pl version 2.3.3:
Jacknifed beta diversity....
beta_diversity.py -o app_analysis_20121002/results/table_based/de_novo_phylogeny/beta_diversity/non_normalised/weighted_unifrac -m weighted_unifrac -t app_analysis_20121002/processing/table_based/de_novo_phylogeny/non_normalised_tree.tre -i app_analysis_20121002/results/table_based/non_normalised_otu_table.txt
upgma_cluster.py -o app_analysis_20121002/results/table_based/de_novo_phylogeny/beta_diversity/non_normalised/weighted_unifrac/weighted_unifrac_non_normalised_otu_table_upgma.tre -i app_analysis_20121002/results/table_based/de_novo_phylogeny/beta_diversity/non_normalised/weighted_unifrac/weighted_unifrac_non_normalised_otu_table.txt
Traceback (most recent call last):
File "/srv/whitlam/bio/apps/12.04/sw/FrankenQIIME/1.2.0/bin/upgma_cluster.py", line 53, in <module>
main()
File "/srv/whitlam/bio/apps/12.04/sw/FrankenQIIME/1.2.0/bin/upgma_cluster.py", line 47, in main
single_file_upgma(opts.input_path, opts.output_path)
File "/srv/whitlam/bio/apps/12.04/sw/FrankenQIIME/1.2.0/lib/qiime/hierarchical_cluster.py", line 56, in single_file_upgma
Ensure it has more than one sample present""" % (str(input_file),))
RuntimeError: input file app_analysis_20121002/results/table_based/de_novo_phylogeny/beta_diversity/non_normalised/weighted_unifrac/weighted_unifrac_non_normalised_otu_table.txt did not make a UPGMA tree.
Ensure it has more than one sample present
**ERROR: /srv/whitlam/bio/apps/sw/app/2.3.3/app_make_results.pl : No such file or directory
at /srv/whitlam/bio/apps/sw/app/2.3.3/app_make_results.pl line 622.
I did not really investigate this issue further. Considering that I have a single sample, it should be pretty obvious what is happening here: one cannot calculate the beta-diversity from a single sample. The APP pipeline should skip any beta-diversity computations if there are less than 2 samples.
Florent
Hi Adam,
APP (app_do_QA.pl and probably other APP scripts) takes a config file as an argument and expects to find a corresponding FASTA and QUAL files.
1/ in the --help message, under -config explanations, perhaps put that these files are expected
2/ APP will find a FASTA file named *.fna, but not *.fasta or *.fa. It is annoying because I never remember which extension it expects. It would be great if it could find different FASTA extensions.
Cheers,
Florent
setting the -b
-t
or -i
options in version 2.3.3 has no affect on the behaviour. I've fixed this issue with version 2.3.3-connor that is currently on brown
The first column of the APP config file is meant to contain unique IDs. However, when APP is given a config file with duplicate IDs, APP proceeds happily. I do not know if APP processes these duplicate IDs in a sensible way.
The expected behaviour would be for APP to check for duplicate IDs and return an error when duplicate IDs are detected.
If there is a use-case for users using duplicate IDs to maybe group multiple samples into a single one, then I suggest that this be implemented as an extra argument option to provide on the command-line.
Cheers,
Florent
Hi,
I am getting no reads in seqs.fna after the split_libraries.py step. Before going into the next step, UCHIME, it would be easy to throw a meaningful error message and die if the size of seqs.fna is 0 bytes (i.e. no reads) to prevent going any further in the pipeline.
Best,
Florent
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.