deniribicic / q2ont Goto Github PK
View Code? Open in Web Editor NEWBash pipeline for analysis of ONT full-length 16S sequences in QIIME2
Bash pipeline for analysis of ONT full-length 16S sequences in QIIME2
Hi, I am trying to configure pipeline. However, it is showing following error?
--
$ conda env create -n qiime2-2019.7 --file qiime2-2019.7-py36-linux-conda.yml
NotWritableError: The current user does not have write permissions to a required path.
path: /anaconda3/envs/.conda_envs_dir_test
uid: 1024
gid: 1023
If you feel that permissions on this path are set incorrectly, you can manually
change them by executing
$ sudo chown 1024:1023 /anaconda3/envs/.conda_envs_dir_test
In general, it's not advisable to use 'sudo conda'.
Hi Deni,
Thanks for this pipeline. I find it great and straightforward but if I am writing this is because I have an small issue.
I think I have all the artefacts ready and the script downloaded in my working directory and then I run:
/home/qiime2/q2ONT.sh [-i 0_basecalled-fastq] [-j SILVA_132_QIIME_release/rep_set/rep_set_all/99/silva_132_99_16S_sequence.qza] [-c silva-132-99-nb-classifier.qza] [-t 2]
Inmediately after running the script I got the next message:
`concatenating .fastq files...
cat: '/*.fastq': No such file or directory
all fastq files merged!
demultiplexing and trimming reads
could not find cpp_functions.so - please reinstall
demultiplexing failed!!!
/home/qiime2/q2ONT.sh: line 96: cd: 2_demux-reads: No such file or directory`
Probably I am doing something wrong with this line of code or with my sequences,etc.
Anyway, I would appreciate any help.
Best,
Jesús
I've tried pulling both and they will not download onto my mac.
Hi,
I try to use your pipeline. First I try to launch q2ONT with a previous SILVA 132 classifier used for my previous Illumina work; but I have an error:
Plugin error from feature-classifier:
The scikit-learn version (0.19.1) used to generate this artifact does not match the current version of scikit-learn installed (0.21.2). Please retrain your classifier for your current deployment to prevent data-corruption errors.
So I try to generate a new classifier with the new conda env. But it seems to process for nothing.
Do you have a copy of the one that you use ?
Regards
First of all thank you for the q2ONT script! Since I am relatively new to bioinformatics this is a great help for me.
I have run your q2ONT pipeline with our Nanopore data of a 16S sequencing. Importing the ONT data, training the classifier with the SILVA database and the rest of the pipeline worked well, the following data was generated from the basecalled data:
1_total_run.fastq.gz
2_demux-reads (folder)
3_QC (folder)
4_single-end-demux.qza
4_single_end-demux.qzv
5_derep-seqs.qza
5_derep-seqs.qzv
5_derep-table.qza
5_derep-table.qzv
6.1_uchime-ref-out (folder)
6.2_new-ref-seqs-op_ref-85.qza
6.2_rep-seqs-op_ref-85.qza
6.2_table-op_ref-85.qza
7_aligned-filtered_derep-seqs.qza
8_masked-aligned-filtered_derep-seqs.qza
9_unrooted-tree.qza
10_rooted-tree.qza
11_taxonomy-sklearn.qza
and exported:
dna-sequences.fasta
feature-table.biom
table-with-taxonomy.biom
taxonomy.tsv
tree.nwk
Now I'm searching for a way to visualize the results in barplots. I have already installed phyloseq, but I don't quite understand how to generate a barplot with the files I received. Could you please show me a way to do this with phyloseq?
I also tried to visualize the results with the qiime visualizer and iTol but I have some problems with these options, maybe you can help me with that too.
When I try to create a barplot of the data with the qiime taxa barplot visualizer (https://docs.qiime2.org/2019.10/plugins/available/taxa/barplot/). I would use as input Artifact Table the file "6.2_table-op_ref-85.qza" and as input Artifact taxonomy the file "11_taxonomy-sklearn.qza". Are these the correct files? But I also need a metadata file. Could you tell me how to generate it in qiime from the above data.
I also uploaded the file "10_rooted-tree.qza" to iTol (https://itol.embl.de/) to visualize the results, but unfortunately the tool only shows the tree with the OTU-IDs and not with the taxa. Maybe you know how I get the taxa here.
If you need more information please let me know.
Thank you a lot!
Hi, I run your pipeline with a small subset of my ONT data and it seems to work well.
Next I try to run it with my full run's data and when it comes to assign taxonomy,
qiime vsearch cluster-features-open-reference \ --i-table 6.1_uchime-ref-out/table-nonchimeric-wo-borderline.qza \ --i-sequences 6.1_uchime-ref-out/rep-seqs-nonchimeric-wo-borderline.qza \ --i-reference-sequences $reference_seqs \ --p-perc-identity 0.85 \ --o-clustered-table 6.2_table-op_ref-85.qza \ --o-clustered-sequences 6.2_rep-seqs-op_ref-85.qza \ --o-new-reference-sequences 6.2_new-ref-seqs-op_ref-85.qza \ --p-threads $threads
I have a memory error ( 32 core and 126Go Mem ).
Then I have done a subsampling and keep only 40% of each read per sample and re-run the pipeline. And I still have the same issues.
Do you have an idea for optimize this step ? What is the mean volume of your data?
PS: For info, the volume of my 12 samples of my run after subsampling and trimming is 12 x 300Mo = 3,6Go
PS2: full traceback of the error log :
Running external co_mmand line application. This may print messages to stdout and/or stderr.
The command being run is below. This command cannot be manually re-run as it will depend on temporary files that no longer exist.
Command: vsearch --usearch_global /tmp/tmp9bmzjk7j --id 0.85 --db /tmp/qiime2-archive-ew75tn5v/9f4b9a95-e8c4-45be-a980-6faaa9b857c7/data/dna-sequences.fasta --uc /tmp/tmpqsl93h8o --strand plus --qmask none --notmatched /tmp/tmp72o2x5lw --threads 24
vsearch v2.7.0_linux_x86_64, 125.9GB RAM, 32 cores
https://github.com/torognes/vsearch
Reading fiand subsamplingle /tmp/qiime2-archive-ew75tn5v/9f4b9a95-e8c4-45be-a980-6faaa9b857c7/data/dna-sequences.fasta 100%
521145303 nt in 369953 seqs, min 900, max 2961, avg 1409
Masking 100%
Counting k-mers 100%
Creating k-mer index 100%
Searching 100%
Matching query sequences: 844922 of 2092876 (40.37%)
Running external command line application. This may print messages to stdout and/or stderr.
The command being run is below. This command cannot be manually re-run as it will depend on temporary files that no longer exist.
Command: vsearch --sortbysize /tmp/tmp72o2x5lw --xsize --output /tmp/q2-DNAFASTAFormat-anta19n4
vsearch v2.7.0_linux_x86_64, 125.9GB RAM, 32 cores
https://github.com/torognes/vsearch
Reading file /tmp/tmp72o2x5lw 100%
1747135600 nt in 1247954 seqs, min 1400, max 1400, avg 1400
Getting sizes 100%
Sorting 100%
Median abundance: 1
Writing output 100%
Running external command line application. This may print messages to stdout and/or stderr.
The command being run is below. This command cannot be manually re-run as it will depend on temporary files that no longer exist.
Command: vsearch --cluster_size /tmp/tmpa7j30vmp --id 0.85 --centroids /tmp/q2-DNAFASTAFormat-xiom9wn4 --uc /tmp/tmpwz5to5f2 --qmask none --xsize --threads 24
vsearch v2.7.0_linux_x86_64, 125.9GB RAM, 32 cores
https://github.com/torognes/vsearch
Reading file /tmp/tmpa7j30vmp 100%
1747135600 nt in 1247954 seqs, min 1400, max 1400, avg 1400
Sorting by abundance 100%
Counting k-mers 100%
Clustering 100%
Sorting clusters 100%
Writing clusters 100%
Clusters: 512147 Size min 1, max 55355, avg 2.4
Singletons: 482081, 38.6% of seqs, 94.1% of clusters
Traceback (most recent call last):
File "/home/bioinfo/anaconda3/envs/qiime2-2019.7/lib/python3.6/site-packages/q2cli/commands.py", line 327, in call
results = action(**arguments)
File "</home/bioinfo/anaconda3/envs/qiime2-2019.7/lib/python3.6/site-packages/decorator.py:decorator-gen-126>", line 2, in cluster_features_open_reference
File "/home/bioinfo/anaconda3/envs/qiime2-2019.7/lib/python3.6/site-packages/qiime2/sdk/action.py", line 240, in bound_callable
output_types, provenance)
File "/home/bioinfo/anaconda3/envs/qiime2-2019.7/lib/python3.6/site-packages/qiime2/sdk/action.py", line 502, in callable_executor
prov = provenance.fork(name, output)
File "/home/bioinfo/anaconda3/envs/qiime2-2019.7/lib/python3.6/site-packages/qiime2/core/archive/provenance.py", line 438, in fork
forked.add_ancestor(alias)
File "/home/bioinfo/anaconda3/envs/qiime2-2019.7/lib/python3.6/site-packages/qiime2/core/archive/provenance.py", line 167, in add_ancestor
shutil.copytree(str(grandcestor), str(destination))
File "/home/bioinfo/anaconda3/envs/qiime2-2019.7/lib/python3.6/shutil.py", line 359, in copytree
raise Error(errors)
shutil.Error: [('/tmp/qiime2-archive-7k562zzb/94346e54-6bf0-4227-95a8-533257fdabcc/provenance/artifacts/1635fd96-bf73-4ae8-bcf4-a392a2f1d920/action/metadata.tsv', '/tmp/qiime2-provenance-mi86hqxk/artifacts/1635fd96-bf73-4ae8-bcf4-a392a2f1d920/action/metadata.tsv', '[Errno 28] No space left on device'), ('/tmp/qiime2-archive-7k562zzb/94346e54-6bf0-4227-95a8-533257fdabcc/provenance/artifacts/1635fd96-bf73-4ae8-bcf4-a392a2f1d920/action/action.yaml', '/tmp/qiime2-provenance-mi86hqxk/artifacts/1635fd96-bf73-4ae8-bcf4-a392a2f1d920/action/action.yaml', '[Errno 28] No space left on device'), ('/tmp/qiime2-archive-7k562zzb/94346e54-6bf0-4227-95a8-533257fdabcc/provenance/artifacts/1635fd96-bf73-4ae8-bcf4-a392a2f1d920/metadata.yaml', '/tmp/qiime2-provenance-mi86hqxk/artifacts/1635fd96-bf73-4ae8-bcf4-a392a2f1d920/metadata.yaml', '[Errno 28] No space left on device'), ('/tmp/qiime2-archive-7k562zzb/94346e54-6bf0-4227-95a8-533257fdabcc/provenance/artifacts/1635fd96-bf73-4ae8-bcf4-a392a2f1d920/citations.bib', '/tmp/qiime2-provenance-mi86hqxk/artifacts/1635fd96-bf73-4ae8-bcf4-a392a2f1d920/citations.bib', '[Errno 28] No space left on device')]_
Dear Deni,
thank you for developing this tool. We have used it to analyse a dataset but were quite surprised to see so little difference between healty and diseased samples. I am new to 16S metabarcoding using nanopore and have so many doubts.
I understand that ONT have higher error rates than Illumina but in my humble opinion, I think that 85% OTU clustering was way too low. Could the little differences between samples be affected by this? Trying to figure this out from literature, I find that in many cases, researchers do direct taxonomic classification without the OTU clustering step (I infer this, since no OTU clustering step was mentioned, so I may as well be in the wrong). I was wondering if the clustering is really needed in the analysis of full-length 16SrRNA gene, since the number of reads produced with this technology is much lower compared to Illumina.
All the best,
M
Dear,
I am writing to get input from experienced ones. I got nanopore data and using q2ONT command line to process my 16srRNA gene seq data. After demuliplexing, adapters removal, and trimming the reads to 1400 length, i imported my sequencing data into qiime2.
Ised these commands for deprelication of sequences, and for obtaining feature table seqs and feature table summary.
Dereplication of sequences
qiime vsearch dereplicate-sequences
--i-sequences 4_single-end-demux.qza
--o-dereplicated-table 5_derep-table.qza
--o-dereplicated-sequences 5_derep-seqs.qza
visualization files
qiime feature-table tabulate-seqs
--i-data 5_derep-seqs.qza
--o-visualization 5_derep-seqs.qzv
qiime feature-table summarize
--i-table 5_derep-table.qza
--o-visualization 5_derep-table.qzv
After these steps, i got two files, derep-seqs.qzv and derep-table.qzv.
Upon checking derep-table.qzv using qiime2 view, i realized that something might have gone wrong, as Frequency per feature is showing 1. Photo is attached.
Could you please provide insights what could have gone wrong that i obtained such outcomes, or it is normal to get such outcomes while processing nanopore seq data?
Thank you
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.