deniribicic / q2ont Goto Github PK

View Code? Open in Web Editor NEW

23.0 23.0 11.0 41 KB

Bash pipeline for analysis of ONT full-length 16S sequences in QIIME2

Shell 100.00%

q2ont's People

Stargazers

Watchers

Forkers

talalhossain samanthaeva miguensblanco hj1994412 joemunju amsyarimorni muguading marynjerey kasmiyassin smwasya brysonkimemia

q2ont's Issues

Pipeline configuration issue?

Hi, I am trying to configure pipeline. However, it is showing following error?

--
$ conda env create -n qiime2-2019.7 --file qiime2-2019.7-py36-linux-conda.yml

NotWritableError: The current user does not have write permissions to a required path.
path: /anaconda3/envs/.conda_envs_dir_test
uid: 1024
gid: 1023

If you feel that permissions on this path are set incorrectly, you can manually
change them by executing

$ sudo chown 1024:1023 /anaconda3/envs/.conda_envs_dir_test

In general, it's not advisable to use 'sudo conda'.

Unable to run q20NT.sh in Qiime2core

Hi Deni,

Thanks for this pipeline. I find it great and straightforward but if I am writing this is because I have an small issue.
I think I have all the artefacts ready and the script downloaded in my working directory and then I run:
/home/qiime2/q2ONT.sh [-i 0_basecalled-fastq] [-j SILVA_132_QIIME_release/rep_set/rep_set_all/99/silva_132_99_16S_sequence.qza] [-c silva-132-99-nb-classifier.qza] [-t 2]

Inmediately after running the script I got the next message:

`concatenating .fastq files...
cat: '/*.fastq': No such file or directory

all fastq files merged!

demultiplexing and trimming reads
could not find cpp_functions.so - please reinstall
demultiplexing failed!!!

/home/qiime2/q2ONT.sh: line 96: cd: 2_demux-reads: No such file or directory`

Probably I am doing something wrong with this line of code or with my sequences,etc.

Anyway, I would appreciate any help.

Best,

Jesús

Issue pulling 16S Silva and .sh

I've tried pulling both and they will not download onto my mac.

Do you have code for cloning your .sh command as it is not currently on the protocol. Do I need to do this in the qiime2 environment?
Can I update qiime2 to the current version?
Can I use the qiime2 full length reads for SILVA?

Problem with the classifier

Hi,

I try to use your pipeline. First I try to launch q2ONT with a previous SILVA 132 classifier used for my previous Illumina work; but I have an error:

Plugin error from feature-classifier:
The scikit-learn version (0.19.1) used to generate this artifact does not match the current version of scikit-learn installed (0.21.2). Please retrain your classifier for your current deployment to prevent data-corruption errors.

So I try to generate a new classifier with the new conda env. But it seems to process for nothing.
Do you have a copy of the one that you use ?

Regards

Need help to create a barplot of the q2ONT results.

First of all thank you for the q2ONT script! Since I am relatively new to bioinformatics this is a great help for me.
I have run your q2ONT pipeline with our Nanopore data of a 16S sequencing. Importing the ONT data, training the classifier with the SILVA database and the rest of the pipeline worked well, the following data was generated from the basecalled data:
1_total_run.fastq.gz
2_demux-reads (folder)
3_QC (folder)
4_single-end-demux.qza
4_single_end-demux.qzv
5_derep-seqs.qza
5_derep-seqs.qzv
5_derep-table.qza
5_derep-table.qzv
6.1_uchime-ref-out (folder)
6.2_new-ref-seqs-op_ref-85.qza
6.2_rep-seqs-op_ref-85.qza
6.2_table-op_ref-85.qza
7_aligned-filtered_derep-seqs.qza
8_masked-aligned-filtered_derep-seqs.qza
9_unrooted-tree.qza
10_rooted-tree.qza
11_taxonomy-sklearn.qza

and exported:
dna-sequences.fasta
feature-table.biom
table-with-taxonomy.biom
taxonomy.tsv
tree.nwk

Now I'm searching for a way to visualize the results in barplots. I have already installed phyloseq, but I don't quite understand how to generate a barplot with the files I received. Could you please show me a way to do this with phyloseq?
I also tried to visualize the results with the qiime visualizer and iTol but I have some problems with these options, maybe you can help me with that too.
When I try to create a barplot of the data with the qiime taxa barplot visualizer (https://docs.qiime2.org/2019.10/plugins/available/taxa/barplot/). I would use as input Artifact Table the file "6.2_table-op_ref-85.qza" and as input Artifact taxonomy the file "11_taxonomy-sklearn.qza". Are these the correct files? But I also need a metadata file. Could you tell me how to generate it in qiime from the above data.
I also uploaded the file "10_rooted-tree.qza" to iTol (https://itol.embl.de/) to visualize the results, but unfortunately the tool only shows the tree with the OTU-IDs and not with the taxa. Maybe you know how I get the taxa here.
If you need more information please let me know.
Thank you a lot!

Memory issues and data volume

Hi, I run your pipeline with a small subset of my ONT data and it seems to work well.
Next I try to run it with my full run's data and when it comes to assign taxonomy,

qiime vsearch cluster-features-open-reference \ --i-table 6.1_uchime-ref-out/table-nonchimeric-wo-borderline.qza \ --i-sequences 6.1_uchime-ref-out/rep-seqs-nonchimeric-wo-borderline.qza \ --i-reference-sequences $reference_seqs \ --p-perc-identity 0.85 \ --o-clustered-table 6.2_table-op_ref-85.qza \ --o-clustered-sequences 6.2_rep-seqs-op_ref-85.qza \ --o-new-reference-sequences 6.2_new-ref-seqs-op_ref-85.qza \ --p-threads $threads

I have a memory error ( 32 core and 126Go Mem ).

Then I have done a subsampling and keep only 40% of each read per sample and re-run the pipeline. And I still have the same issues.

Do you have an idea for optimize this step ? What is the mean volume of your data?

PS: For info, the volume of my 12 samples of my run after subsampling and trimming is 12 x 300Mo = 3,6Go

PS2: full traceback of the error log :

Running external co_mmand line application. This may print messages to stdout and/or stderr.

The command being run is below. This command cannot be manually re-run as it will depend on temporary files that no longer exist.

Command: vsearch --usearch_global /tmp/tmp9bmzjk7j --id 0.85 --db /tmp/qiime2-archive-ew75tn5v/9f4b9a95-e8c4-45be-a980-6faaa9b857c7/data/dna-sequences.fasta --uc /tmp/tmpqsl93h8o --strand plus --qmask none --notmatched /tmp/tmp72o2x5lw --threads 24

vsearch v2.7.0_linux_x86_64, 125.9GB RAM, 32 cores
https://github.com/torognes/vsearch

Reading fiand subsamplingle /tmp/qiime2-archive-ew75tn5v/9f4b9a95-e8c4-45be-a980-6faaa9b857c7/data/dna-sequences.fasta 100%
521145303 nt in 369953 seqs, min 900, max 2961, avg 1409
Masking 100%
Counting k-mers 100%
Creating k-mer index 100%
Searching 100%
Matching query sequences: 844922 of 2092876 (40.37%)
Running external command line application. This may print messages to stdout and/or stderr.
The command being run is below. This command cannot be manually re-run as it will depend on temporary files that no longer exist.

Command: vsearch --sortbysize /tmp/tmp72o2x5lw --xsize --output /tmp/q2-DNAFASTAFormat-anta19n4

vsearch v2.7.0_linux_x86_64, 125.9GB RAM, 32 cores
https://github.com/torognes/vsearch

Reading file /tmp/tmp72o2x5lw 100%
1747135600 nt in 1247954 seqs, min 1400, max 1400, avg 1400
Getting sizes 100%
Sorting 100%
Median abundance: 1
Writing output 100%
Running external command line application. This may print messages to stdout and/or stderr.
The command being run is below. This command cannot be manually re-run as it will depend on temporary files that no longer exist.

Command: vsearch --cluster_size /tmp/tmpa7j30vmp --id 0.85 --centroids /tmp/q2-DNAFASTAFormat-xiom9wn4 --uc /tmp/tmpwz5to5f2 --qmask none --xsize --threads 24

vsearch v2.7.0_linux_x86_64, 125.9GB RAM, 32 cores
https://github.com/torognes/vsearch

Reading file /tmp/tmpa7j30vmp 100%
1747135600 nt in 1247954 seqs, min 1400, max 1400, avg 1400
Sorting by abundance 100%
Counting k-mers 100%
Clustering 100%
Sorting clusters 100%
Writing clusters 100%
Clusters: 512147 Size min 1, max 55355, avg 2.4
Singletons: 482081, 38.6% of seqs, 94.1% of clusters
Traceback (most recent call last):
File "/home/bioinfo/anaconda3/envs/qiime2-2019.7/lib/python3.6/site-packages/q2cli/commands.py", line 327, in call
results = action(**arguments)
File "</home/bioinfo/anaconda3/envs/qiime2-2019.7/lib/python3.6/site-packages/decorator.py:decorator-gen-126>", line 2, in cluster_features_open_reference
File "/home/bioinfo/anaconda3/envs/qiime2-2019.7/lib/python3.6/site-packages/qiime2/sdk/action.py", line 240, in bound_callable
output_types, provenance)
File "/home/bioinfo/anaconda3/envs/qiime2-2019.7/lib/python3.6/site-packages/qiime2/sdk/action.py", line 502, in callable_executor
prov = provenance.fork(name, output)
File "/home/bioinfo/anaconda3/envs/qiime2-2019.7/lib/python3.6/site-packages/qiime2/core/archive/provenance.py", line 438, in fork
forked.add_ancestor(alias)
File "/home/bioinfo/anaconda3/envs/qiime2-2019.7/lib/python3.6/site-packages/qiime2/core/archive/provenance.py", line 167, in add_ancestor
shutil.copytree(str(grandcestor), str(destination))
File "/home/bioinfo/anaconda3/envs/qiime2-2019.7/lib/python3.6/shutil.py", line 359, in copytree
raise Error(errors)
shutil.Error: [('/tmp/qiime2-archive-7k562zzb/94346e54-6bf0-4227-95a8-533257fdabcc/provenance/artifacts/1635fd96-bf73-4ae8-bcf4-a392a2f1d920/action/metadata.tsv', '/tmp/qiime2-provenance-mi86hqxk/artifacts/1635fd96-bf73-4ae8-bcf4-a392a2f1d920/action/metadata.tsv', '[Errno 28] No space left on device'), ('/tmp/qiime2-archive-7k562zzb/94346e54-6bf0-4227-95a8-533257fdabcc/provenance/artifacts/1635fd96-bf73-4ae8-bcf4-a392a2f1d920/action/action.yaml', '/tmp/qiime2-provenance-mi86hqxk/artifacts/1635fd96-bf73-4ae8-bcf4-a392a2f1d920/action/action.yaml', '[Errno 28] No space left on device'), ('/tmp/qiime2-archive-7k562zzb/94346e54-6bf0-4227-95a8-533257fdabcc/provenance/artifacts/1635fd96-bf73-4ae8-bcf4-a392a2f1d920/metadata.yaml', '/tmp/qiime2-provenance-mi86hqxk/artifacts/1635fd96-bf73-4ae8-bcf4-a392a2f1d920/metadata.yaml', '[Errno 28] No space left on device'), ('/tmp/qiime2-archive-7k562zzb/94346e54-6bf0-4227-95a8-533257fdabcc/provenance/artifacts/1635fd96-bf73-4ae8-bcf4-a392a2f1d920/citations.bib', '/tmp/qiime2-provenance-mi86hqxk/artifacts/1635fd96-bf73-4ae8-bcf4-a392a2f1d920/citations.bib', '[Errno 28] No space left on device')]_

Is 85% OTU clustering needed for ONT?

Dear Deni,
thank you for developing this tool. We have used it to analyse a dataset but were quite surprised to see so little difference between healty and diseased samples. I am new to 16S metabarcoding using nanopore and have so many doubts.
I understand that ONT have higher error rates than Illumina but in my humble opinion, I think that 85% OTU clustering was way too low. Could the little differences between samples be affected by this? Trying to figure this out from literature, I find that in many cases, researchers do direct taxonomic classification without the OTU clustering step (I infer this, since no OTU clustering step was mentioned, so I may as well be in the wrong). I was wondering if the clustering is really needed in the analysis of full-length 16SrRNA gene, since the number of reads produced with this technology is much lower compared to Illumina.

All the best,

frequency per feature after proceesing nanopore seq data in qiime2

Dear,
I am writing to get input from experienced ones. I got nanopore data and using q2ONT command line to process my 16srRNA gene seq data. After demuliplexing, adapters removal, and trimming the reads to 1400 length, i imported my sequencing data into qiime2.
Ised these commands for deprelication of sequences, and for obtaining feature table seqs and feature table summary.

Dereplication of sequences
qiime vsearch dereplicate-sequences
--i-sequences 4_single-end-demux.qza
--o-dereplicated-table 5_derep-table.qza
--o-dereplicated-sequences 5_derep-seqs.qza

visualization files
qiime feature-table tabulate-seqs
--i-data 5_derep-seqs.qza
--o-visualization 5_derep-seqs.qzv

qiime feature-table summarize
--i-table 5_derep-table.qza
--o-visualization 5_derep-table.qzv

After these steps, i got two files, derep-seqs.qzv and derep-table.qzv.

Upon checking derep-table.qzv using qiime2 view, i realized that something might have gone wrong, as Frequency per feature is showing 1. Photo is attached.

Could you please provide insights what could have gone wrong that i obtained such outcomes, or it is normal to get such outcomes while processing nanopore seq data?
Thank you

deniribicic / q2ont Goto Github PK

q2ont's People

Stargazers

Watchers

Forkers

q2ont's Issues

Pipeline configuration issue?

Unable to run q20NT.sh in Qiime2core

Issue pulling 16S Silva and .sh

Problem with the classifier

Need help to create a barplot of the q2ONT results.

Memory issues and data volume

Is 85% OTU clustering needed for ONT?

frequency per feature after proceesing nanopore seq data in qiime2

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent