iquasere / mosca Goto Github PK

View Code? Open in Web Editor NEW

34.0 4.0 4.0 451.14 MB

Meta-Omics Software for Community Analysis

License: GNU General Public License v3.0

Python 91.11% Shell 1.89% Dockerfile 0.46% R 6.53%

metagenomics metatranscriptomics metaproteomics community-analysis

mosca's Introduction

Meta-Omics Software for Community Analysis

Welcome to the Meta-Omics Software for Community Analysis (MOSCA) Pipeline!

This pipeline is a powerful tool for analyzing and interpreting meta-omics data, including metagenomics, metatranscriptomics, and metaproteomics. It integrates a variety of state-of-the-art algorithms and visualizations, allowing you to easily and efficiently analyze your data.

To get started, check out MOSca's GUI TO perform meta-omics analyses (MOSGUITO):

🌐 https://iquasere.github.io/MOSGUITO

For more information on how to use MOSCA and its features, check out our wiki:

📖 https://github.com/iquasere/MOSCA/wiki

If you use MOSCA in your research, please cite our publication:

📄 Sequeira, João Carlos, et al. "MOSCA: an automated pipeline for integrated metagenomics and metatranscriptomics data analysis." International Conference on Practical Applications of Computational Biology & Bioinformatics. Springer, Cham, 2018.

We hope you find MOSCA helpful in your research endeavors! 🔍

mosca's People

Contributors

Stargazers

Watchers

Forkers

tiagomanueloliveira vikash84 suharoschi josepereira97

mosca's Issues

join_information issue

Hi dear MOSCA's programmer

First of all thanks for your program it is really very useful. Unfortunately when executing the main script I had several problems that I have been able to solve manually by reading the content of the Snakefile. But I have this join_information.py script problem that I can't solve. Install MOSCA from anaconda3 using mamba, my python version is 3.7.10 installed.

P.S. sorry for my english is not my native language

Select jobs to execute...

[Fri May 14 23:50:40 2021]
rule join_information:
input: /home/home-server/Documentos/libreria_chile/output/Annotation/uniprotinfo.tsv, /home/home-server/Documentos/libreria_chile/output/Annotation/Chile/aligned.blast, /home/home-server/Documentos/libreria_chile/output/Annotation/Chile/reCOGnizer_results.xlsx, /home/home-server/Documentos/libreria_chile/output/Metatranscriptomics/Infectado4.readcounts, /home/home-server/Documentos/libreria_chile/output/Metatranscriptomics/Infectado5.readcounts, /home/home-server/Documentos/libreria_chile/output/Metatranscriptomics/Sano8.readcounts, /home/home-server/Documentos/libreria_chile/output/Metatranscriptomics/Sano5.readcounts, /home/home-server/Documentos/libreria_chile/output/Metatranscriptomics/Sano3.readcounts, /home/home-server/Documentos/libreria_chile/output/Metatranscriptomics/Sano6.readcounts, /home/home-server/Documentos/libreria_chile/output/Metatranscriptomics/Infectado2.readcounts, /home/home-server/Documentos/libreria_chile/output/Metatranscriptomics/Sano2.readcounts, /home/home-server/Documentos/libreria_chile/output/Metatranscriptomics/Infectado7.readcounts, /home/home-server/Documentos/libreria_chile/output/Metatranscriptomics/Infectado10.readcounts, /home/home-server/Documentos/libreria_chile/output/Metatranscriptomics/Sano1.readcounts, /home/home-server/Documentos/libreria_chile/output/Metatranscriptomics/Infectado6.readcounts, /home/home-server/Documentos/libreria_chile/output/Metatranscriptomics/Sano4.readcounts, /home/home-server/Documentos/libreria_chile/output/Metatranscriptomics/Sano10.readcounts, /home/home-server/Documentos/libreria_chile/output/Metatranscriptomics/Sano7.readcounts, /home/home-server/Documentos/libreria_chile/output/Metatranscriptomics/Infectado8.readcounts, /home/home-server/Documentos/libreria_chile/output/Metatranscriptomics/Infectado3.readcounts, /home/home-server/Documentos/libreria_chile/output/Metatranscriptomics/Infectado1.readcounts, /home/home-server/Documentos/libreria_chile/output/Metatranscriptomics/Sano9.readcounts, /home/home-server/Documentos/libreria_chile/output/Metatranscriptomics/Infectado9.readcounts
output: /home/home-server/Documentos/libreria_chile/output/MOSCA_Protein_Report.xlsx, /home/home-server/Documentos/libreria_chile/output/MOSCA_Entry_Report.xlsx, /home/home-server/Documentos/libreria_chile/output/Metatranscriptomics/expression_matrix.tsv
jobid: 1
threads: 0

Job counts:
count jobs
1 join_information
1
python /share/MOSCA/scripts/join_information.py -e /home/home-server/Documentos/libreria_chile/experiments.tsv -t 0 -o /home/home-server/Documentos/libreria_chile/output -if tsv -nm TMM
MissingOutputException in line 262 of /share/MOSCA/scripts/Snakefile:
Job Missing files after 5 seconds:
/home/home-server/Documentos/libreria_chile/output/MOSCA_Protein_Report.xlsx
/home/home-server/Documentos/libreria_chile/output/MOSCA_Entry_Report.xlsx
/home/home-server/Documentos/libreria_chile/output/Metatranscriptomics/expression_matrix.tsv
This might be due to filesystem latency. If that is the case, consider to increase the wait time with --latency-wait.
Job id: 0 completed successfully, but some output files are missing. 0
File "/home/home-server/anaconda3/envs/mosca/lib/python3.7/site-packages/snakemake/executors/init.py", line 584, in handle_job_success
File "/home/home-server/anaconda3/envs/mosca/lib/python3.7/site-packages/snakemake/executors/init.py", line 259, in handle_job_success
Exiting because a job execution failed. Look above for error message
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: /home/home-server/Documentos/libreria_chile/.snakemake/log/2021-05-14T235039.772067.snakemake.log

Parameter from config.json not recognized.

Hi,
I am trying to run the full MOSCA pipeline for a metra-transcriptome differencial expressión analyzsis.
I created the config file with MOSQUITO. However, when I change the value of "minimum_read_length" from 100 (default) to 50 (as my reads are 76 pb long), the trimmomatic process althougt runs without problems it outputs empty fastq files that can not be parsed to the assembly phase.

I check the config.json file created in the /output_dir/ and has in "minimum_mg_read_length" and "minimum_mt_read_length" variables a value of "100".

The only way I find to solve the problem was to change locally the "default_config.json" file in the path /root/miniconda3/envs/MOSCA/share/MOSCA . There I change both minimum_mg_read_length and minimum_mt_read_length values to 50 and now trimmomatic output non-empty file that enters the assembly phase.

Althougth I solve the problem it, I guess it should't work that way.

Please complete the following information:

OS: Ubuntu 22.04.3 LTS, WSL-2
Version of MOSCA: 2.3.0

Regards,
Cesar Ayala

New problem encounter

Hi iquasere,
Sorry to trouble you again, Here I still encounter another new problem after running almost 2 days later.
....
5500000 alignment record pairs processed.
5600000 alignment record pairs processed.
5700000 alignment record pairs processed.
5800000 alignment record pairs processed.
5834173 alignment pairs processed.
[Tue Feb 2 15:08:24 2021]
Finished job 9.
1 of 5 steps (20%) done

[Tue Feb 2 15:08:24 2021]
rule join_information:
input: output/Annotation/uniprotinfo.tsv, output/Annotation/Sample/aligned.blast, output/Annotation/Sample/reCOGnizer_results.xlsx, output/Metatranscriptomics/mtname.readcounts, output/Annotation/mgname.readcounts
output: output/MOSCA_Protein_Report.xlsx, output/MOSCA_Entry_Report.xlsx, output/Metatranscriptomics/expression_matrix.tsv
jobid: 5
threads: 12

Job counts:
count jobs
1 join_information
1
python /media/zyshen/work/MOSCA/MOSCA-1.2.2/workflow/join_information.py -e output/experiments.tsv -t 12 -o output -if tsv -nm TMM
2021-02-02 07:08:28: Joining data for sample: Sample
Traceback (most recent call last):
File "/media/zyshen/work/MOSCA/MOSCA-1.2.2/workflow/join_information.py", line 147, in
Joiner().run()
File "/media/zyshen/work/MOSCA/MOSCA-1.2.2/workflow/join_information.py", line 58, in run
sheet_names = pd.ExcelFile(recognizer_filename).sheet_names
File "/media/zyshen/miniconda3/envs/mosca-1.2.2/lib/python3.6/site-packages/pandas/io/excel/_base.py", line 867, in init
self._reader = self._enginesengine
File "/media/zyshen/miniconda3/envs/mosca-1.2.2/lib/python3.6/site-packages/pandas/io/excel/_xlrd.py", line 22, in init
super().init(filepath_or_buffer)
File "/media/zyshen/miniconda3/envs/mosca-1.2.2/lib/python3.6/site-packages/pandas/io/excel/_base.py", line 353, in init
self.book = self.load_workbook(filepath_or_buffer)
File "/media/zyshen/miniconda3/envs/mosca-1.2.2/lib/python3.6/site-packages/pandas/io/excel/_xlrd.py", line 37, in load_workbook
return open_workbook(filepath_or_buffer)
File "/media/zyshen/miniconda3/envs/mosca-1.2.2/lib/python3.6/site-packages/xlrd/init.py", line 170, in open_workbook
raise XLRDError(FILE_FORMAT_DESCRIPTIONS[file_format]+'; not supported')
xlrd.biffh.XLRDError: Excel xlsx file; not supported
[Tue Feb 2 15:08:28 2021]
Error in rule join_information:
jobid: 0
output: output/MOSCA_Protein_Report.xlsx, output/MOSCA_Entry_Report.xlsx, output/Metatranscriptomics/expression_matrix.tsv

RuleException:
CalledProcessError in line 232 of /media/zyshen/work/MOSCA/MOSCA-1.2.2/workflow/Snakefile:
Command 'set -euo pipefail; python /media/zyshen/work/MOSCA/MOSCA-1.2.2/workflow/join_information.py -e output/experiments.tsv -t 12 -o output -if tsv -nm TMM' returned non-zero exit status 1.
File "/media/zyshen/miniconda3/envs/mosca-1.2.2/lib/python3.6/site-packages/snakemake/executors/init.py", line 2340, in run_wrapper
File "/media/zyshen/work/MOSCA/MOSCA-1.2.2/workflow/Snakefile", line 232, in __rule_join_information
File "/media/zyshen/miniconda3/envs/mosca-1.2.2/lib/python3.6/site-packages/snakemake/executors/init.py", line 568, in _callback
File "/media/zyshen/miniconda3/envs/mosca-1.2.2/lib/python3.6/concurrent/futures/thread.py", line 56, in run
File "/media/zyshen/miniconda3/envs/mosca-1.2.2/lib/python3.6/site-packages/snakemake/executors/init.py", line 554, in cached_or_run
File "/media/zyshen/miniconda3/envs/mosca-1.2.2/lib/python3.6/site-packages/snakemake/executors/init.py", line 2352, in run_wrapper
Exiting because a job execution failed. Look above for error message
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: /media/zyshen/work/MOSCA/MOSCA-1.2.2/.snakemake/log/2021-02-02T143849.506822.snakemake.log

you can see the size of mtname_bowtie2_report.txt is zero.
-rw-rw-r-- 1 zyshen zyshen 630 2月 2 08:27 mtname.log
-rw-rw-r-- 1 zyshen zyshen 13M 2月 2 08:36 mtname.readcounts
-rw-rw-r-- 1 zyshen zyshen 4.5G 2月 2 08:27 mtname.sam
-rw-rw-r-- 1 zyshen zyshen 0 2月 2 08:26 mtname_bowtie2_report.txt

any suggestion? Thanks!

Trouble installing

I tried to install MOSCA through conda biotools and I get the following error:

No such file or directory: 'output/Preprocess/SortMeRNA/read1.txt'

hi iquasere,
Sorry for trouble you again, right now I testing more data by MOSCA.
During the process, I encounter such error but the pipeline not stop and still continue run.
Is this missing file will impact the whole process or we can ignore it?

............................
join output/Preprocess/SortMeRNA/read1.txt output/Preprocess/SortMeRNA/read2.txt | awk '{print $1" "$2"\n"$3"\n+\n"$4 > "output/Preprocess/SortMeRNA/mt3namerep_forward.fastq";print $1" "$5"\n"$6"\n+\n"$7 > "output/Preprocess/SortMeRNA/mt3namerep_reverse.fastq"}'
Traceback (most recent call last):
File "/media/zyshen/work/MOSCA/MOSCA-1.3.5/workflow/preprocess.py", line 379, in
Preprocesser().run()
File "/media/zyshen/work/MOSCA/MOSCA-1.3.5/workflow/preprocess.py", line 349, in run
original_files=True if args.input == original_input else False)
File "/media/zyshen/work/MOSCA/MOSCA-1.3.5/workflow/preprocess.py", line 226, in rrna_removal
'{}/{}_reverse.fastq'.format(out_dir, name), out_dir)
File "/media/zyshen/work/MOSCA/MOSCA-1.3.5/workflow/preprocess.py", line 191, in remove_orphans
os.remove(file)
FileNotFoundError: [Errno 2] No such file or directory: 'output/Preprocess/SortMeRNA/read1.txt'
[Thu Mar 4 16:38:34 2021]
Error in rule preprocess:
jobid: 0
output: output/Preprocess/Trimmomatic/quality_trimmed_mt3name_forward_paired.fq, output/Preprocess/Trimmomatic/quality_trimmed_mt3name_reverse_paired.fq

RuleException:
CalledProcessError in line 111 of /media/zyshen/work/MOSCA/MOSCA-1.3.5/workflow/Snakefile:
Command 'set -euo pipefail; python /media/zyshen/work/MOSCA/MOSCA-1.3.5/workflow/preprocess.py -i /media/zyshen/MOSCA/20201023_L_QMK/mt3_R1.fastq,/media/zyshen/MOSCA/20201023_L_QMK/mt3_R2.fastq -t 10 -o output/Preprocess -adaptdir /media/zyshen/work/MOSCA/MOSCA-1.3.4/adapters -rrnadbs /media/zyshen/work/MOSCA/MOSCA-1.3.4/rRNA_databases -d mrna -rd /media/zyshen/work/MOSCA/MOSCA-1.3.4 -n mt3name --minlen 100 --avgqual 20' returned non-zero exit status 1.
File "/media/zyshen/miniconda3/envs/snakemake/lib/python3.6/site-packages/snakemake/executors/init.py", line 2340, in run_wrapper
File "/media/zyshen/work/MOSCA/MOSCA-1.3.5/workflow/Snakefile", line 111, in __rule_preprocess
File "/media/zyshen/miniconda3/envs/snakemake/lib/python3.6/site-packages/snakemake/executors/init.py", line 568, in _callback
File "/media/zyshen/miniconda3/envs/snakemake/lib/python3.6/concurrent/futures/thread.py", line 56, in run
File "/media/zyshen/miniconda3/envs/snakemake/lib/python3.6/site-packages/snakemake/executors/init.py", line 554, in cached_or_run
File "/media/zyshen/miniconda3/envs/snakemake/lib/python3.6/site-packages/snakemake/executors/init.py", line 2352, in run_wrapper
Exiting because a job execution failed. Look above for error message
Removed: output/Preprocess/SortMeRNA/mt3namerep_interleaved.fastq
fastqc --outdir output/Preprocess/FastQC --threads 10 --extract output/Preprocess/SortMeRNA/mt3namerep_forward.fastq output/Preprocess/SortMeRNA/mt3namerep_reverse.fastq
Started analysis of mt3namerep_forward.fastq
Started analysis of mt3namerep_reverse.fastq
Approx 5% complete for mt3namerep_forward.fastq
Approx 5% complete for mt3namerep_reverse.fastq
Approx 10% complete for mt3namerep_forward.fastq
Approx 10% complete for mt3namerep_reverse.fastq
Approx 15% complete for mt3namerep_forward.fastq
Approx 15% complete for mt3namerep_reverse.fastq
Approx 20% complete for mt3namerep_forward.fastq
.......

zhiyong

Mismatch between Wiki and actual config requirements

The paranmeter "experiment", according tot he wiki, is "Name of TSV file with information on samples/files/conditions.", but in the code it's just tryong to create a dataframe out of that string, suggesting it ought to be a dict as in the example config.

Possible to run MT without MG?

Hi,

Thanks for the recent overhaul to the software. I was wondering if it is possible to run MOSCA only on MT data without MG data? I have some MT data simulated with Polyester and art_illumna (which only simulate RNA-seq data) with no associated MG reads and I was wondering if it's possible to do DE with it.

Thanks.

Error in rule preprocess

Dear MOSCA Devs,

First, thanks for making MOSCA, this is a great tool!

Unfortunately, I'm having an issue with the pipeline that seems to be getting stuck during preprocessing. I'm not sure if it is related to this issue, but I thought I should create another issue just in case.

To test the pipeline I'm only running on set of R1/R2 read files for one sample, and it seems to be getting stuck at the SortMeRNA stage. I've included the traceback error, but I'm not able to figure out what the error refers to.

File "/home/ctrivedi/miniconda3/envs/mosca/share/MOSCA/scripts/mosca_tools.py", line 24, in run_command
    check=True)
  File "/home/ctrivedi/miniconda3/envs/mosca/lib/python3.7/subprocess.py", line
512, in run
    output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['sortmerna', '--ref', 'resources_directory/rRNA_databases/silva-arc-23s-id98.fasta,resources_directory/rRNA_databases/silva-arc-23s-id98.idx:resources_directory/rRNA_databases/rfam-5s-database-id98.fasta,resources_directory/rRNA_databases/rfam-5s-database-id98.idx:resources_directory/rRNA_databases/silva-euk-18s-id95.fasta,resources_directory/rRNA_databases/silva-euk-18s-id95.idx:resources_directory/rRNA_databases/rfam-5.8s-database-id98.fasta,resources_directory/rRNA_databases/rfam-5.8s-database-id98.idx:resources_directory/rRNA_databases/silva-bac-23s-id98.fasta,resources_directory/rRNA_databases/silva-bac-23s-id98.idx:resources_directory/rRNA_databases/silva-bac-16s-id90.fasta,resources_directory/rRNA_databases/silva-bac-16s-id90.idx:resources_directory/rRNA_databases/silva-euk-28s-id98.fasta,resources_directory/rRNA_databases/silva-euk-28s-id98.idx:resources_directory/rRNA_databases/silva-arc-16s-id95.fasta,resources_directory/rRNA_databases/silva-arc-16s-id95.idx', '--reads', 'output/Preprocess/SortMeRNA/MIT1_interleaved.fastq', '--aligned', 'output/Preprocess/SortMeRNA/MIT1_accepted', '--fastx', '--other', 'output/Preprocess/SortMeRNA/MIT1_rejected', '-a', '24', '--paired_out']' returned non-zero exit status 1.
[Sun May  2 16:58:42 2021]
Error in rule preprocess:
    jobid: 0
    output: output/Preprocess/Trimmomatic/quality_trimmed_MIT1_forward_paired.fq, output/Preprocess/Trimmomatic/quality_trimmed_MIT1_reverse_paired.fq

RuleException:
CalledProcessError in line 111 of /home/ctrivedi/miniconda3/envs/mosca/share/MOSCA/scripts/Snakefile:
Command 'set -euo pipefail;  python /home/ctrivedi/miniconda3/envs/mosca/share/MOSCA/scripts/preprocess.py -i Raw/RNA_TR_zymo11_S6_R1_001.fastq,Raw/RNA_TR_zymo11_S6_R2_001.fastq -t 24 -o output/Preprocess -adaptdir resources_directory/adapters -rrnadbs resources_directory/rRNA_databases -d mrna -rd resources_directory -n MIT1 --minlen 100 --avgqual 20' returned non-zero exit status 1.
  File "/home/ctrivedi/miniconda3/envs/mosca/lib/python3.7/site-packages/snakemake/executors/__init__.py", line 2352, in run_wrapper
  File "/home/ctrivedi/miniconda3/envs/mosca/share/MOSCA/scripts/Snakefile", line 111, in __rule_preprocess
  File "/home/ctrivedi/miniconda3/envs/mosca/lib/python3.7/site-packages/snakemake/executors/__init__.py", line 569, in _callback
  File "/home/ctrivedi/miniconda3/envs/mosca/lib/python3.7/concurrent/futures/thread.py", line 57, in run
  File "/home/ctrivedi/miniconda3/envs/mosca/lib/python3.7/site-packages/snakemake/executors/__init__.py", line 555, in cached_or_run
  File "/home/ctrivedi/miniconda3/envs/mosca/lib/python3.7/site-packages/snakemake/executors/__init__.py", line 2364, in run_wrapper
Exiting because a job execution failed. Look above for error message
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: /home/ctrivedi/Working/Deep_Purple/GR19_Metatranscriptomes/Mosca/.snakemake/log/2021-05-02T165747.387856.snakemake.log

I'm using the default config file, and I've included my experiments file (as txt as github doesn't seem to support tsv upload). Any information would be greatly appreciated. Thanks in advance for any help.

Sincerely,
Chris

experiments.txt

Error in rule upimapi -Annotation error-

Hi,

In a differential expresion analysis upimapi brokes down with the follwing error
"ERR: [Entry name] is not a valid column name for ID mapping. For more information, check https://github.com/iquasere/UPIMAPI/tree/master#sometimes-the-return-fields-are-not-properly-updated"

As additional files I am uploading the config file being used, the log produce by sneakemake and the complete error displayed in the screen.
config.json
MOSCA_log.txt
Screen_error.txt

I would love to get a complete execution of the MOSCA software it shows so much value for analyzing my samples.

Please complete the following information:

OS: Ubuntu 22.04.3 LTS, WSL-2
Version of MOSCA: 2.3.0

Regards.

Cesar Ayala

Several suggested improvements to install.bash

Things I think you should consider changing:

Remove Xming install line at beginning of file. This seems to be specifically a Windows program and should not be hard coded into the automated install file.
As of writing this, installing conda and then trying to conda install anything runs into SSL errors due to a change in pyOpenSSL. You need to upgrade it to at least 16.2.0 with python3 -m easy_install --upgrade pyOpenSSL==16.2.0
Conda installs should have an additional -y flag to suppress confirmation
Lines 23,31 you shouldn't hard code your own home directory path.
Line 25,31 please prepend any comments with a #
Line 35 I don't think you need to install aptitude just to install build-essential. Why not add build-essential to line 36 and do both at once?
Lines 73,74 are python code and do not belong in a bash file

File not found error

Hi,

I tried running MOSCA using the command:

python MOSCA/scripts/mosca.py --files /hpc/home/mzl5860/Ming_workdir/G549/MOSCA/files_rename/G549-01-R1.fastq.gz,/hpc/home/mzl5860/Ming_workdir/G549/MOSCA/files_rename/G549-01-R2.fastq.gz --output output

The program ran up the FASTQC step, but then returned the error:

Traceback (most recent call last):
File "MOSCA/scripts/mosca.py", line 170, in
reporter.info_from_preprocessing(args.output, mg_name, mg[0])
File "/ri/shared/modules7/miniconda3/envs/mosca-1.0.2/MOSCA/scripts/report.py", line 112, in info_from_preprocessing
self.info_from_fastqc(output_dir, name, '[Initial quality assessment]', prefix2terms)
File "/ri/shared/modules7/miniconda3/envs/mosca-1.0.2/MOSCA/scripts/report.py", line 76, in info_from_fastqc
prefix2terms[prefix][0], name, prefix2terms[prefix][i])) for i in [1, 2]]
File "/ri/shared/modules7/miniconda3/envs/mosca-1.0.2/MOSCA/scripts/report.py", line 76, in
prefix2terms[prefix][0], name, prefix2terms[prefix][i])) for i in [1, 2]]
File "/ri/shared/modules7/miniconda3/envs/mosca-1.0.2/MOSCA/scripts/mosca_tools.py", line 584, in parse_fastqc_report
file = open(filename).read().split('\n')
FileNotFoundError: [Errno 2] No such file or directory: 'output/Preprocess/FastQC/G549-01-R1.fastq.gz_R1_fastqc/fastqc_data.txt'

The program seems to be trying to read from the wrong file. Is there some format that the file names need to be in?

Can this be fixed?

Thanks.

typo in assembly.py

In assembly.py, line 111, i think it should be "alignment.log" not "alignment.loj"?

No adapter removal despite Adapter Content Fail in FastQC report

Hi,

Nice to see you were able to integrate the MT only-mode in 1.3.
The preprocessing would fail to recognise that adapters were present (as reported by FastQC) and skip adapter removal. I think I identified the problem in preprocess.py: the has_adapter() function checks for over represented sequences when it should check for "Adapter Content" in the FastQC report. Changing the function as below solved it for me and would be the correct way to identify whether adapters are present from FastQC report:

def has_adapters(self, fastqc_report):
data = parse_fastqc(fastqc_report)
if data['Adapter Content'][0] == 'pass':
return False
else:
return True

Best,
Niclas

self.bowtie2 not defined in assembly.py

In assembly.py, line 113, self.bowtie2 throws an error because bowtie2 was never defined. Seems like an easy error to catch that should have been obvious if this pipeline was tested at all.

Issue with metaquast after assembly

Hi,

I've been having issues running MOSCA for RNAseq sequences, specifically after assembly, where I keep getting an issue with Metaquast, in the error, it seems like there is no contig.fasta or scaffold.fasta file. The assembly seems to result in a transcripts.fasta file, which I assume to be the output file expected for rnaSPADES?

Here is the outcome.

Hopefully you will be able to help?

Kind regards,
Imadh

Snakefile KeyError for 'split_gene_calling'

Trying to use MOSCA 2.0 I get this error:

KeyError in file /local_usr/lib/miniconda3/envs/mosca/share/MOSCA/Snakefile, line 197:
'split_gene_calling'

It seems to be looking for a split_gene_calling section in the config file, but I see no documentation on what to put there, ans MOSGUITO doesn't add it either.

typo in mosca_tools.py?

In the file mosca_tools.py, line 124 returns an error; if blast is None, then blast.replace( ) cannot run since None has no replace method. Should it be reference.replace( ) instead? The next line seems to imply the check should be for reference object not blast.

Empty Output Directories

Hi,

I have used my metagenomic data to check this pipeline. The pipeline worked well but the output directories generated were empty with no information. Please look into the matter.

nbhb-001-017:MOSCA-master csid$ python main.py --files /Users/csid/Documents/SRP187387_1.fastq /Users/csidhu/Documents/SRP187387_2 -o /Users/csid/Documents/
{ 'annotation_database': 'Databases/Annotation/uniprot.dmnd',
'assembler': 'metaspades',
'conditions': None,
'files': [ '/Users/csid/Documents/SRP187387_1.fastq',
'/Users/csid/Documents/SRP187387_2'],
'metaproteomic': False,
'no_annotation': False,
'no_assembly': False,
'no_binning': False,
'no_preprocessing': False,
'output': '/Users/csid/Documents',
'output_level': 'min',
'type_of_data': 'paired'}

{'files': ['/Users/csid/Documents/SRP187387_1.fastq', '/Users/csid/Documents/SRP187387_2'], 'type_of_data': 'paired', 'assembler': 'metaspades', 'annotation_database': 'Databases/Annotation/uniprot.dmnd', 'output': '/Users/csidhu/Documents', 'no_preprocessing': False, 'no_assembly': False, 'no_annotation': False, 'no_binning': False, 'output_level': 'min', 'metaproteomic': False, 'conditions': None}
Files
Experiment1
MG files: /Users/csid/Documents/SRP187387_1.fastq
Experiment2
MG files: /Users/csid/Documents/SRP187387_2
Type of data: paired
Assembler: metaspades
Annotation database: Databases/Annotation/uniprot.dmnd
Output: /Users/csid/Documents
No preprocessing: False
No assembly: False
No annotation: False
No binning: False
Output level: min
Metaproteomic: False
Conditions: None
Creating directories at /Users/csid/Documents
Created /Users/csid/Documents/Preprocess/FastQC
Created /Users/csid/Documents/Preprocess/Trimmomatic
Created /Users/csid/Documents/Preprocess/SortMeRNA
Created /Users/csid/Documents/Assembly
Created /Users/csid/Documents/Annotation
Created /Users/csid/Documents/Metatranscriptomics_analysis
Splitting reads in /Users/csid/Documents/SRP187387_1.fastq to /Users/csid/Documents/Preprocess/SRP187387_1_R1.fastq and /Users/csid/Documents/Preprocess/SRP187387_1_R2.fastq.
Handling SRP187387_1
Preprocessing of metagenomic reads has begun
Assembly has begun
Annotation has begun
Binning has begun
Splitting reads in /Users/csid/Documents/SRP187387_2 to /Users/csid/Documents/Preprocess/SRP187387_2_R1.fastq and /Users/csid/Documents/Preprocess/SRP187387_2_R2.fastq.
Handling SRP187387_2
Preprocessing of metagenomic reads has begun
Assembly has begun
Annotation has begun
Binning has begun
nbhb-001-017:MOSCA-master csid$

Versioning / releases

Hi there,

I am looking into metatranscriptomics workflows and found yours. One major drawback I see is that you have no releases/versioning and therefore unnecessarily make reproducibility harder.

I was also wondering whether it is possible to use your pipeline with an existing metagenome assembly & annotation, the readme implies that there need to be MG and MT reads, did I understand that correctly?

Best wishes

self.input error in annotation step

When running mosca.py, an error is returned on line 147 since the Annotater( ) object never receives either an "input" argument or an "mg_name" argument, which both causes line 531 of annotation.py to crash since it requires them.

Error in rule preprocess - FastQC after Sortmerna

Dear MOSCA Devs,

My apologies for creating a new issue, however, I added a comment to the old issue and am now realizing that there may be no notifications on closed issues. I'm reposting my issue here with a link to the comment. If this is improper procedure please let me know and I can delete this issue.

I'm testing the latest release of MOSCA! Unfortunately, though, I'm already hindered by another preprocess error.

Somehow I'm getting stuck on a FastQC step after rRNA removal using Sortmerna. I believe it's because the after_rrna_removal_* files are empty - or at least they are 0 kb on inspection. I haven't been able to determine why this is though.

Here is the full error, maybe you can track it better than I can? Thanks!

FastQC --threads 16 --extract /home/ctrivedi/Working/Deep_Purple/mosca_GR19/output/Preprocess/SortMeRNA/after_rrna_removal_GRIS1_TR_forward.fastq /home/ctrivedi/Working/Deep_Purple/mosca_GR19/output/Preprocess/SortMeRNA/after_rrna_removal_GRIS1_TR_reverse.fastq
Started analysis of after_rrna_removal_GRIS1_TR_forward.fastq
Analysis complete for after_rrna_removal_GRIS1_TR_forward.fastq
Failed to process file after_rrna_removal_GRIS1_TR_forward.fastq
java.lang.ArrayIndexOutOfBoundsException: Index -1 out of bounds for length 0
        at uk.ac.babraham.FastQC.Modules.SequenceLengthDistribution.calculateDistribution(SequenceLengthDistribution.java:101)
        at uk.ac.babraham.FastQC.Modules.SequenceLengthDistribution.raisesError(SequenceLengthDistribution.java:190)
        at uk.ac.babraham.FastQC.Report.HTMLReportArchive.startDocument(HTMLReportArchive.java:336)
        at uk.ac.babraham.FastQC.Report.HTMLReportArchive.<init>(HTMLReportArchive.java:84)
        at uk.ac.babraham.FastQC.Analysis.OfflineRunner.analysisComplete(OfflineRunner.java:178)
        at uk.ac.babraham.FastQC.Analysis.AnalysisRunner.run(AnalysisRunner.java:110)
        at java.base/java.lang.Thread.run(Thread.java:834)
Started analysis of after_rrna_removal_GRIS1_TR_reverse.fastq
Analysis complete for after_rrna_removal_GRIS1_TR_reverse.fastq
Failed to process file after_rrna_removal_GRIS1_TR_reverse.fastq
java.lang.ArrayIndexOutOfBoundsException: Index -1 out of bounds for length 0
        at uk.ac.babraham.FastQC.Modules.SequenceLengthDistribution.calculateDistribution(SequenceLengthDistribution.java:101)
        at uk.ac.babraham.FastQC.Modules.SequenceLengthDistribution.raisesError(SequenceLengthDistribution.java:190)
        at uk.ac.babraham.FastQC.Report.HTMLReportArchive.startDocument(HTMLReportArchive.java:336)
        at uk.ac.babraham.FastQC.Report.HTMLReportArchive.<init>(HTMLReportArchive.java:84)
        at uk.ac.babraham.FastQC.Analysis.OfflineRunner.analysisComplete(OfflineRunner.java:178)
        at uk.ac.babraham.FastQC.Analysis.AnalysisRunner.run(AnalysisRunner.java:110)
        at java.base/java.lang.Thread.run(Thread.java:834)
Traceback (most recent call last):
  File "/home/ctrivedi/miniconda3/envs/mosca/share/MOSCA/scripts/preprocess.py", line 357, in <module>
    Preprocesser().run()
  File "/home/ctrivedi/miniconda3/envs/mosca/share/MOSCA/scripts/preprocess.py", line 343, in run
    minlen=args.minlen, type_of_data=args.data)
  File "/home/ctrivedi/miniconda3/envs/mosca/share/MOSCA/scripts/preprocess.py", line 248, in quality_trimming
    data = parse_fastqc_report(report)
  File "/home/ctrivedi/miniconda3/envs/mosca/share/MOSCA/scripts/mosca_tools.py", line 249, in parse_fastqc_report
    file = open(filename).read().split('\n')
FileNotFoundError: [Errno 2] No such file or directory: '/home/ctrivedi/Working/Deep_Purple/mosca_GR19/output/Preprocess/FastQC/after_rrna_removal_GRIS1_TR_forward_fastqc/fastqc_data.txt'
[Thu Jan 13 16:39:23 2022]
Error in rule preprocess:
    jobid: 0
    output: /home/ctrivedi/Working/Deep_Purple/mosca_GR19/output/Preprocess/Trimmomatic/quality_trimmed_GRIS1_TR_forward_paired.fq, /home/ctrivedi/Working/Deep_Purple/mosca_GR19/output/Preprocess/Trimmomatic/quality_trimmed_GRIS1_TR_reverse_paired.fq

RuleException:
CalledProcessError in line 103 of /home/ctrivedi/miniconda3/envs/mosca/share/MOSCA/scripts/Snakefile:
Command 'set -euo pipefail;  python /home/ctrivedi/miniconda3/envs/mosca/share/MOSCA/scripts/preprocess.py -i /home/ctrivedi/Working/Deep_Purple/GR19_Metatranscriptomes/Raw/totalRNA/RNA_TR_zymo7_S4_R1_001.fastq.gz,/home/ctrivedi/Working/Deep_Purple/GR19_Metatranscriptomes/Raw/totalRNA/RNA_TR_zymo7_S4_R2_001.fastq.gz -t 16 -o /home/ctrivedi/Working/Deep_Purple/mosca_GR19/output/Preprocess -adaptdir resources_directory/adapters -rrnadbs resources_directory/rRNA_databases -d mrna -rd resources_directory -n GRIS1_TR --minlen 100 --avgqual 20' returned non-zero exit status 1.
  File "/home/ctrivedi/miniconda3/envs/mosca/lib/python3.7/site-packages/snakemake/executors/__init__.py", line 2347, in run_wrapper
  File "/home/ctrivedi/miniconda3/envs/mosca/share/MOSCA/scripts/Snakefile", line 103, in __rule_preprocess
  File "/home/ctrivedi/miniconda3/envs/mosca/lib/python3.7/site-packages/snakemake/executors/__init__.py", line 568, in _callback
  File "/home/ctrivedi/miniconda3/envs/mosca/lib/python3.7/concurrent/futures/thread.py", line 57, in run
  File "/home/ctrivedi/miniconda3/envs/mosca/lib/python3.7/site-packages/snakemake/executors/__init__.py", line 554, in cached_or_run
  File "/home/ctrivedi/miniconda3/envs/mosca/lib/python3.7/site-packages/snakemake/executors/__init__.py", line 2359, in run_wrapper
Exiting because a job execution failed. Look above for error message
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: /home/ctrivedi/Working/Deep_Purple/mosca_GR19/.snakemake/log/2022-01-13T163802.188280.snakemake.log

Originally posted by @ChrisTrivedi in #15 (comment)

iquasere / mosca Goto Github PK

mosca's Introduction

Meta-Omics Software for Community Analysis

To get started, check out MOSca's GUI TO perform meta-omics analyses (MOSGUITO):

For more information on how to use MOSCA and its features, check out our wiki:

If you use MOSCA in your research, please cite our publication:

We hope you find MOSCA helpful in your research endeavors! 🔍

mosca's People

Contributors

Stargazers

Watchers

Forkers

mosca's Issues

Recommend Projects

Recommend Topics

Recommend Org