Describe the bug My raw pair-end FASTQ data has the name like thi

BUG: [config.yaml error] about seq2science HOT 15 OPEN

bioinfolabmu commented on September 27, 2024

BUG: [config.yaml error]

from seq2science.

Comments (15)

bioinfolabmu commented on September 27, 2024

Alright, I think I know what is the problem

fqsuffix: fq
fqext1: '1'
fqext2: '2'

This modification has solved my configuration file problem.

from seq2science.

bioinfolabmu commented on September 27, 2024

A relevant question in Sample.tsv file.

For the second line in this file, I have the following tab-delimited column descriptions.

sample assembly dev_stage treatment biological_replicates

My questions is "are these column descriptions be fixed as the key words in your tool? For example, what happen if I use "developmental_stage" to replace "dev_stage"? or use "genome" to replace "assembly"?

I did not find the relevant requirement information in your document web page. I am sorry if I missed something in your documentation.

from seq2science.

Maarten-vd-Sande commented on September 27, 2024

Great that you fixed it. You can have any number of columns, and name them whatever you want. There are however certain column names that have a specific meaning. Such as sample, assembly, biological_replicates, and descriptive_name. These columns are used internally by seq2science for specific stuff. Moreover, you can use your column names in the differential peak/gene calling step: https://vanheeringen-lab.github.io/seq2science/content/DESeq2.html#contrast-in-the-samples-tsv

Specifically:

"developmental_stage" -> "dev_stage" does not change anything in how seq2science runs, as those columns are ignored (except when they are used to define contrasts)
"assembly" -> "genome" makes it so that seq2science won't work. As the assembly column is required. This column specifies which assembly is used

from seq2science.

bioinfolabmu commented on September 27, 2024

Thank you for quick response.

(1) So, considering the DEG analysis, the names in samples.tsv should be in consistent with your requirement. Actually, "dev_stage" should be "stages". That way, we can make sure to get proper results by seq2sequence. Am I correct?

(2) The column names such as "sample", "assembly", " stages", "treatments", "biological_replicates", "technical_replicates" and "condition" are easily applicable to many researcher's data analysis need. That should be sufficient. I noticed that "descriptive_name" requires unique constraint among different rows in the samples.tsv. Am I right? I am curios how does seq2science use "descriptive_name" internally?

from seq2science.

Maarten-vd-Sande commented on September 27, 2024

(1) I don't understand the question. For the DEG analysis you can use any column(s) in the samples file. You can use dev_stage or stages. Just make sure that you contrast specification in the config.yaml reflects the correct column name

If you use the column dev_stage: dev_stage_one_two, and if you use stages: stages_one_two. It can be any column you want. You can even combine multiple columns for batch effect correction: https://vanheeringen-lab.github.io/seq2science/content/DESeq2.html#batch-effect-correction

(2) descriptive_name is one of the special columns that seq2science uses internally, just like for example sample, assembly, and biological replicates. It is used for the count table and for the final multiqc report

from seq2science.

bioinfolabmu commented on September 27, 2024

Thanks. My qeustion was that "should we use 'stage' instead of 'dev_stage' or 'developmental_stages'? ". You said that it does not matter, because they are not the key words used in seq2science. My guess is that 'condition' is also not the key words used by seq2science. So, we can use different variations for it, such as 'conditions' or 'my_conditions', etc. Right?

from seq2science.

bioinfolabmu commented on September 27, 2024

When I run my own data for alignment with Star, I encountered a bug. I am debugging now to see what happens. I noticed that Salmon as the quantifier tool, is not affected at all. It generates its own data. This means that Salmon is using his own alignment tool to finish the quantification itself. My next question is that, after I fixed the bug of running star, how can I connect the start alignment results to feed salmon for qunatification?

from seq2science.

Maarten-vd-Sande commented on September 27, 2024

Yes you are right! I guess that's not entirely clear from the docs.

sample, assembly, descriptive_name, biological_replicates, and technical_replicates are column names used by seq2science internally. Any other column name is basically ignored, unless you use it for DESeq2

from seq2science.

bioinfolabmu commented on September 27, 2024

Here is what in my config.yaml:

aligner:
star:
align: --quantMode GeneCounts --outSAMtype BAM

Seq2science gives me a fatal error in log file, saying "Duplicate parameter". I am trying to solve this problem.

from seq2science.

Maarten-vd-Sande commented on September 27, 2024

Yeah that's perhaps unclear on our side (again). Almost all rules have sensible defaults, so you don't have to tune them. So you could just say: aligner: star.

We force star to output a BAM by default, as we need a bam as its output, so we always have --outSAMtype BAM_Unsorted. This gives a duplicate parameter

see: https://vanheeringen-lab.github.io/seq2science/content/all_rules.html#star-align

from seq2science.

Maarten-vd-Sande commented on September 27, 2024

Also I'm not sure if the downstream steps work when you change quantmode

from seq2science.

bioinfolabmu commented on September 27, 2024

I remembered that I read somewhere, you do not support 2-pass start alignemnt yet. What is your recommendation if we want to do the two passes, and then come back to Seq2science again?

from seq2science.

bioinfolabmu commented on September 27, 2024

I also encounter a problem running trimglora, but no problem with fastp. My guess is the similar problem with configuration with default paramters or no default. I will debug that later.

from seq2science.

Maarten-vd-Sande commented on September 27, 2024

I'm not familiar with 2-pass start alignment of star so I can't comment on that... What does it do? What changes? The sample fastqs, the genome assembly, or the index?

from seq2science.

siebrenf commented on September 27, 2024

Hey bioinfolabmu,

I'm trying to read your questions but I get a bit confused. Please keep to one question per git Issue (I really don't mind if you open multiple 👼 )

I'll open some new issues for each question here that we haven't answered yet, and then try to answer them there!

from seq2science.

BUG: [config.yaml error] about seq2science HOT 15 OPEN

Comments (15)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent