Comments (15)
Alright, I think I know what is the problem
fqsuffix: fq
fqext1: '1'
fqext2: '2'
This modification has solved my configuration file problem.
from seq2science.
A relevant question in Sample.tsv file.
For the second line in this file, I have the following tab-delimited column descriptions.
sample assembly dev_stage treatment biological_replicates
My questions is "are these column descriptions be fixed as the key words in your tool? For example, what happen if I use "developmental_stage" to replace "dev_stage"? or use "genome" to replace "assembly"?
I did not find the relevant requirement information in your document web page. I am sorry if I missed something in your documentation.
from seq2science.
Great that you fixed it. You can have any number of columns, and name them whatever you want. There are however certain column names that have a specific meaning. Such as sample
, assembly
, biological_replicates
, and descriptive_name
. These columns are used internally by seq2science for specific stuff. Moreover, you can use your column names in the differential peak/gene calling step: https://vanheeringen-lab.github.io/seq2science/content/DESeq2.html#contrast-in-the-samples-tsv
Specifically:
- "developmental_stage" -> "dev_stage" does not change anything in how seq2science runs, as those columns are ignored (except when they are used to define contrasts)
- "assembly" -> "genome" makes it so that seq2science won't work. As the assembly column is required. This column specifies which assembly is used
from seq2science.
Thank you for quick response.
(1) So, considering the DEG analysis, the names in samples.tsv should be in consistent with your requirement. Actually, "dev_stage" should be "stages". That way, we can make sure to get proper results by seq2sequence. Am I correct?
(2) The column names such as "sample", "assembly", " stages", "treatments", "biological_replicates", "technical_replicates" and "condition" are easily applicable to many researcher's data analysis need. That should be sufficient. I noticed that "descriptive_name" requires unique constraint among different rows in the samples.tsv. Am I right? I am curios how does seq2science use "descriptive_name" internally?
from seq2science.
(1) I don't understand the question. For the DEG analysis you can use any column(s) in the samples file. You can use dev_stage or stages. Just make sure that you contrast specification in the config.yaml reflects the correct column name
If you use the column dev_stage: dev_stage_one_two
, and if you use stages: stages_one_two
. It can be any column you want. You can even combine multiple columns for batch effect correction: https://vanheeringen-lab.github.io/seq2science/content/DESeq2.html#batch-effect-correction
(2) descriptive_name is one of the special columns that seq2science uses internally, just like for example sample, assembly, and biological replicates. It is used for the count table and for the final multiqc report
from seq2science.
Thanks. My qeustion was that "should we use 'stage' instead of 'dev_stage' or 'developmental_stages'? ". You said that it does not matter, because they are not the key words used in seq2science. My guess is that 'condition' is also not the key words used by seq2science. So, we can use different variations for it, such as 'conditions' or 'my_conditions', etc. Right?
from seq2science.
When I run my own data for alignment with Star, I encountered a bug. I am debugging now to see what happens. I noticed that Salmon as the quantifier tool, is not affected at all. It generates its own data. This means that Salmon is using his own alignment tool to finish the quantification itself. My next question is that, after I fixed the bug of running star, how can I connect the start alignment results to feed salmon for qunatification?
from seq2science.
Yes you are right! I guess that's not entirely clear from the docs.
sample
, assembly
, descriptive_name
, biological_replicates
, and technical_replicates
are column names used by seq2science internally. Any other column name is basically ignored, unless you use it for DESeq2
from seq2science.
Here is what in my config.yaml:
aligner:
star:
align: --quantMode GeneCounts --outSAMtype BAM
Seq2science gives me a fatal error in log file, saying "Duplicate parameter". I am trying to solve this problem.
from seq2science.
Yeah that's perhaps unclear on our side (again). Almost all rules have sensible defaults, so you don't have to tune them. So you could just say: aligner: star
.
We force star to output a BAM by default, as we need a bam as its output, so we always have --outSAMtype BAM_Unsorted. This gives a duplicate parameter
see: https://vanheeringen-lab.github.io/seq2science/content/all_rules.html#star-align
from seq2science.
Also I'm not sure if the downstream steps work when you change quantmode
from seq2science.
I remembered that I read somewhere, you do not support 2-pass start alignemnt yet. What is your recommendation if we want to do the two passes, and then come back to Seq2science again?
from seq2science.
I also encounter a problem running trimglora, but no problem with fastp. My guess is the similar problem with configuration with default paramters or no default. I will debug that later.
from seq2science.
I'm not familiar with 2-pass start alignment of star so I can't comment on that... What does it do? What changes? The sample fastqs, the genome assembly, or the index?
from seq2science.
Hey bioinfolabmu,
I'm trying to read your questions but I get a bit confused. Please keep to one question per git Issue (I really don't mind if you open multiple 👼 )
I'll open some new issues for each question here that we haven't answered yet, and then try to answer them there!
from seq2science.
Related Issues (20)
- BUG: download_fastq report error HOT 4
- Combining STAR and Salmon HOT 5
- STAR 2-pass alignment HOT 1
- Q: [Comparative RNA-Seq analysis] HOT 2
- Q: [DEG analysis contrast] HOT 1
- BUG: chipseeker is broken (again)
- FR: [DEG by Salmon]
- BUG: Authentication plugin 'mysql_native_password' cannot be loaded HOT 4
- FR: single cell auto detection if droplet or cell based barcodes?
- BUG: qc_scRNA rules dont use params, so rerunning doesn't work
- FR: single cell RNA filtering
- Issue with Chip-seq pipeline: jobid: 27: one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode! HOT 8
- Q: Stuck with initialisation of a workflow (atac-seq) HOT 13
- BUG: Problem with the Initiation HOT 6
- FR: Use temporary directories for sra and temporary fastq files HOT 14
- BUG: Incorrect SRA files (RNA-seq) HOT 1
- BUG: [scATAC-seq successful run but bam file and snap object are missing cell barcodes] HOT 6
- FR: don't crash when no differential genes are found HOT 4
- BUG: latest numpy version and IDR inconsistency HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from seq2science.