Comments (11)
from bsseq.
Thanks, Kasper, for the prompt reply. I tried to represent consecutive rows with a single row in each replicate file using a simple Python script. Even though my 100 files have less than 29 million rows, the merge GRange object differed from what I anticipated (still higher than 29 million). Is there a better approach to go from Bismark to bsseq to resolve this issue?
"""
import sys
def merge_consecutive_lines(lines):
merged_lines = []
def process_and_append(start_line, end_line):
max_col4 = max(float(line.split("\t")[3]) for line in lines[start_line:end_line + 1])
sum_col5 = sum(int(line.split("\t")[4]) for line in lines[start_line:end_line + 1])
sum_col6 = sum(int(line.split("\t")[5]) for line in lines[start_line:end_line + 1])
merged_lines.append("\t".join([lines[start_line].split("\t")[0], lines[start_line].split("\t")[1], lines[end_line].split("\t")[2], str(max_col4), str(sum_col5), str(sum_col6)]))
start_line = 0
for i in range(1, len(lines)):
current_line = lines[i].split("\t")
prev_line = lines[i - 1].split("\t")
if int(current_line[1]) != int(prev_line[2]) + 1:
process_and_append(start_line, i - 1)
start_line = i
process_and_append(start_line, len(lines) - 1)
return merged_lines
if name == "main":
#if len(sys.argv) != 2:
#print("Usage: python merge_lines.py input.txt")
#sys.exit(1)
#input_file = sys.argv[1]
#with open(input_file, 'r') as file:
#lines = file.read().splitlines()
lines = sys.stdin.read().splitlines()
merged_lines = merge_consecutive_lines(lines)
for line in merged_lines:
# Print columns 1, 2, 4, 5, and 6
columns = line.split("\t")
print("\t".join([columns[0], columns[1], columns[3], columns[4], columns[5]]))
"""
Best,
Dereje
from bsseq.
I am sorry! Are you saying the way data is loaded into bsseq is correct? I used the DSS package that was built based on bsseq and got erroneous results. and went back to using bsseq and saw how the merging is handled in bsseq.
from bsseq.
from bsseq.
No, I didn't! The manual of the DSS package suggested extracting four columns of the input file from the *.cov.gz" Bistmark output file. I then processed that using their guideline. In the end, I found the number of DMC reporter was off than what exist in the human genome. I then went back again to fix the input and redid the procedure after merging the reads from both strand using the Python script, again got the same issue after the complete DMC and DMR analysis.
I then decided to use bsseq without DSS to see how the bsseq handles the import and Bismark mapping. That is where I am right now.
Dereje
from bsseq.
from bsseq.
I am talking to @FelixKrueger, a developer of Bismark. He suggested running the "coverage2cytosine --merge_CpG" option. I don't know if this is the intended input for bsseq downstream analysis. I am running it right now to see how the coverage looks like. I will look into that.
Dereje
from bsseq.
It would be nice if you wrote a complete vignette including Bismark for both single-end and paired-end reads.
from bsseq.
from bsseq.
Yes, I looked into the read.bismark(). files: It says, The path to the files created by running Bismark's methylation extractor, one sample per file. Files ending in .gz or .bz2 will be automatically decompressed to [tempfile](http://127.0.0.1:14081/help/library/bsseq/help/tempfile)(). We strongly recommend you use the 'genome wide cytosine report' output files. See section 'File formats' for further details.
That is what I read in bsseq. I provided the *.cov.gz
file from Bismark methylation extractor. This represents each "C" in the genome not the aggregate CpG. from both strand.
from bsseq.
In the MOABS pipeline, I always get a single row for C and G in the single CpG. In the differentially methylated "C" (DMC) file, I always get one row in the DMC file. In your case, I am getting a row for "C" and a row for "G" after processing. Here is a sample output from MOABS mcall:
#chrom start end totalC_0 nominalRatio_0 ratioCI_0 totalC_1 nominalRatio_1 ratioCI_1 nominalDif_1-0 credibleDif_1-0 difCI_1-0 p_sim_1_v_0 p_fet_1_v_0 class
1 10468 10470 24 0.583 0.398,0.769 20 0.75 0.562,0.938 0.167 0 -0.0722,0.365 0.0309 0.342 hypo
1 10470 10472 25 0.68 0.504,0.856 23 0.739 0.562,0.916 0.0591 0 -0.154,0.258 0.0576 0.756 hypo
1 10483 10485 27 0.815 0.66,0.969 32 0.906 0.781,1 0.0914 0 -0.0583,0.244 0.0556 0.45 hypo
from bsseq.
Related Issues (20)
- BSmooth.tstat error, Error in compute.correction HOT 5
- read.bismark() too slow - input format issue? HOT 8
- BSmooth - Error in env[[as.character(i)]] <- value : wrong args for environment subassignment HOT 2
- Error: C stack usage 10847362 is too close to the limit HOT 3
- Is it possible to append metadata to the GRanges within BSseq objects? HOT 3
- Normalization of the WGBS smooth data HOT 5
- bsseq incompatible with tictoc library HOT 1
- Error BSmooth HOT 3
- read.bismark: option to change tmpdir in fread? HOT 4
- BSmooth with HDF5 realization backend error: Stop worker failed with the error: wrong args for environment subassignment HOT 5
- combineList: BACKEND Argument not working as intended HOT 1
- BSmooth.tstat issue HOT 11
- Updating beachmat to tatami HOT 1
- combine(sample1.smooth, sample2.smooth) gives in unsmooth results? HOT 4
- bsmooth() fails - Error in `h()`: ! error in evaluating the argument 'seed' in selecting a method for function 'DelayedArray': HDF5. File accessibility. Unable to open file. HOT 3
- Reading H5 files directly HOT 3
- error message when I use the bsseq read.bismark command to read a genome wide cytosine report generated by bismark. HOT 26
- exclude "Y" chromosome from bsseq object HOT 3
- Upcoming change to DelayedArray::realize() HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from bsseq.