nanoporetech / isonclust2 Goto Github PK
View Code? Open in Web Editor NEWA tool for de novo clustering of long transcriptomic reads
License: Other
A tool for de novo clustering of long transcriptomic reads
License: Other
Hi, I am trying to run isONclust2 first for isONcorrect, but I got this error for all the batches, one example:
Loaded input batch from batches/isONbatch_9.cer:
Batch number: 9
Batch range: [244492,273799]
Depth: -1
Nr sequences: 29308
Nr bases: 50001212
Nr clusters: 29308
Nr nontrivial clusters: 0
Minimizers in database: 0
Created pseudo-batch for single clustering:
Batch number: -9
Batch range: [244492,273799]
Depth: -1
Nr sequences: 29308
Nr bases: 0
Nr clusters: 29308
Nr nontrivial clusters: 0
Minimizers in database: 0
Resetting input clusters.
Clustering mode: Invalid clustering mode: 3
from running:
for f in batches/isONbatch_.cer; do
filename=$(basename "$f")
output="clustered/${filename%.}.cer"
isONclust2 cluster -v -l "$f" -o "$output"
done
could you please advise what I need to fix?
Many thanks!!
Best,
CW
Hi!
I have run the following command with Direct RNA sequencing reads:
isONclust2 sort -F 2 -v -o DRS_reads.fq
And I got the following output with the error occurred:
isONclust2 version: v2.3-e9da596 Batches output directory: sorted Minimum batch size: 50000 kilobases Kmer size: 11 Window size: 15 Consensus period: 500 Minimum cluster size for consensus: 50 Maximum cluster size for consensus: -150 Minimum average quality: 7 Minimum shared minimizers: 5 Minimum fraction of top minimizer hit: 0.8 Mapping threshold: 0.65 Alignment threshold: 0.2 Minimum probability no hit: 0.1 Minimum cluster size in left batches: 2 Debug output: off Warning: reusing existing output directory: sorted Warning: reusing existing output directory: sorted/batches Parsed 380000 sequences. Finished sorting sequences. Sorted sequences written to: sorted/sorted_reads.fastq Scores written to: sorted/scores.tsv Preparing batches: terminate called after throwing an instance of 'std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >' Aborted (core dumped)
hi ,sir
When we used the commands isONclust2 cluster -v -l isONclust2_batches/batches/isONbatch_0.cer -o b0.cer
,the following errors appeared.And the output file b0.cer
did not exist.
Batch number: 0
Batch range: [0,24669]
Depth: -1
Nr sequences: 24670
Nr bases: 50001410
Nr clusters: 24670
Nr nontrivial clusters: 0
Minimizers in database: 0
Created pseudo-batch for single clustering:
Batch number: 0
Batch range: [0,24669]
Depth: -1
Nr sequences: 24670
Nr bases: 0
Nr clusters: 24670
Nr nontrivial clusters: 0
Minimizers in database: 0
Resetting input clusters.
Clustering mode: Invalid clustering mode: 3
Then ,we added -x furious
to this command isONclust2 cluster -x furious -l isONclust2_batches/batches/isONbatch_0.cer -o b0.cer
. It worked.
Please help us telling that is a error or not .And how did it come out .
Hi team,
I am using this as part of https://github.com/epi2me-labs/wf-transcriptomes/
I am able to run make batches, generating 0-48 batches, and the following clustering step failed.
The error message is slurmstepd: error: Detected 3526 oom_kill events in StepId=12206690.batch. Some of the step tasks have been OOM Killed.
But when I examine the log files, all job_level_0 output was generated, but most level_1 output not.
I tried to run the failed script from level_1.sh
isONclust2 cluster -x sahlin -v -Q -l clusters/isONcluster_0.cer -r clusters/isONcluster_1.cer -o clusters/isONcluster_49.cer ; sync
It showed segmentation fault (core dumped).
Loaded input batch from clusters/isONcluster_0.cer:
Batch number: 0
Batch range: [0,16883]
Depth: 0
Nr sequences: 16884
Nr bases: 50287830
Nr clusters: 1
Nr nontrivial clusters: 1
Minimizers in database: 22619
Loaded input batch from clusters/isONcluster_1.cer:
Batch number: 1
Batch range: [16884,39157]
Depth: 0
Nr sequences: 22274
Nr bases: 50286904
Nr clusters: 2
Nr nontrivial clusters: 2
Minimizers in database: 0
Generating consensus using spoa algorithm: semi-global
Clustering mode: sahlin
Segmentation fault (core dumped)
There are some were successfully run for level_1
isONclust2 cluster -x sahlin -v -Q -l clusters/isONcluster_8.cer -r clusters/isONcluster_9.cer -o clusters/isONcluster_53.cer ; sync
Loaded input batch from clusters/isONcluster_8.cer:
Batch number: 8
Batch range: [219434,255621]
Depth: 0
Nr sequences: 36188
Nr bases: 50286190
Nr clusters: 38
Nr nontrivial clusters: 38
Minimizers in database: 23082
Loaded input batch from clusters/isONcluster_9.cer:
Batch number: 9
Batch range: [255622,293538]
Depth: 0
Nr sequences: 37917
Nr bases: 50287047
Nr clusters: 33
Nr nontrivial clusters: 32
Minimizers in database: 0
Generating consensus using spoa algorithm: semi-global
Clustering mode: sahlin
Filtered out 0 input clusters smaller than 2.
Finished clustering!
Alignment invocation count: 0 (0%)
Consensus invocation count: 33 (100%)
Number of clusters larger than 1: 38
Output batch statistics:
Batch number: 8
Batch range: [219434,293538]
Depth: 1
Nr sequences: 74105
Nr bases: 100573237
Nr clusters: 38
Nr nontrivial clusters: 38
Minimizers in database: 24370
Output batch written to: clusters/isONcluster_53.cer
I noticed the minimizes is 0 for the right cluster, but not sure if this is related. This error caused then all subsequent issues. The file sizes seem small, and I have requested 16GB per core in a slurm management system.
I need some help to run this if you could kindly have a look at the issue.
Thanks a lot.
Hi @bsipos
It looks like a fab option for de novo clustering and transcriptome assembly.
Could you clarify if it is also possible to use dRNA-Seq datasets ?
If so which parameters would you recommend to use for a first try ?
Thanks
Ad
when I run the code isONclust2 sort -B 200 -v 1.fq.I en countered this problem:
isONclust2 version: v2.3-e9da596
Batches output directory: isONclust2_batches
Minimum batch size: 200 kilobases
Kmer size: 11
Window size: 15
Consensus period: 500
Minimum cluster size for consensus: 50
Maximum cluster size for consensus: -150
Minimum average quality: 7
Minimum shared minimizers: 5
Minimum fraction of top minimizer hit: 0.8
Mapping threshold: 0.65
Alignment threshold: 0.2
Minimum probability no hit: 0.1
Minimum cluster size in left batches: 3
Debug output: off
Parsed 5847806 sequences.
Finished sorting sequences.
Sorted sequences written to: isONclust2_batches/sorted_reads.fastq
Scores written to: isONclust2_batches/scores.tsv
Preparing batches:
terminate called after throwing an instance of 'std::__cxx11::basic_string<char, std::char_traits, std::allocator >'
[1] 54662 abort (core dumped) isONclust2 sort -B 200 -v W555-N01.clipped.sam_clean.fastq
hi, there,
The isONclust2 example code is listed below.
# sort reads and write out batches:
isONclust2 sort -B 50000 -v ens500.fq
# initial clustering of individual batches:
isONclust2 cluster -v -l isONclust2_batches/sorted/batches/isONbatch_0.cer -o b0.cer
isONclust2 cluster -v -l isONclust2_batches/sorted/batches/isONbatch_1.cer -o b1.cer
isONclust2 cluster -v -l isONclust2_batches/sorted/batches/isONbatch_2.cer -o b2.cer
isONclust2 cluster -v -l isONclust2_batches/sorted/batches/isONbatch_3.cer -o b3.cer
isONclust2 cluster -v -l isONclust2_batches/sorted/batches/isONbatch_4.cer -o b4.cer
isONclust2 cluster -v -l isONclust2_batches/sorted/batches/isONbatch_5.cer -o b5.cer
# merge cluster batches:
isONclust2 cluster -v -l b0.cer -r b1.cer -o b_0_1.cer
isONclust2 cluster -v -l b_0_1.cer -r b2.cer -o b_0_1_2.cer
isONclust2 cluster -v -l b_0_1_2.cer -r b3.cer -o b_0_1_2_3.cer
isONclust2 cluster -v -l b_0_1_2_3.cer -r b4.cer -o b_0_1_2_3_4.cer
isONclust2 cluster -v -l b_0_1_2_3_4.cer -r b5.cer -o b_0_1_2_3_4_5.cer
# dump final results:
isONclust2 dump -v -i sorted/sorted_reads_idx.cer -o results b_0_1_2_3_4_5.cer
I have a test with the merge
step .
#Binary merge .
isONclust2 cluster -v -l b0.cer -r b1.cer -o b_0_1.cer
isONclust2 cluster -v -l b2.cer -r b3.cer -o b_2_3.cer
isONclust2 cluster -v -l b4.cer -r b5.cer -o b_4_5.cer
isONclust2 cluster -v -l b_0_1.cer -r b_2_3.cer -o b_0_1_2_3.cer
isONclust2 cluster -v -l b_0_1_2_3.cer -r b_4_5.cer -o b_0_1_2_3_4_5.cer
By this way ,we could decrease the memory cost of about 300+ cer files' merge step.
But the result seems not be the same with the example way.
Below infos come from clusters_info.tsv file.
Type | SeqNumber | ClusterNumber |
---|---|---|
rawONTfastq | 8604779 | 0 |
ExampleWay | 8604779 | 48346 |
BinaryTestWay | 8380306 | 13571 |
The clusters_info.tsv's comparison listed below.
ExampleClusterID | ExampleSize | BinaryClusterId | BinarySize |
---|---|---|---|
0 | 1574207 | 0 | 2049062 |
1 | 305805 | 1 | 292054 |
2 | 190015 | 2 | 281066 |
3 | 169258 | 3 | 274773 |
4 | 143181 | 4 | 235370 |
5 | 120562 | 5 | 199694 |
6 | 112505 | 6 | 197687 |
7 | 96978 | 7 | 171023 |
8 | 90645 | 8 | 138774 |
9 | 86133 | 9 | 109343 |
10 | 80531 | 10 | 103364 |
. | |||
. | |||
. | |||
13565 | 8 | 13565 | 1 |
13566 | 8 | 13566 | 1 |
13567 | 8 | 13567 | 1 |
13568 | 8 | 13568 | 1 |
13569 | 8 | 13569 | 1 |
13570 | 8 | 13570 | 1 |
13571 | 8 | 13571 | 1 |
13572 | 8 | ||
13573 | 8 | ||
13574 | 8 | ||
13575 | 8 | ||
13576 | 8 | ||
13577 | 8 | ||
. | |||
. | |||
. | |||
48341 | 1 | ||
48342 | 1 | ||
48343 | 1 | ||
48344 | 1 | ||
48345 | 1 | ||
48346 | 1 |
Could you help us find out how this happened . Or Give a certain answer of Does this test work fine.
Thanks a lot .
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.