nanoporetech / isonclust2 Goto Github PK

View Code? Open in Web Editor NEW

13.0 13.0 3.0 658 KB

A tool for de novo clustering of long transcriptomic reads

License: Other

CMake 0.21% Makefile 0.02% C++ 99.77% C 0.01%

cdna rna rna-seq transcriptomics

isonclust2's People

Contributors

Stargazers

Watchers

Forkers

botond-sipos tetukas raggedgenes

isonclust2's Issues

Error in Clustering mode: Invalid clustering mode: 3

Hi, I am trying to run isONclust2 first for isONcorrect, but I got this error for all the batches, one example:
Loaded input batch from batches/isONbatch_9.cer:
Batch number: 9
Batch range: [244492,273799]
Depth: -1
Nr sequences: 29308
Nr bases: 50001212
Nr clusters: 29308
Nr nontrivial clusters: 0
Minimizers in database: 0
Created pseudo-batch for single clustering:
Batch number: -9
Batch range: [244492,273799]
Depth: -1
Nr sequences: 29308
Nr bases: 0
Nr clusters: 29308
Nr nontrivial clusters: 0
Minimizers in database: 0
Resetting input clusters.
Clustering mode: Invalid clustering mode: 3

from running:
for f in batches/isONbatch_.cer; do
filename=$(basename "$f")
output="clustered/${filename%.}.cer"
isONclust2 cluster -v -l "$f" -o "$output"
done

could you please advise what I need to fix?

Many thanks!!

Best,
CW

Aborted (core dumped)

Hi!
I have run the following command with Direct RNA sequencing reads:
isONclust2 sort -F 2 -v -o DRS_reads.fq

And I got the following output with the error occurred:
isONclust2 version: v2.3-e9da596 Batches output directory: sorted Minimum batch size: 50000 kilobases Kmer size: 11 Window size: 15 Consensus period: 500 Minimum cluster size for consensus: 50 Maximum cluster size for consensus: -150 Minimum average quality: 7 Minimum shared minimizers: 5 Minimum fraction of top minimizer hit: 0.8 Mapping threshold: 0.65 Alignment threshold: 0.2 Minimum probability no hit: 0.1 Minimum cluster size in left batches: 2 Debug output: off Warning: reusing existing output directory: sorted Warning: reusing existing output directory: sorted/batches Parsed 380000 sequences. Finished sorting sequences. Sorted sequences written to: sorted/sorted_reads.fastq Scores written to: sorted/scores.tsv Preparing batches: terminate called after throwing an instance of 'std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >' Aborted (core dumped)

Cluster Errors

hi ,sir
When we used the commands isONclust2 cluster -v -l isONclust2_batches/batches/isONbatch_0.cer -o b0.cer,the following errors appeared.And the output file b0.cer did not exist.

	Batch number: 0
	Batch range: [0,24669]
	Depth: -1
	Nr sequences: 24670
	Nr bases: 50001410
	Nr clusters: 24670
	Nr nontrivial clusters: 0
	Minimizers in database: 0
Created pseudo-batch for single clustering:
	Batch number: 0
	Batch range: [0,24669]
	Depth: -1
	Nr sequences: 24670
	Nr bases: 0
	Nr clusters: 24670
	Nr nontrivial clusters: 0
	Minimizers in database: 0
Resetting input clusters.
Clustering mode: Invalid clustering mode: 3

Then ,we added -x furious to this command isONclust2 cluster -x furious -l isONclust2_batches/batches/isONbatch_0.cer -o b0.cer. It worked.

Please help us telling that is a error or not .And how did it come out .

some level_1 output is not generated

Hi team,

I am using this as part of https://github.com/epi2me-labs/wf-transcriptomes/
I am able to run make batches, generating 0-48 batches, and the following clustering step failed.
The error message is slurmstepd: error: Detected 3526 oom_kill events in StepId=12206690.batch. Some of the step tasks have been OOM Killed.

But when I examine the log files, all job_level_0 output was generated, but most level_1 output not.
I tried to run the failed script from level_1.sh

isONclust2 cluster -x sahlin -v -Q -l clusters/isONcluster_0.cer -r clusters/isONcluster_1.cer -o clusters/isONcluster_49.cer ; sync

It showed segmentation fault (core dumped).

Loaded input batch from clusters/isONcluster_0.cer:
        Batch number: 0
        Batch range: [0,16883]
        Depth: 0
        Nr sequences: 16884
        Nr bases: 50287830
        Nr clusters: 1
        Nr nontrivial clusters: 1
        Minimizers in database: 22619
Loaded input batch from clusters/isONcluster_1.cer:
        Batch number: 1
        Batch range: [16884,39157]
        Depth: 0
        Nr sequences: 22274
        Nr bases: 50286904
        Nr clusters: 2
        Nr nontrivial clusters: 2
        Minimizers in database: 0
Generating consensus using spoa algorithm: semi-global
Clustering mode: sahlin
Segmentation fault (core dumped)

There are some were successfully run for level_1

isONclust2 cluster -x sahlin -v -Q -l clusters/isONcluster_8.cer -r clusters/isONcluster_9.cer -o clusters/isONcluster_53.cer ; sync

Loaded input batch from clusters/isONcluster_8.cer:
	Batch number: 8
	Batch range: [219434,255621]
	Depth: 0
	Nr sequences: 36188
	Nr bases: 50286190
	Nr clusters: 38
	Nr nontrivial clusters: 38
	Minimizers in database: 23082
Loaded input batch from clusters/isONcluster_9.cer:
	Batch number: 9
	Batch range: [255622,293538]
	Depth: 0
	Nr sequences: 37917
	Nr bases: 50287047
	Nr clusters: 33
	Nr nontrivial clusters: 32
	Minimizers in database: 0
Generating consensus using spoa algorithm: semi-global
Clustering mode: sahlin
Filtered out 0 input clusters smaller than 2.
Finished clustering!
Alignment invocation count: 0 (0%)
Consensus invocation count: 33 (100%)
Number of clusters larger than 1: 38
Output batch statistics:
	Batch number: 8
	Batch range: [219434,293538]
	Depth: 1
	Nr sequences: 74105
	Nr bases: 100573237
	Nr clusters: 38
	Nr nontrivial clusters: 38
	Minimizers in database: 24370
Output batch written to: clusters/isONcluster_53.cer

I noticed the minimizes is 0 for the right cluster, but not sure if this is related. This error caused then all subsequent issues. The file sizes seem small, and I have requested 16GB per core in a slurm management system.
I need some help to run this if you could kindly have a look at the issue.

Thanks a lot.

Using with dRNA-Seq Data

Hi @bsipos
It looks like a fab option for de novo clustering and transcriptome assembly.
Could you clarify if it is also possible to use dRNA-Seq datasets ?
If so which parameters would you recommend to use for a first try ?
Thanks
Ad

isONclust2 sort 'std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >'

when I run the code isONclust2 sort -B 200 -v 1.fq.I en countered this problem:

isONclust2 version: v2.3-e9da596
Batches output directory: isONclust2_batches
Minimum batch size: 200 kilobases
Kmer size: 11
Window size: 15
Consensus period: 500
Minimum cluster size for consensus: 50
Maximum cluster size for consensus: -150
Minimum average quality: 7
Minimum shared minimizers: 5
Minimum fraction of top minimizer hit: 0.8
Mapping threshold: 0.65
Alignment threshold: 0.2
Minimum probability no hit: 0.1
Minimum cluster size in left batches: 3
Debug output: off
Parsed 5847806 sequences.
Finished sorting sequences.
Sorted sequences written to: isONclust2_batches/sorted_reads.fastq
Scores written to: isONclust2_batches/scores.tsv
Preparing batches:
terminate called after throwing an instance of 'std::__cxx11::basic_string<char, std::char_traits, std::allocator >'
[1] 54662 abort (core dumped) isONclust2 sort -B 200 -v W555-N01.clipped.sam_clean.fastq

What is the impact of Different order of merging cer files?

hi, there,
The isONclust2 example code is listed below.

# sort reads and write out batches:
isONclust2 sort -B 50000 -v ens500.fq

# initial clustering of individual batches:
isONclust2 cluster -v -l isONclust2_batches/sorted/batches/isONbatch_0.cer -o b0.cer
isONclust2 cluster -v -l isONclust2_batches/sorted/batches/isONbatch_1.cer -o b1.cer
isONclust2 cluster -v -l isONclust2_batches/sorted/batches/isONbatch_2.cer -o b2.cer
isONclust2 cluster -v -l isONclust2_batches/sorted/batches/isONbatch_3.cer -o b3.cer
isONclust2 cluster -v -l isONclust2_batches/sorted/batches/isONbatch_4.cer -o b4.cer
isONclust2 cluster -v -l isONclust2_batches/sorted/batches/isONbatch_5.cer -o b5.cer

# merge cluster batches:
isONclust2 cluster -v -l b0.cer           -r b1.cer -o b_0_1.cer
isONclust2 cluster -v -l b_0_1.cer        -r b2.cer -o b_0_1_2.cer
isONclust2 cluster -v -l b_0_1_2.cer      -r b3.cer -o b_0_1_2_3.cer
isONclust2 cluster -v -l b_0_1_2_3.cer    -r b4.cer -o b_0_1_2_3_4.cer
isONclust2 cluster -v -l b_0_1_2_3_4.cer  -r b5.cer -o b_0_1_2_3_4_5.cer

# dump final results:
isONclust2 dump -v -i sorted/sorted_reads_idx.cer -o results b_0_1_2_3_4_5.cer

I have a test with the merge step .

#Binary merge .
isONclust2 cluster -v -l b0.cer   -r b1.cer -o b_0_1.cer
isONclust2 cluster -v -l b2.cer   -r b3.cer -o b_2_3.cer
isONclust2 cluster -v -l b4.cer   -r b5.cer -o b_4_5.cer

isONclust2 cluster -v -l b_0_1.cer   -r b_2_3.cer -o b_0_1_2_3.cer

isONclust2 cluster -v -l b_0_1_2_3.cer  -r b_4_5.cer -o b_0_1_2_3_4_5.cer

By this way ,we could decrease the memory cost of about 300+ cer files' merge step.

But the result seems not be the same with the example way.

Below infos come from clusters_info.tsv file.

Type	SeqNumber	ClusterNumber
rawONTfastq	8604779	0
ExampleWay	8604779	48346
BinaryTestWay	8380306	13571

The clusters_info.tsv's comparison listed below.

ExampleClusterID	ExampleSize	BinaryClusterId	BinarySize
0	1574207	0	2049062
1	305805	1	292054
2	190015	2	281066
3	169258	3	274773
4	143181	4	235370
5	120562	5	199694
6	112505	6	197687
7	96978	7	171023
8	90645	8	138774
9	86133	9	109343
10	80531	10	103364
	.
	.
	.
13565	8	13565	1
13566	8	13566	1
13567	8	13567	1
13568	8	13568	1
13569	8	13569	1
13570	8	13570	1
13571	8	13571	1
13572	8
13573	8
13574	8
13575	8
13576	8
13577	8
	.
	.
	.
48341	1
48342	1
48343	1
48344	1
48345	1
48346	1

Could you help us find out how this happened . Or Give a certain answer of Does this test work fine.
Thanks a lot .

nanoporetech / isonclust2 Goto Github PK

isonclust2's People

Contributors

Stargazers

Watchers

Forkers

isonclust2's Issues

Error in Clustering mode: Invalid clustering mode: 3

Aborted (core dumped)

Cluster Errors

some level_1 output is not generated

Using with dRNA-Seq Data

isONclust2 sort 'std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >'

What is the impact of Different order of merging cer files?

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent