Git Product home page Git Product logo

isonclust2's People

Contributors

asaont avatar botond-sipos avatar bsipos avatar iiseymour avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

isonclust2's Issues

Error in Clustering mode: Invalid clustering mode: 3

Hi, I am trying to run isONclust2 first for isONcorrect, but I got this error for all the batches, one example:
Loaded input batch from batches/isONbatch_9.cer:
Batch number: 9
Batch range: [244492,273799]
Depth: -1
Nr sequences: 29308
Nr bases: 50001212
Nr clusters: 29308
Nr nontrivial clusters: 0
Minimizers in database: 0
Created pseudo-batch for single clustering:
Batch number: -9
Batch range: [244492,273799]
Depth: -1
Nr sequences: 29308
Nr bases: 0
Nr clusters: 29308
Nr nontrivial clusters: 0
Minimizers in database: 0
Resetting input clusters.
Clustering mode: Invalid clustering mode: 3

from running:
for f in batches/isONbatch_.cer; do
filename=$(basename "$f")
output="clustered/${filename%.
}.cer"
isONclust2 cluster -v -l "$f" -o "$output"
done

could you please advise what I need to fix?

Many thanks!!

Best,
CW

Aborted (core dumped)

Hi!
I have run the following command with Direct RNA sequencing reads:
isONclust2 sort -F 2 -v -o DRS_reads.fq

And I got the following output with the error occurred:
isONclust2 version: v2.3-e9da596 Batches output directory: sorted Minimum batch size: 50000 kilobases Kmer size: 11 Window size: 15 Consensus period: 500 Minimum cluster size for consensus: 50 Maximum cluster size for consensus: -150 Minimum average quality: 7 Minimum shared minimizers: 5 Minimum fraction of top minimizer hit: 0.8 Mapping threshold: 0.65 Alignment threshold: 0.2 Minimum probability no hit: 0.1 Minimum cluster size in left batches: 2 Debug output: off Warning: reusing existing output directory: sorted Warning: reusing existing output directory: sorted/batches Parsed 380000 sequences. Finished sorting sequences. Sorted sequences written to: sorted/sorted_reads.fastq Scores written to: sorted/scores.tsv Preparing batches: terminate called after throwing an instance of 'std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >' Aborted (core dumped)

Cluster Errors

hi ,sir
When we used the commands isONclust2 cluster -v -l isONclust2_batches/batches/isONbatch_0.cer -o b0.cer,the following errors appeared.And the output file b0.cer did not exist.

	Batch number: 0
	Batch range: [0,24669]
	Depth: -1
	Nr sequences: 24670
	Nr bases: 50001410
	Nr clusters: 24670
	Nr nontrivial clusters: 0
	Minimizers in database: 0
Created pseudo-batch for single clustering:
	Batch number: 0
	Batch range: [0,24669]
	Depth: -1
	Nr sequences: 24670
	Nr bases: 0
	Nr clusters: 24670
	Nr nontrivial clusters: 0
	Minimizers in database: 0
Resetting input clusters.
Clustering mode: Invalid clustering mode: 3

Then ,we added -x furious to this command isONclust2 cluster -x furious -l isONclust2_batches/batches/isONbatch_0.cer -o b0.cer. It worked.

Please help us telling that is a error or not .And how did it come out .

some level_1 output is not generated

Hi team,

I am using this as part of https://github.com/epi2me-labs/wf-transcriptomes/
I am able to run make batches, generating 0-48 batches, and the following clustering step failed.
The error message is slurmstepd: error: Detected 3526 oom_kill events in StepId=12206690.batch. Some of the step tasks have been OOM Killed.

But when I examine the log files, all job_level_0 output was generated, but most level_1 output not.
I tried to run the failed script from level_1.sh

isONclust2 cluster -x sahlin -v -Q -l clusters/isONcluster_0.cer -r clusters/isONcluster_1.cer -o clusters/isONcluster_49.cer ; sync

It showed segmentation fault (core dumped).

Loaded input batch from clusters/isONcluster_0.cer:
        Batch number: 0
        Batch range: [0,16883]
        Depth: 0
        Nr sequences: 16884
        Nr bases: 50287830
        Nr clusters: 1
        Nr nontrivial clusters: 1
        Minimizers in database: 22619
Loaded input batch from clusters/isONcluster_1.cer:
        Batch number: 1
        Batch range: [16884,39157]
        Depth: 0
        Nr sequences: 22274
        Nr bases: 50286904
        Nr clusters: 2
        Nr nontrivial clusters: 2
        Minimizers in database: 0
Generating consensus using spoa algorithm: semi-global
Clustering mode: sahlin
Segmentation fault (core dumped)

There are some were successfully run for level_1

isONclust2 cluster -x sahlin -v -Q -l clusters/isONcluster_8.cer -r clusters/isONcluster_9.cer -o clusters/isONcluster_53.cer ; sync
Loaded input batch from clusters/isONcluster_8.cer:
	Batch number: 8
	Batch range: [219434,255621]
	Depth: 0
	Nr sequences: 36188
	Nr bases: 50286190
	Nr clusters: 38
	Nr nontrivial clusters: 38
	Minimizers in database: 23082
Loaded input batch from clusters/isONcluster_9.cer:
	Batch number: 9
	Batch range: [255622,293538]
	Depth: 0
	Nr sequences: 37917
	Nr bases: 50287047
	Nr clusters: 33
	Nr nontrivial clusters: 32
	Minimizers in database: 0
Generating consensus using spoa algorithm: semi-global
Clustering mode: sahlin
Filtered out 0 input clusters smaller than 2.
Finished clustering!
Alignment invocation count: 0 (0%)
Consensus invocation count: 33 (100%)
Number of clusters larger than 1: 38
Output batch statistics:
	Batch number: 8
	Batch range: [219434,293538]
	Depth: 1
	Nr sequences: 74105
	Nr bases: 100573237
	Nr clusters: 38
	Nr nontrivial clusters: 38
	Minimizers in database: 24370
Output batch written to: clusters/isONcluster_53.cer

I noticed the minimizes is 0 for the right cluster, but not sure if this is related. This error caused then all subsequent issues. The file sizes seem small, and I have requested 16GB per core in a slurm management system.
I need some help to run this if you could kindly have a look at the issue.

Thanks a lot.

Using with dRNA-Seq Data

Hi @bsipos
It looks like a fab option for de novo clustering and transcriptome assembly.
Could you clarify if it is also possible to use dRNA-Seq datasets ?
If so which parameters would you recommend to use for a first try ?
Thanks
Ad

isONclust2 sort 'std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >'

when I run the code isONclust2 sort -B 200 -v 1.fq.I en countered this problem:

isONclust2 version: v2.3-e9da596
Batches output directory: isONclust2_batches
Minimum batch size: 200 kilobases
Kmer size: 11
Window size: 15
Consensus period: 500
Minimum cluster size for consensus: 50
Maximum cluster size for consensus: -150
Minimum average quality: 7
Minimum shared minimizers: 5
Minimum fraction of top minimizer hit: 0.8
Mapping threshold: 0.65
Alignment threshold: 0.2
Minimum probability no hit: 0.1
Minimum cluster size in left batches: 3
Debug output: off
Parsed 5847806 sequences.
Finished sorting sequences.
Sorted sequences written to: isONclust2_batches/sorted_reads.fastq
Scores written to: isONclust2_batches/scores.tsv
Preparing batches:
terminate called after throwing an instance of 'std::__cxx11::basic_string<char, std::char_traits, std::allocator >'
[1] 54662 abort (core dumped) isONclust2 sort -B 200 -v W555-N01.clipped.sam_clean.fastq

What is the impact of Different order of merging cer files?

hi, there,
The isONclust2 example code is listed below.

# sort reads and write out batches:
isONclust2 sort -B 50000 -v ens500.fq

# initial clustering of individual batches:
isONclust2 cluster -v -l isONclust2_batches/sorted/batches/isONbatch_0.cer -o b0.cer
isONclust2 cluster -v -l isONclust2_batches/sorted/batches/isONbatch_1.cer -o b1.cer
isONclust2 cluster -v -l isONclust2_batches/sorted/batches/isONbatch_2.cer -o b2.cer
isONclust2 cluster -v -l isONclust2_batches/sorted/batches/isONbatch_3.cer -o b3.cer
isONclust2 cluster -v -l isONclust2_batches/sorted/batches/isONbatch_4.cer -o b4.cer
isONclust2 cluster -v -l isONclust2_batches/sorted/batches/isONbatch_5.cer -o b5.cer

# merge cluster batches:
isONclust2 cluster -v -l b0.cer           -r b1.cer -o b_0_1.cer
isONclust2 cluster -v -l b_0_1.cer        -r b2.cer -o b_0_1_2.cer
isONclust2 cluster -v -l b_0_1_2.cer      -r b3.cer -o b_0_1_2_3.cer
isONclust2 cluster -v -l b_0_1_2_3.cer    -r b4.cer -o b_0_1_2_3_4.cer
isONclust2 cluster -v -l b_0_1_2_3_4.cer  -r b5.cer -o b_0_1_2_3_4_5.cer

# dump final results:
isONclust2 dump -v -i sorted/sorted_reads_idx.cer -o results b_0_1_2_3_4_5.cer

I have a test with the merge step .

#Binary merge .
isONclust2 cluster -v -l b0.cer   -r b1.cer -o b_0_1.cer
isONclust2 cluster -v -l b2.cer   -r b3.cer -o b_2_3.cer
isONclust2 cluster -v -l b4.cer   -r b5.cer -o b_4_5.cer

isONclust2 cluster -v -l b_0_1.cer   -r b_2_3.cer -o b_0_1_2_3.cer

isONclust2 cluster -v -l b_0_1_2_3.cer  -r b_4_5.cer -o b_0_1_2_3_4_5.cer

By this way ,we could decrease the memory cost of about 300+ cer files' merge step.

But the result seems not be the same with the example way.

Below infos come from clusters_info.tsv file.

Type SeqNumber ClusterNumber
rawONTfastq 8604779 0
ExampleWay 8604779 48346
BinaryTestWay 8380306 13571

The clusters_info.tsv's comparison listed below.

ExampleClusterID ExampleSize BinaryClusterId BinarySize
0 1574207 0 2049062
1 305805 1 292054
2 190015 2 281066
3 169258 3 274773
4 143181 4 235370
5 120562 5 199694
6 112505 6 197687
7 96978 7 171023
8 90645 8 138774
9 86133 9 109343
10 80531 10 103364
.
.
.
13565 8 13565 1
13566 8 13566 1
13567 8 13567 1
13568 8 13568 1
13569 8 13569 1
13570 8 13570 1
13571 8 13571 1
13572 8
13573 8
13574 8
13575 8
13576 8
13577 8
.
.
.
48341 1
48342 1
48343 1
48344 1
48345 1
48346 1

Could you help us find out how this happened . Or Give a certain answer of Does this test work fine.
Thanks a lot .

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.