shingocat / lrscaf Goto Github PK
View Code? Open in Web Editor NEWTGS scaffolding
TGS scaffolding
Hi!
I've been running LRScaf, and since yesterday it apparentely hasn't done anything. The last line of the log is:
2020-01-04 17:27:40 [ INFO ] Finding repeat, erase time: 1948 ms
When using "top" the process appears with state S (interruptible sleep (waiting for an event to complete), https://idea.popcount.org/2012-12-11-linux-process-states/)
Any clue regarding this? How long should I expect this step to last?
The minimap alignment file is a 4.1 Gb file, the contigs file is a 4.3 Gb file, and current memory usage of LRScaf seem to be ~16 Gb.
手上有80x左右的pacbio,先前用MeCAT组装出来一堆Contig,那么通过调整哪些参数,可以提高scaffold的连续性?可否有些建议呢?
Hi,
The program worked and I got scaffolds at the end.
How can I get the information about which contigs on my asssembly have been assembled in which scaffolds ?
In other word: for one scaffold in particular I would like to know the name of the contigs scaffolded ?
thanks
When I try using LRScaf v 1.1.10, I get a message 'The output file/mmjggl/nicodemus/.../nodePaths.info could not be created!'. The nodePaths.info file is actually present in the output folder, but it is empty. The scaffolds.fasta seems to get generated correctly.
When I run exactly the same command using v 1.1.9, nodePaths.info gets properly generated. It seems like the changes introduced into the file writer in the new version broke it. I'm fine with using v 1.1.9, but wanted to let you know.
Also related to this, how can I translate the digraph IDs used in nodePaths.info (e.g. G1), into scaffold IDs (e.g. Scaffolds_0) used in scaffolds.fasta? I need this to figure out what contigs from the initial assembly ended up in LRScaf scaffolds.
Thanks!
Hi, I have clean reads from PACBIO assembly generated from Hifiasm. Error correction and polishing is performed using RACON and PILON. This gave me good N50. However, when I aligned cleaned assembly to the reference genome (plant), the scaffolds are still not extended.
After that I aligned my cleaned assembly to Nanopore (rawfile.fq) using MINIMAP2. I have .paf file as well .sam file after the alignment. Now I am not sure how to proceed with LRSCARF. Can you please suggest? I would appreciate that.
Thank you
Hello!
Recently I ran LRScaf using the ScafConf.xml template, with default values documented in the README, and it reported:
2021-08-26 22:42:23 [ INFO ] Parsing the xml configure, all the other parameters set by command line will dismissed!
2021-08-26 22:42:23 [ INFO ] agis.ps.file.XMLParser The para element contain illeage item tips_length. it will be omitted!
......
I also notice that in the README -> Parameters of LRScaf, the parameter is not spelled <tips_length>:
tip_length The maximum tip length. Default: <1500> bp.
So I suggest that the template xml ScafConf.xml and the example xml file in README.md, better be consistent with the spelling in the source code:)
Hello,
Thanks for your software. I used LRScaf to scaffold my contig assembly. Here is the busco before and after LRScaf:
before: C:95.4%[S:94.7%,D:0.7%],F:1.3%,M:3.3%,n:1367
after: C:95.8%[S:91.4%,D:4.4%],F:1.2%,M:3.0%,n:1367
How can we explain the fact that there is more duplicated busco after ?
Thanks for your help.
No warning message, XML configuration taken from GitHub page.... Default should be to not erase things.
Hi,
What do you recommend to use ? raw or corrected reads ?
What tools to compile
studio or eclipse?
After scaffolding a contigs fasta (containing no Ns), the output scaffolds.fasta file contained Ns, even thought the input long reads contained no Ns.
If two contigs are being linked by a long read, then in the resulting scaffold, is the sequence between the two contigs copied over from the long read, or is it a stretch of Ns?
Hi,
The software works pretty well, but somehow for one of the genome, I'm working on generating very long stretches of homopolymers mainly Ts and As. It does that in many parts of the genome. To me this is not biological but are due to the pacbio reads, how can I avoid it?
Thanks
F
Hi,
I followed your tutorial mapping the long reads to the assembly using minimap2. Then I ran:
java -jar ../lrscaf/target/LRScaf-1.1.7.jar -c ../gelada_10x_split.fa -a ../aln.mm -t mm -o ./lrscaf_out
And I got this error:
2019-11-22 11:17:30 [ ERROR ] agis.ps.file.MMReader 5
2019-11-22 12:18:08 [ ERROR ] agis.ps.Main The aligned file could not be null!
The log looks like this:
2019-11-22 12:27:37 [ WARN ] The identity for minimap mapper would be setted to 0.1! 2019-11-22 12:27:37 [ INFO ] Launching... 2019-11-22 12:27:37 [ INFO ] Build output folder successfully! 2019-11-22 12:27:37 [ INFO ] Building output folder, erase time: 1 ms 2019-11-22 12:33:53 [ INFO ] Reading contigs, erase times: 375239 ms 2019-11-22 12:46:27 [ INFO ] Valid Aligned Records: 7380589 2019-11-22 12:46:27 [ INFO ] Reading Aligned Records, erase time: 753695 ms 2019-11-22 12:46:28 [ INFO ] Finding Repeats: 2019-11-22 12:46:28 [ INFO ] MIN: 1.0 2019-11-22 12:46:28 [ INFO ] First Quartile: 4.0 2019-11-22 12:46:28 [ INFO ] Median cov = 9.0 2019-11-22 12:46:28 [ INFO ] Third Quartile: 14.0 2019-11-22 12:46:28 [ INFO ] MAX: 19899.0 2019-11-22 12:46:28 [ INFO ] Interquartile Range: 10.0 2019-11-22 12:46:28 [ INFO ] 1.5's IQR , Outlier Threshold: 29.0 2019-11-22 12:46:28 [ INFO ] Repeat count: 14875 2019-11-22 12:46:28 [ INFO ] Finding repeat, erase time: 805 ms
Hi,
I have quickmerge fasta file from Hiifasm and IPA assembly of a plant genome. Later I polished the fasta file using RACON.
I tried the below option and getting error described below
java -jar /rhome/khushwas/lrscaf/target/LRScaf-1.1.12.jar -c merged_prefix.fasta -a merged.prefix.fasta.sam -t merged.prefix.fasta.sam -micl 500 -mioll 170 -mmcm 8 -o /path/to/folder/
merged_prefix.fasta - fasta file generated after quickmerge from two long read assmblies
merged.prefix.fasta.sam - Minimap2 - sam file
2021-06-17 22:02:18 [ INFO ] Launching...
2021-06-17 22:02:18 [ INFO ] The output folder existed!
2021-06-17 22:02:18 [ INFO ] Building output folder, elapsed time: 0 ms
2021-06-17 22:03:03 [ INFO ] Reading contigs, elapsed times: 45949 ms
2021-06-17 22:03:04 [ INFO ] The aligned parameter should be not set! only , , or
2021-06-17 22:03:04 [ ERROR ] PathBuilder : The Edges could not be empty!
2021-06-17 22:03:04 [ INFO ] Ending...
2021-06-17 22:03:04 [ INFO ] Scaffolding elapsed time: 46 s.
I do not know if my LRSCAF command is correct or not? Please suggest.
Thank you
Hi!
I'm making a first pass at using LRScaf and having trouble making sense of the failed output. LRScaf exit status was 0 but I'm not seeing a hybrid assembly - and like others here - but maybe not identical - I am also getting the edges could not be empty error - and an error in the MMReader input string.
I have a very poor Discovar assembly for the pygmy octopus genome (over 6 million scaffolds) and a first pass of MinION reads (150,000 reads). The genome is around human size - and I estimate MinION coverage is about 0.68x - Illumina coverage is around 80x going into the Discovar assembly but the assembly did not turn out well. I used minimap2 with default settings.
logs/error.log from LRScaf:
2019-01-12 15:50:26 [ main:466447 ] - [agis.ps.file.AlignmentFileReader.read(AlignmentFileReader.java:163)] - [ ERROR ] agis.ps.file.MMReader For input string: "collected"
2019-01-12 15:50:26 [ main:466475 ] - [agis.ps.Scaffolder.scaffolding(Scaffolder.java:82)] - ### [ ERROR ] PathBuilder : The Edges could not be empty!
logs/logs.log from LRScaf:
2019-01-12 15:42:40 [ main:0 ] - [agis.ps.Main.main(Main.java:48)] - [ INFO ] Parsing the xml configure, all the other parameters set by command line will dismissed
2019-01-12 15:42:40 [ main:58 ] - [agis.ps.file.XMLParser.parseXML(XMLParser.java:182)] - [ INFO ] agis.ps.file.XMLParser The para element contain illeage item tips_length. it will be omitted!
2019-01-12 15:42:40 [ main:59 ] - [agis.ps.Main.main(Main.java:56)] - [ INFO ] Launching...
2019-01-12 15:42:40 [ main:70 ] - [agis.ps.file.OutputFolderBuilder.building(OutputFolderBuilder.java:42)] - [ INFO ] Build output folder successfully!
2019-01-12 15:42:40 [ main:71 ] - [agis.ps.file.OutputFolderBuilder.building(OutputFolderBuilder.java:52)] - [ INFO ] Building output folder, erase time: 8 ms
2019-01-12 15:50:24 [ main:464505 ] - [agis.ps.file.ContigReader.read(ContigReader.java:149)] - [ INFO ] Reading contigs, erase times: 464429 ms
2019-01-12 15:50:26 [ main:466447 ] - [agis.ps.file.AlignmentFileReader.read(AlignmentFileReader.java:163)] - [ ERROR ] agis.ps.file.MMReader For input string: "collected"
2019-01-12 15:50:26 [ main:466450 ] - [agis.ps.file.AlignmentFileReader.read(AlignmentFileReader.java:176)] - [ INFO ] Valid Aligned Records: 0
2019-01-12 15:50:26 [ main:466450 ] - [agis.ps.file.AlignmentFileReader.read(AlignmentFileReader.java:177)] - [ INFO ] Reading Aligned Records, erase time: 19 ms
2019-01-12 15:50:26 [ main:466451 ] - [agis.ps.util.RepeatFinder.findRepeats(RepeatFinder.java:123)] - [ INFO ] Repeat count: 0
2019-01-12 15:50:26 [ main:466451 ] - [agis.ps.util.RepeatFinder.findRepeats(RepeatFinder.java:125)] - [ INFO ] Finding repeat, erase time: 0 ms
2019-01-12 15:50:26 [ main:466453 ] - [agis.ps.util.LinkBuilder.mRecords2Links(LinkBuilder.java:93)] - [ INFO ] Valid Links Acount: 0
2019-01-12 15:50:26 [ main:466453 ] - [agis.ps.util.LinkBuilder.mRecords2Links(LinkBuilder.java:94)] - [ INFO ] Building Link, erase time : 0 ms
2019-01-12 15:50:26 [ main:466475 ] - [agis.ps.Scaffolder.scaffolding(Scaffolder.java:82)] - [ ERROR ] PathBuilder : The Edges could not be empty!
2019-01-12 15:50:26 [ main:466476 ] - [agis.ps.Main.main(Main.java:59)] - [ INFO ] Ending...
2019-01-12 15:50:26 [ main:466476 ] - [agis.ps.Main.main(Main.java:61)] - [ INFO ] Scaffolding erase time: 466 s.
files in the output folder are largely empty:
[eedsinger@barhal minimap2-ee20190111-1]$ ls -1la 5-lrscaff-output/
total 15
drwxr-xr-x 2 eedsinger eedsinger 5 Jan 12 15:50 .
drwxrwxr-x 4 eedsinger eedsinger 14 Jan 12 15:50 ..
-rw-r--r-- 1 eedsinger eedsinger 189 Jan 12 15:50 draft_summary.info
-rw-r--r-- 1 eedsinger eedsinger 0 Jan 12 15:50 links.info
-rw-r--r-- 1 eedsinger eedsinger 0 Jan 12 15:50 triadlinks.info
configure.xml given to LRScaf:
See attached.
Any suggestions would be greatly appreciated!
Thank-you very much,
Eric
configure.txt
Dear developers,
Quick question. I wonder if the program accepts gzipped alignment files.
Cheers!
/Andreas
Hi @shingocat,
I have been able to successfully run lrscaf but am now encountering issues when iterating through different values for specific parameters. I am using a bash script to iterate through various values for specific parameters (ie: mioll, iqrt, mmcm) to assess which combination is optimal.
contents of .sh script:
#!/bin/bash counter=0 declare -a mioll=("320" "960") declare -a tl=("1000" "10000") declare -a iqrt=("1.5" "3") declare -a mmcm=("5" "10") for i in "${mioll[@]}" do for i in "${tl[@]}" do for i in "${iqrt[@]}" do for i in "${mmcm[@]}" do let counter++ java -jar /pickett/software/lrscaf/LRScaf-1.1.11.jar \ -p 20 \ -c /path/to/Scaffolds.fasta \ -a /path/to/mapped.paf \ -t mm \ -mioll $mioll \ -tl $tl \ -iqrt $iqrt \ -mmcm $mmcm \ -o ./2_15_2021/$counter/output_species done done done done
The error I am getting is:
2021-02-15 14:08:51 [ ERROR ] Error: java.lang.NullPointerException: Cannot invoke "String.trim()" because "in" is null at java.base/jdk.internal.math.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:1838) at java.base/jdk.internal.math.FloatingDecimal.parseDouble(FloatingDecimal.java:110) at java.base/java.lang.Double.parseDouble(Double.java:549) at java.base/java.lang.Double.valueOf(Double.java:512) at agis.ps.Main.parsering(Main.java:291) at agis.ps.Main.main(Main.java:48) 2021-02-15 14:08:51 [ ERROR ] agis.ps.Main Cannot invoke "String.trim()" because "in" is null
When I remove the variable iqrt and let it run default, the error goes away and lrscaf successfully runs. I tested out my script with just one value for each parameter to see if iqrt was having an issue with decimal (1.5) vs whole (3) numbers but regardless, the same error occurs.
Please let me know of any additional information you may need to help with this. Thank you!
Hi there-- thanks for the software!
I'm running into an error when I map ONT reads using minimap2 using the appropriate options. i.e. if i prepare my alignment as
minimap2 -t 24 -ax map-ont -o out.mm ref.fa reads.fastq.gz
lrscaf
throws an error like
2021-12-02 20:06:17 [ ERROR ] agis.ps.file.MMReader On lines: 1 For input string: "ptg000088l"
2021-12-02 20:06:18 [ ERROR ] PathBuilder : The Edges could not be empty!
however if I prepare my alignment without the -ax map-ont
flag i.e.,
minimap2 -t 24 -o out.mm ref.fa reads.fastq.gz
then in that case lrscaf
runs without problem. i believe others have seen this in closed issues e.g., #17
thanks for your help
Hi~
I have tried to run LRScaf using nanopore long reads and an assembly file, and with default parameters in ScafConf.xml.
java -jar ~/lrscaf-master/target/LRScaf-1.1.12.jar -x ScafConf.xml
The messages I got are similar to this post, #13:
$ cat logs/error.log
2021-08-26 22:40:34 [ ERROR ] Error:
org.apache.commons.cli.MissingArgumentException: The aligned file could not be null!
at agis.ps.Main.parsering(Main.java:161)
at agis.ps.Main.main(Main.java:48)
2021-08-26 22:40:34 [ ERROR ] agis.ps.Main The aligned file could not be null!
$ cat logs/logs.log
2021-08-26 22:42:23 [ INFO ] Parsing the xml configure, all the other parameters set by command line will dismissed!
2021-08-26 22:42:24 [ INFO ] Launching...
2021-08-26 22:42:24 [ INFO ] Build output folder successfully!
2021-08-26 22:42:24 [ INFO ] Building output folder, elapsed time: 2 ms
2021-08-26 22:43:14 [ INFO ] Reading contigs, elapsed times: 49913 ms
2021-08-26 22:47:51 [ INFO ] Valid Aligned Records: 756966
2021-08-26 22:47:51 [ INFO ] Reading Aligned Records, elapsed time: 277126 ms
2021-08-26 22:47:52 [ INFO ] Finding Repeats:
2021-08-26 22:47:52 [ INFO ] MIN: 1.0
2021-08-26 22:47:52 [ INFO ] First Quartile: 4.0
2021-08-26 22:47:52 [ INFO ] Median cov = 12.0
2021-08-26 22:47:52 [ INFO ] Third Quartile: 17.0
2021-08-26 22:47:52 [ INFO ] MAX: 41645.0
2021-08-26 22:47:52 [ INFO ] Interquartile Range: 13.0
2021-08-26 22:47:52 [ INFO ] 1.5's IQR , Outlier Threshold: 36.5
2021-08-26 22:47:52 [ INFO ] Repeat count: 31624
2021-08-26 22:47:52 [ INFO ] Finding repeat, elapsed time: 528 ms
…
Because the error.log said: "Error: The aligned file could not be null!", but the logs.log says, "Valid Aligned Records: 756966", I think these two logs are easy to make first time users confused about how the program is running.
I wish the log files could report the real situation (whether the software is doing well), thank you.
Hi there,
I am using Minimap2 (paf) file as the input (aligned file) to run lrscaf.
I got these errors
2019-03-12 14:40:44 [ ERROR ] agis.ps.file.MMReader For input string: "SN:scaffold1_cov25"
2019-03-12 14:40:44 [ ERROR ] PathBuilder : The Edges could not be empty!
It will be appreciated if you can let me know how to fix this problem.
Thank you,
Mei
As introduced, The parameter to filter invalid Minimap alignments. Default: <8>. Only for Minimap alignment. Then I wonder by what level does LRScaf filter the alignment results? Mapping quality score?
Here is the definition of PAF files:
https://github.com/lh3/miniasm/blob/master/PAF.md
The 8th column is Target start on original strand and I wonder how can LRScaf filter alignments by this column. Should it be the last column, in 12-columned PAF file, should it be the 12th column Mapping quality ?
Then what is the threshold of this vaule? >0? >10? or >30? Would you please show some more details?
Hi
I am trying to use LRAscaf with a draft assembly of an invertebrate species (1.3Gb) and ~30x coverage of NP reads. I used minimap to create .paf file but it seems LRScaff really slowed down after this message was printed out in the log file Original "Edges size: 155118.". Now the only thing what is changed during the last 2 days of running was that the triadlinks.info file became a bit bigger. Could you help me if it is normal and I should be patient with it or something went wrong? I also copy my script here just in case if I made a mistake.
java -Xmx900G -jar LRScaf-1.1.6.jar -c Ef_wtdbg.ctg.fa -a test.paf -t mm -o scaff_test -p 16 --identity 0.1
Many Thanks
Szabolcs
I used paf from minimap2 and got this error.
2019-03-13 18:08:37 [ ERROR ] agis.ps.graph.DirectedGraph null
Although there were outputs generated, what does this error mean and how to fix it?
Thank you,
Mei
Hello,
I am trying LRscaf with a set of PacBio reads (corected and no corrected). I mapped these reads on the genome with minimap2 (v2.17). Bu t when running LRScaf , I got the following error :
2020-11-30 15:39:05 [ ERROR ] agis.ps.file.M4Reader On lines: 1 For input string: "xfSc0000457"
2020-11-30 15:39:05 [ ERROR ] PathBuilder : The Edges could not be empty!
It looks like that the tool is not able to read the aln file. Do you know why ?
Thanks
when I use like this:
java -jar target//LRScaf-1.1.2.jar --contig draft.fa -a test.m4 -t m4 --identity 0.8 --miniSupLinks 1 --output
,
get this error:
2018-07-30 13:52:24 [ main:0 ] - [agis.ps.Main.main(Main.java:67)] - [ ERROR ] agis.ps.Main The option 't' was specified but an option from this group has already been selected: 'a'
Hi,
I have tried to run LRScaff using a genome assembly and minimap2 alignment of PacBio reads (±10X coverage) and I got the following error:
2020-06-13 20:55:24 [ INFO ] Parsing the xml configure, all the other parameters set by command line will dismissed!
2020-06-13 20:55:24 [ INFO ] agis.ps.file.XMLParser The para element contain illeage item tips_length. it will be omitted!
2020-06-13 20:55:24 [ INFO ] Launching...
2020-06-13 20:55:24 [ INFO ] Build output folder successfully!
2020-06-13 20:55:24 [ INFO ] Building output folder, erase time: 1 ms
2020-06-13 20:55:57 [ INFO ] Reading contigs, erase times: 32725 ms
2020-06-13 20:55:57 [ ERROR ] agis.ps.file.MMReader On lines: 1 For input string: "collected"
2020-06-13 20:55:57 [ INFO ] Valid Aligned Records: 0
2020-06-13 20:55:57 [ INFO ] Reading Aligned Records, erase time: 9 ms
2020-06-13 20:55:57 [ INFO ] Repeat count: 0
2020-06-13 20:55:57 [ INFO ] Finding repeat, erase time: 0 ms
2020-06-13 20:55:57 [ INFO ] Valid Links Acount: 0
2020-06-13 20:55:57 [ INFO ] Building Link, erase time : 13 ms
2020-06-13 20:55:57 [ ERROR ] PathBuilder ? The Edges could not be empty!
2020-06-13 20:55:57 [ INFO ] Ending...
2020-06-13 20:55:57 [ INFO ] Scaffolding erase time: 32 s.
I am wondering what those errors mean.
I generated the minimap2 mapping file with the following command:
minimap2 -t 8 Tniger_Hi-C-scaffolding_HiRise_filtered_v2.0.fasta Tniger_PacBio-reads.fasta > Tniger_PacBio-reads.minimap2.mm
and I executed LRScaff with the following command:
java -Xms80G -Xmx80G -jar /usr/local/src/lrscaf/target/LRScaf-1.1.9.jar -c Tniger_Hi-C-scaffolding_HiRise_filtered_v2.0.fasta -a Tniger_PacBio-reads.minimap2.mm -t mm -i 0.1 -mmcm 8 -p 8 -misl 1 -o Tniger-LRScaf
I used the .XML config file and got the same errors as above.
I am now running blasr to get the alignment file, thinking the minimap2 alignment was wrong, but I am not sure about it.
Any help/advice, I would really appreciate it.
Hi,
I have assembled a genome from pacbio CLR reads, and the assembly is very contiguous (Span=407Mb, N50=12Mb). I want to try LRScaf to see whether there are any contigs that could be joined further. I realise that improvement will be limited, but it is possible that smaller overlaps (<1kb) will have been ignored in assembly and so the assembly could be improved by LRScaf.
Because some of the contigs in the assembly are very long, I thought it best to set -mioll 300 -miolr 1e-100
, as even a 1kb overlap would be a tiny fraction of a 20Mb contig and so would be removed if -miolr 0.8
.
There are some other parameter that I do not understand though, and so am not sure if/how to change them. In particular, what is the overhang length of a contig, and what is the end length of a read? Should I change these parameters (-maohl -maohr -mael -maer
) and are there any others I should consider changing?
Best wishes,
Alex
Hi!I want to know how lrscaf deal with gaps, filing it by Ns or utilizing the sequence from long-reads ?
Hi @shingocat
I got a few questions about the output files and program in general:
In the file contig_coverage.info
, I have some contigs that are missing but are present in the final scaffolding. What does that mean ?
Some of my contigs are also not present in the final scaffold assembly (I just grep the contig in nodePaths.info
and it doesn't exist) Even if it can't be scaffolded, I would expect a single scaffold composed of one contig. Why these contigs are not present at the end ? ( I see that there is a mapping result on the minimap2 file)
What contain the file repeat.contigs
?
I am not sure to understand the min_supported_links
parameter: if it is set at 1, does it mean that it needs 1 pacbio reads that overlap between 2 contigs for the scaffolding ? What is the min_supported_links
optimal parameter for 25X of PacBio reads ?
Do you gap-fill the final assembly, or it just performed a scaffolding ?
Hi, we found some contigs that were used more than once in the scaffolding. For instance ctg015357
which is in scaffolds G824
and G7782
.
cat nodePaths.info
digraph G824{
ctg038297->ctg044989->ctg015357->ctg028040->ctg034934->ctg075623->ctg055892->ctg000373;
digraph G7782{
ctg014392->ctg015357;
How is it possible and what is your interpretation about that ?
Thanks.
I encountered issues like a few others reported before. I tried to use the various changes suggested by everybody in previously resolved issues, but kept getting the same error.
This is what I did:
minimap2 -t 6 FW4911_contigsrenamed.fa ../FW4911nanopore_assemblies/FW4911nanoporeassembly1.fq -o ./FW4911spades_NPaln.mm; lrscaf -x FW4911ScafConf0.xml; (lrscaf was an alias: lrscaf="java -Xms100g -Xmx100g -jar /Path/to/LRScaf-1.1.11.jar")
This are the two errors I got:
[ ERROR ] agis.ps.file.M4Reader On lines: 1 For input string: "NODE_468_length_20536_cov_6283"
PathBuilder : The Edges could not be empty!
I took a look at the first line of the .mm file, it is as below:
1f3d91e5-29eb-4a70-a5dc-4a8027c831ff 491 44 464 - NODE_468_length_20536_cov_6283 20536 1158 1603 51 445 9 tp:A:P cm:i:6 s1:i:46 s2:i:0 dv:f:0.1666 rl:i:0
Thank you in advance for your help!
hello!
i have the problem, that after some minutes LRScaf stops due to the error: "PathBuilder : The Edges could not be empty"
this is the command i use:
$ java -jar /home/programs/LRScaf/LRScaf-1.1.9.jar -c ../KDH -a ../longreads_against_kdh.paf -t mm -i 0.1 -o .
this is the output i get:
2020-04-30 18:18:55 [ INFO ] Launching...
2020-04-30 18:18:55 [ INFO ] The output folder existed!
2020-04-30 18:18:55 [ INFO ] Building output folder, erase time: 0 ms
2020-04-30 18:19:15 [ INFO ] Reading contigs, erase times: 19442 ms
2020-04-30 18:19:15 [ INFO ] The output file /binfl/LRScaff_output/draft_summary.info existed. It will overwrite.
2020-04-30 18:20:11 [ INFO ] Valid Aligned Records: 0
2020-04-30 18:20:11 [ INFO ] Reading Aligned Records, erase time: 55991 ms
2020-04-30 18:20:11 [ INFO ] Repeat count: 0
2020-04-30 18:20:11 [ INFO ] Finding repeat, erase time: 0 ms
2020-04-30 18:20:11 [ INFO ] Valid Links Acount: 0
2020-04-30 18:20:11 [ INFO ] Building Link, erase time : 7 ms
2020-04-30 18:20:11 [ INFO ] The output file /binfl/LRScaff_output/links.info existed. It will overwrite.
2020-04-30 18:20:11 [ INFO ] The output file /binfl/LRScaff_output/triadlinks.info existed. It will overwrite.
2020-04-30 18:20:11 [ ERROR ] PathBuilder : The Edges could not be empty!
2020-04-30 18:20:11 [ INFO ] Ending...
2020-04-30 18:20:11 [ INFO ] Scaffolding erase time: 75 s.
the alignment was done with minimap2
minimap2 -ax map-ont -t 16 ../KDH /project/ALL_reads.fasta > longreads_against_kdh.sam
paftools sam2paf longreads_against_kdh.sam > longreads_against_kdh.paf
even tough i looked into the other tickets, i'm not sure what the problem is in this case.
i'm not very experienced so it would be really great if you could help me.
I input the sam (pbalign) and m4 (blasr) alignment file to lrscaf through the command line or XML both give me the stderr.
2018-08-01 08:40:06 [ main:246434 ] - [agis.ps.file.AlignmentFileReader.read(AlignmentFileReader.java:176)] - [ INFO ] Valid Aligned Records: 0
2018-08-01 08:40:06 [ main:246435 ] - [agis.ps.file.AlignmentFileReader.read(AlignmentFileReader.java:177)] - [ INFO ] Reading Aligned Records, erase time: 13657 ms
2018-08-01 08:40:06 [ main:246436 ] - [agis.ps.util.RepeatFinder.findRepeats(RepeatFinder.java:123)] - [ INFO ] Repeat count: 0
2018-08-01 08:40:06 [ main:246437 ] - [agis.ps.util.RepeatFinder.findRepeats(RepeatFinder.java:125)] - [ INFO ] Finding repeat, erase time: 1 ms
2018-08-01 08:40:06 [ main:246438 ] - [agis.ps.util.LinkBuilder.mRecords2Links(LinkBuilder.java:93)] - [ INFO ] Valid Links Acount: 0
2018-08-01 08:40:06 [ main:246438 ] - [agis.ps.util.LinkBuilder.mRecords2Links(LinkBuilder.java:94)] - [ INFO ] Building Link, erase time : 0 ms
2018-08-01 08:40:06 [ main:246472 ] - [agis.ps.Scaffolder.scaffolding(Scaffolder.java:82)] - [ ERROR ] PathBuilder : The Edges could not be empty!
2018-08-01 08:40:06 [ main:246473 ] - [agis.ps.Main.main(Main.java:59)] - [ INFO ] Ending...
2018-08-01 08:40:06 [ main:246473 ] - [agis.ps.Main.main(Main.java:61)] - [ INFO ] Scaffolding erase time: 246 s.
and empty links.info and triadlinks.info file.
Any help is much appreciated.
Thanks.
I am trying to run LRScaf with the following commands:
minimap2 -ax map-ont -t 8 assembly.fasta ONT.fasta > align.mm
java -Xms200g -Xmx200g -jar LRScaf-1.1.10.jar -c assembly.fasta -a align.mm -t mm -o out
I am getting the following error:
2021-05-12 15:54:44 [ ERROR ] Error:
org.apache.commons.cli.MissingArgumentException: The type of aligned file could not be null!
at agis.ps.Main.parsering(Main.java:180)
at agis.ps.Main.main(Main.java:53)
2021-05-12 15:54:44 [ ERROR ] agis.ps.Main The type of aligned file could not be null!
2021-05-12 15:57:45 [ ERROR ] agis.ps.file.MMReader On lines: 1 For input string: "SN:contig_1"
2021-05-12 15:57:45 [ ERROR ] PathBuilder : The Edges could not be empty!
2021-05-12 16:00:23 [ ERROR ] agis.ps.file.M4Reader On lines: 1 3
2021-05-12 16:00:23 [ ERROR ] PathBuilder : The Edges could not be empty!
2021-05-12 16:02:45 [ ERROR ] PathBuilder : The Edges could not be empty!
Could you please help me with this error?
Thanks in advance!
Julia
Hello,
I am the developer of MaSuRCA assembler. I am looking for a good long-read scaffolder and your paper had nice results. However, when I tried using your scaffolder on a human genome assembly produced by MaSuRCA with ~9Mbp N50 contig size (about 1200 contigs), I found that the scaffolder duplicated many contigs in the scaffolds, resulting in much bigger (3.24Gbp vs 2.85Gbp) final assembly size. This is not the correct behavior. Scaffolder should output about the same amount of sequence, give or take losses in merging contigs. Contigs should never be duplicated exactly unless there is a very good reason for it, and if that is done, then duplicates must be resolved by remapping the reads and re-doing consensus. I found that duplicated contigs were always on the ends of paths in nodePaths.info. My assembly, config xml, the paf output of minimap and lrscaf output are posted here:
ftp://ftp.ccb.jhu.edu/pub/alekseyz/lrscaf_debug/
Best,
Aleksey Zimin
Hi @shingocat ,
Two questions about lrscaf,
Can I use 'scaffolds' as input instead of 'contigs'? My draft assembly was generated from platanus-allee, and has been scaffolded by platanus-allee itself, but I want to try lfscaf to improve it. I know I could split scaffolds into contigs, but contigs are very very fragmented if I do so(several millions contigs may be), so I don't want to split the existing scaffolds.
If 'scaffolds' is acceptable by lrscaf, should I do gap close(13% N in my draft assembly) first or lrscaf first ?
Best,
Kun
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.