Git Product home page Git Product logo

lrscaf's People

Contributors

shingocat avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

lrscaf's Issues

LRScaf-1.1.8 stuck at "Finding repeats"

Hi!

I've been running LRScaf, and since yesterday it apparentely hasn't done anything. The last line of the log is:
2020-01-04 17:27:40 [ INFO ] Finding repeat, erase time: 1948 ms
When using "top" the process appears with state S (interruptible sleep (waiting for an event to complete), https://idea.popcount.org/2012-12-11-linux-process-states/)

Any clue regarding this? How long should I expect this step to last?

The minimap alignment file is a 4.1 Gb file, the contigs file is a 4.3 Gb file, and current memory usage of LRScaf seem to be ~16 Gb.

Composition of scaffolds ?

Hi,

The program worked and I got scaffolds at the end.

How can I get the information about which contigs on my asssembly have been assembled in which scaffolds ?
In other word: for one scaffold in particular I would like to know the name of the contigs scaffolded ?

thanks

nodePaths.info could not be created LRScaf-1.1.10

When I try using LRScaf v 1.1.10, I get a message 'The output file/mmjggl/nicodemus/.../nodePaths.info could not be created!'. The nodePaths.info file is actually present in the output folder, but it is empty. The scaffolds.fasta seems to get generated correctly.

When I run exactly the same command using v 1.1.9, nodePaths.info gets properly generated. It seems like the changes introduced into the file writer in the new version broke it. I'm fine with using v 1.1.9, but wanted to let you know.

Also related to this, how can I translate the digraph IDs used in nodePaths.info (e.g. G1), into scaffold IDs (e.g. Scaffolds_0) used in scaffolds.fasta? I need this to figure out what contigs from the initial assembly ended up in LRScaf scaffolds.

Thanks!

Nanopore alignment with PACBIO

Hi, I have clean reads from PACBIO assembly generated from Hifiasm. Error correction and polishing is performed using RACON and PILON. This gave me good N50. However, when I aligned cleaned assembly to the reference genome (plant), the scaffolds are still not extended.
After that I aligned my cleaned assembly to Nanopore (rawfile.fq) using MINIMAP2. I have .paf file as well .sam file after the alignment. Now I am not sure how to proceed with LRSCARF. Can you please suggest? I would appreciate that.

Thank you

Is <tips_length> a wrong parameter name

Hello!

Recently I ran LRScaf using the ScafConf.xml template, with default values documented in the README, and it reported:

2021-08-26 22:42:23 [ INFO ] Parsing the xml configure, all the other parameters set by command line will dismissed!
2021-08-26 22:42:23 [ INFO ] agis.ps.file.XMLParser The para element contain illeage item tips_length. it will be omitted!
......

I also notice that in the README -> Parameters of LRScaf, the parameter is not spelled <tips_length>:

tip_length The maximum tip length. Default: <1500> bp.

So I suggest that the template xml ScafConf.xml and the example xml file in README.md, better be consistent with the spelling in the source code:)

More duplication ?

Hello,

Thanks for your software. I used LRScaf to scaffold my contig assembly. Here is the busco before and after LRScaf:

before: C:95.4%[S:94.7%,D:0.7%],F:1.3%,M:3.3%,n:1367
after: C:95.8%[S:91.4%,D:4.4%],F:1.2%,M:3.0%,n:1367

How can we explain the fact that there is more duplicated busco after ?

Thanks for your help.

Ns after Scaffolding contigs containing no Ns

After scaffolding a contigs fasta (containing no Ns), the output scaffolds.fasta file contained Ns, even thought the input long reads contained no Ns.

If two contigs are being linked by a long read, then in the resulting scaffold, is the sequence between the two contigs copied over from the long read, or is it a stretch of Ns?

Long stretches of homopolymers

Hi,

The software works pretty well, but somehow for one of the genome, I'm working on generating very long stretches of homopolymers mainly Ts and As. It does that in many parts of the genome. To me this is not biological but are due to the pacbio reads, how can I avoid it?

Thanks
F

error following tutorial

Hi,

I followed your tutorial mapping the long reads to the assembly using minimap2. Then I ran:

java -jar ../lrscaf/target/LRScaf-1.1.7.jar -c ../gelada_10x_split.fa -a ../aln.mm -t mm -o ./lrscaf_out

And I got this error:
2019-11-22 11:17:30 [ ERROR ] agis.ps.file.MMReader 5
2019-11-22 12:18:08 [ ERROR ] agis.ps.Main The aligned file could not be null!

The log looks like this:
2019-11-22 12:27:37 [ WARN ] The identity for minimap mapper would be setted to 0.1! 2019-11-22 12:27:37 [ INFO ] Launching... 2019-11-22 12:27:37 [ INFO ] Build output folder successfully! 2019-11-22 12:27:37 [ INFO ] Building output folder, erase time: 1 ms 2019-11-22 12:33:53 [ INFO ] Reading contigs, erase times: 375239 ms 2019-11-22 12:46:27 [ INFO ] Valid Aligned Records: 7380589 2019-11-22 12:46:27 [ INFO ] Reading Aligned Records, erase time: 753695 ms 2019-11-22 12:46:28 [ INFO ] Finding Repeats: 2019-11-22 12:46:28 [ INFO ] MIN: 1.0 2019-11-22 12:46:28 [ INFO ] First Quartile: 4.0 2019-11-22 12:46:28 [ INFO ] Median cov = 9.0 2019-11-22 12:46:28 [ INFO ] Third Quartile: 14.0 2019-11-22 12:46:28 [ INFO ] MAX: 19899.0 2019-11-22 12:46:28 [ INFO ] Interquartile Range: 10.0 2019-11-22 12:46:28 [ INFO ] 1.5's IQR , Outlier Threshold: 29.0 2019-11-22 12:46:28 [ INFO ] Repeat count: 14875 2019-11-22 12:46:28 [ INFO ] Finding repeat, erase time: 805 ms

Launching error and empty edges

Hi,

I have quickmerge fasta file from Hiifasm and IPA assembly of a plant genome. Later I polished the fasta file using RACON.

I tried the below option and getting error described below

java -jar /rhome/khushwas/lrscaf/target/LRScaf-1.1.12.jar -c merged_prefix.fasta -a merged.prefix.fasta.sam -t merged.prefix.fasta.sam -micl 500 -mioll 170 -mmcm 8 -o /path/to/folder/

merged_prefix.fasta - fasta file generated after quickmerge from two long read assmblies
merged.prefix.fasta.sam - Minimap2 - sam file

2021-06-17 22:02:18 [ INFO ] Launching...
2021-06-17 22:02:18 [ INFO ] The output folder existed!
2021-06-17 22:02:18 [ INFO ] Building output folder, elapsed time: 0 ms
2021-06-17 22:03:03 [ INFO ] Reading contigs, elapsed times: 45949 ms
2021-06-17 22:03:04 [ INFO ] The aligned parameter should be not set! only , , or
2021-06-17 22:03:04 [ ERROR ] PathBuilder : The Edges could not be empty!
2021-06-17 22:03:04 [ INFO ] Ending...
2021-06-17 22:03:04 [ INFO ] Scaffolding elapsed time: 46 s.

I do not know if my LRSCAF command is correct or not? Please suggest.

Thank you

Troubleshooting errors

Hi!

I'm making a first pass at using LRScaf and having trouble making sense of the failed output. LRScaf exit status was 0 but I'm not seeing a hybrid assembly - and like others here - but maybe not identical - I am also getting the edges could not be empty error - and an error in the MMReader input string.

I have a very poor Discovar assembly for the pygmy octopus genome (over 6 million scaffolds) and a first pass of MinION reads (150,000 reads). The genome is around human size - and I estimate MinION coverage is about 0.68x - Illumina coverage is around 80x going into the Discovar assembly but the assembly did not turn out well. I used minimap2 with default settings.

logs/error.log from LRScaf:

2019-01-12 15:50:26 [ main:466447 ] - [agis.ps.file.AlignmentFileReader.read(AlignmentFileReader.java:163)] - [ ERROR ] agis.ps.file.MMReader For input string: "collected"
2019-01-12 15:50:26 [ main:466475 ] - [agis.ps.Scaffolder.scaffolding(Scaffolder.java:82)] - ### [ ERROR ] PathBuilder : The Edges could not be empty!

logs/logs.log from LRScaf:

2019-01-12 15:42:40 [ main:0 ] - [agis.ps.Main.main(Main.java:48)] - [ INFO ] Parsing the xml configure, all the other parameters set by command line will dismissed
2019-01-12 15:42:40 [ main:58 ] - [agis.ps.file.XMLParser.parseXML(XMLParser.java:182)] - [ INFO ] agis.ps.file.XMLParser The para element contain illeage item tips_length. it will be omitted!
2019-01-12 15:42:40 [ main:59 ] - [agis.ps.Main.main(Main.java:56)] - [ INFO ] Launching...
2019-01-12 15:42:40 [ main:70 ] - [agis.ps.file.OutputFolderBuilder.building(OutputFolderBuilder.java:42)] - [ INFO ] Build output folder successfully!
2019-01-12 15:42:40 [ main:71 ] - [agis.ps.file.OutputFolderBuilder.building(OutputFolderBuilder.java:52)] - [ INFO ] Building output folder, erase time: 8 ms
2019-01-12 15:50:24 [ main:464505 ] - [agis.ps.file.ContigReader.read(ContigReader.java:149)] - [ INFO ] Reading contigs, erase times: 464429 ms
2019-01-12 15:50:26 [ main:466447 ] - [agis.ps.file.AlignmentFileReader.read(AlignmentFileReader.java:163)] - [ ERROR ] agis.ps.file.MMReader For input string: "collected"
2019-01-12 15:50:26 [ main:466450 ] - [agis.ps.file.AlignmentFileReader.read(AlignmentFileReader.java:176)] - [ INFO ] Valid Aligned Records: 0
2019-01-12 15:50:26 [ main:466450 ] - [agis.ps.file.AlignmentFileReader.read(AlignmentFileReader.java:177)] - [ INFO ] Reading Aligned Records, erase time: 19 ms
2019-01-12 15:50:26 [ main:466451 ] - [agis.ps.util.RepeatFinder.findRepeats(RepeatFinder.java:123)] - [ INFO ] Repeat count: 0
2019-01-12 15:50:26 [ main:466451 ] - [agis.ps.util.RepeatFinder.findRepeats(RepeatFinder.java:125)] - [ INFO ] Finding repeat, erase time: 0 ms
2019-01-12 15:50:26 [ main:466453 ] - [agis.ps.util.LinkBuilder.mRecords2Links(LinkBuilder.java:93)] - [ INFO ] Valid Links Acount: 0
2019-01-12 15:50:26 [ main:466453 ] - [agis.ps.util.LinkBuilder.mRecords2Links(LinkBuilder.java:94)] - [ INFO ] Building Link, erase time : 0 ms
2019-01-12 15:50:26 [ main:466475 ] - [agis.ps.Scaffolder.scaffolding(Scaffolder.java:82)] - [ ERROR ] PathBuilder : The Edges could not be empty!
2019-01-12 15:50:26 [ main:466476 ] - [agis.ps.Main.main(Main.java:59)] - [ INFO ] Ending...
2019-01-12 15:50:26 [ main:466476 ] - [agis.ps.Main.main(Main.java:61)] - [ INFO ] Scaffolding erase time: 466 s.

files in the output folder are largely empty:

[eedsinger@barhal minimap2-ee20190111-1]$ ls -1la 5-lrscaff-output/
total 15
drwxr-xr-x 2 eedsinger eedsinger 5 Jan 12 15:50 .
drwxrwxr-x 4 eedsinger eedsinger 14 Jan 12 15:50 ..
-rw-r--r-- 1 eedsinger eedsinger 189 Jan 12 15:50 draft_summary.info
-rw-r--r-- 1 eedsinger eedsinger 0 Jan 12 15:50 links.info
-rw-r--r-- 1 eedsinger eedsinger 0 Jan 12 15:50 triadlinks.info

configure.xml given to LRScaf:
See attached.

Any suggestions would be greatly appreciated!

Thank-you very much,
Eric
configure.txt

Reading gzipped PAFs

Dear developers,

Quick question. I wonder if the program accepts gzipped alignment files.

Cheers!

/Andreas

Inability to parse iqrt parameter

Hi @shingocat,

I have been able to successfully run lrscaf but am now encountering issues when iterating through different values for specific parameters. I am using a bash script to iterate through various values for specific parameters (ie: mioll, iqrt, mmcm) to assess which combination is optimal.

contents of .sh script:
#!/bin/bash counter=0 declare -a mioll=("320" "960") declare -a tl=("1000" "10000") declare -a iqrt=("1.5" "3") declare -a mmcm=("5" "10") for i in "${mioll[@]}" do for i in "${tl[@]}" do for i in "${iqrt[@]}" do for i in "${mmcm[@]}" do let counter++ java -jar /pickett/software/lrscaf/LRScaf-1.1.11.jar \ -p 20 \ -c /path/to/Scaffolds.fasta \ -a /path/to/mapped.paf \ -t mm \ -mioll $mioll \ -tl $tl \ -iqrt $iqrt \ -mmcm $mmcm \ -o ./2_15_2021/$counter/output_species done done done done

The error I am getting is:
2021-02-15 14:08:51 [ ERROR ] Error: java.lang.NullPointerException: Cannot invoke "String.trim()" because "in" is null at java.base/jdk.internal.math.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:1838) at java.base/jdk.internal.math.FloatingDecimal.parseDouble(FloatingDecimal.java:110) at java.base/java.lang.Double.parseDouble(Double.java:549) at java.base/java.lang.Double.valueOf(Double.java:512) at agis.ps.Main.parsering(Main.java:291) at agis.ps.Main.main(Main.java:48) 2021-02-15 14:08:51 [ ERROR ] agis.ps.Main Cannot invoke "String.trim()" because "in" is null

When I remove the variable iqrt and let it run default, the error goes away and lrscaf successfully runs. I tested out my script with just one value for each parameter to see if iqrt was having an issue with decimal (1.5) vs whole (3) numbers but regardless, the same error occurs.

Please let me know of any additional information you may need to help with this. Thank you!

lrscaf throws an error with ont mapped reads from minimap2

Hi there-- thanks for the software!

I'm running into an error when I map ONT reads using minimap2 using the appropriate options. i.e. if i prepare my alignment as

minimap2 -t 24 -ax map-ont -o out.mm ref.fa reads.fastq.gz

lrscaf throws an error like

2021-12-02 20:06:17  [ ERROR ]  agis.ps.file.MMReader	On lines: 1	For input string: "ptg000088l"
2021-12-02 20:06:18  [ ERROR ]  PathBuilder : The Edges could not be empty!

however if I prepare my alignment without the -ax map-ont flag i.e.,

minimap2 -t 24  -o out.mm ref.fa reads.fastq.gz

then in that case lrscaf runs without problem. i believe others have seen this in closed issues e.g., #17

thanks for your help

might be inconsistent log messages?

Hi~

I have tried to run LRScaf using nanopore long reads and an assembly file, and with default parameters in ScafConf.xml.
java -jar ~/lrscaf-master/target/LRScaf-1.1.12.jar -x ScafConf.xml
The messages I got are similar to this post, #13:

$ cat logs/error.log
2021-08-26 22:40:34 [ ERROR ] Error:
org.apache.commons.cli.MissingArgumentException: The aligned file could not be null!
at agis.ps.Main.parsering(Main.java:161)
at agis.ps.Main.main(Main.java:48)
2021-08-26 22:40:34 [ ERROR ] agis.ps.Main The aligned file could not be null!

$ cat logs/logs.log
2021-08-26 22:42:23 [ INFO ] Parsing the xml configure, all the other parameters set by command line will dismissed!
2021-08-26 22:42:24 [ INFO ] Launching...
2021-08-26 22:42:24 [ INFO ] Build output folder successfully!
2021-08-26 22:42:24 [ INFO ] Building output folder, elapsed time: 2 ms
2021-08-26 22:43:14 [ INFO ] Reading contigs, elapsed times: 49913 ms
2021-08-26 22:47:51 [ INFO ] Valid Aligned Records: 756966
2021-08-26 22:47:51 [ INFO ] Reading Aligned Records, elapsed time: 277126 ms
2021-08-26 22:47:52 [ INFO ] Finding Repeats:
2021-08-26 22:47:52 [ INFO ] MIN: 1.0
2021-08-26 22:47:52 [ INFO ] First Quartile: 4.0
2021-08-26 22:47:52 [ INFO ] Median cov = 12.0
2021-08-26 22:47:52 [ INFO ] Third Quartile: 17.0
2021-08-26 22:47:52 [ INFO ] MAX: 41645.0
2021-08-26 22:47:52 [ INFO ] Interquartile Range: 13.0
2021-08-26 22:47:52 [ INFO ] 1.5's IQR , Outlier Threshold: 36.5
2021-08-26 22:47:52 [ INFO ] Repeat count: 31624
2021-08-26 22:47:52 [ INFO ] Finding repeat, elapsed time: 528 ms

Because the error.log said: "Error: The aligned file could not be null!", but the logs.log says, "Valid Aligned Records: 756966", I think these two logs are easy to make first time users confused about how the program is running.
I wish the log files could report the real situation (whether the software is doing well), thank you.

Error PathBuilder

Hi there,

I am using Minimap2 (paf) file as the input (aligned file) to run lrscaf.
I got these errors
2019-03-12 14:40:44 [ ERROR ] agis.ps.file.MMReader For input string: "SN:scaffold1_cov25"
2019-03-12 14:40:44 [ ERROR ] PathBuilder : The Edges could not be empty!

It will be appreciated if you can let me know how to fix this problem.

Thank you,
Mei

What does mmcm mean?

As introduced, The parameter to filter invalid Minimap alignments. Default: <8>. Only for Minimap alignment. Then I wonder by what level does LRScaf filter the alignment results? Mapping quality score?

Here is the definition of PAF files:

https://github.com/lh3/miniasm/blob/master/PAF.md

The 8th column is Target start on original strand and I wonder how can LRScaf filter alignments by this column. Should it be the last column, in 12-columned PAF file, should it be the 12th column Mapping quality ?

Then what is the threshold of this vaule? >0? >10? or >30? Would you please show some more details?

LRScaf running slow

Hi

I am trying to use LRAscaf with a draft assembly of an invertebrate species (1.3Gb) and ~30x coverage of NP reads. I used minimap to create .paf file but it seems LRScaff really slowed down after this message was printed out in the log file Original "Edges size: 155118.". Now the only thing what is changed during the last 2 days of running was that the triadlinks.info file became a bit bigger. Could you help me if it is normal and I should be patient with it or something went wrong? I also copy my script here just in case if I made a mistake.

java -Xmx900G -jar LRScaf-1.1.6.jar -c Ef_wtdbg.ctg.fa -a test.paf -t mm -o scaff_test -p 16 --identity 0.1

Many Thanks

Szabolcs

[ ERROR ] agis.ps.graph.DirectedGraph

I used paf from minimap2 and got this error.
2019-03-13 18:08:37 [ ERROR ] agis.ps.graph.DirectedGraph null

Although there were outputs generated, what does this error mean and how to fix it?

Thank you,
Mei

error while running LRscaff

Hello,
I am trying LRscaf with a set of PacBio reads (corected and no corrected). I mapped these reads on the genome with minimap2 (v2.17). Bu t when running LRScaf , I got the following error :

2020-11-30 15:39:05 [ ERROR ] agis.ps.file.M4Reader On lines: 1 For input string: "xfSc0000457"
2020-11-30 15:39:05 [ ERROR ] PathBuilder : The Edges could not be empty!

It looks like that the tool is not able to read the aln file. Do you know why ?

Thanks

args error

when I use like this:
java -jar target//LRScaf-1.1.2.jar --contig draft.fa -a test.m4 -t m4 --identity 0.8 --miniSupLinks 1 --output ,
get this error:
2018-07-30 13:52:24 [ main:0 ] - [agis.ps.Main.main(Main.java:67)] - [ ERROR ] agis.ps.Main The option 't' was specified but an option from this group has already been selected: 'a'

agis.ps.file.MMReader and PathBuilder ? errors

Hi,

I have tried to run LRScaff using a genome assembly and minimap2 alignment of PacBio reads (±10X coverage) and I got the following error:

2020-06-13 20:55:24 [ INFO ] Parsing the xml configure, all the other parameters set by command line will dismissed!
2020-06-13 20:55:24 [ INFO ] agis.ps.file.XMLParser The para element contain illeage item tips_length. it will be omitted!
2020-06-13 20:55:24 [ INFO ] Launching...
2020-06-13 20:55:24 [ INFO ] Build output folder successfully!
2020-06-13 20:55:24 [ INFO ] Building output folder, erase time: 1 ms
2020-06-13 20:55:57 [ INFO ] Reading contigs, erase times: 32725 ms
2020-06-13 20:55:57 [ ERROR ] agis.ps.file.MMReader On lines: 1 For input string: "collected"
2020-06-13 20:55:57 [ INFO ] Valid Aligned Records: 0
2020-06-13 20:55:57 [ INFO ] Reading Aligned Records, erase time: 9 ms
2020-06-13 20:55:57 [ INFO ] Repeat count: 0
2020-06-13 20:55:57 [ INFO ] Finding repeat, erase time: 0 ms
2020-06-13 20:55:57 [ INFO ] Valid Links Acount: 0
2020-06-13 20:55:57 [ INFO ] Building Link, erase time : 13 ms
2020-06-13 20:55:57 [ ERROR ] PathBuilder ? The Edges could not be empty!
2020-06-13 20:55:57 [ INFO ] Ending...
2020-06-13 20:55:57 [ INFO ] Scaffolding erase time: 32 s.

I am wondering what those errors mean.

I generated the minimap2 mapping file with the following command:

minimap2 -t 8 Tniger_Hi-C-scaffolding_HiRise_filtered_v2.0.fasta Tniger_PacBio-reads.fasta > Tniger_PacBio-reads.minimap2.mm

and I executed LRScaff with the following command:

java -Xms80G -Xmx80G -jar /usr/local/src/lrscaf/target/LRScaf-1.1.9.jar -c Tniger_Hi-C-scaffolding_HiRise_filtered_v2.0.fasta -a Tniger_PacBio-reads.minimap2.mm -t mm -i 0.1 -mmcm 8 -p 8 -misl 1 -o Tniger-LRScaf

I used the .XML config file and got the same errors as above.

I am now running blasr to get the alignment file, thinking the minimap2 alignment was wrong, but I am not sure about it.

Any help/advice, I would really appreciate it.

Parameters for contiguous assemblies

Hi,

I have assembled a genome from pacbio CLR reads, and the assembly is very contiguous (Span=407Mb, N50=12Mb). I want to try LRScaf to see whether there are any contigs that could be joined further. I realise that improvement will be limited, but it is possible that smaller overlaps (<1kb) will have been ignored in assembly and so the assembly could be improved by LRScaf.

Because some of the contigs in the assembly are very long, I thought it best to set -mioll 300 -miolr 1e-100, as even a 1kb overlap would be a tiny fraction of a 20Mb contig and so would be removed if -miolr 0.8.

There are some other parameter that I do not understand though, and so am not sure if/how to change them. In particular, what is the overhang length of a contig, and what is the end length of a read? Should I change these parameters (-maohl -maohr -mael -maer) and are there any others I should consider changing?

Best wishes,

Alex

about gaps

Hi!I want to know how lrscaf deal with gaps, filing it by Ns or utilizing the sequence from long-reads ?

Output and parameters

Hi @shingocat

I got a few questions about the output files and program in general:

  1. In the file contig_coverage.info, I have some contigs that are missing but are present in the final scaffolding. What does that mean ?

  2. Some of my contigs are also not present in the final scaffold assembly (I just grep the contig in nodePaths.info and it doesn't exist) Even if it can't be scaffolded, I would expect a single scaffold composed of one contig. Why these contigs are not present at the end ? ( I see that there is a mapping result on the minimap2 file)

  3. What contain the file repeat.contigs ?

  4. I am not sure to understand the min_supported_links parameter: if it is set at 1, does it mean that it needs 1 pacbio reads that overlap between 2 contigs for the scaffolding ? What is the min_supported_links optimal parameter for 25X of PacBio reads ?

  5. Do you gap-fill the final assembly, or it just performed a scaffolding ?

Contig used more than once ?

Hi, we found some contigs that were used more than once in the scaffolding. For instance ctg015357 which is in scaffolds G824 and G7782.

cat nodePaths.info

digraph G824{
ctg038297->ctg044989->ctg015357->ctg028040->ctg034934->ctg075623->ctg055892->ctg000373;

digraph G7782{
ctg014392->ctg015357;


How is it possible and what is your interpretation about that ?

Thanks.

agis.ps.file.M4Reader error and PathBuilder error

I encountered issues like a few others reported before. I tried to use the various changes suggested by everybody in previously resolved issues, but kept getting the same error.

This is what I did:
minimap2 -t 6 FW4911_contigsrenamed.fa ../FW4911nanopore_assemblies/FW4911nanoporeassembly1.fq -o ./FW4911spades_NPaln.mm; lrscaf -x FW4911ScafConf0.xml; (lrscaf was an alias: lrscaf="java -Xms100g -Xmx100g -jar /Path/to/LRScaf-1.1.11.jar")

This are the two errors I got:
[ ERROR ] agis.ps.file.M4Reader On lines: 1 For input string: "NODE_468_length_20536_cov_6283"
PathBuilder : The Edges could not be empty!

I took a look at the first line of the .mm file, it is as below:
1f3d91e5-29eb-4a70-a5dc-4a8027c831ff 491 44 464 - NODE_468_length_20536_cov_6283 20536 1158 1603 51 445 9 tp:A:P cm:i:6 s1:i:46 s2:i:0 dv:f:0.1666 rl:i:0

Thank you in advance for your help!

[ ERROR ] PathBuilder : The Edges could not be empty

hello!
i have the problem, that after some minutes LRScaf stops due to the error: "PathBuilder : The Edges could not be empty"

this is the command i use:
$ java -jar /home/programs/LRScaf/LRScaf-1.1.9.jar -c ../KDH -a ../longreads_against_kdh.paf -t mm -i 0.1 -o .

this is the output i get:
2020-04-30 18:18:55 [ INFO ] Launching...
2020-04-30 18:18:55 [ INFO ] The output folder existed!
2020-04-30 18:18:55 [ INFO ] Building output folder, erase time: 0 ms
2020-04-30 18:19:15 [ INFO ] Reading contigs, erase times: 19442 ms
2020-04-30 18:19:15 [ INFO ] The output file /binfl/LRScaff_output/draft_summary.info existed. It will overwrite.
2020-04-30 18:20:11 [ INFO ] Valid Aligned Records: 0
2020-04-30 18:20:11 [ INFO ] Reading Aligned Records, erase time: 55991 ms
2020-04-30 18:20:11 [ INFO ] Repeat count: 0
2020-04-30 18:20:11 [ INFO ] Finding repeat, erase time: 0 ms
2020-04-30 18:20:11 [ INFO ] Valid Links Acount: 0
2020-04-30 18:20:11 [ INFO ] Building Link, erase time : 7 ms
2020-04-30 18:20:11 [ INFO ] The output file /binfl/LRScaff_output/links.info existed. It will overwrite.
2020-04-30 18:20:11 [ INFO ] The output file /binfl/LRScaff_output/triadlinks.info existed. It will overwrite.
2020-04-30 18:20:11 [ ERROR ] PathBuilder : The Edges could not be empty!
2020-04-30 18:20:11 [ INFO ] Ending...
2020-04-30 18:20:11 [ INFO ] Scaffolding erase time: 75 s.

the alignment was done with minimap2
minimap2 -ax map-ont -t 16 ../KDH /project/ALL_reads.fasta > longreads_against_kdh.sam
paftools sam2paf longreads_against_kdh.sam > longreads_against_kdh.paf

even tough i looked into the other tickets, i'm not sure what the problem is in this case.
i'm not very experienced so it would be really great if you could help me.

error of The Edges could not be empty

I input the sam (pbalign) and m4 (blasr) alignment file to lrscaf through the command line or XML both give me the stderr.

2018-08-01 08:40:06  [ main:246434 ] - [agis.ps.file.AlignmentFileReader.read(AlignmentFileReader.java:176)] - [ INFO ]  Valid Aligned Records: 0
2018-08-01 08:40:06  [ main:246435 ] - [agis.ps.file.AlignmentFileReader.read(AlignmentFileReader.java:177)] - [ INFO ]  Reading Aligned Records, erase time: 13657 ms
2018-08-01 08:40:06  [ main:246436 ] - [agis.ps.util.RepeatFinder.findRepeats(RepeatFinder.java:123)] - [ INFO ]  Repeat count: 0
2018-08-01 08:40:06  [ main:246437 ] - [agis.ps.util.RepeatFinder.findRepeats(RepeatFinder.java:125)] - [ INFO ]  Finding repeat, erase time: 1 ms
2018-08-01 08:40:06  [ main:246438 ] - [agis.ps.util.LinkBuilder.mRecords2Links(LinkBuilder.java:93)] - [ INFO ]  Valid Links Acount: 0
2018-08-01 08:40:06  [ main:246438 ] - [agis.ps.util.LinkBuilder.mRecords2Links(LinkBuilder.java:94)] - [ INFO ]  Building Link, erase time : 0 ms
2018-08-01 08:40:06  [ main:246472 ] - [agis.ps.Scaffolder.scaffolding(Scaffolder.java:82)] - [ ERROR ]  PathBuilder : The Edges could not be empty!
2018-08-01 08:40:06  [ main:246473 ] - [agis.ps.Main.main(Main.java:59)] - [ INFO ]  Ending...
2018-08-01 08:40:06  [ main:246473 ] - [agis.ps.Main.main(Main.java:61)] - [ INFO ]  Scaffolding erase time: 246 s.

and empty links.info and triadlinks.info file.

Any help is much appreciated.
Thanks.

Null alignment file

I am trying to run LRScaf with the following commands:

minimap2 -ax map-ont -t 8 assembly.fasta ONT.fasta > align.mm
java -Xms200g -Xmx200g -jar LRScaf-1.1.10.jar -c assembly.fasta -a align.mm -t mm -o out

I am getting the following error:

2021-05-12 15:54:44 [ ERROR ] Error:
org.apache.commons.cli.MissingArgumentException: The type of aligned file could not be null!
at agis.ps.Main.parsering(Main.java:180)
at agis.ps.Main.main(Main.java:53)
2021-05-12 15:54:44 [ ERROR ] agis.ps.Main The type of aligned file could not be null!
2021-05-12 15:57:45 [ ERROR ] agis.ps.file.MMReader On lines: 1 For input string: "SN:contig_1"
2021-05-12 15:57:45 [ ERROR ] PathBuilder : The Edges could not be empty!
2021-05-12 16:00:23 [ ERROR ] agis.ps.file.M4Reader On lines: 1 3
2021-05-12 16:00:23 [ ERROR ] PathBuilder : The Edges could not be empty!
2021-05-12 16:02:45 [ ERROR ] PathBuilder : The Edges could not be empty!

Could you please help me with this error?

Thanks in advance!
Julia

duplicated contigs in scaffolds in human assembly. assembly size goes up to 3.24Gbp from 2.85Gbp

Hello,

I am the developer of MaSuRCA assembler. I am looking for a good long-read scaffolder and your paper had nice results. However, when I tried using your scaffolder on a human genome assembly produced by MaSuRCA with ~9Mbp N50 contig size (about 1200 contigs), I found that the scaffolder duplicated many contigs in the scaffolds, resulting in much bigger (3.24Gbp vs 2.85Gbp) final assembly size. This is not the correct behavior. Scaffolder should output about the same amount of sequence, give or take losses in merging contigs. Contigs should never be duplicated exactly unless there is a very good reason for it, and if that is done, then duplicates must be resolved by remapping the reads and re-doing consensus. I found that duplicated contigs were always on the ends of paths in nodePaths.info. My assembly, config xml, the paf output of minimap and lrscaf output are posted here:

ftp://ftp.ccb.jhu.edu/pub/alekseyz/lrscaf_debug/

Best,
Aleksey Zimin

Can lrscaf accept 'scaffolds' as input instead of 'contigs'?

Hi @shingocat ,

Two questions about lrscaf,

  1. Can I use 'scaffolds' as input instead of 'contigs'? My draft assembly was generated from platanus-allee, and has been scaffolded by platanus-allee itself, but I want to try lfscaf to improve it. I know I could split scaffolds into contigs, but contigs are very very fragmented if I do so(several millions contigs may be), so I don't want to split the existing scaffolds.

  2. If 'scaffolds' is acceptable by lrscaf, should I do gap close(13% N in my draft assembly) first or lrscaf first ?

Best,
Kun

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.