Git Product home page Git Product logo

aligngraph's People

Contributors

baoe avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

aligngraph's Issues

Parallelizing read grouping post alignment

Hi,

I'm running AlignGraph for one of my projects and it has been running for quite sometime now. Upon closer inspection, I realized that the time consuming step is where AlignGraph groups reads that map to reference contigs into separate files (tmp/_reads_genome* files). This step is taking roughly four minutes for each contig in my case. I have approximately 3000 contigs and that means AlignGraph will be at this stage for atleast 200 hours. So I have a suggestion, perhaps it would be nice to have this step parallelized? If AlignGraph could independently handle multiple instances of this sorting, I could use more threads and get past this step faster. I have at least ten reference based assemblies to make and I would like for this step to not be the rate limiting one.

Thank you very much,
Ram

A suggestion

Hi bao,
Thanks for your amazing software. AlignGraph is working well on my computer after difficult error location. I strongly recommend you to improve the error report part. Such as line 4366 and 4374 in AlignGraph.cpp, maybe you could make the error report be more specific.

best,
Zhu

distanceLow and insertLow parameter calculation

Hello,

I'm not sure what max{insert length - 1000, single read length} means for calculating distanceLow parameter. Can you elaborate?

Plus, I have several libraries with different insert sizes, how do I calculate it then?

Also, since I have several libraries, how do I calculate insertLow and High? --insertLow, let insert length be min{I1, I2}; for --insertHigh, let insert length be max{I1, I2} isn't much help since I have no idea what this means.

Thanks for your help.

BLAT/PBLAT issue "Maximum single piece size (5000) exceeded"

Hello Bao,
I have assembled a de-novo genome and would like to align it to the reference genome of a close species using AlignGraph. So far so good. I run AlignGraph with the following command:

/home/bin/AlignGraph/AlignGraph/AlignGraph --read1 ../Start_fasta/Start_RawReads_FD.fasta --read2 ../Start_fasta/Start_RawReads_RD.fasta --contig ../../1_Short_Read_Assembly/MaSuRCA_1/CA/10-gapclose/genome.ctg.fasta --genome ../../../reference/assembly/ref_281_v5.0.softmasked_GCM.fa --distanceLow 100 --distanceHigh 1350 --extendedContig AlignGraph_1_extendedContigs.fa --remainingContig AlignGraph_1_remainingContigs.fa

This is a small summary of the input reads/genomes and their length distribution (AlignGraph_Issue.xlsx).

So far so good, until blat/pblat (I tested both) throws out the following error in the blat_doc.txt:

Maximum single piece size (5000) exceeded by query 1.1 of size (49814). Larger pieces will have to be split up until no larger than this limit when the -fastMap option is used.

I took the freedom to add some lines to the AlignGraph.ccp. So I know that this happened around line 3654 (AlignGraph.ccp) in the

"void * task1(void * arg)"

when

"command = "/home/bin/icebert-pblat-ed0ac17/pblat tmp/_genome." + itoa(chromosomeID) + ".fa tmp/_contigs.fa -noHead tmp/_contigs_genome." + itoa(chromosomeID) + ".psl -fastMap -threads=8 > blat_doc.txt 2> blat_doc.txt";"

is called.

Now, I understand that BLAT/PBLAT is struggling with aligning the "de-novo" contigs against the "reference" genome. Because some "de novo" contigs are >5000bp and blat/pblat requires them to be shorter than 5000bp (-fastMap flag to suppress gaps) this causes the error. Did I get it right?

Is the only possibility to split my own "de-novo" contigs to acceptable sizes, or does a workaround exist? I would like to retain the longer contigs, if possible. Else I would just proceed and split every contig longer than 5000bp into separate fasta entries.

Best regards,
Ale R.

ps: illegal option -- f

Hi,

I get this messages while running AlignGraph:

(0) Alignment finished

CHROMOSOME 0:
(1) chromosome loaded
(2) contig alignment loaded
(3) read alignment loaded
(4) contigs extended
(5) contigs scaffolded
ps: illegal option -- f
usage: ps [-AaCcEefhjlMmrSTvwXx] [-O fmt | -o fmt] [-G gid[,gid...]]
[-u]
[-p pid[,pid...]] [-t tty[,tty...]] [-U user[,user...]]
ps [-L]

FINISHED for 3192 seconds (2064 seconds for alignment) :-)

What does this ps error mean and how to get around it?

Best wishes,

Karsten

Segfault

I'm getting a segfault on 1 TB RAM machine comparing two small bacterial genomes.

Here's the stdout:
cmd-> cat alignGraph.out
AlignGraph: algorithm for secondary de novo genome assembly guided by closely related references
By Ergude Bao, CS Department, UC-Riverside. All Rights Reserved

(0) Alignment finished

CHROMOSOME 0:
(1) chromosome loaded
(2) contig alignment loaded

Here's error:
[2]+ Segmentation fault /sgi/asmopt/src/AlignGraph/AlignGraph/AlignGraph --read1 all_R1.fasta --read2 all_R2.fasta --contig contigs.fasta --genome chromosome.fasta --distanceLow 1000 --distanceHigh 1000 --extendedContig extendedContigs.fa --remainingContig remainingContigs.fa > alignGraph.out 2> alignGraph.err

Also, you should not hardcode the number of processors for bowtie2 to 8 - we have 64, the prog should pick max at runtime.

alignGraph failed with error: INCONSISTENT PE FILES!

Hi, Bao,

I used the following command to run alignGraph:
AlignGraph --read1 Oka_S2_R1_002trim.fa
--read2 Oka_S2_R2_002trim.fa
--contig soap_contig.fa --genome oka_HiC_23chromOnly.fa
--distanceLow 150 --distanceHigh 1150
--extendedContig extendedContig.fa
--remainingContig remainingContig.fa --kMer 17
-insertVariation 100

I am also getting the error, INCONSISTENT PE FILES! But the input paired-end files contain the same number of sequences (checked with bioawk -cfastx 'END{print NR}' input_file). The input PE files were trimmed by bbMap quality trim module (using command bbduk.sh -Xmx1g ...)

I wondered what could be wrong here? Can AlignGraph work on un-trimmed PE data? or AlignGraph prefer data trimmed by trimmomatic?

Thank you for your advice!
Lan

PE read files that are not one-to-one mapped??

I am getting the error, INCONSISTENT PE FILES! I am inputting paired-end read FASTA files, but there are different numbers of reads in each FASTA file. Is there a way to run AlignGraph with files that are not one-to-one mapped?

Aligngraph comes back with 'CANNOT OPEN FILE!'

I'm dealing with paired end E.coli re-sequencing data. Assembly was done using SPAdes.

When I tried to further align my contigs with Aligngraph, It always comes back with 'CANNOT OPEN FILE!' without any other information

My command:

~/files/AlignGraph/AlignGraph/AlignGraph --read1 97_1.fasta --read2 97_2.fasta --contig scaffolds_97.fasta --genome LF82.fasta  --distanceLow -700 --distanceHigh 1300 --extendedContig 97_extendedContigs.fa --remainingContig 97_remainingContigs.fa

What could be the causing this problem and how can I fix this?

Thank you for any suggestion!!

AlignGraph running indefinitely

Hi,
AlignGraph is running indefinitely in computer for more than two days, but both extended and remaining contig files are still empty. I could see so many files in tmp folder. May I know the problem? There is no issue with Bowtie2 and BLAT, both are working like a charm.

I appreciate any kind of help!

Thanks
Ramesh

Crash when creating _short_initial_contigs_extended_contigs.* files

Hello,

I have the following issue with AlignGraph: the program crashes during the creation of _short_initial_contigs_extended_contigs.* files by BLAT (corresponding to that line of code: s = "blat tmp/_extended_contigs." + itoa(i) + ".fa tmp/_short_initial_contigs." + itoa(i) + ".fa -noHead tmp/_short_initial_contigs_extended_contigs." + itoa(i) + ".psl > blat_doc.txt 2> blat_doc.txt";).

Blat give this error:
Program error: trying to allocate 0 bytes in needLargeMem (limit: 536870912)
blat: memalloc.c:90: needLargeMem: Assertion `0' failed.

After looking at temporary files, it seems that, for some of the reference genome contigs, the _extended_contigs.* and _short_initial_contigs.* files are empty.

Is this an issue with my dataset, or with the program? Can I do something to resolve this issue?

Thank you in advance for your answer.
Best regards,
Yann

Extended Output

Hi,

Thanks for developing this. I had a question about the output files and, after reading the README, I am still unsure on the expected output.

I am using PE150 MiSeq reads. Performed a SPAdes assembly (this output is *.scaffolds.fasta). These genomes are highly fragmented but >99% ANI to a reference genome that I have previously sequenced fully with PACBIO (complete genome; 3.77Mb). However, when I use AlignGraph, the output files confuse me.

One example:

Sequence_ID	Total_Contigs	Genome_length	Largest_Contig	n50	GC_Percent
Desert-2-3.extended	16	3335782	1108581	343422	70
Desert-2-3.remain	19	707214	201094	182646	71
Desert-2-3.scaffolds	73	3740981	508547	119510	71

This appears to me that I would need to concatenate the *extended.fasta with the *remaining.fasta file to get the desired genome? Any clarification would be great.

Here is the command I am using:

AlignGraph --read1 $OUTDIR/${mate}_R1_001.fasta --read2 $OUTDIR/${mate}_R2_001.fasta \
--fastMap --contig $genome.scaffolds.fasta --genome $REFGENOME \
--distanceLow 550 --distanceHigh 1550 \
--extendedContig $genome.extended.fa --remainingContig $genome.remain.fa 

Thank you!

Continuing aborted run

Hey Bao,

I'm running AlignGraph for reference guided improvement of a ~400MBp plant genome assembly on a compute cluster as a single job with 8 cores for 6 days - the maximum allowed walltime for any job on the cluster.
Unfortunately, when reaching the walltime limit AlignGraph is still in the Bowtie alignment phase and the job is killed. Is there any chance of continuing the pipeline from a given point or maybe even perform the alignment as a seperate job with more resources?
If I understood correctly, Bowite is executed with 8 cores by default from within AlignGraph. Is there maybe a way to freely define Bowtie's (and BLAT's) resources?

Thanks in advance,
Nico

Command line specification

Hello,

I'm trying to run AlignGraph using this command:
AlignGraph --read1 AM_1P.fa --read2 AM_2P.fa --contig Am_assembly_47.scafSeq.fa --genome AMgenomic.fa --distanceLow 11 --distanceHigh 283 --extendedContig AM_extended.fa --remainingContig AM_remaining.fa --kMer 47

The program does not run, instead printing only this output:

AlignGraph: algorithm for secondary de novo genome assembly guided by closely related references
By Ergude Bao, CS Department, UC-Riverside. All Rights Reserved

AlignGraph --read1 reads_1.fa --read2 reads_2.fa --contig contigs.fa --genome genome.fa --distanceLow distanceLow --distanceHigh distancehigh --extendedContig extendedContigs.fa --remainingContig remainingContigs.fa [--kMer k --insertVariation insertVariation --covereage coverage --part p --ratioCheck --iterativeMap --misassemblyRemoval --resume]
Inputs:
--read1 is the the first pair of PE DNA reads in fasta format
--read2 is the the second pair of PE DNA reads in fasta format
--contig is the initial contigs in fasta format
--genome is the reference genome in fasta format
--distanceLow is the lower bound of alignment distance between the first and second pairs of PE DNA reads (recommended: max{insert length - 1000, single read length}) . . .

I am on a linux cluster environment with all the files in the same directory. Any help troubleshooting this issue would be appreciated!

std::bad_alloc

Ran AlignGraph with the following command:
AlignGraph --read1 2011C-3493_1.fa --read2 2011C-3493_2.fa --contig 2011C-3493_contigs.fa --genome NC_018658.fa --distanceLow 100 --distanceHigh 1495 --extendedContig 2011C-3493_contigs_extended.fa --remainingContig 2011C-3493_contigs_remaining.fa

and got the following error:

(0) Alignment finished

CHROMOSOME 0:
(1) chromosome loaded
(2) contig alignment loaded
terminate called after throwing an instance of 'std::bad_alloc'
what(): std::bad_alloc
Aborted (core dumped)

BOWTIE2 CALL FAILED

Hello,
When I run AlignGraph, I faced a problem that bowtie2 call failed. I have tried several methods, but can not solve it. Here are my scrips:
PATH: /opt/smrtlink/smrtcmds/bin:/usr/lib64/qt-3.3/bin:/usr/kerberos/sbin:/usr/kerberos/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:$/home/Henry/bin/AlignGraph-master/AlignGraph:/home/Henry/bin/MUMmer3.23/scripts:/home/Henry/bin/MUMmer3.23/src/tigr:/home/Henry/bin/MUMmer3.23:/home/Henry/bin/AlignGraph-master/AlignGraph/bowtie2-2.2.3:$/home/Henry/bin/AlignGraph-master/bowtie2-2.2.3

AlignGraph --read1 S_trim30_pe1.fasta --read2 S_trim30_pe2.fasta --contig contigs.fasta --genome Ca_Brocadia_sp_40.fasta --distanceLow 550 --distanceHigh 1550 --fastMap --extendedContig S_extendedContigs.fa --remainingContig S_remainingContigs.fa
AlignGraph: algorithm for secondary de novo genome assembly guided by closely related references
By Ergude Bao, CS Department, UC-Riverside. All Rights Reserved

BOWTIE2 CALL FAILED!

I have no idea why bowtie2 call failed. I was wonder what is the reason? Thanks very much.

exit unexpectly

Hi, @baoe , I'm trying to assemble a bacteria genome of about 6.5M. Contigs was assembled by SPAdes. bowtie2 (v2.2.6) and blat (v. 35) were installed and added to $PATH.

AlignGraph exited with message below,

(0) Alignment finished

CHROMOSOME 0: 
(1) Chromosome loaded
(2) Contig alignment loaded

No wrong code appeared.

And adding option --resume seems not work, and no useful help message is given.

Besides, an option of --temp-dir could be useful when submitting multiple jobs to clusters from same work directory.

distanceLow and insertLow parameter calculation from paired-end reads

Hallo,

Kindly guide me on how to calculate distanceLow and distance high values.
I am using paired end reads with insert size of 375. Will I be correct to have distanceLow as 375 and distanceHigh as 1375?

--distanceLow is the lower bound of alignment distance between the first and second pairs of PE DNA reads (recommended: max{insert length - 1000, single read length}).
--distanceHigh is the upper bound of alignment distance between the first and second pairs of PE DNA reads (recommended: insert length + 1000).

Single Read Libraries

Hello,

I am very interested in utilizing your software for a project. However, we are using Ion Torrent PGM for sequencing and thus we only have a single read library, reference genome and Denovo contigs.

Is have paired reads necessary for your implementation?

Reference genome question regarding mate pairs

Hi, I noticed that in your FAQ section, you recommend not to use mate pair libraries. Will there be a problem if I use a mate pair created assembly as a reference to assemble paired end reads? I was unsure if the mate pair library comment was only meant for mapping mate pairs to a reference or it should not be used at all.

Need testdata for AlignGraph running

Dear Baoe,
hi,
I would like to use the program AlignGraph for my work at the university and need for your test data.
I thank you in advance for your help.

best wishes,

Armel

SegFault -- out of bounds array index

I posted this on a previously closed ticket, but I'm not sure if you get updates on such responses, so I'm opening a new ticket here. I am having a SegFault issue similar to one reported on a closed ticket. I have tried to narrow down where this happens and provide that information here:

Here is a snippet of your code mixed with my prints (starting around line 1275):

cont:
    cout << "cont: A" << endl;
    k2.traversed = 0;
    k2.s = nextS;
    k2.chromosomeID0 = nextID0;
    k2.chromosomeOffset0 = nextOffset0;
    k2.coverage = 0;
    k2.A = k2.C = k2.G = k2.T = k2.N = 0;

    cout << "cont: B" << endl;
    cout << "nextid: " << nextID << ", nextOffset: " << nextOffset << endl;
    cout << genome[nextID][nextOffset].contiMer.size() << endl;
    cout << "nextid0: " << nextID0 << ", nextOffset0: " << nextOffset0 << endl;
    cout << genome[nextID0][nextOffset0].contiMer.size() << endl;

I have additional cout statements at the beginning of each if() statement and just before each if() statements; however the last few lines of the std out are:

cont: A
cont: B
nextid: 0, nextOffset: 28150056
0
nextid0: 4294967295, nextOffset0: 4294967295

So, it appears that for some reason the segfault is caused by trying to lookup indices that don't exist (genome[nextID0][nextOffset0]). Any thoughts?

Time-consuming step in AlignGraph

Hi,

I have been running AlignGraph for two days. I thought the most time-consuming stelp would be blat. But in the "temp" folder, I have all "_contigs_genome..psl" ready, but for "_reads_genome..bowtie", only 109/4312 have been finished. AlignGraph uses only one thread doing this step and it takes about 15 mins to generate one "_reads_genome.*.bowtie" file. Is this expected and can AlignGraph use multiple thread in this step?

Best,
Danshu

Crashes with Segmentation Fault

I have run AlignGraph successfully before on a subset of my reads, but now that I have loaded all of my data, I AlignGraph has crashed twice with the same error.

"(0) Alignment finished

CHROMOSOME 0:
(1) chromosome loaded
(2) contig alignment loaded
Segmentation fault (core dumped)"

Any suggestions?

Thanks!

BLAT call fails

Hi,

Every time I'm trying to run AlignGraph, I get an error saying BLAT call fails. This is my script:

AlignGraph to scaffold with a reference

set machtype

MACHTYPE=x86_64
export MACHTYPE

adding the path to bowtie

export PATH=/cm/shared/apps/bowtie/2.2.3:$PATH

adding the path to blat

export PATH=/ernie/home/bin/aligngraph/blatSrc/bin/x86_64:$PATH

adding the path to pblat

export PATH=/ernie/home/bin/aligngraph/pblat-1.6.0:$PATH

adding the path to nucmer

export PATH=/ernie/home/bin/aligngraph/MUMmer3.23:$PATH

path to AlignGraph

export PATH=/ernie/home/bin/aligngraph/AlignGraph-master/AlignGraph:$PATH

NAME=R11
KMER=k41
GENOME='/ernie/groups/quoats/contigs/combined_genomic.fasta'
READ1="${NAME}_all_R1.fasta"
READ2="${NAME}_all_R2.fasta"

AlignGraph --read1 ${READ1} --read2 ${READ2} --contig ${NAME}-${KMER}-scaf.fasta --genome ${GENOME} --distanceLow 101 --distanceHigh 1068 --extendedContig ${NAME}-${KMER}-extendedContigs.fa --remainingContig ${NAME}-${KMER}-remainingContigs.fa

When installing BLAT I did 'MACHTYPE=x86_64' and 'export MACHTYPE' too. I have it in my script since every time I log out I have to export it again, so I've got it in my script. I've also added paths to all dependencies thinking that maybe I've copied my executables to the /home/bin directory wrongly. But I'm still getting this error. Is there anything else I can try?

Thanks!

BLAT call failed with Aligngraph

Hello Bao,
My $PATH contains all the three tools - bowtie2, blat and pblat but I still get BLAT CALL FAILED! error. How can I troubleshoot this? I'd like Aligngraph to use pblat preferably but it is trying to look for blat and that also it is not able to find.. Please help.

Thanks,
Meeta

BLAT ERROR

I'm working on assembly data, and I get always the same error:
BLAT CALL FAILED!
I searched BLAT v34 or below but I was not able to find any solution. Could you help me, by providing link for download of correct version of BLAT?

Having trouble resuming

Can you please provide an example on how to resume a run that stopped during the extending step?

I have tried:
AlignGraph [all previous params] --resume (just prints out usage with no error message and quits)
AlignGraph --resume (says cannot open file)

I also have some runs that stopped during the alignments stage. Is there a way to resume from here?

Thanks!

How to chose pblat instead of blat or nucmer

Hello,

I see I can chose to use nucmer instead of blat with the --fastMap option but I can't find anywhere in the files or manual how to chose pblat over blat. I would like to use pblat both because its parallelized and because it can manager larger contigs. Is there anyway of specifying that pblat should be used?

Thanks,
Diane

AlignGraph terminated with error message

hi bao,
alignGraph runs and stops after 30 seconds from the following error message:
"terminated called after throwing an instance of 'std:bad_alloc'
what(): std:bad_alloc"

here is my command: " ./AlignGraph --read1 HB_1_R1.fastq --read2 HB_1_R2.fastq --contig HB_1_contigs.fasta --genome acinetobacter_sp._ruh2624__v2__2_genes.fasta --distanceLow 250 --distanceHigh 1300 --extendedContig extendedContigs.fa --remainingContig remainingContigs.fa "

What can be the problem?
Thanks for the speedy response
Armel

BLAT CALL FAILED! (New problem for existing issue)

Hello and thank you for this highly acclaimed software. I am attempting to use it, but keep receiving this error.

AlignGraph: algorithm for secondary de novo genome assembly guided by closely related references
By Ergude Bao, CS Department, UC-Riverside. All Rights Reserved

BLAT CALL FAILED!

After reading up I've made sure to install a previous BLAT build (v34) and export its location to $PATH The transitional contigs do not seem to have large spaces, as was the issue with another user's data:

CGCCACTGGAATCAATGAAGTCGCCCCGGAGCGCCAATTAATTTTTATTGTTGTTTTTAT
TTTATTTTTATCGTTGTTTTGACAAAATAAAACTTACTTTTAACACAAATATCTTACAAA
GTGTTTTGTACAACTGTCCTGTTTTCATAATAAAATAAATAAATTCATTAAATTATTCCA
TATTTAAAAGTAGGAAACAAAAATATTCAACAATGATTGTGAATTTTTTTTAATCAGTGT
TCACATAGGGTGGTGAATTAGACACTTCTGATAGAGATTTTTCCATTGAGTCACCAGTCA
GCAATTGTTTTTCCAGGGATGGCAACATTATTAACCGAAAATAAAATAAAAACATAACCT
ACTTTTCTTGTAGATGTGCCAAAAATGGATAAGTATGTTTTGGTTGAATATTCAAGTAAT
TTAAATGCCGTGATATGCGAAACGTTCAAGTCTATGCAACGAATTAAGGTCAGTGAGAGG
AGTCAATATTTTTTATATAAAATAAAAAAAGCTTTTGATATAATTTCAGCTAATGAGTTT

But could there be an issue with my paired-end reads' headers?

TTCAGGTGTCCTCTAAGCGGACAATGACCGGTTAGTAGCCCTGTAATTATTCTGAGCCGATCCCTACTATATTGAAGTAGTTCTTCGGTTCTTTTCTCGGACGGTTCCCTAATGAAGGTTCTACT
+
B<BBBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF/BFFFFFFFFFFFFFFFFFFFFFFBFBFFFFFFF<<FFFFFFFFFFFFFBFFFFFFFBFF
@HWI-D00256:413:C7N5GANXX:1:1101:2404:2383 1:N:0:ATTACTCGCCTATCCT
GGAAAACAAAGTAAACCCAAATTCCAATTGAATGAAATTGTACGGATAAGTAAATACAAAAATATTTTTGAAAAGTCTTACACTGCGAATTTCACCACGGAGCTCTTCAAGATTGTTAAAATTAAT
+
///BBFFFFBFFBBFFFFFFFFFFFFFFFF/FFFFFFFFFFFFFFBFF<FFFFFFFFFFFFFFFFBBFFFBFFFFFFFFFFFFFFFFFFFBFFFFFFFFFFFFBBFFFFFFFF/B/BFFFFFFFFB
@HWI-D00256:413:C7N5GANXX:1:1101:2445:2420 1:N:0:ATTACTCGCCTATCCT
TAATATGCCAATGCGTTAATCCCATCAGAGGTTGGAATCGAAAGTAGTTGGAAATGTATTATGTTGAATATGACGTTCGATAACTACAAGAAATGAATTACTTAGGTATATATGGAAAATTCAGAA

I hope you can help me resolve this issue and thanks again for your consideration

Get stuck in bowtie-align

Hello,
I ran this
AlignGraph --read1 8111_ALS1279-S13_1_P.fastq --read2 8111_ALS1279-S13_2_P.fastq --contig contigs_scaffolds.fasta --genome GCF_001941865_1.fasta --distanceLow 50 --distanceHigh 1602 --extendedContig extendedContigs.fa --remainingContig remainingContigs.fa --fastMap

And got stuck in bowtie-align.
When I killed the process I got this message and an empty extendedContigs.fa file

I'm using a scaffold from ragout and 250pb PE reads from illumina


CHROMOSOME 0: 
(1) Chromosome loaded
(2) Contig alignment loaded

(3) Read alignment loaded
(4) Contigs extended
(5) Contigs scaffolded

FINISHED SUCCESSFULLY for 22288 seconds (22226 seconds for alignment) :-)

AlignGraph getting stuck on Bowtie2-align

So I've been running AlignGraph with 10 million paired reads, 160 Mb of scaffolds on a 350 Mb genome (with PBLAT), but I've stumbled upon a problem.
For over a week now AlignGraph has not written ANYTHING. Not in the result or log files, nor in the tmp folder. When I look at the server that is running the program I only see some sleeping processes, among which "Bowtie2-align" which goes from 0 to 900.0% cpu every few seconds, but then back to 0 again.

Do you have any idea what may be wrong and how I could fix it? Because I'm quite sure this instance of AlignGraph won't finish.. Ever.

Inconsistent PE files error

I'm able to do a sopadenovo2 assembly with my paired end reads that I got from a sequencing centre but the AlignGraph errors "Inconsistent PE files" is there a way to correct my files so the program does not error?

Thanks
Rob

empty extendedContigs.fa

Dear Baoe,

Hi
I run AlignGraph on PE reads after de novo assembly of them using velvet. My genome is a fungi and it has not sequenced before I use S.Cerevisiae (yeast) genome as reference genome. First I could not be able to run it because it was giving the PE inconsistency error. It took a while to realize actually AlignGraph needs reads to be as fasta format not fastq, So I converted the reads to fasta format and it run successfully but the problem now is that my extendedcontigns file is empty while the output.stdout file looks like below >

AlignGraph: algorithm for secondary de novo genome assembly guided by closely related references
By Ergude Bao, CS Department, UC-Riverside. All Rights Reserved

(0) Alignment finished

CHROMOSOME 0:
(1) chromosome loaded
(2) contig alignment loaded
(3) read alignment loaded
(4) contigs extended
(5) contigs scaffolded

CHROMOSOME 1:
(1) chromosome loaded
(2) contig alignment loaded
(3) read alignment loaded
(4) contigs extended
(5) contigs scaffolded

CHROMOSOME 2:
(1) chromosome loaded
(2) contig alignment loaded
(3) read alignment loaded
(4) contigs extended
(5) contigs scaffolded
...
CHROMOSOME 16:
(1) chromosome loaded
(2) contig alignment loaded
(3) read alignment loaded
(4) contigs extended
(5) contigs scaffolded

FINISHED for 4230 seconds (524 seconds for alignment) :-)

So do you have any idea what is the potential problem with that?

Beside I suggest to adjust AlignGraph to run on fastq as many people will also face with the same problem of PE inconstancy error!

Best wishes,

Amir

implementing parallelized blat?

Like others have mentioned here, I've been struggling with some exceptionally long runtimes, which seem to be occurring due to a bottleneck at the blat alignment stage. I was wondering how difficult would it be to incorporate a parallelized version of blat (e.g., pblat) into the AlignGraph source? Evidently pblat requires only a single additional command ("-threads" option).

Thanks in advance,
Kevin

Support piping input from stdin

Hi,
Does AlignGraph allow piping input reads from stdin or is it planned to implement that in the near future?
Your readme says AlignGraph only accepts PE-reads in fasta-format, but most commonly such reads will be in fastq format and stored as gzip or bzip2 compressed files. Piping from stdin would allow great flexibility to use these inputs without having to keep different file-formats on the server.

AlignGraph stop without error message

Hi,
I tried to extend my SPAdes result with AlignGraph, but it stop without error message

CHROMOSOME 0:
(1) Chromosome loaded
(2) Contig alignment loaded
Killed

Here is my command:
~/AlignGraph/AlignGraph --read1 S_aureus_trimmed_1.fasta --read2 S_aureus_trimmed_2.fasta --contig scaffolds.fasta --genome Reference.fasta --distanceLow 230 --distanceHigh 1500 --extendedContig out/extendedContig.fasta --remainingContig out/remainContig.fasta

Thank you
Felix

Reference Genome for Eukaryotes

Dear Bao,

I want to extend the scaffolds of an Eukaryote assembly, I have the reads and the scaffolds of my focus species, and would like to know how the reference genome should be organized. Should I join all the reference chromosomes to input it in AlignGraph?

Thank you,
Gabi

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.