The mason components needed to run the solgenomics.net website
solgenomics / bio-genomeupdate Goto Github PK
View Code? Open in Web Editor NEWTools for updating a genome assembly
Tools for updating a genome assembly
Print fasta of query seqs that did not align
find way to include paths of lib modules in scripts that call them without hardcoding them.
Columns for Trim curation file:
Create new classes
Check AlignCoordGroup.pm
sample nucmer output
18361542 18377740 1 16206 16199 16206 99.94 70787664 203766 0.02 7.95 1 1 SL2.50ch03 Contig90
53614018 53614614 17944 17348 597 597 99.66 70787664 203766 0.00 0.29 1 -1 SL2.50ch03 Contig90
500bp.mixedoutoforder.agp.group_coords.stdout
Contig90 SL2.50ch03 18361542 18564072 202530 1 203766 203766 154944 1 1 0 Contains 0 0 48822 0 596 SL2.50ch03:597:53614018:53614614
The location is relative to the original TPF line but multiple insertions are done with respect to the original line number, not the new location. Need to maintain a separate offset counter for 'before' and 'after' insertions for each original TPF line.
See Jeremy's controller code
Maybe move .PL /scripts into scripts dir. Only if history is preserved.
Print filtered delta output file without the BACs that are aligned out of order.
nof components
nof gaps
avg, std dev
% covered by components and gaps
Add methods AlignmentCoordsGroup.pm. Removing BAC alignments that align in non co-linear order to ref chr.
get_tpf_with_bacs_inserted
The problem is that mummer does not report alignments to N's so BAC regions that extend beyond the WGS contig are not considered. Need to compute "overhangs" from BAC length and seq_in_clusters.
Needed for BAC alignments to all chrs.
Fix get_gap_overlap()
Then remove err msg from align_BACends_group_coords.pl and test
query BAC aligns to both + and - strand of ref chr
Align BACs to all chrs and validate if they align to only the chr they belong to. If not, add them to the no_chr set.
File: query_bacends.fasta
-r Fasta file of reference (required)
-q Fasta file of query (assembled and singleton BACs, required)
-c Contig or component AGP file for reference (includes scaffold gaps)
-s Chromosome AGP file for reference (with only scaffolds and gaps)
Modify copy_updated_coordinates_to_vcf to sort -V its output file so that features end up in correct order and are ready for compression and display in jbrowse.
Attribute (accession_prefix_last_base) does not pass the type constraint because: The string, -1, was not a positive coordinate at /usr/local/lib/x86_64-linux-gnu/perl/5.20.2/Moose/Object.pm line 24
Moose::Object::new('Bio::GenomeUpdate::SP::SPLine', 'chromosome', 10, 'accession_prefix', 'AEKE02007654', 'accession_suffix', 'AC239654', 'accession_prefix_orientation', '-', 'accession_suffix_orientation', '+', 'accession_prefix_last_base', -1, 'accession_suffix_first_base', 1, 'comment', 'BAC AC239654 is contained within WGS contig AEKE02007654 from previous version. Designates switch point from WGS contig to BAC.') called at /home/surya/work/Eclipse/Bio-GenomeUpdate/lib/Bio/GenomeUpdate/TPF.pm line 1628
Columns for switch point curation file:
Check using GRC tpf_solo pipeline.
AEKE02023669 ? SL2.50sc05925 MINUS
AC244870 ? SL2.50sc05925 PLUS
AC244937 ? SL2.50sc05925 MINUS contig469
AC244803 ? SL2.50sc05925 PLUS contig469
AC244944 ? SL2.50sc05925 PLUS contig469
AC254768 ? SL2.50sc05925 MINUS contig469
AEKE02023661 ? SL2.50sc05925 MINUS
Contig469_right_1000 aligns to -ive AEKE02023661.1
Contig469_right_1000 aligns to -ive end of AC244937
Contig469_left_1000 aligns to -ive end of AC254768
Contig469_left_1000 aligns to middle of AC244870
Correct order
AEKE02023669 ? SL2.50sc05925 MINUS
AC244870 ? SL2.50sc05925 PLUS
AC254768 ? SL2.50sc05925 MINUS contig469
AC244944 ? SL2.50sc05925 PLUS contig469
AC244803 ? SL2.50sc05925 PLUS contig469
AC244937 ? SL2.50sc05925 MINUS contig469
AEKE02023661 ? SL2.50sc05925 MINUS
68kb 99% identical alignment between AC244870 and AC254768
9.2kb 99% identical alignment between AC244937 and AEKE02023661
Is it required???
TPF spec v1.8 added biological feature in a non-sequence line (centromere or heterochromatin).
Should be able to read in set of AGP/TPF files and produce tabular report
Will be substituted in for "ContigX" placeholder from the group_coords sdtout.
uniq in mixed, outoforder
common in both_errors
add #,length of gaps covered for components/contigs and scaffolds to groupcoords report
Use 500 1 convention of nucmer
-t <TPF_file> Original TPF file (mandatory)
-s scaffold AGP file (mandatory)
-c chromosome AGP file (mandatory)
Should include switch over cases. For testing GFF -> TPF pipeline for bld 3.0
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.