Comments (15)
@cimendes : I added a miniscripts/annotate-tbl2gff.pl
a 'miniscript' that can be used to convert v-annotate.pl .tbl output files to GFF3 format. The script is in the develop
branch currently, and will be included in the next released version. For now, the version in the develop branch should work fine as a standalone conversion script.
This GFF format is not meant to be used for GenBank submissions. Use the .vadr.pass.tbl and .vadr.fail.tbl files for that.
Do
perl annotate-tbl2gff.pl -h
to see information on usage and options.
Please let me know if there are any problems with the script or feature requests.
from vadr.
@taltman : I plan to include this in the next version (next version after 1.1.1).
from vadr.
Hi @taltman : Sorry GFF output didn't make it into v1.1.1. Could you please share your script with me? I'm curious how you handled the Parent field in the attributes column. Thanks!
from vadr.
Hi @nawrockie , sorry for the delayed response.
I hacked together a quick script to do this conversion. I only "validated" it as much as the GFF annotations displayed sensibly in JBrowse. I didn't run it through any GFF validators to see whether it generates compliant GFF files. HTH!
https://bitbucket.org/tomeraltman/darth/src/master/src/tbl2gff.awk
In gearing up for submitting our novel CoV genomes to EBI, I will be verifying that the generated GFF is compliant, so hopefully I'll have an improved version soon.
from vadr.
Great, thanks @taltman ! If you do end up improving it, please let me know.
from vadr.
Hi, @nawrockie , could VADR output gff3 or gbk file now?
from vadr.
@zhaoxvwahaha : Unfortunately, not yet. Development of other features has taken priority. Are you able to use the script kindly shared by Tomer Altman above?:
https://bitbucket.org/tomeraltman/darth/src/master/src/tbl2gff.awk
from vadr.
Hi @nawrockie, It seems the tbl2gff cannot correctly covert the fuzzy positions during determine the ORF strand.
For example,
<3437 4116 mat_peptide
product NS2a
lines will be converted to MW164737 vadr mat_peptide 4116 <3437 . - . ID=ftr-8;Name=NS2a;
in result GFF file.
I tried to modify the scipt to
$1 && $2 {
if ( match($1, "[><]") != 0 ) {
begx = $1
# The gsub() function returns the number of substitutions made
gsub("[><]", "", begx)
} else {
begx = $1
}
if ( match($2, "[><]") != 0 ) {
endx = $2
gsub("[><]", "", endx)
} else {
endx = $2
}
if ( int(begx) < int(endx) ) {
start = $1
end = $2
strand = "+"
} else {
start = $2
end = $1
strand = "-"
}
ftr_key = $3
++ftr_id
#print start, end, strand, ftr_key, ftr_id
}
Does this modification is right?
By the way, Is there any python library could parse the GFF3 file and process fuzzy positions, the BCBio
(https://github.com/chapmanb/bcbb) filed to read the records with fuzzy positions.
from vadr.
@Zjianglin : I'm not sure about your modification of tbl2gff, you might try asking Tomer Altman who wrote that code (https://bitbucket.org/tomeraltman).
I don't know of any python library that can handle the fuzzy positions.
My suggestion would be to either try to modify tbl2gff not output the '>' and '<' characters, or write a simple script that strips them out as an extra step after you've run created the gff file. You could also try writing a script that strips them out of the .tbl file that vadr creates prior to running tbl2gff, or trying to parse the output .ftr table that vadr creates (https://github.com/ncbi/vadr/blob/master/documentation/formats.md#ftr) but note that the .ftr table does not have coordinate positions 'trimmed' due to Ns, like the .tbl file does.
from vadr.
Hi @nawrockie , thanks for your reply and suggestions. I would try to manually check the genomes that with fuzzy positions and strip them out. Thank you again.
from vadr.
Hi @nawrockie , sorry for the delayed response.
I hacked together a quick script to do this conversion. I only "validated" it as much as the GFF annotations displayed sensibly in JBrowse. I didn't run it through any GFF validators to see whether it generates compliant GFF files. HTH!
https://bitbucket.org/tomeraltman/darth/src/master/src/tbl2gff.awk
In gearing up for submitting our novel CoV genomes to EBI, I will be verifying that the generated GFF is compliant, so hopefully I'll have an improved version soon.
Hi @taltman , could you please check the my modification of the [tbl2gfff](https://bitbucket.org/tomeraltman/darth/src/master/src/tbl2gff.awk)
? The original script seems cannot process fuzzy positions.
from vadr.
+1 for this request
I've had a couple of requests to visualize the outputs of VADR in IGV, specifically in a GFF3 file
from vadr.
Is this feature still on the roadmap for VADR development?
from vadr.
@cimendes sorry for the long delay on this requested feature. I'm working on it now and will post another update by the end of next week.
from vadr.
Thank you for the update! That is wonderful news!
from vadr.
Related Issues (20)
- ERROR in vdr_EutilsFetchToFile, problem fetching NC_063383.1 (undefined)
- ERROR in sqf_EslTranslateCdsToFastaFile, problem translating CDS feature, unable to find expected translated sequence in EU156171/EU156171.vadr.cds.esl-translate.2.fa: HOT 3
- Is there a database built for most viruses? HOT 2
- Bug in parse_blast.pl script causes max length deletion to be incorrectly reported in rare cases HOT 1
- Incorrect reporting of short frameshifts in 5' truncated CDS in rare cases HOT 1
- Bug: v-annotate.pl does not report cdsstopn for early in-frame stop codons in 3' truncated CDS if they are the final 3 nt of the input sequence HOT 3
- Bug: early stops misreported for multi-segment CDS HOT 1
- Bug: codon_start and truncation flag (<) missing in .tbl for multi-segment CDS if sequence begins after end of segment 1 HOT 1
- v-annotate.pl fails with 'unable to open file' error on mac os/x HOT 1
- Problem with building a model library HOT 4
- Installation Documentation: vadr-install.sh download link HOT 1
- Installation Issue 1.5.1 HOT 8
- Unable to build a model(NC_006273) HOT 1
- Bioconda - 1.6.3 broken symlink HOT 3
- compatibility with `-` and `.` in input assemblies HOT 2
- v-build.pl 1.6.3 fails on mac os/x HOT 1
- Extra header lines in .vadr.sqa HOT 2
- Outdated MODEL-VERSIONS.txt previnting automated model download HOT 2
- Frameshifts for 5' truncated alignments beginning with an insert are mishandled HOT 1
- v-annotate.pl bug: cdsstopp alert causes script to fail if early stop position exceeds CDS model length HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from vadr.