camilogarciabotero / genefinder.jl Goto Github PK

View Code? Open in Web Editor NEW

14.0 1.0 1.0 1.29 MB

A Gene Finder framework for Julia.

Home Page: https://camilogarciabotero.github.io/GeneFinder.jl/dev

License: MIT License

Julia 100.00%

algorithms bioinformatics biology gene orf-search gene-finding

genefinder.jl's Introduction

Hello there, I'm Camilo

This is my README profile. Hope you get through my repos and find something useful, here some stats:

genefinder.jl's People

Contributors

Stargazers

Watchers

Forkers

vdejager

genefinder.jl's Issues

TagBot trigger issue

This issue is used to trigger TagBot; feel free to unsubscribe.

If you haven't already, you should update your TagBot.yml to include issue comment triggers.
Please see this post on Discourse for instructions and more details.

If you'd like for me to do this for you, comment TagBot fix on this issue.
I'll open a PR within a few hours, please be patient!

Using score to filter what `getorfs` delivers

After #26 and #32 we can now have a more flexible way to use the findorfs with multiple ORF finder methods with or without scoring scheme. Now, we can levearege on that to make getorfs more complex by adding a scoring filter to get only the sequences that actually are above a scoring threshold. For instance the argmax to the orf.score field will help.

orfs[argmax([orf.score for orf in orfs])]

We can also use a combination of sorting and filtering:

sortedorfs = sort(orfs, by = orf -> -orf.score)
sortedorfs[1:min(10, end)]

The function will gain a min_score kwarg:

function getorfs(
    sequence::NucleicSeqOrView{DNAAlphabet{N}},
    ::DNAAlphabet{N},
    method::M;
    kwargs...
    min_score=0
) where {N,M<:GeneFinderMethod}
 ...
end

Still to define...

Check out orfipy for similar code in python/c

You might want to have a look at orfipy for similar code.
https://github.com/urmi-21/orfipy

The `iscoding` should be more generic.

Since we want to apply this predicate to any sequence eventually, the way in which different algortihms/implementations consider that a sequence is probably encoding information varies. Some, as the current naivefinder considers models, then it will use that input information. Now, to be more generic the general iscoding (currently changed to isnaivecoding should be something like:

function iscoding(sequence::LongSeqOrView{DNAAlphabet{N}}, method::Function; kwargs...) where {N}
    ...
end

Reconsider start and stop in the IO methods

The ORF struct is normally defined with a location field that is of type UnitRange{Int64}. This has been used with the default step (i.e., 1) argument. So even if the strand field of ORF is - the start will always be determined by the "positive" strand range.

This is not an issue for the get_orfs_* methods since they use the following treatment:

Base.getindex(sequence::NucleicSeqOrView{A}, orf::ORF) where {A} = orf.strand == '+' ? (@view sequence[orf.location]) : reverse_complement(@view sequence[orf.location])

The inverted range is, for instance, how negative stranded ORF are displayed in PHANOTATE outputs (c.f source code).

The things to reconsider are:

Are the other ORF applications using this convention as well?
Would revamping this bring some benefits to the performance?
The write methods should at least advertise this. However, judging by the previous test with IGV it is found to have only positive ranges at start and stop.

camilogarciabotero / genefinder.jl Goto Github PK

genefinder.jl's Introduction

Hello there, I'm Camilo

genefinder.jl's People

Contributors

Stargazers

Watchers

Forkers

genefinder.jl's Issues

Register

TagBot trigger issue

Using score to filter what `getorfs` delivers

Check out orfipy for similar code in python/c

The `iscoding` should be more generic.

Reconsider start and stop in the IO methods

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent