Git Product home page Git Product logo

pyblast's People

Contributors

jsgounot avatar

Watchers

 avatar  avatar

pyblast's Issues

Getting the best hit for each query in a multifasta file

I'm blasting genes from a multifasta file. I'd like to get the best (defined by evalue) hit for each query from the mutli fasta file.
This is how I'm blasting:

bcl = BCLine6("blastn", query=ref_trans,
subject=db, word_size=11, evalue=0.01, outfmt="evalue sstrand")
res = bcl.run(ncore=cores, quiet=True)

This returns a dataframe with hits.

But I'm at a loss as how to get the best hits for each query without separating each query into its own fasta file and blasting each file, one at a time.

Retrieving the entire query sequence from a blast, not just the 'local' aligned HSP

Hi there

I am using PyBlast to find similar sequences (obviously) in a different genome. None of these are model organisms.
The queries are fasta files, each with the same gene but from different strains. So I have about 12 genes and 10 strains.
I need to get the length of the entire query, not just the length of the part of the query that aligned to the database. Do you know if that's accessible?

If not, I will have to blast each sequence in each fasta file separately, which will be slower than blasting the whole file at a time.
This is what it looks like now:

for q in qdir.glob('*.fasta'):
    bcl = BCLine6("blastn", query=q,
    subject=db, word_size=11, evalue=0.01, outfmt="evalue sstrand")
    res = bcl.run(ncore=8, quiet=True)
    print(f'query length = {len(q.seq)}')

But of course q is not the actual sequence record, but a file name.
And qlen is the length of the aligned query, not the length of the whole query sequence.

BCLine output is one long string, news lines are ignored

Printing the output of BCLine produces one long string, without new lines. The \n are printed literally, and not as new lines.
I actually want to see the alignments blast blast is producing, but it's impossible when it looks like this:

2900 | organism=Plasmodium_relictum_SGS1-like \n| gene_product=conserved Plasmodium protein, \nunknown function | transcript_product=conserved Plasmodium \nprotein, unknown function | location=PRELSG_05_v1:89752-91639(-) \n| length=1542 | sequence_SO=chromosome | SO=protein_coding_gene \n| is_pseudo=false\nLength=1542\n\n Score = 28.8 bits (15),  Expect = 4.3\n Identities = 15/15 (100%), Gaps = 0/15 (0%)\n Strand=Plus/Plus\n\nQuery  161   ATTAAGAGAGCAATT  175\n             |||||||||||||||\nSbjct  1455  ATTAAGAGAGCAATT  1469\n\n\n> PRELSG_0501100.1 | gene=PRELSG_0501100 | organism=Plasmodium_relictum_SGS1-like \n| gene_product=thrombospondin-related protein \n1, putative | transcript_product=thrombospondin-related \nprotein 1, putative | location=PRELSG_05_v1:39592-42177(+) \n| length=2586 | sequence_SO=chromosome | SO=protein_coding_gene \n| is_pseudo=false\nLength=2586\n\n Score = 28.8 bits (15),  Expect = 4.3\n Identities = 17/18 (94%), Gaps = 0/18 (0%)\n Strand=Plus/Plus\n\nQuery  26   AACTACCAATTAataaat  43\n            ||| ||||||||||||||\nSbjct  461  AACAACCAATTAATAAAT  478\n\n\n> PRELSG_0302700.1 | gene=PRELSG_0302700 | organism=Plasmodium_relictum_SGS1-like \n| gene_product=pre-mRNA-processing-splicing \nfactor 8, putative | transcript_product=pre-mRNA-processing-splicing \nfactor 8, putative | location=PRELSG_03_v1:146754-156416(+) \n| length=9510 | sequence_SO=chromosome | SO=protein_coding_gene \n| is_pseudo=false\nLength=9510\n\n Score = 28.8 bits (15),  Expect = 4.3\n Identities = 15/15 (100%), Gaps = 0/15 (0%)\n 

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.