jsgounot / pyblast Goto Github PK
View Code? Open in Web Editor NEWLocal and global alignment using multiprocessing
Local and global alignment using multiprocessing
I'm blasting genes from a multifasta file. I'd like to get the best (defined by evalue) hit for each query from the mutli fasta file.
This is how I'm blasting:
bcl = BCLine6("blastn", query=ref_trans,
subject=db, word_size=11, evalue=0.01, outfmt="evalue sstrand")
res = bcl.run(ncore=cores, quiet=True)
This returns a dataframe with hits.
But I'm at a loss as how to get the best hits for each query without separating each query into its own fasta file and blasting each file, one at a time.
Hi there
I am using PyBlast to find similar sequences (obviously) in a different genome. None of these are model organisms.
The queries are fasta files, each with the same gene but from different strains. So I have about 12 genes and 10 strains.
I need to get the length of the entire query, not just the length of the part of the query that aligned to the database. Do you know if that's accessible?
If not, I will have to blast each sequence in each fasta file separately, which will be slower than blasting the whole file at a time.
This is what it looks like now:
for q in qdir.glob('*.fasta'):
bcl = BCLine6("blastn", query=q,
subject=db, word_size=11, evalue=0.01, outfmt="evalue sstrand")
res = bcl.run(ncore=8, quiet=True)
print(f'query length = {len(q.seq)}')
But of course q is not the actual sequence record, but a file name.
And qlen
is the length of the aligned query, not the length of the whole query sequence.
Printing the output of BCLine produces one long string, without new lines. The \n
are printed literally, and not as new lines.
I actually want to see the alignments blast blast is producing, but it's impossible when it looks like this:
2900 | organism=Plasmodium_relictum_SGS1-like \n| gene_product=conserved Plasmodium protein, \nunknown function | transcript_product=conserved Plasmodium \nprotein, unknown function | location=PRELSG_05_v1:89752-91639(-) \n| length=1542 | sequence_SO=chromosome | SO=protein_coding_gene \n| is_pseudo=false\nLength=1542\n\n Score = 28.8 bits (15), Expect = 4.3\n Identities = 15/15 (100%), Gaps = 0/15 (0%)\n Strand=Plus/Plus\n\nQuery 161 ATTAAGAGAGCAATT 175\n |||||||||||||||\nSbjct 1455 ATTAAGAGAGCAATT 1469\n\n\n> PRELSG_0501100.1 | gene=PRELSG_0501100 | organism=Plasmodium_relictum_SGS1-like \n| gene_product=thrombospondin-related protein \n1, putative | transcript_product=thrombospondin-related \nprotein 1, putative | location=PRELSG_05_v1:39592-42177(+) \n| length=2586 | sequence_SO=chromosome | SO=protein_coding_gene \n| is_pseudo=false\nLength=2586\n\n Score = 28.8 bits (15), Expect = 4.3\n Identities = 17/18 (94%), Gaps = 0/18 (0%)\n Strand=Plus/Plus\n\nQuery 26 AACTACCAATTAataaat 43\n ||| ||||||||||||||\nSbjct 461 AACAACCAATTAATAAAT 478\n\n\n> PRELSG_0302700.1 | gene=PRELSG_0302700 | organism=Plasmodium_relictum_SGS1-like \n| gene_product=pre-mRNA-processing-splicing \nfactor 8, putative | transcript_product=pre-mRNA-processing-splicing \nfactor 8, putative | location=PRELSG_03_v1:146754-156416(+) \n| length=9510 | sequence_SO=chromosome | SO=protein_coding_gene \n| is_pseudo=false\nLength=9510\n\n Score = 28.8 bits (15), Expect = 4.3\n Identities = 15/15 (100%), Gaps = 0/15 (0%)\n
I have set quiet = True
for my blast, but ncore : 8 - available : 8
is still printing to screen.
Am I missing something? This shouldn't happen, right? I presume it's a subprocess printing, and has not been suppressed.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.