Git Product home page Git Product logo

metablastr's People

Contributors

gogleva avatar hajkd avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

metablastr's Issues

blast_nt for nucleotide against NCBI database

Hello
thank you for developping this set of tools
I am trying to run my nucleotide sequences againts NCBI database
From your vignette, it should be blast_n()?
but the function is not available is the version of metablastr I just installed with

devtools::install_github("drostlab/metablastr", build_vignettes = TRUE, dependencies = TRUE)

do you have an alternative to this function for nucleotides?
https://drostlab.github.io/metablastr/reference/blast_protein_to_nr_database.html

does the database need to be stored locally? advices?

thank you!

Feature request : add taxon id for each blast hit

Default blast tabular format output (outfmt 7) doesn't add taxon id for each blast hit. Taxon id is very important for downstream phylogenetic analysis. Indirect approach to add taxon id is to run the blastdbcmd with option %T once the results are obtained. This is very time consuming as you have to get taxon first and map back to original blast results. Can metablstr has function which can map taxon id to blast outcome ?

additional functionality when extracting random sequences

Hi,

I just stumbled across your package today while looking for a way to extract the same set of 10,000 1000kb loci randomly from five different genomes. Thank you so much for writing this package, I think it's going to be a huge help in this process. If you're still open to hearing requests for additional functionality/flexibility, I have two questions for you. For your function 'extract_random_seqs_from_genome', is it possible to set it so that replacement = FALSE once each random locus is selected and extracted (or is this already true)? Would it also be possible to set a minimum distance between randomly selected loci, e.g. if I wanted to specify that all loci are at least 50bp apart?

Thank you,
Amy

Error in .Call2("new_input_filexp", filepath, PACKAGE = "XVector") : cannot open file ''

Hi I finally downloaded Blast+ and created the pathway to R. Now I am having issues loading my fasta file. Thanks!

I tried to load it from my hard drive and from my desktop and get the same errors:

sealionfeces <- readDNAStringSet("H:/ONRdolphinsealionpooled/sealionfecespooled/canu/medaka/consensus.fasta", + package = "rBLAST")) Error: unexpected ')' in: " sealionfeces <- readDNAStringSet("H:/ONRdolphinsealionpooled/sealionfecespooled/canu/medaka/consensus.fasta", package = "rBLAST"))"

sealionfeces <- readDNAStringSet(system.file("H:/ONRdolphinsealionpooled/sealionfecespooled/canu/medaka/consensus.fasta", + package = "rBLAST")) Error in .Call2("new_input_filexp", filepath, PACKAGE = "XVector") : cannot open file ''

sealionfeces <- readDNAStringSet(system.file("C:\Users\katie\OneDrive\Desktop\R\sealionfecespooled.consensus.fasta", Error: '\U' used without hex digits in character string starting ""C:\U"

sealionfeces <- readDNAStringSet(system.file("C:/Users/katie/OneDrive/Desktop/R/sealionfecespooled.consensus.fasta", + package = "rBLAST")) Error in .Call2("new_input_filexp", filepath, PACKAGE = "XVector") : cannot open file ''
--
ย 
| >

Feature Request: exclude sequences with Ns

In extract_random_seqs_from_genome(), It would be helpful to have an option that allows users to decide whether to exclude sequences with too many Ns (e.g. N > 0 or N > 10%). For me, it would be fine for this filtering step to happen after X sequences are drawn (e.g. if 100 sequences are drawn, then 10 are excluded because they have too many Ns, resulting in 90 sequences). It would be great to have a short printout at the end that says how many sequences were drawn and how many were filtered out due to an issue with Ns.

How can I load sequences from a CSV file and do the massive Blast search locally?

Hi,

I have a CSV dataset of edited DNA sequences by DADA2 pipeline and wonder how I can load and blast these sequences automatically using metablastr packages.

In the CSV file, each row represents a unique sequence and each column has a sample name(see attached image):
Screen Shot 2022-08-11 at 6 53 11 PM

These COI gene sequences are clean and ready to Blast directly on NCBI website. Most of the sequences are from mammalian and avian blood. Since there are over 2000 sequences, it'd be great if I can use this package to load and blast automatically instead of manually.

Any R scripts to achieve this goal with metablastr package would highly appreciated. Thank you.

Best,

Gabriel

blast_best_reciprocal_hit nucleotide-protein comparison task error

I ran the following code:

blast_test_reciprocal <- blast_best_reciprocal_hit(
    query   = 'A.fasta', ##protein sequence
    subject = 'B.fasta', ##nucleotide sequence
    search_type = "protein_to_nucleotide",
    task = "tblastn",
    evalue = 0.000001,
    output.path = tempdir(),
    db.import  = FALSE)

which gives the following result:


Starting 'tblastn -task tblastn' with  query: A.fasta and subject: B.fasta using 1 core(s) ...

BLAST search finished! The BLAST output file was imported into the running R session. The BLAST output file has been stored at: C:/Users/A_B_tblastn_eval_1e-06.blast_tbl
Error: Please choose a nucleotide-protein comparison task that is supported by BLAST: task = 'blastx' or task = 'blastx-fast'.

How to specify the second blast task ('blastx') when performing tblastn?

function blast_protein_to_protein with argument is.subject.db = TRUE doesn't work with nr database

I have nr blast database downloaded from NCBI, which contains the files given in the attached snapshot. When I run the command below, it throws the as shown. I wonder, what input should I give as a blast-able database ?

blast_test <- blast_protein_to_protein(
        query   =  "aa_query.fasta",
        subject = "path/to/nr/db/nr",
        is.subject.db = TRUE,
        output.path = tempdir(),
        db.import  = FALSE ,cores = 4)

Error in .Call2("new_input_filexp", filepath, PACKAGE = "XVector") : 
  cannot open file '/Users/chiragparsania/Documents/Database/nr_protein_db/nr'

image

Error: Internal error in `dict_hash_with()`: Dictionary is full.

Thank you metablastr developers for sharing this tool with the community. I'd like to seek for your help for the error I've encountered following blast_best_reciprocal_hit() run. Both BLASTp seem to have completed, but the reciprocal best hit step appears to have failed. One database I'm using has around 25M records, and I'm wondering if this could be the reason why the reciprocal best hit step failed. For reference, I'm sharing the snippets of the error:

BLAST search finished! The BLAST output file was imported into the running R session. The BLAST output file has been stored at: /expt/datb/data/HiC/Rp_RNA-Seq/embryo/timeseries-vs-RpedSuzhou/rna-seq_rpedszv-reannot/annot_gene_sym/metablastr_bbh/metazoa_refseq_biopython-validated_Riptortus_pedestris_SZV_blastp-fast_eval_1e-05.blast_tbl
Error: Internal error in dict_hash_with(): Dictionary is full.

rlang::last_error()
<error/rlang_error>
Internal error in dict_hash_with(): Dictionary is full.
Backtrace:

  1. metablastr::blast_best_reciprocal_hit(...)
  2. metablastr::blast_best_hit(...)
  3. dplyr:::group_by.data.frame(blast_res, query_id)
  4. dplyr::grouped_df(groups$data, groups$group_names, .drop)
  5. dplyr:::compute_groups(data, vars, drop = drop)
  6. dplyr:::vec_split_id_order(group_vars)
  7. vctrs::vec_group_loc(x)
    Run rlang::last_trace() to see the full context.

Is it also possible to skip the BLAST step to directly proceed with the reciprocal best hit step when re-running this procedure?

Thank you very much!

Reg: Reciprocal hits from already run blast outcomes

Hi!

I have performed the following to obtain blast format 6 tabular format:

  1. A vs B with Diamond
  2. B vs A with Diamond

I read AvB.hits and BvA.hits using read_blast() function. Will it be possible to identify the reciprocal hits? Or should I rerun the searches with metablastr?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.