Git Product home page Git Product logo

hifiadapterfilt's Introduction

HiFiAdapterFilt

Convert .bam to .fastq and remove reads with remnant PacBio adapter sequences

Dependencies:

  • BamTools
  • BLAST+

Optional:

  • NCBI FCS Adaptor
  • pigz

Add script and database to your path using:

export PATH=$PATH:[PATH TO HiFiAdapterFilt]
export PATH=$PATH:[PATH TO HiFiAdapterFilt]/DB

Usage

bash hifiadapterfilt.sh [ -p file Prefix ] [ -l minimum Length of adapter match to remove. Default=44 ] [ -m minimum percent Match of adapter to remove. Default=97 ] [ -t Number of threads for blastn. Default=8 ] [ -o outdirectory prefix Default=. ]

All flags are optional.

If no -p argument is provided, the script will run on all sequence files (.bam, .fastq, .fastq.gz, .fq, .fq.gz) in the working directory.

If using FCS adaptor to detect adapter contaminated reads use the hifiadapterfiltFCS.sh script

bash hifiadapterfiltFCS.sh -f <FCS adaptor output file> -r <HiFi reads file> [-t Number of threads for pigz. Defualt=8] [-o outdirectory prefix Default=.]

Outputs

  • {prefix}.contaminant.blastout (Output of BLAST search)
  • {prefix}.blocklist (Headers of PB adapter contaminated reads to be removed)
  • {prefix}.filt.fastq.gz (Fastq reads free of PB adapter sequence ready for assembly)
  • {prefix}.stats (File with simple math on number of reads removed, etc)

Citation

If this script is useful to you, please cite the following in your publication:

@article{HiFiAdapterFilt,
   author = {Sim, Sheina B. and Corpuz, Renee L. and Simmonds, Tyler J. and Geib, Scott M.},
   title = {HiFiAdapterFilt, a memory efficient read processing pipeline, prevents occurrence of adapter sequence in PacBio HiFi reads and their negative impacts on genome assembly},
   journal = {BMC Genomics},
   volume = {23},
   number = {1},
   pages = {157},
   ISSN = {1471-2164},
   DOI = {10.1186/s12864-022-08375-1},
   url = {https://doi.org/10.1186/s12864-022-08375-1},
   year = {2022},
   type = {Journal Article}
}

Sheina B. Sim
USDA-ARS
US Pacific Basin Agricultural Research Service
Hilo, Hawaii, 96720 USA
[email protected]

This script is in the public domain in the United States per 17 U.S.C. § 105

hifiadapterfilt's People

Contributors

sheinasim avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

hifiadapterfilt's Issues

Blast database issue

Hi Sheina,

I was eager to try your adapter trimming tool but I experienced some issues I thought you might want to know about.

I noticed in the 2.0.0 release that line 6 is:
DBpath=$(echo $PATH | sed 's/:/\n/g' | grep "HiFiAdapterFilt/DB" | head -n 1)

However, I think it should be:
DBpath=$(echo $PATH | sed 's/:/\n/g' | grep "HiFiAdapterFilt-2.0.0/DB" | head -n 1)

That still isn't solving a problem I'm having, though. I'm consistently getting the following error unless I hard-code the path to the blast database:

BLAST Database error: No alias or index file found for nucleotide database [/pacbio_vectors_db] in search path [/grps2/kmk/2022-04-20_Ap212/PB744_AP212_5C_ULTP/r64069_20220330_183728/C1/outputs/adapter_filter_test::]

Also, do you know if the adapters used in the ultra-low PacBio library prep are the same as the ones in the pacbio_vectors_db file?

Thanks!
Kevin

pacbio_vectors_db error

I encountered an error:

BLAST Database error: No alias or index file found for nucleotide database [/pacbio_vectors_db] in search path [/home/liangc/Downloads/PacBio:/scratch/ncbi_blast+/db:]

To solve this problem, I have tried two approaches
(1): Inside my /Downloads/PacBio folder, I created a folder named "pacbio_vectors_db" and put all relevant files within this folder
(2): Inside my /Downloads/PacBio folder, places all individual files that I copied from HiFiAdapterFilt-master/DB/.

Both approaches do not work.

In my .bashrc file, I have put these commands:

export BLASTDB=/scratch/ncbi_blast+/db/
export PATH=$PATH:/home/liangc/Software/HiFiAdapterFilt-master
export PATH=$PATH:/home/liangc/Software/HiFiAdapterFilt-master/DB
export PATH=$PATH:/home/liangc/Downloads/PacBio/pacbio_vectors_db

add --cite or --version

would it be possible to add a --cite or --version to the script? it'd make integration into our pipelines that much easier.

expected time

What are the typical run times for the pipeline? I have a 40GB fastq and I expect it could be anywhere from 5-15 hours, based on transfer rates.

fasta file type

Dear @sheinasim
Thanks for your solution of adapters.
Is there anyway to accept fasta file type for HiFi reads ?

Looking forward to reply.

Best
Johnson

Blast version

Hi!

I am trying to run this tool on my PacBio reads but I get a general error from blast (like if some options are not compatible) that makes me think that maybe there is a problem with my blast version.
What is the recommended version for blast to use?

Thanks!!

.temp_file_list: No such file or directory

Hi,
I am trying to run HiFiAdapterFilt on a raw PacBio hifi .bam file using the following command:

bash pbadapterfilt.sh -p /path/to/bam/220209_hifi_reads.bam -o HiFiAdaptFilt_220209

and I keep getting the following error:
path/to/bam/220209_hifi_reads.bam.temp_file_list: No such file or directory

It seems to be making the output directory, but cuts out immediately. Do you know what could be going on here?

performance question

Thanks for this nice tool. Just a quick question. Does HiFiAdapterFilt perform better if I run it on one large input file? Or would it be better to split my input file in several smaller fastq.gz files (for example with fastqsplitter), which I then store in one directory on which I let HiFiAdapterFilt run?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.