Git Product home page Git Product logo

fast5seek's Introduction

⚠️ This project is effectively redundant now. The functionality can be reproduced with fast5_subset. See #7 for a further explanation. ⚠️

fast5seek

PyPI status Build Status GitHub license Twitter Follow

This program takes a directory (or multiple) of fast5 files along with any number of fastq, SAM, or BAM files. The output is the full paths for all fast5 files present in the fastq, BAM, or SAM files that are also in the provided fast5 directory(s).

Installation

Using python3, run

pip3 install fast5seek

Usage

It's pretty straight-forward to use:

fast5seek -i /path/to/fast5s -r in.fastq in.bam in.sam -o out.txt

This will write all fast5 paths to a text file called out.txt - with each path on a new line.

What it does is read in the in.fastq in.bam in.sam files and extracts the read id from each record. It then goes through all the fast5 files under /path/to/fast5s and checks whether their read id is in the set of read ids from <in.fastq|in.bam|in.sam>. If it is, the path to the file is written to it's own line in out.txt.

If no output (-o) is given, it will write the output to stdout.

There is also an option to only search for read ids among mapped records in a BAM or SAM file - -m/--mapped.

Gzipped fastq files can also be provided.

Full Usage

usage: fast5seek [-h] -i FAST5_DIR [FAST5_DIR ...] -r REFERENCE
                 [REFERENCE ...] [-o OUTPUT] [-m] [--log_level {0,1,2,3,4,5}]
                 [--no_progress_bar]

Outputs paths of all the fast5 files from a given directory that are contained within a fastq or BAM/SAM file.

Please see the github page for more detailed instructions.
https://github.com/mbhall88/fast5seek/

Contributors:
Michael Hall (github@mbhall88)
Darrin Schultz (github@conchoecia)

optional arguments:
  -h, --help            show this help message and exit
  -i FAST5_DIR [FAST5_DIR ...], --fast5_dir FAST5_DIR [FAST5_DIR ...]
                        Directory of fast5 files you want to query. Program
                        will walk recursively through subdirectories.
  -r REFERENCE [REFERENCE ...], --reference REFERENCE [REFERENCE ...]
                        Fastq or BAM/SAM file(s).
  -o OUTPUT, --output OUTPUT
                        Filename to write fast5 paths to. If nothing is
                        entered, it will write the paths to STDOUT.
  -m, --mapped          Only extract read ids for mapped reads in BAM/SAM
                        files.
  --log_level {0,1,2,3,4,5}
                        Level of logging. 0 is none, 5 is for debugging.
                        Default is 4 which will report info, warnings, errors,
                        and critical information.
  --no_progress_bar     Do not display progress bar.

Multiple Inputs

It is possible to use multiple directories/files as arguments. No need to merge bam|fastq|sam files.

fast5seek -i /myfast5/dir/1/ /other/fast5/dir/2/ -r reads.sorted.bam reads2.bam

For example, if all of your fast5 directories contain the prefix myfast5_ and the reference files contain .sorted.bam, you can use wildcards to find them all if they are in the same directory.

fast5seek -i myfast5_* -r *.sorted.bam

Piping Commands

If you wanted to pipe these paths into another program, you could do something like

mkdir subset_dir/
fast5seek -i /path/to/fast5s/ -r in.fastq | xargs cp -t subset_dir/

The above example would copy the fast5 files that are found in your fastq to subset_dir/.

Recommended Usage

However because of the computationally intensive step required to open fast5 files, we recommend that you first save the output of fast5seek to a file for safekeeping, then proceed with analysis like so:

mkdir subset_dir/
fast5seek -i /path/to/fast5s/ -r in.fastq -o mapped_reads.txt
cat mapped_reads.txt | xargs cp -t subset_dir/

Contact

If there are any issues with the program please open an issue above.

Contributors

Michael Hall @mbhall88
Darrin Schultz @conchoecia

fast5seek's People

Contributors

conchoecia avatar mbhall88 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

fast5seek's Issues

channel_id does not exist

I try to run fast5seek like so:

fast5seek -i path/to/signal -r ref.fq -o result.txt

However, it fails (always at different percent complete points):

[07/03/2018 12:43:56 PM]:INFO: Scanning 423989 fast5 files for presence in references
Percent: [#####-----------------------------------] 11.29% Traceback (most recent call last):
  File ".../miniconda3/envs/tombo/bin/fast5seek", line 11, in <module>
    sys.exit(cli())
  File ".../miniconda3/envs/tombo/lib/python3.6/site-packages/fast5seek/cli.py", line 105, in cli
    fast5seek.main(args)
  File ".../miniconda3/envs/tombo/lib/python3.6/site-packages/fast5seek/fast5seek.py", line 316, in main
    write_progress_bar_to)
  File ".../miniconda3/envs/tombo/lib/python3.6/site-packages/fast5seek/fast5seek.py", line 232, in collect_present_fast5_filepaths
    fast5_read_id, fast5_run_id = get_read_and_run_id(filepath)
  File ".../miniconda3/envs/tombo/lib/python3.6/site-packages/fast5seek/fast5seek.py", line 49, in get_read_and_run_id
    fast5 = Fast5File(filepath)
  File ".../miniconda3/envs/tombo/lib/python3.6/site-packages/ont_fast5_api/fast5_file.py", line 70, in __init__
    self.status = Fast5Info(self.filename)
  File ".../miniconda3/envs/tombo/lib/python3.6/site-packages/ont_fast5_api/fast5_info.py", line 80, in __init__
    self.channel = handle['UniqueGlobalKey/channel_id'].attrs.get('channel_number')
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper (/private/var/folders/my/m6ynh3bn6tq06h7xr3js0z7r0000gn/T/pip-gkjbrkhs-build/h5py/_objects.c:2840)
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper (/private/var/folders/my/m6ynh3bn6tq06h7xr3js0z7r0000gn/T/pip-gkjbrkhs-build/h5py/_objects.c:2798)
  File ".../miniconda3/envs/tombo/lib/python3.6/site-packages/h5py/_hl/group.py", line 169, in __getitem__
    oid = h5o.open(self.id, self._e(name), lapl=self._lapl)
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper (/private/var/folders/my/m6ynh3bn6tq06h7xr3js0z7r0000gn/T/pip-gkjbrkhs-build/h5py/_objects.c:2840)
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper (/private/var/folders/my/m6ynh3bn6tq06h7xr3js0z7r0000gn/T/pip-gkjbrkhs-build/h5py/_objects.c:2798)
  File "h5py/h5o.pyx", line 190, in h5py.h5o.open (/private/var/folders/my/m6ynh3bn6tq06h7xr3js0z7r0000gn/T/pip-gkjbrkhs-build/h5py/h5o.c:3734)
KeyError: "Unable to open object (Object 'channel_id' doesn't exist)"

What could be the problem here? I did use Tombo on these files, and is has a flag in the resquiggle command to ignore locked hdf5 files -- might this be the problem?

Could fast5seek be adjusted to either unlock such files (if this is the problem) or simple catch the error, report it ("could not process x") and continue?

Thanks,
Adrian

UnboundLocalError: local variable 'global_keys' referenced before assignment

Hi,
Thank you very much for developing fast5seek. It is the only available tool I find to extract fastq5 from fastq files. Your help of this issue will be very appreciated.

The cmd I used was fast5seek -i /nanopore/minknow/data/20220720_1052_MN30213_FAQ39030_a7c337e6/fast5_pass/ -r readname.fastq -o test.txt --log_level 4.

The error report is this:

[03/28/2023 08:53:27 PM]:INFO: Starting fast5seek.
[03/28/2023 08:53:27 PM]:INFO: Looking for reads to extract in the following ref files:
[03/28/2023 08:53:27 PM]:INFO: readname.fastq
[03/28/2023 08:53:27 PM]:INFO: Found 359 unique read ids
[03/28/2023 08:53:27 PM]:INFO: Found 8 fast5 files.
[03/28/2023 08:53:27 PM]:INFO: Scanning 8 fast5 files for presence in references
Traceback (most recent call last):
  File "/home/zhany0s/software/anaconda3/envs/Methylation/lib/python3.10/site-packages/ont_fast5_api/fast5_file.py", line 277, in _initialise_file
    self.status = Fast5Info(self.filename)
  File "/home/zhany0s/software/anaconda3/envs/Methylation/lib/python3.10/site-packages/ont_fast5_api/fast5_info.py", line 77, in __init__
    if 'tracking_id' not in global_keys and not self._legacy_version():
UnboundLocalError: local variable 'global_keys' referenced before assignment

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/zhany0s/software/anaconda3/envs/Methylation/bin/fast5seek", line 10, in <module>
    sys.exit(cli())
  File "/home/zhany0s/software/anaconda3/envs/Methylation/lib/python3.10/site-packages/fast5seek/cli.py", line 105, in cli
    fast5seek.main(args)
  File "/home/zhany0s/software/anaconda3/envs/Methylation/lib/python3.10/site-packages/fast5seek/fast5seek.py", line 315, in main
    final_filepath_list = collect_present_fast5_filepaths(fast5_filepaths,
  File "/home/zhany0s/software/anaconda3/envs/Methylation/lib/python3.10/site-packages/fast5seek/fast5seek.py", line 232, in collect_present_fast5_filepaths
    fast5_read_id, fast5_run_id = get_read_and_run_id(filepath)
  File "/home/zhany0s/software/anaconda3/envs/Methylation/lib/python3.10/site-packages/fast5seek/fast5seek.py", line 49, in get_read_and_run_id
    fast5 = Fast5File(filepath)
  File "/home/zhany0s/software/anaconda3/envs/Methylation/lib/python3.10/site-packages/ont_fast5_api/fast5_file.py", line 42, in __init__
    self._initialise_file()
  File "/home/zhany0s/software/anaconda3/envs/Methylation/lib/python3.10/site-packages/ont_fast5_api/fast5_file.py", line 281, in _initialise_file
    raise Fast5FileTypeError("Failed to initialise single-read Fast5File: '{}'".format(self.filename))
ont_fast5_api.fast5_file.Fast5FileTypeError: Failed to initialise single-read Fast5File: '/nanopore/minknow/data/20220720_1052_MN30213_FAQ39030_a7c337e6/fast5_pass/FAQ39030_pass_ebc85fce_6.fast5'

Would you let me know how can I solve this problem?

Many thanks in advance,

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.