Git Product home page Git Product logo

taxconverter's Introduction

Taxconverter

A lightweight tool for one purpose only: to unify the outputs of different taxonomic classifiers. Currently supports Centrifuge v1.0.4, Kraken2 v2.1.3, Metabuli v1.0.1 and MetaMaps v.633d2e. The output files of these tools are converted to MMseqs2 format.

Having everything converted to MMseqs2 format means the output file will only have two informative columns: sequence identifiers (the 1st column) and taxonomic labels on all levels from domain to species concatenated with ";" (9th column). The rest is filled with zeros or by trivial parsing.

Suggestions and contributions are most welcome.

Installation

  1. Clone this repo and install the package from the source (releasing pip package WIP).
git clone [email protected]:RasmussenLab/taxconverter.git
cd taxconverter
pip install -e .
  1. Unzip the two files from data/lineage.zip (38.3 MB): ncbi_lineage.csv (246.2 MB) and metabuli_lineage.csv (58.1 MB), and place them to the data/ folder.

Usage

To convert Centrifuge, Kraken2 and MetaMaps outputs, provide one file with the taxonomy annotation results:

taxconverter centrifuge -i centrifuge_annotations.tsv -o result.tsv
taxconverter kraken2 -i kraken2_annotations.tsv -o result.tsv
taxconverter metamaps -i metamaps_annotations.tsv -o result.tsv

To convert a Metabuli output, provide two files with _classifications.tsv and _report.tsv postfixes:

taxconverter metabuli -c metabuli_classifications.tsv -r metabuli_report.tsv -o result.tsv

For more help, run taxconverter -h, taxconverter metabuli -h, taxconverter centrifuge -h, taxconverter kraken2 -h, taxconverter metamaps -h

References and links

This package is made to complement the Taxometer tool for refining taxonomic annotations from any classifier using contigs k-mers and co-abundances (preprint).

Other links:

taxconverter's People

Contributors

sgalkina avatar

Stargazers

Eric van der Toorn avatar

Watchers

Henry Webel avatar  avatar

taxconverter's Issues

some incorrect paths

Thanks for this great tool - I am hoping to run taxvamb on my data so it is great to have this tool to convert my centrifuge annotations.

On my system, I got the following path errors after running taxconverter centrifuge -i <my input> -o <my output>:

2024-03-11 16:08:46.665 | INFO     | taxconverter.__main__:ncbi_lineage:45 - Loading NCBI lineage
Traceback (most recent call last):
  File "/my/university/system/.conda/envs/anvio/bin/taxconverter", line 33, in <module>
    sys.exit(load_entry_point('taxconverter', 'console_scripts', 'taxconverter')())
  File "/my/university/system/software/taxconverter/taxconverter/__main__.py", line 171, in main
    df_result = centrifuge_data(args.input)
  File "/my/university/system/software/taxconverter/taxconverter/__main__.py", line 38, in wrapper
    df = func(*args, **kwargs)
  File "/my/university/system/software/taxconverter/taxconverter/__main__.py", line 80, in centrifuge_data
    df_ncbi = ncbi_lineage()
  File "/my/university/system/software/taxconverter/taxconverter/__main__.py", line 46, in ncbi_lineage
    df_ncbi = pd.read_csv(NCBI_LINEAGE, quoting=csv.QUOTE_NONE)

It was solved by changing NCBI_LINEAGE and METABULI_LINEAGE to:

NCBI_LINEAGE = os.path.join(parentdir, 'data/ncbi_lineage.csv')
METABULI_LINEAGE = os.path.join(parentdir, 'data/metabuli_lineage.csv')

I also had an issue with unzipping the lineage.zip file on my university system. Neither gunzip / gzip -d / unzip worked, though my solution was to unzip it on my mac (which worked) and then copy the data over. This might be just a be a me problem though. I can submit this as a separate issue if you think it would be helpful, though it is very minor and might just need a clearer installation note in the readme, if anything.

Thanks again,
Will

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.