Git Product home page Git Product logo

16s-nf's People

Contributors

dfornika avatar jpalmer37 avatar sherrie9 avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar

16s-nf's Issues

include test case for new user setup

It would be nice to include one or more 16S rRNA sequences in the repository to be used as testing examples when setting up the pipeline. This will streamline pipeline setup and Conda installation for users following the 16S SOP.

As a starting point, I will aim to add a single rRNA sequence with corresponding annotations to a new test folder.

Add support for filtering results based on a set of regexes

Allow the blast results to be filtered based on one or more regular expressions (regexes). This can be used to filter out 'uncultured' results, or other similar categories of results that are not wanted in the final blast output.

In order to flexibly support multiple regexes, they can be supplied in a multi-line file, with one line per regex. The file should be supplied using the --filter_regexes flag, and it should be optional.

Incorporate database metadata (name & version) into blast outputs

We've adopted a strategy for tracking database versioning and other metadata that involves storing a metadata.json file alongside the blast database files.

At minimum, the metadata.json file must include these fields:

{
  "version": "1.1",
  "date": "2020-06-24"
}

...though additional fields may be present (they will be ignored by this pipeline).

The version info will be included in each record of the blast results, under the fields:

database_version
database_date

In addition, the ID field of the databases.csv will be included under the field:

database_name

Database metadata should be collected by default, but it can be skipped using the --no_db_metadata flag.

Support multiple databases

We may want to blast against multiple databases, and aggregate the outputs into a single report.

To support an arbitrary number of databases, we could pass the databases as a samplesheet (dbsheet?) like this:

ID,DBNAME,PATH
ncbi_16S,16S_ribosomal_RNA,/path/to/ncbi/2023-09-19_1.1_16S_ribosomal_RNA
silva,SSURef_NR99_tax_silva_trunc,/path/to/silva/2020-08-24_138.1_SSURef_NR99_tax_silva_trunc

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.