qiime2 / q2-fragment-insertion Goto Github PK

View Code? Open in Web Editor NEW

13.0 13.0 17.0 22.55 MB

License: BSD 3-Clause "New" or "Revised" License

Python 97.90% Makefile 0.15% TeX 1.95%

hacktoberfest

q2-fragment-insertion's Introduction

q2-fragment-insertion

This is a QIIME 2 plugin. For details on QIIME 2, see https://qiime2.org.

q2-fragment-insertion's People

Contributors

Stargazers

Watchers

Forkers

sjanssen2 qiyunzhu madeleineernst tanaes jwdebelius chriskeefe turanoo ebolyen lixiaopi1985 andrewsanchez jdrain22 oddant1 ayanlj lizgehret dpear hagenjp colinvwood

q2-fragment-insertion's Issues

Fix travis runs

Bug Description

Travis isn't set up correctly.

Plugin description would be nice

Improvement Description
This plugin doesn't have a top-level description registered for the user docs

Proposed Behavior
A plain-language description would make it easier for users to understand what q2-fragment-insertion can do, and could increase usage.

Improvement Description
I thought about the FeatureData[Taxonomy] artifact and Daniel's warnings about the quality of the assigned taxonomic labels, which depend on the quality of the placements of taxonomic labels in the reference phylogeny. Furthermore, fragment insertion is not unambiguous, but results in a distribution of positions and I remember Siavash suggesting his program TIPP for taxonomy assignment. Thus, I think we better organize creation of a FeatureData[Taxonomy] as a separate function instead of integrating it into the main function ("sepp").

Proposed Behavior
Currently, I am thinking about two alternatives to generate a FeatureData[Taxonomy]:

classify-paths: the current method which collects all taxonomic labels along the path from tip to root. Single input would be the Phylogeny[Rooted] artifact.
classify-otus: For every inserted fragment, we traverse the tree from tip to root. In every step, we check if we can find any OTU nodes in the current sub-tree. If so, we stop, otherwise continue the same procedure with the parent node. Once we found one (or maybe several) OTUs, we look up their assigned taxonomy lineage in Greengenes/Silva taxonomy table for corresponding reference tree. In case of several OTUs we report the longest commong prefix. This would require two inputs, the Phylogeny[Rooted] artifact and the taxonomy table from Greengenes with two columns: OTU-ID and lineage-string. This is the more conservative method and should only produce results en par with current Greengenes based taxonomy assignment algorithms.
classify-tipp: A feature development could use Siavash's TIPP to generate taxonomic lineages.

Questions
@wasade what are your thoughts?

Classify - otus experimental had blank taxon

Bug Description
I ran the classify otus experimental and I was getting an error that one of the entries was a float and it couldn't parse it. After digging into the taxonomy file, it looks like one of the entries was blank, and it was reading it as NaN, and it broke it. Once I deleted the line, everything ran fine.

Questions
Any chance something could be coded to avoid this issue in the future?

use Siavash's version 4.3.4b

Siavash merged my patch, thus we can drop that and switch to his latest tagged version

seq names with whitespaces

Improvement Description
check what happens to seq names with whitespaces

update library.qiime2.org install instructions

Those are outdated, now that this is in the core distribution.

Wire up to busywork

Versioneer
conda recipe
something else?

open ToDos

Improvement Description
PR #66 introduced major changes to the plugin and we have some open ToDos. Let us keep track of them here with this list:

Publish new QZAs on docs.qiime2.org for GreenGenes and SILVA, for the new database format defined above, using sepp-refs as source data.
Move readme information from this repo to library.qiime2.org
think about a mechanism to provide default values within the new reference set qzas for e.g. alignment_subset_size that can override plugin defaults, but also can be overwritten by user flags
find a way to check consistency between reference alignment/tree and raxml info file when creating reference sets
rough in method to merge database components
rough in method to destructure database components

References
PR #66

Trouble with Silva 128 in classify-otus-experimental ?

I ran fragment insertion as seen in the tutorial. I used the Silva 128 provided tree and alignment. My insertion tree was created, and I filtered my feature table. However, once I get to the classify-otus-experimental step and use the Silva 128 consensus 7 level taxonomy, I get the following error:

Not all OTUs in the provided insertion tree have mappings in the provided reference taxonomy.
I am attaching my insertion tree.
Any help would be appreciated!

insertion-tree.qza.gz

Installation does not work (channel-independent issue)

Hello,

I've run into an error when installing this package into a QIIME2 (2018.8) conda environment (Miniconda3-latest-Linux-x86_64) installed into my home directory on a computing cluster (i.e., barnacle).

This is the code I ran to install the package:

$ conda install -c anaconda -c defaults -c conda-forge -c bioconda -c https://conda.anaconda.org/biocore q2-fragment-insertion

This is the error that prints to screen when running the install code:

BEGINNING OF ERROR PRINTED TO SCREEN

Solving environment: failed

>>>>>>>>>>>>>>>>>>>>>> ERROR REPORT <<<<<<<<<<<<<<<<<<<<<<

Traceback (most recent call last):
  File "/home/jpshaffer/software/miniconda3/lib/python3.7/site-packages/conda/exceptions.py", line 819, in __call__
    return func(*args, **kwargs)
  File "/home/jpshaffer/software/miniconda3/lib/python3.7/site-packages/conda/cli/main.py", line 78, in _main
    exit_code = do_call(args, p)
  File "/home/jpshaffer/software/miniconda3/lib/python3.7/site-packages/conda/cli/conda_argparse.py", line 77, in do_call
    exit_code = getattr(module, func_name)(args, parser)
  File "/home/jpshaffer/software/miniconda3/lib/python3.7/site-packages/conda/cli/main_install.py", line 11, in execute
    install(args, parser, 'install')
  File "/home/jpshaffer/software/miniconda3/lib/python3.7/site-packages/conda/cli/install.py", line 235, in install
    force_reinstall=context.force,
  File "/home/jpshaffer/software/miniconda3/lib/python3.7/site-packages/conda/core/solve.py", line 518, in solve_for_transaction
    force_remove, force_reinstall)
  File "/home/jpshaffer/software/miniconda3/lib/python3.7/site-packages/conda/core/solve.py", line 451, in solve_for_diff
    final_precs = self.solve_final_state(deps_modifier, prune, ignore_pinned, force_remove)
  File "/home/jpshaffer/software/miniconda3/lib/python3.7/site-packages/conda/core/solve.py", line 180, in solve_final_state
    index, r = self._prepare(prepared_specs)
  File "/home/jpshaffer/software/miniconda3/lib/python3.7/site-packages/conda/core/solve.py", line 592, in _prepare
    self.subdirs, prepared_specs)
  File "/home/jpshaffer/software/miniconda3/lib/python3.7/site-packages/conda/core/index.py", line 215, in get_reduced_index
    new_records = query_all(spec)
  File "/home/jpshaffer/software/miniconda3/lib/python3.7/site-packages/conda/core/index.py", line 184, in query_all
    return tuple(concat(future.result() for future in as_completed(futures)))
  File "/home/jpshaffer/software/miniconda3/lib/python3.7/site-packages/conda/core/subdir_data.py", line 95, in query
    self.load()
  File "/home/jpshaffer/software/miniconda3/lib/python3.7/site-packages/conda/core/subdir_data.py", line 149, in load
    _internal_state = self._load()
  File "/home/jpshaffer/software/miniconda3/lib/python3.7/site-packages/conda/core/subdir_data.py", line 246, in _load
    _internal_state = self._process_raw_repodata_str(raw_repodata_str)
  File "/home/jpshaffer/software/miniconda3/lib/python3.7/site-packages/conda/core/subdir_data.py", line 369, in _process_raw_repodata_str
    info['fn'] = fn
TypeError: 'NoneType' object does not support item assignment

$ /home/jpshaffer/software/miniconda3/bin/conda install -c anaconda -c defaults -c conda-forge -c bioconda -c https://conda.anaconda.org/biocore q2-fragment-insertion

environment variables:
CIO_TEST=
CONDA_DEFAULT_ENV=qiime2-2018.8
CONDA_EXE=/home/jpshaffer/software/miniconda3/bin/conda
CONDA_PREFIX=/home/jpshaffer/software/miniconda3/envs/qiime2-2018.8
CONDA_PROMPT_MODIFIER=(qiime2-2018.8)
CONDA_PYTHON_EXE=/home/jpshaffer/software/miniconda3/bin/python
CONDA_ROOT=/home/jpshaffer/software/miniconda3
CONDA_SHLVL=1
MANPATH=/opt/slurm-18.08.0/share/man:/opt/torque-4.2.8/man:
MODULEPATH=/opt/modules/Modules/versions:/opt/modules/Modules/$MODULE_VERSION/mod
ulefiles:/opt/modules/Modules/modulefiles
PATH=/home/jpshaffer/software/miniconda3/envs/qiime2-2018.8/bin:/home/jpsha
ffer/software/miniconda3/bin:/usr/lib64/qt-3.3/bin:/usr/local/bin:/bin
:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/opt/gold/2.2.0.5/sbin:/opt/
gold/2.2.0.5/bin:/opt/torque-4.2.8/bin:/opt/torque-4.2.8/sbin:/opt/mau
i-3.3.1/bin:/opt/slurm-18.08.0/bin:/opt/slurm-18.08.0/sbin
PYTHONNOUSERSITE=/home/jpshaffer/software/miniconda3/envs/qiime2-2018.8/lib/python*/sit
e-packages/
REQUESTS_CA_BUNDLE=
SSL_CERT_FILE=

 active environment : qiime2-2018.8
active env location : /home/jpshaffer/software/miniconda3/envs/qiime2-2018.8
        shell level : 1
   user config file : /home/jpshaffer/.condarc

An unexpected error has occurred. Conda has prepared the above report.
If submitted, this report will be used by core maintainers to improve
future releases of conda.

END OF ERROR PRINTED TO SCREEN

I was able to reproduce the error after uninstalling and reinstalling both Miniconda and the QIIME2 environment.

Please let me know if you need additional information to troubleshoot this error.

Thanks in advance and best wishes,

Justin

rename function

https://github.com/biocore/q2-fragment-insertion/blob/ec8333a62932f312414b670b90a64e1484b75ffd/q2_fragment_insertion/plugin_setup.py#L57

We are now able to pass in other reference trees/alignments. Thus, I think we should rename the QIIME 2 function into something independent of "16S-greengenes", if we intend to compile other references like Silva. Maybe just call it "sepp" ?

@wasade @mortonjt what are your thoughts?

qiime phylogeny align-to-tree-mafft-fasttree

Excuse me, how to solve this problem

Plugin error from phylogeny:

Command '['mafft', '--preservecase', '--inputorder', '--thread', '33', '/tmp/qiime2-archive-tspmm41w/4dd87431-bf1c-465f-8f38-2d4c3a9605cf/data/dna-sequences.fasta']' returned non-zero exit status 1.

Debug info has been saved to /tmp/qiime2-q2cli-err-6e_xupmi.log

classify-otus-experimental tutorial data files are missing

Bug Description
The tutorial (recently moved to the QIIME 2 library) cites a taxonomy_gg99.qza file link that is broken.

Steps to reproduce the behavior
See "assign taxonomy" tutorial here: https://library.qiime2.org/plugins/q2-fragment-insertion/16/

Expected behavior
File should be replaced here, or the link fixed to point elsewhere in the tutorial (probably a better solution).

References
forum xref

renamed files

https://github.com/wasade/q2-fragment-insertion/blob/64d4b52847fef856ebcf01c6459395af0dcb5c7f/q2_fragment_insertion/_insertion.py#L50

Is there a reason why you chose to not use the tree and placement files (.relabelled) that have the restored internal node labels? As far as I understand the code, Siavash assigns every node a unique ID and prefixed the original label with this ID. In a postprocessing step (a generated python program) those IDs get trimmed from the labels to restore their original values.
Thus, users don't see those IDs in e.g. the taxonomy labels of the reference.

Add the ability to get FeatureData[Taxonomy] based on the tree

SILVA reference

Improvement Description
It should be possible to download the QIIME compatible version of Silva and construct reference phylogeny and alignment for SEPP to enable 18S analyses.

Questions

@josenavas @wasade do you know if release 128 is the latest?
How and where would we host SEPP compatible references? Within this Plugin (which is already 130 MB large), on the github repo?

Update readme

with respect to later q2 version and the newly optional reference inputs

filter biom table

add a qiime2 function feature-table -> phylogeny -> feature-table that removes those features not found in phylogeny.
And maybe reports about lost read ratio?!

merge placements

Improvement Description
There are increasing numbers of use cases where one wants to merge placements from different runs against the same reference phylogeny.

Questions

Would it make sense to provide anther "function" within the plugin which accepts a list of placement files and produces one phylogeny out of it, or would that be to much of an expert process that we would not want to expose to the "normal" QIIME2 user to not confuse him/her?
@wasade what is your opinion on that?

Installation does not work

Hello.

I am trying the following:

  conda config --add channels anaconda
  conda config --add channels conda-forge
  conda config --add channels defaults
  conda config --add channels r
  conda config --add channels bioconda
  

  conda install -c qiime2/label/r2018.6 qiime2
  conda install -c anaconda -c defaults -c conda-forge -c bioconda -c https://conda.anaconda.org/biocore q2-fragment-insertion
  qiime dev refresh-cache

But, when trying to "Solve the environment", I am getting the PackagesNotFoundError:

conda install -c anaconda -c defaults -c conda-forge -c bioconda -c https://conda.anaconda.org/biocore q2-fragment-insertion
Solving environment: failed

PackagesNotFoundError: The following packages are not available from current channels:

  - q2-fragment-insertion
  - q2cli[version='>=2017.12.*']
  - q2-fragment-insertion
  - q2-feature-table[version='>=2017.12.*']
  - q2-fragment-insertion
  - q2-types[version='>=2017.12.*']
  - q2-fragment-insertion
  - q2templates[version='>=2017.12.*']

Current channels:

  - https://conda.anaconda.org/anaconda/linux-64
  - https://conda.anaconda.org/anaconda/noarch
  - https://repo.anaconda.com/pkgs/main/linux-64
  - https://repo.anaconda.com/pkgs/main/noarch
  - https://repo.anaconda.com/pkgs/free/linux-64
  - https://repo.anaconda.com/pkgs/free/noarch
  - https://repo.anaconda.com/pkgs/r/linux-64
  - https://repo.anaconda.com/pkgs/r/noarch
  - https://repo.anaconda.com/pkgs/pro/linux-64
  - https://repo.anaconda.com/pkgs/pro/noarch
  - https://conda.anaconda.org/conda-forge/linux-64
  - https://conda.anaconda.org/conda-forge/noarch
  - https://conda.anaconda.org/bioconda/linux-64
  - https://conda.anaconda.org/bioconda/noarch
  - https://conda.anaconda.org/biocore/linux-64
  - https://conda.anaconda.org/biocore/noarch
  - https://conda.anaconda.org/r/linux-64
  - https://conda.anaconda.org/r/noarch

I am trying to create a Singularity container with qiime2 plus your extension.

Thank you.

Anders.

reference as input or parameter

Hi @wasade ,
testing is currently not very convenient, because of the long waiting times. Therefore, I think passing reference tree/alignment would be quite beneficial. I wonder how to design that.

Since both are Semantic Types (FeatureData[AlignedSequence] Phylogeny[Rooted]) they can only be "inputs" not "parameters" right? If so, do you know if it is possible to have optional inputs?

If not, the user needs to always pass reference alignment and reference phylogeny as q2 artifacts. Do we really want to put that burden to users or would we be fine to have two "parameters" (which can be optional) that point to filenames?

P.S. could you invite me to the slack channel for q2?

Migration to bioconda

Improvement Description
I finally was able to clean up Siavash's source code and created a bioconda recipe for SEPP, producing the packages at https://anaconda.org/bioconda/sepp

Note that this package does NOT contain the default Greengenes 13.8 99% reference (which consists of three files a) alignment b) tree c) info file.) In the future, we also want to support alternative references like SILVA.

Proposed Behavior
I wonder how we best do this? I see the following options:

create conda reference packages for GG and / or SILVA
pro: no changes to current behaviour of qiime2
con: where to host? would bioconda accept that?
provide as qiime2 Data resources
pro: smaller downloads when qiime2 gets installed, easy to host
con: user need to do extra work when a) install and b) execute since file paths for all three files need to be provided

Questions
Any thoughts @thermokarst @antgonza ?

References

p-threads

double check if --p-threads is correctly passed to executable

licence

pick a licence!

check fragment names

according to Siavash, SEPP might fail if fragments to be inserted have same names as tips of reference tree. Add a testing function to abort early if user provides conflicting names.

How about internal node names?

Update to depend on memory bug fixed release

Can be found here:

https://github.com/smirarab/sepp-refs/releases

DEP: q2-fragment-insertion can't be upgraded to Python 3.10, due to pinned SEPP dependency on Python <=3.9

Here's the relevant pin. This will need to be addressed for q2-fragment-insertion to stay in the amplicon distribution when QIIME 2 transitions its Python version to 3.10 (planned for the 2024.10 release, which is currently scheduled for 2 October 2024).

Plugin error from fragment-insertion: Command '['run-sepp.sh' returned non-zero exit status 1

Bug Description
Hello I've been trying to use q2-fragment insertion in order to use PICRUSt2, following the instructions from the original source, unfortunately I got an error from this plug in, in some forum I saw the same error, and followed the instructions using this command:
first I tried it with the files of my interest but then I tried the files provided in the tutorial
qiime fragment-insertion sepp --i-representative-sequences mammal_seqs.qza --p-threads 12 --i-reference-alignment reference.fna.qza --i-reference-phylogeny reference.tre.qza --output-dir pruebapicrust2tutorial --p-debug --verbose 2> err.txt > out.txt
there was no follow up on the error.

References
in order to view a more detailed information here are the files
err.txt
out.txt

Comments
I'm using an hp with the following hardware:
AMD® A12-9720p radeon r7, 12 compute cores 4c+8g × 4

I though it may be a problem with the installation, so I removed qiime2 and reinstalled. I updated Anaconda and conda to the latest version.
Thank you

"no action filter-features" qiime fragment-insertion filter-features

Hi there,

I stumbled on a weird behaviour with qiime fragment-insertions where why I run the following I get an error that it cannot file the 'filter-features' option.

qiime fragment-insertion filter-features \
  --i-table $path2table \
  --i-tree insertion-tree.qza \
  --o-filtered-table filtered_table.qza \
  --o-removed-table removed_table.qza

Returns the following error:

Error: QIIME 2 plugin 'fragment-insertion' has no action 'filter-features'.

Further, if I look at the qiime fragment-insertion --help there are only two options classify-otus-experimental and sepp

I would be very grateful for any help you could provide. I'm an amateur bioinformatician and I have now exhausted my troubleshooting skills.

I am running QIIME 2 version 2018.4.0 with fragment-insterion 2018.2.0.dev0. See attached for my complete qiime info output.

Thank you very much for your help (and the easy-to-use software!!)

Courtney

qiime_version.txt

verbosity

If SEPP fails it should be more verbose, i.e. override Siavash's trap function which eliminates protocols and thus hinders debugging.

ENH: "Preserve" original node names

Improvement Description
"Preserve" original feature IDs by renaming with the rename-json.py output by SEPP.

Because SEPP renames nodes , the trees it produces don't play nice with downstream tools like Empress that can color trees using feature metadata.

Current Behavior
This tree cannot be easily colored by taxonomy, because the node IDs do not map to the original feature IDs.

Proposed Behavior
Use the rename-json.py script output by SEPP to "preserve" original feature IDs, probably by exposing a new parameter so as to not impact runtimes.

ITS / 18S

Improvement Description
Jake asked if it would be possible to compute insertion trees not only for 16S but also for 18S and ITS.

Comments
I think that would work in principle, however we would need to create reference trees for the according databases (Silva and Unite). Any comments?

Qiita integration

Hi @antgonza @josenavas ,
I hope that we will have soon completed the q2 plugin for SEPP. I wonder how we would integrate that into Qiita? Can you wrap general qiime2 plugins or would we have to create our own Qiita plugin?
Would you consider SEPP a tool for data processing or (meta)-analysis?

Migrate to QIIME 2 Org?

Following up on a months-old discussion regarding including this plugin in the QIIME 2 Core Distribution. Here on some options for us to proceed:

Migrate this repo to the @qiime2 org, add contributors here as maintainers in the new org
Keep the repo under biocore, give a handful of @qiime2 devs maintainer perms
Something else entirely?

@sjanssen2, I think the easiest path is for us to go with 1 - since this will be the least friction for busywork to be wired up.

@antgonza, @sjanssen2, @ebolyen, @gregcaporaso, @nbokulich (and probably more, apologies if my list is incomplete) have discussed getting this into the "core" distribution of QIIME 2, and we would really like that to happen in time for the upcoming release of QIIME 2 (2018.11, scheduled for this Thursday). I don't expect there to be too much to get this rolled into the distro, but, it would be a lot simpler if we moved this over to @qiime2.

Thoughts, @sjanssen2?

fragment-insertion sepp to display # of inserted features upon completion.

It would be very useful if upon completion sepp would print out the # of successfully inserted features.

So far, working on human and mouse samples I've never had a case where any features failed to be inserted to the tree, but I still do the filtering step and each time my table is unchanged. The filtering step can also take a little time depending on the number of features you have. It would be super convenient if sepp can just take print how many features it inserted and the user could compare that number to their feature-table reads and see if a filtering step is needed or not.
Alternatively, the full insertion and filtering can be turned into a pipeline to do all in one go.

Thanks!