qiime2 / q2-gneiss Goto Github PK

View Code? Open in Web Editor NEW

0.0 0.0 15.0 2.19 MB

QIIME2 plugin for Gneiss

License: BSD 3-Clause "New" or "Revised" License

Python 99.46% Makefile 0.19% TeX 0.35%

hacktoberfest

q2-gneiss's Introduction

qiime2 (the QIIME 2 framework)

Source code repository for the QIIME 2 framework.

QIIME 2™ is a powerful, extensible, and decentralized microbiome bioinformatics platform that is free, open source, and community developed. With a focus on data and analysis transparency, QIIME 2 enables researchers to start an analysis with raw DNA sequence data and finish with publication-quality figures and statistical results.

Visit https://qiime2.org to learn more about the QIIME 2 project.

Installation

Detailed instructions are available in the documentation.

Users

Head to the user docs for help getting started, core concepts, tutorials, and other resources.

Just have a question? Please ask it in our forum.

Developers

Please visit the contributing page for more information on contributions, documentation links, and more.

Citing QIIME 2

If you use QIIME 2 for any published research, please include the following citation:

Bolyen E, Rideout JR, Dillon MR, Bokulich NA, Abnet CC, Al-Ghalith GA, Alexander H, Alm EJ, Arumugam M, Asnicar F, Bai Y, Bisanz JE, Bittinger K, Brejnrod A, Brislawn CJ, Brown CT, Callahan BJ, Caraballo-Rodríguez AM, Chase J, Cope EK, Da Silva R, Diener C, Dorrestein PC, Douglas GM, Durall DM, Duvallet C, Edwardson CF, Ernst M, Estaki M, Fouquier J, Gauglitz JM, Gibbons SM, Gibson DL, Gonzalez A, Gorlick K, Guo J, Hillmann B, Holmes S, Holste H, Huttenhower C, Huttley GA, Janssen S, Jarmusch AK, Jiang L, Kaehler BD, Kang KB, Keefe CR, Keim P, Kelley ST, Knights D, Koester I, Kosciolek T, Kreps J, Langille MGI, Lee J, Ley R, Liu YX, Loftfield E, Lozupone C, Maher M, Marotz C, Martin BD, McDonald D, McIver LJ, Melnik AV, Metcalf JL, Morgan SC, Morton JT, Naimey AT, Navas-Molina JA, Nothias LF, Orchanian SB, Pearson T, Peoples SL, Petras D, Preuss ML, Pruesse E, Rasmussen LB, Rivers A, Robeson MS, Rosenthal P, Segata N, Shaffer M, Shiffer A, Sinha R, Song SJ, Spear JR, Swafford AD, Thompson LR, Torres PJ, Trinh P, Tripathi A, Turnbaugh PJ, Ul-Hasan S, van der Hooft JJJ, Vargas F, Vázquez-Baeza Y, Vogtmann E, von Hippel M, Walters W, Wan Y, Wang M, Warren J, Weber KC, Williamson CHD, Willis AD, Xu ZZ, Zaneveld JR, Zhang Y, Zhu Q, Knight R, and Caporaso JG. 2019. Reproducible, interactive, scalable and extensible microbiome data science using QIIME 2. Nature Biotechnology 37:852–857. https://doi.org/10.1038/s41587-019-0209-9

q2-gneiss's People

Contributors

Watchers

Forkers

ebolyen mortonjt jairideout jakereps gregcaporaso jwdebelius nbokulich lisa55asil qiyunzhu chriskeefe turanoo oddant1 andrewsanchez lizgehret

q2-gneiss's Issues

Balance taxonomy colors are inconsistent

References
From here

Error in gneiss tutorial

Hello,
I am following gneiss tutorial and I stuck at the beginning, when run
qiime gneiss correlation-clustering --i-table table.qza --o-clustering hierarchy.qza

I got this error:
Plugin error from gneiss:

Argument to parameter 'table' is not a subtype of FeatureTable[Composition].

Even when use my files, I got the same error.

Is there any updates for the plugin that not posted on the web

I wonder if any could help me.
Thanks

numerical values encoded as categories can raise error

The original problem was spotted here
https://forum.qiime2.org/t/plugin-error-from-gneiss-cannot-perform-reduce-with-flexible-type/8717

If the metadata column of interest that encodes categorical values that can also be represented as numerical values, then matplotlib will convert the values to numerical values and throw a flexible types error. The fix in the meantime is to convert the categories into strings that cannot be converted into numerical values (i.e. A, B, C, ...).

An example of the error message can be found below.

Traceback (most recent call last):
  File "/Users/jmorton/miniconda3/envs/qiime2-2019.1/lib/python3.6/site-packages/q2cli/commands.py", line 274, in __call__
    results = action(**arguments)
  File "</Users/jmorton/miniconda3/envs/qiime2-2019.1/lib/python3.6/site-packages/decorator.py:decorator-gen-277>", line 2, in balance_taxonomy
  File "/Users/jmorton/miniconda3/envs/qiime2-2019.1/lib/python3.6/site-packages/qiime2/sdk/action.py", line 231, in bound_callable
    output_types, provenance)
  File "/Users/jmorton/miniconda3/envs/qiime2-2019.1/lib/python3.6/site-packages/qiime2/sdk/action.py", line 427, in _callable_executor_
    ret_val = self._callable(output_dir=temp_dir, **view_args)
  File "/Users/jmorton/miniconda3/envs/qiime2-2019.1/lib/python3.6/site-packages/q2_gneiss/plot/_plot.py", line 138, in balance_taxonomy
    palette=sample_palette)
  File "/Users/jmorton/miniconda3/envs/qiime2-2019.1/lib/python3.6/site-packages/gneiss/plot/_decompose.py", line 74, in balance_boxplot
    a = sns.boxplot(ax=ax, x=balance_name, data=data, **kwargs)
  File "/Users/jmorton/miniconda3/envs/qiime2-2019.1/lib/python3.6/site-packages/seaborn/categorical.py", line 2237, in boxplot
    plotter.plot(ax, kwargs)
  File "/Users/jmorton/miniconda3/envs/qiime2-2019.1/lib/python3.6/site-packages/seaborn/categorical.py", line 549, in plot
    self.draw_boxplot(ax, boxplot_kws)
  File "/Users/jmorton/miniconda3/envs/qiime2-2019.1/lib/python3.6/site-packages/seaborn/categorical.py", line 486, in draw_boxplot
    **kws)
  File "/Users/jmorton/miniconda3/envs/qiime2-2019.1/lib/python3.6/site-packages/matplotlib/__init__.py", line 1867, in inner
    return func(ax, *args, **kwargs)
  File "/Users/jmorton/miniconda3/envs/qiime2-2019.1/lib/python3.6/site-packages/matplotlib/axes/_axes.py", line 3571, in boxplot
    labels=labels, autorange=autorange)
  File "/Users/jmorton/miniconda3/envs/qiime2-2019.1/lib/python3.6/site-packages/matplotlib/cbook/__init__.py", line 1843, in boxplot_stats
    stats['mean'] = np.mean(x)
  File "/Users/jmorton/miniconda3/envs/qiime2-2019.1/lib/python3.6/site-packages/numpy/core/fromnumeric.py", line 2920, in mean
    out=out, **kwargs)
  File "/Users/jmorton/miniconda3/envs/qiime2-2019.1/lib/python3.6/site-packages/numpy/core/_methods.py", line 75, in _mean
    ret = umr_sum(arr, axis, dtype, out, keepdims)
TypeError: cannot perform reduce with flexible type

Missing plots when there are multiple categories

There was some confusion with the flags. If there were multiple categories, this would prevent the boxplot figure to save. First noticed in this post

Split ilr-transform into two commands

I think it may be a good idea to have two ilr transform commands, namely

ilr-hierarchical
ilr-phylogenetic

That way we can explicitly handle phylogenetic trees. No need for the assign-ids command, especially if we can fold this inside of the ilr-transform. Any thoughts?

Explicit support for phylogenies

Improvement Description
There are quite a few problems preventing phylogenies to be used in the qiime2 interface - namely because of the lack of tree tip filtering methods through the q2cli.

This can either be resolved by creating a tip filtering command, or even better, automatically filter tips (via gneiss.util.match_tips) from the tree prior to performing any gneiss command.

Questions
Any takers on this? cc @tanaes @wasade

Match metadata to tables for every qiime2 command

Originally here

biocore/gneiss#206

This would be useful to have, so that the user isn't forced to worry about weird sample matching.
May want to raise an error in case samples get filtered out.

error if feature table samples aren't found in metadata

An error should be raised if any sample IDs in the feature table aren't present in the sample metadata. This is to match the behavior in QIIME 1 and 2 where metadata IDs can be a superset of the table IDs, but every table ID must have corresponding metadata.

forum x-ref

Match tips when tree is larger than table in heatmap command

When trying to run the heatmap function as follows

qiime gneiss dendrogram-heatmap \
    --i-table voles_137_sortmerna_filtered_even18000_no177_filt100_composition.biom.qza \
    --i-tree phylogeny.qza \
    --m-metadata-file chernobyl_map_v2_no177.txt \
    --m-metadata-category treatment \
    --o-visualization rad_heatmap_no177 \
    --p-ndim 10 --p-method clr --p-color-map seismic

It can throw an error (see below).

Traceback (most recent call last):
  File "/Users/mortonjt/miniconda3/envs/q2-gneiss/lib/python3.5/site-packages/q2cli/commands.py", line 222, in __call__
    results = action(**arguments)
  File "<decorator-gen-261>", line 2, in dendrogram_heatmap
  File "/Users/mortonjt/miniconda3/envs/q2-gneiss/lib/python3.5/site-packages/qiime2/sdk/action.py", line 203, in callable_wrapper
    output_types, provenance)
  File "/Users/mortonjt/miniconda3/envs/q2-gneiss/lib/python3.5/site-packages/qiime2/sdk/action.py", line 363, in _callable_executor_
    ret_val = callable(output_dir=temp_dir, **view_args)
  File "/Users/mortonjt/Dropbox/UCSD/research/software/q2/q2-gneiss/q2_gneiss/plot/_plot.py", line 198, in dendrogram_heatmap
    highlight_width=0.01, figsize=(12, 8))
  File "/Users/mortonjt/miniconda3/envs/q2-gneiss/lib/python3.5/site-packages/gneiss-0.4.1-py3.5.egg/gneiss/plot/_heatmap.py", line 126, in heatmap
    _plot_highlights_dendrogram(ax_highlights, table, t, highlights)
  File "/Users/mortonjt/miniconda3/envs/q2-gneiss/lib/python3.5/site-packages/gneiss-0.4.1-py3.5.egg/gneiss/plot/_heatmap.py", line 179, in _plot_highlights_dendrogram
    node = t.find(n)
  File "/Users/mortonjt/miniconda3/envs/q2-gneiss/lib/python3.5/site-packages/skbio/tree/_tree.py", line 1562, in find
    raise MissingNodeError("Node %s is not in self" % name)
skbio.tree._exception.MissingNodeError: Node 1L-9fe19287-17b2-4dd7-ad52-60ff31dc67ad is not in self

Turns out that this happens when the size of the tree is larger than the table, and some of the internal nodes get filtered out prior to the actual rendering. Thanks @cuttlefishh for catching this!

Proportion plots can be misleading with few features

Improvement Description
The proportion plots can be a little misleading at the moment.

Right now, if there is only 1 feature in the numerator or the denominator, the proportion-plots will plot multiple features, even there is only 1.

The proportion plot should only plot 1 feature if there is only 1 feature.

Current Behavior

Here, there is only 1 feature in the denominator, but for some reason there are 5 features plotted.

Future ideas for `balance_taxonomy`

Improvement Description
Showing the full taxonomy string from the root to the current level is usually how we display the taxonomic information. That would make the Balance Taxonomy barplot match q2-taxa a bit better.

Additionally, the proportion plot should probably have the full taxonomy string, since it is representing a given balance tip. Alternatively, it might make sense for those to be collapsed to the taxa level as well.

It would be nice if the scatter plot was also colored by the partition group when working with numeric metadata.

Add pseudocount parameter to visualizations

Add pseudocount parameter to

Balance Taxonomy
https://github.com/qiime2/q2-gneiss/blob/master/q2_gneiss/plot/_plot.py#L31

Dendrogram heatmap
https://github.com/qiime2/q2-gneiss/blob/master/q2_gneiss/plot/_plot.py#L344

Built in support for tax2tree

It would be nice to have internal nodes of the tree with meaningful names - possibly annotated with tax2tree

An example can be found here.
biocore/gneiss#24

BUG: balance_boxplots long y-axis labels get trimmed

Bug Description
balance_boxplot labels are trimmed in the PNG + PDF if labels are longer than plot dimensions.

Steps to reproduce the behavior
See forum xref

Expected behavior
Use tight layout to accommodate longer labels

Screenshots
From the forum:

Pushing / refactoring regression plotting functions

Improvement Description
I think it would be more suitable to push the regression diagnostic plots into this repository.

Especially since these plots were specifically designed for the q2 interface.

This will require a bit of work, so this is probably more suitable for the next release.

References
regression diagnostic plots

Cleaner metadata categories in dendrogram heatmap

References
https://forum.qiime2.org/t/gneiss-gradient-clustering-metadata-categorical-no-longer-an-option/3982/4

add "download as pdf" button to dendrogram heatmap qzv

References
This issue was originally here

ILR ordination plots

One of the things that makes the ILR transform really hard to use is its difficulty interpreting the balances.

I propose that we represent the ILR transform as an ordination object, something as follows in the below picture.

For each clade, -1 indicates if the species belongs to the denominator, +1 indicates if the species belongs to the numerator, and 0 indicates that it doesn't belong in that particular clade.

The sample loadings represents the log-ratios to be plotted in emperor, and the feature loadings represents the tips to be colored in empress.

The only minor concern is the memory requirements. If there are 100k microbes, then this object will be 100k x 100k.
So there will need to be some pruning for the really massive objects.
To figure out how to select the balances of interest, I propose two possible options

Select the top k balances through variance
Pass in a list of balances one wishes to analyze.

The only question I have now is, are the clade-membership values best represented as feature-metadata, or feature-loadings?

CC @ElDeveloper @fedarko

Typo in correlation-clustering table parameter text

See these lines (missing a space between tip and corresponds).

Using `class` in the formula string gives SyntaxError

Questions
@mortonjt, should we file this issue for gneiss as well?

References*
Full details in this forum post.

Remove deprecated actions

Improvement Description
#63 marked several actions for deprecation. We should remove those now that they have been deprecated for a whole release cycle (technically two now, unless we cut a 2019.10 patch).

statsmodels 0.9.0 appears to break existing tests

Should investigate updating tests, pinning statsmodels, or some other unconsidered option(s).

Adding support for FeatureTensor types

Addition Description
I have a use case where I would like to compute the ILR transform on an entire tensor of (samples) x (features) x (monte_carlo_estimates). Doing so would open the doors for enabling Bayesian phylogenetic inference for differential abundance tasks.

Current Behavior
Doesn't exist

Proposed Behavior
Introduce a new function qiime gneiss ilr-phylogenetic-tensor taking the FeatureTensor type proposed in q2-differential as input.

Comments
This is going to depend on biocore/gneiss#288

pvalues disagree between interactive plot

Bug Description
p-values disagree between interactive plot and

References
https://forum.qiime2.org/t/gneiss-downloaded-p-value-csv-disagrees-with-interactive-summary/7972/10

balance-taxonomy CSV header inconsistencies

Bug Description
When the artifact generated by qiime gneiss balance-taxonomy is exported with qiime tools export, the resulting files include a “numerator.csv” and “denominator.csv” file.

The “usual” header line in these files contains: Feature ID,0,1,2,3,4,5,6.

However, if the file only contains only one feature, then the header becomes: ,0,1,2,3,4,5,6. The Feature ID is missing. This isn’t a big deal unless one tries to aggregate multiple “numerator.csv” and “denominator.csv” files together, which then produces some problems.

References
Ported wholesale from the forum

Add Citations

Should use the new citation API in qiime2/qiime2#387