qiime2 / q2-feature-table Goto Github PK

View Code? Open in Web Editor NEW

2.0 11.0 37.0 1.07 MB

QIIME 2 plugin supporting operations on feature tables.

License: BSD 3-Clause "New" or "Revised" License

Python 93.82% HTML 4.95% Makefile 0.07% TeX 0.35% CSS 0.53% JavaScript 0.27%

hacktoberfest

q2-feature-table's Introduction

q2-feature-table

This is a QIIME 2 plugin. For details on QIIME 2, see https://qiime2.org.

q2-feature-table's People

Contributors

Stargazers

Watchers

q2-feature-table's Issues

filter method

should filter based on sample metadata/ids, feature metadata/ids, or counts. In QIIME 1, this is a few different scripts - can/should this be reduced to one in QIIME 2?

summarize is very slow

probably because of the scipy.optimize call...

feature_table.merge should merge replicate samples if it doesn't already

It looks like feature_table.merge fails if there are duplicate samples in the tables being merged, e.g., with the error:

Some samples are present in both tables: sample1...

I agree that this is ideal default behavior, in case someone creates duplicate names in the mapping files accidentally (or more likely if merging tables from unrelated runs or studies that have arbitrary rather than fully randomized sample IDs)

HOWEVER, there should be an option to override this behavior and merge duplicate sample IDs. For example, I am working with some data right now where the same samples were sequenced across multiple runs. It would be nice to be able to easily, intentionally merge these samples.

"Interactive Sample Detail" table viz plot missing y-axis label

Interactive Sample Detail, question/bug?

Ported from qiime2/qiime2#276 (@antgonza)

During the ASM workshop a user pointed out that the "Sequences Retained: 0 (0.00%)" field is confusing, which I agree. The confusing points are:

It starts in 0 with 0.0% retained, where it should be 100%, right?
Once you start moving to the right, it seems to be going up so showing the sequences removed (not retained) and then once you get to around (50%) it starts going up, see screenshot.

error in Sequences retained calculation

Sequences retained should be number of samples retained times even sampling depth, but it's not:

Filter features from a feature-table based on a FASTA file

It would be useful if qiime feature-table filter-features could also filter features based on a FASTA file. Alternatively filtering based on feature ID would probably accomplish this as well.

Merging capabilities should not be limited to two objects at a time

With qiime feature-table merge, users should be able to merge an arbitrary number of tables at a time. Currently the interface only allows for two tables to be merged at a time, which leads to a proliferation of intermediate tables. The same thing applies for qiime feature-table merge-seq-data.

filter-samples/filter-features: support "negate" filtering option

QIIME 1's filtering scripts have an option to negate the filtering operation (i.e. exclude instead of include). It'd be useful to add that functionality to filter-samples/filter-features.

This came up on the forum here.

Filtering samples from a rarefied table forgets uniform sampling

After rarefying a feature table, I filtered some samples and then realized that the uniform sampling property had been lost. I understand this may happen in some cases, but not when you filter samples, this is the command I used:

qiime feature-table filter-samples \
--i-table table.even1250.qza \
--p-where 'env_material != "sterile water"' \
--o-filtered-table table.even1250.filtered.qza \
--m-sample-metadata-file mapping-file.txt

The types went from:

FeatureTable[Frequency] % Properties(['uniform-sampling']) -> FeatureTable[Frequency]

remove h1: Count per feature detail

This is unnecessary with the current layout.

filter_features doesn't work with FeatureTable[Composition]

Questions
How difficult would it be to generalize this to any FeatureTable, regardless of the underlying semantic type contents?

Comments
Looks like this method is restricted to only work for specific table types.

summarize fails with a traceback if passed a table with zero or one sample

This can happen after overzealous filtering.

Traceback (most recent call last):
  File "/Users/caporaso/miniconda3/envs/qiime2-dev/bin/qiime", line 11, in <module>
    load_entry_point('q2cli', 'console_scripts', 'qiime')()
  File "/Users/caporaso/miniconda3/envs/qiime2-dev/lib/python3.5/site-packages/click/core.py", line 716, in __call__
    return self.main(*args, **kwargs)
  File "/Users/caporaso/miniconda3/envs/qiime2-dev/lib/python3.5/site-packages/click/core.py", line 696, in main
    rv = self.invoke(ctx)
  File "/Users/caporaso/miniconda3/envs/qiime2-dev/lib/python3.5/site-packages/click/core.py", line 1060, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Users/caporaso/miniconda3/envs/qiime2-dev/lib/python3.5/site-packages/click/core.py", line 1060, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Users/caporaso/miniconda3/envs/qiime2-dev/lib/python3.5/site-packages/click/core.py", line 889, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/caporaso/miniconda3/envs/qiime2-dev/lib/python3.5/site-packages/click/core.py", line 534, in invoke
    return callback(*args, **kwargs)
  File "/Users/caporaso/Google Drive/code/qiime2/q2cli/q2cli/commands.py", line 210, in __call__
    results = action(**arguments)
  File "<decorator-gen-138>", line 2, in summarize
  File "/Users/caporaso/Google Drive/code/qiime2/qiime2/qiime/core/callable.py", line 221, in callable_wrapper
    output_types)
  File "/Users/caporaso/Google Drive/code/qiime2/qiime2/qiime/core/callable.py", line 418, in _callable_executor_
    ret_val = callable(output_dir=temp_dir, **view_args)
  File "/Users/caporaso/Google Drive/code/qiime2/q2-feature-table/q2_feature_table/_summarize.py", line 67, in summarize
    sample_counts_ax = sns.distplot(sample_counts, kde=False, rug=True)
  File "/Users/caporaso/miniconda3/envs/qiime2-dev/lib/python3.5/site-packages/seaborn/distributions.py", line 209, in distplot
    bins = min(_freedman_diaconis_bins(a), 50)
  File "/Users/caporaso/miniconda3/envs/qiime2-dev/lib/python3.5/site-packages/seaborn/distributions.py", line 30, in _freedman_diaconis_bins
    h = 2 * iqr(a) / (len(a) ** (1 / 3))
TypeError: len() of unsized object

Error message resulting from choosing too high a sampling depth is not clear.

The error message that is obtained when using a value for sampling depth that is to high is vague and doesn't really point out what the problem is. I found it by looking at the sampling depth qzv and csv. It was giving this 2-d 3-d error when I chose a sampling depth that I thought was reasonable, but it turns out wasn't.
Thanks,
Arron

replace all occurrences of "count" with "frequency"

BUG: summarize extended scroll

Bug Description
When viewing the "Overview" tab in a summarize visualization, if you click into the detailed HTML views of either "Frequency per sample detail" or "Frequency per feature detail," pressing the back button to get back to the main "Overview" tab doesn't resize the window contents vertically. If the per feature table is long, this causes the "Overview" tab to have an equal height made up of whitespace, which looks pretty awesome, but isn't really useful!

Comments
This isn't related to the iframe hackery in q2view, I was able to reproduce this in both q2view and qiime tools view.

Summarize Visualization - Plot Sampling Depth vs Retained Seqs/Samples

Improvement Description
It would be useful to be able to view the curve of sampling depth against either retained sequences or retained samples.

Comments
This would enable one to better evaluate the cost/benefit trade-off of sampling depth versus retention.

rename repository to feature-table

change package name to q2feature_table

Shared vs. Unique OTUs

All,

As an ecologist, quantifying and IDing which microbes are uniquely present in different environments may aid in a functional association with the holobiont. In familiarizing myself with QIIME (not QIIME 2, yet), I've noticed that there is no script for doing so, where, for example, using a mapping file and biom table, the user may quantify the unique as well as shared OTUs between 'treatments' (whether that be geographic location, environmental stress, etc.).

As holobiont or hologenome ecology is becoming an increasingly important in discerning the evolution of holobionts and hologenomes, I write this query to see if there is interest in developing such a script (that, ideally, is compatable with both QIIME and QIIME 2.0).

Regards,
Tyler

integrate taxonomic composition in samples in table summaries

Improvement Description

some users at the iceland workshop noted that it would be useful to know the taxonomic composition of the samples when they're choosing their even sampling depth based on the summarize output - this should be possible, but would require optional artifacts and would have some overlap with the existing taxa plots (so this is lower priority)

References
Taken from #24

add option for seeding rarefaction?

Improvement Description
A forum user suggested that we add support for seeding rarefaction, which is an interesting idea for supporting reproducibility, though I'm not certain what the specific use cases would be.

Questions
Are there times where we would want to perfectly replicate rarefaction results? If so, we'd need the seed to be logged into the artifact's provenance.

References
suggested

filter-features by feature names

Filter a feature table to match a list of feature names. Some scenarios for filtering based on:

file containing list of names on separate lines, e.g., your 100 favorite taxa
string containing comma-separated list of taxa to keep/remove, e.g, 'chloroplast' (similar to qiime1's filter_taxa_from_otu_table.html). This function is really essential when working with host- and especially plant-associated samples (which may contain high levels of chloroplast and mitochondria).
somehow pipe in names directly from ANCOM or other statistical tests (this should probably be generalizable so other plugins can follow suit — e.g., ANCOM or other methods need to optionally output a list of feature IDs that can be read to filter-features following option 1 above)

The third example is what really got me thinking here — it would be great to take a list of feature IDs from something like ANCOM that are significantly different among groups (for example) and use this to filter feature tables before doing something like PCoA biplots, correlation tests, or additional significance testing.

command renaming and shuffling

~~feature-table merge-taxa-data → taxa merge~~ (once we have type maps, merge-taxa-data and merge-seq-data will be combined to a single method, probably merge-feature-data - i'm going to leave this as is for now, so we're not bouncing this functionality back and forth)
feature-table view-taxa-data → taxa tabulate
feature-table view-seq-data → feature-table tabulate-seqs

support filtering FeatureData[Sequence]

Feature request came up on the forum. Add a method to support filtering sequences from FeatureData[Sequence] artifacts. This is something we'll need for feature parity with QIIME 1.

use more general term for "sequence" in `summarize` viz

Some of the text in feature-table summarize references "sequences", such as "Sequence Count" and "Sequences retained". Find a more general term to represent this concept.

remove max_frequency_even_sampling_depth functionality from summarize

This is the latest failed attempt at guessing an even sampling depth. It is never what we want to use in practice (in my experience) so we should drop it.

"Interactive Sample Detail" tab of table summary should include the fraction of reads filtered

Typing in Sampling Depth for interactive tab doesn't update

Copied-and-pasted 4000 into the input field, the plot updates, but not the "Sample Loss" text

add FeatureData visualizer

This should allow for easy lookup of feature ids to feature data, and I think the same method should work for FeatureData[Sequence] or FeatureData[Taxonomy].

filter-features demands metadata map, even though it is optional

I see in the help menu that a metadata map is optional:
--m-feature-metadata-file PATH Metadata mapping file [optional]

However, if I attempt to run without a map specified I get the following error (full error below):
OSError: Metadata file None doesn't exist or isn't accessible (e.g., due to incompatible file permissions).

Full command and error:

$ qiime feature-table filter-features --i-table $projectdir/table.qza 
--o-filtered-table $projectdir/picrust/table.filter.qza --p-min-frequency 10 
--p-min-samples 3

Traceback (most recent call last):
  File "/Users/nbokulich/miniconda3/envs/qiime2-05/lib/python3.5/site-packages/qiime-2.0.5-py3.5.egg/qiime/metadata.py", line 25, in load
    df = pd.read_csv(path, sep='\t', dtype=object)
  File "/Users/nbokulich/miniconda3/envs/qiime2-05/lib/python3.5/site-packages/pandas/io/parsers.py", line 645, in parser_f
    return _read(filepath_or_buffer, kwds)
  File "/Users/nbokulich/miniconda3/envs/qiime2-05/lib/python3.5/site-packages/pandas/io/parsers.py", line 388, in _read
    parser = TextFileReader(filepath_or_buffer, **kwds)
  File "/Users/nbokulich/miniconda3/envs/qiime2-05/lib/python3.5/site-packages/pandas/io/parsers.py", line 729, in __init__
    self._make_engine(self.engine)
  File "/Users/nbokulich/miniconda3/envs/qiime2-05/lib/python3.5/site-packages/pandas/io/parsers.py", line 922, in _make_engine
    self._engine = CParserWrapper(self.f, **self.options)
  File "/Users/nbokulich/miniconda3/envs/qiime2-05/lib/python3.5/site-packages/pandas/io/parsers.py", line 1389, in __init__
    self._reader = _parser.TextReader(src, **kwds)
  File "pandas/parser.pyx", line 373, in pandas.parser.TextReader.__cinit__ (pandas/parser.c:4019)
  File "pandas/parser.pyx", line 683, in pandas.parser.TextReader._setup_parser_source (pandas/parser.c:8144)
OSError: Expected file path name or file-like object, got <class 'NoneType'> type

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/nbokulich/miniconda3/envs/qiime2-05/bin/qiime", line 6, in <module>
    sys.exit(q2cli.__main__.qiime())
  File "/Users/nbokulich/miniconda3/envs/qiime2-05/lib/python3.5/site-packages/click/core.py", line 716, in __call__
    return self.main(*args, **kwargs)
  File "/Users/nbokulich/miniconda3/envs/qiime2-05/lib/python3.5/site-packages/click/core.py", line 696, in main
    rv = self.invoke(ctx)
  File "/Users/nbokulich/miniconda3/envs/qiime2-05/lib/python3.5/site-packages/click/core.py", line 1060, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Users/nbokulich/miniconda3/envs/qiime2-05/lib/python3.5/site-packages/click/core.py", line 1060, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Users/nbokulich/miniconda3/envs/qiime2-05/lib/python3.5/site-packages/click/core.py", line 889, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/nbokulich/miniconda3/envs/qiime2-05/lib/python3.5/site-packages/click/core.py", line 534, in invoke
    return callback(*args, **kwargs)
  File "/Users/nbokulich/miniconda3/envs/qiime2-05/lib/python3.5/site-packages/q2cli-0.0.5-py3.5.egg/q2cli/commands.py", line 181, in __call__
  File "/Users/nbokulich/miniconda3/envs/qiime2-05/lib/python3.5/site-packages/q2cli-0.0.5-py3.5.egg/q2cli/commands.py", line 231, in handle_in_params
  File "/Users/nbokulich/miniconda3/envs/qiime2-05/lib/python3.5/site-packages/q2cli-0.0.5-py3.5.egg/q2cli/handlers.py", line 297, in get_value
  File "/Users/nbokulich/miniconda3/envs/qiime2-05/lib/python3.5/site-packages/qiime-2.0.5-py3.5.egg/qiime/metadata.py", line 30, in load
    "due to incompatible file permissions)." % path)
OSError: Metadata file None doesn't exist or isn't accessible (e.g., due to incompatible file permissions).

`summarize`: table in "Feature Detail" tab shows frequencies to one decimal place

This doesn't really make sense, should just be an integer.

feature-table view-seq-data should allow for downloading of fasta file

A student at the workshop mentioned that he likes the BLAST links a lot, but would also like to have the fasta file in case he wants to blast some other sequence against his data. This would be easy to add.

port `collapse_samples.py` functionality from QIIME 1

This can be used for collapsing replicate groups, or for combining samples that are sequenced in different sequencing runs/lanes.

squash commits

Can't merge sequence data with lowercase characters

Some of the sequences I was working with included lowercase characters (as, cs, gs and ts) that cannot be merged as the sequence parser fails and gives an unworkable error.

(qiime2-2017.4) yovazquezbaeza:data-from-qiita$ qiime feature-table merge-seq-data --i-data1 3168-sequences.qza --i-data2 2388-sequences.qza --o-merged-data sequences.qza --verbose
Traceback (most recent call last):
  File "/home/yovazquezbaeza/miniconda/envs/qiime2-2017.4/lib/python3.5/site-packages/q2cli/commands.py", line 218, in __call__
    results = action(**arguments)
  File "<decorator-gen-153>", line 2, in merge_seq_data
  File "/home/yovazquezbaeza/miniconda/envs/qiime2-2017.4/lib/python3.5/site-packages/qiime2/sdk/action.py", line 168, in callable_wrapper
    recorder)
  File "/home/yovazquezbaeza/miniconda/envs/qiime2-2017.4/lib/python3.5/site-packages/qiime2/sdk/result.py", line 234, in _view
    result = transformation(self._archiver.data_dir)
  File "/home/yovazquezbaeza/miniconda/envs/qiime2-2017.4/lib/python3.5/site-packages/qiime2/core/transform.py", line 59, in transformation
    new_view = transformer(view)
  File "/home/yovazquezbaeza/miniconda/envs/qiime2-2017.4/lib/python3.5/site-packages/qiime2/core/transform.py", line 183, in wrapped
    return transformer(view.file.view(self._wrapped_view_type))
  File "/home/yovazquezbaeza/miniconda/envs/qiime2-2017.4/lib/python3.5/site-packages/q2_types/feature_data/_transformer.py", line 150, in _15
    for sequence in _read_dna_fasta(str(ff)):
  File "/home/yovazquezbaeza/miniconda/envs/qiime2-2017.4/lib/python3.5/site-packages/skbio/io/registry.py", line 506, in <genexpr>
    return (x for x in itertools.chain([next(gen)], gen))
  File "/home/yovazquezbaeza/miniconda/envs/qiime2-2017.4/lib/python3.5/site-packages/skbio/io/registry.py", line 531, in _read_gen
    yield from reader(file, **kwargs)
  File "/home/yovazquezbaeza/miniconda/envs/qiime2-2017.4/lib/python3.5/site-packages/skbio/io/registry.py", line 1008, in wrapped_reader
    yield from reader_function(fhs[-1], **kwargs)
  File "/home/yovazquezbaeza/miniconda/envs/qiime2-2017.4/lib/python3.5/site-packages/skbio/io/format/fasta.py", line 677, in _fasta_to_generator
    **kwargs)
  File "/home/yovazquezbaeza/miniconda/envs/qiime2-2017.4/lib/python3.5/site-packages/skbio/sequence/_grammared_sequence.py", line 338, in __init__
    self._validate()
  File "/home/yovazquezbaeza/miniconda/envs/qiime2-2017.4/lib/python3.5/site-packages/skbio/sequence/_grammared_sequence.py", line 362, in _validate
    list(self.alphabet)))
ValueError: Invalid character in sequence: b'g'.
Valid characters: ['-', 'B', 'A', 'H', 'R', 'V', 'W', 'N', 'Y', 'M', 'K', 'G', 'C', '.', 'T', 'D', 'S']
Note: Use `lowercase` if your sequence contains lowercase characters not in the sequence's alphabet.

Plugin error from feature-table:

  Invalid character in sequence: b'g'.  Valid characters: ['-', 'B',
  'A', 'H', 'R', 'V', 'W', 'N', 'Y', 'M', 'K', 'G', 'C', '.', 'T', 'D',
  'S'] Note: Use `lowercase` if your sequence contains lowercase
  characters not in the sequence's alphabet.

See above for debug info.

Notice that in this case the suggestion to Use lowercase .... is only available through the skbio interface, not from the QIIME 2 interface.

filter-features additional functionality

I apologize if the following is already planned, or if I have misunderstood what this method is actually doing. It is also possible that some of what I have written below is possible with where clauses — in which case I recommended documenting each of these cases when docs are released, as I believe these could be fairly standard uses.

Input: this method only accepts FeatureTable[Frequency]. It would be great to use on FeatureTable[RelativeFrequency], e.g., for filtering by relative abundance.

Filtering Criteria: this method currently only filters on min/max frequency SUMs for a given OTU, which is useful for many situations (e.g., remove all singletons) but not all. A few other cases that would be useful:

Relative frequencies: filter by relative frequencies. This is useful for many cases, and both sum abundance and the criteria suggested below should be applicable to either freq or relative freq
min/max average freq/relative freq: e.g., remove features that comprise <1% average abundance across samples
min/max minimum freq/relative freq: e.g., remove features that are not detected at a minimum of 5% in at least one sample, or remove features detected at more than 5% relative abundance in at least one sample (the latter is probably less useful, but who knows).
min/max maximum freq/relative freq: e.g., remove features that are not detected at 5% in at least one sample, or those that are detected above a certain maximum threshold. The former is a very useful method, e.g., for filtering before making taxonomy barplots, so that only taxa that are abundant in at least one sample are shown.

What to do with filtered features?: In relative frequency tables, it may be useful to keep relative frequencies intact after filtering, e.g., if a user wants to view abundant taxa on a barplot without altering actual relative frequencies (e.g., filtered groups will appear as empty space or as "Other"), or focus on specific clades without altering relative frequencies. Both of these examples relate to taxonomy tables, so may be more relevant for q2-taxa barplots instead...

BUG: Can't download FASTA file from tabulate-seqs

The convenience link for downloading the source FASTA appears to be broken (at least when used on q2view).

interactive exploration of rarefaction depth in table summaries

interactive exploration of rarefaction depth, as we had in q2d2. this would be really helpful to integrate with metadata, so you can determine how many samples of each metadata type a specific depth would filter from your data set

Taken from #24

add merge method

will initially merge tables that don't have overlapping sample ids (will error if they do).

Should biom.Table operations be done inplace when possible?

While working on filter with @gregcaporaso, we noticed a few methods are using inplace=False, which could be undesirable if the table is large enough to possibly fill the user's memory. Should methods that allow inplace be set to True?

Ping: @qiime2/core-developers

viz: see how many samples a feature is observed in

This came up on the forum here. One way to accomplish this is to compute qualitative, as opposed to quantitative, stats in feature-table summarize. This is similar to biom summarize-table --qualitative --observations. Instead of a flag we can just compute qualitative and quantitative stats in the viz.

where clause URL appears to yield a 404

These lines specifically. Tested using 2.0.6, which shows the following URL:

https://docs.qiime2.org/2.0.6/tutorials/table-filtering.html

tabulate-seqs html table sorting

This came up in the PHX workshop.

feature-table heatmaps would be useful for exploring data

update to use types defined in q2-types

summarize tabs break on iframe navigation

Navigation through the history while in a summarize tab will break the overall tab controller. If you're on a tab, go to a second, and navigate back the iframe will change location but the tab for your previous location will still be highlighted.

BUG: `summarize` nav tabs don't match current state

Visiting the "Interactive Sample Detail" or the "Feature Detail" tabs via the "html" links

the user is redirected to the appropriate location, however when you scroll up, the tabs don't correctly indicate the "active" tab:

When navigating to this view by clicking on the tab, things look good!

summarize improvements

add thousands separator (e.g., '{:,}'.format(1234567890), maybe think about internationalization here...)
reverse sort frequency per sample and frequency per feature tables, so highest frequencies are on top
add frequency per feature tables
the width of the bars in the sample/feature count plots are confusing to users (they can't really tell that this is a histogram) when there are few samples. we should make the bin size smaller. for example:

interactive exploration of rarefaction depth, as we had in q2d2. this would be really helpful to integrate with metadata, so you can determine how many samples of each metadata type a specific depth would filter from your data set ported to #63
some users at the iceland workshop noted that it would be useful to know the taxonomic composition of the samples when they're choosing their even sampling depth based on the summarize output - this should be possible, but would require optional artifacts and would have some overlap with the existing taxa plots (so this is lower priority) ported to #64

BUG: `summarize ` scroll state inconsistent

When changing between views when using the "html" links, the scroll state is not reset. This is potentially confusing --- "why am I looking at the middle of a table all of a sudden?!". This might actually be an issue with q2templates, but thought it relevant to log here (possibly related to #115)

qiime2 / q2-feature-table Goto Github PK

q2-feature-table's Introduction

q2-feature-table

q2-feature-table's People

Contributors

Stargazers

Watchers

Forkers

q2-feature-table's Issues

Recommend Projects

Recommend Topics

Recommend Org