Git Product home page Git Product logo

q2-feature-table's Introduction

q2-feature-table

This is a QIIME 2 plugin. For details on QIIME 2, see https://qiime2.org.

q2-feature-table's People

Contributors

andrewsanchez avatar antgonza avatar benkaehler avatar cherman2 avatar chriskeefe avatar clockwork-rat avatar david-rod avatar dwthomas avatar ebolyen avatar eldeveloper avatar ghubrakesh avatar gregcaporaso avatar gwarmstrong avatar hagenjp avatar jairideout avatar jakereps avatar jwdebelius avatar lizgehret avatar maxvonhippel avatar nbokulich avatar oddant1 avatar patthehat033 avatar q2d2 avatar stevendbrown avatar thermokarst avatar turanoo avatar wasade avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

q2-feature-table's Issues

filter method

should filter based on sample metadata/ids, feature metadata/ids, or counts. In QIIME 1, this is a few different scripts - can/should this be reduced to one in QIIME 2?

feature_table.merge should merge replicate samples if it doesn't already

It looks like feature_table.merge fails if there are duplicate samples in the tables being merged, e.g., with the error:

Some samples are present in both tables: sample1...

I agree that this is ideal default behavior, in case someone creates duplicate names in the mapping files accidentally (or more likely if merging tables from unrelated runs or studies that have arbitrary rather than fully randomized sample IDs)

HOWEVER, there should be an option to override this behavior and merge duplicate sample IDs. For example, I am working with some data right now where the same samples were sequenced across multiple runs. It would be nice to be able to easily, intentionally merge these samples.

Interactive Sample Detail, question/bug?

Ported from qiime2/qiime2#276 (@antgonza)

During the ASM workshop a user pointed out that the "Sequences Retained: 0 (0.00%)" field is confusing, which I agree. The confusing points are:

  • It starts in 0 with 0.0% retained, where it should be 100%, right?
  • Once you start moving to the right, it seems to be going up so showing the sequences removed (not retained) and then once you get to around (50%) it starts going up, see screenshot.

feat-table

Merging capabilities should not be limited to two objects at a time

With qiime feature-table merge, users should be able to merge an arbitrary number of tables at a time. Currently the interface only allows for two tables to be merged at a time, which leads to a proliferation of intermediate tables. The same thing applies for qiime feature-table merge-seq-data.

Filtering samples from a rarefied table forgets uniform sampling

After rarefying a feature table, I filtered some samples and then realized that the uniform sampling property had been lost. I understand this may happen in some cases, but not when you filter samples, this is the command I used:

qiime feature-table filter-samples \
--i-table table.even1250.qza \
--p-where 'env_material != "sterile water"' \
--o-filtered-table table.even1250.filtered.qza \
--m-sample-metadata-file mapping-file.txt

The types went from:

FeatureTable[Frequency] % Properties(['uniform-sampling']) -> FeatureTable[Frequency]

summarize fails with a traceback if passed a table with zero or one sample

This can happen after overzealous filtering.

Traceback (most recent call last):
  File "/Users/caporaso/miniconda3/envs/qiime2-dev/bin/qiime", line 11, in <module>
    load_entry_point('q2cli', 'console_scripts', 'qiime')()
  File "/Users/caporaso/miniconda3/envs/qiime2-dev/lib/python3.5/site-packages/click/core.py", line 716, in __call__
    return self.main(*args, **kwargs)
  File "/Users/caporaso/miniconda3/envs/qiime2-dev/lib/python3.5/site-packages/click/core.py", line 696, in main
    rv = self.invoke(ctx)
  File "/Users/caporaso/miniconda3/envs/qiime2-dev/lib/python3.5/site-packages/click/core.py", line 1060, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Users/caporaso/miniconda3/envs/qiime2-dev/lib/python3.5/site-packages/click/core.py", line 1060, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Users/caporaso/miniconda3/envs/qiime2-dev/lib/python3.5/site-packages/click/core.py", line 889, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/caporaso/miniconda3/envs/qiime2-dev/lib/python3.5/site-packages/click/core.py", line 534, in invoke
    return callback(*args, **kwargs)
  File "/Users/caporaso/Google Drive/code/qiime2/q2cli/q2cli/commands.py", line 210, in __call__
    results = action(**arguments)
  File "<decorator-gen-138>", line 2, in summarize
  File "/Users/caporaso/Google Drive/code/qiime2/qiime2/qiime/core/callable.py", line 221, in callable_wrapper
    output_types)
  File "/Users/caporaso/Google Drive/code/qiime2/qiime2/qiime/core/callable.py", line 418, in _callable_executor_
    ret_val = callable(output_dir=temp_dir, **view_args)
  File "/Users/caporaso/Google Drive/code/qiime2/q2-feature-table/q2_feature_table/_summarize.py", line 67, in summarize
    sample_counts_ax = sns.distplot(sample_counts, kde=False, rug=True)
  File "/Users/caporaso/miniconda3/envs/qiime2-dev/lib/python3.5/site-packages/seaborn/distributions.py", line 209, in distplot
    bins = min(_freedman_diaconis_bins(a), 50)
  File "/Users/caporaso/miniconda3/envs/qiime2-dev/lib/python3.5/site-packages/seaborn/distributions.py", line 30, in _freedman_diaconis_bins
    h = 2 * iqr(a) / (len(a) ** (1 / 3))
TypeError: len() of unsized object

Error message resulting from choosing too high a sampling depth is not clear.

image
The error message that is obtained when using a value for sampling depth that is to high is vague and doesn't really point out what the problem is. I found it by looking at the sampling depth qzv and csv. It was giving this 2-d 3-d error when I chose a sampling depth that I thought was reasonable, but it turns out wasn't.
Thanks,
Arron

BUG: summarize extended scroll

Bug Description
When viewing the "Overview" tab in a summarize visualization, if you click into the detailed HTML views of either "Frequency per sample detail" or "Frequency per feature detail," pressing the back button to get back to the main "Overview" tab doesn't resize the window contents vertically. If the per feature table is long, this causes the "Overview" tab to have an equal height made up of whitespace, which looks pretty awesome, but isn't really useful!

Comments
This isn't related to the iframe hackery in q2view, I was able to reproduce this in both q2view and qiime tools view.

Shared vs. Unique OTUs

All,

As an ecologist, quantifying and IDing which microbes are uniquely present in different environments may aid in a functional association with the holobiont. In familiarizing myself with QIIME (not QIIME 2, yet), I've noticed that there is no script for doing so, where, for example, using a mapping file and biom table, the user may quantify the unique as well as shared OTUs between 'treatments' (whether that be geographic location, environmental stress, etc.).

As holobiont or hologenome ecology is becoming an increasingly important in discerning the evolution of holobionts and hologenomes, I write this query to see if there is interest in developing such a script (that, ideally, is compatable with both QIIME and QIIME 2.0).

Regards,
Tyler

integrate taxonomic composition in samples in table summaries

Improvement Description

some users at the iceland workshop noted that it would be useful to know the taxonomic composition of the samples when they're choosing their even sampling depth based on the summarize output - this should be possible, but would require optional artifacts and would have some overlap with the existing taxa plots (so this is lower priority)

References
Taken from #24

add option for seeding rarefaction?

Improvement Description
A forum user suggested that we add support for seeding rarefaction, which is an interesting idea for supporting reproducibility, though I'm not certain what the specific use cases would be.

Questions
Are there times where we would want to perfectly replicate rarefaction results? If so, we'd need the seed to be logged into the artifact's provenance.

References
suggested

filter-features by feature names

Filter a feature table to match a list of feature names. Some scenarios for filtering based on:

  1. file containing list of names on separate lines, e.g., your 100 favorite taxa
  2. string containing comma-separated list of taxa to keep/remove, e.g, 'chloroplast' (similar to qiime1's filter_taxa_from_otu_table.html). This function is really essential when working with host- and especially plant-associated samples (which may contain high levels of chloroplast and mitochondria).
  3. somehow pipe in names directly from ANCOM or other statistical tests (this should probably be generalizable so other plugins can follow suit — e.g., ANCOM or other methods need to optionally output a list of feature IDs that can be read to filter-features following option 1 above)

The third example is what really got me thinking here — it would be great to take a list of feature IDs from something like ANCOM that are significantly different among groups (for example) and use this to filter feature tables before doing something like PCoA biplots, correlation tests, or additional significance testing.

command renaming and shuffling

  • feature-table merge-taxa-datataxa merge (once we have type maps, merge-taxa-data and merge-seq-data will be combined to a single method, probably merge-feature-data - i'm going to leave this as is for now, so we're not bouncing this functionality back and forth)
  • feature-table view-taxa-datataxa tabulate
  • feature-table view-seq-datafeature-table tabulate-seqs

add FeatureData visualizer

This should allow for easy lookup of feature ids to feature data, and I think the same method should work for FeatureData[Sequence] or FeatureData[Taxonomy].

filter-features demands metadata map, even though it is optional

I see in the help menu that a metadata map is optional:
--m-feature-metadata-file PATH Metadata mapping file [optional]

However, if I attempt to run without a map specified I get the following error (full error below):
OSError: Metadata file None doesn't exist or isn't accessible (e.g., due to incompatible file permissions).

Full command and error:

$ qiime feature-table filter-features --i-table $projectdir/table.qza 
--o-filtered-table $projectdir/picrust/table.filter.qza --p-min-frequency 10 
--p-min-samples 3

Traceback (most recent call last):
  File "/Users/nbokulich/miniconda3/envs/qiime2-05/lib/python3.5/site-packages/qiime-2.0.5-py3.5.egg/qiime/metadata.py", line 25, in load
    df = pd.read_csv(path, sep='\t', dtype=object)
  File "/Users/nbokulich/miniconda3/envs/qiime2-05/lib/python3.5/site-packages/pandas/io/parsers.py", line 645, in parser_f
    return _read(filepath_or_buffer, kwds)
  File "/Users/nbokulich/miniconda3/envs/qiime2-05/lib/python3.5/site-packages/pandas/io/parsers.py", line 388, in _read
    parser = TextFileReader(filepath_or_buffer, **kwds)
  File "/Users/nbokulich/miniconda3/envs/qiime2-05/lib/python3.5/site-packages/pandas/io/parsers.py", line 729, in __init__
    self._make_engine(self.engine)
  File "/Users/nbokulich/miniconda3/envs/qiime2-05/lib/python3.5/site-packages/pandas/io/parsers.py", line 922, in _make_engine
    self._engine = CParserWrapper(self.f, **self.options)
  File "/Users/nbokulich/miniconda3/envs/qiime2-05/lib/python3.5/site-packages/pandas/io/parsers.py", line 1389, in __init__
    self._reader = _parser.TextReader(src, **kwds)
  File "pandas/parser.pyx", line 373, in pandas.parser.TextReader.__cinit__ (pandas/parser.c:4019)
  File "pandas/parser.pyx", line 683, in pandas.parser.TextReader._setup_parser_source (pandas/parser.c:8144)
OSError: Expected file path name or file-like object, got <class 'NoneType'> type

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/nbokulich/miniconda3/envs/qiime2-05/bin/qiime", line 6, in <module>
    sys.exit(q2cli.__main__.qiime())
  File "/Users/nbokulich/miniconda3/envs/qiime2-05/lib/python3.5/site-packages/click/core.py", line 716, in __call__
    return self.main(*args, **kwargs)
  File "/Users/nbokulich/miniconda3/envs/qiime2-05/lib/python3.5/site-packages/click/core.py", line 696, in main
    rv = self.invoke(ctx)
  File "/Users/nbokulich/miniconda3/envs/qiime2-05/lib/python3.5/site-packages/click/core.py", line 1060, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Users/nbokulich/miniconda3/envs/qiime2-05/lib/python3.5/site-packages/click/core.py", line 1060, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Users/nbokulich/miniconda3/envs/qiime2-05/lib/python3.5/site-packages/click/core.py", line 889, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/nbokulich/miniconda3/envs/qiime2-05/lib/python3.5/site-packages/click/core.py", line 534, in invoke
    return callback(*args, **kwargs)
  File "/Users/nbokulich/miniconda3/envs/qiime2-05/lib/python3.5/site-packages/q2cli-0.0.5-py3.5.egg/q2cli/commands.py", line 181, in __call__
  File "/Users/nbokulich/miniconda3/envs/qiime2-05/lib/python3.5/site-packages/q2cli-0.0.5-py3.5.egg/q2cli/commands.py", line 231, in handle_in_params
  File "/Users/nbokulich/miniconda3/envs/qiime2-05/lib/python3.5/site-packages/q2cli-0.0.5-py3.5.egg/q2cli/handlers.py", line 297, in get_value
  File "/Users/nbokulich/miniconda3/envs/qiime2-05/lib/python3.5/site-packages/qiime-2.0.5-py3.5.egg/qiime/metadata.py", line 30, in load
    "due to incompatible file permissions)." % path)
OSError: Metadata file None doesn't exist or isn't accessible (e.g., due to incompatible file permissions).

Can't merge sequence data with lowercase characters

Some of the sequences I was working with included lowercase characters (as, cs, gs and ts) that cannot be merged as the sequence parser fails and gives an unworkable error.

(qiime2-2017.4) yovazquezbaeza:data-from-qiita$ qiime feature-table merge-seq-data --i-data1 3168-sequences.qza --i-data2 2388-sequences.qza --o-merged-data sequences.qza --verbose
Traceback (most recent call last):
  File "/home/yovazquezbaeza/miniconda/envs/qiime2-2017.4/lib/python3.5/site-packages/q2cli/commands.py", line 218, in __call__
    results = action(**arguments)
  File "<decorator-gen-153>", line 2, in merge_seq_data
  File "/home/yovazquezbaeza/miniconda/envs/qiime2-2017.4/lib/python3.5/site-packages/qiime2/sdk/action.py", line 168, in callable_wrapper
    recorder)
  File "/home/yovazquezbaeza/miniconda/envs/qiime2-2017.4/lib/python3.5/site-packages/qiime2/sdk/result.py", line 234, in _view
    result = transformation(self._archiver.data_dir)
  File "/home/yovazquezbaeza/miniconda/envs/qiime2-2017.4/lib/python3.5/site-packages/qiime2/core/transform.py", line 59, in transformation
    new_view = transformer(view)
  File "/home/yovazquezbaeza/miniconda/envs/qiime2-2017.4/lib/python3.5/site-packages/qiime2/core/transform.py", line 183, in wrapped
    return transformer(view.file.view(self._wrapped_view_type))
  File "/home/yovazquezbaeza/miniconda/envs/qiime2-2017.4/lib/python3.5/site-packages/q2_types/feature_data/_transformer.py", line 150, in _15
    for sequence in _read_dna_fasta(str(ff)):
  File "/home/yovazquezbaeza/miniconda/envs/qiime2-2017.4/lib/python3.5/site-packages/skbio/io/registry.py", line 506, in <genexpr>
    return (x for x in itertools.chain([next(gen)], gen))
  File "/home/yovazquezbaeza/miniconda/envs/qiime2-2017.4/lib/python3.5/site-packages/skbio/io/registry.py", line 531, in _read_gen
    yield from reader(file, **kwargs)
  File "/home/yovazquezbaeza/miniconda/envs/qiime2-2017.4/lib/python3.5/site-packages/skbio/io/registry.py", line 1008, in wrapped_reader
    yield from reader_function(fhs[-1], **kwargs)
  File "/home/yovazquezbaeza/miniconda/envs/qiime2-2017.4/lib/python3.5/site-packages/skbio/io/format/fasta.py", line 677, in _fasta_to_generator
    **kwargs)
  File "/home/yovazquezbaeza/miniconda/envs/qiime2-2017.4/lib/python3.5/site-packages/skbio/sequence/_grammared_sequence.py", line 338, in __init__
    self._validate()
  File "/home/yovazquezbaeza/miniconda/envs/qiime2-2017.4/lib/python3.5/site-packages/skbio/sequence/_grammared_sequence.py", line 362, in _validate
    list(self.alphabet)))
ValueError: Invalid character in sequence: b'g'.
Valid characters: ['-', 'B', 'A', 'H', 'R', 'V', 'W', 'N', 'Y', 'M', 'K', 'G', 'C', '.', 'T', 'D', 'S']
Note: Use `lowercase` if your sequence contains lowercase characters not in the sequence's alphabet.

Plugin error from feature-table:

  Invalid character in sequence: b'g'.  Valid characters: ['-', 'B',
  'A', 'H', 'R', 'V', 'W', 'N', 'Y', 'M', 'K', 'G', 'C', '.', 'T', 'D',
  'S'] Note: Use `lowercase` if your sequence contains lowercase
  characters not in the sequence's alphabet.

See above for debug info.

Notice that in this case the suggestion to Use lowercase .... is only available through the skbio interface, not from the QIIME 2 interface.

filter-features additional functionality

I apologize if the following is already planned, or if I have misunderstood what this method is actually doing. It is also possible that some of what I have written below is possible with where clauses — in which case I recommended documenting each of these cases when docs are released, as I believe these could be fairly standard uses.

Input: this method only accepts FeatureTable[Frequency]. It would be great to use on FeatureTable[RelativeFrequency], e.g., for filtering by relative abundance.

Filtering Criteria: this method currently only filters on min/max frequency SUMs for a given OTU, which is useful for many situations (e.g., remove all singletons) but not all. A few other cases that would be useful:

  1. Relative frequencies: filter by relative frequencies. This is useful for many cases, and both sum abundance and the criteria suggested below should be applicable to either freq or relative freq
  2. min/max average freq/relative freq: e.g., remove features that comprise <1% average abundance across samples
  3. min/max minimum freq/relative freq: e.g., remove features that are not detected at a minimum of 5% in at least one sample, or remove features detected at more than 5% relative abundance in at least one sample (the latter is probably less useful, but who knows).
  4. min/max maximum freq/relative freq: e.g., remove features that are not detected at 5% in at least one sample, or those that are detected above a certain maximum threshold. The former is a very useful method, e.g., for filtering before making taxonomy barplots, so that only taxa that are abundant in at least one sample are shown.

What to do with filtered features?: In relative frequency tables, it may be useful to keep relative frequencies intact after filtering, e.g., if a user wants to view abundant taxa on a barplot without altering actual relative frequencies (e.g., filtered groups will appear as empty space or as "Other"), or focus on specific clades without altering relative frequencies. Both of these examples relate to taxonomy tables, so may be more relevant for q2-taxa barplots instead...

add merge method

will initially merge tables that don't have overlapping sample ids (will error if they do).

viz: see how many samples a feature is observed in

This came up on the forum here. One way to accomplish this is to compute qualitative, as opposed to quantitative, stats in feature-table summarize. This is similar to biom summarize-table --qualitative --observations. Instead of a flag we can just compute qualitative and quantitative stats in the viz.

summarize tabs break on iframe navigation

Navigation through the history while in a summarize tab will break the overall tab controller. If you're on a tab, go to a second, and navigate back the iframe will change location but the tab for your previous location will still be highlighted.

BUG: `summarize` nav tabs don't match current state

Visiting the "Interactive Sample Detail" or the "Feature Detail" tabs via the "html" links
screen shot 2017-07-10 at 8 50 30 pm
the user is redirected to the appropriate location, however when you scroll up, the tabs don't correctly indicate the "active" tab:
screen shot 2017-07-10 at 8 52 36 pm
When navigating to this view by clicking on the tab, things look good!
screen shot 2017-07-10 at 8 53 22 pm

summarize improvements

  • add thousands separator (e.g., '{:,}'.format(1234567890), maybe think about internationalization here...)
  • reverse sort frequency per sample and frequency per feature tables, so highest frequencies are on top
  • add frequency per feature tables
  • the width of the bars in the sample/feature count plots are confusing to users (they can't really tell that this is a histogram) when there are few samples. we should make the bin size smaller. for example:

screenshot 2016-10-12 03 07 10

  • interactive exploration of rarefaction depth, as we had in q2d2. this would be really helpful to integrate with metadata, so you can determine how many samples of each metadata type a specific depth would filter from your data set ported to #63
  • some users at the iceland workshop noted that it would be useful to know the taxonomic composition of the samples when they're choosing their even sampling depth based on the summarize output - this should be possible, but would require optional artifacts and would have some overlap with the existing taxa plots (so this is lower priority) ported to #64

BUG: `summarize ` scroll state inconsistent

When changing between views when using the "html" links, the scroll state is not reset. This is potentially confusing --- "why am I looking at the middle of a table all of a sudden?!". This might actually be an issue with q2templates, but thought it relevant to log here (possibly related to #115)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.