Git Product home page Git Product logo

q2-diversity-lib's Introduction

q2-diversity-lib

This is a QIIME 2 plugin. For details on QIIME 2, see https://qiime2.org.

q2-diversity-lib's People

Contributors

chriskeefe avatar colinvwood avatar david-rod avatar ebolyen avatar hagenjp avatar lizgehret avatar oddant1 avatar q2d2 avatar thermokarst avatar turanoo avatar wasade avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

q2-diversity-lib's Issues

Alpha diversity operates on dense feature tables

Bug Description
Alpha diversity calculations are occurring on dense feature tables. These transformations require >100x resources than what is necessary as it is not unusual for microbiome data to be < 1% dense. I believe it is appropriate to remark as a regression bug as Q1 operated on sparse data.

Expected behavior
Alpha diversity should be computed using memory proportional to the number of nonzero entries, rather than to the number of samples and features.

Comments
A non-perfect, but appropriate and quick to do work around, would be to compute alpha per-sample by iterating over a BIOM table with .iter(), requesting dense=True, and computing alpha diversity on each np.array. If needed, this could be parallelized through joblib. This is a much lower burden change than adapting skbio's alpha diversity call to properly support sparse vectors (and which may be complex for some metrics).

I can issue a PR soon on this.

beta-phylogenetic-alt: psutil(logical=False) reports too few CPUs on EC2

Current Behavior
The current CPU count check uses logical=False which has the effect of returning the number of "physical" cores observed on the system. For EC2, this is seen as only a single core. I believe the original intent of this was to discourage use of hyperthreading -- performance testing had suggested hyperthreading to be detrimental relative to avoiding it however using multiple threads is still faster than using a single thread.

Proposed Behavior
The suggested fix here is to set logical=True. In addition, we may want to update the error to a warning, and to warn if the number of requested threads is > 16. The motivation for 16 is that's where performance leveled out when testing on 32-core boxes.

References
As reported on the QIIME 2 forum here, and the code presenting an issue is here.

BUG: After 2022.8 faith_pd_vector.qza comes with a #SampleID index/column name

After 2022.8 I noticed that the table output of core-metrics' faith_pd_vector.qza has a #SampleID as its first column name, while the other vectors continue to have a blank column/index name. Not an issue within the Q2 environment but it does break some external pipelines, for example if you are importing into R expecting them all to be similar in structure.

Remove psutil patches from `test_util.py` "through-the-framework" tests

Improvement Description
Remove psutil patches from test_cpu_request_through_framework() and test_more_threads_than_max_stripes()

Current Behavior
These tests are currently running on patched psutil results, due to a bug in psutil's handling of GHA's test runner hardware. This is fine for now, but a little gross.

Proposed Behavior
Remove patches, and let psutil run properly. Note: test runners have two cores, so tests should not call more than two cores.

References
details here

Reduce decorator/decorated function coupling in _util._disallow_empty_tables()

Improvement Description
Reduce coupling by grabbing View type annotations from wrapped function signatures and using appropriate transformers instead of hard-coding acceptable view_types in the decorator.

Current Behavior
At present, _disallow_empty_tables is tightly coupled to the table view types specified by the functions it wraps. If a new view type is used for tables in a function, the decorator will need to be refactored to explicitly allow that view type.

Proposed Behavior
Introspect view type annotations in wrapped function signatures. Use get_transformer() to get a transformer to biom.Table. If none exists, fail on invalid view type/missing transformer. Else, transform to biom.Table and check is_empty() as currently.

Questions

  1. Should error messages here be developer-facing (e.g. no transformer exists from View type to biom.Table) or user facing as currently implemented?

Comments
Unless we see an increase in the number of possible table View types, this 'improvement' might just be over-engineering.

ENH: refactor test data to run through the QIIME 2 Framework

Improvement Description
Many of the tests in this package can be refactored to run through the framework. Unit tests run in isolation are still important here in some cases and will be retained, but most of these tests are less useful or reliable if they fail to take framework machinery into account.

Current Behavior
Most tests test code in isolation from the framework.

Proposed Behavior
Most tests check behavior when the plugin is integrated with the framework.

References
Raised here

Possible patching bug/maintainability concerns in `test_util.py`

Improvement Description
Tests in test_util.py which use mock.patch are wordy, and do not line up nicely with samples from the python docs. Further, they exhibit unusual behavior I haven't yet been able to explain, which raises concerns that they may not be functioning as intended

Current Behavior

    @mock.patch("q2_diversity_lib._util.psutil.Process")
    def test_function_with_an_n_jobs_param(self, mock_process):
        mock_process = psutil.Process()
        mock_process.cpu_affinity = mock.MagicMock(return_value=[0, 1, 2])
        self.assertEqual(self.function_w_n_jobs_param(3), 3)

If we remove mock_process = psutil.Process() from the above snippet, the tests fails, and calls to psutil.Process.cpu_affinity() in the code under test return a mock object instead of returning the assigned return_value=[0, 1, 2]

Proposed Behavior
More succinct and idiomatic testing will improve confidence in test behavior.

Comments

  1. If possible, test_system_has_no_cpu_affinity() should continue to operate without using patch's create=true option. (This arg tells patch to create nonexistent attributes). The code is uglier, but the semantics are cleaner if we let tests run on macOS trigger create AttributeErrors "naturally" than if we patch in an artificial cpu_affinity method which yields a patched AttributeError.

References

  1. Discussion/docs link in this comment
  2. Attempts to remedy in this PR
  3. Further notes

Allow Bray-Curtis to accept Relative Frequency Data

Improvement Description
Allow FeatureTable[RelativeFrequency] to be passed to bray_curtis (1, 2)

Current Behavior
Only FeatureTable[Frequency] is allowed.

Comments
IIRC, I ran into some unexpected test failures when initially trying to implement this. Tests showing that Bray-Curtis produces expected results with relative frequency data will be required in any PR that closes this issue.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.