This is a QIIME 2 plugin. For details on QIIME 2, see https://qiime2.org.
qiime2 / q2-diversity-lib Goto Github PK
View Code? Open in Web Editor NEWLicense: BSD 3-Clause "New" or "Revised" License
License: BSD 3-Clause "New" or "Revised" License
This is a QIIME 2 plugin. For details on QIIME 2, see https://qiime2.org.
Improvement Description
beta-phylogenetic-meta-passthrough
currently only operates on FeatureTable[Frequency]
, but as I understand it, the underlying method should handle FeatureTable[Frequency | RelativeFrequency | PresenceAbsence]
just fine.
References
Please see qiime2/q2-diversity#272 for more information.
Bug Description
Alpha diversity calculations are occurring on dense feature tables. These transformations require >100x resources than what is necessary as it is not unusual for microbiome data to be < 1% dense. I believe it is appropriate to remark as a regression bug as Q1 operated on sparse data.
Expected behavior
Alpha diversity should be computed using memory proportional to the number of nonzero entries, rather than to the number of samples and features.
Comments
A non-perfect, but appropriate and quick to do work around, would be to compute alpha per-sample by iterating over a BIOM table with .iter()
, requesting dense=True
, and computing alpha diversity on each np.array
. If needed, this could be parallelized through joblib. This is a much lower burden change than adapting skbio
's alpha diversity call to properly support sparse vectors (and which may be complex for some metrics).
I can issue a PR soon on this.
Current Behavior
The current CPU count check uses logical=False
which has the effect of returning the number of "physical" cores observed on the system. For EC2, this is seen as only a single core. I believe the original intent of this was to discourage use of hyperthreading -- performance testing had suggested hyperthreading to be detrimental relative to avoiding it however using multiple threads is still faster than using a single thread.
Proposed Behavior
The suggested fix here is to set logical=True
. In addition, we may want to update the error to a warning, and to warn if the number of requested threads is > 16. The motivation for 16 is that's where performance leveled out when testing on 32-core boxes.
References
As reported on the QIIME 2 forum here, and the code presenting an issue is here.
After 2022.8 I noticed that the table output of core-metrics' faith_pd_vector.qza
has a #SampleID
as its first column name, while the other vectors continue to have a blank column/index name. Not an issue within the Q2 environment but it does break some external pipelines, for example if you are importing into R expecting them all to be similar in structure.
Improvement Description
Remove psutil patches from test_cpu_request_through_framework()
and test_more_threads_than_max_stripes()
Current Behavior
These tests are currently running on patched psutil results, due to a bug in psutil
's handling of GHA's test runner hardware. This is fine for now, but a little gross.
Proposed Behavior
Remove patches, and let psutil
run properly. Note: test runners have two cores, so tests should not call more than two cores.
References
details here
Improvement Description
Test data for weighted unifrac is ported from q2-diversity
, and both actual and expected appear to be produced by the same software. Test data should be generated "by hand" or with a different software pipeline to make tests more meaningful.
Improvement Description
Reduce coupling by grabbing View type annotations from wrapped function signatures and using appropriate transformers instead of hard-coding acceptable view_types in the decorator.
Current Behavior
At present, _disallow_empty_tables
is tightly coupled to the table
view types specified by the functions it wraps. If a new view type is used for tables in a function, the decorator will need to be refactored to explicitly allow that view type.
Proposed Behavior
Introspect view type annotations in wrapped function signatures. Use get_transformer()
to get a transformer to biom.Table
. If none exists, fail on invalid view type/missing transformer. Else, transform to biom.Table and check is_empty() as currently.
Questions
no transformer exists from View type to biom.Table
) or user facing as currently implemented?Comments
Unless we see an increase in the number of possible table
View types, this 'improvement' might just be over-engineering.
Improvement Description
Many of the tests in this package can be refactored to run through the framework. Unit tests run in isolation are still important here in some cases and will be retained, but most of these tests are less useful or reliable if they fail to take framework machinery into account.
Current Behavior
Most tests test code in isolation from the framework.
Proposed Behavior
Most tests check behavior when the plugin is integrated with the framework.
References
Raised here
Improvement Description
Tests in test_util.py which use mock.patch
are wordy, and do not line up nicely with samples from the python docs. Further, they exhibit unusual behavior I haven't yet been able to explain, which raises concerns that they may not be functioning as intended
Current Behavior
@mock.patch("q2_diversity_lib._util.psutil.Process")
def test_function_with_an_n_jobs_param(self, mock_process):
mock_process = psutil.Process()
mock_process.cpu_affinity = mock.MagicMock(return_value=[0, 1, 2])
self.assertEqual(self.function_w_n_jobs_param(3), 3)
If we remove mock_process = psutil.Process()
from the above snippet, the tests fails, and calls to psutil.Process.cpu_affinity() in the code under test return a mock object instead of returning the assigned return_value=[0, 1, 2]
Proposed Behavior
More succinct and idiomatic testing will improve confidence in test behavior.
Comments
test_system_has_no_cpu_affinity()
should continue to operate without using patch's create=true
option. (This arg tells patch to create nonexistent attributes). The code is uglier, but the semantics are cleaner if we let tests run on macOS trigger create AttributeErrors "naturally" than if we patch in an artificial cpu_affinity
method which yields a patched AttributeError.References
Improvement Description
Allow FeatureTable[RelativeFrequency] to be passed to bray_curtis
(1, 2)
Current Behavior
Only FeatureTable[Frequency] is allowed.
Comments
IIRC, I ran into some unexpected test failures when initially trying to implement this. Tests showing that Bray-Curtis produces expected results with relative frequency data will be required in any PR that closes this issue.
Improvement Description
diversity-lib
should provide functions that dispatch calls from diversity alpha
and diversity beta
to the best extant implementation of the selected measure.
Improvement Description
To confirm that feature tables are converted to presence/absence data before the metric is applied (sklearn handles this for a lot of methods, so we need tests to ensure that we know if that ever changes).
References
See qiime2/q2-diversity#189 for more context.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.