Git Product home page Git Product logo

gwin's People

Contributors

a-r-williamson avatar ahnitz avatar cdcapano avatar cmbiwer avatar cyberface avatar dfinstad avatar duncanmmacleod avatar farr avatar hannah-griggs avatar hoyc1 avatar josh-willis avatar micamu avatar n-wbrown avatar soumide1102 avatar spxiwh avatar titodalcanton avatar vivienr avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

gwin's Issues

analytical likelihoods

Create a pathological analytical likelihood function to test samplers against:

  • Compute the evidence to check against
  • have ~15 dimentions
  • different length scales in different parameters.

We need more unit tests!

@vivienr, @cdcapano, @cmbiwer, the unit tests currently only hit 22% of the code in gwin package (this deliberately does not include the scripts under /bin/). I would like to get that number up around 90% (ideally, >75% more realistically) before anyone embarks on a grand new development project as part of gwin.

I know how to write unit tests, but have no idea how to actually test the code that exists in gwin at the moment, so am reaching out for help in writing new tests. The base plan is as follows:

  • each class (e.g. GwinClass) has its own test class (TestGwinClass) (think of unittest.TestCase)
  • each method of each class has its own unit test method as part of the test class
  • each function has its own test function
  • each line of code in each function should be tested somewhere, we can simplify annoying loops and the like by using fixtures and parametrisation.

Basically, every object we define in the package should have a corresponding test object. To that end I will set up a PR that defines a test class for every class definition I can find, and hope that y'all can help me populate it with useful stuff.

Testing the scripts in /bin/ is another matter altogether, and not something I want to get to right now.

Create a new generator module

The current module in pycbc.waveforms.generate is getting a bit complicated. In particular, the FDomainDetFrameGenerator should be split up into different classes that would be used for specific types of PE runs, such as ROQ runs vs typical Gaussian noise run. Then the model could select which generator to use.

Add condition strain node to make workflow

There should be a condition strain node in the workflow. The condition strain will do all of the data loading, then write out a (hdf?) file of the Fourier domain data. All subsequent nodes will then simply load the FD data, instead of trying to repeat data conditioning. This will ensure that the ROQ weights, Bayeswave (when it's added) and gwin are all using exactly the same data.

Save seed to output file

Save the seed to the output file. If the seed is provided, let numpy use it's default seed behavior.

benchmarking suite for GWin

A benchmarking suite that'd run:

  • end-to-end test
  • individual functions timing

Includes statistics such as number of likelihood calls...

Test failing when building KDE with kombine (sometimes)

Some of the time one of the unit tests fails, related to constructing a KDE with kombine:

$ python setup.py test
...
test/test_results.py::test_construct_kde[True] FAILED                                      [ 77%]
...

The test function is:

gwin/test/test_results.py

Lines 95 to 110 in 0d50dca

@pytest.mark.parametrize('kombine', [
None, # test ImportError
False,
pytest.param(True, marks=skip_no_kombine),
])
def test_construct_kde(kombine):
samples = numpy.random.rand(5, 5)
if kombine is None:
with mock.patch.dict('sys.modules', {'kombine': None}):
with pytest.raises(ImportError):
scatter_histograms.construct_kde(samples, use_kombine=True)
return
kde = scatter_histograms.construct_kde(samples, use_kombine=kombine)
assert isinstance(kde, kde_types)

The target code (being tested) is:

def construct_kde(samples_array, use_kombine=False):
"""Constructs a KDE from the given samples.
"""
if use_kombine:
try:
import kombine
except ImportError:
raise ImportError("kombine is not installed.")
# construct the kde
if use_kombine:
kde = kombine.clustered_kde.KDE(samples_array)
else:
kde = scipy.stats.gaussian_kde(samples_array.T)
return kde

explicitly with use_kombine=True. The pytest output for that failure is as follows:

____________________________________ test_construct_kde[True] ____________________________________

kombine = True

    @pytest.mark.parametrize('kombine', [
        None,  # test ImportError
        False,
        pytest.param(True, marks=skip_no_kombine),
    ])
    def test_construct_kde(kombine):
        samples = numpy.random.rand(5, 5)

        if kombine is None:
            with mock.patch.dict('sys.modules', {'kombine': None}):
                with pytest.raises(ImportError):
                    scatter_histograms.construct_kde(samples, use_kombine=True)
            return

>       kde = scatter_histograms.construct_kde(samples, use_kombine=kombine)

test/test_results.py:109:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
gwin/results/scatter_histograms.py:156: in construct_kde
    kde = kombine.clustered_kde.KDE(samples_array)
../../opt/gwin-2.7/lib/python2.7/site-packages/kombine/clustered_kde.py:245: in __init__
    self._set_bandwidth()
../../opt/gwin-2.7/lib/python2.7/site-packages/kombine/clustered_kde.py:272: in _set_bandwidth
    self._cho_factor = la.cho_factor(self._kernel_cov)
../../opt/gwin-2.7/lib/python2.7/site-packages/scipy/linalg/decomp_cholesky.py:142: in cho_factor
    check_finite=check_finite)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

a = array([[ 0.06329217, -0.00541257,  0.02746434, -0.0176992 , -0.00789024],
    ...47],
       [-0.00789024, -0.05523657, -0.02178179,  0.00170347,  0.11439074]])
lower = False, overwrite_a = False, clean = False, check_finite = True

    def _cholesky(a, lower=False, overwrite_a=False, clean=True,
                  check_finite=True):
        """Common code for cholesky() and cho_factor()."""

        a1 = asarray_chkfinite(a) if check_finite else asarray(a)
        a1 = atleast_2d(a1)

        # Dimension check
        if a1.ndim != 2:
            raise ValueError('Input array needs to be 2 dimensional but received '
                             'a {}d-array.'.format(a1.ndim))
        # Squareness check
        if a1.shape[0] != a1.shape[1]:
            raise ValueError('Input array is expected to be square but has '
                             'the shape: {}.'.format(a1.shape))

        # Quick return for square empty array
        if a1.size == 0:
            return a1.copy(), lower

        overwrite_a = overwrite_a or _datacopied(a1, a)
        potrf, = get_lapack_funcs(('potrf',), (a1,))
        c, info = potrf(a1, lower=lower, overwrite_a=overwrite_a, clean=clean)
        if info > 0:
            raise LinAlgError("%d-th leading minor of the array is not positive "
>                             "definite" % info)
E           LinAlgError: 5-th leading minor of the array is not positive definite

../../opt/gwin-2.7/lib/python2.7/site-packages/scipy/linalg/decomp_cholesky.py:40: LinAlgError

This is not reproducable 100% of the time, which suggests some issue with random numbers, but it could also be that using a simple 5x5 array of samples created by numpy isn't good enough to actually perform this test.

This test was only designed to make sure that gwin.results.scatter_histogram.construct_kde is doing something sensible, and not to test kombine.clustered_kde.KDE, so as long as we can create a KDE, I'm happy.

All that said, I'm not sure the best way to fix the issue, but it probably requires some combination of @cdcapano and @bfarr working out what is a good enough array of samples to successfully construct a KDE in order to test this code properly.

Typo in InferenceFile.static_args

While writing unit tests, I discovered a typo in InferenceFile.static_args:

gwin/gwin/io/hdf.py

Lines 167 to 172 in 1c331ff

@property
def static_args(self):
"""Returns a dictionary of the static_args. The keys are the argument
names, values are the value they were set to.
"""
return {args: self.attrs[arg] for arg in self.attrs["static_args"]}

L172 should read {arg: self.attrs[arg] ....

A fix will be included as part of #14.

example .ini files in the documentations

It'd be useful to have those .ini files directly in the repository so they can be checkout out and copied when running examples, instead of copy-paste each sections in the documentation.

gwin executable --version shows pycbc version

The gwin executable is reporting the PyCBC version number, we should correct that

$ gwin --version
Version: 1.9.2 Release: True

If this is true in gwin, its probably true in a lot of other executables.

PR #6 shouldn't have been merged so quickly

I was in the midst of writing a review on #6 when it was merged... since the PR was merged, I can't submit my review any more. I don't want to revert it, so sticking my requests here to be addressed in another PR.

Some requests in CONTRIBUTING.md:

  • Line 37: Remove the cd lalsuite. Also, could we suggest calling the upstream gwastro instead? This is a matter of personal taste, but I like to always use the names of the organizations/users from which I've forked a repo.
  • Line 54: Add a get fetch upstream (or gwastro, if you agree with my previous comment) here, before the checkout, to ensure people know that they need to do a fetch first to branch off the most recent changes.

For the CODEOWNERS... is it the case that a code owner must approve any PR on that file, or only that they are automatically added as a reviewer, with the possibility that a reviewee could request (and get approval from) another reviewer in addition? I'm fine with it if the latter. But if the former, I'd need some convincing... that seems a bit draconian. We'd really need some way of determining who should be added as new code owners, since a file could change substantially over time.

For an initial set of code owners, I'd use the people listed in the copy right notices at the top of all the files. We can have multiple people listed for a file, correct?

PEP8 (and maybe later pylint) checks in CodeClimate

Hi @duncanmmacleod. I've recently (for the last month) had codeclimate's PEP8 checker enabled in pycbc, and it has been quite a useful (and not intrusively verbose) checker of PEP8 style (see any recent pycbc pull request ... and associated commits as people actually went and fixed the PEP8 issues!). I'm also lead to believe that pylint will soon be added to the CodeClimate engine as well.

Are you interested in me (or you) adding this also in GWIN? It requires a bit more setup than what I think is there for codeclimate at the moment, but it has worked very well for us so far! ... And I think if I do it, I can just use the same setup I have for pycbc also for gwin (as the permissions setup is over the whole gwastro namespace). (I still miss landscape.io, which was perfect when it worked, but this seems to be doing okay as a replacement).

likelihood marginalisation

It would be very efficient to have time and distance marginalisation in addition to phase.

Phase is already marginalisable with MarginalizedPhaseGaussianLikelihood(), see https://gwin.readthedocs.io/en/latest/api/gwin.likelihood.gaussian_noise.html?highlight=MarginalizedPhaseGaussianLikelihood (technical DCC: https://dcc.ligo.org/LIGO-T1300326)

Time is marginalisable in LALInference: https://git.ligo.org/lscsoft/lalsuite/blob/master/lalinference/src/LALInferenceLikelihood.c#L61, and distance is rather straightforward to marginalise over.
We can still recover a pretty good guess for the distribution with the max(likelihood) point that comes out of the marginalised analysis, and the sampled points likelihood values reconstruct the marginalised parameter likelihood.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.