gwastro / gwin Goto Github PK

View Code? Open in Web Editor NEW

14.0 9.0 26.0 693 KB

A python package for Bayesian inference of gravitational-wave data

License: GNU General Public License v3.0

Python 99.91% Shell 0.09%

gravitational-waves ligo virgo python bayesian-inference parameter-estimation gravity astronomy black-hole neutron-star

gwin's People

Contributors

Stargazers

Watchers

gwin's Issues

Get gwin_make_workflow to estimate the segment length

write prior to output file

alongside the posterior, in the same HDF5 file.

analytical likelihoods

Create a pathological analytical likelihood function to test samplers against:

Compute the evidence to check against
have ~15 dimentions
different length scales in different parameters.

@vivienr, @cdcapano, @cmbiwer, the unit tests currently only hit 22% of the code in gwin package (this deliberately does not include the scripts under /bin/). I would like to get that number up around 90% (ideally, >75% more realistically) before anyone embarks on a grand new development project as part of gwin.

I know how to write unit tests, but have no idea how to actually test the code that exists in gwin at the moment, so am reaching out for help in writing new tests. The base plan is as follows:

each class (e.g. GwinClass) has its own test class (TestGwinClass) (think of unittest.TestCase)
each method of each class has its own unit test method as part of the test class
each function has its own test function
each line of code in each function should be tested somewhere, we can simplify annoying loops and the like by using fixtures and parametrisation.

Basically, every object we define in the package should have a corresponding test object. To that end I will set up a PR that defines a test class for every class definition I can find, and hope that y'all can help me populate it with useful stuff.

Testing the scripts in /bin/ is another matter altogether, and not something I want to get to right now.

Create a new generator module

The current module in pycbc.waveforms.generate is getting a bit complicated. In particular, the FDomainDetFrameGenerator should be split up into different classes that would be used for specific types of PE runs, such as ROQ runs vs typical Gaussian noise run. Then the model could select which generator to use.

Write down time to run completion (estimated and updated in real time)

Including updating the number of independent samples during run. Ideally, we would want to find a way to put that information onto gracedb...

Add the ability to make workflow and submit gwin dax in one line

Add option to specify the low and high frequency limits in the likelihood

Add this to the model section in the config ini file

Add condition strain node to make workflow

There should be a condition strain node in the workflow. The condition strain will do all of the data loading, then write out a (hdf?) file of the Fourier domain data. All subsequent nodes will then simply load the FD data, instead of trying to repeat data conditioning. This will ensure that the ROQ weights, Bayeswave (when it's added) and gwin are all using exactly the same data.

Save seed to output file

Save the seed to the output file. If the seed is provided, let numpy use it's default seed behavior.

benchmarking suite for GWin

A benchmarking suite that'd run:

end-to-end test
individual functions timing

Includes statistics such as number of likelihood calls...

Test failing when building KDE with kombine (sometimes)

Some of the time one of the unit tests fails, related to constructing a KDE with kombine:

$ python setup.py test
...
test/test_results.py::test_construct_kde[True] FAILED                                      [ 77%]
...

The test function is:

gwin/test/test_results.py

Lines 95 to 110 in 0d50dca

 @pytest.mark.parametrize('kombine', [ 

  None, # test ImportError 

  False, 

  pytest.param(True, marks=skip_no_kombine), 

 ]) 

 def test_construct_kde(kombine): 

 samples = numpy.random.rand(5, 5) 

 if kombine is None: 

 with mock.patch.dict('sys.modules', {'kombine': None}): 

 with pytest.raises(ImportError): 

 scatter_histograms.construct_kde(samples, use_kombine=True) 

 return 

 kde = scatter_histograms.construct_kde(samples, use_kombine=kombine) 

 assert isinstance(kde, kde_types)

The target code (being tested) is:

gwin/gwin/results/scatter_histograms.py

Lines 146 to 159 in 0d50dca

 def construct_kde(samples_array, use_kombine=False): 

 """Constructs a KDE from the given samples. 

  """ 

 if use_kombine: 

 try: 

 import kombine 

 except ImportError: 

 raise ImportError("kombine is not installed.") 

 # construct the kde 

 if use_kombine: 

 kde = kombine.clustered_kde.KDE(samples_array) 

 else: 

 kde = scipy.stats.gaussian_kde(samples_array.T) 

 return kde

explicitly with use_kombine=True. The pytest output for that failure is as follows:

____________________________________ test_construct_kde[True] ____________________________________

kombine = True

    @pytest.mark.parametrize('kombine', [
        None,  # test ImportError
        False,
        pytest.param(True, marks=skip_no_kombine),
    ])
    def test_construct_kde(kombine):
        samples = numpy.random.rand(5, 5)

        if kombine is None:
            with mock.patch.dict('sys.modules', {'kombine': None}):
                with pytest.raises(ImportError):
                    scatter_histograms.construct_kde(samples, use_kombine=True)
            return

>       kde = scatter_histograms.construct_kde(samples, use_kombine=kombine)

test/test_results.py:109:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
gwin/results/scatter_histograms.py:156: in construct_kde
    kde = kombine.clustered_kde.KDE(samples_array)
../../opt/gwin-2.7/lib/python2.7/site-packages/kombine/clustered_kde.py:245: in __init__
    self._set_bandwidth()
../../opt/gwin-2.7/lib/python2.7/site-packages/kombine/clustered_kde.py:272: in _set_bandwidth
    self._cho_factor = la.cho_factor(self._kernel_cov)
../../opt/gwin-2.7/lib/python2.7/site-packages/scipy/linalg/decomp_cholesky.py:142: in cho_factor
    check_finite=check_finite)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

a = array([[ 0.06329217, -0.00541257,  0.02746434, -0.0176992 , -0.00789024],
    ...47],
       [-0.00789024, -0.05523657, -0.02178179,  0.00170347,  0.11439074]])
lower = False, overwrite_a = False, clean = False, check_finite = True

    def _cholesky(a, lower=False, overwrite_a=False, clean=True,
                  check_finite=True):
        """Common code for cholesky() and cho_factor()."""

        a1 = asarray_chkfinite(a) if check_finite else asarray(a)
        a1 = atleast_2d(a1)

        # Dimension check
        if a1.ndim != 2:
            raise ValueError('Input array needs to be 2 dimensional but received '
                             'a {}d-array.'.format(a1.ndim))
        # Squareness check
        if a1.shape[0] != a1.shape[1]:
            raise ValueError('Input array is expected to be square but has '
                             'the shape: {}.'.format(a1.shape))

        # Quick return for square empty array
        if a1.size == 0:
            return a1.copy(), lower

        overwrite_a = overwrite_a or _datacopied(a1, a)
        potrf, = get_lapack_funcs(('potrf',), (a1,))
        c, info = potrf(a1, lower=lower, overwrite_a=overwrite_a, clean=clean)
        if info > 0:
            raise LinAlgError("%d-th leading minor of the array is not positive "
>                             "definite" % info)
E           LinAlgError: 5-th leading minor of the array is not positive definite

../../opt/gwin-2.7/lib/python2.7/site-packages/scipy/linalg/decomp_cholesky.py:40: LinAlgError

This is not reproducable 100% of the time, which suggests some issue with random numbers, but it could also be that using a simple 5x5 array of samples created by numpy isn't good enough to actually perform this test.

This test was only designed to make sure that gwin.results.scatter_histogram.construct_kde is doing something sensible, and not to test kombine.clustered_kde.KDE, so as long as we can create a KDE, I'm happy.

All that said, I'm not sure the best way to fix the issue, but it probably requires some combination of @cdcapano and @bfarr working out what is a good enough array of samples to successfully construct a KDE in order to test this code properly.

Typo in InferenceFile.static_args

While writing unit tests, I discovered a typo in InferenceFile.static_args:

gwin/gwin/io/hdf.py

Lines 167 to 172 in 1c331ff

 @property 

 def static_args(self): 

 """Returns a dictionary of the static_args. The keys are the argument 

  names, values are the value they were set to. 

  """ 

 return {args: self.attrs[arg] for arg in self.attrs["static_args"]}

L172 should read {arg: self.attrs[arg] ....

A fix will be included as part of #14.

Add support for RIFT to the model module

What the subject says.

example .ini files in the documentations

It'd be useful to have those .ini files directly in the repository so they can be checkout out and copied when running examples, instead of copy-paste each sections in the documentation.

gwin executable --version shows pycbc version

The gwin executable is reporting the PyCBC version number, we should correct that

$ gwin --version
Version: 1.9.2 Release: True

If this is true in gwin, its probably true in a lot of other executables.

PR #6 shouldn't have been merged so quickly

I was in the midst of writing a review on #6 when it was merged... since the PR was merged, I can't submit my review any more. I don't want to revert it, so sticking my requests here to be addressed in another PR.

Some requests in CONTRIBUTING.md:

Line 37: Remove the cd lalsuite. Also, could we suggest calling the upstream gwastro instead? This is a matter of personal taste, but I like to always use the names of the organizations/users from which I've forked a repo.
Line 54: Add a get fetch upstream (or gwastro, if you agree with my previous comment) here, before the checkout, to ensure people know that they need to do a fetch first to branch off the most recent changes.

For the CODEOWNERS... is it the case that a code owner must approve any PR on that file, or only that they are automatically added as a reviewer, with the possibility that a reviewee could request (and get approval from) another reviewer in addition? I'm fine with it if the latter. But if the former, I'd need some convincing... that seems a bit draconian. We'd really need some way of determining who should be added as new code owners, since a file could change substantially over time.

For an initial set of code owners, I'd use the people listed in the copy right notices at the top of all the files. We can have multiple people listed for a file, correct?

PEP8 (and maybe later pylint) checks in CodeClimate

Hi @duncanmmacleod. I've recently (for the last month) had codeclimate's PEP8 checker enabled in pycbc, and it has been quite a useful (and not intrusively verbose) checker of PEP8 style (see any recent pycbc pull request ... and associated commits as people actually went and fixed the PEP8 issues!). I'm also lead to believe that pylint will soon be added to the CodeClimate engine as well.

Are you interested in me (or you) adding this also in GWIN? It requires a bit more setup than what I think is there for codeclimate at the moment, but it has worked very well for us so far! ... And I think if I do it, I can just use the same setup I have for pycbc also for gwin (as the permissions setup is over the whole gwastro namespace). (I still miss landscape.io, which was perfect when it worked, but this seems to be doing okay as a replacement).

likelihood marginalisation

It would be very efficient to have time and distance marginalisation in addition to phase.

Phase is already marginalisable with MarginalizedPhaseGaussianLikelihood(), see https://gwin.readthedocs.io/en/latest/api/gwin.likelihood.gaussian_noise.html?highlight=MarginalizedPhaseGaussianLikelihood (technical DCC: https://dcc.ligo.org/LIGO-T1300326)

Time is marginalisable in LALInference: https://git.ligo.org/lscsoft/lalsuite/blob/master/lalinference/src/LALInferenceLikelihood.c#L61, and distance is rather straightforward to marginalise over.
We can still recover a pretty good guess for the distribution with the max(likelihood) point that comes out of the marginalised analysis, and the sampled points likelihood values reconstruct the marginalised parameter likelihood.

	@pytest.mark.parametrize('kombine', [
	None, # test ImportError
	False,
	pytest.param(True, marks=skip_no_kombine),
	])
	def test_construct_kde(kombine):
	samples = numpy.random.rand(5, 5)

	if kombine is None:
	with mock.patch.dict('sys.modules', {'kombine': None}):
	with pytest.raises(ImportError):
	scatter_histograms.construct_kde(samples, use_kombine=True)
	return

	kde = scatter_histograms.construct_kde(samples, use_kombine=kombine)
	assert isinstance(kde, kde_types)

	def construct_kde(samples_array, use_kombine=False):
	"""Constructs a KDE from the given samples.
	"""
	if use_kombine:
	try:
	import kombine
	except ImportError:
	raise ImportError("kombine is not installed.")
	# construct the kde
	if use_kombine:
	kde = kombine.clustered_kde.KDE(samples_array)
	else:
	kde = scipy.stats.gaussian_kde(samples_array.T)
	return kde

	@property
	def static_args(self):
	"""Returns a dictionary of the static_args. The keys are the argument
	names, values are the value they were set to.
	"""
	return {args: self.attrs[arg] for arg in self.attrs["static_args"]}

gwastro / gwin Goto Github PK

gwin's People

Contributors

Stargazers

Watchers

Forkers

gwin's Issues

Recommend Projects

Recommend Topics

Recommend Org