jcgoran / fitk Goto Github PK

View Code? Open in Web Editor NEW

1.0 1.0 0.0 39.3 MB

FITK - generation, manipulation, and plotting of Fisher-like matrices and tensors

Home Page: https://jcgoran.github.io/fitk/

License: MIT License

Python 98.57% Shell 0.91% CSS 0.07% Jinja 0.45%

numerical-methods statistics

fitk's People

Contributors

Stargazers

Watchers

fitk's Issues

Add numpy type hints

As of version 1.21, numpy has support for type hints (see numpy.typing docs, notably numpy.typing.NDArray), so it would perhaps be useful to add type hints in the code and bump the minimum version of numpy to that.

Fix shading of contours in `FisherFigure`s

When calling the plot method, the contours are plotted in the order of increasing uncertainty. Unfortunately, this means that if one plots multiple contour levels with opacities alpha1 and alpha2, the first contour will not actually have opacity alpha1, but rather a combination of alpha1 and alpha2.

The solution

plot the contours in reverse order
after plotting each contour, plot the previous (i.e. smaller) contour, with a color matching the color of the background (assuming it's not transparent) with 100% opacity

The above works because the second step effectively makes a contour with a "hole" in it, so the previous (i.e. smaller) contour retains the correct opacity passed to it.

(nice to have) develop interface for software that supports automatic differentiation

Add separate contouring for 1D/2D contour levels

Add kwargs like contour_levels_1d and contour_levels_2d to the class which generates 2D contour plots (currently FisherFigure2D), so they can be specified separately.
If specified, they should override the current contour_levels, but if not, those should be used (for instance, setting both contour_levels_1d and contour_levels, it should plot the 1D contours using values from contour_levels_1d, and the 2D contours using values from contour_levels).

Use `pathlib.Path` instead of `str` as argument to methods that perform IO

Users should be able to provide either a Path or a str to methods that perform any kind of saving to or loading from disk.

Rename `FisherDerivative` to `FisherInterface`

As the FisherDerivative class should technically be used to provide an interface, it would probably make more sense to rename it to FisherInterface.

Add fill to 1D curves in triangle plots

While there is the _add_shading_1d function, it is currently not used in the triangle plots, but only in FisherFigure1D.

Add faster inversion of covariance

From this SO answer, it appears that one can use low-level SciPy modules to gain a speedup in inverting positive-definite symmetric matrices, which would be useful for the covariance.

Switch to using Sphinx for documentation

pdoc does not seem to allow the linking of images, and I've run into some strange parsing issues, so it'd be better in the long run to use Sphinx instead.

Add GUI

Use some cross-platform graphical toolkit to make a GUI version of FITK (optional, should be installable via pip install fitk[gui] or something).

Order of parameters matters when using a `FisherBarFigure`

The order of the parameters should not matter when plotting the matrices using FisherBarFigure, and the parameters in the other matrices should all be automatically sorted according to the order of parameters in the first one.

Example

from fitk import FisherBarFigure, FisherMatrix

fm1 = FisherMatrix([[1, 0], [0, 3]], names=["a", "b"])
fm2 = FisherMatrix([[3.1, 0], [0, 0.9]], names=["b", "a"])
ff = FisherBarFigure()
ff.plot_relative_constraints([fm1, fm2], kind="bar")

Actual output

Expected output

Only upload coverage from one workflow

The coverage report should only be uploaded if the underlying OS is ubuntu-latest, and the Python version is 3.9 (arbitrary, but still supported until 2025).

Add caching of derivatives

It would be nice if the derivatives could be cached somehow to cut down on run time.

Add ability to mark fiducial on the 1D/2D contour plots

Would be nice to have a kwarg like mark_fiducials to mark the fiducials on the 1D/2D contour plots.
It should have a default of None, and if one specifies True, it should use some default plotting style, while if one specifies a dictionary with kwargs compatible with vlines or hlines, it should use that style instead.

A possible ambiguity: if one specifies mark_fiducials={}, do we plot it or not? I think we should, since the user intentionally specified it. In other words, the check should be if mark_fiducials is None, so that only in the case that mark_fiducials=None, will nothing be plotted (as opposed to if not mark_fiducials, which is falsy and would not catch the empty dictionary since that's falsy as well).

Add bar plots of constraints

Would be nice to have a function that creates plots similar to the one below (taken from arXiv:2110.05435, figure 8):

Use a proper JSON class

Instead of calling jsonify to convert numpy arrays to JSON strings, make a proper subclass as described here.

Change z-order of marked fiducials

Unless specified otherwise, the z-order of the marked fiducials should be at the very top of the graph, to make it easily noticeable.

Reverse order of elements in `FisherBarFigure`

Problem

When calling plot_absolute_constraints or plot_relative_constraints with kind='barh', the elements of the Fisher matrices are plotted from bottom to top. Furthermore, when calling legend on it, the labels in the legend are displayed top-to-bottom, while the plots themselves are displayed bottom-to-top.

Example

ff = FisherBarFigure()
fm1 = FisherMatrix(np.diag([1, 2, 3]), names=['a', 'b', 'c'])
fm2 = FisherMatrix(np.diag([4, 5, 6]), names=['a', 'b', 'c'])
ff.plot_relative_constraints(
    [
        fm1,
        fm2,
    ],
    kind='barh',
    labels=['Matrix 1', 'Matrix 2'],
    scale='log',
    percent=True,
)
ff.axes.set_xlim(1, ff.axes.get_xlim()[-1])
with plt.rc_context(get_default_rcparams()):
    ff.figure.legend(bbox_to_anchor=(0.5, 0.87), loc='lower center')

Fix `KeyError` issue when plotting

For some reason, when I run test_plot_2d_figure_error, the test sometimes fails with the following strange error:

>       fp.plot(
            euclid_opt.marginalize_over("Omegam", "Omegab", invert=True),
        )

tests/test_graphics.py:783:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
fitk/graphics.py:2125: in plot
    ax[i, i].remove()
.venv/lib/python3.11/site-packages/matplotlib/artist.py:212: in remove
    self._remove_method(self)
.venv/lib/python3.11/site-packages/matplotlib/figure.py:922: in delaxes
    self._axstack.remove(ax)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <matplotlib.figure._AxesStack object at 0x7fbdf87b4390>, a = <AxesSubplot: >

    def remove(self, a):
        """Remove the axes from the stack."""
>       self._axes.pop(a)
E       KeyError: <AxesSubplot: >

.venv/lib/python3.11/site-packages/matplotlib/figure.py:75: KeyError

Note that I am not sure what the exact source of the error is, as it appears randomly when I run the tests.
I suppose one easy fix is to do something like:

try:
    ax[i, i].remove()
except KeyError:
    pass

though I'm not entirely sure why this is the case as I explicitly perform checks for this beforehand, so this module should never throw any exceptions.

Derivatives: task list

The below are all leftover tasks from #5 that were not mandatory for that PR to get merged.

Comments/metadata lost when reading files

Currently, there's no way to extract comments/metadata from the saved JSON file; unless the user explicitly loads the file via the json package, the metadata cannot be accessed by the FisherMatrix class.
On the other hand, if we put a metadata parameter in the constructor, it's bound to get lost in any arithmetic operations, not to mention that it breaks the single responsibility principle.

Consider changing return type of `constraints` when specifying `name`

Currently, calling constraints(name=name) returns a numpy array with 1 element, which may not be very intuitive, and it may be better to just return a regular float instead.

Add higher order derivatives

The DALI method (as described here) provides a straightforward generalization to Fisher matrices, and only requires minor tweaks to make it work:

instead of having values in the constructor, have *values, i.e. let the user pass n x n, n x n x n, etc. tensors (or just supply the keywords matrix, flexion, quarxion to make it unambiguous)
all math operators (with the exception of matrix multiplication) just act on all elements
the Jacobian transformation requires a bit of care, but is probably straightforward

Now, what isn't super straightforward is the computation of the (un)marginalized constraints; since we are dealing with a non-Gaussian distribution, in particular (assuming zero mean):

$$ \mathcal{L}(\boldsymbol{\theta}) = \frac{1}{N} \mathrm{exp}(-\frac{1}{2} \boldsymbol{\theta}^T \mathsf{F} \boldsymbol{\theta} - H.O.) $$

we have two problems:

we need to figure out the normalizing constant $N$; this can probably be accomplished by using an n-dimensional integration (for instance, scipy.integrate.nquad seems to be adequate for the job of up to, say, 7 parameters, otherwise some MC-based thing is preferred; this paper describes a nice, well-maintained alternative)
we need to get the 2D contours somehow, since we can't just invert a Fisher matrix and call it a day. One idea is to integrate over n-2 dimensions, then define iso-probability contours (once we have the normalization, of course), and then find points in the x-y plane which correspond to those contours. Scipy seems to have some root-finding routines for multidimensional problems, but it's not immediately clear how to code everything up.

Refactor the plotting interface

The plotting interface is still somewhat incomplete, and is missing a bunch of features (notably, the legend). Furthermore, the plot_triangle function does a bit too much for my taste; ideally, we would have an interface similar to matplotlib:

from fitk import FisherPlotter

# compute the Fisher matrices
fm1 = FisherMatrix(...)
fm2 = FisherMatrix(...)

# plot them
plotter = FisherPlotter(...) # the constructor should ideally take some arguments, but for what?
plotter.plot_triangle(fm1, ls='--', c='r') # the keyword arguments should ideally be identical to those that matplotlib uses
plotter.plot_triangle(fm1) # the color should use a cycler, so this would plot the first one in the cycler

There are open questions however (incomplete list):

what should we do when one of the Fisher matrices does not have the same parameters as the initially plotted one?
should plot_triangle return a figure, or just modify the existing one (or both)?
how is saving handled?

UPDATE: the above seems to be adequate, although there are some missing features:

implement custom shading contours (there is already plot_shading_1d, but it's somewhat half-finished)
implement the ability to mark fiducials somehow (on 1D plots, probably best to use vlines, on 2D plots, maybe markers would be better)

Add methods for retrieving the fiducial and LaTeX name from the `FisherMatrix`

Add two methods, fiducial(name: str) -> float, and latex_name(name: str) -> str, which retrieve the fiducial value and LaTeX name, respectively, corresponding to the name name.
While the __getitem__ could in principle be used, it is already used to retrieve the actual values of the Fisher matrix, so it's best to have separate, explicit methods for retrieving the fiducial and the LaTeX name.

Make `add` method accept another Fisher object with different names

Refactor `FisherBarFigure` methods

The methods plot_relative_constraints and plot_absolute_constraints have a lot of code shared among them, and, consequently, it would be nice to refactor them so only one method needs to be updated instead of two.

Make `show_1d_curves` and `show_joint_dist` args of `FisherFigure2D` default to `True`

usually one wants to see the 1D curves as well as the contours
if show_joint_dist is True, the contours will match the outputs of other packages such as getdist, allowing for an easier comparison

Figure out scope of the `D` class

Currently, the D class, which is basically a fancy container for parameters required to compute derivatives using finite differences, does not hold information about the weights required to compute them; this is instead delegated to the find_diff_weights function.
The question then arises: should D be responsible for the weights as well (and maybe even the denominator of each parameter)?
A possible implementation is to just add a weights property to D which holds the weights, and maybe even a denominator or prefactor (inverse prefactor?) which holds the abs_step ** order part.

Rationale

The output of find_diff_weights may contain zeros, which just means that those parts of the stencil do not contribute at all to the final result, and currently these zeros are "manually" removed when calling derivative, but this seems somewhat error-prone.
Note that if D automatically removes the zeroes, it also has to remove the corresponding stencil parameter.

Add ability to autoformat tick labels to prevent overlap

Due to how matplotlib does its positioning of tick labels, it's possible to have tick labels that overlap.
It would be nice to find a way to generate tick labels that do not overlap by either:

using a different format for the labels (assuming they are purely textual)
finding a different, but still "nice" position of ticks (using the ticker API), and labeling those instead

Note that the algorithm should take into account the movement of all tick labels (otherwise the plot would not look nice), and provide a scoring function in the end (so it's basically an optimizer).

Provided such an algorithm exists and is general enough, it could even be placed in an external Python module, though I would settle for just an application to this module's plotting requirements.

Add project logo

Handle singular matrices

In some cases, such as parameter degeneracy, the Fisher matrix is singular and cannot be inverted; this has an impact on certain methods:

FisherMatrix.inverse
FisherMatrix.marginalize_over
FisherMatrix.correlation_matrix
FisherMatrix.constraints
FisherBaseFigure.plot
plot_curve_1d
plot_curve_2d

The code should catch these errors (numpy.LinAlgError), and handle this somehow. Some suggestions (a combination of these would also work):

show a UserWarning
when calling any plotting methods, plot parallel lines; this is a bit challenging as autoscale_view would make anything else drawn on the axis look tiny, but this can be fixed somehow (see this and this SO questions).
something else?

Add separate version w/ all interfaces

It would be useful to have a Docker version or something that has all of the interfaces already included.

Add debugging abilities to `FisherDerivative`

For any existing interface, it may be useful to have debugging capabilities, such as printing the current config, the current args, the current result, etc.
These would of course be off by default, and would be passed as kwargs to the signal and covariance methods.

Add interface to CLASS

Add separate kwargs for `signal` and `covariance` methods when calling `fisher_matrix`

As the signal and covariance methods can in principle take different kwargs (and can be different codes), I should change the **kwargs to kwargs_signal and kwargs_covariance when calling fisher_matrix.

Rename modules accordingly

The proposed scheme, which is probably cleaner for importing:

fisher_matrix → tensors
fisher_derivative → derivatives
fisher_operations → operations
fisher_utils → utils (or utilities)
fisher_plotter → graphics (plotters sounds somewhat silly)

Add automatic upload to PyPI

There should be a GH action to automatically push a release to PyPI (or test PyPI) when there is a push to master, and it has a tag of the form x.y.z.

Add self-hosted runner for running tests of interfaces

There should be a self-hosted runner (so it doesn't waste GH action minutes) which runs the tests for the interfaces. The tests should only be run manually.

Add plots to examples

A picture is worth a thousand words, and a large part of the package involves the plotting of Fisher objects, so it would be nice to have example plots alongside the documentation.
Due to mitmproxy/pdoc#282, it may be a bit challenging to include images as relative URLs are not supported, so some care is needed to make sure it works properly.

Only build docs on commits with tags

The documentation should only be built in the CI when all of the following is satisfied:

master is updated
the commit on master is tagged (optionally, the format of the tag should also be specified)

Currently, it gets built every time master is updated, regardless of any tags present in the commits.

Fix parsing of `options` argument of `FisherFigure` ctor

If one explicitly sets it to an empty dictionary or similar, it should revert to using matplotlib defaults, whereas currently it just leaves the defaults returned by get_default_rcparams.

Add option to not rescale axes when calling `plot` in `FisherFigure1/2D`

There should be an option like autolim, which defaults to True, and determines whether or not the axes should be rescaled when calling plot on a given Fisher matrix

Add `validate` method to `FisherDerivative`

It's possible that a user makes a typo on some parameter when calling, say, fisher_matrix, and therefore the whole computation may fail, wasting valuable time.

Proposal

Add a method called validate (or is_valid), which by default returns True, and which a user can override if they want to perform validation of the parameters before doing any computation.
As fitk only has complete control over the derivative and fisher_matrix methods, it should be implemented at that level, rather than in signal or covariance.

`sort` method raises cryptic error if `key=[LIST OF NAMES]` does not contain a name

When calling sort(key=[LIST OF NAMES]), it's possible the user misspells one of the names, or leaves it out; in this case, the code automatically falls back to treating key as a callable, which is probably not what the user intended, so in case key is an iterable of strings, it should raise a more helpful error message, such as:

SomeError: The object [LIST OF NAMES] appears to be an iterable, but does not contain all of the names in the Fisher object ([LIST OF NAMES IN OBJECT]).

Make `reparametrize` method less cumbersome to use

The problem

Suppose we want to make a reparametrization of the Fisher matrix.
Currently, this is achievable, but is somewhat cumbersome; let's say we have the (cosmological) set of parameters {omega_matter, omega_baryon, h, S8, n_s, w0, wa}, and we want to make a change to {Omega_matter, Omega_baryon, h, sigma8, n_s, w0, wa}.
The old parameters are related to the new ones via:

omega_matter = Omega_matter * h**2
omega_baryon = Omega_baryon * h**2
S8 = sigma8 * sqrt(Omega_matter / 0.3)

The following code, using Sympy, can be used to transform one into the other:

import numpy as np
from sympy import Matrix, sqrt, symbols

# old Fisher matrix
fm = ...
# we need to sort the names to be usable!
fm = fm.sort(
    key=[
        "Omega_matter",
        "Omega_baryon",
        "h",
        "sigma8",
        "n_s",
        "w0",
        "wa",
    ]
)

# old ones
omega_m = symbols("omega_m")
omega_b = symbols("omega_b")
h = symbols("h")
s8 = symbols("s8")

# new ones
Omega_m = symbols("Omega_m")
Omega_b = symbols("Omega_b")
sigma8 = symbols("sigma8")

# the factor or normalization
norm = symbols("norm")

# the symbolic Jacobian
result = Matrix(
    [Omega_m * h**2, Omega_b * h**2, h, sigma8 * sqrt(Omega_m / norm)]
).jacobian([Omega_m, Omega_b, h, sigma8])

# the numerical Jacobian
jacobian = np.array(
    result.subs(
        [
            # actual values may differ
            (Omega_m, 0.3),
            (Omega_b, 0.05),
            (h, 0.67),
            (sigma8, 0.83),
            (norm, 0.33),
        ]
    ).evalf()
).astype(np.float64)

# the new Fisher matrix
fm_reparametrized = fm.reparametrize(
    block_diag(jacobian, np.eye(len(fm) - len(jacobian))),
    names=[
        "omega_m",
        "omega_baryon",
        "h",
        "s8",
        "n_s",
        "w0",
        "wa",
    ],
    latex_names=math_mode(
        [
            "\\omega_m",
            "\\omega_b",
            "h",
            "S_8",
            "n_s",
            "w_0",
            "w_a",
        ]
    ),
    fiducials=fm.fiducials
    # actual values may differ
    * np.array(
        [
            0.67**2,
            0.67**2,
            1,
            np.sqrt(0.3 / float(norm.evalf(subs={norm: 0.3}))),
            1,
            1,
        ]
    )
)

Note that there is a lot of useless/redundant code here, notably the setting of names which do not change, as well as the fiducial values, a lot of which are equal to the previous ones. We also need to make sure the Jacobian has the right dimensions, and the parameters must be sorted in a particular order.

In particular:

we need to evaluate the Jacobian at the fiducial
we need to sort the matrix beforehand
we need to set the new LaTeX names and fiducials, half of which are the same as the old ones

Add `repr` and/or `str` for `FisherDerivative`

This would be useful for displaying the following info:

which software a particular interface implements
who is the maintainer of the interface
at which URLs one can find info about the software
the version number

Some open questions:

it's possible to have an interface which uses multiple software (one for the signal, one for the covariance); is the above then well defined?
should the version number refer to the version of the software itself, or the interface?

jcgoran / fitk Goto Github PK

fitk's People

Contributors

Stargazers

Watchers

fitk's Issues

The solution

Example

Actual output

Expected output

Problem

Example

Rationale

Proposal

The problem

Recommend Projects

Recommend Topics

Recommend Org