jcgoran / fitk Goto Github PK
View Code? Open in Web Editor NEWFITK - generation, manipulation, and plotting of Fisher-like matrices and tensors
Home Page: https://jcgoran.github.io/fitk/
License: MIT License
FITK - generation, manipulation, and plotting of Fisher-like matrices and tensors
Home Page: https://jcgoran.github.io/fitk/
License: MIT License
As of version 1.21, numpy has support for type hints (see numpy.typing
docs, notably numpy.typing.NDArray
), so it would perhaps be useful to add type hints in the code and bump the minimum version of numpy to that.
When calling the plot
method, the contours are plotted in the order of increasing uncertainty. Unfortunately, this means that if one plots multiple contour levels with opacities alpha1
and alpha2
, the first contour will not actually have opacity alpha1
, but rather a combination of alpha1
and alpha2
.
The above works because the second step effectively makes a contour with a "hole" in it, so the previous (i.e. smaller) contour retains the correct opacity passed to it.
Add kwargs like contour_levels_1d
and contour_levels_2d
to the class which generates 2D contour plots (currently FisherFigure2D
), so they can be specified separately.
If specified, they should override the current contour_levels
, but if not, those should be used (for instance, setting both contour_levels_1d
and contour_levels
, it should plot the 1D contours using values from contour_levels_1d
, and the 2D contours using values from contour_levels
).
Users should be able to provide either a Path
or a str
to methods that perform any kind of saving to or loading from disk.
As the FisherDerivative
class should technically be used to provide an interface, it would probably make more sense to rename it to FisherInterface
.
While there is the _add_shading_1d
function, it is currently not used in the triangle plots, but only in FisherFigure1D
.
From this SO answer, it appears that one can use low-level SciPy modules to gain a speedup in inverting positive-definite symmetric matrices, which would be useful for the covariance.
pdoc does not seem to allow the linking of images, and I've run into some strange parsing issues, so it'd be better in the long run to use Sphinx instead.
Use some cross-platform graphical toolkit to make a GUI version of FITK (optional, should be installable via pip install fitk[gui]
or something).
The order of the parameters should not matter when plotting the matrices using FisherBarFigure
, and the parameters in the other matrices should all be automatically sorted according to the order of parameters in the first one.
from fitk import FisherBarFigure, FisherMatrix
fm1 = FisherMatrix([[1, 0], [0, 3]], names=["a", "b"])
fm2 = FisherMatrix([[3.1, 0], [0, 0.9]], names=["b", "a"])
ff = FisherBarFigure()
ff.plot_relative_constraints([fm1, fm2], kind="bar")
The coverage report should only be uploaded if the underlying OS is ubuntu-latest
, and the Python version is 3.9
(arbitrary, but still supported until 2025).
It would be nice if the derivatives could be cached somehow to cut down on run time.
Would be nice to have a kwarg like mark_fiducials
to mark the fiducials on the 1D/2D contour plots.
It should have a default of None
, and if one specifies True
, it should use some default plotting style, while if one specifies a dictionary with kwargs compatible with vlines or hlines, it should use that style instead.
A possible ambiguity: if one specifies mark_fiducials={}
, do we plot it or not? I think we should, since the user intentionally specified it. In other words, the check should be if mark_fiducials is None
, so that only in the case that mark_fiducials=None
, will nothing be plotted (as opposed to if not mark_fiducials
, which is falsy and would not catch the empty dictionary since that's falsy as well).
Would be nice to have a function that creates plots similar to the one below (taken from arXiv:2110.05435, figure 8):
Instead of calling jsonify
to convert numpy arrays to JSON strings, make a proper subclass as described here.
Unless specified otherwise, the z-order of the marked fiducials should be at the very top of the graph, to make it easily noticeable.
When calling plot_absolute_constraints
or plot_relative_constraints
with kind='barh'
, the elements of the Fisher matrices are plotted from bottom to top. Furthermore, when calling legend
on it, the labels in the legend are displayed top-to-bottom, while the plots themselves are displayed bottom-to-top.
ff = FisherBarFigure()
fm1 = FisherMatrix(np.diag([1, 2, 3]), names=['a', 'b', 'c'])
fm2 = FisherMatrix(np.diag([4, 5, 6]), names=['a', 'b', 'c'])
ff.plot_relative_constraints(
[
fm1,
fm2,
],
kind='barh',
labels=['Matrix 1', 'Matrix 2'],
scale='log',
percent=True,
)
ff.axes.set_xlim(1, ff.axes.get_xlim()[-1])
with plt.rc_context(get_default_rcparams()):
ff.figure.legend(bbox_to_anchor=(0.5, 0.87), loc='lower center')
For some reason, when I run test_plot_2d_figure_error
, the test sometimes fails with the following strange error:
> fp.plot(
euclid_opt.marginalize_over("Omegam", "Omegab", invert=True),
)
tests/test_graphics.py:783:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
fitk/graphics.py:2125: in plot
ax[i, i].remove()
.venv/lib/python3.11/site-packages/matplotlib/artist.py:212: in remove
self._remove_method(self)
.venv/lib/python3.11/site-packages/matplotlib/figure.py:922: in delaxes
self._axstack.remove(ax)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <matplotlib.figure._AxesStack object at 0x7fbdf87b4390>, a = <AxesSubplot: >
def remove(self, a):
"""Remove the axes from the stack."""
> self._axes.pop(a)
E KeyError: <AxesSubplot: >
.venv/lib/python3.11/site-packages/matplotlib/figure.py:75: KeyError
Note that I am not sure what the exact source of the error is, as it appears randomly when I run the tests.
I suppose one easy fix is to do something like:
try:
ax[i, i].remove()
except KeyError:
pass
though I'm not entirely sure why this is the case as I explicitly perform checks for this beforehand, so this module should never throw any exceptions.
The below are all leftover tasks from #5 that were not mandatory for that PR to get merged.
signal
should return the tuple (coords, values)
? Or a separate (data) class with properties coords
and values
). Note that the coordinates are not needed anywhere in the code, they would just be there as a convenience for the user, in case they want to construct the derivatives themselves, or if they want to easily plot the derivatives without re-computing the coordinates.omega_m
? Omega_m
? Omega_matter
?")Currently, there's no way to extract comments/metadata from the saved JSON file; unless the user explicitly loads the file via the json
package, the metadata cannot be accessed by the FisherMatrix
class.
On the other hand, if we put a metadata
parameter in the constructor, it's bound to get lost in any arithmetic operations, not to mention that it breaks the single responsibility principle.
Currently, calling constraints(name=name)
returns a numpy array with 1 element, which may not be very intuitive, and it may be better to just return a regular float instead.
The DALI method (as described here) provides a straightforward generalization to Fisher matrices, and only requires minor tweaks to make it work:
values
in the constructor, have *values
, i.e. let the user pass n x n
, n x n x n
, etc. tensors (or just supply the keywords matrix
, flexion
, quarxion
to make it unambiguous)Now, what isn't super straightforward is the computation of the (un)marginalized constraints; since we are dealing with a non-Gaussian distribution, in particular (assuming zero mean):
we have two problems:
n
-dimensional integration (for instance, scipy.integrate.nquad
seems to be adequate for the job of up to, say, 7 parameters, otherwise some MC-based thing is preferred; this paper describes a nice, well-maintained alternative)n-2
dimensions, then define iso-probability contours (once we have the normalization, of course), and then find points in the x-y
plane which correspond to those contours. Scipy seems to have some root-finding routines for multidimensional problems, but it's not immediately clear how to code everything up.The plotting interface is still somewhat incomplete, and is missing a bunch of features (notably, the legend). Furthermore, the plot_triangle
function does a bit too much for my taste; ideally, we would have an interface similar to matplotlib:
from fitk import FisherPlotter
# compute the Fisher matrices
fm1 = FisherMatrix(...)
fm2 = FisherMatrix(...)
# plot them
plotter = FisherPlotter(...) # the constructor should ideally take some arguments, but for what?
plotter.plot_triangle(fm1, ls='--', c='r') # the keyword arguments should ideally be identical to those that matplotlib uses
plotter.plot_triangle(fm1) # the color should use a cycler, so this would plot the first one in the cycler
There are open questions however (incomplete list):
plot_triangle
return a figure, or just modify the existing one (or both)?UPDATE: the above seems to be adequate, although there are some missing features:
plot_shading_1d
, but it's somewhat half-finished)vlines
, on 2D plots, maybe markers would be better)Add two methods, fiducial(name: str) -> float
, and latex_name(name: str) -> str
, which retrieve the fiducial value and LaTeX name, respectively, corresponding to the name name
.
While the __getitem__
could in principle be used, it is already used to retrieve the actual values of the Fisher matrix, so it's best to have separate, explicit methods for retrieving the fiducial and the LaTeX name.
The methods plot_relative_constraints
and plot_absolute_constraints
have a lot of code shared among them, and, consequently, it would be nice to refactor them so only one method needs to be updated instead of two.
show_joint_dist
is True
, the contours will match the outputs of other packages such as getdist
, allowing for an easier comparisonCurrently, the D
class, which is basically a fancy container for parameters required to compute derivatives using finite differences, does not hold information about the weights required to compute them; this is instead delegated to the find_diff_weights
function.
The question then arises: should D
be responsible for the weights as well (and maybe even the denominator of each parameter)?
A possible implementation is to just add a weights
property to D
which holds the weights, and maybe even a denominator
or prefactor
(inverse prefactor?) which holds the abs_step ** order
part.
The output of find_diff_weights
may contain zeros, which just means that those parts of the stencil do not contribute at all to the final result, and currently these zeros are "manually" removed when calling derivative
, but this seems somewhat error-prone.
Note that if D
automatically removes the zeroes, it also has to remove the corresponding stencil
parameter.
Due to how matplotlib does its positioning of tick labels, it's possible to have tick labels that overlap.
It would be nice to find a way to generate tick labels that do not overlap by either:
Note that the algorithm should take into account the movement of all tick labels (otherwise the plot would not look nice), and provide a scoring function in the end (so it's basically an optimizer).
Provided such an algorithm exists and is general enough, it could even be placed in an external Python module, though I would settle for just an application to this module's plotting requirements.
In some cases, such as parameter degeneracy, the Fisher matrix is singular and cannot be inverted; this has an impact on certain methods:
FisherMatrix.inverse
FisherMatrix.marginalize_over
FisherMatrix.correlation_matrix
FisherMatrix.constraints
FisherBaseFigure.plot
plot_curve_1d
plot_curve_2d
The code should catch these errors (numpy.LinAlgError
), and handle this somehow. Some suggestions (a combination of these would also work):
It would be useful to have a Docker version or something that has all of the interfaces already included.
For any existing interface, it may be useful to have debugging capabilities, such as printing the current config, the current args, the current result, etc.
These would of course be off by default, and would be passed as kwargs to the signal
and covariance
methods.
As the signal
and covariance
methods can in principle take different kwargs (and can be different codes), I should change the **kwargs
to kwargs_signal
and kwargs_covariance
when calling fisher_matrix
.
The proposed scheme, which is probably cleaner for importing:
fisher_matrix
→ tensors
fisher_derivative
→ derivatives
fisher_operations
→ operations
fisher_utils
→ utils
(or utilities
)fisher_plotter
→ graphics
(plotters
sounds somewhat silly)There should be a GH action to automatically push a release to PyPI (or test PyPI) when there is a push to master, and it has a tag of the form x.y.z
.
There should be a self-hosted runner (so it doesn't waste GH action minutes) which runs the tests for the interfaces. The tests should only be run manually.
A picture is worth a thousand words, and a large part of the package involves the plotting of Fisher objects, so it would be nice to have example plots alongside the documentation.
Due to mitmproxy/pdoc#282, it may be a bit challenging to include images as relative URLs are not supported, so some care is needed to make sure it works properly.
The documentation should only be built in the CI when all of the following is satisfied:
master
is updatedmaster
is tagged (optionally, the format of the tag should also be specified)Currently, it gets built every time master
is updated, regardless of any tags present in the commits.
If one explicitly sets it to an empty dictionary or similar, it should revert to using matplotlib defaults, whereas currently it just leaves the defaults returned by get_default_rcparams
.
There should be an option like autolim
, which defaults to True
, and determines whether or not the axes should be rescaled when calling plot
on a given Fisher matrix
It's possible that a user makes a typo on some parameter when calling, say, fisher_matrix
, and therefore the whole computation may fail, wasting valuable time.
Add a method called validate
(or is_valid
), which by default returns True
, and which a user can override if they want to perform validation of the parameters before doing any computation.
As fitk
only has complete control over the derivative
and fisher_matrix
methods, it should be implemented at that level, rather than in signal
or covariance
.
When calling sort(key=[LIST OF NAMES])
, it's possible the user misspells one of the names, or leaves it out; in this case, the code automatically falls back to treating key
as a callable, which is probably not what the user intended, so in case key
is an iterable of strings, it should raise a more helpful error message, such as:
SomeError: The object [LIST OF NAMES] appears to be an iterable, but does not contain all of the names in the Fisher object ([LIST OF NAMES IN OBJECT]).
Suppose we want to make a reparametrization of the Fisher matrix.
Currently, this is achievable, but is somewhat cumbersome; let's say we have the (cosmological) set of parameters {omega_matter
, omega_baryon
, h
, S8
, n_s
, w0
, wa
}, and we want to make a change to {Omega_matter
, Omega_baryon
, h
, sigma8
, n_s
, w0
, wa
}.
The old parameters are related to the new ones via:
omega_matter = Omega_matter * h**2
omega_baryon = Omega_baryon * h**2
S8 = sigma8 * sqrt(Omega_matter / 0.3)
The following code, using Sympy, can be used to transform one into the other:
import numpy as np
from sympy import Matrix, sqrt, symbols
# old Fisher matrix
fm = ...
# we need to sort the names to be usable!
fm = fm.sort(
key=[
"Omega_matter",
"Omega_baryon",
"h",
"sigma8",
"n_s",
"w0",
"wa",
]
)
# old ones
omega_m = symbols("omega_m")
omega_b = symbols("omega_b")
h = symbols("h")
s8 = symbols("s8")
# new ones
Omega_m = symbols("Omega_m")
Omega_b = symbols("Omega_b")
sigma8 = symbols("sigma8")
# the factor or normalization
norm = symbols("norm")
# the symbolic Jacobian
result = Matrix(
[Omega_m * h**2, Omega_b * h**2, h, sigma8 * sqrt(Omega_m / norm)]
).jacobian([Omega_m, Omega_b, h, sigma8])
# the numerical Jacobian
jacobian = np.array(
result.subs(
[
# actual values may differ
(Omega_m, 0.3),
(Omega_b, 0.05),
(h, 0.67),
(sigma8, 0.83),
(norm, 0.33),
]
).evalf()
).astype(np.float64)
# the new Fisher matrix
fm_reparametrized = fm.reparametrize(
block_diag(jacobian, np.eye(len(fm) - len(jacobian))),
names=[
"omega_m",
"omega_baryon",
"h",
"s8",
"n_s",
"w0",
"wa",
],
latex_names=math_mode(
[
"\\omega_m",
"\\omega_b",
"h",
"S_8",
"n_s",
"w_0",
"w_a",
]
),
fiducials=fm.fiducials
# actual values may differ
* np.array(
[
0.67**2,
0.67**2,
1,
np.sqrt(0.3 / float(norm.evalf(subs={norm: 0.3}))),
1,
1,
]
)
)
Note that there is a lot of useless/redundant code here, notably the setting of names which do not change, as well as the fiducial values, a lot of which are equal to the previous ones. We also need to make sure the Jacobian has the right dimensions, and the parameters must be sorted in a particular order.
In particular:
This would be useful for displaying the following info:
Some open questions:
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.