pressio / rom-tools-and-workflows Goto Github PK

License: Other

Python 99.16% Jinja 0.09% Shell 0.75%

rom-tools-and-workflows's Introduction

rom-tools-and-workflows

The ROM tools and workflows Python library comprises a set of algorithms for constructing and exploiting ROMs that rely on abstract base classes that encapsulate all the information needed to run a given algorithm. The philosophy is that, for any given application, the user simply needs to "fill out" a class that meets the required API of the abstract base class. Once this class is complete, the user gains access to all of our existing algorithms.

Documentation

https://pressio.github.io/rom-tools-and-workflows/romtools.html

Installation

cd my-path/rom-tools-and-workflows
pip install .

Verify installation by running the tests

Note: you need pytest installed

cd my-path/rom-tools-and-workflows
pytest

Note: some tests actually generate some auxiliary/temporary files which are handled via the tmp_path as suggested https://docs.pytest.org/en/7.1.x/how-to/tmp_path.html.

Building the documentation

cd <repo-dir>
pdoc ./romtools -t ./custom-template --math --docformat google

this opens a browser with the module documentation. More info about pdoc can be found here

rom-tools-and-workflows's People

Contributors

Stargazers

Watchers

rom-tools-and-workflows's Issues

avoid unnecessary imports

need to go through all files and remove all the unnecessary imports

Update shifter API and implementations

Update shifter API as:

class <SomeShifter>(Shifter):

    def apply_shift(self,my_array: np.ndarray):
       ## apply shift to input array in place

    def remove_shift(self,my_array: np.ndarray):
       ## undo shift vector in place

Additionally, all operations should be made to be in place in the shifter class. Tests should check for this.

Add print statement for all MPI tests that hardwired to a specific number of processors

As mentioned in #21

Update shifter API and documentation

Our current API is:

class AbstractShifter(abc.ABC):

    '''

    Abstract implmentation

    '''


    @abc.abstractmethod

    def __call__(self, my_array: np.ndarray) -> Tuple[np.ndarray, np.ndarray]:

        '''

        Overload to apply shift

        '''

        pass

We propose to change this to:

class AbstractShifter(abc.ABC):

    '''

    Abstract implmentation

    '''


    @abc.abstractmethod

    def shiftTensor(self, my_array: np.ndarray):
        '''
        In place operation that shifts the tensor
        '''
        pass

    def getShift(self) -> np.ndarray:
        '''
        Returns the shift value
        '''
        pass

Additionally, clean up documentation:

Misspelling on page "Abstract implmentation".
Does the VectorShifter require the vector to be of the same size as the first dimension of the array? If so, there should be a check.

Rename shift_vector affine_offset

Shift_vector is a bad name now that we are using a tensor format, we should change this to affine_offset or something like this.

Fix deploy_docs yaml

Fix snapshot demo

The snapshot demo needs to be updated for the new case where the snapshot class is non existent. (Demo may now be irrelevant)

add section discussing usage with mpi

just some stuff here to put later in the doc properly

install mpi
install mpi4py: suggest to use https://mpi4py.readthedocs.io/en/stable/install.html#using-pip
install romtools bla bla
running tests:

$MPIRUN -n 3 python -m pytest --with-mpi

etc

Test failure in install-and-test.yaml from check-mpi-size.py

@fnrizzi @cwschilly

This part of the testing workflow is causing our tests to fail (seems this has been the case for a bit)

https://github.com/Pressio/rom-tools-and-workflows/actions/runs/7221756264/job/19677403724

I commented this out for now so that the tests would pass, but could one of you take a look whenever you have a chance? This is a bit beyond me :)

deim: unused function and wrong call

deim.py has vectorDeimGetIndices which is unused/untested and has a wrong call inside.
Leaving the function commented out for now
This needs to be fixed

Linear algebra libraries: mpi4py light library and pressio-tools full library.

To add concrete implementations that can be run parallel (row distributed memory), we need a variety of basic algebra and linear algebra functions.
We want to be able to create a new library that has two usages:

a "light mode":

installable via pip install ... and has pure python dependencies (if any)
supports some of the basic functions so that a user can utilize most of the rom-tools library without headaches

a "heavy more performant mode":

uses bindings to Trilinos building on top of what we have done already in https://github.com/Pressio/pressio-tools

NOTE: All functions should target row distributed parallelization

For the python-only library mode 1 we will support

min max of a distributed vector
A^T B, where A and B are both row distributed
Hacks to do the SVD and orthogonalization via method of snapshots
Stretch/unlikely: svd, qr, basic linear solve.

For the more heavy library mode 2 we will support

min max of a distributed vector
A^T B, where A and B are both row distributed
Hacks to do the SVD and orthogonalization via method of snapshots
svd
qr
linear solve

h5py should be an optional dependency

the outputter.py includes h5py, but h5py should be made an optional dep and conditionally enable the related code if present

remove constructor in all base classes

remove constructor in all base classes because it is not needed at all

cleanup and resolve how the various imports should be done internally

we need to figure out what is the proper hierarchy of imports needed for all these files:

romtools/__init__py 
romtools/hyper_reduction
etc

what is the rule to follow if any?

summary of current tests: w or w/o MPI, ranks, etc

make a summary of all tests currently present in the code so that we have an idea of which ones are serial, which ones use MPI, how many ranks ,etc
something like:

test_name1: serial, skipped with mpi
test_name2: mpi, min # ranks 3 else no-op
test_name3: mpi, min # ranks 3
etc

add a check to ensure no test is added that violates # of ranks

once we decide on #21, we need to add a check somwhere that ensure no new test is added that violates the agreed # of ranks

Remove "Abstract" from "AbstractXYZ" class names

The Abstract prefix is not needed and we should remove to be consistent with more standard naming conventions.

Fix formatting issues so code passes new pylint check

Related to #15 (PR #19)

add pipeline using mpi4py

use this: https://github.com/mpi4py/setup-mpi

Create list for required MPI changes

We are working towards having rom-tools-and-workflows work fully in parallel. For planning purposes, we need to:

Go through each abstract class, and check if there is any reason an abstract class would not be compatible with MPI (it seems like this shouldn't be an issue)
Go through each concrete implementation of a base class, and check if there is a reason a concrete class would not be compatible with MPI (e.g., uses min, max, svd without an svd member, etc.).
Summarize in list

Splitter implementation

The current splitter operates by creating a new matrix that is an expanded version of the existing matrix. We then perform SVD on the expanded matrix. This can be made more efficient by having splitter return information about the splitting, and then we efficiently leverage this to do SVD. This requires some thought, and this issue is to begin this process.

Support for heterogeneous parameter spaces

Current parameter space classes only support homogenous spaces. It would be good to support heterogeneous cases such as a mix of normal and uniform distributions, nonvariable parameters (constants), and non-numeric parameters. The ParameterSpace base class may need to be reworked to accommodate.

Rename TrialSpace as VectorSpace and create a derived TrialSpace concept.

Right now, we represent a linear subspace with a "TrialSpace". As we move to support more hyper-reduction techniques, we need to construct linear subspaces and it would be good to re-use what's in TrialSpace. We will handle this by re-naming TrialSpace as VectorSpace. Downstream we can then create the notion of a TrialSpace which has state-specific functions, i.e., mapFromReducedState, mapToReducedState, etc.

Combine TrialSpaceFromPOD and TrialSpaceFromScaledPOD

add an optional scaler argument to TrialSpaceFromPOD with a default value of NoOpScaler() and delete TrialSpaceFromScaledPOD

setup github page hosting the doc

now that the repo is public, need to setup the github page to host the doc

should be pretty easy: https://pdoc.dev/docs/pdoc.html#deploying-to-github-pages

Remove snapshot data from rom-tools

We are planning on removing the snapshot data class and just have, e.g., trial space work directly with the snapshot tensor. Any new comments/thoughts on this should be placed here.

fix `ERROR: not found` CI failures

The error, as seen here:

 ============================ no tests ran in 0.25s =============================
ERROR: not found: /home/runner/work/rom-tools-and-workflows/rom-tools-and-workflows/tests/romtools/__pycache__
(no match in any of [<Dir romtools>])

First appeared on this commit.

decide where the X goes and where it does not

need to decide if the X defines a product or just indicates a new axis

Resolve failures in MPI tests

Follow up to #5

Failures can be seen here

add license file and license header to all files

we need a liense file and a license header to every file

we can adapt the ones used in pressio: see
https://github.com/Pressio/pressio/blob/develop/LICENSE
https://github.com/Pressio/pressio/blob/develop/helper_scripts/add-sandia-license.sh

IMPORTANT: make sure Eric Parish is listed as POC for all this.

Reduce timeout time for CI tests

We have a test currently that sporadically hangs. The timeout for this should be much lower than the current 6 hours.

public vs private `tensor_to_matrix(tensor_input)`

do we need this to be public?

def tensor_to_matrix(tensor_input):
def matrix_to_tensor

Small improvements to truncater

There's an extra description under EnergyTruncater that does not make sense. Also recommend an example or recommended value for the threshold.
There should be a warning in BasisSizeTruncater if you try to truncate larger than the actual size of the basis (or accidentally put a negative number or zero).
General review of documentation

add formatting stage so that we keep things consistent

standardize imports formatting/order

Update documentation for tensor reformat

Self explanatory. I'll initially take the lead on this.

Add additional Parameter and ParameterSpace implementations

Need to add the following additional concrete implementations of the romtools.workflows.parameter_spaces.Parameter class

GaussianParameter
TriangularParameter

which support sampling from Gaussian and Triangular distributions respectively.

Also need to add the class

GaussianParameterSpace

which will subclass romtools.workflows.parameter_spaces.HomogeneousParameterSpace.

improve scope and motivation section

after #10, we need to improve a bit scope and motivation.
I thjink we should say why we are doing this

Small ECSW improvements

Update documentation for inputs/outputs to conform to the rest of the code
Potentially use more expressive variable names (e.g., xi->weights, inds-> sample_mesh_indices, tau->tolerance). Thoughts?
Is there any reason that xi is of size N instead of N_s?
Could we make inds a numpy array instead of a list? I personally like this more, but this doesn't need to be the case.
In the ecsw_fixed_test_basis, the default solver is the abstract solver. Should we change this to be the concrete implementation?

Add comm argument to globalAbsSumImpl

As discussed in PR #41

General formatting guidelines

Here is a general template for a source file (most items are checked by the pylint check)

#
# ************************************************************************
#          LICENSE (can use add-sandia-license.sh in helper_scripts)
# ************************************************************************
#

'''
Put docstring documentation in single quotes. Try to limit line lengths to 130 characters.
'''

import abc       # standard libraries first
import numpy as np
from romtools.trial_spac import AbstractTrialSpace    # rt imports last


class AbstractClass(abc.ABC): # PascalCase for classes (and two newlines between imports and code)
    '''
    Describe what the class does
    '''
    def __init__(self, arg_1): # all methods should take self as first argument
        '''
        Constructor documentation

        Args:
            arg_1 (<type>): description of argument
        '''
        # code here

    def someMethod(self, arg_1, arg_2): # camelCase for methods/functions, snake_case for variables/arguments
        '''
        Brief description

        Args:
             arg_1 ():
             arg_2 ():

        Returns:
        '''
        # code here

As noted in #24 , ChatGPT can be helpful for generating consistent documentation for methods in a class.

revise classes documentation so that they all conform to a certain starndard

out of curiosity i tried to ask chatgpt to write the documentation for the AbstractSnapshotData

and I got this response:

class AbstractSnapshotData(abc.ABC):
    """
    An abstract base class for representing snapshot data.

    This class defines the common interface for classes that store and provide access to snapshot data
    as part of a simulation or data processing system. Implementations of this class are expected to
    define the initialization method and various methods for accessing and manipulating the data.

    Attributes:
        var_names (list): A list of variable names associated with the snapshot data.

    Methods:
        __init__(self, **kwargs):
            Initializes an instance of the AbstractSnapshotData class. Subclasses should implement
            this method to set up the necessary data structures or connections to data sources.

        getSnapshotsAsListOfArrays(self) -> Iterable[np.ndarray]:
            Retrieve the snapshots as an iterable of NumPy arrays. Each array represents a single snapshot.

        getMeshGids(self):
            Retrieve the mesh global identifiers associated with the snapshots.

        getSnapshotsAsArray(self) -> np.ndarray:
            Retrieve the snapshots as a single NumPy array by converting the list of snapshots into an array.

        getVariableNames(self) -> list:
            Retrieve the list of variable names associated with the snapshot data.

        getNumVars(self) -> int:
            Get the number of variables (dimensions) in the snapshot data.

    Note:
    - Subclasses should implement the abstract methods `__init__()`, `getSnapshotsAsListOfArrays()`,
      and `getMeshGids()`.
    - The attribute `var_names` should be set in the subclass constructor to define the variable names.
    - The `getSnapshotsAsArray()` and `getNumVars()` methods provide convenient access to the data.
    """

    @abc.abstractmethod
    def __init__(self, **kwargs):
        pass

    @abc.abstractmethod
    def getSnapshotsAsListOfArrays(self) -> Iterable[np.ndarray]:
        pass

    @abc.abstractmethod
    def getMeshGids(self):
        pass

    def getSnapshotsAsArray(self) -> np.ndarray:
        snapshot_array = listOfSnapshotsToArray(self.getSnapshotsAsListOfArrays())
        return snapshot_array

    def getVariableNames(self) -> list:
        return self.var_names

    def getNumVars(self) -> int:
        return len(self.getVariableNames())

then i asked to do

and i got this:

class AbstractSnapshotData(abc.ABC):
    """
    An abstract base class for representing snapshot data.

    This class defines the common interface for classes that store and provide access to snapshot data
    as part of a simulation or data processing system. Implementations of this class are expected to
    define the initialization method and various methods for accessing and manipulating the data.

    Attributes:
        var_names (list): A list of variable names associated with the snapshot data.

    Methods:
    """

    @abc.abstractmethod
    def __init__(self, **kwargs):
        """
        Initializes an instance of the AbstractSnapshotData class. Subclasses should implement
        this method to set up the necessary data structures or connections to data sources.

        Args:
            **kwargs: Additional keyword arguments that subclasses may accept for configuration.

        Note:
        Subclasses must call this base class constructor and set the `var_names` attribute to
        define the variable names associated with the snapshot data.
        """
        pass

    @abc.abstractmethod
    def getSnapshotsAsListOfArrays(self) -> Iterable[np.ndarray]:
        """
        Retrieve the snapshots as an iterable of NumPy arrays. Each array represents a single snapshot.

        Returns:
            Iterable[np.ndarray]: An iterable of NumPy arrays representing the snapshots.

        Note:
        Subclasses must implement this method to provide access to the actual snapshot data.
        """
        pass

    @abc.abstractmethod
    def getMeshGids(self):
        """
        Retrieve the mesh global identifiers associated with the snapshots.

        Returns:
            None or specific data type: The mesh global identifiers, or None if not applicable.

        Note:
        Subclasses must implement this method to provide access to mesh global identifiers if relevant.
        """
        pass

    def getSnapshotsAsArray(self) -> np.ndarray:
        """
        Retrieve the snapshots as a single NumPy array by converting the list of snapshots into an array.

        Returns:
            np.ndarray: A NumPy array containing all the snapshots.

        Note:
        This method provides a convenient way to access the snapshot data as a single array.
        Subclasses can use the `getSnapshotsAsListOfArrays()` method to implement this.
        """
        snapshot_array = listOfSnapshotsToArray(self.getSnapshotsAsListOfArrays())
        return snapshot_array

    def getVariableNames(self) -> list:
        """
        Retrieve the list of variable names associated with the snapshot data.

        Returns:
            list: A list of variable names.

        Note:
        Subclasses should ensure that this list is properly defined and set in the constructor.
        """
        return self.var_names

    def getNumVars(self) -> int:
        """
        Get the number of variables (dimensions) in the snapshot data.

        Returns:
            int: The number of variables.

        Note:
        Subclasses should make sure that this method returns the correct number of variables
        associated with the snapshot data.
        """
        return len(self.getVariableNames())

all tests should be marked with mpi guard

some tests are missing the mpi guard for running or skipping

naming convention for classes and functions

check if functions and classes have consistent naming (camelcase, or whatnot)
make a summary of all the inconsistencies

add page to documentation that shows the license directly

we should do something like https://pressio.github.io/pressio/license.html
the page should be clearly visible in the left sidebar

testing with mpi: decide on a min/max # of ranks

we need to decide if tests when run with mpi need to have a min/max # of ranks.
For example we could:

not set any rule on this: any test does whatever but then this creates an issue for the CI
decide only on the min # of ranks to use so that all MPI test should conform to this but then how do we set the # of ranks to use in the CI?
decide that every MPI test MUST be wirtten to work with a min and a max # of ranks

fix order of submodules when rendered

by default, pdoc uses alphabetical order for odering the submodules when rendered.
in #10 we tried to fix the order since it makes more sense to use a "logical" one (topdown) kind of, so that it is easier to follow.

The hack used in that PR however did not work because was breaking the links of the submodules pages.

I also opened an issue mitmproxy/pdoc#630 directly for pdoc.
while this is worked out, it would be good to find a temporary solution.

fix test that sporadically hangs

related to #88, which test hangs? @jtencer

Change greedy algorithm save file to include actual values of parameters

The greedy algorithm should save to file the actual values of the parameters it has run the FOM for. Right now, it just saves the sample index.

Reformat snapshot data to be in tensor form

As we discussed, we are going to refactor snapshot data to be in the form of a tensor:

snapshots \in \RR{ N_vars x N_gridpoints x N_samples }

Is everyone ok with this shape?

Internally, when we leverage the snapshots to do, e.g., POD, we will reshape them into a matrix of size N_vars N_gridpoints x N_samples. We have the option of adapting order "F" convention to do this, or order "C" convention to do this. Order "F" will order the states as u_1, v_1,w_1, u_2,v_2, ... , while "C" will block the variable sets.

I would vote for order "F", even though that's not the preferred way for Python. The alternative is to set up the snapshot shape as snapshots \in \RR{ N_gridpoints x N_vars x N_samples }, in which case we could reshape with order "C" to have u_1, v_1,w_1, u_2,v_2,.

Any preferences?