Git Product home page Git Product logo

chain-simulator's People

Contributors

maxnollet avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar

chain-simulator's Issues

Deprecate old NumPy/SciPy/CuPy vector processors in `_simulation` module

There are a few functions that can be deprecated and removed from the _simulation.py module. These functions still use the slow (see #23) matrix-matrix multiplications and vector_processor effectively has the same functionality.

The following needs to be done:

Optimise matrix multiplications

In cuSPARSE 12, matrix-matrix multiplications consume vast amounts of VRAM. Recent versions of cuSPARSE reworked sparse matrix-matrix multiplication and the current algorithm is very VRAM-hungry. There are several algorithms available (refer to cusparseSpGEMM()), out of which the non-default CUSPARSE_SPGEMM_ALG2 looks most promising.

These are very low-level functions written in C/C++. CuPy makes interacting with the low-level cuSPARSE library possible but is not up-to-date enough to expose CUSPARSE_SPGEMM_ALG2. To access this functionality, Cython can be used, a cuSPARSE wrapper in C/C++ can be written or wait for CuPy to update their wrapper.

In the meantime, vector-matrix multiplications can be used to achieve the same output with a vastly smaller VRAM footprint.

Remove deprecated AbstractDigitalTwinFacade

The old AbstractDigitalTwinFacade is deprecated and no longer used. This first implementation for constructing transition matrices is slow, it has a complexity of $O(n^2)$ and is entirely sequential. The implementation calculates the probability of every single cell in transition matrices. This inefficient as most of these probabilities are 0.

https://github.com/Bovi-analytics/DigitalCowSimulationPlatform/blob/308057b0d9b2df0fe23deadadbbd3baedd2f412a/src/chain_simulator/abstract.py#L16

Paralellize state change probability calculation

Problem

Currently, users of this library need to calculate state change probabilities themselves. Whether this is done by hand or in code, fact is that users need to provide state change probabilities themselves. This library will eventually make heavy use of parallel processing and even clusters. This library should provide a way to calculate state change probablities in parallel so that this process doesn't become a bottleneck of a single CPU core or single node in a cluster.

Proposed solution

There are currently two libraries contending for operating on a cluster: PySpark and Dask. One of these (or maybe both) will be implemented in the library. Investigate which library will suit this project best and provide hooks via these libraries to calulate state change probabilities.

Add GitHub workflow to test documentation

Currently the API documentation and userguide need to be tested manually for errors. When this is forgotten, errors can easily slip in the main/stable branch and cause confusion among end-users. Therefore it would be handy to test the documentation every time the main/stable branch receives an update. This way documentation errors can be spotted immediately.

Add callback mechanism for intermediate state processing

The number of possible paths to traverse through a transition matrix increases drastically as a transition matrix grows in size. This process, in turn, takes longer and longer. Instead of finding possible paths a posteriori, this can be done with each intermediate/final state vector.

To realise this intermediate processing, a callback mechanism needs to be developed. Users should be able to provide functions as calbacks which process intermediate/final state vectors. These callback cuntions need to return ONE variable as these variables are accumulated and summed to produce a final variable.

Concept:

from decimal import Decimal
from typing import Callable, Concatenate, ParamSpec

import numpy as np

P = ParamSpec("P")
T = TypeVar("T", np.ndarray)
U = TypeVar("U", int, float, Decimal)


# Decorator, used to reister callback functions.
def register_callback(Callable[Concatenate[T, P], U]) -> Callable[P, U]:
	...


# Function to call after every n-th matrix multiplication.
@register_callback
def callback(state_vector) -> int | float | Decimal:
	...

Use `warnings.warn` instead of `logging.warn`

What's wrong?

In src/chain_simulator/utilities.py, warnings are communicated to the user using logging.Logger.warning. Based on Logging HOWTO - When to use logging, those warnings should be communicated using warnings. This is because these are cases where the client appication should be modified to rid the warnings.

Resolution

Function validate_matrix_sum should be updated to use warnings:
https://github.com/Bovi-analytics/DigitalCowSimulationPlatform/blob/308057b0d9b2df0fe23deadadbbd3baedd2f412a/src/chain_simulator/utilities.py#L47

And function validate_matrix_positive should be updated too to use warnings:
https://github.com/Bovi-analytics/DigitalCowSimulationPlatform/blob/308057b0d9b2df0fe23deadadbbd3baedd2f412a/src/chain_simulator/utilities.py#L87

It may be handy to define a custom warning, something like InvalidTransitionMatrixWarning?

Todo

Fatal Python error: PyFrame_BlockPop: block stack underflow

Issue: When trying to assemble a transition matrix from a generator, Python crashes.

Environment:

  • Operating system: Windows (11?)
  • Python version: 3.10.?

How to reproduce:

  • Unknown

Traceback:

Fatal Python error: PyFrame_BlockPop: block stack underflow
Python runtime state: initialized
Thread 0x000036bc (most recent call first):
  File "C:\Program Files\Python310\lib\threading.py", line 324 in wait
  File "C:\Program Files\Python310\lib\threading.py", line 607 in wait
  File "C:\Users\gabev\AppData\Local\JetBrains\Toolbox\apps\PyCharm-P\ch-0\213.6777.50\plugins\python\helpers\pydev\pydevd.py", line 150 in _on_run
  File "C:\Users\gabev\AppData\Local\JetBrains\Toolbox\apps\PyCharm-P\ch-0\213.6777.50\plugins\python\helpers\pydev\_pydevd_bundle\pydevd_comm.py", line 218 in run
  File "C:\Program Files\Python310\lib\threading.py", line 1016 in _bootstrap_inner
  File "C:\Program Files\Python310\lib\threading.py", line 973 in _bootstrap
Thread 0x00003614 (most recent call first):
  File "C:\Users\gabev\AppData\Local\JetBrains\Toolbox\apps\PyCharm-P\ch-0\213.6777.50\plugins\python\helpers\pydev\_pydevd_bundle\pydevd_comm.py", line 292 in _on_run
  File "C:\Users\gabev\AppData\Local\JetBrains\Toolbox\apps\PyCharm-P\ch-0\213.6777.50\plugins\python\helpers\pydev\_pydevd_bundle\pydevd_comm.py", line 218 in run
  File "C:\Program Files\Python310\lib\threading.py", line 1016 in _bootstrap_inner
  File "C:\Program Files\Python310\lib\threading.py", line 973 in _bootstrap
Thread 0x0000338c (most recent call first):
  File "C:\Program Files\Python310\lib\threading.py", line 324 in wait
  File "C:\Program Files\Python310\lib\queue.py", line 180 in get
  File "C:\Users\gabev\AppData\Local\JetBrains\Toolbox\apps\PyCharm-P\ch-0\213.6777.50\plugins\python\helpers\pydev\_pydevd_bundle\pydevd_comm.py", line 367 in _on_run
  File "C:\Users\gabev\AppData\Local\JetBrains\Toolbox\apps\PyCharm-P\ch-0\213.6777.50\plugins\python\helpers\pydev\_pydevd_bundle\pydevd_comm.py", line 218 in run
  File "C:\Program Files\Python310\lib\threading.py", line 1016 in _bootstrap_inner
  File "C:\Program Files\Python310\lib\threading.py", line 973 in _bootstrap
Current thread 0x000013d0 (most recent call first):
  File "C:\Users\gabev\PycharmProjects\DigitalCow\DigitalCow.py", line 718 in state_probability_generator
  File "C:\Users\gabev\PycharmProjects\DigitalCow\venv\lib\site-packages\chain_simulator\implementations.py", line 121 in array_assembler
  File "C:/Users/gabev/PycharmProjects/DigitalCow/main.py", line 123 in chain_simulator_test
  File "C:/Users/gabev/PycharmProjects/DigitalCow/main.py", line 137 in <module>
  File "C:\Users\gabev\AppData\Local\JetBrains\Toolbox\apps\PyCharm-P\ch-0\213.6777.50\plugins\python\helpers\pydev\_pydev_imps\_pydev_execfile.py", line 18 in execfile
  File "C:\Users\gabev\AppData\Local\JetBrains\Toolbox\apps\PyCharm-P\ch-0\213.6777.50\plugins\python\helpers\pydev\pydevd.py", line 1483 in _exec
  File "C:\Users\gabev\AppData\Local\JetBrains\Toolbox\apps\PyCharm-P\ch-0\213.6777.50\plugins\python\helpers\pydev\pydevd.py", line 1476 in run
  File "C:\Users\gabev\AppData\Local\JetBrains\Toolbox\apps\PyCharm-P\ch-0\213.6777.50\plugins\python\helpers\pydev\pydevd.py", line 2164 in main
  File "C:\Users\gabev\AppData\Local\JetBrains\Toolbox\apps\PyCharm-P\ch-0\213.6777.50\plugins\python\helpers\pydev\pydevd.py", line 2173 in <module>
Extension modules: _pydevd_bundle.pydevd_cython_win32_310_64, _pydevd_bundle.pydevd_cython, _pydevd_frame_eval.pydevd_frame_evaluator_common, _pydevd_frame_eval.pydevd_frame_evaluator_win32_310_64, numpy.core._multiarray_umath, numpy.core._multiarray_tests, numpy.linalg._umath_linalg, numpy.fft._pocketfft_internal, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, scipy._lib._ccallback_c, scipy.sparse._sparsetools, _csparsetools, scipy.sparse._csparsetools, scipy.sparse.linalg._isolve._iterative, scipy.linalg._fblas, scipy.linalg._flapack, scipy.linalg._cythonized_array_utils, scipy.linalg._flinalg, scipy.linalg._solve_toeplitz, scipy.linalg._matfuncs_sqrtm_triu, scipy.linalg.cython_lapack, scipy.linalg.cython_blas, scipy.linalg._matfuncs_expm, scipy.linalg._decomp_update, scipy.sparse.linalg._dsolve._superlu, scipy.sparse.linalg._eigen.arpack._arpack, scipy.sparse.csgraph._tools, scipy.sparse.csgraph._shortest_path, scipy.sparse.csgraph._traversal, scipy.sparse.csgraph._min_spanning_tree, scipy.sparse.csgraph._flow, scipy.sparse.csgraph._matching, scipy.sparse.csgraph._reordering (total: 41)
Process finished with exit code -1073740791 (0xC0000409)

Document installation of optional dependencies

The library offers optional dependencies. These can be installed using pip install chain-simulator[gpu]. This will install the optional dependency CuPy, which needs some special treatment.

As described in CuPy installation, there are some requirements that the user needs to install themselves. This is not clear and should be clearly documented in how to install chain-simulator.

First optimised matrix multiplier

This will be the first optimised version of a transition matrix multiplier. Below is a general flowchart of how the function will work.

flowchart_cupy_v1 drawio

The function should be compaible with CuPy sparse matrices, but fall back to a CPU implementation using NumPy/SciPy for matrix multiplications.

Parameters needed:

  • Transition matrix
  • Days or steps in time to simulate
  • Interval for returning intermediary matrices, optional

Function `chain_simulator` progresses one step too far

What's wrong?

The current implementation of src/chain_simulator/_simulation.py:chain_simulator returns a transition matrix which is always progressed one step too far forward in time. The function receives a transition matrix which is already the 'next' step in time. So, the configuration chain_simulator(transition_matrix, 1) should return the unmodified transition matrix.

This becomes apparent when multiplying an initial state vector with the initial transition matrix. The initial transition matrix is the next step in time.

A more elaborate example

Suppose we want to calculate a final state vector for 1 day, 2 days and 3 days in the future, we would do the following:

>>> import numpy as np
>>> initial_state_vector = np.array([1, 0, 0])
>>> initial_transition_matrix = np.array([[0, 1, 0], [0, 1/2, 1/2], [0, 0, 1]])
>>> initial_state_vector @ initial_transition_matrix
np.array([0, 1, 0])
>>> initial_state_vector @ initial_transition_matrix @ initial_transition_matrix
np.array(0, 1/2, 1/2)
>>> initial_state_vector @ initial_transition_matrix @ initial_transition_matrix @ initial_transition_matrix
np.array([0, 1/4, 3/4])

The current implementation gives the following output:

>>> import numpy as np
>>> transition_matrix = np.array([[0, 1, 0], [0, 1/2, 1/2], [0, 0, 1]])  # Actually the 1st step forward in time.
>>> simulator = chain_simulator(transition_matrix, 3, 1)
>>> next(simulator)[0]
np.array([[0, 1/2, 1/2], [0, 1/4, 3/4], [0, 0, 1]])  # Actually the 2nd step forward in time.
>>> next(simulator)[0]
np.array([[0, 1/4, 3/4], [0, 1/8, 7/8], [0, 0, 1]])  # Actually the 3rd step forward in time.
>>> next(simulator)[0]
np.array([[0, 1/8, 7/8], [0, 1/16, 15/16], [0, 0, 1]])  # Actually the 4th step forward in time.

The expected output would look like this:

>>> import numpy as np
>>> transition_matrix = np.array([[0.0, 1.0, 0.0], [0.0, 0.5, 0.5], [0.0, 0.0, 1.0]])
>>> simulator = chain_simulator(transition_matrix, 3, 1)
>>> next(simulator)[0]
np.array([[0, 1, 0], [0, 1/2, 1/2], [0, 0, 1]])  # 1st step forward in time, identical to input matrix.
>>> next(simulator)[0]
np.array([[0, 1/2, 1/2], [0, 1/4, 3/4], [0, 0, 1]])  # 2nd step forward in time.
>>> next(simulator)[0]
np.array([[0, 1/4, 3/4], [0, 1/8, 7/8], [0, 0, 1]])  # 3rd step forward in time.

Accumulate intermediate/final state vectors into a sparse matrix

Function state_vector_processor in src/chain_simulator/_simulation.py returns sparse arrays as numpy.ndarray-types. There are currently no sparse formats in the NumPy or SciPy libraries that provide a sparse format for 1d arrays (only 2d arrays/matrices). As the number of states in transition matrices and the number of intermediate state vectors grow, more and more memory will be used. This is wastful as arrays with sparse data are treaded and stored as dense.

To combat this, both intermediate and final state vectors could be stacked incrementally into a matrix-form. This allows the use of sparse formats like CSC and CSR to vastly reduce memory usage. Both formats support efficient arithmetic operations, CSC supports efficient column slicing and CSR supports efficient row slicing.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.