bovi-analytics / chain-simulator Goto Github PK
View Code? Open in Web Editor NEWGeneric, highly scalable platform for simulating digital twins using Markov chains.
Home Page: https://bovi-analytics.github.io/chain-simulator/
Generic, highly scalable platform for simulating digital twins using Markov chains.
Home Page: https://bovi-analytics.github.io/chain-simulator/
The module src/chain_simulator/utilities.py
provides a validator named validate_matrix_positive
. All this function does is check for negative probabilities in a transition matrix. A more suitable name would be validate_matrix_negative
.
Function chain_simulator.utilities.validate_matrix_positive
doesn't work with a Coordinate (COO) array/matrix. Update the function to be compatible with this type and possiby other sparse matrix formats.
There are a few functions that can be deprecated and removed from the _simulation.py
module. These functions still use the slow (see #23) matrix-matrix multiplications and vector_processor
effectively has the same functionality.
The following needs to be done:
vector_processor_numpy
vector_processor_scipy
vector_processor_cupy
vector_processor_numpy
vector_processor_scipy
vector_processor_cupy
Places to update links:
In cuSPARSE 12, matrix-matrix multiplications consume vast amounts of VRAM. Recent versions of cuSPARSE reworked sparse matrix-matrix multiplication and the current algorithm is very VRAM-hungry. There are several algorithms available (refer to cusparseSpGEMM()
), out of which the non-default CUSPARSE_SPGEMM_ALG2
looks most promising.
These are very low-level functions written in C/C++. CuPy makes interacting with the low-level cuSPARSE library possible but is not up-to-date enough to expose CUSPARSE_SPGEMM_ALG2
. To access this functionality, Cython can be used, a cuSPARSE wrapper in C/C++ can be written or wait for CuPy to update their wrapper.
In the meantime, vector-matrix multiplications can be used to achieve the same output with a vastly smaller VRAM footprint.
The old AbstractDigitalTwinFacade is deprecated and no longer used. This first implementation for constructing transition matrices is slow, it has a complexity of 0
.
Todo:
Sometimes it may be useful to access the step size of a simulation output. This step size can be used to extrapolate data in a callback function.
Currently, users of this library need to calculate state change probabilities themselves. Whether this is done by hand or in code, fact is that users need to provide state change probabilities themselves. This library will eventually make heavy use of parallel processing and even clusters. This library should provide a way to calculate state change probablities in parallel so that this process doesn't become a bottleneck of a single CPU core or single node in a cluster.
There are currently two libraries contending for operating on a cluster: PySpark and Dask. One of these (or maybe both) will be implemented in the library. Investigate which library will suit this project best and provide hooks via these libraries to calulate state change probabilities.
Function chain_simulator.utilities.validate_matrix_sum
is only tested with numpy.ndarray
and scipy.sparse.csr_array
(refer to tests).
It should be tested against more sparse matrix formats from the SciPy library, namely:
Linked to #7, this object was once used to assemble Compressed Sparse Row (CSR) arrays from the scipy.sparse
library. This implementation is replaced with chain_simulator.implementations.array_assembler
.
Currently the API documentation and userguide need to be tested manually for errors. When this is forgotten, errors can easily slip in the main/stable branch and cause confusion among end-users. Therefore it would be handy to test the documentation every time the main/stable branch receives an update. This way documentation errors can be spotted immediately.
The number of possible paths to traverse through a transition matrix increases drastically as a transition matrix grows in size. This process, in turn, takes longer and longer. Instead of finding possible paths a posteriori, this can be done with each intermediate/final state vector.
To realise this intermediate processing, a callback mechanism needs to be developed. Users should be able to provide functions as calbacks which process intermediate/final state vectors. These callback cuntions need to return ONE variable as these variables are accumulated and summed to produce a final variable.
Concept:
from decimal import Decimal
from typing import Callable, Concatenate, ParamSpec
import numpy as np
P = ParamSpec("P")
T = TypeVar("T", np.ndarray)
U = TypeVar("U", int, float, Decimal)
# Decorator, used to reister callback functions.
def register_callback(Callable[Concatenate[T, P], U]) -> Callable[P, U]:
...
# Function to call after every n-th matrix multiplication.
@register_callback
def callback(state_vector) -> int | float | Decimal:
...
In src/chain_simulator/utilities.py
, warnings are communicated to the user using logging.Logger.warning
. Based on Logging HOWTO - When to use logging, those warnings should be communicated using warnings
. This is because these are cases where the client appication should be modified to rid the warnings.
Function validate_matrix_sum
should be updated to use warnings
:
https://github.com/Bovi-analytics/DigitalCowSimulationPlatform/blob/308057b0d9b2df0fe23deadadbbd3baedd2f412a/src/chain_simulator/utilities.py#L47
And function validate_matrix_positive
should be updated too to use warnings
:
https://github.com/Bovi-analytics/DigitalCowSimulationPlatform/blob/308057b0d9b2df0fe23deadadbbd3baedd2f412a/src/chain_simulator/utilities.py#L87
It may be handy to define a custom warning, something like InvalidTransitionMatrixWarning
?
warnings
warnings
Issue: When trying to assemble a transition matrix from a generator, Python crashes.
Environment:
How to reproduce:
Traceback:
Fatal Python error: PyFrame_BlockPop: block stack underflow
Python runtime state: initialized
Thread 0x000036bc (most recent call first):
File "C:\Program Files\Python310\lib\threading.py", line 324 in wait
File "C:\Program Files\Python310\lib\threading.py", line 607 in wait
File "C:\Users\gabev\AppData\Local\JetBrains\Toolbox\apps\PyCharm-P\ch-0\213.6777.50\plugins\python\helpers\pydev\pydevd.py", line 150 in _on_run
File "C:\Users\gabev\AppData\Local\JetBrains\Toolbox\apps\PyCharm-P\ch-0\213.6777.50\plugins\python\helpers\pydev\_pydevd_bundle\pydevd_comm.py", line 218 in run
File "C:\Program Files\Python310\lib\threading.py", line 1016 in _bootstrap_inner
File "C:\Program Files\Python310\lib\threading.py", line 973 in _bootstrap
Thread 0x00003614 (most recent call first):
File "C:\Users\gabev\AppData\Local\JetBrains\Toolbox\apps\PyCharm-P\ch-0\213.6777.50\plugins\python\helpers\pydev\_pydevd_bundle\pydevd_comm.py", line 292 in _on_run
File "C:\Users\gabev\AppData\Local\JetBrains\Toolbox\apps\PyCharm-P\ch-0\213.6777.50\plugins\python\helpers\pydev\_pydevd_bundle\pydevd_comm.py", line 218 in run
File "C:\Program Files\Python310\lib\threading.py", line 1016 in _bootstrap_inner
File "C:\Program Files\Python310\lib\threading.py", line 973 in _bootstrap
Thread 0x0000338c (most recent call first):
File "C:\Program Files\Python310\lib\threading.py", line 324 in wait
File "C:\Program Files\Python310\lib\queue.py", line 180 in get
File "C:\Users\gabev\AppData\Local\JetBrains\Toolbox\apps\PyCharm-P\ch-0\213.6777.50\plugins\python\helpers\pydev\_pydevd_bundle\pydevd_comm.py", line 367 in _on_run
File "C:\Users\gabev\AppData\Local\JetBrains\Toolbox\apps\PyCharm-P\ch-0\213.6777.50\plugins\python\helpers\pydev\_pydevd_bundle\pydevd_comm.py", line 218 in run
File "C:\Program Files\Python310\lib\threading.py", line 1016 in _bootstrap_inner
File "C:\Program Files\Python310\lib\threading.py", line 973 in _bootstrap
Current thread 0x000013d0 (most recent call first):
File "C:\Users\gabev\PycharmProjects\DigitalCow\DigitalCow.py", line 718 in state_probability_generator
File "C:\Users\gabev\PycharmProjects\DigitalCow\venv\lib\site-packages\chain_simulator\implementations.py", line 121 in array_assembler
File "C:/Users/gabev/PycharmProjects/DigitalCow/main.py", line 123 in chain_simulator_test
File "C:/Users/gabev/PycharmProjects/DigitalCow/main.py", line 137 in <module>
File "C:\Users\gabev\AppData\Local\JetBrains\Toolbox\apps\PyCharm-P\ch-0\213.6777.50\plugins\python\helpers\pydev\_pydev_imps\_pydev_execfile.py", line 18 in execfile
File "C:\Users\gabev\AppData\Local\JetBrains\Toolbox\apps\PyCharm-P\ch-0\213.6777.50\plugins\python\helpers\pydev\pydevd.py", line 1483 in _exec
File "C:\Users\gabev\AppData\Local\JetBrains\Toolbox\apps\PyCharm-P\ch-0\213.6777.50\plugins\python\helpers\pydev\pydevd.py", line 1476 in run
File "C:\Users\gabev\AppData\Local\JetBrains\Toolbox\apps\PyCharm-P\ch-0\213.6777.50\plugins\python\helpers\pydev\pydevd.py", line 2164 in main
File "C:\Users\gabev\AppData\Local\JetBrains\Toolbox\apps\PyCharm-P\ch-0\213.6777.50\plugins\python\helpers\pydev\pydevd.py", line 2173 in <module>
Extension modules: _pydevd_bundle.pydevd_cython_win32_310_64, _pydevd_bundle.pydevd_cython, _pydevd_frame_eval.pydevd_frame_evaluator_common, _pydevd_frame_eval.pydevd_frame_evaluator_win32_310_64, numpy.core._multiarray_umath, numpy.core._multiarray_tests, numpy.linalg._umath_linalg, numpy.fft._pocketfft_internal, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, scipy._lib._ccallback_c, scipy.sparse._sparsetools, _csparsetools, scipy.sparse._csparsetools, scipy.sparse.linalg._isolve._iterative, scipy.linalg._fblas, scipy.linalg._flapack, scipy.linalg._cythonized_array_utils, scipy.linalg._flinalg, scipy.linalg._solve_toeplitz, scipy.linalg._matfuncs_sqrtm_triu, scipy.linalg.cython_lapack, scipy.linalg.cython_blas, scipy.linalg._matfuncs_expm, scipy.linalg._decomp_update, scipy.sparse.linalg._dsolve._superlu, scipy.sparse.linalg._eigen.arpack._arpack, scipy.sparse.csgraph._tools, scipy.sparse.csgraph._shortest_path, scipy.sparse.csgraph._traversal, scipy.sparse.csgraph._min_spanning_tree, scipy.sparse.csgraph._flow, scipy.sparse.csgraph._matching, scipy.sparse.csgraph._reordering (total: 41)
Process finished with exit code -1073740791 (0xC0000409)
The library offers optional dependencies. These can be installed using pip install chain-simulator[gpu]
. This will install the optional dependency CuPy, which needs some special treatment.
As described in CuPy installation, there are some requirements that the user needs to install themselves. This is not clear and should be clearly documented in how to install chain-simulator.
This will be the first optimised version of a transition matrix multiplier. Below is a general flowchart of how the function will work.
The function should be compaible with CuPy sparse matrices, but fall back to a CPU implementation using NumPy/SciPy for matrix multiplications.
Parameters needed:
The current implementation of src/chain_simulator/_simulation.py:chain_simulator returns a transition matrix which is always progressed one step too far forward in time. The function receives a transition matrix which is already the 'next' step in time. So, the configuration chain_simulator(transition_matrix, 1)
should return the unmodified transition matrix.
This becomes apparent when multiplying an initial state vector with the initial transition matrix. The initial transition matrix is the next step in time.
Suppose we want to calculate a final state vector for 1 day, 2 days and 3 days in the future, we would do the following:
>>> import numpy as np
>>> initial_state_vector = np.array([1, 0, 0])
>>> initial_transition_matrix = np.array([[0, 1, 0], [0, 1/2, 1/2], [0, 0, 1]])
>>> initial_state_vector @ initial_transition_matrix
np.array([0, 1, 0])
>>> initial_state_vector @ initial_transition_matrix @ initial_transition_matrix
np.array(0, 1/2, 1/2)
>>> initial_state_vector @ initial_transition_matrix @ initial_transition_matrix @ initial_transition_matrix
np.array([0, 1/4, 3/4])
The current implementation gives the following output:
>>> import numpy as np
>>> transition_matrix = np.array([[0, 1, 0], [0, 1/2, 1/2], [0, 0, 1]]) # Actually the 1st step forward in time.
>>> simulator = chain_simulator(transition_matrix, 3, 1)
>>> next(simulator)[0]
np.array([[0, 1/2, 1/2], [0, 1/4, 3/4], [0, 0, 1]]) # Actually the 2nd step forward in time.
>>> next(simulator)[0]
np.array([[0, 1/4, 3/4], [0, 1/8, 7/8], [0, 0, 1]]) # Actually the 3rd step forward in time.
>>> next(simulator)[0]
np.array([[0, 1/8, 7/8], [0, 1/16, 15/16], [0, 0, 1]]) # Actually the 4th step forward in time.
The expected output would look like this:
>>> import numpy as np
>>> transition_matrix = np.array([[0.0, 1.0, 0.0], [0.0, 0.5, 0.5], [0.0, 0.0, 1.0]])
>>> simulator = chain_simulator(transition_matrix, 3, 1)
>>> next(simulator)[0]
np.array([[0, 1, 0], [0, 1/2, 1/2], [0, 0, 1]]) # 1st step forward in time, identical to input matrix.
>>> next(simulator)[0]
np.array([[0, 1/2, 1/2], [0, 1/4, 3/4], [0, 0, 1]]) # 2nd step forward in time.
>>> next(simulator)[0]
np.array([[0, 1/4, 3/4], [0, 1/8, 7/8], [0, 0, 1]]) # 3rd step forward in time.
Function state_vector_processor
in src/chain_simulator/_simulation.py
returns sparse arrays as numpy.ndarray
-types. There are currently no sparse formats in the NumPy
or SciPy
libraries that provide a sparse format for 1d arrays (only 2d arrays/matrices). As the number of states in transition matrices and the number of intermediate state vectors grow, more and more memory will be used. This is wastful as arrays with sparse data are treaded and stored as dense.
To combat this, both intermediate and final state vectors could be stacked incrementally into a matrix-form. This allows the use of sparse formats like CSC
and CSR
to vastly reduce memory usage. Both formats support efficient arithmetic operations, CSC
supports efficient column slicing and CSR
supports efficient row slicing.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.