Git Product home page Git Product logo

metran's Introduction

Pastas: Analysis of Groundwater Time Series

Important

As of Pastas 1.5, noisemodels are not added to the Pastas models by default anymore. Read more about this change here.

image image image image image image image image image image image

Pastas: what is it?

Pastas is an open source python package for processing, simulating and analyzing groundwater time series. The object oriented structure allows for the quick implementation of new model components. Time series models can be created, calibrated, and analysed with just a few lines of python code with the built-in optimization, visualisation, and statistical analysis tools.

Documentation & Examples

Get in Touch

  • Questions on Pastas can be asked and answered on Github Discussions.
  • Bugs, feature requests and other improvements can be posted as Github Issues.
  • Pull requests will only be accepted on the development branch (dev) of this repository. Please take a look at the developers section on the documentation website for more information on how to contribute to Pastas.

Quick installation guide

To install Pastas, a working version of Python 3.9, 3.10, 3.11, or 3.12 has to be installed on your computer. We recommend using the Anaconda Distribution as it includes most of the python package dependencies and the Jupyter Notebook software to run the notebooks. However, you are free to install any Python distribution you want.

Stable version

To get the latest stable version, use:

pip install pastas

Update

To update pastas, use:

pip install pastas --upgrade

Developers

To get the latest development version, use:

pip install git+https://github.com/pastas/pastas.git@dev#egg=pastas

Related packages

  • Pastastore is a Python package for managing multiple timeseries and pastas models
  • Metran is a Python package to perform multivariate timeseries analysis using a technique called dynamic factor modelling.
  • Hydropandas can be used to obtain Dutch timeseries (KNMI, Dinoloket, ..)
  • PyEt can be used to compute potential evaporation from meteorological variables.

Dependencies

Pastas depends on a number of Python packages, of which all of the necessary are automatically installed when using the pip install manager. To summarize, the dependencies necessary for a minimal function installation of Pastas

  • numpy>=1.7
  • matplotlib>=3.1
  • pandas>=1.1
  • scipy>=1.8
  • numba>=0.51

To install the most important optional dependencies (solver LmFit and function visualisation Latexify) at the same time with Pastas use:

pip install pastas[full]

or for the development version use:

pip install git+https://github.com/pastas/pastas.git@dev#egg=pastas[full]

How to Cite Pastas?

If you use Pastas in one of your studies, please cite the Pastas article in Groundwater:

To cite a specific version of Pastas, you can use the DOI provided for each official release (>0.9.7) through Zenodo. Click on the link to get a specific version and DOI, depending on the Pastas version.

  • Collenteur, R., Bakker, M., Caljé, R. & Schaars, F. (XXXX). Pastas: open-source software for time series analysis in hydrology (Version X.X.X). Zenodo. http://doi.org/10.5281/zenodo.1465866

metran's People

Contributors

bdestombe avatar dbrakenhoff avatar martinvonk avatar wlberendrecht avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

metran's Issues

Update testing routine

  • Add tests for newer versions of Python
  • Add black / isort formatting
  • Use tox for testing routine, similar to Pastas

Save/load metran models

It would be nice to build a pas-like file (see Pastas) to save models and load them again.

We could try to follow pastas' style and have the different relevant objects in a model contain to_dict() methods. Similar to pastas this would include a series keyword argument to store the dictionary with or without time series.

We can reuse the pastas encoder to write certain datatypes to json for storing it as a file. I guess we could use a .metran extension or something similar.

No solution when using engine="numba" for Kalman Filter

For some datasets, the optimization does not succeed due to a ZeroDivisionError in SciPy.

Can be reproduced with the following dataset: test.csv

df = pandas.read_csv(test.csv, index_col=0, parse_dates=True)
mt = metran.Metran(df)
mt.solve()

The issue can be resolved by using in the SPKalmanFilter(engine="numpy"). I discussed this issue with @dbrakenhoff and we suspect that engine="numpy" is more robust since it fills in inf or nan for logarithms and fractions automatically while engine="numba" does not.

This can be resolved by allowing the user to specify the SPKalmanFilter engine which is currently only possible by changing the source code.

Bug in kalmanfilter - decompose

cdf_means = [[]] * ncdf

An error occurs here for ncdf>1, as the lists produced by the loop are ncdf times too long. Consider the following example:

a = [[]] * 2
a[0].append([0])
print(a)

I think the code in metran expects [[[0]], [[]]] to be printed, but instead [[[0]], [[0]]] is printed.

In which case decompose_simulation retrieves cdf_means that are too long. Or it could definitely be the case that I don't understand the code..

Allow user to specify number of common dynamic components

It would be a nice feature to be able to override the automatic method to determine the number of common dynamic components.

For example:

import metran as mt

ml = mt.Metran(oseries, nfactors=2)
ml.solve()

Currently the FactorAnalysis class also contains a maxfactors argument that can presumably be used to limit the no. of factors. This is not exposed through the Metran model class however. So perhaps we should also expose this argument in the Metran class?

Additionally it would be nice to test the current implementation for estimating the number of factors on a dataset that results in 2 (or more) common components.

So in short:

  • Allow manual setting for number of factors
  • Expose maxfactors keyword argument in FactorAnalysis (if this makes sense)
  • Test metran with dataset that results in 2+ common dynamic components

Version Specifyer Deprecation

DEPRECATION: metran 0.2.0 has a non-standard dependency specifier numpy>=1.16.5matplotlib>=3.0. pip 23.3 will enforce this behaviour change. A possible replacement is to upgrade to a newer version of metran or contact the author to suggest that they release a version with a conforming dependency specifiers. Discussion can be found at pypa/pip#12063

Velicer's MAP Test results in 0 factors for Dynamic Factor Model notebook

David and I get different results dependent on our machines. They get 0 factors with Velicer's MAP test while I get 1 (as intended originally). Velicer's MAP test code:

def _maptest(cov, eigvec, eigval):
"""Internal method to run Velicer's MAP test.
Determines the number of factors to be used. This method includes
two variations of the MAP test: the orginal and the revised MAP test.
Parameters
----------
cov : numpy.ndarray
Covariance matrix.
eigvec : numpy.ndarray
Matrix with columns eigenvectors associated with eigenvalues.
eigval : numpy.ndarray
Vector with eigenvalues in descending order.
Returns
-------
nfacts : integer
Number factors according to MAP test.
nfacts4 : integer
Number factors according to revised MAP test.
References
----------
The original MAP test:
Velicer, W. F. (1976). Determining the number of components
from the matrix of partial correlations. Psychometrika, 41, 321-327.
The revised (2000) MAP test i.e., with the partial correlations
raised to the 4rth power (rather than squared):
Velicer, W. F., Eaton, C. A., and Fava, J. L. (2000). Construct
explication through factor or component analysis: A review and
evaluation of alternative procedures for determining the number
of factors or components. Pp. 41-71 in R. D. Goffin and
E. Helmes, eds., Problems and solutions in human assessment.
Boston: Kluwer.
"""
nvars = len(eigval)
fm = np.array([np.arange(nvars, dtype=float), np.arange(nvars, dtype=float)]).T
np.put(
fm,
[0, 1],
((np.sum(np.sum(np.square(cov))) - nvars) / (nvars * (nvars - 1))),
)
fm4 = np.copy(fm)
np.put(
fm4,
[0, 1],
(
(np.sum(np.sum(np.square(np.square(cov)))) - nvars)
/ (nvars * (nvars - 1))
),
)
for m in range(nvars - 1):
biga = np.atleast_2d(eigvec[:, : m + 1])
partcov = cov - np.dot(biga, biga.T)
# exit function with nfacts=1 if diag partcov contains negatives
if np.amin(np.diag(partcov)) < 0:
return 1, 1
d = np.diag((1 / np.sqrt(np.diag(partcov))))
pr = np.dot(d, np.dot(partcov, d))
np.put(
fm,
[m + 1, 1],
((np.sum(np.sum(np.square(pr))) - nvars) / (nvars * (nvars - 1))),
)
np.put(
fm4,
[m + 1, 1],
(
(np.sum(np.sum(np.square(np.square(pr)))) - nvars)
/ (nvars * (nvars - 1))
),
)
minfm = fm[0, 1]
nfacts = 0
minfm4 = fm4[0, 1]
nfacts4 = 0
for s in range(nvars):
fm[s, 0] = s
fm4[s, 0] = s
if fm[s, 1] < minfm:
minfm = fm[s, 1]
nfacts = s
if fm4[s, 1] < minfm4:
minfm4 = fm4[s, 1]
nfacts4 = s
return nfacts, nfacts4

On my device:

eigvec = array([[ 0.96750358, -0.25285732],  [ 0.96750358,  0.25285732]])
eigvec[0,0] = 0.9675035797467857
eigvec[0,1] = -0.25285731782401605
eigvec[1,0] = 0.9675035797467855
eigvec[1,1] = 0.2528573178240161

Later on this results in:

minfm = 1.000000000000007
minfm4 = 1.0000000000000142

which yields True for (if s = 1):

if fm[s, 1] < minfm:`

with fm[s, 1] = 1.0

Todo list when Metran goes public

  • build documentation on readthedocs
  • code-style checks codacy
  • upload coverage codacy
  • add dev branch for development
  • add automatic PyPI release workflow
  • make first release

Metran seems to work!

Hoi @wlberendrecht en @dbrakenhoff ! Ik heb net even getest, en het is me gelukt metran te installeren en de notebook te runnen. Een paar methodes aan het einde van de notebook moest ik aanpassen maar verder werkt alles! 👍🏻

Ik moet er nog eens rustig naar kijken, maar leuk dat het technisch al goed lijkt te werken.

Groet,
Raoul

problem importing metran

Hi

I am getting this error when trying to load the package

AttributeError: module 'pastas' has no attribute 'stats'

lag structure and autocorrelation of residuals

Hi,
Good work.

Wondering if the metran only assumes AR(1) process.

statsmodels have more flexible DFM configurations (see here an example), but they lack the ability to determine the optimal lags and numbers of structures.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.