sdxorg / pysd Goto Github PK

System Dynamics Modeling in Python

License: MIT License

Python 98.89% Makefile 0.09% TeX 1.02%

pysd's Introduction

PySD

This project is a library for running System Dynamics (SD) models in Python, with the purpose of improving integration of Big Data and Machine Learning into the SD workflow.

The current version needs to run at least Python 3.9.

Resources

See the project documentation for information about:

For standard methods for data analysis with SD models, see the PySD Cookbook, containing (for example):

Why create a new SD simulation engine?

There are a number of great SD programs out there (Vensim, iThink, AnyLogic, Insight Maker, and others). In order not to waste our effort, or fall victim to the Not-Invented-Here fallacy, we should have a very good reason for starting a new project.

That reason is this: There is a whole world of computational tools being developed in the larger data science community. System dynamicists should directly use the tools that other people are building, instead of replicating their functionality in SD specific software. The best way to do this is to bring specific SD functionality to the domain where those other tools are being developed.

This approach allows SD modelers to take advantage of the most recent developments in data science, and focus our efforts on improving the part of the stack that is unique to System Dynamics modeling.

Cloning this repository

If you'd like to work with this repository directly, you'll need to use a recursive git checkout in order to properly load the test suite (sorry..)

The command should be something like:

git clone --recursive https://github.com/SDXorg/pysd.git

Extensions

You can use PySD in R via the PySD2R package, also available on CRAN.

Contributing

PySD is currently a community-maintained project, any contribution is welcome.

Many people have contributed to developing this project - by submitting code, bug reports, and advice. Main historic changes in PySD are described in the About PySD section. The Developer Documentation could help new developers.

The code for this package is available at: https://github.com/SDXorg/pysd

Join our slack channel in sd-tools-and-methodology-community.

pysd's People

Contributors

Stargazers

Watchers

Forkers

infonality bpowers bahares pbreach planewryter mhy05 rosscollins lordmzn gkbharathy simonstrong joonatuovinen keith-citrenbaum chen-jianghang eebart rick1609 quipa ansuma25 alexprey bluetyson lzim equihuam candyrobber rugue1 philip928lin duthedd kaetze66 agurvinder rafialhamd julienmalard enriquecornejo shiv-io mazzol ainar yjx4131 rogersamso anakinnn stefano-romano harry159821 yunho0130 mbafrani peterhovmand jakefreise askasjoe cfkuocfkuo terfani tnonet mortezahaddadi behrouz-bakhtiari hidekinamerikawa engreeneers xiaojieqiu arnislektauers andreramos7 davidescuderomancebo connectedsystems marrobl yasminerazavi xfysq1 vishalbelsare lionel42 rayniervanegmond systems-explained nchowes dynamic-system-identification lagamura djpasseyjr ewouth dashadower rmallof hyunjimoon tahirinadia paulritsche gwr69 yjian180 sudarina betty2008sh laudehenri kinow iq-scm jeffamaxey gdrosos lidi100 standardgalactic matteobevilacqua anders1111 eshmel

pysd's Issues

Model documentation fails to print

After loading a model ("model = pysd.read_vensim('MVD.mdl')"), a command like "print model.components.doc()" does not work. IPython outputs:

AttributeError: component_class instance has no attribute 'doc'

Active initial

Returns the active equation during simulation, except when needed for determining initial conditions, when the initial equation is returned. Normally this function is used to break a loop of simultaneous initial value equations.

NOTE In the Equation Editor the ACTIVE INITIAL function is automatically entered when you select the variable type Auxiliary and the subtype with Initial
.
Restrictions: Must appear first on the right of the = sign and not be followed by anything else. It defines a variable as an Auxiliary variable with a separate initialization condition.

Units:
ACTIVE INITIAL(input units,input units) --> same units

Example

Capacity = Integ(capacity adjust,target capacity) ~~|
target capacity = Capacity * adjust from utilization ~~|

This would simulate properly, except that the initial conditions cannot be computed correctly: the initial conditions of Capacity require a value for target capacity, which in turn requires a value for Capacity. Since Vensim does not support the implied simultaneous computation, you need to use the following equation for target capacity:

target capacity = ACTIVE INITIAL(Capacity * adjust from utilization,
 100) ~~|

This will cause Capacity to be initialized at 100; then the first value of target capacity will be Capacity * adjust from utilization. In general, this will not be 100; the initial expression is used only to compute the initial conditions of State variables

Lookup Tables from XMILE don't parse

The machinery to parse lookup tables from xmile isn't yet implemented.

The vensim spec looks like this:

<aux name="lookup_result">
    <gf>
        <xpts>1,200,400,500</xpts>
        <ypts>0.2,0.1,0.01,0</ypts>
    </gf>                
    <eqn>lookup_value</eqn>
</aux>

So right now, we are essentially just passing the lookup value through as equivalent to the result.

Implement lookup tables

Current vensim and xmile importers omit lookup tables.

Implement the import in such a way as to allow the lookup tables to be modified or replaced with other objects/functions.

Use model original names as column titles for pandas output

Capture and store the original names of the variables, use them as pandas columns output.

Consider converting component class to a module

This would make integration with numba easier, and the model definitions cleaner.

return_timeseries parameter in run function can't handle arrays, demands lists

We want to be able to do:

    model.run(return_timestamps=data.index.values)

but we get an error:

    121         ##### Setting timestamp options
--> 122         if return_timestamps:
    123             tseries = return_timestamps
    124         else:

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

We should change this if statement to perhaps ensure that it is not null/none.

In the meantime, the workaround is:

    model.run(return_timestamps=list(data.index.values))

Error with integer division

When importing constants from vensim, python treats whole numbers as integers. This makes division yield unexpected results.

    >>> a = 3
    >>> b = 4
    >>> a/b
    0

Ideally, all numbers we deal with should be floats. One of the ways we may choose to deal with this is through the inclusion of cython which adds static types and helps with code performance.

Create speed profiling tests

Before we optimize the code for speed, it would be helpful to have a benchmark set of models, run cases, etc. to test against. This could take a form similar to the test_pysd.py test suite.

Give support functions access to the simulation time

We have the module 'functions' which supplies things like step or if_then_else. We import these functions into the model with import functions and access them as functions.step(a, b), etc. Some of these functions, such as step need access to the current time of the simulation. In v0.3, we handled this by making the functions attributes of a class, the class having a pointer to the components class, etc.

In the interests of simplicity, I'd like to move away from the class architecture, and have these be regular functions. This means that in order to have access to the model time, they need to be instantiated within the components namespace.

The simplest way to do this may be to use (instead of import) the execfile command, which bypasses the module administration. This would create all of the functions within the scope of components.

This isn't quite as clean as having the module import, and there is something floating around the parts of my mind that are generally active when walking through bear country at night, but it is simple.

If the function names are vensim keywords, then any vensim model we write won't have a name conflict. But when we expand to have XMILE imports, we can't trust this. It would be nicer still to have the namespace protection, but just to share the time value with the imported namespace.

Another issue is that we want to make sure that these functions don't get caught up in the cache, as they may be called by different parts of the model, and need to provide different return values based upon that parameter. Namespace protection would help with that.

Vensim Lookup Table Parser fails on in-line lookups

Externally defined lookup tables parse fine in vensim:

Lookup Table(
    [(0,0)-(100,1)],(0,0),(5,.01),(20,0.2),(30,0.5),(70,0.9),(90.353,.99),(100,1))
    ~   
    ~       |

Y Value=
    Lookup Table(X Value)
    ~   
    ~       |

But those defined in the body of an equation don`t have a correct parsing grammar:

Y Value = WITH LOOKUP (
    X Value,
        [(0,0)-(100,1)],(0,0),(5,.01),(20,0.2),(30,0.5),(70,0.9),(90.353,.99),(100,1)) )

Implement Test Coverage

Generally good practice to test your software..

Improve control over integration

Scipy's Odeint gives us a bunch of integration parameters to work with, as documented here.

tcrit is a vector of critical points (e.g. singularities) where integration care should be taken. We could allow the user to set this manually, or automatically determine what points to 'take care' with based upon step, pulse functions, etc.

'hmax' is the maximum absolute step size allowed, and could be a useful way to get a quick handle on the critical points. We could set it by the model step size.

Subscripted Delays with different residence times

In theory, we should be able to build up a structure that allows different elements of a subscripted delay to have different residence times.

Vensim import fails on complex identifiers

Currently we only read identifiers that meet the following pattern:

~"[a-zA-Z]" ~"[a-zA-Z0-9_\$\s]"*

specifically, start with a letter, then are followed by a letter, number, underscore, dollar sign, or space.

Eventually, we should be able to parse everything that vensim can create, or at least the full set of identifiers recognized by XMILE (section 2.2.1).

For now, a workaround is just to make vensim identifiers more simple.

Handle Logical Operators in vensim parsing

Need to be able to handle :AND:,:OR:, and :NOT:

I think thats all of them?

References to time fall through

Vensim refers to time as an implicit variable called time. PySD currently implements this with the name components.t. We should either change this to be compatible with Vensim, or translate any references to time as references to the appropriate PySD component.

Errors importing multiple models

If we run:

import pysd
model = pysd.read_vensim('Lookups_Test.mdl')
model = pysd.read_vensim('teacup.mdl')

We get an error:


---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-1-492310e6bce1> in <module>()
      1 import pysd
      2 model = pysd.read_vensim('Lookups_Test.mdl')
----> 3 model = pysd.read_vensim('teacup.mdl')

/Library/Python/2.7/site-packages/pysd/pysd.pyc in read_vensim(mdl_file)
     60         """
     61     component_class = _translators.import_vensim(mdl_file)
---> 62     return pysd(component_class)
     63 
     64 def help():

/Library/Python/2.7/site-packages/pysd/pysd.pyc in __init__(self, component_class)
     74 class pysd:
     75     def __init__(self, component_class):
---> 76         self.components = component_class() #this is where we create an instance of the model subclass
     77         self.record = []
     78 

/Library/Python/2.7/site-packages/pysd/translators/component_class_template.pyc in __init__(self)
     32 
     33     def __init__(self):
---> 34         self.reset_state()
     35         self.__doc__ = self.doc()
     36 

/Library/Python/2.7/site-packages/pysd/translators/component_class_template.pyc in reset_state(self)
     63 
     64         for key in self.state.keys():
---> 65             self.state[key] = eval('self.'+key+'_init()') #set the initial state
     66         pass
     67 

/Library/Python/2.7/site-packages/pysd/translators/component_class_template.pyc in <module>()

AttributeError: component_class instance has no attribute 'integrate_lookup_init'

Which seems like python is trying to modify the original model object in the import of the next model, instead of replacing it.

Trouble parsing newlines in equations

It seems that this snippet:

stock= INTEG (
    inflow-outflow,
        10
        )

Fails to parse, whereas this:

stock= INTEG (
    inflow-outflow,
        10
        )

succeeds.

Integrate with Conda Package Manager

Anaconda seems to be an easy way for folks to set up the bulk of the python stack - it would be nice if we had a really easy way to integrate with the environment.

Set the output type on 'measure'

We should consider setting the output type on the 'measure' function, to match the 'run' function.

Get model value by string

Would be helpful to have a function that lets you get the current value of a model component by its name:

>>> model.get_component('parameter_1')
{'parameter_1':42}

>>> model.get_component(['parameter_1', 'parameter_2'])
{'parameter_1':42, 'parameter_2':64}

Priority Allocation Function

Would be nice to include vensim's priority allocation function.

Handle Vensim Subscripting

A number of the most interesting models utilize subscripts, and so if PySD is to be useful in their analysis, it should support this feature.

What we describe as 'subscripts' encompasses a number of different features:

Subscript Ranges
Subscript Subranges
Multiple/Multidimensional Subscripts
Subscript Summary Functions
Subscript Mapping
Multiple equations
Exceptions
Numeric Ranges

We should start with only a subset of these features. (An outstanding question is if we want to ever support all of them. A good goal would be to match the functionality of the XMILE schema.)

Subscripts have different behaviors in a number of different contexts:

Constants
Vector Functions
Arrayed Stocks

Import Macros from Vensim, etc

pass timeseries as parameter values

Would be good to be able to pass a pandas timeseries as a parameter value, and have the values from the timeseries used in the integration:

ts = pd.Series(...)
model.run(params={'input':ts})

Return_columns fails with non-list type

We should be able to use either of the following syntax:

model.run(return_columns=['stockname'])
model.run(return_columns='stockname')

Currently only the first works.

'Measure' function cannot take non-array values

We may want to measure a value at a particular point, ie:

model.measure('output', model.tstop)

but the measure function expects an array, and throws an error

TypeError                                 Traceback (most recent call last)
<ipython-input-25-68f4c9b47589> in <module>()
----> 1 model.measure('output', model.tstop)

/Library/Python/2.7/site-packages/pysd/pysd.pyc in measure(self, elements, timestamps)
    159 
    160         #this part does a weird interpolation to estimate the stock values at the given times
--> 161         ts = _pd.DataFrame(index=list(set(timestamps)-set(self.stock_values.index)), columns=self.stock_values.columns)
    162         #the set arithmetic means we should only add values that arent in the dataset
    163         # need to work out why we have duplicate timestamps AND EXTERMINATE THEM

TypeError: 'float' object is not iterable

Overall, that interpolation is slow and problematic and we should work out a new way to do it.

Integrator seems to fail with discontinuous functions such as 'pulse'

Setting the max step size such the integrator is guaranteed to hit the pulse helps somewhat, but still integration issues. May need to abandon adaptive step size.

Case sensitivity in matching dictionary objects impedes translation from Vensim

Currently the translator expects vensim keywords to be all caps: SIN, LN, etc. However, vensim allows you to use mixed case: Sin, sin, etc.

Implement CI tests

Right now the test suite is just run locally, which is fine for one or two people doing dev. It would be nice to expand the pool of contributors, and to enable that we should consider using a hosted CI testing framework.

TravisCI seems to offer free plans for open source projects, which is what we are, so that might be the way to go.

Deal with multi-line equations in vensim translator

When Vensim has a particularly long equation, it breaks up the equation onto multiple lines with the syntax: (you may need to scroll right)

Product=
    Particularly effusive auxiliary descriptor * Semi Infinite Component Moniker * Terrifically elaborate element handle\
         * Very Long Variable Name
    ~   
    ~       |

for the moment it appears that variable names remain intact.

Our importer doesn't know how to handle the backslash-newline convention.

Allow DelayN Function to take variable order input

Currently, delay orders must be supplied as constants, because the delay structure gets built at translate time. It would be nice to be able to use a variable to set the delay order. This would require building delay structure at initialization time... That might be nice, because it would simplify the model file...

But, that's a bit of a different paradigm!

Apply the cache decorator to function replacements.

Assume we have a model with a constant value, and we replace it with a function, say rand(). The original function would have had the caching decorator applied, but the replacement would not. It would be good to be able to import the decorator from the library and apply it.

Add error checking

On initial_conditon setting:

Currently, its possible to 'set' an initial condition that doesn't exist. For instance, if the state is:

(0, {'stock1':12, 'stock2':15})

we should not be able to do:

model.set_initial_condition((0,{'stork1':15, 'stock_2':19})

because stork1 and stock_2 don't correspond to existing states.

On return_timestamps:

We should check that the list is monotonically increasing.

Efficient model evaluation and memoization

The current implementation recalculates a lot of values that are calculated in the integration step in order to extract those values (non flow values usually, aux or flow) later on. We should work out how to avoid repeat calculation, while minimizing the overhead of memoization.

Handle datetime objects as timeseries indicies

It would be good to be able to properly integrate over a timeseries.

Modify model without running

It would be neat to be able to modify the model without having to run it. For instance, we might set up a bunch of models, each with different parameters, and run them all at the same time later in the program.

Unclear what the set function should return. It could return a pointer to the model itself, so that you could do a chained import/modify operation in one line, like:

model = pysd.import_vensim('model.mdl').set({'delay':5})

The danger is that this could lead to having multiple pointers to the same model, and get confusing.

Need an efficient 'step' function

If we're running in realtime, or we want a model to interact with another model (ie, in a patch type model where we import the model one patch at a time) we need to be able to have efficient ways to bring the integration out of a run-at-a-time function call.

One way to do this would be to call a 'step' method that would run the model forward one step and then return control to the user.

We could also consider ways of tying multiple models together within the same integrator. We could consider putting 'run' at the package level, instead of at the model level, so that you could pass multiple model objects together into the integrator.

Implement Delay and Smooth functions

A delay function must essentially create a number of hidden 'stocks' in the background of the model (one for each order of the delay). This requires some intelligent construction in the translation functions, and a clever ability to handle these delays.

Additionally, some types of analysis won't be quite as easy when delay stocks are hidden, (mostly things that modify the model step-at-a-time) so we'll need to think about that some as well.

Add capability for intelligently handling units

Import units, and deal with them in ways that allow datetime components to work with datetime-capable integration.

Set output on run function

Currently, the run function will return a dataframe with stock values at each timestamp as calculated by tstart, tstop, and dt. It would be good to be able to specify what columns we want to look at - stocks, flows, etc; and what timestamps we want to sample them over.

Port translation to ANTLR4

There seems to be significant energy around building translation tools and grammars using ANTLR4 - switching to this platform might be a nice way to tap into some of that energy and improve support for the translation tool.

Implement Python Code Accelerations

It would be good to use some of python's capabilities for improving runtime speeds, such as cython or theano.

Initial conditions that are set as part of a 'run' command don't get propagated properly

For example, if I want to run the model with a different initial stock value, I should be able to do this:

model.run(params={'delay_buffer_init':50, 'input':10})

but the delay_buffer_init value gets lost.

We can work around this by setting the values of the object directly, but as the initial_values element is an array (for passing into the integrator) not a dict, this is messy:

model.initial_values = [50]

Can't handle models with no stocks

A model without any stocks is just a bunch of equations to solve, and doesn't actually need to run - calling the variables will return the correct answer for all of time.

That being said, if we do run a model without any stocks, we should still be able to have the run function handle the situation gracefully, either by throwing an error, or returning something else.

Takes forever to import

We're using parts of some pretty big libraries, and we import most of them. We should be more selective about what we import, partly to speed up the import process for the pysd user.

Support for PySD in Python 3.x

Sometime in the near future we should figure out how to support both python 2 and python 3.

There seem to be two major pathways:

Maintain the codebase in python 2 and use tools like 2to3 or modernize or futurize in order to generate python 3 code from the python 2 base.
Maintain the codebase in python 3, and use tools like python-future to allow imports into python 2 environments.

For now I'm leaning towards the first option, as that's where the codebase is currently. If it turns out that a lot of the libraries we want to interface with for model analysis require 3, then we can change strategy.

Dev pathway for this probably looks like creating a fork of the current, non-subscripted master branch to to work in. It'll be simpler than trying with the subscripted version, at least until that is cleaned up. Then we can offer a python 3 version of the current codebase on pypi, and repeat the process when we're ready with the subscripted branch.

I expect we'll need to generate some sort of script to run the conversion tools and place new files in their own (python3) directory. Then we'll need to include the conversion and python 3 execution in the unit tests, so that when we make changes in the python2 codebase, we know they don't break the python3 implementation.

Sequence run

If you're fitting a model to data, sometimes it is helpful to initialize the model at a datapoint, run the model forward one timestep, compare with the next datapoint, reinitialize based upon the data, and run forward again. This type of operation should be easily parallelizable. One way to do this would be to pass into the run function an array of initial conditions, and have pysd cleverly allocate how they should be calculated.

Import fails on user-entered vensim equation line breaks

In the vensim equations box, you can type an equation to span multiple lines, for instance:

Model component 1 *
Model component 2

You can also write a very long line that vensim will break into shorter lines with a line termination character. For example,

Model component with an extremely long name * Another model component with another extremely long name

Becomes:

Model component with an extremely long name * \
Another model component with another extremely long name

We're good at parsing the second case, but unfortunately vensim doesn't add the line termination characters in the first case.

This could be tricky for the parser because we currently use lines for identifying where equations start and stop.