elfi-dev / elfi Goto Github PK
View Code? Open in Web Editor NEWELFI - Engine for Likelihood-Free Inference
Home Page: http://elfi.readthedocs.io
License: BSD 3-Clause "New" or "Revised" License
ELFI - Engine for Likelihood-Free Inference
Home Page: http://elfi.readthedocs.io
License: BSD 3-Clause "New" or "Revised" License
Running a deterministic Lotka-Volterra example parameterised using ABC rejection. This follows the example outlined in Toni et al, (2009) where two parameters are to be found in predator-prey system. It is required to choose parameters that minimize the sum of squared errors between the observed and simulated time series. Priors for the parameters are taken from the paper, observed data are estimated from the paper above.
Rather than outputting a multi-dimensional array from the LV model, I've concatenated results from the model into a 1D array.
Not quite sure which of these functions I need to change. Perhaps the LV()
function needs to allow processing of vector inputs? Any help would be much appreciated!
The Lotka-Volterra system of ODEs is created in Python, priors are defined, a Simulator object is defined (and tested), distance measure defined (and tested), ABC rejection sampler defined, but sampling from the rejection sampler throws an error.
The following code reproduces the error.
import numpy as np
import elfi
from scipy.integrate import odeint
seed = 2017
np.random.seed(seed)
# Define the observed data and flatten the array
y_obs_lv = np.array([[1.9, 0.5],[0.63, 2.6],[0.2, 1.55],[0.3, 0.03],
[1.65, 1.15],[1.15, 1.68],[0.25, 1.1],[2.94, 0.94]])
y_obs_lv = y_obs_lv.flatten()
# Define the Lotka-Volterra derivative function
def LVderivative(xy, t, a, b):
x, y = xy
dx = a*x - x*y
dy = b*x*y - y
return [dx, dy]
# Define the function as input to the elfi.Simulator object
def LV(a, b, batch_size = 1, random_state = None):
# Initial conditions state
y0 = [1.0, 0.5]
# Define times over which to solve the system of ODEs
times = np.linspace(0, 15, 151)
# Actually solve the ODE
result = odeint(LVderivative, y0, times, args = (*a, *b))
# Define the subset of times of interest
t_obs = np.array([1.2, 2.4, 3.9, 5.7, 7.5, 9.6, 11.9, 14.5])*10
t_obs = t_obs.astype(int)
# Subset the solution and flatten the returned output
return result[t_obs,:].flatten()
# Define the priors for alpha and beta
a = elfi.Prior('uniform', -10, 20, name = 'a')
b = elfi.Prior('uniform', -10, 20, name = 'b')
# Define the elfi.Simulator object
Y_lv = elfi.Simulator(LV, a, b, observed = y_obs_lv)
# Check the Simulator object can generate data
Y_lv.generate()
> array([ 5.73756329e-02, 3.45129771e-03, 3.49858603e-03,
8.29044191e-04, 1.06221009e-04, 1.82366487e-04,
1.60337990e-06, 3.01316351e-05, 2.38143636e-08,
4.98299215e-06, -1.46843950e-09, 6.08976108e-07,
-1.32007952e-09, 5.80888435e-08, 5.60478623e-11,
3.97446214e-09])
# Define sum of squared errors as the distance function
def SSE(x, y):
return np.sum((x-y)**2)
d_lv = elfi.Distance(SSE, Y_lv)
# Check the Distance object can generate SSE correctly
d_lv.generate()
> 516.5330268484762
# Run ABC-rejection
rej = elfi.Rejection(d_lv, batch_size = 1, seed = seed)
result = rej.sample(10000, threshold = 4.3)
Current output gives the following error
> TypeError: object of type 'numpy.float64' has no len()
0.6.0
3.6.1
OSX 10.12.5
Hi all, I'm just listening to a talk by Jukka and learning about ELFI for the first time, sounds like an amazing project!
I just thought I would mention a potential connection with another project called Dask, which is providing a parallel computing framework for Python that can be run on a wide variety of parallel & distributed computing platforms, including local multicore, public cloud and traditional HPC environments. I saw ELFI supports multiprocessing and ipyparallel, so Dask could potentially be used instead of these and make it easier to then deploy and run on a broader range of computing backends.
No need to respond, feel free to close the issue straight away, just thought I'd make the connection.
Is it possible to define the mean function for the Gaussian process.
A quadratic one for example?
If so, can you provide the syntax for that?
Thank you.
The tox virtual environments need to be setup so that tests can be run against different Python interpreters. Currently the problem seems to be that GPy can't be installed directly from pip without first installing numpy.
For example,
in ELFI tutorial...
We should specify observed data like below:
Y = elfi.Simulator(MA2, t1, t2, observed=y_obs)
for calculating a discrepancy like:
d = elfi.Distance('euclidean', S1, S2)
What I wonder is,
without explicitly passing 'observed' argument to the Simulator object,
can we make it possible to do ABC-rejection by modifying discrepancy function?
e.g. instead of passing observed data to Simulator,
calculate all discrepancies between simulated samples and observed samples in a custom Distance function.
Elfi nodes should be able to use previously stored data. Being able to safely produce additional data to the stored data requires that we know at least the master seed and sub stream index, so that we don't repeat already computed values that are in the store.
Add to ElfiStore
an optinal mask for stored values. When using numpy arrays, use the numpy.ma
All the vectorized wrappers for different nodes could be combined to one vectorized function. Currently they share a lot of code.
How to do it:
For positional arguments take the i:th element from each. If it is a tuple (as in distance node, take the i:th element from each array in the tuple). n_sim
would be always one. If prng
(stochastic mixin) is given, pass it as a keyword argument to the operation function.
Please provide a short couple sentence summary.
I followed the tutorial, then
I got an elfi.Sample which fails to get summary.
While these is also error about parameters
Describe the issue as clearly as possible.
TypeError: unsupported format string passed to numpy.ndarray.format
The prior distribution of parameter beta is [0.05,0.2], while in most results, result.sample_means['beta'] is 'inf'
Please report steps to reproduce the issue. If it's not possible to reproduce, please include a description of how you discovered the issue.
If you have a reproducible example, please include it.
import time
import numpy as np
import scipy.stats
import matplotlib
import matplotlib.pyplot as plt
import logging
logging.basicConfig(level=logging.INFO)
import elfi
MSE1 = []
MSE2 = []
seed=3333
tao = 0.198
def M(alpha, beta, batch_size=1, random_state=None):
y = [1]
alpha = np.float(alpha)
beta = np.float(beta)
theta = [alpha, alpha*beta, tao]
theta = theta/np.sum(theta)
i = 1
while i < 20:
if y == []:
break
else:
b = np.random.multinomial(1, theta, size=1)
i = i + 1
if b[0][2] == 1:
y = y + [1]
elif b[0][0] == 1:
q = y/np.sum(y)
p = q.tolist()
a = np.random.multinomial(1, p, size=1).tolist()[0]
y = [f+g for f,g in zip(y,a)]
else:
i = i - 2
q = y/np.sum(y)
p = q.tolist()
a = np.random.multinomial(1, p, size=1).tolist()[0]
y = [f-g for f,g in zip(y,a)]
while 0 in y:
y.remove(0)
yalpha = sorted(y)
return yalpha
for seed in range(1000):
elfi.new_model()
np.random.seed(seed+100500)
alpha = elfi.Prior(scipy.stats.uniform, 0.05, 2, size=1)
beta = elfi.Prior(scipy.stats.uniform, 0.05, 0.2, size=1)
alpha0 = np.random.uniform(0.05,2,size=1).tolist()[0]
beta0 = np.random.uniform(0.05,0.2,size=1).tolist()[0]
y0 = M(alpha0, beta0)
Y = elfi.Simulator(M, alpha, beta, observed=y0)
def S1(x):
t1 = len(x)/20
return t1
def S(x):
y = 0
for u in x:
y = 100*y + u
return y
Su = elfi.Summary(S, Y)
d = elfi.Distance('euclidean', Su)
rej = elfi.Rejection(d, batch_size=1, seed=seed)
result = rej.sample(n_samples=10, threshold=0)
MSE1 = MSE1 + [np.square(result.sample_means['alpha'] - alpha0)]
MSE2 = MSE2 + [np.square(result.sample_means['beta'] - beta0)]
end = time.time()
end-start
np.mean(MSE1)
np.mean(MSE2)
The current output. Knowing what is the current behavior is useful.
output of np.mean(MSE2) is inf
and even without loop, result.summary gets error:
TypeError: unsupported format string passed to numpy.ndarray.format
Describe what you expect the output to be. Knowing the correct behavior is also very useful.
Implement a nice syntax for combining acquisition functions. For example:
a1 = AcquisitionA(...)
a2 = AcquisitionB(...)
acquisition = a1 + a2
Implement initial semi-parametric Bayesian Synthetic Likelihood as described by Ziwen An in https://link.springer.com/article/10.1007/s11222-019-09904-x. This will be implemented as an alternative method to estimate the synthetic likelihood in the BSL class.
Initialize the GP in BOLFI with existing simulations.
Current GP/BOLFI starts the acquisition procedure without using any existing information, which may waste resources.
When the DAG structure is changed, other nodes not part of the change need sometimes to be notified about it, so that they can e.g. reset their data. Currently this is achieved through Node.reset
. However the implementation is non optimal at the moment and should be refactored.
Hi,
This issues arises from a non-standard implementation of the rejection sampler. The summary statistics I'm working with are a set of distribution quantiles; instead of simulating random samples and computing the quantiles of these samples, I've found it's much faster to directly calculate the theoretical quantiles from a given parameter set in the "simulation" step, then define dummy summary nodes which simply pick out the elements of the simulation output. However, when I apply the regression adjustment via elfi.adjust_posterior()
, the posterior distribution of each parameter becomes a point estimate (i.e. the outputs
arrays contain the same value repeated n_samples
times).
In the example below, I attempt to model the quantiles of a chi-squared distribution with 5 degrees of freedom using the normal distribution:
import numpy as np
import scipy.stats
import elfi
seed = 20200227
# Observed data
df = 5
qtiles = np.arange(0.1, 1, 0.1)
quantiles = scipy.stats.chi2.ppf(qtiles, df)
# Simulation function
def theoretical_quantiles(q_vec, loc, scale, batch_size=1, random_state=None):
loc, scale = np.atleast_1d(loc, scale)
return scipy.stats.norm.ppf(q_vec, loc[:, None], scale[:, None])
# Summary function
def match_quantile(q, idx):
return np.atleast_2d(q)[:, idx]
# Model object
mod = elfi.new_model()
# Priors
elfi.Prior(scipy.stats.norm, quantiles.mean(), quantiles.std(), model=mod, name='loc')
elfi.Prior(scipy.stats.uniform, 0, 10, model=mod, name='scale')
# Simulation node
s = elfi.Simulator(theoretical_quantiles, qtiles, mod['loc'], mod['scale'],
observed=quantiles, name='norm')
# Dummy summary statistic nodes
summary_stats = list()
for idx in range(len(qtiles)):
summary_stats.append(elfi.Summary(match_quantile, mod['norm'], idx, name=f"S{idx}"))
# Distance metric
d = elfi.Distance('euclidean', *summary_stats, name='d')
# Rejection sampling
rejection_sampler = elfi.Rejection(mod, 'd',
output_names=[ss.name for ss in summary_stats],
batch_size=1000, seed=seed)
res = rejection_sampler.sample(500, quantile=0.05)
adj = elfi.adjust_posterior(model=mod, sample=res, summary_names=[ss.name for ss in summary_stats])
res.plot_pairs()
adj.plot_pairs()
adj.outputs
In adj.outputs
you should see an array of length n_samples
for each distribution parameter (i.e. loc
and scale
), each containing only one unique value.
I'm not sure whether this is actually an issue with the elfi.adjust_posterior()
method or not. Interestingly, when my model is setup using a simulation function which actually generates a random sample and summary nodes which actually compute empirical quantiles of the sample, I do not see this behavior.
What sets ELFI apart from other commonly used ABC software? We should decide what software we want to compare against and what features should be compared. Here are some ideas:
Parallelized sampling does not work when .py file is executed with IPython or Spyder.
Running parallelized sampling takes forever or produces error if file is executed with Spyder or IPython, but everything works when same lines are copy pasted in IPython.
How to reproduce: Open Spyder, copy following code and save it:
import elfi
import numpy as np
from elfi.examples import ma2
seed = 20170530
np.random.seed(seed)
model = ma2.get_model()
elfi.set_client('multiprocessing')
rej = elfi.Rejection(model, 'd', batch_size=10000, seed=20170530)
print ("Starting sampling...")
result = rej.sample(5000, n_sim=int(1e6))
print(result.summary)
# %%
Please note that at the end of code there should be cell separator (# %%
), it does not affect the bug but makes following step with Spyder easier.
Then select first cell and run it (ctrl+enter). Everything should work and after a couple of second result should be printed out.
Then restart the kernel (ctrl+.), just to make sure following steps is executed in 'clean' environment.
Press F5 to execute code. Now it will take very long (or maybe forever) to complete. No error is shown however.
Another way to reproduce this error is just run the code from console with IPython, then the error in "Current Output section" is displayed. When code is just copy pasted inside IPython it works just as well as executing cell in Spyder.
The code keeps printing following error message in loop, when executed with IPython:
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "C:\ProgramData\Anaconda3\lib\multiprocessing\spawn.py", line 105, in spawn_main
exitcode = _main(fd)
File "C:\ProgramData\Anaconda3\lib\multiprocessing\spawn.py", line 114, in _main
prepare(preparation_data)
File "C:\ProgramData\Anaconda3\lib\multiprocessing\spawn.py", line 225, in prepare
_fixup_main_from_path(data['init_main_from_path'])
File "C:\ProgramData\Anaconda3\lib\multiprocessing\spawn.py", line 277, in _fixup_main_from_path
run_name="__mp_main__")
File "C:\ProgramData\Anaconda3\lib\runpy.py", line 263, in run_path
pkg_name=pkg_name, script_name=fname)
File "C:\ProgramData\Anaconda3\lib\runpy.py", line 96, in _run_module_code
mod_name, mod_spec, pkg_name, script_name)
File "C:\ProgramData\Anaconda3\lib\runpy.py", line 85, in _run_code
exec(code, run_globals)
File "C:\Users\ossig\Google Drive\Koulujutut\ABC\Elfibugi2.py", line 8, in <module>
elfi.set_client('multiprocessing')
File "C:\ProgramData\Anaconda3\lib\site-packages\elfi\client.py", line 37, in set_client
client = m.Client()
File "C:\ProgramData\Anaconda3\lib\site-packages\elfi\clients\multiprocessing.py", line 30, in __init__
self.pool = multiprocessing.Pool(processes=num_processes)
File "C:\ProgramData\Anaconda3\lib\multiprocessing\context.py", line 119, in Pool
context=self.get_context())
File "C:\ProgramData\Anaconda3\lib\multiprocessing\pool.py", line 174, in __init__
self._repopulate_pool()
File "C:\ProgramData\Anaconda3\lib\multiprocessing\pool.py", line 239, in _repopulate_pool
w.start()
File "C:\ProgramData\Anaconda3\lib\multiprocessing\process.py", line 105, in start
self._popen = self._Popen(self)
File "C:\ProgramData\Anaconda3\lib\multiprocessing\context.py", line 322, in _Popen
return Popen(process_obj)
File "C:\ProgramData\Anaconda3\lib\multiprocessing\popen_spawn_win32.py", line 33, in __init__
prep_data = spawn.get_preparation_data(process_obj._name)
File "C:\ProgramData\Anaconda3\lib\multiprocessing\spawn.py", line 143, in get_preparation_data
_check_not_importing_main()
File "C:\ProgramData\Anaconda3\lib\multiprocessing\spawn.py", line 136, in _check_not_importing_main
is not going to be frozen to produce an executable.''')
RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.
This probably means that you are not using fork to start your
child processes and you have forgotten to use the proper idiom
in the main module:
if __name__ == '__main__':
freeze_support()
...
The "freeze_support()" line can be omitted if the program
is not going to be frozen to produce an executable.
Starting sampling...
bound method Sample.summary of Method: Rejection
Number of samples: 5000
Number of simulations: 1000000
Threshold: 0.0952
Sample means: t1: 0.87, t2: 0.231
0.7.1
3.6.5
Windows 10, version 1709, 64 bit
CPU: Intel i7-5600U
I am trying to use an ABC method in order to estimate the parameters of a system of ordinary differential equations.
I'd like to use ELFI and I'm using code gathered from the tutorials. The problem is the following: one of the variables of the model is computed from a conservation equation. That is, an ODE is related to a quantity which is computed as the difference between the total amount of that quantity (which has to be estimated) and the free amount of that quantity (which is passed to the function as an initial condition). The problem is that in doing so, I am taking the difference between a prior distribution and a float:
TypeError: In executing node '_simulator_feee': unsupported operand type(s) for -: 'Prior' and 'float'.
import numpy as np
from scipy.integrate import odeint
from matplotlib import pyplot as plt
from pandas import read_csv
mcn_observed = np.asarray(read_csv("../data/mcn_sol.csv", sep = ",", header = None))
mcn_observed = mcn_observed.flatten()
# This is the system of ODEs. Note the conservation equations
def MCN(states, t, CCP, params):
# Parameters to be fitted
k_SMPF,k_ASS,k_DISS,k_DRUM1,k_IRUM1,k_I2RUM1,k_1SLP1,V_2SLP1,J_1SLP1,J_2SLP1, \
k_1IE,V_2IE,J_1IE,J_2IE,k_DMPFRUM1,V_SRUM1,k_ARUM1,k_DRUM1P,V_WEE1,k_WEE1, \
J_1WEE1,J_2WEE1,WEE1T,k_CDC25,V_CDC25,J_1CDC25,J_2CDC25,mu,k_1WEE1,k_2WEE1, \
k_1CDC25,k_2CDC25,k_D1CYC,k_D2CYC,CDC25T,IET,SLP1T,alpha,k_DX = params
k_ASS = 100 # 1
k_I2RUM1 = 50 # 5
k_ARUM1 = 35 # 16
k_DRUM1P = 250 # 17
# Variables of the system, coming from the initial conditions
MPF, MPFP, SLP1A, IEA, MPFRUM1, RUM1, RUM1P, WEE1, CDC25P, MASS = states
# Conservation equations
SLP1 = SLP1T - SLP1A
IE = IET - IEA
WEE1P = WEE1T - WEE1
CDC25 = CDC25T - CDC25P
dMPFdt = k_SMPF*MASS - k_WEE1*MPF + k_CDC25*MPFP - \
(k_D1CYC+k_D2CYC*SLP1A)*MPF - k_ASS*RUM1*MPF + \
+ (k_DISS+k_DRUM1+k_IRUM1)*MPFRUM1 + k_DX*CCP*MPFRUM1
dMPFPdt = k_WEE1*MPF - k_CDC25*MPFP - (k_D1CYC+k_D2CYC*SLP1A)*MPFP
dSLP1Adt = k_1SLP1*IEA*SLP1/(J_1SLP1+SLP1) - \
V_2SLP1*SLP1A/(J_2SLP1+SLP1A)
dIEAdt = k_1IE*(MPF+alpha*MPFP)*IE/(J_1IE+IE) - V_2IE*IEA/(J_2IE+IEA)
dMPFRUM1dt = k_ASS*RUM1*MPF - \
(k_DISS+k_DRUM1+k_IRUM1+k_DMPFRUM1)*MPFRUM1 - \
k_DX*CCP*MPFRUM1
dRUM1dt = V_SRUM1 - k_ASS*RUM1*MPF + (k_DISS+k_DMPFRUM1)*MPFRUM1 - \
k_I2RUM1*MPFP*RUM1 + k_ARUM1*RUM1P - k_DRUM1*RUM1 - \
k_DX*CCP*RUM1 + k_DX*CCP*MPFRUM1 - k_DX*CCP*RUM1
dRUM1Pdt = k_IRUM1*MPFRUM1 + k_I2RUM1*MPFP*RUM1 - \
k_ARUM1*RUM1P - k_DRUM1P*(MPF+alpha*MPFP)*RUM1P - \
k_DRUM1*RUM1P + k_DX*CCP*RUM1 + k_DX*CCP*MPFRUM1 - \
k_DX*CCP*RUM1P
dWEE1dt = V_WEE1*WEE1P/(J_1WEE1+WEE1P) - \
k_WEE1*(MPF+alpha*MPFP)*WEE1/(J_2WEE1+WEE1)
dCDC25Pdt = k_CDC25*(MPF+alpha*MPFP)*CDC25/(J_1CDC25+CDC25) - \
V_CDC25*CDC25P/(J_2CDC25+CDC25P)
dMASSdt = mu*MASS
system = [dMPFdt, dMPFPdt, dSLP1Adt, dIEAdt, dMPFRUM1dt, \
dRUM1dt, dRUM1Pdt, dWEE1dt, dCDC25Pdt, dMASSdt]
return system
# Priors
k_SMPF = elfi.Prior('uniform', 0.0, 1.0, name = 'k_SMPF')
k_ASS = elfi.Prior('uniform', 0.0, 1.0, name = 'k_ASS')
k_DISS = elfi.Prior('uniform', 0.0, 1.0, name = 'k_DISS')
k_DRUM1 = elfi.Prior('uniform', 0.0, 1.0, name = 'k_DRUM1')
k_IRUM1 = elfi.Prior('uniform', 0.0, 1.0, name = 'k_IRUM1')
k_I2RUM1 = elfi.Prior('uniform', 0.0, 1.0, name = 'k_I2RUM1')
k_1SLP1 = elfi.Prior('uniform', 0.0, 1.0, name = 'k_1SLP1')
V_2SLP1 = elfi.Prior('uniform', 0.0, 1.0, name = 'V_2SLP1')
J_1SLP1 = elfi.Prior('uniform', 0.0, 1.0, name = 'J_1SLP1')
J_2SLP1 = elfi.Prior('uniform', 0.0, 1.0, name = 'J_2SLP1')
k_1IE = elfi.Prior('uniform', 0.0, 1.0, name = 'k_1IE')
V_2IE = elfi.Prior('uniform', 0.0, 1.0, name = 'V_2IE')
J_1IE = elfi.Prior('uniform', 0.0, 1.0, name = 'J_1IE')
J_2IE = elfi.Prior('uniform', 0.0, 1.0, name = 'J_2IE')
k_DMPFRUM1 = elfi.Prior('uniform', 0.0, 1.0, name = 'k_DMPFRUM1')
V_SRUM1 = elfi.Prior('uniform', 0.0, 1.0, name = 'V_SRUM1')
k_ARUM1 = elfi.Prior('uniform', 0.0, 1.0, name = 'k_ARUM1')
k_DRUM1P = elfi.Prior('uniform', 0.0, 1.0, name = 'k_DRUM1P')
V_WEE1 = elfi.Prior('uniform', 0.0, 1.0, name = 'V_WEE1')
k_WEE1 = elfi.Prior('uniform', 0.0, 1.0, name = 'k_WEE1')
J_1WEE1 = elfi.Prior('uniform', 0.0, 1.0, name = 'J_1WEE1')
J_2WEE1 = elfi.Prior('uniform', 0.0, 1.0, name = 'J_2WEE1')
WEE1T = elfi.Prior('uniform', 0.0, 1.0, name = 'WEE1T')
k_CDC25 = elfi.Prior('uniform', 0.0, 1.0, name = 'k_CDC25')
V_CDC25 = elfi.Prior('uniform', 0.0, 1.0, name = 'V_CDC25')
J_1CDC25 = elfi.Prior('uniform', 0.0, 1.0, name = 'J_1CDC25')
J_2CDC25 = elfi.Prior('uniform', 0.0, 1.0, name = 'J_2CDC25')
mu = elfi.Prior('uniform', 0.0, 1.0, name = 'mu')
k_1WEE1 = elfi.Prior('uniform', 0.0, 1.0, name = 'k_1WEE1')
k_2WEE1 = elfi.Prior('uniform', 0.0, 1.0, name = 'k_2WEE1')
k_1CDC25 = elfi.Prior('uniform', 0.0, 1.0, name = 'k_1CDC25')
k_2CDC25 = elfi.Prior('uniform', 0.0, 1.0, name = 'k_2CDC25')
k_D1CYC = elfi.Prior('uniform', 0.0, 1.0, name = 'k_D1CYC')
k_D2CYC = elfi.Prior('uniform', 0.0, 1.0, name = 'k_D2CYC')
CDC25T = elfi.Prior('uniform', 0.0, 1.0, name = 'CDC25T')
IET = elfi.Prior('uniform', 0.0, 1.0, name = 'IET')
SLP1T = elfi.Prior('uniform', 0.0, 1.0, name = 'SLP1T')
alpha = elfi.Prior('uniform', 0.0, 1.0, name = 'alpha')
k_DX = elfi.Prior('uniform', 0.0, 1.0, name = 'k_DX')
# Function for the simulator
def sim(params, batch_size = 1, random_state = None):
# Initial conditions
ivp = np.random.randint(low = 0, high = 5, size = 10)
# Time
stop_time = 500
t = np.linspace(0, stop_time, stop_time*10)
# Odeint
res = odeint(MCN, ivp, t, args = (0, params))
res = np.asarray(result).flatten()
return res
params = list((k_SMPF,k_ASS,k_DISS,k_DRUM1,k_IRUM1,k_I2RUM1,k_1SLP1,V_2SLP1,J_1SLP1,J_2SLP1, \
k_1IE,V_2IE,J_1IE,J_2IE,k_DMPFRUM1,V_SRUM1,k_ARUM1,k_DRUM1P,V_WEE1,k_WEE1, \
J_1WEE1,J_2WEE1,WEE1T,k_CDC25,V_CDC25,J_1CDC25,J_2CDC25,mu,k_1WEE1,k_2WEE1, \
k_1CDC25,k_2CDC25,k_D1CYC,k_D2CYC,CDC25T,IET,SLP1T,alpha,k_DX))
elfi_sim = elfi.Simulator(sim, params, observed = mcn_observed)
elfi_sim.generate()
When I run elfi_sim.generate()
, I get the error. How can I deal with this situation?
Moreover, when I first tried to use ELFI for this very same problem, I did not get this error but it was complaining about the fact that res
from the odeint
step was not 1D.
Wall time in bolfi.sample seems to depend on the n_evidence parameter in bolfi.fit while I think this should not be the case.
The advantage in using a surrogate of the discrepancies fitted using Gaussian processes, is that, once the GP hyperparameters are determined, MCMC sampling (via bolfi.sample) should be completely independent of the model simulator, thus resulting in a scalable algorithm.
If I have not misunderstood the theory in the relevant paper, once bolfi.fit(n_evidence=xxx) has completed, the execution of bolfi.sample(numsamples) to produce numsaples posterior samples should require the same wall-clock time, no matter the value I set for n_evidence. This is because bolfi.samples only invokes the fitted GP and never the true model simulator.
This is clearly not the case, hence I wonder what I am missing.
I build up on the example in https://elfi.readthedocs.io/en/latest/usage/BOLFI.html
I run all the code before the call to elfi. BOLFI. Then I run the following:
bolfi = elfi.BOLFI(log_d, batch_size=1, initial_evidence=20, update_interval=10, bounds={'t1':(-2, 2), 't2':(-1, 1)}, acq_noise_var=[0.1, 0.1], seed=seed)
Then I fit the GP model on 100 evidence points:
%time post = bolfi.fit(n_evidence=100)
INFO:elfi.methods.posteriors:Using optimized minimum value (-1.6966) of the GP discrepancy mean function as a threshold
Wall time: 24.7 s
and produce 1000 posterior samples
%time result_BOLFI = bolfi.sample(1000, info_freq=1000)
4 chains of 1000 iterations acquired. Effective sample size and Rhat for each parameter:
t1 2063.7620137995186 1.0014860681196993
t2 2415.9426764728387 0.9994192510776999
Wall time: 57.4 s
Now I enlarge the number of evidence points to 300:
%time post = bolfi.fit(n_evidence=300)
INFO:elfi.methods.posteriors:Using optimized minimum value (-1.6316) of the GP discrepancy mean function as a threshold
Wall time: 1min 4s
and sample 1000 draws:
%time result_BOLFI = bolfi.sample(1000, info_freq=1000)
4 chains of 1000 iterations acquired. Effective sample size and Rhat for each parameter:
t1 1908.9201493103064 1.0000733568191036
t2 2329.652810214964 1.0009504568577294
Wall time: 1min 35s
So posterior sampling time has increased from 57" to 1'35" in the two attempts. Shouldn't the sampling times be roughly the same?
0.7.3
3.5.6
Windows 10
If batch_size of a sampling method (e.g. elfi.Rejection) > n_samples , the algorithm does a simulation batch_size times, and takes only n_samples from it. It maybe a good idea to check: if batch_size > n_sample then batch_size = n_samples;
n_samples = 1
m = elfi.Prior('uniform', -2, 4) # or any other elfi model
rej = elfi.Rejection(m, 'dist', seed=0, batch_size=200, max_parallel_batches=32)
result = rej.sample(n_samples)
Doing 200 simulations and then taking only one sample;
Printing a warning that the batch_size can't be > than n_samples, then doing one simulation that will be later given as a requested sample;
After acquiring evidence from the simulator, an error is thrown on the call to scipy's differential_evolution
.
I think this is possibly an issue due to a change in scipy. If I downgrade to scipy 1.3.1 everything works, but with version >=1.5 (the default installed from conda) I get this error. You could pin the version of scipy in the conda recipe for elfi, but I would suggest updating if possible as doing this is likely to lead to unresolvable environments for many people.
Here is my environment:
# Name Version Build Channel
appnope 0.1.0 py36h9f0ad1d_1001 conda-forge
backcall 0.2.0 pyh9f0ad1d_0 conda-forge
ca-certificates 2020.6.20 hecda079_0 conda-forge
certifi 2020.6.20 py36h9f0ad1d_0 conda-forge
cycler 0.10.0 py_2 conda-forge
decorator 4.4.2 py_0 conda-forge
elfi 0.7.4 py_2 conda-forge
freetype 2.10.2 h8da9a1a_0 conda-forge
gpy 1.9.9 py36h255dfe6_3 conda-forge
graphviz 2.42.3 h98dfb87_0 conda-forge
ipykernel 5.3.4 py36h95af2a2_0 conda-forge
ipyparallel 6.3.0 py36h9f0ad1d_0 conda-forge
ipython 7.16.1 py36h95af2a2_0 conda-forge
ipython_genutils 0.2.0 py_1 conda-forge
jedi 0.17.2 py36h9f0ad1d_0 conda-forge
joblib 0.16.0 py_0 conda-forge
jpeg 9d h0b31af3_0 conda-forge
jupyter_client 6.1.6 py_0 conda-forge
jupyter_core 4.6.3 py36h9f0ad1d_1 conda-forge
kiwisolver 1.2.0 py36h863e41a_0 conda-forge
lcms2 2.11 h174193d_0 conda-forge
libblas 3.8.0 17_openblas conda-forge
libcblas 3.8.0 17_openblas conda-forge
libcxx 10.0.1 h5f48129_0 conda-forge
libffi 3.2.1 h4a8c4bd_1007 conda-forge
libgfortran 4.0.0 2 conda-forge
liblapack 3.8.0 17_openblas conda-forge
libllvm9 9.0.1 h7475705_1 conda-forge
libopenblas 0.3.10 openmp_h63d9170_3 conda-forge
libpng 1.6.37 hbbe82c9_1 conda-forge
libsodium 1.0.17 h01d97ff_0 conda-forge
libtiff 4.1.0 h2ae36a8_6 conda-forge
libwebp-base 1.1.0 h0b31af3_3 conda-forge
llvm-openmp 10.0.1 h28b9765_0 conda-forge
llvmlite 0.33.0 py36hdce38a9_1 conda-forge
lz4-c 1.9.2 h4a8c4bd_1 conda-forge
matplotlib 3.3.0 1 conda-forge
matplotlib-base 3.3.0 py36h534ab7b_1 conda-forge
ncurses 6.2 hb1e8313_1 conda-forge
networkx 1.11 py36_0 conda-forge
numba 0.50.1 py36h18adda2_1 conda-forge
numpy 1.19.0 py36h4a66613_0 conda-forge
olefile 0.46 py_0 conda-forge
openssl 1.1.1g h0b31af3_0 conda-forge
paramz 0.9.5 py_0 conda-forge
parso 0.7.0 pyh9f0ad1d_0 conda-forge
pexpect 4.8.0 py36h9f0ad1d_1 conda-forge
pickleshare 0.7.5 py36h9f0ad1d_1001 conda-forge
pillow 7.2.0 py36h2ae5dfa_1 conda-forge
pip 20.1.1 py_1 conda-forge
prompt-toolkit 3.0.5 py_1 conda-forge
ptyprocess 0.6.0 py_1001 conda-forge
pygments 2.6.1 py_0 conda-forge
pyparsing 2.4.7 pyh9f0ad1d_0 conda-forge
python 3.6.10 h4334963_1011_cpython conda-forge
python-dateutil 2.8.1 py_0 conda-forge
python_abi 3.6 1_cp36m conda-forge
pyzmq 19.0.1 py36h820b253_0 conda-forge
readline 8.0 h0678c8f_2 conda-forge
scikit-learn 0.23.1 py36hef903b7_0 conda-forge
scipy 1.5.1 py36h1dac7e4_0 conda-forge
setuptools 49.2.0 py36h9f0ad1d_0 conda-forge
six 1.15.0 pyh9f0ad1d_0 conda-forge
sqlite 3.32.3 h93121df_1 conda-forge
threadpoolctl 2.1.0 pyh5ca1d4c_0 conda-forge
tk 8.6.10 hbbe82c9_0 conda-forge
toolz 0.10.0 py_0 conda-forge
tornado 6.0.4 py36h37b9a7d_1 conda-forge
traitlets 4.3.3 py36h9f0ad1d_1 conda-forge
wcwidth 0.2.5 pyh9f0ad1d_0 conda-forge
wheel 0.34.2 py_1 conda-forge
xz 5.2.5 h0b31af3_1 conda-forge
zeromq 4.3.2 h6de7cb9_2 conda-forge
zlib 1.2.11 h0b31af3_1006 conda-forge
zstd 1.4.5 h0384e3a_1 conda-forge
Run conda create -n elfi elfi==0.7.4 scipy>=1.5 numpy && conda activate elfi
Run any model with BOLFI, e.g.:
bolfi = elfi.BOLFI(d, batch_size=1, initial_evidence=20, update_interval=10,
bounds={'alphaprior1':(0, 5), 'alphaprior2':(0, 5)},
acq_noise_var=[0.01, 0.01], seed=1)
post = bolfi.fit(n_evidence=400)
After the evidence acquisition is complete:
Progress: |โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ| 100.0% Complete
Traceback (most recent call last):
File "ELFI.simulator.py", line 61, in <module>
post = bolfi.fit(n_evidence=40)
File "/Users/jlees/miniconda3/envs/elfi/lib/python3.6/site-packages/elfi/methods/parameter_inference.py", line 1191, in fit
self.infer(n_evidence, bar=bar)
File "/Users/jlees/miniconda3/envs/elfi/lib/python3.6/site-packages/elfi/methods/parameter_inference.py", line 276, in infer
return self.extract_result()
File "/Users/jlees/miniconda3/envs/elfi/lib/python3.6/site-packages/elfi/methods/parameter_inference.py", line 953, in extract_result
self.target_model.predict_mean, self.target_model.bounds, seed=self.seed)
File "/Users/jlees/miniconda3/envs/elfi/lib/python3.6/site-packages/elfi/methods/bo/utils.py", line 32, in stochastic_optimization
func=fun, bounds=bounds, maxiter=maxiter, polish=polish, init='latinhypercube', seed=seed)
File "/Users/jlees/miniconda3/envs/elfi/lib/python3.6/site-packages/scipy/optimize/_differentialevolution.py", line 308, in differential_evolution
ret = solver.solve()
File "/Users/jlees/miniconda3/envs/elfi/lib/python3.6/site-packages/scipy/optimize/_differentialevolution.py", line 814, in solve
constraints=self.constraints)
File "/Users/jlees/miniconda3/envs/elfi/lib/python3.6/site-packages/scipy/optimize/_minimize.py", line 618, in minimize
callback=callback, **options)
File "/Users/jlees/miniconda3/envs/elfi/lib/python3.6/site-packages/scipy/optimize/lbfgsb.py", line 308, in _minimize_lbfgsb
finite_diff_rel_step=finite_diff_rel_step)
File "/Users/jlees/miniconda3/envs/elfi/lib/python3.6/site-packages/scipy/optimize/optimize.py", line 262, in _prepare_scalar_function
finite_diff_rel_step, bounds, epsilon=epsilon)
File "/Users/jlees/miniconda3/envs/elfi/lib/python3.6/site-packages/scipy/optimize/_differentiable_functions.py", line 95, in __init__
self._update_grad()
File "/Users/jlees/miniconda3/envs/elfi/lib/python3.6/site-packages/scipy/optimize/_differentiable_functions.py", line 171, in _update_grad
self._update_grad_impl()
File "/Users/jlees/miniconda3/envs/elfi/lib/python3.6/site-packages/scipy/optimize/_differentiable_functions.py", line 92, in update_grad
**finite_diff_options)
File "/Users/jlees/miniconda3/envs/elfi/lib/python3.6/site-packages/scipy/optimize/_numdiff.py", line 388, in approx_derivative
raise ValueError("`f0` passed has more than 1 dimension.")
ValueError: `f0` passed has more than 1 dimension.
Gives me a correct bolfi object, with a lower version of scipy
0.7.4
3.6
OS X 10.15
Format the functional tests to be compatible with py.test, as currently they are not executed. We should also move all the tests directly under the test directory and use the tagging feature of pytest to limit the tests that are run (i.e. mark slow tests and run them separately).
SMC should not propose forbidden particles.
A prior pdf may forbid certain values for particles. These should not be allowed by the SMC proposals either.
Although plotting tends to be simple as it is, it would be nice to integrate typical plotting scenarios.
Please provide more detail on how the SMC ABC sampling works; in particular how the "schedule" affects sampling and performance.
The tutorial is quite brief on the SMC ABC sampling:
For sampling, one has to define the number of output samples, the number of populations and a schedule i.e. a list of quantiles to use for each population. In essence, a population is just refined rejection sampling.
N = 1000
schedule = [0.7, 0.2, 0.05]
%time result_smc = smc.sample(N, schedule)
Does that mean each population is a new prior for a rejection sampling round with the given quantile?
I would expect the performance of such an algorithm to be comparable to rejection sampling with quantile = 0.70.20.05 but the time it takes to sample for SMC ABC is much longer.
Thank you for any explanation or link!
Also, a progress bar similar to PyMC sampling would be very much appreciated.
The paths to the cpp code in the documentation notebook are incorrect.
The paths are listed as resources/cpp
and ./resources/cpp/bdm
whereas they should be ./resources
and ./resources/bdm
respectively.
The documentation for using non-Python operations (here) (as it currently stands) does not run the example as expected.
Currently will return make: *** ./resources/cpp: No such file or directory. Stop.
No output is expected but the cpp code needs to be compiled/linked and the executable moved, these currently don't occur.
0.6.0
3.6.1
Mac OS 10.12.5
When I tried to run the example scripts in the examples folder:
python bdm.py
I get this error:
File "/home/akash/anaconda2/envs/py34/lib/python3.4/site-packages/elfi/methods/bo/utils.py", line 76
start_points[:, i] = random_state.uniform(*bounds[i], n_start_points)
^
SyntaxError: only named arguments may follow *expression
I got this error both in 2.7 and 3.4. After this I tried to create a new environment and installed 3.5 virtually on anaconda, and tried to run it again, and this time I got this:
File "/home/akash/SheffieldML/GPy/core/parameterization/init.py", line 9, in
from paramz import ties_and_remappings, ObsAr
ImportError: cannot import name 'ties_and_remappings'
This is a GPy Error, because one of the dependency pacjkages: paramz is still uncompatible for python 3.5. Can you suggest something here which I should do with the installations to make it finally working.
The example scripts did not run and throw syntax errors.
The example scripts did not run.
Implement initial working standard Bayesian Synthetic Likelihood method as described by Leah Price in https://doi.org/10.1080/10618600.2017.1302882.
Please provide a short couple sentence summary.
I am trying to run a model using BOLFI with parallelization.
I am running this in PyCharm and I am not using a Jupyter notebook.
Describe the issue as clearly as possible.
I am using kernel Matern32 thus I am passing a GPyRegression as the target model.
I already tried including the import like this:
ipyclient = elfi.get_client().ipp_client
with ipyclient[:].sync_imports():
import GPy
However, i am getting the same error.
Please report steps to reproduce the issue. If it's not possible to reproduce, please include a description of how you discovered the issue.
If you have a reproducible example, please include it.
matern = GPy.kern.Matern32(10, 0.01, 0.2)
matern32 = elfi.GPyRegression(parameter_names=ml.parameter_names,kernel=matern,
noise_var=0.05, bounds={'c0': (-1, 1), 'c1': (-1, 1), 'c2': (-1, 1), 'c3': (-1, 1), 'c4': (-1, 1),
'c5': (-1, 1), 'c6':(-1,1),'c7':(-1,1),'c8':(-1,1),'c9':(-1,1)})
bolfi = elfi.BOLFI(log_d, batch_size=1, initial_evidence=20, update_interval=10, target_model=matern32,seed=seed,async=True)
bolfi.fit(n_evidence=200)
result_BOLFI = bolfi.sample(1000, info_freq=1000)
print(result_BOLFI)
The current output. Knowing what is the current behavior is useful.
I am getting this error after running the script:
ipyparallel.error.RemoteError: AttributeError('Matern32' object has no attribute '_name')
Some of the requirements are very tightly pinned, e.g. dask[distributed]==2.30.0
, and also out of date with regards to the latest developments. dask
is now up to 2021.9
. Thus, our projects using elfi
are stuck with the old version of dask
.
The requirements of this library are falling behind the latest developments of its dependencies. For example,
2021.9
but elfi
is tightly pinned to 2.30.0
1.20
; elfi
is at 0.12.1
If these requirements could be updated or relaxed, then projects dependent on elfi
could access new features of the other libraries.
N/A
Environment with old version of dependency libraries.
Updated or relaxed dependency definitions.
0.8.0
3.7.10
Please provide a short couple sentence summary.
About multiprocessing parallel, is there anything wrong?
Describe the issue as clearly as possible.
Doing it once, it saves time, but in a for loop, it takes more time.
Please report steps to reproduce the issue. If it's not possible to reproduce, please include a description of how you discovered the issue.
If you have a reproducible example, please include it.
import time
import numpy as np
import scipy.stats
import matplotlib
import matplotlib.pyplot as plt
import logging
logging.basicConfig(level=logging.INFO)
import elfi
MSE = []
MSE1 = []
t1=0
t=0
seed=55889
for seed in range(100):
np.random.seed(seed+4543)
elfi.new_model()
alpha = elfi.Prior(scipy.stats.uniform, 0.05, 1)
tao = 0.198
def M(alpha, batch_size=1, random_state=None):
y = [1]
for i in range(19):
b = np.random.binomial(1, alpha/(alpha+tao), size=1)
if b == 0:
y = y + [1]
else:
q = y/np.sum(y)
p = q.tolist()
a = np.random.multinomial(1, p, size=1).tolist()[0]
y = [f+g for f,g in zip(y,a)]
yalpha = sorted(y)
return yalpha
theta = np.random.uniform(0.05,1,size=1)
y0 = M(theta)
Y = elfi.Simulator(M,alpha,observed=y0)
Y1 = elfi.Simulator(M, alpha, observed=y0)
def S(x):
sum=np.sum([y*y for y in x])
return sum
Su = elfi.Summary(S, Y)
def S1(x):
t1 = len(x)/20
return t1
def S2(x):
q = x/np.sum(x)
p = q.tolist()
t2 = 1 - np.sum([a*a for a in p])
return t2
Su1 = elfi.Summary(S1, Y1)
d = elfi.Distance('euclidean', Su,Su1)
d1 = elfi.Distance('euclidean', Su1)
rej = elfi.Rejection(d, batch_size=1, seed=seed)
rej1 = elfi.Rejection(d1, batch_size=1, seed=seed)
start=time.time()
**elfi.set_client('multiprocessing')**
result = rej.sample(n_samples=100, threshold=0.01)
end=time.time()
t=end-start+t
start1=time.time()
**elfi.set_client('multiprocessing')**
result1 = rej1.sample(n_samples=100, threshold=0.01)
end1=time.time()
t1=end1-start1+t1
a1 = result1.samples_array
b1 = np.sqrt(np.mean(np.square(a1-theta)))
a = result.samples_array
b = np.sqrt(np.mean(np.square(a-theta)))
MSE = MSE + [b]
MSE1 = MSE1 + [b1]
The current output. Knowing what is the current behavior is useful.
it has taken more than 3 hours, but without multiprocessing, around 3 hours
Describe what you expect the output to be. Knowing the correct behavior is also very useful.
should be half the time, cauz when I run once, it takes about 2 minutes with parallel and 1 minute with multiprocessing.
elfi.simulator(name, ..., vectorized=True)
If vectorized=False
, then the simulator node __init__
would just vectorize the simulator before passing it on. We would not need if
clauses in simulator_operation
.
I'm trying to use ELFI with a computationally intensive simulator and want to store some output from the simulator function and other nodes. I have tried using using OutputPool to store both simulator and summary node outputs, but the stores of the OutputPool are empty after simulation: e.g.
pool = elfi.OutputPool(["S1"])
results in {'S1': None}
after simulation.
Any ideas? Ideally I would like to only save the simulation output of accepted simulations.
the quickstart page lacks an "import elfi". Also, for clarity, priors should be defined at a latter stage
Describe the issue as clearly as possible.
I am totally new to Python, and when I say "new" I mean that I have installed it for the first time today (via Anaconda). So what follows might be silly and/or totally obvious to others, as of course some python knowledge is required to play with ELFI. But anyway, I managed to go through the quickstart https://elfi.readthedocs.io/en/latest/quickstart.html#
However, this was not pain-free, hence here are my suggestions:
I'm trying to replicate a STAN model I've been using with elfi. I've run into some trouble with Elfi and propably Python.
The lognorm and bernoulli distances (using scipy stats functions) are implemented in Python, prior is defined (using one lognorm 2,10), a Simulator object is defined (the function is tested and it ouputs similar data as in elfi examples), ABC rejection sampler defined, but sampling from the rejection sampler throws an error.
I have defined the simulator like this:
def legacy_updated(data, batch_size=1, random_state=None):
data= np.atleast_1d(data)
global result_array
rows = len(data)
cols = len(data[0])
result_array = np.array([])
for row in range(rows):
if(1 > row):
for col in range(cols):
x = data[row][col]*bernoulli_pmf(bernoulli_theta(data[None][row]))
y = lognormal_pdf(frozen_lognormalmean(data[row][None],lognormal_sigma(data[row][None])))
result_array = np.append(result_array,[x+y, x])
return result_array.reshape(1,-1)
else:
for col in range(cols):
y = data[row][col]*bernoulli_pmf((bernoulli_theta(data[None][row])))
result_array = np.append(result_array,[y,])
return result_array.reshape(1,-1)
def lognormal_pdf(y):
s=0.945
x = np.linspace(ss.lognorm.ppf(0.01, s),ss.lognorm.ppf(0.99, s))
return ss.lognorm.pdf(x, s)
def bernoulli_pmf(y):
p = 0.3
x = bernoulli_theta(y)
return ss.bernoulli.pmf(x,p)
def log_mean(y):
return np.mean(np.log(y),axis=0)
def log_std(y):
return np.std(np.log(y),axis=0)
def frozen_lognormalmean(y,s):
rv=ss.lognorm(s)
return rv.pdf(np.mean(y))
def lognormal_sigma(y):
s=0.945
x = np.linspace(ss.lognorm.ppf(0.01, s),ss.lognorm.ppf(0.99, s))
return x
def bernoulli_theta(y):
p = 0.3
x = np.arange(ss.bernoulli.ppf(0.01, p), ss.bernoulli.ppf(9.99,p))
return ss.bernoulli.rvs(1,p)
Simulator function outputs the data in the same format as in the elfi example, but when I try to generate data with simulator node I get the following error from within the loop.
`TypeError Traceback (most recent call last)
C:\Anaconda\envs\DataScienceEnv\lib\site-packages\elfi\executor.py in execute(cls, G)
69 try:
---> 70 G.node[node] = cls._run(op, node, G)
71 except Exception as exc:
C:\Anaconda\envs\DataScienceEnv\lib\site-packages\elfi\executor.py in _run(fn, node, G)
153
--> 154 output_dict = {'output': fn(*args, **kwargs)}
155 return output_dict
in legacy_updated(data, batch_size, random_state)
9 rows = len(data)
---> 10 cols = len(data[0])
11 result_array = np.array([])
TypeError: object of type 'numpy.float64' has no len()`
I didn't expect the error since my function is outputting a numpy array, my python skills propably come in play also this really doesn't seem like a big error
0.3.1
3.6.5
windows 10
Running same code multiple times produces different outcomes, and adds weird priors to model.
Running same code multiple times, e.g. with IPython, produces different outcomes. It seems to me that defining new priors and simulators does not handle memory correctly and they append, not overwrite, old priors and simulator.
How to reproduce: Open Spyder, copy following code and save it:
import elfi
import numpy as np
from elfi.examples import ma2
seed = 20170530
np.random.seed(seed)
# Define everything similarly as done in introduction. However here all code is taken from examples module.
t1_true = 0.6
t2_true = 0.2
y_obs = ma2.MA2(t1_true, t2_true)
t1 = elfi.Prior(ma2.CustomPrior1, 2)
t2 = elfi.Prior(ma2.CustomPrior2, t1, 1)
Y = elfi.Simulator(ma2.MA2, t1, t2, observed=y_obs)
S1 = elfi.Summary(ma2.autocov, Y)
S2 = elfi.Summary(ma2.autocov, Y, 2)
d = elfi.Distance('euclidean', S1, S2) #
rej = elfi.Rejection(d, batch_size=5000, seed=seed)
result = rej.sample(1000, threshold=0.5)
print(result.summary)
Run (F5) code. Everything is fine and output should be print in couple of seconds. Then run the code again. Output is different and weird prior nodes are appeared to the model. If code is run again even more priors keep appearing.
Same behavior exists if code is run multiple times from console with IPython.
I think that problem arise from fact that running lines
t2 = elfi.Prior(ma2.CustomPrior2, t1, 1)
Y = elfi.Simulator(ma2.MA2, t1, t2, observed=y_obs)```
again, do not overwrite old prior nodes and/or simulator, but instead they append nodes to old simulator.
#### Current Output for both runs:
runfile('XXX.py', wdir='XXX')
<bound method Sample.summary of Method: Rejection
Number of samples: 1000
Number of simulations: 10000
Threshold: 0.387
Sample means: t1: 0.544, t2: 0.227
>
runfile('XXX.py', wdir='XXX')
<bound method Sample.summary of Method: Rejection
Number of samples: 1000
Number of simulations: 10000
Threshold: 0.386
Sample means: _prior_0466: 0.252, _prior_7dfe: 0.537, t1: -0.0194, t2: 0.356
>
#### Expected Output:
Output should be the same for both runs.
#### ELFI Version:
0.7.1
#### Python Version:
3.6.5
#### Operating System:
Windows 10, version 1709, 64 bit
CPU: Intel i7-5600U
FYI, you might want to consider DAFT for graphical models
http://daft-pgm.org
When a sequence of thresholds having length > 1 is passed to smc, the inference fails with ValueError: In executing node '_theta_logpdf': Vector 'x' must have either the same number of entries as, or one entry fewer than, parameter vector 'a', but alpha.shape = (10,) and x.shape = (100000,)..
Using a simulator of a multinomial model, having a dirichlet prior, attempting to run smc for a sequence of thresholds produces an error.
import numpy as np
import scipy.stats as ss
import elfi
num_categories = 10
num_experiments = 100
theta = elfi.Prior('dirichlet', np.ones(num_categories), name='theta')
def simulator(num_experiments, theta, batch_size=1, random_state=None):
if len(theta.shape)==1:
theta = theta.reshape(1,-1)
return_value = []
for i in range(theta.shape[0]):
return_value.append(
ss.multinomial.rvs(
100,
theta[i,:],
random_state=random_state
)
)
return np.array(return_value)
def mean(y):
return np.mean(y, axis=1)
def ret_identity(y):
return y
def distance_in_variation(p1,observed=[]):
arr_p1 = np.asarray(p1)
y = np.tile(observed[0],(np.asarray(p1).shape[0],1))
return 0.5*np.sum(np.abs(arr_p1-y),axis=1)
theta0 = np.random.dirichlet(np.ones(num_categories))
np.random.seed(20191126)
y0 = simulator(num_experiments,theta0)
from functools import partial
# Add the simulator node and observed data to the model
sim_fn = partial(simulator,num_experiments)
sim = elfi.Simulator(sim_fn, theta, observed=y0)
d = elfi.Discrepancy(
distance_in_variation,
elfi.Summary(ret_identity, sim, name='ret_identity'),
name='d')
smc = elfi.SMC(d, batch_size=10000)
N = 1000
schedule = [100,99]
%time result_smc = smc.sample(N, schedule)
When the threshold schedule is [100], everything runs. When it is changed to [100,99] it fails with the error:
Progress: |โโโโโโโโโโโโโโโโโโโโโโโโโ-------------------------| 50.0% Complete
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
~/.pyenv/versions/3.7.4/envs/work/lib/python3.7/site-packages/elfi/executor.py in execute(cls, G)
69 try:
---> 70 G.node[node] = cls._run(op, node, G)
71 except Exception as exc:
~/.pyenv/versions/3.7.4/envs/work/lib/python3.7/site-packages/elfi/executor.py in _run(fn, node, G)
153
--> 154 output_dict = {'output': fn(*args, **kwargs)}
155 return output_dict
~/.pyenv/versions/3.7.4/envs/work/lib/python3.7/site-packages/scipy/stats/_multivariate.py in logpdf(self, x, alpha)
1448 alpha = _dirichlet_check_parameters(alpha)
-> 1449 x = _dirichlet_check_input(alpha, x)
1450
~/.pyenv/versions/3.7.4/envs/work/lib/python3.7/site-packages/scipy/stats/_multivariate.py in _dirichlet_check_input(alpha, x)
1242 "parameter vector 'a', but alpha.shape = %s "
-> 1243 "and x.shape = %s." % (alpha.shape, x.shape))
1244
ValueError: Vector 'x' must have either the same number of entries as, or one entry fewer than, parameter vector 'a', but alpha.shape = (10,) and x.shape = (100000,).
During handling of the above exception, another exception occurred:
ValueError Traceback (most recent call last)
<timed exec> in <module>
~/.pyenv/versions/3.7.4/envs/work/lib/python3.7/site-packages/elfi/methods/parameter_inference.py in sample(self, n_samples, *args, **kwargs)
404 bar = kwargs.pop('bar', True)
405
--> 406 return self.infer(n_samples, *args, bar=bar, **kwargs)
407
408 def _extract_result_kwargs(self):
~/.pyenv/versions/3.7.4/envs/work/lib/python3.7/site-packages/elfi/methods/parameter_inference.py in infer(self, vis, bar, *args, **kwargs)
262
263 while not self.finished:
--> 264 self.iterate()
265 if vis:
266 self.plot_state(interactive=True, **vis_opt)
~/.pyenv/versions/3.7.4/envs/work/lib/python3.7/site-packages/elfi/methods/parameter_inference.py in iterate(self)
299 # Submit new batches if allowed
300 while self._allow_submit(self.batches.next_index):
--> 301 next_batch = self.prepare_new_batch(self.batches.next_index)
302 logger.debug("Submitting batch %d" % self.batches.next_index)
303 self.batches.submit(next_batch)
~/.pyenv/versions/3.7.4/envs/work/lib/python3.7/site-packages/elfi/methods/parameter_inference.py in prepare_new_batch(self, batch_index)
716 params = GMDistribution.rvs(*self._gm_params, size=self.batch_size,
717 prior_logpdf=self._prior.logpdf,
--> 718 random_state=self._round_random_state)
719
720 batch = arr2d_to_batch(params, self.parameter_names)
~/.pyenv/versions/3.7.4/envs/work/lib/python3.7/site-packages/elfi/methods/utils.py in rvs(cls, means, cov, weights, size, prior_logpdf, random_state)
229 # check validity of x
230 if prior_logpdf is not None:
--> 231 x = x[np.isfinite(prior_logpdf(x))]
232
233 n_accepted1 = len(x)
~/.pyenv/versions/3.7.4/envs/work/lib/python3.7/site-packages/elfi/methods/utils.py in logpdf(self, x)
356 def logpdf(self, x):
357 """Return the log density of the joint prior at x."""
--> 358 return self._evaluate_pdf(x, log=True)
359
360 def _evaluate_pdf(self, x, log=False):
~/.pyenv/versions/3.7.4/envs/work/lib/python3.7/site-packages/elfi/methods/utils.py in _evaluate_pdf(self, x, log)
380 loaded_net.node[k] = {'output': v}
381
--> 382 val = self.client.compute(loaded_net)[node]
383 if ndim == 0 or (ndim == 1 and self.dim > 1):
384 val = val[0]
~/.pyenv/versions/3.7.4/envs/work/lib/python3.7/site-packages/elfi/client.py in compute(self, loaded_net)
271 def compute(self, loaded_net):
272 """Request evaluation of `loaded_net` and wait for result."""
--> 273 return self.apply_sync(Executor.execute, loaded_net)
274
275 @property
~/.pyenv/versions/3.7.4/envs/work/lib/python3.7/site-packages/elfi/clients/native.py in apply_sync(self, kallable, *args, **kwargs)
51
52 """
---> 53 return kallable(*args, **kwargs)
54
55 def get_result(self, task_id):
~/.pyenv/versions/3.7.4/envs/work/lib/python3.7/site-packages/elfi/executor.py in execute(cls, G)
71 except Exception as exc:
72 raise exc.__class__("In executing node '{}': {}."
---> 73 .format(node, exc)).with_traceback(exc.__traceback__)
74
75 elif 'output' not in attr:
~/.pyenv/versions/3.7.4/envs/work/lib/python3.7/site-packages/elfi/executor.py in execute(cls, G)
68 op = attr['operation']
69 try:
---> 70 G.node[node] = cls._run(op, node, G)
71 except Exception as exc:
72 raise exc.__class__("In executing node '{}': {}."
~/.pyenv/versions/3.7.4/envs/work/lib/python3.7/site-packages/elfi/executor.py in _run(fn, node, G)
152 args = [a[1] for a in sorted(args, key=itemgetter(0))]
153
--> 154 output_dict = {'output': fn(*args, **kwargs)}
155 return output_dict
156
~/.pyenv/versions/3.7.4/envs/work/lib/python3.7/site-packages/scipy/stats/_multivariate.py in logpdf(self, x, alpha)
1447 """
1448 alpha = _dirichlet_check_parameters(alpha)
-> 1449 x = _dirichlet_check_input(alpha, x)
1450
1451 out = self._logpdf(x, alpha)
~/.pyenv/versions/3.7.4/envs/work/lib/python3.7/site-packages/scipy/stats/_multivariate.py in _dirichlet_check_input(alpha, x)
1241 "of entries as, or one entry fewer than, "
1242 "parameter vector 'a', but alpha.shape = %s "
-> 1243 "and x.shape = %s." % (alpha.shape, x.shape))
1244
1245 if x.shape[0] != alpha.shape[0]:
ValueError: In executing node '_theta_logpdf': Vector 'x' must have either the same number of entries as, or one entry fewer than, parameter vector 'a', but alpha.shape = (10,) and x.shape = (100000,)..
I expect the output to have samples similar to running with threshold [99].
0.7.4
3.7.4
MAC OSX
The first line in the quickstart tutorial throws an error on my system. Not entirely sure what I'm doing wrong. Thanks in advance for the help!
The following reproduces my error.
import elfi
elfi.Prior('uniform', -2, 4)
The current error is pasted here:
>>> mu = elfi.Prior('uniform', -2, 4)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "~/Library/Python/3.6/lib/python/site-packages/elfi/model/elfi_model.py", line 676, in __init__
super(Prior, self).__init__(distribution, *params, size=size, **kwargs)
File "~/Library/Python/3.6/lib/python/site-packages/elfi/model/elfi_model.py", line 595, in __init__
super(RandomVariable, self).__init__(*params, state=state, **kwargs)
File "~/Library/Python/3.6/lib/python/site-packages/elfi/model/elfi_model.py", line 531, in __init__
super(StochasticMixin, self).__init__(*parents, state=state, **kwargs)
File "~/Library/Python/3.6/lib/python/site-packages/elfi/model/elfi_model.py", line 345, in __init__
name = self._give_name(name, model)
File "~/Library/Python/3.6/lib/python/site-packages/elfi/model/elfi_model.py", line 472, in _give_name
while re.match('\s*super\(', info.code_context[0]):
TypeError: 'NoneType' object is not subscriptable
>>> import elfi; elfi.__version__
'0.6.0'
>>> import numpy; numpy.__version__
'1.13.1'
>>> import scipy; scipy.__version__
'0.19.1'
$ python3 --version
Python 3.6.1
Mac OS 10.12.5
$ sw_vers -productVersion
10.12.5
Please provide a short couple sentence summary.
I am doing smc ABC using elfi smc as instructed on the official website tutoirial
Describe the issue as clearly as possible.
if I run once, there is no problem, but if I want to infer the parameter 100 times using a for loop, it caused problem, I can't run it, when I run the first of the loops, it's ok, and takes about 20s, the second, it becomes 1 minute, and third, it always said there is not enough examples....and elfi is trying.
Please report steps to reproduce the issue. If it's not possible to reproduce, please include a description of how you discovered the issue.
If you have a reproducible example, please include it.
below is the code, while Ms is just the function which you could ignore. S1 is the summary statistics.
def Ms(alpha, beta, batch_size=1, random_state=None):
y = [1]
alpha = np.float(alpha)
beta = np.float(beta)
theta = [alpha, alpha*beta, tao]
theta = theta/np.sum(theta)
i = 1
while i < 20:
if y == []:
break
else:
b = np.random.multinomial(1, theta, size=1)
i = i + 1
if b[0][2] == 1:
y = y + [1]
elif b[0][0] == 1:
q = y/np.sum(y)
p = q.tolist()
a = np.random.multinomial(1, p, size=1).tolist()[0]
y = [f+g for f,g in zip(y,a)]
else:
i = i - 2
q = y/np.sum(y)
p = q.tolist()
a = np.random.multinomial(1, p, size=1).tolist()[0]
y = [f-g for f,g in zip(y,a)]
while 0 in y:
y.remove(0)
yalpha = sorted(y)
return yalpha
def S1(x):
t1 = len(x)/20
return t1
start=time.time()
for i in range(100):
np.random.seed(seed+10000)
alpha0s = np.random.uniform(0.05,2,size=1)
beta0s = np.random.uniform(0.05,0.2,size=1)
y0s = Ms(alpha0s, beta0s)
alpha = elfi.Prior(scipy.stats.uniform, 0.005, 2)
beta = elfi.Prior(scipy.stats.uniform, 0.005, 0.2)
Ys = elfi.Simulator(Ms, alpha, beta, observed=y0s)
Sus1 = elfi.Summary(S1, Ys)
ds1 = elfi.Distance('euclidean', Sus1)
smc = elfi.SMC(ds1, batch_size=1, seed=seed)
N = 1000
schedule = [0.7, 0.4, 0.05]
%time result_smc = smc.sample(N, schedule)
MSE1s = MSE1s + [np.square(result_smc.sample_means['alpha'] - alpha0s)]
MSE2s = MSE2s + [np.square(result_smc.sample_means['beta'] - beta0s)]
end = time.time()
print(end-start)
print(np.mean(MSE1s))
print(np.mean(MSE2s))
The current output. Knowing what is the current behavior is useful.
if I run once, there is no problem, but if I want to infer the parameter 100 times using a for loop, it caused problem, I can't run it, when I run the first of the loops, it's ok, and takes about 20s, the second, it becomes 1 minute, and third, it always said there is not enough examples....and elfi is trying.
Describe what you expect the output to be. Knowing the correct behavior is also very useful.
I want to get all the result like the first of the loop.
If n_sim and n_samples (e.g. elfi.Rejection.set_objective()) are type 'float', the code gives "TypeError".
When n_sim and n_samples (e.g. elfi.Rejection.set_objective()) are provided in the format '10e7', self.objective['n_samples'] becomes 'float', which gives "TypeError: 'float' object cannot be interpreted as an integer" in parameter_inference.py::_init_samples_lazy line "samples[node] = np.ones(shape, dtype=dtype) * np.inf". In my opinion, the format '10e7' is more convenient than 10000000 when dealing with large numbers. This can be solved by converting the variables to 'int' automatically.
The problem can be reproduced with any simulator:
n_sim = 10e3
n_samples = 10e2
rej = elfi.Rejection(m, 'dist')
rej.set_objective(n_samples=n_samples, n_sim=n_sim)
elfi is incompatible with networkx 2, but networkx 1.11 is incompatible with matplotlib 3.
This means that it is impossible to plot with networkx and matplotlib 3 if using Elfi in the same environment.
To reproduce, install elfi, matplotlib 3.
import networkx as nx
G = nx.random_regular_graph(2, 100)
nx.draw(G)
this leads to:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-29-34f682212305> in <module>
1 G = nx.random_regular_graph(2, 100)
----> 2 nx.draw(G)
~/miniconda3/envs/ENV/lib/python3.7/site-packages/networkx/drawing/nx_pylab.py in draw(G, pos, ax, hold, **kwds)
124 if 'with_labels' not in kwds:
125 kwds['with_labels'] = 'labels' in kwds
--> 126 b = plt.ishold()
127 # allow callers to override the hold state by passing hold=True|False
128 h = kwds.pop('hold', None)
**AttributeError: module 'matplotlib.pyplot' has no attribute 'ishold'**
ELFI Version: 0.7.5
matplotlib version: 3.2.1
networkx version: 1.11
Python Version: 3.7.6
OS: MacOS
Investigate possibilities for automatic convergence diagnostics.
Problem Statement)
Hello, I have a simple question in understanding of setting a threshold value when I only pass 'n_sim' and 'n_samples' arguments to the rejection sampler.
When inspecting the source code of Rejection class in elfi.methods.parameter_inference code,
there exists a explanation of determining a threshold value when 'n_sim' is passed:
(https://elfi.readthedocs.io/en/latest/_modules/elfi/methods/parameter_inference.html#Rejection.set_objective)
n_sim : int Total number of simulations. The threshold will be the n_samples **smallest** discrepancy among n_sim simulations.
Why the threshold is set to smallest discrepancy among n_samples (i.e. accepted samples)?
I think it is reasonable to set a threshold value to be the largest discrepancy value among collected n_samples in n_sim simulations.
For example, when I set n_samples = 10, n_sim=10000,
finished rejection sampling,
sorted discrepancy values of 1000 simulated samples in an increasing order.
As a result, I got the sequence of discrepancy values:
1, 1.2, 1.4, 4, 4.3, 5, 7, 8, 9, 10, 11, 12, 13, ... , 10000.
Then, shouldn't the threshold value be 10? not 1?
Did I make a mistake in understanding the implementation?
(Reasoning: in Biau et al. 2013, (https://arxiv.org/pdf/1207.6461.pdf)
we can infer the fact that a threshold value is at least smaller than or equal to an infimum of a discrepancy of rejected samples)
Question)
1-1
If my understanding is correct, is there any reference on such an implementation?
1-2
In what part of a code should I change when I want the returned threshold value to be the largest discrepancy value among n_samples?
2
If wrong, what is the returned threshold value in my example?
Is it 10?
Thank you!
Investigate possibility to use Spark and/or TensorFlow.
Currently testing a python module wrapping https://github.com/diyabc/abcranger : posterior methodologies (model choice and parameter estimation) with Random Forests on reference table.
(See the references)
I would like to know the best way to integrate the posterior methodologies into the elfi pipeline. It seems any inference method in elfi should have an "iterate" method with every new sample, but both methodologies haven't got any (they need the whole reference table at once)
See the demos at :
https://github.com/diyabc/abcranger/blob/master/testpy/Model%20Choice%20Demo.ipynb
and
https://github.com/diyabc/abcranger/blob/master/testpy/Parameter%20Estimation%20Demo.ipynb
Note that the basic rejection sampler is more than enough with those methodologies (and the threshold parameter almost doesn't matter).
Regards,
I'm getting the following ValueError when running BOLFI fit()
function:
ValueError: failed to initialize intent(inout) array -- input not fortran contiguous
https://github.com/elfi-dev/notebooks/blob/master/BOLFI.ipynb
%time post = bolfi.fit(n_evidence=200)
Here's the full trace:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-8-e609f23bdce1> in <module>()
1 get_ipython().magic('time')
----> 2 post = bolfi.fit(n_evidence=200)
/Users/wdeback/anaconda/envs/py35/lib/python3.5/site-packages/elfi/methods/parameter_inference.py in fit(self, n_evidence, threshold)
1153 'You must specify the number of evidence (n_evidence) for the fitting')
1154
-> 1155 self.infer(n_evidence)
1156 return self.extract_posterior(threshold)
1157
/Users/wdeback/anaconda/envs/py35/lib/python3.5/site-packages/elfi/methods/parameter_inference.py in infer(self, vis, *args, **kwargs)
247
248 while not self.finished:
--> 249 self.iterate()
250 if vis:
251 self.plot_state(interactive=True, **vis_opt)
/Users/wdeback/anaconda/envs/py35/lib/python3.5/site-packages/elfi/methods/parameter_inference.py in iterate(self)
280 # Submit new batches if allowed
281 while self._allow_submit(self.batches.next_index):
--> 282 next_batch = self.prepare_new_batch(self.batches.next_index)
283 logger.debug("Submitting batch %d" % self.batches.next_index)
284 self.batches.submit(next_batch)
/Users/wdeback/anaconda/envs/py35/lib/python3.5/site-packages/elfi/methods/parameter_inference.py in prepare_new_batch(self, batch_index)
989 acquisition = self.state['acquisition']
990 if len(acquisition) == 0:
--> 991 acquisition = self.acquisition_method.acquire(self.acq_batch_size, t=t)
992
993 batch = arr2d_to_batch(acquisition[:self.batch_size], self.parameter_names)
/Users/wdeback/anaconda/envs/py35/lib/python3.5/site-packages/elfi/methods/bo/acquisition.py in acquire(self, n, t)
123 self.n_inits,
124 self.max_opt_iters,
--> 125 random_state=self.random_state)
126
127 # Create n copies of the minimum
/Users/wdeback/anaconda/envs/py35/lib/python3.5/site-packages/elfi/methods/bo/utils.py in minimize(fun, bounds, grad, prior, n_start_points, maxiter, random_state)
89 if grad is not None:
90 result = fmin_l_bfgs_b(
---> 91 fun, start_points[i, :], fprime=grad, bounds=bounds, maxiter=maxiter)
92 else:
93 result = fmin_l_bfgs_b(
/Users/wdeback/anaconda/envs/py35/lib/python3.5/site-packages/scipy/optimize/lbfgsb.py in fmin_l_bfgs_b(func, x0, fprime, args, approx_grad, bounds, m, factr, pgtol, epsilon, iprint, maxfun, maxiter, disp, callback, maxls)
191
192 res = _minimize_lbfgsb(fun, x0, args=args, jac=jac, bounds=bounds,
--> 193 **opts)
194 d = {'grad': res['jac'],
195 'task': res['message'],
/Users/wdeback/anaconda/envs/py35/lib/python3.5/site-packages/scipy/optimize/lbfgsb.py in _minimize_lbfgsb(fun, x0, args, jac, bounds, disp, maxcor, ftol, gtol, eps, maxfun, maxiter, iprint, callback, maxls, **unknown_options)
319 _lbfgsb.setulb(m, x, low_bnd, upper_bnd, nbd, f, g, factr,
320 pgtol, wa, iwa, task, iprint, csave, lsave,
--> 321 isave, dsave, maxls)
322 task_str = task.tostring()
323 if task_str.startswith(b'FG'):
ValueError: failed to initialize intent(inout) array -- input not fortran contiguous
According to the rendered notebook, the output should be
INFO:elfi.methods.parameter_inference:BOLFI: Fitting the surrogate model...
INFO:elfi.methods.posteriors:Using optimized minimum value (-1.6146) of the GP discrepancy mean function as a threshold
0.6.2
3.5.3
10.12.6
It would be useful to see (non-working) code examples of the API we want to use.
Running BOLFI on the example MA2 with one scaled parameter is not giving meaningful results.
If I run the BOLFI fitting algorithm on the toy example MA2 where one parameter is scaled by the factor 10**4 for sampling and rescaled in the MA2 function the fitting breaks down. The acquisition function is not meaningful and also after running NUTS the posterior is not what it should look like.
If I am right, the fitting of the posterior should be independent of the order of magnitude of the parameters and all parameters in the GP should scale with the order. Is this correct?
I ran in this issue while trying to fit a more complex model.
import numpy as np
import scipy.stats
import matplotlib.pyplot as plt
import logging
logging.basicConfig(level=logging.INFO)
%matplotlib inline
import elfi
seed = (1)
def MA2(t1, t2, n_obs=100, batch_size=1, random_state=None):
# Make inputs 2d arrays for numpy broadcasting with w
t1 = np.asanyarray(t1).reshape((-1, 1))/10000 # rescale parameter t1
t2 = np.asanyarray(t2).reshape((-1, 1))
random_state = random_state or np.random
w = random_state.randn(batch_size, n_obs+2) # i.i.d. sequence ~ N(0,1)
x = w[:, 2:] + t1*w[:, 1:-1] + t2*w[:, :-2]
return x
# true parameters
t1_true = 0.6 * 10000 # scale true parameter t1
t2_true = 0.2
y_obs = MA2(t1_true, t2_true)
# Plot the observed sequence
#plt.figure(figsize=(11, 6));
#plt.plot(y_obs.ravel());
# To illustrate the stochasticity, let's plot a couple of more observations with the same true parameters:
plt.plot(MA2(t1_true, t2_true).ravel());
plt.plot(MA2(t1_true, t2_true).ravel());
# a node is defined by giving a distribution from scipy.stats together with any arguments (here 0 and 2)
t1 = elfi.Prior(scipy.stats.uniform, 0, 2 *10000) # scale prior distribution of parameter t1
t2 = elfi.Prior('uniform', 0, 2)
Y = elfi.Simulator(MA2, t1, t2, observed=y_obs)
def autocov(x, lag=1):
C = np.mean(x[:,lag:] * x[:,:-lag], axis=1)
return C
S1 = elfi.Summary(autocov, Y)
S2 = elfi.Summary(autocov, Y, 2) # the optional keyword lag is given the value 2
# Finish the model with the final node that calculates the squared distance (S1_sim-S1_obs)**2 + (S2_sim-S2_obs)**2
d = elfi.Distance('euclidean', S1, S2)
#elfi.draw(d) # just give it a node in the model, or the model itself (d.model)
log_d = elfi.Operation(np.log, d.model['d'])
# scale also bounds and acquisition noise to have the appropriate std
bolfi = elfi.BOLFI(log_d, batch_size=1, initial_evidence=20, update_interval=10,
bounds={'t1':(-20000, 20000), 't2':(-1, 1)}, acq_noise_var=[0.1*10000**2, 0.1], seed=seed)
#%%time
post = bolfi.fit(n_evidence=200)
#%%time
result_BOLFI = bolfi.sample(1000, info_freq=100)
The acquisition function is screwed:
and also the marginals are uninformative:
I expected a similar shape of the functions as for the unscaled model.
As per the title, calling the summary
method of elfi.methods.results.Sample
object throws NoneType error in Python interactive mode
This is following the MA(2) model in the ELFI documentation.
import numpy as np
import scipy.stats
seed = 20170530
np.random.seed(seed)
def MA2(t1, t2, n_obs=100, batch_size=1, random_state=None):
# Make inputs 2d arrays for numpy broadcasting with w
t1 = np.asanyarray(t1).reshape((-1, 1))
t2 = np.asanyarray(t2).reshape((-1, 1))
random_state = random_state or np.random
w = random_state.randn(batch_size, n_obs+2) # i.i.d. sequence ~ N(0,1)
x = w[:, 2:] + t1*w[:, 1:-1] + t2*w[:, :-2]
return x
t1_true = 0.6; t2_true = 0.2
y_obs = MA2(t1_true, t2_true)
import elfi
t1 = elfi.Prior(scipy.stats.uniform, 0, 2)
t2 = elfi.Prior('uniform', 0, 2)
Y = elfi.Simulator(MA2, t1, t2, observed=y_obs)
def autocov(x, lag=1):
C = np.mean(x[:,lag:] * x[:,:-lag], axis=1)
return C
S1 = elfi.Summary(autocov, Y); S2 = elfi.Summary(autocov, Y, 2)
d = elfi.Distance('euclidean', S1, S2)
rej = elfi.Rejection(d, batch_size=10000, seed=seed)
result = rej.sample(1000, quantile=0.01)
result.summary()
In [2]: result.summary()
Method: Rejection
Number of posterior samples: 1000
Number of simulations: 100000
Threshold: 0.123
Posterior means: t1: 0.552, t2: 0.272
TypeError Traceback (most recent call last)
<ipython-input-2-c942b82acb49> in <module>()
> 1 result.summary()
TypeError: 'NoneType' object is not callable
0.6.0
3.6.1
MacOSX 10.12.5
I would like to delete a node after creating it. I have to restart my interactive Python session for this to work.
Using the example from the quickstart tutorial, I create a simulator but then delete it and try to redefine it.
import elfi
mu = elfi.Prior('uniform', -2, 4)
sigma = elfi.Prior('uniform', 1, 4)
import scipy.stats as ss
import numpy as np
def simulator(mu, sigma, batch_size=1, random_state=None):
mu, sigma = np.atleast_1d(mu, sigma)
return ss.norm.rvs(mu[:, None], sigma[:, None], size=(batch_size, 30), random_state=random_state)
def mean(y):
return np.mean(y, axis=1)
def var(y):
return np.var(y, axis=1)
# Set the generating parameters that we will try to infer
mean0 = 1
std0 = 3
# Generate some data (using a fixed seed here)
np.random.seed(20170525)
y0 = simulator(mean0, std0)
sim = elfi.Simulator(simulator, mu, sigma, observed=y0, name = "sim")
# Trying to delete the sim object ...
del sim
sim = elfi.Simulator(simulator, mu, sigma, observed=y0, name = "sim")
>> ValueError: Node sim already exists
sim
>> NameError: name 'sim' is not defined
0.6.0
3.6.1
Mac OSX 10.12.5
Parellization does not work with external operations
Code as follows:
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
import scipy.io as sio
import scipy.stats as ss
from elfi.methods.utils import (GMDistribution, ModelPrior, arr2d_to_batch,
batch_to_arr2d, ceil_to_batch_size, weighted_var)
import elfi
import logging
logging.basicConfig(level=logging.INFO)
elfi.set_client('multiprocessing')
command = './RecEstABC2 {0} {1} {2} {3} {4} {5} {6} {7} {8} {9} {10} {11} {12} {13} {seed}'
seed = 865
alpha1 = np.log(9)
alpha2 = np.log(1.0/0.3-1.0)
alpha3 = np.log(1.0/0.7 - 1.0)
alpha4 = 0.6931471805599453
alpha5 = -0.8472978603872036
alpha6 = +0.8472978603872036
alpha7 = 0.8472978603872036
alpha9 = 1.6094
alpha8 = alpha9
alpha10 =0.3
alpha11 = 1 -(0.3*0.3 + 0.7)
alpha12 = 0.3
alpha14 = 2.302585092994046
alpha13 = alpha14
RecEst = elfi.tools.external_operation(command)
RecEst_Vec = elfi.tools.vectorize(RecEst)
y_obs = RecEst(alpha1,alpha2,alpha3,alpha4,alpha5,alpha6,alpha7,alpha8,alpha9,alpha10,alpha11,alpha12,alpha13,alpha14,seed = seed)
# %%
m = elfi.ElfiModel(name='RecEst')
elfi.Prior('norm',0,1,model=m,name='alpha1')
elfi.Prior('norm',0,1,model=m,name='alpha2')
elfi.Prior('norm',0,1,model=m,name='alpha3')
elfi.Prior('norm',0,1,model=m,name='alpha9')
m,name='alpha13')
elfi.Prior('norm',0,1,model=m,name='alpha14')
)
RecEst_node = elfi.Simulator(RecEst_Vec, m['alpha1'], m['alpha2'], m['alpha3'],
alpha4, alpha5, alpha6, alpha7,
alpha8,m['alpha9'],alpha10,alpha11,alpha12,alpha13,m['alpha14'], observed=y_obs, name='sim')
elfi.draw(m)
m['sim'].generate(3)
def chi_squared(simulated,observed):
"""Return Chi squared goodness of fit.
Adjusts for differences in magnitude between dimensions.
Parameters
----------
simulated : np.arrays
observed : tuple of np.arrays
"""
simulated = np.column_stack(simulated)
observed= np.column_stack(observed)
d = np.sum((simulated - observed)**2. / (observed*0.05)**2, axis=1)
return d
def T1(simulatedRes):
simulatedRes = np.column_stack(simulatedRes)
return simulatedRes
elfi.Summary(T1,m['sim'],name = 'Mean')
t = elfi.Discrepancy(chi_squared, m['Mean'], name='d')
elfi.draw(m)
log_d = elfi.Operation(np.log, m['d'])
bolfi = elfi.BOLFI(log_d, batch_size=1, initial_evidence=300, update_interval=10,
bounds={'alpha1':(-3,3),'alpha2':(-3,3),'alpha3':(-3,3),'alpha9':(-2.5,2.5),
'alpha14' :(-3,3)},
acq_noise_var=[0.1, 0.1,0.1, 0.1,0.1],seed=870)
bolfi.acquisition_method
post = bolfi.fit(n_evidence=400)
bolfi.plot_gp()
bolfi.target_model
bolfi.plot_discrepancy();
bolfi.plot_state();
result_BOLFI = bolfi.sample(1000, info_freq=100,target_prob = 0.95, n_chains = 3)
opt = bolfi.extract_result()
opt.x_min
result_BOLFI
result_BOLFI.plot_traces();
result_BOLFI.plot_marginals();
#### Current Output:
Ignores parallelization or gives error and take foverver
#### ELFI Version:
0.7.3
#### Python Version:
3.6
#### Operating System:
Ubuntu 18.04
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.