elfi-dev / elfi Goto Github PK

View Code? Open in Web Editor NEW

261.0 261.0 60.0 2.21 MB

ELFI - Engine for Likelihood-Free Inference

Home Page: http://elfi.readthedocs.io

License: BSD 3-Clause "New" or "Revised" License

Python 98.48% Makefile 0.56% C++ 0.89% Dockerfile 0.06%

bayesian bayesian-inference inference likelihood-free python simulator statistics

elfi's People

Contributors

Stargazers

Watchers

elfi's Issues

Running a model using BOLFI with parallelization

Summary:

Please provide a short couple sentence summary.
I am trying to run a model using BOLFI with parallelization.
I am running this in PyCharm and I am not using a Jupyter notebook.

Description:

Describe the issue as clearly as possible.
I am using kernel Matern32 thus I am passing a GPyRegression as the target model.
I already tried including the import like this:

   ipyclient = elfi.get_client().ipp_client
   with ipyclient[:].sync_imports():
        import GPy

However, i am getting the same error.

Reproducible Steps:

Please report steps to reproduce the issue. If it's not possible to reproduce, please include a description of how you discovered the issue.

If you have a reproducible example, please include it.

  matern = GPy.kern.Matern32(10, 0.01, 0.2)
  matern32 = elfi.GPyRegression(parameter_names=ml.parameter_names,kernel=matern, 
                     noise_var=0.05, bounds={'c0': (-1, 1), 'c1': (-1, 1), 'c2': (-1, 1), 'c3': (-1, 1), 'c4': (-1, 1),
                                               'c5': (-1, 1), 'c6':(-1,1),'c7':(-1,1),'c8':(-1,1),'c9':(-1,1)})
  bolfi = elfi.BOLFI(log_d, batch_size=1, initial_evidence=20, update_interval=10, target_model=matern32,seed=seed,async=True)
  bolfi.fit(n_evidence=200)
  result_BOLFI = bolfi.sample(1000, info_freq=1000)
  print(result_BOLFI)

Current Output:

The current output. Knowing what is the current behavior is useful.
I am getting this error after running the script:
ipyparallel.error.RemoteError: AttributeError('Matern32' object has no attribute '_name')

ELFI Version: 0.7

Python Version: 3.5

Operating System: Ubuntu 16.4

ABC for parameter estimation in ODE-based model

Summary:

I am trying to use an ABC method in order to estimate the parameters of a system of ordinary differential equations.

Description:

I'd like to use ELFI and I'm using code gathered from the tutorials. The problem is the following: one of the variables of the model is computed from a conservation equation. That is, an ODE is related to a quantity which is computed as the difference between the total amount of that quantity (which has to be estimated) and the free amount of that quantity (which is passed to the function as an initial condition). The problem is that in doing so, I am taking the difference between a prior distribution and a float:

TypeError: In executing node '_simulator_feee': unsupported operand type(s) for -: 'Prior' and 'float'.

Reproducible Steps:

import numpy as np
from scipy.integrate import odeint
from matplotlib import pyplot as plt
from pandas import read_csv
mcn_observed = np.asarray(read_csv("../data/mcn_sol.csv", sep = ",", header = None))
mcn_observed = mcn_observed.flatten()

# This is the system of ODEs. Note the conservation equations
def MCN(states, t, CCP, params):
    
    # Parameters to be fitted
    k_SMPF,k_ASS,k_DISS,k_DRUM1,k_IRUM1,k_I2RUM1,k_1SLP1,V_2SLP1,J_1SLP1,J_2SLP1, \
    k_1IE,V_2IE,J_1IE,J_2IE,k_DMPFRUM1,V_SRUM1,k_ARUM1,k_DRUM1P,V_WEE1,k_WEE1, \
    J_1WEE1,J_2WEE1,WEE1T,k_CDC25,V_CDC25,J_1CDC25,J_2CDC25,mu,k_1WEE1,k_2WEE1, \
    k_1CDC25,k_2CDC25,k_D1CYC,k_D2CYC,CDC25T,IET,SLP1T,alpha,k_DX = params
    
    k_ASS    = 100 # 1
    k_I2RUM1 = 50  # 5
    k_ARUM1  = 35  # 16
    k_DRUM1P = 250 # 17
    
    # Variables of the system, coming from the initial conditions
    MPF, MPFP, SLP1A, IEA, MPFRUM1, RUM1, RUM1P, WEE1, CDC25P, MASS = states
    
    # Conservation equations
    SLP1  = SLP1T - SLP1A
    IE    = IET - IEA
    WEE1P = WEE1T - WEE1
    CDC25 = CDC25T - CDC25P
    
    dMPFdt = k_SMPF*MASS - k_WEE1*MPF + k_CDC25*MPFP - \
            (k_D1CYC+k_D2CYC*SLP1A)*MPF - k_ASS*RUM1*MPF + \
            + (k_DISS+k_DRUM1+k_IRUM1)*MPFRUM1 + k_DX*CCP*MPFRUM1
    dMPFPdt = k_WEE1*MPF - k_CDC25*MPFP - (k_D1CYC+k_D2CYC*SLP1A)*MPFP
    dSLP1Adt = k_1SLP1*IEA*SLP1/(J_1SLP1+SLP1) - \
            V_2SLP1*SLP1A/(J_2SLP1+SLP1A)
    dIEAdt = k_1IE*(MPF+alpha*MPFP)*IE/(J_1IE+IE) - V_2IE*IEA/(J_2IE+IEA)
    dMPFRUM1dt = k_ASS*RUM1*MPF - \
            (k_DISS+k_DRUM1+k_IRUM1+k_DMPFRUM1)*MPFRUM1 - \
            k_DX*CCP*MPFRUM1
    dRUM1dt = V_SRUM1 - k_ASS*RUM1*MPF + (k_DISS+k_DMPFRUM1)*MPFRUM1 - \
            k_I2RUM1*MPFP*RUM1 + k_ARUM1*RUM1P - k_DRUM1*RUM1 - \
            k_DX*CCP*RUM1 + k_DX*CCP*MPFRUM1 - k_DX*CCP*RUM1
    dRUM1Pdt = k_IRUM1*MPFRUM1 + k_I2RUM1*MPFP*RUM1 - \
            k_ARUM1*RUM1P - k_DRUM1P*(MPF+alpha*MPFP)*RUM1P - \
            k_DRUM1*RUM1P + k_DX*CCP*RUM1 + k_DX*CCP*MPFRUM1 - \
            k_DX*CCP*RUM1P
    dWEE1dt = V_WEE1*WEE1P/(J_1WEE1+WEE1P) - \
            k_WEE1*(MPF+alpha*MPFP)*WEE1/(J_2WEE1+WEE1)
    dCDC25Pdt = k_CDC25*(MPF+alpha*MPFP)*CDC25/(J_1CDC25+CDC25) - \
            V_CDC25*CDC25P/(J_2CDC25+CDC25P)
    dMASSdt = mu*MASS
    
    system = [dMPFdt, dMPFPdt, dSLP1Adt, dIEAdt, dMPFRUM1dt, \
              dRUM1dt, dRUM1Pdt, dWEE1dt, dCDC25Pdt, dMASSdt]
    
    return system

# Priors
k_SMPF     = elfi.Prior('uniform', 0.0, 1.0, name = 'k_SMPF')
k_ASS      = elfi.Prior('uniform', 0.0, 1.0, name = 'k_ASS')
k_DISS     = elfi.Prior('uniform', 0.0, 1.0, name = 'k_DISS')
k_DRUM1    = elfi.Prior('uniform', 0.0, 1.0, name = 'k_DRUM1')
k_IRUM1    = elfi.Prior('uniform', 0.0, 1.0, name = 'k_IRUM1')
k_I2RUM1   = elfi.Prior('uniform', 0.0, 1.0, name = 'k_I2RUM1')
k_1SLP1    = elfi.Prior('uniform', 0.0, 1.0, name = 'k_1SLP1')
V_2SLP1    = elfi.Prior('uniform', 0.0, 1.0, name = 'V_2SLP1')
J_1SLP1    = elfi.Prior('uniform', 0.0, 1.0, name = 'J_1SLP1')
J_2SLP1    = elfi.Prior('uniform', 0.0, 1.0, name = 'J_2SLP1')
k_1IE      = elfi.Prior('uniform', 0.0, 1.0, name = 'k_1IE')
V_2IE      = elfi.Prior('uniform', 0.0, 1.0, name = 'V_2IE')
J_1IE      = elfi.Prior('uniform', 0.0, 1.0, name = 'J_1IE')
J_2IE      = elfi.Prior('uniform', 0.0, 1.0, name = 'J_2IE')
k_DMPFRUM1 = elfi.Prior('uniform', 0.0, 1.0, name = 'k_DMPFRUM1')
V_SRUM1    = elfi.Prior('uniform', 0.0, 1.0, name = 'V_SRUM1')
k_ARUM1    = elfi.Prior('uniform', 0.0, 1.0, name = 'k_ARUM1')
k_DRUM1P   = elfi.Prior('uniform', 0.0, 1.0, name = 'k_DRUM1P')
V_WEE1     = elfi.Prior('uniform', 0.0, 1.0, name = 'V_WEE1')
k_WEE1     = elfi.Prior('uniform', 0.0, 1.0, name = 'k_WEE1')
J_1WEE1    = elfi.Prior('uniform', 0.0, 1.0, name = 'J_1WEE1')
J_2WEE1    = elfi.Prior('uniform', 0.0, 1.0, name = 'J_2WEE1')
WEE1T      = elfi.Prior('uniform', 0.0, 1.0, name = 'WEE1T')
k_CDC25    = elfi.Prior('uniform', 0.0, 1.0, name = 'k_CDC25')
V_CDC25    = elfi.Prior('uniform', 0.0, 1.0, name = 'V_CDC25')
J_1CDC25   = elfi.Prior('uniform', 0.0, 1.0, name = 'J_1CDC25')
J_2CDC25   = elfi.Prior('uniform', 0.0, 1.0, name = 'J_2CDC25')
mu         = elfi.Prior('uniform', 0.0, 1.0, name = 'mu')
k_1WEE1    = elfi.Prior('uniform', 0.0, 1.0, name = 'k_1WEE1')
k_2WEE1    = elfi.Prior('uniform', 0.0, 1.0, name = 'k_2WEE1')
k_1CDC25   = elfi.Prior('uniform', 0.0, 1.0, name = 'k_1CDC25')
k_2CDC25   = elfi.Prior('uniform', 0.0, 1.0, name = 'k_2CDC25')
k_D1CYC    = elfi.Prior('uniform', 0.0, 1.0, name = 'k_D1CYC')
k_D2CYC    = elfi.Prior('uniform', 0.0, 1.0, name = 'k_D2CYC')
CDC25T     = elfi.Prior('uniform', 0.0, 1.0, name = 'CDC25T')
IET        = elfi.Prior('uniform', 0.0, 1.0, name = 'IET')
SLP1T      = elfi.Prior('uniform', 0.0, 1.0, name = 'SLP1T')
alpha      = elfi.Prior('uniform', 0.0, 1.0, name = 'alpha')
k_DX       = elfi.Prior('uniform', 0.0, 1.0, name = 'k_DX')

# Function for the simulator
def sim(params, batch_size = 1, random_state = None):
    
    # Initial conditions
    ivp = np.random.randint(low = 0, high = 5, size = 10)
    
    # Time
    stop_time = 500
    t = np.linspace(0, stop_time, stop_time*10)
    
    # Odeint
    res = odeint(MCN, ivp, t, args = (0, params))
    res = np.asarray(result).flatten()
    
    return res

params = list((k_SMPF,k_ASS,k_DISS,k_DRUM1,k_IRUM1,k_I2RUM1,k_1SLP1,V_2SLP1,J_1SLP1,J_2SLP1, \
k_1IE,V_2IE,J_1IE,J_2IE,k_DMPFRUM1,V_SRUM1,k_ARUM1,k_DRUM1P,V_WEE1,k_WEE1, \
J_1WEE1,J_2WEE1,WEE1T,k_CDC25,V_CDC25,J_1CDC25,J_2CDC25,mu,k_1WEE1,k_2WEE1, \
k_1CDC25,k_2CDC25,k_D1CYC,k_D2CYC,CDC25T,IET,SLP1T,alpha,k_DX))

elfi_sim = elfi.Simulator(sim, params, observed = mcn_observed)
elfi_sim.generate()

When I run elfi_sim.generate(), I get the error. How can I deal with this situation?
Moreover, when I first tried to use ELFI for this very same problem, I did not get this error but it was complaining about the fact that res from the odeint step was not 1D.

ELFI Version: latest version

Python Version: 3.5

Operating System: Mac OS

Use Dask for parallel computing?

Hi all, I'm just listening to a talk by Jukka and learning about ELFI for the first time, sounds like an amazing project!

I just thought I would mention a potential connection with another project called Dask, which is providing a parallel computing framework for Python that can be run on a wide variety of parallel & distributed computing platforms, including local multicore, public cloud and traditional HPC environments. I saw ELFI supports multiprocessing and ipyparallel, so Dask could potentially be used instead of these and make it easier to then deploy and run on a broader range of computing backends.

No need to respond, feel free to close the issue straight away, just thought I'd make the connection.

Is BOLFI robust to scaling of the parameters?

Summary:

Running BOLFI on the example MA2 with one scaled parameter is not giving meaningful results.

Description:

If I run the BOLFI fitting algorithm on the toy example MA2 where one parameter is scaled by the factor 10**4 for sampling and rescaled in the MA2 function the fitting breaks down. The acquisition function is not meaningful and also after running NUTS the posterior is not what it should look like.
If I am right, the fitting of the posterior should be independent of the order of magnitude of the parameters and all parameters in the GP should scale with the order. Is this correct?
I ran in this issue while trying to fit a more complex model.

Reproducible Steps:

import numpy as np
import scipy.stats
import matplotlib.pyplot as plt
import logging
logging.basicConfig(level=logging.INFO)

%matplotlib inline
import elfi
seed = (1)

def MA2(t1, t2, n_obs=100, batch_size=1, random_state=None):
    # Make inputs 2d arrays for numpy broadcasting with w
    t1 = np.asanyarray(t1).reshape((-1, 1))/10000 # rescale parameter t1
    t2 = np.asanyarray(t2).reshape((-1, 1))
    random_state = random_state or np.random

    w = random_state.randn(batch_size, n_obs+2)  # i.i.d. sequence ~ N(0,1)
    x = w[:, 2:] + t1*w[:, 1:-1] + t2*w[:, :-2]
    return x

# true parameters
t1_true = 0.6 * 10000 # scale true parameter t1
t2_true = 0.2

y_obs = MA2(t1_true, t2_true)

# Plot the observed sequence
#plt.figure(figsize=(11, 6));
#plt.plot(y_obs.ravel());

# To illustrate the stochasticity, let's plot a couple of more observations with the same true parameters:
plt.plot(MA2(t1_true, t2_true).ravel());
plt.plot(MA2(t1_true, t2_true).ravel());

# a node is defined by giving a distribution from scipy.stats together with any arguments (here 0 and 2)
t1 = elfi.Prior(scipy.stats.uniform, 0, 2 *10000) # scale prior distribution of parameter t1
t2 = elfi.Prior('uniform', 0, 2)

Y = elfi.Simulator(MA2, t1, t2, observed=y_obs)

def autocov(x, lag=1):
    C = np.mean(x[:,lag:] * x[:,:-lag], axis=1)
    return C

S1 = elfi.Summary(autocov, Y)
S2 = elfi.Summary(autocov, Y, 2)  # the optional keyword lag is given the value 2

# Finish the model with the final node that calculates the squared distance (S1_sim-S1_obs)**2 + (S2_sim-S2_obs)**2
d = elfi.Distance('euclidean', S1, S2)

#elfi.draw(d)  # just give it a node in the model, or the model itself (d.model)

log_d = elfi.Operation(np.log, d.model['d'])
# scale also bounds and acquisition noise to have the appropriate std
bolfi = elfi.BOLFI(log_d, batch_size=1, initial_evidence=20, update_interval=10,
                   bounds={'t1':(-20000, 20000), 't2':(-1, 1)}, acq_noise_var=[0.1*10000**2, 0.1], seed=seed)

#%%time
post = bolfi.fit(n_evidence=200)

#%%time 
result_BOLFI = bolfi.sample(1000, info_freq=100)

Current Output:

The acquisition function is screwed:

and also the marginals are uninformative:

Expected Output:

I expected a similar shape of the functions as for the unscaled model.

ELFI Version: 0.7

Python Version: 3.5

Operating System:

threshold value returned when specifying 'n_sim' in Rejection sampler

Problem Statement)
Hello, I have a simple question in understanding of setting a threshold value when I only pass 'n_sim' and 'n_samples' arguments to the rejection sampler.

When inspecting the source code of Rejection class in elfi.methods.parameter_inference code,
there exists a explanation of determining a threshold value when 'n_sim' is passed:
(https://elfi.readthedocs.io/en/latest/_modules/elfi/methods/parameter_inference.html#Rejection.set_objective)

n_sim : int Total number of simulations. The threshold will be the n_samples **smallest** discrepancy among n_sim simulations.

Why the threshold is set to smallest discrepancy among n_samples (i.e. accepted samples)?

I think it is reasonable to set a threshold value to be the largest discrepancy value among collected n_samples in n_sim simulations.

For example, when I set n_samples = 10, n_sim=10000,
finished rejection sampling,
sorted discrepancy values of 1000 simulated samples in an increasing order.

As a result, I got the sequence of discrepancy values:
1, 1.2, 1.4, 4, 4.3, 5, 7, 8, 9, 10, 11, 12, 13, ... , 10000.

Then, shouldn't the threshold value be 10? not 1?
Did I make a mistake in understanding the implementation?
(Reasoning: in Biau et al. 2013, (https://arxiv.org/pdf/1207.6461.pdf)
we can infer the fact that a threshold value is at least smaller than or equal to an infimum of a discrepancy of rejected samples)

Question)

1-1
If my understanding is correct, is there any reference on such an implementation?
1-2
In what part of a code should I change when I want the returned threshold value to be the largest discrepancy value among n_samples?
2
If wrong, what is the returned threshold value in my example?
Is it 10?
Thank you!

Tightly pinned requirements are becoming out of date

Summary:

Some of the requirements are very tightly pinned, e.g. dask[distributed]==2.30.0, and also out of date with regards to the latest developments. dask is now up to 2021.9. Thus, our projects using elfi are stuck with the old version of dask.

Description:

The requirements of this library are falling behind the latest developments of its dependencies. For example,

dask is now up to 2021.9 but elfi is tightly pinned to 2.30.0
numpy is up to 1.20; elfi is at 0.12.1

If these requirements could be updated or relaxed, then projects dependent on elfi could access new features of the other libraries.

Reproducible Steps:

N/A

Current Output:

Environment with old version of dependency libraries.

Expected Output:

Updated or relaxed dependency definitions.

ELFI Version: `0.8.0`

Python Version: `3.7.10`

Operating System: macOS BigSur 11.6

Paths to bdm cpp code in documentation incorrect

Summary:

The paths to the cpp code in the documentation notebook are incorrect.

Description:

The paths are listed as resources/cpp and ./resources/cpp/bdm whereas they should be ./resources and ./resources/bdm respectively.

Reproducible Steps:

The documentation for using non-Python operations (here) (as it currently stands) does not run the example as expected.

Current Output:

Currently will return make: *** ./resources/cpp: No such file or directory. Stop.

Expected Output:

No output is expected but the cpp code needs to be compiled/linked and the executable moved, these currently don't occur.

ELFI Version:

0.6.0

Python Version:

3.6.1

Operating System:

Mac OS 10.12.5

elfi.Prior and NoneType error

Summary:

The first line in the quickstart tutorial throws an error on my system. Not entirely sure what I'm doing wrong. Thanks in advance for the help!

Reproducible Steps:

The following reproduces my error.

import elfi
elfi.Prior('uniform', -2, 4)

Current Output:

The current error is pasted here:

>>> mu = elfi.Prior('uniform', -2, 4)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "~/Library/Python/3.6/lib/python/site-packages/elfi/model/elfi_model.py", line 676, in __init__
    super(Prior, self).__init__(distribution, *params, size=size, **kwargs)
  File "~/Library/Python/3.6/lib/python/site-packages/elfi/model/elfi_model.py", line 595, in __init__
    super(RandomVariable, self).__init__(*params, state=state, **kwargs)
  File "~/Library/Python/3.6/lib/python/site-packages/elfi/model/elfi_model.py", line 531, in __init__
    super(StochasticMixin, self).__init__(*parents, state=state, **kwargs)
  File "~/Library/Python/3.6/lib/python/site-packages/elfi/model/elfi_model.py", line 345, in __init__
    name = self._give_name(name, model)
  File "~/Library/Python/3.6/lib/python/site-packages/elfi/model/elfi_model.py", line 472, in _give_name
    while re.match('\s*super\(', info.code_context[0]):
TypeError: 'NoneType' object is not subscriptable

ELFI Version:

>>> import elfi; elfi.__version__
'0.6.0'
>>> import numpy; numpy.__version__
'1.13.1'
>>> import scipy; scipy.__version__
'0.19.1'

Python Version:

$ python3 --version
Python 3.6.1

Operating System:

Mac OS 10.12.5

$ sw_vers -productVersion 
10.12.5

Different mean functions for the Gaussian process

Is it possible to define the mean function for the Gaussian process.

A quadratic one for example?

If so, can you provide the syntax for that?

Thank you.

sampling times in bolfi.sample

Summary:

Wall time in bolfi.sample seems to depend on the n_evidence parameter in bolfi.fit while I think this should not be the case.

Description:

The advantage in using a surrogate of the discrepancies fitted using Gaussian processes, is that, once the GP hyperparameters are determined, MCMC sampling (via bolfi.sample) should be completely independent of the model simulator, thus resulting in a scalable algorithm.
If I have not misunderstood the theory in the relevant paper, once bolfi.fit(n_evidence=xxx) has completed, the execution of bolfi.sample(numsamples) to produce numsaples posterior samples should require the same wall-clock time, no matter the value I set for n_evidence. This is because bolfi.samples only invokes the fitted GP and never the true model simulator.

This is clearly not the case, hence I wonder what I am missing.

Reproducible Steps:

I build up on the example in https://elfi.readthedocs.io/en/latest/usage/BOLFI.html

I run all the code before the call to elfi. BOLFI. Then I run the following:

bolfi = elfi.BOLFI(log_d, batch_size=1, initial_evidence=20, update_interval=10, bounds={'t1':(-2, 2), 't2':(-1, 1)}, acq_noise_var=[0.1, 0.1], seed=seed)

Then I fit the GP model on 100 evidence points:
%time post = bolfi.fit(n_evidence=100)
INFO:elfi.methods.posteriors:Using optimized minimum value (-1.6966) of the GP discrepancy mean function as a threshold
Wall time: 24.7 s

and produce 1000 posterior samples
%time result_BOLFI = bolfi.sample(1000, info_freq=1000)
4 chains of 1000 iterations acquired. Effective sample size and Rhat for each parameter:
t1 2063.7620137995186 1.0014860681196993
t2 2415.9426764728387 0.9994192510776999
Wall time: 57.4 s

Now I enlarge the number of evidence points to 300:
%time post = bolfi.fit(n_evidence=300)
INFO:elfi.methods.posteriors:Using optimized minimum value (-1.6316) of the GP discrepancy mean function as a threshold
Wall time: 1min 4s

and sample 1000 draws:

%time result_BOLFI = bolfi.sample(1000, info_freq=1000)
4 chains of 1000 iterations acquired. Effective sample size and Rhat for each parameter:
t1 1908.9201493103064 1.0000733568191036
t2 2329.652810214964 1.0009504568577294
Wall time: 1min 35s

Expected Output:

So posterior sampling time has increased from 57" to 1'35" in the two attempts. Shouldn't the sampling times be roughly the same?

ELFI Version:

0.7.3

Python Version:

3.5.6

Operating System:

Windows 10

Please explain how "schedule" works with SMC ABC

Summary:

Please provide more detail on how the SMC ABC sampling works; in particular how the "schedule" affects sampling and performance.

Description:

The tutorial is quite brief on the SMC ABC sampling:

For sampling, one has to define the number of output samples, the number of populations and a schedule i.e. a list of quantiles to use for each population. In essence, a population is just refined rejection sampling.

N = 1000
schedule = [0.7, 0.2, 0.05]
%time result_smc = smc.sample(N, schedule)

Does that mean each population is a new prior for a rejection sampling round with the given quantile?

I would expect the performance of such an algorithm to be comparable to rejection sampling with quantile = 0.70.20.05 but the time it takes to sample for SMC ABC is much longer.

Thank you for any explanation or link!

Also, a progress bar similar to PyMC sampling would be very much appreciated.

ELFI Version: 0.7

Python Version: 3.6

Operating System: Ubuntu 16.04

Support for Spark, TensorFlow

Summary:

Investigate possibility to use Spark and/or TensorFlow.

SMC when prior pdf=0

Summary:

SMC should not propose forbidden particles.

Description:

A prior pdf may forbid certain values for particles. These should not be allowed by the SMC proposals either.

ABC rejection comparing two time series in deterministic Lotka-Volterra model

Summary:

Running a deterministic Lotka-Volterra example parameterised using ABC rejection. This follows the example outlined in Toni et al, (2009) where two parameters are to be found in predator-prey system. It is required to choose parameters that minimize the sum of squared errors between the observed and simulated time series. Priors for the parameters are taken from the paper, observed data are estimated from the paper above.

Rather than outputting a multi-dimensional array from the LV model, I've concatenated results from the model into a 1D array.

Not quite sure which of these functions I need to change. Perhaps the LV() function needs to allow processing of vector inputs? Any help would be much appreciated!

Description:

The Lotka-Volterra system of ODEs is created in Python, priors are defined, a Simulator object is defined (and tested), distance measure defined (and tested), ABC rejection sampler defined, but sampling from the rejection sampler throws an error.

Reproducible Steps:

The following code reproduces the error.

import numpy as np
import elfi
from scipy.integrate import odeint

seed = 2017
np.random.seed(seed)

# Define the observed data and flatten the array
y_obs_lv = np.array([[1.9, 0.5],[0.63, 2.6],[0.2, 1.55],[0.3, 0.03],
    [1.65, 1.15],[1.15, 1.68],[0.25, 1.1],[2.94, 0.94]])
y_obs_lv = y_obs_lv.flatten()

# Define the Lotka-Volterra derivative function
def LVderivative(xy, t, a, b):
    x, y = xy
    
    dx = a*x - x*y
    dy = b*x*y - y
    return [dx, dy]

# Define the function as input to the elfi.Simulator object
def LV(a, b, batch_size = 1, random_state = None):
    
    # Initial conditions state
    y0 = [1.0, 0.5]
    
    # Define times over which to solve the system of ODEs
    times = np.linspace(0, 15, 151)
    
    # Actually solve the ODE
    result = odeint(LVderivative, y0, times, args = (*a, *b))
    
    # Define the subset of times of interest
    t_obs = np.array([1.2, 2.4, 3.9, 5.7, 7.5, 9.6, 11.9, 14.5])*10
    t_obs = t_obs.astype(int)
    
    # Subset the solution and flatten the returned output
    return result[t_obs,:].flatten()

# Define the priors for alpha and beta
a = elfi.Prior('uniform', -10, 20, name = 'a')
b = elfi.Prior('uniform', -10, 20, name = 'b')

# Define the elfi.Simulator object
Y_lv = elfi.Simulator(LV, a, b, observed = y_obs_lv)

# Check the Simulator object can generate data
Y_lv.generate()
> array([  5.73756329e-02,   3.45129771e-03,   3.49858603e-03,
         8.29044191e-04,   1.06221009e-04,   1.82366487e-04,
         1.60337990e-06,   3.01316351e-05,   2.38143636e-08,
         4.98299215e-06,  -1.46843950e-09,   6.08976108e-07,
        -1.32007952e-09,   5.80888435e-08,   5.60478623e-11,
         3.97446214e-09])

# Define sum of squared errors as the distance function
def SSE(x, y):
 return np.sum((x-y)**2)
d_lv = elfi.Distance(SSE, Y_lv)

# Check the Distance object can generate SSE correctly
d_lv.generate()
> 516.5330268484762

# Run ABC-rejection
rej = elfi.Rejection(d_lv, batch_size = 1, seed = seed)
result = rej.sample(10000, threshold = 4.3)

Current Output:

Current output gives the following error

> TypeError: object of type 'numpy.float64' has no len()

ELFI Version:

0.6.0

Python Version:

3.6.1

Operating System:

OSX 10.12.5

AB Testing with elfi, bernoulli+lognorm

Summary:

I'm trying to replicate a STAN model I've been using with elfi. I've run into some trouble with Elfi and propably Python.

Description:

The lognorm and bernoulli distances (using scipy stats functions) are implemented in Python, prior is defined (using one lognorm 2,10), a Simulator object is defined (the function is tested and it ouputs similar data as in elfi examples), ABC rejection sampler defined, but sampling from the rejection sampler throws an error.

Reproducible Steps:

I have defined the simulator like this:

def legacy_updated(data, batch_size=1, random_state=None):
    data= np.atleast_1d(data)
    global result_array 
    rows = len(data)
    cols = len(data[0])
    result_array = np.array([])
    for row in range(rows):
        if(1 > row):
            for col in range(cols):
                x = data[row][col]*bernoulli_pmf(bernoulli_theta(data[None][row]))
                y = lognormal_pdf(frozen_lognormalmean(data[row][None],lognormal_sigma(data[row][None])))
                result_array = np.append(result_array,[x+y, x]) 
            return result_array.reshape(1,-1)
        else:
            for col in range(cols):
                y = data[row][col]*bernoulli_pmf((bernoulli_theta(data[None][row])))
                result_array = np.append(result_array,[y,])
            return result_array.reshape(1,-1)

def lognormal_pdf(y):
    s=0.945
    x = np.linspace(ss.lognorm.ppf(0.01, s),ss.lognorm.ppf(0.99, s))
    return ss.lognorm.pdf(x, s)

def bernoulli_pmf(y):
    p = 0.3 
    x = bernoulli_theta(y)
    return ss.bernoulli.pmf(x,p)

def log_mean(y):
    return np.mean(np.log(y),axis=0)

def log_std(y):
    return np.std(np.log(y),axis=0)

def frozen_lognormalmean(y,s):
    rv=ss.lognorm(s)
    return rv.pdf(np.mean(y))

def lognormal_sigma(y):
    s=0.945
    x = np.linspace(ss.lognorm.ppf(0.01, s),ss.lognorm.ppf(0.99, s))
    return x
    
def bernoulli_theta(y):
    p = 0.3 
    x = np.arange(ss.bernoulli.ppf(0.01, p), ss.bernoulli.ppf(9.99,p))
    return ss.bernoulli.rvs(1,p)

Current Output:

Simulator function outputs the data in the same format as in the elfi example, but when I try to generate data with simulator node I get the following error from within the loop.
`TypeError Traceback (most recent call last)
C:\Anaconda\envs\DataScienceEnv\lib\site-packages\elfi\executor.py in execute(cls, G)
69 try:
---> 70 G.node[node] = cls._run(op, node, G)
71 except Exception as exc:

C:\Anaconda\envs\DataScienceEnv\lib\site-packages\elfi\executor.py in _run(fn, node, G)
153
--> 154 output_dict = {'output': fn(*args, **kwargs)}
155 return output_dict

in legacy_updated(data, batch_size, random_state)
9 rows = len(data)
---> 10 cols = len(data[0])
11 result_array = np.array([])

TypeError: object of type 'numpy.float64' has no len()`

Expected Output:

I didn't expect the error since my function is outputting a numpy array, my python skills propably come in play also this really doesn't seem like a big error

ELFI Version:

0.3.1

Python Version:

3.6.5

Operating System:

windows 10

Fix testing against multiple Python versions

The tox virtual environments need to be setup so that tests can be run against different Python interpreters. Currently the problem seems to be that GPy can't be installed directly from pip without first installing numpy.

Controlling n_samples / batch_size ratio in elfi.Rejection

Summary:

If batch_size of a sampling method (e.g. elfi.Rejection) > n_samples , the algorithm does a simulation batch_size times, and takes only n_samples from it. It maybe a good idea to check: if batch_size > n_sample then batch_size = n_samples;

Reproducible Steps:

n_samples = 1
m = elfi.Prior('uniform', -2, 4) # or any other elfi model
rej = elfi.Rejection(m, 'dist', seed=0, batch_size=200, max_parallel_batches=32)
result = rej.sample(n_samples)

Current Output:

Doing 200 simulations and then taking only one sample;

Expected Output:

Printing a warning that the batch_size can't be > than n_samples, then doing one simulation that will be later given as a requested sample;

Cannot delete elfi node

Description:

I would like to delete a node after creating it. I have to restart my interactive Python session for this to work.

Reproducible Steps (and current output):

Using the example from the quickstart tutorial, I create a simulator but then delete it and try to redefine it.

import elfi
mu = elfi.Prior('uniform', -2, 4)
sigma = elfi.Prior('uniform', 1, 4)

import scipy.stats as ss
import numpy as np

def simulator(mu, sigma, batch_size=1, random_state=None):
    mu, sigma = np.atleast_1d(mu, sigma)
    return ss.norm.rvs(mu[:, None], sigma[:, None], size=(batch_size, 30), random_state=random_state)

def mean(y):
    return np.mean(y, axis=1)

def var(y):
    return np.var(y, axis=1)

# Set the generating parameters that we will try to infer
mean0 = 1
std0 = 3

# Generate some data (using a fixed seed here)
np.random.seed(20170525)
y0 = simulator(mean0, std0)

sim = elfi.Simulator(simulator, mu, sigma, observed=y0, name = "sim")

# Trying to delete the sim object ... 

del sim
sim = elfi.Simulator(simulator, mu, sigma, observed=y0, name = "sim")
>> ValueError: Node sim already exists

sim
>> NameError: name 'sim' is not defined

ELFI Version:

0.6.0

Python Version:

3.6.1

Operating System:

Mac OSX 10.12.5

DAFT for graphical models

FYI, you might want to consider DAFT for graphical models
http://daft-pgm.org

Convergence diagnostics

Summary:

Investigate possibilities for automatic convergence diagnostics.

Implement initial semiBSL method

Implement initial semi-parametric Bayesian Synthetic Likelihood as described by Ziwen An in https://link.springer.com/article/10.1007/s11222-019-09904-x. This will be implemented as an alternative method to estimate the synthetic likelihood in the BSL class.

Posterior methodologies with Random Forests

Summary:

Currently testing a python module wrapping https://github.com/diyabc/abcranger : posterior methodologies (model choice and parameter estimation) with Random Forests on reference table.
(See the references)

Description:

I would like to know the best way to integrate the posterior methodologies into the elfi pipeline. It seems any inference method in elfi should have an "iterate" method with every new sample, but both methodologies haven't got any (they need the whole reference table at once)

See the demos at :
https://github.com/diyabc/abcranger/blob/master/testpy/Model%20Choice%20Demo.ipynb
and
https://github.com/diyabc/abcranger/blob/master/testpy/Parameter%20Estimation%20Demo.ipynb

Note that the basic rejection sampler is more than enough with those methodologies (and the threshold parameter almost doesn't matter).

Regards,

Parellelization for external operations

Summary:

Parellization does not work with external operations

Description:

Code as follows:

Reproducible Steps:

import numpy as np
import matplotlib
import matplotlib.pyplot as plt
import scipy.io as sio
import scipy.stats as ss
from elfi.methods.utils import (GMDistribution, ModelPrior, arr2d_to_batch,
                                batch_to_arr2d, ceil_to_batch_size, weighted_var)

import elfi

import logging
logging.basicConfig(level=logging.INFO)

elfi.set_client('multiprocessing')

command = './RecEstABC2 {0} {1} {2} {3} {4} {5} {6} {7} {8} {9} {10} {11} {12} {13} {seed}'

seed = 865

alpha1 = np.log(9)
alpha2 = np.log(1.0/0.3-1.0)
alpha3 = np.log(1.0/0.7 - 1.0)
alpha4 = 0.6931471805599453
alpha5 = -0.8472978603872036
alpha6 = +0.8472978603872036
alpha7 = 0.8472978603872036
alpha9 = 1.6094
alpha8 = alpha9
alpha10 =0.3
alpha11 = 1 -(0.3*0.3 + 0.7)
alpha12 = 0.3
alpha14 = 2.302585092994046
alpha13 = alpha14





RecEst = elfi.tools.external_operation(command)

RecEst_Vec = elfi.tools.vectorize(RecEst)


y_obs = RecEst(alpha1,alpha2,alpha3,alpha4,alpha5,alpha6,alpha7,alpha8,alpha9,alpha10,alpha11,alpha12,alpha13,alpha14,seed = seed)


# %%

m = elfi.ElfiModel(name='RecEst')
elfi.Prior('norm',0,1,model=m,name='alpha1')
elfi.Prior('norm',0,1,model=m,name='alpha2')
elfi.Prior('norm',0,1,model=m,name='alpha3')
elfi.Prior('norm',0,1,model=m,name='alpha9')
m,name='alpha13')
elfi.Prior('norm',0,1,model=m,name='alpha14')

)


RecEst_node = elfi.Simulator(RecEst_Vec, m['alpha1'], m['alpha2'], m['alpha3'], 
                               alpha4, alpha5,  alpha6, alpha7,  
                             alpha8,m['alpha9'],alpha10,alpha11,alpha12,alpha13,m['alpha14'], observed=y_obs, name='sim')



elfi.draw(m)


m['sim'].generate(3)



def chi_squared(simulated,observed):
    """Return Chi squared goodness of fit.

    Adjusts for differences in magnitude between dimensions.

    Parameters
    ----------
    simulated : np.arrays
    observed : tuple of np.arrays

    """
    simulated = np.column_stack(simulated)
    observed= np.column_stack(observed)
    d = np.sum((simulated - observed)**2. / (observed*0.05)**2, axis=1)
    return d

def T1(simulatedRes):
               
      simulatedRes = np.column_stack(simulatedRes)
        return simulatedRes




elfi.Summary(T1,m['sim'],name = 'Mean')

t = elfi.Discrepancy(chi_squared, m['Mean'], name='d')

elfi.draw(m)

log_d = elfi.Operation(np.log, m['d'])


bolfi = elfi.BOLFI(log_d, batch_size=1, initial_evidence=300, update_interval=10,
                   bounds={'alpha1':(-3,3),'alpha2':(-3,3),'alpha3':(-3,3),'alpha9':(-2.5,2.5),
                           'alpha14' :(-3,3)},
                           acq_noise_var=[0.1, 0.1,0.1, 0.1,0.1],seed=870)



bolfi.acquisition_method


post = bolfi.fit(n_evidence=400)



bolfi.plot_gp()

bolfi.target_model


bolfi.plot_discrepancy();

bolfi.plot_state();

result_BOLFI = bolfi.sample(1000, info_freq=100,target_prob = 0.95, n_chains = 3)

opt = bolfi.extract_result()
opt.x_min

result_BOLFI

result_BOLFI.plot_traces();

result_BOLFI.plot_marginals();




#### Current Output:
Ignores parallelization or gives error and take foverver





#### ELFI Version:

0.7.3

#### Python Version:

3.6
#### Operating System:
Ubuntu 18.04

Initialization of BOLFI

Summary:

Initialize the GP in BOLFI with existing simulations.

Description:

Current GP/BOLFI starts the acquisition procedure without using any existing information, which may waste resources.

networkx 2 compatibility

elfi is incompatible with networkx 2, but networkx 1.11 is incompatible with matplotlib 3.

This means that it is impossible to plot with networkx and matplotlib 3 if using Elfi in the same environment.

To reproduce, install elfi, matplotlib 3.

import networkx as nx
G = nx.random_regular_graph(2, 100)
nx.draw(G)

this leads to:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-29-34f682212305> in <module>
      1 G = nx.random_regular_graph(2, 100)
----> 2 nx.draw(G)

~/miniconda3/envs/ENV/lib/python3.7/site-packages/networkx/drawing/nx_pylab.py in draw(G, pos, ax, hold, **kwds)
    124     if 'with_labels' not in kwds:
    125         kwds['with_labels'] = 'labels' in kwds
--> 126     b = plt.ishold()
    127     # allow callers to override the hold state by passing hold=True|False
    128     h = kwds.pop('hold', None)

**AttributeError: module 'matplotlib.pyplot' has no attribute 'ishold'**

ELFI Version: 0.7.5
matplotlib version: 3.2.1
networkx version: 1.11
Python Version: 3.7.6
OS: MacOS

summary method of elfi.methods.results.Sample object throws NoneType error in Python interactive mode

Summary:

As per the title, calling the summary method of elfi.methods.results.Sample object throws NoneType error in Python interactive mode

Description:

This is following the MA(2) model in the ELFI documentation.

Reproducible Steps:

import numpy as np
import scipy.stats

seed = 20170530
np.random.seed(seed)

def MA2(t1, t2, n_obs=100, batch_size=1, random_state=None):
    # Make inputs 2d arrays for numpy broadcasting with w
    t1 = np.asanyarray(t1).reshape((-1, 1))
    t2 = np.asanyarray(t2).reshape((-1, 1))
    random_state = random_state or np.random

    w = random_state.randn(batch_size, n_obs+2)  # i.i.d. sequence ~ N(0,1)
    x = w[:, 2:] + t1*w[:, 1:-1] + t2*w[:, :-2]
    return x

t1_true = 0.6; t2_true = 0.2
y_obs = MA2(t1_true, t2_true)


import elfi

t1 = elfi.Prior(scipy.stats.uniform, 0, 2)
t2 = elfi.Prior('uniform', 0, 2)

Y = elfi.Simulator(MA2, t1, t2, observed=y_obs)

def autocov(x, lag=1):
    C = np.mean(x[:,lag:] * x[:,:-lag], axis=1)
    return C

S1 = elfi.Summary(autocov, Y); S2 = elfi.Summary(autocov, Y, 2)

d = elfi.Distance('euclidean', S1, S2)
rej = elfi.Rejection(d, batch_size=10000, seed=seed)

result = rej.sample(1000, quantile=0.01)
result.summary()

Current Output:

In [2]: result.summary()
Method: Rejection
Number of posterior samples: 1000
Number of simulations: 100000
Threshold: 0.123
Posterior means: t1: 0.552, t2: 0.272

TypeError                                 Traceback (most recent call last)
<ipython-input-2-c942b82acb49> in <module>()
> 1 result.summary()

TypeError: 'NoneType' object is not callable

ELFI Version:

0.6.0

Python Version:

3.6.1

Operating System:

MacOSX 10.12.5

stochastic_optimization call to differential_evolution broken in newer scipy

Summary:

After acquiring evidence from the simulator, an error is thrown on the call to scipy's differential_evolution.

Description:

I think this is possibly an issue due to a change in scipy. If I downgrade to scipy 1.3.1 everything works, but with version >=1.5 (the default installed from conda) I get this error. You could pin the version of scipy in the conda recipe for elfi, but I would suggest updating if possible as doing this is likely to lead to unresolvable environments for many people.

Here is my environment:

# Name                    Version                   Build  Channel
appnope                   0.1.0           py36h9f0ad1d_1001    conda-forge
backcall                  0.2.0              pyh9f0ad1d_0    conda-forge
ca-certificates           2020.6.20            hecda079_0    conda-forge
certifi                   2020.6.20        py36h9f0ad1d_0    conda-forge
cycler                    0.10.0                     py_2    conda-forge
decorator                 4.4.2                      py_0    conda-forge
elfi                      0.7.4                      py_2    conda-forge
freetype                  2.10.2               h8da9a1a_0    conda-forge
gpy                       1.9.9            py36h255dfe6_3    conda-forge
graphviz                  2.42.3               h98dfb87_0    conda-forge
ipykernel                 5.3.4            py36h95af2a2_0    conda-forge
ipyparallel               6.3.0            py36h9f0ad1d_0    conda-forge
ipython                   7.16.1           py36h95af2a2_0    conda-forge
ipython_genutils          0.2.0                      py_1    conda-forge
jedi                      0.17.2           py36h9f0ad1d_0    conda-forge
joblib                    0.16.0                     py_0    conda-forge
jpeg                      9d                   h0b31af3_0    conda-forge
jupyter_client            6.1.6                      py_0    conda-forge
jupyter_core              4.6.3            py36h9f0ad1d_1    conda-forge
kiwisolver                1.2.0            py36h863e41a_0    conda-forge
lcms2                     2.11                 h174193d_0    conda-forge
libblas                   3.8.0               17_openblas    conda-forge
libcblas                  3.8.0               17_openblas    conda-forge
libcxx                    10.0.1               h5f48129_0    conda-forge
libffi                    3.2.1             h4a8c4bd_1007    conda-forge
libgfortran               4.0.0                         2    conda-forge
liblapack                 3.8.0               17_openblas    conda-forge
libllvm9                  9.0.1                h7475705_1    conda-forge
libopenblas               0.3.10          openmp_h63d9170_3    conda-forge
libpng                    1.6.37               hbbe82c9_1    conda-forge
libsodium                 1.0.17               h01d97ff_0    conda-forge
libtiff                   4.1.0                h2ae36a8_6    conda-forge
libwebp-base              1.1.0                h0b31af3_3    conda-forge
llvm-openmp               10.0.1               h28b9765_0    conda-forge
llvmlite                  0.33.0           py36hdce38a9_1    conda-forge
lz4-c                     1.9.2                h4a8c4bd_1    conda-forge
matplotlib                3.3.0                         1    conda-forge
matplotlib-base           3.3.0            py36h534ab7b_1    conda-forge
ncurses                   6.2                  hb1e8313_1    conda-forge
networkx                  1.11                     py36_0    conda-forge
numba                     0.50.1           py36h18adda2_1    conda-forge
numpy                     1.19.0           py36h4a66613_0    conda-forge
olefile                   0.46                       py_0    conda-forge
openssl                   1.1.1g               h0b31af3_0    conda-forge
paramz                    0.9.5                      py_0    conda-forge
parso                     0.7.0              pyh9f0ad1d_0    conda-forge
pexpect                   4.8.0            py36h9f0ad1d_1    conda-forge
pickleshare               0.7.5           py36h9f0ad1d_1001    conda-forge
pillow                    7.2.0            py36h2ae5dfa_1    conda-forge
pip                       20.1.1                     py_1    conda-forge
prompt-toolkit            3.0.5                      py_1    conda-forge
ptyprocess                0.6.0                   py_1001    conda-forge
pygments                  2.6.1                      py_0    conda-forge
pyparsing                 2.4.7              pyh9f0ad1d_0    conda-forge
python                    3.6.10          h4334963_1011_cpython    conda-forge
python-dateutil           2.8.1                      py_0    conda-forge
python_abi                3.6                     1_cp36m    conda-forge
pyzmq                     19.0.1           py36h820b253_0    conda-forge
readline                  8.0                  h0678c8f_2    conda-forge
scikit-learn              0.23.1           py36hef903b7_0    conda-forge
scipy                     1.5.1            py36h1dac7e4_0    conda-forge
setuptools                49.2.0           py36h9f0ad1d_0    conda-forge
six                       1.15.0             pyh9f0ad1d_0    conda-forge
sqlite                    3.32.3               h93121df_1    conda-forge
threadpoolctl             2.1.0              pyh5ca1d4c_0    conda-forge
tk                        8.6.10               hbbe82c9_0    conda-forge
toolz                     0.10.0                     py_0    conda-forge
tornado                   6.0.4            py36h37b9a7d_1    conda-forge
traitlets                 4.3.3            py36h9f0ad1d_1    conda-forge
wcwidth                   0.2.5              pyh9f0ad1d_0    conda-forge
wheel                     0.34.2                     py_1    conda-forge
xz                        5.2.5                h0b31af3_1    conda-forge
zeromq                    4.3.2                h6de7cb9_2    conda-forge
zlib                      1.2.11            h0b31af3_1006    conda-forge
zstd                      1.4.5                h0384e3a_1    conda-forge

Reproducible Steps:

Run conda create -n elfi elfi==0.7.4 scipy>=1.5 numpy && conda activate elfi
Run any model with BOLFI, e.g.:

bolfi = elfi.BOLFI(d, batch_size=1, initial_evidence=20, update_interval=10,
                    bounds={'alphaprior1':(0, 5), 'alphaprior2':(0, 5)},
                    acq_noise_var=[0.01, 0.01], seed=1)

post = bolfi.fit(n_evidence=400)

Current Output:

After the evidence acquisition is complete:

Progress: |██████████████████████████████████████████████████| 100.0% Complete
Traceback (most recent call last):
  File "ELFI.simulator.py", line 61, in <module>
    post = bolfi.fit(n_evidence=40)
  File "/Users/jlees/miniconda3/envs/elfi/lib/python3.6/site-packages/elfi/methods/parameter_inference.py", line 1191, in fit
    self.infer(n_evidence, bar=bar)
  File "/Users/jlees/miniconda3/envs/elfi/lib/python3.6/site-packages/elfi/methods/parameter_inference.py", line 276, in infer
    return self.extract_result()
  File "/Users/jlees/miniconda3/envs/elfi/lib/python3.6/site-packages/elfi/methods/parameter_inference.py", line 953, in extract_result
    self.target_model.predict_mean, self.target_model.bounds, seed=self.seed)
  File "/Users/jlees/miniconda3/envs/elfi/lib/python3.6/site-packages/elfi/methods/bo/utils.py", line 32, in stochastic_optimization
    func=fun, bounds=bounds, maxiter=maxiter, polish=polish, init='latinhypercube', seed=seed)
  File "/Users/jlees/miniconda3/envs/elfi/lib/python3.6/site-packages/scipy/optimize/_differentialevolution.py", line 308, in differential_evolution
    ret = solver.solve()
  File "/Users/jlees/miniconda3/envs/elfi/lib/python3.6/site-packages/scipy/optimize/_differentialevolution.py", line 814, in solve
    constraints=self.constraints)
  File "/Users/jlees/miniconda3/envs/elfi/lib/python3.6/site-packages/scipy/optimize/_minimize.py", line 618, in minimize
    callback=callback, **options)
  File "/Users/jlees/miniconda3/envs/elfi/lib/python3.6/site-packages/scipy/optimize/lbfgsb.py", line 308, in _minimize_lbfgsb
    finite_diff_rel_step=finite_diff_rel_step)
  File "/Users/jlees/miniconda3/envs/elfi/lib/python3.6/site-packages/scipy/optimize/optimize.py", line 262, in _prepare_scalar_function
    finite_diff_rel_step, bounds, epsilon=epsilon)
  File "/Users/jlees/miniconda3/envs/elfi/lib/python3.6/site-packages/scipy/optimize/_differentiable_functions.py", line 95, in __init__
    self._update_grad()
  File "/Users/jlees/miniconda3/envs/elfi/lib/python3.6/site-packages/scipy/optimize/_differentiable_functions.py", line 171, in _update_grad
    self._update_grad_impl()
  File "/Users/jlees/miniconda3/envs/elfi/lib/python3.6/site-packages/scipy/optimize/_differentiable_functions.py", line 92, in update_grad
    **finite_diff_options)
  File "/Users/jlees/miniconda3/envs/elfi/lib/python3.6/site-packages/scipy/optimize/_numdiff.py", line 388, in approx_derivative
    raise ValueError("`f0` passed has more than 1 dimension.")
ValueError: `f0` passed has more than 1 dimension.

Expected Output:

Gives me a correct bolfi object, with a lower version of scipy

ELFI Version:

0.7.4

Python Version:

3.6

Operating System:

OS X 10.15

Implement initial standard BSL

Implement initial working standard Bayesian Synthetic Likelihood method as described by Leah Price in https://doi.org/10.1080/10618600.2017.1302882.

Can rejection sampling be done with no explicit passing of observed data to a simulator?

For example,
in ELFI tutorial...
We should specify observed data like below:

Y = elfi.Simulator(MA2, t1, t2, observed=y_obs)

for calculating a discrepancy like:

d = elfi.Distance('euclidean', S1, S2)

What I wonder is,
without explicitly passing 'observed' argument to the Simulator object,
can we make it possible to do ABC-rejection by modifying discrepancy function?
e.g. instead of passing observed data to Simulator,
calculate all discrepancies between simulated samples and observed samples in a custom Distance function.

Node.reset()

When the DAG structure is changed, other nodes not part of the change need sometimes to be notified about it, so that they can e.g. reset their data. Currently this is achieved through Node.reset. However the implementation is non optimal at the moment and should be refactored.

Make it possible to use saved values with ElfiStore

Elfi nodes should be able to use previously stored data. Being able to safely produce additional data to the stored data requires that we know at least the master seed and sub stream index, so that we don't repeat already computed values that are in the store.

Some tentative ideas

Add to ElfiStore an optinal mask for stored values. When using numpy arrays, use the numpy.ma

SMC Sampling fails with multivariate distribution

Summary:

When a sequence of thresholds having length > 1 is passed to smc, the inference fails with ValueError: In executing node '_theta_logpdf': Vector 'x' must have either the same number of entries as, or one entry fewer than, parameter vector 'a', but alpha.shape = (10,) and x.shape = (100000,)..

Description:

Using a simulator of a multinomial model, having a dirichlet prior, attempting to run smc for a sequence of thresholds produces an error.

Reproducible Steps:

import numpy as np
import scipy.stats as ss
import elfi
num_categories = 10
num_experiments = 100
theta = elfi.Prior('dirichlet', np.ones(num_categories), name='theta')
def simulator(num_experiments, theta, batch_size=1, random_state=None):
    if len(theta.shape)==1:
        theta = theta.reshape(1,-1)
    return_value = []

    for i in range(theta.shape[0]):
        return_value.append(
            ss.multinomial.rvs(
                100, 
                theta[i,:], 
                random_state=random_state
            )
        )
    return np.array(return_value)
                       
def mean(y):
    return np.mean(y, axis=1)

def ret_identity(y):
    return y

def distance_in_variation(p1,observed=[]):
    arr_p1 = np.asarray(p1)
    y = np.tile(observed[0],(np.asarray(p1).shape[0],1))
    
    return 0.5*np.sum(np.abs(arr_p1-y),axis=1)

theta0 = np.random.dirichlet(np.ones(num_categories))
np.random.seed(20191126) 
y0 = simulator(num_experiments,theta0)

from functools import partial
# Add the simulator node and observed data to the model
sim_fn = partial(simulator,num_experiments)
sim = elfi.Simulator(sim_fn, theta, observed=y0)
d = elfi.Discrepancy(
    distance_in_variation, 
    elfi.Summary(ret_identity, sim, name='ret_identity'), 
    name='d')

smc = elfi.SMC(d, batch_size=10000)

N = 1000
schedule = [100,99]
%time result_smc = smc.sample(N, schedule)

Current Output:

When the threshold schedule is [100], everything runs. When it is changed to [100,99] it fails with the error:

Progress: |█████████████████████████-------------------------| 50.0% Complete
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
~/.pyenv/versions/3.7.4/envs/work/lib/python3.7/site-packages/elfi/executor.py in execute(cls, G)
     69                 try:
---> 70                     G.node[node] = cls._run(op, node, G)
     71                 except Exception as exc:

~/.pyenv/versions/3.7.4/envs/work/lib/python3.7/site-packages/elfi/executor.py in _run(fn, node, G)
    153 
--> 154         output_dict = {'output': fn(*args, **kwargs)}
    155         return output_dict

~/.pyenv/versions/3.7.4/envs/work/lib/python3.7/site-packages/scipy/stats/_multivariate.py in logpdf(self, x, alpha)
   1448         alpha = _dirichlet_check_parameters(alpha)
-> 1449         x = _dirichlet_check_input(alpha, x)
   1450 

~/.pyenv/versions/3.7.4/envs/work/lib/python3.7/site-packages/scipy/stats/_multivariate.py in _dirichlet_check_input(alpha, x)
   1242                          "parameter vector 'a', but alpha.shape = %s "
-> 1243                          "and x.shape = %s." % (alpha.shape, x.shape))
   1244 

ValueError: Vector 'x' must have either the same number of entries as, or one entry fewer than, parameter vector 'a', but alpha.shape = (10,) and x.shape = (100000,).

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
<timed exec> in <module>

~/.pyenv/versions/3.7.4/envs/work/lib/python3.7/site-packages/elfi/methods/parameter_inference.py in sample(self, n_samples, *args, **kwargs)
    404         bar = kwargs.pop('bar', True)
    405 
--> 406         return self.infer(n_samples, *args, bar=bar, **kwargs)
    407 
    408     def _extract_result_kwargs(self):

~/.pyenv/versions/3.7.4/envs/work/lib/python3.7/site-packages/elfi/methods/parameter_inference.py in infer(self, vis, bar, *args, **kwargs)
    262 
    263         while not self.finished:
--> 264             self.iterate()
    265             if vis:
    266                 self.plot_state(interactive=True, **vis_opt)

~/.pyenv/versions/3.7.4/envs/work/lib/python3.7/site-packages/elfi/methods/parameter_inference.py in iterate(self)
    299         # Submit new batches if allowed
    300         while self._allow_submit(self.batches.next_index):
--> 301             next_batch = self.prepare_new_batch(self.batches.next_index)
    302             logger.debug("Submitting batch %d" % self.batches.next_index)
    303             self.batches.submit(next_batch)

~/.pyenv/versions/3.7.4/envs/work/lib/python3.7/site-packages/elfi/methods/parameter_inference.py in prepare_new_batch(self, batch_index)
    716         params = GMDistribution.rvs(*self._gm_params, size=self.batch_size,
    717                                     prior_logpdf=self._prior.logpdf,
--> 718                                     random_state=self._round_random_state)
    719 
    720         batch = arr2d_to_batch(params, self.parameter_names)

~/.pyenv/versions/3.7.4/envs/work/lib/python3.7/site-packages/elfi/methods/utils.py in rvs(cls, means, cov, weights, size, prior_logpdf, random_state)
    229             # check validity of x
    230             if prior_logpdf is not None:
--> 231                 x = x[np.isfinite(prior_logpdf(x))]
    232 
    233             n_accepted1 = len(x)

~/.pyenv/versions/3.7.4/envs/work/lib/python3.7/site-packages/elfi/methods/utils.py in logpdf(self, x)
    356     def logpdf(self, x):
    357         """Return the log density of the joint prior at x."""
--> 358         return self._evaluate_pdf(x, log=True)
    359 
    360     def _evaluate_pdf(self, x, log=False):

~/.pyenv/versions/3.7.4/envs/work/lib/python3.7/site-packages/elfi/methods/utils.py in _evaluate_pdf(self, x, log)
    380             loaded_net.node[k] = {'output': v}
    381 
--> 382         val = self.client.compute(loaded_net)[node]
    383         if ndim == 0 or (ndim == 1 and self.dim > 1):
    384             val = val[0]

~/.pyenv/versions/3.7.4/envs/work/lib/python3.7/site-packages/elfi/client.py in compute(self, loaded_net)
    271     def compute(self, loaded_net):
    272         """Request evaluation of `loaded_net` and wait for result."""
--> 273         return self.apply_sync(Executor.execute, loaded_net)
    274 
    275     @property

~/.pyenv/versions/3.7.4/envs/work/lib/python3.7/site-packages/elfi/clients/native.py in apply_sync(self, kallable, *args, **kwargs)
     51 
     52         """
---> 53         return kallable(*args, **kwargs)
     54 
     55     def get_result(self, task_id):

~/.pyenv/versions/3.7.4/envs/work/lib/python3.7/site-packages/elfi/executor.py in execute(cls, G)
     71                 except Exception as exc:
     72                     raise exc.__class__("In executing node '{}': {}."
---> 73                                         .format(node, exc)).with_traceback(exc.__traceback__)
     74 
     75             elif 'output' not in attr:

~/.pyenv/versions/3.7.4/envs/work/lib/python3.7/site-packages/elfi/executor.py in execute(cls, G)
     68                 op = attr['operation']
     69                 try:
---> 70                     G.node[node] = cls._run(op, node, G)
     71                 except Exception as exc:
     72                     raise exc.__class__("In executing node '{}': {}."

~/.pyenv/versions/3.7.4/envs/work/lib/python3.7/site-packages/elfi/executor.py in _run(fn, node, G)
    152         args = [a[1] for a in sorted(args, key=itemgetter(0))]
    153 
--> 154         output_dict = {'output': fn(*args, **kwargs)}
    155         return output_dict
    156 

~/.pyenv/versions/3.7.4/envs/work/lib/python3.7/site-packages/scipy/stats/_multivariate.py in logpdf(self, x, alpha)
   1447         """
   1448         alpha = _dirichlet_check_parameters(alpha)
-> 1449         x = _dirichlet_check_input(alpha, x)
   1450 
   1451         out = self._logpdf(x, alpha)

~/.pyenv/versions/3.7.4/envs/work/lib/python3.7/site-packages/scipy/stats/_multivariate.py in _dirichlet_check_input(alpha, x)
   1241                          "of entries as, or one entry fewer than, "
   1242                          "parameter vector 'a', but alpha.shape = %s "
-> 1243                          "and x.shape = %s." % (alpha.shape, x.shape))
   1244 
   1245     if x.shape[0] != alpha.shape[0]:

ValueError: In executing node '_theta_logpdf': Vector 'x' must have either the same number of entries as, or one entry fewer than, parameter vector 'a', but alpha.shape = (10,) and x.shape = (100000,)..

Expected Output:

I expect the output to have samples similar to running with threshold [99].

ELFI Version:

0.7.4

Python Version:

3.7.4

Operating System:

MAC OSX

ValueError when running BOLFI: `input not fortran contiguous`

Description:

I'm getting the following ValueError when running BOLFI fit() function:

ValueError: failed to initialize intent(inout) array -- input not fortran contiguous

Reproducible Steps:

Download BOLFI example notebook:

https://github.com/elfi-dev/notebooks/blob/master/BOLFI.ipynb

Execute cell with this line:

%time post = bolfi.fit(n_evidence=200)

Current Output:

Here's the full trace:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-8-e609f23bdce1> in <module>()
      1 get_ipython().magic('time')
----> 2 post = bolfi.fit(n_evidence=200)

/Users/wdeback/anaconda/envs/py35/lib/python3.5/site-packages/elfi/methods/parameter_inference.py in fit(self, n_evidence, threshold)
   1153                 'You must specify the number of evidence (n_evidence) for the fitting')
   1154 
-> 1155         self.infer(n_evidence)
   1156         return self.extract_posterior(threshold)
   1157 

/Users/wdeback/anaconda/envs/py35/lib/python3.5/site-packages/elfi/methods/parameter_inference.py in infer(self, vis, *args, **kwargs)
    247 
    248         while not self.finished:
--> 249             self.iterate()
    250             if vis:
    251                 self.plot_state(interactive=True, **vis_opt)

/Users/wdeback/anaconda/envs/py35/lib/python3.5/site-packages/elfi/methods/parameter_inference.py in iterate(self)
    280         # Submit new batches if allowed
    281         while self._allow_submit(self.batches.next_index):
--> 282             next_batch = self.prepare_new_batch(self.batches.next_index)
    283             logger.debug("Submitting batch %d" % self.batches.next_index)
    284             self.batches.submit(next_batch)

/Users/wdeback/anaconda/envs/py35/lib/python3.5/site-packages/elfi/methods/parameter_inference.py in prepare_new_batch(self, batch_index)
    989         acquisition = self.state['acquisition']
    990         if len(acquisition) == 0:
--> 991             acquisition = self.acquisition_method.acquire(self.acq_batch_size, t=t)
    992 
    993         batch = arr2d_to_batch(acquisition[:self.batch_size], self.parameter_names)

/Users/wdeback/anaconda/envs/py35/lib/python3.5/site-packages/elfi/methods/bo/acquisition.py in acquire(self, n, t)
    123             self.n_inits,
    124             self.max_opt_iters,
--> 125             random_state=self.random_state)
    126 
    127         # Create n copies of the minimum

/Users/wdeback/anaconda/envs/py35/lib/python3.5/site-packages/elfi/methods/bo/utils.py in minimize(fun, bounds, grad, prior, n_start_points, maxiter, random_state)
     89         if grad is not None:
     90             result = fmin_l_bfgs_b(
---> 91                 fun, start_points[i, :], fprime=grad, bounds=bounds, maxiter=maxiter)
     92         else:
     93             result = fmin_l_bfgs_b(

/Users/wdeback/anaconda/envs/py35/lib/python3.5/site-packages/scipy/optimize/lbfgsb.py in fmin_l_bfgs_b(func, x0, fprime, args, approx_grad, bounds, m, factr, pgtol, epsilon, iprint, maxfun, maxiter, disp, callback, maxls)
    191 
    192     res = _minimize_lbfgsb(fun, x0, args=args, jac=jac, bounds=bounds,
--> 193                            **opts)
    194     d = {'grad': res['jac'],
    195          'task': res['message'],

/Users/wdeback/anaconda/envs/py35/lib/python3.5/site-packages/scipy/optimize/lbfgsb.py in _minimize_lbfgsb(fun, x0, args, jac, bounds, disp, maxcor, ftol, gtol, eps, maxfun, maxiter, iprint, callback, maxls, **unknown_options)
    319         _lbfgsb.setulb(m, x, low_bnd, upper_bnd, nbd, f, g, factr,
    320                        pgtol, wa, iwa, task, iprint, csave, lsave,
--> 321                        isave, dsave, maxls)
    322         task_str = task.tostring()
    323         if task_str.startswith(b'FG'):

ValueError: failed to initialize intent(inout) array -- input not fortran contiguous

Expected Output:

According to the rendered notebook, the output should be

INFO:elfi.methods.parameter_inference:BOLFI: Fitting the surrogate model...
INFO:elfi.methods.posteriors:Using optimized minimum value (-1.6146) of the GP discrepancy mean function as a threshold

Versions and platform:

ELFI 0.6.2
Python 3.5.3
Mac OSX 10.12.6

cannot run example scripts using BO code in python2.7, 3.4 and 3.5

Summary:

When I tried to run the example scripts in the examples folder:

python bdm.py

I get this error:

File "/home/akash/anaconda2/envs/py34/lib/python3.4/site-packages/elfi/methods/bo/utils.py", line 76
start_points[:, i] = random_state.uniform(*bounds[i], n_start_points)
^
SyntaxError: only named arguments may follow *expression

I got this error both in 2.7 and 3.4. After this I tried to create a new environment and installed 3.5 virtually on anaconda, and tried to run it again, and this time I got this:
File "/home/akash/SheffieldML/GPy/core/parameterization/init.py", line 9, in
from paramz import ties_and_remappings, ObsAr
ImportError: cannot import name 'ties_and_remappings'

This is a GPy Error, because one of the dependency pacjkages: paramz is still uncompatible for python 3.5. Can you suggest something here which I should do with the installations to make it finally working.

Current Output:

The example scripts did not run and throw syntax errors.

Expected Output:

The example scripts did not run.

ELFI Version:

Python Version:2.7, 3.4, 3.5

Operating System: Linux

Refactor tests

Format the functional tests to be compatible with py.test, as currently they are not executed. We should also move all the tests directly under the test directory and use the tagging feature of pytest to limit the tests that are run (i.e. mark slow tests and run them separately).

Parallelized sampling does not work when .py file is executed with IPython or Spyder.

Summary:

Parallelized sampling does not work when .py file is executed with IPython or Spyder.

Description:

Running parallelized sampling takes forever or produces error if file is executed with Spyder or IPython, but everything works when same lines are copy pasted in IPython.

Reproducible Steps:

How to reproduce: Open Spyder, copy following code and save it:

import elfi
import numpy as np
from elfi.examples import ma2
seed = 20170530
np.random.seed(seed)

model = ma2.get_model()
elfi.set_client('multiprocessing')

rej = elfi.Rejection(model, 'd', batch_size=10000, seed=20170530)

print ("Starting sampling...")

result = rej.sample(5000, n_sim=int(1e6))
print(result.summary)

# %%

Please note that at the end of code there should be cell separator (# %%), it does not affect the bug but makes following step with Spyder easier.

Reproducing with Spyder (no error message displayed)

Then select first cell and run it (ctrl+enter). Everything should work and after a couple of second result should be printed out.

Then restart the kernel (ctrl+.), just to make sure following steps is executed in 'clean' environment.

Press F5 to execute code. Now it will take very long (or maybe forever) to complete. No error is shown however.

Reproducing with iPython (error message is displayed)

Another way to reproduce this error is just run the code from console with IPython, then the error in "Current Output section" is displayed. When code is just copy pasted inside IPython it works just as well as executing cell in Spyder.

Current Output:

The code keeps printing following error message in loop, when executed with IPython:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "C:\ProgramData\Anaconda3\lib\multiprocessing\spawn.py", line 105, in spawn_main
    exitcode = _main(fd)
  File "C:\ProgramData\Anaconda3\lib\multiprocessing\spawn.py", line 114, in _main
    prepare(preparation_data)
  File "C:\ProgramData\Anaconda3\lib\multiprocessing\spawn.py", line 225, in prepare
    _fixup_main_from_path(data['init_main_from_path'])
  File "C:\ProgramData\Anaconda3\lib\multiprocessing\spawn.py", line 277, in _fixup_main_from_path
    run_name="__mp_main__")
  File "C:\ProgramData\Anaconda3\lib\runpy.py", line 263, in run_path
    pkg_name=pkg_name, script_name=fname)
  File "C:\ProgramData\Anaconda3\lib\runpy.py", line 96, in _run_module_code
    mod_name, mod_spec, pkg_name, script_name)
  File "C:\ProgramData\Anaconda3\lib\runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "C:\Users\ossig\Google Drive\Koulujutut\ABC\Elfibugi2.py", line 8, in <module>
    elfi.set_client('multiprocessing')
  File "C:\ProgramData\Anaconda3\lib\site-packages\elfi\client.py", line 37, in set_client
    client = m.Client()
  File "C:\ProgramData\Anaconda3\lib\site-packages\elfi\clients\multiprocessing.py", line 30, in __init__
    self.pool = multiprocessing.Pool(processes=num_processes)
  File "C:\ProgramData\Anaconda3\lib\multiprocessing\context.py", line 119, in Pool
    context=self.get_context())
  File "C:\ProgramData\Anaconda3\lib\multiprocessing\pool.py", line 174, in __init__
    self._repopulate_pool()
  File "C:\ProgramData\Anaconda3\lib\multiprocessing\pool.py", line 239, in _repopulate_pool
    w.start()
  File "C:\ProgramData\Anaconda3\lib\multiprocessing\process.py", line 105, in start
    self._popen = self._Popen(self)
  File "C:\ProgramData\Anaconda3\lib\multiprocessing\context.py", line 322, in _Popen
    return Popen(process_obj)
  File "C:\ProgramData\Anaconda3\lib\multiprocessing\popen_spawn_win32.py", line 33, in __init__
    prep_data = spawn.get_preparation_data(process_obj._name)
  File "C:\ProgramData\Anaconda3\lib\multiprocessing\spawn.py", line 143, in get_preparation_data
    _check_not_importing_main()
  File "C:\ProgramData\Anaconda3\lib\multiprocessing\spawn.py", line 136, in _check_not_importing_main
    is not going to be frozen to produce an executable.''')
RuntimeError:
        An attempt has been made to start a new process before the
        current process has finished its bootstrapping phase.

        This probably means that you are not using fork to start your
        child processes and you have forgotten to use the proper idiom
        in the main module:

            if __name__ == '__main__':
                freeze_support()
                ...

        The "freeze_support()" line can be omitted if the program
        is not going to be frozen to produce an executable.

Expected Output:

Starting sampling...
bound method Sample.summary of Method: Rejection
Number of samples: 5000
Number of simulations: 1000000
Threshold: 0.0952
Sample means: t1: 0.87, t2: 0.231

ELFI Version:

0.7.1

Python Version:

3.6.5

Operating System:

Windows 10, version 1709, 64 bit
CPU: Intel i7-5600U

Combine vectorization wrappers

All the vectorized wrappers for different nodes could be combined to one vectorized function. Currently they share a lot of code.

How to do it:
For positional arguments take the i:th element from each. If it is a tuple (as in distance node, take the i:th element from each array in the tuple). n_sim would be always one. If prng (stochastic mixin) is given, pass it as a keyword argument to the operation function.

when run smc 100 times it caused problem

Summary:

Please provide a short couple sentence summary.

I am doing smc ABC using elfi smc as instructed on the official website tutoirial

Description:

Describe the issue as clearly as possible.

if I run once, there is no problem, but if I want to infer the parameter 100 times using a for loop, it caused problem, I can't run it, when I run the first of the loops, it's ok, and takes about 20s, the second, it becomes 1 minute, and third, it always said there is not enough examples....and elfi is trying.

Reproducible Steps:

Please report steps to reproduce the issue. If it's not possible to reproduce, please include a description of how you discovered the issue.

If you have a reproducible example, please include it.

below is the code, while Ms is just the function which you could ignore. S1 is the summary statistics.

def Ms(alpha, beta, batch_size=1, random_state=None):
y = [1]
alpha = np.float(alpha)
beta = np.float(beta)
theta = [alpha, alpha*beta, tao]
theta = theta/np.sum(theta)

theta = map(float, theta)

i = 1 
while i < 20:
    if y == []:
        break
    else:
        b = np.random.multinomial(1, theta, size=1)
        i = i + 1
        if b[0][2] == 1:
            y = y + [1]
        elif b[0][0] == 1:
            q = y/np.sum(y)
            p = q.tolist()
            a = np.random.multinomial(1, p, size=1).tolist()[0]
            y = [f+g for f,g in zip(y,a)]
        else:
            i = i - 2
            q = y/np.sum(y)
            p = q.tolist()
            a = np.random.multinomial(1, p, size=1).tolist()[0]
            y = [f-g for f,g in zip(y,a)]
            while 0 in y:
                y.remove(0)
    yalpha = sorted(y)
return yalpha

def S1(x):
t1 = len(x)/20
return t1

start=time.time()
for i in range(100):
np.random.seed(seed+10000)
alpha0s = np.random.uniform(0.05,2,size=1)
beta0s = np.random.uniform(0.05,0.2,size=1)
y0s = Ms(alpha0s, beta0s)

alpha = elfi.Prior(scipy.stats.uniform, 0.005, 2)
beta = elfi.Prior(scipy.stats.uniform, 0.005, 0.2)

Ys = elfi.Simulator(Ms, alpha, beta, observed=y0s)
Sus1 = elfi.Summary(S1, Ys)    

ds1 = elfi.Distance('euclidean', Sus1)
smc = elfi.SMC(ds1, batch_size=1, seed=seed)
    
N = 1000
schedule = [0.7, 0.4, 0.05]
%time result_smc = smc.sample(N, schedule)
MSE1s = MSE1s + [np.square(result_smc.sample_means['alpha'] - alpha0s)]
MSE2s = MSE2s + [np.square(result_smc.sample_means['beta'] - beta0s)]

end = time.time()
print(end-start)
print(np.mean(MSE1s))
print(np.mean(MSE2s))

Current Output:

The current output. Knowing what is the current behavior is useful.

Expected Output:

Describe what you expect the output to be. Knowing the correct behavior is also very useful.

I want to get all the result like the first of the loop.

ELFI Version:newest

Python Version:3.6

Operating System:windows 10

quickstart page: two oddities

Summary:

the quickstart page lacks an "import elfi". Also, for clarity, priors should be defined at a latter stage

Description:

Describe the issue as clearly as possible.

I am totally new to Python, and when I say "new" I mean that I have installed it for the first time today (via Anaconda). So what follows might be silly and/or totally obvious to others, as of course some python knowledge is required to play with ELFI. But anyway, I managed to go through the quickstart https://elfi.readthedocs.io/en/latest/quickstart.html#

However, this was not pain-free, hence here are my suggestions:

please add an "import elfi" right at the top. This is completely missing and for a Python novice it will be painful to try to find the reason for those "NameError: name 'elfi' is not defined".
again in th quickstart page: it is a strange workflow to define the priors before everything else. I find this to be totally non-intuitive. My suggestion is to place the priors definition after the data (y0) have been generated.

Integrate visualization

Summary:

Although plotting tends to be simple as it is, it would be nice to integrate typical plotting scenarios.

Is it possible to store arbitrary node output?

I'm trying to use ELFI with a computationally intensive simulator and want to store some output from the simulator function and other nodes. I have tried using using OutputPool to store both simulator and summary node outputs, but the stores of the OutputPool are empty after simulation: e.g.
pool = elfi.OutputPool(["S1"]) results in {'S1': None} after simulation.

Any ideas? Ideally I would like to only save the simulation output of accepted simulations.

An elfi.Sample fails to get summary

Summary:

Please provide a short couple sentence summary.
I followed the tutorial, then
I got an elfi.Sample which fails to get summary.
While these is also error about parameters

Description:

Describe the issue as clearly as possible.

TypeError: unsupported format string passed to numpy.ndarray.format
The prior distribution of parameter beta is [0.05,0.2], while in most results, result.sample_means['beta'] is 'inf'

Reproducible Steps:

Please report steps to reproduce the issue. If it's not possible to reproduce, please include a description of how you discovered the issue.

If you have a reproducible example, please include it.

import time
import numpy as np
import scipy.stats
import matplotlib
import matplotlib.pyplot as plt
import logging
logging.basicConfig(level=logging.INFO)
import elfi

MSE1 = []
MSE2 = []
seed=3333
tao = 0.198

def M(alpha, beta, batch_size=1, random_state=None):
y = [1]
alpha = np.float(alpha)
beta = np.float(beta)
theta = [alpha, alpha*beta, tao]
theta = theta/np.sum(theta)
i = 1
while i < 20:
if y == []:
break
else:
b = np.random.multinomial(1, theta, size=1)
i = i + 1
if b[0][2] == 1:
y = y + [1]
elif b[0][0] == 1:
q = y/np.sum(y)
p = q.tolist()
a = np.random.multinomial(1, p, size=1).tolist()[0]
y = [f+g for f,g in zip(y,a)]
else:
i = i - 2
q = y/np.sum(y)
p = q.tolist()
a = np.random.multinomial(1, p, size=1).tolist()[0]
y = [f-g for f,g in zip(y,a)]
while 0 in y:
y.remove(0)
yalpha = sorted(y)
return yalpha

for seed in range(1000):
elfi.new_model()
np.random.seed(seed+100500)

alpha = elfi.Prior(scipy.stats.uniform, 0.05, 2, size=1)
beta = elfi.Prior(scipy.stats.uniform, 0.05, 0.2, size=1)


alpha0 = np.random.uniform(0.05,2,size=1).tolist()[0]
beta0 = np.random.uniform(0.05,0.2,size=1).tolist()[0]
y0 = M(alpha0, beta0)
   
Y = elfi.Simulator(M, alpha, beta, observed=y0)
 
def S1(x):
    t1 = len(x)/20
    return t1

def S(x):
    y = 0
    for u in x:
        y = 100*y + u
    return y

Su = elfi.Summary(S, Y)    
 
d = elfi.Distance('euclidean', Su)

rej = elfi.Rejection(d, batch_size=1, seed=seed) 
 
result = rej.sample(n_samples=10, threshold=0)


MSE1 = MSE1 + [np.square(result.sample_means['alpha'] - alpha0)]
MSE2 = MSE2 + [np.square(result.sample_means['beta'] - beta0)]

end = time.time()
end-start
np.mean(MSE1)
np.mean(MSE2)

Current Output:

The current output. Knowing what is the current behavior is useful.

output of np.mean(MSE2) is inf
and even without loop, result.summary gets error:
TypeError: unsupported format string passed to numpy.ndarray.format

Expected Output:

Describe what you expect the output to be. Knowing the correct behavior is also very useful.

ELFI Version:newest

Python Version: spider 3.6

Operating System: win10

Running same code multiple times produces different outcomes, and adds weird priors to model.

Summary:

Running same code multiple times produces different outcomes, and adds weird priors to model.

Description:

Running same code multiple times, e.g. with IPython, produces different outcomes. It seems to me that defining new priors and simulators does not handle memory correctly and they append, not overwrite, old priors and simulator.

Reproducible Steps:

How to reproduce: Open Spyder, copy following code and save it:

import elfi
import numpy as np
from elfi.examples import ma2

seed = 20170530
np.random.seed(seed)

# Define everything similarly as done in introduction. However here all code is taken from examples module.
t1_true = 0.6
t2_true = 0.2

y_obs = ma2.MA2(t1_true, t2_true)

t1 = elfi.Prior(ma2.CustomPrior1, 2)
t2 = elfi.Prior(ma2.CustomPrior2, t1, 1)

Y = elfi.Simulator(ma2.MA2, t1, t2, observed=y_obs)

S1 = elfi.Summary(ma2.autocov, Y)
S2 = elfi.Summary(ma2.autocov, Y, 2)
d = elfi.Distance('euclidean', S1, S2) #

rej = elfi.Rejection(d, batch_size=5000, seed=seed)
result = rej.sample(1000, threshold=0.5) 
print(result.summary)

Run (F5) code. Everything is fine and output should be print in couple of seconds. Then run the code again. Output is different and weird prior nodes are appeared to the model. If code is run again even more priors keep appearing.

Same behavior exists if code is run multiple times from console with IPython.

I think that problem arise from fact that running lines

t2 = elfi.Prior(ma2.CustomPrior2, t1, 1)

Y = elfi.Simulator(ma2.MA2, t1, t2, observed=y_obs)```

again, do not overwrite old prior nodes and/or simulator, but instead they append nodes to old simulator.



#### Current Output for both runs:
runfile('XXX.py', wdir='XXX')
<bound method Sample.summary of Method: Rejection
Number of samples: 1000
Number of simulations: 10000
Threshold: 0.387
Sample means: t1: 0.544, t2: 0.227
>

runfile('XXX.py', wdir='XXX')
<bound method Sample.summary of Method: Rejection
Number of samples: 1000
Number of simulations: 10000
Threshold: 0.386
Sample means: _prior_0466: 0.252, _prior_7dfe: 0.537, t1: -0.0194, t2: 0.356
>

#### Expected Output:

Output should be the same for both runs.


#### ELFI Version:
0.7.1

#### Python Version:
3.6.5

#### Operating System:

Windows 10, version 1709, 64 bit
CPU: Intel i7-5600U

Convert n_sim and n_samples in 'float' to 'int' automatically

Summary:

If n_sim and n_samples (e.g. elfi.Rejection.set_objective()) are type 'float', the code gives "TypeError".

Description:

When n_sim and n_samples (e.g. elfi.Rejection.set_objective()) are provided in the format '10e7', self.objective['n_samples'] becomes 'float', which gives "TypeError: 'float' object cannot be interpreted as an integer" in parameter_inference.py::_init_samples_lazy line "samples[node] = np.ones(shape, dtype=dtype) * np.inf". In my opinion, the format '10e7' is more convenient than 10000000 when dealing with large numbers. This can be solved by converting the variables to 'int' automatically.

Reproducible Steps:

The problem can be reproduced with any simulator:
n_sim = 10e3
n_samples = 10e2
rej = elfi.Rejection(m, 'dist')
rej.set_objective(n_samples=n_samples, n_sim=n_sim)

About multiprocessing parallel, is there anything wrong?

Summary:

Please provide a short couple sentence summary.

About multiprocessing parallel, is there anything wrong?

Description:

Describe the issue as clearly as possible.

Doing it once, it saves time, but in a for loop, it takes more time.

Reproducible Steps:

Please report steps to reproduce the issue. If it's not possible to reproduce, please include a description of how you discovered the issue.

If you have a reproducible example, please include it.

import time

import numpy as np
import scipy.stats
import matplotlib
import matplotlib.pyplot as plt
import logging
logging.basicConfig(level=logging.INFO)

import elfi

MSE = []
MSE1 = []
t1=0
t=0

seed=55889

for seed in range(100):
np.random.seed(seed+4543)
elfi.new_model()

alpha = elfi.Prior(scipy.stats.uniform, 0.05, 1)
tao = 0.198

def M(alpha, batch_size=1, random_state=None):
    y = [1]    
    for i in range(19):
        b = np.random.binomial(1, alpha/(alpha+tao), size=1)
        if b == 0:
            y = y + [1]
        else:
            q = y/np.sum(y)
            p = q.tolist()
            a = np.random.multinomial(1, p, size=1).tolist()[0]
            y = [f+g for f,g in zip(y,a)]
    yalpha = sorted(y)
    return yalpha

theta = np.random.uniform(0.05,1,size=1)
y0 = M(theta)
   
Y = elfi.Simulator(M,alpha,observed=y0)
Y1 = elfi.Simulator(M, alpha, observed=y0)

def S(x):
    sum=np.sum([y*y for y in x])
    return sum

Su = elfi.Summary(S, Y)

def S1(x):
    t1 = len(x)/20
    return t1

def S2(x):
    q = x/np.sum(x)
    p = q.tolist()
    t2 = 1 - np.sum([a*a for a in p])
    return t2

Su1 = elfi.Summary(S1, Y1)

d = elfi.Distance('euclidean', Su,Su1)
d1 = elfi.Distance('euclidean', Su1)

rej = elfi.Rejection(d, batch_size=1, seed=seed)                             
rej1 = elfi.Rejection(d1, batch_size=1, seed=seed)
start=time.time()
**elfi.set_client('multiprocessing')**
result = rej.sample(n_samples=100, threshold=0.01)
end=time.time()
t=end-start+t
start1=time.time()
**elfi.set_client('multiprocessing')**
result1 = rej1.sample(n_samples=100, threshold=0.01)
end1=time.time()
t1=end1-start1+t1

a1 = result1.samples_array
b1 = np.sqrt(np.mean(np.square(a1-theta)))
a = result.samples_array
b = np.sqrt(np.mean(np.square(a-theta)))

MSE = MSE + [b]
MSE1 = MSE1 + [b1]

Current Output:

The current output. Knowing what is the current behavior is useful.

it has taken more than 3 hours, but without multiprocessing, around 3 hours

Expected Output:

Describe what you expect the output to be. Knowing the correct behavior is also very useful.

should be half the time, cauz when I run once, it takes about 2 minutes with parallel and 1 minute with multiprocessing.

ELFI Version: latest

Python Version: 3.6

Operating System: win10

Comparison to other ABC software

What sets ELFI apart from other commonly used ABC software? We should decide what software we want to compare against and what features should be compared. Here are some ideas:

How easy is it to formulate a inference problem? Is the problem limited to some specific domain (e.g. genetics)?
Developer friendliness: How easy is it to develop new ABC methods?
Can the ABC problem be inferred with different methods without changing the code?

Simplification of sequential simulators

elfi.simulator(name, ..., vectorized=True)

If vectorized=False, then the simulator node __init__ would just vectorize the simulator before passing it on. We would not need if clauses in simulator_operation.

Regression adjustment returns point estimate in specific case

Hi,

This issues arises from a non-standard implementation of the rejection sampler. The summary statistics I'm working with are a set of distribution quantiles; instead of simulating random samples and computing the quantiles of these samples, I've found it's much faster to directly calculate the theoretical quantiles from a given parameter set in the "simulation" step, then define dummy summary nodes which simply pick out the elements of the simulation output. However, when I apply the regression adjustment via elfi.adjust_posterior(), the posterior distribution of each parameter becomes a point estimate (i.e. the outputs arrays contain the same value repeated n_samples times).

In the example below, I attempt to model the quantiles of a chi-squared distribution with 5 degrees of freedom using the normal distribution:

import numpy as np
import scipy.stats
import elfi

seed = 20200227

# Observed data
df = 5
qtiles = np.arange(0.1, 1, 0.1)
quantiles = scipy.stats.chi2.ppf(qtiles, df)

# Simulation function
def theoretical_quantiles(q_vec, loc, scale, batch_size=1, random_state=None):
        loc, scale = np.atleast_1d(loc, scale)
        return scipy.stats.norm.ppf(q_vec, loc[:, None], scale[:, None])

# Summary function
def match_quantile(q, idx):
    return np.atleast_2d(q)[:, idx]

# Model object
mod = elfi.new_model()

# Priors
elfi.Prior(scipy.stats.norm, quantiles.mean(), quantiles.std(), model=mod, name='loc')
elfi.Prior(scipy.stats.uniform, 0, 10, model=mod, name='scale')

# Simulation node
s = elfi.Simulator(theoretical_quantiles, qtiles, mod['loc'], mod['scale'], 
               observed=quantiles, name='norm')

# Dummy summary statistic nodes 
summary_stats = list()
for idx in range(len(qtiles)):
    summary_stats.append(elfi.Summary(match_quantile, mod['norm'], idx, name=f"S{idx}"))

# Distance metric
d = elfi.Distance('euclidean', *summary_stats, name='d')

# Rejection sampling
rejection_sampler = elfi.Rejection(mod, 'd', 
                         output_names=[ss.name for ss in summary_stats], 
                         batch_size=1000, seed=seed)

res = rejection_sampler.sample(500, quantile=0.05)
adj = elfi.adjust_posterior(model=mod, sample=res, summary_names=[ss.name for ss in summary_stats])

res.plot_pairs()
adj.plot_pairs()
adj.outputs

In adj.outputs you should see an array of length n_samples for each distribution parameter (i.e. loc and scale), each containing only one unique value.

I'm not sure whether this is actually an issue with the elfi.adjust_posterior() method or not. Interestingly, when my model is setup using a simulation function which actually generates a random sample and summary nodes which actually compute empirical quantiles of the sample, I do not see this behavior.

Syntactic sugar for acquisition functions

Implement a nice syntax for combining acquisition functions. For example:

a1 = AcquisitionA(...)
a2 = AcquisitionB(...)
acquisition = a1 + a2

Suggestions for public APIs

It would be useful to see (non-working) code examples of the API we want to use.

elfi-dev / elfi Goto Github PK

elfi's People

Contributors

Stargazers

Watchers

Forkers

elfi's Issues

Summary:

Description:

Reproducible Steps:

Current Output:

ELFI Version: 0.7

Python Version: 3.5

Operating System: Ubuntu 16.4

Summary:

Description:

Reproducible Steps:

ELFI Version: latest version

Python Version: 3.5

Operating System: Mac OS

Summary:

Description:

Reproducible Steps:

Current Output:

Expected Output:

ELFI Version: 0.7

Python Version: 3.5

Operating System:

Summary:

Description:

Reproducible Steps:

Current Output:

Expected Output:

ELFI Version: 0.8.0

Python Version: 3.7.10

Operating System: macOS BigSur 11.6

Summary:

Description:

Reproducible Steps:

Current Output:

Expected Output:

ELFI Version:

Python Version:

Operating System:

Summary:

Reproducible Steps:

Current Output:

ELFI Version:

Python Version:

Operating System:

Summary:

Description:

Reproducible Steps:

Expected Output:

ELFI Version:

Python Version:

Operating System:

Summary:

Description:

ELFI Version: 0.7

Python Version: 3.6

Operating System: Ubuntu 16.04

Summary:

Summary:

Description:

Summary:

Description:

Reproducible Steps:

Current Output:

ELFI Version:

Python Version:

Operating System:

Summary:

Description:

Reproducible Steps:

Current Output:

Expected Output:

ELFI Version:

Python Version:

Operating System:

ELFI Version: `0.8.0`

Python Version: `3.7.10`