Git Product home page Git Product logo

ace's Introduction

The ace Package

image

ace is an implementation of the Alternating Conditional Expectation (ACE) algorithm [Breiman85], which can be used to find otherwise difficult-to-find relationships between predictors and responses and as a multivariate regression tool.

The code for this project, as well as the issue tracker, etc. is hosted on GitHub. The documentation is hosted at http://partofthething.com/ace.

What is it?

ACE can be used for a variety of purposes. With it, you can:

  • build easy-to-evaluate surrogate models of data. For example, if you are optimizing input parameters to a complex and long-running simulation, you can feed the results of a parameter sweep into ACE to get a model that will instantly give you predictions of results of any combination of input within the parameter range.
  • expose interesting and meaningful relations between predictors and responses from complicated data sets. For instance, if you have survey results from 1000 people and you and you want to see how one answer is related to a bunch of others, ACE will help you.

The fascinating thing about ACE is that it is a non-parametric multivariate regression tool. This means that it doesn't make any assumptions about the functional form of the data. You may be used to fitting polynomials or lines to data. Well, ACE doesn't do that. It uses an iteration with a variable-span scatterplot smoother (implementing local least squares estimates) to figure out the structure of your data. As you'll see, that turns out to be a powerful difference.

Installing it

ace is available in the Python Package Index, and can be installed simply with the following.

On Linux:

sudo pip install ace

On Windows, use:

pip install ace

Directly from source:

git clone [email protected]:partofthething/ace.git
cd ace
python setup.py install

Note

If you don't have git, you can just download the source directly from here.

You can verify that the installation completed successfully by running the automated test suite in the install directory:

python -m unittest discover -bv

Using it

To use, get some sample data:

from ace.samples import wang04
x, y = wang04.build_sample_ace_problem_wang04(N=200)

and run:

from ace import model
myace = model.Model()
myace.build_model_from_xy(x, y)
myace.eval([0.1, 0.2, 0.5, 0.3, 0.5])

For some plotting (matplotlib required), try:

from ace import ace
ace.plot_transforms(myace.ace, fname = 'mytransforms.pdf')
myace.ace.write_transforms_to_file(fname = 'mytransforms.txt')

Note that you could alternatively have loaded your data from a whitespace delimited text file:

myace.build_model_from_txt(fname = 'myinput.txt')

Warning

The more data points ACE is given as input, the better the results will be. Be careful with less than 50 data points or so.

Demo

A combination of various functions with noise is shown below:

Plot of the input data, which is all over the place

Given just those points and zero knowledge of the underlying functions, ACE comes back with this:

Plot of the output transforms, which clearly show the underlying structure

A longer version of this demo is available in the Sample ACE Problems section.

Other details

This implementation of ACE isn't as fast as the original FORTRAN version, but it can still crunch through a problem with 5 independent variables having 1000 observations each in on the order of 15 seconds. Not bad.

ace also contains a pure-Python implementation of Friedman's SuperSmoother [Friedman82], the variable-span smoother mentioned above. This can be useful on its own for smoothing scatterplot data.

History

The ACE algorithm was published in 1985 by Breiman and Friedman [Breiman85], and the original FORTRAN source code is available from Friedman's webpage.

Motivation

Before this package, the ACE algorithm has only been available in Python by using the rpy2 module to load in the acepack package of the R statistical language. This package is a pure-Python re-write of the ACE algorithm based on the original publication, using modern software practices. This package is slower than the original FORTRAN code, but it is easier to understand. This package should be suitable for medium-weight data and as a learning tool.

For the record, it is also quite easy to run the original FORTRAN code in Python using f2py.

About the Author

This package was originated by Nick Touran, a nuclear engineer specializing in reactor physics. He was exposed to ACE by his thesis advisor, Professor John Lee, and used it in his Ph.D. dissertation to evaluate objective functions in a multidisciplinary design optimization study of nuclear reactor cores [Touran12].

License

This package is released under the MIT License, reproduced here.

References

Breiman85

L. BREIMAN and J. H. FRIEDMAN, "Estimating optimal transformations for multiple regression and correlation," Journal of the American Statistical Association, 80, 580 (1985). [Link1]

Friedman82

J. H. FRIEDMAN and W. STUETZLE, "Smoothing of scatterplots," ORION-003, Stanford University, (1982). [Link2]

Touran12

N. TOURAN, "A Modal Expansion Equilibrium Cycle Perturbation Method for Optimizing High Burnup Fast Reactors," Ph.D. dissertation, Univ. of Michigan, (2012). [The Thesis]

Wang04

D. WANG and M. MURPHY, "Estimating optimal transformations for multiple regression using the ACE algorithm," Journal of Data Science, 2, 329 (2004). [Link3]

ace's People

Contributors

partofthething avatar simplyknownasg avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

ace's Issues

wang04 random seed to big

from ace import ace
import numpy as np
np.__version__

'1.10.2'

from ace.samples import wang04
x, y = wang04.build_sample_ace_problem_wang04(N=200)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-4-448739fc9080> in <module>()
----> 1 from ace.samples import wang04
      2 x, y = wang04.build_sample_ace_problem_wang04(N=200)

/Users/paulperry/anaconda/lib/python2.7/site-packages/ace/samples/wang04.py in <module>()
      7 from ace import ace
      8 
----> 9 numpy.random.seed(9287349087)
     10 
     11 def build_sample_ace_problem_wang04(N=100):

mtrand.pyx in mtrand.RandomState.seed (numpy/random/mtrand/mtrand.c:7781)()

ValueError: Seed must be between 0 and 4294967295

'Model' object has no attribute 'x'

This is a problem with the README.rst documentation.

import matplotlib.pyplot as plt
%matplotlib inline
from ace import ace
ace.plot_transforms(myace, fname = 'mytransforms.pdf')

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-5-3ea400bca898> in <module>()
      2 get_ipython().magic(u'matplotlib inline')
      3 from ace import ace
----> 4 ace.plot_transforms(myace, fname = 'mytransforms.pdf')

/Users/paulperry/anaconda/lib/python2.7/site-packages/ace/ace.pyc in plot_transforms(ace_model, fname)
    241     plt.rcParams.update({'font.size': 8})
    242     plt.figure()
--> 243     numCols = len(ace_model.x) / 2 + 1
    244     for i in range(len(ace_model.x)):
    245         plt.subplot(numCols, 2, i + 1)

AttributeError: 'Model' object has no attribute 'x'

Changing it to the following works:
ace.plot_transforms(myace.ace, fname = 'mytransforms.pdf')

random seed is too large

The random seed used in many of the files is too large when using windows 64-bit.

    numpy.random.seed(9287349087)
  File "mtrand.pyx", line 646, in mtrand.RandomState.seed (numpy\random\mtrand\mtrand.c:7697)
ValueError: Seed must be between 0 and 4294967295

Deriving the transformation equations/forms

I am not sure if this is an issue, or a request for improvement/documentation.

In your ACE implementation, is there a way to expose directly the equations/forms for the transformations? If not, what would you recommend, running OLS in pairs (e.g. phi0-x0, phi1-x1, and so on)?

divide by zero encountered in double scalar

Hi there.
I am able to run the Wang example without problems.
However, when I try to adapt to use with my own data I get an error message, attached below:
divide

Do you know what might be causing this? I am including my data to allow you to try and replicate the issue. I realize it is not a lot of data points but I suspect this has not to do with the issue.

data_final.txt

Use ACE in a predictive sense?

Follow up on a closed issue I have more questions.
I went back and reviewed the literature papers; if I understand those examples correctly, I think that one could use ACE in a predictive sense in a couple of ways:

  1. use the magnitude of the transforms as a measure of the strength of the relationship between the original independent predictors and the target, even though as you say, there are no functional forms for the transforms

  2. predict the target given new measurements of the predictors, using the inverse relationship between theta and Y; e.g., using the example from Wang and Murphy:
    image
    or in pictorial way using my example from here:
    image

Originally posted by @mycarta in #11 (comment)

ACE giving wiggly results for Wang's test problem

The excellent test problem in [Wang, "Estimating Optimal Transformations for Multiple Regression Using the ACE Algorithm"] is acting up a little. On one hand, the basic shapes of the various components are being recovered. On the other hand, they are very noisy and not nearly as good as in the paper. This suggests something is slightly wrong in ACE still.

ace_results

ACE algorithm is too slow

The initial implementation of ACE that's active does not make use of the update capability of the fixed-span smoother. This makes ACE really slow, as confirmed by this profiling result (made with gprof2dot):

ace_profiled

Next step is to just modify the fixed-span smoother to update intelligently. Should be easy.

ACE sometimes gives negative Maximal Correlation values

Maximal Correlation (MC) values should always be between 0 and 1. However, when I calculate the MC values of x1 and x2 with y for values of
x1 = [ 1., 2., 3., 4., 5., 6., 7., 8., 9., 10.]
x2 = [ 2., 5., 9., 7., 4., 8., 1., 6., 3., 10.]
y = [ 3., 9., 11., 8., 4., 15., 14., 20., 30., 32.]
I get a negative MC between x2 and y.

Running the same problem using the R library acepack yields an MC value within the proper range.

Python calculation:

def ACE(x, y):
    ''' 
    Output MCs: Maximal Correlations (MCs) for each variable x 
    Input x: list of 1D numpy arrays, one for each input variable
    Input y: 1D numpy array of responses
    '''
    ace_solver = ace.ACESolver()
    ace_solver.specify_data_set(x, y)
    ace_solver.solve()
    MCs = [] # mutual correlations
    for i in range(len(x)):
        (MC, Pval) = stats.pearsonr( ace_solver.x_transforms[i], ace_solver.y_transform )
        MCs.append( MC )
    return(MCs)

from ace import ace
from scipy import stats
import numpy as np
x = [np.array([ 1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9., 10.]), np.array([ 2.,  5.,  9.,  7.,  4.,  8.,  1.,  6.,  3., 10.])]
y = np.array([ 3.,  9., 11.,  8.,  4., 15., 14., 20., 30., 32.])
MCs = ACE(x, y)
print('MCs = ', MCs)

yields
MCs = [0.9523, -0.0577]

Meaning the Maximal Correlation value between x2 and y is -0.058.

R acepack calculation:

library(acepack)
x1 = 1:10
x2 = c(2.,  5.,  9.,  7.,  4.,  8.,  1.,  6.,  3., 10.)
x <- cbind(x1, x2)
y = c( 3.,  9., 11.,  8.,  4., 15., 14., 20., 30., 32.)
ace_model = ace(x, y)
MC = cor(ace_model$tx, ace_model$ty)

yields MC values of

x1 0.9427068
x2 0.3442552

Giving a positive Maximal Correlation value between x2 and y of 0.344

ACE not giving proper numbers

The ACE algorithm is running and converging, but it doesn't converge to the right answer. What is the issue?

There's a good potential that I'm not applying the expected conditionals correctly. Right now I'm just using the supersmoother S(y|x) as a direct replacement for E(y|x). The FORTRAN code is too unreadable to see how it's supposed to be done.

Plotting in the ace.py:227 accepts only int while the output len(myace.ace.x)/2 is float

I ran the sample wang example and ran to the issue because it does a
num_cols = len(ace_model.x) / 2 + 1 equal to 3.5.
that cause the value of num_cols be 3.5. I removed one of arrays in the x. Now I have 4 x samples. But again plotting threw the error because it is considering it as float .

ValueError: Number of rows must be a positive integer, not 3.0

PS.: "Successfully installed ace-0.3.2"

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.