alexanderfabisch / gmr Goto Github PK
View Code? Open in Web Editor NEWGaussian Mixture Regression
Home Page: https://alexanderfabisch.github.io/gmr/
License: BSD 3-Clause "New" or "Revised" License
Gaussian Mixture Regression
Home Page: https://alexanderfabisch.github.io/gmr/
License: BSD 3-Clause "New" or "Revised" License
In the R package flexmix, I can specify a model for the priors, rather than just a vector a numbers. The model can depend on some arbitrary set of associated variables, called concomitant variables.
https://cran.r-project.org/web/packages/flexmix/vignettes/mixture-regressions.pdf
The idea is that there are some explanatory variables which have some information guiding the prior distribution.
A general model class of finite mixtures of regression models is considered in the following. The
mixture is assumed to consist of K components where each component follows a parametric
distribution. Each component has a weight assigned which indicates the a-priori probability
for an observation to come from this component and the mixture distribution is given by the
weighted sum over the K components. If the weights depend on further variables, these are
referred to as concomitant variables.
flexmix is designed to be extensible, but requires some level of expertise in order to achieve such extension. Out of the box, it has only a single concomitant model form. Would be good to have a similar capability for python.
sorry ?
Hey there,
I have survey data where each person makes multiple observations across different brands.
Is it possible to use this library to extract each person's coefficients/class membership, and the adjusted r2 scores for each class?
Usually, I'd use a software called Latent Gold and was hoping this might be the python way of performing the same analysis.
It's possible my thinking is backward, but I've long considered regression from an "imputation of missing values" point of view (e.g., Schneider, 2001; see Sec. 2). Given that, the docstrings for module functions mvn.condition():
Lines 524 to 528 in 53185ff
...and mvn.regression_coefficients():
Lines 489 to 493 in 53185ff
...seem confusing to me because you refer to the "input feature" as the feature I would associate with the missing data (dependent variables), and the "output feature" as the feature I would associate with the available data (independent variables).
What's more, it appears you might, at least implicitly, share my view since, when you call mvn.condition() from the MVN.predict() and MVN.condition() methods, you "invert" the indices.
I recently submitted a paper to the journal of open source software. Submitted papers are typically tightly coupled with Zenodo releases. Here is the latest release of GMR: https://zenodo.org/record/4889867 and here is the paper review issue: openjournals/joss-reviews#3054
In addition to myself, Zenodo also lists you, @jfsantos and @mralbu, as authors. That is why I would have to agree with you both on whether you want to become author of the JOSS paper as well. Both author lists should be as similar as possible.
@jfsantos The amount of lines of code that you contributed and that survived in the latest version of gmr is 8 so that I suggest that it would not make sense to include you as an author if that is OK. Do you want to stay author of the Zenodo release or should I remove your name here? I wouldn't have any issue with any of the two options.
@mralbu You contributed the sklearn interface and I adopted your idea for doing faster batch mean predictions. So I would ask you whether you want to become co-author of the paper. Have a look at the latest article proof here: openjournals/joss-reviews#3054 (comment) . The current state of the review is that both reviewers accepted the paper and we are currently only discussing the authorship details.
It seems that I've run into a bug where for a GMM with a single feature the is_in_confidence_region
always returns False, and so sample_confidence_region
never terminates. I first noticed this with a conditioned GMM, but one constructed by hand shows the same behavior. Example:
import gmr
gmm = gmr.GMM(n_components=1, priors=[1], means=[[0]], covariances=[[[1]]])
# Works fine: gmm.sample(1)
# Never returns: gmm.sample_confidence_region(1, 1.0)
gmm.is_in_confidence_region([0], 1.0) # False
I'm trying to predict a next step position (using latitude and longitude as attributes and target). I've tried the following:
pred = gmm.predict(len(X)+i, np.array([X[(num-1)+i]]))
where the first value is 10 and the second "array([[41.4051453, 2.1776344]])" with shape (1, 2)
however, I get this error:
`AttributeError: 'list' object has no attribute 'shape'
What I'm doing wrong?
`
Once the model is trained GMM.means has the number of components (n_components, n_features) but GMM.covariances seem to have the number of training points (n_training_points, n_features, n_features).
Can it be that even though the len(covariance) == n_training_points, the first points belong to the n_components? Because after reading the code it seems that works but the algorithms take only the first points ignoring the rest.
Hi!
Thanks a lot for maintaining this great library! It is really convenient and elegant.
I wonder if it is possible to save fitted model as a file?
Thank you for your excellent work,I want to know if there is a more flexible GMR, similar to sklearn, which can limit the covariance type to ‘shared’, ‘spherical’ or ‘diag'. Sometimes we don’t need a 'full' type of covariance matrix. I look forward to your reply.
Hi,
The main part of this code is present or rewrited in GMM mixture in scikit-learn. Is there any reason to continue to develop it instead of modifying the sklearn version ?
Hi, thank you for making this package. However, I am failing to see where I can do a GMM regression with a dependent variable y?
I would like to perform a specific kind of sampling, and I'm not sure what is the best way to go about it. Say I have two variables (1d arrays) X
and Y
, and I have a GMM trained on the [X Y]
dataset. Now I'd like to generate values for Y
based on an array of values of X
, but instead of just getting the mean I'd like to obtain multiple (let's say N
) values sampled according to the mixture distribution. One way to accomplish this is as follows:
Y_sampled = np.empty((len(X), N))
for i in range(len(X)):
Y_sampled[i, :] = gmm.condition([0], X[i]).sample(N)
However, this requires a loop over all values of X
(which predict
avoids). Is there a better/more performant way to get this same result?
TODO
Hi.
I use my dataset when using gmr. My dataset named train is (188318 rows and 14 coloumns) and test is (122000 and 14 coloumns). My label is y_train (188318 rows,)
then, after following from the regression example you provide:
gmm.from_samples(train)
Y = gmm.predict(np.array([0]), y_train[:, np.newaxis])
Not sure why it throws the NaN values? Usually we fit the model using train and y_train, then we predict using the test data right?
Hi,
I met an issue when i using my own data for training, the error message as follows, I also checked my data and confirmed that there was no NaN in the data:
Traceback (most recent call last):
File "C:/Users/FJL/Downloads/gmr-master/gmr-master/examples/Test2.py", line 43, in
initial_means = kmeansplusplus_initialization(X_train, n_components, random_state)
File "C:\Users\FJL\Downloads\gmr-master\gmr-master\gmr\gmm.py", line 46, in kmeansplusplus_initialization
i = _select_next_center(X, centers, random_state, selected_centers, all_indices)
File "C:\Users\FJL\Downloads\gmr-master\gmr-master\gmr\gmm.py", line 58, in _select_next_center
return random_state.choice(all_indices, size=1, p=selection_probability)[0]
File "mtrand.pyx", line 935, in numpy.random.mtrand.RandomState.choice
ValueError: probabilities contain NaN
Hi,
When I fit a GMM to data, I sometimes get the following error:
Traceback (most recent call last):
File "test_nan_problem.py", line 9, in
model.from_samples(frame.values)
File "build/bdist.linux-x86_64/egg/gmr/gmm.py", line 94, in from_samples
File "build/bdist.linux-x86_64/egg/gmr/gmm.py", line 160, in to_responsibilities
File "build/bdist.linux-x86_64/egg/gmr/mvn.py", line 105, in to_probability_density
File "/usr/lib/python2.7/dist-packages/scipy/linalg/decomp_cholesky.py", line 81, in cholesky
check_finite=check_finite)
File "/usr/lib/python2.7/dist-packages/scipy/linalg/decomp_cholesky.py", line 20, in _cholesky
a1 = asarray_chkfinite(a)
File "/usr/lib/python2.7/dist-packages/numpy/lib/function_base.py", line 1022, in asarray_chkfinite
"array must not contain infs or NaNs")
ValueError: array must not contain infs or NaNs
The occurence of the error depends on the data and on the GMM parameters. E.g. it may happen that, with the same data, the error does not occur when I use a different random state.
I am working on master branch, with python2.7 and numpy version 1.11.0. The code to reproduce the error is
import pandas as pd
from gmr import GMM
import random, time
frame=pd.read_csv("data.txt", sep=" ")
random_state = 1578569639
model = GMM(n_components=7, random_state=random_state)
model.from_samples(frame.values)
I attached the data file: data.txt
Best,
Dennis
This package certainly has potential, but there are quite a few issues that need to be addressed before I think it is ready for publication in JOSS. Starting from the top.
install_requires
keyword argument of setup
rather than requires
. I think you need 2 dependencies for this package: numpy and scipy. You might also have a dependency on matplotlib, since a number of functions require matplotlib. Scikit-learn is also required for some examples.setup.py
, see above.I'm not sure how, but my first run of the test failed. Reinstalling using pip install -e .
seemed to fix this. It might be an error from my virtual environment, but you might want to look into this.
============================= test session starts ==============================
platform linux -- Python 3.8.7+, pytest-6.2.2, py-1.10.0, pluggy-0.13.1
rootdir: /home/sam/tmp/gmr
collected 49 items
gmr/tests/test_gmm.py ............................FFF [ 63%]
gmr/tests/test_mvn.py .................. [100%]
=================================== FAILURES ===================================
________________________ test_extract_mvn_negative_idx _________________________
def test_extract_mvn_negative_idx():
gmm = GMM(n_components=2, priors=0.5 * np.ones(2), means=np.zeros((2, 2)),
covariances=[np.eye(2)] * 2)
> assert_raises(ValueError, gmm.extract_mvn, -1)
E AttributeError: 'GMM' object has no attribute 'extract_mvn'
gmr/tests/test_gmm.py:427: AttributeError
________________________ test_extract_mvn_idx_too_high _________________________
def test_extract_mvn_idx_too_high():
gmm = GMM(n_components=2, priors=0.5 * np.ones(2), means=np.zeros((2, 2)),
covariances=[np.eye(2)] * 2)
> assert_raises(ValueError, gmm.extract_mvn, 2)
E AttributeError: 'GMM' object has no attribute 'extract_mvn'
gmr/tests/test_gmm.py:433: AttributeError
______________________________ test_extract_mvns _______________________________
def test_extract_mvns():
gmm = GMM(n_components=2, priors=0.5 * np.ones(2),
means=np.array([[1, 2], [3, 4]]), covariances=[np.eye(2)] * 2)
> mvn0 = gmm.extract_mvn(0)
E AttributeError: 'GMM' object has no attribute 'extract_mvn'
gmr/tests/test_gmm.py:439: AttributeError
=============================== warnings summary ===============================
../venv/lib/python3.8/site-packages/nose/importer.py:12
/home/sam/tmp/venv/lib/python3.8/site-packages/nose/importer.py:12: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
from imp import find_module, load_module, acquire_lock, release_lock
gmr/tests/test_gmm.py: 12 warnings
/home/sam/tmp/venv/lib/python3.8/site-packages/gmr/gmm.py:175: DeprecationWarning: `np.float` is a deprecated alias for the builtin `float`. To silence this warning, use `float` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.float64` here.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
dtype=np.float) / self.n_components
gmr/tests/test_gmm.py: 1204 warnings
gmr/tests/test_mvn.py: 5 warnings
/home/sam/tmp/venv/lib/python3.8/site-packages/gmr/mvn.py:8: DeprecationWarning: `np.bool` is a deprecated alias for the builtin `bool`. To silence this warning, use `bool` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.bool_` here.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
inv = np.ones(n_features, dtype=np.bool)
gmr/tests/test_mvn.py::test_unscented_transform_linear_transformation
gmr/tests/test_mvn.py::test_unscented_transform_linear_combination
gmr/tests/test_mvn.py::test_unscented_transform_projection_to_more_dimensions
gmr/tests/test_mvn.py::test_unscented_transform_quadratic
/home/sam/tmp/venv/lib/python3.8/site-packages/gmr/mvn.py:316: DeprecationWarning: `np.float` is a deprecated alias for the builtin `float`. To silence this warning, use `float` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.float64` here.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
D = np.maximum(D, np.finfo(np.float).eps)
-- Docs: https://docs.pytest.org/en/stable/warnings.html
=========================== short test summary info ============================
FAILED gmr/tests/test_gmm.py::test_extract_mvn_negative_idx - AttributeError:...
FAILED gmr/tests/test_gmm.py::test_extract_mvn_idx_too_high - AttributeError:...
FAILED gmr/tests/test_gmm.py::test_extract_mvns - AttributeError: 'GMM' objec...
================= 3 failed, 46 passed, 1226 warnings in 5.11s ==================
Hi @AlexanderFabisch ,
By going through your algorithm I have noticed that the model cannot predict GMMs when the magnitude of one variable exceeds 1000.
In my case, this is not too much an issue as my variables can be scaled. But this may be interesting to take a look at.
Thanks again for your work, GMR should definitely be part of Sklearn...
Hi, Alexander!
First of all, thanks for this package, it is really awesome! In particular, I appreciate the posterior sampling features.
Do you think a scikit-learn RegressorMixin could be a good additional feature?
I think it could be useful for integration with Pipelines and other scikit-learn tooling.
I made an attempt at it on this branch: GMMRegression
Please let me know if I can be of help implementing it.
Relevant examples at the and of this pull request: #28 from @mralbu
Did some additional comparisons with an experiment using sklearn.mixture.GaussianMixture machinery for fitting the regressor.
from sklego.mixture import GMMRegressor np.set_printoptions(precision=4) np.random.seed(2) scores = [] for _ in range(10): gmr = GMMRegressor(n_components=2) gmr.fit(X, y) scores.append(gmr.score(X, y)) print(np.array(scores)) >> [0.8478 0.8478 0.8478 0.8478 0.8478 0.8478 0.8478 0.8478 0.8478 0.8478] np.random.seed(2) scores = [] for _ in range(10): gmr = GMMRegressor(n_components=2, init_params='random', max_iter=100) gmr.fit(X, y) scores.append(gmr.score(X, y)) print(np.array(scores)) >> [0.8157 0.8061 0.8221 0.8152 0.8221 0.8192 0.8479 0.8282 0.8251 0.7792]Maybe using internal sklearn.mixture machinery might help ease numerical issues, though it would introduce sklearn as a hard dependency and might be out of scope. On the other hand, it would enable the introduction of other regressors such as BayesianGMMRegressor in an easy way, and would have familiar parameters (the same as in sklearn.mixture.GaussianMixture). Do you think exploring the use of sklearn.mixture inner workings would be interesting for gmr?
init_params="kmeans++"
drastically improves stability of our results, it is not the default initialization thoughHello!
Could you provide references for Mixtures of Experts Regression? Like a book/paper to refer algorithm from which it was implemented.
Thanks!
Here is an example of a faster implementation:
Should be float, but is int:
https://github.com/AlexanderFabisch/gmr/blob/master/gmr/gmm.py#L612
This is probably a very basic mistake. But I can't seem to run the example (after installing with pip).
I get the error:
File "/Users/Harald/anaconda/lib/python3.4/site-packages/gmr/init.py", line 1, in
from mvn import MVN, plot_error_ellipse
ImportError: No module named 'mvn'
I tried cloning the complete repository and run the examples from the source but that didn't help either.
Hi Alex,
Want to solve multivariate regression problem using mixed density network. Need a reference for a open source downloable dataset.
Thanks in advance
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.