fabsig / gpboost Goto Github PK

Combining tree-boosting with Gaussian process and mixed effects models

License: Other

CMake 1.21% C 1.19% R 2.40% C++ 89.43% HTML 0.05% Python 1.55% Cuda 0.44% Shell 0.13% M4 0.01% Fortran 2.58% JavaScript 0.02% CSS 0.01% Makefile 0.01% Starlark 0.31% NASL 0.23% Batchfile 0.01% Less 0.39% SWIG 0.04% XSLT 0.01%

artificial-intelligence boosting cpp data-science gaussian-processes machine-learning mixed-effects python r

gpboost's People

Contributors

Stargazers

Watchers

gpboost's Issues

Updating random effect estimates

I'm using the library from Python & it's great, thank you!

I was wondering whether it's possible to update my random effect estimates with new data? Say I have a set up with a single grouping variable (users) and a bunch of covariates that are being handled by the LightGBM fixed effect model.

If I have new data for a bunch of users (some of which are previously unseen), is there a way to update the random effect estimates without changing the (already trained) fixed effects model?

Support for quantile regression as stated in the parameters section of the docs

~/miniconda3/lib/python3.8/site-packages/gpboost/engine.py in train(params, train_set, num_boost_round, gp_model, use_gp_model_for_validation, train_gp_model_cov_pars, valid_sets, valid_names, fobj, feval, init_model, feature_name, categorical_feature, early_stopping_rounds, evals_result, verbose_eval, learning_rates, keep_training_booster, callbacks)
276 # construct booster
277 try:
--> 278 booster = Booster(params=params, train_set=train_set, gp_model=gp_model)
279 if is_valid_contain_train:
280 booster.set_train_data_name(train_data_name)

~/miniconda3/lib/python3.8/site-packages/gpboost/basic.py in init(self, params, train_set, model_file, model_str, silent, gp_model)
2383 self.has_gp_model = True
2384 self.gp_model = gp_model
-> 2385 _safe_call(_LIB.LGBM_GPBoosterCreate(
2386 train_set.construct().handle,
2387 c_str(params_str),

~/miniconda3/lib/python3.8/site-packages/gpboost/basic.py in _safe_call(ret)
113 """
114 if ret != 0:
--> 115 raise GPBoostError(_LIB.LGBM_GetLastError().decode('utf-8'))
116
117

GPBoostError: The GPBoost algorithm can currently not be used for objective = quantile. If this is desired, contact the developer or open a GitHub issue.

It would be fantastic if quantile regression support could be added.

'GPModel' object has no attribute 'cluster_ids_pred'

I've fit a GPModel with cluster IDs specified, as follows:

gp_model = gpb.GPModel(
    gp_coords=data[['platex', 'platez']], 
    cluster_ids=data['levelid'],
    cov_function="exponential")

gp_model.fit(y=data['delta'])

The model fits fine, but when I go to predict, passing values to cluster_ids_pred, I get an unusual error:

pred = gp_model.predict(gp_coords_pred=XZgrid,
                        cluster_ids_pred=np.ones(XZgrid.shape[0], int)*8,
                        predict_var=True, predict_response=False)

AttributeError                            Traceback (most recent call last)
~/pitcher_heat_maps/pitcher_residual_gpboost.py in 
----> 245 pred = gp_model.predict(gp_coords_pred=XZgrid,
      246                         cluster_ids_pred=np.ones(XZgrid.shape[0], int)*8,
      247                         predict_var=True, predict_response=False)
      248 
      249 fig, ax = plt.subplots(figsize=(10,6))

~/anaconda3/envs/heat_maps/lib/python3.8/site-packages/gpboost/basic.py in predict(self, y, group_data_pred, group_rand_coef_data_pred, gp_coords_pred, gp_rand_coef_data_pred, vecchia_pred_type, num_neighbors_pred, cluster_ids_pred, predict_cov_mat, predict_var, cov_pars, X_pred, use_saved_data, predict_response, fixed_effects, fixed_effects_pred)
   4764                                                          check_data_type=False, check_must_be_int=False,
   4765                                                          convert_to_type=None)
-> 4766                 if self.cluster_ids_pred.shape[0] != num_data_pred:
   4767                     raise ValueError("Incorrect number of data points in cluster_ids_pred")
   4768                 if self.cluster_ids_map_to_int is None and not cluster_ids_pred.dtype == np.dtype(int):

AttributeError: 'GPModel' object has no attribute 'cluster_ids_pred'

Clearly I have passed cluster_ids_pred, so this appears to be an internal error.

How to specify "ind_effect_group_rand_coef" for multiple random slopes

I want to fit random slopes using the same covariates with those for fixed effects (i.e., Z=X). But I don't know how to specify "ind_effect_group_rand_coef". I kept getting errors as below:

ind_re_train=[i for i in range(1,full_X_train.shape[1]+1)]
gp_model = gpb.GPModel(group_data=group_train,likelihood = "bernoulli_probit",group_rand_coef_data=full_X_train,
ind_effect_group_rand_coef=ind_re_train)

IndexError: list index out of range

Issue with adding save_raw_data in save_model

When we run
bst.save_model('temp.json', save_raw_data=True)

we get an error:
ValueError Traceback (most recent call last)
/var/folders/1q/0655r2rn31g30hsnl5vg3mv00000gn/T/ipykernel_29604/3290800556.py in
----> 1 bst.save_model('temp.json', save_raw_data=True)

~/Documents/ApolloAgri/py3/lib/python3.7/site-packages/gpboost/basic.py in save_model(self, filename, num_iteration, start_iteration, importance_type, save_raw_data, **kwargs)
3115 save_data['label'] = self.train_set.label
3116 with open(filename, 'w+') as f:
-> 3117 json.dump(save_data, f, default=json_default_with_numpy, indent="")
3118 else: # has no gp_model
3119 importance_type_int = FEATURE_IMPORTANCE_TYPE_MAPPER[importance_type]

~/anaconda3/lib/python3.7/json/init.py in dump(obj, fp, skipkeys, ensure_ascii, check_circular, allow_nan, cls, indent, separators, default, sort_keys, **kw)
177 # could accelerate with writelines in some versions of Python, at
178 # a debuggability cost
--> 179 for chunk in iterable:
180 fp.write(chunk)
181

~/anaconda3/lib/python3.7/json/encoder.py in _iterencode(o, _current_indent_level)
429 yield from _iterencode_list(o, _current_indent_level)
430 elif isinstance(o, dict):
--> 431 yield from _iterencode_dict(o, _current_indent_level)
432 else:
433 if markers is not None:

~/anaconda3/lib/python3.7/json/encoder.py in _iterencode_dict(dct, _current_indent_level)
403 else:
404 chunks = _iterencode(value, _current_indent_level)
--> 405 yield from chunks
406 if newline_indent is not None:
407 _current_indent_level -= 1

~/anaconda3/lib/python3.7/json/encoder.py in _iterencode(o, _current_indent_level)
437 markers[markerid] = o
438 o = _default(o)
--> 439 yield from _iterencode(o, _current_indent_level)
440 if markers is not None:
441 del markers[markerid]

~/anaconda3/lib/python3.7/json/encoder.py in _iterencode(o, _current_indent_level)
434 markerid = id(o)
435 if markerid in markers:
--> 436 raise ValueError("Circular reference detected")
437 markers[markerid] = o
438 o = _default(o)

ValueError: Circular reference detected

The error is not there when the save_raw_data is not added

Simulate data process on demo

Hi, I would like to clarify the process of generating data simulation.
I think it's better to clearly show in the code that the training and test data are generated by a similar process
for example, like below in Python:

#train_data
x_train = np.random.rand(n_train, 2)
F_x = f1d(x_train[:, 0])  # mean
xi = np.sqrt(sigma2) * np.random.normal(size=n_train)  # simulate error term
y = F_x + b_train + xi  # observed data

# test data(generate like train)
x_test = np.random.rand(n_test*n_test, 2)
F_x_test = f1d(x_test[:, 0])
xi_test = np.sqrt(sigma2) * np.random.normal(size=n_test*n_test)
y_test = F_x_test + b_test + xi_test

Is GPBoost compatible with SHAP TreeExplainer?

Hello, thanks a lot for your work. I was wondering if GPBoost can be used with TreeExplainer from the SHAP library (https://shap-lrjball.readthedocs.io/en/docs_update/generated/shap.TreeExplainer.html). SHAP is compatible with LightGBM, but can I use it with GPBoost? Thanks

Can not load Python package in MacOS

I tried to install the package using pip on my Macbook (10.13.6). But once I imported the library, the error message showed

OSError: dlopen(/User/opt/anaconda/lib/python3.8/site-packages/gpboost/lib_gpboost.so, 6): Library not loaded: usr/local/opt/libomp.dylib
Referenced from: /User/opt/anaconda3/lib/python3.8/site-packages/gpboost/lib_gpboost.so
Reson: image not found

I then install libomp using brew. After that, the Error messages became like this:

dlopen(/Users/opt/anaconda3/lib/python3.8/site-packages/gpboost/lib_gpboost.so, 6): Symbol not found: ____chkstk_darwin
  Referenced from: /Users/opt/anaconda3/lib/python3.8/site-packages/gpboost/lib_gpboost.so (which was built for Mac OS X 11.0)
  Expected in: /usr/lib/libSystem.B.dylib
 in /Users/opt/anaconda3/lib/python3.8/site-packages/gpboost/lib_gpboost.so

I tried the same process on my another Mac (v 10.15.7), the same problem showed. I also tried to install it by cloning the package and using gcc to make the file, the same issue with #23 showed.

Some questions about friedman3 instance code

Through understanding friedman3 dataset, I found that it has four independent feature columns, and the relationship between output y and input x is as follows (this relationship already exists when the dataset is created):
y(X) = arctan((X[:, 1] * X[:, 2] - 1 / (X[:, 1] * X[:, 3])) / X[:, 0]) + noise * N(0, 1)
In the friedman3 instance code:
X, F = datasets. make_ friedman3(n_samples=n)
So can it be understood that F here is the y column in friedman3 dataset, and y (X) acts as the mean function F (x)? If so, why is F (x) (or y (X)) in this form?
In addition, in the friedman3 instance, there are the following codes:
y = F + Zb + xi # observed data
So why is y here called observed data? Isn't it calculated by F, Zb and xi?
Please explain in detail, thank you!

Sample Weight Support

Hello - thanks for the wonderful package. From the writeup and the description, it seems very promising.

I wanted to check if GPBoost supports or will support sample weights? I have tried in both the native API and scikit-learn API, and gotten the following error message:

GPBoostError: Weighted data is currently not supported for the GPBoost algorithm. If this is desired, contact the developer or open a GitHub issue.

It's a bit confusing since the API has support for sample weights seemingly, but it looks like they may just not be implemented yet? If so, are there any plans to implement them? This is a key functionality for some domains, where observations many have radically different weights, and fitting an unweighted set will tend to give misleading results.

Thanks!

need raw_data and raw_label to save gp_model?

Hi, thanks for upgrading all the staff regarding saving model.

However, your current saving solution need raw data/label, but most of my models are using very large dataset, so saving them in model is impossible.
According to my understanding, the only usage for raw data/label during prediction is to calculate '''fixed_effect_train''' and '''residual'''(gaussian mode). Can we just save these two arrays instead of raw data/label during saving gpmodel?

GPBoost/python-package/gpboost/basic.py

Lines 3242 to 3250 in c6fedcd

 fixed_effect_train = predictor.predict(self.train_set.data, start_iteration=start_iteration, 

 num_iteration=num_iteration, raw_score=True, pred_leaf=False, 

 pred_contrib=False, data_has_header=data_has_header, 

 is_reshape=False) 

 if self.gp_model.get_likelihood_name() == "gaussian": # Gaussian data 

 residual = self.train_set.label - fixed_effect_train 

 # Note: we need to provide the response variable y as this was not saved 

 # in the gp_model ("in C++") for Gaussian data but was overwritten during training 

 random_effect_pred = self.gp_model.predict(y=residual,

Raise an error when input parameters are misspecified

Hi. First of all thank you for putting time and effort in developing such an interesting tool.

The method train of a GPModel instance does not recognize incorrect names for parameters. For example, when you define the dictionary with parameter values like this:

params = {'num_boost_round': 20000, 'xxxxx': 0.5}

The train method just ignores "xxxxx" and proceeds with training. I think it would be useful to raise a warning or an error to facilitate debugging. For example, the other day I specified the learning rate with a value of 0.5. Unfortunately, there was a typo that slipped under the radar:

params = {"learning_rate:": 0.5}

Note the extra colon ":" in the parameter's name. Because this typo renders the parameter name as invalid, the algorithm just ignores it and assumes a default value for the learning rate (which I believe is 0.1). It took me a while to find out why the algorithm was running but not performing as expected (I knew a priori that with a learning rate of 0.5 the results were good). Thank you for your attention.

How to install GPBoost on an M1 chip

Installation to M1 chip Macbook is tricky, do you have some steps to help install it in M1?

eval_train is not working when use_gp_model_for_validation

within eval_train , gp_model should gp_model.set_prediction_data(train_set_datas) before self.__inner_eval
https://github.com/fabsig/GPBoost/blob/master/python-package/gpboost/basic.py#L2986-L3016

Issue #31 revised

I am still running into errors when importing gpboost libraries, despite having the latest macIOS. The error I receive (similar to #31) is:
OSError: dlopen(/opt/anaconda3/lib/python3.8/site-packages/gpboost/lib_gpboost.so, 6): Library not loaded: /usr/local/opt/libomp/lib/libomp.dylib
Referenced from: /opt/anaconda3/lib/python3.8/site-packages/gpboost/lib_gpboost.so
Reason: image not found

This occurs regardless of installation via pip or conda

Question : Out of Sample Forecast

I am interested in fitting a model of housing rental prices with property owner and geospatial mixed effects. The property owner can own multiple properties.

Does GPboost allow for an out of sample prediction, in case new data on property owners are found? Or would this somehow error out the model?

Second question: Is it possible to use GPboost in a panel data framework and do a one period ahead forecast? Identification would be off of cross sectional features. Thanks!

Is it possible to use gpboost on big data

I continued working with gpboost in hope to apply it to my own data. However, I bumped into what I think is a big limitation and would like to know if something can be done about it.

I seems gpboost is very sensitive to sample size. Actually, calculation time seems to grow exponentially with sample size (O(n^3))

I use a modified version of your examples to illustrate the situation:

First: I prepare the session:

library(gpboost)
library(dplyr)
library(tictoc)

Then, I wrapped your data creation code into a function:

make_data <- function(ntrain, nx,   likelihood = "poisson"){
coords_train <- matrix(runif(2)/2,ncol=2)
while (dim(coords_train)[1]<ntrain) {
coord_i <- runif(2)
if (!(coord_i[1]>=0.7 & coord_i[2]>=0.7)) {
coords_train <- rbind(coords_train,coord_i)
}
}
 
x2 <- x1 <- rep((1:nx)/nx,nx)
for(i in 1:nx) x2[((i-1)*nx+1):(i*nx)]=i/nx
coords_test <- cbind(x1,x2)
 
coords <- rbind(coords_train, coords_test)
ntest <- nx * nx
n   <- ntrain + ntest
 
#   Simulate fixed effects
X_train <- matrix(runif(2*ntrain),ncol=2)
x   <- seq(from=0,to=1,length.out=nx^2)
X_test <- cbind(x,rep(0,nx^2))
X   <- rbind(X_train,X_test)
f1d <- function(x) 1/(1+exp(-(x-0.5)*10)) - 0.5
f   <- f1d(X[,1])
 
#   Simulate spatial Gaussian process
sigma2_1 <- 0.25 # marginal variance of GP
rho <- 0.1 # range parameter
D   <- as.matrix(dist(coords))
Sigma <- sigma2_1 * exp(-D/rho) + diag(1E-20,n)
C   <- t(chol(Sigma))
b_1 <- rnorm(n=n)
eps <- as.vector(C %*% b_1)
eps <- eps - mean(eps)
 
#   Simulate response variable
if (likelihood == "bernoulli_probit") {
probs <- pnorm(f+eps)
y <- as.numeric(runif(n) < probs)
}   else if (likelihood == "bernoulli_logit") {
probs <- 1/(1+exp(-(f+eps)))
y <- as.numeric(runif(n) < probs)
}   else if (likelihood == "poisson") {
mu <- exp(f+eps)
y <- qpois(runif(n), lambda = mu)
}   else if (likelihood == "gamma") {
mu <- exp(f+eps)
y <- qgamma(runif(n), scale = mu, shape = 1)
}
 
#   Split into training and test data
y_train <- y[1:ntrain]
dtrain <- gpb.Dataset(data = X_train, label = y_train)
y_test <- y[1:ntest+ntrain]
eps_test <- eps[1:ntest+ntrain]
 
dtest <- gpb.Dataset.create.valid(dtrain, data = X_test, label =   y_test)
 
list(dtrain=dtrain, dtest=dtest, coords_train = coords_train,   coords_test = coords_test)
}

I set the different parameters and select the sample sizes I want to test:

likelihood="poisson"
params <- list(learning_rate = 0.1,   min_data_in_leaf = 20,
objective = likelihood,   monotone_constraints = c(1,0))
nrounds <- 35
 
tested_n <-   c(200,500,1000,1500,2000,2500,3000,4000, 6000, 10000)

Finally, I run gpboost on those sample sizes in a loop by saving the time taken to do the process :

tic.clearlog()
for(i in tested_n){
set.seed(i)
the_data <- make_data(i, 30, likelihood)
 
Sys.time()
 
tic("total")
tic("GBModel")
gp_model <- GPModel(gp_coords = the_data$coords_train, cov_function   = "exponential",
likelihood =   likelihood)
toc(log = TRUE, quiet = TRUE)
 
tic("Boosting")
bst <- gpb.train(data = the_data$dtrain, gp_model = gp_model,
nrounds = nrounds, params   = params, verbose = 1)
toc(log = TRUE, quiet = TRUE)
toc(log = TRUE, quiet = TRUE)
 
}

I can than plot the time required:

dd <- tic.log(format = FALSE)   %>%
lapply(as.data.frame) %>%
do.call(rbind, .) %>%
mutate(n = unlist(lapply(tested_n, rep, 3))) %>%
mutate(duration = toc-tic) %>%
mutate(msg=factor(msg, levels = c("GBModel",   "Boosting", "total"))) %>%
filter(msg=="Boosting")
 
bigO1 <- nls(duration ~ a * n ^ b,   data=dd, start=list(a=0.0001, b=2))
plot(dd$n,dd$duration, xlab="sample   size", ylab="Duration (sec)")
lines(1:10000, predict(bigO1,   newdata=data.frame(n = 1:10000)))
text(3000, 40000, "duration ~ 1.2e-8   * n^3.1")

which produced this plot:

We can see that passing 6000 data point, it gets really hard and slow to use gpboost which I personally believe this is low. I was initially expecting similar performance as for gpboost or lightgbm and we know that lighgbm can handles millions of data point without any problem. But it seems that the limiting factor is the random effect estimation. The same problem seems present in other mixed effect packages out there. Tree boosting is a powerful tool that allow us to obtain good quality predictions on data set with lots of observations and lots of variables. Gpboost seem to fail to harness this power because it added a random component to the model, but at the same time this random component is what make GPBoost so interesting.

So could gpboost be adapt to handle larger amount of data?

To give a context, I work in the insurance industry and we usually work with hundred of thousand if not millions of data points. This amount of data is generally needed because we are predicting rare events. In the particular dataset I'm working on right now to test yours and other methods, we have 114000 distinct training data points. On this many data points, it was rather a memory usage the problem rather than a timing problem (it clogged my 432gb Ram machine!). I subsampled it to 44000 distinct training points and memory wasn't a problem anymore but, even if I let the model running for over 4 days, the calculation was still not finished.

I really think this approach has potential and is really useful. But to be truly democratized, it will have to be able to handle more data.

I'm no programmer and no mathematician, so it's hard for me to contribute or propose solutions. I'll take a chance here anyway with some suggestions. Feel free to disregard them.

It seems that the code do some kind of distance matrix calculation. This is computationally intensive. Could we add some kind of distance limit, so everything above this limit is disregard in term of correlation in the gaussian process? Using specialized spatial tool could maybe help in that regard.
Could we run the lightgm on all the data, but then "tile" the gaussian process to size that is more manageable?

Sorry for the long post, but I still wanted to share this so I could have your view about it.

Issues while using GPBoostClassifier

While running a classification using GPBoostClassifier with simulated data the following error is observed:

On checking the sklearn.py, I noticed that the group_data_pred argument was not being used to call the predict_proba from predict in the class GPBoostClassifier. Hence the size mismatch.

On passing the group_data_pred argument, the following error was found:

Further, I noticed that the docstring for function predict in basic.py states that "either a NumPy array (if there is no GPModel) or a dict with three entries each having NumPy arrays as values (if there is a GPModel)". So in my case, it was returning a dict but it was attempting the following operation on that which resulted in the failure:

return np.vstack((1. - result, result)).transpose()

This is resolved by checking if the gp_model is present with the help Booster class "has_gp_model".

About nested group structure

Hi, first of all, thanks for making this great library!
Wonder if there is any way to incorporate nested group structure or crossed structure.
So far, in your example code, it seems that group means one-level group, not multilevel.
Thanks for reading this issue!

Installation fails Windows 10

Windows 10
R 4.0.2
CMake 3.20.0

Installation from command line fails on step 3:

C:\Windows\System32\GPBoost>Rscript build_r.R
Error in .handle_result(result) : Copying files failed!
Execution halted

All PATH variables in instructions set as required

Where to find the random effects estimates?

I was only able to find estimates of the prior variance of random effects, using gp_model.summary(). But for inference purpose, I would also like to see the estimated random effects, i.e, the estimates of b in the y = F(X) + Zb + e formula. How can I do that? Thank you.

difference between the fit and the gpboost functions

This is a question more than an issue. If you think I should ask this question somewhere else (like stackoverflow or other) let me know and feel free to close this issue.

I'm currently trying to get your package to work with my data which are claims from the insurance industry following a poisson distribution.

When I look at your examples for generalized models (here and here), I see you use the fit (or fitGPModel) function while in another example you use the gpboost function (like here).

I don't quite grasp the difference between the two functions and why we should use one over the other. Is it because the fit function does not do boosting? And if it's the case, what is the point in using this function?

Thanks for clarifying that.

Guidance on scalability

Merely instantiating a GPModel becomes very slow when group_data has more than about 1k-20k rows, even if the number of distinct groups is constant and small. What is the computational complexity of the GPModel? Do any hyperparameters help reduce the complexity?

very slow on initializing gpmodel for non-consecutive group_data

As in #3, the speed for initializing consecutive group_data is tested great, but it seems not properly handeled non-consecutive data.
I have tested this on a super cool machine, and the performance between consecutive group data and non-consecutive ones are 0.4s and 1800+s.

pls help on this, because lgb model requires the data in its query order, which might be very different with the group_data requested in gaussian model.

Support needed to pickle the GPBoost model

Hi I have used GPBoost library for my project which requires mixed effects features. After the model building using training data, I am trying to save the model using pickle and joblib. I am encountering the below error.

ValueError: ctypes objects containing pointers cannot be pickled

Seems like there are pointers used in the library. Could you please help me with a way where I can save and use the trained model of GPBoost?

Help file for the set_prediction_data() function

I've been playing around with the tutorials and bump into this error:

(sorry for the print screen instead of a normal copy paste, I have IT limitations)

It took me a while to understand why and how to use this function. I had to do lot of trial and error, as well and reading the function code itself and reading this issue.

The problem is that there is no help file for the set_prediction_data function. ?set_prediction_data do not work, which is probably normal because this function seems to be part of the model output itself. This is problematic because there is no easy way to know what arguments the function requires.

In my case, I managed to get it working:

But could it be possible to create an help file for this function? If it's not possible, may the error message could be change to something the refer to the argument use_gp_model_for_validation=T, and add details about this function in the argument description? I think that could help.

Finally, to make sure I understand well the use of the argument use_gp_model_for_validation, is setting it to false equivalent of not taking into account the Random structure of the model in the prediction, a little bit like setting re.form=~0 when predicting with the lme4pacakge?

Predict error: Incorrect number of data points in fixed_effects_pred

Trying to do a simple example with gpboost... but predict function does not seems to work properly!

import gpboost as gpb
#--------------------Training----------------

create dataset for gpb.train

data_train = gpb.Dataset(X_train_enc_scaled, y_train)

Train model

gp_model = gpb.GPModel(group_data=g_encoded)

num_boost_round = 1000

params = {'learning_rate': 0.01, 'min_data_in_leaf': 20, 'objective': "binary",'verbose': 1}

bst = gpb.train(params=params,
train_set=data_train,
gp_model=gp_model,
num_boost_round=num_boost_round,
#early_stopping_rounds=5,
use_gp_model_for_validation=True)

Predict

gp_model.summary()

pred_resp = bst.predict(data=X_test_enc_scaled,
group_data_pred=g_test_encoded,
predict_var=True, raw_score=False)

ERROR;

File "C:\dev\Anaconda3\lib\site-packages\gpboost\basic.py", line 4838, in predict
raise ValueError("Incorrect number of data points in fixed_effects_pred")

if i set raw_score=True to the predict function, i can see that the size of fixed_effects_pred is equal to the test data, but the other effects are equal to train dataset !???

Accommodating Missing Data

Hello, I'm working with gpboost in Python and I'm trying to fit a GPModel (using a binary likelihood) to a dataset with some missing features. If I drop the rows with missing values, the model converges.

If I don't do that, and instead fill in missing values with np.nan or float('nan') or None, I run into the following error:

"NaN or Inf occurred in approximate negative marginal log-likelihood for intial parameters. Please provide better initial values."

Do you know why this might be? I'm running version 0.6.7.

How to apply gpboost on grouped data

I am new to grouped data and having some issue.

I would like to know can I apply gpboost to this dataset ?

dummy = np.random.randint(5,30,size=220) dummy= pd.DataFrame(dummy) dummy.columns=['score'] dummy.tail() from statsmodels.datasets import grunfeld data = grunfeld.load_pandas().data data.year = data.year.astype(np.int64) final=pd.concat([data,dummy], axis=1) final.head()

or need to put in 3d format

final_data = final.set_index(['firm','year']) final_data.head()

It would be better to know how it works for panel data with an example which is more intutive to understand, I would really appreciate if you can add an example like this for better understanding

fatal error in R

I have been trying to try out GPBoost using the demo here, but I get a fatal error in both RStudio and RGui when running

# Training
# Define random effects model
gp_model <- GPModel(group_data = group, likelihood = likelihood)
bst <- gpboost(data = X, label = y, verbose = 0,
               gp_model = gp_model,
               monotone_constraints = c(1,0),
               nrounds = nrounds, 
               params = params)
summary(gp_model) # Trained random effects model (true variance = 0.5)

and have also tried using the following from the help which also causes R to crash.

data(agaricus.train, package = "gpboost")
train <- agaricus.train
dtrain <- gpb.Dataset(train$data, label = train$label)
data_file <- tempfile(fileext = ".data")
gpb.Dataset.save(dtrain, data_file)
dtrain <- gpb.Dataset(data_file)
gpb.Dataset.construct(dtrain)

Can you provide any assistance?

Python package : sklearn compatibility

The current version throws error with sklearn version > 0.21.3, eventhough the requirement was lifted from setup in a recent commit.

GPBoost/python-package/gpboost/sklearn.py

Line 299 in 454f778

elif SKLEARN_VERSION > '0.21.3':

Will ignoring the requirement create major issues with the model?

Output discretization when using cluster IDs

Fitting a GP model with cluster_ids included on a set of spatial coordinates results in an odd discretization pattern along the x-coordinate:

whereas if the cluster IDs are removed, this effect is not occurring.

The model is as follows:

gp_model = gpb.GPModel(
    gp_coords=data[['platex', 'platez']], 
    cluster_ids=data['levelid'],
    cov_function="exponential")

where platex and platez are both continuous coordinates. Any intuition for this?

Build upon which lgb version?

which lightgbm code version (git commit id ) u built on? I would like to merge ur code with latest lightgbm.

how to implement this on GPBoost on CPG data with binary classfification

I have sales data where i need to predict which product would sold or not (Binary classification)
I have considered group variable as my shop id and my product id variable in dataframe.

I am trying to implement "classification_non_Gaussian_data" solution on my problem statement.
Now I got below issue described below.

Check failed: (static_cast<size_t>(train_data_->num_total_features())) == (config->monotone_constraints.size()) at /home/whsigris/Dropbox/HSLU/Projects/MixedBoost/GPBoost/python-package/compile/src/LightGBM/boosting/gbdt.cpp, line 55 .
After fixing 1st issue by assigning any data variable to monotonic constraint and then pass it to parameter( I dont know where its a right approach to fix this or not) , Now Model is taking too much time to run (May be due to huge data).
How to get prediction in yes/No once i train model as i hav'nt find proper result in code solution example shared on github by you.

Support for additional objective functions with random effects and boosting

Hi, very excited about this package! Do you see extensions to additional loss functions for random effects + boosting as a potential near term upgrade? For my use case, I need a custom objective function, which is currently supported with LightGBM via fobj. I am of course getting the following error, "GPBoostError: Gaussian process boosting can currently only be for 'objective = "regression"'. Curious if adding support for either custom objectives or the other objective functions offered by LightGMB is in the cards.

Thanks for the great work here.

Compilation of PyPi Package (unix)

Hello: I have installed the gpboost PyPi package to the user path ( --user ). I believe GPboost recommends pypi installation with --U (all users), which is not possible for me.

When I import gpboost I have the following errors:

Error message: /usr/lib64/libm.so.6: version 'GLIBC_2.27' not found. Looks like the gpboost package requires an updated GLIBC.

I checked the system, looks like we are running version 2.17. I know for command line compilation, the requirement is > 2.14.

I wonder if the python package is looking in the wrong place? Or if the requirement has changed.

Upgrading GLIBC will be a tough endeavor for me as I am working in a shared environment. - Redhat linux distribution.

Any tips on how to work around this issue?

Link to a Gaussian Process Boosting in the README is a dead link

Can you provide the right link there? I could not make a pull request because I could not actually find the paper here.

Get optimal parameters of trained gpboost.basic.Booster

First of all thank you for all the nice improvements added in the last package update.

I have performed a grid search optimization approach to determine optimal parameters for analysis. After I have found them, I trained a model with those parameters and saved the model to a json file. When I load the model, however, I cannot get the parameters used to train the model. Not sure if I am using the wrong attributes, and I also know this is just a minor issue. I can save independently another file with a list of parameters but just thought it would be handier to access them through the loaded model itself.

Please find below a snippet of code to replicate my problem. I am using Spyder 5.0.0 with Python 3.8 on Windows 10. Thank you very much.


import os
import numpy as np
import gpboost as gpb
from sklearn.model_selection import KFold

np.random.seed(42)

#--------------------------- Simulated data ---------------------------------------------------
#same simulated dataset used in the tutorials of this package
def f1d(x):
    """Non-linear function for simulation"""
    return (1.7 * (1 / (1 + np.exp(-(x - 0.5) * 20)) + 0.75 * x))

n = 5000
m = 500  
group = np.arange(n)  # grouping variable
for i in range(m):
    group[int(i * n / m):int((i + 1) * n / m)] = i
b1 = np.random.normal(size=m)  
eps = b1[group]
X = np.random.rand(n, 2)
f = f1d(X[:, 0])
xi = np.sqrt(0.01) * np.random.normal(size=n) 
y = f + eps + xi  # observed data
#----------------------------------------------------------------------------------------------

#learning parameters to be tested
learn_params = {'learning_rate': 0.05,
                'max_depth': 6,
                'min_data_in_leaf': [5, 10, 15],
                'max_bin': [50, 100]} 

#core parameters
core_params = {'objective': 'regression_l2', 'num_leaves': 50} 

#input data
kfold = KFold(n_splits=5, random_state=42, shuffle=True)
gpb_data = gpb.Dataset(X, y)
gpb_model = gpb.GPModel(group_data=group).set_optim_params(params={"optimizer_cov": "gradient_descent"})

#perform grid search
opt_params = gpb.grid_search_tune_parameters(param_grid=learn_params,
                                             params=core_params,
                                             num_try_random=None,
                                             folds=kfold,
                                             gp_model=gpb_model,
                                             use_gp_model_for_validation=True,
                                             train_set=gpb_data,
                                             num_boost_round=1000, 
                                             metrics='root_mean_squared_error')

#opt_params results
# {'best_params': {'learning_rate': 0.05,
#   'max_depth': 6,
#   'min_data_in_leaf': 15,
#   'max_bin': 50},
#   'best_iter': 59,
#   'best_score': 1.0046612116531306}

#concatenate optimal params to train a model
gpb_params = dict()
ls_dict = [opt_params["best_params"], core_params]
for dict_ in ls_dict:
    gpb_params.update(dict_)
gpb_params = dict(gpb_params)

#train model
gpb_trained = gpb.train(params=gpb_params, train_set=gpb_data, gp_model=gpb_model, 
                        num_boost_round=opt_params['best_iter'])

#using the params attribute we can get the parameters of gpb_trained
#gpb_trained.params

#save trained model to a file
path = os.path.join(os.getcwd(), "model.json")
gpb_trained.save_model(path)

#load model
loaded_model = gpb.Booster(model_file = path) 

#when trying to use the attribute params from loaded_model, an empty dictionary is printed
loaded_model.params #returns {}

Temporal and hierarchical groups effects modeling

Hello, I'm very excited about your work on GPBoost, and I'm trying to understand how to apply it to my problem domain. Basically, I have many sensors which are measuring entities. For each entity, I have a set of weights that represent its membership to various groups, and also, each measurement has an associated time, so I have both temporal & panel data for each observation. Is it possible to represent this structure?

Section 2.3.3 seems to indicate how to do this, but I'm struggling to understand how to take \Psi from section 2.3.2 and combine it with \Psi from section 2.3.1.

Thank you for your consideration.

Output interpretation

Hi!

thanks for the package. I have problems understanding the output of the model and I'm not able to find documentation that clarifies it.

For what I know, gp_model.summary() has 2 possible outputs: ['Error_term', 'Group_1']. The first one I suppose is the RMSE because in the binary classification this measure goes away. I just don't understand what is 'Group_1', it is a covariance parameter, but how is interpreted?
Is this value used for: "all other variance parameters are replaced by the ratio of their original value and the error variance" (stated in your paper)? So it's like an overall variance?
From the formula: "y = F(X) + Zb + xi" I conclude that if I don't take into account the random effects "b" (that is: pred['random_effect_mean']), the model is equivalent as a normal gradient boosting. Then you do a reduction of bias by taking into account the random effects. Is this correct?
It's the first time I see a model that can predict random effects. Usually, they are accounted for, that is, you allow the covariance matrix to have groups. In any case, I'm not sure how an -0.03 should be interpreted, for example.
pred['random_effect_cov'] ... I would expect a matrix where each element is the variance/covariance of each group. But it's not that.
I understand that there are 2 possibles random effects:
A. group_data is for group labeling.
B. gp_coords is for accounting for time (all groups the same AR(1) model). *cluster_ids: if you want to have more than one AR(1) model.
But to do predictions still will be: pred['fixed_effect'] + pred['random_effect_mean']

regards,
Ferran

Exception: An error has occurred while building gpboost library file

I received the following error on MacOSX 12.0.1, missing OptimLib : in trying to install it, it seems to go back to the Eigen problem described before!

XXXX ~ % pip3 install gpboost -U               
DEPRECATION: Configuring installation scheme with distutils config files is deprecated and will no longer work in the near future. If you are using a Homebrew or Linuxbrew Python, please see discussion at https://github.com/Homebrew/homebrew-core/issues/76621
Collecting gpboost
  Using cached gpboost-0.7.1.tar.gz (1.8 MB)
  Preparing metadata (setup.py) ... done
Requirement already satisfied: wheel in /opt/homebrew/lib/python3.9/site-packages (from gpboost) (0.37.1)
Requirement already satisfied: numpy in /opt/homebrew/lib/python3.9/site-packages (from gpboost) (1.22.1)
Requirement already satisfied: scipy in /opt/homebrew/lib/python3.9/site-packages (from gpboost) (1.7.3)
Requirement already satisfied: scikit-learn!=0.22.0 in /opt/homebrew/lib/python3.9/site-packages (from gpboost) (1.0.2)
Requirement already satisfied: threadpoolctl>=2.0.0 in /opt/homebrew/lib/python3.9/site-packages (from scikit-learn!=0.22.0->gpboost) (3.0.0)
Requirement already satisfied: joblib>=0.11 in /opt/homebrew/lib/python3.9/site-packages (from scikit-learn!=0.22.0->gpboost) (1.1.0)
Building wheels for collected packages: gpboost
  Building wheel for gpboost (setup.py) ... error
  ERROR: Command errored out with exit status 1:
   command: /opt/homebrew/opt/[email protected]/bin/python3.9 -u -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/private/var/folders/b4/8s8w06nx4hl653jrlstcpbzr0000gp/T/pip-install-njw87q84/gpboost_a18887583e13434eb2dc0b8f86e9bf10/setup.py'"'"'; __file__='"'"'/private/var/folders/b4/8s8w06nx4hl653jrlstcpbzr0000gp/T/pip-install-njw87q84/gpboost_a18887583e13434eb2dc0b8f86e9bf10/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(__file__) if os.path.exists(__file__) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' bdist_wheel -d /private/var/folders/b4/8s8w06nx4hl653jrlstcpbzr0000gp/T/pip-wheel-4k2hw1ax
       cwd: /private/var/folders/b4/8s8w06nx4hl653jrlstcpbzr0000gp/T/pip-install-njw87q84/gpboost_a18887583e13434eb2dc0b8f86e9bf10/
  Complete output (65 lines):
  running bdist_wheel
  /opt/homebrew/lib/python3.9/site-packages/setuptools/command/install.py:34: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools.
    warnings.warn(
  running build
  running build_py
  running egg_info
  writing gpboost.egg-info/PKG-INFO
  writing dependency_links to gpboost.egg-info/dependency_links.txt
  writing requirements to gpboost.egg-info/requires.txt
  writing top-level names to gpboost.egg-info/top_level.txt
  reading manifest file 'gpboost.egg-info/SOURCES.txt'
  reading manifest template 'MANIFEST.in'
  no previously-included directories found matching 'build'
  warning: no files found matching '*.rst'
  warning: no files found matching '*.so' under directory 'gpboost'
  warning: no files found matching '*.so' under directory 'compile'
  warning: no files found matching '*.dll' under directory 'compile/Release'
  warning: no files found matching '*' under directory 'compile/external_libs/eigen/Eigen/src/Sparse'
  warning: no files found matching 'GPBoost.sln' under directory 'compile/windows'
  warning: no files found matching 'GPBoost.vcxproj' under directory 'compile/windows'
  warning: no files found matching '*.dll' under directory 'compile/windows/x64/DLL'
  warning: no previously-included files matching '*.py[co]' found anywhere in distribution
  warning: no previously-included files found matching 'compile/external_libs/compute/.git'
  adding license file 'LICENSE'
  installing to build/bdist.macosx-12-arm64/wheel
  running install
  INFO:GPBoost:Starting to compile the library.
  INFO:GPBoost:Starting to compile with CMake.
  Traceback (most recent call last):
    File "/private/var/folders/b4/8s8w06nx4hl653jrlstcpbzr0000gp/T/pip-install-njw87q84/gpboost_a18887583e13434eb2dc0b8f86e9bf10/setup.py", line 106, in silent_call
      subprocess.check_call(cmd, stderr=log, stdout=log)
    File "/opt/homebrew/Cellar/[email protected]/3.9.10/Frameworks/Python.framework/Versions/3.9/lib/python3.9/subprocess.py", line 373, in check_call
      raise CalledProcessError(retcode, cmd)
  subprocess.CalledProcessError: Command '['make', '_gpboost', '-j4']' returned non-zero exit status 2.
  
  During handling of the above exception, another exception occurred:
  
  Traceback (most recent call last):
    File "<string>", line 1, in <module>
    File "/private/var/folders/b4/8s8w06nx4hl653jrlstcpbzr0000gp/T/pip-install-njw87q84/gpboost_a18887583e13434eb2dc0b8f86e9bf10/setup.py", line 335, in <module>
      setup(name='gpboost',
    File "/opt/homebrew/lib/python3.9/site-packages/setuptools/__init__.py", line 155, in setup
      return distutils.core.setup(**attrs)
    File "/opt/homebrew/lib/python3.9/site-packages/setuptools/_distutils/core.py", line 148, in setup
      return run_commands(dist)
    File "/opt/homebrew/lib/python3.9/site-packages/setuptools/_distutils/core.py", line 163, in run_commands
      dist.run_commands()
    File "/opt/homebrew/lib/python3.9/site-packages/setuptools/_distutils/dist.py", line 967, in run_commands
      self.run_command(cmd)
    File "/opt/homebrew/lib/python3.9/site-packages/setuptools/_distutils/dist.py", line 986, in run_command
      cmd_obj.run()
    File "/opt/homebrew/lib/python3.9/site-packages/wheel/bdist_wheel.py", line 335, in run
      self.run_command('install')
    File "/opt/homebrew/lib/python3.9/site-packages/setuptools/_distutils/cmd.py", line 313, in run_command
      self.distribution.run_command(command)
    File "/opt/homebrew/lib/python3.9/site-packages/setuptools/_distutils/dist.py", line 986, in run_command
      cmd_obj.run()
    File "/private/var/folders/b4/8s8w06nx4hl653jrlstcpbzr0000gp/T/pip-install-njw87q84/gpboost_a18887583e13434eb2dc0b8f86e9bf10/setup.py", line 249, in run
      compile_cpp(use_mingw=self.mingw, use_gpu=self.gpu, use_cuda=self.cuda, use_mpi=self.mpi,
    File "/private/var/folders/b4/8s8w06nx4hl653jrlstcpbzr0000gp/T/pip-install-njw87q84/gpboost_a18887583e13434eb2dc0b8f86e9bf10/setup.py", line 200, in compile_cpp
      silent_call(["make", "_gpboost", "-j4"], raise_error=True,
    File "/private/var/folders/b4/8s8w06nx4hl653jrlstcpbzr0000gp/T/pip-install-njw87q84/gpboost_a18887583e13434eb2dc0b8f86e9bf10/setup.py", line 110, in silent_call
      raise Exception("\n".join((error_msg, LOG_NOTICE)))
  Exception: An error has occurred while building gpboost library file
  The full version of error log was saved into /Users/XXX/GPBoost_compilation.log
  ----------------------------------------
  ERROR: Failed building wheel for gpboost
  Running setup.py clean for gpboost
Failed to build gpboost
Installing collected packages: gpboost
    Running setup.py install for gpboost ... error
    ERROR: Command errored out with exit status 1:
     command: /opt/homebrew/opt/[email protected]/bin/python3.9 -u -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/private/var/folders/b4/8s8w06nx4hl653jrlstcpbzr0000gp/T/pip-install-njw87q84/gpboost_a18887583e13434eb2dc0b8f86e9bf10/setup.py'"'"'; __file__='"'"'/private/var/folders/b4/8s8w06nx4hl653jrlstcpbzr0000gp/T/pip-install-njw87q84/gpboost_a18887583e13434eb2dc0b8f86e9bf10/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(__file__) if os.path.exists(__file__) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /private/var/folders/b4/8s8w06nx4hl653jrlstcpbzr0000gp/T/pip-record-qsq3oe03/install-record.txt --single-version-externally-managed --compile --install-headers /opt/homebrew/include/python3.9/gpboost
         cwd: /private/var/folders/b4/8s8w06nx4hl653jrlstcpbzr0000gp/T/pip-install-njw87q84/gpboost_a18887583e13434eb2dc0b8f86e9bf10/
    Complete output (36 lines):
    running install
    /opt/homebrew/lib/python3.9/site-packages/setuptools/command/install.py:34: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools.
      warnings.warn(
    INFO:GPBoost:Starting to compile the library.
    INFO:GPBoost:Starting to compile with CMake.
    Traceback (most recent call last):
      File "/private/var/folders/b4/8s8w06nx4hl653jrlstcpbzr0000gp/T/pip-install-njw87q84/gpboost_a18887583e13434eb2dc0b8f86e9bf10/setup.py", line 106, in silent_call
        subprocess.check_call(cmd, stderr=log, stdout=log)
      File "/opt/homebrew/Cellar/[email protected]/3.9.10/Frameworks/Python.framework/Versions/3.9/lib/python3.9/subprocess.py", line 373, in check_call
        raise CalledProcessError(retcode, cmd)
    subprocess.CalledProcessError: Command '['make', '_gpboost', '-j4']' returned non-zero exit status 2.
    
    During handling of the above exception, another exception occurred:
    
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/private/var/folders/b4/8s8w06nx4hl653jrlstcpbzr0000gp/T/pip-install-njw87q84/gpboost_a18887583e13434eb2dc0b8f86e9bf10/setup.py", line 335, in <module>
        setup(name='gpboost',
      File "/opt/homebrew/lib/python3.9/site-packages/setuptools/__init__.py", line 155, in setup
        return distutils.core.setup(**attrs)
      File "/opt/homebrew/lib/python3.9/site-packages/setuptools/_distutils/core.py", line 148, in setup
        return run_commands(dist)
      File "/opt/homebrew/lib/python3.9/site-packages/setuptools/_distutils/core.py", line 163, in run_commands
        dist.run_commands()
      File "/opt/homebrew/lib/python3.9/site-packages/setuptools/_distutils/dist.py", line 967, in run_commands
        self.run_command(cmd)
      File "/opt/homebrew/lib/python3.9/site-packages/setuptools/_distutils/dist.py", line 986, in run_command
        cmd_obj.run()
      File "/private/var/folders/b4/8s8w06nx4hl653jrlstcpbzr0000gp/T/pip-install-njw87q84/gpboost_a18887583e13434eb2dc0b8f86e9bf10/setup.py", line 249, in run
        compile_cpp(use_mingw=self.mingw, use_gpu=self.gpu, use_cuda=self.cuda, use_mpi=self.mpi,
      File "/private/var/folders/b4/8s8w06nx4hl653jrlstcpbzr0000gp/T/pip-install-njw87q84/gpboost_a18887583e13434eb2dc0b8f86e9bf10/setup.py", line 200, in compile_cpp
        silent_call(["make", "_gpboost", "-j4"], raise_error=True,
      File "/private/var/folders/b4/8s8w06nx4hl653jrlstcpbzr0000gp/T/pip-install-njw87q84/gpboost_a18887583e13434eb2dc0b8f86e9bf10/setup.py", line 110, in silent_call
        raise Exception("\n".join((error_msg, LOG_NOTICE)))
    Exception: An error has occurred while building gpboost library file
    The full version of error log was saved into /Users/XXX/GPBoost_compilation.log
    ----------------------------------------
ERROR: Command errored out with exit status 1: /opt/homebrew/opt/[email protected]/bin/python3.9 -u -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/private/var/folders/b4/8s8w06nx4hl653jrlstcpbzr0000gp/T/pip-install-njw87q84/gpboost_a18887583e13434eb2dc0b8f86e9bf10/setup.py'"'"'; __file__='"'"'/private/var/folders/b4/8s8w06nx4hl653jrlstcpbzr0000gp/T/pip-install-njw87q84/gpboost_a18887583e13434eb2dc0b8f86e9bf10/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(__file__) if os.path.exists(__file__) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /private/var/folders/b4/8s8w06nx4hl653jrlstcpbzr0000gp/T/pip-record-qsq3oe03/install-record.txt --single-version-externally-managed --compile --install-headers /opt/homebrew/include/python3.9/gpboost Check the logs for full command output.

Here the GPBoost Compilation log:

-- The CXX compiler identification is AppleClang 13.0.0.13000029
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /Library/Developer/CommandLineTools/usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /Library/Developer/CommandLineTools/usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Found OpenMP_C: -Xclang -fopenmp (found version "5.0") 
-- Found OpenMP_CXX: -Xclang -fopenmp (found version "5.0") 
-- Found OpenMP: TRUE (found version "5.0")  
-- Performing Test MM_PREFETCH
-- Performing Test MM_PREFETCH - Failed
-- Performing Test MM_MALLOC
-- Performing Test MM_MALLOC - Success
-- Using _mm_malloc
-- Configuring done
-- Generating done
-- Build files have been written to: /private/var/folders/b4/8s8w06nx4hl653jrlstcpbzr0000gp/T/pip-install-ef7yzw7b/gpboost_041ecd2108b647ad91a51be93538d5a6/build_cpp
[  7%] Building C object CMakeFiles/_gpboost.dir/external_libs/CSparse/Source/cs_dfs.c.o
[  7%] Building C object CMakeFiles/_gpboost.dir/external_libs/CSparse/Source/cs_reach.c.o
[  7%] Building C object CMakeFiles/_gpboost.dir/external_libs/CSparse/Source/cs_spsolve.c.o
[  9%] Building CXX object CMakeFiles/_gpboost.dir/src/GPBoost/DF_utils.cpp.o
[ 16%] Building CXX object CMakeFiles/_gpboost.dir/src/GPBoost/GP_utils.cpp.o
[ 16%] Building CXX object CMakeFiles/_gpboost.dir/src/GPBoost/re_model.cpp.o
[ 16%] Building CXX object CMakeFiles/_gpboost.dir/src/GPBoost/Vecchia_utils.cpp.o
[ 19%] Building CXX object CMakeFiles/_gpboost.dir/src/GPBoost/sparse_matrix_utils.cpp.o
[ 21%] Building CXX object CMakeFiles/_gpboost.dir/src/LightGBM/boosting/boosting.cpp.o
[ 23%] Building CXX object CMakeFiles/_gpboost.dir/src/LightGBM/boosting/gbdt.cpp.o
[ 26%] Building CXX object CMakeFiles/_gpboost.dir/src/LightGBM/boosting/gbdt_model_text.cpp.o
In file included from /private/var/folders/b4/8s8w06nx4hl653jrlstcpbzr0000gp/T/pip-install-ef7yzw7b/gpboost_041ecd2108b647ad91a51be93538d5a6/compile/src/GPBoost/re_model.cpp:9:
In file included from /private/var/folders/b4/8s8w06nx4hl653jrlstcpbzr0000gp/T/pip-install-ef7yzw7b/gpboost_041ecd2108b647ad91a51be93538d5a6/compile/include/GPBoost/re_model.h:13:
/private/var/folders/b4/8s8w06nx4hl653jrlstcpbzr0000gp/T/pip-install-ef7yzw7b/gpboost_041ecd2108b647ad91a51be93538d5a6/compile/include/GPBoost/re_model_template.h:23:10: fatal error: 'optim.hpp' file not found
#include "optim.hpp" // OptimLib
         ^~~~~~~~~~~
1 error generated.
make[3]: *** [CMakeFiles/_gpboost.dir/src/GPBoost/re_model.cpp.o] Error 1
make[3]: *** Waiting for unfinished jobs....
In file included from /private/var/folders/b4/8s8w06nx4hl653jrlstcpbzr0000gp/T/pip-install-ef7yzw7b/gpboost_041ecd2108b647ad91a51be93538d5a6/compile/src/LightGBM/boosting/boosting.cpp:7:
In file included from /private/var/folders/b4/8s8w06nx4hl653jrlstcpbzr0000gp/T/pip-install-ef7yzw7b/gpboost_041ecd2108b647ad91a51be93538d5a6/compile/src/LightGBM/boosting/dart.hpp:16:
In file included from /private/var/folders/b4/8s8w06nx4hl653jrlstcpbzr0000gp/T/pip-install-ef7yzw7b/gpboost_041ecd2108b647ad91a51be93538d5a6/compile/src/LightGBM/boosting/gbdt.h:9:
In file included from /private/var/folders/b4/8s8w06nx4hl653jrlstcpbzr0000gp/T/pip-install-ef7yzw7b/gpboost_041ecd2108b647ad91a51be93538d5a6/compile/include/LightGBM/objective_function.h:9:
In file included from /private/var/folders/b4/8s8w06nx4hl653jrlstcpbzr0000gp/T/pip-install-ef7yzw7b/gpboost_041ecd2108b647ad91a51be93538d5a6/compile/include/GPBoost/re_model.h:13:
/private/var/folders/b4/8s8w06nx4hl653jrlstcpbzr0000gp/T/pip-install-ef7yzw7b/gpboost_041ecd2108b647ad91a51be93538d5a6/compile/include/GPBoost/re_model_template.h:23:10: fatal error: 'optim.hpp' file not found
#include "optim.hpp" // OptimLib
         ^~~~~~~~~~~
1 error generated.
make[3]: *** [CMakeFiles/_gpboost.dir/src/LightGBM/boosting/boosting.cpp.o] Error 1
In file included from /private/var/folders/b4/8s8w06nx4hl653jrlstcpbzr0000gp/T/pip-install-ef7yzw7b/gpboost_041ecd2108b647ad91a51be93538d5a6/compile/src/LightGBM/boosting/gbdt.cpp:6:
In file included from /private/var/folders/b4/8s8w06nx4hl653jrlstcpbzr0000gp/T/pip-install-ef7yzw7b/gpboost_041ecd2108b647ad91a51be93538d5a6/compile/src/LightGBM/boosting/gbdt.h:9:
In file included from /private/var/folders/b4/8s8w06nx4hl653jrlstcpbzr0000gp/T/pip-install-ef7yzw7b/gpboost_041ecd2108b647ad91a51be93538d5a6/compile/include/LightGBM/objective_function.h:9:
In file included from /private/var/folders/b4/8s8w06nx4hl653jrlstcpbzr0000gp/T/pip-install-ef7yzw7b/gpboost_041ecd2108b647ad91a51be93538d5a6/compile/include/GPBoost/re_model.h:13:
/private/var/folders/b4/8s8w06nx4hl653jrlstcpbzr0000gp/T/pip-install-ef7yzw7b/gpboost_041ecd2108b647ad91a51be93538d5a6/compile/include/GPBoost/re_model_template.h:23:10: fatal error: 'optim.hpp' file not found
#include "optim.hpp" // OptimLib
         ^~~~~~~~~~~
In file included from /private/var/folders/b4/8s8w06nx4hl653jrlstcpbzr0000gp/T/pip-install-ef7yzw7b/gpboost_041ecd2108b647ad91a51be93538d5a6/compile/src/LightGBM/boosting/gbdt_model_text.cpp:6:
In file included from /private/var/folders/b4/8s8w06nx4hl653jrlstcpbzr0000gp/T/pip-install-ef7yzw7b/gpboost_041ecd2108b647ad91a51be93538d5a6/compile/include/LightGBM/metric.h:12:
In file included from /private/var/folders/b4/8s8w06nx4hl653jrlstcpbzr0000gp/T/pip-install-ef7yzw7b/gpboost_041ecd2108b647ad91a51be93538d5a6/compile/include/LightGBM/objective_function.h:9:
In file included from /private/var/folders/b4/8s8w06nx4hl653jrlstcpbzr0000gp/T/pip-install-ef7yzw7b/gpboost_041ecd2108b647ad91a51be93538d5a6/compile/include/GPBoost/re_model.h:13:
/private/var/folders/b4/8s8w06nx4hl653jrlstcpbzr0000gp/T/pip-install-ef7yzw7b/gpboost_041ecd2108b647ad91a51be93538d5a6/compile/include/GPBoost/re_model_template.h:23:10: fatal error: 'optim.hpp' file not found
#include "optim.hpp" // OptimLib
         ^~~~~~~~~~~
1 error generated.
make[3]: *** [CMakeFiles/_gpboost.dir/src/LightGBM/boosting/gbdt.cpp.o] Error 1
1 error generated.
make[3]: *** [CMakeFiles/_gpboost.dir/src/LightGBM/boosting/gbdt_model_text.cpp.o] Error 1
make[2]: *** [CMakeFiles/_gpboost.dir/all] Error 2
make[1]: *** [CMakeFiles/_gpboost.dir/rule] Error 2
make: *** [_gpboost] Error 2

compile error

cannot complie beacause of lack of ../GPBoost/external_libs/eigen/Eigen/Core and ../GPBoost/external_libs/eigen/Eigen/src/Core.

I have got them from official lightgbm code,pls update it for gpboost

For the example of wages data in Section 5.1, can you provide the corresponding code?

Recently, I am interested in GPBoost algorithm and want to further apply it, but I don't know much about the mathematical knowledge behind it, especially how to set the mean function F (x).
In Gaussian Process Boosting, you provided an example of Friedman3 data set. I want to ask how to get its mean function F (x), and how to set the average function F (x) for other data sets (there may be more than four feature columns)?
Thank you!

cannot save gp_model

as discussed in #4 (comment)

glibc_2.27 on Ubuntu 16.04

Can you relax the glibc_2.27 requirement? The error comes up on import gpboost: https://stackoverflow.com/questions/59145051/glibc-2-27-not-found-ubuntu-16-04

Your requirements mention >= 2.14. I have 2.23.

SHAP example does not work

    shap_values = shap.TreeExplainer(model).shap_values(x)
  File "/opt/conda/lib/python3.7/site-packages/shap/explainers/_tree.py", line 382, in shap_values
    check_additivity)
  File "/opt/conda/lib/python3.7/site-packages/shap/explainers/_tree.py", line 235, in _validate_inputs
    tree_limit = self.model.values.shape[0]
AttributeError: 'TreeEnsemble' object has no attribute 'values'

When attempting the example described here

Feature Request: Multivariate modeling via coregionalization

I am interested in an extension to joint modeling of multiple correlated outcomes. For example, via coregionalization, described in this paper and implemented in pymc3. I see a separable spatiotemporal GP on your roadmap, and I think the implementation of coregionalization would be similar.

Nice work, BTW. I recently came across this software and the two supporting papers and am excited to apply these methods to my work.

Error in parameter tuning using cross-valiation

Hi, thank you for creating this amazing method and package. I am really interested to try it out.

I was strictly following your blog post (https://towardsdatascience.com/tree-boosted-mixed-effects-models-4df610b624cb), and the first error message I got is about the parameter tuning cross-validation:

# Parameter tuning using cross-validation (only number of boosting iterations)
gp_model = gpb.GPModel(group_data=group_train)
cvbst = gpb.cv(params=params, train_set=data_train,
               gp_model=gp_model, use_gp_model_for_validation=False,
               num_boost_round=100, early_stopping_rounds=5,
               nfold=4, verbose_eval=True, show_stdv=False, seed=1)
best_iter = np.argmin(cvbst['l2-mean'])
print("Best number of iterations: " + str(best_iter))
# Best number of iterations: 32

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
/tmp/ipykernel_7866/3556734036.py in <module>
      1 # Parameter tuning using cross-validation (only number of boosting iterations)
      2 gp_model = gpb.GPModel(group_data=group_train)
----> 3 cvbst = gpb.cv(params=params, train_set=data_train,
      4                gp_model=gp_model, use_gp_model_for_validation=False,
      5                num_boost_round=100, early_stopping_rounds=5,

~/.conda/envs/gpboost/lib/python3.8/site-packages/gpboost/engine.py in cv(params, train_set, num_boost_round, gp_model, use_gp_model_for_validation, fit_GP_cov_pars_OOS, train_gp_model_cov_pars, folds, nfold, stratified, shuffle, metrics, fobj, feval, init_model, feature_name, categorical_feature, early_stopping_rounds, fpreproc, verbose_eval, show_stdv, seed, callbacks, eval_train_metric, return_cvbooster)
    717 
    718     results = collections.defaultdict(list)
--> 719     cvfolds = _make_n_folds(train_set, folds=folds, nfold=nfold,
    720                             params=params, seed=seed, gp_model=gp_model,
    721                             use_gp_model_for_validation=use_gp_model_for_validation,

~/.conda/envs/gpboost/lib/python3.8/site-packages/gpboost/engine.py in _make_n_folds(full_data, folds, nfold, params, seed, gp_model, use_gp_model_for_validation, fpreproc, stratified, shuffle, eval_train_metric)
    412     ret = CVBooster()
    413     for train_idx, test_idx in folds:
--> 414         train_set = full_data.subset(sorted(train_idx))
    415         if full_data.free_raw_data:
    416             valid_set = full_data.subset(sorted(test_idx))

~/.conda/envs/gpboost/lib/python3.8/site-packages/gpboost/basic.py in subset(self, used_indices, params, reference)
   1638                 data_subset = self.data.iloc[used_indices_sorted]
   1639             else:
-> 1640                 data_subset = self.data[used_indices_sorted]
   1641             label_subset = self.label[used_indices_sorted]
   1642             weight_subset = None

~/.conda/envs/gpboost/lib/python3.8/site-packages/pandas/core/frame.py in __getitem__(self, key)
   3462             if is_iterator(key):
   3463                 key = list(key)
-> 3464             indexer = self.loc._get_listlike_indexer(key, axis=1)[1]
   3465 
   3466         # take() does not accept boolean indexers

~/.conda/envs/gpboost/lib/python3.8/site-packages/pandas/core/indexing.py in _get_listlike_indexer(self, key, axis)
   1312             keyarr, indexer, new_indexer = ax._reindex_non_unique(keyarr)
   1313 
-> 1314         self._validate_read_indexer(keyarr, indexer, axis)
   1315 
   1316         if needs_i8_conversion(ax.dtype) or isinstance(

~/.conda/envs/gpboost/lib/python3.8/site-packages/pandas/core/indexing.py in _validate_read_indexer(self, key, indexer, axis)
   1372                 if use_interval_msg:
   1373                     key = list(key)
-> 1374                 raise KeyError(f"None of [{key}] are in the [{axis_name}]")
   1375 
   1376             not_found = list(ensure_index(key)[missing_mask.nonzero()[0]].unique())

KeyError: "None of [Int64Index([   0,    2,    3,    4,    6,    7,    8,    9,   10,   14,\n            ...\n            4986, 4987, 4988, 4991, 4993, 4994, 4995, 4996, 4998, 4999],\n           dtype='int64', length=3750)] are in the [columns]"

I also tried the code in your python-guide script (https://github.com/fabsig/GPBoost/blob/master/examples/python-guide/parameter_tuning.py) with minimal changes, but got the same error

# Other parameters not contained in the grid of tuning parameters
params = { 'objective': 'regression_l2', 'verbose': 0, 'num_leaves': 2**10, 'max_bin': 255 }

# Small grid and deterministic search
param_grid_small = {'learning_rate': [1, 0.1,0.01], 'min_data_in_leaf': [20,100],
                    'max_depth': [5,10]}

opt_params = gpb.grid_search_tune_parameters(param_grid=param_grid_small,
                                             params=params,
                                             num_try_random=None,
                                             nfold=4,
                                             gp_model=gp_model,
                                             use_gp_model_for_validation=True,
                                             train_set=data_train,
                                             verbose_eval=1,
                                             num_boost_round=1000, 
                                             early_stopping_rounds=10,
                                             seed=1000,
                                             metrics='l2-mean')

Starting deterministic grid search with 12 parameter combinations...
Trying parameter combination 1 of 12: {'learning_rate': 1.0, 'min_data_in_leaf': 20, 'max_depth': 5} ...
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
/tmp/ipykernel_7866/649370704.py in <module>
----> 1 opt_params = gpb.grid_search_tune_parameters(param_grid=param_grid_small,
      2                                              params=params,
      3                                              num_try_random=None,
      4                                              nfold=4,
      5                                              gp_model=gp_model,

~/.conda/envs/gpboost/lib/python3.8/site-packages/gpboost/engine.py in grid_search_tune_parameters(param_grid, train_set, params, num_try_random, num_boost_round, gp_model, use_gp_model_for_validation, train_gp_model_cov_pars, folds, nfold, stratified, shuffle, metrics, fobj, feval, init_model, feature_name, categorical_feature, early_stopping_rounds, fpreproc, verbose_eval, seed, callbacks)
   1019             print("Trying parameter combination " + str(counter_num_comb) +
   1020                   " of " + str(len(try_param_combs)) + ": " + str(param_comb) + " ...")
-> 1021         cvbst = cv(params=params, train_set=train_set, num_boost_round=num_boost_round,
   1022                    gp_model=gp_model, use_gp_model_for_validation=use_gp_model_for_validation,
   1023                    train_gp_model_cov_pars=train_gp_model_cov_pars,

~/.conda/envs/gpboost/lib/python3.8/site-packages/gpboost/engine.py in cv(params, train_set, num_boost_round, gp_model, use_gp_model_for_validation, fit_GP_cov_pars_OOS, train_gp_model_cov_pars, folds, nfold, stratified, shuffle, metrics, fobj, feval, init_model, feature_name, categorical_feature, early_stopping_rounds, fpreproc, verbose_eval, show_stdv, seed, callbacks, eval_train_metric, return_cvbooster)
    717 
    718     results = collections.defaultdict(list)
--> 719     cvfolds = _make_n_folds(train_set, folds=folds, nfold=nfold,
    720                             params=params, seed=seed, gp_model=gp_model,
    721                             use_gp_model_for_validation=use_gp_model_for_validation,

~/.conda/envs/gpboost/lib/python3.8/site-packages/gpboost/engine.py in _make_n_folds(full_data, folds, nfold, params, seed, gp_model, use_gp_model_for_validation, fpreproc, stratified, shuffle, eval_train_metric)
    412     ret = CVBooster()
    413     for train_idx, test_idx in folds:
--> 414         train_set = full_data.subset(sorted(train_idx))
    415         if full_data.free_raw_data:
    416             valid_set = full_data.subset(sorted(test_idx))

~/.conda/envs/gpboost/lib/python3.8/site-packages/gpboost/basic.py in subset(self, used_indices, params, reference)
   1638                 data_subset = self.data.iloc[used_indices_sorted]
   1639             else:
-> 1640                 data_subset = self.data[used_indices_sorted]
   1641             label_subset = self.label[used_indices_sorted]
   1642             weight_subset = None

~/.conda/envs/gpboost/lib/python3.8/site-packages/pandas/core/frame.py in __getitem__(self, key)
   3462             if is_iterator(key):
   3463                 key = list(key)
-> 3464             indexer = self.loc._get_listlike_indexer(key, axis=1)[1]
   3465 
   3466         # take() does not accept boolean indexers

~/.conda/envs/gpboost/lib/python3.8/site-packages/pandas/core/indexing.py in _get_listlike_indexer(self, key, axis)
   1312             keyarr, indexer, new_indexer = ax._reindex_non_unique(keyarr)
   1313 
-> 1314         self._validate_read_indexer(keyarr, indexer, axis)
   1315 
   1316         if needs_i8_conversion(ax.dtype) or isinstance(

~/.conda/envs/gpboost/lib/python3.8/site-packages/pandas/core/indexing.py in _validate_read_indexer(self, key, indexer, axis)
   1372                 if use_interval_msg:
   1373                     key = list(key)
-> 1374                 raise KeyError(f"None of [{key}] are in the [{axis_name}]")
   1375 
   1376             not_found = list(ensure_index(key)[missing_mask.nonzero()[0]].unique())

KeyError: "None of [Int64Index([   0,    1,    2,    4,    8,    9,   11,   12,   13,   14,\n            ...\n            4986, 4987, 4989, 4990, 4991, 4993, 4994, 4996, 4998, 4999],\n           dtype='int64', length=3750)] are in the [columns]"

In case it is helpful, here is my session info:

Click to view session information
-----
gpboost             0.7.0
numpy               1.22.0
pandas              1.3.5
session_info        1.0.0
sklearn             1.0.2
-----
Click to view modules imported as dependencies
backcall                    0.2.0
beta_ufunc                  NA
binom_ufunc                 NA
cython_runtime              NA
dateutil                    2.8.2
debugpy                     1.5.1
decorator                   5.1.0
entrypoints                 0.3
ipykernel                   6.4.1
ipython_genutils            0.2.0
jedi                        0.18.0
joblib                      1.1.0
nbinom_ufunc                NA
parso                       0.8.2
pexpect                     4.8.0
pickleshare                 0.7.5
pkg_resources               NA
prompt_toolkit              3.0.20
ptyprocess                  0.7.0
pydev_ipython               NA
pydevconsole                NA
pydevd                      2.6.0
pydevd_concurrency_analyser NA
pydevd_file_utils           NA
pydevd_plugins              NA
pydevd_tracing              NA
pygments                    2.10.0
pytz                        2021.3
scipy                       1.7.3
setuptools                  60.2.0
six                         1.16.0
storemagic                  NA
threadpoolctl               3.0.0
tornado                     6.1
traitlets                   5.1.1
wcwidth                     0.2.5
zmq                         22.3.0
-----
IPython             7.29.0
jupyter_client      7.1.0
jupyter_core        4.9.1
-----
Python 3.8.12 (default, Oct 12 2021, 13:49:34) [GCC 7.5.0]
Linux-5.10.68-62.173.amzn2.x86_64-x86_64-with-glibc2.17
-----
Session information updated at 2022-01-03 17:02

Thank you so much for looking into this.

parameter optimization "does not accept boolean indexers" error

I am trying to perform tuning of parameters with the following approach, but I get an # take() does not accept boolean indexers

df = pd.read_csv('model_gpboost/multicenter_model.csv') 
y_train = df[['var1']]
X_train =df.drop(['var1','var2','var3','var4'], axis = 1)
group = df[['var2']]
gp_model = gpb.GPModel(group_data=group, likelihood = "bernoulli_probit")
gp_model.set_optim_params(params={"optimizer_cov": "gradient_descent"})
data_train = gpb.Dataset(X_train, y_train)
params = { 'objective': 'binary', 'verbose': 0, 'num_leaves': 2**10 }
# Small grid and deterministic grid search
param_grid_small = {'learning_rate': [0.1,0.01], 'min_data_in_leaf': [20,100],
                    'max_depth': [5,10], 'max_bin': [255,1000]}

opt_params = gpb.grid_search_tune_parameters(param_grid=param_grid_small,
                                             params=params,
                                             num_try_random=None,
                                             nfold=4,
                                             gp_model=gp_model,
                                             use_gp_model_for_validation=True,
                                             train_set=data_train,
                                             verbose_eval=1,
                                             num_boost_round=1000, 
                                             early_stopping_rounds=10,
                                             seed=1,
                                             metrics='binary_logloss')

The detailed error

#-> 2908             indexer = self.loc._get_listlike_indexer(key, axis=1, raise_missing=True)[1]
   2909 
   2910         # take() does not accept boolean indexers

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\indexing.py in _get_listlike_indexer(self, key, axis, raise_missing)
   1252             keyarr, indexer, new_indexer = ax._reindex_non_unique(keyarr)
   1253 
-> 1254         self._validate_read_indexer(keyarr, indexer, axis, raise_missing=raise_missing)
   1255         return keyarr, indexer
   1256 

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\indexing.py in _validate_read_indexer(self, key, indexer, axis, raise_missing)
   1296             if missing == len(indexer):
   1297                 axis_name = self.obj._get_axis_name(axis)
-> 1298                 raise KeyError(f"None of [{key}] are in the [{axis_name}]")
   1299 
   1300             # We (temporarily) allow for some missing keys with .loc, except in

KeyError: "None of [Int64Index([   0,    1,    2,    3,    4,    5,    6,    8,    9,   10,\n            ...\n            8115, 8116, 8117, 8118, 8119, 8120, 8121, 8122, 8123, 8124],\n           dtype='int64', length=6093)] are in the [columns]

Is there a problem with using the pandas data frame or what am I missing?

Python package build failure

I am trying to build the Python package from source on Ubuntu, and have installed gcc, g++ and cmake via apt. However, the build fails as follows:

INFO:GPBoost:Starting to compile with CMake.
Traceback (most recent call last):
  File "setup.py", line 106, in silent_call
    subprocess.check_call(cmd, stderr=log, stdout=log)
  File "/home/ubuntu/anaconda3/envs/heat_maps/lib/python3.8/subprocess.py", line 364, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['cmake', '../compile/']' returned non-zero exit status 1.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "setup.py", line 335, in <module>
    setup(name='gpboost',
  File "/home/ubuntu/anaconda3/envs/heat_maps/lib/python3.8/site-packages/setuptools/__init__.py", line 163, in setup
    return distutils.core.setup(**attrs)
  File "/home/ubuntu/anaconda3/envs/heat_maps/lib/python3.8/distutils/core.py", line 148, in setup
    dist.run_commands()
  File "/home/ubuntu/anaconda3/envs/heat_maps/lib/python3.8/distutils/dist.py", line 966, in run_commands
    self.run_command(cmd)
  File "/home/ubuntu/anaconda3/envs/heat_maps/lib/python3.8/distutils/dist.py", line 985, in run_command
    cmd_obj.run()
  File "setup.py", line 249, in run
    compile_cpp(use_mingw=self.mingw, use_gpu=self.gpu, use_cuda=self.cuda, use_mpi=self.mpi,
  File "setup.py", line 199, in compile_cpp
    silent_call(cmake_cmd, raise_error=True, error_msg='Please install CMake and all required dependencies first')
  File "setup.py", line 110, in silent_call
    raise Exception("\n".join((error_msg, LOG_NOTICE)))
Exception: Please install CMake and all required dependencies first
The full version of error log was saved into /home/ubuntu/GPBoost_compilation.log

Using gcc/g++ 9.3 and cmake 3.16.3-1ubuntu1

	fixed_effect_train = predictor.predict(self.train_set.data, start_iteration=start_iteration,
	num_iteration=num_iteration, raw_score=True, pred_leaf=False,
	pred_contrib=False, data_has_header=data_has_header,
	is_reshape=False)
	if self.gp_model.get_likelihood_name() == "gaussian": # Gaussian data
	residual = self.train_set.label - fixed_effect_train
	# Note: we need to provide the response variable y as this was not saved
	# in the gp_model ("in C++") for Gaussian data but was overwritten during training
	random_effect_pred = self.gp_model.predict(y=residual,

fabsig / gpboost Goto Github PK

gpboost's People

Contributors

Stargazers

Watchers

Forkers

gpboost's Issues

create dataset for gpb.train

Train model

Predict

Recommend Projects

Recommend Topics

Recommend Org