stanfordmlgroup / ngboost Goto Github PK
View Code? Open in Web Editor NEWNatural Gradient Boosting for Probabilistic Prediction
License: Apache License 2.0
Natural Gradient Boosting for Probabilistic Prediction
License: Apache License 2.0
Disclaimer: forgive me for all my stupid mistakes and/or misinterpretation of several statistical things. My intentions were to provide a complete, working example of the Bernoulli class before uploading it, but because several people told me that they would like to help I decided to put this (preliminary) version already on GitHub. Once it is in a more sophisticated state I guess we can open up a pull request.
So for classification problems we require to have distributions which can match these problems accordingly (e.g. Bernoulli for binary classification). Thus, my aim was to create a Bernoulli class which would make binary classification using NGBoost feasible. The last few days I studied a lot of probability statistics and did my best at reading the NGBoost paper to maximum detail. Please forgive me in advance if I completely misunderstood the whole concept and my implementation might be complete nonsense (if so, please tell me). Underneath I provide a first version of the Bernoulli class, in which I would like to point out several things:
The Bernoulli class was tested on the breast cancer classification dataset (scikit-learn) and it somehow seems to converge. Also the predicted probabilities seem to match with the labelled outcomes of the test dataset. There are some runtime issues, however.
I am not sure how to implement the Bernoulli.fit
method, as I would not have any better ideas to just set the initial parameter to the average positive probability in the dataset. Additionally, I am not 100% positive on the nll
, D_nll
and fisher_info
functions.
Running the example gives a lot of RuntimeWarnings exclusively about mathematical operations (e.g. invalid values, divides by zeroes). This is caused due to the NGBoost.line_search
function but I yet have to look what exactly is causing this.
Bernoulli class:
class Bernoulli(object):
"""
Bernoulli class containing the Bernoulli distribution.
...
Attributes
----------
n_params : int
contains the numeric amount of params in our distribution.
Methods
-------
nll(Y)
returns the negative log-likelihood dependent on data `Y`.
D_nll(Y)
returns the first derivative of the negative log-likelihood dependent on data `Y`.
fisher_info()
returns the fisher information
"""
n_params = 1
def __init__(self, params):
# Initialize class
# Probablity for succes (only parameter)
self.p = params[0]
# Initialize the distribution
self.dist = dist(self.p)
def __getattr__(self, name):
if name in dir(self.dist):
return getattr(self.dist, name)
return None
def nll(self, Y):
# formula: log(p) * X + log(1-p) * 1 - X
Y = Y.squeeze()
return np.array(-(np.log(self.p) * Y + np.log(1. - self.p) * (1 - Y)))
def D_nll(self, Y):
# formula: (X / p) - ((1 - X) / 1-p)
Y = Y.squeeze()
D = (Y / self.p) - ((1 - Y) / (1 - self.p))
return D.reshape(-1, 1)
def crps(self, Y):
raise NotImplementedError('crps not implemented yet')
def crps_metric(self, Y):
raise NotImplementedError('crps_metric not implemented yet')
def fisher_info(self):
# formula: (1 / p(p-1)
FI = np.ones((self.p.shape[0], 1, 1))
FI[:, 0, 0] = 1 / (self.p * (self.p-1))
return FI
def fisher_info_cens(self, Y):
# not sure, is this a specific function for censored data?
# those "_cens" functions are not called in the API somewhere, I guess
# these can be removed and are mainly in other classes cause of deprecated code?
raise NotImplementedError('fisher_info_cens not implemented yet')
def fit(Y):
# how to fit to initial generic data?
# now I set the `p` to the total amount of positive class, not sure if this is correct..
return np.array([sum(Y.squeeze())/len(Y.squeeze())])
To perform a small test (WARNING: loads of RuntimeWarnings):
from ngboost.ngboost import NGBoost
from ngboost.learners import default_tree_learner
from ngboost.scores import MLE
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
X, Y = load_breast_cancer(True)
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.2)
ngb = NGBoost(Base=default_tree_learner, Dist=Bernoulli, Score=MLE(), natural_gradient=True,
verbose=True)
ngb.fit(X_train, Y_train)
Y_dists = ngb.pred_dist(X_test)
test_NLL = Y_dists.nll(Y_test.flatten()).mean()
print('Test NLL', test_NLL)
print(Y_dists.p)
print(Y_test)
Full example also available as a Google Collab at: https://colab.research.google.com/drive/1_O2w1MXjuMKq7bc8Pj4Atv5a_bGKWSp7.
I am open for any suggestions, tips, help, guidance on how to develop this further. And once more, please my apologies in advance if I am completely missing the point somewhere.
NGBoost Scripts are not working because renaming and moving files.
This Link is result of executing the script.
Google Colabratory - execute NGBoost Scripts
Encountering this error repeatedly - "type object 'LogNormal' has no attribute 'scores' ". How can I fix it?
Moving discussion here from #60 (comment)
@cooberp @kmedved here is some scaffold code that I'm hoping you folks can fill in/modify to get your distributions up and running:
from ngboost.distns import Distn
import numpy as np
class Generalized3ParamBurrXIIDiscrete(Distn):
n_params = 3
def __init__(self, params):
self.params_ = params # all real numbers
self.mu, self.sigma, self.mu = [exp(p) for p in params] # transform to positives
def sample(self, m):
"""
Code to draw m random samples from the distribution. Each sample will have
n = len(self) = len(self.mu) elements, so the output should be m x n
"""
return None
def fit(Y):
"""
Code to fit a *single* Discrete Burr XII distribution to data Y.
Needs to return parameters mu, sigma, and nu in a numpy array.
This is just for initialization of the NGBoost algorithm, so it doesn't need to be perfect.
Ballpark is good enough.
"""
mu, sigma, nu = None, None, None
return np.array([mu, sigma, nu])
# log score methods
def nll(self, Y):
"""
log-likelihood (per observation).
Returns a vector of length n = len(self)
"""
return -np.log(((1 + (Y/self.mu)**self.sigma)**(-self.nu)) - ((1 + ((Y+1)/self.mu)**self.sigma)**(-self.nu)))
def D_nll(self, Y):
"""
Returns the derivative of self.nll() with respect to each of the real-valued
parameters [log(mu), log(sigma), log(nu)].
These can be easily calculated using, e.g., wolframalpha, and efficiently implemented here.
"""
d_log_mu = np.zeros_like(self.mu)
d_log_sigma = np.zeros_like(self.sigma)
d_log_nu = np.zeros_like(self.nu)
return np.array([dmu, dsigma, dnu])
This is for Discrete Burr XII, but the equivalent code should also do to implement your skew-t distribution.
D_nll()
should be easy. Just copy-paste the nll into wolframalpha, edit the variable names, and ask for derivatives. Copy-paste back to D_nll, re-edit the variable names, and call it a day.
The biggest challenge, I think, will be implementing the sample()
method, which is necessary if you don't want to derive/implement the Fisher Information. I was working on this myself but didn't have luck with anything simple. As you know, the distribution isn't already implemented in scipy.stats
or another python package. scipy.stats
does have a Burr XII, which I hoped to sample from and then use np.floor()
on to get the discrete version, but then I noticed that what they call Burr XII has a different number of parameters than the pmf you gave me, which I think corresponds to some 3-parameter "generalized" version of the (discretized) Burr XII... All of which is fine, if that's what you want, but the upshot is that there isn't a pre-implemented version to sample from or an easy way to make one.
On the other hand, I don't think this is at all an insurmountable challenge. Making a sampling algorithm for a "custom" distribution is fairly straightforward using, e.g., inverse transform sampling. All you need to do is calculate the inverse CDF (use wolframalpha or whatever) and implement that. And if that doesn't work, there are other methods. All in all, still probably easier than deriving the Fisher.
The fit()
method is also not necessarily trivial, but feel free to use whatever heuristics you want since it's just for initialization. Or go wild and implement/call some optimization method of your choosing.
I haven't looked into skew-T that closely, but the same general ideas should apply. And if the proper sample()
method is already implemented somewhere, your job will be easier.
Since this is all a little bit of a challenge and not distributions others will likely use, I'm hoping you two (and/or your collaborators) can give me a hand here and give this a shot. But please do let me know where/if you get stuck and I will jump in to rescue as necessary!
Hi,
I couldn't find any version attribute within the package.
I think it's helpful for bug reports to be able to do
import ngboost
print(ngboost.__version__)
I would be nice to have an example of classification problem.
If X is a pandas dataframe and Y is a series, and minibatch_frac == 1, then ngb.fit(X, Y)
works with no issues.
But if minibatch_frac < 1, then the function sample
in ngboost.py
fails to work on dataframes:
def sample(self, X, Y, params):
if self.minibatch_frac == 1.0:
return np.arange(len(Y)), X, Y, params
sample_size = int(self.minibatch_frac * len(Y))
idxs = np_rnd.choice(np.arange(len(Y)), sample_size, replace=False)
return idxs, X[idxs,:], Y[idxs], params[idxs, :]
Because X[idxs, :]
is not valid DataFrame syntax.
A workaround that works on my machine:
def sample(self, X, Y, params):
if self.minibatch_frac == 1.0:
return np.arange(len(Y)), X, Y, params
sample_size = int(self.minibatch_frac * len(Y))
idxs = np_rnd.choice(np.arange(len(Y)), sample_size, replace=False)
try:
X_batch = X[idxs,:]
except TypeError:
X_batch = X.iloc[idxs, :]
return idxs, X[idxs,:], Y[idxs], params[idxs, :]
I'm running version 0.1.3, installed from github via pip today, on Mac OS 10.14.13, python version 3.7.4
Running your code from https://github.com/stanfordmlgroup/ngboost/blob/master/examples/classification.py
I received the following error when I try
ngb.predict(X_test)
`---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
in ()
----> 1 ngb.predict(X_test)
/usr/local/lib/python3.6/dist-packages/ngboost/ngboost.py in predict(self, X)
133 def predict(self, X):
134 dist = self.pred_dist(X)
--> 135 return list(dist.loc.flatten())
136
137 def score(self, X, Y):
AttributeError: 'NoneType' object has no attribute 'flatten'
----------------------------------------------------------------`
ngb.pred_dist(X_test)
works properly but the above error is also obtained when using the sklearn function with cross validation, "cross_validate()".
Regards,
re: #60 (comment)
requires:
dist.sample()
and dist.pdf()
methods in terms of parameters in RThanks for the great work!
Is there any way to estimate aleatoric and epistemic uncertainties separately with this method?
pip install ngboost
Collecting ngboost
Using cached https://files.pythonhosted.org/packages/58/15/8942e2b8a38f92b92e1dedad882b0746372e4acde236e74b98bfa66d717a/ngboost-0.1.3-py3-none-any.whl
Collecting scikit-learn>=0.21.3
Using cached https://files.pythonhosted.org/packages/76/79/60050330fe57fb59f2c53d0d11673df28c20ea9315da3652477429fc4949/scikit_learn-0.21.3-cp36-cp36m-win_amd64.whl
Collecting numpy>=1.17.2
Using cached https://files.pythonhosted.org/packages/55/7a/f32b39164262765b069b0fe3ec5d4b47580c9c60f7bd3588b58ba8e93a4c/numpy-1.17.3-cp36-cp36m-win_amd64.whl
ERROR: Could not find a version that satisfies the requirement jaxlib>=0.1.29 (from ngboost) (from versions: none)
ERROR: No matching distribution found for jaxlib>=0.1.29 (from ngboost)
Collecting git+https://github.com/stanfordmlgroup/ngboost.git
Cloning https://github.com/stanfordmlgroup/ngboost.git to c:\users\stig.cz\appdata\local\temp\pip-req-build-zjnf7vpn
Running command git clone -q https://github.com/stanfordmlgroup/ngboost.git 'C:\Users\Stig.CZ\AppData\Local\Temp\pip-req-build-zjnf7vpn'
error: RPC failed; curl 18 transfer closed with outstanding read data remaining
fatal: the remote end hung up unexpectedly
fatal: early EOF
fatal: index-pack failed
ERROR: Command errored out with exit status 128: git clone -q https://github.com/stanfordmlgroup/ngboost.git 'C:\Users\Stig.CZ\AppData\Local\Temp\pip-req-build-zjnf7vpn' Check the logs for full command output.
model = NGBClassifier(Base=default_tree_learner, Dist=Bernoulli,
Score=MLE, natural_gradient=True, verbose=False,n_estimators = 500)
Here is error
` 226 # fitting
--> 227 model.fit(train_arx,train_ary)
228 if return_proba :
229 predict_value = model.predict_proba(test_arx)
~/anaconda3/lib/python3.6/site-packages/ngboost/ngboost.py in fit(self, X, Y, X_val, Y_val, sample_weight, val_sample_weight, train_loss_monitor, val_loss_monitor, early_stopping_rounds)
119 loss_list += [train_loss_monitor(D, Y_batch)]
120 loss = loss_list[-1]
--> 121 grads = S.grad(D, Y_batch, natural=self.natural_gradient)
122
123 proj_grad = self.fit_base(X_batch, grads, sample_weight)
~/anaconda3/lib/python3.6/site-packages/ngboost/scores.py in grad(forecast, Y, natural)
13 grad = forecast.D_nll(Y)
14 if natural:
---> 15 grad = np.linalg.solve(fisher, grad)
16 return grad
17
<array_function internals> in solve(*args, **kwargs)
~/anaconda3/lib/python3.6/site-packages/numpy/linalg/linalg.py in solve(a, b)
401 signature = 'DD->D' if isComplexType(t) else 'dd->d'
402 extobj = get_linalg_error_extobj(_raise_linalgerror_singular)
--> 403 r = gufunc(a, b, signature=signature, extobj=extobj)
404
405 return wrap(r.astype(result_t, copy=False))
~/anaconda3/lib/python3.6/site-packages/numpy/linalg/linalg.py in _raise_linalgerror_singular(err, flag)
95
96 def _raise_linalgerror_singular(err, flag):
---> 97 raise LinAlgError("Singular matrix")
98
99 def _raise_linalgerror_nonposdef(err, flag):
LinAlgError: Singular matrix`
On the same data, sometimes works fine but sometimes occur errors. What should I do when this error occurs?
It seesm the empirical results used retrain model (changing tree structures) but not refit (keep pretrained tree structures) in "ngboost/examples/empirical/regression.py". However, the paper said exactly "refit" which is more reasonable
MAE is popular for regression problems, current model is only using RMSE
Disclaimer: my background is mainly in (clinical) chemistry and less in maths/statistics (although I try to read and learn about it on a day-to-day basis) so please excuse me in advance if I am missing some obvious points.
I am trying to implement a ngboost.distns.Bernoulli class for binary classification problems using NGBoost algorithms. However, I have some questions around the implementation of the ngboost.distns.Normal
, specifically the nll
and D_nll
functions. The nll (negative log-likelihood) function is written as follows:
def nll(self, Y):
return -self.dist.logpdf(Y).mean()
My first question would be; is this actually the negative log-likelihood function? As far as my knowledge goes we use the sum of the PDF by implementing logpdf(<data>).sum()
to obtain a log-likelihood of a Normal distribution. Is there some specific reason we account for N in this specific nll function? Secondly, the D_nll
implementation (derivative of the nll function) is as follows:
def D_nll(self, Y_):
Y = Y_.squeeze()
D = np.zeros((self.var.shape[0], 2))
D[:, 0] = (self.loc - Y) / self.var
D[:, 1] = 1 - ((self.loc - Y) ** 2) / self.var
return D
Secondly, why do we use this kind of implementation for the derivative? Wouldn't it be better to go for a more generic way and for instance use scipy.optimize
? Has this to do with the fact that we use natural gradients? Third, why do we add 1e-8 to the scale and variance of our Normal distribution when we initialize it?
Thank you very much in advance.
Hi, I was wondering which is the purpose of the arguments train_loss_monitor and val_loss_monitor. Sklearn models usually include a monitor argument that allows for early stopping. Is that the idea?
Thank you for the excellent work with NGBoost, really excited to having been testing it out!
In commit c4b46b9
the fit method was altered to return self instead of the train and val losses. Is there any way to access the losses with the current behavior?
I believe the losses should be accessible, because we may not be interested in doing early stopping but actually training for a longer number of iterations and simply chose the best val loss.
Also, returning the losses is essential to compare different models.
I might be wrong but the gradient seems incorrect wrt sigma
I think this should be used, my tests give bad results on boston (nll) with the current implementation, with the following change it's good again
def D_nll(self, Y_):
Y = Y_.squeeze()
D = np.zeros((self.var.shape[0], 2))
D[:, 0] = (Y - self.loc) / self.var
D[:, 1] = (Y - self.loc)**2 / self.scale**3 - 1/self.scale
return -D
Read the paper and tried this package, had to say it is marvelous! I will not hesitate to use it in my real-life work, but wonder if there is plan to add support for model explanation tool such as feature importance plot, SHAP plot or tree visualizer. It would be crucial to present the model explanation to business stakeholders.
Thanks for this fantastic masterpiece!
Thanks for your work sharing. Maybe it is necessary to add support for GridSearchCV of scikit-learn to improve the usability and influence.
In the classification example (https://github.com/stanfordmlgroup/ngboost/blob/master/examples/empirical/clf_sklearn.py) , I tried to provide custom scoring metric:
grid_search = GridSearchCV(ngb, param_grid=param_grid, scoring = 'roc_auc', cv=5)
However, the following error is raised:
'NGBClassifier' object has no attribute 'predict_proba'
default_tree_learner is DecisionTreeRegressor with friedman_mse criterion, which is kinda weird to use for classification.
I may be a bit confused here but is it really fine to use Regressor and not Classifier as a base class? It may be by design, but looks really weird.
This package looks so promising!
I am just testing it out on my dataset with dimensions (N, M) = (57795, 144). At first I tried with N=100, N=1000, and N=10_000 and it worked well. Now I am trying to run it on all N=57_795 and I am encountering some overflow errors, see below. Is this something to be worried about?
[iter 0] loss=2.6377 val_loss=0.0000 scale=0.1250 norm=0.3378
~/miniconda3/envs/py37/lib/python3.7/site-packages/ngboost/distns/normal.py:13: RuntimeWarning:
overflow encountered in exp
~/miniconda3/envs/py37/lib/python3.7/site-packages/ngboost/distns/normal.py:14: RuntimeWarning:
overflow encountered in square
Cheers,
Christian
Hi, thank you for this repository. I'm using it in my work. I hope to know how to save and load the trained model. joblib.dump doesn't work.
Collecting jaxlib
Could not find a version that satisfies the requirement jaxlib (from versions: )
No matching distribution found for jaxlib
the issue raised under jax github page causes installation issues under Windows for this package
Would you consider releasing the PyTorch version too, I could make that work from the backup branch and the results look similar (though I'm not sure that the PyTorch implementation contains all the tricks from the jax version or not)
I want to continue testing (e.g. #35) with Github Actions.
This algorithm is working really good for regression problems where we can choose the among available distributions. However it is only limited to Bernouilli for classification which outputs a single probability value for a given outcome. Is there anyway we can have a confidence interval estimation for each possible outcome?
I believe the indentation is broken as of the last commit.
>>> from ngboost import NGBRegressor
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/omarwagih/miniconda2/envs/pred_dpsi/lib/python3.8/site-packages/ngboost/__init__.py", line 2, in <module>
from .api import NGBClassifier, NGBRegressor, NGBSurvival
File "/Users/omarwagih/miniconda2/envs/pred_dpsi/lib/python3.8/site-packages/ngboost/api.py", line 64
super().__init__(Dist, Score, Base, natural_gradient, n_estimators, learning_rate,
^
IndentationError: expected an indented block
Can you please fix?
Hi,
First of all, many thanks for ngboost.
The issue I'm reporting is related to the feature importances property. When I try to get this property, unfortunately the returned value is None. I believe this is due to the following if clause:
if not 'sklearn.tree.tree.DecisionTreeRegressor' in str(type(self.base_models[0][0])): return None
In sklearn version 0.22, str(type(self.base_models[0][0]))
returns <class 'sklearn.tree._classes.DecisionTreeRegressor'>
For me, it works if I replace the clause by
if not isinstance(self.base_models[0][0], sklearn.tree.DecisionTreeRegressor):
Can you please check if you also have this problem?
Many thanks,
Carlos
Column sampling by base learner would make NGBoost trivially scalable to high-dimensional datasets, so we should implement it.
Hi again,
Setting
NGBoost(Base=default_tree_learner, Dist=Normal, Score=MLE(), natural_gradient=True)
vs
NGBoost(Base=default_tree_learner, Dist=Normal, Score=MLE(), natural_gradient=False)
seems to give the same results. The code below shows at least this is the case for the predictions.
I checked the source code and the line https://github.com/stanfordmlgroup/ngboost/blob/master/ngboost/ngboost.py#L21 sets the attribute natural_gradient
but then it doesn't seem to use it anywhere else in that file.
from ngboost.ngboost import NGBoost
from ngboost.learners import default_tree_learner
from ngboost.scores import MLE
from ngboost.distns import Normal
from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
import numpy as np
np.random.seed(seed=2334)
X, Y = load_boston(True)
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.2)
ngb_natural = NGBoost(Base=default_tree_learner, Dist=Normal, Score=MLE(), natural_gradient=True,
verbose=False)
np.random.seed(seed=2334)
ngb_natural.fit(X_train, Y_train)
Y_preds_nat = ngb_natural.predict(X_test)
ngb_artificial = NGBoost(Base=default_tree_learner, Dist=Normal, Score=MLE(), natural_gradient=False,
verbose=False)
np.random.seed(seed=2334)
ngb_artificial.fit(X_train, Y_train)
Y_preds_art = ngb_artificial.predict(X_test)
#This one comes out True
assert np.allclose(Y_preds_nat, Y_preds_art)
Hello! I am very excited to see this package 🎉. Was wondering what the release plan to PyPI and conda-forge is? If you make a PyPI package, I am happy to help build the conda-forge recipe for you.
Are there any plans for implementing this in R?
Dear ngboost dev-team,
there is currently some discussion going on around catboost implementation of probabilistic forecasting:
Implement relevant algorithms from NGBoost
learn prediction intervals (variance, noise)
Since ngboost is mentioned here as well, wanted to let you know.
X, Y = load_breast_cancer(True)
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.2)
ngb = NGBClassifier(Dist=Bernoulli, verbose=True, minibatch_frac=0.5, n_estimators=50)
ngb.fit(X_train, Y_train)
usually delivers a long error stack terminating in
TypeError: ufunc 'expit' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
but
ngb = NGBClassifier(Dist=Bernoulli, verbose=True, minibatch_frac=1, n_estimators=50)
ngb.fit(X_train, Y_train)
does not, nor does
ngb = NGBClassifier(Dist=Bernoulli, verbose=True, minibatch_frac=0.5, n_estimators=20)
ngb.fit(X_train, Y_train)
I would like to know what are the limitations in adding base learners. I see in ngboost/ngboost/learners.py only two learners implemented, each taken from sklearn. Is it the case that we can add any base learner from sklearn simply by adding to this file some learner and the specifying it at NGBoost instantiation time? If this is not the case what is the limitation? Why aren't more base learners implemented?
ngboost.sample()
will break if you try to use it with survival data in the expected {Time, Event}
format.
Hey, great work. FYI, the first link to the ngboost page in your readme is broken. I guess it should be https://stanfordmlgroup.github.io/projects/ngboost/
, instead of https://stanfordmlgroup.github.io/project/ngboost/
.
Thanks !
If fit
was called with a validation set, subsequent calls to predict
should return the predictions from the model at best iteration according to the score on the validation set.
This is an amazing project and I have high hopes for using ngboost in my work. I don't currently see any sample_weight functionality. Are there any plans to add this? (I apologize, as I lack the technical expertise to do it myself).
Hi! I couldn't find anything on the site nor the paper on how to utilize this with GPUs. Is there even any GPU support as of yet?
Hi,
I expected the training and predictions to be the same after setting np.random.seed
Is there another seed I should set to obtain reproducible results?
Below I have an example you can run. I'm using the current version from Github.
from ngboost.ngboost import NGBoost
from ngboost.learners import default_tree_learner
from ngboost.scores import MLE
from ngboost.distns import Normal
from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
import numpy as np
np.random.seed(seed=2334)
X, Y = load_boston(True)
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.2)
ngb_natural = NGBoost(Base=default_tree_learner, Dist=Normal, Score=MLE(), natural_gradient=True,
verbose=False)
ngb_natural.fit(X_train, Y_train)
Y_preds_nat = ngb_natural.predict(X_test)
ngb_natural2 = NGBoost(Base=default_tree_learner, Dist=Normal, Score=MLE(), natural_gradient=True,
verbose=False)
ngb_natural2.fit(X_train, Y_train)
Y_preds_nat2 = ngb_natural2.predict(X_test)
#This one comes out False
assert np.allclose(Y_preds_nat, Y_preds_nat2)
#This one too
assert np.allclose(Y_preds_nat, Y_preds_nat2, rtol=1e-3)
#This one is true
assert np.allclose(Y_preds_nat, Y_preds_nat2, rtol=1e-2)
hello, I've tried NGBClassifier with some configurations, but no luck. thank you.
Would it make sense to implement early_stopping_rounds
like in LGBM or XGB? If so, I'm happy to contribute to the issue.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.