Git Product home page Git Product logo

ngboost's People

Contributors

alejandroschuler avatar angeldroth avatar avati avatar btatkinson avatar cdagnino avatar comprhys avatar connortann avatar cs2716 avatar daikikatsuragawa avatar dependabot[bot] avatar dguzelkokar avatar dingdaisy avatar eco3 avatar jack-mcivor avatar joseortiz3 avatar matsuken92 avatar mlko53 avatar mzjp2 avatar peterdudfield avatar rhjohnstone avatar ryan-wolbeck avatar saianil58 avatar samshipengs avatar themrzmaster avatar tokoroten avatar tonyduan avatar wakame1367 avatar yutayamazaki avatar zhiruiwang avatar zyxue avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ngboost's Issues

Discrete Burr XII and Skew-T distributions

Moving discussion here from #60 (comment)

@cooberp @kmedved here is some scaffold code that I'm hoping you folks can fill in/modify to get your distributions up and running:

from ngboost.distns import Distn
import numpy as np

class Generalized3ParamBurrXIIDiscrete(Distn):

    n_params = 3 

    def __init__(self, params):
        self.params_ = params # all real numbers 
        self.mu, self.sigma, self.mu = [exp(p) for p in params] # transform to positives

    def sample(self, m):
        """
        Code to draw m random samples from the distribution. Each sample will have 
        n = len(self) = len(self.mu) elements, so the output should be m x n        
        """
        return None

    def fit(Y):
        """
        Code to fit a *single* Discrete Burr XII distribution to data Y. 
        Needs to return parameters mu, sigma, and nu in a numpy array.

        This is just for initialization of the NGBoost algorithm, so it doesn't need to be perfect. 
        Ballpark is good enough.
        """
        mu, sigma, nu = None, None, None
        return np.array([mu, sigma, nu])

    # log score methods
    def nll(self, Y): 
        """
        log-likelihood (per observation). 
        Returns a vector of length n = len(self)
        """
        return -np.log(((1 + (Y/self.mu)**self.sigma)**(-self.nu)) - ((1 + ((Y+1)/self.mu)**self.sigma)**(-self.nu)))

    def D_nll(self, Y):
        """
        Returns the derivative of self.nll() with respect to each of the real-valued 
        parameters [log(mu), log(sigma), log(nu)]. 

        These can be easily calculated using, e.g., wolframalpha, and efficiently implemented here.
        """
        d_log_mu = np.zeros_like(self.mu)
        d_log_sigma = np.zeros_like(self.sigma)
        d_log_nu = np.zeros_like(self.nu)
        return np.array([dmu, dsigma, dnu])

This is for Discrete Burr XII, but the equivalent code should also do to implement your skew-t distribution.

D_nll() should be easy. Just copy-paste the nll into wolframalpha, edit the variable names, and ask for derivatives. Copy-paste back to D_nll, re-edit the variable names, and call it a day.

The biggest challenge, I think, will be implementing the sample() method, which is necessary if you don't want to derive/implement the Fisher Information. I was working on this myself but didn't have luck with anything simple. As you know, the distribution isn't already implemented in scipy.stats or another python package. scipy.stats does have a Burr XII, which I hoped to sample from and then use np.floor() on to get the discrete version, but then I noticed that what they call Burr XII has a different number of parameters than the pmf you gave me, which I think corresponds to some 3-parameter "generalized" version of the (discretized) Burr XII... All of which is fine, if that's what you want, but the upshot is that there isn't a pre-implemented version to sample from or an easy way to make one.

On the other hand, I don't think this is at all an insurmountable challenge. Making a sampling algorithm for a "custom" distribution is fairly straightforward using, e.g., inverse transform sampling. All you need to do is calculate the inverse CDF (use wolframalpha or whatever) and implement that. And if that doesn't work, there are other methods. All in all, still probably easier than deriving the Fisher.

The fit() method is also not necessarily trivial, but feel free to use whatever heuristics you want since it's just for initialization. Or go wild and implement/call some optimization method of your choosing.

I haven't looked into skew-T that closely, but the same general ideas should apply. And if the proper sample() method is already implemented somewhere, your job will be easier.

Since this is all a little bit of a challenge and not distributions others will likely use, I'm hoping you two (and/or your collaborators) can give me a hand here and give this a shot. But please do let me know where/if you get stuck and I will jump in to rescue as necessary!

NGBClassifier predict error

Running your code from https://github.com/stanfordmlgroup/ngboost/blob/master/examples/classification.py
I received the following error when I try

ngb.predict(X_test)

`---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
in ()
----> 1 ngb.predict(X_test)

/usr/local/lib/python3.6/dist-packages/ngboost/ngboost.py in predict(self, X)
133 def predict(self, X):
134 dist = self.pred_dist(X)
--> 135 return list(dist.loc.flatten())
136
137 def score(self, X, Y):

AttributeError: 'NoneType' object has no attribute 'flatten'
----------------------------------------------------------------`

ngb.pred_dist(X_test) works properly but the above error is also obtained when using the sklearn function with cross validation, "cross_validate()".

Regards,

indentation is broken in api.py

I believe the indentation is broken as of the last commit.

>>> from ngboost import NGBRegressor
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/omarwagih/miniconda2/envs/pred_dpsi/lib/python3.8/site-packages/ngboost/__init__.py", line 2, in <module>
    from .api import NGBClassifier, NGBRegressor, NGBSurvival
  File "/Users/omarwagih/miniconda2/envs/pred_dpsi/lib/python3.8/site-packages/ngboost/api.py", line 64
    super().__init__(Dist, Score, Base, natural_gradient, n_estimators, learning_rate,
    ^
IndentationError: expected an indented block

Can you please fix?

minibatch_frac < 1 does not work if X is a dataframe

If X is a pandas dataframe and Y is a series, and minibatch_frac == 1, then ngb.fit(X, Y) works with no issues.

But if minibatch_frac < 1, then the function sample in ngboost.py fails to work on dataframes:

    def sample(self, X, Y, params):
        if self.minibatch_frac == 1.0:
            return np.arange(len(Y)), X, Y, params
        sample_size = int(self.minibatch_frac * len(Y))
        idxs = np_rnd.choice(np.arange(len(Y)), sample_size, replace=False)
        return idxs, X[idxs,:], Y[idxs], params[idxs, :]

Because X[idxs, :] is not valid DataFrame syntax.

A workaround that works on my machine:

    def sample(self, X, Y, params):
        if self.minibatch_frac == 1.0:
            return np.arange(len(Y)), X, Y, params
        sample_size = int(self.minibatch_frac * len(Y))
        idxs = np_rnd.choice(np.arange(len(Y)), sample_size, replace=False)
        try:
              X_batch = X[idxs,:]
        except TypeError:
              X_batch = X.iloc[idxs, :]

        return idxs, X[idxs,:], Y[idxs], params[idxs, :]

I'm running version 0.1.3, installed from github via pip today, on Mac OS 10.14.13, python version 3.7.4

Save and load model

Hi, thank you for this repository. I'm using it in my work. I hope to know how to save and load the trained model. joblib.dump doesn't work.

NGBClassifier use DecisionTreeRegressor by default

default_tree_learner is DecisionTreeRegressor with friedman_mse criterion, which is kinda weird to use for classification.
I may be a bit confused here but is it really fine to use Regressor and not Classifier as a base class? It may be by design, but looks really weird.

Return train and val loss

Thank you for the excellent work with NGBoost, really excited to having been testing it out!

In commit c4b46b9 the fit method was altered to return self instead of the train and val losses. Is there any way to access the losses with the current behavior?

I believe the losses should be accessible, because we may not be interested in doing early stopping but actually training for a longer number of iterations and simply chose the best val loss.

Also, returning the losses is essential to compare different models.

readme: link not working

Hey, great work. FYI, the first link to the ngboost page in your readme is broken. I guess it should be https://stanfordmlgroup.github.io/projects/ngboost/, instead of https://stanfordmlgroup.github.io/project/ngboost/.

Thanks !

why sometimes get "LInAlgError:Singular matrix" error?

model = NGBClassifier(Base=default_tree_learner, Dist=Bernoulli,
Score=MLE, natural_gradient=True, verbose=False,n_estimators = 500)

Here is error


` 226 # fitting
--> 227 model.fit(train_arx,train_ary)
228 if return_proba :
229 predict_value = model.predict_proba(test_arx)

~/anaconda3/lib/python3.6/site-packages/ngboost/ngboost.py in fit(self, X, Y, X_val, Y_val, sample_weight, val_sample_weight, train_loss_monitor, val_loss_monitor, early_stopping_rounds)
119 loss_list += [train_loss_monitor(D, Y_batch)]
120 loss = loss_list[-1]
--> 121 grads = S.grad(D, Y_batch, natural=self.natural_gradient)
122
123 proj_grad = self.fit_base(X_batch, grads, sample_weight)

~/anaconda3/lib/python3.6/site-packages/ngboost/scores.py in grad(forecast, Y, natural)
13 grad = forecast.D_nll(Y)
14 if natural:
---> 15 grad = np.linalg.solve(fisher, grad)
16 return grad
17

<array_function internals> in solve(*args, **kwargs)

~/anaconda3/lib/python3.6/site-packages/numpy/linalg/linalg.py in solve(a, b)
401 signature = 'DD->D' if isComplexType(t) else 'dd->d'
402 extobj = get_linalg_error_extobj(_raise_linalgerror_singular)
--> 403 r = gufunc(a, b, signature=signature, extobj=extobj)
404
405 return wrap(r.astype(result_t, copy=False))

~/anaconda3/lib/python3.6/site-packages/numpy/linalg/linalg.py in _raise_linalgerror_singular(err, flag)
95
96 def _raise_linalgerror_singular(err, flag):
---> 97 raise LinAlgError("Singular matrix")
98
99 def _raise_linalgerror_nonposdef(err, flag):

LinAlgError: Singular matrix`


On the same data, sometimes works fine but sometimes occur errors. What should I do when this error occurs?

ngboost.distns.Normal implementation of nll and D_nll functions

Disclaimer: my background is mainly in (clinical) chemistry and less in maths/statistics (although I try to read and learn about it on a day-to-day basis) so please excuse me in advance if I am missing some obvious points.

I am trying to implement a ngboost.distns.Bernoulli class for binary classification problems using NGBoost algorithms. However, I have some questions around the implementation of the ngboost.distns.Normal, specifically the nll and D_nll functions. The nll (negative log-likelihood) function is written as follows:

def nll(self, Y):
  return -self.dist.logpdf(Y).mean()

My first question would be; is this actually the negative log-likelihood function? As far as my knowledge goes we use the sum of the PDF by implementing logpdf(<data>).sum() to obtain a log-likelihood of a Normal distribution. Is there some specific reason we account for N in this specific nll function? Secondly, the D_nll implementation (derivative of the nll function) is as follows:

def D_nll(self, Y_):
  Y = Y_.squeeze()
  D = np.zeros((self.var.shape[0], 2))
  D[:, 0] = (self.loc - Y) / self.var
  D[:, 1] = 1 - ((self.loc - Y) ** 2) / self.var
  return D

Secondly, why do we use this kind of implementation for the derivative? Wouldn't it be better to go for a more generic way and for instance use scipy.optimize? Has this to do with the fact that we use natural gradients? Third, why do we add 1e-8 to the scale and variance of our Normal distribution when we initialize it?

Thank you very much in advance.

Releases to PyPI and conda-forge

Hello! I am very excited to see this package 🎉. Was wondering what the release plan to PyPI and conda-forge is? If you make a PyPI package, I am happy to help build the conda-forge recipe for you.

Distributions for classification

This algorithm is working really good for regression problems where we can choose the among available distributions. However it is only limited to Bernouilli for classification which outputs a single probability value for a given outcome. Is there anyway we can have a confidence interval estimation for each possible outcome?

Sorry,it couldn't install ngboost under Windows

pip install ngboost
Collecting ngboost
Using cached https://files.pythonhosted.org/packages/58/15/8942e2b8a38f92b92e1dedad882b0746372e4acde236e74b98bfa66d717a/ngboost-0.1.3-py3-none-any.whl
Collecting scikit-learn>=0.21.3
Using cached https://files.pythonhosted.org/packages/76/79/60050330fe57fb59f2c53d0d11673df28c20ea9315da3652477429fc4949/scikit_learn-0.21.3-cp36-cp36m-win_amd64.whl
Collecting numpy>=1.17.2
Using cached https://files.pythonhosted.org/packages/55/7a/f32b39164262765b069b0fe3ec5d4b47580c9c60f7bd3588b58ba8e93a4c/numpy-1.17.3-cp36-cp36m-win_amd64.whl
ERROR: Could not find a version that satisfies the requirement jaxlib>=0.1.29 (from ngboost) (from versions: none)
ERROR: No matching distribution found for jaxlib>=0.1.29 (from ngboost)

return predictions at best iteration

If fit was called with a validation set, subsequent calls to predict should return the predictions from the model at best iteration according to the score on the validation set.

Feature importance

Hi,
First of all, many thanks for ngboost.

The issue I'm reporting is related to the feature importances property. When I try to get this property, unfortunately the returned value is None. I believe this is due to the following if clause:
if not 'sklearn.tree.tree.DecisionTreeRegressor' in str(type(self.base_models[0][0])): return None
In sklearn version 0.22, str(type(self.base_models[0][0])) returns <class 'sklearn.tree._classes.DecisionTreeRegressor'>

For me, it works if I replace the clause by
if not isinstance(self.base_models[0][0], sklearn.tree.DecisionTreeRegressor):

Can you please check if you also have this problem?

Many thanks,
Carlos

Jaxlib can't be installed under Windows

Collecting jaxlib
Could not find a version that satisfies the requirement jaxlib (from versions: )
No matching distribution found for jaxlib

google/jax#507

the issue raised under jax github page causes installation issues under Windows for this package

Would you consider releasing the PyTorch version too, I could make that work from the backup branch and the results look similar (though I'm not sure that the PyTorch implementation contains all the tricks from the jax version or not)

Any plan for model explanation functionality?

Read the paper and tried this package, had to say it is marvelous! I will not hesitate to use it in my real-life work, but wonder if there is plan to add support for model explanation tool such as feature importance plot, SHAP plot or tree visualizer. It would be crucial to present the model explanation to business stakeholders.
Thanks for this fantastic masterpiece!

empirical used retrain not refit as paper

It seesm the empirical results used retrain model (changing tree structures) but not refit (keep pretrained tree structures) in "ngboost/examples/empirical/regression.py". However, the paper said exactly "refit" which is more reasonable

input validation

  • make it clear that the only acceptable inputs are numeric numpy arrays
  • should be integers from 0:K in the case of classification

First try at implementing `ngboost.distn.Bernoulli` class

Disclaimer: forgive me for all my stupid mistakes and/or misinterpretation of several statistical things. My intentions were to provide a complete, working example of the Bernoulli class before uploading it, but because several people told me that they would like to help I decided to put this (preliminary) version already on GitHub. Once it is in a more sophisticated state I guess we can open up a pull request.

So for classification problems we require to have distributions which can match these problems accordingly (e.g. Bernoulli for binary classification). Thus, my aim was to create a Bernoulli class which would make binary classification using NGBoost feasible. The last few days I studied a lot of probability statistics and did my best at reading the NGBoost paper to maximum detail. Please forgive me in advance if I completely misunderstood the whole concept and my implementation might be complete nonsense (if so, please tell me). Underneath I provide a first version of the Bernoulli class, in which I would like to point out several things:

  1. The Bernoulli class was tested on the breast cancer classification dataset (scikit-learn) and it somehow seems to converge. Also the predicted probabilities seem to match with the labelled outcomes of the test dataset. There are some runtime issues, however.

  2. I am not sure how to implement the Bernoulli.fit method, as I would not have any better ideas to just set the initial parameter to the average positive probability in the dataset. Additionally, I am not 100% positive on the nll, D_nll and fisher_info functions.

  3. Running the example gives a lot of RuntimeWarnings exclusively about mathematical operations (e.g. invalid values, divides by zeroes). This is caused due to the NGBoost.line_search function but I yet have to look what exactly is causing this.

Bernoulli class:

class Bernoulli(object):
    """
    Bernoulli class containing the Bernoulli distribution.

    ...

    Attributes
    ----------
    n_params : int
        contains the numeric amount of params in our distribution.

    Methods
    -------
    nll(Y)
        returns the negative log-likelihood dependent on data `Y`.
    D_nll(Y)
        returns the first derivative of the negative log-likelihood dependent on data `Y`.
    fisher_info()
		returns the fisher information 
    """
	
    n_params = 1

    def __init__(self, params):
	# Initialize class 
	# Probablity for succes (only parameter)
        self.p = params[0]
		
	# Initialize the distribution
        self.dist = dist(self.p)

    def __getattr__(self, name):
        if name in dir(self.dist):
            return getattr(self.dist, name)
        return None

    def nll(self, Y):
	# formula: log(p) * X + log(1-p) * 1 - X
        Y = Y.squeeze()
		
        return np.array(-(np.log(self.p) * Y + np.log(1. - self.p) * (1 - Y)))

    def D_nll(self, Y):
	# formula: (X / p) - ((1 - X) / 1-p)
        Y = Y.squeeze()
        D = (Y / self.p) - ((1 - Y) / (1 - self.p))
        return D.reshape(-1, 1)

    def crps(self, Y):
	raise NotImplementedError('crps not implemented yet')

    def crps_metric(self, Y):
        raise NotImplementedError('crps_metric not implemented yet')

    def fisher_info(self):
	# formula: (1 / p(p-1)
        FI = np.ones((self.p.shape[0], 1, 1))
        FI[:, 0, 0] = 1 / (self.p * (self.p-1))
        return FI
		
    def fisher_info_cens(self, Y):
        # not sure, is this a specific function for censored data?
	# those "_cens" functions are not called in the API somewhere, I guess
	# these can be removed and are mainly in other classes cause of deprecated code?
	raise NotImplementedError('fisher_info_cens not implemented yet')
			
    def fit(Y):
	# how to fit to initial generic data?
	# now I set the `p` to the total amount of positive class, not sure if this is correct..
        return np.array([sum(Y.squeeze())/len(Y.squeeze())])

To perform a small test (WARNING: loads of RuntimeWarnings):

from ngboost.ngboost import NGBoost
from ngboost.learners import default_tree_learner
from ngboost.scores import MLE

from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split

X, Y = load_breast_cancer(True)
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.2)

ngb = NGBoost(Base=default_tree_learner, Dist=Bernoulli, Score=MLE(), natural_gradient=True,
              verbose=True)
ngb.fit(X_train, Y_train)
Y_dists = ngb.pred_dist(X_test)

test_NLL = Y_dists.nll(Y_test.flatten()).mean()
print('Test NLL', test_NLL)
print(Y_dists.p)
print(Y_test)

Full example also available as a Google Collab at: https://colab.research.google.com/drive/1_O2w1MXjuMKq7bc8Pj4Atv5a_bGKWSp7.

I am open for any suggestions, tips, help, guidance on how to develop this further. And once more, please my apologies in advance if I am completely missing the point somewhere.

Overflow warnings

This package looks so promising!

I am just testing it out on my dataset with dimensions (N, M) = (57795, 144). At first I tried with N=100, N=1000, and N=10_000 and it worked well. Now I am trying to run it on all N=57_795 and I am encountering some overflow errors, see below. Is this something to be worried about?

[iter 0] loss=2.6377 val_loss=0.0000 scale=0.1250 norm=0.3378
~/miniconda3/envs/py37/lib/python3.7/site-packages/ngboost/distns/normal.py:13: RuntimeWarning:

overflow encountered in exp

~/miniconda3/envs/py37/lib/python3.7/site-packages/ngboost/distns/normal.py:14: RuntimeWarning:

overflow encountered in square

Cheers,
Christian

Fixing error

Encountering this error repeatedly - "type object 'LogNormal' has no attribute 'scores' ". How can I fix it?

natural_gradient option in fit doesn't seem to change the result

Hi again,

Setting

NGBoost(Base=default_tree_learner, Dist=Normal, Score=MLE(), natural_gradient=True)

vs

NGBoost(Base=default_tree_learner, Dist=Normal, Score=MLE(), natural_gradient=False)

seems to give the same results. The code below shows at least this is the case for the predictions.

I checked the source code and the line https://github.com/stanfordmlgroup/ngboost/blob/master/ngboost/ngboost.py#L21 sets the attribute natural_gradient but then it doesn't seem to use it anywhere else in that file.

from ngboost.ngboost import NGBoost
from ngboost.learners import default_tree_learner
from ngboost.scores import MLE
from ngboost.distns import Normal

from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
import numpy as np

np.random.seed(seed=2334)

X, Y = load_boston(True)
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.2)

ngb_natural = NGBoost(Base=default_tree_learner, Dist=Normal, Score=MLE(), natural_gradient=True,
                      verbose=False)

np.random.seed(seed=2334)
ngb_natural.fit(X_train, Y_train)
Y_preds_nat = ngb_natural.predict(X_test)

ngb_artificial = NGBoost(Base=default_tree_learner, Dist=Normal, Score=MLE(), natural_gradient=False,
                      verbose=False)


np.random.seed(seed=2334)
ngb_artificial.fit(X_train, Y_train)
Y_preds_art = ngb_artificial.predict(X_test)

#This one comes out True
assert np.allclose(Y_preds_nat, Y_preds_art)

GPU usage

Hi! I couldn't find anything on the site nor the paper on how to utilize this with GPUs. Is there even any GPU support as of yet?

early_stopping_rounds

Would it make sense to implement early_stopping_rounds like in LGBM or XGB? If so, I'm happy to contribute to the issue.

Add __version__ attribute to the package

Hi,

I couldn't find any version attribute within the package.
I think it's helpful for bug reports to be able to do

import ngboost
print(ngboost.__version__)

Non reproducibility of results even after np.random.seed is set

Hi,

I expected the training and predictions to be the same after setting np.random.seed
Is there another seed I should set to obtain reproducible results?

Below I have an example you can run. I'm using the current version from Github.

from ngboost.ngboost import NGBoost
from ngboost.learners import default_tree_learner
from ngboost.scores import MLE
from ngboost.distns import Normal

from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
import numpy as np

np.random.seed(seed=2334)

X, Y = load_boston(True)
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.2)

ngb_natural = NGBoost(Base=default_tree_learner, Dist=Normal, Score=MLE(), natural_gradient=True,
                      verbose=False)

ngb_natural.fit(X_train, Y_train)
Y_preds_nat = ngb_natural.predict(X_test)

ngb_natural2 = NGBoost(Base=default_tree_learner, Dist=Normal, Score=MLE(), natural_gradient=True,
                      verbose=False)

ngb_natural2.fit(X_train, Y_train)
Y_preds_nat2 = ngb_natural2.predict(X_test)

#This one comes out False
assert np.allclose(Y_preds_nat, Y_preds_nat2)

#This one too
assert np.allclose(Y_preds_nat, Y_preds_nat2, rtol=1e-3)

#This one is true
assert np.allclose(Y_preds_nat, Y_preds_nat2, rtol=1e-2)

Add support for GridSearchCV

Thanks for your work sharing. Maybe it is necessary to add support for GridSearchCV of scikit-learn to improve the usability and influence.

How to use the train and val monitors?

Hi, I was wondering which is the purpose of the arguments train_loss_monitor and val_loss_monitor. Sklearn models usually include a monitor argument that allows for early stopping. Is that the idea?

gradient of normal distribution negative log-likelihood

I might be wrong but the gradient seems incorrect wrt sigma

I think this should be used, my tests give bad results on boston (nll) with the current implementation, with the following change it's good again

    def D_nll(self, Y_):
        Y = Y_.squeeze()
        D = np.zeros((self.var.shape[0], 2))
        D[:, 0] = (Y - self.loc) / self.var
        D[:, 1] = (Y - self.loc)**2 / self.scale**3 - 1/self.scale
        return -D

I can't install it

Collecting git+https://github.com/stanfordmlgroup/ngboost.git
Cloning https://github.com/stanfordmlgroup/ngboost.git to c:\users\stig.cz\appdata\local\temp\pip-req-build-zjnf7vpn
Running command git clone -q https://github.com/stanfordmlgroup/ngboost.git 'C:\Users\Stig.CZ\AppData\Local\Temp\pip-req-build-zjnf7vpn'
error: RPC failed; curl 18 transfer closed with outstanding read data remaining
fatal: the remote end hung up unexpectedly
fatal: early EOF
fatal: index-pack failed
ERROR: Command errored out with exit status 128: git clone -q https://github.com/stanfordmlgroup/ngboost.git 'C:\Users\Stig.CZ\AppData\Local\Temp\pip-req-build-zjnf7vpn' Check the logs for full command output.

What are the limitations to add a base learner?

I would like to know what are the limitations in adding base learners. I see in ngboost/ngboost/learners.py only two learners implemented, each taken from sklearn. Is it the case that we can add any base learner from sklearn simply by adding to this file some learner and the specifying it at NGBoost instantiation time? If this is not the case what is the limitation? Why aren't more base learners implemented?

classification broken for large n_estimators and small minibatch_frac

X, Y = load_breast_cancer(True)
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.2)

ngb = NGBClassifier(Dist=Bernoulli, verbose=True, minibatch_frac=0.5, n_estimators=50)
ngb.fit(X_train, Y_train)

usually delivers a long error stack terminating in

TypeError: ufunc 'expit' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

but

ngb = NGBClassifier(Dist=Bernoulli, verbose=True, minibatch_frac=1, n_estimators=50)
ngb.fit(X_train, Y_train)

does not, nor does

ngb = NGBClassifier(Dist=Bernoulli, verbose=True, minibatch_frac=0.5, n_estimators=20)
ngb.fit(X_train, Y_train)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.