Git Product home page Git Product logo

wildboottest's Introduction

wildboottest

PyPI PyPI - Downloads

wildboottest implements multiple fast wild cluster bootstrap algorithms as developed in Roodman et al (2019) and MacKinnon, Nielsen & Webb (2022).

It has similar, but more limited functionality than Stata's boottest, R's fwildcusterboot or Julia's WildBootTests.jl. It supports

At the moment, wildboottest only computes wild cluster bootstrapped p-values, and no confidence intervals.

Other features that are currently not supported:

  • The subcluster bootstrap (MacKinnon and Webb 2018).
  • Confidence intervals formed by inverting the test and iteratively searching for bounds.
  • Multiway clustering.

Direct support for statsmodels and linearmodels is work in progress.

If you'd like to cooperate, either send us an email or comment in the issues section!

Installation

You can install wildboottest from PyPi by running

pip install wildboottest

Example

import pandas as pd
import statsmodels.formula.api as sm
from wildboottest.wildboottest import wildboottest

df = pd.read_csv("https://raw.github.com/vincentarelbundock/Rdatasets/master/csv/sandwich/PetersenCL.csv")
model = sm.ols(formula='y ~ x', data=df)

wildboottest(model, param = "x", cluster = df.firm, B = 9999, bootstrap_type = '11')
# | param   |   statistic |   p-value |
# |:--------|------------:|----------:|
# | x       |      20.453 |     0.000 |

wildboottest(model, param = "x", cluster = df.firm, B = 9999, bootstrap_type = '31')
# | param   |   statistic |   p-value |
# |:--------|------------:|----------:|
# | x       |      30.993 |     0.000 |

# bootstrap inference for all coefficients
wildboottest(model, cluster = df.firm, B = 9999, bootstrap_type = '31')
# | param     |   statistic |   p-value |
# |:----------|------------:|----------:|
# | Intercept |       0.443 |     0.655 |
# | x         |      20.453 |     0.000 |

# non-clustered wild bootstrap inference
wildboottest(model, B = 9999, bootstrap_type = '11')
# | param     |   statistic |   p-value |
# |:----------|------------:|----------:|
# | Intercept |       1.047 |     0.295 |
# | x         |      36.448 |     0.000 |

wildboottest's People

Contributors

s3alfisc avatar amichuda avatar dependabot[bot] avatar

Stargazers

Cong Wang avatar Wenzhi Ding avatar yixinmei avatar Iván Higuera Mendieta avatar Viet Hung Pham  avatar  avatar Dmitry Balabka avatar

Watchers

 avatar  avatar Kostas Georgiou avatar

wildboottest's Issues

Small sample corrections

  • For the x1 bootstraps, I will implement a ssc correction of $(N-1) / (N-k)$.
  • For the x3 bootstraps, $(G-1) / G$

Both is more or less the standard in the literature. For compatibility with choices made via statsmodels / linearmodels, add the option to overwrite these default values.

Note that for p-values, the choice of small sample corrections does not have an impact (as the ssc's cancel out when computing p-values by counting cases of scc x t_stat < ssc x t_boot). They only affect the internally computed and reported non-bootstrapped test statistic.

wildboottest without clusters

Should we allow for no cluster input, and in which case, does it just turn into a regular wild bootstrap? Or should there be an error?

[Bug] Fixed seed leads to different results

Currently, running wildboottest() twice under the same seed leads to different inferences. This is of course terrible for reproducibility and we should fix it.

Example:

import pandas as pd
import statsmodels.formula.api as sm
from wildboottest.wildboottest import wildboottest
import numpy as np

df = pd.read_csv("https://raw.github.com/vincentarelbundock/Rdatasets/master/csv/sandwich/PetersenCL.csv")
df['treat'] = np.random.choice([0, 1], df.shape[0], True)
model = sm.ols(formula='y ~ treat', data=df)

wildboottest(model, param = "treat", cluster = df.year, B= 999, seed = 12)
# | param   |   statistic |   p-value |
# |:--------|------------:|----------:|
# | treat   |       1.129 |     0.255 |

wildboottest(model, param = "treat", cluster = df.year, B= 999, seed = 12)
# | param   |   statistic |   p-value |
# |:--------|------------:|----------:|
# | treat   |       1.129 |     0.289 |

Error in bootstrap algo for '11' types?

In a first test of Python vs R, it looks like the '31' bootstrap t-statistics match exactly under full enumeration (as they should), while the '11' do not:

See here

I suppose that the R version is correct, as it's tested against WildBootTests.jl and produces matching statistics.

>>> df (Python)
          WCR11         WCR31     WCU11     WCU31
0 -5.569437e-07  7.119511e-16 -0.547569  0.079819
1 -3.286834e+00 -3.286832e+00 -4.243206 -3.085644
2  1.715372e-02  1.715392e-02 -0.166832  0.053646
3 -2.881170e+00 -2.881171e+00 -1.780821 -3.045997
4  2.881170e+00  2.881171e+00  1.780821  3.045997
5 -1.715372e-02 -1.715392e-02  0.166832 -0.053646
6  3.286834e+00  3.286832e+00  4.243206  3.085644
7  5.569437e-07 -7.119511e-16  0.547569 -0.079819
>>> r_df (R)
   Unnamed: 0     WCR11         WCR31     WCU11     WCU31
0           1 -0.620824 -1.017073e-16 -0.547213  0.079819
1           2 -3.852912 -3.286832e+00 -4.198954 -3.085644
2           3 -0.196266  1.715392e-02 -0.171808  0.053646
3           4 -1.630894 -2.881171e+00 -1.782077 -3.045997
4           5  1.630894  2.881171e+00  1.782077  3.045997
5           6  0.196266 -1.715392e-02  0.171808 -0.053646
6           7  3.852912  3.286832e+00  4.198954  3.085644
7           8  0.620824  1.017073e-16  0.547213 -0.079819

Should we just assume pandas dataframes as input?

A lot of the data inputs you have in the boot_algo3 function in R assumes a dataframe. Should we just do the same and assume pandas dataframes in the python version? We can then turn them into numpy arrays in the function for performance.

Fix Documentation

  • example for Wildboottest class not running as R has wrong dimension
  • some minor points in the docs that became more visible after creating the mkdocs site

`self.k` should be defined by `X` not `R`

@s3alfisc

Since self.k is used for creating matrices and such for X, should we redefine self.k = self.X.shape[1], not self.k = self.R.shape[0], and then make an error if self.k != self.R.shape[0].

I think this makes more sense, since the user can make a mistake in defining R which might propagate through the code and show up in place not related to conducting the statistical test.

Do you concur?

Repo Cleanup

  • Delete files that are not required, e.g.
  • ... any other files that are not required?
  • Check that package metadata is accurate etc.
  • ...

Allow scalar tests of multiple coefficients

E.g. allow tests for R'beta = r, with R a vector of length K and r a scalar.

  • start with the heteroskedastic bootstrap (as it is less work)
  • continue with the wild cluster bootstrap

API discussions

Some thoughts regarding a statsmodels API

Our state of departure is the following:

model.fit(cov_type='wildclusterbootstrap',
             cov_kwds = {'cluster' : cluster,
                         'B' : 9999,
                         'weights_type' : 'rademacher',
                         'impose_null' : True, 
                         'bootstrap_type' : '11', 
                         'seed' : None,
                         'param' : 'X1'}).summary()

The internal function which is called, wildboottest(), returns a pvalue for a bootstrapped t-test of the null hypothesis $X_1 = 0$ vs $X_1 \neq 0$. Further, users can impose the null hypothesis $X_1 = 0$ on the bootstrap data generating process via the impose_null argument.

In consequence, we can report a full "regression table" as below by looping over all parameters [x1, ..., x10] and also impose the null hypothesis for each hypothesis on the bootstrap dgp. For this use case, it would be nice if we could allow a 'param' value of e.g. 'ALL', which would loop 10 times over the internal wildboottest(), imposing the null if impose_null = True or not.

Second, if a user selects to test only one parameter, e.g. X1, it would be great if we could simply output the column for X1. I suppose that most times, researchers are not interested in inferences on the full set of regression coefficients, so computing all of them would only be wasteful. This is how fwildclusterboot, boottest and WildBootTests.jl all operate, but not what statsmodels does. Maybe we should ask the statsmodels maintainers what they think about this?

Last, once the bootstrapped vcov matrix is available, we can compute an F-statistic, but we can no longer impose the null hypothesis on the bootstrap dgp as we would use only one bootstrapped vcov matrix for all k coefficient tests. In consequence, specifying impose_null = True needs to lead to an error. With a bootstrapped vcov, inference (both p-values and confidence intervals) can then be based on bootstrapped t-statistics (percentile-t, strongly preferable and what currently happens) or asymptotic approximations (i.e. the t(G-1) distribution). In short, with a full vcov matrix, we should be able to support all features of a regular statsmodels regression table, including standard errors, at the cost of being able to impose the null hypothesis on the wild cluster bootstrap dgp.

All in all, providing a bootstrapped vcov and se's leads to deviations from fwildclusterboot, boottest and WildBootTests.jl, which does not necessarily make me happy. 😄

Still, I lean towards having both, but we need to make a few decision as sketched above.

I hope this is all understandable - let me know what you think @amichuda ! =)

                                 OLS Regression Results                                
=======================================================================================
Dep. Variable:                      y   R-squared (uncentered):                   0.737
Model:                            OLS   Adj. R-squared (uncentered):              0.735
Method:                 Least Squares   F-statistic:                              278.0
Date:                Mon, 17 Oct 2022   Prob (F-statistic):                   2.58e-279
Time:                        18:35:01   Log-Likelihood:                         -1769.6
No. Observations:                1000   AIC:                                      3559.
Df Residuals:                     990   BIC:                                      3608.
Df Model:                          10                                                  
Covariance Type:            nonrobust                                                  
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
x1            -0.0457      NaN    -1.016      0.310      -0.134       0.043
x2            -1.1441      NaN    -24.947      0.000      -1.234      -1.054
x3             0.3851      NaN      8.243      0.000       0.293       0.477
x4            -0.6970      NaN    -15.284      0.000      -0.786      -0.607
x5            -0.5754      NaN    -12.380      0.000      -0.667      -0.484
x6             0.0367      NaN      0.790      0.430      -0.054       0.128
x7             0.2766      NaN      5.957      0.000       0.185       0.368
x8            -1.1516      NaN    -25.925      0.000      -1.239      -1.064
x9             0.5564      NaN     12.022      0.000       0.466       0.647
x10           -1.2981      NaN    -28.412      0.000      -1.388      -1.208
==============================================================================
Omnibus:                        0.445   Durbin-Watson:                   0.955
Prob(Omnibus):                  0.800   Jarque-Bera (JB):                0.327
Skew:                          -0.007   Prob(JB):                        0.849
Kurtosis:                       3.087   Cond. No.                         1.18
==============================================================================

Notes:
[1] R² is computed without centering (uncentered) since the model does not contain a constant.
[2] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[3] Standard errors not available for wildclusterbootstrap

Implement Confidence Intervals (by test inversion)

  • easy for the WCU: simply take quantiles of bootstrapped t-statistics- more work for WCR
  • allow hypothesis tests with different values of r as in $R \beta = r$
  • implement bisection algorithm

Overall, follow exposition in MacKinnon (2022, Econometrics) - adjust for new bootstrap types.

Implement Unit Tests

For features in version 0.1:

Test the bootstrap algos against R & Julia. strategy: create data + run algo in R, Julia, Python + save all scrips + hard-code test values (or safe in file) for actual tests.

  • Potentially set up CI via github actions (?)
  • Potentially set up codecov (?)

Implement the following tests - each for different weights, bootstrap types, null on dgp, etc...

External Tests:

  • Using full enumeration, are bootstrapped t-statistics exactly identical between R, Julia, Python when using the same small sample corrections?
  • When full_enumeration = False, are bootstrapped p-values almost identical? Are non-bootstrapped t-stats exactly identical?

Internal Tests:

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.