py-econometrics / wildboottest Goto Github PK

View Code? Open in Web Editor NEW

7.0 3.0 0.0 4.23 MB

python module for wild cluster bootstrapping

Home Page: https://py-econometrics.github.io/wildboottest/

License: MIT License

Python 100.00%

linear-regression wild-cluster-bootstrap cluster-robust-inference

wildboottest's Issues

Allow scalar tests of multiple coefficients

E.g. allow tests for R'beta = r, with R a vector of length K and r a scalar.

start with the heteroskedastic bootstrap (as it is less work)
continue with the wild cluster bootstrap

Allow option to return "full" bootstrapped vcov matrix

See the discussion here.

Have f-test model

     return np.nan

https://github.com/amichuda/statsmodels/blob/c74f4a6e3b593ad4c2ec4dc4adc1a36d8ab48cf4/statsmodels/regression/linear_model.py#L1853

Right now we don't fun a f-stat for the full model, but the statsmodels table has it. Setting it to np.nan for now.

Fix Documentation

example for Wildboottest class not running as R has wrong dimension
some minor points in the docs that became more visible after creating the mkdocs site

Possible type in `boot_algo3`

@s3alfisc

The arguments in the function reference, N_G_bootcluster, but the function uses two arguments, bootcluster and N_G. Was the underscore a typo?

https://github.com/s3alfisc/wildboottest/blob/b33df5fca1ea72986b47d33ebc575bc4068b4d81/src/boot_algo3.py#L3

Implement heteroskedastic wild bootstrap

rename Wildboottest method to Wildboottest_CR
create new method Wildboottest_HR

Add github actions for mkdocs website build

bootcluster argument optional

Make the bootcluster argument optional - set it to 'cluster' by default

Publish `wildboottest` on pypi

What do you think about publishing so that statsmodels can add it to dependencies?

Provide an API for statsmodels

Discuss design choices, i.e.:

allow fixed effects?
how to handle clustering?

Some thoughts regarding a statsmodels API

Our state of departure is the following:

model.fit(cov_type='wildclusterbootstrap',
             cov_kwds = {'cluster' : cluster,
                         'B' : 9999,
                         'weights_type' : 'rademacher',
                         'impose_null' : True, 
                         'bootstrap_type' : '11', 
                         'seed' : None,
                         'param' : 'X1'}).summary()

The internal function which is called, wildboottest(), returns a pvalue for a bootstrapped t-test of the null hypothesis $X_1 = 0$ vs $X_1 \neq 0$. Further, users can impose the null hypothesis $X_1 = 0$ on the bootstrap data generating process via the impose_null argument.

In consequence, we can report a full "regression table" as below by looping over all parameters [x1, ..., x10] and also impose the null hypothesis for each hypothesis on the bootstrap dgp. For this use case, it would be nice if we could allow a 'param' value of e.g. 'ALL', which would loop 10 times over the internal wildboottest(), imposing the null if impose_null = True or not.

Second, if a user selects to test only one parameter, e.g. X1, it would be great if we could simply output the column for X1. I suppose that most times, researchers are not interested in inferences on the full set of regression coefficients, so computing all of them would only be wasteful. This is how fwildclusterboot, boottest and WildBootTests.jl all operate, but not what statsmodels does. Maybe we should ask the statsmodels maintainers what they think about this?

Last, once the bootstrapped vcov matrix is available, we can compute an F-statistic, but we can no longer impose the null hypothesis on the bootstrap dgp as we would use only one bootstrapped vcov matrix for all k coefficient tests. In consequence, specifying impose_null = True needs to lead to an error. With a bootstrapped vcov, inference (both p-values and confidence intervals) can then be based on bootstrapped t-statistics (percentile-t, strongly preferable and what currently happens) or asymptotic approximations (i.e. the t(G-1) distribution). In short, with a full vcov matrix, we should be able to support all features of a regular statsmodels regression table, including standard errors, at the cost of being able to impose the null hypothesis on the wild cluster bootstrap dgp.

All in all, providing a bootstrapped vcov and se's leads to deviations from fwildclusterboot, boottest and WildBootTests.jl, which does not necessarily make me happy. 😄

Still, I lean towards having both, but we need to make a few decision as sketched above.

I hope this is all understandable - let me know what you think @amichuda ! =)

                                 OLS Regression Results                                
=======================================================================================
Dep. Variable:                      y   R-squared (uncentered):                   0.737
Model:                            OLS   Adj. R-squared (uncentered):              0.735
Method:                 Least Squares   F-statistic:                              278.0
Date:                Mon, 17 Oct 2022   Prob (F-statistic):                   2.58e-279
Time:                        18:35:01   Log-Likelihood:                         -1769.6
No. Observations:                1000   AIC:                                      3559.
Df Residuals:                     990   BIC:                                      3608.
Df Model:                          10                                                  
Covariance Type:            nonrobust                                                  
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
x1            -0.0457      NaN    -1.016      0.310      -0.134       0.043
x2            -1.1441      NaN    -24.947      0.000      -1.234      -1.054
x3             0.3851      NaN      8.243      0.000       0.293       0.477
x4            -0.6970      NaN    -15.284      0.000      -0.786      -0.607
x5            -0.5754      NaN    -12.380      0.000      -0.667      -0.484
x6             0.0367      NaN      0.790      0.430      -0.054       0.128
x7             0.2766      NaN      5.957      0.000       0.185       0.368
x8            -1.1516      NaN    -25.925      0.000      -1.239      -1.064
x9             0.5564      NaN     12.022      0.000       0.466       0.647
x10           -1.2981      NaN    -28.412      0.000      -1.388      -1.208
==============================================================================
Omnibus:                        0.445   Durbin-Watson:                   0.955
Prob(Omnibus):                  0.800   Jarque-Bera (JB):                0.327
Skew:                          -0.007   Prob(JB):                        0.849
Kurtosis:                       3.087   Cond. No.                         1.18
==============================================================================

Notes:
[1] R² is computed without centering (uncentered) since the model does not contain a constant.
[2] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[3] Standard errors not available for wildclusterbootstrap

Implement proper error messages

Fix bugs in '31' bootstrap types

Currently, there are dimension errors in matrix multiplications, and likely other errors.

Set up a Python package structure

`bootcluster` parameter

This code here uses a bootcluster variable that is from the original R code coming from a pre-processed object.

According to the software website it is:

allows the user to specify subclusters via the bootcluster argument.

Can you explain that?

https://github.com/s3alfisc/wildboottest/blob/b33df5fca1ea72986b47d33ebc575bc4068b4d81/src/boot_algo3.py#L33

Check accuracy of documentation

Allow to call WildBootTests.jl via PyJulia

For examples, see the WildBootTests.jl and fwildclusterboot's boot_algo_julia.R function

`self.k` should be defined by `X` not `R`

@s3alfisc

Since self.k is used for creating matrices and such for X, should we redefine self.k = self.X.shape[1], not self.k = self.R.shape[0], and then make an error if self.k != self.R.shape[0].

I think this makes more sense, since the user can make a mistake in defining R which might propagate through the code and show up in place not related to conducting the statistical test.

Do you concur?

Repo Cleanup

Delete files that are not required, e.g.
... any other files that are not required?
Check that package metadata is accurate etc.
...

`fe` in `bootalgo3`

@s3alfisc

Is fe supposed to be a long variable of being in a particular entity/group (like hhids) or are they already supposed to be dummy variables?

https://github.com/s3alfisc/fwildclusterboot/blob/99f5ec51392ea70a61d0e54dc53b739de057b3b5/R/boot_algo3.R#L71

Implement CRV3 variance-covariance estimator

R code for the cluster jackknife in the (summclust R package](https://github.com/s3alfisc/summclust/blob/main/R/cluster_jackknife.R)

Jit compile x3 bootstrap

... else it is rather slow.

Implement Unit Tests

For features in version 0.1:

Test the bootstrap algos against R & Julia. strategy: create data + run algo in R, Julia, Python + save all scrips + hard-code test values (or safe in file) for actual tests.

Potentially set up CI via github actions (?)
Potentially set up codecov (?)

Implement the following tests - each for different weights, bootstrap types, null on dgp, etc...

External Tests:

Using full enumeration, are bootstrapped t-statistics exactly identical between R, Julia, Python when using the same small sample corrections?
When full_enumeration = False, are bootstrapped p-values almost identical? Are non-bootstrapped t-stats exactly identical?

Internal Tests:

Update readme

Test heteroskedastic bootstrap types "21" and "31" against fwildclusterboot

As soon as fwildclusterboot allows these variants (currently in the dev branch).

Is this just a permutation function?

        #TODO: Is this just a permutation function?

undefined

Continuous Integration for Tests

Add CI support.

Should this divide the whole expression by 2 or just the second part

        a= np.array([-1, 1]) * (np.sqrt(5) + np.array([-1, 1])) / 2 #TODO: Should this divide the whole expression by 2 or just the second part

undefined

[Bug] Fixed seed leads to different results

Currently, running wildboottest() twice under the same seed leads to different inferences. This is of course terrible for reproducibility and we should fix it.

Example:

import pandas as pd
import statsmodels.formula.api as sm
from wildboottest.wildboottest import wildboottest
import numpy as np

df = pd.read_csv("https://raw.github.com/vincentarelbundock/Rdatasets/master/csv/sandwich/PetersenCL.csv")
df['treat'] = np.random.choice([0, 1], df.shape[0], True)
model = sm.ols(formula='y ~ treat', data=df)

wildboottest(model, param = "treat", cluster = df.year, B= 999, seed = 12)
# | param   |   statistic |   p-value |
# |:--------|------------:|----------:|
# | treat   |       1.129 |     0.255 |

wildboottest(model, param = "treat", cluster = df.year, B= 999, seed = 12)
# | param   |   statistic |   p-value |
# |:--------|------------:|----------:|
# | treat   |       1.129 |     0.289 |

Implement Confidence Intervals (by test inversion)

easy for the WCU: simply take quantiles of bootstrapped t-statistics- more work for WCR
allow hypothesis tests with different values of r as in $R \beta = r$
implement bisection algorithm

Overall, follow exposition in MacKinnon (2022, Econometrics) - adjust for new bootstrap types.

Enable Jit Compilation of the Heteroskedastic bootstrap

wildboottest without clusters

Should we allow for no cluster input, and in which case, does it just turn into a regular wild bootstrap? Or should there be an error?

Prettify output

Small sample corrections

For the x1 bootstraps, I will implement a ssc correction of $(N-1) / (N-k)$.
For the x3 bootstraps, $(G-1) / G$

Both is more or less the standard in the literature. For compatibility with choices made via statsmodels / linearmodels, add the option to overwrite these default values.

Note that for p-values, the choice of small sample corrections does not have an impact (as the ssc's cancel out when computing p-values by counting cases of scc x t_stat < ssc x t_boot). They only affect the internally computed and reported non-bootstrapped test statistic.

Number of bootstrap iterations, B, overridden by `get_weights`

@s3alfisc

In __init__, you need to specify B, but in order to run the boostrap, you need to run get_weights which overrides the attribute. Should B not be specified then?

do different weights lead to different inferences?

Should we just assume pandas dataframes as input?

A lot of the data inputs you have in the boot_algo3 function in R assumes a dataframe. Should we just do the same and assume pandas dataframes in the python version? We can then turn them into numpy arrays in the function for performance.

Problems with WCR and tests with only one covariates

... as in classical AB test setups without covariates. The same problem exists for fwildclusterboot.

does seeding work? E.g. same seeds -> same results, different seeds -> different results?

Allow for regression weights in bootstrap algorithm

simply multiply dependent variable and design matrix with $\sqrt(W)$?

Error in bootstrap algo for '11' types?

In a first test of Python vs R, it looks like the '31' bootstrap t-statistics match exactly under full enumeration (as they should), while the '11' do not:

See here

I suppose that the R version is correct, as it's tested against WildBootTests.jl and produces matching statistics.

>>> df (Python)
          WCR11         WCR31     WCU11     WCU31
0 -5.569437e-07  7.119511e-16 -0.547569  0.079819
1 -3.286834e+00 -3.286832e+00 -4.243206 -3.085644
2  1.715372e-02  1.715392e-02 -0.166832  0.053646
3 -2.881170e+00 -2.881171e+00 -1.780821 -3.045997
4  2.881170e+00  2.881171e+00  1.780821  3.045997
5 -1.715372e-02 -1.715392e-02  0.166832 -0.053646
6  3.286834e+00  3.286832e+00  4.243206  3.085644
7  5.569437e-07 -7.119511e-16  0.547569 -0.079819
>>> r_df (R)
   Unnamed: 0     WCR11         WCR31     WCU11     WCU31
0           1 -0.620824 -1.017073e-16 -0.547213  0.079819
1           2 -3.852912 -3.286832e+00 -4.198954 -3.085644
2           3 -0.196266  1.715392e-02 -0.171808  0.053646
3           4 -1.630894 -2.881171e+00 -1.782077 -3.045997
4           5  1.630894  2.881171e+00  1.782077  3.045997
5           6  0.196266 -1.715392e-02  0.171808 -0.053646
6           7  3.852912  3.286832e+00  4.198954  3.085644
7           8  0.620824  1.017073e-16  0.547213 -0.079819

Implement bootstrap types x3 (with CRV3 vcov used for covariance estimation in bootstrap samples)

Round printed output to a sensible number

Currently, we report t-stats with more than 10 digits behind the comma.

# 0    X1  [-1.0530803154504016]  0.308831

        #TODO: Is this just a permutation function?
        permutations(
            n = 2,
            r = N_G_bootcluster,
            v = c(1, -1),
            repeats.allowed = TRUE
        )

py-econometrics / wildboottest Goto Github PK

wildboottest's Issues

Some thoughts regarding a statsmodels API

Recommend Projects

Recommend Topics

Recommend Org