quantco / metalearners Goto Github PK

View Code? Open in Web Editor NEW

26.0 6.0 2.0 1.73 MB

MetaLearners for CATE estimation

Home Page: https://metalearners.readthedocs.io/en/latest/

License: BSD 3-Clause "New" or "Revised" License

Python 99.94% Shell 0.06%

metalearners's Introduction

metalearners

MetaLearners for Conditional Average Treatment Effect (CATE) estimation

The library focuses on providing

Methodologically sound cross-fitting
Convenient access to and reuse of base models
Consistent APIs across Metalearners
Support for more than binary treatment variants
Integrations with pandas, shap, lime, optuna and soon onnx

Example

df = ...

from metalearners import RLearner
from lightgbm import LGBMClassifier, LGBMRegressor

rlearner = RLearner(
    nuisance_model_factory=LGBMRegressor,
    propensity_model_factory=LGBMClassifier,
    treatment_model_factory=LGBMRegressor,
    is_classification=False,
    n_variants=2,
)

features = ["age", "weight", "height"]
rlearner.fit(df[features], df["treatment"], df["outcomes"])
cate_estimates = rlearner.predict(df[features], is_oos=False)

Please refer to our docs for many more in-depth and reproducible examples.

Installation

metalearners can either be installed via PyPI with

$ pip install metalearners

or via conda-forge with

$ conda install metalearners -c conda-forge

Development

Development instructions can be found here.

metalearners's People

Contributors

Stargazers

Watchers

Forkers

apoorvalal

metalearners's Issues

Provide nuisance estimates to pseudo-outcome methods

Status quo

As of now we have the following interface for the pseudo-outcome methods in the R-Learner and R-Learner:

DR-Learner

metalearners/metalearners/drlearner.py

Lines 381 to 390 in d863df1

 def _pseudo_outcome( 

 self, 

 X: Matrix, 

 y: Vector, 

 w: Vector, 

 treatment_variant: int, 

 is_oos: bool, 

 oos_method: OosMethod = OVERALL, 

 epsilon: float = _EPSILON, 

 ) -> np.ndarray:

R-Learner

metalearners/metalearners/rlearner.py

Lines 469 to 479 in d863df1

 def _pseudo_outcome_and_weights( 

 self, 

 X: Matrix, 

 y: Vector, 

 w: Vector, 

 treatment_variant: int, 

 is_oos: bool, 

 oos_method: OosMethod = OVERALL, 

 mask: Vector | None = None, 

 epsilon: float = _EPSILON, 

 ) -> tuple[np.ndarray, np.ndarray]:

Since both pseudo outcome kinds require nuisance model estimates and since these are visibly not provided as input arguments, they are estimated as part of the respective pseudo outcome method.

Importantly, the pseudo outcome methods are treatment-variant specific. Yet, the nuisance estimates estimated as part of the pseudo outcome methods are not treatment variant specific:

In the case of the R-Learner, the overall outcome model $\hat{\mu}$ is applied on all data; the overall propensity model $\hat{e}$ is applied on all data. Only after the estimation is the data filtered wrt to the treatment variant at hand:

metalearners/metalearners/rlearner.py

Lines 495 to 508 in d863df1

 y_estimates = self.predict_nuisance( 

 X=X, 

 is_oos=is_oos, 

 model_kind=OUTCOME_MODEL, 

 model_ord=0, 

 oos_method=oos_method, 

 )[mask] 

 w_estimates = self.predict_nuisance( 

 X=X, 

 is_oos=is_oos, 

 model_kind=PROPENSITY_MODEL, 

 model_ord=0, 

 oos_method=oos_method, 

 )[mask]

In the case of the DR-Learner, the propensity $\hat{e}$ and all conditional average outcomes $\hat{mu}_k$ are estimated for all data points; filtering of variant-specific information only happens thereafter:

metalearners/metalearners/drlearner.py

Lines 394 to 411 in d863df1

 conditional_average_outcome_estimates = ( 

 self.predict_conditional_average_outcomes( 

 X=X, 

 is_oos=is_oos, 

 oos_method=oos_method, 

 ) 

 ) 

 propensity_estimates = self.predict_nuisance( 

 X=X, 

 is_oos=is_oos, 

 oos_method=oos_method, 

 model_kind=PROPENSITY_MODEL, 

 model_ord=0, 

 ) 

 y0_estimate = conditional_average_outcome_estimates[:, 0] 

 y1_estimate = conditional_average_outcome_estimates[:, treatment_variant]

Assessment

In the case of $k>2$ many treatment variants, the above approach causes needlessly much effort since the same nuisance estimates are created, i.e. repeated, for every single treatment variant, which is not considered to be the 'control'.

Computational burden aside, it is not clear that it is a better method interface that the pseudo outcome methods does the estimation itself. Wouldn't it feel more natural that (and concerns be better separated if) the pseudo outcome methods merely defined the pseudo outcome given the nuisance estimates, rather than estimating quantities itself?

Leakage in X-Learner in-sample prediction

Issue at hand

@ArseniyZvyagintsevQC brought the following to our attention:

Let us assume a binary treatment variant scenario in which we want to work with in-sample predictions, i.e. is_oos=False.

The current implementation would go about fitting five models, three of which considered nuisance models and two of which considered treatment models:

model	target	cross-fitting dataset	stage	name
$\hat{\mu}_0$	$Y_i$	$\{(X_i, Y_i) \| W_i=0\}$	nuisance	`"treatment_variant"`
$\hat{\mu}_1$	$Y_i$	$\{(X_i, Y_i) \| W_i=1\}$	nuisance	`"treatment_variant"`
$\hat{e}$	$W_i$	$\{(X_i, Y_i)\}$	nuisance/propensity	`"propensity_model"`
$\hat{\tau}_0$	$\hat{\mu}(X_i) - Y_0$	$\{(X_i, Y_i) \| W_i=0\}$	treatment	`"control_effect_model"`
$\hat{\tau}_1$	$Y_i - \hat{\mu}(X_i)$	$\{(X_i, Y_i) \| W_i=1\}$	treatment	`"treatment_effect_model"`

More background on this here.

Note that each of these models is cross-fitted. More precisely, each is cross-fitted wrt the data it has seen at training time.

Let's suppose now that we are at inference time and encounter an in-sample data point $i$. Wlog, let's assume that $W_i=1$.
In order to come up with a CATE estimate, the predict method will run

$\hat{\tau}_0(X_i)$ with is_oos=True since this datapoint has not been seen during training time of the model $\hat{\tau}_0$
$\hat{\tau}_1(X_i)$ with is_oos=False since this datapoint has indeed been seen during the training time of the model $\hat{\tau}_1$

The latter call makes sure we avoid leakage in $\hat{\tau}_1$. The former call, however, does not completely avoid leakage:
even though $i$ hasn't been seen in the training of $\hat{\tau}_0$, it has been seen in $\hat{\mu}_1$, which is, in turn, used by $\hat{\tau}_0$. Therefore, the observed outcome $Y_i$ can leak into the estimate $\hat{\tau}(X_i)$.

Next steps

We can devise an extreme, naïve approach to counteract this issue by training every type of model once per datapoint. Clearly, this ensures the absence of data leakage. The challenge with this issue revolves around coming up with a design that

allows for arbitrary numbers (>1, <=n) of cross-fitting folds, i.e. not fixing it to be equal to the number of training data points
integrates well into the structure of the library

Add tests for classification with only one outcome

Model-specific intialization fails if a superset of expected base model keys is provided

Example

from metalearners import TLearner
from lightgbm import LGBMRegressor

tlearner = TLearner(
    nuisance_model_factory=LGBMRegressor,
    is_classification=False,
    n_variants=2,
    nuisance_model_params={"verbose": -1},
    feature_set={"variant_outcome_model": None, "useless_model": []}
)

tlearner.feature_set

yields

{'variant_outcome_model': {'variant_outcome_model': None, 'useless_model': []}}

We observe that the model dictionary has been constructed improperly. This was spotted by @MatthiasLuxQC .

Underlying problem

metalearners.metalearner._initialize_model_dict only returns the output if the set of provided keys is exactly equal to the set of expected keys:

https://github.com/Quantco/metalearners/blob/main/metalearners/metalearner.py#L95-L98

Instead, we should probably test that the provided keys are a superset of the expected keys.

Allow for passing of 'fixed' propensity scores

Several MetaLearners, such as the R-Learner or DR-Learner, have propensity base models.

As of now, they are trained -- just as all other base models -- based on the data passed through the MetaLearner's fit call.

In particular in cases of non-observational data, it might be interesting to pass 'fixed' propensity scores, as compared to trying to infer the propensities from the experiment data.

Next steps:

Clearly define in which scenarios it might be desirable to have 'fixed' propensity score estimates
Assess different implementation options and their design implications (e.g. does creating a wrapped 'model' predicting on the end-user side do the trick? Is it a reasonable suggestion to provide no features to the propensity model? If not, should the scores be provided in __init__, fit, predict?)

Sklearn Dependency update to 1.4 instead of 1.3

In various places the following function from sklearn root_mean_square_error is imported and used (e.g: here). The function was added with version 1.4 hence the pyproject.toml should be updated from 1.3 to 1.4 to reflect this dependency.

Would've liked to make a PR myself, but too much effort to understand how to get pixy pre-commit to run without documentation (to my understanding already work in progress :))

Consistently use `W`, rather than `T`, in docs

Implement `predict_conditional_average_outcomes` for `RLearner`?

All implemented MetaLearners allow the user to call predict_conditional_average_outcomes. At the beginning we thought this was not possible for the RLearner but I think the following formulas may work:

(For ease of notation I'll use $Y(k) := \mathbb{E}[Y_i(k)]$, $Y = \mathbb{E}[Y | X])$, $\tau(k) = \mathbb{E}[Y(k) - Y(0) | X]$ and $e(k) = \mathbb{P}[W = k | X]$)

We know this system of $K$ linear equations is true:

$$\begin{cases} Y(1) - Y(0) = \tau(1)\\\ Y(2) - Y(0) = \tau(2)\\\ \vdots \\\ Y(K) - Y(0) = \tau(K) \\\ e(0) Y(0) + e(1) Y(1) + \dots + e(K) Y(K) = Y \end{cases}$$

that we need to solve for $Y(0), Y(1), \dots, Y(K)$.

Isolating $Y(1), Y(2), \dots, Y(K)$ from each of the first $K-1$ equations and plugging it into the last we get:

$$e(0) Y(0) + e(1) (\tau(1) + Y(0)) + \dots + e(K) (\tau(K) + Y(0)) = Y$$

From this we can isolate $Y(0)$ as:

$$Y(0) = \frac{Y - \sum\limits_{i=1}^{K}e(i)\tau(i)}{e(0) + \sum\limits_{i=1}^{K} e(i)} = Y - \sum\limits_{i=1}^{K}e(i)\tau(i)$$

Where we used the fact that all the propensity scores should sum up to 1.

Finally we can compute all $Y(k)$ as $Y(k) = Y(0) + \tau(k)$.

I extracted this idea for the binary case from this code snippet from this reference:

Any thoughts on this @kklein ?

Extension to DR-Learner: propensity clipping

@MatthiasLoefflerQC has pointed out that section 4.1 Mahajan et al. (2024) suggests a clipping of propensities and a consequent adaptation of CATE estimates.

We could consider adding this procedure as an optional feature in our DR-Learner implementation.

MetaLearners to be implemented

MetaLearners are a family of approaches to estimate CATEs. This issue is supposed to track which concrete MetaLearners have already been implemented in this library.

Name	Reference	Implemented?
T-Learner	https://arxiv.org/pdf/1706.03461	✅
S-Learner	https://arxiv.org/pdf/1706.03461	✅
R-Learner	https://arxiv.org/pdf/1712.04912	✅
X-Learner	https://arxiv.org/pdf/1706.03461	✅
DR-Learner	https://arxiv.org/pdf/2004.14497	✅
RA-Learner	https://arxiv.org/pdf/2101.10943
EP-Learner	https://arxiv.org/pdf/2402.01972
U-Learner	https://arxiv.org/pdf/1706.03461
F-Learner	https://arxiv.org/pdf/1706.03461
M-Learner (a.k.a PW-Learner)	https://arxiv.org/pdf/2101.10943

Please let us know if you'd like to use a MetaLearner -- whether already part of this list or not -- which is not yet implemented.

Implement an ensembler of `MetaLearner`s

sklearn provides a BaseEnsemble class which can be used to ensemble various Estimators.

Unfortunately, sklearn's BaseEnsemble does not work out of the box with a MetaLearner from metalearners due to differences in predict and fit signatures.

In order to facilitate the ensembling of CATE estimates from various MetaLearners, it would be useful to implement helpers.

Some open questions:

Should the ensemble be given trained MetaLearners or train the MetaLearners itself?
Should the ensemble require all MetaLearners to have been trained on exactly the same data?
Should the ensemble work with both, in-sample and out-of-sample data, too?

Run unit tests against Windows in CI

Allow for other prediction methods in `CrossFitEstimator`

Using a CrossFitEstimator one can only call predict and predict_proba from the inner models for oos predictions, it may be interesting to allow passing a string so other methods can be used, this may be useful for example in Survival models where there are sometimes multiple predictions possible, see here.

It may also be interesting to allow it in the metalearners, but this would be a second step.

Challenging "CATE estimation is not supervised learning"

This is not an issue or bug but there was no Discussions sections so I am asking away.

Let's start from the example in your docs:

Why can't I train a multi-output (2 in this specific example) neural network as a regressor and mask the loss for the missing targets? Such masking is quite standard practice in all sorts of neural network use cases, e.g. when time series signals have different lengths etc.

So, here, CATE estimation exactly is supervised learning.

Add support for polars

As of now, covariates X, treatment assignments w and observey outcomes y can be provided as numpy datastructures (np.ndarray) or as pandas datastructures (pd.Series and pd.DataFrame) respectively.

A PR to allow for X to be scipy.sparse.csr_matrix is in the making: PR #86

It might be beneficial to allow for polars datastructures, too.

One question that might arise is how we deal with a potential additional dependency. Do we want to wrap every polars-dependent piece of code in a try-block that tries to import? Do we want to make polars a run dependency of metalearners?

If you'd like to use metalearners with polars please let us know. :)

Evaluate method fails if feature_set is not None

Initialize X, R, or DR metalearner with feature_set specifying what columns are used for what base models
Fit it
Evaluate it

Evaluation fails with an error "Number of features must match the input..."

Remove git_root from run requirements?

Does it make sense to have git_root as run requirement? If I install this package from PyPI or conda-forge, there's no guarantee that I'm running this inside a git repo. The only two helper functions inside the package that use git_root are

metalearners/metalearners/_utils.py

Line 329 in 60745dc

def load_mindset_data() -> tuple[pd.DataFrame, str, str, list[str], list[str]]:
metalearners/metalearners/_utils.py

Line 365 in 60745dc

def load_twins_data(

Here you can just add an argument that tells the functions where to download the data to. If you remove git_root there, it's just a development requirement afterwards.

Fix index columns order for `MetaLearnerGridSearch.results_`

Right now the index columns order is quite hard to read. We can try to make it more human friendly.

	def _pseudo_outcome(
	self,
	X: Matrix,
	y: Vector,
	w: Vector,
	treatment_variant: int,
	is_oos: bool,
	oos_method: OosMethod = OVERALL,
	epsilon: float = _EPSILON,
	) -> np.ndarray:

	def _pseudo_outcome_and_weights(
	self,
	X: Matrix,
	y: Vector,
	w: Vector,
	treatment_variant: int,
	is_oos: bool,
	oos_method: OosMethod = OVERALL,
	mask: Vector \| None = None,
	epsilon: float = _EPSILON,
	) -> tuple[np.ndarray, np.ndarray]:

	y_estimates = self.predict_nuisance(
	X=X,
	is_oos=is_oos,
	model_kind=OUTCOME_MODEL,
	model_ord=0,
	oos_method=oos_method,
	)[mask]
	w_estimates = self.predict_nuisance(
	X=X,
	is_oos=is_oos,
	model_kind=PROPENSITY_MODEL,
	model_ord=0,
	oos_method=oos_method,
	)[mask]

	conditional_average_outcome_estimates = (
	self.predict_conditional_average_outcomes(
	X=X,
	is_oos=is_oos,
	oos_method=oos_method,
	)
	)

	propensity_estimates = self.predict_nuisance(
	X=X,
	is_oos=is_oos,
	oos_method=oos_method,
	model_kind=PROPENSITY_MODEL,
	model_ord=0,
	)

	y0_estimate = conditional_average_outcome_estimates[:, 0]
	y1_estimate = conditional_average_outcome_estimates[:, treatment_variant]

quantco / metalearners Goto Github PK

metalearners's Introduction

metalearners

Example

Installation

Development

metalearners's People

Contributors

Stargazers

Watchers

Forkers

metalearners's Issues

Status quo

Assessment

Issue at hand

Next steps

Example

Underlying problem

Recommend Projects

Recommend Topics

Recommend Org