wayfair / pylift Goto Github PK

View Code? Open in Web Editor NEW

368.0 368.0 76.0 7.66 MB

Uplift modeling package.

Home Page: http://pylift.readthedocs.io

License: BSD 2-Clause "Simplified" License

Python 100.00%

hacktoberfest

pylift's People

Contributors

Stargazers

Watchers

pylift's Issues

Error in docs

up.transformed_y_train # The predicted uplift.

This is not the predicted uplift, but just the transformed outcome.

No docs on EDA.

We have some EDA functions, up.NIV and up.NWOE, but no docs on them.

1D arrays are required for `UpliftEval`...

Docs should specify this, or this should be fixed: if a (-1,1) array is given, a plot is still produced, but it's wrong.

Calculate actual and predicted treatment effect

Thanks for the great package. I got a model on a set of features, a conversion column (0, 1), and a treatment indicator (0, 1). I am interested in plotting a decile of predicted vs actual treatment effect for hold-out set.

I am struggling to understand how I can calculate the predicted and the actual treatment effect based on the model. Any pointers?

From my understanding, the actual average treatment effect (ATE) can be calculated by the difference in mean outcomes of treatment - control. How can I get predicted average treatment effect? My hunch is that I can take the average of the uplift score per decile. Is that correct?

Thanks in advance!

Adjusted Qini Curver is under Random Line while Q_aqini is positive?

Curve is plotted like this, but upev.Q_aqini = 0.011440722

Can I assume it is a bad score if qini curve is under random line? If so, Q_aqini should be negative?

Latex is not rendering on documentation

https://pylift.readthedocs.io/en/latest/introduction.html

For example:

Input data with imbalanced outcome class?

New to transformed outcome method but found the package super interesting!

How does the method work with imbalanced outcome data though? In many marketing use cases, the proportion of customers who buy tend to be very low, so transformed y^{*} will be 0 most of the time. How does that affect the model?

Can we possibly take a resampling approach for training data? However, if we sample a, say 1:1 balanced outcome dataset, how should we estimate the treatment policy p?

Would love to hear your thoughts on if pylift can consider such data sets.

How to change X axis labels

How to show custom labels instead of showing "Fraction of Data"

WOE-explore.base._get_counts

counts_dict = {}
y = df_new[col_treatment]
trt = df_new[col_outcome]
for feat in feats:
bin_feat = str(feat) + '_bin'
counts1_t1 = df_new[(y == 1) & (trt == 1)][[feat, bin_feat]].groupby(bin_feat).count().rename(
columns={feat: 'counts_y1t1'})
counts1_t0 = df_new[(y == 1) & (trt == 0)][[feat, bin_feat]].groupby(bin_feat).count().rename(
columns={feat: 'counts_y1t0'})
counts0_t1 = df_new[(y == 0) & (trt == 1)][[feat, bin_feat]].groupby(bin_feat).count().rename(
columns={feat: 'counts_y0t1'})
counts0_t0 = df_new[(y == 0) & (trt == 0)][[feat, bin_feat]].groupby(bin_feat).count().rename(
columns={feat: 'counts_y0t0'})
# creating a dataframe with all of these results
counts_dict[feat] = pd.concat([counts1_t1, counts1_t0, counts0_t1, counts0_t0], axis=1).fillna(
0) + 1 # replace any empty slots with zeros (and add 1 to everything)

should be

counts_dict = {}
y = df_new[col_outcome]
trt = df_new[col_treatment]
for feat in feats:
bin_feat = str(feat) + '_bin'
counts1_t1 = df_new[(y == 1) & (trt == 1)][[feat, bin_feat]].groupby(bin_feat).count().rename(
columns={feat: 'counts_y1t1'})
counts1_t0 = df_new[(y == 1) & (trt == 0)][[feat, bin_feat]].groupby(bin_feat).count().rename(
columns={feat: 'counts_y1t0'})
counts0_t1 = df_new[(y == 0) & (trt == 1)][[feat, bin_feat]].groupby(bin_feat).count().rename(
columns={feat: 'counts_y0t1'})
counts0_t0 = df_new[(y == 0) & (trt == 0)][[feat, bin_feat]].groupby(bin_feat).count().rename(
columns={feat: 'counts_y0t0'})
# creating a dataframe with all of these results
counts_dict[feat] = pd.concat([counts1_t1, counts1_t0, counts0_t1, counts0_t0], axis=1).fillna(
0) + 1 # replace any empty slots with zeros (and add 1 to everything)

Memory Error & NIV Dictionary Query

Hi,

I was hoping you guys could please help me!

My dataset is around 270k rows with around 60 variables, its a 60mb file, when I try to call NIV, I run into memory error. To tackle this, I have ran NIV on a sample of around 180k max with success, I then refer to the sample's NIV dictionary and select all variables with a NIV higher than 0.03 for example. I then select these variables from my 270k dataset and build my model on this.

However, my cumulative gains plot always ends up with a negative correlation and cgains line plotted below the random selection line.

My guess is this problem is as a result of two possible things:

Using NIV performed on a sample of the entire dataset has led to a biased NIV calculation, and so shows me variables that aren't very good predictors of uplift.
OR, my dataset is rubbish.

Possible solutions/questions I have:

How can I tackle the memory issue stated above (this will then allow me to select variables based on NIV calculated on the entire 270k dataset)
The NIV values in the dictionary don't seem to resemble the bar plot of NIV. (Assuming its a bar plot with error bars and not a box plot). So my question is, what is the value recorded in the dictionary vs. the values/bars plotted on the NIV graph?

Are there any other solutions you guys would recommend? Apologies - I am fairly new to machine learning in general so still learning alot!

Many thanks for your help in advance, its much appreciated!

Khaashif

TerminatedWorkerError

I am using the Pylift module in an AWS-EC2 linux instance with the code below and getting 2 different errors

up = TransformedOutcome(df_fil, col_treatment='Treatment',col_outcome='Outcome',col_policy='prop_scores', stratify=df_fil['Treatment'],sklearn_model = XGBClassifier)
param_grid = {#'estimator': XGBClassifier(), 'param_grid': {'max_depth': range(1,8,1) 'learning_rate':[x/100 for x in range(1,12,4)], 'colsample_bytree':[x/10 for x in range(3,10,1)], 'min_child_weight':range(1,6,1), 'scale_pos_weight':[x/10 for x in range(12,18,1)], },'n_jobs' : -1}
up.grid_search(**param_grid,cv=2)

Getting the following error while using the above code

`Fitting 2 folds for each of 7 candidates, totalling 14 fits
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done 3 out of 14 | elapsed: 1.0min remaining: 3.7min
[Parallel(n_jobs=-1)]: Done 8 out of 14 | elapsed: 1.0min remaining: 45.1s

TerminatedWorkerError Traceback (most recent call last)
in
----> 1 up.grid_search(**param_grid,cv=2)

~/anaconda3/lib/python3.7/site-packages/pylift/methods/base.py in grid_search(self, **kwargs)
337 self.grid_search_params.update(kwargs)
338 self.grid_search_ = GridSearchCV(**self.grid_search_params)
--> 339 self.grid_search_.fit(self.x_train, self.transformed_y_train)
340 return self.grid_search_
341

~/anaconda3/lib/python3.7/site-packages/sklearn/model_selection/search.py in fit(self, X, y, groups, **fit_params)
685 return results
686
--> 687 self.run_search(evaluate_candidates)
688
689 # For multi-metric evaluation, store the best_index, best_params and

~/anaconda3/lib/python3.7/site-packages/sklearn/model_selection/_search.py in _run_search(self, evaluate_candidates)
1146 def _run_search(self, evaluate_candidates):
1147 """Search all candidates in param_grid"""
-> 1148 evaluate_candidates(ParameterGrid(self.param_grid))
1149
1150

~/anaconda3/lib/python3.7/site-packages/sklearn/model_selection/_search.py in evaluate_candidates(candidate_params)
664 for parameters, (train, test)
665 in product(candidate_params,
--> 666 cv.split(X, y, groups)))
667
668 if len(out) < 1:

~/anaconda3/lib/python3.7/site-packages/joblib/parallel.py in call(self, iterable)
932
933 with self._backend.retrieval_context():
--> 934 self.retrieve()
935 # Make sure that we get a last message telling us we are done
936 elapsed_time = time.time() - self._start_time

~/anaconda3/lib/python3.7/site-packages/joblib/parallel.py in retrieve(self)
831 try:
832 if getattr(self._backend, 'supports_timeout', False):
--> 833 self._output.extend(job.get(timeout=self.timeout))
834 else:
835 self._output.extend(job.get())

~/anaconda3/lib/python3.7/site-packages/joblib/_parallel_backends.py in wrap_future_result(future, timeout)
519 AsyncResults.get from multiprocessing."""
520 try:
--> 521 return future.result(timeout=timeout)
522 except LokyTimeoutError:
523 raise TimeoutError()

~/anaconda3/lib/python3.7/concurrent/futures/_base.py in result(self, timeout)
430 raise CancelledError()
431 elif self._state == FINISHED:
--> 432 return self.__get_result()
433 else:
434 raise TimeoutError()

~/anaconda3/lib/python3.7/concurrent/futures/_base.py in __get_result(self)
382 def __get_result(self):
383 if self._exception:
--> 384 raise self._exception
385 else:
386 return self._result

TerminatedWorkerError: A worker process managed by the executor was unexpectedly terminated. This could be caused by a segmentation fault while calling the function or by an excessive memory usage causing the Operating System to kill the worker. The exit codes of the workers are {SIGABRT(-6)}`

When I remove n_jobs=-1 from the param_grid i.e with the code below

param_grid = {#'estimator': XGBClassifier(), 'param_grid': {'max_depth': range(1,8,1) 'learning_rate':[x/100 for x in range(1,12,4)], 'colsample_bytree':[x/10 for x in range(3,10,1)], 'min_child_weight':range(1,6,1), 'scale_pos_weight':[x/10 for x in range(12,18,1)], }}
up.grid_search(**param_grid,cv=2)

I am getting the following error
terminate called after throwing an instance of 'std::bad_alloc' what(): std::bad_alloc

I am using Jupyter notebook to run the above mentioned code, I know it can't be a memory error, cause I am having ample of memory i.e. 64 GB with 8 cores and using Python3.7 anaconda distribution

Predict New Data?

Are there any descriptions in the docs for scoring new data?

Definition of qini curve is unequal the implemented function

Hi,

I have a question. In your documentation you claim that qini curve is defined as:
nt,1 - nc,1 * Nt / Nc. But your implemented function looks like: nt1o1/Nt - nt0o1/Nc.

It looks like there is an indifference between your documentation and code.

Thanks for your help!

Question on Package - What is added

Thank you for publishing this! I was not sure I entirely followed from the documentation, does the class essentially:

Created the transformed class
Code some standard and variant evaluation metrics (including weighting them when treatment and control was sampled at uneven rates for some instances)
Convenience functions plotting, optimization search etc

Otherwise, what i was curious about, once transformed, is the model fit exactly the same as the base regressor (meaning no change to the fit or objective function)?

Also I was curious, is the same method applicable when the target is not binary but is perhaps something like revenue?

Backward compatibility to run on Py3.5.2

This module runs well on python 3.6x. & 3.7.

Please make it run on 3.5.2 for my use-case (on databricks).
Egg distribution is ideal.

qini and adjusted qini

in terms of treatment and control group split , under what circumstances is adjusted qini recommended for model evaluation, compared to qini?

Balancing for small control group

If I have a control group which is ~2% of the overall data compared to ~98% treatment group (i.e., p=0.98), should the training data be balance such that it will have a 50/50 split between control and treatment?
Otherwise, the negative Inverse probability weight Multiplier for the control group (-1/(1-p) = -50) will be much larger than the positive Inverse probability weight Multiplier for the treatment group (1/p = 1.02).

Pickling Error when calling .randomized_search on up

multiple treatment groups

I have experiment data with multiple treatment groups. How do I include this model ?

Instructions for pylift

Hello, I've been looking at the uplift model recently. I have the following questions about pylift model. Please reply.

Feature screening: how do I need to screen features? How many features are more suitable for molding?
Model evaluation: which index of the model is used to evaluate the effect, and how many general indexes can be considered as the model can be used?
Model export and model loading, up model. save_ Model ('. / AA. Pkl') export the model like this? What about the way to load the model?

Improve Documentation

Some good functionalities of the package are not sufficiently documented on https://pylift.readthedocs.io/, for example

Most optional arguments are only documented in docstring
The capability to initialize TransformedOutcome object with a tuple of custom training and testing dataframes (very neat flexibility) is not documented

NIV empty

Hello,

I've setup my model via TransformedOutcome, then wanted to check all the NIV features. However, when I use NIV (dict, or plot) all the included features are empty. The rest of the steps work and I get plots further down the line, but the NIV being empty leads me to believe they are incorrect.

Setting up the model

up = TransformedOutcome(df, col_treatment='Response', col_outcome='TotalRevenueFoodItems',random_state=4, stratify=y)

Call NIV

up.NIV_dict

Output of NIV: see screenshot

Thank you.

Continuous outcome documentation

I have a few questions related to working with continuous target variable, please let me know if this is not the right place for my questions because they don’t directly relate to the actual code.

Based on t Athey, S., & Imbens, G. W. (2015) paper, the transformed outcome (Y^) is the
Conditional Average Treatment Effect (CART), aka the uplift, for a given Xi. Therefore, if the original outcome column, Y, is continues (e.g., net sale, earning, etc.), then transforming it to Y^, fitting a regression model (e.g., Linear regression, Boosted Decision Tree Regression, etc.), and predicting for a given Xi, the model output can be interpreted as the uplift for a given Xi in the original scale (e.g., uplift in net sale for a given Xi, uplift in earning for a given Xi, etc.). Is this correct?

Now if we decide to apply a data transformation to the original continues outcome column (Only for the samples corresponding with response because the continuous values for non-response are equal to zero), fit a regression model and then predict for a given Xi, does it mean that we have to apply the inverse data transformation for the model output/prediction for it to represent the uplift in the original scale?

In general, can positive predictions be explained as positive uplift compared to negative predictions which indicates negative uplift? or should the model prediction be only perceived as a score/ranking index?

I would recommend improving the documentations regarding continuous outcome and adding an example.

docstrings Issue: scoring_method and scoring_cutoff

Description of the scoring_method and scoring_cutoff exists in the docstrings of derivatives.py but not in base.py.
In addition the default values for scoring_method is inconsistent in both files. I will create a pull Request once I have a chance.

See PR #32

how to evaluate the model

hello, I have two questions and hope that you can help me.

I want to know when we have got an uplift model and every customer has got a score. how to compare with the reponse model .
AUC is a common evaluation. When AUC reaches 0.8, we will think that the model works well. So, when Qini reaches what value, the model works better?

Dependency Dashboard

This issue lists Renovate updates and detected dependencies. Read the Dependency Dashboard docs to learn more.

This repository currently has no open or pending branches.

Detected dependencies

pip_requirements

requirements.txt

matplotlib >=2.1.0

numpy >=1.13.3

scikit-learn >=0.19.1

scipy >=1.0.0

seaborn >=0.7.1

xgboost >=0.6a2

Check this box to trigger a request for Renovate to run again on this repository

Should training and validation data have same treatment:control ratio?

Hi, I have two questions.

Question 1: Considering the following scenario:
I have a dataset where my treatment is given to 94% of the total base so the control is just 6%. Is it the right to approach to use this imbalanced data into the model with specifying the p=0.94 in the TransformedOutcome method?
Or if a sampling is needed, how to deal with the validation data? Should that also be sampled?

Question 2:
I am using the trained model on a hold out data and I have the uplift scores. But I am getting a huge chunk of customers getting exactly the same uplift score. Any thoughts on why this occurs?

Please advice.

🧹 Add Renovate

Description

👋 This repository is not currently configured for Renovate. This issue proposes the steps necessary to add Renovate to this project!

💡 Not familiar with Renovate, or are confused about what advantages it holds over GitHub's Dependabot? Learn more here!

Steps to Add

Review the guide for Adding Renovate to Existing Projects.
Add a comment to this issue as a signal to others that you intend to work on it. The OSPO will then assign the issue to you. If you ultimately decide not to pursue this, please remember to let us know via comment so that others may participate!
If the renovate[bot] account has already auto-filed a Configure Renovate PR against this repository, feel free to reference the proposed changes in your own Pull Request. If you are contributing to this project as a Hacktoberfest participant, you must file your own PR in order to get credit for your contribution!
You may find that the CI build for this project is failing for unrelated reasons. If you are not already a contributor to this project and don't feel comfortable attempting to fix the build, that's okay! There's plenty of other ways you can contribute to Wayfair's open source projects :) Feel free to consult the list of our other participating repositories here!
In order to catch potential JSON syntax errors or other mis-configurations, please add Renovate linting to this project's existing GitHub Workflow CI pipeline, or create a new one (eg. .github/workflows/lint.yml). See here for an example.
If this repository is currently configured to use GitHub's Dependabot, you must also deprecate support for Dependabot in order to avoid conflicts with Renovate. This is typically as simple as removing the .github/dependabot.yml file. See here for an example.

Checklist

I have read the Adding Renovate to Existing Projects guide.
I have assigned this issue to myself avoid duplicating efforts with other potential contributors.
I have verified this repository does not already have Renovate configured (or proposed in an open PR by another contributor).
If the renovate[bot] account has already auto-filed a Configure Renovate PR in this repository, I confirm that I will create a separate PR under my own GitHub account, using the initial PR as inspiration.
I confirm that I have added Renovate linting to this project's existing CI pipeline, or have created a new linting workflow if one doesn't already exist.
If this repository is currently configured to use GitHub's Dependabot, my PR will also deprecate support for Dependabot in order to avoid conflicts with Renovate.