mobiletelesystems / ambrosia Goto Github PK
View Code? Open in Web Editor NEWAmbrosia is a Python library for A/B tests design, split and result measurement
License: Apache License 2.0
Ambrosia is a Python library for A/B tests design, split and result measurement
License: Apache License 2.0
The BootstrapStats
class is currently not suitable for scenarios where objects in groups are paired (dependent).
For these tasks, we must use consistent sampling, so for each step we select a dependent pair of objects from the experimental groups, rather than independent objects individually.
It is necessary to implement the function of consistent sampling of objects for BootstrapStats
.
In the current implementation of the Preprocessor
it is possible to load the parameters of the cuped
and multicuped
methods using the path to the json file.
It would be good to develop user convenient Preprocessor
methods that allow you to store and load the entire instance using, for example, json file.
This problem depends on the ability of RobustPreprocessor
to deal with parameters storage #14.
We can also consider about AggregatePreprocessor
ability to save and load parameters.
In order to deal with ratio metrics in the correct way, it is useful to implement classes in the preprocessing module that perform the linearization of these metrics.
This should look like standard ambrosia
classes supporting Taylor linearization technique and approach from Yandex article.
Part of big issue for preprocessing enhancement.
To speed up A/B tests on big data, it is necessary to implement support for PySpark
dataframes in the Cuped
class.
It is necessary to think about how to structurally decompose code of functions for data on PySpark
and pandas
.
Make
The Designer
class in its work uses own methods for generating subsamples in an empirical approach.
Let's think about switching to using the Splitter
class for the tasks of subsamples generation inside the Designer
.
Currently, I see the following advantages of this choice:
Splitter
is designed specifically to generate subsamples(actually set of subgroups).tools.py
and others modulesSplitter
instance inside the methods, which would help keep the empirical design more flexible and correct for custom splitting.The cons should be considered as well, for example, there may be some problems with generation of a large number of group pairs with current structure of the Splitter
.
Metric split is not supported for spark tables.
Simple version with 1 covariate column (fit_columns) can be implemented via sorting.
In order to extend the ambrosia
functionality for working with spark data to an acceptable level, it is necessary to implement a set of PySpark statistical criteria classes at ambrosia.spark_tools.stat_criteria
In the Tester
class when one use multiple experimantal groups or several metrics only Bonferroni correction is supported.
It will be useful to implement some more complex and popular classic corrections for MCP (Holm, Benjamini–Hochberg, etc).
It should be noted that current structure of `Tester`` may not be so convenient to add these corrections, so it will need to change the main class code.
These corrections should be discussed before the implementation as well as the problem of correct confidence intervals calculation.
The current RobustPreprocessor
class dynamically calculates quantile values for a given set of columns and removes outliers from them during the execution of the run
method.
In some problems, we need to remove outliers based on pre-selected quantile values. For example, if we have a treatment applied to group B and control group A, it is necessary to clean up outliers using pre-experimental data in order to perform the experiment correctly.
To do this, it is necessary to reconsider the structure of the RobustPreprocessor
class.
One way to solve this problem is to implement fit
and transform
methods. Then when we will have such storable parameters as quantile values for columns, the store
-like and load
-like methods of the class are also essential.
Also, it's good to think about keeping some ability of the class to remove outliers without any fitting.
Fractional split feature of Splitter
returns an undesired result when one tries to split a pandas
dataframe with duplicated indices without passing any argument for id_column
.
The following examples are illustrating the bug.
Let's create a dataframe with duplicated indices:
import pandas as pd
# Create separate dfs
df_1 = pd.DataFrame(np.random.normal(size=(5_000, )),
columns=["metric_val"])
df_1['frame'] = 1
df_2 = pd.DataFrame(np.random.normal(size=(5_000, )),
columns=["metric_val"])
df_2['frame'] = 2
# Concat and shuffle
dataframe = pd.concat([df_1, df_2]).sample(frac=1)
Now perform a fractional split on it:
from ambrosia.splitter import Splitter
# Create `Splitter` instance and make split based on dataframe index (no `id_column` provided)
splitter = Splitter()
factor = 0.5
result_1 = splitter.run(dataframe=dataframe,
method='hash',
part_of_table=factor,
salt='bug')
result_1.group.value_counts()
# Output:
# A 15000
# B 10000
# Name: group, dtype: int64
So, some of the objects after the split are duplicated and now appear in groups several times.
We can see that totally groups are bigger than the original dataframe.
This behaviour does not repeat if we try to split dataframe on the column with duplicated ids.
# Create column from dataframe indices and split on it
dataframe = dataframe.reset_index().rename(columns={'index': 'id_column'})
result_2 = splitter.run(dataframe=dataframe,
id_column='id_column',
method='hash',
part_of_table=factor,
salt='bug')
result_2.group.value_counts()
# Output:
# A 5000
# B 5000
# Name: group, dtype: int64
But if we look deeper, there is another unusual behaviour:
# Let's count objects origin dataframe frequencies in group A
result_2[result_2.group == 'A'].frame.value_counts()
# Output:
# A 2500
# B 2500
# Name: frame, dtype: int64
Objects from two original dataframes appear in the group equally, which in general is not desired.
This should be inspected further.
Bug was not checked on Spark
implementation of same methods, but the care should be taken for them as well.
At the end, I want to add that duplicate indices are undesirable on the id column in the vast majority of splitting issues.
It will be nice to add duplicated id check in Splitter
and warn user via logger.
Current set of usage examples is not self-sufficient and detailed enough for some segment of users.
Therefore, it should be expanded and reorganized.
I am thinking about the following schema:
For the tasks of preprocessing pandas
data and speeding up experiments, we have the Preprocessor
class and a number of base classes with single functionality at preprocessing.
These methods should be implemented for spark
dataframes, in the same paradigm as we have for the Designer
and the Splitter
.
At this moment, the implementation of the following methods is essential:
Current Splitter
functionality allows one to generate n groups of equal size m from some dataframe using different split methods.
But several types of experiments require split with unequal group sizes. It would be nice to implement this feature in the Splitter
tools.
As one of the options, group sizes could be controlled using the modified groups_size
parameter (from run
method) of length n with different values for group sizes inside, for example: [1000, 100, 100]
. If this parameter is equal to single number, the split will be made into groups of the same size.
In the future, this feature can also be used in the Designer
empirical design of unequal sized groups, if we decide to integrate the Splitter
into Designer
methods #16.
The utility of implementing the delta_type
parameter in the Designer
methods needs to be discussed.
This parameter is dedicated for handling relative and absolute effect types.
This issue was created to track the development of PySpark support for methods of the Tester
class.
The current functionality of the Tester
does not support any operations on spark
data. However, this is very important for big data scenarios, and given that we already had PySpark support for Designer
and Tester
, it seems, that such Tester
enhancement is vital for us.
In my opinion we should focus on these two points:
Designer
and others) #19Tester
methodsWhen we use the Designer
class to design a parameter of interest, we operate on the effect input values in the following relative form: [1.01, 1.02, ...]
.
This is a pretty handy notation for a variable, but further we always use in our calculations and stat criteria not really these relative effects, but an absolute type of the effect.
As far as the theoretical approach is concerned, this may be fine, for empirical approaches we can make adjustments and start to distinguish between relative and absolute effects.
For empirical methods, we can implement the same functionality in the Designer
class as in Tester
: handling "absolute" and "relative" effects.
One way to do this is to simply start instantiating a Tester
inside empiric methods(mainly stat_criterion_power
method) and pass all necessary arguments to it. The Tester
class already has all the implemented functions for all the statistical tests in the package.
The notation of relative effects mentioned earlier could remain the same - but now it will be an additional possible effect_type
argument that will be passed to the Designer
and further to the Tester
, which will have two possible values "absolute"
(default) and "relative"
.
In the requirements we have pandas
version >=0.24.0
, however some code in Designer
class (actually pivot_table
method) crashes when pandas
version is less than 1.3.0
.
This is needed to be fixed, and can be done in two ways:
pandas
version in requirements and check that everything is okaypivot_table()
and leave the appropriate version of the dependency as the older oneShort error snippet
get_empirical_table_sample_size
report = report.pivot_table(
TypeError: pivot_table() got an unexpected keyword argument 'sort'
No plans to add support for Ratio Metrics?
(linearization or delta method).
Or give examples of how you work with them within Ambrosia?
Currently Ambrosia supports only simulation-based power calculations for experiments with binary outcomes (see design_binary_size
ultimately referencing __helper_calc_empirical_power
).
One could rely on approximations to arrive at an analytical expression for power. First, consider variance-stabilising transformation of the proportions in the control (
and search for either of
Second, when
and perform the same search.
Let us analytically solve a problem in your 4_usage_example_binary_design.ipynb
: find R
parlance the solution is:
effect <- 1.05
p1 <- 0.05
p2 <- 0.05*effect
sig.level <- 0.05
power <- 0.8
tol <- .Machine$double.eps^0.25
# Variance-stabilising transformation
h <- 2 * asin(sqrt(p1)) - 2 * asin(sqrt(p2))
p.asin <- quote({pnorm(qnorm(sig.level/2, lower = F) - h * sqrt(n/2), lower = F) + pnorm(qnorm(sig.level/2, lower = T) - h * sqrt(n/2), lower = T)})
# Normal approximation of the binomial distribution
p.normal <- quote(pnorm((sqrt(n) * abs(p1 - p2) - (qnorm(sig.level/2, lower.tail = F) * sqrt((p1 + p2) * (1 - (p1 + p2)/2))))/sqrt(p1 * (1 - p1) + p2 * (1 - p2))))
# Solve for n
n.asin <- stats::uniroot(function(n) eval(p.asin) - power, c(2 + 1e-10, 1e+09))$root
n.normal <- stats::uniroot(function(n) eval(p.normal) - power, c(2 + 1e-10, 1e+09))$root
# What is n to achieve the MDE of interest under two approximations?
n.asin # 122106.8
n.normal # 122123.5
This is a self-contained solution that could be easily translated into Python. It is taken from the existing routines:
# Variance stabilising transformation-based
pwr::pwr.2p.test(h = ES.h(0.05, 0.05*effect), power = 0.8, sig.level = 0.05)
# Difference of proportion power calculation for binomial distribution (arcsine transformation)
#
# h = 0.01133831
# n = 122106.8
# sig.level = 0.05
# power = 0.8
# alternative = two.sided
#
#NOTE: same sample sizes
# Normal approximation-based
stats::power.prop.test(n = NULL, p1 = 0.05, p2 = 0.05*effect, power = 0.8, sig.level = 0.05)
# Two-sample comparison of proportions power calculation
#
# n = 122123.5
# p1 = 0.05
# p2 = 0.0525
# sig.level = 0.05
# power = 0.8
# alternative = two.sided
#
#NOTE: n is number in *each* group
I think offering analytical methods in binary designs using the above approximations could be a valuable alternative to your simulation-based power calculations since the former are commonplace in statistics.
The current fractional feature of the Splitter
class only supports splitting into two groups.
In some tasks, it is necessary to make a multigroup partition of a given table. It would be nice if we extended our functionality with such a function.
I think it will be convenient for users to control the division of fractions between groups using the analogue of the part_of_table
parameter, but in the form of a list/iterator: [0.5, 0.1, 0.1, 0.1, 0.1, 0.1]
.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.