deacondesperado / flask_mab Goto Github PK

View Code? Open in Web Editor NEW

80.0 2.0 12.0 414 KB

An implementation of the multi-armed bandit optimization pattern as a Flask extension

License: Other

Python 90.12% CSS 3.00% HTML 6.88%

flask flask-extension python

flask_mab's Introduction

Flask-MAB

Flask-MAB is an implementation of multi-armed bandit test pattern as a flask middleware.

It can be used to test the effectiveness of virtually any parts of your app using user signals.

If you can pass it, we can test it!

Note for users of pre-release version: The API has [changed](#2) significantly with 1.0 to better fit with the [application factory pattern](http://flask.pocoo.org/docs/patterns/appfactories/).

Complete Documentation.

Multi-armed what?!

A multi-armed bandit is essentially an online alternative to classical A/B testing. Whereas A/B testing is generally split into extended phases of execution and analysis, Bandit algorithms continually adjust to user feedback and optimize between experimental states. Bandits typically require very little curation and can in fact be left running indefinitely if need be.

The curious sounding name is drawn from the "one-armed bandit", an colloquialism for casino slot machines. Bandit algorithms can be thought of along similar lines as a eager slot player: if one were to play many slot machines continously over many thousands of attempts, one would eventually be able to determine which machines were hotter than others. A multi-armed bandit is merely an algorithm that performs exactly this determination, using your user's interaction as its "arm pulls". Extracting winning patterns becomes a fluid part of interacting with the application.

While bandit algorithms can provide excellent automated optimization, it's important to note that they are not considered a replacement for classic A/B tests. Bandits could be considered a sort of "black box," in the sense that their intuitions become opaque as they optimize. Experiments that call for rigorous tests of statistical significance may be better suited to more traditional frameworks.

John Myles White has an awesome treatise on Bandit implementations in his book Bandit Algorithms for Website Optimization. Most of the code in this library consistes of his excellent guidelines reimplemented to suit the nature of the Flask request lifecycle.

Getting Started

To get started defining experiments, there several steps:

Determine what parts of your app you'd like to optimize
Setup a storage engine (currently only json, though mongo+zodb are in the roadmap)
Instantiate Bandits for all your experiments (you can have as many as you like, several experiments can run at once in a single app.)
Assign arms to your bandits that represent your experimental states
Attach the BanditMiddleware to your Flask app.

This guide will take you through each step. The example case we'll be working with is included in the source under the 'example' folder if you'd like to try running the finished product.

Determining what to test

The first task at hand requires a little planning. What are some of the things in your app you've always been curious about changing, but never had empirical data to back up potential modifications? Bandits are best suited to cases where changes can be "slipped in" without the user noticing, but since the state assigned to a user will be persisted to their client, you can also change things like UI.

For our example case, we'll be changing the label text and color of a button in our app to see if either change increases user interaction with the feature. We'll be representing these states as two separate experiements (so a user will get separate assignments for color and text) but you could conceviably make them one experiment by utilizing a tuple or sequence. More on that later!

Setting up your storage backend

HTTP itself is stateless, but bandits need to persist their increments between requests. In order to accomplish this, there is a bandit storage interface that can be implemented to save all the experiments for an application down to memory, database, etc.

At present, the only core implementation of this interface saves the bandits down to a JSON file at the path you specify, but this should work for most purposes. For 1.0 release, implementations using MongoDB and ZODB are planned.

Storage engines are attached using flask configuration directives.

Let's start setting up our bandit file storage:

app.config['MAB_STORAGE_ENGINE'] = 'JSONBanditStorage'
app.config['MAB_STORAGE_OPTS'] = ('./example/bandit_storage.json',)

This storage instance will be passed into our bandit middleware and all values that need to be persisted will be handled under the hood.

The storage opts are just arguments to be passed to the storage instance constructor (in this case, just the path to a flat file to store the information.)

Create bandits and assigning arms

The next step is to create a bandit for each experiment we want to test.

There are several different bandit implemenations included, but for the purposes of this example we'll be using an bandits.EpsilonGreedyBandit, an algorithm which aggressively assigns the present winner according to a fixed constant value, epsilon

Expanding upon our previous example, here are our bandits alongside our storage engine:

from flask.ext.mab.storage import JSONBanditStorage
from flask.ext.mab.bandits import EpsilonGreedyBandit

color_bandit = EpsilonGreedyBandit(0.2)
color_bandit.add_arm("green","#00FF00")
color_bandit.add_arm("red","#FF0000")
color_bandit.add_arm("blue","#0000FF")

txt_bandit = EpsilonGreedyBandit(0.5)
txt_bandit.add_arm("casual","Hey dude, wanna buy me?")
txt_bandit.add_arm("neutral","Add to cart")
txt_bandit.add_arm("formal","Good day sir... care to purchase?")

Here we have two bandits, one of which will randomize %20 of the time on the color of the button, the other %50 of the time on the text. The colors and test blurbs are considered our "arms" in the bandit parlance. An epsilon greedy bandit splits states between random selection and deterministically selecting the "winner", so as users click more, thereby sending reward signals, one combination of these two states will start to win out.

This code could easily be refactored using a function or generator, but for now, we'll include the full boilerplate. If you have a lot of experiments, consider defining a function to be more convenient.

Attaching the middleware

The main BanditMiddleware is where all the magic happens. Attaching it to our app, assigning it some bandits, and sending it pull and reward signals is all that's necessary to get the test going.

Expanding on our example, we'll define a simple flask app with some basic routes for rendering the interface. These routes will also understand how to reward the right arms and update the bandits so the state of the experiment starts adjusting in realtime.

Again, boilerplate here could be easily cut down, but here is a rough example:

from flask import Flask,render_template
from flask.ext.mab import BanditMiddleware

app = Flask('test_app')
mab = BanditMiddleware()
mab.init_app(app)
app.add_bandit('color_btn',color_bandit) #our bandits from previous code block
app.add_bandit('txt_btn',txt_bandit)

@app.route("/")
def home():
    """Render the btn"""
    return render_template("ui.html")

@app.route("/btnclick")
def home():
    """Button was clicked!"""
    return render_template("btnclick.html")

Now our app understands that it should be tracking two experiments and persisting their values to a file. "Arms" that get selected for every user will be persisted to cookies. However, we still need to make the system understand what endpoints use which experiments. In our example case, the "/" route is going to render the button, and so both states will need to be assigned there. The "/btnclick" endpoint, alternatively, is where our reward is determined, the theoretical "payoff" that state won us. In this case, its a boolean, assigning a 1 if the button gets clicked. So how are these two signals sent to the middleware? There are decorators much like the route decorator that easily registers these actions.

Using the decorators

Setting up the MAB feedback cycle is easily negotiated by endpoint:

@app.route("/")
@mab.choose_arm("color_btn")
@mab.choose_arm("txt_btn")
def home(color_btn, txt_btn):
    """Render the btn using values from the bandit"""
    return render_template("ui.html",btn_color=color_btn,btn_text=txt_btn)

@app.route("/btnclick")
@mab.reward_endpt("color_btn",1.0)
@mab.reward_endpt("txt_btn",1.0)
def reward():
    """Button was clicked!"""
    return render_template("btnclick.html")

Using these decorators, our middleware knows that the it should suggest some values for both our experiments at the root endpoint. When decorating with choose_arm, we identify the bandit/experiment we need a value assignment for. Just like parameters from your route these values are passed into the view function in the order you decorated for them, always after your route params

It should be stressed that things like colors are probably best stored in CSS, but for this example we'll pass the values right into jinja. You could consider setting up a dedicated endpoint for experiments with static styles like this, one that could parse and render your CSS. The rough idea here is to leave what the bandit actually affects up to you.

On the other side of the process, our "/btnclick" endpoint now knows that whatever "arms" assigned to this user worked out well, because the user clicked it. The BanditMiddleware.reward_endpt decorator knows to look in our user's cookie for the values that were assigned to her and give them some props. We're using booleans here, but you could pass any amount of reward in the event that some states in your experiment are better than others (you could for example weight your experiments differently.)

That's it! This user's feedback will be persisted by the middleware and used to adjust the content for future users. Over time, this pattern will start converging to a winner. Your app will get optimized on these two experimental features for free!

flask_mab's People

Contributors

Stargazers

Watchers

Forkers

juaneschutte jdkx d13sl0w henryvps gitter-badger edelans bigrlab cheng-kuan cstm112 daghan spencernelsonucla dougmiller

flask_mab's Issues

Using Flask-MAB with application factory pattern

For a first release, the documentation for Flask-MAB is excellent. Well done! My only question is regarding how to use Flask-MAB with application factories. Specifically, it's not clear where one should put...

mab = BanditMiddleware(app,bandit_storage)

... and how one would access the above object from within the portion of the codebase that handles routes (i.e., where the bandits would also presumably be located).

I'm happy to submit a pull request to clarify this, but first I would need to understand how this would work. Would you be so kind as to shed some light on this?

Once again, excellent work on this project!

Flask-Debugtoolbar integration

It would be cool to have MAB plug into flask debug toolbar plugin.

ZODB storage

Although it's not widely used, ZODB seems like a pretty good drop-in storage solution for POPO's like the bandits, and since it's persists to a file it's a pretty lightweight dependency.

Make storage engines true Flask configuration variable

Implement rotating file storage

In order to make steps towards #8, a persistency solution that takes snapshots of changing bandit state is needed.

This could be built off of python's core rotating log file handler and take the necessary config via standard flask config.

Admin Interface?

It would be cool to have a visualization layer secured via flask-admin (as an optional dependency) that could track the average experimentation over time.

Use universal URI paths for storage engines?

Would it be practical to configure storage engines with URIs in the style of pymongo, urllib2 etc?

Implement monte carlo tests

Ideally the test suite should contain better testing on the bandit logic.

Multiple Bandits occasionally emit false positives

A bug in execution in the function decoration occasionally results in false positives when chaining decorators:

@route("/foo")
@choose_arm("foo")
@choose_arm("bar")
def endpt(foo, bar):
  return render_template("index.html")

Implement Softmax bandit

reward_and_redirect decorator

Right now, preventing repeated rewards is left up to the user implementation.

This should be improved by codifying the practice into a reward_and_redirect decorator.

Question : is it python3 compatible ?

debug_toolbar and Jijna3 incompatibility

After upgrading to Flask2 with Jinja3 requirement and debugtoolbar 0.10.1, I get an error with the debugtoolbar which says that flask_mab was not insalled ina way that PackageLoader understands:

File "/Users/doug/.pyenv/versions/3.9.6/lib/python3.9/site-packages/flask_debugtoolbar/toolbar.py", line 32, in create_panels
for panel_class in self._iter_panels(current_app):
File "/Users/doug/.pyenv/versions/3.9.6/lib/python3.9/site-packages/flask_debugtoolbar/toolbar.py", line 57, in _iter_panels
panel_class = cls._import_panel(app, panel_path)
File "/Users/doug/.pyenv/versions/3.9.6/lib/python3.9/site-packages/flask_debugtoolbar/toolbar.py", line 71, in _import_panel
panel_class = import_string(path)
File "/Users/doug/.pyenv/versions/3.9.6/lib/python3.9/site-packages/werkzeug/utils.py", line 865, in import_string
import(import_name)
File "/Users/doug/.pyenv/versions/3.9.6/lib/python3.9/site-packages/flask_mab/debug_panels.py", line 6, in
package_loader = PackageLoader('flask_mab', 'templates')
File "/Users/doug/.pyenv/versions/3.9.6/lib/python3.9/site-packages/jinja2/loaders.py", line 309, in init
raise ValueError(
ValueError: The 'flask_mab' package was not installed in a way that PackageLoader understands.

Looks like the panels are loaded differently in Jinja3.

Min code example:

`
from flask import Flask, render_template, abort
from flask_debugtoolbar import DebugToolbarExtension
from flask_mab import BanditMiddleware,add_bandit,choose_arm,reward_endpt
from flask_mab.storage import JSONBanditStorage
from flask_mab.bandits import EpsilonGreedyBandit
from jinja2.exceptions import TemplateNotFound

app = Flask(name)
app.config['SECRET_KEY'] = '<replace with a secret keyaaaaaa'
app.config['MAB_STORAGE_ENGINE'] = 'JSONBanditStorage'
app.config['MAB_STORAGE_OPTS'] = ('./bandit_storage.json',)
app.debug = True

mab = BanditMiddleware()
mab.init_app(app)

toolbar = DebugToolbarExtension(app)
app.config['DEBUG_TB_PANELS'] = app.config.get("DEBUG_TB_PANELS", []) + ("flask_mab.debug_panels.BanditDebugPanel",)

color_bandit = EpsilonGreedyBandit(0.2)
color_bandit.add_arm("green","#00FF00")
color_bandit.add_arm("red","#FF0000")
color_bandit.add_arm("blue","#0000FF")

txt_bandit = EpsilonGreedyBandit(0.5)
txt_bandit.add_arm("casual","Hey dude, wanna buy me?")
txt_bandit.add_arm("neutral","Add to cart")
txt_bandit.add_arm("formal","Good day sir... care to purchase?")

app.add_bandit('color_button', color_bandit)
app.add_bandit('txt_button', txt_bandit)

@app.route("/")
@choose_arm("color_button")
@choose_arm("txt_button")
def hello_world(color_button, txt_button):
return render_template(
'hello.html',
colour=color_button,
text=txt_button
)

@app.route("/cases/<case_id>")
@reward_endpt("color_button", 1.0)
@reward_endpt("txt_button", 1.0)
def case_with_id(case_id):
try:
return render_template('cases/' + case_id + '.html')
except TemplateNotFound:
abort(404)

@app.errorhandler(404)
def error404(error):
return render_template('404.html')

Requierments: (Flask with the debugtoolbar and MAB only)
blinker==1.4
click==8.0.1
Flask==2.0.1
Flask-DebugToolbar==0.10.1
Flask-MAB==2.0.1
future==0.17.1
itsdangerous==2.0.1
Jinja2==3.0.1
MarkupSafe==2.0.1
Werkzeug==2.0.1

Cheers

PyPI package seems to be missing

There is an existing PyPI listing, but there doesn't seem to be any package attached to it, and thus pip install Flask-MAB produces:

Could not find any downloads that satisfy the requirement Flask-MAB

Any chance of getting a package uploaded to PyPI?

Dual registration methods for reward and pull, via decorators as well as inline functions
Wide implementation for a bandits value: at present, python objects like functions can be returned by a pull, which is a pretty cool feature 🎳

Implement Annealing Softmax Bandit

Scale reward for unique users

Right now, pulls are saved on a per-cookie basis.

We should also scale clicks/reward by unique users for a more accurate underlying reflection.

This could be accomplished by making storage of the bandit outcomes an FSM, where only the first STORED state (as represented in the debug headers) is utilized to trigger a reward.

A flag could be added to the json cookie to filter potential rewards.

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.