Wednesday, April 30, 2014

Multi-Armed Bandits and Testing Online Marketing Campaigns

What is the connection between testing marketing campaigns and Las Vegas?

The Multi-Armed Bandit problem is a statistical problem that seeks to optimize a gambler playing multiple slot machines.  Each slot machine has an unknown probability of paying out, and the goal is to maximize the winnings over time.  Slot machines, by the way, are sometimes called "one-armed bandits", because -- on average -- you leave them with less money than you arrived with.

What is appealing about the Multi-Armed Bandit problem is that it is solvable.  The basic "learn-and-proceed" algorithm works quite well.  That is, for a certain period of time, play all the machines.  Then continue to play the winning'est machine.  Variations on this approach are also optimal, under a wide range of assumptions.

This is very appealing.  Imagine that you have a web site that makes money -- it actually doesn't matter whether that is through selling ads, selling merchandize, or getting subscribers.  You want to optimize the web site, and you can -- literally -- think of millions of different combinations of things.  You can change the layout of the home page, you can change the language, the colors, the images, and so on.  What works best?

The Multi-Armed Bandit solution would seem to provide a path to site optimization.  Start with a few variations.  Treat each variation as a separate "slot machine".  Each time the page is about to be loaded, peer into the historical data on how well each version has done.  Then choose the right version accordingly.  Keep track of how good the page is.  And, one of the optimization algorithms for the Multi-Armed Bandit problem can not only determine which page to load but guarantee optimal outcomes over time.  Truly impressive.

The problem is when this hits the real world.  In the real world, the slot machines are not all equal.  And, the probabilities may be more complicated.  For instance, the message "Visit us while you sip your morning cup of coffee" might work better in the morning than in the afternoon (I'm making this up).  Or the message, "Escape from the winter blues" might not work as well in Florida in January as in Minnesota.

So, this is the first problem.  In the web world, we do know something about even anonymous visitors.  We have a good idea of the time of day where they are.  We have a good idea of their "gross" geography.  We have a good idea of the device and web browser they are using.  What works in one place at one time on one device, may not work in another.

This makes the testing problem much harder.  If we know a handful of categories in advance, we do separate multi-armed bandit problems for each.  For instance:

  • US-based, web-based, weekday working hours.
  • US-based, mobile, weekday working hours.
And so on, through the 12 combinations suggested by these groups.

When the groups are not predetermined, it is a harder problem to solve.  This is often approached using off-line analysis, if there is adequate test data.

Another issue is that some messages may be targeted at a specific population.  For instance, some ip addresses have registered as .EDU indicating educational institutions.  How do you handle market testing for these groups?

If there is only one group, then you can set up separate Multi-Armed Bandit-type tests for EDU and for non-EDU.  But this might get more complicated.  For instance, some messages might be targeted for web versus mobile.  And now you have overlap -- because EDU users are in both groups.

Yet another problem is that determining the outcome may take time.  If the outcome is as simple as clicking on an ad, then that is known within a few minutes.  If the outcome is purchasing a particular, well, not everyone puts something in a cart and immediately checks out.  You might want a day lag.  If the outcome is a longer term measurement (did site engagement go up? time to next visit?), then even longer periods are needed.  This can result in a very slow feed back loop for the Multi-Armed Bandit.  After all, a slot machine returns its winnings (or losings) very quickly.

What is the real solution to such adaptive marketing online?  The real solution requires a comprehensive solution.  This solution has to understand "natural" divisions online (such as organization type) and the fact that a given message may have differential effects in different areas.  This solution has to take into account that some message are targeted to specific visitors or customers (if they can be known), so pure randomization may not be possible.  The solution has to understand the time dimension of the business, hopefully incorporating faster measures in the short-term, even if these are imperfect, along with validation of the measures in the longer term.

The Mutli-Armed bandit is a powerful idea.  Unfortunately, the solution to this problem is only one part of an online marketing solution.


  1. Multi-armed bandits are rather complicated (perhaps overly so) bit I do know a little about them. I am not convinced that they would work at all for online marketing. They are based around the idea that choosing an option (a marketing option in this case; playing a one-armed bandit in the original analogy) radically reduces the reward of that option in the near future.

    The analogy I've heard used is somebody attempting to spin a number of plates on top of poles. Once the player spins and rebalances one of the plates doing so again doesn't offer much gain and effort would be better spent spinning/balancing one of the other plates.

    The application to advertising doesn't really match the rationale behind the model. Did you read this in a paper? It would be interesting to see the reference. A lot of theoretical operational research plays the game of making problems so abstract that you can produce nice theoretical results but have very little relevance to the real world.

  2. That's nice i like it ......and want to share it with other too.

  3. Hey, Multi-equipped brigand testing and web showcasing includes a statistical issue set-up. The most-utilized sample takes a set of space machines and a speculator who suspects one machine pays out more or more frequently than the others. For each token, they have to decide which space machine to use so as to expand winnings from their financial plan.Happy Good Day!!!

  4. Hello! I think that as of late, there has been a ton of interest in another, alternative methodology to A/B testing. On account of a well known blog entry with an infectious title: 20 lines of code that will beat A/B testing without fail, this new, new thing was everywhere throughout the Internet (Actually, it is not new. Yet open interest in it is new). This new approach is called multi-equipped crook calculation, and since Visual Website Optimizer is an A/B testing device, our clients were interested what our reaction to this alternative methodology was. We would not like to hurry into remarking, so we read the hypothesis and did a few recreations. What we figured out was that the truth is not as straightforward as what that blog entry claimed to be. To put it plainly, multi-furnished crook algorithms don't" "beat" A/B testing. Truth be told, an innocent understanding of this calculation can lead you to pondering what's going on in the engine.All the best!!!
    Internet Marketing Pro

  5. Your post is food for thought. I read an article on an adtech blog that was comparing Multi-armed bandit and A/B testing within website optimisation framework. Do you think A/B testing is an easier tool to use when it comes to conversion optimization?
    for reference :


Your comment will appear when it has been reviewed by the moderators.