What is the connection between testing marketing campaigns and Las Vegas?

The Multi-Armed Bandit problem is a statistical problem that seeks to optimize a gambler playing multiple slot machines. Each slot machine has an unknown probability of paying out, and the goal is to maximize the winnings over time. Slot machines, by the way, are sometimes called "one-armed bandits", because -- on average -- you leave them with less money than you arrived with.

What is appealing about the Multi-Armed Bandit problem is that it is solvable. The basic "learn-and-proceed" algorithm works quite well. That is, for a certain period of time, play all the machines. Then continue to play the winning'est machine. Variations on this approach are also optimal, under a wide range of assumptions.

This is very appealing. Imagine that you have a web site that makes money -- it actually doesn't matter whether that is through selling ads, selling merchandize, or getting subscribers. You want to optimize the web site, and you can -- literally -- think of millions of different combinations of things. You can change the layout of the home page, you can change the language, the colors, the images, and so on. What works best?

The Multi-Armed Bandit solution would

The problem is when this hits the real world. In the real world, the slot machines are not all equal. And, the probabilities may be more complicated. For instance, the message "Visit us while you sip your morning cup of coffee" might work better in the morning than in the afternoon (I'm making this up). Or the message, "Escape from the winter blues" might not work as well in Florida in January as in Minnesota.

So, this is the first problem. In the web world, we do know something about even anonymous visitors. We have a good idea of the time of day where they are. We have a good idea of their "gross" geography. We have a good idea of the device and web browser they are using. What works in one place at one time on one device, may not work in another.

This makes the testing problem much harder. If we know a handful of categories in advance, we do separate multi-armed bandit problems for each. For instance:

The Multi-Armed Bandit problem is a statistical problem that seeks to optimize a gambler playing multiple slot machines. Each slot machine has an unknown probability of paying out, and the goal is to maximize the winnings over time. Slot machines, by the way, are sometimes called "one-armed bandits", because -- on average -- you leave them with less money than you arrived with.

What is appealing about the Multi-Armed Bandit problem is that it is solvable. The basic "learn-and-proceed" algorithm works quite well. That is, for a certain period of time, play all the machines. Then continue to play the winning'est machine. Variations on this approach are also optimal, under a wide range of assumptions.

This is very appealing. Imagine that you have a web site that makes money -- it actually doesn't matter whether that is through selling ads, selling merchandize, or getting subscribers. You want to optimize the web site, and you can -- literally -- think of millions of different combinations of things. You can change the layout of the home page, you can change the language, the colors, the images, and so on. What works best?

The Multi-Armed Bandit solution would

*seem*to provide a path to site optimization. Start with a few variations. Treat each variation as a separate "slot machine". Each time the page is about to be loaded, peer into the historical data on how well each version has done. Then choose the right version accordingly. Keep track of how good the page is. And, one of the optimization algorithms for the Multi-Armed Bandit problem can not only determine which page to load but guarantee optimal outcomes over time. Truly impressive.The problem is when this hits the real world. In the real world, the slot machines are not all equal. And, the probabilities may be more complicated. For instance, the message "Visit us while you sip your morning cup of coffee" might work better in the morning than in the afternoon (I'm making this up). Or the message, "Escape from the winter blues" might not work as well in Florida in January as in Minnesota.

So, this is the first problem. In the web world, we do know something about even anonymous visitors. We have a good idea of the time of day where they are. We have a good idea of their "gross" geography. We have a good idea of the device and web browser they are using. What works in one place at one time on one device, may not work in another.

This makes the testing problem much harder. If we know a handful of categories in advance, we do separate multi-armed bandit problems for each. For instance:

- US-based, web-based, weekday working hours.
- US-based, mobile, weekday working hours.

And so on, through the 12 combinations suggested by these groups.

When the groups are not predetermined, it is a harder problem to solve. This is often approached using off-line analysis, if there is adequate test data.

Another issue is that some messages may be targeted at a specific population. For instance, some ip addresses have registered as .EDU indicating educational institutions. How do you handle market testing for these groups?

If there is only one group, then you can set up separate Multi-Armed Bandit-type tests for EDU and for non-EDU. But this might get more complicated. For instance, some messages might be targeted for web versus mobile. And now you have overlap -- because EDU users are in both groups.

Yet another problem is that determining the outcome may take time. If the outcome is as simple as clicking on an ad, then that is known within a few minutes. If the outcome is purchasing a particular, well, not everyone puts something in a cart and immediately checks out. You might want a day lag. If the outcome is a longer term measurement (did site engagement go up? time to next visit?), then even longer periods are needed. This can result in a very slow feed back loop for the Multi-Armed Bandit. After all, a slot machine returns its winnings (or losings) very quickly.

What is the real solution to such adaptive marketing online? The real solution requires a comprehensive solution. This solution has to understand "natural" divisions online (such as organization type) and the fact that a given message may have differential effects in different areas. This solution has to take into account that some message are targeted to specific visitors or customers (if they can be known), so pure randomization may not be possible. The solution has to understand the time dimension of the business, hopefully incorporating faster measures in the short-term, even if these are imperfect, along with validation of the measures in the longer term.

The Mutli-Armed bandit is a powerful idea. Unfortunately, the solution to this problem is only one part of an online marketing solution.