## Thursday, May 1, 2008

### Statistical Test for Measuring ROI on Direct Mail Test

If I want to test the effect of return of investment on a mail/ no mail sample, however, I cannot use a parametric test since the distribution of dollar amounts do not follow a normal distribution. What non-parametric test could I use that would give me something similar to a hypothesis test of two samples?

Recently, we received an email with the question above. Since it was addressed to bloggers@data-miners.com, it seems quite reasonable to answer it here.

First, I need to note that Michael and I are not statisticians. We don't even play one on TV (hmm, that's an interesting idea). However, we have gleaned some knowledge of statistics over the years, much from friends and colleagues who are respected statisticians.

Second, the question I am going to answer is the following: Assume that we do a test, with a test group and a control group. What we want to measure is whether the average dollars per customer is significantly different for the test group as compared to the control group. The challenge is that the dollar amounts themselve do not follow a known distribution, or the distribution is known not to be a normal distribution. For instance, we might only have two products, one that costs \$10 and one that costs \$100.

The reason that I'm restating the problem is because a term such as ROI (return on investment) gets thrown around a lot. In some cases, it could mean the current value of discounted future cash flows. Here, though, I think it simply means the dollar amount that customers spend (or invest, or donate, or whatever depending on the particular business).

The overall approach is that we want to measure the average and standard error for each of the groups. Then, we'll apply a simple "standard error" of the difference to see if the difference is consistently positive or negative. This is a very typical use of a z-score. And, it is a topic that I discuss in more detail in Chapter 3 of my book "Data Analysis Using SQL and Excel". In fact, the example here is slightly modified from the example in the book.

A good place to start is the Central Limit Theorem. This is a fundamental theorem for statistics. Assume that I have a population of things -- such as customers who are going to spend money in response to a marketing campaign. Assume that I take a sample of these customers and measure an average over the sample. Well, as I take more an more samples, the distribution of the averages follows a normal distribution regardless of the original distribution of values. (This is a slight oversimplification of the Central Limit Theorem, but it captures the important ideas.)

In addition, I can measure the relationship between the characteristics of the overall population and the characteristics of the sample:

(1) The average of the sample is as good an approximation as any of the average of the overall population.

(2) The standard error on the average of the sample is the standard deviation of the overall population divided by the square root of the size of the sample. Alternatively, we can phrase this in terms of variance: the variance of the sample average is the variance of the population average divided by the size of the sample.

Well, we are close. We know the average of each sample, because we can measure the average. If we knew the standard deviation of the overall population, then we could get the standard error for each group. Then, we'd know the standard error and we would be done. Well, it turns out that:

(3) The standard deviation of the sample is as good an approximation as any for the standard deviation of the population. This is convenient!

Let's assume that we have the following scenario.

Our test group has 17,839 customers, and the overall average purchase is \$85.48. The control group has 53,537 customers, and the average purchase is \$70.14. Is this statistically different?

We need some additional information, namely the standard deviation for each group. For the test group, the standard deviation is \$197.23. For the control group, it is \$196.67.

The standard error for the two groups is then \$197.23/sqrt(17,839) and \$196.67/sqrt(53,537), which comes to \$1.48 and \$0.85, respectively.

So, now the question is: is the difference of the means (\$85.48 - \$70.14 = \$15.34) significantly different from zero. We need another formula from statistics to calculate the standard error of the difference. This formula says that the standard error is the square root of the sums of the squares of standard errors. So the value is \$1.71 = sqrt(0.85^2 + 1.48^2).

And we have arrived at a place where we can use the z-score. The difference of \$15.34 is about 9 standard deviations from 0 (that is, 9*1.71 is about 15.34). It is highly, highly, highly unlikely that the difference includes 0, so we can say that the test group is significantly better than the control group.

In short, we can apply the concepts of normal distributions, even to calculations on dollar amounts. We do need to be careful and pay attention to what we are doing, but the Central Limit Theorem makes this possible. If you are interested in this subject, I do strongly recommend Data Analysis Using SQL and Excel, particularly Chapter 3.

--gordon

1. A few questions:

Why didn't you you a ranked non-parmametric test like kruskall-wallis to test the differences?

I don't understand why you would frequently sample a population that you can easily access whole.

Also, you haven't factored in to accounts that have account spent compared to the control. A binomial/CHI-Square test would actually tell you how many extra people are actually spending which leads to future growth.

2. I cannot use a parametric test since the distribution of dollar amounts do not follow a normal distribution. feng shui

Your comment will appear when it has been reviewed by the moderators.