In campaign measurements, it's possible to get a larger lift at the overall level compared to all the individual decile level lifts or vice versa, because of the differences in sample size across the deciles, and across Test & Control.
According to wikipedia, it's known as Simpson's paradox (or the Yule-Simpson effect) and is explained as an apparent paradox in which the successes in different groups seem to be reversed when the groups are combined.
In such scenarios, how do you calculate the overall lift? Which methods are commonly used in the industry?
Simpson's Paradox is an interesting phenomenon, where results about subgroups of a population do not generalize to the overall population. I think the simplest version that I've heard is an old joke . . . "I heard you moved from Minnesota to Iowa, raising the IQ of both states."
How could this happen? For the joke to work, the average IQ in Minnesota must be higher than the average IQ in Iowa. And, the person who moves must have an IQ between these two values. Voila, you can get the paradox that the averages in both states go up, although they are based on exactly the same population.
I didn't realize that this paradox has a name (or, if I did, then I had forgotten). Wikipedia has a very good article on Simpson's Paradox, which includes real world examples from baseball, medical studies, and an interesting discussion of a gender discrimination lawsuit at Berkeley. In the gender discrimination lawsuit, women were accepted at a much lower rate than men overall. However, department by department, women were typically accepted at a higher rate than men. The difference is that women applied to more competitive departments than men. These departments have lower rates of acceptance, lowering the overall rate for women.
Simpson's Paradox arises when we are taking weighted averages of evidence from different groups. Different weightings can produce very different, even counter-intuitive results. The results become much less paradoxical when we see the actual counts rather than just the percentages.
The specific question is how to relate this paradox to lift, and understanding marketing campaigns. Assume there is a marketing campaign, where one group receives a particular treatment and another group does not. The ratio of performance between these two groups is the lift of the marketing campaign.
To avoid Simpson's paradox, you need to ensure that the groups are as similar as possible, except for what's being tested. If the test is for the marketing message, there is no problem, both groups can be pulled from the same population. If, instead, the test is for the marketing group itself (say high value customers), then Simpson's Paradox is not an issue, since we care about how the group performs rather than how the entire population performs.
As a final comment, I could imagine finding marketing results where Simpson's Paradox has surfaced, because the original groups were not well chosen. Simpson's Paradox arises because the sizes of the test groups are not proportional to their sizes in the overall population. In this case, I would be tempted to weight the results from each group based on the expected size in the overall population to calculate the overall response and lift.