It is really quite simple, but it requires a slight tilt of the head. Instead of thinking about the area

*under*the survival curve, think of the equivalent area

*to the left of*the survival curve. With the discreet-time survival curves we use in our work, I think of the area

*under*the curve as a bunch of vertical bars, one for each time period. Each rectangle has width one and its height is the percentage of customers who have not yet canceled as of that period. Conveniently, this means you can estimate the area under the survival curve by simply adding up the survival values. Looked at this way, it is not particularly clear why this value should be the mean tenure.

So let's look at it another way, starting with the original data. Here is a table of some customers with their final tenures. (Since this is not a real example, I haven't bothered with any censored observations; that makes it easy to check our work against the average tenure for these customers which is 7.56.)

Stack these up like cuisenaire rods with the longest on the bottom and the shortest on the top, and you get something that looks a lot like the survival curve.

If I made the bars fat enough to touch, each would get 1/25 of the height of the stack. The area of each bar would be 1/25 times the tenure. If everyone had tenure of 20, like Tim, the area would be 25*1/25*20=20. If everyone had tenure of 1, like the first of the two Daniels, then the area would be 25*1/25*1=1. Since most customers have tenures somewhere between Tim's and Daniels, the area actually comes out to 7.56-- the average tenure.

In J:

x

20 14 13 12 11 11 11 11 10 8 8 8 7 7 7 6 5 4 3 3 3 2 2 2 1

+ /x%25 NB. This is J for "the sum of x divided by 25"

7.56

So, the area under (or to the left of) the stack of tenure bars is equal to the average tenure, but the stack of tenure bars is not exactly the survival curve. The survival curve is easily derived from it, however. For each tenure, it is the percentage of bars that stick out past it. At tenure 0, all 25 bars are longer than 0, so survival is 100%. At tenure 1, 24 out of 25 bars stick out past the line, so survival is 96% and so on.

In J:

i. 21 NB. 0 through 20

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

+/"1 (i. 21)

25 24 21 18 17 16 15 12 9 9 8 4 3 2 1 1 1 1 1 1 0

After dividing by 25 to turn the counts into percentages, we can add the survival curve to the chart.

Now, even the vertical view makes sense. The vertical grid lines are spaced one period apart. The number of blue bars between two vertical grid lines says how many customers are going to contribute their 1/25 to the area of the column. This is determined by how many people reached that tenure. At tenure 0, there are 25/25 of a full day. At tenure 1, there are 24/25, and so on. Add these up and you get 7.56.

Thanks, excellent post.

ReplyDeleteI'd like to calculate the difference of two AUCs (a treatment group and a control group) using survival data from an RCT. Do you know if any statistical packages supports such a feature?

Best regards,

P