- Visits to a web page.
- Purchases of additional minutes for a pre-paid phone plan.
- Subscription renewals.
- Repeat purchases.
- Pregnancies.
- Incarcerations.
- Posts to a blog.
These are customers making multiple purchases during an observation window. Each time a customer makes a purchase, a transaction record is created. When we add this data to a table in the counting process style, each customer contributes several rows. There is a row for the time from time 0, which may be the time of the initial purchase, to the second purchase, a row for the time to each subsequent purchase, and a row for the time between the final observed purchase and the end of the observation period.
Depending on the style of analysis used, each event may be seen as starting a new time 0 with the number of previous events as a covariate, or each event may be modeled separately with a customer only becoming part of the at-risk pool for event n after experiencing event n-1.
Either way, it is important to include the final censored time period. This period does not correspond to any transaction, but customers are "at risk" for another purchase during that period.
My approach to creating the table is to first create the table without the censored observations, which is reasonably straightforward. Each of these rows contains a flag indicating it is a complete, uncensored observation. Next I create just the censored observations by creating an observation going from the latest observed purchase to the end of the observation period (in this case, 22May2006). The censored rows can then be appended to the uncensored rows. These could, of course, be turned into subqueries in order to avoid creating the temporary tables.
This fully expanded version of the data is what is referred to as the counting process style of input. In a realistic situation where there might be millions of customers, it makes more sense to group by tenure so that there is one row showing how many customers made a purchase with that tenure and how many customers experienced the tenure and so could have made a purchase. This is the data needed to estimate the intensity function.
In Gordon Linoff's book, Data Analysis Using SQL and Excel, he provides sample code for making a related, but different table using the data available on the book's companion page. I reproduce it here for reference.
The code uses the DATEDIFF function to subtract a household's first order date from all its other order dates to put things on the tenure timeline. It then counts the number of second (or third, or fourth, . . .) purchases that happen at each tenure. This query does not track the population at risk so it is not the actual intensity function, but it never the less gives a nice visual image of the way intensity peaks at yearly intervals as many customers make regular annual purchases, just as the purchasers of calendars in the previous posting did.
Hi Gordon, Michael,
ReplyDeleteLet me include a comment seven years after;)
First of all, I'd like give you two guys thanks for your incredible book: 'Data Analysis Using SQL and Excel'.
For me, the most practical reference I have read in my life.
Then, I would like to share with you and sql script that I have adapted from 'chapter 8: Customer purchases and other repeated events' , section 'Time to next event'.
You split the process between a sql script and and Excel spreadsheet with formulas. I have only joined all the model in the same query:
SELECT
ReplyDeleteS.days
,S.h1
,S.acum_h1
,s.Previous_h1
--,sum(S.Previous_h1) over(order by days ) as ascG
,CASE WHEN (days=0 ) THEN CAST(1 AS DECIMAL(16,6)) ELSE (1-sum(S.Previous_h1) over(order by days )) end as S1
,CASE WHEN (days=0 ) THEN CAST(1 AS DECIMAL(16,6)) ELSE (1-sum(S.Previous_h2) over(order by days )) end as S2
,CASE WHEN (days=0 ) THEN CAST(1 AS DECIMAL(16,6)) ELSE (1-sum(S.Previous_h3) over(order by days )) end as S3
,CASE WHEN (days=0 ) THEN CAST(1 AS DECIMAL(16,6)) ELSE (1-sum(S.Previous_h4) over(order by days )) end as S4
,CASE WHEN (days=0 ) THEN CAST(1 AS DECIMAL(16,6)) ELSE (1-sum(S.Previous_h5) over(order by days )) end as S5
,CASE WHEN (days=0 ) THEN CAST(1 AS DECIMAL(16,6)) ELSE (1-sum(S.Previous_h6) over(order by days )) end as S6
,1-(CASE WHEN (days=0 ) THEN CAST(1 AS DECIMAL(16,6)) ELSE (1-sum(S.Previous_h1) over(order by days )) end) as _1_S1
ReplyDelete,1-(CASE WHEN (days=0 ) THEN CAST(1 AS DECIMAL(16,6)) ELSE (1-sum(S.Previous_h2) over(order by days )) end) as _1_S2
,1-(CASE WHEN (days=0 ) THEN CAST(1 AS DECIMAL(16,6)) ELSE (1-sum(S.Previous_h3) over(order by days )) end) as _1_S3
,1-(CASE WHEN (days=0 ) THEN CAST(1 AS DECIMAL(16,6)) ELSE (1-sum(S.Previous_h4) over(order by days )) end) as _1_S4
,1-(CASE WHEN (days=0 ) THEN CAST(1 AS DECIMAL(16,6)) ELSE (1-sum(S.Previous_h5) over(order by days )) end) as _1_S5
,1-(CASE WHEN (days=0 ) THEN CAST(1 AS DECIMAL(16,6)) ELSE (1-sum(S.Previous_h6) over(order by days )) end)as _1_S6
FROM
(
SELECT
h.*
--h1
,LEAD(h1, 1,0) OVER (ORDER BY days desc) AS Previous_h1
--,h2
,LEAD(h2, 1,0) OVER (ORDER BY days desc) AS Previous_h2
--,h3
,LEAD(h3, 1,0) OVER (ORDER BY days desc) AS Previous_h3
--,h4
,LEAD(h4, 1,0) OVER (ORDER BY days desc) AS Previous_h4
--,h5
,LEAD(h5, 1,0) OVER (ORDER BY days desc) AS Previous_h5
--,h6
,LEAD(h6, 1,0) OVER (ORDER BY days desc) AS Previous_h6
,sum(h1) over(order by days ) as Acum_h1
,sum(h2) over(order by days ) as Acum_h2
,sum(h3) over(order by days ) as Acum_h3
,sum(h4) over(order by days ) as Acum_h4
,sum(h5) over(order by days ) as Acum_h5
,sum(h6) over(order by days ) as Acum_h6
FROM
(
select
b.*
,case when Total <>0 then cast(cast(b.hasnext as decimal(14,6))/cast(Total as decimal(14,6)) as decimal(14,6)) else 0 end as h1
,case when Total_ord1 <>0 then cast(cast(b.hasnext1 as decimal(14,6))/cast(Total_ord1 as decimal(14,6)) as decimal(14,6)) else 0 end as h2
,case when Total_ord2 <>0 then cast(cast(b.hasnext2 as decimal(14,6))/cast(Total_ord2 as decimal(14,6)) as decimal(14,6)) else 0 end as h3
,case when Total_ord3 <>0 then cast(cast(b.hasnext3 as decimal(14,6))/cast(Total_ord3 as decimal(14,6)) as decimal(14,6)) else 0 end as h4
,case when Total_ord4 <>0 then cast(cast(b.hasnext4 as decimal(14,6))/cast(Total_ord4 as decimal(14,6)) as decimal(14,6)) else 0 end as h5
,case when Total_ord5 <>0 then cast(cast(b.hasnext5 as decimal(14,6))/cast(Total_ord5 as decimal(14,6)) as decimal(14,6)) else 0 end as h6
from
(
ReplyDeleteSELECT
A.*
,SUM(contador) OVER( order by days desc) AS Total
--POP:
,SUM(ord1) OVER( order by days desc) AS Total_ord1
,SUM(ord2) OVER( order by days desc) AS Total_ord2
,SUM(ord3) OVER( order by days desc) AS Total_ord3
,SUM(ord4) OVER( order by days desc) AS Total_ord4
,SUM(ord5) OVER( order by days desc) AS Total_ord5
--h
--,case when SUM(ord1) OVER( order by days desc)<>0 then cast(cast(A.hasnext as decimal(14,4))/cast(SUM(ord1) OVER( order by ord1 desc) as decimal(14,4)) as decimal(14,4)) else 'N/A' end as h1
FROM
(
SELECT DATEDIFF(dd, firstdate, ISNULL(nextdate, DATEADD(DAY, 568, GETDATE()))) as days
, COUNT(*) AS contador,
SUM(CASE WHEN numbef = 1 THEN 1 ELSE 0 END) as ord1,
SUM(CASE WHEN numbef = 2 THEN 1 ELSE 0 END) as ord2,
SUM(CASE WHEN numbef = 3 THEN 1 ELSE 0 END) as ord3,
SUM(CASE WHEN numbef = 4 THEN 1 ELSE 0 END) as ord4,
SUM(CASE WHEN numbef = 5 THEN 1 ELSE 0 END) as ord5,
SUM(hasnext) as hasnext,
SUM(CASE WHEN numbef = 1 THEN hasnext ELSE 0 END) as hasnext1,
SUM(CASE WHEN numbef = 2 THEN hasnext ELSE 0 END) as hasnext2,
SUM(CASE WHEN numbef = 3 THEN hasnext ELSE 0 END) as hasnext3,
SUM(CASE WHEN numbef = 4 THEN hasnext ELSE 0 END) as hasnext4,
SUM(CASE WHEN numbef = 5 THEN hasnext ELSE 0 END) as hasnext5
FROM
(
SELECT o1.householdid, o1.orderdate as firstdate,
SUM(CASE WHEN o2.orderdate < o1.orderdate THEN 1 ELSE 0 END) as numbef,
MIN(CASE WHEN o2.orderdate > o1.orderdate THEN o2.orderdate END) as nextdate,
COUNT(*) as numords,
MAX(CASE WHEN o2.orderdate > o1.orderdate THEN 1 ELSE 0 END) as hasnext
FROM
(
SELECT
c.householdid, o.*
FROM
orders o
JOIN
customer c
ON
o.customerid = c.customerid
) o1
LEFT OUTER JOIN
(
SELECT
c.householdid, o.*
FROM
orders o
JOIN
customer c
ON
o.customerid = c.customerid
) o2
ON
o1.householdid = o2.householdid
GROUP BY
o1.householdid, o1.orderdate
) a
GROUP BY
DATEDIFF(dd, firstdate, ISNULL(nextdate, DATEADD(DAY, 568, GETDATE())))
) A
) b
) h
) S
order by 1
Hi Gordon, Michael,
ReplyDeleteLet me include a comment seven years after;)
First of all, I'd like give you two guys thanks for your incredible book: 'Data Analysis Using SQL and Excel'.
For me, the most practical reference I have read in my life.
Then, I would like to share with you and sql script that I have adapted from 'chapter 8: Customer purchases and other repeated events' , section 'Time to next event'.
You split the process between a sql script and and Excel spreadsheet with formulas. I have only joined all the model in an only query.
SELECT
ReplyDeleteS.days
,S.h1
,S.acum_h1
,s.Previous_h1
--,sum(S.Previous_h1) over(order by days ) as ascG
,CASE WHEN (days=0 ) THEN CAST(1 AS DECIMAL(16,6)) ELSE (1-sum(S.Previous_h1) over(order by days )) end as S1
,CASE WHEN (days=0 ) THEN CAST(1 AS DECIMAL(16,6)) ELSE (1-sum(S.Previous_h2) over(order by days )) end as S2
,CASE WHEN (days=0 ) THEN CAST(1 AS DECIMAL(16,6)) ELSE (1-sum(S.Previous_h3) over(order by days )) end as S3
,CASE WHEN (days=0 ) THEN CAST(1 AS DECIMAL(16,6)) ELSE (1-sum(S.Previous_h4) over(order by days )) end as S4
,CASE WHEN (days=0 ) THEN CAST(1 AS DECIMAL(16,6)) ELSE (1-sum(S.Previous_h5) over(order by days )) end as S5
,CASE WHEN (days=0 ) THEN CAST(1 AS DECIMAL(16,6)) ELSE (1-sum(S.Previous_h6) over(order by days )) end as S6
,1-(CASE WHEN (days=0 ) THEN CAST(1 AS DECIMAL(16,6)) ELSE (1-sum(S.Previous_h1) over(order by days )) end) as _1_S1
,1-(CASE WHEN (days=0 ) THEN CAST(1 AS DECIMAL(16,6)) ELSE (1-sum(S.Previous_h2) over(order by days )) end) as _1_S2
,1-(CASE WHEN (days=0 ) THEN CAST(1 AS DECIMAL(16,6)) ELSE (1-sum(S.Previous_h3) over(order by days )) end) as _1_S3
,1-(CASE WHEN (days=0 ) THEN CAST(1 AS DECIMAL(16,6)) ELSE (1-sum(S.Previous_h4) over(order by days )) end) as _1_S4
,1-(CASE WHEN (days=0 ) THEN CAST(1 AS DECIMAL(16,6)) ELSE (1-sum(S.Previous_h5) over(order by days )) end) as _1_S5
,1-(CASE WHEN (days=0 ) THEN CAST(1 AS DECIMAL(16,6)) ELSE (1-sum(S.Previous_h6) over(order by days )) end)as _1_S6
FROM
(
SELECT
h.*
--h1
,LEAD(h1, 1,0) OVER (ORDER BY days desc) AS Previous_h1
--,h2
,LEAD(h2, 1,0) OVER (ORDER BY days desc) AS Previous_h2
--,h3
,LEAD(h3, 1,0) OVER (ORDER BY days desc) AS Previous_h3
--,h4
,LEAD(h4, 1,0) OVER (ORDER BY days desc) AS Previous_h4
--,h5
,LEAD(h5, 1,0) OVER (ORDER BY days desc) AS Previous_h5
--,h6
,LEAD(h6, 1,0) OVER (ORDER BY days desc) AS Previous_h6
,sum(h1) over(order by days ) as Acum_h1
,sum(h2) over(order by days ) as Acum_h2
,sum(h3) over(order by days ) as Acum_h3
,sum(h4) over(order by days ) as Acum_h4
,sum(h5) over(order by days ) as Acum_h5
,sum(h6) over(order by days ) as Acum_h6
FROM
(
select
b.*
,case when Total <>0 then cast(cast(b.hasnext as decimal(14,6))/cast(Total as decimal(14,6)) as decimal(14,6)) else 0 end as h1
,case when Total_ord1 <>0 then cast(cast(b.hasnext1 as decimal(14,6))/cast(Total_ord1 as decimal(14,6)) as decimal(14,6)) else 0 end as h2
,case when Total_ord2 <>0 then cast(cast(b.hasnext2 as decimal(14,6))/cast(Total_ord2 as decimal(14,6)) as decimal(14,6)) else 0 end as h3
,case when Total_ord3 <>0 then cast(cast(b.hasnext3 as decimal(14,6))/cast(Total_ord3 as decimal(14,6)) as decimal(14,6)) else 0 end as h4
,case when Total_ord4 <>0 then cast(cast(b.hasnext4 as decimal(14,6))/cast(Total_ord4 as decimal(14,6)) as decimal(14,6)) else 0 end as h5
,case when Total_ord5 <>0 then cast(cast(b.hasnext5 as decimal(14,6))/cast(Total_ord5 as decimal(14,6)) as decimal(14,6)) else 0 end as h6
from
Hi again,
ReplyDeleteI have included the first part of the query.
Kind regards.