Thursday, September 27, 2007

Intensity of rating activity by time since first rating

Originally posted to a previous version of this blog on 29 May 2007.

Many people in the Netflix sample rate movies one day and then never rate again. Overall, rating frequency declines over time. In this post, I look at what happens to the probability of making a rating as a function of tenure calculated as days elapsed since the subscriber's first rating. The first chart shows the raw count of ratings by tenure of the subscriber making the rating.

The query that produced this table is:

The reason for restricting the results to tenures shorter the 2,190 days is that beyond that ratings are so infrequent that some tenures are not even represented.

While this chart does show clearly that day 0 ratings are by far the most common (by definition, everyone in the training data made ratings on day 0), not much else is visible. In the second chart, the base 10 log of the ratings count is plotted. On this scale, we can see that after an initial percipitous decline, the number of ratings diminishes more slowly of the next 2,000 days and then drops off sharply as we run out of examples of subscribers with very high tenures.

Recall that the earliest rating date in the training data is 11 November 1999 and the latest is 31 December 2005. This means that the theoretical highest observable tenure would be 2,242 days for a subscriber who submitted ratings on the first and last days of the observation period.

The raw count of ratings by tenure is not particularly informative on its own, but it is the first step towards calculating something quite useful: what is the expected number of ratings that a subscriber will make at any given tenure. This can be estimated empirically by dividing the actual count of ratings for each tenure by the number of people who ever experienced that tenure.This is reminiscent of estimating the hazard probability in survival analysis except that one can rate many movies on a single day and making a rating at tenure t is not conditional on not having made a rating at some earlier tenure. You only die once, but you can make ratings as often as you like. For this purpose, I assume that a subscriber comes into existence when he or she first rates a movie and remains eligible to make ratings for ever. In real life, of course, a subscriber could cease to be a Netflix customer, but that does not concern me as that is just one of several reasons that customers are less likely to make ratings over time. For the 2007 KDD Cup challenge, there is no requirement to predict whether customers cancel their subscriptions; only whether they keep rating movies.

The heart of the calculation combines the table created above with another table containing the number of subscribers who ever experienced each tenure.

On the first day that people rate movies, they rate a lot of them--over 25 on average. By the next day, their enthusiasm has waned dramatically.

The chart shows the expected number of ratings per day after the first 10 days.

It is not surprising that the variance of the expected number of ratings gets high for higher tenures. The pool of people who have experienced the higher tenures is quite small as seen below.

No comments:

Post a Comment

Your comment will appear when it has been reviewed by the moderators.