- Cookie deletions. A user may manually delete their cookies one or more times during the month.
- Disallowing first party cookies. A user may allow session cookies (while the browser is running), but not allow the cookies to be committed to disk.
- Multiple browsers. A single user may use multiple browsers on the same machine during the month. This is particularly true when the user upgrades his or her browser.
- Multiple machines. A single user may use multiple machines during the month.
And, I have to admit, that the data that I'm using has one more problem, which is probably not widespread. The cookies are actually hashed into four bytes. This means that it is theoretically possible for two "real" cookies to have the same hash value. Not only theoretically possible, but it happens (although not too frequently).
If I make the following assumptions:
- The Yahoo! users have an average of 2.5 cookies per month.
- ComCast used the main Yahoo! cookies, and not the Yahoo! mail cookies.
- All Yahoo! users use the site consistently throughout the month.
- All Yahoo! users have the "keep me logged in for 2 weeks" box checked.
By the way, I find this number much more reasonable. I also think that it misses the larger source of overcounting -- users who use more than one machine. Unfortunately, there is no single approach. In the case that I'm working on, we have the advantage that a minority of users are registered, so we can use them as a sample.