tag:blogger.com,1999:blog-3366935554564939610.post4196022993234489193..comments2021-09-20T02:39:30.570-04:00Comments on Data Miners Blog: Data Mining and StatisticsMichael J. A. Berryhttp://www.blogger.com/profile/06077102677195066016noreply@blogger.comBlogger6125tag:blogger.com,1999:blog-3366935554564939610.post-3109506172478760192009-10-08T11:38:41.986-04:002009-10-08T11:38:41.986-04:00Yes, statisticians make assumptions, but that'...Yes, statisticians make assumptions, but that's why there's model validation and residual analysis. To me, a big difference between "data miners" and statisticians, especially academic statisticians, is the emphasis placed on model analysis, residual analysis, and validation of assumptions. Usually I find that data miners think they are really doing a lot to validate their models, but they really aren't. In my opinion, data miners rely too much on only a couple of small model tests (lift, confusion matrix, etc.).Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-3366935554564939610.post-76129509578762930432009-10-01T11:14:33.373-04:002009-10-01T11:14:33.373-04:00To me the difference lies in the assumptions of th...To me the difference lies in the assumptions of the technique. Statistics make more assumptions about the data than data mining. For example, a linear regression makes the assumption that the error term is normally distributed with a constant variance. This assumption (among others) is needed in order for the confidence intervals to be efficient and unbiased.<br />Standard data mining techniques, such as Decision Trees or association rules, make much less assumptions about the data. This makes data mining much more flexible, but in my opinion also less precise to generalize.<br /><br />However, I don't believe a clear cut exists!B. Depairenoreply@blogger.comtag:blogger.com,1999:blog-3366935554564939610.post-73959038707459444962009-09-27T02:23:30.299-04:002009-09-27T02:23:30.299-04:00The statistics you describe is qualitative statist...The statistics you describe is qualitative statistics. But there has always been quantitative statistics dealing with large amounts of data as well, apart from data mining.Ken Astonnoreply@blogger.comtag:blogger.com,1999:blog-3366935554564939610.post-52177016248149835832009-09-23T14:07:46.150-04:002009-09-23T14:07:46.150-04:00To me there is really a more clear distinction : s...To me there is really a more clear distinction : statistics are used to test hypotheses, whereas data mining is used to calculate the hypotheses.<br />Suggestion : collect data on statisticians an data miners (what they do, their data etc...) and run a logistic regression or decision tree, to see which variables separates them.Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-3366935554564939610.post-64348573125660850112009-09-23T09:10:05.431-04:002009-09-23T09:10:05.431-04:00Excelent post! I have to agree... the Data Miners ...Excelent post! I have to agree... the Data Miners lives in a big data world,and more, it can extract small and relevant data from this big world and can make the same assumptions when it's applied with the all data.marcelcaraciolohttps://www.blogger.com/profile/03000508520057818811noreply@blogger.comtag:blogger.com,1999:blog-3366935554564939610.post-39529536058813998792009-09-22T08:08:25.127-04:002009-09-22T08:08:25.127-04:00Thanks. This was a good post.Thanks. This was a good post.Anonymousnoreply@blogger.com