Thursday, June 5, 2008

Qualifications for Studying Data Mining

A recent question . . .

I am hoping to begin my masters degree in Data Mining. I have come from a Software Development primary degree. I am a bit worried over the math involved in Data Mining.Could you tell me, do I need to have a strong mathematical aptitude to produce a good Thesis on Data Mining?

First, I think a software development background is a good foundation for data mining. Data mining is as much about data (and hence computers and databases) as it is about analysis (and hence statistics, probability, and math).

Michael and I are not academics so we cannot speak to the thesis requirements for a particular data mining program. Both of us majored in mathematics (many years ago) and then worked as software engineers. We do have some knowledge of both fields, and the combination provided a good foundation for our data mining work.

To be successful in data mining, you do need some familiarity with math, particularly applied math -- things like practical applications of probability, algebra, the ability to solve word problems, and the ability to use spreadsheets. Unlike theoretical statistics, the purpose of data mining is not to generate rigorous proofs of various theorems; the purpose is to find useful patterns in data, to validate hypotheses, to set up marketing tests. We need to know when patterns are unexpected, and when patterns are expected.

This is a good place to add a plug for my book Data Analysis Using SQL and Excel, which has two or three chapters devoted to practical statistics in the context of data analysis.

In short, if you are math-phobic, then you might want to reconsider data mining. If your challenges in math are solving complex integrals, then you don't have much to worry about.