Thursday, June 5, 2008

Qualifications for Studying Data Mining

A recent question . . .

I am hoping to begin my masters degree in Data Mining. I have come from a Software Development primary degree. I am a bit worried over the math involved in Data Mining.Could you tell me, do I need to have a strong mathematical aptitude to produce a good Thesis on Data Mining?

First, I think a software development background is a good foundation for data mining. Data mining is as much about data (and hence computers and databases) as it is about analysis (and hence statistics, probability, and math).

Michael and I are not academics so we cannot speak to the thesis requirements for a particular data mining program. Both of us majored in mathematics (many years ago) and then worked as software engineers. We do have some knowledge of both fields, and the combination provided a good foundation for our data mining work.

To be successful in data mining, you do need some familiarity with math, particularly applied math -- things like practical applications of probability, algebra, the ability to solve word problems, and the ability to use spreadsheets. Unlike theoretical statistics, the purpose of data mining is not to generate rigorous proofs of various theorems; the purpose is to find useful patterns in data, to validate hypotheses, to set up marketing tests. We need to know when patterns are unexpected, and when patterns are expected.

This is a good place to add a plug for my book Data Analysis Using SQL and Excel, which has two or three chapters devoted to practical statistics in the context of data analysis.

In short, if you are math-phobic, then you might want to reconsider data mining. If your challenges in math are solving complex integrals, then you don't have much to worry about.



  1. agree. people are hung up on math when all you need is a processor and an excel spreadsheet.

    grab a shovel and enjoy the dig!

  2. Re the previous comment:
    well, people who build processors and spreadsheets got to hung up on math... :)
    I think if one enjoys abstract algebra and statistics there should not be any problems with data mining.

  3. what are best universities to have MS in data mining in new york state?

  4. yes,you need to have and own a mathematics knowledge to have a good data mining potential spicily Graph theory,Advanced linear and Non_linear Optimization,mathematical modeling,linear Algebra ,Discreet Mathematics etc......


Your comment will appear when it has been reviewed by the moderators.