Yesterday, I had the pleasure of being on a panel for a local TDWI event here in New York focused on advanced analytics (thank you Jon Deutsch). Mark Madsen of Third Nature gave an interesting, if rapid-fire, overview of data mining technologies. Of course, I was excited to see that Mark included Data Analysis Using SQL and Excel as one of the first steps in getting started in data mining -- even before meeting me. Besides myself, the panel included my dear friend Anne Milley from SAS, Ali Pasha from Teradata, and a gentleman from Information Builders whose name I missed.
I found one of the questions from the audience to be quite interesting. The person was from the IT department of a large media corporation. He has two analysis groups, one in Los Angeles that uses SPSS and the other in New York that uses SAS. His goal, of course, is to reduce costs. He prefers to have one vendor. And, undoubtedly, the groups are looking for servers to run their software.
This is a typical IT-type question, particularly in these days of reduced budgets. I am more used to encountering such problems in the charged atmosphere of a client. The more relaxed atmosphere of a TDWI meeting perhaps gives a different perspective.
The groups are doing the same thing from the perspective of an IT director. Diving in a bit futher, the two groups do very different things -- at least from my perspective. Of course, both are using software running on computers to analyze data. The group in Los Angeles is using SPSS to analyze survey data. The group in New York is doing modeling using SAS. I should mention that I don't know anyone in the groups, and only have the cursory information provided at the TDWI conference.
Conflict Alert! Neither group wants to change and both are going to put up a big fight. SPSS has a stronghold in the market for analyzing survey data, with specialized routines and procedures to handle this data. (SAS probably has equivalent functionality, but many people who analyze survey data gravitate to SPSS.) Similarly, the SAS programmers in New York are not going to take kindly to switching to SPSS, even if offers the same functionality.
Each group has the skills and software that they need. Each group has legacy code and methods, that are likely tied to their tools. The company in question is not a 20-person start-up. It is a multinational corporation. Although the IT department might see standarizing a tool as beneficial, in actual fact, the two groups are doing different things and the costs of switching are quite high -- and might involve losing skilled people.
This issue brings up the question of what do we want to standardize on. The value of advanced analytics comes in two forms. The first is the creative process of identifying new and interesting phenomena. The second is the communication process of spreading the information where it is needed.
Although people may not think of nerds as being creative, really, we are. It is important to realize that imposing standards or limiting resources may limit creativity, and hence the quality of the results. This does not mean that cost control is unnecessary. Instead, it means that there are intangible costs that may not show up in a standard cost-benefit analysis.
On the other hand, communicating results through an organization is an area where standards are quite useful. Sometimes the results might be captured as a simple email going to the right person. Other times, the communication must go to a broader audience. Whether byy setting up an internal Wiki, updating model scores in a database, or loading a BI tool, having standards is important in this case. Many people are going to be involved, and these people should not have to learn special tools for one-off analyses -- so, if you have standardized on a BI tool, make the resources available to put in new results. And, from the perspective of the analysts, having standard methods of communicating results simplifies the process of transforming smart analyses into business value.