Last week, I travelled to Malaysia to apply data mining techniques to business-to-business (B2B) problems. More typically, the work we do at data miners is in the world of business-to-consumer (B2C) , since a company with millions of customers is more likely to need our services. However, a certain gentleman read my book Data Analysis Using SQL and Excel and wanted to apply the ideas to his business.
In this case, the business is a large computer hardware business, that sells complex customized systems to customers. In fact, the systems are so complex that they have a special organization to configure the systems. The joke is that this allows the sales force to spend more time on the golf course. More seriously, it reduces configuration mistakes and helps ensure that the end customers receive the right system.
The data that we worked with was primarily sales data. The data is not tiny; several tables contain millions of rows, and the whole database probably occupies about ten gigabytes of data. However, the data structure is more complex than in a typical B2C database.
First, the important item of data is a quote. This is a configuration of hardware, software, networking equipment, and services provided to the customer. Some quotes result in orders; many do not. Each customer typically has many quotes, and several employees work on each quote. However, typically only one of those employees (let's call that person the sales rep) is credited with the sale.
A quote contains lists of items. However, these items are at the most detailed level (let's call it the UPC -- universal product code -- level). The hierarchy of products is important, particularly the two highest levels.
The highest level is the product category -- hardware, software, and so on. The next highest level is the catalog-listing level. That is, the product name in the catalog. For instance, I am writing this on an old Dell Precision M60 -- that is the "catalog" name. However, the specific disk size, number of megabytes, peripherals, and software might differ from computer to computer. Does it have a 40 Gig disk or an 80 Gig disk? And so on.
What type of questions might be answered by a history of customer quotes, with associated dimension information?
One question is the "quote-to-close" question. This is a simple time-to-event problem of how long from when a quote is created to when the quote results in an order. Generally, companies in this business look at the close rate for quotes, but not the timing of the close. However, the timing provides important information, particularly when it is changing over time.
A related question is what affects the quote-to-close time? Complex quotes with many different components might lengthen the time. Similarly, large discounts that require more levels of approval might lengthen the time. In this case, we were surprised that certain small quotes had a very long time. These small quotes had large percentage discounts, and were often paired with larger orders. The large percentage discount required more levels of approval -- a sign that the company is spending too much effort in the wrong place.
Such data also suggests market basket analysis. "What products are purchased together?" is not quite the right question. Instead, the question is "What configurations are most common?"
Common configuations lead to additional questions. Which configurations (if any) are associated with longer quote-to-close times? Which configuraitons are associated with longer install times? Some products are readily available, some have a longer lead time, and some are built-to-order. The logistics database is not necessarily available for quotes. And, the historical logistics database (what is the lead time for products at a given point in the past) probably does not even exist.
Another genre of question is related to purchase velocity. After a customer makes a purchase, when does the next purchase take place? In this business, purchases can be one of several types. A new purchase represents new equipment; an upgrade is adds additional components to existing equipment; and a replacement replaces one model for another. The pattern of purchases is interesting, both at the customer level and the sales rep level. Some sales reps may be focusing on upgrade business (which is relatively easy, since the customer already has the product), and not expanding into new customers.
A related idea to purchase velocity is to categorize customers by the time since their most recent purchase. What volume of purchases are made by customers who purchased something in the previous month? In the previous twelve months? By year? Most customers make at least one purchase per year. For the dormant customers who return, what caused them to come back.
B2B data is more complicated than B2C, because purchases are more complex and customers are organizations with multiple sites and contacts. Often, people are paid to understand the customers and to meet their needs. However, sales reps understand individual customers. There is ample opportunity for data mining to learn patterns about customers in general and to learn what works and does not work ijn particular circumstances.