Profit and Predictability

By Ed Cohen | Winter 2012

Learning to Decode and Leverage Likely Futures

The first thing you learn in Business Forecasting and Data Mining is how to read Professor Barry Keating’s syllabus. It looks something like this:

Gur svefg guvat lbh yrnea va Ohfvarff Sberpnfgvat naq Qngn Zvavat vf ubj gb ernq Cebsrffbe Oneel Xrngvat’f flyynohf, juvpu ybbxf fbzrguvat yvxr guvf:

Like Keating’s syllabus, the text above has been encrypted or “enciphered.” So the first assignment for students is to, literally, de-cipher the course syllabus. The first test, they’re told, will be on basic information contained in the coded document, such as what percentage of the final grade will be derived from test scores.

It isn’t that Keating, Notre Dame’s Jesse H. Jones Professor of Finance, enjoys messing with students’ minds. The reason he encrypts the syllabus and spends the first class talking about nothing but cryptography is simple: Forecasting is just like decrypting, he says. You’re faced with a mind-boggling array of information, and the only way to understand it is to discover the patterns.

In business, pattern recognition can mean noticing how, year after year, your company’s sales are higher (or lower) in certain seasons or during certain points in the macroeconomic business cycle. With such insight, you could adjust your manufacturing or staffing to maximize profit.

Keating describes how Walmart demands that its suppliers share their sales data with Walmart executives. The world’s largest retailer wants to forecast how much of a product it’s likely to sell at specific stores at particular times of the year. Walmart’s dream: to have the last unit of a particular product sell just as the tractor-trailer pulls into the parking lot with replacement supplies.

“They want to make the supply chain as short as possible,” Keating says, “and the only way they can do that is by being better at forecasting than anybody else. And they are.”

The same could be said of Keating and this course, which is offered as an elective to both undergraduates and MBA students. Keating has been teaching forecasting for 20 years and literally wrote the book on the subject, along with co-author J. Holton Wilson of Central Michigan University. Now in its sixth edition, Business Forecasting is the most-used forecasting text in the world, according to its publisher, McGraw-Hill. It’s been translated into Mandarin, among other languages.

The course moves in stages from cryptography to forecasting—looking at quarterly results for clues to likely outcomes no further than 18 months out—to data mining.

As the name implies, data mining involves unearthing precious management metal from mountains of recorded information. In the Information Age, the volume of such material has become almost unfathomable. For instance, today’s sophisticated online merchants, Keating says, record every keystroke of every visitor to their sites.

“It is a gold mine,” Keating says. “It’s just like the ’49ers in California. This data is just waiting to give up its secrets, and we now we have some tools to get it.”

One of those tools is the data-mining software included with the course textbook. The current edition comes with a CD of statistical-analysis software and datasets drawn from the actual business records of companies that are household names. Developed by a researcher at MIT for educational purposes, the software is similar to the popular—but expensive—commercial program SAS Enterprise Miner. 

In one assignment, students are given about 25 categories of information on individual customers of a bank, including the types of accounts they have, the size of their families and the ZIP code where they live. The assignment: Use the data-mining software to determine which customers are likely to take out a personal loan.

Keating says students start by “partitioning” or splitting the data in half. The software analyzes the first half to chart the characteristics most common to people who have already taken out loans. This creates a model. Students then apply the model to the second half of the customers. The object is to see if the model provides any “lift.” That is, does it predict who in the second group has taken out loans any better than just guessing?

In the dataset used in the assignment, it does. The model’s prediction turns out to be 7½ times better than guessing. As Keating explains, the bank could use that knowledge to limit a direct-mail effort to just the subset of customers—about 10 percent—most likely to be interested in a loan.

“I’ve just gotten rid of 90 percent of my (marketing) cost,” says Keating.

The same bank could use data mining to retain customers by looking for characteristics common to those who had left the bank. It could then lavish giveaways or special offers on similar customers to squash the idea of a bank change.

The practical, applicable nature of this course likely contributes to its enduring appeal. In the fall, for example, Keating taught two sections to undergrads. The classroom had a capacity of 42. Both classes started at 7:30 a.m., a time anathema to most students. Yet every seat was taken, and there was a waiting list 41 names long.

“The course was one of my favorites,” says one of the MBA enrollees from last spring, Jared Shawlee (MBA ’11). “I use it quite a bit in this job.”

The senior director of ticket sales and strategy for the San Jose Earthquakes Major League Soccer team, Shawlee is developing what’s called a dynamic-pricing strategy for the team’s tickets. Historically, the team followed the conventional model of charging the same price for every regular-season home game. But as fans know, some games are more desirable than others, such as when an opposing star player comes to town, during better weather months, or when a particular promotion is offered (think Bat Day or fireworks nights in baseball).

Starting next year, there will be different prices for Earthquakes games based on demand patterns from previous years. In other words, game prices will be set before the season starts based on historic data and forecasting. Then, during the season, prices for the remaining tickets will rise and fall depending on interest factors, such as whether the team is winning.

The ticket strategist says he won’t be crunching the numbers himself for the forecasting aspect of the project. Rather, he’ll be involved in selecting a consultant to do that.

Keating says that’s typical of alumni of the course. Most don’t go on to become full-time forecasting or data specialists. But almost all business people will face challenges involving forecasting at some point in their careers, he says.

Every student leaves Business Forecasting and Data Mining with an understanding of a favorite mantra of Keating’s: “More data is better than less data.” It means that in business forecasting, as in cryptography, the more information you have, the easier it is to spot patterns. That’s the reason why military commanders are advised to keep even coded messages short. It makes them harder to crack.

The second paragraph of this article employs an ancient military code credited to Julius Caesar. It is just a repeat of the first paragraph with the letters changed using a substitution pattern. Each letter has been moved forward 13 places in the alphabet. What was an “a” is now an “n,” and so on.

The code for Keating’s syllabus is astronomically harder. It employs a random letter-substitution pattern.

Keating tells students that if they tried a billion combinations a second, it would take 12 billion years to exhaust every possibility. Such codes were thought to be unbreakable, he says, until a researcher discovered that some letters, such as “e,” appear in words much more often than others. The letter-usage rates are the basis for the number of tiles and the different point scores assigned to each letter in the game Scrabble.

Keating explains that even with random-letter substitution, if you’re given a sufficiently long sample of scrambled text (the syllabus, for example) and a software tool that analyzes letter frequency (which Keating provides), students can decipher the course syllabus.

“It usually takes them 12 to 15 minutes,” he says.

Leaving plenty of time to prepare for the first test.