The Searchers

By Michael Hardy | Spring 2014

New "predictive coding" software makes it easier than ever to search through mountains of data. But is the software ready for use in lawsuits?

In 2010, Monique da Silva Moore, a public relations director at MSLGroup in New York, returned from maternity leave to learn that she had been fired. In response to her firing, as well as what she claimed was systematic bias against women at MSLGroup, da Silva Moore filed a $100 million class action lawsuit against MSLGroup’s French parent company, the Publicis Groupe. The suit alleged that women made up 70 percent of the corporation’s 45,000 employees, but that the senior managers were overwhelmingly male.

In order to prove her claims, da Silva Moore’s lawyers requested approximately three million electronic documents from Publicis, including internal emails, employee evaluations and promotion paperwork. Of course, no single person could read three million files.

So what’s a court to do with such an avalanche of documents?

“When you think about companies with hundreds or thousands of employees, think of the mass of documents that they produce,” says Tonia Murphy, a Mendoza College teaching professor of accountancy who specializes in business law. “It would be impossible, really, to have a person put eyes on every document and make a judgment of whether it’s responsive to the search request. You’ve got to have some methodology that you can explain to the judge and to the other party, to assure them that you’ve undertaken a reasonable search of the records that are out there.”

In a recent paper published in the American Business Law Journal, Murphy discusses the issues surrounding the choice of methodologies used to search through the mass of electronic information. Traditionally, lawyers have used keyword searches to cull the relevant documents. In the case of Da Silva Moore v. Publicis Groupe (or simply Da Silva Moore) the keywords might be “promotion” or “maternity leave.” Boolean logic—the use of connectors like “and,” “but not,” or “in the same sentence as”—is used to make the search more sophisticated.

Recently, however, software companies have developed a new method of searching electronic data known as “predictive coding.” Instead of just searching for particular words, predictive coding software searches for related concepts. For instance, a search for “maternity leave” might turn up references to babies, mothers, pregnancy and childbirth. According to advocates of predictive coding, the method is more accurate than keyword searching and cuts down on costs because it turns up fewer irrelevant documents.

The advent of predictive coding has led some lawyers to begin requesting that it be employed instead of keyword searching. In Da Silva Moore, it was the defendants who requested predictive coding. The judge in the case, Andrew Peck, was a well-known advocate of the new technology who had participated in trade shows sponsored by predictive coding vendors.

The plaintiffs in Da Silva Moore initially agreed to the use of predictive coding. However, they soon changed their mind and requested that Peck recuse himself from the case because of his apparent bias in favor of the new technology. When Peck denied the request, the plaintiffs appealed the decision to the district judge, who upheld Peck’s decision. The decision was appealed to the U.S. Supreme Court, which recently declined to hear the case.

In her paper, Murphy argues that the plaintiffs in Da Silva Moore and similar cases may have good reason to be skeptical about the use of predictive coding. For one thing, there are several competing manufacturers of predictive coding software, each with its own proprietary techniques. These manufacturers are understandably reluctant to divulge their methods, which makes predictive coding a bit like a black box. Even if both parties to a lawsuit agree to use the software, they may not know exactly what they’re getting.

Furthermore, there is scant empirical evidence that predictive coding produces superior results compared with more sophisticated keyword searches. Given questions about the technology, Murphy says that it’s certainly not appropriate for judges to begin mandating use of predictive coding. Indeed, the parties to a lawsuit “may have legitimate, good-faith concerns that lead them to prefer keyword searching.”

Apart from the utility of predictive coding, Murphy says that there’s another problem with judges requiring use of the new software in electronic discovery—the principle of a party-driven discovery process. In the American legal system, the parties are responsible for choosing search methodology. Judges may only intervene if one of the parties is being unreasonable or obstructive. A heavy-handed intervention by a judge can create the appearance of judicial bias.

“The system is designed in a particular way for a reason,” Murphy says. “Party-driven discovery is there for a reason. It would be wise to stick with basic principles that have served the system well for so many years.”