Main navigation

Register for the OSS 25th Anniversary Event

Is the Caller the Killer? 911 Call Analysis Can’t Give You the Right Answer

Many forensic techniques are junk science. We can add 911 call analysis to the pile.

Anxious woman having phone conversation in office

Jonathan Jarry M.Sc. | 24 Mar 2023

“Police thought they could read her mind just by listening.” It’s a chilling statement in reporter Brett Murphy’s ProPublica coverage and it sums up the pseudoscience of 911 call analysis. One of the founders of this technique, deputy police chief Tracy Harpster from Moraine, Ohio, is said to have plainly stated that he knows “what a guilty father, mother or boyfriend sounds like.” How? By listening to their 911 call.

911 call analysis is one of several forensic techniques born of shoddy science that have been put on trial lately not by judges, but by scientists and investigative journalists. The idea behind this latest bit of junk science is bewitching. Calls to the 911 emergency service are generally not rehearsed. Compared to a written statement, a perpetrator who wants to call in their victim while maintaining their innocence has to do more work to lie to a dispatcher on the phone. And the call is recorded, which means it could be picked apart by an expert. What if there are telltale signs of guilt in how the call is made?

Diving into the origin of 911 call analysis, however, reveals a poor initial study, limited attempts at replication, and law enforcement services that don’t seem to vet the pseudoscientific material delivered to their officers via seminars. The outcome? Innocent people being convicted of crimes they didn’t commit.

One hundred calls for help

While attempts at detecting lies are nothing new, the idea of analyzing 911 calls for their truth value saw the light of day in 2004 with the publication of a book which contained a section on fire emergency calls in London. The author, John Olsson, wanted to know if there was a way to single out hoax calls, and his experience led him to think that a sense of urgency was lacking in those.

But it wasn’t until a few years later that we saw the true birth of 911 call analysis. It was a study of one hundred homicide calls to 911 and it tested twenty different variables that might be associated with a caller’s guilt or innocence. The authors were Tracy Harpster, from the Moraine Police Department, and Susan H. Adams and John P. Jarvis, from the FBI. This analysis began as Harpster’s thesis as a criminal justice Master’s student, before morphing into a scientific paper, a copyrighted checklist, then a series of seminars taught all over the United States, and finally a book.

Taken at face value, the findings of the study make some intuitive sense. Guilty callers, who either committed or assisted in the homicide they are reporting to the 911 operator without confessing to it, were more likely to provide extraneous information not relevant to the emergency. They more often answered the dispatcher’s questions in evasive ways. They might not make a desperate plea for help, because they know that nothing can be done to save their own victim.

But the study itself, subtitled “an exploratory analysis,” was flawed in many ways. By the authors’ own admission in the paper itself, almost two-thirds of the 911 calls analyzed came from police departments in Ohio, which makes generalizations suspect. Any call made by someone who was extremely intoxicated was cast aside. All callers were English speakers from the United States. More importantly, researchers who subsequently tried to replicate these findings pointed out that the way in which these one hundred 911 calls were chosen and analyzed lacked transparency. Had they been randomly selected or not? And while half of them had been adjudicated as guilty and the other half as innocent, had the people coding these calls been blinded to this outcome? Did they know that a call was made by a guilty party while they were analyzing it? This sort of qualitative analysis, where a scientist combs through a 911 call and highlights the presence or absence of features, is very susceptible to motivated reasoning. If I’m coding a call and I know the caller is guilty of the homicide and I want my theory to be true, I can subconsciously be nudged in my analysis.

The findings from this original study should suggest even more questions. How do they apply to people who have a cognitive impairment? A speech impediment? Chronic anxiety? People with varying degrees of education? Autistic callers? People for whom English is not their first language? Harpster’s preliminary analysis cannot provide answers.

What it did do, however, is produce a book.

Attempts at replication

It’s called Analyzing 911 Homicide Calls: Practical Aspects and Applications. It was released in 2021 and is coauthored by Harpster and Adams.

The book is based on that original study of theirs, complemented by another set of one hundred calls, the analysis of which remains, it seems, unpublished. Each chapter focuses on a different cue that is allegedly more common in guilty or innocent calls. The reliability of each indicator is given a patina of scientific credibility by the inclusion of a pie chart. For example, chapter 3’s pie chart informs us that 79% of immediate pleas for help are made by innocent callers, with the remaining 21% coming from guilty callers. If these percentages were based off of an international data set of 10,000 calls, they would be more impressive; coming from flawed and limited work, these pie charts have the potential to mislead more than to inform.

Reading the book, we are told that guilty callers are less likely to volunteer an immediate assessment of the victim’s condition, and a caller requesting police instead of an ambulance may suggest a crime. We read that innocent callers may comment on the “bleeding” they see, but that guilty parties refer to “blood” instead. A criminal sounds like they have accepted the victim’s death; innocent people, however, simply will not believe their loved one is deceased. Though we are already on a flimsy foundation by building an entire book off of two hundred 911 calls, the authors add indicators that trend toward guilt, meaning they were seen in too few 911 calls to know whether they were even remotely reliable or not. They include cues like a caller asking the 911 operator, “Should I touch him?” (guilty!) and making a comment about the victim’s eyes (guilty!). And if you start recounting to the dispatcher a conversation that you and the victim had earlier, that’s another potential indictment.

No single indicator, the authors tell us, is enough to determine guilt, however. But the copyrighted “911 COPS Scale” included in the book is meant to get dispatchers, cops and detectives checking each of these telltale signs off as they go through a 911 call. The more of them they find, the more likely the caller is to be the suspect.

To base an entire forensic technique off of this material alone is irresponsible, but that does not mean that there is nothing true about this checklist. To figure this out, we need replications. Unfortunately, the researchers who have attempted to reproduce these findings and expand the scope of 911 call analysis have run into similar problems as Harpster and his co-authors. A Master’s thesis by Jon Cromer only looked at fifty 911 calls, with almost half coming from Virginia. A study by Michelle Miller and others looked at a total of one hundred and seventy-five calls, but the calls came from a non-random convenience sample, with most of them emerging from military law enforcement cases. Yet another study, this one by Patrick Markey and others, used two different sets of calls, one to explore hypotheses, the other to confirm them, while a recent paper by Daniel O’Donnell and colleagues analyzed seventy 911 calls not for homicides but for mysterious disappearances of children.

Some of these studies improve on Harpster’s original methodology— by pre-registering their study to improve transparency, by better defining the cues Harpster was reporting on, by masking the coders to the guilty or innocent verdict the 911 caller would eventually receive—but they all use too few 911 calls to be decisive. For example, in O’Donnell’s study of missing children calls, thirteen innocent callers showed an initial delay in speaking to the operator versus nineteen callers making a false allegation. This was seen as a significant difference, but the small numbers at play should make us question how generalizable or even real this finding truly is.

Many of these studies rely on 911 calls that were released publicly and were found on the Internet. It’s impossible to know how representative these calls are, and the mish-mash of agreements and disagreements with Harpster’s original data only serves to remind us that we are not dealing with settled science, not even close.

Reading tea leaves

911 call analysis is yet another attempt at selling an easy way to detect lies. History, unfortunately, is littered with unreliable lie detectors, from the polygraph all the way to the non-verbal behaviour we see pundits comment on when they analyze how politicians behave in front of cameras. Did they scratch their nose? Did they reach for a glass of water before answering an invasive question? Did their eyes dart to the left or to the right? None of this matters. As nonverbal communication expert Vincent Denault likes to say, there is no sign comparable to Pinocchio’s nose in real life. Lie detection is hard and even trained experts aren’t very good at it.

Meanwhile, forensic sciences as a whole have not fared well in recent years. Many of them, especially those based on pattern matching, are nothing more than junk science. Bite mark analysis is a good example. The claims that our teeth are unique and that bite marks left in skin can lead to a reliable match are just assumptions and they are not supported by good scientific evidence. So-called experts can’t even agree with each other or reliably determine if bite marks were left by a human or an animal. Yet this type of evidence can end up in court and play a role in condemning a suspect when it's presented as science.

Reporter Brett Murphy has done a fantastic job exposing the incredibly frail underpinning of 911 call analysis for ProPublica. Per his account, Tracy Harpster is paid up to USD 3,500 to teach an eight-hour class on his 911 call analysis theory. Hundreds of 911 dispatchers, prosecutors, coroners, and police officers have taken this course. Murphy tracked down over one hundred cases in twenty-six states where 911 call analysis was pivotal in arrests, prosecutions, and convictions. John Jarvis, the third author on that original watershed paper, told Murphy that he was uncomfortable with 911 call analysis being used in real cases. “There’s no definitive answer as to whether this is useful,” he is quoted as saying.

The way in which it is used is quite sneaky. Harpster himself is not brought in as an expert witness. Rather, law enforcement personnel who have attended his seminar are identified and told how to testify about the 911 call analysis guilt indicators by simply and vaguely mentioning their training and experience. The junk science origin of the method is obscured. Instead, a cop testifies that in their experience an innocent caller would make a direct plea for help, for example. Meanwhile, opening and closing arguments remind the jurors about the ways in which a typical person allegedly should or shouldn’t react during an emergency, thus priming them to listen to the 911 call with suspicion. Harpster himself never has to put up with cross-examination as an expert, nor can a legal precedent prohibit the admission of his technique in court, because it’s always brought forth indirectly.

There may yet be something useful to rescue from 911 call analysis, though given past efforts in lie detection and the fact that not everyone reacts to an emergency in exactly the same way, I’m not holding my breath. If there are guilt indicators to be salvaged here, we will need better studies to identify them: a large enough sample of representative 911 calls; proper blinding of the data coders; an exploratory data set from which to identify promising leads and a second data set to see if these leads hold water.

As of now, I will leave you with Brett Murphy’s pithy observation: “Such judgments often amount to reading tea leaves.”

Take-home message:
- The idea behind 911 call analysis is that there are indicators of guilt and innocence that can be heard during the call to determine if the caller actually did the crime they are reporting
- 911 call analysis is based on a flawed study of one hundred 911 homicide calls, and small-scale attempts at replicating its results have resulted in a mess of conflicting findings

Keywords: