FST SFST accuracy sensitivity specificity predictive value

Our driver failed the scientific SFST. What is the probability she was impaired by alcohol? The obvious answer is to use the accuracy of the SFST. If the SFST gives the correct answer 95% of the time, then there's a 95% chance our lady's SFST result was correct. I've told you this doesn't work. There's fancy math to explain but for our purposes it'll work better if I just show you.

I'm going to give you an example using real numbers from a real NHTSA FST validation study, from 1977. The point here is not the accuracy of this early non-standardized FST. The point is the concept of scientific accuracy itself.

Accuracy is more complicated than you think
Do the SFST 100 times and you'll get the correct answer ACCURACY percent of the time. That's what NTHSA teaches DUI officers. But it doesn't work. Accuracy is more complicated than common sense makes you think.

Here's a Field Sobriety Test accuracy table from NHTSA's original "scientific" FST validation project, Psychophysical Tests for DWI Arrest 1977. I've changed the labels to make the thing easier to read; you can get the original at Pdfs. In this project each person tested had two measurements made: blood alcohol and a Field Sobriety Test. The question is, What percent of the time did the FST measurements correctly predict the alcohol measurements? The answer to that question will be the accuracy of the SFST.

The table sorts people by test result. Look under the pink label Officer/ FST. People who failed the FST were counted in the Fail column. People who passed were counted in the Pass column. Over on the side, people whose measured alcohol was high went in the ALCOHOL high row. People whose alcohol was low went it the ALCOHOL low row. Tables like this set up True Positive, True Negative, False Positive and False Negative results in a way that makes it easy to answer important questions about this FST.

When people were guilty, how accurate was this FST? Look at the row ALCOHOL high→ . Follow the red arrow across to the "% Correct Decisions" column. See the red circle around 84? In this study when people had a high alcohol level, the FST gave the correct answer 84% of the time. When people were guilty, the accuracy of this FST was 84%.

When this FST said people were guilty, how accurate was that prediction? Look at the column Fail ↓ . Follow the orange arrow down to the "% Correct Decisions" row. See the orange circle around 53? When people failed this FST, the test was correct only 53% of the time. When this test said people were guilty, the accuracy of the test was 53%—a coin toss.

Wait, wait, wait! Those two accuracies are both about people who were guilty. How come they're different— 84%, 53%? The answer is, the two accuracies answer questions that are subtly different. One is about people who are guilty. The other is about people the test says are guilty. Those groups are subtly different. They count different groups of people. So the accuracies are different. And notice that although the difference in what groups to count is subtle, the difference in accuracy—84% vs. a coin toss—is dramatic.

When people were innocent, how accurate was this FST? Blue circle, 73%.
When this FST said people were innocent, how accurate was that prediction? Green circle, 93%.
How often did this FST give the correct answer? Pink circle, 76%.

Ok, so what is the accuracy of this FST? What is the ACCURACY number that lets us administer this FST and get a result and know the percent chance that result is correct? NHTSA used the pink circle and reported this FST was 76% accurate. But if we use that statistic our probability

"Studies of screening and diagnostic tests should report sensitivity, specificity, and likelihood ratio. "

Journal of the American Medical Association
Instructions For Authors, 2008, pg 8;
also in JAMA, July 2, 2008-Vol 300, No. 1

When the Journal of the American Medical Association (impact factor wise, one of the top three medical journals in the world) publishes scientific papers about diagnostic tests, it requires authors to describe diagnostic test accuracy by reporting the statistics sensitivity and specificity.

calculations are always wrong. When we arrest people 53% (orange circle) of those arrests will be correct, not 76%. When we don't arrest people, 93% (green circle) of the non-arrests will be correct, not 76%. And if we go to the airport and use this same FST to screen pilots before they fly, NHTSA's "76% accurate" test will correctly identify 84% (red circle) of the drunk pilots.

NHTSA's 76% accuracy statistic does not allow us to do this FST, get a result, and know the probability that result was correct. NHTSA's statistic "accuracy" doesn't work.

Competent scientists knows all this. Real science does not describe diagnostic tests with the NHTSA's accuracy statistic. Because it doesn't work. Competent science describes diagnostic tests with two special accuracies, sensitivity and specificity, that do allow the calculation of the probability that a test result is correct. You can learn more at: Wikepedia Bayes, web Bayes, web PPV , MEDLINE PPV .

Complicateder
So like all diagnostic tests the FST doesn't have one accuracy it has five, each accuracy answering a different question about the test. If we want to know the percent chance a DUI defendant's FST result was correct, the obvious fix is to look at those five accuracies and pick the one that answers the question relevant to DUI guilt: What is the accuracy of the FST when the test has returned the result, "Guilty"? Unfortunately this is another one of those times the simple and obvious answer doesn't work. Again there's fancy math; again it'll work best if I use an example to just show you what happens.

EXAMPLE: Validating and using an New SFST
NHTSA hires a contractor to scientifically validate a New SFST for alcohol impairment. To study a test to intently drunks, the contractor has to go where the drunks are. He validates the New SFST at Oktoberfest, where everyone is drunk. Here's what happens.

Highly trained DUI officers make New SFST measurements on Oktoberfesters. They interpret these measurements according to standardized interpretation rules. How accurate are those results? Well, NHTSA claims the real SFST gives the correct answer 95% of the time, so in this example I'll use that number. Officers test 100 Oktoverfesters. The New SFST gives the wrong answer 5% of the time, and the correct answer 95% of the time. 95 Ockotberfesters are arrested.

The question relevant to DUI guilt is: What is the accuracy of the New SFST when the test has returned the result, "Guilty"? The contractor calculates that accuracy: 95 people were judged to be guilty; blood testing proved each of those people had a high blood alcohol. 95/95 = 100%.The New SFST is 100% accurate.

Excellent. Now officers can use the New SFST in the field, confident it has been scientifically validated.

Confident of the new scientific test, the same DUI officers set up a New SFST sobriety checkpoint outside a church. The officers administer the New SFST to parishioners as they drive home from Sunday services. The officers make exactly the same measurements they made at Oktoberfest; they interpret those measurements exactly as they did before. No one leaving church is drunk.

NHTSA says the real SFST gives the correct answer 95% of the time. In this example we'll use that same percentage. Officers administer a New SFST to 100 parishioners 95 pass, and are released. 5 fail, and are arrested.

The question relevant to DUI guilt is: What is the accuracy of the New SFST when the test has returned the result, "Guilty"? The contractor is gone, but we know how to calculate the relevant accuracy of DUI convictions based on this new test: 5 people were judged guilty; blood testing would have proved none of them had a high blood alcohol. 0/5 = 0%. Convictions based on the New SFST are 0% accurate.

See what happened? Trained DUI officers scientifically validated the New SFST as 100% accurate, but when the same officers did identical New SFST tests in the field, the accuracy of the test was 0%.

This is not a trick. This is how real tests work in the real world, which, again, is why real scientists don't describe the performance of diagnostic tests using this statistic. It doesn't work. But this is the SFST statistic US law enforcement uses to convict DUI defendants.

Why Scientific Accuracy matters
Every day in the United States people are punished because they failed a scientific SFST. It is fair and just for us to do this to them only if we know SFST measurements accurately predict alcohol levels. The only way we know SFSTs are accurate scientific tests of alcohol level is, NHTSA says so. NHTSA justifies this claim only by pointing to in-house, non-peer reviewed, self published scientific "studies" that discovered the Oktoberfest accuracy of the SFST is 91, 93 or 95%. The Oktoberfest accuracy. Our justification for jailing people, and taking their stuff, is based on the Oktoberfest accuracy statistic that real science knows does not correctly identify the power of the SFST to predict alcohol levels. But that's all we've got. Every day in the United States we punish people because they failed a "scientific test" that we have no way of knowing works.

In his email threatening me (as I read it) for exposing errors in NHTSA's SFST science, NHTSA contractor Dr. Jack Stuster, author of the most recent NHTSA SFST validation study, writes:

"I cannot find one claim in your rant that is supported by the evidence or accepted statistical methods."

So I asked NHTSA contractor Dr. Jack Stuster to name the statistical methods he has in mind, and the basis on which he believes them to be generally accepted. So far he hasn't.

Read this web site while you can. Best I can tell, I am under ongoing threat from NHTSA contract SFST scientist Dr. Jack Stuster for exposing the scientific errors you are reading about here.

This web site is about science— NHTSA's SFST validation science. I do not know, I do not care, I do not have an opinion about Dr. Jack Stuster's knowledge or intentions at any time ever in his life. I'm not even saying he had knowledge or intentions. But if he did, this web site isn't about them. Or him.

Top