| This web site gives the opinions of Dr. Greg Kane. Everything you read here is expressed only as my personal opinion. |
| © 2010 Nothing here may be reproduced without written permission; Trial Talk articles and raw study data excepted. |
In the San Diego study, at BAC 0.04%, the accuracy of the SFST on innocent drivers was 7%. |
| Home | Data | Accuracy |
They Changed The Answers |
Secret Evidence | Threats? |
| On this page: |
|
| Elsewhere: |
Read NHTSA contract scientist Dr. Stuster's defense of his use of NHTSA's A = PV theory |
| What number is X? The technical term here is Predictive Value. If officers do 100 SFSTs, and 60 of them are correct, the Predictive Value of the SFST is 60%. That means 100 - 60 = 40% of the convictions are false convictions. So what number is the Predictive Value of each defendant's SFST? In lay English we'd say, "What is the accuracy of each defendant's SFST?" The "accuracy" of a DUI defendant's SFST is one of those problems with an answer that is easy and obvious. And wrong.
Easy? Easy. Obvious? Yeah. True? Not so much. SFSTs convince courts and juries to convict people because NHTSA tells everyone the test is 93% accurate—that the answers it gives are correct 93% of the time. I'm going to show you is this claim is wrong. It is not wrong because the test is not 93% accurate. It is wrong because NHTSA's statistic "accurate" and the English word "accurate" mean different things. |
||||||||
So here are a bunch of quotes from NHTSA publications using "accuracy" to measure the "vaidity" of SFSTs. I'm going easy on you, quote wise. NHTSA makes this claim hundreds of times. |
||
![]() |
NHTSA has assumed that "accuracy" = predictive value all the way back to the first FST project in 1977. |
|
|
And NHTSA continued to rely on this theory all the way through the most recent SFST validation project, in 1998. |
|
![]() ![]() |
Police officers are trained that "accuracy" = predictive value. |
|
![]() |
And NHTSA's web site assesses SFST predictive values through the years by reporting...."accuracy." |
|
"Accuracy" = predictive value is the basis of all NHTSA SFST validation claims. |
||
|
When the American Medical Association publishes research in its journal
(the largest circulation medical journal in the world), it requires
authors to describe diagnostic test accuracy by reporting the statistics
sensitivity and specificity. Reports
of predictive value (aka "accuracy") may be added, but only
if prevalence information is also given. Later on we'll talk
about why.
|
||||
|
Pointy headed medical experts at the American College of Physician review about 5,000 medically related scientific articles each month, and summarize the 25 most relevant articles in ACP Journal Club. Here's a figure from the journal showing what they do:
ACP Journal Club prints a glossary of statistical terms related to diagnostic test interpretation. Here it is. >> You see "accuracy" on the list? Me neither. The experts who review 60,000 medical journal articles a year don't include "accuracy" in the glossary of scientific terms used to describe diagnostic tests. |
|
||
Two not enough? Google sensitivity specificity predictive value. |
|||
| Mainstream, competent science does not use NHTSA's Accuracy = Predictive Value theory — because it does not work. Oops. |
|||
| NHTSA's A = PV
theory gives answers that are wrong
Officers are taught that the One Leg Stand is 65% accurate. They are taught that that means if they rely on the OLS test, the answers they get will be correct 65% of the time. |
So let's
set up an OLS sobriety checkpoint outside a local church Sunday morning. No one leaving church has a high BAC. Officers do OLS tests on 100 drivers leaving church. The OLS gives the correct answer 65% of the time — 65 of these 100 innocent parishioners will pass the OLS test. They will be released. The OLS test gives the wrong answer 100 –
65 = 35% of the time. Thirty-five innocent parishioners will
fail the OLS test. They will all be arrested. 65 releases, all correct. Release accuracy 100%. This is not a trick. This is how SFSTs and OLS work in the real world. Police really could set up a sobriety checkpoint outside a church Sunday morning. And when they did, the SFST would make mistakes. NHTSA science proves that. And every time the SFST made a mistake the officer would arrest a person who was innocent. Every single person arrested would be innocent. According to NHTSA's naive theory, "OLS is 65% accurate." We tried out that theory. We used that theory to predict what would happen. What really happened is not what we expected. NHTSAs theory does not work. NHTSA's theory is wrong. Look at the boxes on the bottom right of the picture. Diagnostic tests don't have one "accuracy" they have several. In this example the 65% "overall accuracy" of OLS is the average of the 0% arrest accuracy and the 100% release accuracy. Knowing the overall accuracy does not tell you the arrest accuracy. |
|
Only this time we do the test at Oktoberfest. Everyone has a high
BAC. 35 releases, all wrong. Release accuracy 0% Every single person released was guilty. The Predictive Value of a test depends on where you do the test. Amazing, but true. This is how the world really works. Only NHTSA doesn't know. So NHTSA uses statistics that greatly underestimate the SFST's false arrest and false conviction rate. |
What's going on The SFST is not a sobriety test. Officers do not measure sobriety. They measure, can you stand on one leg, can you walk and turn, do your eyes move smoothly. Officer's measure coordination. That's it. That's all they do. SFSTs are coordination tests. NHTSA has a theory: Drunks are uncoordinated, and everyone who is uncoordinated is drunk. But it's only a theory. And it's not a very good theory. The truth is people are uncoordinated for lots of reasons besides alcohol. Going from the church to Oktoberfest, the accuracy of the officers' coordination measurements doesn't change. What changes is the accuracy of NHTSA's theory. At church no one is drunk; everyone who is uncoordinated is uncoordinated for some other reason. At Oktoberfest, everyone is drunk, so everyone who is drunk is uncoordinated. The accuracy of the SFST depends on the accuracy of the officer's coordination measurement and on the accuracy of the theory that everyone who is uncoordinated is drunk. |
Review NHTSA answers the wrong conviction rate question with the easy and obvious answer — "accuracy." But real scientists know NHTSA's answer doesn't work. So here's a question. What is the SFST's false conviction rate? |
| In his threatening
(as I read it) email, NHTSA's official contract scientist,
San Diego validation study author Dr. Stuster, writes: So I asked NHTSA scientist Dr. Stuster to name the statistical methods he has in mind, and the basis on which he believes them to be generally accepted. So far he hasn't. |
![]() |
|
|||||||||||||||||||||
83 innocent drivers took the SFST At 0.08% BAC, on innocent people the SFST is only
29% accurate! That's worse than a coin
toss. If juries rely on the standardized field sobriety test to decide the guilt of drivers charged with DUI at the 0.08% level, they will falsely convict 71% of the innocent drivers who go to trial. |
|||||||||||||||||||||
PPV tables When a driver's pretest probability of impairment is 20%, a failed FST indicates a post-test probability of impairment of 21%. The FST moved the probability 1%. 50% becomes 52%. 70% becomes 71%. None of these changes are greater than the margin of error in the pretest probabilities themselves. The FST makes no difference. None. Within the margin of error, the FST does not move the probability at all.
|
|||||||||||||||||||||
Here are
PPV results using SFST criteria for identifying impairment at 0.08%
BAC Again, no meaningful change in the probability of impairment. The science has been done. The science proves FSTs do not work. |
|||||||||||||||||||||
|
||||||
| Skewed
sample, skewed results: how the NHTSA's accuracy statistic can be
manipulated
These studies generally report police officers' "arrest accuracy." What the NHTSA doesn't let on is, this accuracy statistic is easily manipulated. Simply by skewing the mix of sober and impaired drivers you choose to “study,” you can set up your validation project beforehand so it is certain to “discover” whatever arrest accuracy you’ve been paid to validate. The extended example below shows you how it can be done. To “discover” the accuracy
you're being paid to discover, skew the sample. The accuracy statistic changes as the mix of drivers
researchers choose to study changes. Which is why peer reviewed scientific
journals like the Journal of the American Medical Association do not
accept papers that report only the accuracy statistic.
> >
Contingency tables The four squares in a matrix tally study results. (Extra boxes on the right and bottom sum rows and columns.) Every driver in a validation study goes in one of the four squares, depending on their combination of BAC and WAT results. BAC results are separated by row. Drivers with BACs above the legal limit go in the top row. Drivers with BACs less than the legal limit go in the bottom row. Which column a driver goes in depends on whether the WAT said to release or arrest them. Here's what we can tell from the WAT contingency table: 76 innocent people took the WAT test. |
||
The best way to pick groups that are similar to the whole population is to pick people at random. (This turns out to be harder than you'd think.) A representative group of SFST subjects would have the same age, gender, athletic ability, and percentage of sobriety as the population in general. Groups that do not look like the population, groups that have a different number of old people, or women, or drunks than the general population are said to be skewed. In our extended example we're going to skew the proportion of drunk and innocent people in a series of gedanken SFST validation studies. We're going to see how skewing the study group affects the accuracy statistic we "discover." Ready? Here we go.
Notice this example keeps the WAT test's fundamental accuracies, the innocent and impaired driver accuracies, unchanged. One hundred drivers were impaired. 8 impaired drivers passed the WAT test, 92 impaired drivers failed. The accuracy of the WAT on impaired drivers is 92%. Eighteen innocent drivers took the WAT. 9 passed, 9 failed. The accuracy of the WAT on innocent drivers is 50%. These numbers, the 50% and the 92%, I didn't make them up.
I got them from the latest, most up to date NHTSA SFST validation
study. Two numbers I did pick. The 100 impaired and 18 innocent drivers.
I didn't pick randomly. I chose carefully. The truth is, my WAT validation
study's accuracy "discovery" is not a discovery at all.
I new ahead of time what I wanted to discover, and I rigged the study.
I deliberately manipulated the study group. I loaded
up on drunks. I knew if I did, the accuracy statistic I "discovered"
would be high. To rig the study, I didn't have to sneak into the lab after dark and switch results. I didn't have to get honest, upright police officers to cheat. I didn't have to change the SFST in any way. I didn't have to lie. All I had to do was set up my study design so most of the people we would be studying would be impaired. Just load up on drunks. That's it. Almost. Also, I had to ignore basic science and conceal sensitivity, specificity and prevalence. And I had to be sure my work wasn't peer reviewed. |
| Do you see how every time the mix of drivers in the study group changes, the NHTSA’s arrest accuracy statistic changes? Skewed sample, skewed results. This is why peer reviewed scientific journals like the Journal of the American Medical Association do not accept papers that report only the accuracy statistic. The change-the-mix accuracy swing is huge, from zero percent all the way up to 100%. Simply by skewing the impaired driver: innocent driver mix you choose to “study,” you can set up your field test beforehand to “discover” whatever arrest accuracy you’ve been paid to validate. Like this: |
|
|
|
If you are being paid to discover an accuracy of 99%
skew your study group so that 98%
of its drivers are impaired |
![]() |
If you are being paid to discover an accuracy of 80% skew your study group so that 68% of its drivers are impaired. |
![]() |
If you are being paid to discover an accuracy of 60% skew your study group so that 45% of its drivers are impaired |
![]() |
If you are being paid to discover an accuracy of 40% skew your study group so that 27% of its drivers are impaired |
![]() |
If you are being paid to discover an accuracy of 20% skew your study group so that 12% of its drivers are impaired |
![]() |
If you are being paid to discover an accuracy of 1% skew your study group so that 1% of its drivers are impaired |
| These examples represent highly skilled DUI officers, personally trained by Dr. Burns (effectively the inventor of the WAT test), performing NHTSA-standardized WAT tests flawlessly. I didn't make up the WAT accuracies, I found them on page 21 of the NHTSA's most recent, most up do date SFST validation study. In every example the accuracy of the WAT on innocent drivers is the same, 50%. In every example the accuracy of the WAT on impaired drivers is the same, 92% From example to example all that changes is the mix of impaired and sober drivers. Skewing that mix lets you manipulate the accuracy you "discover" to any number you choose, from 0% to 100% |
|
| San Diego validation study author
Dr. Stuster responds: |
All NHTSA SFST validation studies "discovering" that the SFST is "extremely accurate" depend on this statistical trick.
The high "accuracy" of the SFST is a statistical trick. Here's how the San Diego study "discovered" a high SFST accuracy:
|