This web site gives the opinions of Dr. Greg Kane. Everything you read here is expressed only as my personal opinion.

© 2010 Nothing here may be reproduced without written permission; Trial Talk articles and raw study data excepted.

In the San Diego study, at BAC 0.04%, the accuracy of the SFST on innocent drivers was 7%.

Accuracy Threats?
On this page:

   1   What NHTSA's Accuracy = Predictive Value Theory is
   2   SEE  the Accuracy = Predictive Value Theory in use
   3   Competent science does not use NHTSA's Theory....
   4   ...Becuase it makes mistakes.
   5   Poindexter

Elsewhere:

Read NHTSA contract scientist Dr. Stuster's defense of his use of NHTSA's A = PV theory

Top

NHTSA's Accuracy = Predictive Value Theory
Courts have decided to rely on SFSTs, on tests that cause false convictions. Courts have decided to convict innocent people. Science can't fix that. Science cannot figure out whether any particular SFST is wrong or not. What science can do is say, "If the court relies on SFSTs 100 times, X % of convictions will be false convictions."

What number is X? The technical term here is Predictive Value. If officers do 100 SFSTs, and 60 of them are correct, the Predictive Value of the SFST is 60%. That means 100 - 60 = 40% of the convictions are false convictions.

So what number is the Predictive Value of each defendant's SFST? In lay English we'd say, "What is the accuracy of each defendant's SFST?" The "accuracy" of a DUI defendant's SFST is one of those problems with an answer that is easy and obvious. And wrong.

Here's NHTSA's A=PV theory

1

  Several SFST "validation" projects discover that the "accuracy" of the SFST is about 93% .

2

  Your D took an SFST, and failed.

3

  Therefore: the probability that your D had a high BAC is 93%.

Easy? Easy. Obvious? Yeah.     True? Not so much.

SFSTs convince courts and juries to convict people because NHTSA tells everyone the test is 93% accurate—that the answers it gives are correct 93% of the time. I'm going to show you is this claim is wrong. It is not wrong because the test is not 93% accurate. It is wrong because NHTSA's statistic "accurate" and the English word "accurate" mean different things.

Top

SEE NHTSA's Accuracy = Predictive Value Theory in use
Let's start by seeing that NHTSA really does rely on the statistic "accuracy" to validate the SFST. NHTSA pays a contractor to have DUI officers do the SFST on a bunch of people, count up the answers right and wrong, and calculate what percent of those answers are correct. That's the "accuracy" of the SFST. Sounds reasonable, doesn't it?

So here are a bunch of quotes from NHTSA publications using "accuracy" to measure the "vaidity" of SFSTs. I'm going easy on you, quote wise. NHTSA makes this claim hundreds of times.

NHTSA has assumed that "accuracy" = predictive value all the way back to the first FST project in 1977.

"Decision analyses found that officers’ estimates of whether a motorist’s BAC was above or below 0.08 or 0.04 percent were extremely accurate. Estimates at the 0.08 level were accurate in 91 percent of the cases..."

Validation Of The Standardized Field Sobriety Test Battery At BACs Below 0.10 Percent Final Report Stuster and Burns, 1998, pg. iii

And NHTSA continued to rely on this theory all the way through the most recent SFST validation project, in 1998.


Police officers are trained that "accuracy" = predictive value.

And NHTSA's web site assesses SFST predictive values through the years by reporting...."accuracy."

"Accuracy" = predictive value is the basis of all NHTSA SFST validation claims.

Top

But, competent science does not use NHTSA's Accuracy = Predictive Value Theory
Real science does not report diagnostic test accuracies with the NHTSA's accuracy statistic. Because it doesn't work. Competent science uses two special accuracies (called sensitivity and specificity, if you care). Here are a couple examples, and a link to google for hundreds more.


"Studies of screening and diagnostic tests should report sensitivity, specificity, and likelihood ratio. If predictive value or accuracy is reported, prevalence or pretest likelihood should be given as well."

Journal of the American Medical Association
Instructions For Authors, 2008
, pg 8;
also in JAMA, July 2, 2008-Vol 300, No. 1

When the American Medical Association publishes research in its journal (the largest circulation medical journal in the world), it requires authors to describe diagnostic test accuracy by reporting the statistics sensitivity and specificity. Reports of predictive value (aka "accuracy") may be added, but only if prevalence information is also given. Later on we'll talk about why.

 

Pointy headed medical experts at the American College of Physician review about 5,000 medically related scientific articles each month, and summarize the 25 most relevant articles in ACP Journal Club. Here's a figure from the journal showing what they do:

ACP Journal Club prints a glossary of statistical terms related to diagnostic test interpretation. Here it is.  >>

You see "accuracy" on the list? Me neither. The experts who review 60,000 medical journal articles a year don't include "accuracy" in the glossary of scientific terms used to describe diagnostic tests.

"Terms used in Diagnosis

"Sensitivity: the proportion of patients with the target disorder who have a positive test result (Figure) (a/[a + c]).

"Specificity: the proportion of patients without the target disorder who have a negative test result (Figure) (d/[b + dl).

"Pretest probability (prevalence): the proportion of patients who have the target disorder, as determined before the test is carried out (Figure) ([a + c]/[a + b + c + dl).

"Pretest odds: the odds that the patient has the target disorder before the test is carried out (pretest probability/[l - pretest probability]).

"Likelihood ratio (LR): the ratio of the probability of a test result among patients with the target; disorder to the probability of that same test result among patients who are free of the target disorder. The LR for a positive test (positive likelihood ratio) is calculated as sensitivity/ (1 - specificity). The LR for a negative test (negative likelihood ratio) is calculated as (1 - sensitivity)/specificity.

"Posttest odds: the odds that the patient has the target disorder after the test is carried out (pretest odds x LR).

"Posttest probability: the proportion of patients with that particular test result who have the target disorder (posttest odds/[l + posttest odds]). Use of a nomogram avoids the need for these calculations."

ACP Journal Club, May 8, 2008 , volume 148, number 3, page JC3-2,

Two not enough? Google sensitivity specificity predictive value.

Mainstream, competent science does not use NHTSA's Accuracy = Predictive Value theory — because it does not work. Oops.

Top

NHTSA's A = PV  theory gives answers that are wrong
If you think I'm a nice guy you'll maybe believe me when I tell you there are technical mathematical reasons why NHTSA's naive A = PV theory is wrong.

Another good way to see the problems with NHTSA's theory is to try it out. Use it. See the answers it gives. See how wrong they are. Let's do that.

Officers are taught that the One Leg Stand is 65% accurate. They are taught that that means if they rely on the OLS test, the answers they get will be correct 65% of the time.

So let's set up an OLS sobriety checkpoint outside a local church Sunday morning.
When you do the OLS test, the answer you get is correct 65% of the time. That means OLS gives the wrong answer 100 – 65 = 35% of the time.

No one leaving church has a high BAC.

Officers do OLS tests on 100 drivers leaving church. The OLS gives the correct answer 65% of the time — 65 of these 100 innocent parishioners will pass the OLS test. They will be released.

The OLS test gives the wrong answer 100 – 65 = 35% of the time. Thirty-five innocent parishioners will fail the OLS test. They will all be arrested.
At the end of the day....35 arrests, all wrong. Arrest accuracy 0%. Every single person arrested was innocent.

65 releases, all correct. Release accuracy 100%.

This is not a trick. This is how SFSTs and OLS work in the real world. Police really could set up a sobriety checkpoint outside a church Sunday morning. And when they did, the SFST would make mistakes. NHTSA science proves that. And every time the SFST made a mistake the officer would arrest a person who was innocent. Every single person arrested would be innocent.

According to NHTSA's naive theory, "OLS is 65% accurate." We tried out that theory. We used that theory to predict what would happen. What really happened is not what we expected. NHTSAs theory does not work. NHTSA's theory is wrong.

Look at the boxes on the bottom right of the picture. Diagnostic tests don't have one "accuracy" they have several. In this example the 65% "overall accuracy" of OLS is the average of the 0% arrest accuracy and the 100% release accuracy. Knowing the overall accuracy does not tell you the arrest accuracy.

 

Oktoberfest
It get's weirder. Now lets do exactly the same test, exactly the same way, by exactly the same police officers with exactly the same "accuracy" – correct answers 65% of the time.

Only this time we do the test at Oktoberfest. Everyone has a high BAC.
At the end of the day...65 arrests, all correct. Arrest accuracy 100%.

35 releases, all wrong. Release accuracy 0% Every single person released was guilty.

The Predictive Value of a test depends on where you do the test. Amazing, but true. This is how the world really works. Only NHTSA doesn't know. So NHTSA uses statistics that greatly underestimate the SFST's false arrest and false conviction rate.

What's going on
The idea that the accuracy of a test changes from place to place sounds ridiculous. Sounds like a trick. It sort of is, but not the way you'd think. Here's what's going on.

The SFST is not a sobriety test. Officers do not measure sobriety. They measure, can you stand on one leg, can you walk and turn, do your eyes move smoothly. Officer's measure coordination. That's it. That's all they do. SFSTs are coordination tests.

NHTSA has a theory: Drunks are uncoordinated, and everyone who is uncoordinated is drunk. But it's only a theory. And it's not a very good theory. The truth is people are uncoordinated for lots of reasons besides alcohol.

Going from the church to Oktoberfest, the accuracy of the officers' coordination measurements doesn't change. What changes is the accuracy of NHTSA's theory. At church no one is drunk; everyone who is uncoordinated is uncoordinated for some other reason. At Oktoberfest, everyone is drunk, so everyone who is drunk is uncoordinated.

The accuracy of the SFST depends on the accuracy of the officer's coordination measurement and on the accuracy of the theory that everyone who is uncoordinated is drunk.

Review
Because courts have decided to rely on a test that makes mistakes, courts convict innocent people. Cost of doing business. Science cannot say whether any particular SFST is wrong, but science can say what percentage of the time SFSTs lead to false convictions.

NHTSA answers the wrong conviction rate question with the easy and obvious answer — "accuracy." But real scientists know NHTSA's answer doesn't work.

So here's a question. What is the SFST's false conviction rate?

In his threatening (as I read it) email, NHTSA's official contract scientist, San Diego validation study author Dr. Stuster, writes:
"I cannot find one claim in your rant that is supported by the evidence or accepted statistical methods."

So I asked NHTSA scientist Dr. Stuster to name the statistical methods he has in mind, and the basis on which he believes them to be generally accepted. So far he hasn't.

Top

Specificity proves SFSTs are inaccurate
The specificity statistic the Journal of the American Medical Association demands in reports of diagnostic test accuracy is effectively the accuracy of the test on innocent people.

Specificity can be calculated from data organized in "contingency tables" like the ones here. NHTSA SFST validation reports give contingency tables (which they call "decision matrices") for some selected statistics: officers' BAC estimates; arrest decisions; and, in the San Diego study, for HGN and WAT and OLS. But never the SFST. Never the SFST. No NHTSA SFST validation study has ever revealed the decision matrix of the SFST itself.

Here it is.


AT BAC 0.04%
29 innocent drivers took the SFST
27 failed—93%.
2 passed— 7 %
On innocent people the interpretation accuracy is 7%!


If juries rely on the SFST to decide the guilt of drivers charged with DWAI at the current 0.05% level, they will wrongly convict 93% of the innocent drivers who go to trial.


Here's the decision matrix for the SFST at 0.08% BAC:

83 innocent drivers took the SFST
59 failed—71%.
24 passed— 29 %
On innocent people the interpretation accuracy is 29%!

At 0.08% BAC, on innocent people the SFST is only 29% accurate! That's worse than a coin toss.

If juries rely on the standardized field sobriety test to decide the guilt of drivers charged with DUI at the 0.08% level, they will falsely convict 71% of the innocent drivers who go to trial.

Top

The Predictive Value of the SFST
The Journal of the American Medical Association's other relevant requirement is that if a study reports a diagnostic test's interpretation accuracy statistic (aka predictive value) then it must also give the study group's prevalence of the condition tested for. The prevalence is the mix—the proportion (think "percentage")—of people in the study group who have the condition tested for, who were alcohol impaired.

The reason for this requirement is clear from the skewed sample skewed results example below. The interpretation accuracy statistic is highly dependent on what mix (percentage) of people in the study group are impaired.

 

Top

Be sure you get that. You will not understand SFST validation studies until you understand this. The interpretation accuracy statistic a study discovers depends on two factors:
1. the mix (prevalence, percentage) of impairment in the group of people studied,
2. the test itself.

Reporting both the mix (prevalence) and the accuracy statistic shows how much of the accuracy statistic is due to the inclusion criteria of the study (mix), and how much is due to the SFST itself. Here's how...

Here's the contingency table for the SFST, based on the standardized interpretation criteria imagined to target a BAC of 0.08%. The SFST's accuracy statistic is 78%

72% of the people in the study group were impaired. If you were reporting this study in the Journal of the American Medical Association, you'd be required to say" "Sensitivity 100%, specificity 29%."

If you wanted to mention the accuracy statistic, you'd be required to say: "Accuracy was 78% in this group with a prevalence of impairment of 72%"


Prevalence of impairment = 213 / 296 = 72%

How did the study group get to be 72% impaired? Before the first officer set out on his first patrol, the study design assured that the San Diego study group would be skewed toward drunks. The study design excluded drivers who drove well. The study design excluded people driving during the day (officers patrolled late at night). The study design excluded drivers who looked and smelled and acted sober. In fact the study design deliberately excluded everyone highly experienced DUI officers thought was sober.

Using those inclusion criteria, and big city late at night patrol tactics, veteran DUI officers were able to come up with a study group that was 72% guilty, at 0.08% BAC, before they began doing SFSTs. And after they did SFSTs? After doing SFSTs, they ended up with a group of drivers that was 78% guilty. The SFST itself can be responsible for no more than 6% of that 78% accuracy! JAMA's medical readership is trained to understand this. Thus the requirement that predictive value be paired with prevalence.

At BAC 0.04% the prevalence and the accuracy statistic are 90% and 91%. Within the margin of error, the SFST contributes nothing to the final accuracy!

 


Accuracy
= 265 / 292 = 91%
Prevalence = 267 / 296 = 90%

In English
This is science's precise way of saying the obvious. On innocent people the SFST gives the wrong answer 93% of the time. A test can't be wrong that often and give meaningful results. When an FST says "impaired," you can't tell whether the test is positive because the person is really impaired, or whether this is just one of those 93% of innocent people for whom the FST gives the wrong answer.

PPV tables
There's a formula for calculating the accuracy statistic (aka Positive Predictive Value) at various group impairment percentages (aka Pretest probabilities). Comparing the PPV with the Pretest probability lets you see how much probability of guilt the SFST adds in each case. Here's a table of results for the SFST at BAC 0.04%.


When a driver's pretest probability of impairment is 20%, a failed FST indicates a post-test probability of impairment of 21%. The FST moved the probability 1%. 50% becomes 52%. 70% becomes 71%. None of these changes are greater than the margin of error in the pretest probabilities themselves. The FST makes no difference. None. Within the margin of error, the FST does not move the probability at all.

Cognitive dissonance.
Wait a minute. Validation studies prove SFST accuracy is 90%. Now Greg says the real accuracy is 1 to 2%. How can that be? How do NHTSA validation studies prove the SFST is "extremely accurate"?

1

They "validate" not the SFST but the officers' 90% accuracy. Officers systematically ignore the SFST.

2

They inflate the accuracy statistic with study designs that loads up on drunks.

3

They keep SFST results secret.

Here are PPV results using SFST criteria for identifying impairment at 0.08% BAC

Again, no meaningful change in the probability of impairment. The science has been done. The science proves FSTs do not work.

Criticize NHTSA science, get sued

Read this web site while you can. Best I can tell we live in a country where, if you criticize government science, the agency will look the other way while it's contractor tries to intimidate you into shutting up.
Best I can tell, I am under ongoing threat from the NHTSA's contract scientist Dr. Jack Stuster for exposing the scientific errors you are reading about here.

I'm about to say unflattering things about NHTSA science. This web site is about science—the science of the NHTSA's SFST validation theory. I do not know, I do not care, I do not have an opinion about Dr. Jack Stuster's knowledge or intentions at any time ever in his life. I'm not even saying he had knowledge or intentions. But if he did, this web site isn't about them. Or him. This web site is not about Jack Stuster, PhD, CPE. This web site is about the flawed science in the NHTSA's SFST validation theory and supporting studies—and in every courtroom where SFSTs are admitted as evidence. 

If you're an expert critical of the SFST and you've been threatened or intimidated by NHTSA or it's associates, let me know. I'm putting together a list.

Skewed sample, skewed results: how the NHTSA's accuracy statistic can be manipulated
THE NHTSA's fundamental claim about SFSTs is that if you know a person's Field Sobriety Test score you know their TopBlood Alcohol Concentration.

"The only appropriate criterion measure to assess the accuracy of SFSTs is BAC."

Validation Of The Standardized Field Sobriety Test Battery At BACs Below 0.10 Percent Final Report Stuster and Burns, 1998, pg. 10

The National Highway Traffic Safety Administration has paid for several validation studies claiming to discover that officers "using" FSTs are "extremely accurate"—about 90% accurate—at identifying driver's with high BACs.

These studies generally report police officers' "arrest accuracy." What the NHTSA doesn't let on is, this accuracy statistic is easily manipulated. Simply by skewing the mix of sober and impaired drivers you choose to “study,” you can set up your validation project beforehand so it is certain to “discover” whatever arrest accuracy you’ve been paid to validate. The extended example below shows you how it can be done.

To “discover” the accuracy you're being paid to discover, skew the sample.
I'm going to show you how skewed study groups lead directly to skewed accuracy statistics. We're going to calculate the NHTSA's accuracy statistic not just for the group of drivers in the San Diego study, but also for other groups of drivers. We'll keep the fundamental scientific accuracies of the SFST—what scientists call "sensitivity" and "specificity"—exactly the same in every case. The only thing we'll vary is a thing researchers have control over, the mix of guilty and innocent drivers chosen to be in the study. We'll skew that mix up and down. When we do, the accuracy statistic we "discover" will also move up and down. A lot.

The accuracy statistic changes as the mix of drivers researchers choose to study changes. Which is why peer reviewed scientific journals like the Journal of the American Medical Association do not accept papers that report only the accuracy statistic. > >

In this example I'm not going to make up SFST results, and I'm not going to ask you to believe my calculations on the whole data set. We'll use published data. We'll calculate the innocent driver accuracy (specificity) and guilty driver accuracy (sensitivity) of the Walk And Turn component of the SFST. You'll find the needed data on page 21 (.pdf page 31) of the San Diego study report.

Contingency tables
We'll do our calculations using contingency tables. When one test (Walk And Turn) acts as a stand-in for a second, gold standard test (Blood Alcohol Concentration), scientists summarize the stand-in's performance with 2 x 2 boxes—contingency tables—like the ones here. (NHTSA validation studies call them decision matrices.) At the right is the contingency table for the WAT test .

The four squares in a matrix tally study results. (Extra boxes on the right and bottom sum rows and columns.) Every driver in a validation study goes in one of the four squares, depending on their combination of BAC and WAT results. BAC results are separated by row. Drivers with BACs above the legal limit go in the top row. Drivers with BACs less than the legal limit go in the bottom row. Which column a driver goes in depends on whether the WAT said to release or arrest them.

Here's what we can tell from the WAT contingency table:
195 guilty drivers took the WAT test.
179 failed — 92%. The accuracy of the WAT on guilty people is 92%.

76 innocent people took the WAT test.
36 passed — 47%. The accuracy of the WAT on innocent people is 47%. Let's pretend the WAT is more accurate than it really is, and round that up to 50%. A coin toss.


Skewing samples, checking results.
In general scientists can't study SFSTs, drug treatments, jury strategies, or anything else, on all 300 million Americans. So they pick representative samples, groups of people whose average age, height, weight, income, gender, etc. are similar, as a group, to those same features in the general population. They study the group and reason that the group's statistics will be similar to the population's statistics—to the degree the study group is similar to the population.

The best way to pick groups that are similar to the whole population is to pick people at random. (This turns out to be harder than you'd think.) A representative group of SFST subjects would have the same age, gender, athletic ability, and percentage of sobriety as the population in general.

Groups that do not look like the population, groups that have a different number of old people, or women, or drunks than the general population are said to be skewed. In our extended example we're going to skew the proportion of drunk and innocent people in a series of gedanken SFST validation studies. We're going to see how skewing the study group affects the accuracy statistic we "discover." Ready? Here we go.

GROUP 1 has impaired and sober drivers at a ratio, 85%, slightly higher than the San Diego SFST study's 72%. As with the San Diego study, we discovers an accuracy statistic of 91%

Notice this example keeps the WAT test's fundamental accuracies, the innocent and impaired driver accuracies, unchanged. One hundred drivers were impaired. 8 impaired drivers passed the WAT test, 92 impaired drivers failed. The accuracy of the WAT on impaired drivers is 92%.

Eighteen innocent drivers took the WAT. 9 passed, 9 failed. The accuracy of the WAT on innocent drivers is 50%.

These numbers, the 50% and the 92%, I didn't make them up. I got them from the latest, most up to date NHTSA SFST validation study. These are the real world accuracies of the real world WAT test. The other numbers, the 9,9, 8 and 92 are calculated from those accuracies. If the WAT test is done on 100 impaired people, on average 8 will pass, 92 will fail. If the WAT is done on 18 innocent people, on average 9 will pass, 9 will fail.

Two numbers I did pick. The 100 impaired and 18 innocent drivers. I didn't pick randomly. I chose carefully. The truth is, my WAT validation study's accuracy "discovery" is not a discovery at all. I new ahead of time what I wanted to discover, and I rigged the study. I deliberately manipulated the study group. I loaded up on drunks. I knew if I did, the accuracy statistic I "discovered" would be high.

To rig the study, I didn't have to sneak into the lab after dark and switch results. I didn't have to get honest, upright police officers to cheat. I didn't have to change the SFST in any way. I didn't have to lie. All I had to do was set up my study design so most of the people we would be studying would be impaired. Just load up on drunks. That's it.

Almost. Also, I had to ignore basic science and conceal sensitivity, specificity and prevalence. And I had to be sure my work wasn't peer reviewed.

 

GROUP 2 has NO impaired drivers. Look what happens to the NHTSA’s arrest accuracy statistic as the mix of innocent and impaired drivers changes. In this and all other examples on this page, the only thing that changes is the mix of impaired and sober drivers in the study group. All the results here reflect exactly the same highly skilled officers doing exactly the same NHTSA-standardized WAT test. The fundamental accuracies of the test stay the same. In each example, the impaired driver accuracy is 92%, and the innocent driver accuracy is 50%.

When the same highly skilled officers do exactly the same WAT test on a group of drivers who are  all innocent the NHTSA’s  arrest accuracy statistic is  0%.

This happens even though the innocent driver accuracy is still 50% and the impaired driver accuracy is still 92%.

Now you see why field sobriety tests are instruments of evil.

Suppose police set up a DUI checkpoint outside a church Sunday morning, and did WAT tests on parishioners as they left the service. None of the parishioners would have any alcohol on board, but 50% of the parishioners who took the WAT test would fail. Every single failed test would be mistaken. But police are told, and trained, and do testify that SFSTs are 91% accurate. If the police believed their own test, they'd arrest dozens of parishioners, all of them innocent. This accuracy business in not a paper exercise. It damages real lives.

 

GROUP 3 has ONLY impaired drivers.

 

When the same highly skilled officers do exactly the same WAT test on a group of drivers who are  all impaired  the NHTSA’s arrest accuracy statistic is 100%.

This happens even though the innocent driver accuracy is still 50% and the impaired driver accuracy is still 92%.

 

Do you see how every time the mix of drivers in the study group changes, the NHTSA’s arrest accuracy statistic changes? Skewed sample, skewed results. This is why peer reviewed scientific journals like the Journal of the American Medical Association do not accept papers that report only the accuracy statistic.

The change-the-mix accuracy swing is huge, from zero percent all the way up to 100%. Simply by skewing the impaired driver: innocent driver mix you choose to “study,” you can set up your field test beforehand to “discover” whatever arrest accuracy you’ve been paid to validate. Like this:

If you are being paid to discover an accuracy of 99% skew your study group so that  98% of its drivers are impaired

If you are being paid to discover an accuracy of  80% skew your study group so that  68% of its drivers are impaired.

If you are being paid to discover an accuracy of  60% skew your study group so that  45% of its drivers are impaired

If you are being paid to discover an accuracy of  40% skew your study group so that  27% of its drivers are impaired

If you are being paid to discover an accuracy of  20% skew your study group so that  12% of its drivers are impaired

If you are being paid to discover an accuracy of  1% skew your study group so that  1% of its drivers are impaired

These examples represent highly skilled DUI officers, personally trained by Dr. Burns (effectively the inventor of the WAT test), performing NHTSA-standardized WAT tests flawlessly. I didn't make up the WAT accuracies, I found them on page 21 of the NHTSA's most recent, most up do date SFST validation study. In every example the accuracy of the WAT on innocent drivers is the same, 50%. In every example the accuracy of the WAT on impaired drivers is the same, 92%

From example to example all that changes is the mix of impaired and sober drivers. Skewing that mix lets you manipulate the accuracy you "discover" to any number you choose, from 0% to 100%

San Diego validation study author Dr. Stuster responds:
Neither NHTSA nor I selected the drivers who were stopped. Drivers were stopped during the study period who officers observed exhibiting a driving error or violation, which is the only legal procedure that could be followed to assess the SFSTs under field conditions. Only one case was excluded from analysis and that was because the driver refused all chemical tests. There was no a priori selection of subjects and NO manipulation of data, except by you in your examples.

Read Greg's reply.


All NHTSA SFST validation studies "discovering" that the SFST is "extremely accurate" depend on this statistical trick.

I am not saying the NHTSA or its contractors are deliberately deceptive. I'm not saying all that money they have been paid in any way influenced their findings. I do not know, I do not care, I do not have an opinion about what they knew or didn't know, or did or did not intend. Nothing here is a statement about the knowledge or intentions of the NHTSA or it's contractors. This web site is not about them. This web site is about science—the science of SFST validation theory.

I do have an opinion about the science in the NHTSA's validation studies. The science is flawed. FieldSobrietyTest.info is about the science.

Not only is it possible to pick a skewed-sample study group that inflates the "accuracy" the study "discovers," compared with the accuracy in the population in general, that's actually how it's done in real life. That's exactly how NHTSA validation studies "discover" a high accuracy for the FST.

The high "accuracy" of the SFST is a statistical trick.

Here's how the San Diego study "discovered" a high SFST accuracy:

How the NHTSA's San Diego SFST validation study"discovered" the SFST is 91% accurate
BAC 0.04%

Let's start with this diagram of just the SFST results.

Each driver in the study is a cricle. Driver's who failed the SFST have a dot in their circle. 296 drivers, 292 failed the SFST = 99%

Four circles without dots—4 drivers passed the SFST = 1%.

If juries believe the SFST, they will convict each of the 292 drivers who failed the test.

The NHTSA claims officers using the SFST were "extremely accurate" at exactly this BAC level, 0.04%. Lets use the never before published NHTSA San Diego study data to see how the SFST really performed.

 

This study group diagram represents every driver in the San Diego study at BAC 0.04%. Red circles are drivers with high BACs. Blue circles are drivers with low BACs. Driver's with green spots failed the SFST. The same four drivers don't have green spots—they passed the SFST.

292 guilt drivers, 290 failed. Failure rate 99%.
29 innocent drivers, 27 failures. Failure rate 93%.

Remember, the reasoning behind a diagnostic test is, it measures a property that people with the condition tend to have but people without the condition don't have. SFSTs measure incoordination. The theory is, people with high BAC tend to have incoordination more than people without high BAC. But look at those percentages. People with high BAC tend to have SFST-incoordination at a 99% rate. Fine so far. But people with low BACs have SFST-incoordination at a 93% rate. 99%, 93% — very little difference. As a practical matter, the SFST can't tell the difference between people with high and low BACs.

So where'd the high NHTSA accuracy statistic come from? From the mix: 267 High BAC. 29 Low BAC— 90% of the drivers in the study had high BACs.

Impaired drivers who failed (Red circles, dots) are SFST successes. Innocent drivers who failed (Blue circles, dots) are SFST mistakes. The fix is, don't worry about dots, exclude blue circles. Study as few innocent drivers as possible. Load up on drunks. In the San Diego study at 0.04% BAC impaired drivers outnumber innocent drivers 90% to 10%—9 to 1. And the NHTSA accuracy statistic the study "discovered" is 91%—9 to 1.

The study didn't discover the accuracy of the SFST, it discovered it's driver ratio!

top