%False-positive psychology
%\url{http://andrewgelman.com/2012/02/false-positive-psychology/}

% FalsePositives/contents.tex
%
\chapter{\mychaptername}
\label{\here}

\tocnotetoo{
In Chapter~\ref{BreakTheBank} we looked at probabilities of independent
events -- things that had nothing to do with one another. Here we think about
probabilities in situations where we expect to see connections:
screening tests for diseases, DNA evidence for guilt in a criminal
trial.}%
\teachertag
\begin{teacher}
This chapter focusses on two way contingency tables in
order to discuss several important common logical pitfalls dealing
with everyday probabilities. We think that approach makes more sense,
and is easier to remember and apply, than an explicit treatment of
dependent events and Bayes' theorem. That's too technical for our
goals in this quantitative reasoning text, and so better left for a
full course in probability and statistics. In fact, many of the
examples in this chapter employ qualitative rather than quantitative
reasoning.  

You can even skip the first two sections and the vocabulary of
dependent events and start with the section on screening for rare
diseases. 
\end{teacher}

\begin{goals}
\begin{goal}{contingencytable}
Interpret and buid two way contingency tables.
\end{goal}

\begin{goal}{dependentevents}
Understand how to compute probabilities for dependent events.
\end{goal}

\begin{goal}{falsepositives}
Understand the implications of false positives.
\end{goal}
\end{goals}

\begin{chapterpix}

\begin{center}
\includegraphics[height=50mm]{\here/mrBoffo.jpg}
\end{center}

\theGlobe, August 29, 2008 \\copyright \copyright Neatly Chiseled
Features

\includegraphics[height=50mm]{\here/111polygraph2.jpg}
\includegraphics[height=50mm]{\here/thomasbayes.png}

If we use the polygraph, write an exercise about it. The Bayes image
is probably too arcane, since we don't ever mention Bayes' theorem
(although we could).

\url{http://phillips.blogs.com/goc/2008/01/lying-lie-detec.html}
\url{http://www.allspammedup.com/2011/02/what-are-bayesian-filters-anyway/}

Many google search images for ``false positive'' are about pregnancy
tests. I didn't include any here.

\end{chapterpix}

\qrsection[conditional]{UMass Boston enrollment}

Table~\ref{UMassBostonEnrollment}
summarizes student enrollment 
at UMass Boston in 2006 by category two ways: graduate/undergraduate
and male/female.
We can use the data to answer some probability questions about the
a random student.%
\footnote{This is real data, but not generally interesting. It's here
to illustrate some important ideas -- we hope to replace it with
a better example for that purpose.}

\begin{table}[ht]
\centering
\includegraphics[height=30mm, width=120mm]{\here/UMBstudentTable.jpg}
\caption{UMass Boston Enrollment, 2007}
\tablesource{Handbuilt table. We're sure data is public.}
\label{UMassBostonEnrollment}
\end{table}



\begin{itemize}

\item What is the probability that a student chosen at random is an
undergraduate?

The totals in the last row of the table has the numbers we need:
\begin{align*}
\frac{\text{number of undergraduates}}{\text{number of students}}
& = \frac{5,680}{13,433} \\
& = 0.4228392764 \\
& \approx 42\%.
\end{align*}

\item What is the probability that a student is female?

For that computation we use the totals in the last \emph{column}:

\begin{align*}
\frac{\text{number of females}}{\text{number of students}}
& = \frac{8,068}{13,433} \\
& = 0.600610437 \\
& \approx 60\%.
\end{align*}

\item What is the probability that a student is a female
undergraduate?

Use the count in the first column of the first row:

\begin{align*}
\frac{\text{number of female undergraduates}}{\text{number of students}}
& = \frac{5,680}{13,433} \\
& =  0.4228392764 \\
& \approx 43\%.
\end{align*}

\end{itemize}

In each of these probability calculations we used the total number of
students (13,433) in the denominator. 

Continuing \ldots

\begin{itemize}

\item What is the probability that a female student is an undergraduate?

  Since this is a question about the female students, we need a
different denominator:

\begin{align*}
\frac{\text{number of female undergraduates}}{\text{number of
    female students}} 
& = \frac{5,680}{8,068} \\
& = 0.70401586514 \\
& \approx 70\% .
\end{align*}

\item What is the probability that an undergraduate is female?

That's a different question. This time we know the student is an
undergraduate. That calls for a different denominator:

\begin{align*}
\frac{\text{number of female undergraduates}}{\text{number of
    undergraduates}}
& = \frac{5,680}{10,008} \\
& =  0.56754596322 \\
& \approx 57\% .
\end{align*}
\end{itemize}

The last two questions sound similar, but have different answers,
because each begins with a different assumption. In the first we know
the student is female and wonder whether she's an undergraduate.
In the second, we know that the student is an undergraduate and wonder
whether it's a she. 

We're not finished thinking about these probabilities. We found that
there's a 60\% probability that a student is female. But \emph{if
we know the student is an undergraduate} then that probability drops to
57\%, because the proportion of women is different for undergraduates
than for the student body as a whole. This is not what happened when
we thought about a coin and a die
in \sref*{coindie}. The probability that the die shows a four is the
same whether the coin comes up heads or tails. Those events are
\emph{independent}. The facts \emph{is female} and 
\emph{is an undergraduate} are \emph{dependent}\index{dependent
events}. When you know one of them you know something about the
probability of the other.

We learned in \sref*{coindie} that when events are independent
you multiply to compute the probability that both happen:
\begin{align*}
\text{probability(coin head and die four)}
& = 
\text{probability(coin head)} \times \text{probability(die four)} \\
& = \frac{1}{2} \times \frac{1}{6} \\
& = \frac{1}{12}.
\end{align*}
For dependent events that won't work. We found that
\begin{equation*}
\text{probability(female and undergraduate)} = 43\%
\end{equation*}
but
\begin{align*}
\text{probability(female)} \times \text{probability(undergraduate)} 
& = 60\% \times 42\% \\
& = 25\%.
\end{align*}

In the rest of this chapter we will look at the probabilities for
dependent events, working with displays like
Table~\ref{UMassBostonEnrollment}
in examples where the consequences matter much more
than they do here.%
\footnote{
What we do with tables can also be done with formulas -- the most
important one is called \emindex{Bayes' rule}. We won't use the
formulas since we think the tables are easier to understand and the
methods using them easier to remember.}

\qrsection[falsepos]{False positives and false negatives}

Figure~\ref{vennDiagram} appeared in the
article \headline{False positives, false negatives, and the validity
  of the diagnosis of major depression in primary care} in the
September 1998 Archives of Family Medicine.%
\webref{%
http://archfami.ama-assn.org/cgi/reprint/7/5/451
}
It summarizes the results of a study of 372 patients who were screened
by family physicians for clinical depression.

\begin{figure}[ht]
\centering
\includegraphics[height=60mm]{\here/vennDiagram.png}
\caption{Diagnosing depression}
\figsource{\url{http://archfami.ama-assn.org/cgi/reprint/7/5/451}}
\figcomment{Permission needed.}
\label{vennDiagram}
\end{figure}

The numbers in the four categories in the figure are easier to
understand when they are displayed in Table~\ref{vennTable}, where
we have included percentages along with the counts.%
\footnote{
Since we rounded the percentages they actually add up to just 99\%
instead of the 100\% shown.
}
\begin{table}[ht]
\centering
\begin{tabular}{|c|c|r|r||r|}
\hline 
\multicolumn{2}{|c|}{} &  \multicolumn{3}{|c|}{depressed} \\
\hline
\multicolumn{2}{|c|}{}  & yes & no & total\\
\hline
\multirow{2}{*}{diagnosed}
	 & yes &  31 (8\%) & 34 (9\%) & 65 (17\%)\\
	 & no  &  50 (13\%)  & 257 (69\%) & 307 (83\%) \\
\hline
\hline
 & total & 81 (21\%) & 291 (79\%) & 372 (100\%) \\
\hline
\end{tabular}
\caption{Diagnosing depression}
\tablesource{Data from
\url{http://archfami.ama-assn.org/cgi/reprint/7/5/451}}
\tablecomment{Needs permission.}
\label{vennTable}
\end{table}

%\multirow{2}{*}{\begin{sideways}diagnosed\end{sideways}}
%	 & \begin{sideways}yes\end{sideways}&  31 & 34 & 65 \\
%	 & \begin{sideways}no\end{sideways} &  50 & 257 & 307 \\
%

Two by two tables like this are called {\em
contingency tables}\index{contingency
tables}. Figure~\ref{contingencyTable} shows the standard names for
the four top left cells: true positive,
false positive, false negative and true negative. In this example
they have values 31, 34, 50 and 257, corresponding to probabilities
$31/372 \approx 8\%$, $34/372 \approx 9\%$, $50/372 \approx 13\%$ and
$257/372 \approx 69\%$.% 

\begin{figure}[ht]
\centering
\includegraphics[height=40mm, width=120mm]{\here/FalseNegChart.png}
\caption{A two way contingency table}
\figsource{Hand built.}
\figcomment{Redraw to make it look nicer.}
\label{contingencyTable}
\end{figure}

The most interesting conclusions you can draw from a contingency table
use the row and column totals along with the individual entries.

\begin{itemize*}

\item The first row total tells us that about $8\% + 9\% = 17\%$ were
diagnosed as depressed.

\item The first column total tells us that about
$8\% + 13\% = 21\%$ were actually depressed.
\end{itemize*}

The fact that the two probabilities are close suggests that this is a
good test: it identifies identifies just about the right fraction of the
population. But when you study the rows and columns separately, a
different story emerges. 

\begin{itemize}

\item
The second column says that the \emindex{false positive rate}
is $34/291 = 0.117 \approx 12\%$. That means about 12\% of the people
diagnosed as depressed don't in fact suffer from that condition.

\item
The first column says that
\emph{if a person is depressed} the probability that he or she will be
diagnosed correctly is only $31/81 \approx 38\%$. There's a $62\%$
chance the condition will be missed. That 62\% is the \emindex{false
negative} rate. 
\end{itemize}

Whether this is a ``good'' test is a difficult decision.  Although the
chance of misdiagnosis of depression when it doesn't exist is fairly
low -- about 12\% -- the 62\% false negative rate says that test will
identify less than half the depressed people.

\qrsection[rare]{Screening for a rare disease}

A test with a small false positive rate looks like a good candidate for
screening large populations for a nasty disease. However, if the
disease is rare, the test may not be as good as it looks. In this section
we'll study two examples, one made up and one real.

Suppose a drug company has developed a test for rare disease
X. Clinical trials show that the test is 90\% accurate at detection,
so the false negative rate is 10\%. The false positive rate is only 1\%.

These are the important questions:

\begin{enumerate}
\item What is the probability that a person who suffers from X tests
positive?

\item What is the probability that a person who tests positive suffers
from X?
\end{enumerate}

If the test were perfect each question would have the same answer:
100\%. But The two facts \emph{suffers from X} and \emph{tests
  positive} are not exactly the same.  Knowing either one makes the
other more likely, but not certain. We want to find out how much more
likely in each case.

Question 1 is easy: the drug company's clinical
trials found that the answer is 90\%.

Whether that test is as good as it sounds depends in part on the
answer to the second question. That answer depends on two things: the
false positive rate and the number of people who actually have X.
Suppose it's rare -- affecting just one person in every 1,000 (one
one tenth of one percent of the population). Then even though the
false positive rate is only 1\%, most of the positive results will come
from healthy people. To find the actual value for ``most of'' we will
build the contingency table.

Since percentages (particularly small percentages) are often
confusing, we'll build our table for an imaginary population of
100,000 people that just matches the statistical profile for this
test.%
\footnote{
We introduced this strategy for dealing with percentages in
Chapter~\ref{Percentages}.}
In that population of 100,000, one out of every 1,000
will have the disease. That's 100 people. Of those 100, 90\% (so 90
people) will test positive. The other 10 will be the false negatives.
Of the 99,900 healthy people, one percent (999) will test
positive. The other 98,901 will be the true
negatives. Figure~\ref{fig:xtable} shows the contingency table.

\begin{minipage}[c]{\textwidth}
\centering
    \includegraphics[width=3.0in]{\here/xtablecropped}
    \captionof{figure}{Screening for disease X}
    \label{fig:xtable}
\end{minipage}

Now we can answer the second question. The probability that someone
who tests positive is actually ill with X is only $90/1089 = 8.26\%$.

Is this acceptable? Maybe, maybe not. If the test is inexpensive and
there's a second test (perhaps more expensive) that can weed out the
false positives, and the disease can be treated successfully if
detected, perhaps the screening is a good idea.  If all the people who
test positive must undergo expensive painful unreliable treatment,
which would be unnecessary for more than 90\% of them, then the
screening is probably a bad investment of scarce health care
resources.


%The insurance company would
%like to think so. They advertise to ask you to ask your doctor
%for the test, lobby doctors to screen their patients, and lobby the
%insurance companies to cover the cost.

Now for a real example. Instead of disease X we'll look at prenatal
screening for the birth defect \myindex{Trisomy 18}. The quote that
follows is from a July 2008 web posting T Davies wrote in response
to an on line query. His prose is dense and complicated. You have to
read it carefully over and over again to extract the meaning.
Our first job is to understand what he's trying to say. Then we'll look at the
numbers and check his conclusions.
\teachertag
\begin{teacher}
This example started out as an exercise. We discovered (and should not
have been surprised) that it's much too difficult for students to read
independently -- but well worth the time in class.
\end{teacher}

\begin{qwrap}
\begin{quotation}
\firstline{Prenatal screening for trisomy 18: Should not be contemplated.}

K Spencer and colleagues' claims for prenatal detection of trisomy 18
by measurement of maternal serum (alpha) fetoprotein and free $\beta$
human chorionic gonadotrophin concentrations are impressive. Detection of
50\% of cases for a false positive rate of only 1\% seems to compare
favourably with the detection rate for Down's syndrome when similar
techniques are used, which is 70\% for a false positive rate of
5\%. Unfortunately, the authors fail to emphasise the importance of the
relative incidence of the two conditions at birth before concluding
that screening for trisomy 18 should be introduced. 

The natural incidence of Down's syndrome at birth is approximately
12.6/10,000 births. Among 10,000 pregnant women a 70\% sensitivity
would result in 8.8 cases being detected at the cost of 500
amniocenteses (5\% of 10,000). This means that one case of Down's
syndrome is detected for every 57 amniocenteses performed. The
incidence of trisomy 18 at birth is 1.3/10,000 births. A sensitivity
of 50\% would detect 0.65 cases per 10,000 women tested at a cost of
100 amniocenteses (1\% of 10,000). For each case of trisomy 18
detected, therefore, 154 women would have to have had
amniocentesis. Thus a screening programme would cause the abortion of
at least as many normal fetuses as it would detect cases of trisomy
18. 

In many places it is still undecided whether screening for Down's
syndrome is worth the disbenefits for the prospective parents. To my
mind, the decision is clear for screening for trisomy 18: screening
should not be contemplated until the predictive value of the test is
considerably improved.%
\webref{%
http://askville.amazon.com/understand-False-Positive-test-Trisomy-18/AnswerViewer.do?requestId=12714458
}
\end{quotation}
\sourceinfo{http://askville.amazon.com/understand-False-Positive-test-Trisomy-18/AnswerViewer.do?requestId=12714458}
\end{qwrap}

The first sentence says that the screening procedure for Trisomy 18
tests for chemicals in the mother's blood. If the test is positive
then the diagnosis is confirmed by an \emindex{amniocentesis}.
You don't have to know what amniocentesis is, but you
do have to know that it has some risks: there is a small chance that
it will lead to a miscarriage. Davies' goal is to convince you to
believe the last sentence of his next to last paragraph:

\begin{quotation}
Thus a screening programme would cause the abortion of at least as
many normal fetuses as it would detect cases of trisomy 18. 
\end{quotation}

The ``abortion'' he refers to is his synonym for ``miscarrage,'' not
the politically charged ``abortion'' so much in the news.

Let's do the numbers. To build the contingency table we need three of
them:
\begin{itemize}
\item The false negative rate. Since the test detects 50\% of cases
  the other 50\% are the false negatives.
\item The false positive rate. It's only 1\%.
\item The incidence rate. The second paragraph tells us it's 1.3 per
  10,000 births. 
\end{itemize}

Figure~\ref{fig:trisomytable} shows 
a screenshot of the spreadsheet
\link{ContingencyTable.xlsx} with entries for this
problem. Cell~\cell{B12} is named \excel{INCIDENCE}; it contains the
formula \excel{=1.3/10000}, formatted as a percent. Cell~\cell{B17}
for the number of true positive results contains the formula

\displayexcel{=POPULATION*INCIDENCE*(1-FALSENEG)}

which in this example is 
%
\begin{equation*}
10,000 \times \frac{1.3}{10,000} \times (1-0.5) = 0.65,
\end{equation*}
%
confirming Davies' ``0.65 cases per 10,000 women tested.'' 


\begin{figure}
\centering
\includegraphics[width=4.0in]{\here/trisomytable}
\captionof{figure}{Screening for Trisomy 18}
\label{fig:trisomytable}
\end{figure}

In order to check the diagnosis in those cases, every woman with a
positive test would need an amniocentesis. The spreadsheet shows
100.637 positive tests, which matches Davies' estimate of 100. So it
would take 100 amniocenteses to find 0.65 cases of Trisomy 18. That
works out to $100/0.65 = 154$ amniocenteses to find each case. He
concludes that ``a screening programme would cause the abortion of at
least as many normal fetuses as it would detect cases of trisomy 18.''

That would be true if the risk of miscarriage from amniocentesis was
about one in 150. It's probably smaller. Several web sources provide
statistics like these:

\begin{qwrap}
\begin{quotation}
Miscarriage is the primary risk related to amniocentesis. The risk of
miscarriage ranges from 1 in 400 to 1 in 200. In facilities where
amniocentesis is performed regularly, the rates are closer to 1 in
400. 
\webref{http://americanpregnancy.org/prenataltesting/amniocentesis.html}
\end{quotation}
\sourceinfo{http://americanpregnancy.org/prenataltesting/amniocentesis.html}
\end{qwrap}

If we use one in 300 instead of one in 150 as the probability of
miscarriage from an amniocentesis then it costs about one unnecessary
miscarriage to detect two cases of Trisomy 18. That's still a pretty
high risk. 

Davies compares his risk estimate to the much lower estimate for
similar screening for Down's syndrome, noting that ``In many places it
is still undecided whether screening for Down's syndrome is worth the
disbenefits for the prospective parents.''

Exercise~\exref{downs} asks you to check Davies' arithmetic for the
Down's syndrome.

\qrsection[prosecutor]{The prosecutor's fallacy}
\index{prosecutor's fallacy}

The Cornell University Legal Information Institute posted a discussion
of \emph{McDaniel v. Brown} when that case was on the docket of the
Supreme Court. They wrote \index{McDaniel v. Brown}\index{Supreme Court}

\begin{qwrap}
\begin{quotation}
\firstline{Following a state conviction for sexual assault, Troy Brown}
filed a petition for writ of habeas corpus in the United States
District Court for the District of Nevada. The District Court allowed
Brown to present new evidence: a report from Dr. Lawrence
Mueller. This report detailed a statistical error (``prosecutor's
fallacy'') made by the prosecution during the presentation of DNA
evidence. Based on Dr. Mueller's report, the District Court dismissed
the DNA evidence from consideration, found insufficient evidence to
convict Brown, and ordered a retrial.  

\ldots

At trial, Renee Romero, a forensic scientist at the Washoe County Crime
Lab, testified that the DNA found in the victim's underwear matched
Brown's DNA; only one in three million people would match the DNA
tested. The prosecutor asked Romero to express this statistic
as ``the likelihood that the DNA found \ldots is the same as the DNA
found in [Brown's] blood.'' Romero concluded that the likelihood was 99.999967
percent. Based on this statistic, the prosecutor then
asked Romero if it would be fair to conclude that there was a 0.000033
percent chance that the DNA did not belong to Brown. Romero
agreed with the prosecutor, stating that that this was ``not
inaccurate.''%
\webref{%
http://topics.law.cornell.edu/supct/cert/08-559}
\end{quotation}
\sourceinfo{http://topics.law.cornell.edu/supct/cert/08-559}
\end{qwrap}

Romero's arithmetic is right: one in three million  is 0.000033
percent. But her thinking is wrong. 

The prosecutor's fallacy is the claim that the one in three
million probability of a random match is the same as the probability
that the defendant is the source of the DNA sample. We can use a
contingency table to show why those probabilities are different.

First we need an
estimate of the population in which a possible DNA match might be
found. To make the arithmetic easier, we'll take that to be 9 million
people (Los Angeles is near enough to Nevada). Then the ``one in three
million'' statistic says we should expect three DNA matches from that
population. Table~\ref{mcdaniel} summarizes the data.

\begin{table}[ht]
\centering
\begin{tabular}{|c|r|r||r|}
\hline 
%\multicolumn{2}{|c|}{} &  \multicolumn{3}{|c|}{} \\
%\hline
\multicolumn{1}{|c|}{}  & guilty & innocent & total\\
\hline
DNA match &  1 &   2  & 3 \\
DNA nonmatch & 0  & 8,999,997  &  8,999,997 \\
\hline
\hline
 total & 1 & 8,999,999 & 9,000,000 \\
\hline
\end{tabular}
\caption{DNA matching}
\tablesource{Handbuilt data.}
\label{mcdaniel}
\end{table}

The first row of that table tells us that if the only evidence in the
case is the DNA match the odds are $2:1$ that the suspect is innocent!
That's a far cry from the ``99.999967\% guilty'' that the prosecutor
asked the jury to believe.

The defense didn't make this argument using a hypothetical 9,000,000
population of potential suspects. Instead they questioned 
the ``one in three million'' chance 
of a match. The defendant had near relatives in the
area which  increased the chances of a match to about one in 6,500,
according to a defense specialist. That would reduce the chance of an
accidental 
match to $6499 / 6500 = 0.999846154 \approx 99.98\%$. 
We're not surprised that the change from 99.999967 percent to 99.98\%
did not convince the jury to acquit. 99.98\% still sounds very much
like a sure thing.

But it's not, because of the prosecutor's fallacy. That was the basis
for the appeal. Suppose we
reduce the population from which the match might come to just
100,000 -- the nearby area where there may be close relatives. Then
the 1 in 6,500 chance of a match means there will be about 15 matches
in that population.
The numbers in the revised contingency table~\ref{mcdaniel2}
show there is now a $15:1$ chance that the 
DNA match fingers an innocent person rather than the true criminal.

\begin{table}[ht]
\centering
\begin{tabular}{|c|r|r||r|}
\hline 
%\multicolumn{2}{|c|}{} &  \multicolumn{3}{|c|}{} \\
%\hline
\multicolumn{1}{|c|}{}  & guilty & innocent & total\\
\hline
DNA match &  1 &   15  & 16 \\
DNA nonmatch & 0  & 99,984  &  99,984 \\
\hline
\hline
 total & 1 & 99,999 & 100,000 \\
\hline
\end{tabular}
\caption{DNA matching}
\tablesource{Handbuilt data.}
\label{mcdaniel2}
\end{table}

Nevertheless, the story did not end well for Brown. 

\begin{qwrap}
\begin{quotation}
\firstline{The Supreme Court [overturning the appeals court order for}
a retrial] said 
in a per curiam opinion that overstated estimates of a DNA match at
trial did not warrant reversal of a conviction when there is still
``convincing evidence of guilt.''% 
\webref{%
http://www.criminallawlibraryblog.com/2010/01/us_supreme_court_update_mcdani.html
}
\end{quotation}
\sourceinfo{http://www.criminallawlibraryblog.com/2010/01/us_supreme_court_update_mcdani.html}
\end{qwrap}

%\footnote{
%\url{http://www.supremecourt.gov/opinions/09pdf/08-559.pdf}
%}

%http://www.scotusblog.com/2009/08/argument-preview-mcdaniel-v-brown/
%\url{http://en.wikipedia.org/wiki/Prosecutor's\_fallacy} 
%http://www.conceptstew.co.uk/PAGES/prosecutors_fallacy.html
%http://www.wacocriminallawblog.com/2010/01/articles/evidence-and-procedure/the-prosecutors-fallacy/
%Mr. Brown was charged with sexual assault. The victim could not
%identify him, and the evidence was all circumstantial; the type where
%it could support innocence just as easily as guilt. The most
%compelling evidence was DNA recovered from sperm on the victim's
%panties. And it was the DNA evidence that was the focus of the writ
%proceeding. 
%
%Mr. Brown lived with his brother, and there was another brother that
%also knew the victim. They all lived in the same trailer park, so it
%was obvious that there would be an issue as to whether the DNA could
%be attributed to one of the brothers. The argument was over
%probabilities; according the State's expert, the probability that
%another person from the general population would have the same DNA
%profile was 1 in 3,000,000. The defense expert expert said it was more
%like 1 in 6,500. 
%
%The prosecutor's fallacy is the assumption that the random match
%probability is the same as the probability that the defendant is the
%source of the DNA sample. In other words, you cant take that the above
%statistic and say the probability that someone other than the
%defendant committed the offense was 1 in 3,000,000; or that there is a
%99.9% chance that the defendant is guilty. 

%http://dna-view.com/profile.htm

\qrsection[retrospective]{Should they have known?}

After unusual disasters like terrorist attacks, earthquakes, severe
storms or airplane crashes you often hear finger-pointing discussions about
the incompetence of the agencies charged with predicting (perhaps even
preventing) what happened. Those discussions may start with a search
that discovers warning signs that were ignored.

Sometimes there were real lapses, and policies and
practices must be designed to prevent a recurrence.
But often blame is unjustified. Table\ref{table:disaster} 
explains why, even without numbers. You might call this
\emindex{qualitative reasoning}. 

\begin{table}[ht]
\centering
\begin{tabular}{|c|r|r||r|}
\hline 
\multicolumn{1}{|c|}{}  & disaster & nothing happens & total \\
\hline
warning &  rare &   often  & often \\
no warning & rare  & usually  &  almost always \\
\hline
\hline
 total & rare & usually &   \\
\hline
\end{tabular}
\caption{Should it have been predicted?}
\tablesource{Handbuilt data.}
\label{table:disaster}
\end{table}

With numbers in the first column you can compute the
probability that a disaster occurs with no warning at all.
With numbers in the first row you can compute the
probability that a particular warning actually corresponds to a
disaster about to happen. That probability is small, because
there are many warnings but few
disasters. Most warnings are false positives. 

That means there are often good reasons for ignoring a warning. For
example, if a state agency believes an earthquake warning it may order
the evacuation of an entire city. The expense and disruption from repeated
evacuations that are not followed by an earthquake may be worse than
the consequences in the rare instance when the earthquake
happens. Just because after the fact you look back and find clues in the
seismic record that suggested an earthquake was imminent doesn't mean
the agency should have acted.

\begin{teacher}

If you want to go further into the analysis of dependence (perhaps
leading to Bayes' theorem) consider two way tables as the entry
point. Independence corresponds to tables whose rows (and hence
columns) are proportional. Those are the only ones that can be modeled
using areas of parts of a square, as in the last chapter.

Causation corresponds to tables with a 0 in one quadrant.
\end{teacher}

\exstart

\begin{exx}{\untested\routine\sref{falsepos}
\gref{contingencytable}\gref{falsepositives}}
Depression

Compute the false positive and false negative rates from the 
probabilities for each of the four corners of the contingency
Table~\ref{vennTable}. Check that the answers match those in the text
computed from the actual counts.

Then use your computed rates in the spreadsheet 
\link{ContingencyTable.xlsx} to check the counts in the corners.
\end{exx}

\begin{exx}{\hassolution\routine\sref{falsepos}
\gref{contingencytable}\gref{falsepositives}}
\headline{Researchers link chronic fatigue syndrome to class of virus}

\begin{qwrap}
\begin{quotation}
\firstline{WASHINGTON -- A well-respected team of scientists released}
long-awaited new evidence yesterday that a virus could be playing a
role in chronic fatigue syndrome. 

The researchers, from the National Institutes of Health, the Food and
Drug Administration, and Harvard Medical School, analyzed blood
samples that had been collected 15 years ago from 37 patients with
chronic fatigue syndrome. Most of the subjects -- 32, or 86.5
percent -- tested positive for a virus known as a murine leukemia
virus-related virus, the researchers found. In contrast, tests on 44
healthy blood donors detected evidence of the virus in only three of
the subjects, or 6.8 percent.%
\webref{%
http://www.boston.com/news/nation/articles/2010/08/24/researchers_link_chronic_fatigue_syndrome_to_class_of_virus/
}
\index{chronic fatigue syndrome}
\end{quotation}
\sourceinfo[515]{http://www.boston.com/news/nation/articles/2010/08/24/researchers_link_chronic_fatigue_syndrome_to_class_of_virus/}
\end{qwrap}

\begin{abcd}
\item Construct the contingency table for this diagnostic tool. You
  may do this by hand, or with the spreadsheet
\link{ContingencyTable.xlsx}.

\item Explain why this test is potentially important for research on
chronic fatigue syndrome but might not be a good screening test.
\end{abcd}

\begin{sol}

\begin{abcd}
\item Construct the contingency table for this diagnostic tool.

\begin{center}
\begin{tabular}{|c|c|r|r||r|}
\hline 

\multicolumn{2}{|c|}{} &  \multicolumn{3}{|c|}{has chronic fatigue
syndrome} \\
\hline
\multicolumn{2}{|c|}{}  & yes & no & total\\
\hline
\multirow{2}{*}{tested positive}
	 & yes &  32  &   5  & 37 \\
	 & no  &  3  &  41 & 44 \\
\hline
\hline
 & total & 35 & 46 & 81 \\
\hline
\end{tabular}

\end{center}

\item Explain why this test is potentially important for research on
chronic fatigue syndrome but might not be a good screening test.
\end{abcd}

The test suggests pretty clearly that a virus may be involved in
chronic fatigue syndrome. That is a lead worth pursuing with further
research. However, the false positive rate is almost 7\%. Since the
disease isn't very common, sceening will produce lots of false
positive results, with concomitant anxiety and expense.

The test might be good for people who already show symptoms suggesting
they have the disease. 

\end{sol}
\end{exx}

\begin{exx}{\untested\sref{falsepos}\gref{contingencytable}
\gref{falsepositives}}
Pregnancy tests

A home \myindex{pregnancy test} kit web site says

\begin{qwrap}
\begin{quotation}
\firstline{Home tests are usually 97\% accurate when all instructions}
are followed correctly and the results are read on time. 

A false positive pregnancy test is when the test says that you are
pregnant but actually you are not. This is a one off case and a
positive pregnancy test is a pretty good indication that you are
pregnant. False positive pregnancy tests are rare - though there are
instances and conditions where they can occur.%
\webref{%
http://www.babyhopes.com/articles/falsepositive.html
}
\end{quotation}
\sourceinfo{http://www.babyhopes.com/articles/falsepositive.html}
\end{qwrap}

Assume that ``97\% accurate'' means a false positive rate and a false
negative rate of 3\%.

\begin{abcd*}

\item Explain why the probability that a woman testing positive is
  pregnant is \emph{less than} 97\%.

\item Explain why that probability is probably not a lot less than 97\%

\end{abcd*}

\begin{hint}
For (b), think about when a woman is likely to use a pregnancy test.
\end{hint}

\end{exx}

\begin{exx}[downs]{\untested\sref{rare}
\gref{contingencytable}\gref{falsepositives}}
Prenatal screening.\index{prenatal screening}

Check the calculations for Down's syndrome testing using the data in
the quotation in \sref{rare}.
\end{exx}

\begin{exx}{\untested\complex\sref{falsepos}\gref{contingencytable}
\gref{falsepositives}}
Spam\index(spam)

Spam is junk email. Most mail systems have a spam filter that tries to
decide whether each piece of email you get is spam. When the spam
filter finds something it thinks is spam, it may throw it away, or put
it in a junk mail folder so that you can decide whether to throw it
away without reading it. 

Before my university department set up a spam filter I ran my own.%
\footnote{The ``I'' here is Ethan Bolker, one of the authors, not the
generic authorial ``we'' we use in most of the book.
}
found that I got about 250 emails each day. My spam filter trapped
about 175 of them. Of those about five were legitimate, and should
have been delivered directly to me. My inbox, which should
contain just the emails that aren't spam, was usually about half
spam. So (in words) my spam filter is pretty good (but not perfect) at
recognizing legitimate email but not very good at calling spam
spam. 

\begin{abcd}

\item Build a two way contingency table with
row categories ``marked spam'' and ``not marked spam'', column categories  
``spam'' and ``legitimate''. 

\item Compute and interpret the false positive and false negative rates.

\item
Explain why both the false positives and the false negatives make
dealing with my email harder.

\item
I can adjust the settings in my spam filter to reduce the false
positive rate. Explain why that would increase the false negative
rate.

\item Is the number of spam emails I received consistent with
this quotation from the August, 6 2008 issue of \theNewYorker ?

\begin{qwrap}
\begin{quotation}
\firstline{More than a hundred billion unwanted messages clog}
computer networks every day.%
\webref{%
http://www.newyorker.com/reporting/2007/08/06/070806fa_fact_specter
}
\end{quotation}
\sourceinfo{http://www.newyorker.com/reporting/2007/08/06/070806fa_fact_specter}
\end{qwrap}

\item
What is the original meaning of the word ``spam''? Does the company that
sells (the real) spam object to the new meaning? 

\item How do you deal with spam? (If your email provider does all the
filtering for you you may not even know it's throwing things away
before you see them, so you may need to do some research on your email
provider's web site to find the answers to these questions.)

\begin{itemize}

\item
Who provides your email service? (your university, your company,
Google, Yahoo, ... ?) 

\item
Do you have any say in how your email provider filters spam for you?
If so, what do you tell it?

\item
Estimate the data you need to build the two way table for your spam
statistics and compute the false negative and false positive rates.

\end{itemize}

\end{abcd}

Here are some web sites to look at if you want to find out more about spam.

\begin{itemize}

\item
\url{http://www.imediaconnection.com/content/3649.asp}. There are some
useful tips here about how to keep other people's spam filters from
thinking mail from you is spam. 

\item
Tools your system administrator might use:
\url{http://www.spamcop.net/}, \url{http://www.spamhaus.org/}

\end{itemize}

\end{exx}

\begin{exx}{\hassolution\sref{falsepos}\gref{contingencytable}
\gref{falsepositives}} 
Plagiarism
\index{plagiarism}

In 2006 UMass Boston experimented with the \myindex{plagiarism}
detection software described at
\url{http://www.turnitin.com} that 
claims it can identify plagiarism in essays students write. 
UMass did not purchase the software after the experiment.
Perhaps the possibility of false positives contributed to that
decision.

Suppose that the software can actually detect every cheater and that
it's 99\% accurate in declaring honest students honest. (We made up
these numbers since the company does not advertise them.) Sounds like
a pretty good test. 

\begin{abcd}

\item
Estimate how many papers are submitted by students at your school each
semester. 

\item
Suppose that most students are honest. Estimate how
many students will be falsely accused of cheating.

\item
What are the advantages and disadvantages of using the software?
(There are several arguments on both sides of the question. Think of
as many as you can.) 

\item Read and write about this article from \theTimes: 
\url{http://www.nytimes.com/2010/07/06/education/06cheat.html}

% Bates/Bowdoin tutorial http://abacus.bates.edu/cbb/quiz/index.html 

% plagiarism resouce site
% http://abacus.bates.edu/cbb/index71ca.html?q=node

\end{abcd}

\begin{sol}

\begin{abcd}

\item
Estimate how many papers are submitted by students at your school each
semester. 

In the spring of 2011 there were about 13,000 students at UMass
Boston. If each one wrote six papers a semester that would come to
about 80,000 papers -- a nice round number in the right ballpark.

\item
Suppose that most students are honest. Estimate how
many students will be falsely accused of cheating.

Since most of the 80,000 papers are honest, the false positive rate
applies -- one percent of them, or 800 papers, will be falsely tagged
as plagiarised. That might
not be quite 800 students, since some students might be unjustly
accused twice, but the order of magnitude is right.

\item
What are the advantages and disadvantages of using the software?
(There are several arguments on both sides of the question. Think of
as many as you can.) 

An advantage is that some plagiarists will be caught who might
otherwise get away with it. Another is that students might be less
likely to cheat knowing that this software was being used.

I can think of several disadvantages. One is the anxiety caused by the
false accusations. Another is the cost. 
\end{abcd}

\end{sol}
\end{exx}

\begin{exx}{\untested\sref{falsepos}\gref{falsepositives}}
Mad cow disease\index{mad cow disease}

\myindex{Bovine Spongiform Encaphalopathy}(\myindex{BSE})
is a disease fatal to people who eat infected beef products.
BSE is rare in cattle; the test used to detect it has a
false positive rate of one in 100,000.

\begin{abcd}
\item Express this false positive rate as a percentage. 
Explain what it means.

\item The United States tested about 788,000 cattle between 2004 and
2006. About how many cattle would test positive for BSE?

\suspend{abcd}
\begin{hint}
Since you do not know the actual number of infected cattle, you can't
know exactly how many would test positive. But using the fact that the
disease is rare, you can estimate the number of positive test results
using the known false positive rate.
\end{hint}
\resume{abcd}

\item Discuss whether you would worry more about a false
positive result or a false negative result.

\end{abcd}

\end{exx}

\begin{exx}{\untested\sref{falsepos}\gref{falsepositives}}
Airport screening
\index{airport screening}

In response to the article \headline{Screening programme evaluation applied
to airport security}
in the December 10, 2007 issue of the British Medical
Journal,
Ganesan Karthikeyan wrote

\begin{qwrap}
\begin{quotation}
\firstline{It is probably true that airport security in its present}
form is not an efficient screening measure. However, one important
difference exists between screening for disease in individual patients
and screening for, say, explosives in airports. While one missed
cancer on screening can cause the loss of at the most, one life, the
number of potential lives lost per missed screening at airports can be
substantially larger. This has to be factored into any attempts at
evaluation of the process.%
\webref{%
http://www.bmj.com/rapid-response/2011/11/01/cost-negative-test
}
\end{quotation}
\sourceinfo{
http://www.bmj.com/rapid-response/2011/11/01/cost-negative-test
}
\end{qwrap}

It's clear that a false negative is a disaster. Discuss the
consequences of a high false positive rate.

\end{exx}

%\begin{exx}{\untested\sref{falsepos}\gref{contingencytable}}
%\myindex{Mr. Boffo}
%
%Construct a reasonable two way contingency table that
%incorporates the data in the Mr. Boffo cartoon that starts this
%chapter. Label the rows and columns. Explain how you made up the
%numbers.
%\end{exx}

\begin{exx}{\hassolution\worthy\sref{falsepos}\gref{contingencytable}
\gref{falsepositives}}
Breast cancer screening.

In his \headline{Chances Are} blog in \theTimes{} on
April 25, 2010 Steven Strogatz wrote about a diagnostic puzzle
presented to several doctors:

\begin{qwrap}
\begin{quotation}
\firstline{The probability that [a woman in this cohort] has breast}
cancer 
is 0.8 percent.  If a woman has breast cancer, the probability is 90
percent that she will have a positive mammogram.  If a woman does not
have breast cancer, the probability is 7 percent that she will still
have a positive mammogram.  Imagine a woman who has a positive
mammogram.  What is the probability that she actually has breast
cancer?

\ldots

[When 24 doctors were asked this question], their
estimates whipsawed from 1 percent to 90 percent.   Eight of them
thought the chances were 10 percent or less, 8 more said 90 percent,
and the remaining 8 guessed somewhere between 50 and 80 percent.
Imagine how upsetting it would be as a patient to hear such divergent
opinions. 
\webref{
http://opinionator.blogs.nytimes.com/2010/04/25/chances-are/
}
\end{quotation}
\sourceinfo[1800]{http://opinionator.blogs.nytimes.com/2010/04/25/chances-are/}
\end{qwrap}

\begin{abcd}
\item
What is the correct answer? 
\suspend{abcd}

\begin{hint}
Build the contingency table, based on a population of 1,000 women
tested. You may do this by hand or with  the spreadsheet
\link{ContingencyTable.xlsx}. 

\end{hint}

\resume{abcd}
\item What percentage of the 24 doctors got the correct answer?

\end{abcd}

\begin{sol}
\begin{abcd}
\item
What is the correct answer?

Here is the contingency table, based on 1,000 women screened.
\begin{center}
\begin{tabular}{|c|c|r|r||r|}
\hline 
\multicolumn{2}{|c|}{} &  \multicolumn{3}{|c|}{has breast cancer} \\
\hline
\multicolumn{2}{|c|}{}  & yes & no & total\\
\hline
\multirow{2}{*}{screens +}
	 & yes &   7 &   70  & 77 \\
	 & no  &   1 &  922  & 923  \\
\hline
\hline
 & total & 8 & 992 & 1,000 \\
\hline
\end{tabular}
\end{center}

So the probability that a woman with a positive mammogram actually has
cancer is just 7/77 = 1/11, or about 9\%.

\item What percentage of the 24 doctors got the correct answer?

Eight doctors thought the correct answer was less than 10\%, which it
is. One doctor thought it was just 1\%, so I won't count that as a
correct answer. That means 7/24 or about 30\% got the answer right. 

\end{abcd}

\end{sol}

\end{exx}

\begin{exx}{\untested\sref{falsepos}\gref{contingencytable}
\gref{falsepositives}}
\headline{Identity fraud dragnet hardly seems worth the expense or trouble}

On July 24, 2011 Jane Allen wrote in a letter to the editor of
\theGlobe{} that

\begin{qwrap}
\begin{quotation}
\firstline{[T]he state Registry of Motor Vehicles sends out 1,500}
suspension letters a day. Last year, as a result of the software,
State Police said there were 100 arrests for fraudulent identity and
1,860 licenses were revoked. 

That means that about 390,000 people were questioned for the sake of
finding fewer than 2,000 transgressors. This hardly seems worth the
\$1.5 million grant for the software, let alone the investment of
personnel.
\webref{
http://www.boston.com/bostonglobe/editorial_opinion/letters/articles/2011/07/24/identity_fraud_dragnet_hardly_seems_worth_the_expense_or_trouble/
}
\end{quotation}
\sourceinfo[132]{http://www.boston.com/bostonglobe/editorial_opinion/letters/articles/2011/07/24/identity_fraud_dragnet_hardly_seems_worth_the_expense_or_trouble/}
\end{qwrap}

\begin{abcd}
\item
Check Allen's arithmetic in the second paragraph.

\item Construct the contingency table for this screening. Identify the
true and false positives and negatives. Explain the
costs and benefits.
\end{abcd}

\end{exx}


\begin{exx}{\untested\complex\sref{prosecutor}\gref{contingencytable}}
Candy leads to crime

An article headlined
\headline{Happy Halloween! Kids who eat candy every day grow up to be violent
criminals}\ in the October 2, 2009 \emph{Daily Finance}, begins

\begin{qwrap}
\begin{quotation}
\firstline{Quick, hide the candy jar! Feeding your child candy every}
day could help turn Junior into a violent criminal, according to a
large study in Britain, which found that 69 percent of the
participants who had committed violence by 34 had eaten sweets or
chocolate nearly every day during childhood. 
\end{quotation}
\sourceinfo[584]{http://www.dailyfinance.com/2009/10/02/happy-halloween-kids-who-eat-candy-every-day-grow-up-to-be-viol/}
\end{qwrap}

You can find the full text at
\url{http://www.dailyfinance.com/2009/10/02/happy-halloween-kids-who-eat-candy-every-day-grow-up-to-be-viol/}

Here is the Associated Press version:
\url{http://www.wtop.com/?nid=105&sid=1775511}

And here is an abstract of the original study, from the British
Journal of Psychiatry:
\url{http://bjp.rcpsych.org/cgi/content/abstract/195/4/366}.

\begin{abcd}

\item
Read the rest of the article. Build the contingency table with columns
for whether or not someone ate candy as a child, rows for whether or
not they committed violence as an adult.

\item 
Explain why this is an example of the prosecutor's fallacy.

\item
Some of the on line comments on that article recognize the fallacy --
for example

\begin{quotation}
\noindent
10-03-2009 @ 10:21PM \\
Bski said... \\
I bet you, 99\% of criminals ate bread daily by the time they were 10
years old!!!! 
\end{quotation}

Write your own blog entry, using your understanding of two way
contingency tables to enlighten any readers. If you like what you've
written you can post your comment on the article's blog.
\end{abcd}


\end{exx}


\begin{exx}{\untested\sref{falsepos}\gref{falsepositives}}
In Andrew Gelman's \index{Gelman, Andrew} blog on
\headline{Statistical Modeling, Causal Inference, and Social Science}
commenter Mike Spagat writes that

\begin{qwrap}
\begin{quotation}
\firstline{Even within exceptionally violent environments most}
households will still not have a violent death. So a very small false
positive rate in a household survey will cause substantial upward bias
in violence estimates. 
\webref{
http://andrewgelman.com/2011/08/the_reliability/
}
\end{quotation}
\sourceinfo{http://andrewgelman.com/2011/08/the_reliability/}
\end{qwrap}

Write a paragraph or two explaining this to someone who is interested
and smart enough to understand this but has not studied the material in
this chapter. Consider making up some numbers to illustrate your argument.
\end{exx}


\begin{exx}{\needsquestions\sref{marginoferror}}
\headline{Surgery offers no advantage for early prostate cancer, study
finds}

That article reported on a clinical trial involving 731 men diagnosed
with prostate cancer. About half had surgery; the rest were monitored.

\begin{qwrap}
\begin{quotation}
\firstline{After 12 years, nearly 6 percent of men who had immediate}
  surgery died of the cancer, compared with slightly more than 8
  percent of those patients who were observed, which was not a great
  enough difference to reach statistical significance.  
\webref{http://bostonglobe.com/lifestyle/health-wellness/2012/07/18/surgery-offers-survival-advantage-for-older-men-with-early-stage-prostate-cancer-study-finds/T5XM7APIuoZuav6PbJzYuI/story.html}
\end{quotation}
\sourceinfo[1038]{http://bostonglobe.com/lifestyle/health-wellness/2012/07/18/surgery-offers-survival-advantage-for-older-men-with-early-stage-prostate-cancer-study-finds/T5XM7APIuoZuav6PbJzYuI/story.html}
\end{qwrap}

\begin{abcd*}
\item About how many men were in each category?
\item About how many deaths were there in each category?
\item Construct the contingency table for this study.
\end{abcd*}
\end{exx}

\begin{exx}{\hassolution\artificial\worthy}
Teenage drug use

Here's a made up story.

The dean at a fancy private high school 
is very worried. She suspects that about 20\% of the 1000 students
on campus are using drugs. She has asked all the parents to administer
a home drug test to their kids (since it's a private school she can
actually require them to do it). She has read on the web that

\begin{qwrap}
\begin{quotation}
\firstline{With home drug testing methods believed to produce reliable}
 and accurate results, many of us overlook the cases of false positives and
draw conclusions on the suspect before reconfirming the result. But,
researchers from the Boston University have found out that drug tests
may produce false positives in 5-10\% of cases and false negatives in
10-15\% of cases.
\webref{http://lapoliticaesotracosa.blogspot.com/2012/05/how-to-avoid-false-positives-while.html}
\footnote{We found several blogs that seem to report on this same
  study. None gives a link or a precise reference. We haven't been
  able to locate the original.}
\end{quotation}
\sourceinfo{http://lapoliticaesotracosa.blogspot.com/2012/05/how-to-avoid-false-positives-while.}
\end{qwrap}

Answer the following questions, assuming the worst cases 
(10\% false positive rate, 15\% false negative rate).

\begin{abcd}

\item Build the contingency table for this drug screening scenario. To
  do that you will have to figure out

\begin{itemize}
\item How many students are drug users.
\item How many of the drug users test positive. How many test
  negative. 
\item How many students are drug free.
\item How many of the drug free students test positive. How many test
  negative.
\end{itemize}

You may do the arithmetic with by hand or with the spreadsheet
at \link{ContingencyTable.xlsx}. 

\item What is the true positive rate?

\item Student John Smith tested positive. What is the probability that
  he is really on drugs?

\item Student Jane Doe tested negative. What is the probability that
  she is really drug free?

\item Answer the previous two questions if you assume the best
  cases for reported false values in the Boston University study.
\end{abcd}

\begin{sol}
\begin{abcd}

\item Build the contingency table for this drug screening scenario. To
  do that you will have to figure out

\begin{itemize}
\item How many students are drug users: 20\% of 1000, so 200.

\item How many of the drug users test positive. How many test
  negative. 

170 of the 200 users test positive. The other 30 test negative (these
are the false negatives).

\item How many students are drug free. The other 800.

\item How many of the drug free students test positive. How many test
  negative.

720 of the 800 clean students test negative. 80 are false positives.

\end{itemize}

\item What is the true positive rate? 100\% - 10\% = 90\%.

\item Student John Smith tested positive. What is the probability that
  he is really on drugs?

That's $170/250 = 0.68$. 

\item Student Jane Doe tested negative. What is the probability that
  she is really drug free?

$760/780 = 96\% $.

\item Answer the previous two questions if you assume the best
  cases for reported false values in the Boston University study.

82\% and 97\% -- Excel did the work for me.
\end{abcd}

\end{sol}

\end{exx}

\begin{exx}{\untested\needsquestions}
At home drug tests

Walgreens advertises its at home drug tests as 
``99.9\% accuracy, easy to use''

Search for
\gc{
at home drug test false positive
}
and explore some of what you find. One example:

\begin{qwrap}
\begin{quotation}
\firstline{Drug Tests Often Trigger False Positives \\}
Poppy Seeds, Cold Medications Can Trigger False Alarms\\

Drug tests generally produce false-positive results in 5\% to 10\% of
cases and false negatives in 10\% to 15\% of cases, new research
shows. 
\webref{http://www.webmd.com/news/20100528/drug-tests-often-trigger-false-positives}
\end{quotation}
\sourceinfo{http://www.webmd.com/news/20100528/drug-tests-often-trigger-false-positives}
\end{qwrap}

\end{exx}

\begin{exx}{\untested\sref{retrospective}}
The Boy Who Cried Wolf

Use Table~\ref{table:disaster} to analyze the children's story with
that title.

\end{exx}

\begin{exx}{\untested}
Playing the lottery.

Table~\ref{table:lotteryTable} illustrates the ultimate example of the error
you can make reading a column instead of a row. 

\begin{table}[ht]
\centering
\begin{tabular}{|c|c|r|r||r|}
\hline 

\multicolumn{2}{|c|}{} &  \multicolumn{3}{|c|}{bought a ticket} \\
\hline
\multicolumn{2}{|c|}{}  & yes & no & total\\
\hline
\multirow{2}{*}{won the lottery}
	 & yes &  1  &   0  & 1 \\
	 & no  &  many  &  very many & very many \\
\hline
\hline
 & total & many & very many & very many \\
\hline
\end{tabular}
\caption{Playing the lottery}
\tablesource{Handbuilt data.}
\label{table:lotteryTable}
\end{table}

\begin{abcd}
\item Suppose you won the lottery. What is the probability that you
  bought a ticket?

\item Suppose you bought a ticket. What is the probability that you
  won the lottery?
\end{abcd}

\end{exx}
\begin{ReviewExercises}

\end{ReviewExercises}

\setexercisecounter{}

\begin{ExtraExercises}

\begin{exx}{\untested\needsquestions\sref{falsepos}\gref{contingencytable}} \headline{False Positive Oral
Fluid Rapid HIV 
Testing in NYC STD Clinics}\index{HIV testing}\index{STD testing}

Read the City of New York DEPARTMENT OF HEALTH AND MENTAL HYGIENE
Health Advisory \#20 at
\url{http://www.nyc.gov/html/doh/downloads/pdf/cd/08md20.pdf}. 

Build the two way contingency tables based on the data there, and
discuss the consequences of the data.

\end{exx}


\begin{exx}{\untested\complex\needsquestions}
Missile defense.
\index{missile defense}
\index{Postel, Theodore}

Theodore A. Postel wrote in \theGlobe{} on April 15, 2008 that

\begin{qwrap}
\begin{quotation}
\firstline{THE HOUSE Subcommittee on National Security and Foreign}
Affairs will 
hold a long-overdue oversight hearing tomorrow on the prospects for
national missile defense. The most basic question that needs to be
addressed is the inability of the national missile defense to tell the
difference between simple warheads and decoys.

\ldots

The issue of the effectiveness of decoys against the missile defense
is easy to understand. The national missile defense is designed to
destroy warheads by hitting them with infrared homing Kill Vehicles
while the warheads are in the near vacuum of space. Since there is no
air-drag in space, a warhead weighing thousands of pounds and a
balloon weighing almost nothing will travel together. Warheads could
be placed inside balloons, and many balloons could be deployed along
with the warheads. \ldots Since there would be
no way for the Kill Vehicle to know which balloons contain warheads,
the chances of actually hitting a warhead would be minuscule.%
\webref{
http://www.boston.com/bostonglobe/editorial\_opinion/oped/articles/2008/04/15/troubling\_questions\_about\_missile\_defense/}
\end{quotation}
\sourceinfo[667]{http://www.boston.com/bostonglobe/editorial_opinion/oped/articles/2008/04/15/troubling_questions_about_missile_defense/}
\end{qwrap}

\marginpar{
Not yet sure how to phrase a question about this. Is there
a natural two way contingency table?
}

\end{exx}



\begin{exx}{\needsquestions}
\headline{Asian carp eDNA testing can lead to false positives, study
finds}

\begin{qwrap}
\begin{quotation}
\firstline{The technical team developed a strategy to test for false}
positives.    Water samples were collected in April from a metro lake
that served as a negative control (very little chance Asian carp could
be present).   Twenty samples were sent to the Corps of Engineers
laboratory in Vicksburg and 20 were sent to the private contractor
that did the 2011 analysis.  All of the samples from the Corps of
Engineers lab tested negative, while one sample from the private
contractor tested positive for silver carp.  This sample was tested
again and the positive was verified. 

There is a high likelihood this is a false positive which creates
uncertainty about previous results.    The percentage of positives in
the 2012 samples was much lower than previous samples suggesting there
may have been a mix of real and false positive samples in 2011.   This
does not minimize eDNA testing as an important tool for detecting
Asian carp, but it does emphasize the need to determine the source of
false positives and to review and modify sampling and analytical
procedures.  In addition, we have collected live Asian carp from the
St. Croix and Mississippi Rivers which are definitive evidence these
fish are present and pose a threat to Minnesota. 
\webref{%
http://blogs.twincities.com/outdoors/2012/07/27/asian-carp-edna-testing-can-lead-to-false-positives-study-finds/
}
\end{quotation}
\sourceinfo{http://blogs.twincities.com/outdoors/2012/07/27/asian-carp-edna-testing-can-lead-to-false-positives-study-finds/}
\end{qwrap}

\emph{The posting has no numbers, so it's hard to write questions.}

\end{exx}

\begin{exx}{\needsquestions}
\headline{Shaky Foundations for the New Mammogram Economy}

From Bloomberg news, on August 1, 2012:

\begin{qwrap}
\begin{quotation}
\firstline{Part of the problem is that, on a mammogram, a
noncancerous}
abnormality can look very much like cancer. \ldots
This causes 10 percent to 15 percent of screened
women in the U.S. to be recalled for more evaluation. Most (95
percent) screening-detected abnormalities are ultimately found to be
noncancerous. An American woman who is regularly screened during her
40s has a 61 percent chance of getting a false positive result. 

\ldots

Now, for every \$100 spent on screening, an additional \$30 to \$33 is
spent to evaluate false positive findings. In the Medicare population,
the workup of false positive mammogram results is estimated to total
\$250 million a year.%
\webref{%
http://www.bloomberg.com/news/2012-08-01/shaky-foundations-for-the-new-mammogram-economy.html
}
\end{quotation}
\sourceinfo{
http://www.bloomberg.com/news/2012-08-01/shaky-foundations-for-the-new-mammogram-economy.html
}
\end{qwrap}

Also see the report
\headline{High Rate of False-Positives with Annual Mammogram} 
from UCSF:

\begin{qwrap}
\begin{quotation}
\firstline{For the false-positive study, the researchers found that}
after a decade of annual screening, a majority of women will receive
at least one false-positive result, and 7 to 9 percent will receive a 
false-positive biopsy recommendation.
\webref{http://www.ucsf.edu/news/2011/10/10778/high-rate-false-positives-annual-mammogram
}
\end{quotation}
\sourceinfo{
http://www.ucsf.edu/news/2011/10/10778/high-rate-false-positives-annual-mammogram
}
\end{qwrap}

\end{exx}

\begin{exx}{\untested}
Testing for prostate cancer

Figure~\ref{fig:psa} from the Department of Family Medicine at
Virginia Commonwealth University illustrates the possible
outcomes of a \myindex{PSA} screening test for prostate cancer.

\begin{figure}[ht]
\centering
\includegraphics[height=60mm]{\here/psa.jpg}
\caption{Prostate cancer screening test results}
\figsource{http://www.familymedicine.vcu.edu/research/misc/psa/index.html}
\label{fig:psa}
\end{figure}

This quotation spells out some of the arguments for and against the test:

\begin{qwrap}
\begin{quotation}
\firstline{There are possible advantages to having a PSA test.}

\begin{enumerate}
\item   A normal PSA test may reassure you.
\item   A PSA test may find prostate cancer early before it has
spread.
\item  Treatment of prostate cancer in early stages may help some men
to avoid problems from cancer. 
\item  Treatment of prostate cancer in early stages may help some men live longer.
\end{enumerate}

There are possible disadvantages to having a PSA test.

\begin{enumerate}
\item  A normal PSA test may miss some prostate cancers.
\item  A false positive PSA test may cause unnecessary anxiety.
\item  A false positive PSA test may cause an unneeded prostate
biopsy. 
\item  You may find out that you have prostate cancer, but it may be a
cancer that would never cause you any problems. 
\item  Treatment of prostate cancer may cause you harm. Difficulties
with getting erections or problems with controlling your bladder or
bowels are some potential harms. 
\end{enumerate}
\sourceinfo{http://www.familymedicine.vcu.edu/research/misc/psa/index.html}
\end{quotation}
\end{qwrap}

\begin{abcd}
\item What does ``PSA'' stand for?
\item Construct the two way contingency table based on this data.
\item If your screening test is positive, what is the probability that
you have prostate cancer?
\item If you have prostate cancer, what is the probability that this
screening test will detect it?
\end{abcd}

The figure shows that the overall incidence of prostate cancer is
10\%. That is probably an invented statistic, to make the arguments
easier to understand. The reality is complex. Here's a start on it,
from the American Cancer Society:

\begin{qwrap}
\begin{quotation}
\firstline{What are the key statistics about prostate cancer?}

Other than skin cancer, prostate cancer is the most common cancer in
American men. The latest American Cancer Society estimates for
prostate cancer in the United States are for 2012: 

    About 241,740 new cases of prostate cancer will be diagnosed

    About 28,170 men will die of prostate cancer

About 1 man in 6 will be diagnosed with prostate cancer during his lifetime.

Prostate cancer occurs mainly in older men. Nearly two thirds are
diagnosed in men aged 65 or older, and it is rare before age 40. The
average age at the time of diagnosis is about 67. 

Prostate cancer is the second leading cause of cancer death in
American men, behind only lung cancer. About 1 man in 36 will die of
prostate cancer. 

Prostate cancer can be a serious disease, but most men diagnosed with
prostate cancer do not die from it. In fact, more than 2.5 million men
in the United States who have been diagnosed with prostate cancer at
some point are still alive today. 
\end{quotation}
\webref{http://www.cancer.org/Cancer/ProstateCancer/DetailedGuide/prostate-cancer-key-statistics}
\sourceinfo{
http://www.cancer.org/Cancer/ProstateCancer/DetailedGuide/prostate-cancer-key-statistics
}
\end{qwrap}

\end{exx}

\end{ExtraExercises}


\begin{ScopeExercises}

\end{ScopeExercises}