%http://www.familymedicine.vcu.edu/research/misc/psa/index.html
% From Maura:
%
%I think that reading this type of table is subtle enough that it may help to
%have an introductory section on how to read such a table.  I know it may
%sound trivial, but in fact it's very easy to mess up and draw the wrong
%conclusions.  Then we could introduce the false positive/false negative
%thing and have a broader discussion.   I'd have to think of a good example.
%In the exercises we have one about the UMB population.  A more general (and
%more interesting) example would be better.
%
% FalsePositives/contents.tex
%
\chapter{\mychaptername}
\label{\here}



\tocnotetoo{
In Chapter\ref{BreakTheBank} we looked at probabilities of independent
events -- things that had nothing to do with one another. Here we think about
probabilities in situations where we expect to see connections:
screening tests for diseases, DNA evidence for guilt in a criminal
trial.}%

\begin{goals}
\begin{goal}{contingencytable}
Interpret and buid two way contingency tables.
\end{goal}

\begin{goal}{falsepositives}
Understand the implications of false positives.
\end{goal}
\end{goals}


\begin{chapterpix}

\begin{center}
\includegraphics[height=50mm]{\here/mrBoffo.jpg}
\end{center}

\theGlobe, August 29, 2008 \\copyright \copyright Neatly Chiseled
Features

\includegraphics[height=50mm]{\here/111polygraph2.jpg}
\includegraphics[height=50mm]{\here/thomasbayes.png}

If we use the polygraph, write an exercise about it. The Bayes image
is probably too arcane, since we don't ever mention Bayes' theorem
(although we could).

\url{http://phillips.blogs.com/goc/2008/01/lying-lie-detec.html}
\url{http://www.allspammedup.com/2011/02/what-are-bayesian-filters-anyway/}

Many google search images for ``false positive'' are about pregnancy
tests. I didn't include any here.

\end{chapterpix}

\qrsection[conditional]{UMass Boston enrollment}

Table~\ref{UMassBostonEnrollment}
summarizes student enrollment 
at UMass Boston in 2006 by category: graduate/undergraduate and
male/female.
\footnote{This is real data, but not generally interesting. It's here
to illustrate some important ideas -- we hope to replace it with
a better example for that purpose.}

\begin{table}[ht]
\centering
\includegraphics[height=30mm, width=120mm]{\here/UMBstudentTable.jpg}
\caption{UMass Boston Enrollment, 2007}
\tablesource{Handbuilt table. We're sure data is public.}
\label{UMassBostonEnrollment}
\end{table}

We can use the data in that table to answer some questions about the
UMass Boston student population.

\begin{itemize}

\item What is the probability that a student chosen at random is a
graduate? 

The last row of the table has the numbers we need:
\begin{align*}
\frac{\text{number of graduates}}{\text{number of students}}
& = \frac{3,425	}{13,433} \\
& = 0.254969106 \\
& \approx 25\%.
\end{align*}

\item What is the probability that a student is female?

For that computation we use the last {\em column}:

\begin{align*}
\frac{\text{number of females}}{\text{number of students}}
& = \frac{8,068}{13,433} \\
& = 0.600610437 \\
& \approx 60\%.
\end{align*}

\item What is the probability that a student is a female
graduate?

Use the count in the second column of the first row:

\begin{align*}
\frac{\text{number of females}}{\text{number of students}}
& = \frac{2,388}{13,433} \\
& =  0.177771161 \\
& \approx 18\%.
\end{align*}

\end{itemize}

In each of these probability calculations we used the total number of
students (13,433) in the denominator. 

Continuing \ldots

\begin{itemize}

\item What is the probability that a female student is a graduate?

Since this is a question about the female students, we need a
different denominator:

\begin{align*}
\frac{\text{number of female graduates}}{\text{number of female students}}
& = \frac{2,388}{8,068} \\
& = 0.295984135 \\
& \approx 30\%
\end{align*}

\item What is the probability that a graduate student is female?

That's a different question, which calls for a different denominator:

\begin{align*}
\frac{\text{number of female graduates}}{\text{number of graduates}}
& = \frac{2,388}{3,425} \\
& =  0.697226277 \\
& \approx 70\%
\end{align*}
\end{itemize}

The last two questions sound similar, but have very different answers,
because each begins with a different assumption. In the first we know
the student is a female and wonder whether she's a graduate. 
Since there are nearly twice as many undergraduate as graduate
females, the 30\% probability makes sense. In the second, we know that the
student is a graduate student and wonder whether it's a she. Since
there are about twice as many female as male graduate students it's no
surprise that the probability is about 79\%.

We're not finished thinking about these probabilities. We found that
there's a 60\% probability that a student is female. But {\em if
we know the student is a graduate} then that probability increases to
70\%. This is not what happened when we thought about a coin and a die
in \sref*{coindie}. The probability that the die shows a four is the
same independent of whether the coin comes up heads or tails. Those
events are {\em independent}. The facts {\em student is female} and
{\em student is a graduate} are {\em dependent}\index{dependent
events}. When you know one of them you know something about the
probability of the other.

We learned in \sref*{coindie} that when events are independent
you multiply to compute the probability that both happen:
\begin{align*}
\text{probability(coin head, die four)}
& = 
\text{probability(coin head} \times \text{probability(die four)} \\
& = \frac{1}{2} \times \frac{1}{6} \\
& = \frac{1}{12}.
\end{align*}.
For dependent events that won't work. We found that
\begin{equation*}
\text{probability(female and graduate)} = 18\%
\end{equation*}
but
\begin{align*}
\text{probability(female} \times \text{probability(graduate)} 
& = 60\% \times 25\% \\
& = 15\%.
\end{align*}

In the rest of this chapter we will look at the probabilities for
dependent events, working with displays like
Table~\ref{UMassBostonEnrollment}.
in examples where the consequences matter much more
than they do here.%
\footnote{
What we do with tables can also be done with formulas -- the most
important one is called \emindex{Bayes' rule}. We won't use the
formulas since we think the tables are easier to understand and the
methods using them easier to remember.}
\teachertag
\begin{teacher}
This chapter focusses on two way contingency tables in
order to discuss several important common logical pitfalls dealing
with everyday probabilities. We think that approach makes more sense,
and is easier to remember and apply, 
than an explicit treatment of dependent events and Bayes'
theorem. That's too technical for 
our goals in this quantitative reasoning text, and so 
better left for a full
course in probability and statistics.
In fact, many of the examples in this chapter employ qualitative
rather than quantitative reasoning. 
\end{teacher}

\qrsection[falsepos]{False positives and false negatives}

Figure~\ref{vennDiagram} appeared in the
article \headline{False positives, false negatives, and the validity
of the diagnosis of major depression in primary care} in the
September 1998 Archives of Family Medicine.%
\webref{%
http://archfami.ama-assn.org/cgi/reprint/7/5/451
}
It summarizes the results of a study of 372 patients who were screened
by family physicians for clinical depression.

\begin{figure}[ht]
\centering
\includegraphics[height=60mm]{\here/vennDiagram.png}
\caption{Diagnosing depression}
\figsource{\url{http://archfami.ama-assn.org/cgi/reprint/7/5/451}}
\figcomment{Permission needed.}
\label{vennDiagram}
\end{figure}

The numbers in the four categories in the figure are easier to
understand when they are displayed in Table~\ref{vennTable}, where
we have included percentages along with the counts.%
\footnote{
Since we rounded the percentages they actually add up to just 99\%
instead of the 100\% shown.
}
\begin{table}[ht]
\centering
\begin{tabular}{|c|c|r|r||r|}
\hline 
\multicolumn{2}{|c|}{} &  \multicolumn{3}{|c|}{depressed} \\
\hline
\multicolumn{2}{|c|}{}  & yes & no & total\\
\hline
\multirow{2}{*}{diagnosed}
	 & yes &  31 (8\%) & 34 (9\%) & 65 (17\%)\\
	 & no  &  50 (13\%)  & 257 (69\%) & 307 (83\%) \\
\hline
\hline
 & total & 81 (21\%) & 291 (79\%) & 372 (100\%) \\
\hline
\end{tabular}
\caption{Diagnosing depression}
\tablesource{Data from
\url{http://archfami.ama-assn.org/cgi/reprint/7/5/451}}
\tablecomment{Needs permission.}
\label{vennTable}
\end{table}

%\multirow{2}{*}{\begin{sideways}diagnosed\end{sideways}}
%	 & \begin{sideways}yes\end{sideways}&  31 & 34 & 65 \\
%	 & \begin{sideways}no\end{sideways} &  50 & 257 & 307 \\
%

Two by two tables like this are called {\em
contingency tables}\index{contingency
tables}. Figure~\ref{contingencyTable} shows the standard names for
the four corners: true positives,
false positives, false negatives and true negatives. In this example
they have values 31, 34, 50 and 257, corresponding to probabilities
$31/372 \approx 8\%$, $34/372 \approx 9\%$, $50/372 \approx 13\%$ and
$257/372 \approx 69\%$.% 

\begin{figure}[ht]
\centering
\includegraphics[height=40mm, width=120mm]{\here/FalseNegChart.png}
\caption{A two way contingency table}
\figsource{Hand built.}
\figcomment{Redraw to make it look nicer.}
\label{contingencyTable}
\end{figure}

The most interesting conclusions you can draw from a contingency table
use the row and column totals along with the individual entries.

\begin{itemize*}

\item The first row total tells us that about $8\% + 9\% = 17\%$ were
diagnosed as depressed.

\item The first column total tells us that about
$8\% + 13\% = 21\%$ were actually depressed.
\end{itemize*}

The fact that the two probabilities are close says that
the diagnosis identifies just about the right fraction of the
population. But when you study the rows and columns separately, a
different story emerges. 

\begin{itemize}

\item
The second column says that the {\em false positive rate}\index{false
positive rate} is significant: it's $34/291 = 0.117$. That means that
about 12\% of the people diagnosed as
depressed don't in fact suffer from that condition.

\item
The first column says that
{\em if a person is depressed} the probability that he or she will be
diagnosed correctly is only $31/81 \approx
38\%$. There's a $62\%$ chance the condition will be missed. That 62\%
is the {\em false negative rate}.\index{false negative rate}
\end{itemize}

Whether this is a ``good'' test is a difficult decision.  Although the
chance of misdiagnosis of depression when it doesn't exist is fairly
low -- about 12\% -- the 62\% false negative rate says that test will
identify less than half the depressed people.

\qrsection[rare]{Testing for a rare condition}

Suppose a drug company has developed a test for rare disease X that is
99\% accurate at detection with a false positive rate of only 0.1\% (a
mere tenth of a percent). 
This example concerns an unnamed disease X; we made up the numbers.
\footnote{See the Exercises for some real disease testing data.}
Before we work with them numbers lets set out the questions we need
answers to:

\begin{enumerate}
\item What is the probability that a person who suffers from X tests
positive?

\item What is the probability that a person who tests positive suffers
from X?
\end{enumerate}

These are just the kind of parallel questions we considered in the two
previous sections. Question 1 is easy: the drug company's clinical
trials found that the answer is a fantastic 99\%.

Whether that test is as good as it sounds depends in part on the
answer to the second question. That answer depends on
how many people actually suffer from disease X. Suppose it's rare --
affecting just one person in every 10,000 (one one hundredth of one
percent of the population). 

Since several of the percentages are pretty small (less than one
percent) and small percentages are hard to understand, we will build our
contingency table for an imaginary population of 1,000,000 people that
just matches the statistical profile for this test.%
\footnote{
This technique is called working with {\em natural frequencies}
\index{natural frequency}. Use it when you're worried that 
the percentages might be confusing.
}
In that population of one million, one out of every 10,000 
will have the disease. That's 100 people. Because the test is 99\%
accurate when the 
disease is present, it will catch 99 of those. One case will be a
false negative. Those numbers fill the first column of
Table~\ref{diseaseX}. To fill the second column, note
that there will be $0.001 \times 999,900 = 999.9 \approx 1000$
positive tests in the healthy part of the population (the false
positives) and thus 998,900 true negatives. 

\begin{table}[ht]
\centering
\begin{tabular}{|c|c|r|r||r|}
\hline 
\multicolumn{2}{|c|}{} &  \multicolumn{3}{|c|}{suffers from X} \\
\hline
\multicolumn{2}{|c|}{}  & yes & no & total\\
\hline
\multirow{2}{*}{tests + for X}
	 & yes &  99 &   1,000  & 1,099 \\
	 & no  &   1 &  998,900 & 998,901 \\
\hline
\hline
 & total & 100 & 999,900 & 1,000,000 \\
\hline
\end{tabular}
\caption{Screening for a rare disease}
\tablesource{Handbuilt data.}
\label{diseaseX}
\end{table}

The false negative rate is 1\%, as claimed. But the small false
positive rate matters because the population is large. 0.1\% of the
nearly one million healthy people will test positive. That's about
1,000 people. That answers Question 2:
the probability someone testing positive has the disease is only 
$99/1,099 \approx 9\%$.
If you test positive
for X the odds are still $1,000: 99:$ or about $10 : 1$ that you don't
have the disease. 

Is this acceptable? Maybe, maybe not. 

If the test is inexpensive and there's a second test (perhaps more
expensive) that can weed out the false positives, and
the disease can be treated successfully if detected, perhaps
the screening is a good idea.

If all the people who test positive must undergo expensive painful
unreliable treatment, which would be unnecessary for 90\% of them,
then the screening is probably a bad investment of scarce health care
resources.

%The insurance company would
%like to think so. They advertise to ask you to ask your doctor
%for the test, lobby doctors to screen their patients, and lobby the
%insurance companies to cover the cost.

\qrsection[prosecutor]{The prosecutor's fallacy}
\index{prosecutor's fallacy}

The Cornell University Legal Information Institute posted a discussion
of {\em McDaniel v. Brown} when that case was on the docket of the
Supreme Court. They wrote \index{McDaniel v. Brown}\index{Supreme Court}

\begin{qwrap}
\begin{quotation}
\firstline{Following a state conviction for sexual assault, Troy Brown}
filed a petition for writ of habeas corpus in the United States
District Court for the District of Nevada. The District Court allowed
Brown to present new evidence: a report from Dr. Lawrence
Mueller. This report detailed a statistical error (``prosecutor's
fallacy'') made by the prosecution during the presentation of DNA
evidence. Based on Dr. Mueller's report, the District Court dismissed
the DNA evidence from consideration, found insufficient evidence to
convict Brown, and ordered a retrial.  

\ldots

At trial, Renee Romero, a forensic scientist at the Washoe County Crime
Lab, testified that the DNA found in the victim's underwear matched
Brown's DNA; only one in three million people would match the DNA
tested. The prosecutor asked Romero to express this statistic
as ``the likelihood that the DNA found \ldots is the same as the DNA
found in [Brown's] blood.'' Romero concluded that the likelihood was 99.999967
percent. Based on this statistic, the prosecutor then
asked Romero if it would be fair to conclude that there was a 0.000033
percent chance that the DNA did not belong to Brown. Romero
agreed with the prosecutor, stating that that this was ``not
inaccurate.''%
\webref{%
http://topics.law.cornell.edu/supct/cert/08-559}
\end{quotation}
\sourceinfo{http://topics.law.cornell.edu/supct/cert/08-559}
\end{qwrap}

Romero's arithmetic is right: one in three million  is 0.000033
percent. But her thinking is wrong. 

The prosecutor's fallacy is the claim that the one in three
million probability of a random match is the same as the probability
that the defendant is the source of the DNA sample. We can use a
contingency table to show why those probabilities are different.

First we need an
estimate of the population in which a possible DNA match might be
found. To make the arithmetic easier, we'll take that to be 9 million
people (Los Angeles is near enough to Nevada). Then the ``one in three
million'' statistic says we should expect three DNA matches from that
population. Table~\ref{mcdaniel} summarizes the data.

\begin{table}[ht]
\centering
\begin{tabular}{|c|r|r||r|}
\hline 
%\multicolumn{2}{|c|}{} &  \multicolumn{3}{|c|}{} \\
%\hline
\multicolumn{1}{|c|}{}  & guilty & innocent & total\\
\hline
DNA match &  1 &   2  & 3 \\
DNA nonmatch & 0  & 8,999,997  &  8,999,997 \\
\hline
\hline
 total & 1 & 8,999,999 & 9,000,000 \\
\hline
\end{tabular}
\caption{DNA matching}
\tablesource{Handbuilt data.}
\label{mcdaniel}
\end{table}

The first row of that table tells us that if the only evidence in the
case is the DNA match the odds are $2:1$ that the suspect is innocent!
That's a far cry from the ``99.999967\% guilty'' that the prosecutor
asked the jury to believe.

The defense didn't make this argument using a hypothetical 9,000,000
population of potential suspects. Instead they questioned 
the ``one in three million'' chance 
of a match. The defendant had near relatives in the
area which  increased the chances of a match to about one in 6,500,
according to a defense specialist. That would reduce the chance of an
accidental 
match to $6499 / 6500 = 0.999846154 \approx 99.98\%$. 
We're not surprised that the change from 99.999967 percent to 99.98\%
did not convince the jury to acquit. 99.98\% still sounds very much
like a sure thing.

But it's not, because of the prosecutor's fallacy. That was the basis
for the appeal. Suppose we
reduce the population from which the match might come to just
100,000 -- the nearby area where there may be close relatives. Then
the 1 in 6,500 chance of a match means there will be about 15 matches
in that population.
The numbers in the revised contingency table~\ref{mcdaniel2}
show there is now a $15:1$ chance that the 
DNA match fingers an innocent person rather than the true criminal.

\begin{table}[ht]
\centering
\begin{tabular}{|c|r|r||r|}
\hline 
%\multicolumn{2}{|c|}{} &  \multicolumn{3}{|c|}{} \\
%\hline
\multicolumn{1}{|c|}{}  & guilty & innocent & total\\
\hline
DNA match &  1 &   15  & 16 \\
DNA nonmatch & 0  & 99,984  &  99,984 \\
\hline
\hline
 total & 1 & 99,999 & 100,000 \\
\hline
\end{tabular}
\caption{DNA matching}
\tablesource{Handbuilt data.}
\label{mcdaniel2}
\end{table}

Nevertheless, the story did not end well for Brown. 

\begin{qwrap}
\begin{quotation}
\firstline{The Supreme Court [overturning the appeals court order for}
a retrial] said 
in a per curiam opinion that overstated estimates of a DNA match at
trial did not warrant reversal of a conviction when there is still
``convincing evidence of guilt.''% 
\webref{%
http://www.criminallawlibraryblog.com/2010/01/us_supreme_court_update_mcdani.html
}
\end{quotation}
\sourceinfo{http://www.criminallawlibraryblog.com/2010/01/us_supreme_court_update_mcdani.html}
\end{qwrap}

%\footnote{
%\url{http://www.supremecourt.gov/opinions/09pdf/08-559.pdf}
%}

%http://www.scotusblog.com/2009/08/argument-preview-mcdaniel-v-brown/
%\url{http://en.wikipedia.org/wiki/Prosecutor's\_fallacy} 
%http://www.conceptstew.co.uk/PAGES/prosecutors_fallacy.html
%http://www.wacocriminallawblog.com/2010/01/articles/evidence-and-procedure/the-prosecutors-fallacy/
%Mr. Brown was charged with sexual assault. The victim could not
%identify him, and the evidence was all circumstantial; the type where
%it could support innocence just as easily as guilt. The most
%compelling evidence was DNA recovered from sperm on the victim's
%panties. And it was the DNA evidence that was the focus of the writ
%proceeding. 
%
%Mr. Brown lived with his brother, and there was another brother that
%also knew the victim. They all lived in the same trailer park, so it
%was obvious that there would be an issue as to whether the DNA could
%be attributed to one of the brothers. The argument was over
%probabilities; according the State's expert, the probability that
%another person from the general population would have the same DNA
%profile was 1 in 3,000,000. The defense expert expert said it was more
%like 1 in 6,500. 
%
%The prosecutor's fallacy is the assumption that the random match
%probability is the same as the probability that the defendant is the
%source of the DNA sample. In other words, you cant take that the above
%statistic and say the probability that someone other than the
%defendant committed the offense was 1 in 3,000,000; or that there is a
%99.9% chance that the defendant is guilty. 

%http://dna-view.com/profile.htm

\qrsection[retrospective]{Should they have known?}

After unusual disasters like terrorist attacks, earthquakes, severe
storms or airplane crashes you often hear finger-pointing discussions about
the incompetence of the agencies charged with predicting (perhaps even
preventing) what happened. Those discussions may start with a search
that discovers warning signs that were ignored.

Sometimes there were real lapses, and policies and
practices must be designed to prevent a recurrence.
But often blame is unjustified. Table\ref{disaster} 
explains why, even without numbers. You might call this {\em
qualitative reasoning}.
\index{qualitative reasoning}

\begin{table}[ht]
\centering
\begin{tabular}{|c|r|r||r|}
\hline 
\multicolumn{1}{|c|}{}  & disaster & nothing happens & total \\
\hline
warning &  rare &   often  & often \\
no warning & rare  & usually  &  almost always \\
\hline
\hline
 total & rare & usually &   \\
\hline
\end{tabular}
\caption{Should it have been predicted?}
\tablesource{Handbuilt data.}
\label{disaster}
\end{table}

With numbers in the first column you can compute the
probability that a disaster occurs with no warning at all.
With numbers in the first row you can compute the
probability that a particular warning actually corresponds to a
disaster about to happen. That probability is small, because
there are many warnings but few
disasters. Most warnings are false positives. 

That means there are often good reasons for ignoring a warning. For
example, if a state agency believes an earthquake warning it may order
the evacuation of an entire city. The expense and disruption from repeated
evacuations that are not followed by an earthquake may be worse than
the consequences in the rare instance when the earthquake
happens. Just because after the fact you look back and find clues in the
seismic record that suggested an earthquake was imminent doesn't mean
the agency should have acted.

Table~\ref{lotteryTable} illustrates the ultimate example of the error
you can make reading a column instead of a row. 

\begin{table}[ht]
\centering
\begin{tabular}{|c|c|r|r||r|}
\hline 

\multicolumn{2}{|c|}{} &  \multicolumn{3}{|c|}{bought a ticket} \\
\hline
\multicolumn{2}{|c|}{}  & yes & no & total\\
\hline
\multirow{2}{*}{won the lottery}
	 & yes &  1  &   0  & 1 \\
	 & no  &  many  &  very many & very many \\
\hline
\hline
 & total & many & very many & very many \\
\hline
\end{tabular}
\caption{Playing the lottery}
\tablesource{Handbuilt data.}
\label{lotteryTable}
\end{table}

The first row says that the probability that a lottery winner
bought a ticket is 100\%. The first column says that the probability that
a particular ticket won the lottery is very small.

\begin{teacher}

If you want to go further into the analysis of dependence (perhaps
leading to Bayes' theorem) consider two way tables as the entry
point. Independence corresponds to tables whose rows (and hence
columns) are proportional. Those are the only ones that can be modeled
using areas of parts of a square, as in the last chapter.

Causation corresponds to tables with a 0 in one quadrant.
\end{teacher}

\exstart

\begin{exx}{\untested\routine}
Depression

Compute the false positive and false
negative rates from the 
probabilities for each of the four corners of the contingency
Table~\ref{vennTable}. Check that the answers match those in the text
computed from the actual counts.

\end{exx}

\begin{exx}{\hassolution\routine}
\headline{Researchers link chronic fatigue syndrome to class of virus}

\begin{qwrap}
\begin{quotation}
\firstline{WASHINGTON -- A well-respected team of scientists released}
long-awaited new evidence yesterday that a virus could be playing a
role in chronic fatigue syndrome. 

The researchers, from the National Institutes of Health, the Food and
Drug Administration, and Harvard Medical School, analyzed blood
samples that had been collected 15 years ago from 37 patients with
chronic fatigue syndrome. Most of the subjects -- 32, or 86.5
percent -- tested positive for a virus known as a murine leukemia
virus-related virus, the researchers found. In contrast, tests on 44
healthy blood donors detected evidence of the virus in only three of
the subjects, or 6.8 percent.%
\webref{%
http://www.boston.com/news/nation/articles/2010/08/24/researchers_link_chronic_fatigue_syndrome_to_class_of_virus/
}
\index{chronic fatigue syndrome}
\end{quotation}
\sourceinfo[515]{http://www.boston.com/news/nation/articles/2010/08/24/researchers_link_chronic_fatigue_syndrome_to_class_of_virus/}
\end{qwrap}

\begin{abcd}
\item Construct the contingency table for this diagnostic tool.

\item Explain why this test is potentially important for research on
chronic fatigue syndrome but might not be a good screening test.
\end{abcd}

\begin{sol}

\begin{abcd}
\item Construct the contingency table for this diagnostic tool.

\begin{center}
\begin{tabular}{|c|c|r|r||r|}
\hline 

\multicolumn{2}{|c|}{} &  \multicolumn{3}{|c|}{has chronic fatigue
syndrome} \\
\hline
\multicolumn{2}{|c|}{}  & yes & no & total\\
\hline
\multirow{2}{*}{tested positive}
	 & yes &  32  &   5  & 37 \\
	 & no  &  3  &  41 & 44 \\
\hline
\hline
 & total & 35 & 46 & 81 \\
\hline
\end{tabular}

\end{center}

\item Explain why this test is potentially important for research on
chronic fatigue syndrome but might not be a good screening test.
\end{abcd}

The test suggests pretty clearly that a virus may be involved in
chronic fatigue syndrome. That is a lead worth pursuing with further
research. However, the false positive rate is almost 7\%. Since the
disease isn't very common, sceening will produce lots of false
positive results, with concomitant anxiety and expense.

The test might be good for people who already show symptoms suggesting
they have the disease.

\end{sol}
\end{exx}

\begin{exx}{\untested}
A home \myindex{pregnancy test} kit web site says

\begin{qwrap}
\begin{quotation}
\firstline{Home tests are usually 97\% accurate when all instructions}
are followed correctly and the results are read on time. 

A false positive pregnancy test is when the test says that you are
pregnant but actually you are not. This is a one off case and a
positive pregnancy test is a pretty good indication that you are
pregnant. False positive pregnancy tests are rare - though there are
instances and conditions where they can occur.%
\webref{%
http://www.babyhopes.com/articles/falsepositive.html
}
\end{quotation}
\sourceinfo{http://www.babyhopes.com/articles/falsepositive.html}
\end{qwrap}

\begin{abcd*}

\item 
What do you think the phrase ``97\% accurate'' means in this
quotation? 
\item
What quantitative information is missing that would allow
you to interpret the results of this test more accurately? 
\item
Can you
find that information on this website, or elsewhere on the internet?
\end{abcd*}

\end{exx}

\begin{exx}{\untested\complex}
Prenatal screening.\index{prenatal screening}

T Davies wrote in July of 2008 in answer to an on line query about
prenatal screening for a particular problem:

\begin{qwrap}
\begin{quotation}
\firstline{Prenatal screening for trisomy 18: Should not be contemplated.}

K Spencer and colleagues' claims for prenatal detection of trisomy 18
by measurement of maternal serum (alpha) fetoprotein and free $\beta$
human chorionic gonadotrophin concentrations are impressive. Detection of
50\% of cases for a false positive rate of only 1\% seems to compare
favourably with the detection rate for Down's syndrome when similar
techniques are used, which is 70\% for a false positive rate of
5\%. Unfortunately, the authors fail to emphasise the importance of the
relative incidence of the two conditions at birth before concluding
that screening for trisomy 18 should be introduced. 

The natural incidence of Down's syndrome at birth is approximately
12.6/10,000 births. Among 10,000 pregnant women a 70\% sensitivity
would result in 8.8 cases being detected at the cost of 500
amniocenteses (5\% of 10,000). This means that one case of Down's
syndrome is detected for every 57 amniocenteses performed. The
incidence of trisomy 18 at birth is 1.3/10,000 births. A sensitivity
of 50\% would detect 0.65 cases per 10,000 women tested at a cost of
100 amniocenteses (1\% of 10,000). For each case of trisomy 18
detected, therefore, 154 women would have to have had
amniocentesis. Thus a screening programme would cause the abortion of
at least as many normal fetuses as it would detect cases of trisomy
18. 

In many places it is still undecided whether screening for Down's
syndrome is worth the disbenefits for the prospective parents. To my
mind, the decision is clear for screening for trisomy 18: screening
should not be contemplated until the predictive value of the test is
considerably improved.%
\webref{%
http://askville.amazon.com/understand-False-Positive-test-Trisomy-18/AnswerViewer.do?requestId=12714458
}
\end{quotation}
\sourceinfo{http://askville.amazon.com/understand-False-Positive-test-Trisomy-18/AnswerViewer.do?requestId=12714458}
\end{qwrap}

Check his arithmetic.

\end{exx}

\begin{exx}{\untested\complex}
Missile defense.
\index{missile defense}
\index{Postel, Theodore}

Theodore A. Postel wrote in \theGlobe{} on April 15, 2008 that

\begin{qwrap}
\begin{quotation}
\firstline{THE HOUSE Subcommittee on National Security and Foreign}
Affairs will 
hold a long-overdue oversight hearing tomorrow on the prospects for
national missile defense. The most basic question that needs to be
addressed is the inability of the national missile defense to tell the
difference between simple warheads and decoys.

\ldots

The issue of the effectiveness of decoys against the missile defense
is easy to understand. The national missile defense is designed to
destroy warheads by hitting them with infrared homing Kill Vehicles
while the warheads are in the near vacuum of space. Since there is no
air-drag in space, a warhead weighing thousands of pounds and a
balloon weighing almost nothing will travel together. Warheads could
be placed inside balloons, and many balloons could be deployed along
with the warheads. \ldots Since there would be
no way for the Kill Vehicle to know which balloons contain warheads,
the chances of actually hitting a warhead would be minuscule.%
\webref{
http://www.boston.com/bostonglobe/editorial\_opinion/oped/articles/2008/04/15/troubling\_questions\_about\_missile\_defense/}
\end{quotation}
\sourceinfo[667]{http://www.boston.com/bostonglobe/editorial_opinion/oped/articles/2008/04/15/troubling_questions_about_missile_defense/}
\end{qwrap}

\marginpar{
Not yet sure how to phrase a question about this. Is there
a natural two way contingency table?
}

\end{exx}

\begin{exx}{\untested\complex}
Spam

Spam is junk email. Most mail systems have a spam filter that tries to
decide whether each piece of email you get is spam. When the spam
filter finds something it thinks is spam, it may throw it away, or put
it in a junk mail folder so that you can decide whether to throw it
away without reading it. 

Before my university department set up a spam filter I ran my own.%
\footnote{The ``I'' here is Ethan Bolker, one of the authors, not the
generic authorial ``we'' we use in most of the book.
}
found that I got about 250 emails each day. My spam filter trapped
about 175 of them. Of those about five were legitimate, and should
have been delivered directly to me. My inbox, which should
contain just the emails that aren't spam, was usually about half
spam. So (in words) my spam filter is pretty good (but not perfect) at
recognizing legitimate email but not very good at calling spam
spam. 

\begin{abcd}

\item Build a two way contingency table with
row categories ``spam'' and ``not spam'', column categories  
``called spam'' and ``called legitimate''. 

\item Compute and interpret the false positive and false negative rates.

\item
Explain why both the false positives and the false negatives make
dealing with my email harder.

\item
I can adjust the settings in my spam filter to reduce the false
positive rate. Explain why that would increase the false negative
rate.

\item Is the number of spam emails I received consistent with
this quotation from the August, 6 2008 issue of \theNewYorker ?

\begin{qwrap}
\begin{quotation}
\firstline{More than a hundred billion unwanted messages clog}
computer networks every day.%
\webref{%
http://www.newyorker.com/reporting/2007/08/06/070806fa_fact_specter
}
\end{quotation}
\sourceinfo{http://www.newyorker.com/reporting/2007/08/06/070806fa_fact_specter}
\end{qwrap}

\item
What is the original meaning of the word "spam"? Does the company that
sells (the real) spam object to the new meaning? 

\item How do you deal with spam? (If your email provider does all the
filtering for you you may not even know it's throwing things away
before you see them, so you may need to do some research on your email
provider's web site to find the answers to these questions.)

\begin{itemize}

\item
Who provides your email service? (your university, your company,
Google, Yahoo, ... ?) 

\item
Do you have any say in how your email provider filters spam for you?
If so, what do you tell it?

\item
Estimate the data you need to build the two way table for your spam
statistics and compute the false negative and false positive rates.

\end{itemize}

\end{abcd}

Here are some web sites to look at if you want to find out more about spam.

\begin{itemize}

\item
\url{http://www.imediaconnection.com/content/3649.asp}. There are some
useful tips here about how to keep other people's spam filters from
thinking mail from you is spam. 

\item
Tools your system administrator might use:
\url{http://www.spamcop.net/}, \url{http://www.spamhaus.org/}

\end{itemize}

\end{exx}

\begin{exx}{\hassolution} Plagiarism
\index{plagiarism}

In 2006 UMass Boston experimented with the \myindex{plagiarism}
detection software described at
\url{http://www.turnitin.com} that 
claims it can identify plagiarism in essays students write. 
UMass did not purchase the software after the experiment.
Perhaps the possibility of false positives contributed to that
decision.

Suppose that the software can actually detect every cheater and that
it's 99\% accurate in declaring honest students honest. (We made up
these numbers since the company does not advertise them.) Sounds like
a pretty good test. 

\begin{abcd}

\item
Estimate how many papers are submitted by students at your school each
semester. 

\item
Suppose that most students are honest. Estimate how
many students will be falsely accused of cheating.

\item
What are the advantages and disadvantages of using the software?
(There are several arguments on both sides of the question. Think of
as many as you can.) 

\item Read and write about this article from \theTimes: 
\url{http://www.nytimes.com/2010/07/06/education/06cheat.html}

% Bates/Bowdoin tutorial http://abacus.bates.edu/cbb/quiz/index.html 

% plagiarism resouce site
% http://abacus.bates.edu/cbb/index71ca.html?q=node

\end{abcd}

\begin{sol}

\begin{abcd}

\item
Estimate how many papers are submitted by students at your school each
semester. 

In the spring of 2011 there were about 13,000 students at UMass
Boston. If each one wrote six papers a semester that would come to
about 80,000 papers -- a nice round number in the right ballpark.

\item
Suppose that most students are honest. Estimate how
many students will be falsely accused of cheating.

Since most of the 80,000 papers are honest, the false positive rate
applies -- one percent of them, or 800 papers, will be falsely tagged
as plagiarised. That might
not be quite 800 students, since some students might be unjustly
accused twice, but the order of magnitude is right.

\item
What are the advantages and disadvantages of using the software?
(There are several arguments on both sides of the question. Think of
as many as you can.) 

An advantage is that some plagiarists will be caught who might
otherwise get away with it. Another is that students might be less
likely to cheat knowing that this software was being used.

I can think of several disadvantages. One is the anxiety caused by the
false accusations. Another is the cost. 
\end{abcd}

\end{sol}
\end{exx}

\begin{exx}{\untested}
Mad cow disease\index{mad cow disease}

\myindex{Bovine Spongiform Encaphalopathy}(\myindex{BSE})
is a disease fatal to people who eat infected beef products.
BSE is rare in cattle; the test used to detect it has a
false positive rate of one in 100,000.

\begin{abcd}
\item Express this false positive rate as a percentage. 
Explain what it means.

\item The United States tested about 788,000 cattle between 2004 and
2006. About how many cattle would test positive for BSE?

\suspend{abcd}
\begin{hint}
Since you do not know the actual number of infected cattle, you can't
know exactly how many would test positive. But using the fact that the
disease is rare, you can estimate the number of positive test results
using the known false positive rate.
\end{hint}
\resume{abcd}

\item Discuss whether you would worry more about a false
positive result or a false negative result.

\end{abcd}

\end{exx}

\begin{exx}{\untested}
Airport screening
\index{airport screening}

In response to the article \headline{Screening programme evaluation applied
to airport security}
in the December 10, 2007 issue of the British Medical
Journal,
Ganesan Karthikeyan wrote

\begin{qwrap}
\begin{quotation}
\firstline{It is probably true that airport security in its present}
form is not an efficient screening measure. However, one important
difference exists between screening for disease in individual patients
and screening for, say, explosives in airports. While one missed
cancer on screening can cause the loss of at the most, one life, the
number of potential lives lost per missed screening at airports can be
substantially larger. This has to be factored into any attempts at
evaluation of the process.%
\webref{%
http://www.bmj.com/rapid-response/2011/11/01/cost-negative-test
}
\end{quotation}
\sourceinfo{
http://www.bmj.com/rapid-response/2011/11/01/cost-negative-test
}
\end{qwrap}

It's clear that a false negative is a disaster. Discuss the
consequences of a high false positive rate.

\end{exx}

\begin{exx}{\untested} \myindex{Mr. Boffo}

Construct a reasonable two way contingency table that
incorporates the data in the Mr. Boffo cartoon that starts this
chapter. Label the rows and columns. Explain how you made up the
numbers.

\end{exx}

\begin{exx}{\untested\needsquestions} \headline{False Positive Oral
Fluid Rapid HIV 
Testing in NYC STD Clinics}\index{HIV testing}\index{STD testing}

Read the City of New York DEPARTMENT OF HEALTH AND MENTAL HYGIENE
Health Advisory \#20 at
\url{http://www.nyc.gov/html/doh/downloads/pdf/cd/08md20.pdf}. 

Build the two way contingency tables based on the data there, and
discuss the consequences of the data.

\end{exx}

\begin{exx}{\untested\complex}
Candy leads to crime

An article headlined
\headline{Happy Halloween! Kids who eat candy every day grow up to be violent
criminals}\ in the October 2, 2009 {\em Daily Finance}, begins

\begin{qwrap}
\begin{quotation}
\firstline{Quick, hide the candy jar! Feeding your child candy every}
day could help turn Junior into a violent criminal, according to a
large study in Britain, which found that 69 percent of the
participants who had committed violence by 34 had eaten sweets or
chocolate nearly every day during childhood. 
\end{quotation}
\sourceinfo[584]{http://www.dailyfinance.com/2009/10/02/happy-halloween-kids-who-eat-candy-every-day-grow-up-to-be-viol/}
\end{qwrap}

You can find the full text at
\url{http://www.dailyfinance.com/2009/10/02/happy-halloween-kids-who-eat-candy-every-day-grow-up-to-be-viol/}

Here is the Associated Press version:
\url{http://www.wtop.com/?nid=105&sid=1775511}

And here is an abstract of the original study, from the British
Journal of Psychiatry:
\url{http://bjp.rcpsych.org/cgi/content/abstract/195/4/366}.

\begin{abcd}

\item
Read the rest of the article. Build the contingency table with columns
for whether or not someone ate candy as a child, rows for whether or
not they committed violence as an adult.

\item 
Explain why this is an example of the prosecutor's fallacy.

\item
Some of the on line comments on that article recognize the fallacy --
for example

\begin{quotation}
\noindent
10-03-2009 @ 10:21PM \\
Bski said... \\
I bet you, 99\% of criminals ate bread daily by the time they were 10
years old!!!! 
\end{quotation}

Write your own blog entry, using your understanding of two way
contingency tables to enlighten any readers. If you like what you've
written you can post your comment on the article's blog.
\end{abcd}


\end{exx}


\begin{exx}{\hassolution} Breast cancer screening.

In his \headline{Chances Are} blog in \theTimes{} on
April 25, 2010 Steven Strogatz wrote about a diagnostic puzzle
presented to several doctors:

\begin{qwrap}
\begin{quotation}
\firstline{The probability that [a woman in this cohort] has breast}
cancer 
is 0.8 percent.  If a woman has breast cancer, the probability is 90
percent that she will have a positive mammogram.  If a woman does not
have breast cancer, the probability is 7 percent that she will still
have a positive mammogram.  Imagine a woman who has a positive
mammogram.  What is the probability that she actually has breast
cancer?

\ldots

[When 24 doctors were asked this question], their
estimates whipsawed from 1 percent to 90 percent.   Eight of them
thought the chances were 10 percent or less, 8 more said 90 percent,
and the remaining 8 guessed somewhere between 50 and 80 percent.
Imagine how upsetting it would be as a patient to hear such divergent
opinions. 
\webref{
http://opinionator.blogs.nytimes.com/2010/04/25/chances-are/
}
\end{quotation}
\sourceinfo[1800]{http://opinionator.blogs.nytimes.com/2010/04/25/chances-are/}
\end{qwrap}

\begin{abcd}
\item
What is the correct answer?

\suspend{abcd}

\begin{hint}
Build the contingency table, based on a population of 1,000 women
tested. 
\end{hint}

\resume{abcd}
\item What percentage of the 24 doctors got the correct answer?

\end{abcd}

\begin{sol}
\begin{abcd}
\item
What is the correct answer?

Here is the contingency table, based on 1,000 women screened.
\begin{center}
\begin{tabular}{|c|c|r|r||r|}
\hline 
\multicolumn{2}{|c|}{} &  \multicolumn{3}{|c|}{has breast cancer} \\
\hline
\multicolumn{2}{|c|}{}  & yes & no & total\\
\hline
\multirow{2}{*}{screens +}
	 & yes &   7 &   70  & 77 \\
	 & no  &   1 &  922  & 923  \\
\hline
\hline
 & total & 8 & 992 & 1,000 \\
\hline
\end{tabular}
\end{center}

So the probability that a woman with a positive mammogram actually has
cancer is just 7/77 = 1/11, or about 9\%.

\item What percentage of the 24 doctors got the correct answer?

Eight doctors thought the correct answer was less than 10\%, which it
is. One doctor thought it was just 1\%, so I won't count that as a
correct answer. That means 7/24 or about 30\% got the answer right. 

\end{abcd}

\end{sol}

\end{exx}

\begin{exx}{\untested}
\headline{Identity fraud dragnet hardly seems worth the expense or trouble}

On July 24, 2011 Jane Allen wrote in a letter to the editor of
\theGlobe{} that

\begin{qwrap}
\begin{quotation}
\firstline{[T]he state Registry of Motor Vehicles sends out 1,500}
suspension letters a day. Last year, as a result of the software,
State Police said there were 100 arrests for fraudulent identity and
1,860 licenses were revoked. 

That means that about 390,000 people were questioned for the sake of
finding fewer than 2,000 transgressors. This hardly seems worth the
\$1.5 million grant for the software, let alone the investment of
personnel.
\webref{
http://www.boston.com/bostonglobe/editorial_opinion/letters/articles/2011/07/24/identity_fraud_dragnet_hardly_seems_worth_the_expense_or_trouble/
}
\end{quotation}
\sourceinfo[132]{http://www.boston.com/bostonglobe/editorial_opinion/letters/articles/2011/07/24/identity_fraud_dragnet_hardly_seems_worth_the_expense_or_trouble/}
\end{qwrap}

\begin{abcd}
\item
Check Allen's arithmetic in the second paragraph.

\item Construct the contingency table for this screening. Identify the
true and false positives and negatives. Explain the
costs and benefits.
\end{abcd}

\end{exx}

\begin{exx}{\untested}
In Andrew Gelman's \index{Gelman, Andrew} blog on
\headline{Statistical Modeling, Causal Inference, and Social Science}
commenter Mike Spagat writes that

\begin{qwrap}
\begin{quotation}
\firstline{Even within exceptionally violent environments most}
households will still not have a violent death. So a very small false
positive rate in a household survey will cause substantial upward bias
in violence estimates. 
\webref{
http://andrewgelman.com/2011/08/the_reliability/
}
\end{quotation}
\sourceinfo{http://andrewgelman.com/2011/08/the_reliability/}
\end{qwrap}

Write a paragraph or two explaining this to someone who is interested
and smart enough to understand this but has not studied the material in
this chapter. Consider making up some numbers to illustrate your argument.
\end{exx}

\begin{ExtraExercises}

\begin{exx}{\needsquestions}
\headline{Asian carp eDNA testing can lead to false positives, study
finds}

\begin{qwrap}
\begin{quotation}
\firstline{The technical team developed a strategy to test for false}
positives.    Water samples were collected in April from a metro lake
that served as a negative control (very little chance Asian carp could
be present).   Twenty samples were sent to the Corps of Engineers
laboratory in Vicksburg and 20 were sent to the private contractor
that did the 2011 analysis.  All of the samples from the Corps of
Engineers lab tested negative, while one sample from the private
contractor tested positive for silver carp.  This sample was tested
again and the positive was verified. 

There is a high likelihood this is a false positive which creates
uncertainty about previous results.    The percentage of positives in
the 2012 samples was much lower than previous samples suggesting there
may have been a mix of real and false positive samples in 2011.   This
does not minimize eDNA testing as an important tool for detecting
Asian carp, but it does emphasize the need to determine the source of
false positives and to review and modify sampling and analytical
procedures.  In addition, we have collected live Asian carp from the
St. Croix and Mississippi Rivers which are definitive evidence these
fish are present and pose a threat to Minnesota. 
\webref{%
http://blogs.twincities.com/outdoors/2012/07/27/asian-carp-edna-testing-can-lead-to-false-positives-study-finds/
}
\end{quotation}
\sourceinfo{http://blogs.twincities.com/outdoors/2012/07/27/asian-carp-edna-testing-can-lead-to-false-positives-study-finds/}
\end{qwrap}

{\em The posting has no numbers, so it's hard to write questions.}

\end{exx}

\begin{exx}{needsquestions}
\headline{Shaky Foundations for the New Mammogram Economy}

From Bloomberg news, on August 1, 2012:

\begin{qwrap}
\begin{quotation}
\firstline{Part of the problem is that, on a mammogram, a
noncancerous}
abnormality can look very much like cancer. \ldots
This causes 10 percent to 15 percent of screened
women in the U.S. to be recalled for more evaluation. Most (95
percent) screening-detected abnormalities are ultimately found to be
noncancerous. An American woman who is regularly screened during her
40s has a 61 percent chance of getting a false positive result. 

\ldots

Now, for every \$100 spent on screening, an additional \$30 to \$33 is
spent to evaluate false positive findings. In the Medicare population,
the workup of false positive mammogram results is estimated to total
\$250 million a year.%
\webref{%
http://www.bloomberg.com/news/2012-08-01/shaky-foundations-for-the-new-mammogram-economy.html
}
\end{quotation}
\sourceinfo{
http://www.bloomberg.com/news/2012-08-01/shaky-foundations-for-the-new-mammogram-economy.html
}
\end{qwrap}

Also see the report
\headline{High Rate of False-Positives with Annual Mammogram} 
from UCSF:

\begin{qwrap}
\begin{quotation}
\firstline{For the false-positive study, the researchers found that}
after a decade of annual screening, a majority of women will receive
at least one false-positive result, and 7 to 9 percent will receive a 
false-positive biopsy recommendation.
\webref{http://www.ucsf.edu/news/2011/10/10778/high-rate-false-positives-annual-mammogram
}
\end{quotation}
\sourceinfo{
http://www.ucsf.edu/news/2011/10/10778/high-rate-false-positives-annual-mammogram
}
\end{qwrap}

\end{exx}

\begin{exx}{\untested}
Testing for prostate cancer

Figure~\ref{fig:psa} from the Department of Family Medicine at
Virginia Commonwealth University illustrates the possible
outcomes of a \myindex{PSA} screening test for prostate cancer.

\begin{figure}[ht]
\centering
\includegraphics[height=60mm]{\here/psa.jpg}
\caption{Prostate cancer screening test results}
\figsource{http://www.familymedicine.vcu.edu/research/misc/psa/index.html}
\label{fig:psa}
\end{figure}

This quotation spells out some of the arguments for and against the test:

\begin{qwrap}
\begin{quotation}
\firstline{There are possible advantages to having a PSA test.}

\begin{enumerate}
\item   A normal PSA test may reassure you.
\item   A PSA test may find prostate cancer early before it has
spread.
\item  Treatment of prostate cancer in early stages may help some men
to avoid problems from cancer. 
\item  Treatment of prostate cancer in early stages may help some men live longer.
\end{enumerate}

There are possible disadvantages to having a PSA test.

\begin{enumerate}
\item  A normal PSA test may miss some prostate cancers.
\item  A false positive PSA test may cause unnecessary anxiety.
\item  A false positive PSA test may cause an unneeded prostate
biopsy. 
\item  You may find out that you have prostate cancer, but it may be a
cancer that would never cause you any problems. 
\item  Treatment of prostate cancer may cause you harm. Difficulties
with getting erections or problems with controlling your bladder or
bowels are some potential harms. 
\end{enumerate}
\sourceinfo{http://www.familymedicine.vcu.edu/research/misc/psa/index.html}
\end{quotation}
\end{qwrap}

\begin{abcd}
\item What does ``PSA'' stand for?
\item Construct the two way contingency table based on this data.
\item If your screening test is positive, what is the probability that
you have prostate cancer?
\item If you have prostate cancer, what is the probability that this
screening test will detect it?
\end{abcd}

The figure shows that the overall incidence of prostate cancer is
10\%. That is probably an invented statistic, to make the arguments
easier to understand. The reality is complex. Here's a start on it,
from the American Cancer Society:

\begin{qwrap}
\begin{quotation}
What are the key statistics about prostate cancer?

Other than skin cancer, prostate cancer is the most common cancer in
American men. The latest American Cancer Society estimates for
prostate cancer in the United States are for 2012: 

    About 241,740 new cases of prostate cancer will be diagnosed

    About 28,170 men will die of prostate cancer

About 1 man in 6 will be diagnosed with prostate cancer during his lifetime.

Prostate cancer occurs mainly in older men. Nearly two thirds are
diagnosed in men aged 65 or older, and it is rare before age 40. The
average age at the time of diagnosis is about 67. 

Prostate cancer is the second leading cause of cancer death in
American men, behind only lung cancer. About 1 man in 36 will die of
prostate cancer. 

Prostate cancer can be a serious disease, but most men diagnosed with
prostate cancer do not die from it. In fact, more than 2.5 million men
in the United States who have been diagnosed with prostate cancer at
some point are still alive today. 
\end{quotation}
\webref{http://www.cancer.org/Cancer/ProstateCancer/DetailedGuide/prostate-cancer-key-statistics}
\sourceinfo{
http://www.cancer.org/Cancer/ProstateCancer/DetailedGuide/prostate-cancer-key-statistics
}
\end{qwrap}

\end{exx}

\end{ExtraExercises}

\begin{ReviewExercises}

One reviewer says

\begin{quotation}
While the problems make nice links to real-world problems, a few more
"routine" problems with the contingency tables provided might be
helpful for initial practice with computation and interpretation. 
\end{quotation}


\end{ReviewExercises}