# Research Paper Peer Review Sample

*et al.*criticizes a paper by Bradley Efron that discusses Bayesian statistics (Efron, 2013a), focusing on a particular example that was also discussed in Efron (2013b). The example concerns a woman who is carrying twins, both male (as determined by sonogram and we ignore the possibility that gender has been observed incorrectly). The parents-to-be ask Efron to tell them the probability that the twins are identical.

This is my first open review, so I'm not sure of the protocol. But given that there appears to be errors in both Efron (2013b) and the paper under review, I am sorry to say that my review might actually be longer than the article by Efron (2013a), the primary focus of the critique, and the critique itself. I apologize in advance for this. To start, I will outline the problem being discussed for the sake of readers.

This problem has various parameters of interest. The primary parameter is the genetic composition of the twins in the mother’s womb. Are they identical (which I describe as the state *x* = 1) or fraternal twins (*x* = 0)? Let *y* be the data, with *y* = 1 to indicate the twins are the same gender. Finally, we wish to obtain Pr(*x* = 1 | *y* = 1), the probability the twins are identical given they are the same gender1. Bayes’ rule gives us an expression for this:

Pr(*x* = 1 | *y* = 1) = Pr(*x*=1) Pr(*y* = 1 | *x* = 1) / {Pr(*x*=1) Pr(*y* = 1 | *x* = 1) + Pr(*x*=0) Pr(*y* = 1 | *x* = 0)}

Now we know that Pr(*y* = 1 | *x* = 1) = 1; twins must be the same gender if they are identical. Further, Pr(*y* = 1 | *x* = 0) = 1/2; if twins are not identical, the probability of them being the same gender is 1/2.

Finally, Pr(*x* = 1) is the prior probability that the twins are identical. The bone of contention in the Efron papers and the critique by Amrhein *et al.* revolves around how this prior is treated. One can think of Pr(*x* = 1) as the population-level proportion of twins that are identical for a mother like the one being considered.

However, if we ignore other forms of twins that are extremely rare (equivalent to ignoring coins finishing on their edges when flipping them), one incontrovertible fact is that Pr(*x* = 0) = 1 − Pr(*x* = 1); the probability that the twins are fraternal is the complement of the probability that they are identical.

The above values and expressions for Pr(*y* = 1 | *x* = 1), Pr(*y* = 1 | *x* = 0), and Pr(*x* = 0) leads to a simpler expression for the probability that we seek ‐ the probability that the twins are identical given they have the same gender:

Pr(*x* = 1 | *y* = 1) = 2 Pr(*x*=1) / [1 + Pr(*x*=1)] (1)

We see that the answer depends on the prior probability that the twins are identical, Pr(*x*=1). The paper by Amrhein *et al.* points out that this is a mathematical fact. For example, if identical twins were impossible (Pr(*x* = 1) = 0), then Pr(*x* = 1| *y* = 1) = 0. Similarly, if all twins were identical (Pr(*x* = 1) = 1), then Pr(*x* = 1| *y* = 1) = 1. The “true” prior lies somewhere in between. Apparently, the doctor knows that one third of twins are identical2. Therefore, if we assume Pr(*x* = 1) = 1/3, then Pr(*x* = 1| *y* = 1) = 1/2.

Now, what would happen if we didn't have the doctor's knowledge? Laplace's “Principle of Insufficient Reason” would suggest that we give equal prior probability to all possibilities, so Pr(*x* = 1) = 1/2 and Pr(*x* = 1| *y* = 1) = 2/3, an answer different from 1/2 that was obtained when using the doctor's prior of 1/3.

Efron (2013a) highlights this sensitivity to the prior, representing someone who defines an uninformative prior as a “violator”, with Laplace as the “prime violator”. In contrast, Amrhein *et al.* correctly points out that the difference in the posterior probabilities is merely a consequence of mathematical logic. No one is violating logic – they are merely expressing ignorance by specifying equal probabilities to all states of nature. Whether this is philosophically valid is debatable (Colyvan 2008), but this example does not lend much weight to that question, and it is well beyond the scope of this review. But setting Pr(*x* = 1) = 1/2 is not a violation; it is merely an assumption with consequences (and one that in hindsight might be incorrect2).

Alternatively, if we don't know Pr(*x* = 1), we could describe that probability by its own probability distribution. Now the problem has two aspects that are uncertain. We don’t know the true state *x*, and we don’t know the prior (except in the case where we use the doctor’s knowledge that Pr(*x* = 1) = 1/3). Uncertainty in the state of *x* refers to uncertainty about this particular set of twins. In contrast, uncertainty in Pr(*x* = 1) reflects uncertainty in the population-level frequency of identical twins. A key point is that the state of one particular set of twins is a different parameter from the frequency of occurrence of identical twins in the population.

Without knowledge about Pr(*x* = 1), we might use Pr(*x* = 1) ~ dunif(0, 1), which is consistent with Laplace. Alternatively, Efron (2013b) notes another alternative for an uninformative prior: Pr(*x* = 1) ~ dbeta(0.5, 0.5), which is the Jeffreys prior for a probability.

Here I disagree with Amrhein *et al.*; I think they are confusing the two uncertain parameters. Amrhein *et al.* state:

*“We argue that this example is not only flawed, but useless in illustrating Bayesian data analysis because it does not rely on any data. Although there is one data point (a couple is due to be parents of twin boys, and the twins are fraternal), Efron does not use it to update prior knowledge. Instead, Efron combines different pieces of expert knowledge from the doctor and genetics using Bayes’ theorem.”*

This claim might be correct when describing uncertainty in the population-level frequency of identical twins. The data about the twin boys is not useful by itself for this purpose – they are a biased sample (the data have come to light because their gender is the same; they are not a random sample of twins). Further, a sample of size one, especially if biased, is not a firm basis for inference about a population parameter. While the data are biased, the claim by Amrheim *et al.* that there are no data is incorrect.

However, the data point (the twins have the same gender) is entirely relevant to the question about the state of this particular set of twins. And it does update the prior. This updating of the prior is given by equation (1) above. The doctor’s prior probability that the twins are identical (1/3) becomes the posterior probability (1/2) when using information that the twins are the same gender. The prior is clearly updated with Pr(*x* = 1| *y* = 1) ≠ Pr(*x* = 1) in all but trivial cases; Amrheim *et al.*’s statement that I quoted above is incorrect in this regard.

This possible confusion between uncertainty about these twins and uncertainty about the population level frequency of identical twins is further suggested by Amrhein *et al.*’s statements:

“Second, for the uninformative prior, Efron mentions erroneously that he used a uniform distribution between zero and one, which is clearly different from the value of 0.5 that was used. Third, we find it at least debatable whether a prior can be called an uninformative prior if it has a fixed value of 0.5 given without any measurement of uncertainty.”

Note, if the prior for Pr(*x* = 1) is specified as 0.5, or dunif(0,1), or dbeta(0.5, 0.5), the posterior probability that these twins are identical is 2/3 in all cases. Efron (2013b) says the different priors lead to different results, but this result is incorrect, and the correct answer (2/3) is given in Efron (2013a)3. Nevertheless, a prior that specifies Pr(*x* = 1) = 0.5 does indicate uncertainty about whether this particular set of twins is identical (but certainty in the population level frequency of twins). And Efron’s (2013a) result is consistent with Pr(*x* = 1) having a uniform prior. Therefore, both claims in the quote above are incorrect.

It is probably easiest to show the (lack of) influence of the prior using MCMC sampling. Here is WinBUGS code for the case using Pr(*x* = 1) = 0.5.

*x*is 2/3; this is the posterior probability that

*x*= 1.

Instead of using pr_ident_twins <- 0.5, we could set this probability as being uncertain and define pr_ident_twins ~ dunif(0,1), or pr_ident_twins ~ dbeta(0.5,0.5). In either case, the posterior mean value of *x* remains 2/3 (contrary to Efron 2013b, but in accord with the correction in Efron 2013a).

Note, however, that the value of the population level parameter pr_ident_twins is different in all three cases. In the first it remains unchanged at 1/2 where it was set. In the case where the prior distribution for pr_ident_twins is uniform or beta, the posterior distributions remain broad, but they differ depending on the prior (as they should – different priors lead to different posteriors4). However, given the biased sample size of 1, the posterior distribution for this particular parameter is likely to be misleading as an estimate of the population-level frequency of twins.

So why doesn’t the choice of prior influence the posterior probability that these twins are identical? Well, for these three priors, the prior probability that any single set of twins is identical is 1/2 (this is essentially the mean of the prior distributions in these three cases).

If, instead, we set the prior as dbeta(1,2), which has a mean of 1/3, then the posterior probability that these twins are identical is 1/2. This is the same result as if we had set Pr(*x* = 1) = 1/3. In both these cases (choosing dbeta(1,2) or 1/3), the prior probability that a single set of twins is identical is 1/3, so the posterior is the same (1/2) given the data (the twins have the same gender).

Further, Amrhein *et al.* also seem to misunderstand the data. They note:

“Although there is one data point (a couple is due to be parents of twin boys, and the twins are fraternal)...”

This is incorrect. The parents simply know that the twins are both male. Whether they are fraternal is unknown (fraternal twins being the complement of identical twins) – that is the question the parents are asking. This error of interpretation makes the calculations in Box 1 and subsequent comments irrelevant.

Box 1 also implies Amrhein *et al.* are using the data to estimate the population frequency of identical twins rather than the state of this particular set of twins. This is different from the aim of Efron (2013a) and the stated question.

Efron suggests that Bayesian calculations should be checked with frequentist methods when priors are uncertain. However, this is a good example where this cannot be done easily, and Amrhein *et al.* are correct to point this out. In this case, we are interested in the probability that the hypothesis is true given the data (an inverse probability), not the probabilities that the observed data would be generated given particular hypotheses (frequentist probabilities). If one wants the inverse probability (the probability the twins are identical given they are the same gender), then Bayesian methods (and therefore a prior) are required. A logical answer simply requires that the prior is constructed logically. Whether that answer is “correct” will be, in most cases, only known in hindsight.

However, one possible way to analyse this example using frequentist methods would be to assess the likelihood of obtaining the data for each of the two hypothesis (the twins are identical or fraternal). The likelihood of the twins having the same gender under the hypothesis that they are identical is 1. The likelihood of the twins having the same gender under the hypothesis that they are fraternal is 0.5. Therefore, the weight of evidence in favour of identical twins is twice that of fraternal twins. Scaling these weights so they sum to one (Burnham and Anderson 2002), gives a weight of 2/3 for identical twins and 1/3 for fraternal twins. These scaled weights have the same numerical values as the posterior probabilities based on either a Laplace or Jeffreys prior. Thus, one might argue that the weight of evidence for each hypothesis when using frequentist methods is equivalent to the posterior probabilities derived from an uninformative prior. So, as a final aside in reference to Efron (2013a), if we are being “violators” when using a uniform prior, are we also being “violators” when using frequentist methods to weigh evidence? Regardless of the answer to this rhetorical question, “checking” the results with frequentist methods doesn’t give any more insight than using uninformative priors (in this case). However, this analysis shows that the question can be analysed using frequentist methods; the single data point is not a problem for this. The claim in Armhein *et al.* that a frequentist analyis "is impossible because there is only one data point, and frequentist methods generally cannot handle such situations" is not supported by this example.

In summary, the comment by Amrhein *et al.* raises some interesting points that seem worth discussing, but it makes important errors in analysis and interpretation, and misrepresents the results of Efron (2013a). This means the current version should not be approved.

References

Burnham, K.P. & D.R. Anderson. 2002. Model Selection and Multi-model Inference: a Practical Information-theoretic Approach. Springer-Verlag, New York.Colyvan, M. 2008. Is Probability the Only Coherent Approach to Uncertainty? *Risk Anal. * 28: 645-652.

Efron B. (2003a) Bayes’ Theorem in the 21st Century. *Science* 340(6137): 1177-1178.

Efron B. (2013b) A 250-year argument: Belief, behavior, and the bootstrap. *Bull Amer. Math Soc. * 50: 129-146.

Footnotes

- The twins are both male. However, if the twins were both female, the statistical results would be the same, so I will simply use the data that the twins are the same gender.
- In reality, the frequency of twins that are identical is likely to vary depending on many factors but we will accept 1/3 for now.
- Efron (2013b) reports the posterior probability for these twins being identical as “a whopping 61.4% with a flat Laplace prior” but as 2/3 in Efron (2013a). The latter (I assume 2/3 is “even more whopping”!) is the correct answer, which I confirmed via email with Professor Efron. Therefore, Efron (2013b) incorrectly claims the posterior probability is sensitive to the choice between a Jeffreys or Laplace uninformative prior.
- When the data are very informative relative to the different priors, the posteriors will be similar, although not identical.

## Sample Forms - Peer Review

Students utilizing well-developed feedback forms for peer review can in effect give students a deeper understanding of how their writing affects different readers, reinforce familiarity with revising strategies, and assist students in developing a familiarity with scientific writing expectations.

Several formats exist for peer-review feedback forms. Two common styles of feedback forms include criteria grids and open-ended forms. Both forms are presented in general terms below. You can also see examples of open-ended forms for a science research paper, a science lab report, a science article, and a problem-solving exercise.

## Criteria grid

A criteria grid is useful to assist students in recognizing and constructing assertion-plus-evidence arguments. Fuller responses may be obtained by leaving more space in the "Reader's Comments" column and soliciting specifics from the reviewers. The grid can be available online through a website or set up in MS Word or Excel as a table.Examples of criteria grids can be found at the University of Hawaii at Manoa's Writing Center peer-review page.

## Open-ended form

A list of open-ended questions can encourage students to provide more detailed feedback. Inform the students that the amount of space you leave for a response reflects the amount of information you are expecting.^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Author __________ Reviewer __________

The goals of peer review are 1) to help improve your classmate's paper by pointing out strengths and weaknesses that may not be apparent to the author, and 2) to help improve editing skills.

**INSTRUCTIONS**

Read the paper(s) assigned to you twice, once to get an overview of the paper, and a second time to provide constructive criticism for the author to use when revising his/her paper. Answer the questions below.

**ORGANIZATION (10%)**

1) Were the basic sections (Introduction, Conclusion, Literature Cited, etc.) adequate? If not, what is missing?

2) Did the writer use subheadings well to clarify the sections of the text? Explain.

3) Was the material ordered in a way that was logical, clear, easy to follow? Explain.

**CITATIONS (20%)**

4) Did the writer cite sources adequately and appropriately? Note any incorrect formatting.

5) Were all the citations in the text listed in the Literature Cited section? Note any discrepancies.

**GRAMMAR AND STYLE (20%)**

6) Were there any grammatical or spelling problems?

7) Was the writer's writing style clear? Were the paragraphs and sentences cohesive?

**CONTENT (50%)**

8) Did the writer adequately summarize and discuss the topic? Explain.

9) Did the writer comprehensively cover appropriate materials available from the standard sources? If no, what's missing?

10) Did the writer make some contribution of thought to the paper, or merely summarize data or publications? Explain.

## 0 comments