This article was originally published on LinkedIn, May 10, 2025: https://www.linkedin.com/pulse/dont-go-backward-tom-regan-j3rqe
Introduction
This article discusses a question related to heredity that comes up from time to time in assessment. The question is whether parents can be identified based on the characteristics of their offspring. As motivation, we will take a brief detour into behavioral economics.
In a series of experiments by Tversky and Kahneman,
Subjects were asked to categorize persons, e.g., as a “salesman” or a “member of parliament,” on the basis of given descriptions. Confronted with a description of an individual – randomly drawn from a given population – as “interested in politics, likes to participate in debates, and is eager to appear in the media,” most subjects would say that the person is a member of parliament, even though the higher proportion of salespersons in the population makes it more likely that the person is a salesman.[1]
In other words, the fact that politicians are interested in politics does not mean that people who are interested in politics are politicians. That deduction requires knowledge of the population from which the people are drawn. Such reasoning is called Bayesian inference[2], which has broad applicability. For example, it is used to determine the probability that an individual displaying a symptom actually has the associated disease. It also applies to a seemingly-straightforward question of heredity.
Heredity Review
A simple model of heredity starts with a plant that can have either red flowers or white flowers. We will assume that this plant’s flower color is controlled by a single gene which has two variants, or alleles. The allele associated with red flowers is denoted by R and the allele associated with white flowers is denoted by r. Each offspring plant inherits one allele for this gene from each parent. If an offspring inherits an R allele from both parents, its genotype is RR and its phenotype is red—-it has red flowers. If the offspring inherits an r allele from both parents, its genotype is rr and its phenotype is white—-it has white flowers. If the offspring inherits an R allele from one parent and an r allele from the other, yielding a genotype of Rr, then the flower color is determined by how those different alleles interact. We will assume that R is dominant to r, meaning that the R allele is expressed and the flower is red.
In general, parents produce offspring with different combinations of genes. If the parents’ genotypes are Rr and Rr, then remembering that an offspring gets an allele from each parent, the genotype of an offspring could be RR, Rr, rR (which is the same as Rr), or rr. Any given offspring has a 25% chance of having each of those combinations of genes. And following the discussion in the previous paragraph, given those four possible genotypes, any given offspring has a 75% chance of red flowers (associated with genotypes RR, Rr and rR) and a 25% chance of white flowers (genotype rr). By the same reasoning, parents with genotypes Rr and rr will produce offspring with a 50% chance of red flowers (Rr) and a 50% chance of white flowers (rr).
The Meaning of a Percent Chance
Say that the parent genotypes are both Rr so that each offspring has a 75% chance of having red flowers and a 25% chance of having white flowers. It’s not necessarily true that if there are four offspring, three will have red flowers and one will have white flowers. That may happen—-it is the most likely outcome, with a probability of 42%[3]—-but it also may not happen. All four offspring might have red flowers, or all four offspring might have white flowers, or there might be two reds and two whites or one red and three whites. All of these outcomes are possible. All you can say with certainty is that, in accord with the Law of Large Numbers, as the number of offspring increases, the chance of a significant departure from the expected ratios decreases.[4]
The Point
A test question on this content might be “Given parents with genotypes Rr and Rr, what is each offspring’s chance of having red flowers?” The answer is 75%. So far so good. However, after asking a few questions like this, we may be tempted to try something different. We will flip the question. Instead of giving information about the parents and asking about the offspring, we will give information about the offspring and ask about the parents. The question will be, “There are four offspring, three with red flowers and one with white flowers. What are the most likely genotypes of the parents?”
We have come to the point, and the point is, Don’t do this! This question cannot be answered because you have no information about the population from which the parents are drawn.
The answer that the question is fishing for is that each parent’s genotype is Rr. The (fallacious) reasoning behind this answer is: Each offspring of Rr Rr parents has a 75% chance of having red flowers and a 25% chance of having white flowers, so if three offspring have red flowers and one offspring has white flowers, the parent genotypes are Rr Rr. As in the salesperson and politician scenario, this reasoning is fallacious, and we can demonstrate this by considering overall populations. Before doing that, let’s establish that there is a different set of parents that could also produce 3 red and 1 white offspring.
Each offspring of Rr rr parents has a 50% chance of having red flowers and a 50% chance of having white flowers. If these parents have four offspring, the most likely single outcome is two red and two white, with a probability of 37.5%. However, it is possible for these parents to have three red and one white offspring. The probability of this is 25%[5]. Our “competing” set of parents is Rr rr.
Considering Populations
In the question, the population is not specified, so we can consider any population that we want. Let’s say there are 200 sets of parents. 100 of these sets have genotypes Rr and Rr and 100 of these sets have genotypes Rr and rr. Every set of parents has four offspring.
The first 100 parents would produce about 42 sets of offspring consisting of three plants having red flowers and one plant having white flowers[6]. The second 100 parents would produce about 25 such sets of offspring. That’s 67 sets of 3 red:1 white offspring. More than half of those sets come from Rr Rr parents, so a set picked at random most likely came from Rr Rr parents. Given that population, the parents of a 3 red:1 white set of offspring are most likely Rr Rr.[7]
Next let’s consider a population that has 200 Rr rr parents rather than 100. These parents will produce about 50 3 red:1 white sets of offspring, making a total of 92 such sets. Now, more than half the 3 red:1 white offspring sets came from Rr rr parents, so a set picked at random most likely came from Rr rr parents. Given this new population, the parents of a 3 red:1 white set of offspring are most likely Rr rr, not Rr Rr.
Finally, let’s consider a very simple population: a single set of parents, Rr rr. Then, given three red and one white offspring, the most likely—actually, certain—parent genotypes are Rr rr.
Takeaway
The question-in-question is “Given four offspring, three of which express the dominant phenotype and one of which expresses the recessive phenotype, what are the most likely genotypes of the parents?” This question does not have a definite answer because it depends on the population of parents, which is unspecified.
My takeaway from this analysis is to be extremely careful when reasoning from offspring to parents. Some deductions of this nature are perfectly fine. For example, if an offspring has an r allele, then it’s certain that at least one parent has an r allele. But then there’s the question we’ve been exploring, which flings you into the realm of Bayesian inference.
[1] https://www.nobelprize.org/uploads/2018/06/advanced-economicsciences2002-1.pdf, page 14.
[2] Of Wikipedia’s Bayes-related pages, the Bayes’ Theorem page seems the most accessible: https://en.wikipedia.org/wiki/Bayes%27_theorem. The Bayesian inference page has a straightforward example involving cookies: https://en.wikipedia.org/wiki/Bayesian_inference#Examples.
[3] For Rr Rr parents, a particular 3 red:1 white outcome has probability 0.75*0.75*0.75*0.25=0.1055. There are four ways to accomplish this—the white flower could be first, second, third, or fourth—so the overall probability is 4*0.1055=0.422.
[4] Law of Large Numbers, https://mathworld.wolfram.com/LawofLargeNumbers.html
[5] For Rr rr parents, a particular 3 red:1 white outcome has probability 0.50*0.50*0.50*0.50=0.0625. There are four ways of accomplishing this, so the overall probability is 4*0.0625=0.25.
[6] If you are troubled by the claim that a 42% chance per offspring would yield about 42 out of the 100 sets of offspring, then increase the number of parent sets and offspring sets to one thousand. Or one million. According to the Law of Large Numbers, as the number of instances increases, the 42% chance per instance comes closer and closer to yielding 42% of the total number of instances.
[7] This calculation exemplifies Bayes’ Theorem. For an accessible introduction, including a discussion of Kahneman and Tversky’s work, see 3Blue1Brown, “Bayes’ Theorem: The geometry of changing beliefs”, https://www.youtube.com/watch?v=HZGCoVF3YvM