Details
The data are due to Sir Francis Galton. The data set includes the following data for 205 families: The family that the child belongs to numbered from 1 to 205. The height of the father in inches. The height of the mother in inches. The gender of the child with male (M) or female (F). The height of the child in inches. The number of children in family of the child.
Galton reduced the analysis of the data to two variables by multiplying the female heights of the children by 1.08 and defining what he referred to as a midparent. The height of the midparent was defined to be hmid=(hfather+1.08hmother)/2. That is the midparent height is the average of the father's height and the mother's height adjusted by the factor 1.08. He then considered the distribution of the paired data (hi,ci) where hi is the height of the midparent and ci is the height of the adult child. He found that these data follow a binormal distribution with parameters {μc,μm,σc,σm,ρ) where μ denotes mean, σ denotes standard deviation and ρ is correlation coefficient. The probability density function of the joint distribution of child and midparent height is log p(h,c)∝(h-μh)2/σh2+(c-μc)2/σc2-2ρ(h-μh)(c-μc)/(σmσc). Thus the distribution is elliptical in shape with the tilt of the ellipse controlled by the correlation ρ.
Galton went on to investigate whether children of tall parents tend to be tall and what to degree. He found that children of tall parents tend to be taller than average but not as tall as their parents. Similarly children of shorter parents tend to be shorter than average but taller than their parents. This is referred to as regression to the mean and is necessary in order to maintain a stable distribution of population height.
The mean of the conditional distribution p(c|h=h0) is μc+(h0-μm)ρ σc/σm. If the midparent height in question h0 is equal to the mean of the midparent heights μm then the height of the child can be expect to equal to the mean μc of the children heights. However if the value of h0 is smaller than μm then the height of the child can be expected to be greater than the height of the midparent, but less than average in height. Correspondingly if the value of h0 is greater than μm then the height of the child can be expected to be less than the height of the midparent, but above average in height.