Paper Highlight: “Log-Normal Distributions Across the Sciences” (Limpert et al.)

tldr: Many natural quantities (e.g., height, cancer survival times, concentrations in the environment, protein sizes, etc.) are non-negative and the result of independent multiplicative processes. These quantities are not distributed normally, but log-normally! That means log(X) is normally distributed, not X itself. Treating something as normal when it is actually log-normal could give incorrect results in hypothesis testing!

Central Limit Theorem: Figure 2 shows a graphical demonstration of why sums of iid variables lead to normal distributions and products lead to log-normal ones. Sums are like marbles being dropped onto a pyramid of equilateral triangles, while products are like marbles being dropped onto a pyramid of scalene triangles. The sides of the triangles are x±c for sums and c’x and x/c’ for products, where c and c’ are constants, and x is the distance from the point of the pyramid. The scalene triangles get more and more right-skewed as the variables compound, which is why the distribution becomes right-skewed. Check out the photo! It’s really intuitive.

People: I’ve often wondered why the tail of the aptitude distribution is so long. Whether it’s the ability to play basketball, solve mathematical problems, or climb rocks, it’s a truism that no matter how good you are, there is almost certainly someone who is much better. There are plenty of Nobel prize winners, but only a couple Einsteins. This seems to be because abilities, practice sessions, and opportunities multiply, not add, over time. (Interestingly, IQ is defined to be normal, not log-normal. That is because IQ is not a natural quantity, but rather just a ranking that is defined to be normal by fiat. So, IQ is not a good example of a normally distributed quantity.)

Questions: For growth or ability, it’s easy to see why the end result is multiplicative, not additive. But for other cases, it’s not so obvious. Why should stock prices be log-normal, for example? Or protein sizes? For stocks, you could maybe argue that prices change by percentages, but amino acids are discrete, so what’s happening there? If you have thoughts, please leave them in the comments!