Imagine being presented with a sealed box and asked to disclose its contents without any further clues. This scenario might appear impossible at first glance. However, the very nature of the container offers subtle hints. For instance, whatever resides within must be smaller than the box itself. Similarly, the material of the container plays a role; a solid metal box can hold liquids and endure temperatures that would quickly compromise a cardboard one.

Is there a systematic, mathematically sound method for approaching such educated guesses? While certain events, like a coin flip or a dice roll, are inherently unpredictable and thus deemed random, for most other situations, a few powerful tools can significantly enhance your ability to narrow down possibilities, moving beyond mere speculation.

A guess that is narrowed down by logical constraints is essentially an estimation, a practice with a long and distinguished history. One of the most remarkable early instances comes from the ancient Greek philosopher Eratosthenes, who resided in Alexandria, Egypt, during the 3rd century BC. Employing a few elementary principles, he managed to calculate the Earth’s circumference with surprising accuracy. Although his precise methodology has been lost to time, subsequent writings allow for a reconstruction of his work.

Estimating the Earth’s Circumference

Eratosthenes observed that at noon on the summer solstice, the sun appeared directly overhead in the ancient city of Syene, casting no shadow down a deep well. Concurrently, in Alexandria on the same day and time, a vertical rod cast a shadow indicating an angle of approximately 7 degrees. This angle represents roughly one-fiftieth of a full circle. Knowing the distance between the two cities to be 5,000 stadia, a unit of length, he extrapolated that the Earth’s total circumference must be 50 times this distance, totaling 250,000 stadia.

While Eratosthenes made certain geometrical approximations, these can be set aside for the moment. A more significant point of uncertainty lies in the exact value of a ‘stadium’. It is widely believed that Eratosthenes used a length equivalent to approximately 160 meters. This measurement yields a circumference of 40,000 kilometers (160 meters/stadium * 250,000 stadia), a figure remarkably close to the modern measurement of 40,075 kilometers. Depending on the assumed length of a stadium, which varied between 150 and 210 meters, the accuracy of his calculation shifts, reflecting varying degrees of generosity in our assessment of his work.

The core takeaway here is that a few straightforward, yet logical, calculations can lead to a remarkably potent estimate—measuring a planet without the actual need to traverse its entire surface.

Enrico Fermi and the Art of “Back-of-the-Envelope” Calculations

This approach to estimation found a modern champion in the 20th-century physicist Enrico Fermi. Celebrated for his contributions to the first nuclear reactor and his pivotal role in the US Manhattan Project, Fermi was present at the initial detonation of an atomic weapon during the Trinity test. In an attempt to gauge the explosion’s yield, a figure that was not precisely known at the time, he dropped small pieces of paper and observed their displacement by the blast wave. Like Eratosthenes, Fermi’s exact technique was not formally documented. However, his estimate of a 10-kiloton bomb was nearly half the accepted value of 21 kilotons for the Trinity test. While not exact, this was a solid approximation that fell within the expected range.

Indeed, hitting the correct ballpark was Fermi’s hallmark—he relished these swift, informal estimations, so much so that they are now commonly referred to as Fermi problems. A classic example of such a challenge, which he often posed to his students, was to estimate the number of piano tuners in Chicago.

Starting with Chicago’s population, estimated at around 3 million, one could assume an average household size of four people, leading to approximately 750,000 households. If one in five households owns a piano, there are roughly 150,000 pianos in Chicago. Assuming a piano tuner can service four pianos each weekday, they could tune approximately 1,000 pianos annually. Consequently, if these 150,000 pianos require servicing each year, the estimated number of piano tuners in Chicago would be around 150.

The value of this estimation process lies not in its absolute accuracy but in its controlled margin of error. A series of assumptions are made—some potentially overestimating, others underestimating. Barring a significant bias in any single direction, the cumulative errors tend to remain within a bounded range. For instance, if the calculation had suggested a million piano tuners in Chicago, one could confidently assert that this figure is incorrect.

Bayesian Reasoning: Refining Estimates with New Data

While Fermi estimation provides a powerful method for initial assessments, new information can often refine these preliminary figures. Returning to the initial box analogy: if a blue ball bearing the number 32 was revealed, it would certainly influence one’s hypothesis about the box’s contents. One might infer the presence of other balls, some also blue, and others bearing numbers.

But how can this uncertainty be quantified? Thanks to Thomas Bayes, an 18th-century statistician and clergyman, we have a framework for this. Bayes’s profound insight was to reframe probability, shifting its focus from merely describing randomness—like the outcome of a coin toss—to providing a structure for measuring and revising uncertainty. He developed an equation, Bayes’ theorem, designed to translate observations into evidence. This theorem comprises four key components: the prior, the evidence, the likelihood, and the posterior.

The prior represents an initial assumption or belief. Let’s consider a scenario where one is serving three flavors of ice cream at a party—chocolate, strawberry, and vanilla—and wishes to determine the most popular flavor to ensure adequate stock. A reasonable initial assumption, or prior, is that flavor preferences are evenly distributed among attendees, with one-third of the population favoring each flavor. As the party begins, however, observations start to accumulate. If the first 10 guests all choose chocolate, this constitutes the evidence.

Here, the process becomes more intricate. To determine the likelihood, one must refer back to the initial assumption. If preferences were truly equal, what is the probability of observing 10 consecutive chocolate selections? The answer is (1/3) raised to the power of 10, approximately 1 in 60,000. This low probability strongly suggests that the initial assumption is likely incorrect, necessitating an update to favor a significantly higher preference for chocolate. This revised assumption, in turn, increases the likelihood of observing the actual evidence. This updated belief is known as the posterior.

The utility of Bayes’ theorem proves to be exceptionally broad. In the context of the box example, the first ball drawn drastically narrows the range of possibilities for its contents. If a second ball is subsequently drawn, this one red and marked “50”, the potential contents are further constrained. One can deduce that there are at least two colors of balls present, and if one assumes a uniform numbering order, the total quantity is likely small (under 100) rather than exceedingly large (over a million). Each additional ball drawn provides more evidence, which can be used to iteratively update the prior belief.

A common application where Bayes’ theorem is employed, often without explicit recognition, is in email spam filters. Early spam filters utilized Bayesian reasoning. They began with a prior assumption about the percentage of emails that are spam. This was then combined with data from user-marked spam messages (the evidence), along with the probability of specific words and phrases appearing in spam emails (the likelihood), to refine the classification of incoming emails as either legitimate or spam (the posterior).

Implications for Modern AI and Human Cognition

The effectiveness of spam filtering demonstrates that educated guessing is not merely a theoretical exercise confined to abstract problems but has direct practical relevance. Harnessing techniques such as Fermi estimation and Bayesian reasoning is becoming increasingly critical in an era dominated by pattern-matching artificial intelligence, exemplified by systems like ChatGPT. As has been noted, the current architecture of many modern AIs tends to reinforce existing beliefs rather than actively updating or challenging them. They often prioritize matching established patterns, sometimes without fully accounting for novel evidence that deviates from these patterns. Therefore, to avoid succumbing to potentially inaccurate AI-driven estimations, it is essential for individuals to cultivate their own capacity for sound inferential reasoning.