2015년 12월 17일 목요일

[발췌] Benford's Law


자료 1: Tim Harford, The Undercover Economist

※ 발췌:

( ... ... ) Benford's Law was discovered in 1881 by the astronomer Simon Newcomb, and then again by Frank Benford, a physicist at General Electric, in 1938. The law is a curious one: it predicts the frequency of the first digits of a collection of numbers. For example, measure the lengths of the world's rivers, and see how many of the digits begin with "one" (184 miles; 1,543 miles) versus "three" (3,022 miles) or "nine" (985 miles). Newcomb and Benford discovered that the first digit is usually a "one"─fully 30 per cent of the time, over six times more common than an initial "nine". And the result is true whether one counts the numbers on the front page of the New York Times or leafs through baseball statistics.

Nobody seems sure why so much data has the Benford distribution. We do know that exponential growth produces it. To move from a GDP of one billion Flainian Pobble Beads (a unit of currency in the Hitchhiker's Guide to the Galaxy) to billion Flainian Pobble Beads requires cumulative growth of 100 per cent, which will take a while. But to move from a GDP of 9 billion to 10 billion Flainian Pobble Beads requires only 10 per cent growth. Benford distribution are, uniquely, scale-invariant─in other words, if one measures GDP in dollars instead of Pobble Beads, the Benford property remains.

Manipulated data often fail to satisfy Benford's Law. A manager who must submit receipts for expenses over £20 may end up filing claims for lots of £18 and £19 expenses─and the data will then contain too many ones, eights and nines. A forensic accountant can easily check this, and while not an infallible check (fraudster Bernard Madoff files Benford-compatible monthly returns), it's an indicator of possible trouble. ( ... ... )


자료 2: The Curious Case of Benford's Law

When you roll dice, all numbers have the same probability to show up (assuming that the dice aren't loaded in any way).

... ...

However, the leading digits of numbers in very large accumulated datasets─for example, the amount you pay for each household bill over the course of a year─follow a very different pattern. In such cases it is much more likely that a given number will start with one, with decreasing probability for each higher digit up to nine. This statistical pattern is called Benford's law.

Benford's law arises naturally if the data under consideration span several orders of magnitude─for example, the first digits of the powers of two obey Benford's law:

... ...

This plot shows the frequency of initial digits for every number from 2^1 to 2^1000:

... ...

Benford's law seems to apply to a broad variety of datasets, not just pure mathematical progressions, and it has a number of serious applications.

  • it is often used to detect anomalies in datasets, including income taxes; most people don't know about Benford's Law, so when they fill out fraudulent tax forms, they tend to choose numbers with higher leading digits. If the distribution of leading digits in a given return doesn't closely follow Benford's predicted distribution, that could be a sign that the return should be pulled for additional review.
... ...




댓글 쓰기