Benford's law

Revision as of 01:55, 3 May 2013 by Thundergnat (talk | contribs) (Draft Task)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Benford's law, also called the first-digit law, refers to the frequency distribution of digits in many (but not all) real-life sources of data. In this distribution, the number 1 occurs as the first digit about 30% of the time, while larger numbers occur in that position less frequently: 9 as the first digit less than 5% of the time. This distribution of first digits is the same as the widths of gridlines on a logarithmic scale. Benford's law also concerns the expected distribution for digits beyond the first, which approach a uniform distribution.

Benford's law is a draft programming task. It is not yet considered ready to be promoted as a complete task, for reasons that should be found in its talk page.
This page uses content from Wikipedia. The original article was at Benfords_law. The list of authors can be seen in the page history. As with Rosetta Code, the text of Wikipedia is available under the GNU FDL. (See links for details on variance)

This result has been found to apply to a wide variety of data sets, including electricity bills, street addresses, stock prices, population numbers, death rates, lengths of rivers, physical and mathematical constants, and processes described by power laws (which are very common in nature). It tends to be most accurate when values are distributed across multiple orders of magnitude.

A set of numbers is said to satisfy Benford's law if the leading digit d (d ∈ {1, ..., 9}) occurs with probability

 

For this task, write (a) routine(s) to calculate the distribution of first significant digits in a collection of numbers, then display the actual vs. expected distribution in the way most convenient for your language (table / graph / histogram / whatever).

Use the first 1000 Fibonacci numbers as your data set. No need to show how the Fibonacci number are obtained. You can generate them or load them [from a file]; whichever is easiest. Display your actual vs expected distribution.

For extra credit: Show the distribution for one other set of numbers from a page on Wikipedia. State which Wikipedia page it can be obtained from and what the set enumerates. Again, no need to display the actual list of numbers or the code to load them.


Perl 6

<lang perl6>sub benford (@a) { %(grep(/<[1..9]>/, @a)».match(/<[1..9]>/).bag) }

sub dump (%distribution, $base = 10) {

   printf "%9s %9s  %s\n", <Actual Expected Error>;
   for 1 .. 9 -> $digit {
       my $actual = %distribution{$digit} * 100 / [+] %distribution.values;
       my $expected = (1 + 1 / $digit).log($base) * 100;
       printf "%d: %5.2f%% | %5.2f%% | %.2f%%\n",
         $digit, $actual, $expected, abs($expected - $actual);
   }

}

say "First 1000 Fibonaccis";

( 1, 1, 2, *+* ... *)[^1000].&benford.&dump;</lang>

Output

First 1000 Fibonaccis
   Actual  Expected  Deviation
1: 30.10% | 30.10% | 0.00%
2: 17.70% | 17.61% | 0.09%
3: 12.50% | 12.49% | 0.01%
4:  9.60% |  9.69% | 0.09%
5:  8.00% |  7.92% | 0.08%
6:  6.70% |  6.69% | 0.01%
7:  5.60% |  5.80% | 0.20%
8:  5.30% |  5.12% | 0.18%
9:  4.50% |  4.58% | 0.08%

Extra credit: Square Kilometers of land under cultivation, by country / territory. First column from Wikipedia: Land use statistics by country.

   Actual  Expected  Deviation
1: 33.33% | 30.10% | 3.23%
2: 18.31% | 17.61% | 0.70%
3: 13.15% | 12.49% | 0.65%
4:  8.45% |  9.69% | 1.24%
5:  9.39% |  7.92% | 1.47%
6:  5.63% |  6.69% | 1.06%
7:  4.69% |  5.80% | 1.10%
8:  5.16% |  5.12% | 0.05%
9:  1.88% |  4.58% | 2.70%