## Benford's law and the 2021 Madrid regional election

I originally wanted the title of this post to be “Was the 2021 Madrid regional election fraudulent?”, but I found it too much for a post about observation-based laws and R code.

In particular, this post focuses on Benford’s Law. This law describes a phenomenon by which, in many real-life collections of numbers, those numbers whose leading digit is 1 appear more frequently than numbers beginning with other digits. Some examples of these collections of numbers include

1. the number of followers of Twitter users,
2. the number of books in US libraries,
3. the population of Spanish cities,
4. the street numbers of Brazilian addresses, and so on.

In all these collections, the digit that most of the numbers start with is 1, then 2, then 3… and so on, with 9 being the digit that the fewest numbers start with. The website testingbenfordslaw.com provides a visual check of these examples and many more.

The distribution of the first digits, according to Benford's law. Image from Wikipedia.

This law, which might seem not so relevant, has interesting applications in the fields of forensic accounting, auditing and fraud detection, as described in the book by M. J. Nigrini, Benford’s law. Among the frauds that are often analyzed with Benford’s law are tax fraud and also electoral fraud. For example, a claim that circulated on social media after the 2020 US Presidential election was that some of the votes for Joe Biden seemed suspicious because they did not follow Benford’s law. The tweet below is an example.

Tweet by @PetersonAmoriah

Last Tuesday (May 4th) there were regional elections in Madrid and I wanted to check whether Benford’s law holds true for the results of each municipality in the region. Six main political parties contested the 136 seats in the Madrid Assembly and the results were as follows:

• Partido Popular: 65 seats (1,620,213 votes)
• PSOE: 24 seats (610,190 votes)
• Vox: 13 seats (330,660 votes)
• Podemos: 10 seats (261,010 votes)
The results per municipality and per polling station can be found at the official site of the elections. To make things easier though, I have created two CSV files from this data, which can be found in my GitHub repository. In addition, to analyze the data using Benford’s law, I have prepared a very simple piece of code in R that makes use of the benford.analysis package. The script, which can also be found in my repository, represents a histogram with the leading digits of the electoral results for each municipality of Madrid, and compares it with the ideal curve according to Benford’s law. The results are shown below for each of the six main parties: