/* ---- Google Analytics Code Below */

Thursday, November 15, 2018

Benford's Law and Data Science

Used it from the very beginning in enterprise data science.  well worth understanding,  especially for anomaly cases in finance or research fraud.   Even in finance we found relatively few people that had heard of it or how to use it.  Good, mostly non technical overview.

What is Benford’s Law and why is it important for data science?

By Tirthajyoti Sarkar
Sr. Principal Engineer | Ph.D. in EE (U. of Iilinois)| AI/ML certification (Stanford, MIT) | Data science author | Open-source contributor| AI in Simulations

We discuss a little-known gem for data analytics — Benford’s law, which tells us about expected distribution of significant digits in a diverse set of naturally occurring datasets and how this can be used for anomaly or fraud detection in scientific or technical publications.

Introduction
We all know about the Normal distribution and its ubiquity in all kind of natural phenomena or observations. But there is another law of numbers which does not get much attention but pops up everywhere — from nations’ population to stock market volumes to the domain of universal physical constants.

It is called “Benford’s Law”. In this article, we will discuss what it is, and why it is important for data science. 

What is Benford’s law?

Benford’s Law, also known as the Law of First Digits or the Phenomenon of Significant Digits, is the finding that the first digits (or numerals to be exact) of the numbers found in series of records of the most varied sources do not display a uniform distribution, but rather are arranged in such a way that the digit “1” is the most frequent, followed by “2”, “3”, and so in a successively decreasing manner down to “9”.  ... " 

No comments: