Our software System INVEP contains a great amount of modules to check if the data entered are valid. For accounting the things are not even simple. Every accountant is able to enter fictional data. To detect this, every pice of paper must be compared with the data entered into the computer.
But there is a way to check reliability of the accounting data by doing some tests. If there is enough data available (much more than 100 accounting entries) a NBL check can be made (Newcomb Benford Law). Every real accounting system must show results that follow this law. For some detailed information take a look into Wikipedia (Benford’s Law) .
I have written a small Mathematica script that shows the results. The script can easily be modified for the own needs to analyze every kind of numeric data.
Here are the steps to analyze data according to the Benword Law:
Fist of all we need a fileselection module to get the data into our system. There are several ways to select files. The following is a suggestion only. We are using some Java Code. This may be changed:
Next we ask the user to select a file. The path is stored into impname:
Now we do import the file. Because it is a list within a list, we drop the outer list while doing the import:
Our imported file contains a headline, we don’t need this headline, so we can drop it:
The data imported contains two columns that we are interested in. Column 5 contains the debit values („Soll“ in german), column 6 contains the credit values („Haben“ in german). Because we are interested in the raw data only, we can join both lists. Additionally we drop all zero entries because we are not interested in these entries (NBL only takes the numbers 1 to 9 into account). We can drop the zeros by using the DeleteCases Function:
Benford’s Law only looks at the first digit of each number (I know, there are extensions to look at the second digit and so on, but for this example we only look at the first digit). We can do this by transforming every number from our list into an own list that contains all the digits of this number and from every such list we take the first digit. The result is a list that contains only the first digit of every number from our outgoing list:
To show the results we do need the length of our list:
Now we are able to plot the result. We do show every value found as a bar. This results in 9 bars for all the digits 1-9. The bars correspond to the real accounting information. Additionally we do plot a line. The line shows the results that we have to expect. Only if the line crosses the dot on top of each bar, the distribution of this digit conforms to the expected. If not, possibly something is wrong with our accounting information. For every digit a relative difference of up to 20% seems to be quite normal for a real accounting system.
The picture below shows some real accounting information. These accounting data is not as expected. Now we have to dive into the paper to find out why there are such big differences with the digits 2, 3, 5 and 7. Possibly we may find fraud within the accounting. We need an answer for such big differences.
Following is a picture from an other account. This really looks as expected:
Try these small scripts, read the wikipedia article. You may find data all over the world that corrsponds to Benford’s Law. We only looked into accounting.
Have fun