# Intro to Audit Analytics Lecture 6: benfords law

hello everyone for this week we will have two videos the first video is presented by myself covering burn control analysis and some other functions in a Theo an ideal the Signum part is taught by hussein issa who is currently an assistant professor at Rutgers Business School he will explain his research and duplicate payments so in this video I will start by discussing a very useful technique for auditors to do data analytics then third law after that I will show you how to do other function in a CEO and idea corporate law I will first discuss the history of it and what exactly is Benford’s law after that have you talked about what types of data that conform with Benford’s law and how it can be used in fraud investigation I will demonstrate this by using examples finally I will discuss the other uses of benders law and the quotients that we need to consider when using Binford’s laws the discovery of this law dates back to 1880 1 an American astronomer civil Nuka noticed that in love with and tables as early pages which contains numbers that started with 1 were much more one than the other pages this become the first known of the relation of social new club proposed the law that the probability of a single number n being the first digit of a number was equal to log M plus 1 minus log Evan for example if n equals 1 that it should be love to – not one and it’s approximately point three the phenomenon was again noticed in 1938 by a physicist named Frank burnford who tested the data from 20 different domains and was credit for it his dataset comes from various sources including surface areas of 300 and certain 5 rivers the sizes of 3215 9 u.s. populations 104 physical constants 4,000 entries from mathematical head books 308 numbers contained in an issue of Reader’s Digest the street addresses of the first 314 two persons listed in American men of science and 418 death rates the total number of observations used in the paper was twenty thousand two hundred and twenty-nine which is considered as extremely large as our time this discover was later named after Benford so we called comfort law in 1985 hill proved the results of mixer distribution turn first row was first used in counting in late 1980s here we have to mention one important person named mark knee grinning mock Nicolini was a pioneer of applying Benford’s law in auditing and forensic accounting in his groundbreaking 1991 Journal of Accountancy octal he introduced this useful tool into accounting world he showed the widespread applicability of Benford’s law and it’s practical usage to detect fraud arrows and other anomalies mark Nicolini has write many books regarding Benford’s law and its usage in accounting and auditing field one of his

latest book is called burnford law applications for forensic accounting auditing and fraud detection if you are interested in this topic and this technology you can find this book on Amazon so what exactly is Benford’s law Memphis law gives the probability of obtaining teaching won-soon I in each position of a number what is the position of a number well let me illustrate it with the simple example for example for three thousand eight hundred and seventy nine the number on the first digit is 3 on the signal digit it’s 8 on the third digit it’s seven and on the foster jet it’s not intuitively most people tend to assume that the probability of a number appearing on a digit is the same for different numbers for example for the first digit people incorrectly believed that the probability of number one appearing on the first digit is 1 over nine which is the same as the probability of number eight appearing on this period however in reality this is not true in many cases according to Benford’s law the probability of obtaining one in the first digit position is about certain point one percent which is look to – not one as I explained before so this table summarizes – expected the frequencies of different numbers appearing on different digits calculated using Binford’s law as you see here on each digit there is a decline probability from smaller numbers to larger numbers on the first digit the probability of number one is largest almost twice likely as number two the probability of number nine appearing on this digit the first digit is very small it’s about like it’s nestled on 5% on second digit the difference of probabilities between different numbers is smaller than the first digit and this pattern applies to the third and the fourth digit in practice you don’t need to calculate this yourself although it’s very simple but you need to have this idea in mind that’s the probability of different numbers appearing on a digit is not the same in many cases so you may ask why there is such a seemingly strange pattern there is actually a clear logic behind this let’s consider only numbers on the first digit if the data entry began with the digit 1 it has to double in size which is 100 before it begins with didn’t – for example from 100 into 200 or from 1000 to 2000 however if the data entry begins with the digital nine it only has to be increased by 11 percent in order for the first digit to become one this is like no increase from 900 to 1000 you may realize that this law is not applicable in all circumstances this is why I keep saying Binford’s law is useful in many cases but not all for example if the number is generated randomly then this law does not apply in fact when firts law is only useful for certain type of data so this table summarizes the types of data that can form when first all with examples

for each time the first type is the set of numbers that come from mathematical combination of numbers results come from two distributions too obvious examples accounts receivable and accounts payable because the are both products of two variables that come from two distributions auditors can use Benford’s law to test whether accounts receivable or payable is reasonable the other type of data that follows temperance law is transaction level data however it is important to note that there is no need to sample will apply Benford’s law you can simply check on the rotator such as disbursement sales expenses to see if there is any violation of Panthers law the third type of data that follows Benford’s law is large data set the more observations the better meaning the more reliable the result this could be full years transactions finally accounts appeared to conform when the mean of a set of numbers is greater than the median and the skewness is positive this result comes from mathematical manipulations and we don’t need to consider the detail of it but actually most of the common numbers fall into the style so suitable for Memphis law analysis now let’s discuss on what situations the data does not conform Benford’s law first if the data set is compressed of assigned numbers for example zip codes are fine once they occur naturally so it does not conform then third law the same applies to check numbers or invoice numbers also numbers that are influenced by human souls are not applicable for instances many prices are set at psychological trash showed such as the one point ninety nine dollars or ATM with your amounts a third type of data that do not conform a perfect law comes with a large number of firm specific numbers such as an account that has specifically set up to recall 100 reforms ethan account has building minimal or maximal and will also reduce the usefulness of Benford’s law like the set of assets that master meet a threshold to be recorded and lastly with no transaction is recorded Benford’s law is not applicable so to summarize accounting data that can use Benford’s law include accounts payable data account receivable data estimation in general ledger relative size of inventory unit prices among locations new combinations of setting prices customer refunds and duplicate payments now I’m going to give you some examples of the use of penton’s law in fraud detection the underlying reason that Benford’s law could be useful is because he invented or altered number are not likely to follow Benford’s law as random numbers do in a sense human choices are not random although we do not realize it ourselves Benford’s law however can easily detect such manipulation one famous example was in 1993 which is a case between state of Arizona and Wayne James Nielson the accused was found guilty of trying to defraud the state of nearly 2 million Nielsen a manager in

the office of Arizona State Treasury argued that he had diverted forms to a bogus vendor to demonstrate the absence of safeguards in a new computer system 23 checks were manually created these are carefully designed so that in the numbers appear to be random there is no duplicate payments all the values include dollars and the sentiment however because Pam law is quite counterintuitive people do not naturally assume that some digits occur more frequently so subconsciously the manager repeated some digits and digit combinations among the first two digits of the invented amounts 87 88 93 and 96 were all used twice for the last two digits 16 second seven and eighty-three were two placated note that seven through nine were the most frequently used the digits in contrast to Memphis law even though mr. Newsom believes that his choice is random but Benford’s law shows that this is not random but in were invented now let me show you how to conduct Benford’s law or nurses using a sale remember last week we have imported accounts receivable and accounts payable data in today’s lecture I’m going to use account payable data so let’s go to the table first and from this table you can notice on the mount field and this is a variable that we are going to use for analysis so if we choose analysis perform Binford analysis and it’s asking you to select and which field you want to do Benford law analysis and here it will list all the numeric variables however if you remember we just discussed what kind of data that does not conform Stanford’s law check number is one of them because it is pre-assign so here we can only choose amount and you can also choose a number of Laden digits and for now we only consider the first digit we also select to include upper and lower bounds according to Benford’s law any value outside of this range in suspicious you can also set conditions if you want but now we are going to test the whole population for the output again we show it on the screen so now let’s click OK we see the results you can see the leading digits the actual count the expected account these statistics ratio lower bound and the lower bound if we have stored you the statistics you should be familiar with this value this statistic ratio as well as T value and F value if the X account is smaller than the lower bound or larger than the upper bound we see these cases violating Benford’s law in this particular case we see number one has 188 counts which is smaller than lower bound for number two that actual count 115 four is between over bound and upper bound for number three it also falls into reasonable range so does number four for number five it is a little larger than the upper bound number six has ninety grounds which is also above upper bound for number seven it is much larger than the upper bound for number eight 18 inside the range for number nine it is above the upper bound while it is

difficult to see from the table format remember that we have covered how powerful visualization could be ECL automatically generate graph for this data so simply go here graph so here you can see the actual value the expected value which is in the middle the upper bound and the lower bound you can see that at the actual value for number one it is most remote from range order expectation and other of weird deviation is number seven where it is much larger than the upper bound for other numbers they seem to comply with the law or debate owning a little bit then you may want to know the results when using the first two digits so we go back to a call to account payable and the noise perform Banford analysis we change a number of leading digits to Q and also include upper and lower bounds it okay and now you see many numbers since this our first two digits there are 19 noise in this case it is convenient for us to see the trend using graphs as it is shown here you can see that while most all numbers are acceptable but for number I think it’s 79 it has significantly larger frequency also it is true for number 75 and 76 in this case we need to extract this three part of record to do further analysis so what we can do is to go back to the table and find and record sorry and find the records starting with 79 click it and we can see all of the records with a month starting with 79 so if we want to extract this to a new table so you can simply go to data extract data and here you can choose to either to extract over the fields or or only in fact a subset of the fields and set the condition of the extraction and the names s for example we see that we could name it as a count a account payable 79 and okay then we can see this new table and here we only extract the mult field so we deleted the clips and the way it’s trapped again the data is checked data we extract all field ok yes now we extract all the fields of this wreck of the records that satisfy this condition now let’s go to idea to see how to do the same thing also we first come to the account payable table and then we choose a analysis Benford’s law and here again you can choose a field to analyze and we choose a mount and the in idea you can choose to include positive values or negative values because you can see here all the modes are in positive values so which was positive and again we choose to show boundaries

and mean absolute deviation here you can see that there are many tests that you can choose like you can work on first digit first the two digits for sweeteners or signal digit lost two digits and so forth and it will automatically create names for these databases and here if you go to advanced you can choose the minimal records in input database here because we only have 999 data so which was 999 and choose ok ok now you can see several results here if you would want to switch between them simply click this one or this one or this one and you can switch between different results of of different tasks or if you go back to the account payable and you can also choose bamford law to see the result when you click the result on the second digit like what I did here you will see that number two is significantly lower while number nine is significantly larger than the expected value then let’s go back to the results of the first two digits if you want to extract data you can click the bar which is highlighted here you see highly suspicious suspicious and the police one remember we have discussed before is 79 so if you want to extract this record so you can click it and choose to extract suspicious records display suspicious records so you can choose to extract the records and then if you give it a name it will create a new table here and you can also choose to display these records first and the choose to save it and this will also creates a table a new table that satisfies this condition now suppose we extract it and name it 79 okay and all the records are listed in the new table now when you look closely to this record you will see that many adjusted bar seventy-nine thousand and among these records there are only three authorizers F M V PC and w JT so one possible explanation is that there are among limitations that they can authorize for each transaction in order to approve transactions larger than that amount they have to spilt these payments usually they split them just below the limitation that could be the reason why there are so many records that is between seventy nine thousand and eighty solvent but notice that this is only one explanation and it could be misleading for example if in total there are only three of four authorizers then it is absolutely normal that a number of distinct also risers for these transactions in stream so you need also to check number of distinct authorized persons first this is a type of data analysis that auditors need to do you know to generate a hypothesis but use other ways to look at the data to exclude possible explanations and the only thing left is lectures now you have seen how to use Benford load to detect suspicious transactions there are also other uses of Benford’s law for example it can be used to solve here to solving the problem it can be used on the

allocate computer disk space it can detect it irregularities in clinical trials it can be used for demographic models of population statistics or add hazard many other users if you are interested in this topic you can search online and you can find many interesting articles so when you are using Binford’s law you should always be cautious first what you identified these anomalies was not necessarily fraud this causes a problem of false positive in addition or the perm firts law is useful in detecting fraud not all fraudulent transactions can be detected using Binford’s law it’s only one tool among various tools that can be used for this purpose now I’m going to discuss some other basic functions in a CL and idea that we haven’t covered in the last week’s lecture I will show you how to create expressions add virtual field create future instruct data and join tables finally I will show you how to do logs in a sale and idea not suppose for account payable data we may want to know how many days it takes to pay the invoice so we need to create a new field in ACL when you go back to account payable table you can find two days one is the invoice date and other one is a pay date the payment date is the difference between pay date and the inverse state we can analyze payment date to see for example whether there is very late payment or very quick payment so what we can do is to go to add table layout here add new expressions on the first we named new field as pay days and then we can define the value of it so we go to this expression editor so if you see the function button you can see many different functions that you can use the function now we want to choose is under paid paid category and it’s called age so we added here and what we can do is to put the invoice date the early date first and comma and put the pay date later this helps to calculate the difference between dates notice that I simply click this available fields to add vein to the expression rather than typing now we verify and it’s valid so we click OK now we choose to accept entry and it will be show here in the table layout so now if you go back to your table choose one Colin for example so we just name put it to one corner here we can choose to add columns and then we can choose the new field that we created named pay date and select okay and as you can see here is a new date the pay that is included in this table or you may feel it’s a little bit cumbersome so you can do an other thing we still go to table layouts and new expressions here we see we create a new suit named pay days and for the function we can simply use the pay date – in what state verify it’s still valid and click OK what’s at the entry and you can still see it here and you obtain the same

result and if you add it here it will say show the same thing now you want to sort this variable to see extremes values for example if you choose create sorting ascending we will find some negative days which is quite unusual indicating that the pay date is actually before the invoice date so you are paid you are essentially paying something that didn’t have invoice yet now if you want to extract these transactions for further analysis so what you can do is to treat the future so here you can choose added new filter and you see the same expression box so here we choose paydays and it should be smaller than zero so we extract those transactions that satisfy this condition we verify it it’s valid so we click OK and now it will show the records that have negative paydays again you can choose data extract the data and the thigh enter name example negative hey days and we extract all the fields ok ok yes we maintain exchanges and now you can see this new table negative paydays now let’s go to idea to see how to do this so again we go to account payable go back to data and in idea you should choose data append and firstly you assign it a name and choose a type of it so name which uses pay days and here you can choose a field type here we set it as default virtual numeric then you need to click the pyramid button and it will lead you to an equation editor which is similar to the expression editor in a CL here we can also choose functions that can be used here so again we go to date go to this one named age and on the right side there are usage of it as you can see here so for the first stage you put the early date so it’s in both states and comma and then you put the later date which is a pay date there sorry I put a check node which is painted there now we try to validate it’s valid so we accept and exit well one thing I want to mention is a field type you can notice here that you can choose either virtual numeric or numeric so the difference is that if you create a virtual memory then you can delete that field after after you imported it into the table however if you create a numeric and the essentially means that you cannot delete it which is the same as a regional file so for now we just create a virtual numeric so click OK and you see that this variable is created but it seems that we made a mistake now we actually use the earlier date – older date so what we want to do is to first remove this field ok and then create a new field hey thanks expression now we can simply

now again we choose the date and time which is age and paid eight dogs in what stage now it’s an added ok now we see the result now we can double click it to sort the value and you can see there are still negative paydays so if we want to set criteria here actually you can only go to right side there is a criteria field and then it will be due to a creation editor and now what you want to do is to set paydays smaller than zero except entry and now we get this new table again if you want to see this you can go to Home button see whereas in immense negative pay days okay and a new file is generated a new table is generated now we have finished the analyzing payment days now let’s talk about detecting an authorized suppliers remember that we have imported two data set for count payable scenarios one account payable data and one supplier address data now what we want to look at is to join the two tables first to see if all the payees in the account payable data are also why the suppliers in the supplier address data one way to do it is to use nacl is to use data join table so let’s first go to account payable data join table and because currently and opening account payable so the primary table is a conch payable table before we join two tables we need to know what variable we can use for joining so let’s see in account payable we have supplier number payee in wars’ paid days in worst date amount check pay days and also write the person while in supply address we had supplier number supplier name address and zip code so we can see we can only use supplier number for the join before we join these two tables we need to make sure that these two variables have the same type of lens so for checking lens we still go to table layout and you can see in account payable beta for the supplier number it’s character and the less info and for supplier address you can see it’s character but the less is 10 so when we look at this field actually we realize that four digits are enough to represent it so we create a new field that contain that contains supplier number with length of 4 so here we go to table layout and new expression we call it supplier new number new and then we go to expression now the function we want to use is on the string category and it’s called substring we need to subscreen we need to trim the variable so we set all three and here it should be some prior number and because we only need four digits so we start with the first digit and the lens is full and it’s