MATH 1127 Section 2.2

all right let’s talk about organizing quantitative data and again quantitative it’s dealing with numbers so we’re first gonna discuss organizing discrete data in tables and remember discrete we’re over here discrete data was thinking about something that’s countable alright so if the number of observations are a relatively small quantitative data can be treated the same way as qualitative data so this is where we would actually go through and look at our data and we would start the tally so this example here is gonna be the number of people arriving during 15-minute intervals during lunch time at Wendy’s so they randomly selected 40 15-minute intervals during lunches and recorded how many people came through the drive-through or came to the restaurant so literally they’re just gonna go through and they’re gonna tally up how many customers came during each of these 15-minute intervals they recorded it another tallying so for the number so there was only one 15-minute slot with one person and they recorded that here the number of 15 minute slot but had two customers arrive it’s gonna be six so there was the one two three four five and six and again and as we showed and tallied here so the process would be to go through all of that all over the raw data organize it into this table by tallying up how many how many times each of these occurred and then we would go ahead and write it out as the frequency and then the relative frequency this is gonna be the number of occurrences that happened for that one particular thing so the number of times one customer came through the window out of all 40 of those observations so the relative frequency is the probability or is is very closely related to the probability whenever we’re coming up with relative frequency these will all when we add them all up they will add up to you’re never gonna have a negative relative frequency so it has to be between zero and one for a relative frequency all right so if we wanted to go through and do an example with the number of siblings that everybody have we would go through and figure out how many have zero siblings so we will assume that five people have zero siblings maybe seven have one maybe another seven have two maybe nobody has three siblings and maybe one person has four and that we don’t have any with five six or seven so this is assuming that we have 20 students and that our 20s have our 20 students five of them have no siblings seven of them have one sibling seven have two siblings and one has four siblings so the relative frequency we’re going to take this frequency and divide by the total number of students so if it divided by the total number of students or observations so we said 20 so this is going to give us 0.25 for our relative frequency our seven over 20 0.35 and again seven out of 20 is then give us 0.35 the relative frequency here will be 0 and the 1 out of 20 then give us zero point zero five and again zero for the rest of these all right so let’s about a histogram so a histogram is constructed by drawing rectangles for each class of data the height of each rectangle is the frequency or the relative frequency of the class the width of each rectangle is the same and the rectangles touch each other so keep this in mind the width of each rectangle is the same and the rectangles touch each other so it’s gonna look like a bar graph only it’s a little bit more specific because again the width is the same and the rectangles touch each other so if we were to take that same data of the number of people arriving and the 15-minute intervals at Wendy’s and turn that into a histogram then we would have 1 through 11 and because of the

histogram we have touching bars they’re all the same width here whenever we had we did not have 10 people arrive during any 15 minute interval so we’ve left a space that is the same width as the rest of the bars where the 10 whatever the 10 is and I want to point out that over here we have the frequency so actually the number of times that was observed and then on the right we have the relative frequency and it’s the exact same shape so instead of us having the actual number of times something occurred we can actually go through and change that and put in the relative frequencies and on the same same histogram again it’s gonna be exactly the same shape it’s the idea of it just being a different scale all right so we’re gonna create a histogram with the class sibling data and so again we had five seven seven zero one and then all these were zeros and we had 0.25 0.35 zero point three five zero zero point zero five and again zeros for the rest so I’m gonna try make this is as attractive as possible alright so we’re gonna start by us having our Oh Syd lanes one sibling two siblings three siblings and four siblings and we’re not gonna get ten yuan with the five through seven since there’s zero for all of those and let’s go ahead and form our rectangles for zero siblings we had five let’s say that five is right around here for one sibling we have seven for two siblings we have right around seven so I’m gonna actually just copy that for three siblings we had zero certainly was space here for the three and like praise up a little bit for us and then for four siblings we had one alright so once we’ve got that all put together we know this was fine this was seven you know down here is one and we can have their frequency so right now we have a histogram with the frequencies but if we wanted to change this just a little bit we could take this from the frequency to the relative frequency so if we instead of having these numbers in blue if we had put zero point three five zero point two five and zero point zero five here instead of the one five and seven now we’ve gone from a frequency to a relative frequency histogram alright so let’s talk about continuous data so this is gonna be things where it’s none of countable and so it’s gonna make more sense to put things in ranges instead of trying to tally up each individual thing so we come with categories or intervals of numbers for each is gonna be created for each continuous raw data the categories are called classes so again the classes it’s just how we’re gonna separate the the data your classes need to be the same width meaning they need to have the same same difference from the start to finish for each one for instance here going from 25 to 34 35 to 44 45 to 54 each of these classes contains 10 years and to determine the width of a clown you take the beginning of one class and subtract the beginning of the class just before it so 35 minus 25 would be 10 so we know that the class width is 10 I also want to be specific here but it says age 25 3 34 it’s understood that this is thirty four point nine nine nine nine and so on and so forth and that thirty five point zero begins this class

so this is 25 up to just before 35 this will be 35 up to just before 45 so on and so forth so this first table here is showing us us residents age 25 to 74 who’ve earned a bachelor’s degree and then over here this table is showing us us residents on death row as of December 2006 and again where we’ve got the ages split out by class and then how the the frequency of our on the right or how many people fall into each of those classes alright for our next example we’ve got the three-year rate return of mutual fund so this is going through different mutual fund and showing their rate of return recording that and again because of all of these decimals it makes more sense to have a class then it does for us to go through and try to count each individual number so instead of us having for instance ten through eleven point nine nine then here we have a ten point nine nine so we’d actually have to have a category of ten point nine nine we’d have to have a category of eleven point three two so on and so forth if we were treating us like discrete data and again discrete being countable and this because we’re gonna have a range instead of a class so our classes are running from ten to eleven point nine nine twelve or thirteen point nine nine so on and so forth and if we want to figure out the width of our class we’re gonna do the twelve minus ten we get two so we know that each class has a width of two so the books already been nice enough to go through and do the hard work of tallying for us we’re literally that we just go through each piece of data and put a tally mark and the proper class and then we’re going to count those all up and there were forty different pieces of data so we’re taking whatever our number was dividing by 40 and that’s gonna give us our relative frequency so let’s actually work through a very similar example so dividend yield a dividend is a payment from a publicly traded company to its shareholders the dividend yield of a stock is determined by dividing the annual dividend of a stock by its price the following data represent the dividend yield in percent of a random sample of twenty eight publicly traded stocks of companies with a value of at least five billion dollars with the first class having a lower limit of zero and a class width of point four we need to go through and construct a frequency distribution and then we want to Hadden construct a relative frequency distribution okay so let’s set our classes up first so it’s telling us that the first class has a lower limit of zero so there’s going to be zero through and then it’s time to have a class width of 0.4 and again remember to find our class widths we are gonna actually take the beginning of one class and subtract the beginning of idea and another class so the lower limit of one class subtracting the lower limit of the class before it so we know that the big I’m just going to put the lower limits at the beginning of each of the classes down so we’re gonna have 0 we’re going to have 0.4 we’re gonna have a point 8 and again we’re just counting by 0.4 zeros we’re going to have one point to one point six two point oh two point four two point eight and three point two all right now from that now I’m gonna do the lower limit and we could have done the first at the lower end upper limit of each class individually but I find it having the lower limit of the next class makes it a little bit easier so here I’m going to have a 0 through just before this point four zero sure enough through 0.39 and then the 0.40 through just before this point eight zero so zero point seven nine the zero point eight zero through one point one nine that’s just the four one point to the one point two through one point five nine the one point six zero through

one point nine nine with the 2.0 through two point three nine the two point four through two point seven nine two point eight zero through 3.19 and this 3.2 is gonna be through three point five nine all right so now we have all of our classes set up so we can just go through and start tallying our data and see if I can actually pull it down just a little bit for us so we get all on the same screen okay and as we tally I’m going to actually mark through so the 1.7 it’s going to fall here zero one point one five point six two one point zero six two point four five three point three eight three is alright two point eight three two point one six one point zero five one point two to one point six eight zero point eight nine zero make sure mark it out so I don’t count twice there’s the zero and now we’re to the two point five nine zero again one point seven point six four point six seven two point zero seven point nine four two point zero four zero zero one point three five zero zero and zero point four one okay and so now let’s actually write out the frequency here so we’ve got seven for the 0.39 class we have four for the next five for the next two three four two and one and it turns out we don’t actually need this bottom class it’s just gonna work that one out we’re not going to use it and the reason we’re not using it because if at the end of the data are the end of the classes so now we’ve got all of this worked out now we can do our relative frequency so to figure out how many observations we had we can see a quick multiplication so there’s gonna be seven columns and four rows which means that we have 28 observations so let’s do that work right quick so we’re gonna have seven over 28 which is gonna be 0.25 four out of 28 which is gonna be zero point one four two nine five out of 28 there’s going to be zero point one seven eight six two out of 28 just going to be zero point zero seven one four three out of 28 which is gonna be zero point one zero seven one get that decimal in there four out of 28 which is even at zero point one four to nine again two out of 28 which will still be the zero point zero seven one four and one out of 28 which is gonna be zero point zero three five seven all right so our frequency distribution is gonna be this column so we would have the class and this the relative frequency distribution is going to be the classes and the relative frequency column all right let’s continue talking about our histograms so again

histograms need to have columns of the same width and that they was rectangles those columns need to be touching each other so here is the exact same information that we had from just a minute ago so I’m going to pull over the table that we just did so we can make a histogram out of our our data so as we look we’ve got eight different classes so we need to put those eight classes along the bottom all right so once we have all the classes there we also want to label what would be the y-axis and so first we’ll go through and you the frequency histogram so we’re being concerned about the numbers one through seven so let’s just label by twos so that’ll be two four six and the very top will be eight and now we need to go through and actually draw our rectangles so for our first when we have seven I’ll be right around there for our next category we had four for our third class we’re going to have five and two for the 1.22 1.59 class and three for our next class for two and one so here we’re going to have our frequency histogram and if we want to go ahead and work on our relative frequency histogram we can actually just copy all this and make a few edits so instead of having these numbers 2 4 6 & 8 over on the side we’re gonna erase those and we’re an input and so instead of the two that would have been point zero seven one four instead of the four that have been point one four two nine and we didn’t actually have the six so weak but we’ll go ahead and put what six divided by 28 is which is gonna be point two one two three sorry point two one four three and we can put the 1 for the 8 which again she’s gonna be the same idea around you 8/28 and get point two eight five seven so we’re going to change the side to show the relative frequency and so we need to add the word relative up here those would be all relative frequency histogram now all right let’s switch gears a little bit and start talking about stem-and-leaf plots so let’s look at this first before we start talking about constructing one so swimmingly the stem can be one or more digits so these are the stems and then this two-sided one the stems in the middle and then the leaf is always one digit so let’s talk about it a little bit more specifically so the stem the graph will consist of the digits to be left of the rightmost digit the leaf of the graph will be the rightmost digit sometimes it’s necessary to modify the method of choosing the stem if a different class width is desired and then for two so our next step is we’re going to write the stems in a vertical column in decreasing order draw a vertical line to the right of the stems

so writing the stems down drawing a vertical line and then putting the leaves and again we’re gonna write each leaf corresponding to the each stem to the right of the vertical line and we need to write these in a setting order so smallest to largest and also with your stem and leaf plot you’re gonna have a legend so it’s how is that one and that line zeroes and represent ten so this represents ten there’s also 11 and 14 so the one this is the tens place and this is gonna be the ones place essentially here have 10 11 14 gonna have 21 24 24 27 29 21 24 24 27 29 33 those 35 so 35 35 35 37 37 38 Oh 37 37 38 and then we’re going to run fours for the 40 40 41 42 46 46 48 49 55 58 61 and 62 and going through that you can kind of see how the data is distributed means that really quickly we have this nice visual here so we know there’s gonna be more data towards the middle there with the 30s and 40s so those were our highest categories who have the most the highest frequency all right and so again your stim does not necessarily have to be one digit and it does not this does not have to necessarily represent 123 so again this is why we’re gonna have our legend so it’s telling us that 12 and then the line 3 represents 12 point 3 so let’s go through and get all of our original data from this one it’s around 12 point three and then twelve point seven twelve point nine twelve point nine again and now on to our 13 stem so 13.0 thirteen point four thirteen point five thirteen point seven thirteen point eight thirteen point nine thirteen point nine again and so on and so forth so we would continue this through all of this data so again it’s the fourteen point two fourteen point four fourteen point four fourteen point seven fourteen point seven fourteen point eight fourteen point nine and we would also have fifteen point one fifteen point two fifteen point two fifteen point five fifteen point six 16.0 and fifteen point three now let’s talk about looking at a back to back to back to back sets of data so it’s timidly if it’s gonna be useful here too so here we’re able to compare the fat and fast food sandwiches between McDonald’s and Burger King so the stem here is gonna represent the first digit still and then each digit in the leaf represents the next so it’s telling us that ate the line zero the line again and then three are sorry and then seven is gonna represent zero eight and zero seven so this means that under McDonald so underneath mcdhh ease mcdonald’s we’re gonna have zero eight zero nine so also just known as eight and nine right so zero eight zero nine one zero so

that’s gonna be ten twelve so the one two so twelve fourteen sixteen sixteen seventeen eighteen and nineteen with our too weird I’m 21 23 23 24 so we got the 24 then 26 26 28 28 29 now let’s go to 3 the 3 has nothing so we don’t have anything here if it was 30 they would have putted 0 so we go next to our 42 and that’s gonna be it for McDonald’s so now I can look over at Burger King and under Burger King this is the one that this is going to be the direction that we’re used to reading so the 0 7 represents 0 7 or 7 and then 12 12 for the 1 to 1 to 12 12 the 1 3 you give us 13 the one six one six one seven so 16 16 17 now I help to the twos so twenty-one twenty-two twenty nine twenty one twenty two twenty nine for the 30s our we’re now at 30 33 39 39 for now 44 and 47 54 and 57 then 65 and 68 so here’s all the original data that’s represented in the stem-and-leaf plot and we can see that Burger King has a more a wider distribution of the fat grams and their sandwiches whereas McDonald the majority of their sandwiches are between our 10 or higher up to 30 so in the tens and 20s range all right so let’s go through and actually create a stem-and-leaf plot so we’re gonna talk about the age of inauguration so the following data represents the ages of the President of the United States from George Washington through Barack Obama on their first days in office and I didn’t want to make a note that President Cleveland is listed twice because he’s historically counted as two different president numbers 22 and 24 because his terms were not consecutive so we’re gonna take our data which they have kindly already put in a sending order for us and we’re gonna do our stem and leaf so here it’s telling us that for this line – it’s gonna represent 42 so we’ve got 40 fifties and sixties and that’s gonna be it so we’ve got forties fifties and sixties so these are our stems and then our leaves so 42 we’ll need a two for our leaf for 43 we need a three for our leaf 46 we need a 6 for our leaf and again another 46 so 47 is listed twice 48 49 is listed twice and now we’re to 50 so 50 we have two times so we need five so there’s our 0 for 50 and then another 0 for 50 again and then 50 150 150 150 once there’s 451 and then 52 52 and then we have 54 54 54 54 54 so we have five of those so 54 54 54 54 54 we’ve got four 55 we’ve got 350 sixes we have four 57 158 and now we’re to our 60s we’re going to

have 6 0 4 60 and then 6161 61 62 64 64 65 68 and 69 so this could be our stem-and-leaf plot and i say could be because we have another option so that we only have three stems and maybe we want to split the data a little bit more right now our classes are running from they’re running 10 years maybe we want to separate that and only have them run 5 years but based on the way stem and leaf plots work we couldn’t have different stems for say 40 to 44 and 45 to 49 but we can’t just separate how we list the stems so another option would be where we’re going to go through and list the stems twice and for the first set so for the first stem we’re gonna list the 0 through 4 and for the second stem we’re gonna list 5 through 9 so for the 4 again we’re going to list 0 through fours and our first part so we’ve got the 2 and the 3 and then the 5 through 9 so sick six seven seven eight nine nine and we’re going to do the same things or explain our 50s into 50 through 54 and then 55 through 59 so we’ve got 5050 and then we had for 51 to 52 s550 fours and now for our 55 we got four of those we have 350 sixes we have four 57 and a 58 and then for our 60 again splitting through it’s 62 64 and then doing the 65 to 69 so 60 we have three 61 a 62 to 64 and then we have a 65 68 and a 69 so this is gonna split the data out a little bit more for us and we still have a fairly nice distribution where we can see that the majority of the presidents are not rated around a chair in their 50 and we look we can see that most presidents are gonna be inaugurated in their late 40s to early sixties um if we wanted to be a little bit more specific about it but again still in their 50s so these two either one of these would be acceptable and I do you want you to realize that they can list the same stem more than once if they’re trying to have smaller classes all right let’s quickly you talk about the shape of the data says we’re going to talk about data being skewed left skewed right nor normal or uniform so whenever I say normal I mean bell-shaped so to determine how data is skewed you look for the little skinny part or at least I look for the little skinny part called the tail so bell shape we have bell shaped data it’s gonna have this very nice pretty bell shape and pretend this is some symmetric and it’s gonna be fairly small on the left and right and again majority of the data being in the middle here for our first one the majority of our data occurs first and we have this little skinny part over to the right so this is going to be skewed right for our next one we got the skinny part to the left this is gonna be skewed left again up skinny part we refer to as the tail here this data is actually fairly normal so this is going to be bell-shaped or

normal normally distributed and here everything is exactly the same so this is uniform and then these two these have a little room for interpretation so if we were to kind of draw this you can kind of see that there’s a little bit more of a tail over to the right so we could say that this was skewed right but other people might argue that this was normally distributed the same idea is going to be true over here where if we look some might argue that this was normally distributed and some might argue that it’s going to be skewed left there won’t be any that are that ambiguous on any of your test