BroadE: Statistical methods of data analysis

so you heard about Mr M data from cosmic you heard about phosphoprotein mcc’s data from Philip you heard about itraq and related data from number of tie and Monica so I am kind of trying to put everything together from the term from the perspective of data analysis and so I’ll kind of refer to what all of them have mentioned and like I mentioned you’ve been at it for a day and a half now so which means it’s time for a quiz so the question here is from a data analysis perspective which of these are fundamentally different I am going to wait for an answer even if it takes and any guesses so the first choice is I track in Sai lakh so I presume people remember what they are silac is metabolic labeling I track is sort of chemical labeling and there’s label free and label based so there’s global and targeted and then there is phosphoprotein mix and affinity based proteomics is all of the above or none of the above yeah yeah global and targeted so someone mentioned it so from a data analysis perspective so I mean this this wasn’t supposed to be facetious I mean the I was talking to Karl yesterday and he said you know your presentation has two distinct parts and I think people are going to get lost in it so I am trying to kind of set it up so that that won’t happen and the main idea here is that in a global proteomics experiment which people have also called a discovery proteomics experiment the idea is that you don’t know what you are looking for you are interested in two different states like cancer versus control are two different cell types or two different I guess classes I would call them from a statistical perspective so you are different in discriminating two or more groups of cells or tissues or pay or whatever so you have some groups and you want to see which proteins are different in those groups and to start with you don’t have you pretty much don’t have an idea of what will be different so here you’re going for a fishing expedition so the idea is that you’re trying to look for you’re trying to cast your net wide and look for things that are different and you don’t have a priori a good idea of what it is you’re looking for so you do what is called discovery proteomics you can use itraq and scilab to label your two classes of interest and combine them and try to see which ones are different you can use label-free and run set of cancers and set of controls and then do a statistical analysis to see what is different so most of the things mentioned here fall into global so targeted is is what Hasmik mentioned yesterday where you do an MRI mms analysis so you know the exact proteins or peptides that you are interested in and you set up assays to look exactly at those in your sets of samples so in a targeted mass spec experiment the important thing is to properly set up the assay and qualify your assay and understand the characteristics of your assay how good is it what is the limits of detection is a response linear as the concentration increases in the sample you are trying to go off the protein that you’re trying to measure so you want to characterize the assay first in a targeted experiment whereas in a global experiment you just want to run your sample and try to see look at the list that comes out and say which do I think is different so I am sort of oversimplifying things a little bit because you could use a targeted assay that you have properly configured and then measure lots of patient samples and then ask ok this protein that I thought was different is it really different from a statistical perspective so that is going kind of beyond the configuration into the actual use of the targeted assay but generally the major amount of time is spent in configuring the eyesight so that is kind of why I am doing a little bit of hand waving here so this kind of goes into so one of the things you can do is there has been a lot of tools that been created for analyzing genomics data so there’s like mRNA expression analysis has been reasonably well established for like the last 15 years so why can’t you just take those tools and apply to proteomics to some degree you can but you have to be aware of what are the challenges in looking at proteomics data so the data sources are kind of what I mentioned the two classes it was here but I wanted to emphasize that there is really a big difference in how you

analyze it so the two main differences that I see from a proteomics per challenge perspective is that in proteomics when you don’t measure a peptide or a protein it does not mean it’s absent whereas in a gene expression or like rna-seq experiment to the word to a large degree if you don’t measure a gene then you can assume that it’s either not present or it is present at very low abundance so you can say that it’s measurement was zero in other words when you don’t see something in genomics or gene expression analysis you can set it to zero or some very low value that is not okay with proteomics because of the of what Karl and the others have mentioned in terms of how the mass spectrometer samples the the input material so it’s quite possible that your protein or peptide is actually present at reasonable levels but you may not be able to measure it because of the stochastic nature of the mass spectrometer so you have to keep that in mind when you have a missing value you can’t replace it with a zero or a low number the second issue is you heard like Philip and others mentioned that in order to process a few samples it takes a few months and so given that kind of a throughput especially if you want deep coverage and you want to look into all the nooks and crannies of your proteome for some discovery experiment you’re going to spend a lot of time on each sample which basically means you can’t run thousands of samples in a proteomics experiment most experiments I’ve seen use one to tens of samples there are some where if you have reasonably high-throughput assays configured like mrm assays for measuring like 200 proteins those kind of things you can deal with hundreds of samples but usually it’s very few sample whereas in genomics I mean they’re there even that the TCGA study has like hundreds if not thousands of samples for each cancer type so that’s another difference so because of that the statistical machinery you employ has to be a little more carefully crafted because you can’t assume the standard large large sample asymptotic assumptions that a lot of statisticians make and say everything is fine if we ignore a few factors so in a global discovery experiment what we generally do is you run some m/s analysis using the groups of interest so it could be case control for cancer or some other disease it could be some sort of our time series where you have a baseline and then you measure something at 10 minutes and 50 minutes and one hour so you could have something like that or you could be interested in multiple disease states so you could have a control and then two or three different transgenic mice or through two or three different mutations you are looking at so there could be multiple so the basically the point here is that you have classes of interest and you run them in your mass spectrometer how exactly you run them depends on whether you are using I tracks I lack label-free so but you run them and at the end of it after you process your data using software like spectrum l or ms quant or the others that you have heard you will end up with a set of ratios so the ratios are basically relative to things you are interested in so the ratio between control and cancer for a peptide or the ratio between 60 minutes and zero minutes for some time series you are running so you have these ratios taken with reference with respect to some reference and then you usually log to transform them because if you erase your extents between zero and one and that is not a symmetric range sorry between zero and infinity and that is not a symmetric range things between zero and one are down regulated things between 1 and infinity are up regulated so the up regulated range is significantly larger than the down regulated range so it is hard to look at the data and work with the data but when you log transform it the unregulated ranges from minus infinity to zero and up regulated ranges from zero to infinity and so you we usually look at log transformed data the other reason for do using log transform data is that it also makes the resulting data look more normally distributed so that it follows the bell curve a little better and then we usually normalize in some way I won’t go into the details now because it’s kind of tangential so once you have your ratios that then your goal is to kind of determine which of my peptides are proteins which are in depending on the domain I am interested in or regulated in other words which are higher in my cancers compared to my controls or vice-versa based on two or more groups that you might be looking at so finding things that are different is essentially from a statistical perspective or hypothesis testing exercise so you want to say in my case versus control study

is the log of the ratio statistically different from zero because if the ratio is if the log ratio is zero then your cases and controls how pretty much the same value of that protein you’re looking at and so the question is can I set up a statistical test to see if it is different from zero if you have multiple groups it’s kind of similar except that the status the machinery you would use is different for multiple groups you use some sort of like a ANOVA or a multiple group comparison whereas here you would use like a standard t-test or one of those things that many people were probably familiar from high school mathematics and statistics so this is to kind of recap that these are the kind of workflows that you would use and I think Phillip has gone into excruciating detail on this so I won’t belabor the point except to note that this is the area where we’ll be focusing on so I mentioned that proteomics is a small sample domain you deal with one to ten samples and so on so forth so when you present that in in some audience especially people who are used to like looking at large numbers of samples the question they ask is is it really meaningful to say that you are going to find differential things when you are running one or two or three samples so for that I this is sort of like a back-of-the-envelope calculation there are a lot of assumptions that go into this but the bottom line is that it kind of gives you a flavor for what are the things you can expect to find when you are dealing with very small samples so here this column is what we call the coefficient of variation or CV for the assay so the mathematical definition of CV is standard deviation divided by the mean so it’s kind of like a measurement of a measure of the variance of your measurement so 0.5 means 50% variance so on average your measurement varies by 50% for a given peptide or protein and point 2 5 is 25% I pick these values because usually the technical variation of a label-free experiment is around 50 200 percent whereas the technical variation of a mrm experiment is sort of like 10 to 20 25 percent that kind of range so those are the reasons why I pick those values and now if you are doing a statistical test and you say my p-value is going to be 0.05 so I picks fixed my p-value now what happens if I have five samples what is the minimum fold change that I can detect if I have five samples with this p-value and with this measurement error this is saying I can detect about a fold change of three so if my case and control have a peptide that is on average different by about three fold with a very variation that is that is bound by 50% then I can actually find the difference how did I arrive at this this is based on how the t-test works so if you look at the t-test as calculating two confidence intervals for the cases and the controls then you don’t want the confidence intervals to overlap so that is kind of the back of the envelope calculation that is being done here so if you translate that into numbers you say about 3/4 change is what I can detect with five samples but then what is the power in other words if I did have a three-fold change how frequently would I actually find and the power unfortunately is about 35% so if you actually had a three-fold change in your cancers and controls and you had only five samples and this was your measurement error there’s only a 35% chance that you would find it yeah so sample pairs I I have it as pairs here because you need a cancer and a control and so I’m just saying that you have cancer control pairs so in other words this would effectively be a 10 sample study so 5 cancers 5 controls they don’t have to be exact pairs but if they are not exact pairs this analysis will change a little bit it’s easier to do the analysis if there you treat them as pairs and so that’s kind of why I’m using that terminology did I see another question somewhere No ok so now you can say ok threefold that’s what’s happening maybe I am willing to go 5 4 so I’m willing to find things that are fivefold different and stop there so fivefold are higher so if you are willing to go up to 5 volt then you have a 50% chance of finding it if it exists so that the table you can kind of see as your measurement accuracy increases you can see that your power I for detecting 5 volt change starts getting close to 90 95 percent which is a very reasonable power even highly touted clinical studies many times how power in the 70 80 90 percent range so this is nothing laughable but as you can see you need more precise measurements so that’s kind of the point I think the bottom line of all this this this this slide is to say

that small discovery samples can actually be useful if not for definitively stating what is different at least for saying what your candidates are that you want to pursue and once you get there then you can look at combining genomics and proteomics or you can do network analysis or go enrichment or gene set the equivalent of other enrichment analyses and then kind of narrow in on the groups or peptides or proteins that you want to focus on but the bottom line here is that with small numbers you can find useful information any questions so far ok so I said you can analyze small numbers of samples so let’s start with one so suppose you had a single when I say one sample I always mean a pair because you can’t have like one cancer sample and and try to find what is different with the cancer and a control without having any reference or a control so when I say one sample I mean one pair so you have a cancer and the control sample and you have run I don’t know Psylocke what our experiment to characterize the ratios of many proteins and peptides so let’s say we are looking at the peptide level I’ll get to the distinction between proteins and maybe I can just mention it here so as Carl has mentioned previously there you usually measure the measure peptides in the mass spectrometer so if you are interested in proteins you have to roll up those peptides to look at proteins and how you roll up is a little bit of a tricky issue as Carla’s mentioned yesterday with grouping protein grouping and all that stuff and from a four phosphoprotein mix for example it doesn’t even make sense to roll up to the protein level because you are looking at the phospholipid I and the peptide is what you are interested in so here when I say ratio it’s usually ratios of peptides but you have if you have calculated ratios for proteins you can use that to the analysis is a kind of agnostic to whether user you are using proteins or peptides so I’ll arbitrarily interchange them and it really doesn’t mean much of a difference except that you have to do different kind of pre-processing in order to get there so if you have one sample pair and you have a set of ratios that you have calculated so the black dotted line is a smooth histogram of the ratios that you have so you can see the histogram in the light gray the black dotted line is the actual smooth histogram if you use some smoothing software to do it so now you can do this thing called a mixture model analysis where you say I have some histogram approximated with the least number of normal distributions possible so this mixture modeling basically says what I have is composed of two three four five however many number of normal distributions find out what they are so the the point of this is to estimate the component Gaussian or normal distributions that make up your smooth histogram so here in this case applying this this algorithm basically gives you three components the red green and blue components so you can see that the green is kind of centered around zero so you can say that is the distribution of components that do not change because they are centered around zero there is a small hump captured by the red distribution which is a small set of downregulated peptides and then there is a much larger hump captured by the blue distribution which is a set of upregulated peptide so now you know what your null distribution is so now you can say okay I’m going to draw a line at where the 95 percent of this distribution is to one side anything above that my peptides are upregulated so you know you identify your null distribution and then you can calculate statistical significance using the properties of a normal distribution once you have that and then you can assign p values to your ratios based on all the other p-values that you have seen in this one sample pair so with just one sample you can kind of essentially catch excuse me calculate z-scores and then come up with p values for your ratios and then you can narrow in on thing upregulated things you are interested in or downregulated things you are interested in and then look at the lists or look at a some sort of a network analysis or enrichment analysis to see which ones you believe in and which ones you don’t believe in so this is applying that that methodology to real data where you are how three components the red green and the blue I won’t go into the details because the mechanism is essentially the same so now if you have more than one sample you can’t really do this kind of

an analysis because now suppose you had two samples these could be either technical replicates or biological replicates are – replicates that represent the same kind criteria you are looking at same conditions usually it’s better to have biological replicates because you have more variability in the biological setting as number the pointed out but if you don’t then you can just at least get technical replicates to get some measure of what your variation is so if you have two of these then now you have to analyze one sample in and you have to analyze the two separately and then you will get two different p-values for each peptide or protein you’re looking at and then you have to now combine p-values now if you have three replicates you have to combine three p-values and combination of p-values is not really that recommended I mean you can do it if you don’t have any other option but when you have an option that kind of looks at everything together and comes up with a single p-value that is much better and so that is what is done by T and F tests so we look at the T and F test first and then there is a variant of those tests called moderated tests so I will get into those two and these were these were the moderated tests were designed specifically to deal with like small numbers of samples and so they have some nice properties compared to the standard T and F tests which we learn in high school so what does the T test do it basically takes your data calculates the number which is called the test statistic and if that test statistic is greater than some value based on the T distribution you say the the null hypothesis can be rejected which means there is a real difference between your cancer and control or whatever classes you are interested in it is essentially like taking the number that you calculate the test statistic and comparing it to the distribution so if your test statistic where minus 1 then it’s kind of in the middle of the distribution so your p-value would be like 0.4 or something like that but if the value that you get is to an extreme like T equals 3 then there is very little of the distribution left after that so you would have a much lower p-value and hence higher statistical significance so that is kind of the point of this so how well you can assess this depends on how many replicates are home samples you have the more samples you have the more easily the the more the distribution kind of falls off before you get to a specific value so the less samples you have the more and uncertainty you will have and so the the higher your p-value will be and so usually we use this test to see if the ratio or whatever we are looking at is zero you could do it for other options by just subtracting to me the number you are looking for but generally we are looking for whether the the difference we are looking for is zero so if you are interested in a peptide and you are looking at the log ratio of the peptide this would be a perfect test the test would ask is the log ratio zero and if you can reject that hypothesis then there is a difference if not there isn’t a difference so for comparing multiple groups we use this ANOVA method where you try to characterize the overall mean of all the groups so you put all the groups together and calculate the mean and then you take your specific groups so your control your transgenic air and transgenic B or your baseline 10-minute 50-minute two hour time points so there are now three or four groups so for each group so the the overall mean is put all the groups together calculate the mean for the next term here you separate the groups and you calculate the mean and then what this test does is it says okay how different is my overall mean from each of the group means is any of the means significantly different from the group mean if it is then there is a statistically significant difference in one of the groups so the F test or the F test is based on this so the F test is essentially calculated based on the the difference between the values across groups divided by the values within a group so SS stands for sum square so you want to square it so that negative values kind of don’t end up cancelling your positive values if you are off in one direction by ten it is equally important as being off in the other direction by and so you want to take both into account so that is the reason for squaring but essentially here again you calculate the test statistic the F test statistic and compare it to the F distribution here and it’s the same same deal again so if your test statistic is two and you have some specific number of samples that determine the degrees of freedom for this how much of the distribution is left to what to the

right of that of the number so that is that will decide what your p-value is so the more extreme your value the lower your p-value will be and the more significant the differences so the one thing you need to note about the F test is that you can use it to do comparison of multiple groups but it will only tell you if at least one of the groups is different from zero so the the null hypothesis that this is trying to test is that the mean of all the groups is zero and if the test is significant then you know that at least one of the groups is nonzero you can’t tell which one based on the test I mean you can obviously go look at the groups and try to figure out what’s happening but you based on just this this test will not come back and tell you group three is different it will tell you in groups one to four something is different so many times what people do is after doing this if they want to know which one exactly is different you can go and do a list of paired t-tests so you can do like if you have three you can do three pairs of t-test and see which one is different questions so you need enough to calculate a standard deviation so which means you need two values in each group right so you need two numbers to calculate the standard deviation so you will need two number two numbers but if you have a lot of groups so then you are calculating a lot of means and standard deviations so you would need more so that is where this thing called the degrees of freedom come in so you must have enough degrees of freedom to calculate all the parameters you have so for each of the groups you are calculating a mean and a standard deviation so that is two numbers you are calculating for each group so if you have only eight data points and you have ten groups then you are calculating twenty numbers with eight data points that is not a well-defined problem so the number of points you need depends on the number of groups you have but irrespective of that each group must have at least two data points so you can calculate some so it is the same with the t-test you need at least two points that’s why I introduced this for two or more samples not for one sample because you must have at least two values in order to calculate a standard deviation otherwise you can’t calculate one and all these methods depend on the standard deviation any other questions so if you have a set of groups then that is what this is doing it is trying to assess if the groups are similar or you mean the level of difference so one is low medium high kind of thing yeah you can’t do that with this test right right so for that I think I will show you some techniques towards the end you can try to come up with like temporal profiles or some sort of like trend profiles that you can do or you can kind of many times you can get a feel by visualizing the data or if you think there is a specific trend you can do linear regression you can do other methods to evaluate trends and look at things so for this an alpha from the perspective of degrees of freedom technical and biological replicates are treated the same so what is the disadvantage of using one versus the other obviously biological replicates represent a lot more diversity and variation so if you can so if you had if you could run three replicates for your for the amount of money you have for the project it would be better if those were three biological replicates as opposed to technical replicates but many times it may not be possible and so you would end up running technical replicates but usually ideally you would want biological replicates many times what people do is if you have like five biological replicates and you are able to run two technical replicates for each biological replicates it would actually be good to take the mean of the two and then use it as a five sample test because then your mean is more robust because you have used more data to calculate the mean and so any variation that you might experience from a technical perspective is kind of minimized and you focus more on the biological variation so from the perspective of the test it’s just a number so degree of freedom is three that means you have three samples it doesn’t matter whether the samples are technical or biological from the theoretical perspective but from a pragmatic perspective you would how biological replicates so yeah the

the it is true to a large degree many of these things assume that your data is normally distributed and if you have more than like 10 or 15 samples or replicates then by the central limit theorem or the law of large numbers you can say that you can that you can assume that the result is going to be reasonably normally distributed but for small sets of samples that’s a little hard to say so that’s one of the reasons why we take the log transform so the log transform data is generally known to be a little more normally distributed the second reason is that people have been using the t-test in a lot of situations even in situations where things are not really normally distributed and there are empirical and theoretical results that show that as the data is more farther and farther removed from normality the deterioration of the t-test in terms of how accurately it gives you a p-value is is also incremental so if you had something that was almost normal the p-values you you get will be almost correct if it was off by a factor of 10 then your p values will be proportionally off so the TT test there are some statistical tests where if your distribution assumptions are violated then the test basically falls flat and you will get completely incorrect results that is not the case for the t-test that is one of the reasons why even though you learnt it in high school people still apply it here because it’s what is called a robust test and so the F test is also to some degree like that because it’s based on numbers that are calculated that are similar to the t-test so in theory you really wouldn’t want to be applying these if you think they are not normally distributed but in practice it’s not too bad if you apply the other issue here is so what would you do if you really think it’s not normally distributed and you don’t want to apply the t-test do you have any other alternatives so then you have to go to what are called nonparametric tests so parametric tests assume that your data come from unknown distribution you know the form of the distribution but you don’t know the parameters for the distribution you know it comes from a normal distribution you don’t know the mean and standard deviation but you know the distribution is normal whereas when you go to a nonparametric test you are not making any assumptions you don’t know anything and so you can calculate p values based on nonparametric test so instead of a t-test there is this thing called a rank test so you can do a rank test but the problem with those tests is that for small numbers of samples your p-values are going to be very conservative in other words you will get much higher p-values than you would get here so in in a way these tests derive some power from the assumption where they make about the underlying distribution and because of that it helps you in small sample situations but if it is a situation where you really want to be sure that the p-values you calculate are correct then you would do a nonparametric test or there are permutation tests which will give you exact p-values for the actual data that you have so you would need to resort to those kind of tests any other questions before I go on okay so I mentioned moderated tests in the beginning so the problem with tests with small samples is that suppose you are calculating a standard deviation with two numbers and it just so happens that by random chance one of the numbers was off by a little and the standard deviation was kind of close to zero and then when you take the ratio then the ratio is going to be unnaturally large because your standard deviation was slightly off because of some random error and that was not offset by the fact that you had a lot of measurements and so when things like that happen you call the the test statistic as being unstable because small changes in your data can result in large changes in your test statistic if that happens then it’s not a good thing because in one case you would have had a statistically significant results result and if you just changed a one peptide by a small amount then the statistical significant definitely goes away so you don’t want situations like that so for in order to address that the people in the statistic in the gene expression analysis domain have come up this moderated estimation procedure so here basically what you do is you use a Bayesian method to kind of say based on all the data I have seen I think my standard deviation is going to be X and then I go and look at the specific peptide and say okay how close is this to X and how how robustly can I estimate the standard deviation based on the number of samples I have so let’s say the overall standard deviation was one for the whole sample set looking at all all proteins and peptides and then you go and look at a specific protein and

that standard deviation dird deviation for that protein was 10 so it’s off by a factor of 10 so it could be a really regulated peptide but if you had only two measurements whereas the overall standard deviation was based on ten measurements you would say well there I would believe ten but let me weight it down a little bit because it’s based on too few points but now in another experiment if you had the same difference of one and ten but the 10 was estimated using ten data points you want to give it a lot more weight so that is kind of what this does so s naught is basically the standard deviation of the entire population of proteins or peptides you are looking at s P is the standard deviation for your specific protein or peptide you are looking at and you calculate these parameters D naught and D P based on Bayesian estimation from your whole data set though they’re essentially weights that are learned from the data that you provide and so what this does is it kind of moderates your variance so when your variance is based on very few values and it tends to fluctuate a lot this kind of trends to reign in the the the variation in the standard deviation and makes the test statistics that you calculate more robust so that is kind of what the the moderated aspect provides and so we use this in order to account for the small samples that that we small numbers of samples that we have and the other issue with using moderated sticks is that the p-values degrade gracefully as you have more and more missing values so this is a kind of the p-value is 0.3 when you have only 5 actual values you use so let’s say you had 25 measurements you are looking at and in one case 20 measurements are missing so your p-value for the same difference would be 0.3 but as more and more values become available and are not missing you can see that the p-value drops fast and when you have 10 values you are already reaching significance so this kind of shows that you can use this to deal with missing values remember in the beginning I mentioned missing values is one of the key things that we have to keep in mind because you can’t just put zero for that and so here if you have missing values you can legitimately deal with them leave the data out only for those prep tides or proteins that how those values missing and as the values become available or for other proteins for which values are already available you can achieve significance and kind of capitalize on the more numbers of samples more numbers of date with more data that you have for that peptide or protein so are there any questions up to this point and then I’m going to kind of slightly switch gears and look at it a different issue ok so the next thing I want to very briefly mention is this thing called multiple hypothesis testing you’re probably all heard of the fact that you have to quote-unquote correct your p-value when you do T tests or whatever in genomics and proteomics and so forth and so the idea is that when you use the t-test for example you calculate a p-value and you if your experiment had like 20,000 phosphoproteins that you are looking at phospholipids that you are looking at like Phillip or numbers are mentioned then you are doing 20,000 T tests and so you need to be aware that if you do that many tests just by sheer chance something will come out to be significant and so you have to account for it and a good example that I found I do not know maybe I found this on Wikipedia or somewhere so a pretty good example is so the multiple testing problem is the potential increase in false positives when you in other words you think something is significantly different when it is not simply because you are just repeating a test over and over again and so an example for that is suppose you are flipping a coin a fair coin so in for statisticians a fair coin is a standard term they use that is a coin that when you flip on average gives a probability v of falling heads on probability sorry not 5.5 of getting a heads or a tail so you want to assess whether a coin that you got from someone is fair or not for that you test the fairness of the coin by flipping it ten times so you flip the coin ten times and you observe how many heads and tails you get so if you use a point oh five P value then this is basically saying if you observe more than nine heads your coin is biased so you flip the coin ten times if you see nine or ten heads with a p-value of 0.05 you are going to say my coin is biased so if you do this once you might get five heads six heads and you would say it’s not biased if you

if you got nine or ten heads you would say it is biased so now if you went to a bank and they gave you 100 coins to test so you repeat this flipping each coin ten times on each of the hundred coins so I won’t go through the calculations here but the bottom line is because you repeated is this many times there is a thirty four percent chance that you will deem that there is at least one kind that is biased so even though there were no biased coins there is a thirty four percent chance that you will come up with the conclusion that at least one coin is biased simply because you did this 100 times so that is kind of the reasoning behind multiple testing and you want to correct for this I mean this is just hundred what about 20,000 or in the genome world is 25,000 a ubiquity norm 5000 so you are looking at thousands or tens of thousands of tasks and so you really have to correct for this otherwise you are going to end up with a lot of things that seem very rosy and you will be very disappointed actually this is the other way around Oh point three four is that all coins are deemed fair so that means you have a 66% chance that some coin is deemed unfair so in order to do this the most straightforward thing is what people call the bonferroni correction so you you want a p-value of 0.01 and you are doing hundred tests so you set your threshold one hundred times lower so I instead of a p-value of 0.01 now you use a p-value of 0.01 divided by 100 I can’t think how many zeros it does which is right so basically you you make your test more stringent more and more stringent depending on how many times you are repeating the test the main disadvantage and you get with things that are significant you’re golden this is kind of the most conservative multiple testing correction and the problem with that is it is the most conservative correction and in things like proteomics and genomics where you do a lot of lots of tests many times if you apply this you will end up with all p-values being one nothing significant nothing anywhere so that is primarily because the the bonferroni correction is meant for tests that are completely independent the correction is really true only when all the tests are completely independent in other words the two genes are the 30 genes or the 300 genes you are testing are all independent which is never the case in genomics or proteomics because there’s a lot of winter relations between the expression of one protein and another or one gene and another so this condition never applies and it results in values that are way too conservative so in order to there are many ways that you can compensate and come up with multiple testing correction that is more balanced and one way that people commonly use is called the Benjamin II Hochberg false discovery rate correction so here what you do is you kind of sort all you excuse me sort all your p values and then you kind of decide what your final p-value I won’t go into the details but you decide your final p-value based on multiplying the sorted p-values by some factor depending on how many tests you are doing so generally this results in reasonable correction and in most cases people use this it’s relatively simple to implement it’s just a couple of sorting exercises and there’s there’s like code that you can write or code that you can find that will do this for you so I’ve spoken about moderated T and F tests and multiple testing correction for those so we have in gene pattern I don’t know how many of you are familiar with gene pattern this is a suite of software tools created by the cancer informatics group and you can actually very easily add new programs and modules to it and we have taken the moderated F test and T test module and added it to it it’s called moderated tests if you are interested in using it you can find it at the gene pattern beta dot Broad Institute our site for now but if you are interested I’m more than willing to help you navigate this but the input data will be like this so you have your this is like phospho peptide data so you have phosphor peptides you have the log ratios for replicate one log ratios for replicate two and log ratios for replicate three and you can see that some of them are missing for some two values are missing so for this one you won’t get a p-value at the end but if there is more than two available you will get a p-value at the end and if you have more replicates you’ll get correspondingly stronger p-values so you take a csv file like this and stick it into the input file column and you can you can pick T or F test depending on whether you have two groups or more than two groups that you are comparing and

when you hit enter you’ll it will run through and it will create a table that will give you the nominal p-value the corrected p-value you can also specify what corrected p-value significance you want and based on this this will actually draw a plot so this is for a two sample case where you have replicate one along the x axis two along the y axis so this shows the histogram of the ratios of the individual this are called marginal histogram so this is for replicate – and this is for replicate one and when you combine them this is the scatter plot the red dots are significantly regulated up and down regulated the black ones are not so you can see that if there is a change in sign so here it’s positive in one whereas negative in the other replicate it won’t achieve significance because there is too much of variation and too much of difference but if they are consistent so both are positive and both are sort of positive to the same degree then you will get significance and the red dots are based on the on this number that you said if you want to be more stringent you can set it to 0.01 or 0.001 and you’ll get fewer red dots here or vice versa if you have three replicates it will plot something like this where you have the marginal histograms along the diagonal and so each one is this is called a pairs plot so this is the plot of replicate one versus replicate two this is a plot of replicate one versus replicate three and so on and these provide the correlation numbers for the corresponding plots if you have multiple groups you will get one such figure for each group so this is for Group one this is for group two and so on and the interpretation is substantially the same so here the red dots are decided based on the F test whereas here the red dots are decided based on the t-test and we use moderated versions for both F and T so if you have more than a recent 5 or 10 samples so let’s say you have an M RMSA configured you have 300 cases and controls of breast cancer samples you run them all on your M RMSA and you have measured 200 proteins in those samples ant you wann do you want to do more detailed analysis so then now you can because you have larger sample numbers you can use pattern recognition methods and other standard gene expression analysis tools and other machine learning tools so we have a collection of these in gene pattern these are easily available in our SAS many other statistical packages and you can easily access them and use them you can plot heat maps you can do classifiers you can do clustering and so on and so forth and someone asked in the beginning if you have trends in your data how would you see it so one way to do that is clustering so this is an example that shows fuzzy clustering where each point can belong to more than one cluster with some probability and so the probability is shown by the intensity of the color red means it’s higher prior probability that it belongs to that group and as you get into more and more of red colors the probability of that belonging into this group is less but if you do this and you have nine clusters you can see that this if this were a time trend right so you’re looking at time points you can see that this one starts off high and then drops whereas this one starts off low and then goes up so now you can go back see which proteins or peptides are doing that and then see if they fall into pathways or you can continue with your other analyses so the the as I’ve been mentioning you can apply this to peptides or proteins and I think I commented on this before so I go won’t go into that again the one thing I want to emphasize is many times people come back and say okay I set my adjusted p-value to 0.05 there is nothing significant so what do I do do I just throw away the experiment I spent $100,000 on and six months in the lab and you don’t have to so especially with small samples the goal of applying these tests and corrections and calculating p-values or for changes is to prioritize the results so if at the 0.05 level you don’t find anything significant but you look at the point two five level and you find ten things and you know that five of those are really correct because you know what pathways are activated that’s fine simply because you have smaller numbers of samples and simply because Steve Carr’s group is so phenomenal that they can measure hundreds of thousands or tens of thousands of peptides doesn’t mean that you have end up with nothing if at the end because you don’t have statistical significance simply because there were lots of tests that you did or very few replicates that you have you might end up with things that do not cross

statistical significance but you can still use the p-values for prioritization so you can prioritize the values look at that you let’s say you see you are interested in or you have the ability to look at 25 or 50 things so you can just use the p-values sought by that and then using or sometimes people sort by a combination of p-values and fold changes and then use those results so simply because something doesn’t come out statistically significant doesn’t mean it’s a complete wash it just means that the signal isn’t as strong and the numbers of samples you have and the number of proteins you are measuring doesn’t quite support full statistical significance but then you can use it as a prioritization mechanism and the the kind of accompanying statement with that is that you now really have to use those peptides or proteins and do further analyses either sort of network pathway or enrichment analysis or mrm validation or other genomic support or actual lab experiments to kind of go and figure out which ones are you you’re you tend to believe in so this is kind of the point where I switch from discovery base to targeted analysis so in discovery base like I mentioned in the beginning you are interested in finding what’s different how do I get there how do I assign statistical significance what do I believe and how do I deal with it now if you are going to do targeted analyses then you are going to do this class of experiments called mr mms analyses and here primarily you want to be looking at so that NMR mmm SSA measures a single peptide or a protein and you want to be able to say that say how good your assay is in terms of linearity limits of detection how how low can it detect in terms of the samples that you are going to provide things like that so you want to kind of qualify your essay to begin with and then you want to apply the essay to samples so for that I think you have seen this flow or Seuss going to talk about it if you haven’t this basically says how you run an MRI MMS experiment I won’t go into it it’s just a placeholder so to analyze data that comes from Mr MMS experiments we have this tool called quasar and quasar essentially calculates a lot of statistics visualization and and provides a lot of information for you to look at how your asses are working and what you can say about them so you would start with data from the mass spectrometer you would process what is Co using some tool either vendor provided or there’s third-party tools which like like one is called skyline I think sue might talk about that that let you process mr MMS data so basically what you will be doing here is you will process the raw data files and you will do peak extraction and what is called peak integration I think Hasmik mentioned peak area ratios yesterday so those ratios and stuff will be calculated in this tool so once you have the ratios for all your samples and concentration points that you are trying to assess the performance of where I say then you would get into quasar so quasar is essentially this block around of collection of algorithms so one of the things you want to do is able to act so quick quickly it has multiple components one is doing calibration curves it can also find interferences in your mrm transitions I think Hasmik covered this briefly yesterday you can quantify your limits of detection and quantification and you can quantify the precision of your essay based on the coefficient of variation so each one I’ll just spend two minutes on on all of them so audit is one of the tools that we have here that can figure out if you have an interference so an interference is where when you have a transition that you are trying to measure so a transition consists of two masts that are combined right so there is the q1 mass or the precursor mass and when you fragment the precursor you get multiple fragments and you’re going to look at one of those fragments so a transition is a combination of a precursor mass and one of your fragment masses and you want to measure the area or the peak area under your specific fragment mass so when you are doing that it’s possible especially in complex samples like plasma that something else also has the signal at exactly that same mass so both the q1 and q3 masses match and the retention times are also reasonably close that one contributes to the other and when that happens you want

to be able to tell that that is happening because otherwise your quantification will be off because that one transition was affected by something not.you are other than what you are measuring and so to do that we calculate so if you are measuring 3 or 5 or some numbers of transitions we calculate these things called relative ratios these are the ratios of the intensities of one transition to another for a given peptide and when you do those you can apply a statistical test to see when there is an interference and if there is an interference this tool will fly it is also possible that in some cases the coefficient of variation is too large remember I told you that we expect something around 15-20 percent to be like the achievable coefficient of variation for mrm analysis if it goes beyond that for whatever reason the peptide is not behaving well or there is some other issue with your chromatography then you want to be able to flag those two so audit will flag both both kinds of issues calibration curves I think Kosmic covered this yesterday we plot linear calibration curves but we visualize in the log domain so we can see all the things as you can see here so this is the linear calibration curve and this is the log plot so you can see all the points here many of the points are kind of cluttered around at the bottom because you are on the linear scale and we fit calibration curves using robust weighted regression the as the measurement values go up the measurement variance also goes up so you want to account for that using weighted regression and the software will also calculate confidence intervals and kind of provide you a table with those limits of detection and quantification this is like a domain onto itself I won’t go into details except to say that this is kind of the the limit of detection is the lowest value that you can confidently detect in your sample lowest amount and the limit of quantification is the lowest that you can quantify so the difference is that by the time you are at the limit of quantification your coefficient of variation has dropped so that you are measuring it in a very precise way with just detection you are happy if you can say that you have seen it confidently whereas with quantification you should have both seen it and measured it well enough to quantify so you these two are very important numbers for any assay because you want to know if you are going to measure some protein in plasma for example and your assay has a limit of detection or quantification that is much higher than what would be found in a sample that you are going to get then it is an useless asset so this is a very important number that needs to be calculated we usually calculate it using a blank sample and a low concentration sample again I won’t go into details there I can talk with anyone if your interest rate offline but we have several methods to calculate it and the most common method we use uses the mean and the blank sample so again quasar is available in gene pattern you can plug in your input table and press a button and you will get all the results and plots so some examples of what you will get our calibration curves so this is the calibration curve where you have three transitions all of them are almost along the diagonal all of them are very close and you can see the data points there is some variation here at the low concentration level but beyond so you the limit of detection is somewhere around here beyond that it is all very nicely lined up along the diagonals so the top is from a peptide that is well behaved peptide and is a peptide that you definitely use to quantify that protein whereas the bottom you can see that there is more variation in the slopes of the three transitions the values are all over the place the limit of detection is probably way over here it’s too high to be of much use you can see these lines here show that those transitions have interferences so this is a peptide that you might want to consider twice before using it for quantification so if you have an assay that you are trying to set up for like 300 peptides quasar will come up with 300 plots like this so you can just quickly glance through them pick out the ones that are problematic and then focus on them and keep the rest so it is a tool to kind of accelerate the asset development process so this is these are other examples from phaser so the LOD loq table will give you those values the regression table will give you the slope the errors of the slope the intercept so it will basically give you the regression line and intercept the audit table will tell you which transitions are which samples how interferences and then the CV plot so this is AZ concentration increases you can see that the coefficient of variation actually drops quite well so

this is kind of an ideal situation which you would expect at very low concentration it’s hard to measure and so you have a lot of variation as you go higher and higher you can measure better and better and your CV drops so or it will create plots like this if you have replicates for the samples and concentration curves that you have so we are done so this is a closing quiz now to keep you guys on your toes so NMR mms experiment measures the level of a protein in 100 breast cancer and 100 120 control samples what do you need to measure it to analyze it so one thing I kind of forgot to mention here is that you once so this mostly what I have talked about in quasar is how do you get measures of how good your assay is when you are developing the assay but once you have developed assay and you have run the assay on various patients you can actually process the data in quasar again and you will get measurements of CV a coefficient of variation and mean for every sample that you have measured and so you can use that to kind of see what may what the various measurements are in your sample for the peptides or proteins you’re measuring and so this is more of a serious quiz I think so in this case you would actually need to set it up using quasar and then analyze the data that comes out of these measurements also using quasar but now if you are interested in so let us say you measure 200 proteins and you want to know if any of them are actually statistically significantly different between the cases and the controls then you would use some sort of a moderated or some other test to actually look at that so this is where in the beginning I mentioned I am kind of hand waving a little bit so even though this is a targeted experiment when you start applying targeted experiments you may apply it to find things that are different in your samples of interest or groups of interest and and if nothing works find a statistician so I went to one a conference of recently a year ago and there was a person who was giving a talk so she was from England and she was from a traditional biology department and she goes well we were trying to analyze this data very hard we heard a lot of trouble and then there was this guy who lost his way and came to our lab to ask where how to get to an auditorium we found out that he was a statistician and we locked the doors and that’s how they found their statistic so it I don’t think it’s that bad here I think you can probably find where most registrations than you need here if not talk to Steve Carr some references so if there are any questions you