Biostatistics R: Cox proportional hazard model, Hosmer and Lemeshow Chapter 4, Section 4.3

hi guys hope you okay Sunday and today I’m in good spirits I found this really good place in Piccadilly in London that sells well very tasty kimchi right well what we’re doing in this session is following on from previous video of interpreting a coefficient in a Cox proportional hazards model and examples taken from the book Hosmer and LEM show not sure that I’m pronouncing that right looking at the first edition chapter four and previously we were looking at the interpretation of qualitative qualitative explanatory variables otherwise some people called covariance and this time we’re turning to the attention to interpretation coefficients when we have a single continuous Cove area in the Cox pH model alright so here’s the data so it’s about remember it’s about follow-up time of hiv-positive individuals over a period and what we have is censoring going on so one indicates that an individual actually died at the recorded time and if it’s called a zero it means that the time that we have is just the time that last saw the patient alive and that patient is probably what we don’t know exact time of death so that person could died after the trial or could still be alive okay what we notice is here that drug whether you take drug on yes or no historic historic in the past and that is a qualitative variable and that was an indicator variable and last time we didn’t know to the interpretation of this when it went into the model and this time focusing attention age age is continuous these numbers actually mean something as its increasing obviously means a person getting older and older right so I want to fit pH small load the package survival I’ve done it already so I’m not going to do now and then just like before run it the page using the command pH and now we’re doing on age and I’m sticking to this method like in the book the breast low although we know it’s better to run the Ephron it’s more exact and if we can run the exact if exact works it’s better to run the exact alright the exact complicated models takes ages takes forever or it doesn’t go all so that’s why people tend to prefer then the Efron approximation this Breslow approximation is not as good but for some reason they’re using in the book so let’s just do that so the numbers match to what you see in the page of the books alright and our focus is on the coefficient and it’s pretty much a breeze for we have continuous covariance so here you go just to recap the exponential of the coefficient gives you the hazard ratio for an it in this case for an additional year increase in age age is measured in years right the hazard ratio is just a word but what does this number 1.08 tell us well recall the kind of rule I told you about looking at this number if the number is 1 that tells us that the risk of the event in this case the event is death the risk of death for a person who is one year older is the same as the risk of someone who’s not one year older like I said more cleanly than that so if that’s one that saying is saying that the risk of death doesn’t change for an additional year in age and that is independent of the age which the increase is calculated so in other words if a person is 20 going on to 21 if this was actually one and not 1.08 the person was 20 compared to a person who’s 21 the risk would of death would be the same as if we compared person who is 60 to a person who is 61 or compared to a person who is 90 but to a person who’s 91 right so that’s what I mean by saying an independent our age at which the

increase is calculated phrase I’m just kind of peeling from the book page one to nine if this core this number here this parameter is bigger than one that the risk of death for an additional increase in age increases if this figure is less than one that tells us that an additional the increase year increase in age decreases you know so first we got to be able to get the right word is an increasing risk or decreasing risk or not changing risk in this case we could see that figure is above one with only slightly above one so we can say that for an additional year in age the risk increases and that’s irrespective of the age at which the increase is calculated okay so how much does that increased by Hampshire does the risk increased by by an additional for an additional year in age increases by a factor of 1.08 increasing by a factor of 100 1.0 is same as saying talk about 8% increase so what you do is you subtract one off the figure and multiply by a hundred to convert into a percentage change so if I did it here just this is basically the formula I take that figure I subtract one okay and then I multiply that number by 100 to convert it into a percentage so it’s eight eight percent increase if that number turned out to be negative then we’re talking about percentage decrease okay so we think to ourselves well that is plausible that this they would expect that if the person gets older the risk of death will increase no once we got the hazard ratio would also like to report the confidence interval for that so let’s look at that you can get the for more fuller output by typing a summary the object here you go so we’ve got the original coefficients as before and something more we’ve got the confidence intervals so here you go these confidence intervals 95% confidence appalls it’s a respect of the hazard ratio exponential of their coefficient I remember I told you in a repeat again that you know it can’t be for this one and it can’t be for coefficient so it can’t be for coefficient it cannot be for exponential of minus the coefficient figure because for each of these guys this figure does not fall in between these two this does not fall in between this only this one falls in between this so remember an estimate and the confidence interval is similar the estimate just gives you it’s like you know what if I had to ask you what is your best guess or the hazard ratio give me one number you would say one point zero eight five but if I ask you okay don’t give me a an estimate give me you know give me an interval give me a lower value in the upper value where you think that that traps the true hazard ratio and you know with 95% confidence whether that means you you would say you say oh you know kind very vaguely I’m 95% sure that that true value luck is trapped between one point zero four eight and one point one two three so can you see that they’re saying similar things so they should be consistent with each other one another so to say that the true value lies between these two guys it’s consistent with this saying that the estimate is 1.08 because this guy is actually in this interval isn’t it whereas this guy’s no way that interval because it’s less than a lower limit likewise for the straight coefficient that’s way less than the low limit as well so it can’t possibly be for those two coefficient all the exponential minus the coefficient and again to a revision what is the interpretation of exponential of minus coefficient we don’t really need this figure it’s just looking at it the other way around so it’s talking about person if he goes not an additional year increase in age additionally additionally a decrease in age this really a decrease in age leads to into a factor change by 0.92 which is the same as saying an 8% decrease in the risk of death well that’s consistent with talking about the other way around one year increased it’s an 8 percent

increase in death right finally then it says in the book here let’s just get it I’m looking at first edition page 1 to 7 bottom of the page no top of page 1 to 8 it says here often a 1 year change in age is not of clinical interest no physicians conducting the study may be more interested in five year change in age in other words you know one year might be too short a period you might want to look at it more medium term like 2 3 4 5 whatever so what I want to be able you to be able to do is to be able to calculate their hazard ratio and a corresponding confidence intervals for any amount of year change see the code the output given by art and indeed for any kind of standing package SAS spss stata it will all look like this you look very similar to what you’ve got on the screen here but for any of those packages how do I calculate their hazard ratio are you this for say C C year change in age and the corresponding confidence intervals all right well in this book page one two it shows you how to do it for five years right I want to be able to show you how to do it for any number of years it’s basically the same procedure and for this I’ve got the Excel spreadsheet here you can do this by hand but I’m using Excel spreadsheet okay let’s set it up so what we’ve got is the estimate of the coefficient coefficient here our beta being naught point naught 8 1 4 so let’s pull that in naught point naught 8 1 4 and standard error of the beta that is let’s move this aside naught point naught 1 7 4 4 yep so we let’s type that naught point naught 1 7 4 4 take time to do this because one mistake and the whole calculation is wrong now the critical cutoff point for that 95% confidence interval well sample size is large and this thing it’s going when you use the normal table so just you know you know if at the top of your head is 1.96 this is from a standard normal table z table coffee point for a 95% confidence interval okay so the output here gives us the hazard ratio for one year change so I want to complete this I’ve got a column here okay now okay what we want to complete is this year change and then I want their hazard ratio lower limit of the 95% confidence of upper limit of the 95% confidence interval and okay let’s just do one year change you’re in our lectures this is the kind of kind of the output that our gives you anyway my pleasure to repeat this is what you do a year change that say it’s a one year change and remember the year change for the interpretation doesn’t matter what age it’s calculated which is a disadvantage in this model obviously yeah but that’s something you’ll see can easily be rectified later on what you want to do here is calculate the year change times the value of the beta okay that so let me just type it straight in like that okay and then what you want to do I need this figure you see but I’m not actually this figure is not reported in the table if you’re gonna report in presentation it’s just about working the hazard ratio we’d need the exponential so type exponential exponential you don’t do it correctly you see the little thing coming up exponent number and then we enter the formula and what we want the has a ratio is this guy tap that enter 1.08 for eight so that’s pretty much what we’ve got last time now the confidence interval had the lower limit first the lower limit of a 95% confidence level for the hazard ratio that is equal to exponential right and what you do is the exponential of this guy here the change times beta so it’s why you see I

have separate column I don’t want to type that in every single time performative for that and then we do – and then this is what we need this cut off cut a value 1.96 1.96 times and then what we want is the change change in the age which is 1 times and then we’ll under standard error okay there’s a proof for why this is the case but you know we just want to know how platinum so let’s just do that that’s one point oh four eight two three five right let’s go to our did I get okay they go one point zero four eight right one point okay so we got them good that means run the right lines in at one point or four it’s a number basically two decimal two decimal places okay then 95% confidence about the upper limit is going to be very similar exponential so if we do a minus instead of a plus so yeah let me not taking shortcuts I’ll just type it in slowly again next minute sure I’ll repeat again this chain gearchange times the beta this time I’m adding and not subtracting 1.96 and then I multiply by the Year change one and then I multiply by the standard error which is that figure and okay that’s at this time let’s just click on it because this is actually the short way to do it and what you do put dollar sign in front of the letter e and 3 as well to stop it changing when I drag this down to call them alright 1.12 yeah okay well I didn’t think I’d seen the figure before one point one two three oh one point one two year-round out four three okay all right okay so how about if I want to three can you see this is what’s so good about once you’ve set this up I can do it for however many years of course this child it lasts five years or something so it wouldn’t make sense to continue on none but suppose I wanted to go on and on suppose I went up to ten years that’s long enough isn’t it well we don’t know say ten years I just want to show you that this is easily done well you do low if you track this down let’s just do it for this column you’ll see the numbers automatically update because they’ll know and they’ll recognize the pattern that all you’re doing is let’s see yeah that you’re doing year changed times the beta so here should be d9 cell d9 yep which is what’s why saying cell d9 times that figure and all the way down all right so indeed if I grab the whole lot and I grabbed this all the way down it’ll give me all those years I presented there why is it doing that the the all the years that I’ve gotten here yoki middly hazard ratio along with the lower limit and the upper limit no why is that e e twelve yes just right so when something like that happens you just gotta look at it on e 12 minus 1.96 times D 12 D 12 okay x e7 e yz7 appeared I have a C times e when I post what I needed to put a dollar sign in front of Senderos I don’t unless you put a dollar sign means that when you drag down those numbers don’t that cell reference doesn’t change obviously I do not want cell reference to change so if I check down now so those numbers should be slightly off okay all right check this as well all right dollar sign from the standard error right that’s correct okay that’s why before if you’re not sure about I just type in the actual number then you won’t go wrong then right well there you go so I’ve got an entire table of hazard ratios lower the confidence intervals for a lower and upper limit the one for the book is is done for it works out for you by hand five-year so is this figure correct I mean does this agree with what they’ve got in the book and I just highlight this thing doesn’t excel colors or not colors are not nice in Excel all right so what we’ve got is

that the hazard ratio is 1.5 that’s what it says in book okay one point five zero and the lower limit of the confidence at one point two six four okay it’s about two six slight higher right and this one is 1.78 right so the number pretty much agree you might be asking in this stage how many decimal places should I report my number two that’s always a question that I get asked of students or whatever kind of stats class were in first of all when you’re doing your working don’t round don’t round don’t Brown too early right so you try to keep several decimal places of accuracy only at the end do you do any kind of rounding and obviously in a presentation it would not make sense to report it to how many decimal places are here oh I don’t know six seven eight it was meaningless those numbers absolutely meaning to so many just places far back since I told you that the hazard ratio could be interpreted here one point five is like a fifty percent increase so for that tells us basically they for a five year increase in age starting from whatever age you’re looking at leads to increase in risk by fifty percent you see so I’ve just done that times that by a hundred to get well take this figure minus one times by 100 that’s gives you your fifty percent I don’t say fifty point to three percent because that’s meaningless to people fifty percent you see so you could think about the same thing with this one so you might just have this to one point two seven because I’ll be talking about twenty seven percent as a lower limit and and seventy eight percent for the upper limit so you see in other words one point two six light suffice or if you want to round up one point two seven and this one one point seven eight right so rounded this up to one point two seven if I wanted to report it because the third decimal place is a six it’s bigger than five so I round up if it was lower than five you round down they sit right in between to snack on five it’s up to you whether to round up or down but be consistent and you do the same thing for this one this one one point seven eight two three one I want a report to do displaces because that mean I could say it’s about seventy percent into percent to eighty percent increase I would say seventy one point seven eight because the third a surplus here is two on a scale of one to nine it’s closer to zero so I’m round down alright that’s that’s what I mean by that okay I just want to remind you like in the book that when I run this model I miss you know this is only for the purposes as an exercise to kind of interpret the coefficients in this case it may be quite me it’s the expected sign that we would you know we’d expect that the older you get then that increases the risk of death but whether or not this is true it’s probably not true it’s because we’ve got the model is too simplistic right doesn’t capture other features that affect survival time so that’s a warning and also this cops pH model relies on the assumptions and those assumptions have to be checked before we can actually go ahead and report these figures all right but that’s for another time the next part is section four point form multiple core Verret model so that just takes this here model but just adds on now adds on then it’s more than one one one covariance so that’s been another time okay thanks for watching take care