The Outcome of Psychotherapy for Borderline Personality Disorder: A Meta-Analysis

my name is Ken levy and I will be presenting a meta-analysis that I’ve done with my colleagues bill Ellis and Christina Thomas and shabad Rotten kelsa where we looked at the psychotherapy treatments for borderline personality disorder and before beginning I’d also just like to point out and thank Bill Ellison one of the author’s because he did the statistical analyses he’s quite a meta analyst and quite a data analyst and you’ll be seeing him in future years here presenting now we’ve heard reference to a number of different psychotherapy treatment options for people with borderline personality disorder and here you see just some of the treatments that have actually been tested in randomized control trials some coming from a cognitive behavioral traditions some coming from a psychodynamic tradition and many of these treatments have been assessed in two or more randomized control trials which is a criterion for being an empirically supported treatment from the American Psychological Association Division twelve and there’s actually a number of other Streetman that have been tested in pre post studies like Joel mentioned his but others too that have actually very good data and probably warrant future research the point I’m trying to make here is that there’s there’s a lot more out there than I think people often consider and there’s actually a rationale to be aware of all the different treatments and for us to be examining their effects and through the meta-analysis I hope to make that point but one issue in general is that studies consistently show only about 50 to 60 percent of patients in treatment are actually improving and that’s a lot of patients who aren’t actually improving and maybe they’re like Joel said there are people who won’t necessarily get better in any treatment but they also maybe people who would get better in a different treatment and even a treatment from a different orientation and so I think it’s important that we be aware of all that is out there and what the effects of these treatments might be in addition even those patients that do show effects in treatment the 50 to 60 percent that do better there they’re showing often clinically significant change if you go from a thirty five gaff to a 65 gaff you’ve made clinically significant change but you haven’t made clinically sufficient change none of us in this room would want to be functioning out of 65 and so a lot of the changes that you see the reduction in hospitalizations and ER visits etc are really important changes but they’re not necessarily enough and the meta-analysis was our hope would actually help us understand what kind of factors were contributing to the outcome that we see in these various treatments now if you look at the literature it’s it’s quite complicated there’s a lot of variability in the various studies out there and here’s just some of the ways that these studies have varied in terms of dose and treatment setting samples outcome measures etc and there are other issues that actually affect that the literature such as researcher allegiance and publication biases and how are we as researchers and clinicians to draw conclusions from this data to make both research decisions in terms of where we want to put our efforts at the most cogent questions and how our clinicians in the community to make decisions about triaging patients to particular treatments and so we thought a meta analysis would be helpful in that regard for organizing the data as well as looking at treatment moderators and this was one of the main issues that we wanted to do in this meta-analysis was actually look at treatment moderators now this is a pyramid of evidence that has been articulated in the empirically supported as a psychotherapy literature and you’ll notice actually at the top of this pyramid is systematic reviews and meta-analyses often this isn’t this isn’t meta-analysis have been actually quite controversial in recent years your Joel mentioned a lesson during meta-analysis which they think is one of the controversial ones but there have been other instances too but meta-analysis can be quite useful for summarizing data and any individual study that’s carried out can also distort our could distort the because distort practice because we may put too much emphasis on that study and I think we’ll see we look at the results of the meta analysis that something like that has probably been going on now for many years there have been three recent meta-analysis actually a fourth one on drop out that I won’t talk about but the three main meta analyses are the ones that were published by the Cochran report in 2009 by Binks Adele a year later a meta-analysis on DBT that was in consulting and clinical which bill and I actually were reviewer on and then the most recent Cochrane report by stuffers that I’ll leave that was recently

published and these met with three meta-analysis you may be saying why another meta-analysis but these meta analyses were limited in particular ways that we wanted to actually address the the initial one Cochrane report meta-analysis actually examined only seven treatments or seven treatment studies six of which were DBT about 260 patients and at the time what the conclusion was that the confidence intervals was so large around the effect size that that the findings were unreliable and they essentially said well it looks like BPD could be treated with psychotherapy but all treatment should be considered experimental a year later sixteen studies of DBT were meta analyzed and in this study they found a moderate effect size for DBT when compared to treatment as usual but buried in this article even though bill and I tried to get this out was the fact that when DBT was compared to an active control the effect size was 0 1 and so DBT clearly effective compared to a tau but in this study doesn’t that does not necessarily look like it’s better than any of the other active controls that it was compared to and then finally the meta-analysis by Stouffer’s and colleagues this was actually more of a series of sub analyses that actually are hard to sort of summarize but essentially their conclusion was that there are DBT is the most intensely studied of these treatments and therefore has the most evidence MBT TFP schema focused therapy and steps are also treatments that have evidence and are promising what we wanted to take a very different approach to our meta analysis I’ll say a little bit about that before I go to the critiques we we wanted to be we wanted to actually look at moderators and in order to do that we felt that we should actually include as many of the studies that have actually been completed most of these studies have inclusion exclusion criteria and result in sort of subsets of studies that are available for the meta-analysis and we Leif we wanted to be overly inclusive and actually look at design issues and patient variables and treatment variables as they might moderate the effect size meta-analysis as I indicated before is controversial there’s a number of critiques that are out there the inclusion exclusion critique is an important one because there’s been a number of meta analyses where people have published on the same issue within a year or two of each other and actually found very different conclusions so Chris Kristoff and Schwartzberg looked at dynamic psycho therapy for depression found different conclusions Toland and womp all looked at CBT found very different conclusions and Smith’s and lesson ring found different conclusions and sometimes these these arguments have actually gotten quite nasty in reaction over the last few years there’s been recommendations about reporting guidelines and and also assessing the quality of the meta-analysis we and our meta-analysis used all three guidelines and we adhere to every relevant guideline for our meta-analysis we’re hoping to not be critiqued on some of those same grounds that other people have been critiqued on okay so turning to the method but this extensive literature search looked at conference abstracts retrieved articles through reference sections made appeals through listservs many of you may have received that and we identify 2100 over 2100 articles that were then comb through to determine whether they would meet eligibility in terms of our eligibility or inclusion criteria we stayed very close to issues of relevance and pragmatics so in terms of relevance we only looked at studies that were psychotherapy because this is the question we were interested in we also looked at specific treatments for BPD sorry treatments for specifically for BPD and not associated diagnoses like depression we also wanted BPD patients not mixed diagnoses or individuals with let’s say an Associated problem like self-harm but not necessarily knowing whether they had borderline personality disorder and in terms of the pragmatic concerns we had to be able to read the article and we hadn’t had to have understandable statistics that we could use in the meta-analysis and you can see the number of articles excluded as a function of that this is a flow chart that shows how we got to where we ended up with 73 articles or 73 studies for inclusion the meta-analysis which is quite a lot of studies in fact we had no idea that there were going to be that many psychotherapy studies of borderline personality disorder when we started this so 73 studies we looked at both between groups and within group effects

there were 32 between group studies 32 studies that provide a group between group effects 1,700 over 1,700 subjects 70 studies that we were able to get pre post effect size estimates again over 1,700 individuals we this I’m going to skip through this quickly we used hedges G to look at between group effect sizes it actually controls for some of the bias sees with small samples in Cohen’s D we looked at we used this robust covariance estimation which was a new procedure that was actually quite useful in the moderator analyses and we used Becker’s modification of hedges G which doesn’t necessarily have a name associated with it for within group effect sizes we looked at sample moderators study level moderators treatment moderators design moderators as well as the effects of raters on outcome and domain of outcome and our coders were both graduate students and advanced undergraduates graduate students with regard to moderators and study quality which is something that we addressed we used downs in black which is a standard measure developed for rating studies in meta analyses I won’t go too much into detail about that there are a couple of others in terms of our reliability of coding you can see it’s quite high interestingly our ratings of the quality of studies was quite high – and there was another study that actually had an overlap of 7 studies that were included in ours and we looked at the relationship and the reliability between our ratings and their ratings was 0.9 one small sample no.1 fell off the diagonal they they were all along the diagonal no outliers so in terms of moderators we did these meta regression analyses where we looked at design and characteristics of measurement artifacts study quality raters and so on sample characteristics age percent female gaffe scores we were limited in some of the characteristics we could look at because not enough studies may have had measures that cut across these the various characteristics and we looked at treatment characteristics orientation PDT for psychodynamic CBT for cognitive behavioral DBT versus non DBT studies we looked at intensity in terms of the number of hours per week length weeks total attention as well as individual versus group inpatient vs. outpatient etc I’m gonna turn to results you guys may be waiting for it much like that lady in terms of our sample the mean age was 31 mainly women in the studies we had 32 controlled studies that as I mentioned earlier 10 psychodynamic 22 cognitive behavioral 7 RCTs of psychodynamic treatment 19 RCTs of CBT 14 dialectical behavior therapy report the pre post analyses included 17 psychodynamic treatments and 44 cognitive behavioral we rated the the quality of the studies the mean quality was 15.3 a little bit low given the range but the RCT whoops the RCT qualities were higher we wanted to be able to look at quality as a moderator in terms of what we found with with the within-group effect size we actually found an effect size a hedges G of 0.8 to 8 and the confidence interval you can see is actually quite nice and tight but effect sizes are I find effect sizes understand what an effect size means or on is it’s like I’m trying to understand what a heredity index means the numbers seem to make sense but they’re actually hard to interpret and so I thought I would actually try to unpack this a little bit and try to explain to you what it actually means that the effect size is 0.8 to 8 and and and so what you do with the effect size is you actually look it up on a normal curve to see what where the percentile Falls and what you find if you do that with an effect size of point eight two is that if the average patient being treated at post treatment sorry the average treated patient is is doing better than 80% of patients entering a treatment and so that’s actually considered a large effect size and and it is actually the good outcome that we found the rest of it could probably be characterized as the good the bad and the ugly but in terms of pre post effect sizes that’s actually a nice effect size where you can think that a patient eighty percent of your patients at the end of the average patient will be doing better than 80 percent of patients that would be entering the

treatment now in terms of the between-group effect size it was much smaller it was 0.23 and that’s actually considered a small effect size you can see the confidence interval is not quite as tight but what’s really important here is it doesn’t go below zero so it’s a reliable finding and the effect size range is still all within the small effect let’s see in terms of trying to explain what that means there’s there’s a couple of ways we could do it the way I described the within-group effect size it turns out that this exercise corresponds to the 58th percentile which means that the average patient in the in the experimental condition the the psychotherapy condition is doing better than 58 percent of the of the patients in the control which is not all that great the put it into another statistic that might be more easily understandable is you might see here the number needed to treat it’s 9 that’s also considered a small effect what that essentially means is that you need to treat nine patients before in your experimental treatment whether it be DBT TFP schema focus therapy MBT before you’re going to see one patient that does better than the patient in the control group if you look at a lot of the psychotherapy studies what you find is that they have sample sizes of about twenty or thirty people and what you’re talking about is essentially differences between groups that are probably a function of two or three patients actually doing better in the experimental group than in the comparison group so there it doesn’t it doesn’t look so good but there there’s more to the story oh this didn’t show up nearly as well as I would like okay this is a funnel plot and what we did with in using this funnel plot was to actually look and see if there’s any publication bias that might be affecting the the effect sizes and it turns out that there is you can see that there are studies that aren’t represented here where these are these would be low end studies that also have low effect sizes most likely these studies can’t get published okay most likely those studies can’t get published okay and this trimmin leaf this treatment Phil plots sort of shows us where those studies would actually be let me talk a little bit about the moderators because that’s actually where I think a lot of the important findings are we found that the year of the publication actually as an artifact role map modifier moderator actually affected the effect size as well as using dichotomous variables as opposed to dimensional variables and the study quality too so interestingly the higher to study quality the the effect size and this is actually not a finding that’s inconsistent with other disciplines we also looked at a control group it turns out that if you use the waitlist control group you had a higher effect size than if you use the t-i you control group there were a handful of studies that did use waitlist controls the majority did use tau groups we also found that component controls actually had actually had smaller effect sizes than then ta use there were only a handful of studies that used component controls and then there was no differences between tau versus a empirically supported treatment or a active treatment that wasn’t yet empirically supported and that’s probably because there was a lot of variation within these within these groups in terms of outcome measurements having blind graders resulted in better outcome than then self-report ratings so we got stronger findings when we were using observer raters you got even more stronger ratings if you use non blind observer raters and completer analyses provided stronger ratings than ITT analyses I will I actually did so finding that I think is left off this slide which is actually that four person four measures that looked at personality variables there were smaller effect sizes than for symptom measures and this relates to Joel’s talk which there are lots of studies that show that would between four and six months you see a lot of symptom reduction but that personality change if one was interested and it doesn’t come nearly as quickly age also the younger the patient the the worse the outcome sorry the older the patient the worse the outcome but in terms of treatment and this is important there were no differences between psychodynamic treatments and cognitive behavioral treatments both in between and within groups although there was a trend within groups favoring actually psychodynamic treatment we’ll talk a little bit about that and in terms of looking at non DBT versus DBT treatments no differences again there was actually a trend

with four within-group effects favoring non DBT no differences in terms of contact length intensity and let me sort of address both those trend findings it turned when we sort of explored that more what we found is that there’s a lot of DBT studies that have been coming out in the last few years that look like they’re using convenience samples that are available to people in fact four out of our five dissertations that were included word of DBT studies and they had and and these studies had lower effects which pushed it in this direction we don’t really think that DBT has lower effects no differences in terms of modalities and settings we did a final of meta regression analysis and essentially confirmed the findings with regard to a control group with the within group analyses as well I’m sorry with regard to radar observer raters with the within group analyses and also with completer status and with the between group effects confirm the findings with regard to control group status and study quality okay I know that that’s a lot this is actually showing your publication related to study quality it’s getting better our study quality is getting better over time but unfortunately as the quality gets better our effect sizes are getting lower and this shows you you can see sort of a scatter that there’s really no difference between dynamic and cognitive behavioral in terms of study quality there okay two minutes I think I can do this in two minutes okay so a quick summary the within group effect sizes were large the between group effect size which are more important or were actually small no differences between CBT and psychodynamic treatments and both those within in between group effect sizes no differences in the quality between CBT and psychodynamic treatments study quality correlated with publication year study quality correlated negatively would affect size and there seems to be a publication bias where a small end studies with small effect sizes are not being published and therefore inflating the effect size so that 0.23 is actually could be lower if those studies were actually out there we had moderators which is important but our moderators other than age tended to be all design related issues so we’re what we’re finding is that actually what’s influence in the outcome doesn’t have to do with the issues around treatment or issues around patient characteristics – age but actually has to do with how you actually design your study and now we know a lot more about what kind of studies we might want to show but the important point here is that there is no evidence of the superiority of any one treatment over the others and this is important because this means that there are more treatments available to clinicians to use when they are actually if confronted with a patient with borderline personality disorder and let’s face it you know people come out of TFP or schema focus or MBT or DBT having failed those treatments and now they have options and I’m seeing Mary standing them suggest I guess I may be out of time I’ll stop here and hopefully if people have questions I can get to some of the other points I might have wanted to make if this time