Moving Healthcare Analytics to Hadoop to build better predictive models Saving Cost and Lives

so good afternoon this is sunil crackly I’m from dignity health I’m a director for IT indicating health and today I’m here with cheap medical information of his ass ass dr. graham hughes he is going to join me at the end of sessions to do any discussion or any questions if we have so what we’d like to do in this presentation is share our journey of moving analytics and specially healthcare analytics to Hadoop and like we started our program about any year back and like Hadoop Jeremy really started about five months back and we wanted to show you our learnings and where we are at so real quick question like how many of you are from health care or health care background okay that’s good and meet anybody from Pierre side versus prayer side and then provider side okay thank you very much for responding to that because health care is a different spectrum different industry itself I moved from retail i was in retail and doing the Hadoop implementation took her from 10 nodes to about 600 700 nodes and moved a lot of analytics and data processing there and now I’m in healthcare and I’m learning a lot of new things in last or one year that how complex healthcare industry is such a very exciting time for people in I’d especially in healthcare IT because there are a lot of opportunities emerging out because the technology such as Hadoop is evolving and it’s becoming more mature and helping to eliminate some of the constraints we have which in traditional technologies so yep so no matter how much you prepare for this like there will be one technical gauge we can’t predict that with analytics so one of the thing here like you know we want to quickly introduce you who you are what is the dignity health so a dignity health is one of the leading health care provider in the nation as you can see we have 13 billion in assets 55,000 employees 10,000 plus active physicians we have 40 plus hospitals but the key thing here for us is that like we almost spent more than billion dollar to for community and taking care of pores that’s what we stand for dignity health stands for the human kindness we it’s a notion that like you know poor notion if we act with the kindness and we help each other we can actually heal body spirit and mind that’s the motor of dignity health so everything we do in dignity health is around human kindness that’s our philosophy so as you can see that we are we have a presence all over the nation and what this tells is that imagine how big a our dear medical data ecosystem is we have um we have I’m sorry we have multiple registration systems we have multiple acute systems so the data is flowing from multiple directions and this is exactly a good opportunity to bring that data into one place one of the challenges with healthcare analytics is in current state is there is a tremendous amount of pressure on a business and various pressure coming from regulations it is coming from the cost pressure the business model itself is changing you are moving from now a fee to surveys to you are going into the pop health so what this calls for is a powerful capability analytical capability where you need to build and understand what is happening what is going to happen what has happened and that analytical capability is the one which is going to help release some of the business performance pressure so another aspect is the TV you know health care is all about saving lives and you know giving the better patient outcome so potential in health care is very high and just to give an example all million people get affected by sepsis condition in America and if you if you probably don’t know like 28 to fifty percent of these people die sepsis and this number is much more higher than the total number of prostate cancer breast cancer and AIDS combined now why we are talking about this is this is one one situation where the condition itself is a time-sensitive sooner you act and analyze you have potential to save that

line and if you look at the burden this condition puts into the like if you have two percent of the hospitalization happening and seventeen percent of deaths or contributed to sepsis so imagine in healthcare space if you can build some analytics around it to predict the sepsis condition are prettier providing a intelligence or exploratory analysis where you can find out you know the good practices and and not so good practices you have that you have fortunately to make direct impact to the patient and take care of our patients so if you look at the dignity as we said like we have 40 plus hospitals and multiple medical systems there is a lot of data we have data like you know it’s 30 / straighter by data but data is there in silos so what if we can bring in one place then we can start analyzing the complete episodes for a particular person and that’s the rare opportunities and once you analyze that and giving that information to right people can actually make lot of difference now this is the slide from iron mountain what this is showing is if you look at this the challenges with the healthcare data this is a one dimension of it where you have a very sophisticated equipment data is generating in the real light real time coming from different equipments if you look at this slit like you know the patient is right there so we are generating so much of data to do what monitoring but what if we can do analytics on top of that we can actually have potential to reduce any risk associated with the patient now the challenges with healthcare data is number one is complexity it’s it’s it is lot of proprietary formats are there if you have heard the terms like hl7 etc and this this complexity actually creates lot of obstacle for doing analytics the variety of data is high like if you if you go to one mr versus another AMR versus on one register system the data starts it’s also functionally it is the same data it starts becoming a different so making analytics becomes extremely complex it’s a first it’s a fast data like it’s constantly evolving now one of the thing here is you could you could put a very solid Big Data Platform can absorb as much of data you can you can you intact all the variety and complexity but the biggest complexity for healthcare is tacit handling this data in terms of privacy if you cannot get your privacy right the the protection of privacy right for a data you basically does not matter how big your Hadoop cluster is or your other whether you’re using HBase or Krishna that doesn’t matter at all so these the challenges one needs to take care before even if you venture into launching your first production job now that’s a one set of it the other side of the data challenges is lot of this data is sitting into the legacy format now this say our data is not in the paper format that’s just the symbolic picture here but what we are saying is if you you may have a data but if you cannot interpret that data from for analytics it’s there and sitting it’s dark data like we have data in the main frame format so you can imagine the files is sitting in the flat files with cobol layouts where you have a you know flat file is having 17 different formats and each format is need to be read indifferently and that that rigidity actually hurts doing any type of analytics so on one hand we have algorithms very sophisticated algorithms like neural networks and na baze or you know support vector machines those can do miracles to do a good prediction but how do I get data to those algorithms and that logic that’s a challenge and that’s the challenge we are trying to resolve and solve in dignity health so what this needs is a complete different mindset the mindset which goes beyond our traditional thinking of business intelligence where I can ETL data bring the data somebody asked me you may report I give the report and then I keep report like so sometimes you may have 10,000 employees and twenty thousand reports so we don’t know whether everybody is looking at this report but you are churning the data and producing the inside which is either outdated or dead so what you need is a new thinking you know like we have at disruptive technologies like Hadoop and we we have to have complete new problem mindset which means you know I if I am I somebody asked me to do analysis and they say hey I need this for variable I don’t need to go and write an ETL for those four variable I can simply bring

everything and then keep mind open and find out what what else I can find it out and we can do that now in with using the technologies like Hadoop so with that in mind set and with the with the goal to build world class like future state analytical platform we partnered with SAS and come up with this concept or of enterprise data hub and what if you notice they’re like you know it’s Hadoop centric you got Hadoop in the center however we brought in the the unified processes that’s the governance we are putting in a place and then on top of that we are taking advantage of much more matured analytical platform of SAS and so we are taking these two ecosystems and putting this together and all explaining what are the advantages of of this combination is so if you look at this what we are trying to do is we want to build platform ones which can take care of all workloads you know it should help me with riding etl I do not need to go and create another platform for writing etl I it will act as a data reservoir or the data lake it can keep as much data it is scalable I can do a search I don’t have to go on investing to another search engine I can do machine learning in combination with some of this as tools I have the security platform established on top of that we can do text mining we can do natural language processing all those capabilities so this is the this concept thing what what we found the advantage of extending our relationship with SAS which is is that like SAS does all type of analytics in they’ve been doing for more than 30 years and we could easily turn on the value for any of the inside by because i have my data and as soon as i turn on one of the algorithm i started getting the value so that’s the integration real helping but the more critical point is it is a full spectrum of analytics it brings it brings hand side insights and foresight and you can do optimization you can do predictive modeling so in health care although we starting with the focus of provider and appear at some point we do do marketing we do financials so I could use all these tools to explict build the platform and extend enterprise data hub into enterprise analytical hub that’s what we are achieving here so how did we do this and we we started this inside out the critical thing for us is we wanted to get Hadoop right and then we didn’t want it to start any use case until we solidified Hadoop completely so without even thinking about any use case first thing we did is we got the loop we established the audit and logging if you do not have right origin logging in health care really your system does not move is not going to fly with your legal and compliance so we took it we initially tested this entire platform that anybody access is our data we will know who is that it we will also find out that like you know what time that a data is accessed in fact some of the logs will become a part of Hadoop cluster itself so as as somebody accessing something these locks will start getting and streamed into Hadoop then we went to the the establishing the security which is very very critical so we took advantage of sasses Security Intelligence Platform what it allows us to do is it allows us to set the role based security as well as like it can allow us to do metadata base security why is this like important obviously we are securing Hadoop cluster with carbo’s and then sentry to do what Hadoop is doing but this analytics at the end of the day is going to go to the end users and this this analytics needs to be absolutely secured so that like we can come with confidence we can give our platforms to the end users then we establish the data governance there is there is a broad conversation about what is data governance is the it’s not just one product will turn on your data governance it starts with your own philosophy and your own processes you are putting around that our simple philosophy about data governance is this bring data once do not bring it again that starts with creating single source of truth if you are if a patient got registered once you should have one entry for that and that allows us to use the sum of the concept from loop which is like extract load and then transform that’s what we are doing we are not using our traditional approach of ETL we we you know we believe we kill is it’s not relevant anymore we can just do II yell and then write as many transformation I need its establishing the government’s are tighter governance

on data we actually check everybody who is going to create a hive table why are they creating water is the reason what was the source of it and if it is already created you do not need to and after establishing the data governance then we started our use case the journey of use cases now that our experiences if you get this first for right your speed of the use cases started getting and you can run the use cases in parallel one of the use cases like yeah we we process the data and push into analytical enterprise miners as tool which does a predictive modeling for us and the same time I am doing the exploration analytics using the tool called SAS visual analytics so one other thing we did is like we didn’t go and buy our own hardware and start reassembling the Hadoop cluster we put this into in cloud so we are SAS is our partner in this engagement we they have a cloud and we took advantage of their code where we put the our Hadoop cluster so what this does to us is we are focusing truly on what we want to do which is analytics I do not need to monitor how to cluster I do not need to worry about like you know if you know the if some error k-means and warning came in some logs are generating it’s it’s been taken care and what we did is if you notice this architecture where we bring the data into Hadoop that’s the only source we have all the data comes into Hadoop we broke all the rules of normalization we do not need we just D normalizing that’s the key for us if I don’t want to replicate the enterprise data warehouse or the relational database structure into Hadoop when we bring something we flattened it and in health care that’s critically health care the data is not going to be high volume it’s not going to be like you know like Facebook where they are linkedin they will will not have billions of records or transactions we will have you know total population of United States is about 300 to 400 million dollar million people right so if you think about your data and if you get your data architecture right you should have 400 million rows and then you can go as many times you want in horizontal way like you know when person gets admitted have a record after that if he goes for taste add that another combination think about using the columnar technology like HBase and creating this kind of data architecture so our goal is bring the data into Hadoop and secure it there you know we have tighter security with century and carbo’s there and then only system is talking then we push the data into SAS analytical platform to perform the analytical workload in this van analytical work load it could be for data profiling data quality it could be for building predictive models or even to do and web reports or standard reports if you need to so you know bi is not dead bi is going to be there bi has its own purpose it tells you what happened very well and you need to have presence of that and now on the other hand you want to go for the advanced analytics which is like start finding what is going to happen or what I do not know that’s where SAS visual analytics comes into play it’s the in-memory product where when I it and it has the underline HDFS scalable platform so when I move data from HDFS to HDFS I can do exploration in memory and try to find out what’s happening to give an example like you know when somebody gets admitted in registration system you could get records up to 1900 columns so if you want to analyze right as a human beings we can analyze up to seven variables think about you know ability to go and they do multi dimension analysis it allows with the tools such as visual analysis you can do that and on top of that as I said like we put SAS intelligence platform so user always come through this authentication single authentication seamless experience for them because they are using ldap and then they are seeing the insights now we have data scientist who needs to do any deep analytics we can provide an access to them for do platform okay so this is the we believe this is a pragmatic approach and this is why as a as an architect I always want flexibility on my platform but I also do not want to go and start creating something which has already been done because like I am going to do the same mistakes and try to mature so i have this mature analytical platform with mature analytical processes which i would want to take advantage of but i also want to take advantage of Hadoop and its cost effectiveness and it’s all the power auto bringing so we are empowering analytics with Hadoop what we are doing is we bring the data into Hadoop that our data lake and then we process that in Hadoop then we move data as required in to assess I can move in the SAS our

SAS models can be moved into a loop I can do hybrid all possibilities exist for doing this so what this does to us is this we are enabling the data decision life cycle with help of SAS and I have to be combined as you can see Hadoop is at the center of this so if my data grows I am not worried i am going to go and scale it right we started with few nodes right now and like six nodes in production and we are going to add another eight nodes so as we adding the data we started expanding the the notes that is a scalability part of it and then you know I can process the at the same time the power of this I can learn multiple use cases the use cases that actually can do an exploration type on allottees can do predictive analytics can also do a modernization of the legacy systems etc now the as i said the different design pattern that’s what I’m looking as a architec like I do not want to lock everything into one particular format so my solution starts becoming more like fitting into the tools I have I can do like the pure Hadoop play I can do all the work load which makes sense into Hadoop will do it when we need to let’s say run a decision tree algorithm I don’t want to go and create a decision to yoga rhythms one of the learning we have in Hadoop spaces like you know mouth was taking good shape but right now my mouth is is is not production really in my view for the many of the algorithm so you know my option is go and hire data scientist and ask that the descent is to write a code he is going to pick a core and what we are doing is we are writing the same algorithm which is already exists somewhere else so what’s ass allows us to do is with enterprise miner I can do do neural network decision tree I can do na baze you know k-means clustering association anything i need to that’s the power like now we are putting this energy here with this to the if I can if just in case I want to do all of this thing in memory I can lift the data into memory and do an exploration there so a couple of use cases architecture is one of the use case we have is we are building the predictive model so although we are our theme is a big data big data it doesn’t mean that a lot of data you know sometime big data means lost data for example if you are trying to build a predictive model for we can just do heart patients how many have congestive heart patients you are going to have it’s not going to be big data it’s not obviously everybody in America has her congestive heart failure condition that’s not 400 million rows also so in that case what is smart thing to do is use you know it says is that as like in keynote they said use a big data to find a small data and then do an smart analytics on that so you in in this is use case like we use big data that means data is in one place and from there you extract the data which is very specific to what analytics you want to run and bring an apply which you’re the smartest algorithm you have and that’s the first part of it in case your data suddenly grows a lot of people get that same condition now you have data challenge what would you do is you ship this and use the high performance analytics capability and then push the the analytics now into the Hadoop cluster itself so both both platform exist the second architecture here is that like when you want to do an exploration like the exploration means imagine if I create a long record into health care which means every episode are incident since the birth of the person you know and we start adding that it would grow and technology exists to support that now and then this is what’ll and you you want to do exploration of that you need a platter in-memory analytics platform so visual analytics allows us to do that actually it can even do a forecasting type of functions in the fly these are all math computations in the fly so in case the data grows yes you can push that analytics into parallel processing mode so as a typical example here like what you could do is if you get your data right into Hadoop if you build the the data leg and if you create an horizontal record what happens now is you could run this type of analysis on the fly this is one example where like you know one of the challenges in health care is the cost of care if you want to identify where the opportunities where are my best practices where my cost is low versus cost is high I could do that variation analysis using the tools like this as an example in this case like you could see one of the bubble and start finding the anomalies here like why would have this particular facility you know excessive cost versus the sum of the facility and you can drill down to the provider level and it is going to

give insight it’s basically turning analytics on what we do on every day we we are not perfect as a human beings we were we are going to learn from our mistakes or to learn from what we do not do right but it is important how do I find what is not right and these are the tools allow us to perform that analytics so oops okay all right so what we accomplished right now is or you know program is for one year we have enabled several capabilities of analytics which is I can do machine learning predictive analytics I can do descriptive analytics using K means I can do or I can store as much of data i need i can i can do data transformation I can bring the data from multiple resources like I can bring data from as400 mainframe oracle database sequel server database no matter wave data is it is in one place so now it’s talking to each other I can link that with the technology like the capabilities like empi and you know we are we are health care our our primary focus is giving the better quality what this does us to do is if we have any use case which can save our lives we should not be waiting for it because technology is constrained you that was the theme for us so when we build this something that like we don’t want to go back and back and rebuild this again we thought through this deeply what is the best in the business and we brought them together so it is a one platform it can absorb multiple data sources we can do real-time analytics we can manage multiple workloads and we can help many consumer so this is this was the like you know we have multiple use cases running and what is so what is the possibilities right once you have the platform like this if you have the resources available you could run this file and this I put this fire use cases is because it shows the variation and the type of use cases we are running first one we are running the readmission predictive model you know if you are in health care you know how important that is the end you could not only run once you know we can experiment with this we actually you know on daily basis we are trying to find what variables really impact and find using inside that you know the person will be coming back after major surgery second one is the building compass this is a very important and project for our sea mio doctor cholera fee he wants to build this campus so that we can find out a lot about unknowns what is going on in in in our data you know the data about the patient when they come in you know it is used today for you know deriving in a treatment option and treating it but if I find that the N and plus it is by particular provider if you think the one more provider in some other geo geo location is treating the same way are they treating same arbitrating different is there any something can learn all of that you can achieve by bringing data into one place and giving this exploration capability to our data scientist and the clinical information and including doctors because like in health care rather than retail I found out he tell you have people from MBA and then they can analyze it in health care you need to make true impact you really need to ultimately go to doctor who is the knowledgeable person about this data in this science the third use case we are doing is a legacy report system modernization this doesn’t sound very glam us or like North clinical sounding but if you look at in our keynote address like there are things which actually you know we should challenge ourself as an IT professionals why is etl job running for nine hours in this DNA H like you know why if we have technology we should challenge that and take my great that TTL job somewhere else why it takes me to create an a custom report for 80 hours or 100 hours it costs money so why because the data is in mainframe with the only way i can do is i need a guy who knows RPG cobol programming then he can extract and bring it by the time he gives that data the importance of data is gone so those constraints need to be revolved this is it this is if you get the your architecture right what happens is these are the side benefits you are going to start getting it now you can say hey already I have this data into Hadoop i can use pig or high or i can put any any tool on top of which people can read a sequel and create a report on the fly they’re doing another use case on pharmacy analytics so you can start spreading your arms like in the organization saying if there is any other analytical use case you can do it what’s the motivating here is like like

any IT project and traditionally analytics project were treated like IT project so i will come up with estimation i need data etl and so on so now what happens in since data is already sourced okay and since data is ready to be processed and you the language is like Pig etc your productivity is very high with the programming you can easily start focusing on the use case and not focusing on the project management of like how do I get the data so that my data scientist will start it the this is the gap we are trying to reduce so we want to cut the middleman that is what our sea mio say dr. kohlrabi says doctors can analyze this data better than the IT guy we should take the IT guy in the middle out let the data come into how to platform it’s enable to analysis and they can and they can make the better decision and the last use case here listed is empi so you know if you go to doctors place every time you go every place you go you have to write this forms and like you your first name last name blocks date of birth again and again that’s a good but if it is entered incorrectly what happens like you are two different person in the system if you have two different percent in the system you are going to get potentially two different need for another blood test like although your possibilities may be okay and that creates a lot of burden and the cost for everybody what do you do how do you solve that if you can build in a capability something like universal identifier Hadoop is very good for the batch processing and doing some type of analytics like very can it can go and find out oh I am already existing if I can establish that identifier and imagine that every time any new record comes in and I tag that I create this network of the data so i can connect the like you know blood reports versus the excel reports and I can current the admission data I can create the ambulatory data so all of this the data Creed treated by one provider versus another provider that that’s a possibility we are creating so that we can analyze this effectively so it in dignity health changes happening we we just wanted to share technology changes we are making so that we will continue to be one of the leading healthcare provider in the nation so if you have any questions or in discussion i am going to invite dr. graham hughes here on the stage so we can take your questions this is this is the cloud infrastructure of SAS it’s not true amazon cloud it so i would call like it’s more like a private cloud but since it’s a collaboration between sassanids like we we we get our own secure machines okay we we control what the configuration of those machines so the infrastructure like the cloud infrastructure is common but the wii architecture or the plat infrastructure itself is controlled by us a great question so the question is like you know a Muhammad ya Muhammad be from wellcare question is that like you know their bigs ass up how easy to transition the like em from Hadoop to SAS or SATA autocorrelation of mindset so if you look at the data when you do analytics it’s pretty much like a flat when you get the data originally data was coming from let us say in relational database what were you doing is you are spending time of transformation in some ETL tool what we are doing is we took that and then trained our people to write a pig language it’s very easy straightforward people who do is as they can write pic easily if you know SAS programming there is a lot of similarity in that so and like in addition we put the schema stricter schema with a hive so any data which comes in so it’s a readable because it has high anything you want to add on that the only other thing that I would say is that those routines that you already have in place don’t need to be necessarily replaced but as you start to then deploy Hadoop within your environment you can leverage that either to supplement the existing SAS data sets and the routines and algorithms that you already use or you can actually then start to points as directly to run your analytics in Hadoop whether that’s your data quality data preparation data management or your your visualization or your other analytics directly in there

so again you’ve got you’ve got the bit you know the choice of both worlds but you don’t have to rip and replace yes sir great question we are using personal we have certain or and we have G and right now like certain l is oracle base okay so we are we have created our own instance of sir not so it has a real-time replication we do not want to attack the transaction system old TP system so we have the replica or TP in near time synchronization and then we are connecting Hadoop to this this Oracle instance of center they also have met attack but are in the process of Sun setting that as they move to sirna so we’ve decided that we will not spend the time and energy to connect meditech but they do all in the ambulatory side have a a wide variety of different ambulatory EMR systems and EHR systems that we’re in the process of evaluating some of those are actually Cerner others are that the full gamut of different ambulatory mrs that you might imagine and we’re looking at consolidation strategies to interact with those and then as i was chatting to you before about sania i’ve been talking about using some of the emerging capabilities as we start to stream more data and we will also be consuming some of the the fire interfaces through hl7 as a way more dynamically querying into the EMR environments yes absolutely that is a great question so do we have a plan to consume hl7 into Hadoop es so it is basically like the way we architected that like if I want to get the cue from hl7 and then start parsing into Hadoop yes that’s our plan right now we we don’t have any use case but it’s coming soon and so that is our strategy is like absorb that to a queue and bring it in or do capability exists so that is what we we decided like you know we have platform you can bring images and you can do analytics on that we do not have any use case right now to bring the images but technology constraint does not exist now yeah I think he is asking for waiting you okay the grid question the how do you address data set is sharing it is like one hand we want to give the data to more people other hand like you have to restrict it so there are different user groups so you have to start from there there are usergroups which supposed to know what exactly they need to know and then there is a data scientist group which needs to have explore a the death so we have to define and that like is the roof it’s the your name and that is what we did with this the role based security like role be security allows us to actually control how much you can see and what you can see yes sir no we do not yet but you know that’s that’s the that’s a that’s a guaranteed like you know maybe you can dance well pretty much everything that we’re doing is on that spectrum of decision support I think and so if you think about the sort of the traditional concurrent real-time rules based decision support that exists within and relatively rudimentary generally in an electronic health records is that as we get closer and closer to real time that boundary between real time concurrent decision support and asynchronous decision support that occurs through our readmission reduction restoring and our visualizations of care process variation which we are pushing to end users is that is a form of decision support and as we provide guidance on on combination of risk score as well as our next phase will be to then provide recommended actions associated with some of these insights is that I think what we’re going to see over the next few years is that this concept of point of care decision support is a complete blend of knowledge that is coming from declarative and rules based systems right the way through to data driven

systems that will have in place and it leads to another area that Sunil and I talked about which is the need for curation of the knowledge management of the decision and the knowledge framework that is being generated and executed in the environment and we have a number of plans about how we think that we will maintain and manage the publishing cycle of that knowledge so in summary I’d say we are doing decision support today just maybe not the way that it would be being thought of five years ago sure I think can you it yeah good a great question so our readmission existing readmission model is not yet using the claims data we are having the registration data as one of the source and again it’s Oakland primarily clinical data registration administrative data pre judicata claims and and unpaid claims are being added into those models as we speak we think we need to finish thank you very much yeah thank you