Behavioral Assessment with Mobile Sensors

Hello everyone. Welcome back to the online version of NIH mHealth Training Institute. This is the module on behavior assessment with mobile sensors. My name is Santosh Kumar. I’m an Associate Professor of Computer Science at the University of Memphis. So, I’ll begin by talking about the guiding principles of developing behavioral assessment from mobile sensors. When we look at any particular mobile sensor and target a particular behavior for assessment, we ask ourselves a question about the method we are going to develop Will it be zero burden. That means it should not have any impact or any burden on the participants to collect those measurements. And if we can do so then these measurements will have large scale adoption over the long term and we can, indeed, have assessment in the natural environment for long periods. The next question we ask is: Can we make this assessment continuous so that any changes in the behavior that we are targeting will be captured. For example, if we are targeting stress, stress can occur at any moment and therefore the measurement that we are collecting should be collected at such a continuous basis at high enough frequency that any occurrence of stress should be captured. The third question, since this is going to be a mobile assessment in the natural environment, it must be robust, which means that when people wear it in a free living natural environment then we should get predictable performance out of this, that is, that it will tolerate all the noises and confounders. And then, the whole purpose of developing a new measurement is that it helps improve some clinical outcome. So we should always ensure that what we develop is clinically useful by either validating with existing gold standard or demonstrating its predictive utility in a clinical outcome. And then, if we are developing a new measure it should be such that it has a measure of this nature has not existed before so it must be novel, or fill a long-standing gap that we have had for some time, or it should represent a new capability, something that we had not been able to do that we should be able to do now. So those are the characteristics of the assessment or measurement that we seek using mobile sensors. So, a brief introduction to mobile health systems. Mobile health systems consist of various components. For example, beginning at the lowest layer there are sensors that we wear on the body that could measure our accelerometry movement, activity, location. And then we could have those sensors embedded in the mobile phone itself and they include extremity microphone and GPS. All of these measurements are collected. If they are collected on the body then they are transmitted by wireless channel to the mobile phone. If the sensors are embedded in the mobile phone then the measurements stay right there. So all the measurements, whether on the body or collected from the mobile phone sensors, they are all synchronized together at the mobile phone. Then they are processed to clean up the signals and extract features of interest from them such as heart rate variability from ECG sensor or location from GPS sensor. Once we collect those features then we can make inferences of human behaviors so if appropriate models are applied, again, they can be implemented right on the mobile phone. And when we have these inferences of behavior, for example, a stress level of a person or a smoking status or the activity status, then this is what can be used directly to trigger an intervention, to trigger a self report. Or, if we collect these measurements across lots of people then they can be used offline for conducting health research or formulating health policies. These measurements can also be shared with health care providers who can then make use of it to guide and inform their treatments So in the following when I talk about mobile health sensors, this is the kind of architecture and how the data is collected from them and is processed and can potentially be acted upon. So there are a variety of sensors that are not emerging which can be used for monitoring of the behaviors. There is the ECG sensor which can be used to infer

stress. There is the respiration sensor that can be used to infer smoking or eating behavior or conversation behavior. Then GPS can help infer the exposure to the environment and the current context surrounding the person. The smart watches have recently emerged and they can be immensely helpful. They can help figure out or track the motion of the arms and arms are in some distinct motion during smoking or eating or drinking episodes and they can potentially be detected by the smart watch sensors. And microphones, which can attach to mobile phones or they are also embedded in the mobile phones and they can be used to infer the social interactions as well as the acoustic context surrounding the person. Another sensor that has recently emerged are smart eyeglasses and smart eyeglasses offer two opportunities, one is to know what is the person exposed to so if somebody is trying to quit smoking we can figure if the person who is trying to quit smoking if they are wearing this eyeglasses, are they looking at a smoking advertisement or other cues, for example, seeing somebody else smoke. And then we can also figure out, if they were not able to resist the cues and they drove to a gas station and we can then figure out if they are gazing at a cigarette pack. So, lots of different cues can be detected from smart eyeglasses. They can also help look inward into the eye to detect fatigue or pain. And so there are numerous exiting opportunities from the eyeglasses. And then you could also have environmental sensors attached, connected to the mobile phone, which can detect the exposure of environmental qualities on the person. So, in summary, a variety of behaviors and environmental exposures can now, or in near future, be collected to inform about behavioral exposure. When we get all of these sensors, what we get from them are a time series of sensor data. So the time series sensor data by itself is not very useful. To make it useful what we need to do is we need to convert it into informative measurements. So we need to develop the proper computational models, which if we apply to this time series of measurements, and you see here measurements are ECG respiration and accelerometry, at the top of the slide. And if we apply computational models to them then we could, for example, infer at what points in time was the person stressed, at what points in time were they smoking, at what points were they using cocaine, it at all, when were they walking, when were they talking. And if all these measurements are time synchronized then we can look at relationships across these behaviors. So that’s what we desire. So why has it been hard to do so, or what are the technical challenges in accomplishing this vision? The first challenge is to have the proper sensors that can be worn conveniently in the mobile environment that will be robust for collecting good quality data that will last sufficiently long and be small enough to be worn unobtrusively. So that’s the sensor development challenge. Next when we collect the data from these sensors they are going to be noisy, they will have several confounding events because we live our free life with these sensors. The next challenge is that of big data that must be able to screen, clean, and make inferences from this data — figure out when the sensor was worn, when it was not worn, when is the data not usable, when is the data usable. And after that when we want to infer a particular behavior from a given sensor we have to de-multiplex or extract information. So what happens is if you take any example of a sensor, say an ECG, an ECG will be affected by physical activity, will be affected by stress, may be affected by drug use, may be affected by cocaine. And what we get is a single ECG response, a single ECG time series And the responses to all of these events, they are all superimposed together on the ECG response. That’s what we call, in engineering, a multiplexed signal. What we need to do is we need to somehow tease out individual events whose effect we are trying to isolate

And then we can infer a variety of behaviors from them — stress or smoking, conversation or drug use — from a variety of sensors. So for any sensor we pick we are faced with this problem of de-multiplexing, or extracting, the measurement of interest. And then what we are going to measure, health decisions may depend on these measurement. So it’s extremely important that we assure good quality of inference from these sensor measurements. A chain is only as strong as its weakest link. So that means that throughout this entire system from sensor development to cleaning, processing, to inferencing, to our pipeline has to be robust and rigorous and should have predictable quality so we can make health decisions with confidence. To further illustrate the challenge of inference, it has been known for a hundred years that stress activates the autonomic nervous system and its effect can be observed in ECG signal. So one would expect, then, that if we look at the arousal on the ECG signal we should be able to infer that somebody is stressed. But that shows that if we lift our hand or if I’m doing public speaking or if we are just conversing, or if there is any caffeine use, all of them also elevate ECG. So when we see an elevation in ECG, is it from stress, is it from activity, is it from conversation? Unless we tease it out we won’t be able to make inferences of stress and the measurement will not be useable. As if that was not enough of a challenge, every person is different and so there is wide diversity in the physiological response to the same stressor for different persons. There is differences due to age, due to gender, due to ethnicity. And even if all of them are the same, each of our bodies is made differently and we react to the same stressor differently, given our coping methods, given our fitness levels. So any model that we develop should basically be able to infer the measurement in this example of stress for any given person without them having to go through a calibrated training session where we expose them to probable stressors to calibrate the model. Any model we develop should be able to account for all this diversity and still work for each individual. Finally, this is a new measurement that we’re developing. If this is a new measurement that nobody has had this measurement before, especially in the mobile environment. chances are we don’t have any existing gold standard. So when we claim that this particular measurement is useful, is valid, how do we validate it in the absence of gold standard. And that’s another big challenge when we go to develop any new measurement of behavior from mobile sensors. There are several benefits of developing this new assessment, new measurements, from the mobile sensor. One is that once we have this comprehensive characterization of the individual in terms of the behaviors they engage in, in terms of the exposures they have had, if we then analyze it together with genetic analysis then we can try to tease out the interaction of genes involvement to get to the causation of complex diseases. Another benefit that we have by using these measurements is that we can also use it to realize the vision of personalized medicine, called P4 Medicine — Predictive, Preventive, Personalized and Participatory Medicine and Precision Medicine. How? Because remember the time series of the measurements I showed. If we are able to collect measurements of various behaviors and adverse health events such as smoking labs or drug use labs or an onset of congestion of a congestive heart failure patient, or an impulsive eating event in those who are combating obesity, then we can actually go back in time because we have measurements from these other sensors that detect the location, who the person was with, whether they were stressed. We can go back in time and try to find out are there any signatures in these sensor measurements that are predictors or precipitants of these adverse behavior or adverse health event. And, if so, can we detect these predictors from the real time measurement of the sensors. And if we can do that then we can intervene well ahead in time before the person lapses or goes through an adverse health event. And if we can do

that then we will be able to realize the vision of P4 Medicine and Precision Medicine for a variety of health outcomes. So the key questions we ask. I talked about the desire for guiding principles when developing a new sensor measurement. So when we identify a particular measurement then we ask ourselves a series of questions to finalize whether we should define the researchers end off for that particular behavior measurement. The first question is why is this measure important? Is this generalizable enough, where will it be used or is it just an interesting scientific exercise? So unless we establish a wide utility the investment will not be warranted. Second is, are we going on a wild goose chase? So this measurement has not existed, it is novel, but is it because it’s just not a solvable problem, that it’s too hard to do? So to understand that it’s important to get to, okay, such a measurement was desirable but we have not been able to get to it and the reason we have not been able to get to it because such technology didn’t exist or we didn’t know about how certain behavior are precipitated or what happens during a certain behavior. We need to get to what has been the core problem or core hurdle in developing this particular measurement. Third, what are the robustness requirements of this particular measurement. Does it need to be measured continuously? How quickly does that target phenomena valley. For stress it can happen any time. If it’s eating behavior it can happen any time. If it drug use it can happen any time. If it’s smoking it could happen any time. But if it is, say, congestion in the lungs of a patient, if it is the gene expression, these change more slowly and therefore we may not need as continuous of a measurement, or on the go may be sufficient to do the measurement in the home. So given the target measurement what are the robustness requirements with respect to variability in the subject, variability in the situation of the same subject, variability of the environment where the sensor needs to operate. And then if we develop it how would we declare that we succeeded. What I’m going to do is illustrate this entire process by a case study that we just recently published. This is about modeling cocaine use from ECG data. Thus far, cocaine use has been measured in the lab setting but there didn’t exist any method to be able to detect cocaine use from sensor data so you can pinpoint the timing of when cocaine use occurred. And, again, if we can do that then we can go back in time and find out predictors of cocaine use and use that in a prevention program. What I’m going to describe is, first I’ll talk about the data collection. How did we collect the data in the lab and in the field? Then I’ll talk about what kind of data did we get from this study. Then I’ll get to how did we develop the model, what were the challenges in developing the model. And then I’ll conclude this particular case history with prospects as to what do we do now that we have this new measurement. So how do we use that? To collect these measurements we used a sensor system that we have developed for collecting physiology. It’s called AutoSense. AutoSense was developed from the funds provided by NIDA as part of Genes, Environment and Health Initiative at NIH and it has been developed in collaboration and directly at the lab of Dr. Emre Ertin at Ohio State University. He is the electrical engineer who is developing all the sensors that we use AutoSense, by now, is kind of mature for use of scientific studies. We have had 30 daily smokers who wore AutoSense for one week in Memphis. Then we recently had 42 drug users who wore the AutoSense for four weeks in the field environment. And so using all these measurements we collected from AutoSense, various studies we have done with it, we have developed models of physiological stress response and we have developed a model for detecting cocaine use that I’m going to present in this case study. We also have developed some preliminary models of detecting smoking from respiration measurements and conversation from respiration measurements. We conducted an in-residence lab study to collect cocaine response on ECG in a lab setting where we could have clean data to be able to develop the model. We had clean drug users in Johns Hopkins BPRU lab, it is an

incidence facility and we had study weeks, the first and third and fifth week and in those weeks whenever there was a cocaine session then participants wore AutoSense for 8 hours not just when the cocaine session was administered but also during free living activities. But there was somebody watching so if there were certain events such as smoking of TV watching or video games then they were all recorded so that they can potentially be used in developing the model for cues. And then we had another lab study at NIDA that is ongoing where we have six more drug users who went through in-lab incidence study. In the Johns Hopkins study we had lab administrations of 10mg, 20mg and 40mg of cocaine. And in the NIDA lab study we had 25mg of cocaine uniformly for each subject. So indirectly we have nine participants on whose data the model was developed And to validate the model we collected data from the field on 42 illicit drug users who wore the sensors for up to four weeks. Some of them are still continuing in the study and four of them had dropped out after the first few weeks because of not complying with the protocol. So when participants wear it in the field they report to the lab daily and when they are in the field they self report at the source of stress, smoking, craving, and drug use. And for drug use they also mark how long ago did they use the drugs. Next I’m going to describe what kind of data did we get from the field because this is one of the first times that physiological measurements have been collected for such a long duration and that from a difficult population. So, as I said, we have 42 participants, several of whom, more than 35 of whom have completed, and some of them are currently ongoing. We had 922 days, person days, worth of data already collected. That amounts to more than 10,000 hours of both ECG and respiration data. And they collected EMA self reports when prompted. And they provide smoking reports and they also provided drug use reports. So we had 142 reports of cocaine and 211 for all variety of other drugs out of which 142 was cocaine. The first step in the data analysis is to be able to figure out the screening and cleaning part — which data is usable, which data is not, when was the sensor not worn, when was the sensor loose. And unless we figure that out we won’t be able to get useful results out of this. So what I show here are examples of how we detect when the sensor quality is good and when it is not good. So we look at the morphology of the signal to decide when the signal quality is usable and when it is not. Then we looked at, so we got good quality data, more than 11 hours per day on average of physiologic sensor data, even though it requires us taking the electrodes, which requires participants to put it on and take it off by themselves. But we also wanted to look at what worked and what didn’t. There are several factors that could be responsible for data loss that could be improved. So we looked at the factors such as phone being turned off due to battery or intentionally by the participants. We looked at sensors being turned off or sensors being off body or sensors being taken off, perhaps because they were engaging in some activity, contact sports or some physical activity. Then, how much data did we lose due to attachment issues and how much data did we lose because of the wireless losses. It is important to note that AutoSense sensors collect the data but they don’t store it locally, they transmit all the sensor data to the mobile phone by wireless channels at 32 packets per second. So every second there are 32 times that they transmit the sensor data to mobile phone, and it is done continuously so as a person moves through various environments there could be interference, there could be losses, there could be physical separation between the sensor, the sensor could be worn on the body and the phone could be on the table, people could go to visit the restroom. There are several reasons why we could have losses in the wireless channel as well. What we found in terms of good data, as I said, is we lose less than an hour per day due to the phone being off. The people are in the study, that means when they put on the sensor and they take it off more than 14 hours per day that means when they put it on in the morning and take it off in the night there are

14 hours in between when they are in the study, that’s when we consider them to be part of the study. Out of that, the phone is on for over 13 hours and the sensors are also on for over 13 hours, the sensors is worn properly on the body, that means for 13 hours, over 13 hours per day sensor is on, the phone is on, the sensor is on the body and out of which we get over 11 hours per day of acceptable quality data. Things to note here, we don’t lose a lot of data due to the sensor being off or sensor’s battery going down because these sensors last over 10 days even with continuous sampling and wireless transmission We also don’t see them taking the sensor off from their body frequently. But why are we losing data? We are losing data because of not being good quality. Then we investigated why is it not good quality. The primary reason is not wireless packet losses or wireless disconnection. It is actually the attachment loss and even in the attachment loss it is the intermittent loosening so while the person is walking or they go about doing various things, the respiration band might slip, the electrode may become loose over time. So it’s mostly lost in the attachment issues at this point. So what worked? Wireless link is particularly reliable and is reasonably mature now. Physical separation between sensor and phone is not a major issue. Sensor lifetime is pretty acceptable. Respiration sensor attachment is also robust. Skin contact requirement for ECG, that probably is an issue Phone being off is an issue because we lose about 0.8 hours per day of data. And with the phones becoming more energy efficient or doing smarter sampling and processing on the phone we could further improve that. And at this point people are not in the stream for over nine hours and one could make it more convenient to have them in the stream for longer Next I’ll talk about the model development and evaluation. What has been known are how does the body respond to cocaine when it’s administered in the lab and people have done administration of cocaine in the lab from various modalities — smoking, sniffing, intravenous IV, or from dilation and their effects in terms of the activation of physiology is different in terms of how delayed of an action there is, what is the duration of action. But consistently cocaine does activate physiology. Here is a snippet of data from the lab at Johns Hopkins. What you see here are administration of cocaine 1mg, which is a placebo, then 20mg and 40mg. The green dots you see are the interval between successive beats of the heart, which means that if the heart interval is low value that means the heart rate is rising. So it is the inverse of the heart rate. So dip in the heart rate interval means rise in the heart rate. So, as you can see, activity activates the physiology that increases the heart rate or decreases the heart rate interval. And so does cocaine. And you can also see that the level to which activity induces the change in physiology is even more than that due to cocaine. So yes there is a pronounced effect due to cocaine but so is due to activity and that becomes a big confounder for us. What you see here is the data from the field so now there are a lot more activities happening in the field and you see that the cocaine report that the participants provided was about 100mg and that was about 80 minutes ago. So the two dips you see before that solid line with the marking of cocaine are the potential candidates that could be due to activity or could be due to cocaine. And our goal is to be able to apply the model to figure out which one of this is due to cocaine so we can pinpoint when is it that cocaine was used. There are several challenges in developing a model for automated assessment of cocaine. First, this is a time series of data and the model we are talking about will just get this data as input and you’re supposed to figure out which of the responses is due to cocaine. And so that means it’s going to look at a dip or a valley in this heart rate interval time series but you see there are lots of little and large

valleys, some due to activity, others due to tiny movement, others due to cocaine. So its goal is to be able to locate the beginning and end of each of these dips in the heart rate interval. Then there is wide variation in the dosage and modality of administration and the model should work, should be impervious or tolerant to this variation in the doses. We administered up to 40mg in the lab and what people report in the field is 400mg so we have never seen this kinds of dosage in the lab setting. In the lab what we had was mostly IV and what people report from the field are smoking, sniffing, dilation, all various kinds of modalities. Next, we have no field data for those in the lab and all the field data we have for those who never had a lab session, lab administration of cocaine So that means we can’t train a model in the lab for a particular person to work in the field but we need to have a model that can work on anyone without having to do a cocaine administration in the lab. And then, as I showed, we need to distinguish those dips in the heart rate interval that are from activity from that due to cocaine. The issue is that, suppose we looked at accelerometry and see there is not enough movement on the accelerometry indicated by accelerometry and we see a significant dip in the heart rate, then we could perhaps infer there is cocaine. But what we notice, usually, is when people take cocaine they are physically active too, so they are not static as happens when we strap them to a chair during the lab administration. So we need to not really distinguish from physical activity, we need to disentangle the effect of concurrent physical activity from that due to cocaine. And without having any lab training we couldn’t ask people after administering cocaine to them to now please walk on a treadmill. There has been recent work, also, for classifying each ECG cycle in whether it is of cocaine class or not, using clean data from the lab. And we see that the output of that model is actually denoted, the red ones denote that the model thinks that its cocaine and the blue one means that it is not cocaine. So as we see, if a model is trained on clean data from a machine learning model then it is not able to generalize the data, even during the lab session, especially when there are physical activities. So our approach is actually to get some clues from how the body responds, what happens in the body when cocaine enters the bloodstream. The first task is to locate the candidate windows. So what we use is something from the stock trading. If you look at the DOW you see lots of noisy variations of the stock price and still stock brokers are pretty smart and they are able to figure out when is a stock going on upward trend and downward trend. So they are able to locate the beginning and end of those valleys. So we borrow this approach that they have used to locate those upward and downward trends to find the beginning and end of the segments in the time series data that may indicate the start of a cocaine use episode and of the recovery. This method is called Moving Average Convergence Divergence approach. Then for physical activity decomposition, that means a person took cocaine but they are doing physical activity as well. So how do we figure out whether in this particular instance the dip in the heart rate interval we see is due to physical activity alone, or is it both physical activity and cocaine. So we use only clean recovery segments. Our idea is that when the body recovers from an elevation in the heart rate at that point the parasympathetic nervous system is acting by itself if there is no other factor driving the sympathetic nervous system. If we look at the clean recovery segment when the body is trying to recover to the baseline and there is no further activity then during those times it’s only the parasympathetic nervous system that is active and we can then estimate

the behavior of this parasympathetic nervous system by using those segments. Now what happens during if the person had cocaine as well and the physical activity that even during those recovery segments, that means there is no longer any physical activity, there is still cocaine left in bloodstream, it has not fully metabolized and therefore it is still driving the sympathetic nervous system. So that means the recovery is not going to be as quick because there is a dampening to the recovery so cocaine is stretching the recovery. It’s trying to prevent the recovery by activating the sympathetic nervous system. So by looking at the kind of rate of recovery, one could distinguish cocaine from physical activity. But then we won’t have the rate due to the cocaine dampened recovery when the activity is mixed. So what we do is we try to estimate the behavior of cocaine metabolism in the body and then estimate what the effect of this will be on the sympathetic nervous system. So we basically model the behavior of the entire autonomic nervous system during the recovery period of these heart rate intervals that we are looking at as a time series. The recovery of constants for each individual is usually different, sometimes it’s also used to assess the fitness of the individual by having them go through a stress test. In our case we had a variety of physical activity at the source in the field, we are able to look at those and then look at the urine reports to identify which days these participants did not have cocaine. And from those days we look at their activity at the source and then estimate the time constants associated with their own parasympathetic nervous system recovery. But when it comes to a cocaine model, that means a cocaine metabolism model, there we used a population model because we don’t have any lab data for those in the field. In summary, the entire step is we get sensor data. We detect the heart rate intervals and remove the outliers. We locate the candidate segments. We do the smoothing to locate the beginning and end of those episodes of cocaine effect. Then we identify the window. And then we prescreen the window if the window we located is not wide enough or deep enough that the episode is not due to cocaine, maybe due to short activity at the source, also if the window is wide but the entire activation can be explained by accelerometry then also we prescreen it out. Then we extract all the candidate segments. And then during model development we estimate the cocaine dampening parameter. We estimate the activity recovery parameter. Once we have those parameters and when we apply to test a particular candidate window segment in the field then we use these parameters to test and see which particular model cocaine dampened recovery or cocaine free recovery is a better fit to the recovery segment. And that’s how we are able to classify whether a cocaine event occurred at a certain point in time, or not. This is a depiction of how the moving average convergence divergence method works. The purple, the blue and the red lines are the MACD and signal lines and their crossovers are the ones that indicate when the dip has begun, for a valley, when the valley has started a recovery. And then, again, where I show the arrow is when the valley has turned into a recovery mode. And then the next crossover will mark the end of this particular episode. Now to get to the model. What is our model of the parasympathetic nervous system, of the autonomic nervous system. So what we model is the deviation from the baseline. So there is a base heart rate interval and when there is a deviation from base that’s when we see the impact of the autonomic nervous system. So there is baseline B and then there is deviation from the baseline, which is y, and there is a nuisance parameter, noise. So the PNS Recovery Model, if we model it as a function of time, takes this form of a simple differential equation and u(t) is the SNS arousal component and so if the arousal component is absent, which will be the case in cocaine free recovery from activity, here is the explanation for how we estimate the time constant for the PNS

system recovery. Here is a depiction of how good the fit is, so as we can see, the fit is pretty reasonable in terms of the recovery from activity. So when you see the time constant, tau of R = 0.3576 which corresponds to about 3.8 or 3.7 minutes to half recovery. That means that in 3.7 minutes a person would have recovered up to half off the baseline or from where they are right now. The median recovery constant for the nine participants who were in the lab is 0.32. Next we wanted to model cocaine metabolism and for cocaine metabolism, not absorption model, it should be metabolism. So this is going to drive the sympathetic nervous system, also similar to the parasympathetic nervous system we also model it as a dynamical system and when we substitute the u(t) allele we assume it to be zero for cocaine free recovery. Now when there is cocaine dampened recovery there is going to be a sympathetic nervous system and also activation to it. And in this case we have two additional terms. So first term, as you see, is for the natural recovery, that means cocaine free recovery. The second term is the SNS activation, which is trying to dampen the recovery due to cocaine in the bloodstream. This cocaine in the bloodstream is actually being metabolized at an exponential rate so that means there is a decay of that dampening effect due to cocaine metabolism and that is the third term. Here we show the fit for the cocaine dampened recovery for the tau-D, which is the time constant for cocaine metabolism. We have about 0.023, which corresponds to about 43 minutes to the half life. So that means it will take that long for the cocaine to be metabolized by half. So once we have both of these, what we look at are the errors, the miniscule errors from fitting the two models and which model provides the better fit that’s what we use as the class, whether cocaine class or not cocaine class. In the picture you see below we show when we fit the cocaine dampened model to a cocaine recovery segment verses a cocaine free recovery curve, the cocaine dampened recovery curve provides a much better fit and therefore this particular episode is classified into the cocaine class. I’m just going to provide a snapshot of the results and you can read the paper for more details. In the lab we had nine participants. On that we can achieve a true positive rate of 100% by keeping the false positive rate to 10%. On the field data we had 30 participants out of 42 who had cocaine use episodes in the field and their data is good and so from them we have 28 such cocaine episodes and we had more than 1,000 non cocaine episodes from which we are trying to compare our applied model to see if our model is able to rule them out. We can have a false positive rate of 4.5% for this large of this non cocaine data if we keep the true positive rate at 82%. So I’m reading off the Auto Seeker, which is explained in the paper. If we want to improve the true positive rate we get some parity in the false positive rate. So this is a snapshot of the same field data that I presented, that I showed earlier. You see that there are two windows that have been marked as cocaine from our model and it is possible that the participants may have had repeated administration. So they reported one of these windows self reported to have had cocaine. The model finds that the second, the next interval following one may also be due to cocaine due to the slow recovery. There are several improvements that one could make to the model. One could personalize the cocaine absorption rate, cocaine metabolism rate, using age, gender, weight, BMI, all of which affect that rate. You could model the SNS activation due to physical activity to see if the activation of the heart

rate interval is commensurate to the intensity, but estimating the intensity of physical activity using one extra meter on the chest is going to be challenging. And so what people presented was explainable, you can say white box model. One could also try to use more powerful black box model such as SDM by extracting several other features in addition to this one feature that we estimate and that is the recovery time constants. And one could also estimate the dosages from the model. So once we have such a model, what can you do with it. The goal basically is to go from detection to prediction to action. What I mean is that, so we have a measure of cocaine use. If we look at the time series of the sensor data we also have a model for stress from ECG so you could have a time series where you know the time in which cocaine use occurred and you can go back in time and say “what was the stress level or what was the pattern of the stress leading up to that use episode”. You could also detect smoking from respiration and the smart watch sensors. So from that you could look at was the person smoking and could that be the cue to the use episode. You could also estimate location and what kind of neighborhoods they are passing through when they had a use episode. You could also infer social interactions where they are talking to someone, or if so with whom. If we had smart eyeglasses we could capture several other cues. There are several other sensors from which we could capture other cues or predictors of the cocaine use episode. And if so, then we can use that in a just- in-time intervention for those who are trying to quit to trigger intervention as soon as those predictive signatures on these other sensors are observed. The work I presented here in this case study has been possible with a large contribution from several other collaborators that we have. In particular I would like to mention: Dr. Kenzie Preston, whose lab collected all the field data and a lot of the lab data; Dr. Annie Umbricht, who collected the first lab data that we used to develop the initial model; and then Dr. Emre Ertin, who not only developed the sensors that we use but also he was key to developing the dynamical system model for the autonomic nervous system. And then there are several students at Memphis and at collaborating institutions who have contributed to the software, to the model, and to the software that we run on the phone, and to various aspects of our research. Next as to our conclusion I would like to point out that if any health researcher is seeking, is considering seeking out a computer scientist. What do computer scientists bring to the table and what should the health researchers weigh before they approach a computer scientist? So computer scientists, like us, can provide new capabilities with respect to assessment of, and from, sensors and converting the sensor data into appropriate models that can indicate a variety of behaviors. So if you are a health researcher and you are looking for a new capability that doesn’t exist before, that’s probably the right reason, one of the proper reasons to seek out a computer scientist to see if such a behavior assessment is possible. But, remember that if you are seeking a computer scientist to develop a new measurement, this is going to be a new technology. And even new technology has lots of uncertainties. So, if you want to be the first in that boat, then you will have to be ready to handle the uncertainty and the setbacks that is associated with getting any new technology to work. This drug use study that I mentioned was supposed to be done in one year and it took us three years to complete this study, which has just been completed now. So it takes quite a bit more effort to get a new technology to work and to perfect it and to get it to such maturity as you would expect in an established product. So if this information interests you and you have any questions, please feel free to Email me (santosh.kumar@memphis.edu) or call me (901.678.2487) I would be glad to provide my feedback and participate and help you Thank you for your attention. Good luck with the program