ML Options for Mobile Developers (Cloud Next '19)

[MUSIC PLAYING] LAURENCE MORONEY: Good afternoon, everybody, and welcome to this session Today we’re going to be talking about machine learning options for mobile developers I’m Laurence Moroney I’m an AI advocate at Google IBRAHIM ULUKAYA: And I’m Ibrahim from the Firebase data local relations team LAURENCE MORONEY: So we get a lot of feedback from developers, and particularly mobile developers, about there are just so many options when it comes to machine learning Even at Google, we’ve got all of these options You’ve probably seen lots of this today You walk around the expo hall, you see Cloud has a number of offerings We’ve got Firebase ML Kit We’ve got TensorFlow Android has some offerings And we get a lot of feedback about how confusing that is And somebody who’s a mobile developer, who just really wants to take advantage of machine learning, build that into their mobile applications, they need a roadmap They need to figure out, what do they do? Which offering do they use? Which product do they use? So what we’re here today to talk about is hopefully to try and clear that up a little bit, hopefully help you understand what the choices are when you would make particular choices So the question, of course, is, OK, where do I start? What can machine lang– what can machine learning– I sometimes say machine language, but that’s a different thing What can machine learning do for me? And really why should I care? So I always like to level set before I go into some of the offerings of machine learning and just talk a little bit about what machine learning is anyway, and how we think about it at Google, and why we’re so excited about it So to me, first of all, when you think about machine learning, we’re in the middle of this massive hype cycle around it You probably see graphics like these and blog posts and news articles all the time, and they’re just generally talking about machine learning will do this, and machine learning will do that, and machine learning will change the world, and machine learning will take away our jobs And there’s generally a lot of hype around this and there’s a lot of just hyperbole going on around machine learning So what I wanted to do is level set a little bit and say, well, here’s why this is going on So first of all, this chart is the typical hype cycle around a new technology Now, typically, it starts with a technology trigger And that technology trigger then will lead to something that we called the peak of inflated expectations And it’s because of that peak of inflated expectations that we’re getting all those kind of news stories that I mentioned on the previous slide After we pass through the peak of expectations, we tend to fall into something called the trough of disillusionment, after which enlightenment will come and then productivity So as an example of this, remember back in 2007 when Steve Jobs came onstage and he said, one more thing, and he introduced the iPhone? What happened after that was the peak of inflated expectations The iPhone, the cell phone was going to do away with everything We wouldn’t need desktop computers anymore We wouldn’t need laptop computers anymore All we would need is one of these devices in our pocket and it would do everything And while those devices were revolutionary and we all have smartphones or most of us have smartphones now, they didn’t replace the desktop They didn’t replace the laptop We went through that peak of inflated expectations And then, once the trough of disillusionment hits, when people realized, OK, this is what I can do with a smartphone– a smartphone has a small screen, the smartphone is touch first, it has limited battery life There were various things that I have to still be able to do on a desktop that I can’t do on a smartphone Then productivity set in and new businessses came out of that, things like Uber Can you imagine trying to call an Uber on your laptop? It just wouldn’t work So when that trough of disillusionment was hit and people then began to be enlightened about what it is that you can do with a smartphone, then productivity hit in So when it comes to machine learning, when it comes to AI, I think we’re right about here right now We’re still heading towards the peak of inflated expectations And I like to joke that my job as an AI advocate is to tunnel you through that peak and get you straight to disillusionment So I’m here today as a professional disillusioner So please forgive me if I’m a downer with that So let me talk about one particular example Think about something like activity detection Now, I’m curious, sometimes at Next, we have a lot of coders, sometimes we have a lot of people who aren’t coders Out of curiosity, how many of you are actual coders here, just by show of hands? Oh, wow, almost all of you OK, cool So think about something like activity detection if you’re writing code Like, say, take, for example, I have a sensor that’s giving me the speed at which I’m moving So I can say, you know what? If my speed is less than four miles an hour, I can set my status as walking in code like this But now I want to add running to that So now it’s like, OK, if my speed is less than four, then I’m walking; otherwise I’m running And now let’s say I want to add biking to that So if my speed is less than four, I’m walking, if it’s less than 12, I’m running; otherwise, I’m biking It’s a basic algorithm and it kind of works But then my boss, who loves golf, wants me to add golf to it All I can say is, oh, crap How do I write an algorithm for that? When I hit a golf ball it, goes about three yards, so I walk this far But when Ibrahim hits a golf ball, he will hit it, what, 200 yards, 250? IBRAHIM ULUKAYA: Larry! LAURENCE MORONEY: So it’s like the algorithm for him would be a lot different from me,

and we end up not being able to write a reliable activity detection And I know this is a very naive algorithm based on speed, but hopefully it gets the point across So how do we solve something like this? Well, that’s the idea behind machine learning and how we like to think of machine learning at Google So first of all, if we talk about traditional programming, traditional programming is you express rules in a programming language, maybe Java maybe C++, maybe JavaScript, whatever Those roles will act on data, and then out of that, you’ll get answers Like I had earlier on, if speed is less than four, then I’m walking, that’s traditional programming and as high level as I can actually draw it So machine learning, the idea behind that, is we flip the axes on this chart and we end up saying, well, what if we have a whole bunch of answers, and then we have data associated with those answers, and then we have a machine actually learn the rules itself as opposed to me as a programmer trying to express those rules Then some of the limited scenarios and some of the difficulties in a scenario, like activity detection that I mentioned, they could be fixed by saying, I can get a whole bunch of people walking, and I can say this is what walking looks like I can get a bunch of people running, and say this is what running looks like, this is what biking looks like, and, yeah, sort of this is what golfing looks like So the idea behind machine learning is if I get all of that data, then it’s just really as simple as I can have a computer start sifting through this data and spot the patterns that make a commonality for walking, spot the patterns that make a commonality for running, biking, golfing, all that kind of stuff So the framework that we’ve built around this is called TensorFlow And TensorFlow is what I work on Whoops Slightly broken animation there You didn’t see that So going back to this diagram that I showed you earlier on, machine learning, we think about it as we’re feeding in the answers, we’re feeding in the data We write some code in a framework like TensorFlow that matches the answers to the data and then infers the rules out of that What that will do is build what we call a model And a model is effectively a neural network It’s a really badly named technology, between you and me, because it’s got nothing to do with the brain, and sometimes it makes you think it’s a lot smarter than it is But if you think of a neural network, it’s basically a bunch of interconnected functions, and each of these interconnected functions calculate a probability based on their inputs But what that will end up giving us is a model And then at runtime, we’re going to pass data into the model, and it will give us an inference or a prediction out of that So now, again, when I was thinking of my activity detection, I’ve trained it on a lot of people saying what it looks like to walk, run, bike, and golf Now I’m going to pass it a bunch of data, and it will say, I think this looks like walking Instead of me writing code, this inference is happening based on that data pattern matching So building these models is what we call the training phase So as we’re creating a model, as we’re passing it the data, as we’re making a neural network to sift through that data to come up with the patterns that determine one thing over another, we call that training And then at runtime, we call that inference So the idea now is like, OK, now I kind of what machine learning’s all about What do I do? Where should I run my inference? And Ibrahim is going to tell us all about that IBRAHIM ULUKAYA: So now you have the question, where should I do my inference? We have two options here We can do the inference on the Cloud or locally on the device Why I might do it in the Cloud? If you want to integrate into an existing Cloud service, or you may want to use large, complex models that wouldn’t be able to fit in a mobile device, or you want to use aggregate learning from a wide variety of sources And using the Cloud, you can actually minimize the impact on mobile battery and support less performance, older devices What about the case for on device? With on device, you’ll be able to keep the data local and avoid round trip latency This is particularly useful on real-time applications where you want to process multiple frames in a quick succession And sometimes, you want your app to work even when it’s disconnected, and you want to reach users without reliable or affordable connectivity And this time, you will be able On device is a much better inference type And Lawrence will go through the options that Google provides here LAURENCE MORONEY: Thanks So again, thinking about your inference, you’ve built a model You want to run inference on it And as Abraham said, there’s two choices You can either run it in the Cloud, or you can run it on device So let’s first look at if you’re going to be running your inference in the Cloud So you’re not deploying a model to the device Here’s where you see all the Cloud-based inference and stuff that you’ve been seeing at this conference today So Cloud-based inference from Google, there’s three pillars There’s computer vision-based stuff that we call sights There’s language-based stuff for translation and NLP And then there’s conversational-based stuff So these are models that have been built by Google and services that have been provided by Google

So if you’re meeting these three scenarios, and you want to run inference in the Cloud on these three scenarios, it’s available to you Now, in addition to that, the models that we’ve built, of course, can be customized So the customization of these models come through the product called AutoML So you see, like, AutoML Vision allows you to retrain Google’s Computer Vision model to recognize your needs For example, you’re a manufacturer who makes widgets And you want some kind of computer vision solution that recognizes widgets You could use AutoML to retrain it based on your widgets Or in language, for example, if you want to do translation, so translation from language to language with machine translation, it can be pretty good It’s getting better all the time, but it’s trained on a very general set Again, you might be a manufacturer of building something that has a specific language associated with it Maybe your widgets have particular things associated with the widgets themselves that don’t translate properly You could retrain the machine translation models based on your specific domain That’s what AutoML is all about So that’s Cloud-based inference for these three specific scenarios But then we always get the question, OK It’s great if it’s sights, if it’s language, or if it’s conversational But what if I’m doing something that’s unique and something different? For example, somebody asked me recently about they wanted to do network configuration And they’ve these massive networks And right now, all of the configuration on these networks is done manually Wouldn’t it be great if they could see traffic on these networks and then use software-defined networking to automatically configure their networks based on a learned behavior? So there’s a great example of a custom ML model So the question then becomes, well, if I have a custom ML model and I want to host that in the Cloud, what do I do? So typically, there’s three main ways that you build models for machine learning Number one is a pretrained model A little bit like I just mentioned some of the services that we have in the Cloud, somebody has built a model for you Those services I showed earlier on are models that Google has built for you But we’re not the only people who build models So the idea is that there are a ton of pretrained models out there that you just might want to use And you might want to deploy them to the Cloud So we have a service called TensorFlow Hub, which is on, where the idea is that that’s a repository of pretrained models that might break out of the scenarios that we’re hosting and allow you to have a model that then you can host in the Cloud The second scenario is retraining models Now earlier, I mentioned the fact that there’s those three pillars in my previous slide that you could retrain those for your scenario But what if you have a scenario where there’s an existing model, and you want to retrain that? So for example, that may be a computer vision model that you want to retrain, but you don’t want to host it with AutoML Well, that’s also possible You’ll be able to take a model out of something like TensorFlow Hub And you’ll be able to retrain that And you’ll be able to host it yourself And I’ll talk about hosting it yourself in a moment And then, of course, there’s building from scratch You want to start from nothing You don’t want to use anybody else’s model You have a scenario that is so particular and so unique to you And that’s what TensorFlow is all about So TensorFlow is your API that will allow you to define neural networks and train them so that you can build a model OK So then you’ve built a model, and you want to host it in the cloud yourself But you don’t want to use Google Cloud Maybe you want to host it on your own premise Maybe you want to host it in another cloud provider So we have something called TensorFlow extended Now, the idea behind TensorFlow extended is it’s end-to-end, enterprise type for building, managing, and running models But the orange box that I’ve highlighted here is a technology called TensorFlow Serving And you could just use TensorFlow Serving from TFX Deploy TensorFlow Serving to your server or to another cloud provider or wherever you want to put it And then, you can serve models out of that So we want to just try to be able to cover every base that we can cover if you want to run Cloud-based inference So there’s really three pillars Number one, you can use a service from Google and either use that directly or retrain it Number two, you can build your own model and host it on Google Or number three, you can build your own model and host it anywhere you like So those are the three main scenarios that we’re trying to hit So AutoML for the first, TensorFlow Serving on Google Cloud for the second, TensorFlow Serving on your own server or in another cloud provider it for the third And if you want to learn more about TensorFlow Extended, TFX, you can go to It is open-sourced And we have our website around it, which is So next up is on-device inference Now, the idea is that Abraham mentioned, your choice where you’re going to run your inference, number one, you can run on the Cloud Or number two, there was a bunch of scenarios where you may want to run it on-device Maybe you don’t want to pass data up to a cloud to have it inferred You want to do everything locally There’s a great video out there, for example, of farms in Tanzania and Africa who rely on a cassava plant for their food And the problem with the cassava plant is that when it gets diseased, you don’t recognize the disease until it’s too late

But they actually trained a TensorFlow-based model that runs on-device that can actually spot the disease in a plant before the human eye can spot the disease And all they have to do is hold their phone over it, point the camera at the leaf of this plant, and it spots the disease with 99% plus accuracy And as a result, they can root that plant out before it infects the rest of the crop Now, this is on a farm in Tanzania, where they don’t have broadband connectivity They don’t have 4G et cetera, et cetera So everything has to run on-device So that’s just a great example of there are scenarios where you may want to run on-device beyond just privacy and all that kind of thing So when it comes to models on device, it’s actually the same three choices Number one, pretrained models– in the example of the cassava– sorry A pretrained model is you can go to TF Hub You can download a pretrained model And you can just use that directly Then, there’s retrained models And in the example of the cassava, what they did was they downloaded something called MobileNet And MobileNet is a well-known model that’s optimized for mobile use and recognizes 1,000 classes But then, they retrained that based on leaves that they were able to categorize, this is a disease leaf This is an un-diseased leaf And again, using the same thing that I mentioned with the activity detection, you can train it that this is what a diseased leaf looks like This is what an un-diseased leaf looks like, that type of thing So they were able to retrain and build a model out of that Or of course, you can just build from scratch You can create your own model, and off you go But when it comes to on-device, there’s one extra step that you need to do, and that’s to convert your model to optimize it to run on the device What this does is, it shrinks the model It makes it not battery safe, but it makes it better for working with battery, those kind of things So there’s four steps in this First of all, you build your model by either using a pre-trained, retraining, or creating a model from scratch And in TensorFlow, then, you save it as what’s called a saved model So saved model is a file format that we’re standardizing on in TensorFlow 2 And the idea behind saved model is when you use that file format, it will work on enterprise-grade scenarios in the Cloud It will work on mobile It will work in JavaScript, in the browser, that type of thing To be able to get it to work on mobile, then, there’s TF Lite Converter, which shrinks that model, freezes parts of it, and allows you to use it on TensorFlow Lite So you go through these steps for model conversions so that you can then execute your model on a mobile device And when I say mobile device, we’re actually moving beyond just iOS and Android The same process will be able to be used so that your model will then work on embedded systems So we recently released support for microcontrollers, some microcontrollers and embedded systems like IoT-type devices like Raspberry Pi So let me give a quick demo of this if we can switch to the demo machine And my screen saver kicked in while I was talking Thanks All righty So I’m actually going to run this on the Android emulator So we can see running on the Android emulator now This is a little app that’s running in TensorFlow Lite So it’s an open source app you can go and download this now if you like And what this is doing is, it’s grabbing frames from the webcam It’s emulating the webcam using the front camera on my laptop And then, it’s classifying those frames using MobileNet MobileNet, as I mentioned earlier on, is built to work with 1,000 different classes So what it’s doing, frame by frame, is it’s taking the frame from the webcam It’s compressing that to the size of the image that MobileNet wants to use It’s passing it to TensorFlow Lite It’s getting back 1,000 classifications It’s picking the top three and then rendering those on the screen So for example, if I hold up a water bottle, we can see– I did say water bottle Eh, there we go It’s hard to get it right with the angle I usually have a banana for this I forgot to bring a banana with me And we didn’t have a backup banana There we go, water bottle, 100%, so that that kind of thing So if I, then, just move this up– sometimes I put my face in it And I don’t know what it wants to say But so we can see the inference time here Now, this is running on the Android emulator I’m sorry if it’s really small With the inference time, it’s about 300 milliseconds on an Android emulator If you run this on a device, particularly an Android device that has a neural networks API installed, like, for example, like a Pixel– if you want to see it running on my Pixel later, I can show you– it’ll do the inference in about 30 milliseconds It’s super, super, super fast So we’ve been building TensorFlow Lite with that in mind to shrink the model, to optimize it, to make it super fast for inference So you can even do like near real-time inference like this running on an emulator on my laptop, which I think is really cool So that’s the demo if we can switch back to the slides So in summary, for the choices that you’re facing, we realize that this is a very confusing thing So part of what I’ve been trying to drive in Google is a bit of a strategy around, OK, what do mobile developers need for machine learning? And it was driven off of a flowchart like this one And I’m sorry if it’s really small But that’s kind of part of the intent,

that we understand that it’s very confusing But if we think about this, if we look at the left-hand side of this flowchart, those are the options that are available if you want to run inference on mobile And if you want to copy this flowchart, just drop me an email I’m happy to send it to you because I know it’s pretty small to see on here And if you look at the right-hand side of this flowchart, those are the options that are available to Cloud developers So the idea is if you want to run inference on mobile device, there are a number of scenarios Maybe you want to have it connected to the Cloud to do your inference Maybe you want it disconnect it from the Cloud Those are the things on the left side of the flowchart Ditto on the right side if you want to use Google services or if you want to host your own models, those kind of things But we also have an offering that Abraham is going to talk about that gives us both So the idea is– and Abraham probably will be best to explain it with ML Kit LAURENCE MORONEY: Thanks, Laurence As Laurence described, ML options may look like a [INAUDIBLE] at first look because you would like to provide tailored solutions for each use case As a developer, you want to implement these solutions together Hence, you can use the ML Kit as a single SDK ML Kit is a mobile SDK that brings Google’s machine learning expertise to Android [INAUDIBLE] in a powerful, but easy-to-use package It comes with a set of ready-to-use APIs We’ll call them base APIs And they work for both on-device and Cloud-based inferences They support common mobile use cases like image labeling, face detection, texture cognition, and barcode scanning, to name a few While they’re all ML models under the hood, they’re as easy to use as any other API Let’s look at a typical workflow of someone looking to implement a machine learning model in an app So first, the ML expert, data scientist gathers, massages, and splits the data; brings or creates a model; trains, tunes and evaluate the model Maybe they will have to do this in a few loops And then, the DevOps will deploy the model And finally, the software developer will do the prediction But ML Kit makes this way easier You simply pass in the data to the ML Kit library, and it will give you the information you need Whether you are new to or experienced with machine learning, you can implement the functionality in a few lines of code And it includes both Android and iOS libraries While on-device APIs process the data quickly and will work even when there is no network connection, the Cloud-based API has leveraged the power of Google Cloud platforms, machine learning technology to give powerful, high-accuracy results Let’s look at a specific example As you can see here, we are building an app that requires identifying the teams and content of an image You would be using the image labeling API In the cart, you’ll be seeing two icons here On-device APIs are free of charge and are best suited for scenarios where you need a low latency, and you might not have a network connection, and you’re OK with a more coarse-grained answer On the other hand, the Cloud calls come at a free quota, but are paid beyond that They provide the most accurate results and are powered by the state-of-the-art machine learning models running on the Google Cloud Platform But if you do have some ML experience or would like to bring your own custom models, as Laurence already described, you can do that too And using ML Kit with that, you’ll be able to run your TensorFlow Lite models on the underlying framework, whether it’s Android or iOS And you’ll get further benefits from ML Kit since it’s integrated into the Firebase, such as analytics, remote config, and A/B testing Maybe not everyone is familiar with Firebase So Firebase is Google’s mobile development platform which has tools for building better apps, improving app quality, and further growing your business So with using ML Kit with Firebase, we’ll actually allow you to upload your TF Lite models into the Firebase console and dynamically deploy them to your users This means that the APK bundle size can be really kept small since you won’t be actually putting the model inside the APK And you can do frequent roll-outs without having to republish your app And you can further run A/B experiments Firebase features, such as A/B testing, remote config,

analytics make this process real easy And I would like to walk through a demo and show all these features to you Thanks a lot So let’s start with the base APIs This is a small sample, actually I was using only the base APIs here Let’s do some detection As I already described on image labeling, let’s start with that When I start using the on-device APIs, they are really fast, and I get some answers here, like beach, rock, sky, cliff And they’re pretty good But if I want even finer results and way better accuracy, then I can use the Cloud Here, I am actually detecting from the Cloud As you can see now, I get more finer grade answers, such as “body of water,” “sky,” “wave,” “shore,” that weren’t actually in the on-device API call And we already described some use cases there Maybe some use cases were on-device APIs were better or the Cloud APIs were better And in such a use case, let’s say we are doing a face detection We will need to use on-device API because we want to actually process those frames really faster LAURENCE MORONEY: I think you look great in lipstick, by the way IBRAHIM ULUKAYA: I am not sure So it’s face detection, and we get the contours And we are using the on-device API here so that we can actually process these frames without having to go back to network and waiting for the network response But as we said, like ML Kit, we were able to use even hybrid solutions because we are using a single SDK There’s another demo I did earlier Let’s see how it works But this is actually, the results coming on the top are coming from the device API because I want actually get these frames as quickly as possible LAURENCE MORONEY: I did not use my Mac for making music So saying it’s a musical instrumental is accurate IBRAHIM ULUKAYA: We’ll see about that So I want to trigger it It starts with the musical instrument, as you see there It was on-device API But, at the same time, I did a call to the Cloud API And I was able to select the label with a laptop, with a better answer because I was actually getting the laptop from the Cloud API I can do a few other things For some reason, I’m getting a lot of Clouds in LAURENCE MORONEY: Yeah, there we go IBRAHIM ULUKAYA: As you see, it takes a change from like the musical instrument AUDIENCE: Is everything a musical instrument? LAURENCE MORONEY: Definitely IBRAHIM ULUKAYA: I hope not LAURENCE MORONEY: Everything is a potential musical instrument when you think about it So this is Sitting, arm AUDIENCE: Cool [INAUDIBLE] IBRAHIM ULUKAYA: And on top of that, we also mentioned about the custom model So you can actually bring your own model to ML Kit as well on the framework of TF Lite So I was actually using a local model local model here and doing the image labeling with my MobileNet or model And, also, I can use the Firebase to download another model from the remote And I can use this time using the same image labeling from my remote model This way, I was able to use another model that wasn’t in the APK or in the bundle before But not only that, we can actually further use Firebase to go one level higher here This is my Firebase console If we are not familiar with it, that I’ll be actually using the console for my mobile development Here, in the ML Kit section, I have some custom models In these models, I see the image classification, invalid model, some models that I already published, so they can be downloaded on the fly from my client Let’s say I want to do some A/B experiment I want to see if I have a better model, if it’s really better or worse, or how will my users react to this new model? We can actually run an A/B testing An A/B testing lets me use some of these models or some of the other parameters on the fly to my app We’ll use a remote config, which my clients will be able to get some configs on the fly again Let’s say the ML experiment name is mlkit, just for the sake of example And here, I will be selecting my app For the sake of example, I’m just going to say 5% of target users, which I will be rolling my new model And I want to have some goals so I can track some metrics that if my new model is successful Let’s say daily user engagement, one of the common goals,

the [INAUDIBLE] So I will be creating some parameters so I can change these models on the fly Let’s say ml_model_name And one of my models is image labeling And I can create a new experiment now with a different variant So the 5% of my users can actually get the version 2 model With that, I will be starting an experiment And I’ll do a click Review here And from now on, I can actually start the experiment and see how my users will react To the experiment, I can see things that like that if there is a better user engagement, this model Or maybe my users are actually having a lower churn rate Or if I’m selling something from my in-app store, maybe they will be getting like more stuff, because they really found what they were looking for using the ML model Can we go back to slides? As I explained, I just started this experiment, so I won’t have the data ready But there was another one, actually I did the experiment before Here, I can see clearly with different variants that one of the model ended up having a lower user engagement, and the other model made my users stay more Maybe they really found what they were looking in my app using this new model So using these experiments, I can really find the best model I can use and I can even process and get better models to go from there As Lawrence already described, Google provides so many different options Either you are a new or experienced ML developer, or you want to do your inference on the Cloud or on the device And, if you want to bring your custom model, or you want to use the Google model And with ML Kit, we can actually use it from a single SDK And it’s easier sometimes if you want to use the base APIs, a few lines of code, or you can bring your own model as well And you can further use Firebase to get analytics and to actually match your different ML models and see how they perform [MUSIC PLAYING]