Accessing IoT Data with Couchbase Server, Couchbase Mobile and Kaa – Couchbase Connect 2015

absolutely that’s how it happens ok right I think we should start so hi everyone my name is Andrea hanowski I’m chief technology officer at cyber vision I’m also leading the team that works on carbon source IOT platform and today I would like to talk to you about a few things which are related to the IOT revolution that we’re not only witnessing butts I’m also actively participating in it’s been a very interesting time lately what we’ve seen is a tremendous advancement in the hardware technology which has led to proliferation of low-cost and highly powerful devices which allowed us to enter this new era of IOT and while it’s still very new and fresh and exciting and so on it’s also becoming very very competitive field there are new IOT products being launched almost daily and in order to succeed the companies have to have a very clear pass as to how they get from the concept the idea of their product all the way to production and there are some technologies that can help them and come to the rescue so let’s talk about this perfect ideal picture IOT wouldn’t be successful if it wasn’t for the data data is essentially the key in the success of IOT it is the ability to acquire all of this data that was previously hidden from us get it get access to it get visibility of this data and and give people the capability to do wiser decisions based on the data that they have they obtain so conceptually doesn’t matter what your what your device is what what the IRT product this is one of the more most important things is extracting the data from the field from these devices and driving it all the way to the presentation to the end user so there are n points which we want to make possible to which you want to enable to send data to cloud and then we use this cloud conceptually to present data through various channels from mobile applications to web dashboards to some monitoring systems and so on now all of this is understandable the piece that is a little bit scary is the one right in the middle of this picture this cloud it’s a very abstracted thing and it is especially scary when you come from a clear field when there is nothing that you have and you have to build the entire system from scratch so what I want to talk about is the technologies that help deliver this cloud faster and make this vision a reality so I want to talk about two platforms here it’s a guy open source IOT platform and gouge base and the combination of couchbase server and gouge based light let’s first of all talk about why car right why why would you pick this platform this is the fully featured IOT middleware platform which gives you the capability to to adapt to your specific solution and and build your product faster what is important here is in terms of devices that we that the platform can support the footprint that it requires is sub damn k so this is not something that you would easily integrate any technology and to write and many of these iety products they did not even have a parading system on board

so it’s just firmware and with car platform even despite there is no operating system and it cannot write apps what you still can do is it can take one of the use the case that’s see is the key and bake that directly into the firmware so that is important then the data delivery was the platform like ah is out of box it’s guaranteed it’s guaranteed it’s solid it’s reliable both on the endpoint side so doesn’t matter what sort of data you’re transferring from the from your end points from your devices to the cloud there is a storage which goes into the endpoint into the device and then there is the bit which which reliably delivers that data all the way to your cloud tier middleware and further to the database Transport Security is also granted it’s built into the platform so there is encryption at rest and in flight in the platform efficient the data serialization is also very important especially in terms of very constrained devices and I’ll talk about that a little bit later the platform is horizontally scalable and fault-tolerant now one of the most important pieces is one hundred percent open source and it’s licensed under Apache to the 0 which is probably one of the most business-friendly licenses out there it allows you to rapidly build applications using one of these three existing right now is the case c c++ and java and then it comes pre integrated with some of the popular platforms but in general integration was the new ones is not a big deal doesn’t matter which rating system they run and whether they even run it now white couchbase so first of all it’s this flexible document-oriented data model that gouge base has the capability to basically collect any sorts of data from the from the field from the devices it’s the elastic scalability it’s consistently high performance it’s the capability to do big data analytics in real time it’s the availability high availability that is offered by couchbase it is also well as you know it comes in two forms so there is a version that is open source and it’s apache license and last but not the least and it’s pretty important here is this capability to synchronize couchbase server with couchbase light with the new synchronization server capabilities and finally support of various mobile platforms some of those are listed a year so the architecture now in this slide at it looks familiar but here I have expanded just a little bit how this works was the two systems being in place so God is a middleware essentially talks directly with the devices and collects these beasts bits and pieces of telemetry data from the devices aggregates them and pushes them to couchbase server which then uses the synchronization server to synchronize some of the some of the bits of the data was the user application sitting in the edge and the users devices let’s explore a little bit how the data flow actually works in this architecture so at the wire at the very bottom there is a device which is a device in the field the IOT product let’s say that it has certain sensors that we are collecting data from and sensors can be pretty much anything those can be physical sensors or purely programmable sensors like detections of rates of certain exceptional conditions race conditions and so on so this raw data is within the application in the device or the firmware in the device gets gets fed into car is the key that is embedded into the application and that automatically ensures delivery of that data from the endpoint from the device assuming of course has wireless connectivity or right connectivity or any other connectivity to the Internet to the server it ensures delivery of that to deserve to car services and also in the in the open source there is an existing couchbase connector which you don’t have to program anymore so that the data that gets accumulated from the from the devices by the platform gets automatically fed into couchbase again this is a connector that can be easily just deployed configured and you’re

ready to go so where this data goes next from God is to couchbase and at this level the amount of data gets larger because we are essentially feeding in aggregated data from potentially millions of endpoints now in order to present the users who is the data that they want to see from their devices or their aggregation or their collections of devices within couchbase you can define views and this the summarized data the summarized documents then can be synchronized with the couch base light installed within the application on the mobile phone or the mobile application or any other applications through this through the couch base st gateway now I want to talk a little bit about how the data is actually shaped and there is a secret sauce in here this is Apache Avram so the key in these in the combination of these two platforms is the use the fact that the entire set of data is structured all the way from the application in the field from the IOT device all the way down to the database what this means is that you are not just getting blobs of data you don’t have to interpret them you don’t have to crunch them you don’t even have to parse them what you get are perfectly well defined documents that couchbase can immediately build analytics Addabbo so this average schema definition is something that you start off with by defining in the in the car platform and that’s something that becomes embedded into the SDK and that essentially demands the devices that you build the IIT products that you built demands to submit data in the way which is easy to process and easy to analyze and again it’s a bachelor compatible now I want to go through a quick quick use case or quick demonstration so this is one of the damage that we built for for conference last month’s it’s an example of of an application in the smart energy space where you have like solar power panels and they may be installed pretty much anywhere the roofs there are plenty at SFO top of the airport if you saw that a top of the airport buildings and what you want to do is you want to see how how well performing they are how much energy they are producing and you want to see this data real time so that you would know how much data your solar solar panels are producing compared to how much data is how much energy is being drawn from from the grid so in this demo what we are doing is we are introducing a concept of zones so there are solar solar panels installed in several zones and there is an application which runs on Intel Edison that well in this demo case of course it essentially measures the the power production by every solar power solar panel and since this data all the way to our cloud where this data gets accumulated essentially pre-processed grouped into these zones and then through the synchronization of couchbase server which got loose calabash risk of a slight it gets gets synchronized to the application on a tablet so this user interface on the right is is an Android application that money there’s these six zones there is the current power production at the gadgets in the bottom left double power output at at the top and then there is this greenish yellow thing is essentially a combination of the power that is the system is drawing from the grid versus the power that it gets from the solar power plant planned so I mentioned that the data structure data is very well structured so on the left here I have the data record scheme and let’s go through this real quick so this is the telemetry data that we are getting in the database from the

application in the field this JSON essentially defines the way in which the data is laid out so every Power record or every voltage record here is a record which contains several fields there is a time stamp and there is an array of samples unless you’re gay you can you can see my mind pointer here so it’s an array of samples each of the samples is essentially a record itself and that record contains three fields it’s the zone identifier it’s the panel identifier and it’s the voltage that that panel currently is producing so a single device out there can report multiple power readings from the sensor get the data into this document which is structured with this over schema and submit this data through the car platform to couchbase to be some to be processed and synchronized with the application on the on the tablet now on the right is a very short code snippet that shows how this schema translates into the code that you have to write in the application to enable this data collection and it’s very simple in C++ here essentially what happens is we we go through the list of connected solar panels we get the current output we stick n into this sample you can see here that there are fields which are called zone ID panel ID and voltage and they correspond to the names on the left here and that is not that is not magic that is something that was automatically compiled by car platform so basically the definition of the schema got translated into into data object model definitions into the structures and c++ in this case and instead of working with something low level like JSON representation or XML representation or or inventing your own data format everything you have to do is deal with these objects that have been predefined for you based on the data schema that you defined in the platform so once the record is is populated it’s essentially added to the lock storage in the platform the platform is decay itself takes care of the data applaud when the the connection becomes available and so on now the beauty of this approach is not only that it makes it very simple to program this telemetry data collection and there are many other features that that are also available in similar shape but also the fact that it becomes very similar in various platforms so this is a snippet from a C++ application but here a couple that that were produced by simulating the same code in Java and and see so again you are not working with the low-level representation of the day but rather you have objects so on the left there is a java snippet and again you see here that essential in this case it’s even simpler so we just create an object which is voltage sample which contains the zone ID panel ID and voltage readings that are current to that are currently read and the names again they correspond to what whatever was written here the name of the object type is voltage sample here so it translated into the voltage sample object in the code in the SDK code and again you produce you’ve reduced these objects you add samples to the to the list and you push this data to the lock storage which gets the records further transmitted to the cloud and put into couchbase I won’t go through the sea example it’s basically the same except that instead of the objects here we operate on pure see structures now here is how it looks like from the couch based perspective so this is an example of one document that was pushed from the from the endpoint who is the list of panels with their corresponding zones and the current voltage so you see here that this is a perfectly well understandable JSON document that is easy easy to read and and also easy to process so what we’re trying to do here

this document has 6 zone IDs from 0 to 5 and what we want to do is imagine that there are like millions of these devices or at least like a hundred of those and they have a mix of these zones connected to them so they are essentially reporting data from from solar power panels from their various zones in various combinations and we want to a grenade the data from from the solar panel panels from the same zone so that we know what the average and what the total production per zone is so what happens in this case is in couchbase we define the view and the view is very simple there is a map function and reduce function so the map function is on the left so the time stamp it essentially it essentially Maps the data received from all of these documents into by using the time stamp and the zone ID and the reduce function calculates stats so that efficiently aggregates the data from various zones collected from various sensors into into an aggregated view which contains the time stamp so this is a time series data essentially and it is it is also mapped to the zone IDs which go from 0 to 5 over here now once we have this view in order to now synchronize this view to the tablet to the application and the tablet we have to have a small job in there because there is no way right now to synchronize the view from couchbase server to college baseball so there is a small work in replication which essentially copies the latest X amount of records the like this tax amount of documents from the view into into college basis as separate records essentially a separate documents this is how it looks like after the MapReduce processing after the after processing with it was the view so there is a list of zones it contains the count records the sum of the voltage output that we calculated previously in the in the reduce function and the zone ID and there is a time stamp which which is shaded was the entire list of of the zones now the next thing that happens is in the Android application we have couchbase lice light in the in the application so we set up a channel which which is called toddles so be startles is something that we are synchronizing on without synchronizing the entire data set we are just synchronizing this subset which is post post processing data set so we create a channel and we set up a pool replication from the server to the mobile application and finally in the code of the Android application itself we create a view of the database that gives you that gives you tuples of time stamps and documents and these documents are the ones that you saw previously on the previous slide now I hope that Wi-Fi here works and I want to show you the result of all of this as we move on down the street we go buy a solar power station which supplies our city with sound it is comprised of the set of solar panels and a dashboard application that allows us you can still hear how a real time is a sponsor exited energy once it closed every one of those sour plum into these panels it’s really the data gets propagated right away from adecco of this mod energy the data gets propagated all the way from this Intel Edison chip here it gets propagated to our server in in that case it was it was in there in Virginia where the mapreduce gearing and synchronization to the tablet happens and again I’ll switch off the sound I just want you to see how quickly the dashboard the response to this so once you close one of the panels immediately the readings change so this is a fairly quick response time okay so

this is how it looked like in the real life and finally results so after we had the integration was between car platform and couch base and implemented and it’s now part of open source code the implementation of the synchronization code itself just took two days the implementation of the overall solution just took two days the telemetry data delivery was tested to be reliable we we ran tests stress tests of up to 4 million of records and every single record successfully was pushed to couchbase end-to-end latency from deliver from pushing data from the endpoint to the cloud and all the way to delivering that to the user application was under two seconds this this sort of solution gives you horizontal scalability you can add those couchbase notes if you need to and you can add cooperation server nodes if you’re running in a very successful deployment was tens of millions of devices that’s probably something you would want the integration is very clear and simple and there is no need to rewrite that again basically you have the SDK that you embed into the application and on the other hand you just read the data from from couchbase light and the plug in the mobile application now the secret sauce the Apache of raw data schema makes it really easy and simple to cure this data and process this data so I showed you a very very simple example but you can imagine how easily it is translated to more complex use cases and how easy easy it is to analyze the data which is which you know what to expect from because you know the scheme of this data and finally this couch based light pool replication which allowed us to set up synchronization from the server to the client application was almost your effort was also very very cool thing so I thank you for your attention and if you have any questions please people please feel free hey go ahead slide 2 this is my slide too well in that specific case we we use C++ is the key C++ is the gay is more complex than then c is the key so I’ll talk a little about the minimum requirements and the minimum requirements are with see as the key that one can run on something as small as cortex m0 was the memory consumption under 10 kilobytes of RAM but i’m pretty sure that you can poured it into something even smaller and there is an easy capability to switch off features that you are not using which would slim down the volume and the memory consumption even more great service it again how powerful is the structure the conditioner if it wasn’t structure data well the beauty of using structured data is the fact that you can write applications for virtually any hardware and any software and at the data analytics and that the management layer you wouldn’t worry about what it is down there and you won’t worry to have you wouldn’t worry to remember like what how this data was structured and you have a very clear path to improving your product and adding more features and adding more data into this schema because the platform itself keeps track of the historical data schemas and the server remains compatible was the older versions of your of your endpoints so you don’t have to re-implement that yourself it essentially improves significantly the handling of the data and reduces the amount of worry on implementing this handling yourself go ahead okay yeah so this is a more detailed

diagram so Kyle components one of the components is the server which is here on the left so this can be as small as just a single machine which would run all of the services and it could scale horizontally to like dozens of our hundreds of machines in the Y in the larger deployments so this is car component now there is an interface here which injects data telemetry data received from the devices into couchbase server right so this is this is where couchbase server starts and finally the last bit which which is part of the platform is the SDK that is embedded into the endpoints so this is something that is that is generated dip based on your schema definitions so if you if you remember the code that i showed the schema definition translate the objects that we were using in the code to be able to collect this telemetry data and push it to the cloud and push it to analytics live hundreds of Saladin’s and deaths tens of millions I mean in just performance tests but well the most important most important question here is the pattern in which you use the platform because there are some applications which require more transactions there are others which require less and also telemetry is not the only feature here so there are more like device management like like configuration management of firmware update software updates and so on and so forth so it’s it’s basically the workload that you are expecting from your application leveraging all of these features with that that matters what’s the typical throughput of what oh well the latest tests that died so were were conducted on AWS like a week ago on a pre-release version of next version cut that there we were waiting load telemetry data load every record was about one to two kilobytes large so it’s well fairly fairly large da cunha um and the server was handling out there was actually an AWS mid-sized installation of three servers and they were handling 180,000 of records per second so that’s about 180 megabytes of data from endpoints and well that was a test so there was like two or three thousand different points basically storming the dis server and then pushing the data the random data in there but again they are just stress simulating so in real key in real scenarios what we would have is certain caching if this is not in the real time application then we would have certain cashing in the in decline and then the client would essentially defined triggers to to start the big download existing legacy embedded SCADA deployment how what kind of migration plan should be done come back to what you have because they already have very dark about right well that’s that’s a very interesting question so the perfect scenario is basically create a new version of the firmware for the equipment and push not redo everything we would we would replace well in terms of telemetry data collection we could replace the part that does that bit right right right right so that’s perfectly scenario where you would be able to distribute the new firmware to your endpoints right and they that would have our SDK embedded into it in worst case scenario what we could do is if there is a way to basically do aggregation to some sort of i call it intermediary box right like you have a bunch of sensors and they as a first step they push push data to to some sort of staging box and thats that box is easier to manage then we would put it in there and make that box representative

of the equipment that stands behind yeah well I do not suggest depending on skater so again CD and now we are building a new see no well okay there are there are places on earth where this can be done so we are not trying to replace something that already exists and works what we what I’m talking about here is building these new applications because right now we are leveraging completely new hardware it’s an enablement platform that’s right so pretty much well we’ve seen we’ve seen some case where this platform was introduced not not from day one and still it worked significantly better than anything that was that it was built in the house previously but yes this this is essentially an enablement technology that preferably it’s it’s not it’s not what we are trying to achieve here it’s an enablement open source platform which is specifically designed for bringing new products to life faster and simple any other questions okay thank you so much