Developing a Warning System for Risk Managers from Scratch on GCP, using AI & ML (Cloud Next '18)

[MUSIC PLAYING] OTTO VERMEULEN: Have you ever wondered how banks are keeping your money safe and how banks monitor toward their credit risk? Before our presentation, we will show you a short animation outlining the business problem, the solution, and how technology can help Meet Charles [VIDEO PLAYBACK] – Meet Charles, a credit risk manager at a bank On a daily basis, Charles is responsible for monitoring the risks related to lending the bank’s money to other companies The problem nowadays isn’t getting information but getting the right information The big data world is Charles’s struggle It’s impossible to read all the news about these companies And often, it comes in a language that Charles doesn’t speak And even if most is just noise, sometimes there are really relevant and even urgent things mentioned in local news Wouldn’t it be great to help Charles do his job even better and stay on top of the credit risks the bank is facing? For that, we created Early Warning System, a machine learning-based application that is trained to process all the data that becomes available With Early Warning System, Charles gets proactive signals about his counter-parties that he needs to look at on the devices he has These signals come from all the local markets where his clients are active in all different languages Early Warning System translates everything into English so that Charles understands This saves him lots of time and makes his life in the big data world much better Early Warning System [END PLAYBACK] OTTO VERMEULEN: Good afternoon My name is Otto Vermeulen, and I am a technology partner at PwC I’m very pleased to be here at Cloud Next ’18 What we’ve seen is that the main business challenge for Charles is getting the right information at the right time and that his problem could be solved if he gets information from around the globe at the right time So that’s why we built an Early Warning System for credit risk management– extracting insights from public data from around the globe, which enables the credit risk managers to focus on important events and also supports that decision-making And how and why is that possible now? It is possible because we have nearly unlimited compute power and storage capacity from the Cloud We have a whole range of APIs and algorithms for machine learning and also because we now work agile, enabling fast experimentation So that enabled us to build a proof of concept in 12 weeks’ time I will now walk you briefly through the agenda First, I introduce the team and explain how we did come together After that, ING will explain why this use case for credit risk management is so important to them Then, we will elaborate on our approach of this proof of concept and the design of it And after that, we will explain the technical solution, including a technical demo Then, we will wrap up with the key takeaways of this project As you can see, we are here with three main parties They are ING, Google, and PwC And how did we come together? Well, easy– Google approached us and indicated that ING had the idea to collect real-time news and perform corresponding analysis for the credit risk management departments And they asked whether we would be willing to work with them, which, of course, we would So we quickly built a demo And the demo convinced ING that they were on the right track and needed to explore this further So they asked us to build this proof of concept Before I now hand over to the next speaker, I will introduce the team First of all, we have Anand Autar Anand is the executive owner of this project and the head of portfolio management at RNG He’s also the father of this project In Anand’s team were the executive sponsors, five core team members, and over 30 test users Then from PwC we have, apart from myself, Pieter Verheijen Pieter is the engagement leader of this project and leads our banking innovation practice in consulting Pieter will be followed by Peter, Peter Wiggers He’s a cloud native architect and our technical lead And his hobby is Kubernetes, and you’ll hear a lot more about that later Also in our team, we had data scientists, a UX designer, and a front-end developer

And then not on the presentation but equally important, Petra Stojanovic from Google She’s the account manager for ING and has given us tremendous insights and access to the right experts within Google So now, you’ve seen the animation You’ve had an introduction to the project and the team And it’s time to hand over to Anand Autar Anand, over to you ANAND AUTAR: Thanks [APPLAUSE] Thank you all As Otto mentioned, you could see me as the father of this project, and I’m quite proud of it, where we are now I started with Early Warning Systems about a few years ago now, and we at ING are quite excited to be able to share with you what you can do with today’s technology We developed this, too, within the wholesale credit risk management space, which means that we focus on clients such as large corporates; financial institutions, like banks, insurance companies; and countries What personally makes this project really interesting for me is the fact that it actually shows that also in the risk management space, we’re innovating as well And along with ING’s strategy, we’re trying to stay a step ahead As you could see in the introduction movie, we have to deal with a lot of information these days And what I will try to do is actually explain why we started this with all this information overload that we have today To do so, I will first start to explain to you what credit risk management is, and then I will actually guide you through an example using the workday of a credit risk manager So let’s start with, what is credit risk management? To answer that question, I’ll actually go back to the basics of banking If you looked at banks, banks are a place where you go to to deposit your money, to borrow money, or to transfer money from one place to the other end In essence, we’re just a financial services company So if you as an individual deposit your money with a bank, the bank will actually lend that money to a client who is in need of money And when we do that, we actually run a risk, and that risk is called credit risk It’s the risk that when we lend out money to a client, we run the risk that the client will not be able to repay that money In other words, they won’t be able to adhere to the obligation towards the bank If you look very closely to this picture, you will actually see that you as a private individual are also running a risk, but then on the bank And as a result of that, we are a regulated entity So we need to abide to a lot of rules and regulations just to make sure that your money is safe with us And within this whole scheme, having proper early warning systems are quite important So now, let’s look at what happens when we actually lend out money What we’ll do to assess the risk, we’ll do a lot of analysis We’ll look at the financial health of a company We’ll look at the management team of the company, whether they are able to actually adhere to the strategies that they’ve set out We’ll look at macroeconomic type of events We’ll look at the balance sheet, the net profitability of the company, and so forth And what we’ll do is actually, we input all these variables into what we call a credit risk model These are models that we’ve internally developed, which have also been validated by independent parties within the bank and also [INAUDIBLE] by the regulator So when we input all these variables, we get a risk rating And a risk rating is nothing more than an output of a model And it gives you the classification of the risk So with ING, we use a number code– one being very good risk, whilst 18 being very bad risk External rating agencies actually do the same thing They used AAA for very low risk, whilst BBB and beyond is very high risk So when we’ve engaged with a client and we’ve determined, actually, that we’re willing to run the risk because it’s within our risk appetite, what we’ll do is we’ll start monitoring that client, which means that we’re actually starting to see whether there are any signals out there which actually causes any potential issues that we would have with the client going forward This is what we call Early Warning Signals And then on a daily basis, what we’ll do is we’ll use sources like news, market prices, but also like interim figures I could give you a lot of examples of early warning signals, but these are quite different per type of client

and per type of industry So what I’ll do is I’ll just run you through the day of a risk manager So let’s meet Charles Actually, he’s almost the same guy as in the introduction movie but then with a different hair color Well, Charles is a credit risk manager who has a portfolio of 100 clients, which are globally active He has been with the bank for about like 10 years He’s a very seasoned banker So he actually has developed a sixth sense which actually tells him when he’s looking at a transaction whether it’s a good or bad risk And in line with all credit risk managers who are out there, we don’t like any type of surprises So let’s look at Charles’s day First of all, he gets into the office He grabs a cup of coffee and then goes through his paces Basically, he looks at the risks– I mean the news, which is relevant for his portfolio And then, he also runs through all the transactions, so he looks at transactions that he needs to approve or whether he will decline them He looks at annual reviews And he also looks at rating models So Charles has quite a busy day And in that day, he also needs to make sure that he captures all the signals and information out there for his 100 clients Now, let’s imagine the following One day, Charles is actually reading the newspaper, a global newspaper And he finds out that one of his client is involved in a bribery case in a very specific country, which surprises him, because he didn’t know about them So being a risk manager, he starts digging in deeper And then he finds out that, actually, this information was already available a month ago but only in that very specific country and the very specific language Digging deeper, he actually finds out that the stock price has been declining for like a month for 10%, and he didn’t know about that As you can imagine, Charles being a person who isn’t fond of surprises, he is quite upset Now, let’s look at another example Meet Lauren Lauren is also a credit risk manager but then for banks and for countries And her portfolio is actually much more geared toward market-based type of prices OK, thanks, I hope you can hear me now Well, actually what it means is that Lauren has to look whether the risk rating profile of that client needs to be just adjusted or not Well, actually, she finds out that the news was already available a day ago when the external rating agencies downgraded the country Like Charles, she’s not really happy So what do Lauren and Charles have in common? Well, basically, they have to deal with a lot of information whilst they have to do their regular day job Imagine that you have to look– you have like 2,000 counterparties across the globe in 75 languages with more than 20,000 sources that you need– that you get information from That’s a lot So they run the risk of losing out on early warning signals So we identified this problem within the wholesale risk management space, and we wanted to have a tool which actually gives us information with big coverage in news market data, which is relevant for the risk manager So we can actually move faster and catching these signals, and be more proactive as a risk management department and moving toward what we call a continuous monitoring framework So the tool that has been built now is actually a first step It has global and local news sentiment It has market prices, such as equity, CDS, commodity prices, and for countries, organizations and sectors Users can actually set their own threshold levels And when something is reached, the tool will actually signal or give a notification to the user For us, this is a first step, because we always anticipated when we started with this project that we want to have a tool which is much more predictive Remember the sixth sense that I talked about with Charles? Well, Charles, this sixth sense is nothing more than a heuristic that he developed across the years And we believe that with the current data which is available, but also with the existing and future machine learning technology which is out there, we can make a tool that is much more predictive So why did we build it from scratch? Well we, as ING, we’re a data company

And we think that it has a lot of advantages to build these types of applications yourself First of all, in today’s world, it’s relatively easy in and you can build these tools quite quickly for a very specific use case Second of all, it gives you a lot of flexibility It makes you also independent from other solution providers, which actually provide a lot of features that you often don’t need Besides the fact that we, as ING, want to be masters of our own destiny, we’ve actually proven that we can build these type of tools ourselves So that’s why we actually partnered up with Google and PwC to build this tool We’re at the stage now that we’re– we have tested the tool across eight countries with 30 users And we’re now determining the next steps in developing the tool further but, more importantly, to embed it in our core monitoring processes going forward So I hope now you get a better feeling why we need to have the tool but also that credit risk management is a very data-driven and also a very analytical function within the bank Well, with that, I want to thank you for your attention Sorry about that Thank you for your attention And I’ll hand it back to Otto [APPLAUSE] OTTO VERMEULEN: Thank you, Anand That was very clear I think we now know what the bank does and that credit risk management is extremely important to them Also, good to hear that the proof of concept addressed your early warning signal challenges and also that you are looking forward to use machine learning to get more predictive signals So now, I think it’s time that we hear about project has evolved and what the design principles were, so over to Pieter Verheijen PIETER VERHEIJEN: Hey, everyone It’s great to be here and talk to you about our proof of concept Personally, I’m very passionate about solving important problems and making innovation really happen And I will, in my part, run you through the approach that we applied to this proof of concept; secondly, the design of this solution and the underlying business needs; and thirdly, I will give you a demo of the solution itself To start with the first part, the approach– for me, there are three crucial ingredients that have been part of this proof concept The first is the team You need to have the right skills and people to deliver fast In our case, we created five skill sets– an IT architect; a user experience designer; a user interface developer to implement the design; data scientists to build models, to detect early warning signals; and credit risk expertise from Anand and his team to tell us actually what is relevant to see My lesson learned from building such a team is that you need to find the right balance– the balance of experts who can deliver very fast in their area of expertise, but also skill overlaps so that you build, in the end, a solution that fits together The second ingredient was the way of working We really were benefiting from direct user feedback all the time We applied an agile way of working And we started with a demo that was only built in two weeks’ time, shown to the end users And based on that approach, we got a very valuable feedback and a dialogue that you could build upon for the rest of the project We took that feedback into our backlog, started planning the first print, and from that moment on, the rhythm was there that we could work with the end-users continuously The third ingredient was scoping We only were you using public data in the beginning And that increased the speed tremendously We didn’t need to connect with any other existing IT system within a bank at that point in time, and that was very important for us Now, the approach that we applied had three phases on a high level The first phase was really the beginning when we needed to start from scratch Nothing there on the Google Cloud Platform environment, just the project name Then, we started with the three ingredients that I just explained and with the services available in the Google Cloud Platform to develop the solution in sprints And I think the success factor was that we had a successful demo every sprint

and at the end of six sprints a working proof of concept ready for user testing at that time In our case, this, for me, really demonstrated the power that you get from building proof of concepts with the Google Cloud, that you can really go fast with your proof of concepts In our case, it was a successful proof of concept, so we decided to move on And the second phase was about testing it with users– as Anand explained, eight locations around 30 users And through those user pilots, we gathered a lot of feedback and data that we could use to retrain the models for machine learning perspective and decide what is actually relevant to see for users And now, we are in the third phase, where we are deciding how to roll this out further and take this into production into the core processes of the bank Now, that was the approach on how we did this proof concept Now, quickly, I want to take you through what we exactly built This is a very simplistic representation of our solution On the bottom, you see two types of data sources that we ingest– firstly, market data in terms of equity prices, credit default swaps, bonds, and other financial instruments coming from Thomson Reuters based on the license that ING has with them; secondly, news coming from public sources via Google, GDELT, and others That’s something we uploaded into the Google Cloud Platform, ingest it, and there we have built a pipeline to process and analyze that information And that middle layer is also where the machine learning is happening And then based on that pipeline, we decide what to display to a user who is able to log into an interface and from there, get that information to him On the right side, you see some statistics of what we are actually processing The current scope is around 250 clients, or organizations, in the tool And you can get an impression of the amount of information that we are processing on a daily basis Now, that was a very simplistic picture And actually, through those prints, we encountered some very specific business needs that we needed to fulfill and overcome to build a successful proof of concept Five are here on the street You need to have maximum coverage delivered as signals in real time with enriched information so that you are sure that their imputation is right, filtered and ranked based on relevance, and then, in the end, clustered into single events And I will go through each of those in more detail in the coming slides The first part is coverage We really learned that entity profiles drive the coverage that we were looking for In one of the first demos, we actually got the feedback that we were missing out on important information And as Anand explained, you don’t want to miss out on those signals The reason for missing out on information was that we were only looking for information with the parent company name And actually, we learned that we should take a different approach We should build entity profiles with all the aliases, subsidiaries underneath that company in different languages, as you can see here as an example for ING And in that way, we have been able to increase the coverage with a factor of five to 10, on average, for each client Now, the information is then coming in, and you want to process that in real time That’s a key requirement for risk managers, so that he gets those signals fast and can respond to them As you see here, for an average risk manager with a portfolio of 50 clients, that’s quite a lot of information that we need to handle On a daily basis, you talk about 15,000 news items, 250 financial instruments which are continuously moving, and that’s all something we need to be able to process in real time So how can we help the risk manager to stay on top of these things? In the technical part done by Peter, we will talk about the architecture and the way we developed a solution to make this– to meet this requirement The third one is enrichment So as a risk manager, you actually want to have information that is truly about the organization that you are responsible for So you need to be able to cut out the false positives As the example here shows you, if you get a average news item, you see a lot of actors mentioned in the news item So if you would do a keyword search

for any of the organizations, this item would whip up But if you are looking for early warning signals, only in the case of oil company E, this is a true positive early warning signal So the rest are just false positive, and you want to filter those out So we need to go beyond the keyword method and applying techniques to understand who, what, when, and where this event actually is happening And we use the natural language processing technique for that, and we will show later on how that works The fourth part is now we went very broad A lot of information is coming in We also enriched that So it’s really about this counterparty, or client But now, there’s still too much information to handle So you need to be able to filter that information and select items which are actually relevant for a credit risk manager to take a look at As you can see here, there are also some examples of things you don’t want to take a look at as a risk manager So if you are responsible for monitoring a bank and you’re getting the global news, you also see a lot of local news about ATM incidents, for instance– negative sentiment but not relevant for a credit risk manager on that level Same with car manufacturers– you get in news about car accidents as well So what we need to do is actually make that distinction in our pipeline to detect what type of topics are relevant for credit risk manager to take look at and cut out the other part, which is just noise frame And then the fourth– or, sorry– the fifth one is the clustering So now, we have relevant items to take a look at But as you can imagine, if such an event happens, more publishers are writing about it And it becomes a very annoying experience for a user, because you get that same news message over and over again but then from a different news publisher So what you need to do to solve this is build a clustering algorithm that understands, that actually based on the text, this is talking about the same event And that’s what we did, so that you cluster it into a single event and bring that message only once to the user Now, these are the five building blocks of our solution The last thing I would like to do is take you through a demo of what the solution now really looks like For the purpose of the demo, we don’t want to harm anyone here in the room with showing real early warning signals So we created a simulated dataset with dummy names and dummy news items The rest of the tool is working just as it is Now, Charles was logging in As you can see, nothing there yet on his home page So the first thing he does is he goes to his portfolio, and he wants to add counterparties to his organization– a list, but you can also start searching for names So that’s what he’s doing right now And he adds a second organization to his portfolio, third one, and a fourth organization So now, he has four organizations in his portfolio Yes, there they are And then he will click top left going back to the home page And on the left side of the screen, you see the news feed And the news feed has factors like sentiment score, dates, entities found You can also filter on the time period You can filter the news on various factors, like relevance or sentiment negative or popularity, which means how much is written about it And then here, you see that he goes to a specific article which has five similar items That’s the clustering So these are all about the same event in different languages with different titles So this is, I think, Chinese, a French article– FR is French, translated, and then clustered that into that single event On the right side, you see the market information, so equity prices So the first one is a negative one So it went down So he takes a look at this specific counterparty page We had much more detailed information– equity price, credit default swap, credit ratings But actually, this is all very reactive You need to go and look for the information But what you can do as well is there– set your triggers and make it proactive So let’s say the equity price drops 5% in a day That’s what he’s doing now He adds that to his portfolio He can do the same thing with, for instance, keywords In case of fraud, for instance, if there’s a news item with fraud in it, I want

to be notified immediately And that’s what the trigger is doing So it’s already running now in the background and checking if there are any of those triggers being hit New sentiment– very negative, another trigger Credit ratings– if there’s probably a downgrade, that’s something you want to be notified of immediately And in this way, it becomes a very proactive way of getting that information, because now, those triggers will search for you and make sure that that information is coming to you There’s already one been hit now around fraud So there has been an article about fraud on this specific client Take a look Yes, there it is And now, sentiment score of mine is 70, so it’s relatively high And now, the user is giving you feedback on the specific item I’d like to see more about that Or for instance, the second one, he doesn’t consider relevant So that’s information we can use to actually make those models that we have running in the background more advanced with machine learning So that was a very short demo, but I hope it demonstrated to you that we have been able to actually create a digital assistant for a credit risk manager to make his job much more easy with increased coverage, increased response time, and in the end, a proactive tool that makes sure that the information comes to him at the right time With that, I would like to thank you and hand back to Otto [APPLAUSE] OTTO VERMEULEN: So thank you, Pieter, for explaining the way how we approach projects You did that very well And also, thank you for outlining the business concepts, which we needed I think it’s now more than time that we dive into some of the technical details of this project to show some technical meat to the bone And I’d like to have on stage, Peter Wiggers PETER WIGGERS: So my name is Peter Wiggers I’m the Cloud architect and one of the software developers of this tool And I would like to make this presentation a bit more technical So I’m really excited to do that I hope you are too So let’s start with a schematic overview So can we switch back, please, to the slide? Yeah, thank you So what you see here is a schematic overview of the solution we built into Google Cloud As you can see, we use a lot of managed Google Cloud Services, which really us to develop quickly On the left side, you see the input we used for the pipeline you see on the right side So the input consists of the entity profiler and consists of public data sources to news So on the right side, we have the pipeline we built And as you can see, it consists mainly of two very important components One of them is Google Cloud Pub/Sub, and one of them is Google Kubernetes Engine This combination made sure that this pipeline is really robust but still very flexible And I will show you why later in the demo This pipeline, it separates into a couple of steps And every step is its own Kubernetes deployment So we could easily scale it independently from the other Kubernetes deployments And all deployments communicate via Pub/Sub So the input comes from Pub/Sub, and the output goes to Pub/Sub until processing and article is finished So what I want to do now is switch to the demo, please I want to take you on a trip of an article through the pipeline And we’ll do this in a Jupyter notebook I hope you can read it from the back And what we will do is we will start with building an entity profile In this case, we will just use PwC It’s the only keyword we use as an input And we will only build a part of the entity profile, which is gathering the aliases and Wikipedia pages of this entity So we will run this And as you can see, we find quite some aliases, some subsidiaries in different languages, different character sets, different namings And we use all these aliases to search for news about this company What you also see is a couple of Wikipedia identifiers,

and these are really important for us in a later stage So what we’ll do now is we’ll use one of the aliases– in this case, number five, which is just PwC And what we do is we generate RSS feed URLs to the Google News So as you can see, we used the word as a query for this RSS feed And what you also see is we query for specific editions And an edition on Google News is a combination of a language and a country So in this case, it would be a French article written in Belgium For this example, we will query one Google News RSS feed, and we will only one edition, the Dutch edition So what we will do is we parse the feed We get the feed in And what you see here is the first five articles of this RSS feed It’s in Dutch, so probably you won’t be able to understand it, but that will come later What we get from the RSS feed here is the headline, is the publish date, and the URL to the original article So this is the first part where Pub/Sub comes in Because before this, we have to start with a pipeline at a moment where we start with the cure And then what cure does– it was located on the previous slide– what a cure does is it periodically publishes a task to Pub/Sub where it says, hey, can you query on PwC on the Dutch edition on Google News? So that’s done now And what we now do is for every article we get from this RSS feed, we publish a task to Pub/Sub saying, hey, can you process this article for me? And we have different workers which listen to this Pub/Sub topic, get the task, and process it So it’s as easy as this, just I publish to a specific topic, and I subscribe to a specific subscription within that topic What I want to show you now is some statistics, so we will open this one What you see here is some metrics from Pub/Sub In the first chart, you see how many messages we publish to Pub/Sub In this case, it’s about almost, I think, 2,000 per second at one point And we do it for, I think, a couple of minutes, which leads to a total queue in Pub/Sub of articles to process of 200,000 articles can be duplicates So here, you see the entire queue And what you here see is the amount of articles, the worker’s process So it’s about, I think, 30,000 per– this is five minutes So in like 30 minutes, we process the entire queue And then we just wait for– we search, again, on those entities And this is a graph showing the age of the oldest task in this Pub/Sub queue, so which is about 2,600 seconds I don’t know, it’s– I think it’s 40 minutes And what makes this setup really nice is that we can scale the workers independently So let’s say you think this is too long I don’t want to wait 40 minutes if something happens I want to know it quicker Then, we can say, OK, let’s scale the workers independently And let’s scale them to, for example, the double amount of workers And then, this number, this 30,000, will increase to 60,000 And your queue will be processed and half the time So I would like to show how easy this is First, I’d like to know, how many of you have hands-on experience with Kubernetes? OK, that’s nice That’s quite some You probably know how to scale a Kubernetes deployment For the people who don’t, I would like to show you how easy it is So let me zoom in a little And what we do here is we see these are the workers in process, the first part of an article You see– I think it’s 12 containers or 12 bots So what we want to do is we want to scale it So we are not happy with this 30-minutes processing time So what we will do is– let me type correctly there So here, it says we want to run 12 replicas So let’s say we change to 18

So one’s 50% more bots running Then, you immediately see that now, Kubernetes spawns up more containers And they are running already now And they listen to the same Pub/Sub topic So Pub/Sub makes sure the tasks are distributed over these containers So if we now would wait for, I think, a couple of minutes and the pipeline would start again, this bar would be at like 45,000 per five minutes So it’s that easy to scale this So let’s assume this article arrived at the processor Then, we instantiate a new article, and we have the same information that was in the RSS feed So the information was sent to Pub/Sub, and we get this information from Pub/Sub, so nothing added here Just a URL We have the original headline in the original language and the publish date What we do now is we give the worker the task to visit the website site and collect the body from the website So what you here see is the body of the article It’s in Dutch And we need this for further processing What we do after this is, of course, we translate the article Because we process, I think, 75 languages, and we all need it in English So we use the Google Cloud Translator API to translate the headline and the body of the article So now, you know what this article is about It says, Ad van Gils is the new CEO of PwC Netherlands And we have to translate the body here Now, we have the article in English, we can use it for the Google Cloud Natural Language Processing API And this API does three things for us First of all, we want to analyze the sentiment of the article So the credit risk manager is really interested in mainly negative news So a negative sentiment is more important for a credit risk manager than a positive sentiment So we just analyze the sentiment And here it says, OK, we have a sentiment score which is normally between minus 1, which is very negative, and 1, which is very positive, of 0.26 It was quite positive And here, we have a magnitude The magnitude is a number that represents the number of expressions of emotions used in this text It’s 3.3 The second thing we do with NLP is we analyze entities And this is really an important part for us, because we want to extract the entities And we want to know, which entities is this article about? So let’s run it Well, here you see it extracted Ad van Gils as a person And it also extracted PwC Netherlands as an organization And here you see why the Wikipedia is so important for us Because what we had– we just searched for PwC And when we search for PwC, we don’t know if the results are about the PwC we are looking at But we know that when NLP processed it and says, hey, this is the Wikipedia of this company And then, we know we have this Wikipedia in on our database Now, we can make the match This is really the company we are interested in And I saw this example Let me go a bit up Here it says, Waterpolo Association PwC, which is a completely different entity We have a textual match But probably if we would pass this through NLP, it would not recognize this entity as being PricewaterhouseCoopers That’s really important What you also see, here is the salience And the salience is a very important number And it represents the importance of an entity within this article So it’s ordered by salience So it says Ad van Gils is the most important entity And next to that, it’s PwC And here, we can make a distinguish between, is this article really about this entity, or did we just find it somewhere in the article? We also have a sentiment per entity, where we have the same thing, like score and magnitude So if an article is negative, we want to know, OK, is it really negative about that entity or maybe about another entity? We have a couple of more entities We only have the proper type entities in here, so not all entities found And what you here see is also type event And this is really important for clustering later So it recognizes a couple of events So it was about Ad van Gils being appointed as new CEO And we have appointment as an event here

The last thing NLP does for us is analyze the category of the article So Google configures a couple of hundred categories, and they have this NLP functionality where it can predict in which category the article is So in the case that, for example, PwC would sponsor a sports event, the article will be pro– the article would probably be classified as sports And that’s a category we’re not interested in So we only use white-listed categories, like these– Business and Industrial or Company News And here it says, it’s 90% sure that this article belongs to the category Business and Industrial So now, we can continue with processing this article But because these couple of hundred categories are still quite general and not based or not specified for the field of credit risk, we defined a couple of topics ourselves, and we trained a couple of models that can classify the article for these topics So we trained more than these, but a couple of models with [INAUDIBLE] learn And these are really good at, for example, predicting if an article is about fraud or about bankruptcy, the topics that are really important for that credit risk manager So here, it says it’s 11% sure that it’s about fraud, 8% sure about bankruptcy Now we generate a summary, which is a completely newly-generated summary So we can show this in the dashboard And after this, we can remove the original body and the translated body, because we don’t need it anymore We extracted all the information we need from it And now, here comes the last part, which is the clustering So what we do with clustering is we create one big matrix where all the columns are entities we found and all articles we found So there are hundreds of thousands of columns in the matrix, maybe even millions I don’t know And the rows are individual articles And here you see as a value, the salience of that entity in the specific article So for this example, we will find a related article just as an example, and we’ll add this to the matrix to show you how we do this clustering So we found a related item about this appointment And we add it to the matrix So what you can see is a couple of entities are added here So these entities are found in the second article them but not in the first article And also, these entities are not found in the first– are found in the first article but not in the second article And we can also see that still, Ad van Gils is the most important entity in the first article and the second article And what we can do now is we calculate the cosine similarity, which is just a mathematical formula which can calculate the distance between two vectors, so this vector and this vector And with this distance, we can say, OK, these two articles are about the same event So let’s calculate it An here it says, OK, I’m 82% sure– we have a 82% match that these– for these two vectors And we can play with the thresholds where we say, OK, if we set the thresholds at 80%, then these two articles will be clustered together If we’re set the threshold at 85%, they wouldn’t be clustered together So that was it for the demo now This, I think, cost me like 15 minutes I think on a work, it will take like 15 seconds And we do it 100,000 times a day And what really helped us here was the fact that we used Kubernetes Engine and Pub/Sub that we can scale Now we have 250 counterparties in the tool, it will be increased to maybe 10 times this And I’m confident that we can scale to that amount of counterparties Thank you I think back to Otto [APPLAUSE] OTTO VERMEULEN: Thank you, Peter, for this excellent demo And I think now we all understand why your hobby is Kubernetes I think we’ve seen that we’ve a lot of the components of GCP and also did quite some coding ourselves So now, I think it’s time to wrap up this session And I want to give you three key takeaways The first is the right information at the right time We have seen that an international bank

needs information from around the globe at the right moments but also that information is growing exponentially So the key question was how to get the right information at the right time at the credit risk manager? And I think that with this proof of concept, we’ve shown that you can do that The second key takeaway is Cloud and machine learning Given what we’ve seen, in the past three to five years ago, it would have been inconceivable to make this But now, with the advent of the Cloud and also all kinds of APIs and algorithms for machine learning– like translation, like natural language processing– developing a system like this is quite easy And finally agile working and fast experimentation– we have demonstrated that with the right tooling and the right team, you can build a proof of concept from scratch in just 12 weeks And you can do that as well and start tomorrow Thank you for your time [APPLAUSE] [MUSIC PLAYING]