Searching Text With PostgreSQL – Phil Vacca

well hello everybody it’s about time to get started thanks for coming out to the last session of postgres open hope everybody’s having a great time learned a bunch of stuff heard some cool speeches I certainly know I did you know if you’re waiting for your your taxi or your plane ride i’ll i’ll try to keep it entertaining so that so we can keep going or maybe you’re just waiting for the auction i don’t know that’s gonna be fun to go before I really get started on the topic I just want to give you a little addendum on the original description of this topic so that I’d be talking a lot about us address data but I did not get a chance to get that demo finished so we’re certainly going to be talking about postgres and full-text search and text searching and what you can do with text will certainly talk about some of the problems with similar sounding things and if we have time to Q&A I’ll certainly be happy to talk about address data but the demo won’t have address data so if that’s what you’re here for yeah you know maybe get a brown hear something that said I think we have plenty to talk about today my name is phil vaca i’m here with enterprisedb i’ll say a few words about them since they were kind enough to put me on a plane and send me here if you’re not familiar with enterprisedb they’re one of the larger postgres consulting shots we offer a proprietary platform that’s built on polish crest called postgres plus and it features a number of improvements that that many of them make their way back down into the postgres main we’ve got a number of developers who are who are also core contributors to the postgres to the actual postgres project and it’s the sort of company that you might talk to you either a if you’re you’re new to postgres and not your enterprise isn’t running the way you think it should or be even better the things we really like if you’re you happen to be running some other database system and you’re saying yourself why why are we spending this money let’s let’s run postgres it’s great anyway that’s said today’s talk will be about searching text in Postgres and let’s get going so I gotta ask you people i mean you ever spent a lot of time sifting through text I mean really going through it you know how to deal with like a a lot of text you know just Oh laughter roll file after file it just seems like it’s never going to end how are you ever going to find that thing that you need you can’t see that it says polish Chris can do this I tried for like half an hour to get that to be a red font and that’s beyond me at the moment nonetheless postgres can do this so let’s talk about the basics of text in postgresql how is it stored well the ANSI standard defines two types for four characters defines the fixed character with and the varying character with and we know them lovingly as care and very care postgres we have both of those but we also have the text type you might be saying yourself wow that’s a lot of choices which one should i use it’s really very simple just use text text is straightforward works very well behind the scenes text and very care are literally the same thing and a little addendum from the 94 manuals as some database problem some databases out there may find that the character fixed with character actually has a performance advantage there is no such advantage in postgresql that character n is usually the slowest of the three is the additional storage costs it’s right from the manual and can you can take it from those folks there’s some other weird text based data types we’ve got care notice that that is care in quotes I do not add those that is the actual type care is not care one that is the representation of a single enumeration a one-letter enumeration that’s used in system tables second type is named another weird one you probably won’t use it but you’ll probably see it if you ever do a backslash d on the system tables on the catalog tables that’s the reserved at the moment it’s 64 characters or 64 bits but someday it could go up because who knows we might need bigger system table names and we also have structured text by that I of course mean structured as in XML structured as in JSON but I do not mean JSON be json be actually is its own internal representation it is not a text format JSON is XML is behind the scenes you can cast them very easily you certainly could cast JSON betta text and I’m sure it can do the conversion with no problem but but it’s literally a physically

different data type on disk and let’s not even talk about character encodings today I think we all know that that’s that’s a topic that’s going to get you it’s going to that would open a whole different whole different ballgame so how do you query text if you’re looking for it well we all know the familiar using the like statement hopefully everybody knows about the I like statement the case insensitive like one that you don’t see very much but is in the manual is similar similar is a is a subset of regular expression so you can write a instead of a like Klaus you can write a similar Clause performance isn’t what it’s known for and you don’t see it used too often if you’re going to see Reddick reg ex people will just use a reg ex because because postgres supports it that right there is a very trivial right now it’s that’s that’s a validator for email addresses and a pretty good validator for email addresses too so you can of course query your text and have it do pattern matching those of you who spend time looking at things like regular expressions probably have a very very solid mind behind you but you’ll find that it’s it’s fun and easy to do in Postgres actually much more trivial examples in that are extremely useful as well but that’s an email address but these methods aren’t always that good I mean they’re good enough there what we know they’re what we use but but are they the best that we can do with postgresql I mean it’s billed as the world’s most advanced open source database and it’s build that way for a reason one of the problems with this method is that when you’re querying text you need to know how it’s spelled you have to know what the correct spelling of that word is before you can get the the data out we’re going to talk about addresses or names or any sort of thing if you’ve ever had to validate large databases full of information that was entered by agents people on the phone sometimes consumers coming in sometimes the same customer twice might spell their name differently so how do you know which one is the correct record well you don’t you have to know before you’re looking or you have to write a bunch of with like % sort of clauses nobody wants to do that also another problem is that words with the same route don’t match so for example democracy and democracies you can’t you could write a regular expression to search for it but again you really want to go back to writing a statement that looks like a regular expression maybe you do but they’re not always the most easily readable not always the most maintainable it doesn’t scale that’s just that’s just a fact the larger the table is the more it’s going to have to scan has to scan if you’re trying to you’re not querying based on the absolute first letter of that text column then it has to it has to run through the entire text field a thousand rows maybe not so bad 10,000 rows is ten times worse a hundred thousand roses probably more than a hundred times worse fun that’s certainly bad it doesn’t scale particularly well a b-tree index won’t help that’s your typical index you just type create index on column that create index is going to create a b-tree index the only thing that is good for is if you’re querying if your query starts with the first letter of the first column of that text type of the first character in that text type that’s because it’s organized just as you would think it’s a sort an index is essentially sorting it alphabetically for you so if you’re trying to find out where name is like and you know that Williams is somewhere in there but maybe his person’s name is Mick Williams and you’re not entirely sure if it was MC or ma see that that doesn’t help you you can’t your your index won’t help you in this case and expression indexes you say haha but I know the queries I’m going to run against it so I could write a specific expression index on that column those are the typical those are the indexes you might see where for example the simplest expression index would be for example on lower casing the column so you can do an index that is on the lower cased column and postgres will have that expression in built so that you can query against it but if you’re trying to do pattern matching if you’re trying to do deep text searching or even not so deep text searching you’re not going to know what you’re querying for beforehand and you’re not going to create every index before you try to run the query so that’s really a non-starter either thankfully there are tools that can help you first we’re going to talk about is PG trigram what’s it good for well it’s good for helping you with spelling problems it’s good for helping you with words that are very close to each other it’s an extension so you can get it with a simple create extension command and it’s used for determining the similarity of two words and spelling not the phonetic mind you about the actual similarity of the physical word divides the word into a string and a device

right it divides each word of the string into a three character segments hence the name trigram that’s what they call them and it always pads the left side of the character with two spaces and the right side with one space so what does that look like it ignores nom alphanumeric characters so if there’s a dot in there if there’s dash in there it does not affect what the what trigram is doing how does that look like it looks like this and create the extension create extension PGT rgm and if there for you already the basic commands the PG trigram gives you you can see here the the first to show trigram commands will show you what i was saying that the left hand side pads two spaces then it’s pads one space and then it moves on so we can see that what it’s comparing between those two words the misspelling of the word spelling and the correct spelling of the word spelling it breaks it up into each of the three character segments in there then it Dunn’s a pretty simple statistical comparison of the two to say how similar are those two things on what percentage of these things match really the inverse of similarity so similarity is a function but the inverse of similarity or one similarity is distance so you might want to sort things by how far apart they are rather than how close they are and lastly there’s a threshold there’s a threshold for matching if you want to use PG trigram to to pluck words out of your database that you think might be related so obviously slowing and spelling you know that’s a that’s a common typing error but it’s just as common for for people to be doing data entry I’m from the City of Milwaukee but it sometimes gets entered as Milwaukee or Milwaukee with an a Milwaukee you know these things are very very close and if you’re trying to go through a database and find all the people who live in Milwaukee you want to get as close as possible and get all the things that might have possibly been misspelled that limit is the thresholds you can see that it says the similarity between those two spellings is 0 point three eight the limit is 0 point three so if we wanted to query PG trigram it would say that yes point three eight is better than point three so this is a word that would match we can in fact change that limit if we don’t want it to be point three if we think that’s too too liberal we want to tighten up a little bit so we can see and it just cuts off at the very edge of that but those are just the names of the columns so two simple case statements the same statement effect case when those two that mod operator that the percent operator in there is a PG trigram specific operator in this case and it’s used for doing the exact comparison that we were just talking about it compares against the limit and if they are similar enough to meet the limit then it passes and if they are not similar enough it fails so again remember our result was 0 point three eight with default setting out of the box the threshold is point three so if you’re trying to find out whether those two words are similar it would say yes we can tighten that up if we say if we increase that limit and say now we want forty percent certain if we want things to be forty percent similar then it would fail would say no this is a stricter limit and there is nothing returned if you fat about PG trigram PG trigram is not case sensitive the order of the words does not matter so space separated words what it’s doing is it’s putting them all into an array and doing an array comparison it’s not doing a left-to-right comparison it’s doing left to right for each word but accents do so what does that look like what’s the similarity of these two statements I hope that’s big enough for people to see the first statement there is I am Sam the second one is Sam IM are those two similar well not left to right but in terms of PGT rgm yes they are one hundred percent similar all of the trigrams and those two two statements match on the other hand does the letter e match the letter E with an accent maybe that’s an accent Grove it’s been a long time since I took French no they don’t post grasses no these are not the same unless you do some coaxing behind the scenes again we’re not going to talk about character encoding today because that’s a gigantic can of worms it says no those two things are not similar how scalable ah some of you may be familiar with the show whoo televisions of the prisoner will not be pushed filed stamped indexed briefed debriefed or numbered now wait a minute indexed yes actually let’s use indexes what can we do with an index well if you have installed the PGT rgm extension you’ll also be able to see a massive speed

increase using like comparisons I like comparisons and regular expression comparisons by indexing with these newly available index type their new as of the for the extension they’re not new to postgres I’ve been around i believe since they’re 90 or 91 possibly even before that but they’re new to us because we just installed them in our theoretical database right so these new things give us two new index type to the gini index and the gist index the nexus will not aid in equality search meaning if you’re trying to say is string one exactly the same as string two and you use an equals operator that general gist index whichever one you choose put the optimizer won’t even look at it it will just go back to doing a heap scan unless you’ve created a traditional btree index this is actually the spot where a beach remix is useful because it is a left-to-right encoding of the entire string so string equality can use an index of Beach reindex it cannot use a Djinn or just index so let’s talk about these fancy new index types well the gist indexing is for generalized search tree that’s what we call a lossy style of index meaning that what it has built behind the scenes the what the index is actually doing is it’s making a hash of all of the different in this case of all the different trigram pieces and because a hashing algorithm at least this hashing algorithm has the possibility for multiple different characters to hash to the same before postgres can be a hundred percent certain that it is actually found the match that you’re trying to query for it goes the index first and all the things that it thinks are correct then it goes to the heap it actually goes directly to the disk and double checks and says did the hash work it has to gear it to the results the other index you get is the gin index generalized inverted index it is not a lossy index meaning that it is actually building a cache of what you think it’s building a cache of and it is a faster index than just substantially so some people say it really depends on the load but as you test you’ll probably find for query speeds that actually returns faster however there’s a trade-off gin indexes take substantially longer to build reports on the internet say between three to five times and these are on our own forums and mailing lists my own experience in putting these together to three times certainly very common to build the index in order to build it again because it is not glossy because it’s actually characterizing everything any update statement you make to your text data takes longer than had you used a gist indexed just index also the gin indexes again it’s not abbreviating anything it’s actually the absolute results that we’re looking for so physically on disk larger so what does that tell us if you have a whole lot of text and you think it’s going to change a lot you’re doing lots of updates to it you’re constantly getting you’re constantly getting new records in you’re constantly adding those indexes are to those to those columns and you may have to go back and update Rose you know probably not first and last name kind of data about some other kind of text data a message board data something like that where people might go back and edit and re-edit or you know maybe a blog platform would be more appropriate something that people go back to again and again and hit a lot then the gin index is probably going to be too costly for your needs because it takes a longer time to build and a longer time to update you just index on the other hand if your data is largely static so for example address data which is where we started talking address data doesn’t particularly change I mean they do build new houses and houses burned down and how is this get demolished buildings and things like that but not particularly quickly you know once you have a set of us addresses it’s largely going to stay the same so take the time build a gin index on it and take the cost on the update statements because you’re not doing an awful lot of them just as pretty fast gin just happens to be a little bit faster but you have to balance your needs so that’s pg trigram PG trigram again is used for word similarity finding spelling errors in your words what have you are not interested in spelling errors in the words you’re actually interested in reams and reams of text data that you’ve saved and you need to find out if certain keywords are in there this is the kind of full text search or semantic search that people are very interested nowadays they want to find out whether people use certain words in combination with other words they want to ultimately what they want to do by and large they want to find combinations of words so they can predict what you’re likely to buy so they can advertise to you that’s the glorious life we’ve all chosen in data we can certainly find plenty of advertising out there full text search we all picture in our head that what’s

happening then is like some kind of index on the text if I were to take the first chapter of Moby Dick and throw it into a column I put a what I consider to be a full text index I would think what I’m getting is the literal representation of every word in there that is not what you’re getting in fact we’re not getting even the text type behind the scenes in order to set up full text search we are not using a text column at all we’re using a column called TS vector TS stands for text search Nanda factors the count of mathematical term TS vector is different from text in a number of very important ways number one it eliminates case so there is no casing you know upper and lower case letters ur are identical it removes what it considers stop words this is pretty important so remember we’re talking about the little chunks of words in this case we’re talking about spaces between words it’s going to say that many many words are not relevant to text search we’re trying to find out unique things about a buyer or about a prospective client there are there words like and or not for sure but also words like he/she him/her and in fact the list is very very large it’s not it’s not a dozen or two it’s not all the UH one of those ad Edwards programs no thang long time since high school grammar too apparently nonetheless there are there are hundreds of those stop words and you can find a list of of all them that are in there they were pretty carefully picked because they aren’t terribly relevant if you’re trying to find interesting things in other words you’re looking for will probably be more like nouns and verbs adjectives the last thing the thing that’s really shocking to people is that the TS vector type doesn’t just take the word it tries to remove redundancy so there’s a giant English dictionary of synonyms in fact there’s dictionaries in many different languages of synonyms you just have to import it at the very beginning when you set up your index I’m sir when you set up your your full text column it will replace the synonyms because it’s trying to find word similarity and also takes what it doesn’t actually record the word it takes the stem of the word so if we’re looking for the word elephant being great postgresql Ian’s as we are certainly that’s a great word for us to look for in the full text catalog it doesn’t have the word elephant and has the word elif considers that the trunk haha trunk of the elephant stem this column absolutely can take advantage of those those index types that we were talking about both gin or gist and you absolutely should if you are building a full text catalog with TS vector and you need to be able to search it you should index that same rules apply if you think your data is going to change rapidly and get lots of updates then you should probably go with a gist index and if you think your Indian text data is largely static maybe you’ve got the Library of Congress or something and you’re going to search it probably not going to change too much use Djinn index if you’ve got the time they’re much much more things that we can do with with TS vector we simply don’t have the time to get into if all of the things that you know it’s an extremely powerful feature written toolset that’s in there one of the most important things we can do with it is in fact put waiting so if we have ultimately if we have used I built our full text catalog we’ve got two different columns let’s take a like a like a academic article summary academic articles have a title they have an abstract may actually have the body of text the full volume of texts we might say if we’re interested in finding a specific word maybe we want to find out about you know whatever research has been done on the human genome well we would say that the word genome if found in the title is extremely important we can give that a top-ranked if it’s found in that second category in the abstract column maybe we’d say it that’s worth half as much as the first one and if it’s somewhere in the body of the text well we don’t really know what the text is about I mean somebody could just be making an analogy you know they could be talking about zoology and make some reference to the genome that’s not as important or relevant to our search results whereas if we’re in the title or the header or the abstract that would be a lot more this is all things that we can customize and the reason postgres doesn’t do it for you is because all of the text workloads out there are a little bit different whatever it is that you’re doing whether it’s social media data scientific research marketing research you’re going to have different needs than the person next to you it has many more natural languages than just English not we have to do is Google find a look

40 s vector dictionary for the language of your choice my found one for German one for French my saw one for traditional Chinese I’m sorry for simplified chinese which are the little pinion characters it’s oh that’s terrific that would make a lot of sense yeah that’s good to hear that certainly makes a lot of sense they also have additional operators we could use besides just the the vector probe I’m just a demonstration there apologies I thought I had one more in there oh well and additionally you could combine search terms so thank you yourself if you’re trying to search through text maybe the word genome is only relevant if it’s next to the word human maybe the word genome is only relevant in some other time maybe you’re studying cheetahs I don’t know real question though is how does it all perform can we do with it well the reason we do this stuff is because we want to give it up to two large volumes of vexed if we had just a little bit of text we’d loaded into memory and we’d write up a program in whatever your favorite programming language is Perl Python Ruby whatever you feel like c-sharp if you’re a weird masochist I don’t know how does the scale let’s let’s take a look you know I i am sure i had a full text search example in there the the way you were before before i proceed to file i’ll just tell you the the name of the command that you would use is to TS vector i’m sorry to TS query and you would tokenize the word you’re looking here so if you’re looking for the word elephant in in a giant Rhema text you would say where your text column and you would use to at signs as the character for vector searching you’d say where where column at at to TS query elephant use english too since we’re in the english catalog certainly apologize i thought i had a an actual visual on that one but maybe we can do a live demo who doesn’t like that yeah all right now let’s take a look at this demo at the moment so this demo we’re going to take a look at PG trigram against the bunch of data and see how fast it is relative to traditional search with like or i like in this case a i found a list of the thousand most common surnames in the united states it’s easy it’s google search away on a thousand most common surnames and because a thousand isn’t a very big number let’s multiply it by 26 and then it’s x 26 again that gives us six hundred and seventy six thousand rows we have essentially very human-looking names but i stuck two letters on the end to guarantee uniqueness so we’ve got the natural joining of these these three columns will put an index on there you know traditionally we’d say we want to use just a b-tree index and since we know we’ve guaranteed uniqueness learn what the heck let’s use the best index we know how to use these unique index creates that just a few seconds let’s do a count okay so counts tower this was just run on my macbook now eight gigs of ram it’s you know if you think back to running production maybe four or five years ago maybe my mac looks almost that powerful but today’s terms it’s it’s a just a perfectly good developers tool on my laptop we got pretty good speed 6 1 is less than a sec and to get our to get our result back we want to find any last name where the last name using the the PG trigram case operator that the PG program that mod operator for for getting a response but we don’t have an index on it that it can use and it’s it’s a case-insensitive operator 658 milliseconds let’s just do a traditional I like statement so where the surname is like mi ll I honestly don’t know why there are fewer results on that I looked at that for a long time I so if somebody happens to look at that and say oh those nine dropped out for some other reason again we have no index only 676 thousand columns you probably have a bigger database than this but this is for example purposes faster I like is faster than PG trigram with no indexes on it 270 milliseconds so let’s put that trigram index will use the gist indexed again that was the faster

cheaper index but is easier to maintain created the index very quickly let’s do a count star again remember we were at 600 some milliseconds for that it dropped by a factor of ten sixty four milliseconds with with one index and just to prove that you get the same benefit without having to use the the fancy operator just installing the PG trigram and extension allows you to use this index type on any text column and then you’re like comparisons and I like comparisons behind the scenes are doing the the tokenization and and gaining a massive speed increase so we can see that we’ve gone down from we started 658 milliseconds and we ended up at 17 milliseconds it’s pretty impressive those same those same results you’ll find will scale as the number of those up how about a scaling a full text search how good it is full text search well let me show you a little bit this live demo here now uh what I have here is a Postgres database did I ever finish wow really didn’t then I wanted to get a lot of post grad of text data that we could really chew through so I went out to the internet i downloaded Wikipedia anybody can do it just search for Wikipedia database archive you’ll find every two weeks they put another one out there and I ftp site the whole thing is about fifty six gigs I loaded about 15 gigs into my database no 56 on gigs uncompressed actually Wikipedia is much smaller than many people think it is the the whole of Wikipedia could fit in memory and a moderately powerful laptop nowadays 64 gays probably a little bit beefy laptop but nonetheless so I loaded i loaded about a quarter of that data in there let’s take a look with that what that table looks like very simple it only has three actually four columns on there once these are just the Wikipedia entries themselves so we’ve got the Wikipedia entry for isochronous let’s just see what that what that article looks like and this is the the downloaded taxi it’s actually in markdown but it’s got a ton of data in it this actually is a very very small article again we’ve got 15 gigs of text in several thousand columns let’s see how many are actually in there so I got the 2.2 million articles loaded some them are extremely long some of them are extremely short in the middle we have an awful lot of awful lot of data like I say a little bit about 15 of that 50 gigs in actually I think means I actually unloaded about eight of the fit of the of the 50 gigs in as you can see I have two columns on that table or you could a while ago I’ll show Annie I mean I have the content then I converted that content into a TS vector so let’s look at the TS vector what it actually physically looks like if we were looking at it using that same article I so synchronous you can see that from that giant chunk of text remember vector doesn’t care about special characters so all of the the weird markdown characters the equal signs the pipe signs left the curly braces left and right braces all those are gone all its interested in is the pure text look at this swered right here it’s not the word frequency its frequents with nothing that’s the way it will match frequency or frequencies word equal word davi see some of these are the actual words beat brainwave burst

these are real words others so that’s physically what it looks like yeah that’s that is my understanding although there are a couple of them right here like isochron has a huge string numbers after at 11 18 3165 and of course that’s what that would be oh man you know I spent a while reading on that that’s uh that makes perfect sense nor did my presentation now I’m sorry literally all I had to do was a load the content and then uh there’s that there’s a corollary function I’ve got the function here to TS query there’s like there’s there’s another function that’s exactly really named as to TS vector to underscore TS vector and as long as you know the name and that’s when you define what what dictionary you’re using so you would just say update table set content vector equals to TS vector content are to content common English English comic content there’s also a second one simple simple is good to know because again our our languages are going through and and taking stems of words they’re not giving you the whole word giving you stems there’s something you might for example not want that what you actually want our all the stop words the actual content of your text if you instead of going to TS vector English or German if you say to TS vectors simple it will build the actual literal representation the text word by word it’ll still simplify it i’ll show you the number of times the word appears we’ll do that aggregation for you simple simple when building the the vector column doesn’t do the the synonym replacement or or the stemming we didn’t we can try it out actually let’s just take one column let’s see if i can find find a real short one you always want to win let’s go say i’m missing you so this is the first of the English stem let’s just look at the first few again because we’re parsing actual Wikipedia entries there are things in there that are part of the HTML allow actual texts and let’s change that to simple and see if we can spot the difference by five loaded it up last night about a third event so what can we see that’s different certainly not any of these that are just pieces of aurl’s let’s start with the first actual words the word here is we can see that there are literal numbers here the number six appeared 86 times number four by itself 160 times the letter A 80 times or in

position to do so the first the first real word I see after the letter A is acceptance access actual let’s take a look here and go past and now english is the stem diversion except access actual allow not accept dense so simple as simple as showing us the non stemmed word um the terminology in TS vector i’m not gonna lie is is confusing i gone back and forth on those i was just in writing this up and I’ve i’m at i have used TS vector specifically to solve text search problems with yeah yeah at ordinarily that’s not what I was looking for we didn’t want the literal one how does this perform relative to other methods so the same laptop the most generic thing I could think of a search for text was grep how fast is grap on a laptop with the same amount of data well first of all I couldn’t load the same amount of data one of the things I did in order to load Wikipedia into the into the database was I broke it up into four gigabyte chunks each one of those four gigabyte files to get to my to get to my 15 so I just kept against one of those files this one was named x3 so our count with content vector looking for the word elephant and again here we’re getting all the advantage of it being able to find words like elephants or elephantine I took it 22 seconds twenty two thousand milliseconds almost 23 gripping a third of that data next three was was the third of the files that I was loading and just doing a word count to see how many how many lines did it return that had that word in it it took it a minute and 25 seconds same laptop less data postgres beat it by you know a mile beat it by four times yep yep this is just my laptop I know I’m not claiming that my laptop’s been as well tuned as it could be this was just my default installed postgres hmm no yeah that could be but the the lesson here is that the compared to other methods of search or brute force searching not only are we getting the advantage of of the stemmed word being able to find in this case elephant doesn’t have a lot of weird fancy words that are like it you know like you know democracy disease did you have a question through the oh because I’m gripping I still have it and I do just encrypt their sea elephant I don’t need to closely with that run well that was a was my demo so we can we can talk about address data if you like I’m gonna think about it what five minutes left are you saying in the production solution that I’d worked on or in on in the sample here i was i was using it against a crossover column of city state county and we included zip code and mailing address now all even all of those put together you’re looking i mean they could have been of any size because their text columns but you’re right the amount the physical amount of data and it was extremely performant and allowed us to search

sure there is I wouldn’t imagine any more limit then than the limit of texts it absolutely could would it perform as well as elasticsearch I mean it would depend how much compute you’ve got behind your elasticsearch instance how good those yeah you can you can easily I mean mm-hmm yes so so for example the Wikipedia entries that are in there like if I want to do right now and just see how many Wikipedia entries had though we really saw it how many how many Wikipedia understand board so some of those Wikipedia entries were you know hundreds of eggs in size alright well thank you very much for your attention will be happy to keep talking about the about the data if anybody’s a interested in keeping up with what I have to say you can find me on google+ you can follow me on twitter at Phil Baca or you can check out my blog which is 64 to calm that’s another prisoner reference by the way the presentation was built in revealed j/s if you’re curious what it was that I was doing reveal actually a very very quick and easy way to put together slides very powerful and allows you to get a lot of effects out of it I found it very helpful be seeing you folks