Elasticsearch Series — Part 1 — Introduction
In the last few years I have worked a lot with Elasticsearch and processed terabytes of semi structured data. Strange is that my love for Elasticsearch started with a journey I embarked on not out of choice but necessity of the business. What that means is that I have not explored the competitors of Elasticsearch. I will someday in future definitely do a POC with Solr, but till then I am quite satisfied with Elasticsearch.
I must say that the learning curve is steep no matter what kind of background you come from. Terms like “Analysers”, “Filters”, “Tokenizers”, “Lucien” etc along with a different query language can make the task daunting. On top of that the documentation of Elasticsearch is let’s just say not easy to navigate (More often than not I have relied on google to find me the best elastic page). Because of theses for about 6 month of using Elasticsearch I was still using it as a NoSql database. The use-case I was working on was to build search capabilities on TV subtitles and EPG data. The dataset for just the last 5 years was over 800 GB and one of the first tasks was to ingest the data and that is where I began my journey into the world of Elasticsearch. Since then I have led various projects to build search engines based on Elasticsearch and even after years of experience I am still learning.
Enough background, let us get into it…
Let us start with Why Elasticsearch?
Before I start, this is not a history lesson and I am not bringing in facts, just abstractly talking about the evolution of commercial databases.
I come from an era where Oracle Databases were still in. I remember tuning SQL queries and though there were lots of advantages of Oracle and similar databases they were missing two essential things:
- Storing unstructured data was not efficient
- String based queries were not efficient
Then came the boom of NoSql databases like mongoDb and all of a sudden unstructured data could be easily stored. But one thing most of these databases were still missing was efficient string based queries. One could ask why are string based searches so important? And I would say because of user experience and what I like to call google effect. Search boxes now a days are not just meant to input search terms. Query suggestions, aggregations, fuzzy search, acronym and synonym search, stemming and probably 100 more algorithms have ushered in a new era of search needs.
And this is exactly the problem that Elasticsearch solves. While not every user experience would be same as google (Nor would be every problem), but Elasticsearch brings us pretty close to giving these (once delighter and now necessary) features to the end user and with not a lot of effort and cost.
Use-cases of Elasticsearch
The next logical topic to discuss now is what is Elasticseasrch most useful for? In commercial terms what are use cases of Elasticsearch? Or when should we use Elasticsearch?
Quite simply Elasticsearch is meant to be used was a search-engine. But what exactly does that mean? Common sense tells us that search engines can find stuff for us. And most natural search is text based search (google, amazon, ebay etc). But text based search is not simple. There are spelling mistakes, synonyms, acronyms, grammar, languages and many more things to think about. Before you know the search starts becoming a monster. So, this brings me to the first use case of Elasticsearch — Complex Text based search.
Now let us elaborate it a bit, given a search phrase a search engine can bring to us the most relevant results (focus on relevancy of results). So we have arrived at the second use case of Elasticsearch — Most relevant search results.
Now what good would a search be in any commercial environment without some aggregations, right? Search for a product on amazon and you can see so many different aggregations. Aggregations are second nature of search and for a good reason. What reason? Filtering. So here is the third use case — Aggregation of the search results based on some metadata.
So imagine you have a lots of content and your user start typing, what is the expectation after first few characters typed? Yes, you are right some suggestions based on the content. Well you can have an ML model built for that, but that is a discussion for another time. So this takes us to the fourth use case of Elasticsearch — Query suggestions.
The world is moving at a fast pace and everyone is looking for real time results. Business want the most up-to date information to reach customers and for a good reason. So here we have the fifth use case for Elasticsearch — Near Real time search. For those who will understand, Elasticsearch is a highly concurrent and distributed data store with a possibility of geographically distributed setup of master and data(slave) nodes. This enables Elasticsearch to provide a near real time search capability without any downtime.
An extension of Aggregation is Visualization. Visualizations are easy to understand and look good on presentations. Bring in Kibana and we have a Sixth use case for Elasticsearch —Search and aggregation with visualization. Please note I will not recommend Elasticsearch just for visualization. There are far better tools available for that.
My hope is that this article would have given you a pretty good idea of why and what of Elasticsearch. In the coming days I will explain the concepts of Indexing and Searching in Elasticsearch. Please keep tuned in.
Alvida, Bye, Khuda-Hafiz, Saionara for now.