It appears daunting, but adding a powerful search function to your application is neither complex nor time-consuming. Here are some solid recommendations!
While many technologies have impressed in the last 2-3 decades, a search is among the very few that became integral to our lives. It’s everywhere — e-commerce sites, blogs, knowledge bases, and more — not because a search box and icon look cool, but because it does something that is desperately needed.
If you’re a business looking for a good Search solution or are tired of your existing one, what do you do?
Thankfully, you need not pay ridiculous licensing fees, nor do you need to maintain a team of 20 developers and sysadmins. Today I have some search engine recommendations that can be installed and integrated in no time, especially by small-ish businesses with developer teams of size 1-2.
One of the very best and high-value search engines you’ll come across is MeiliSearch.
So, what makes me list MeiliSearch among my top recommendations?
All the source code powering MeiliSearch is available in the open on GitHub. That means developers can examine any part of the code themselves. In return, businesses can be assured of the quality and intent (no backdoors or scanner in the program, for example). And, of course, knowledgeable developers can help improve the technology even more.
MeiliSearch doesn’t have complex rules (like “a – b” means a but not b). Just type your search naturally, and results will start showing up fluidly. The engine is highly tolerant and accommodative, delivering accurate results even when typos or synonyms are used. It also supports multiple languages.
Simple search solutions should be simple to use and set up. As such, MeiliSearch checks all the boxes! When you are happy to proceed, get it started on DigitalOcean with one-click.
Solr, part of the Apache Project, has been around for a few years now. It’s built on the well-known and highly reliable Lucene library, which also powers the popular search solution called ElasticSearch. All this mumbo-jumbo means that Solr is among the most powerful, scalable, standards-compliant, feature-rich, and trusted search solutions.
It’s used by behemoths such as Disney, eBay, Netflix, Zappos, and BestBuy. However, that doesn’t mean you can’t run a smaller, simpler installation (say, single-machine, no scaling, no failover — well, sometimes it’s fine) and make use of this powerhouse called Solr.
So, why use Solr?
Here are some excellent reasons.
Accurate and powerful
Solr is among the most accurate, capable, and powerful search systems in the world. Plus, it’s open-source, which explains why big names (as mentioned earlier) have made a beeline to it. Its capability of digesting documents and answering search queries is second to none.
Simple install and maintenance
Installing Solr is as simple as uncompressing and running the program. For simple, single-machine systems, no tricky maintenance is required; keep an eye on the RAM usage, as search solutions in general and Java-based technologies in particular, can be quite RAM-hungry (because they keep or try to keep everything in RAM to provide fast reads/writes).
Solr comes with an admin panel that allows visual monitoring and configuration. With some trivial amount of training, even non-developers can learn to read the key charts. Not many search solutions on this list come with functionality like this one.
Solr provides a result interface in an API that can handle multiple formats — JSON, CSV, XML, and binary. It outputs monitoring data as per the JMX standard, a huge boon for Java developers.
There’s a lot more to be said in favor of Solr, but trying to cover everything will take us to the end of time. 😂 Suffice it to say Solr is a top-notch solution, and you can never go wrong with it, no matter what type of data you work with.
Elasticsearch was, and arguably still is, a pioneer in free-form text search. In fact, even today, if you ask a programmer or sysadmin for a recommendation on search engines, Elasticsearch is highly likely to be the only name they will come up with. Sure, these days, a sizeable chunk will recommend something like Algolia as well, but we’ve already covered how that pans out. 🤪
Don’t be misled by the “Start free trial” button in the graphic above. While the core Elasticsearch technology itself is open source and free, the company is trying to monetize its efforts and target enterprises. Hence, what you see here is actually the trial for their cloud service, making managing Elasticsearch easy, especially when there are clusters involved.
Uff, so many webs to untangle. Let’s recap: Elasticsearch is open source and free, and anyone can set it up easily and use it without any limits.
And now, as expected, let’s dive into the reasons for choosing Elasticsearch:
- Mature, battle-tested search engine. This means you’re far more likely to find solutions if you’re stuck with “weird” bugs.
- First-class focus on clustering, scalability, and asynchronous writes.
- Accessible via a simple REST API (which is what everyone else ended up copying).
- Document-oriented but supports schema if needed.
- Insanely fast and accurate results. Configurable search speed.
- Stellar documentation, in the amount as well as usefulness.
- A complete search-and-analyze cloud platform (the ELK stack), if you feel like paying for the convenience.
The only nit-pick I’d have against Elasticsearch is the massive RAM consumption. I mean, as consultants, it’s hard enough to convince clients to invest in a server costing $20/month, which sadly is nowhere near what Elasticsearch demands.
If you are curious to learn Elasticsearch, then check out this Udemy course.
Typesense is a lightweight, straightforward, yet powerful search engine. Those looking for usefulness and simplicity should definitely try this one out.
One of the best things about Typesense is that you can try it right on their website. That can save frustration and time in cases where you set everything up and try the API . . . only to find that one or more of the features don’t work the way you’d have liked.
That is not to say there might be bugs in the engine; it’s just that the engine’s take on something might not be your preference, or it might be flat-out in conflict with your business domain. Typos, special symbols, synonyms, and more . . . you can check the results the engine throws out right on the homepage (they’re using a books database for this).
As you can see, this section is right below the top-most one. In the search box, I’ve entered the query “tra”, and below I see matching results from the books database (as well as metadata — total results, current page, etc.).
Typesense has a lot going for it when it comes to a search engine of choice:
- The technology behind it is fully open source and welcoming.
- Easy to configure an HA (High Availability) setup, should you need one.
- Tolerant when it comes to typos and other noise in search queries.
- An advanced filtering system for those who need fine-grained control on the search results.
- Simple REST API, though their docs will make you work extra-hard to find it!
Finally, if you find the idea of setting up new servers tiring, Typesense also has a cloud offering where provisioning takes a single click. Billing is by the hour and read, and writes are unmetered. Frankly, I’ll say this is the better option for most businesses, provided they’ve worked out the pricing in advance and made sure it’s a net gain.
All in all, Typesense makes a lot of sense (no pun intended!) if you need something small, slick, precise, and a real workhorse.
Sonic prides itself on being an ElasticSaerch alternative that runs on “a few MBs of RAM”.
How is this possible?
Well, the Java Virtual Machine (JVM) is known to be RAM-hungry (generally, just starting up the JVM eats up around 1 GB of RAM); no surprise, then, that something coded in the Rust language (which provides full control and memory-safety for developers) can run just as fast and use only a few MBs of RAM.
As of writing, there are a couple of companies listed among its users, though I’m sure there are a few more that didn’t bother adding their names. I don’t remember how or the exact time frame, but I’ve come across Sonic earlier; at that time, while I was happy to see a low-memory alternative, I thought it’d need time to stabilize and iron out hidden bugs. Well, it looks like they’ve more or less arrived; how popular Sonic becomes is something only time will tell.
Okay, so long ruminations aside, why should you consider using Sonic for your organization/project?
Here are some reasons:
- Extremely low memory footprint, as far as search engines go.
- Libraries are available for all major programming languages. Node, PHP, and Rust are what the authors themselves released, while others were created by the community (rejoice, because exotic stiff like Elixir and Nim is also covered!).
- Several languages are supported (it was too much to count, but I think as of writing, 40-50 languages are supported).
- A surprise! You can even use new languages, and the engine will work (😂😂), though you’ll lose some advanced functions such as stop words.
- Very fast engine. If you check out their GitHub page, you’ll see that the ingestion and search times were in microseconds in several cases! Of course, this was a single-machine test, as network latency will never let the numbers be this low.
If you want to see this engine in action, go to this link (one of their user companies) and play with the search box you see there:
There are certain limitations to Sonic as a search engine. The devs have highlighted and discussed them openly on their GitHub page. My advice would be to closely examine this list and establish that your use cases are not in common with their domain. That said, everything has limitations; it’s just that they are kept hidden, and so we don’t realize until it’s too late. Therefore, I consider Sonic to be a great choice for a search engine.
We now have a fascinating entry on this list. The first interesting thing is that this feature-complete, a production-ready search engine, was written in PHP!
Yes, of all the languages possible, PHP. And I say that not because I hate PHP, but because it’s a short-lived process by design.
The second interesting thing is their license, at least as of writing. Actually, the license itself is MIT, so there are no problems there, but the authors classify this software as PS4Ware; if you use TNTSearch in production, you should send them a PS4 game! 😂😂 Now, it’s not mandatory, as the “should” indicates, but it’s funny beyond belief. I also hope they upgrade it to a PS5 license, though it’s too early right now.
Anyway, coming from a strong PHP + Laravel background myself, I highly appreciate these guys’ efforts. Their website doesn’t say much but seems to indicate that they’re consultants, so I highly recommend you reach out to them if you have projects!
Now, are there any good reasons to use TNTSearch in your projects?
Yes, there are:
- Coded in PHP, for PHP, by PHP. The PHP ecosystem needs more dedicated, high-quality solutions like this one.
- Important features such as fuzzy search, geo-search, and text classification.
- Easy to change the search index, which is major flexibility missing from many solutions.
- Stemming, BM25 ranking, and custom tokenization ensure high accuracy.
- Easy deployment — just like any other Composer package!
You can check out the engine performance here and see for yourself how fast and accurate it is. I’d stress the PHP aspect again: if you’re maintaining a PHP project, you want to remain within the PHP walls as much as possible (why? think of re-training costs). And for such cases, TNTSearch brings a value proposition that is hard to refuse!
Vespa is a broad and heavyweight offering. Like a couple of other entries on this list, it is too big to be captured in a few words. But I must try, so I will. 🙂 Vespa is a search engine, sure, but using it as an ordinary search engine will be wasting its potential.
Vespa was built to handle endless amounts of data (Big Data) and provide Machine Learning-driven features and endless customization on top of that.
Vespa positions itself as a competitor to Elasticsearch and traditional databases and provides a decent comparison on what to use when.
As you can see, the closer you wish you get Machine Learning-driven operations, the more sense Vespa makes. As a pure search engine for a small- to medium-level business, I don’t think it has any advantages over other options.
Now, considering you’re generating vast amounts of data continuously and want to make decision making better through AI/ML (a description that fits many SaaS businesses today), here’s why Vespa makes a lot of sense:
- Open source: No weird licenses and no trapping contracts. And nothing to pay on top of that, though I always stress that companies pay a regular sum to the projects they most use (even $50/month will help them a lot).
- Real-time: Vespa is truly real-time. It cannot only digest, crunch, and search on data as it comes in; even its configuration can be modified on the fly.
- Scalable and tolerant: Vespa is trivial to scale. It also responds very well to the sudden disappearance of nodes, providing high reliability.
- Ranking and recommendations: Search, ranking, and Vespa recommendations can be fused with structured queries to deliver truly accurate results.
- Painless AI/ML: Vespa comes bundled with high-quality, pre-trained ML models. You don’t need to hire 20 data scientists to clean and use your data.
- Custom plugins: There’s a full set of APIs helping developers create custom Java plugins, should they need to alter how the engine works.
Vespa is massive, no doubt, so it’s clear for teams that are a bit beyond the starting tier — whether it be team size, tech prowess, infrastructure budgets, daily data volumes, or something else. For this segment, Vespa will hit a home run and is highly recommended.
For some businesses, search data isn’t neatly transformed and stored as JSON documents already; rather, it’s a mess in the true sense of the word — a chaotic collection of all sorts of documents such as Word, PDF, HTML files, etc. If you’re one of them and thought there’s no hope for you, well, say hello to Ambar!
The best thing about Ambar is the large type of files it can work with:
- MS Office file formats (.docx, .xlsx, etc.), including PowerPoint, Visio, and Publisher!
- OpenOffice file formats
- PDF documents with auto OCR applied to extract information.
- Email archive formats such as PST (hello, Outlook users!)
- Email messages with attachments
The goodies don’t end here, too. Ambar is capable of working with large files (over 30MB), ZIP archives, and multithreading for full CPU utilization and faster results. So, if you have years’ worth of documents lying on some disk on a forgotten server, it’s time to bring it back and feed everything to Ambar!
Search 🔎 is powerful, search is magic, search is everywhere!
It might even be black magic, but today there’s no reason why everyone (with some developer help, of course) can’t reap its benefits. From businesses to individuals to governments, the search engines in this list provide an almost zero-effort offering that has exponential benefits and impact.
Go ahead and get a cloud server and install the above-listed search software you like to experience.