The Blueprint
Posts
How Yelp Switched to a Near-Real-Time Search Achitecture

How Yelp Switched to a Near-Real-Time Search Achitecture

Omkaar Kamath
December 21, 2023

GM Busy Engineers 🫡 We are back after a short break (I just completed my job hunt)! Today, we are covering how 70M+ Yelpers get their search needs satisfied.

We have covered Github’s Search Engine before which had code heavy search use cases which are different than Yelp’s use cases.

Yelp is hiring engineers!

Search at Yelp Scale

Yelp previously used their Elasticsearch (ES) for their ranking platform which was for use cases like finding listings and photos relevant to searches. However, as more use cases were added to the roadmap, ElasticSearch started falling behind.

Source: Nidhi Gupta’s Medium

Why?

Replica-level indexing: Each new document added to ES would be indexed individually by Primary and Replica instances, which meant tons of wasted CPU resources across nodes on redundant work.
Shard distribution problems: Index distribution on shards is controlled by ES. Some nodes end up having a lot of requests while the others do not depending on which indices are stored on the node.
Complex autoscaling: ES is tough to scale up in real-time causing Yelp to always run on peak capacity even during non-peak hours.

How Lucene solves their search needs

Lucene is a powerful and mature full-text search engine written in Java. Here are some of it’s characteristics and features that Yelp loved:

Java - Yelp used Java to work with it’s ElasticSearch layer, given that Lucene is written on Java, a lot of custom ranking work can be reused
NRT Segment Replication - In Near Real Time, a Lucene primary node indexes data into segments (or sub-indexes). Replica nods can just pull these segments instead of reindexing like in ES. This is - good because all nodes don’t index the same data, bad because it could cause network congestion.
Concurrent search - Instead of parallelizing search over multiple ES shards (or instances), Lucene multi-threaded can search over multiple segments of an index in parallel.
Easier to scale - As long as the primary copy of an index is stored on external storage, autoscaling is easy! Node count can go up or down and pull the index from external storage (shown in image below).

Implementing Lucene

Source: Yelp Engineering blog

The above is a simple diagram for Yelp’s search architecture: The primary node is responsible for indexing data, committing the index to persistent storage and then, storing backups to remote storage. The replica nodes sync with the primary node and receive near-real-time updates.

Migration strategy

Yelp adopted a tried-and-tested migration strategy: phased rollouts. Initially, they implemented a dark-launch where 1%, and gradually 100%, of all traffic was sent to NrtSearch, while results continued to be provided by ElasticSearch. This approach ensures that NrtSearch can meet production load and that its results were consistent with ElasticSearch's. Finally, 1% of all traffic was routed to NrtSearch for result generation, progressively increasing this to 100%.

Optimizations

Query Warming - When a new replica instance starts, it would take a long time to collect a steady state cache of the most frequent queries. Yelp samples these queries and runs them on startup to “warm“ the index → better performance earlier in the replica’s lifetime.
Virtual Sharding - Segments in Lucene are immutable sub-indexes. Virtual sharding allows assigning sub-indexes their own threads and thus, allows for distributing queries evenly.

Results

Source: Yelp Engineering blog

Their P99 metric says it all, 35% drop in query times with NRTSearch and a 40% drop in costs after some clever optimization!

Shoutout to Andrew Prudhomme, Erik Yang, Jedrzej Blaszyk, Karthik Alle, Samir Desai, and Tao Yu for their hard work on Nrtsearch!