The Blueprint
Posts
How Doordash scaled up their feature store to 20M reads per second (Part 1)

How Doordash scaled up their feature store to 20M reads per second (Part 1)

Omkaar Kamath
July 27, 2023

GM Busy Engineers 🫡 Doordash released a blog about scaling up a crucial piece of ML (Machine Learning) infra: The Feature Store. This is Part 1 of a two-part series and we are covering what makes a good feature store and how they chose the right technology (spoiler: it gets muddy)!

In Part 2, we breakdown specific optimizations they made to Redis to enable high throughput.

But first…

What is a Feature Store?

Source: Doordash Engineering Blog

A feature store is a fast-access store for massive amounts of feature data. As shown above, it provides data to ML models that make user-relevant predictions.

At companies that require 100K+ (even 10K+) predictions per second (QPS), Infra that ensures fast inference times are what separate a good user experience from bad.

While the ML engineers worry about model building and data quality, we system engineers have a higher “scale” of worries ;)

A “Gigascale” Feature Store

With 34M active users, their feature data spans billions of records while models actively use millions for inference. While Doordash was using Redis for their initial feature store, they were not happy with its performance. Thus, they identified characteristics of an ideal feature store for them:

Persistent Storage of Billions of Records

Wait a sec, why is Redis (a cache) used for persistent storage? This is why (TLDR; Doordash uses snapshots to ensure persistence). I’d love to know if this topic is something you’d want to learn further.

Serves Millions of Feature Lookups Per Second

Let’s take their top ML use case, store ranking, as an example to do some simple maths. It uses 23 features to rank and recommend stores based on relevance to the user. It also gets called 1M+ times per second 🤯 Therefore, the datastore should support, at the very least, 20M+ reads per second!!

Batched Random Reads

On disk-based data-stores, I would assume batched random reads referring to reading data located at different areas of the disk. But what does it mean in an in-memory context? Storing data on random areas of RAM wouldn’t matter as it is designed for random access. Thus, it simply means async reads (Pipelining in Redis shown below). The whole point being to reduce network overhead of requests going back’n’forth.

“Heterogeneous data types require non-standardized optimizations”

Fancy way of saying: Support for optimizations on a per-type basis to optimize storage and performance efficiency (types include strings, floats, lists and vector embeddings).

Low Read Latency

The requirement for ML inference is in the low milliseconds which means the feature store read latency HAS to be a small fraction of that. Insane.

Why Did Doordash Choose Redis Again?!

Doordash ran an extensive benchmark on a bunch of alternatives (I won’t bore you with the deets) and here is latency and cost breakdown.

Latency-Based Comparison

Source: Doordash Engineering Blog

With no surprise, the in-memory store Redis won the benchmarks! Please note the above is over a workflow of 10,000 operations consisting of 1,000 batched lookups or reads per operation. For example, Redis on average for a read-only workload takes 1.9 microseconds per lookup!

Cost-Based Comparison

From the above latency chart, Redis and CockroachDB seem like good options. Disk-based databases are going to be cheaper on a storage-basis, so a further cost-based comparison needs to be done.

Doordash uses AWS ElastiCache which has in-built replication to ward off persistance concerns. But this also means the cost calculations need to be done the AWS way.

Source: Doordash Engineering Blog

vCPU is virtual CPU in AWS land and the above image shows how Redis requires half the CPUs to perform 125,000 lookups per second. Which means double the CockroachDB instances will need to be run to support the same amount of lookups as Redis instances (higher costs).

Since in this case vCPUs is the real limiting variable for low latency reads and not storage memory, Redis is clearly the better and cheaper solution.

Wait, so they went from not liking Redis to choosing it again? Yes! Next newsletter I will breakdown how they performed some insane optimizations.

Have questions for me? Reply to this email or DM me on Linkedin