How Doordash Cut Elasticache Costs by 50%

GM Busy Engineers ☀️ My last article covered how Doordash chose Redis as their feature store. This article covers how they cut costs and even improved their latency in the process.

FYI Doordash is hiring!

Thanks to Grace Lemire for this idea!

Quick Recap

Source: Doordash Engineering Blog

Let’s get you caught up on some context:

  • Doordash wanted to build a highly-scalable feature store. Their existing Redis store was not working for their scale requirements.

  • Wait, but what is a feature store? A feature store is a fast-access store to store massive amounts of feature data. It provides data to ML models that make user-relevant predictions.

  • If Redis was not scaling well, what did they choose instead? Redis :) It won all the benchmarks!

  • Which leads us to this article, where instead of changing technology, they improve existing implementation (an awesome lesson for all engineers).

Optimizing CPU Usage

All Doordash did to optimize their CPU usage was switch to Redis’s in-built data type Hashes to store related data (from a flat list of key-value pairs). Think of this data structure as a dictionary in Python or object in Javascript. Well, why did this work?

Collocation

Collocation is storing related data on the same Redis node. Turns out, key-value pairs under a hash are collocated which means querying for multiple keys is much more efficient (compared to querying for keys scattered across nodes)!

Less commands per batch lookup

The Doordash article states that they need just “one Redis HMGET command per entity as opposed to multiple GET calls”. This is valid as HMGET entity_id key1 key2 key3 is the batch lookup version of

GET entity_id_key1
GET entity_id_key2
GET entity_id_key3

Something the original article missed is the MGET command that allows you to get multiple keys in one command which should have been used in place of multiple GET commands!

Less commands sent leads to less CPU usage and less network overhead!

Optimizing Memory Usage

Since Redis is an in-memory store, finding ways to minimize the size of feature data will result in a direct drop in memory usage. Doordash implemented very neat strategies to do so:

xxHash hashing feature names

Doordash mentioned using feature names like “daf_cs_p6m_consumer2vec_emb” which is great for communication. Strings like these are 27 bytes long and so, we have an optimization opportunity :)

What if we can convert this 27-byte feature name into a 32-bit integer? This can be done easily using a fast and resource-efficient hashing algorithm called xxHash32. It’s speed comes from being a non-cryptographic algorithm.

Reducing bytes through datatype conversion

Compound features like list of integers or vector embeddings (list of floats) take up a lot of space, so, a quick win was to use Protocol Buffers to serialize that data.

However, the DD engineers noticed something else, a significant number of single floating-point value features were 0, This means instead of storing it with 32-bits, it would be more space efficient to stringify them. Basically, 8-bit “0” string is lighter than 32-bit 0 float. Note, this is a doordash specific optimization and might not apply to your org… which emphasizes the point: Understand your data really well.

Another idea was to use compression on integer lists. They chose a compression algorithm with high compression ratio (large reduction after comp.) and low deserialization overhead (fast to bring data back to it’s original form): Snappy. An awesome insight from the article is that Embeddings have high entropy (very few repeat values) and thus, are not great for compression.

Results of Optimizations

The new benchmark results are in!

A quick note, when we talk about reducing CPU usage & Memory consumption, we are talking about Elasticache resources in particular. Serialization and compression both add to resource usage on the feature store service level but reduce it on the $$$, AWS-managed Elasticache level.

Source: Doordash Engineering Blog (edited)

The table above shows the effectiveness of the hashes approach. A gigascale feature store should prioritize read latency over write (no. of write operations is 0.1% of no. of read operations). Thus, the small increase in the write latency is negligible compared to the large drop in read-heavy and read-only workload latencies!

Source: Doordash Engineering Blog

Table above clearly shows the benefit of compression! One question you might have is LZ4 seems better (writes and reads both faster), why was Snappy chosen?

Notice how deserialization is 3x faster when Snappy is used. Faster deserialization means → ML service can be executed relatively earlier → User sees personalized results quicker! Also, taking a step back and looking at db latency + deserialization time together, 2.5ms + 1.9ms (Snappy) < 2.1ms + 6.5ms (LZ4) and so, Snappy is the right choice!

Source: Doordash Engineering Blog (edited)

This shows a whopping 80% drop!

Ok Omkaar, all these benchmark results are great but what about… the real world… PROD-land (production)?

Results from PROD-land

Source: Doordash Engineering Blog (edited)

Summary: 60% drop in memory allocation, 65% drop in CPU utilization, 40% drop in Redis latency and 15% drop in Feature Store latency… 🤯

The customer only sees predictions 1-ish ms faster… but the above results add up to a large (undisclosed) amount of savings and better system reliability!

I hope you learned a lot from this article, cya in the next one!

Checkout the engineers behind this - Arbaz Khan, Zohaib Sibte Hassan, Kornel Csernai, Swaroop Chitlur