• The Blueprint
  • Posts
  • Scaling Discord's message store to 1,000,000,000 messages (5 min)

Scaling Discord's message store to 1,000,000,000 messages (5 min)

Key Takeaways

  • Analyze read/write patterns before choosing a database solution

  • In Cassandra, only write to non-null keys (unlike relational dbs).

  • Choose the simplest solution… they used a simple bucketing function to make their partitioning scalable!

GM Busy Engineers. Discord hit a billion messages sent in 2017 and recently hit one trillion 🤯. This article is Part 1 of a two-part series breaking down how they scaled to 1B and then 1T messages in 2023. Let’s query right into this data problem!

Credit: Discord Engineering Blog

If problems like these are interesting, Discord is hiring engineers!

Also, If there are specific topics you want covered, send it through this form!

Problem

Discord was built on top of MongoDB for fast iteration. But, when they hit a 100M messages, their indexes could not fit their RAM.

When the whole index doesn’t fit in RAM, the non-RAM portion of it must read from disk which causes an increase in latency → messages load slower for the users.

Discord decided to analyze their Read/Write pattern of users before choosing on a new DB, here are three main observations they made:

  • A voice-chat-heavy server has few messages sent every month. When a user loads messages, since they been sent over weeks, it causes random seeks (locating scattered messages on disk) which causes the eviction of useful data from the disk cache (bad for performance).

  • Some chat-heavy servers with few members have 100K+ messages. However, having few members, data is probably requested fewer times meaning its likely not in disk cache 😢

  • New features like replies (with clickable link that takes you to a previous message) and full-text search means more random seeks will be introduced.

Defining a good solution

As it goes for every good solution, they defined requirements:

  • Automatic failover - Ability to switch over to a standby on DB failure

  • Proven & Predictable - Data can be retrieved directly and quickly from the primary storage without needing an extra caching layer. Also, a battle-tested system.

  • Open source - Zero dependence on a third-party company!

All these requirements point to a few solutions, Discord chose Cassandra. It is open-source, has automatic failover and interestingly, Apple and Netflix already use it extensively.

Cassandra is awesome!

Cassandra is a KKV (key-key-value) store → The first K (Partition key) helps find the right partition while the second K (Row ID key) helps find the specific row of data (shown in the image below).

In Discord’s case, channel_id is the partition key and the message_id becomes the row key (they use Snowflake type ID which is chronologically sortable and unique).

partition_key = channel_id

BUT…

Cassandra could support up to 2GB partitions, what could possibly go wrong with partitions? Discord engineers realized partitions with > 100MB data will cause pressure on the garbage collector. Since a channel can exist for a long time and accrue data, their initial partitioning strategy was not scalable.

Tweaking the 🔑 strategy…

A small change in their implementation helped them overcome the hurdle: Partition key becomes a compounded key made of channel_id and bucket (as shown in the image below).

Basically, messages are batched or bucketed in 10 days intervals and the bucket number is used as a part of the partition key. The bucket number comes from a clever function that calculates it based on epoch_time.

partition_key = channel_id + bucket

Testing Cassandra before prod-ing it

Discord performed a standard double read/write to MongoDB and Cassandra for testing and they learned a few lessons.

One that caught my eye intrigued me is how they optimized writes to Cassandra. Each message is comprised of 16 columns, and though all 16 columns inserted, only 4 of them had values. The other 12 columns were written with null values on each write and were considered “tombstones”!

In Cassandra, tombstone rows are not read by default which means there is no reason to even write it. So, to avoid writing unread and unused values, the optimization was to simply to let a write consist of non-null values!

Final Results

Imagine having a sub-millisecond write and 5-millisecond reads. Well, Discord had that luxury with over a billion messages worth of data (refer to their Datadog dashboard below)!

Credit: Discord Engineering Blog

Thank you for your time, busy engineers 👋 Here is the original article.