• The Blueprint
  • Posts
  • How Dropbox saved $1.7M using a new machine learning classifier system

How Dropbox saved $1.7M using a new machine learning classifier system

🫡 GM Busy Engineers. Dropbox introduced an out-of-ordinary use case for ML within their systems and you need to know about it. Imagine getting the same performance at the cost of $9,000… but at the gain of $1.7M in savings!

Dropbox is hiring engineers like you!

Dropbox’s Preview Feature

Source: Dropbox Blog - File previews

Dropbox’s file preview feature allows users to view files without downloading them in addition to thumbnail previews. Their system, Riviera, generates previews by performing various transformations like rasterizing PDF pages into images.

Without Riviera or file previews, a user would have to identify files just by their names or view them by downloading, which leads to bad UX.

Riviera processes massive data volumes daily, and to enhance the previews experience for large files, it pre-warms (pre-generate + cache) preview assets.

The Problem

Think about your Google Drive usage, have you previewed every single file you store there? Probably not. Dropbox noticed the same pattern with their users; users only previewed a small subset of their files and most of them were stored there as a backup / archive.

Yet, most large files seem to be pre-warmed in the anticipation that a user might preview it → incurring substantial avoidable CPU and storage costs.

Fitting ML to the situation

Dropbox’s first thought was “What if we predict which files needed pre-cached previews?”. Well, this came with two main challenges:

  • User Experience (UX) vs. Saving money - Pre-warming all files leads to optimal UX but high costs. On the other hand, files that are not pre-warmed but still previewed would require on-the-fly preview generation which is expensive and slow and leads to bad UX.

  • Accuracy vs. Model Interpretability - Complex models might be more accurate but understanding the ‘why’ of a prediction is even more complicated. Knowing why the model makes a certain prediction leads to more transparency, less bias, and better chance of detecting errors.

Being a novel solution, Dropbox chose a simple and interpretable ML model for the time being while building a strong infrastructure that allows for the introduction of complex models later.

With 500M+ requests for generating previews and 98% of files on Dropbox being previewable, even if this ML helps reduces pre-warming by 10%, it translates to a huge saving!

Cannes - The Classifier Model

The ML model was named Cannes and aimed to be a simple, fast-to-train, explainable and cheap to run. Dropbox chose a gradient-boost classifier and trained it on features like file extension, type of dropbox account and 30-day trailing account activity.

The model was 70% accurate and rejected about 40% of pre-warm requests when evaluated, which was satisfactory for V0!

For previews that are not pre-warmed (generated and cached), they were created on-the-fly (expensive). The pre-warming costs avoided (saved) minus the on-the-fly costs of wrong predictions totals up to $1.7M in savings!

System Design

Source: Dropbox Blog - Cannes Architecture

Here is the flow of the backend request to the prediction (shown above):

  1. Riviera (the preview generation service) requests a prediction from the Cannes backend

  2. Backend retrieves account activity and account type for the file from respective services and forwards it to the predict service

  3. Predict service encodes raw inputs into a feature vector and sends it to the model

  4. The model predicts the probability of the file being previewed in the next 60 days based on input features and returns it to the backend

All the extra service requests you see above put extra load on the system and were calculated to be about $9,000 in extra costs for Cannes (well worth the $1.7M savings!).

Source: Ubiq

The Dropbox team wanted to test the effectiveness of it’s ML, so it let 3% of it’s production traffic go through Cannes and the rest go through it’s normal system. Three metrics were really important when evaluating the solution:

  1. Preview latency distribution - p90 latency for Cannes vs normal system, a higher latency with Cannes suggests that ML is doing a bad job

  2. Cache hit rate - A lower cache hit rate for the Cannes system translated to a very bad user experience (user waits much longer) and high system load (on-the-fly preview generation)

  3. Confusion matrix - False negatives (previews that should have been generated but did not) need to be minimized for a good user experience. So this metric helps compare different ML models

While we were not provided with specific metric numbers, it is safe to say the ML worked well and is in prod!

Final Notes

When ML is introduced in critical areas like these, an important metric to track and fix is model drift which is the measure of the entropy in a model’s predictions due to real changes in data.

For example, consider a spam email filter trained on historical email data. Initially, it accurately identifies spam emails. However, over time, email patterns change as spammers adapt. The model's accuracy drops as it struggles to adapt to the evolving data distribution, leading to model drift.

Stay bizzy!