How Realm Works: The Database Architecture Behind 2 Billion Installs

yesterday · 2,090 words · 11 min read

realmdatabase-architectureoffline-firstmobiledistributed-systemsopen-source

How Realm Works: The Database Architecture Behind 2 Billion Installs

When we started Realm in 2011, the dominant options for mobile data storage were SQLite (a C library from 2000), Core Data (Apple's ORM built on top of SQLite, from 2005), and a handful of wrapper libraries that made SQLite slightly less painful to use. None of them were designed for mobile — they were desktop or server architectures squeezed into a device that had spotty connectivity, constrained memory, and users who expected instant response to every touch.

We set out to build a database designed specifically for mobile from first principles. This is what we learned.

The Core Problem with SQLite on Mobile

SQLite is a remarkable piece of software. It's correct, fast for many workloads, well-tested, and runs on literally everything. But its threading model is fundamentally misaligned with how mobile apps work.

A typical mobile app has a main thread that must never block (blocking it means dropped frames, and dropped frames mean App Store reviews starting with "sluggish and unresponsive"). It has background threads doing network requests, parsing responses, and syncing data. And it has UI components that want to observe data and re-render whenever it changes.

SQLite gives you:

// All access through a single connection serializes your writes
sqlite3_exec(db, "BEGIN TRANSACTION", ...);
// ... do stuff ...
sqlite3_exec(db, "COMMIT", ...);

To get concurrent reads, you open multiple connections — but then you have to manually coordinate which connection sees which version of the data. To get reactive updates, you poll. Core Data adds change notifications on top of this, but they fire after the fact, require you to understand NSManagedObjectContext threading rules, and have enough edge cases that Apple wrote a 40-page technical note about it.

The result: most iOS apps in 2011 blocked the main thread on database operations, crashed intermittently with "EXC_BAD_ACCESS" errors that traced to incorrect Core Data threading, and had refresh-on-pull as the primary UI refresh pattern because reactive updates were too hard to get right.

The Design Decisions

We made four architectural decisions early that defined everything that came after.

1. Object store, not relational.

Realm stores objects, not rows. This sounds like a minor difference, but it's fundamental. When you write:

class Task: Object {
    @Persisted var name: String = ""
    @Persisted var done: Bool = false
    @Persisted var project: Project?
}

Realm stores Task instances as contiguous memory blobs, with relationships represented as direct pointers (after translation through the memory map). Reading a Task and following its project link is two memory accesses — no JOIN, no query rewrite, no n+1 problem to manage.

The tradeoff: you can't run arbitrary SQL. If you need ad-hoc relational queries, Realm is the wrong tool.

2. MVCC without traditional locking.

Multi-Version Concurrency Control is how most modern databases allow concurrent readers. Each transaction sees a consistent snapshot of the database at the point it began. But MVCC implementations typically use a combination of write-ahead logs and version pointers that require coordination between threads.

We took a different approach. Realm uses a memory-mapped file with an immutable tree structure (a B+ tree variant). When a write transaction commits, it doesn't modify existing data — it writes new versions of the modified nodes and atomically updates the root pointer. Old readers are still holding references to the previous root, so they continue to see the old data, unaffected.

The key property: reads never block writes, and writes never block reads. You can have a hundred threads reading while a write is happening. The write commits atomically by updating a single pointer. Readers finish their transactions and then advance to the latest root.

This is possible because the file is memory-mapped and the root pointer update is atomic at the OS level (via msync on a single page). The price: you accumulate stale versions until no reader holds a reference to them. We had to implement careful version tracking to know when it was safe to compact.

3. Zero-copy reads.

When you read a string from a SQLite query, SQLite allocates a new buffer, copies the data into it, and returns a pointer to the copy. You're responsible for freeing it. Do this in a loop over ten thousand objects and you've just thrashed your allocator.

Realm returns pointers directly into the memory-mapped file. Reading a string means getting a pointer and a length — no copy, no allocation. The data is already in memory (or will be paged in by the OS on demand). For large datasets, this is the difference between an operation that takes milliseconds and one that takes microseconds.

The consequence: Realm objects are only valid while you hold a reference to the current transaction. If the transaction advances, the underlying memory might be reclaimed. This is why Realm objects need to be "frozen" if you want to pass them across thread boundaries — freezing takes a snapshot that won't be affected by subsequent writes.

4. Live objects as first-class primitives.

This was the design decision we got the most pushback on, and the one I'm most proud of.

In Realm, when you query for a collection of objects, you don't get a static array — you get a live Results object that always reflects the current state of the database. When a write commits, all live Results objects that would be affected are automatically updated.

// This query result is live
let incompleteTasks = realm.objects(Task.self).filter("done == false")

// incompleteTasks.count is 5

realm.write {
    incompleteTasks.first?.done = true
}

// incompleteTasks.count is now 4 — automatically, with no polling

Implementing this correctly was one of the hardest engineering problems we solved. When a write commits, we need to determine which live query results changed — without re-running every query. We built a query evaluation engine that could compute diffs: given the set of changes (object IDs and which properties changed), which queries are affected, and what changed within those query results?

This is what made building reactive UIs with Realm feel magical compared to the competition. You bind a table view to a Results and it just stays up to date. No NSFetchedResultsController, no refresh notifications to wire up, no polling.

The Sync Engine: Where It Got Really Hard

The local database was hard. The sync engine was an order of magnitude harder.

The problem: you have the same Realm database on multiple devices. The user edits data on their phone while offline. They edit the same data on their tablet, also offline. Both devices sync to the server. What does the server do?

The two standard approaches are:

Operational Transformation (OT): Transform operations so that applying them in different orders produces the same result. This is how Google Docs works. The problem: OT is notoriously difficult to implement correctly for anything beyond simple text. There are papers from the 1990s proving that certain OT algorithms have subtle inconsistencies.

CRDTs (Conflict-free Replicated Data Types): Design your data types so that merging concurrent modifications is always well-defined and associative. This works beautifully for certain data structures (counters, sets, last-write-wins registers) but becomes awkward for arbitrary object graphs with rich types.

We ended up building a custom protocol that used a form of operational transformation, but constrained to the specific operations that Realm objects support: create, delete, set property, insert-into-list, delete-from-list. By restricting the operation set, we could prove convergence properties that are impossible for general OT.

The protocol assigned every operation a global position in a causal history — a logical clock that captured "this operation happened after these other operations." The server would receive operations from multiple clients and compute a canonical ordering. Both clients, after receiving each other's operations, would arrive at the same final state by replaying operations in the canonical order.

The hard cases:

Concurrent deletes: Client A deletes an object. Client B modifies a property of the same object. The merged result: the object is deleted, and the property change is lost. We had to decide this was the right behavior, communicate it clearly to developers, and make sure it actually happened correctly.
List interleaving: Client A inserts item X at position 3 in a list. Client B inserts item Y at position 3 in the same list. Which one ends up at position 3 in the merged result? We used the causal clock to break ties deterministically.
Schema migrations: The server needs to apply operations from a client that's running schema version N to a database at schema version N+1. We maintained a migration log that could be replayed, and the sync engine had to understand the relationship between schema versions.

What We Got Wrong

The C++ core was good engineering. The cross-platform binding layer was a mess.

We built the core in C++, then wrapped it with Objective-C for iOS, Java for Android, C# for Xamarin, and JavaScript for React Native. Each binding layer required its own threading model, its own memory management conventions, and its own approach to surfacing the live-objects semantics in an idiomatic way for the target language.

This meant every API decision required implementing it four (later six) times, and every bug might be in the C++ core, the binding layer, or both. As we added platforms, the ratio of binding code to core code went above 2:1. Most of our engineers were working on bindings, not the database.

In retrospect, the right architecture was to build the bindings as thin as possible — just raw data access with explicit copying — and implement all the higher-level semantics (live objects, notifications, threading) in the language-native layer. We would have had slower individual-query performance, but the architectural complexity would have been manageable. We did eventually move in this direction, but it required a major rewrite that we should have done earlier.

The sync engine was also significantly more complex than it needed to be because we tried to handle too many conflict resolution policies. Different apps need different semantics — a notes app is fine with last-write-wins, a collaborative document needs richer merge behavior. We built a protocol that tried to be general enough to support both, and ended up with something that was too complex for either.

What I'd Do Differently

Embrace the mobile constraint as a design principle, not a limitation. The zero-copy, memory-mapped approach was the right call. I'd double down on it and make it even more aggressive — eliminate heap allocations from the read path entirely.

Build the sync engine on top of a CRDT library, not custom OT. The CRDT literature has advanced significantly since 2013. Martin Kleppmann's work on CRDTs for rich text is a good example of how far the field has come. Custom OT implementations are a liability — they're hard to reason about and hard to prove correct.

Ship the binding layer as open source from day one. Community-maintained language bindings are better than company-maintained bindings for languages outside your core expertise. We eventually open-sourced everything, but earlier would have been better.

Don't try to make the database invisible. One of our design principles was that Realm should be "transparent" — developers shouldn't have to think about it. In practice, understanding Realm's threading model, versioning, and live-object semantics is essential for avoiding bugs. Hiding these concepts behind abstractions made it harder, not easier, for developers to understand what was happening. The right abstraction level exposes the important concepts while making simple things simple.

The Takeaway

Building Realm taught me that most database performance problems are not algorithmic — they're mechanical. Memory allocation patterns, cache behavior, locking contention. Before you choose a data structure, choose your memory layout.

It also taught me that distributed systems problems don't disappear when you hide them in a library. Sync is hard because distributed systems are hard. You can make the API nice, but you can't make the fundamental tradeoffs go away. The best you can do is surface them clearly so developers can make informed decisions.

The database reached 2 billion installations and was acquired by MongoDB. I'm proud of what we built. More than the outcome, I'm proud of the specific technical decisions — the ones that held up over eight years and at that scale. The zero-copy reads. The MVCC implementation. The live-objects semantics. Those were right.

The sync engine architecture — that one I'd do over.

If you're curious about what the MongoDB acquisition was actually like from the founder seat — the process, the negotiations, the emotional aftermath — I wrote about that in What the MongoDB Acquisition Actually Felt Like.

How Realm Works: The Database Architecture Behind 2 Billion Installs

How Realm Works: The Database Architecture Behind 2 Billion Installs

The Core Problem with SQLite on Mobile

The Design Decisions

The Sync Engine: Where It Got Really Hard

What We Got Wrong

What I'd Do Differently

The Takeaway

Writing on building and fundraising

Related Posts