Skip to main content

The Scalability Mistake Most Startups Make Before They Have Anything to Scale

There's a specific kind of startup meeting I've watched go wrong many times.

It happens when the founding team is still pre-launch, or has a few hundred users at most. Someone — often the most technical person in the room — raises a concern about scale. What happens when we have a million users? How will the database handle the traffic spike if a major customer imports their entire dataset? What if the architecture doesn't hold up?

These are real questions. At this stage, they are almost always the wrong questions to be asking.


I spent over a decade at Nokia working on software configuration management and build tooling for devices deployed on hundreds of millions of handsets. Before any of the founder narrative — before Realm, before Y Combinator, before any of it — there was the unglamorous, foundational work of understanding how software actually fails at scale. Not theoretical scale. The actual scale of global consumer devices, where a bug in a configuration management system could affect millions of users and patching was not a simple matter.

What that experience taught me is not what most people expect.

The assumption is that scale reveals architectural problems. This is partly true. But the more useful lesson is different: most "scalability problems" that engineering teams worry about in early stages never actually materialize, and the ones that do almost never look like the ones the team planned for.

At Nokia, the problems that killed products at scale were almost always operational, not architectural. Integration failures between systems that worked perfectly in isolation. Configuration management that was correct in the lab and wrong on the device. Logging that consumed memory at a rate nobody had noticed during testing. The boring, invisible infrastructure that nobody designs for because it isn't interesting to design for.


When we built Realm, I tried to apply this lesson directly.

The core architectural decisions — the custom C++ storage engine, the zero-copy memory mapping, MVCC without traditional locking — were made early, when Realm had a handful of users. We made them not because we had a million developers yet, but because we understood the problem well enough to know what the architecture needed to be.

The zero-copy approach wasn't premature optimization. It was a fundamental design decision about how data moves through the system — one that would have been expensive to change later. We made it once, correctly, and never had to revisit it. The database eventually reached 2 billion device installations without rewriting the storage engine.

This is the distinction that matters: there's a difference between designing for scale and premature optimization.

Designing for scale means making architectural choices that remain correct as the system grows. Premature optimization means spending engineering capacity on performance problems you don't have yet, based on load patterns you're guessing at. Premature optimization looks like: weeks of performance profiling before you have users, complex caching strategies before you know your access patterns, a microservices architecture before you understand where the service boundaries actually should be.

We spent a meaningful amount of time designing a query caching layer that we thought would be necessary when developer databases grew large. The reasoning was straightforward: repeated queries on large datasets would be slow without caching; developers would expect query performance to be consistent regardless of data size; we needed to solve this before it became a user-facing problem. We built something fairly sophisticated. It worked correctly. It turned out not to matter. The access patterns developers actually had were different enough from what we had modeled that the cache hit rate was low in practice, and the zero-copy read performance was fast enough that developers weren't reporting the problem we had been optimizing for. We kept the caching layer, but the time we spent on it would have been better spent elsewhere. The lesson I took from it: the load patterns you guess at before you have users are almost always wrong in specific ways. Some of those wrong guesses are cheap — you build something that doesn't hurt even if it doesn't help. Some are expensive. The expensive kind of wrong guess is the one that commits your architecture to a specific assumption.


The way I've come to think about this: the first scalability question isn't "how do we handle more load?" It's "which decisions will be expensive to change later, and which ones won't?"

Some architectural decisions are nearly impossible to reverse. Data format. Sync protocol design. The threading model your entire API is built around. The relationship between your data model and your storage layer. Get these wrong and you're looking at a rewrite at the worst possible time — when you have users depending on you and investors asking why growth has stalled.

Other decisions that feel foundational are actually incremental. You can add caching layers when you need them. You can move database reads to replicas when the primary becomes a bottleneck. You can introduce a message queue when synchronous calls become the slow path.

The mistake is treating the second category like the first — spending expensive early engineering time on problems you don't have, while underinvesting in the architectural decisions that genuinely are hard to change.


At Nokia, I watched the projects that didn't survive scale. The pattern was consistent: teams that spent early cycles on interesting optimization problems and skipped the boring architectural questions. The systems that scaled were usually the boring ones — where someone had thought carefully about data formats, interface contracts, and operational failure modes before writing much code at all.

The 2 billion device installations that Realm eventually reached weren't the result of building for that scale from day one. They were the result of building the right architecture from day one — an architecture that could reach that scale without being rewritten.

That distinction is the whole game. Start with what you can't change. Leave everything else until you have evidence that it's actually the bottleneck.