The System Design Decision You Will Regret Making Too Early
by Arif Ikhsanudin, Backend Developer
The Decision You Cannot Undo
You split your monolith into microservices at month three. You had twenty users. You did it because the team agreed it was "the right architecture for where we are going." Eighteen months later, you have four engineers spending a day each week managing service mesh configuration, debugging distributed traces for what used to be a single function call, and maintaining six independent CI/CD pipelines for services that deploy together 90% of the time anyway.
The decision was not wrong in principle. It was wrong in timing. Made with almost no operational data about how the system would actually be used, it locked in complexity before you understood the problem well enough to know what the right boundaries were.
This is the decision you will regret making too early: decomposition decisions. Specifically, service boundaries, data model schemas committed to too early, and synchronous vs. asynchronous communication patterns chosen before you understand your actual traffic shape.
Why Timing Matters for Irreversibility
Not all architectural decisions carry the same reversal cost. Some are cheap to change. Some are expensive. The mistake is not distinguishing between them.
Cheap to change: adding a cache layer, switching from REST to gRPC on a single internal service, changing queue implementation (SQS to RabbitMQ), adding read replicas.
Expensive to change: service boundaries once clients exist, data model decisions once data is in production at volume, synchronous coupling between services once SLAs depend on it, choice of database engine once terabytes of data exist.
The heuristic is: if changing this decision requires coordinating changes across multiple codebases, migrating production data, or changing interfaces that external consumers depend on, it is expensive. Make it later, with more information.
# Cheap to add later:
- Cache layer (additive, doesn't change interface)
- Read replica (transparent to application code)
- Rate limiting middleware (additive)
- Async job for non-critical work
# Expensive to change after the fact:
- Table schema with millions of rows (ALTER TABLE is a production event)
- Service split once both services have separate clients
- Changing from sync to async for an operation callers expect sync
- Switching from a relational to document model at volume
The Service Boundary Problem
Service boundaries are the most commonly premature decision in backend systems today. The correct service boundary is the one that reflects a stable domain concept that has been validated by how the system is actually used in production.
Before you have production usage data, you are guessing at domain boundaries based on how the requirements look on paper. Requirements on paper and domain behavior in production are different things. Users use systems in ways that do not match the feature spec. The boundary that seemed clean at design time creates friction at runtime.
Martin Fowler's guidance on this is worth taking seriously: start with a monolith, identify the seams under real usage, then extract services along those seams. The seams that are under active load, that have different scaling requirements, or that are owned by separate teams — those are the boundaries worth extracting. Everything else is coupling that happens to cross a network boundary.
The cost of the wrong boundary is high: data that needs to be joined across services requires either a distributed join (expensive and complex) or data duplication (creates consistency problems). Transactions that span services require two-phase commit or saga patterns — both of which add significant operational complexity. Neither of these costs shows up in the architecture diagram.
What to Defer and Until When
Service decomposition: defer until you have a service that is a deployment bottleneck, has a meaningfully different scaling characteristic from the rest of the system, or is being actively developed by a team that is blocked on deploying because of another team's work. Not before.
Schema finalization: keep schemas additive in early development. Add columns, do not change or remove them. Delay any decision that requires a backfill migration until you have confirmed the data shape is stable. Use nullable columns with defaults while a feature is being validated.
Async vs sync: start synchronous. When you observe that a specific operation is creating latency on the critical path, or that failures in a downstream service are cascading into user-visible errors, then move that operation async. Not speculatively.
The Practical Rule
Make the decision that gives you the most information before committing to the expensive one. A monolith gives you accurate data about domain boundaries. A single database gives you accurate data about which tables actually need to scale independently. Sync communication gives you accurate data about which operations users actually care about latency for.
Defer the expensive decisions until the cheap version has taught you what you need to know to make them correctly.