Designing APIs That Scale Without Rewriting Them Later

by Arif Ikhsanudin, Backend Developer

The decisions that constrain you at 100x

The common narrative around scaling is that it is an infrastructure problem — add more servers, tune your database, add a cache layer. Infrastructure scaling is necessary. But the decisions that are genuinely hard to scale out of are API design decisions: what you put in the contract with clients.

A response schema that clients depend on cannot be changed without a version bump. An endpoint that performs a synchronous database join of four tables cannot be decoupled without changing the API. A global sequential ID scheme that reveals business metrics cannot be replaced without breaking client integrations.

These are design decisions made when the API was small. At 100x load, they are constraints you work around rather than problems you solve cleanly.

Statelessness is not just a REST principle

Stateless APIs — where each request contains all the information needed to fulfill it — are not just architecturally clean, they are horizontally scalable by default. You can add application server instances without coordinating state between them.

The places where statefulness sneaks in:

Server-side sessions: If your API tracks session state in memory or a server-local store, requests from the same session must go to the same server. This works with sticky sessions but breaks under failover and complicates deployment. Use client-side tokens (JWTs) or distributed session stores (Redis) instead.

Request-scoped context that leaks between requests: Global or thread-local state in languages with shared memory. This does not become a correctness problem until you have multiple workers or start reusing connections. Audit your middleware and request context handling.

Synchronous accumulation of derived state: An endpoint that returns "your account balance" by summing all transactions is a design that requires an increasingly expensive query as the transaction history grows. Separate commands (transactions) from queries (balance) using a materialized view or event-sourced aggregation pattern.

ID design has scaling implications

Sequential integer IDs leaking to clients create several problems:

  • They reveal business metrics (order count, user count) by inspection
  • They are not safe to generate at scale across multiple database shards without coordination
  • They assume a single canonical sequence, which limits horizontal write scaling

ULIDs (Universally Unique Lexicographically Sortable Identifiers) or UUIDs v7 give you globally unique IDs that sort by creation time without a centralized counter. They are safe to generate in any service instance, shard, or region.

// ULID: 01HZQK7P3WVXBN4Y9MRDTJC8E6
// 48-bit millisecond timestamp + 80-bit random component
// Sorts lexicographically by creation time
// Safe to generate distributed

Expose ULIDs to clients from day one. Migrating from sequential integers to opaque IDs after clients have built integrations that parse IDs as integers is painful.

Synchronous vs. asynchronous operations

The default REST pattern is synchronous: client sends request, server processes it, client gets response. This works for operations that complete in under a few seconds. It breaks for operations that take longer or involve multiple services.

The indicators that an operation should be asynchronous:

  • Processing time is unpredictable (depends on file size, downstream service latency, or queue depth)
  • The operation involves multiple steps that can fail independently
  • The client does not need the result immediately to continue

The pattern:

POST /exports
→ 202 Accepted
{
"job_id": "job_01HZQK",
"status": "pending",
"status_url": "https://api.example.com/exports/job_01HZQK"
}

Design this from the start for operations that are clearly going to be long-running. Retrofitting async into an existing synchronous endpoint requires adding a new endpoint pattern and migrating clients — more work than getting it right initially.

Resource granularity

APIs that expose fine-grained resources force clients to make many requests to accomplish a single task. A mobile app that needs to render a user profile page making 7 API calls is a chatty API — high latency (each call is serial or requires complex parallel orchestration) and fragile (a single failure breaks the whole page).

Design for client use cases, not data model purity. A /profile endpoint that returns user data plus account status plus recent activity in a single call is not less RESTful — it is pragmatically designed for the clients that exist.

At scale, chatty APIs multiply: 7 calls per page load × N concurrent users × M page loads per minute. Batching at the API level has a disproportionate effect on load.

Idempotency is a scalability feature

In distributed systems, requests fail and get retried. If retrying a request causes a duplicate operation (double charge, double inventory deduction), your system is not safe to retry and requires complex deduplication logic on the client side.

Design mutating endpoints to be idempotent with a client-supplied idempotency key:

POST /payments
Idempotency-Key: 01HZQK7P3WVXBN
{
"amount": 9900,
"currency": "USD"
}

The server stores the response keyed by idempotency key. If the same key is submitted again (client retry after network failure), the stored response is returned without re-processing. This makes your API safe to retry at any layer — API gateway, client SDK, application code.

Store idempotency keys with a TTL (24 hours is standard). Implement the storage before you need it — adding it later requires clients to update their integration to send the header.

Scale Your Backend - Need an Experienced Backend Developer?

We provide backend engineers who join your team as contractors to help build, improve, and scale your backend systems.

We focus on clean backend design, clear documentation, and systems that remain reliable as products grow. Our goal is to strengthen your team and deliver backend systems that are easy to operate and maintain.

We work from our own development environments and support teams across US, EU, and APAC timezones. Our workflow emphasizes documentation and asynchronous collaboration to keep development efficient and focused.

  • Production Backend Experience. Experience building and maintaining backend systems, APIs, and databases used in production.
  • Scalable Architecture. Design backend systems that stay reliable as your product and traffic grow.
  • Contractor Friendly. Flexible engagement for short projects, long-term support, or extra help during releases.
  • Focus on Backend Reliability. Improve API performance, database stability, and overall backend reliability.
  • Documentation-Driven Development. Development guided by clear documentation so teams stay aligned and work efficiently.
  • Domain-Driven Design. Design backend systems around real business processes and product needs.

Tell us about your project

Our offices

  • Copenhagen
    1 Carlsberg Gate
    1260, København, Denmark
  • Magelang
    12 Jalan Bligo
    56485, Magelang, Indonesia

More articles

Early Signs a Software Project Is Headed for Disaster

Sometimes, you can feel a project slipping before it even starts shipping bugs. Recognizing the red flags early can save time, money, and a lot of headaches.

Read more

Singapore Backend Developers Are Expensive and Hard to Retain — The Remote Fix

You finally hired that senior backend engineer. Eight months later, they left for a bank offering 40% more. Now you're starting over.

Read more

Why Code Quality Suffers When There’s No Tech Lead

Good developers, great intentions—but code quality still drops. Here’s why missing leadership in engineering teams quietly erodes software standards.

Read more

Why Your API Returns 200 Even When Something Goes Wrong

Returning HTTP 200 for failed operations hides errors, breaks client logic, and makes systems harder to debug. Using proper status codes is not pedantry—it’s critical for correctness and reliability.

Read more