Vertical Scaling vs Horizontal Scaling: When to Use Which

by Arif Ikhsanudin, Backend Developer

The Default Answer Is Wrong

Ask most engineers which scaling strategy is correct and they will say horizontal scaling. It is the modern answer. It is what cloud-native architecture evangelists advocate. It is what Kubernetes is built around. The answer is also frequently wrong for the specific situation at hand.

Horizontal scaling means adding more instances. Vertical scaling means adding more resources to existing instances — more CPU, more RAM, faster storage. Both have appropriate use cases. The choice depends on your workload, your database, and your operational constraints — not on what the industry considers architecturally fashionable.

When Vertical Scaling Is the Right Answer

Vertical scaling is the right answer when your workload cannot be parallelized, or when parallelization cost exceeds the cost of a larger instance.

Databases are the clearest case. A primary PostgreSQL or MySQL write instance does not benefit from horizontal scaling in the way application servers do. Adding more write replicas does not increase write throughput — all writes go to the primary. If your primary is saturating CPU or I/O, your options are: move to a larger instance (vertical), reduce write load through application-level batching, or shard the data (which is horizontal but at the data level, not the server level). For most systems below extreme write volume, vertical scaling the primary is faster, cheaper, and operationally simpler than sharding.

In-memory computation workloads also favor vertical scaling. If your process needs to hold a large dataset in memory — graph traversal, ML inference, complex session aggregation — adding more instances gives you more parallelism but each instance still needs the memory to hold its working set. A larger instance with more RAM solves the problem directly. Multiple smaller instances require the working set to be partitioned, which may or may not be feasible.

# PostgreSQL primary write throughput vs instance size
# Benchmark: pgbench, scale factor 100, 32 concurrent clients, 60s

db.t3.medium  (2 vCPU, 4 GB RAM):   ~1,800 TPS
db.t3.large   (2 vCPU, 8 GB RAM):   ~2,200 TPS  (working set fits in RAM)
db.m5.xlarge  (4 vCPU, 16 GB RAM):  ~4,100 TPS  (more cores + RAM)
db.m5.4xlarge (16 vCPU, 64 GB RAM): ~9,800 TPS

# Source: RDS PostgreSQL, gp3 storage, benchmark results will vary
# significantly based on workload type and query complexity.

When Horizontal Scaling Is the Right Answer

Horizontal scaling is the right answer when your workload is stateless and the bottleneck is parallel throughput, not per-instance performance.

HTTP application servers are the canonical case. A stateless API server — one that holds no session state, reads from a shared database, and can handle any request without knowledge of other requests — scales linearly with instances behind a load balancer. Doubling instances roughly doubles throughput up to the point where the database or a downstream service becomes the bottleneck.

Queue consumers are another clear case. If you have a queue of work to process and processing each item is independent, adding consumer instances increases throughput proportionally. This is embarrassingly parallel. It is the textbook use case for horizontal scaling.

The requirement is statelessness. If your application server holds local state — in-memory session data, local file handles, instance-specific caches — horizontal scaling requires either eliminating that state or managing it across instances. Sticky sessions (routing the same user to the same instance) work but create uneven load distribution and complicate failover. The better answer is to move the state out: sessions to Redis, files to object storage.

The Hybrid Reality

Most production systems use both. A common pattern:

  • Application tier: horizontally scaled stateless instances
  • Cache tier: vertically scaled Redis primary with read replicas for redundancy
  • Database tier: vertically scaled primary, horizontally scaled read replicas

The read replicas are horizontal, but the primary is vertical. This is not inconsistent — read replicas are stateless from the write path's perspective, so they scale horizontally. The primary has stateful write serialization requirements, so it scales vertically until vertical limits become the bottleneck.

The Decision Criterion

Before choosing, answer two questions:

  1. Is the unit of work parallelizable without shared mutable state?
  2. Does the cost of coordination between parallel units exceed the cost of a larger instance?

If the work is parallelizable and coordination cost is low: horizontal. If the work requires shared state or coordination cost is high: vertical. If the bottleneck is a database primary: vertical first, then evaluate sharding only when vertical limits are hit.

The rule that horizontal scaling is always preferable is a piece of cloud-era marketing that got absorbed into engineering orthodoxy. Choose based on the workload, not the orthodoxy.

Scale Your Backend - Need an Experienced Backend Developer?

We provide backend engineers who join your team as contractors to help build, improve, and scale your backend systems.

We focus on clean backend design, clear documentation, and systems that remain reliable as products grow. Our goal is to strengthen your team and deliver backend systems that are easy to operate and maintain.

We work from our own development environments and support teams across US, EU, and APAC timezones. Our workflow emphasizes documentation and asynchronous collaboration to keep development efficient and focused.

  • Production Backend Experience. Experience building and maintaining backend systems, APIs, and databases used in production.
  • Scalable Architecture. Design backend systems that stay reliable as your product and traffic grow.
  • Contractor Friendly. Flexible engagement for short projects, long-term support, or extra help during releases.
  • Focus on Backend Reliability. Improve API performance, database stability, and overall backend reliability.
  • Documentation-Driven Development. Development guided by clear documentation so teams stay aligned and work efficiently.
  • Domain-Driven Design. Design backend systems around real business processes and product needs.

Tell us about your project

Our offices

  • Copenhagen
    1 Carlsberg Gate
    1260, København, Denmark
  • Magelang
    12 Jalan Bligo
    56485, Magelang, Indonesia

More articles

Why Your CI Pipeline Takes Forever and What to Do About It

A slow CI pipeline is not just an annoyance — it actively damages developer throughput and code quality. Most slowdowns have identifiable causes and practical fixes that teams routinely overlook.

Read more

Banned From WFH? Why Contractors Lose Flexibility and Efficiency

“We don’t allow remote work for this role.” For contractors, that sentence often signals something bigger than just a policy—it signals a broken setup.

Read more

Lazy vs Eager Loading in JPA — What Gets Loaded and When

JPA's fetch type determines when associated data is loaded from the database. Getting it wrong in either direction — too eager or too lazy — produces either unnecessary data transfer or N+1 queries. Here is the model and the correct defaults.

Read more

How to Roll Back a Production Catastrophe Without Panic

Production disasters happen, often when you least expect them. Knowing how to roll back calmly can save hours of stress and downtime.

Read more