Why Your Database Gets Slower as Your Table Gets Bigger
by Arif Ikhsanudin, Backend Developer
Growth doesn't slow everything equally
You're at a planning meeting. Engineering is debating whether to archive old data or add more database capacity. The question underneath is: why does the database slow down as tables grow, and which queries are affected?
The answer isn't "more rows = slower queries" uniformly. Some queries on large tables are just as fast as on small ones. Others degrade linearly. A few collapse exponentially. Understanding the difference tells you what to fix and where investment pays off.
What the storage engine is actually doing
Modern relational databases (PostgreSQL's heap storage, InnoDB in MySQL, SQL Server's B-tree pages) organize data in fixed-size pages, typically 8KB or 16KB. A table with 10 million rows might occupy 80,000 pages.
When you run a query, the database reads pages from disk (or buffer pool if cached). The number of pages read is the primary cost driver — not the number of rows directly.
Sequential scans read all pages. Cost scales linearly with table size. A table that is 10x bigger takes 10x longer to scan.
Index lookups read a logarithmic number of index pages plus the heap pages for matching rows. Cost scales very slowly with table size — a B-tree index over 10 million rows has depth ~log₁₆(10,000,000) ≈ 6, meaning at most 6 index page reads to find any row. This doesn't change much even at 100 million rows.
The implication: queries doing index lookups stay fast as tables grow. Queries doing full scans get proportionally slower.
Why indexes degrade with table growth (sometimes)
A healthy index keeps queries fast indefinitely. But two things cause index effectiveness to degrade:
1. Index bloat from updates and deletes
In PostgreSQL, updated rows create new versions (MVCC), and old versions accumulate until VACUUM reclaims them. A heavily updated table can have an index pointing to many dead row versions, bloating the index and causing extra page reads. Monitor index bloat with:
-- PostgreSQL: check index vs table size ratio
SELECT
schemaname,
tablename,
indexname,
pg_size_pretty(pg_relation_size(indexrelid)) AS index_size,
pg_size_pretty(pg_relation_size(relid)) AS table_size
FROM pg_stat_user_indexes
ORDER BY pg_relation_size(indexrelid) DESC;
An index larger than the table it covers is a signal worth investigating.
2. Index selectivity decreasing with data distribution shifts
If you have an index on status with values active, inactive, and deleted, and over time 95% of rows become deleted, the index on status = 'deleted' is low-selectivity — the database may prefer a sequential scan. The query plan that was good when the table was small might change as data distribution shifts.
Track this with pg_stats (PostgreSQL) or information_schema.COLUMN_STATISTICS (MySQL 8):
-- PostgreSQL: check MCV (most common values) distribution
SELECT most_common_vals, most_common_freqs
FROM pg_stats
WHERE tablename = 'orders' AND attname = 'status';
Join performance and table size interaction
Join cost is driven by the cardinality of the inputs. When a small table joins to a large table, most strategies work well. When both tables are large, the join algorithm choice becomes critical:
- Hash join: builds a hash table from the smaller relation in memory. Degrades if the hash table spills to disk (controlled by
work_memin PostgreSQL). Fine for large tables if you have enough memory allocated. - Merge join: requires both inputs to be sorted. Fast when both sides have covering sorted indexes. Problematic when sort spills to disk.
- Nested loop: good when the inner side is an index lookup. Catastrophic when the inner side requires a full scan (O(n²) complexity).
As tables grow, joins that were using nested loops (because the inner side was small enough that a scan was tolerable) shift to hash joins. This is usually automatically handled by the optimizer — unless statistics are stale, in which case the optimizer may stick with a suboptimal plan.
The buffer pool effect
Most databases maintain an in-memory buffer pool (PostgreSQL's shared_buffers, InnoDB's innodb_buffer_pool_size). When a table fits entirely in the buffer pool, queries run from memory. When it doesn't, queries incur disk I/O.
A table that grows past the buffer pool size will experience a step-change in performance — queries that ran fast when the data was cached start waiting on disk reads. This often manifests as sudden slowdowns rather than gradual degradation.
On a server with 32GB RAM, a PostgreSQL shared_buffers of 8GB means any table over ~8GB will have frequent cache misses. The fix is either more memory, a smarter caching strategy, or partitioning to keep hot data in cache.
Table bloat and vacuum lag
In PostgreSQL, AUTOVACUUM reclaims dead row versions but can fall behind on high-write tables. A bloated table has many dead rows interspersed with live rows, forcing full scans to read more pages than necessary to find live data.
Check table bloat:
SELECT
schemaname,
tablename,
n_dead_tup,
n_live_tup,
round(n_dead_tup * 100.0 / NULLIF(n_live_tup + n_dead_tup, 0), 1) AS dead_pct
FROM pg_stat_user_tables
WHERE n_dead_tup > 10000
ORDER BY dead_pct DESC;
Tables with >20% dead rows and high query frequency should have their autovacuum settings tuned (autovacuum_vacuum_scale_factor, autovacuum_vacuum_threshold).
The practical response
When a table hits a performance threshold, work through this checklist in order: check query plans for sequential scans that should be index scans, check index bloat and dead tuple ratios, check data distribution for low-selectivity columns, and check whether the hot working set still fits in the buffer pool. Most growth-related slowdowns are explained by one of these four causes.