How to Read a Query Execution Plan Without Getting Lost

by Arif Ikhsanudin, Backend Developer

The output nobody reads until something breaks

EXPLAIN output gets pasted into Slack, misread, and argued about. Half the team doesn't know how to interpret it. The other half confuses estimated rows with actual rows and draws the wrong conclusion. The result is slow queries that stay slow because the root cause is never correctly identified.

Execution plans are not that hard to read once you have the mental model. Here's how to get useful signal from them in under five minutes.

The structure: a tree of operators

A query execution plan is a tree. Each node is an operator that takes input rows, processes them, and passes output rows to its parent. The root node is the last step — what gets returned to the client. The leaf nodes are where data is read from storage.

In PostgreSQL EXPLAIN output, the tree is represented by indentation — deeper nodes execute first:

Hash Join  (cost=1234.00..5678.00 rows=4500 width=64)
  Hash Cond: (o.user_id = u.id)
  ->  Seq Scan on orders o  (cost=0.00..3200.00 rows=180000 width=32)
        Filter: (status = 'pending')
  ->  Hash  (cost=890.00..890.00 rows=27520 width=32)
        ->  Index Scan using users_pkey on users u  (cost=0.43..890.00 rows=27520 width=32)

Reading this bottom-up: PostgreSQL will do an index scan on users, build a hash table from those rows, scan all rows in orders filtering for status = 'pending', then probe the hash table to join them.

The five numbers that matter

Each node shows: (cost=startup_cost..total_cost rows=estimated_rows width=avg_row_bytes)

startup_cost: estimated cost to return the first row (relevant for LIMIT queries where the optimizer can stop early)

total_cost: estimated cost to return all rows. This is the number the optimizer minimizes. Cost is in abstract units (not milliseconds), calibrated by seq_page_cost, random_page_cost, and cpu_tuple_cost parameters.

rows: estimated number of rows this node will output. This is the most important number for diagnosing plan problems.

width: average row width in bytes. Relevant for understanding data volume.

With EXPLAIN ANALYZE, you get actual values alongside estimates:

Seq Scan on orders  (cost=0.00..45231.00 rows=950 width=64)
                    (actual time=0.043..1823.000 rows=847291 loops=1)

The discrepancy here is critical: estimated 950 rows, actual 847,291 rows. The optimizer was off by 3 orders of magnitude. Every plan decision downstream of this node was based on the wrong cardinality. This is almost certainly why the query is slow — the optimizer chose a plan optimized for 950 rows that's catastrophic for 847,291.

Key operators and what they mean

Seq Scan: reads every row in the table. The presence of a seq scan is not automatically bad — for small tables or low-selectivity queries it's correct. It's bad when you expected selectivity and didn't get it.

Index Scan: traverses the B-tree and fetches heap rows. Expect ~6 page reads to find an entry plus one heap page per result row. Bad when many result rows are scattered across many heap pages.

Index Only Scan: fetches data directly from the index without hitting the heap. Look for this on covering indexes — it's the fast path.

Bitmap Heap Scan (with Bitmap Index Scan): PostgreSQL's hybrid approach. Collects all matching row pointers from the index into a bitmap, sorts them by physical location, then reads heap pages in order. Better than Index Scan when many rows match (reduces random I/O). Worse than Seq Scan for very high row counts.

Hash Join: builds a hash table from the smaller input, probes it with the larger. Good for large unsorted inputs. Requires memory — if it spills to disk (Hash Batches > 1), performance degrades significantly.

Merge Join: requires both inputs to be sorted on the join key. Very efficient when sorted indexes are available. Expensive if a sort is required first.

Nested Loop: for each outer row, looks up matching inner rows. Fast when the inner side uses an index lookup. Catastrophically slow when the inner side does a seq scan — O(n²) complexity.

Sort: explicit sort node. Appears before ORDER BY, GROUP BY, DISTINCT. If you see Sort Method: external merge Disk: 54268kB, the sort spilled to disk — increase work_mem for this session or query.

What to look for in five minutes

  1. Find the node with the highest actual time — that's your bottleneck. In EXPLAIN ANALYZE, timing appears per node.

  2. Find the largest estimate vs actual row discrepancy — a 10x or greater gap means the optimizer made decisions based on wrong information. Run ANALYZE on the table.

  3. Look for nested loops on large tables — a Nested Loop where the outer side produces thousands of rows and the inner side does anything other than a primary key lookup is suspect.

  4. Check for hash batches > 1Hash Batches: 8 means the hash join spilled to disk 8 times. Increase work_mem.

  5. Verify index scans where you expect them — if you see a Seq Scan on a column you indexed, trace back to why (function on column, type mismatch, low selectivity).

The invocation to remember

-- PostgreSQL: full diagnostic output
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT) 
SELECT ...;

-- BUFFERS shows shared hit/read counts — high 'read' vs 'hit' means cache misses
-- FORMAT TEXT is human-readable; FORMAT JSON is better for tools

Run this on every slow query before touching anything else. The plan tells you exactly what the database is doing. Guessing without it wastes time.

Scale Your Backend - Need an Experienced Backend Developer?

We provide backend engineers who join your team as contractors to help build, improve, and scale your backend systems.

We focus on clean backend design, clear documentation, and systems that remain reliable as products grow. Our goal is to strengthen your team and deliver backend systems that are easy to operate and maintain.

We work from our own development environments and support teams across US, EU, and APAC timezones. Our workflow emphasizes documentation and asynchronous collaboration to keep development efficient and focused.

  • Production Backend Experience. Experience building and maintaining backend systems, APIs, and databases used in production.
  • Scalable Architecture. Design backend systems that stay reliable as your product and traffic grow.
  • Contractor Friendly. Flexible engagement for short projects, long-term support, or extra help during releases.
  • Focus on Backend Reliability. Improve API performance, database stability, and overall backend reliability.
  • Documentation-Driven Development. Development guided by clear documentation so teams stay aligned and work efficiently.
  • Domain-Driven Design. Design backend systems around real business processes and product needs.

Tell us about your project

Our offices

  • Copenhagen
    1 Carlsberg Gate
    1260, København, Denmark
  • Magelang
    12 Jalan Bligo
    56485, Magelang, Indonesia

More articles

Designing for Failure Is Not Pessimism. It Is Professionalism.

Every component in a distributed system will eventually fail. The only question is whether your system was designed to handle that failure gracefully or to propagate it.

Read more

How to Plan Tasks Without Feeling Overwhelmed

Task lists can feel like a mountain you’ll never climb. With the right approach, you can turn chaos into clarity.

Read more

The Problem With Always Reaching for the Latest Technology

New technology is appealing for legitimate reasons and problematic for systematic ones. The engineers who build the most reliable systems are the ones who evaluate novelty against operational reality, not against excitement.

Read more

Working on Someone Else’s Desk: The Contractor’s Dilemma

“Just sit anywhere—we’ll assign you a desk.” It sounds simple, but for contractors, it often signals something deeper about how the work is really viewed.

Read more