Writing SQL That Still Makes Sense Six Months Later
by Arif Ikhsanudin, Backend Developer
The query you wrote in a hurry
You're six months into a project. A bug report comes in about incorrect revenue totals. You open the query responsible and see 80 lines of SQL with five subqueries, aliases like t1, t2, and x, and a comment that says -- TODO: clean this up. The person who wrote it was you. You have no memory of why any of it exists.
SQL readability is underinvested across the industry. Developers who would never write a 200-line method with no comments routinely write opaque queries that are impossible to debug under pressure. The database doesn't care about formatting. Future you does.
Formatting is not optional
Consistent formatting is the baseline. It costs nothing in runtime and makes queries scannable by someone under pressure.
-- Hard to scan
SELECT u.id,u.name,SUM(o.total) as revenue FROM users u LEFT JOIN orders o ON u.id=o.user_id WHERE u.created_at>'2024-01-01' AND o.status='completed' GROUP BY u.id,u.name HAVING SUM(o.total)>1000 ORDER BY revenue DESC;
-- Readable
SELECT
u.id,
u.name,
SUM(o.total) AS revenue
FROM users u
LEFT JOIN orders o
ON u.id = o.user_id
WHERE
u.created_at > '2024-01-01'
AND o.status = 'completed'
GROUP BY
u.id,
u.name
HAVING SUM(o.total) > 1000
ORDER BY revenue DESC;
Each clause gets its own line. Each selected column gets its own line. JOIN conditions are indented under the JOIN. This sounds obvious. It is rarely practiced.
Name things like you'll read them cold
Aliases are the biggest readability debt in SQL. t1, a, sub — these are meaningless. Use abbreviated but descriptive aliases:
-- Meaningless
SELECT t1.id, t2.name, t3.amount
FROM orders t1
JOIN users t2 ON t1.user_id = t2.id
JOIN invoices t3 ON t1.invoice_id = t3.id;
-- Self-documenting
SELECT ord.id, usr.name, inv.amount
FROM orders ord
JOIN users usr ON ord.user_id = usr.id
JOIN invoices inv ON ord.invoice_id = inv.id;
Column aliases should reflect what the value means in business terms, not just what the expression is:
-- Expression alias (describes the computation)
SUM(line_item.quantity * line_item.unit_price) AS total_line_item_revenue
-- Not just
SUM(li.qty * li.price) AS s
Use CTEs to break complex logic into named steps
Common Table Expressions (CTEs) — the WITH clause — let you assign a name to an intermediate result. This is the single highest-leverage readability tool in SQL.
-- Without CTEs: one giant query, hard to reason about
SELECT
u.name,
cohort_data.first_order_date,
revenue_data.total_revenue
FROM users u
JOIN (
SELECT user_id, MIN(created_at) AS first_order_date
FROM orders
GROUP BY user_id
) cohort_data ON u.id = cohort_data.user_id
JOIN (
SELECT user_id, SUM(total) AS total_revenue
FROM orders
WHERE status = 'completed'
GROUP BY user_id
) revenue_data ON u.id = revenue_data.user_id;
-- With CTEs: each step has a name and a purpose
WITH user_first_orders AS (
SELECT
user_id,
MIN(created_at) AS first_order_date
FROM orders
GROUP BY user_id
),
user_revenue AS (
SELECT
user_id,
SUM(total) AS total_revenue
FROM orders
WHERE status = 'completed'
GROUP BY user_id
)
SELECT
u.name,
ufo.first_order_date,
ur.total_revenue
FROM users u
JOIN user_first_orders ufo ON u.id = ufo.user_id
JOIN user_revenue ur ON u.id = ur.user_id;
The CTE version is slightly longer. It is far easier to debug, modify, and hand off to someone else.
One caveat: some database engines (MySQL pre-8.0, older versions of SQL Server) materialize CTEs as temp tables, which can cause performance regressions if the optimizer can't push predicates into them. In PostgreSQL, CTEs prior to version 12 were optimization fences by default. Know your engine's behavior before using deeply nested CTEs in hot paths.
Comment the why, not the what
SQL comments should explain the business reason for a filter or join, not describe what the clause does syntactically.
-- Bad comment (describes syntax)
-- Join to orders table
JOIN orders o ON u.id = o.user_id
-- Good comment (explains business reason)
-- Include only users who have placed at least one order; users without orders
-- are handled by a separate onboarding flow and excluded from revenue reports
JOIN orders o ON u.id = o.user_id
If there's a WHERE clause that filters out a specific edge case, say why. Future maintainers will thank you when they're tempted to remove it.
Organize query files like code
If you're storing queries in files (in a /sql directory, in migration files, in a reporting repo), treat them like code:
- One query per file with a descriptive filename (
monthly_cohort_revenue_by_region.sql) - A header comment block with: what the query returns, who uses it, any known edge cases
- Avoid copy-paste variants of the same query — refactor shared logic into views or CTEs
The practical takeaway
Find the most complex query your team owns — the one that's been modified five times without ever being refactored. Rewrite it using CTEs, explicit column lists, and descriptive aliases. Don't change the logic. Just restructure for readability. Run both versions through EXPLAIN to confirm the plans match. Then commit the readable version. You'll spend 30 minutes now and save hours later when something breaks at 11pm.