Your Table Structure Is Making Your Queries Harder Than They Need to Be
by Arif Ikhsanudin, Backend Developer
When the query is the symptom, not the problem
You're in a code review. The query under discussion has four CTEs, a self-join, and a comment that says "this is the only way to get this data." Someone asks why it's so complicated. The answer is always some variant of: "because of how the table is structured."
Complex queries are often schema problems in disguise. When data is stored in a shape that doesn't match how it needs to be queried, developers add complexity at the query layer to compensate. Understanding that the root cause is structural — not a lack of SQL skill — is the first step to actually fixing it.
Mismatched granularity
The most common structural mismatch: the table stores data at the wrong level of granularity for your primary query.
-- Table stores one row per event, including event metadata that repeats
CREATE TABLE user_events (
id BIGINT PRIMARY KEY,
user_id BIGINT,
event_type VARCHAR(100),
event_data JSONB,
session_id UUID,
session_started_at TIMESTAMPTZ, -- Repeated for every event in the session
session_device_type VARCHAR(50), -- Repeated for every event in the session
created_at TIMESTAMPTZ
);
Query: "How many sessions per device type this week?"
-- Have to deduplicate sessions because session data is repeated per event
SELECT session_device_type, COUNT(DISTINCT session_id) AS session_count
FROM user_events
WHERE created_at >= NOW() - INTERVAL '7 days'
GROUP BY session_device_type;
The COUNT(DISTINCT session_id) is expensive — it requires sorting or hashing all matching rows. This query's complexity and cost are a direct result of storing session attributes at event granularity.
The structural fix: separate sessions and events.
CREATE TABLE sessions (
id UUID PRIMARY KEY,
user_id BIGINT REFERENCES users(id),
device_type VARCHAR(50),
started_at TIMESTAMPTZ
);
CREATE TABLE user_events (
id BIGINT PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
session_id UUID REFERENCES sessions(id),
event_type VARCHAR(100),
event_data JSONB,
created_at TIMESTAMPTZ
);
Now the session query is straightforward:
SELECT device_type, COUNT(*) AS session_count
FROM sessions
WHERE started_at >= NOW() - INTERVAL '7 days'
GROUP BY device_type;
EAV: the schema that makes every query hard
Entity-Attribute-Value (EAV) is the pattern where attributes are stored as rows rather than columns:
CREATE TABLE product_attributes (
product_id BIGINT,
attribute_name VARCHAR(100),
attribute_value TEXT,
PRIMARY KEY (product_id, attribute_name)
);
-- Data looks like:
-- (1, 'color', 'red')
-- (1, 'size', 'L')
-- (1, 'weight_kg', '0.5')
Flexibility at the cost of every query being a pivot operation:
-- Find all red, size-L products — requires self-joins per attribute
SELECT p.id
FROM products p
JOIN product_attributes pa_color ON p.id = pa_color.product_id
AND pa_color.attribute_name = 'color' AND pa_color.attribute_value = 'red'
JOIN product_attributes pa_size ON p.id = pa_size.product_id
AND pa_size.attribute_name = 'size' AND pa_size.attribute_value = 'L';
EAV trades query simplicity for schema flexibility. It's appropriate when the set of attributes is genuinely unknown and varies wildly per entity. It's overused when a JSON column would achieve the same flexibility with simpler queries:
-- PostgreSQL JSONB: flexible attributes with indexable structure
CREATE TABLE products (
id BIGINT PRIMARY KEY,
name TEXT NOT NULL,
attributes JSONB
);
-- Index a specific attribute for fast filtering
CREATE INDEX idx_products_color ON products ((attributes->>'color'));
-- Query is straightforward
SELECT id FROM products WHERE attributes->>'color' = 'red' AND attributes->>'size' = 'L';
JSONB in PostgreSQL supports GIN indexes for arbitrary key queries and operator-based filtering. It's not a replacement for properly structured columns, but it's a far better fit than EAV for semi-structured data.
The wide table that needs too many nulls
A table with 80 columns where only 15 are populated for any given row is a sign that multiple entity subtypes are being stored in one table.
CREATE TABLE content (
id BIGINT PRIMARY KEY,
type VARCHAR(50), -- 'article', 'video', 'podcast'
title TEXT,
body TEXT, -- NULL for video/podcast
video_url TEXT, -- NULL for article/podcast
video_duration_seconds INT,-- NULL for article/podcast
audio_url TEXT, -- NULL for article/video
audio_bitrate_kbps INT, -- NULL for article/video
...
);
This is a type hierarchy problem. The correct pattern is a shared base table with type-specific tables:
CREATE TABLE content (
id BIGINT PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
type VARCHAR(50) NOT NULL,
title TEXT NOT NULL,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE TABLE articles (
content_id BIGINT PRIMARY KEY REFERENCES content(id) ON DELETE CASCADE,
body TEXT NOT NULL
);
CREATE TABLE videos (
content_id BIGINT PRIMARY KEY REFERENCES content(id) ON DELETE CASCADE,
video_url TEXT NOT NULL,
duration_seconds INT NOT NULL
);
The tradeoff: you now need a join to get the full record for a specific type. But your columns are all meaningful, your NOT NULL constraints are enforced, and queries against a single type don't load irrelevant columns.
The practical audit
Pick your most complex query — the one that took longest to write or that developers are afraid to modify. Trace backwards: why is it complex? Is it aggregating data that exists at the wrong granularity? Is it pivoting EAV rows into columns? Is it joining subtypes out of a nullable wide table? The query complexity is almost always pointing at a schema structural mismatch. Fix the structure; the query simplifies itself.