Your Table Structure Is Making Your Queries Harder Than They Need to Be

by Arif Ikhsanudin, Backend Developer

When the query is the symptom, not the problem

You're in a code review. The query under discussion has four CTEs, a self-join, and a comment that says "this is the only way to get this data." Someone asks why it's so complicated. The answer is always some variant of: "because of how the table is structured."

Complex queries are often schema problems in disguise. When data is stored in a shape that doesn't match how it needs to be queried, developers add complexity at the query layer to compensate. Understanding that the root cause is structural — not a lack of SQL skill — is the first step to actually fixing it.

Mismatched granularity

The most common structural mismatch: the table stores data at the wrong level of granularity for your primary query.

-- Table stores one row per event, including event metadata that repeats
CREATE TABLE user_events (
  id BIGINT PRIMARY KEY,
  user_id BIGINT,
  event_type VARCHAR(100),
  event_data JSONB,
  session_id UUID,
  session_started_at TIMESTAMPTZ,   -- Repeated for every event in the session
  session_device_type VARCHAR(50),  -- Repeated for every event in the session
  created_at TIMESTAMPTZ
);

Query: "How many sessions per device type this week?"

-- Have to deduplicate sessions because session data is repeated per event
SELECT session_device_type, COUNT(DISTINCT session_id) AS session_count
FROM user_events
WHERE created_at >= NOW() - INTERVAL '7 days'
GROUP BY session_device_type;

The COUNT(DISTINCT session_id) is expensive — it requires sorting or hashing all matching rows. This query's complexity and cost are a direct result of storing session attributes at event granularity.

The structural fix: separate sessions and events.

CREATE TABLE sessions (
  id UUID PRIMARY KEY,
  user_id BIGINT REFERENCES users(id),
  device_type VARCHAR(50),
  started_at TIMESTAMPTZ
);

CREATE TABLE user_events (
  id BIGINT PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
  session_id UUID REFERENCES sessions(id),
  event_type VARCHAR(100),
  event_data JSONB,
  created_at TIMESTAMPTZ
);

Now the session query is straightforward:

SELECT device_type, COUNT(*) AS session_count
FROM sessions
WHERE started_at >= NOW() - INTERVAL '7 days'
GROUP BY device_type;

EAV: the schema that makes every query hard

Entity-Attribute-Value (EAV) is the pattern where attributes are stored as rows rather than columns:

CREATE TABLE product_attributes (
  product_id BIGINT,
  attribute_name VARCHAR(100),
  attribute_value TEXT,
  PRIMARY KEY (product_id, attribute_name)
);

-- Data looks like:
-- (1, 'color', 'red')
-- (1, 'size', 'L')
-- (1, 'weight_kg', '0.5')

Flexibility at the cost of every query being a pivot operation:

-- Find all red, size-L products — requires self-joins per attribute
SELECT p.id
FROM products p
JOIN product_attributes pa_color ON p.id = pa_color.product_id
  AND pa_color.attribute_name = 'color' AND pa_color.attribute_value = 'red'
JOIN product_attributes pa_size ON p.id = pa_size.product_id
  AND pa_size.attribute_name = 'size' AND pa_size.attribute_value = 'L';

EAV trades query simplicity for schema flexibility. It's appropriate when the set of attributes is genuinely unknown and varies wildly per entity. It's overused when a JSON column would achieve the same flexibility with simpler queries:

-- PostgreSQL JSONB: flexible attributes with indexable structure
CREATE TABLE products (
  id BIGINT PRIMARY KEY,
  name TEXT NOT NULL,
  attributes JSONB
);

-- Index a specific attribute for fast filtering
CREATE INDEX idx_products_color ON products ((attributes->>'color'));

-- Query is straightforward
SELECT id FROM products WHERE attributes->>'color' = 'red' AND attributes->>'size' = 'L';

JSONB in PostgreSQL supports GIN indexes for arbitrary key queries and operator-based filtering. It's not a replacement for properly structured columns, but it's a far better fit than EAV for semi-structured data.

The wide table that needs too many nulls

A table with 80 columns where only 15 are populated for any given row is a sign that multiple entity subtypes are being stored in one table.

CREATE TABLE content (
  id BIGINT PRIMARY KEY,
  type VARCHAR(50),          -- 'article', 'video', 'podcast'
  title TEXT,
  body TEXT,                 -- NULL for video/podcast
  video_url TEXT,            -- NULL for article/podcast
  video_duration_seconds INT,-- NULL for article/podcast
  audio_url TEXT,            -- NULL for article/video
  audio_bitrate_kbps INT,    -- NULL for article/video
  ...
);

This is a type hierarchy problem. The correct pattern is a shared base table with type-specific tables:

CREATE TABLE content (
  id BIGINT PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
  type VARCHAR(50) NOT NULL,
  title TEXT NOT NULL,
  created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

CREATE TABLE articles (
  content_id BIGINT PRIMARY KEY REFERENCES content(id) ON DELETE CASCADE,
  body TEXT NOT NULL
);

CREATE TABLE videos (
  content_id BIGINT PRIMARY KEY REFERENCES content(id) ON DELETE CASCADE,
  video_url TEXT NOT NULL,
  duration_seconds INT NOT NULL
);

The tradeoff: you now need a join to get the full record for a specific type. But your columns are all meaningful, your NOT NULL constraints are enforced, and queries against a single type don't load irrelevant columns.

The practical audit

Pick your most complex query — the one that took longest to write or that developers are afraid to modify. Trace backwards: why is it complex? Is it aggregating data that exists at the wrong granularity? Is it pivoting EAV rows into columns? Is it joining subtypes out of a nullable wide table? The query complexity is almost always pointing at a schema structural mismatch. Fix the structure; the query simplifies itself.

Scale Your Backend - Need an Experienced Backend Developer?

We provide backend engineers who join your team as contractors to help build, improve, and scale your backend systems.

We focus on clean backend design, clear documentation, and systems that remain reliable as products grow. Our goal is to strengthen your team and deliver backend systems that are easy to operate and maintain.

We work from our own development environments and support teams across US, EU, and APAC timezones. Our workflow emphasizes documentation and asynchronous collaboration to keep development efficient and focused.

  • Production Backend Experience. Experience building and maintaining backend systems, APIs, and databases used in production.
  • Scalable Architecture. Design backend systems that stay reliable as your product and traffic grow.
  • Contractor Friendly. Flexible engagement for short projects, long-term support, or extra help during releases.
  • Focus on Backend Reliability. Improve API performance, database stability, and overall backend reliability.
  • Documentation-Driven Development. Development guided by clear documentation so teams stay aligned and work efficiently.
  • Domain-Driven Design. Design backend systems around real business processes and product needs.

Tell us about your project

Our offices

  • Copenhagen
    1 Carlsberg Gate
    1260, København, Denmark
  • Magelang
    12 Jalan Bligo
    56485, Magelang, Indonesia

More articles

Feeling Stuck After 3 Years? How to Know if You’re Improving

You’ve been coding for a few years, but it feels… flat. No big jumps, no clear progress—just work on repeat.

Read more

Docker in CI/CD Is Easier Than Most Tutorials Make It Look

Most CI/CD Docker tutorials are either too simple (just run docker build) or too complex (full GitOps with Argo and Helm). The practical middle ground — building, testing, tagging, and pushing images in a CI pipeline — is straightforward once you see it laid out.

Read more

What Clients Often Get Wrong When Outsourcing Development

Outsourcing development seems simple: hire, delegate, and wait for results. In reality, many clients misunderstand what it takes to build quality software remotely.

Read more

Why Paris Startups Are Quietly Routing Backend Work to Async Remote Contractors

Nobody talks about it at Station F happy hours. But the startups shipping fastest have stopped hiring for every backend project on their roadmap.

Read more