Where You Put Your Cache Matters More Than You Think

by Arif Ikhsanudin, Backend Developer

The Cache Is In the Wrong Place

A team adds Redis in front of their database, sees a 10% improvement in p95 latency, and considers the problem solved. Six months later, user-facing latency is still unacceptably high. The Redis cache has a 95% hit rate. The problem was never the database — it was the three synchronous external API calls in the request path, none of which are cached.

Cache placement is a function of what you are trying to protect and from what. Placing a cache at the wrong layer either leaves the actual bottleneck unaddressed or creates consistency problems without meaningful performance benefit.

The Cache Layers and What They Protect

Client-side / in-process cache. Data cached in the application process memory — a dictionary, a LRU cache, an in-process store like Caffeine (JVM) or functools.lru_cache (Python). Sub-millisecond access, no network hop. Appropriate for: reference data that changes rarely and can tolerate per-instance staleness (configuration values, feature flags, lookup tables).

Limitation: not shared across instances. Each instance has its own copy. A cache update must be propagated to all instances, usually via TTL expiration or a pub/sub invalidation event. Memory is bounded by the instance's heap.

Distributed cache (Redis, Memcached). A shared cache layer between application instances and the database. Millisecond-range latency. Appropriate for: session data, computed results shared across users, expensive query results with acceptable staleness.

Limitation: adds a network hop. Introduces an additional failure surface. Requires managing consistency with the origin.

Database query cache. Some databases (MySQL historically, PostgreSQL less so) have built-in query result caches. Rarely the right layer — invalidation is coarse-grained and the performance benefit is typically better achieved at the application layer.

CDN / edge cache. Caches responses at the network edge, close to the client. Zero application server load for cache hits. Appropriate for: public, non-user-specific content — API responses, static data, media. HTTP cache headers (Cache-Control, ETag, Last-Modified) control CDN behavior.

# HTTP cache headers for a public API response:
Cache-Control: public, max-age=300, stale-while-revalidate=60
# max-age=300: serve from CDN for up to 5 minutes
# stale-while-revalidate=60: serve stale content for 60s while fetching fresh
# Result: near-zero origin load for popular public endpoints

Matching the Layer to the Problem

The question to ask first: where is the latency coming from? Use an APM or request profiler to identify the slowest segments.

If latency is in database queries: a distributed cache (Redis) in front of those queries is appropriate. Configure TTL based on acceptable staleness. Ensure cache keys are scoped to the query parameters.

If latency is in external API calls: cache the API responses in-process or in Redis with a TTL appropriate to the external data's change frequency. If the external API is called per-user, scope the cache key to the user.

If latency is in repeated computation (rendering, aggregation, encoding): cache the computed result. Ensure the cache key captures all inputs that affect the output.

If latency is geographic — users far from the origin experience high round-trip time: edge caching via CDN, or deploying application instances in multiple regions.

The Dangerous Placement

The most dangerous misplacement is caching user-specific data at a shared cache without proper key scoping. If a cache key does not include the user ID, two users can receive each other's data. This is not a theoretical concern — it has caused significant security incidents at real companies.

# Dangerous: cache key based only on request path
cache_key = f"product:{product_id}"
# If product data is user-specific (personalized pricing, entitlements),
# user A can receive user B's data.

# Safe: scope the cache key to the user when data is user-specific
cache_key = f"product:{product_id}:user:{user_id}"

Get the placement right first, then tune TTLs and invalidation. A cache in the wrong place cannot be fixed by tuning.

Scale Your Backend - Need an Experienced Backend Developer?

We provide backend engineers who join your team as contractors to help build, improve, and scale your backend systems.

We focus on clean backend design, clear documentation, and systems that remain reliable as products grow. Our goal is to strengthen your team and deliver backend systems that are easy to operate and maintain.

We work from our own development environments and support teams across US, EU, and APAC timezones. Our workflow emphasizes documentation and asynchronous collaboration to keep development efficient and focused.

  • Production Backend Experience. Experience building and maintaining backend systems, APIs, and databases used in production.
  • Scalable Architecture. Design backend systems that stay reliable as your product and traffic grow.
  • Contractor Friendly. Flexible engagement for short projects, long-term support, or extra help during releases.
  • Focus on Backend Reliability. Improve API performance, database stability, and overall backend reliability.
  • Documentation-Driven Development. Development guided by clear documentation so teams stay aligned and work efficiently.
  • Domain-Driven Design. Design backend systems around real business processes and product needs.

Tell us about your project

Our offices

  • Copenhagen
    1 Carlsberg Gate
    1260, København, Denmark
  • Magelang
    12 Jalan Bligo
    56485, Magelang, Indonesia

More articles

Why the Second System Is Always the One That Disappoints

The second-system effect — the tendency to over-engineer rewrites with every lesson and wish-list item accumulated from the first system — is one of the most reliable failure modes in software engineering.

Read more

The Best Ways to Organize Your Freelance Workflow

Freelancing can feel like juggling a dozen balls while riding a unicycle. With the right workflow, you can keep everything moving smoothly—and stay sane.

Read more

Premature Optimization Is Still Killing Codebases in 2026

Knuth's warning from 1974 remains one of the most violated principles in software engineering. The pattern hasn't changed: engineers optimize code that isn't the bottleneck, sacrificing readability for performance gains that don't move the needle.

Read more

Spring Security Method-Level Authorization — @PreAuthorize, SpEL, and Custom Permission Evaluators

URL-level authorization is coarse-grained — it protects paths, not resources. Method-level authorization with @PreAuthorize enables fine-grained access control that considers the current user, the method arguments, and the resource being accessed.

Read more