Graceful Degradation: How to Keep Your App Running When Things Break

by Arif Ikhsanudin, Backend Developer

The difference between degraded and down

Your recommendation engine is unavailable. What happens to your product page? Option A: the page returns a 500 error and users see a broken experience. Option B: the page loads without recommendations, showing a static "popular items" fallback or nothing in that section. The difference between these outcomes is not infrastructure — it's whether anyone on your team made an explicit design decision about what happens when the recommendation service is unavailable.

Graceful degradation means your system provides reduced but functional service when a dependency fails, rather than failing completely. It is a design discipline, not a resilience pattern you can bolt on with a library. The library (circuit breaker, fallback handler) is the mechanism. The decision about what reduced service looks like is yours to make before the failure happens.

Mapping your degradation modes

For every external dependency your service has, you need an explicit answer to: "what does my service do when this is unavailable?"

Start by categorizing dependencies by criticality:

Critical dependencies: the service cannot fulfill its core function without them. If Order Service's payment database is unavailable, it cannot accept orders — there is no graceful degradation, only an honest failure. These dependencies justify a hard fail with a clear error.

Non-critical dependencies: the service can provide reduced but useful functionality without them. Recommendations, personalization, enhanced metadata, activity logging, analytics — all of these can be absent without preventing the core user flow.

Asynchronous dependencies: message brokers, notification services, audit log services. These can fail without affecting synchronous responses — the main flow completes, and the async work either retries or is recorded in a dead letter queue.

Document this categorization explicitly. A dependency map that shows criticality levels makes graceful degradation decisions concrete and reviewable.

Fallback strategies by dependency type

Cached responses: for dependencies that provide data that changes slowly, serve the last known good response from a local cache when the dependency is unavailable. Product details, user preferences, feature flags, configuration data — all reasonable candidates for cache-based fallback.

The key parameter is acceptable staleness. If your product catalog is cached for 5 minutes, you can serve 5-minute-old data when the catalog service is slow. If the catalog service is down for 30 minutes, you serve 30-minute-old data. Whether that's acceptable is a product decision, not a technical one.

@CircuitBreaker(name = "catalogService", fallbackMethod = "catalogFromCache")
public ProductDetails getProductDetails(String productId) {
    return catalogClient.getDetails(productId);
}

private ProductDetails catalogFromCache(String productId, Exception ex) {
    return productCache.getIfPresent(productId)  // Caffeine local cache
        .orElse(ProductDetails.minimal(productId)); // last resort: minimal data
}

Default responses: when no cached data exists and the dependency is unavailable, return a sensible default. Empty recommendations list rather than error. "Price unavailable" rather than page crash. "Shipping estimate unavailable" rather than form failure.

Feature disabling: for features entirely powered by an unavailable service, disable the feature cleanly. The recommendations section doesn't render. The personalized banner shows a generic one. The "customers also viewed" widget is hidden. This requires that features be designed with the "absent" state in mind — not as an afterthought.

Async queue with response: for operations that don't need to be processed synchronously, accept the request, queue it locally, and return a success response. If Email Service is down, persist the email to a local pending_emails table and process it when the service recovers. The user is told their action was received. The email sends later.

Communicating degradation to users

The worst user experience is silent degradation — the page loads but shows wrong data, or a feature appears to work but silently fails. This is worse than an honest error because users don't know to be skeptical of what they're seeing.

Design your degradation UX explicitly:

  • If recommendations are unavailable, show "Recommendations unavailable right now" or nothing — not stale data from 3 days ago presented as current
  • If pricing is from cache, show it with a "Prices may not reflect the latest updates" notice if staleness is a material concern
  • If an action was queued rather than immediately processed, tell the user: "Your request was received and will be processed shortly"

Honesty in degraded states builds more trust than pretending everything is fine.

Testing degradation paths

Graceful degradation paths that aren't tested will fail in unexpected ways when they're actually needed. The fallback code is often the least-exercised code in your service.

Inject dependency failures in your integration test suite:

@Test
void productPageDegrades_whenCatalogServiceUnavailable() {
    // Stub catalog service to return 503
    wireMockServer.stubFor(get(urlPathMatching("/products/.*"))
        .willReturn(serviceUnavailable()));

    // Verify the page still renders with fallback data
    ProductDetails result = productService.getProductDetails("sku-123");
    
    assertThat(result).isNotNull();
    assertThat(result.getTitle()).isEqualTo("Product sku-123"); // minimal fallback
    assertThat(result.isFromCache()).isFalse();
    assertThat(result.isDegraded()).isTrue();
}

Run a quarterly degradation drill in staging: take down each non-critical dependency one at a time and verify the system behaves as designed. Your runbooks should describe what degraded state looks like for each dependency so on-call engineers can recognize expected degradation versus unexpected failure.

The work is not in the circuit breaker configuration. It's in deciding, for every dependency, what your system should do when it's gone — and then building, testing, and documenting that behavior before you need it.

Scale Your Backend - Need an Experienced Backend Developer?

We provide backend engineers who join your team as contractors to help build, improve, and scale your backend systems.

We focus on clean backend design, clear documentation, and systems that remain reliable as products grow. Our goal is to strengthen your team and deliver backend systems that are easy to operate and maintain.

We work from our own development environments and support teams across US, EU, and APAC timezones. Our workflow emphasizes documentation and asynchronous collaboration to keep development efficient and focused.

  • Production Backend Experience. Experience building and maintaining backend systems, APIs, and databases used in production.
  • Scalable Architecture. Design backend systems that stay reliable as your product and traffic grow.
  • Contractor Friendly. Flexible engagement for short projects, long-term support, or extra help during releases.
  • Focus on Backend Reliability. Improve API performance, database stability, and overall backend reliability.
  • Documentation-Driven Development. Development guided by clear documentation so teams stay aligned and work efficiently.
  • Domain-Driven Design. Design backend systems around real business processes and product needs.

Tell us about your project

Our offices

  • Copenhagen
    1 Carlsberg Gate
    1260, København, Denmark
  • Magelang
    12 Jalan Bligo
    56485, Magelang, Indonesia

More articles

The Problem With Screenshot Monitoring Software

Taking screenshots of employees’ work might sound like control, but it often does more harm than good for productivity and morale.

Read more

When the Most Experienced Developer Becomes the Biggest Bottleneck

At first, having a highly experienced developer feels like a shortcut to speed. Then one day, everything starts waiting on them.

Read more

How to Handle Contract Termination Professionally

Hearing “we need to end the contract” can feel like a punch in the gut. It’s awkward, stressful, and sometimes confusing.

Read more

The Difference Between a Fast Test Suite and a Useful Test Suite

Optimizing test suite speed without examining test value produces a fast suite that catches nothing important. Speed and usefulness are both necessary — but they require different investments and are often in tension.

Read more