Rollback Is Not Failure. Not Having One Is.

by Arif Ikhsanudin, Backend Developer

The Shame Around Rollback

In many engineering organizations, triggering a rollback is culturally loaded. It means "something went wrong." The implicit expectation is that good deployments don't need rollbacks — the code was tested, the pipeline was green, the engineer should have been more careful. Rolling back feels like public failure.

This framing is actively dangerous. It makes engineers hesitant to roll back when they should, which extends incident duration. It creates pressure to "fix forward" on broken deployments when rollback would be faster and safer. And it incentivizes hiding rollbacks from postmortem documentation, which means the team can't learn from them.

Rollback is not failure. Rollback is a deployment control mechanism — as deliberate and engineered as the forward deployment. A team that rolls back quickly has a shorter mean time to recovery. A team that avoids rollback out of embarrassment has a longer one. Which team would you rather be on?

What a Real Rollback Plan Looks Like

"We can roll back" is not a rollback plan. A rollback plan answers five specific questions:

Who can trigger it? Any on-call engineer, not just the person who deployed. Rollback during an incident should never be blocked by key-person dependency.

What exact command or pipeline step triggers it? Documented, not reconstructed under pressure. Ideally a single command or a button in your deployment UI.

How long does it take? Measured from previous rollbacks, not estimated. A rolling deployment rollback in Kubernetes via kubectl rollout undo typically takes the same time as a forward rollout — 5–15 minutes for a typical service. Blue-green rollback is under 60 seconds. Know your number.

What are the database implications? This is the hardest question. If the deployment ran a non-reversible migration, rolling back the application code doesn't restore the previous state. The rollback plan must account for this.

How do you verify the rollback succeeded? Specific health checks, specific error rate thresholds, specific user-facing behaviors to validate. Not "it looks better."

The Database Migration Problem in Rollbacks

The most common reason rollbacks fail or are avoided: the deployment ran a database migration that the previous version can't handle.

-- This migration makes the previous version incompatible:
ALTER TABLE payments ALTER COLUMN amount TYPE DECIMAL(19,4);
-- If v1.2 expects INTEGER, it will fail with a type error against this schema

The solution is writing migrations to be backward-compatible for at least one release cycle. The expand-contract pattern applies here:

-- Release 1: Add new column alongside old one (v1.1 writes to old, v1.2 writes to both)
ALTER TABLE payments ADD COLUMN amount_decimal DECIMAL(19,4) NULL;

-- Release 2: v1.3 reads from new column, writes to both; v1.2 still works
-- Release 3: Drop old column (v1.3 now exclusively uses new column; rollback to v1.2 no longer supported)
ALTER TABLE payments DROP COLUMN amount;

This means every backward-incompatible schema change takes three releases instead of one. The tradeoff is the ability to roll back releases 1 and 2. For a release cadence of once per week, this adds two weeks of migration horizon. That's a reasonable cost for reliable rollback capability.

Testing Rollback Before You Need It

A rollback plan that's never been tested is a rollback theory. The procedure that sounds straightforward in a calm planning meeting will reveal hidden dependencies, missing permissions, and undocumented state when executed at 2am during an incident.

Schedule rollback drills:

  1. Deploy a non-breaking change to staging
  2. Verify it's working
  3. Execute the rollback procedure
  4. Verify the previous version is restored and healthy
  5. Measure the time from rollback trigger to healthy state

Do this monthly. Rotate who executes it. The goal is that rollback becomes boring — a routine procedure that any on-call engineer can complete in the expected time without consulting documentation.

The Deployment Confidence Loop

Counterintuitively, investing in rollback capability makes teams more willing to deploy, not less. When you know that a bad deployment can be reversed in under 5 minutes by any on-call engineer without database complications, the cost of a bad deployment is bounded. Bounded risk enables more aggressive deployment frequency.

Teams without good rollback tend to be conservative about what they deploy and when — deploying large batches infrequently because each deployment is high-stakes. Teams with good rollback deploy small batches frequently, because each deployment is reversible.

Without rollback capability:
  Deploy risk: HIGH → Deploy frequency: LOW → Batch size: LARGE → Deploy risk: HIGHER

With rollback capability:
  Deploy risk: LOW → Deploy frequency: HIGH → Batch size: SMALL → Deploy risk: LOWER

Build the rollback. Deploy more often. The two are not in tension — they're the same investment.

Scale Your Backend - Need an Experienced Backend Developer?

We provide backend engineers who join your team as contractors to help build, improve, and scale your backend systems.

We focus on clean backend design, clear documentation, and systems that remain reliable as products grow. Our goal is to strengthen your team and deliver backend systems that are easy to operate and maintain.

We work from our own development environments and support teams across US, EU, and APAC timezones. Our workflow emphasizes documentation and asynchronous collaboration to keep development efficient and focused.

  • Production Backend Experience. Experience building and maintaining backend systems, APIs, and databases used in production.
  • Scalable Architecture. Design backend systems that stay reliable as your product and traffic grow.
  • Contractor Friendly. Flexible engagement for short projects, long-term support, or extra help during releases.
  • Focus on Backend Reliability. Improve API performance, database stability, and overall backend reliability.
  • Documentation-Driven Development. Development guided by clear documentation so teams stay aligned and work efficiently.
  • Domain-Driven Design. Design backend systems around real business processes and product needs.

Tell us about your project

Our offices

  • Copenhagen
    1 Carlsberg Gate
    1260, København, Denmark
  • Magelang
    12 Jalan Bligo
    56485, Magelang, Indonesia

More articles

Testing Ruby Service Objects with RSpec — My Go-To Approach

Service objects are easy to test well and easy to test badly. The difference is in how you handle dependencies, what you assert on, and where you draw the boundary between unit and integration.

Read more

Why a 6-Hour Timezone Gap Makes Your Backend Contractor More Productive, Not Less

A significant timezone difference sounds like a coordination problem. For async backend work, it's closer to a productivity feature.

Read more

Why Over-Communication Is the Most Underrated Remote Work Skill

In remote work, what you do not say is as important as what you do. The gap between your last update and now is where client anxiety grows.

Read more

When Coffee is Your Only Coworker

Working solo has perks, but when your main companion is a steaming cup of coffee, you start to notice the little things. Here’s how to survive and thrive in a one-coffee office.

Read more