Testing Is Not About Proving Your Code Works. It Is About Knowing When It Breaks.
by Arif Ikhsanudin, Backend Developer
The Illusion of the Green Build
You push a change. CI goes green. You ship. Three hours later, a customer reports that their checkout flow is silently dropping orders. You look at the test suite — all 847 tests still pass. Nothing caught it.
This happens because the tests were written to prove the happy path works, not to detect the specific failure mode that just hit production. The tests were authored by someone who knew how the code was supposed to behave, and naturally, they tested exactly that behavior. The bug lived in a codepath nobody thought to model as a failure scenario.
This is the fundamental confusion about what tests are for. Tests are not a proof of correctness. They are a detection system. And a detection system designed only to confirm known-good behavior is going to miss the failures that actually matter.
Confidence vs. Coverage
A test suite gives you confidence when it can catch regressions you did not predict. That is different from catching bugs you already know about.
Consider two codebases. The first has 90% line coverage, all tests written immediately after implementation, each one asserting that a function returns the value the developer just wrote it to return. The second has 40% coverage, but every test case was derived from an actual incident, a production edge case, or a failure mode discovered during code review.
The second codebase will have fewer surprises in production. Not because coverage percentages matter, but because the tests are encoding institutional memory about how the system breaks — not just how it works.
# This test proves the code works as written.
def test_calculate_discount():
assert calculate_discount(100, 0.1) == 90.0
# This test encodes a failure mode.
def test_calculate_discount_does_not_apply_to_negative_price():
with pytest.raises(ValueError):
calculate_discount(-50, 0.1)
# This test catches a real incident.
def test_calculate_discount_with_100_percent_does_not_produce_negative_total():
result = calculate_discount(100, 1.0)
assert result == 0.0, "100% discount should zero out, not produce negative value"
The first test is not useless — it establishes a baseline. But on its own, it tells you almost nothing about whether the function is safe to ship.
Writing Tests That Catch What You Did Not Anticipate
The shift in mindset is this: when you write a test, ask yourself what scenario you are afraid of, not what behavior you want to confirm.
Some concrete practices that follow from this:
Test the boundaries, not the middle. If a function accepts an integer, test 0, -1, and MAX_INT. If it accepts a list, test an empty list. Middle-of-the-range values are where code almost always works. Edges are where it breaks.
Write tests from incidents. Every production bug should produce at least one new test before the fix is merged. Not a test for the fix — a test that would have caught the bug. This practice alone compounds into one of the best regression suites you can build.
Treat expected errors as first-class behavior. A function that can fail under valid input conditions needs tests for those failures just as much as it needs tests for success. If your payment processor returns a timeout, that is not an edge case — it is a normal operating condition that your code must handle correctly.
Separate the "does it work" test from the "does it survive misuse" test. Both matter. But they are asking different questions and should be treated as distinct concerns.
The Detector Framing in Practice
Reframe every test you write as: "what condition am I trying to detect?" If you cannot answer that question specifically — if the answer is just "I'm checking that the function runs" — the test is not doing useful detection work.
This framing also clarifies when a test is worth writing at all. Not every line needs a dedicated test. But every failure mode that would be invisible without a test probably needs one.
A test suite that gives you real confidence is not one that has the highest pass rate when everything is going well. It is one that goes red the fastest when something starts to go wrong — and points you at the right place.
That is the goal. Not green builds on code you just wrote. Red builds on code that is about to cause an incident.