Your Tests Are Coupled to Your Implementation and That Is Why They Keep Breaking

by Arif Ikhsanudin, Backend Developer

The Test That Breaks on Rename

You rename a private method. Fifteen tests fail. You rename it back, update the tests, rename it again. Now it works. You have spent forty minutes on a rename.

This is implementation coupling at its most obvious: the tests know the name of an internal method they have no business knowing about. But coupling to implementation takes subtler forms too, and the cost compounds over time as the number of coupled tests grows and refactoring becomes progressively more expensive.

The Forms of Implementation Coupling

Testing private methods directly. Private methods are implementation details. If a private method is important enough to test, it is either doing too much (and should be extracted to its own class with a public interface) or its behavior is already covered by testing the public method that calls it.

Accessing private methods through reflection, by making them package-private "for testing," or by restructuring visibility to accommodate tests is a sign the test is reaching past the interface. The interface is the contract; internals are free to change.

Asserting on method call order. Tests that use Mockito's InOrder or similar to assert that method A was called before method B are testing execution sequence — an internal detail. If the observable behavior (the output, the side effect) is correct, the order of internal calls should not matter. Reordering those calls in a valid refactor should not break a test.

Matching exact call counts for internal operations. Asserting that a repository method was called exactly three times inside a service ties the test to the current implementation's strategy. A refactor that batches those three calls into one will break the test, even if the final result is identical.

# Coupled to implementation: will break on any internal refactor
def test_sends_notification_for_each_recipient():
    mock_notifier = Mock()
    service = CampaignService(notifier=mock_notifier)

    service.send_campaign(campaign_id=1)

    # Asserts on call count — couples test to current loop implementation
    assert mock_notifier.send.call_count == 3
    # Asserts on call order — couples test to current loop sequence
    calls = mock_notifier.send.call_args_list
    assert calls[0] == call("user1@example.com", "Hello!")
    assert calls[1] == call("user2@example.com", "Hello!")

# Coupled to behavior: survives refactoring
def test_campaign_reaches_all_recipients():
    sent_to = []
    mock_notifier = Mock(side_effect=lambda email, _: sent_to.append(email))
    service = CampaignService(notifier=mock_notifier)

    service.send_campaign(campaign_id=1)

    # Asserts on outcome — what was received, not how it was sent
    assert set(sent_to) == {"user1@example.com", "user2@example.com", "user3@example.com"}

The second test will pass whether the service sends notifications sequentially, in parallel, or in batches. It verifies that all recipients were reached — the behavior — not how the service achieves it.

The Test That Survives Refactoring

A test is correctly coupled to behavior when it:

  • Calls only public methods
  • Asserts on public outputs and externally observable side effects
  • Does not assert on the number or order of internal method calls
  • Would only break if the behavior the user or calling code depends on actually changed

This is also the test that tells you something meaningful when it fails. If a test breaks because a private method was renamed, the failure is noise — it tells you something changed internally but says nothing about whether the system is correct. If a test breaks because send_campaign no longer reaches all recipients, the failure is signal — the behavior users depend on has changed.

Practical Identification

Audit your test suite for these patterns. Search for:

  • Tests that use reflection to access private fields or methods
  • Tests that have @VisibleForTesting in the production code they test
  • Tests with InOrder or inOrder.verify that are checking sequence rather than outcome
  • Tests that assert callCount == N for internal operations

Each of these is a test that will resist future refactoring without adding proportional detection value. Rewriting them to assert on behavior — even if the rewrite involves fewer assertions — produces a suite that gets out of the way when you are improving the code and stays in the way when you are breaking it.

Scale Your Backend - Need an Experienced Backend Developer?

We provide backend engineers who join your team as contractors to help build, improve, and scale your backend systems.

We focus on clean backend design, clear documentation, and systems that remain reliable as products grow. Our goal is to strengthen your team and deliver backend systems that are easy to operate and maintain.

We work from our own development environments and support teams across US, EU, and APAC timezones. Our workflow emphasizes documentation and asynchronous collaboration to keep development efficient and focused.

  • Production Backend Experience. Experience building and maintaining backend systems, APIs, and databases used in production.
  • Scalable Architecture. Design backend systems that stay reliable as your product and traffic grow.
  • Contractor Friendly. Flexible engagement for short projects, long-term support, or extra help during releases.
  • Focus on Backend Reliability. Improve API performance, database stability, and overall backend reliability.
  • Documentation-Driven Development. Development guided by clear documentation so teams stay aligned and work efficiently.
  • Domain-Driven Design. Design backend systems around real business processes and product needs.

Tell us about your project

Our offices

  • Copenhagen
    1 Carlsberg Gate
    1260, København, Denmark
  • Magelang
    12 Jalan Bligo
    56485, Magelang, Indonesia

More articles

The Soft Skills Nobody Mentions in Backend Engineering Job Descriptions

Job descriptions for backend engineers list languages, frameworks, and system design experience. The skills that actually determine whether an engineer is effective at senior levels are almost never listed — and almost always matter more.

Read more

Deadlocks in Java — How They Form, How to Find Them, and How to Design Around Them

Deadlocks are deterministic — given the same lock acquisition order and timing, they reproduce reliably. Understanding the four conditions that create them makes both prevention and diagnosis systematic rather than guesswork.

Read more

Why the Best Senior Backend Developers You Have Never Heard of Are Based in Southeast Asia

The strongest contractors most Western startups have never worked with aren't hard to find. They're just not in the places founders usually look.

Read more

JPA Query Optimization — What Hibernate Generates and How to Control It

Hibernate generates SQL from your entity model and query methods. The generated SQL is often correct but rarely optimal. Understanding what gets generated — and the specific patterns that override it — determines whether JPA is a productivity tool or a performance liability.

Read more