Documentation Is Not a Chore. It Is Part of the Work.
by Arif Ikhsanudin, Backend Developer
The Undocumented Service
You're six months into a new role. There's a service called payment-reconciliation-worker that runs nightly, writes to three tables, and calls two external APIs. Nobody on the current team built it; the original author left a year ago. There's no README. The code works — you know this because the business would notice immediately if it stopped — but nobody can confidently say what it does, what it depends on, or what the failure modes are.
This situation is not unusual. It's the default outcome when documentation is treated as optional or as something done after the "real work" is finished.
The cost of undocumented systems accumulates invisibly: time spent reverse-engineering code instead of building, anxiety about touching systems nobody understands, decisions made based on guesswork about how something works. These costs are real and ongoing, and they were optional.
What Documentation Is Actually For
Documentation serves different audiences across different time horizons:
Future you (next week): Why did I make this decision? What was I trying to avoid? A comment or a commit message that captures the "why" takes thirty seconds to write and can save an hour of confusion.
New teammates (next quarter): What does this service do, how does it fit into the larger system, and how do I run it locally? A service README that covers these three questions in a few paragraphs is worth days of onboarding time for every new engineer.
Production responders (2am, six months from now): What is this alert telling me, and what do I do about it? A runbook that describes the alert, the likely cause, and the remediation steps — even a rough one — is the difference between a thirty-minute incident and a three-hour one.
Future architects (next year): Why was this designed this way? What alternatives were considered and why were they rejected? An Architecture Decision Record (ADR) captures the context that is otherwise lost entirely when the original decision-makers leave.
The Types of Documentation That Matter
Not all documentation has equal ROI. In rough order of impact:
Code comments that explain non-obvious decisions: Already covered, but worth re-stating. A comment next to a magic number, a surprising algorithm choice, or a workaround for a known upstream behavior is permanently valuable.
Service READMEs: Three sections are sufficient: what this service does (one paragraph), how to run it locally (commands, dependencies), and where to find monitoring, the deployment pipeline, and the on-call runbook.
Runbooks: For every production alert, a corresponding runbook document. What does this alert mean? What are the most likely causes? What are the investigation steps? What is the remediation? Runbooks don't need to be perfect — a rough runbook is dramatically better than no runbook.
Architecture Decision Records: A short document per significant decision: context, options considered, decision made, consequences. ADRs live in the repository alongside the code. When someone asks "why is this a queue instead of a synchronous call?" three years from now, the answer is findable.
The Markdown-based ADR format by Michael Nygard is widely used and takes fifteen minutes per record to produce.
What Documentation Is Not
Documentation is not comprehensive. Nobody reads a 200-page technical manual. The goal is capturing enough context that someone unfamiliar with a system can understand it at the level they need, not exhaustively describing every implementation detail.
Documentation is not a substitute for readable code. If your code is so complex that it requires extensive documentation to understand, the code is the problem. Documentation describes the "why" and the system context; the code should describe the "what" and "how" clearly on its own.
Documentation is not done once. A README that hasn't been touched in two years for a service that has changed significantly in that time is worse than no README — it's actively misleading. The discipline is updating documentation as part of the change, not after it.
Making It Happen on a Team
The most effective team practice: include documentation in the definition of done for any significant change. "This PR is not complete until the README/runbook/ADR is updated or created." Not as a bureaucratic gate, but as a genuine part of what "done" means.
If a PR changes how a service is deployed, the runbook should reflect it. If a PR adds a new alert, a runbook entry should accompany it. If a PR makes a significant architectural decision, an ADR should capture it.
This costs time — maybe thirty minutes per significant PR. It saves multiples of that across every future interaction with the system.
The Practical Takeaway
Pick the most critical undocumented service your team operates. Block out two hours this week and write the minimal useful documentation for it: a README (what it does, how to run it, where monitoring lives) and one runbook for its most common alert. Treat it as engineering work, not overhead. That document will be read by your future self before your next on-call incident, and you will be glad it exists.