Architecture Documents for Real Codebases

April 22, 2026 (1mo ago)

A good contributor can read code. That is not the problem. The problem is that new contributors read code sequentially, while maintainers navigate by a mental map. Maintainers know that “pricing” really lives behind the quote builder, that “notifications” are split between delivery and preference storage, and that the scary-looking folder is actually generated glue nobody should touch.

ARCHITECTURE.md is the cheapest way to lend that map to someone else.

Diagram showing ARCHITECTURE.md mapping product questions to code destinations

What belongs in the file

The document should not explain every package. It should answer the questions that cost the most calendar time when they are missing:

  • what problem the system solves
  • which modules own the main concepts
  • where a contributor should start for common changes
  • which dependencies are intentional
  • which dependencies are forbidden
  • which details are generated, external, or intentionally boring
  • which cross-cutting systems affect everything

That last category matters more than people expect. Authentication, logging, tracing, permissions, retries, tenancy, localization, billing, migrations, and background jobs usually do not live in one folder. They are the weather of the codebase. If the weather is not described, every new contributor has to discover it by getting wet.

The trick is to keep the document stable. Mention names that can be found with symbol search. Avoid fragile line links. Avoid describing small helper functions. The goal is not perfect documentation coverage. The goal is a durable orientation layer.

Example: a SaaS checkout system

Imagine a B2B SaaS product with workspaces, subscriptions, invoices, and usage-based billing. A new engineer is asked to add annual discounts.

Without an architecture note, they might search for “price”, open a checkout page, patch a React component, discover the API recalculates totals, patch the API handler, discover invoices disagree, patch invoice generation, then get a review comment saying discounts are represented as pricing rules and must be audited.

The right ARCHITECTURE.md does not need to teach billing theory. It can say:

## Billing codemap
 
- `PricingRule` is the source of truth for discounts, trials, coupons, and enterprise overrides.
- Checkout displays prices but never calculates canonical totals.
- Invoice generation replays pricing rules against metered usage snapshots.
- Payment provider adapters only receive finalized invoices.
- Every pricing rule change emits an audit event.

That is enough to save hours. It tells the contributor where the change starts, what code is downstream, and which invariant will matter in review.

Example: a frontend monorepo

Frontend architecture gets messy in a quieter way. The folder tree looks familiar, but the real design is hidden in conventions.

A large app might have route modules, shared UI primitives, generated API clients, feature collections, visual fixtures, and integration tests. When a developer adds a “resend invite” button, the hard question is not how to render a button. It is whether the action belongs in a route loader, a feature hook, a collection mutation, or a generated client wrapper.

A useful architecture note might say:

## Frontend layering
 
- Routes compose feature components and own navigation state.
- Feature modules own product workflows and mutations.
- UI primitives are reusable but do not import feature modules.
- Generated API clients are called through feature-level adapters.
- Visual fixtures live next to the workflow they stabilize.

This prevents two common failures: shared UI accidentally learning product rules, and route components becoming the place where every workflow goes to grow old.

Example: an ingestion pipeline

Data systems often need the same treatment because the files are ordered by mechanics, not by meaning. You may see consumer, normalizer, writer, jobs, schema, and deadletter, but that does not tell you which part owns correctness.

For an ingestion pipeline that imports partner inventory feeds, I would document the flow like this:

## Inventory import flow
 
- `FetchJob` downloads raw partner payloads and stores immutable input blobs.
- `NormalizeBatch` converts partner fields into internal product candidates.
- `ValidationReport` records rejected rows and warnings before any write.
- `CatalogWriter` is the only module that mutates live catalog tables.
- Retry workers may repeat any stage, so handlers must be idempotent.

The invariant is the important part: raw input is immutable, live catalog writes happen in one place, and retries must be safe. You cannot reliably infer those absences by browsing a directory.

Layered architecture diagram showing boundaries and invariants

Boundaries are more valuable than tours

The most useful architecture notes often describe things that are absent:

  • domain code does not import React
  • API handlers do not open database transactions directly
  • background workers do not call payment providers without an idempotency key
  • generated clients are not edited by hand
  • reporting queries do not read hot transactional tables

These rules are architectural because they shape every future implementation. They are also hard to see in code because there is no file named nothing_imports_the_ui.go.

Write them down.

Better yet, pair them with enforcement where practical. If the architecture document says feature modules cannot import route modules, a lint rule or dependency check can keep that true. The document explains intent. The tool catches drift.

A template I would actually maintain

I would rather have a short architecture file that stays alive than a grand document everyone admires once and ignores forever. For most projects, this is enough:

# Architecture
 
## System story
 
Two or three paragraphs describing what the system does, who uses it,
and which constraints shape the design.
 
## Codemap
 
Name the main modules and the product concepts they own.
Explain where common changes usually start.
 
## Boundaries and invariants
 
List the rules reviewers care about:
dependency direction, ownership, generated code, data flow,
idempotency, security, tenancy, or performance constraints.
 
## Cross-cutting concerns
 
Describe auth, logging, tracing, config, migrations, background jobs,
third-party integrations, and test fixtures at a high level.
 
## How to update this document
 
Keep it short. Update it when module ownership changes,
when a new boundary appears, or when review feedback repeats.

This is not a replacement for README, onboarding docs, API references, or comments near tricky code. It is the missing middle: less detailed than implementation docs, more structural than setup instructions.

When to write it

The best time is just after you notice the same review comment appearing twice.

“This belongs in the domain layer.”

“Do not call Stripe from here.”

“That folder is generated.”

“Use the collection instead of fetching in the component.”

“The worker can run twice.”

Those comments are signals that the maintainers have a map the project has not written down yet. Capture the map once, and future reviews can spend more time on the actual change.

An ARCHITECTURE.md does not make a codebase simple. It makes the complexity legible. That is a much more useful promise.