How We Use Claude Code in Production: Workflows, Costs, Anti-Patterns

May 9, 202611 mins
Claude Code in production — Unsplash
Six months. 30+ production builds. Claude Code as primary tool, not assistant.

The honest report — what works, what gets people in trouble, the prompt patterns we use daily, what it costs per engineer per month.

The unsexy result first

Senior engineers we know are shipping two to four times more line-items per week. Code quality holds or improves. Not because typing got faster — typing was never the bottleneck. The engineer's time moved from boilerplate to architecture, review, edge cases. That's the right shift.

The failure modes are real and most teams adopting it walk into them. This post is mostly about those.

What Claude Code does well in our hands

Boilerplate at scale. Schemas, API routes, types, test scaffolds, form components, table components, admin CRUD. Anything with a known shape that just needs to be typed out.

Broad-scope refactors. Renaming a model across 80 files. Pages Router to App Router. REST to tRPC. Swapping date libraries.

Reading-heavy debugging. "Why is this query slow" with EXPLAIN output. "Why is this hook re-rendering" with React Profiler trace. The model reads stack traces faster than humans do.

Reading documentation. Faster than a tab graveyard for any library where docs exist.

First-draft architecture notes. Give it the constraints, get a strawman, edit to truth. Cheaper than starting from blank.

What it does badly

Architecture. It'll ship the cargo-cult stack of the month if you don't push back. The human picks the architecture, always.

Security-sensitive code without review. Auth, payment paths, data isolation. It writes plausible code with subtle holes. Every line goes through human review.

Anything with a fuzzy spec. Garbage in, garbage out. The spec needs to be tight before code-gen starts.

Naming. It picks generic names. Take the time.

The daily workflow

Three-hour blocks, each with a clear deliverable. Inside the block:

Minutes 0–15: scope. Write a short spec in SCRATCH.md at repo root. What's being built. What's NOT in scope. Which files will change. What tests pass at the end. This file is the prompt anchor for everything that follows.

Minutes 15–60: code-gen. Hand the spec to Claude Code. It scaffolds. Two or three iterations is normal — we rarely accept the first version unmodified.

Minutes 60–150: human-loop. Edge cases the model won't see. Error states. Loading states. Empty states. This is where products get good or bad — not the happy path, which the model handles.

Minutes 150–180: cleanup. Lint, types, tests, commit, push.

A three-hour block ships what used to be a one-to-two-day item.

Prompt patterns that hold

Scope-fence. Tell it explicitly which files to touch and what's out of scope. Without this it wanders into adjacent files and "improves" them. This single rule kills the most common failure.

Tests-first. Have it write tests before implementation. Forces a contract. Catches spec ambiguity early.

Design-then-implement. For anything non-trivial. A 200-word design note is one cheap artifact to argue with — arguing with 400 lines of generated code is much more expensive.

Existing-code-first. Point it at the files it should match in style and pattern. Without this it invents conventions. Codebase coherence collapses post-AI without this rule.

Map this to your team

Ask ChatGPT to translate these patterns into a workflow for your specific stack, team size, and codebase shape — and to flag which patterns probably won't fit you.

|

Prompt patterns that look good and aren't

"Make it production-ready." Means nothing. The model imagines someone else's bullet list of "production-ready" and ships half of it.

"Fix the bug." Without a reproduction, the model guesses. Plausible guesses, wasted time. Always paste the stack trace, the failing test, the actual broken output.

"Refactor for cleanliness." Vague. Produces code shuffles, not improvements. Be specific — "extract X out of Y so it can be reused in Z" or "rename foo to bar across the codebase."

"Keep iterating until tests pass." Without bounds it runs forever and racks up cost. Always cap the iteration limit and the token budget per block.

What the bill looks like

Per engineer, with normal usage (3–4 hours of active code-gen per day, 5 days a week):

  • Claude Code subscription (Max plan): $200/month
  • API tokens at the margin (batch ops, evals): $30–$80/month
  • IDE subscriptions for the occasional non-Claude task: $20/month
  • Total: $250–$300/month per engineer. Compare to engineer cost loaded at $20K–$30K/month for a US senior. Tooling is rounding error.

    Cost gotchas:

  • Long context windows are tempting and expensive. Don't paste the entire codebase. Use file references.
  • Repeated exploration burns tokens. Build small skills/ for the workflows you do weekly. Saves 30–50% on tokens for repeated tasks.
  • Sub-agent setups quadruple cost. Use them only for embarrassingly parallel work — generating tests for 20 files at once, for example. Usually not worth it.
  • Want this workflow on your build?

    We embed as the senior engineer using exactly this stack — scope-fenced prompts, eval-gated CI, AI-handles-boilerplate-human-handles-edges. Flat price, two-week first deploy, code in your GitHub from commit one. Send your email, we'll set up the call within 24 hours.

    Team rules that prevent regression

    One engineer at a time on the same Claude Code session. Two humans plus one model confuses the model and creates drift.

    Commit before every prompt that touches more than one file. Lets you reset cheaply if the output is bad.

    Shared CLAUDE.md at repo root with project conventions (naming, file structure, allowed deps). Every prompt starts from this context.

    Don't let the model review its own output. It rubber-stamps. Human review only.

    The economic reason this matters

    A one-engineer shop works because of this shift. The $5K MVP tier — see the price sheet — depends on it. The hire-vs-engage math for founders — see the 2026 math — depends on it.

    If you're building AI features and want the agent / RAG / eval side done right, see RAG done right in 2026.

    Enjoyed this article? Share it with others

    Related Posts