Claude Code pricing: what it actually costs to run background agents
The first question every team asks before running Claude Code in CI or as a background agent is “what does this cost?” The answer isn't a flat subscription — Claude Code charges by API token consumption, the bill scales with how much context you feed it, and the range is wide. Across enterprise deployments, Anthropic reports an average of about $13 per developer per active day, with 90% of users staying under $30. Here's how the pricing actually works, what levers you have, and how to keep background agent spend predictable.
The billing model: tokens, not seats
Claude Code doesn't have a per-seat license. Every interaction — every prompt you send and every response you receive — consumes API tokens. You pay for what you use:
- Input tokens — the prompt, system context, file contents Claude reads, tool definitions, conversation history. This is the big one. Background agents that read large codebases consume more input tokens per turn.
- Output tokens— Claude's response, including code it writes, explanations, and tool calls. Output tokens cost more per token than input.
- Thinking tokens— when extended thinking is enabled (the default), Claude's internal reasoning is billed as output tokens. This can be thousands of tokens per request on complex tasks.
There are also subscription options. Claude Pro and Max subscribers get Claude Code usage included in their plan — the /costcommand still shows token counts but it's not billed separately. The /stats command tracks usage patterns for subscribers. Most teams running background agents at scale use the API directly, so the rest of this post focuses on API-based billing.
Model pricing: the rates
The per-token cost depends on which model you select. The three you'll actually use in practice:
| Model | Input | Output | Cache hits |
|---|---|---|---|
| Opus 4 | $15/MTok | $75/MTok | $1.50/MTok |
| Sonnet 4 | $3/MTok | $15/MTok | $0.30/MTok |
| Haiku 4.5 | $1/MTok | $5/MTok | $0.10/MTok |
MTok = million tokens. The cache-hit column is the rate when prompt caching kicks in (more on that below). Sonnet handles most coding tasks well and costs 5× less than Opus on output. The docs recommend reserving Opus for complex architectural decisions or multi-step reasoning.
You can switch models mid-session with /model, set a default in /config, or force one per-run with --model sonnet. For background agents, pinning the model in your automation config is the right move — you want predictable cost per run.
What does a session actually cost?
Use /cost in any Claude Code session to see your running total:
$ /cost
Total cost: $0.55
Total duration (API): 6m 19.7s
Total duration (wall): 6h 33m 10.2s
Total code changes: 0 lines added, 0 lines removedThat $0.55 was a six-minute API conversation spread over a six-hour wall-clock window — typical for a developer who asks a question, goes away, comes back later. A focused 30-minute coding session that reads files, writes code, and runs tests might cost $2–5 on Sonnet. A full-day heavy session can hit $20+.
The enterprise averages Anthropic publishes: about $13 per developer per active day, $150–250 per developer per month, with 90% of users under $30/day. But these numbers assume interactive use. Background agents are different.
Background agent cost dynamics
Background agents — the kind you run in CI, on a cron, or triggered by a Linear issue — have a different cost profile than interactive sessions:
- They run unattended. No human pressing Escape when things go sideways. Without guardrails, a confused agent can churn through turns and tokens doing nothing useful.
- They read more context. A background agent typically reads multiple files, runs tests, reads their output, and iterates. Each file read adds to the input token count, and the context accumulates across turns.
- They run more often. A developer uses Claude Code a few times a day. A background agent might run on every PR, every issue, every cron tick.
This is why cost caps are non-negotiable for background agents.
The cost caps: flags you must set
Claude Code has two hard caps you should always set for unattended runs:
claude -p "fix the failing test" \
--max-turns 10 \
--max-budget-usd 3--max-turns N— stops after N tool-use turns. A focused fix rarely needs more than 10. Set this as a circuit breaker.--max-budget-usd N— hard dollar cap per invocation. The agent stops when spend reaches this amount. Set this to the maximum you'd accept for a single run.
For GitHub Actions workflows, pass these through claude_args:
- uses: anthropics/claude-code-action@v1
with:
anthropic_api_key: ${{ secrets.ANTHROPIC_API_KEY }}
claude_args: |
--max-turns 10
--max-budget-usd 3Without these flags, a single misbehaving run can cost as much as a week of normal use. With them, your worst case is predictable and bounded.
Team-level controls
Beyond per-run caps, Claude Code has organization-level cost management:
- Workspace spend limits.When you authenticate Claude Code with your Console account, a “Claude Code” workspace is auto-created. Admins can set total spend limits and view usage reports in the Console.
- Workspace rate limits.You can cap Claude Code's share of your org's API rate limits to protect production workloads from being crowded out by developer usage.
- Per-user TPM recommendations. Anthropic publishes recommended tokens-per-minute allocations by team size — from 200k–300k TPM per user for 1–5 people down to 10k–15k for 500+ users. The per-user allocation decreases because fewer people use it concurrently at scale.
Prompt caching: the biggest cost lever
Claude Code automatically uses prompt caching, and it's the single most impactful cost optimization. Prompt caching stores previously processed context (system prompts, file contents, conversation history) and re-reads it at 10% of the standard input price on subsequent requests.
The math: if your system prompt and context is 50,000 tokens and you send 20 messages in a session, without caching you'd pay for 50,000 input tokens × 20 = 1,000,000 input tokens. With caching, you pay full price once (50,000 tokens) and cache-hit price for the other 19 (950,000 tokens at 10% = 95,000 token equivalent). That's roughly a 10× reduction on the cached portion.
Two cache tiers exist: a 5-minute cache (1.25× write cost) and a 1-hour cache (2× write cost). The 5-minute cache pays off after one re-read. The 1-hour cache pays off after two. Claude Code manages this automatically — you don't need to configure anything.
Six strategies to reduce cost
- Use Sonnet by default, Opus when you need it. Switch with
/modelor--model sonnet. Reserve Opus for complex architectural reasoning. For simple sub-tasks, usemodel: haikuin subagent configs. - Clear between tasks.
/clearresets the context window. Stale context from a previous task wastes tokens on every subsequent message. - Reduce extended thinking budget. Extended thinking is on by default and can consume tens of thousands of output tokens per request. For simpler tasks, lower it with
/effortorMAX_THINKING_TOKENS=8000. - Write specific prompts.“Improve this codebase” triggers broad file scanning. “Add input validation to the login function in auth.ts” lets Claude work with minimal reads.
- Delegate verbose operations to subagents. Running tests, reading logs, or processing large files — push these to a subagent so verbose output stays in the subagent's context and only a summary returns.
- Use hooks to preprocess. Instead of Claude reading a 10,000-line log to find errors, a PreToolUse hook can grep for
ERRORand return only matching lines. Thousands of tokens reduced to dozens.
Background agent budgeting: a practical formula
Here's how to estimate your monthly background agent spend:
runs_per_day × avg_cost_per_run × 30 = monthly spend
Example: PR review agent
5 PRs/day × $2/review × 30 = $300/month
Example: nightly codebase sweep
1 run/day × $5/run × 30 = $150/month
Example: Linear issue agent (Cyrus-style)
10 issues/week × $4/issue × 4.3 = $172/monthThe per-run cost depends heavily on codebase size, task complexity, and model choice. Start with --max-budget-usd 5 and check /cost on a few real runs to calibrate. Then set the cap to 2× your observed average — tight enough to catch runaways, loose enough to let complex tasks finish.
Batch API: 50% off for async work
If you're running Claude Code through the API for non-time-sensitive tasks (nightly audits, batch code reviews, documentation generation), the Batch API provides a 50% discount on both input and output tokens. Sonnet input drops from $3/MTok to $1.50/MTok, output from $15/MTok to $7.50/MTok. The tradeoff: results are asynchronous, not streaming.
What Cyrus does about cost
If you're running background agents at any real volume, the cost management burden adds up fast — per-run caps, team budgets, model selection per task type, monitoring for runaway sessions. That overhead is part of what DIY background agents require you to build yourself.
Cyrus handles this with BYOK (bring your own key) — you pay Anthropic directly for tokens, and Cyrus adds the orchestration layer: per-issue budget caps, model routing, isolated git worktree runs, and streamed cost reporting back to the triggering issue. Community self-hosted is free forever. The spend on tokens is yours to manage; the infrastructure to manage it is Cyrus's.
Takeaways
- Claude Code is token-based, not per-seat. Your bill scales with how much context your agents read and how much they write.
- Enterprise average: ~$13/developer/active day on interactive use. Background agents may be higher or lower depending on run frequency and task complexity.
- Always set
--max-turnsand--max-budget-usdfor unattended runs. Non-negotiable. - Prompt caching (automatic) is the biggest savings lever. Model selection (Sonnet vs Opus) is the second.
- Start with a small pilot, measure with
/cost, set caps at 2× your observed average, then scale.
BYOK. Per-issue budgets. Zero infrastructure tax.
Cyrus runs Claude Code (or Codex, Cursor, Gemini) in isolated git worktrees per issue, with built-in per-run budget caps and cost reporting streamed back to the triggering issue. Community self-hosted is free forever, BYOK across all models.
Try Cyrus free →