Cost Management¶

The pipeline enforces token budgets at the flow level to prevent runaway LLM costs.

Cost Ceilings¶

Each Prefect flow has a per-run cost ceiling defined in src/config.py:

Flow	Limit (USD)
`produce_article`	$5.00
`batch_produce`	$100.00
`research_news_scan`	$2.00
`produce_digests`	$9.00

If accumulated LLM spend exceeds the ceiling, call_agent() raises CostLimitExceeded and the flow stops.

How Enforcement Works¶

The current_flow_name ContextVar is set at the start of each @flow function
call_agent() reads this ContextVar (or accepts an explicit flow_name override)
After recording cost to the database, check_cost_limit() compares accumulated cost against the ceiling
If the cost recording itself fails (DB error), a fallback check uses the current call's cost alone -- conservative but prevents runaway spending

Model Selection Strategy¶

Costs are managed by choosing the right model for each task:

Tier	Model	Use Case	Relative Cost
Sonnet	`claude-sonnet-4-6`	Creative writing, research synthesis, complex reasoning	Higher
Haiku	`claude-haiku-4-5`	Classification, scoring, structured evaluation	Lower

Default model assignments are defined in src/config.py and can be overridden per agent via the prompt version in the database.

Prompt Caching¶

System prompts are sent with cache_control: {"type": "ephemeral"} to enable Anthropic server-side caching:

Cache TTL: 24 hours
Cache hit: billed at 10% of normal input token rate
Cache creation: billed at 25% premium (first call only per prompt version)

This provides significant savings during batch production runs where the same prompt is used repeatedly. Cache read and creation tokens are tracked separately in AgentResponse.usage and logged in structured logs.

Image Generation Costs¶

DALL-E 3 image generation is tracked separately from token costs:

Size	Cost per Image
1024x1024	$0.04
1024x1792	$0.08
1792x1024	$0.08

The pipeline uses 1792x1024 by default for featured images. Image generation is optional -- set OPENAI_API_KEY to enable. Articles without featured images proceed normally. Image cost is added to the article's total_cost_usd via increment_article_cost().

Cost Tracking¶

Token usage is recorded per agent call via the cost tracker (src/agents/cost_tracker.py):

calculate_cost() computes USD cost from model, input tokens, output tokens, and cache tokens
record_agent_cost() persists the cost to the database, attributed to the article and flow

The dashboard Analytics page shows cost trends over time, broken down by agent and flow.

Monitoring¶

Cost data is available through multiple channels:

Dashboard Analytics page: cost trends, per-agent breakdowns, production throughput
Langfuse generations: per-call cost recorded in cost_details.total
Structured logs: every agent_call_complete log event includes cost_usd, tokens_in, tokens_out, and cache stats