Cost Management¶
The pipeline enforces token budgets at the flow level to prevent runaway LLM costs.
Cost Ceilings¶
Each Prefect flow has a per-run cost ceiling defined in src/config.py:
| Flow | Limit (USD) |
|---|---|
produce_article |
$5.00 |
batch_produce |
$100.00 |
research_news_scan |
$2.00 |
produce_digests |
$9.00 |
If accumulated LLM spend exceeds the ceiling, call_agent() raises CostLimitExceeded and the flow stops.
How Enforcement Works¶
- The
current_flow_nameContextVar is set at the start of each@flowfunction call_agent()reads this ContextVar (or accepts an explicitflow_nameoverride)- After recording cost to the database,
check_cost_limit()compares accumulated cost against the ceiling - If the cost recording itself fails (DB error), a fallback check uses the current call's cost alone -- conservative but prevents runaway spending
Model Selection Strategy¶
Costs are managed by choosing the right model for each task:
| Tier | Model | Use Case | Relative Cost |
|---|---|---|---|
| Sonnet | claude-sonnet-4-6 |
Creative writing, research synthesis, complex reasoning | Higher |
| Haiku | claude-haiku-4-5 |
Classification, scoring, structured evaluation | Lower |
Default model assignments are defined in src/config.py and can be overridden per agent via the prompt version in the database.
Prompt Caching¶
System prompts are sent with cache_control: {"type": "ephemeral"} to enable Anthropic server-side caching:
- Cache TTL: 24 hours
- Cache hit: billed at 10% of normal input token rate
- Cache creation: billed at 25% premium (first call only per prompt version)
This provides significant savings during batch production runs where the same prompt is used repeatedly. Cache read and creation tokens are tracked separately in AgentResponse.usage and logged in structured logs.
Image Generation Costs¶
DALL-E 3 image generation is tracked separately from token costs:
| Size | Cost per Image |
|---|---|
| 1024x1024 | $0.04 |
| 1024x1792 | $0.08 |
| 1792x1024 | $0.08 |
The pipeline uses 1792x1024 by default for featured images. Image generation is optional -- set OPENAI_API_KEY to enable. Articles without featured images proceed normally. Image cost is added to the article's total_cost_usd via increment_article_cost().
Cost Tracking¶
Token usage is recorded per agent call via the cost tracker (src/agents/cost_tracker.py):
calculate_cost()computes USD cost from model, input tokens, output tokens, and cache tokensrecord_agent_cost()persists the cost to the database, attributed to the article and flow
The dashboard Analytics page shows cost trends over time, broken down by agent and flow.
Monitoring¶
Cost data is available through multiple channels:
- Dashboard Analytics page: cost trends, per-agent breakdowns, production throughput
- Langfuse generations: per-call cost recorded in
cost_details.total - Structured logs: every
agent_call_completelog event includescost_usd,tokens_in,tokens_out, and cache stats