Skip to content

Cost Management

The pipeline enforces token budgets at the flow level to prevent runaway LLM costs.

Cost Ceilings

Each Prefect flow has a per-run cost ceiling defined in src/config.py:

Flow Limit (USD)
produce_article $5.00
batch_produce $100.00
research_news_scan $2.00
produce_digests $9.00

If accumulated LLM spend exceeds the ceiling, call_agent() raises CostLimitExceeded and the flow stops.

How Enforcement Works

  1. The current_flow_name ContextVar is set at the start of each @flow function
  2. call_agent() reads this ContextVar (or accepts an explicit flow_name override)
  3. After recording cost to the database, check_cost_limit() compares accumulated cost against the ceiling
  4. If the cost recording itself fails (DB error), a fallback check uses the current call's cost alone -- conservative but prevents runaway spending

Model Selection Strategy

Costs are managed by choosing the right model for each task:

Tier Model Use Case Relative Cost
Sonnet claude-sonnet-4-6 Creative writing, research synthesis, complex reasoning Higher
Haiku claude-haiku-4-5 Classification, scoring, structured evaluation Lower

Default model assignments are defined in src/config.py and can be overridden per agent via the prompt version in the database.

Prompt Caching

System prompts are sent with cache_control: {"type": "ephemeral"} to enable Anthropic server-side caching:

  • Cache TTL: 24 hours
  • Cache hit: billed at 10% of normal input token rate
  • Cache creation: billed at 25% premium (first call only per prompt version)

This provides significant savings during batch production runs where the same prompt is used repeatedly. Cache read and creation tokens are tracked separately in AgentResponse.usage and logged in structured logs.

Image Generation Costs

DALL-E 3 image generation is tracked separately from token costs:

Size Cost per Image
1024x1024 $0.04
1024x1792 $0.08
1792x1024 $0.08

The pipeline uses 1792x1024 by default for featured images. Image generation is optional -- set OPENAI_API_KEY to enable. Articles without featured images proceed normally. Image cost is added to the article's total_cost_usd via increment_article_cost().

Cost Tracking

Token usage is recorded per agent call via the cost tracker (src/agents/cost_tracker.py):

  • calculate_cost() computes USD cost from model, input tokens, output tokens, and cache tokens
  • record_agent_cost() persists the cost to the database, attributed to the article and flow

The dashboard Analytics page shows cost trends over time, broken down by agent and flow.

Monitoring

Cost data is available through multiple channels:

  • Dashboard Analytics page: cost trends, per-agent breakdowns, production throughput
  • Langfuse generations: per-call cost recorded in cost_details.total
  • Structured logs: every agent_call_complete log event includes cost_usd, tokens_in, tokens_out, and cache stats