Agent Pattern¶

All AI agents in the pipeline follow a consistent pattern built around call_agent() in src/agents/base.py.

How `call_agent()` Works¶

call_agent() is the single entry point for all LLM interactions. It handles:

Prompt loading -- fetches the active system prompt from the database (cached with 300s TTL)
Langfuse observation -- starts a generation observation (if Langfuse is configured)
API call -- sends the request to Anthropic with exponential backoff on 429/5xx errors
Tool use loop -- if the model requests tool calls, executes them and continues the conversation
Structured output validation -- when using structured output mode, detects truncation and raises
Cost tracking -- calculates token costs (including prompt cache savings) and records to the database
Langfuse update -- records output, usage, cost, and metadata to the generation observation
Structured logging -- logs agent name, duration, token usage, cost, and cache stats

Parameters¶

Parameter	Type	Description
`agent_name`	`str`	Agent identifier (e.g., `"factuality_checker"`)
`content_type`	`str`	Content category for prompt lookup (e.g., `"article"`)
`messages`	`list[dict]`	Conversation history
`article_id`	`UUID \| None`	For cost attribution
`tools`	`dict[str, Callable] \| None`	Tool name to async handler map
`tool_definitions`	`list[dict] \| None`	JSON Schema definitions for Claude's tool protocol
`max_tokens`	`int`	Token budget (default 4096)
`max_iterations`	`int`	Max tool use loop cycles (default 10)
`flow_name`	`str \| None`	Override for cost ceiling enforcement (defaults to `current_flow_name` ContextVar)
`session`	`AsyncSession \| None`	Reuse a caller-provided DB session for cost recording
`structured_output_schema`	`dict \| None`	Pydantic JSON schema for Anthropic structured output mode
`assistant_prefill`	`str \| None`	Pre-fill the assistant response (for steering output format)

Return Type¶

@dataclass
class AgentResponse:
    content: str              # Final text response
    tool_results: list[dict]  # Executed tools with inputs/outputs
    usage: dict[str, int]     # {"input_tokens": N, "output_tokens": N, ...}
    cost_usd: float           # Calculated cost (including cache savings)
    model: str                # Model used (from prompt)
    duration_ms: int          # Wall-clock milliseconds
    stop_reason: str          # API stop reason ("end_turn", "tool_use", "max_tokens")

The usage dict may also include cache_read_input_tokens and cache_creation_input_tokens when prompt caching is active.

Structured Output¶

Most agents now use Anthropic's structured output mode instead of extracting JSON from free-text responses. When structured_output_schema is provided:

The schema is transformed by schema_transform.py into Anthropic's restricted JSON Schema format
The API returns guaranteed-valid JSON matching the schema
The response is validated with validate_structured_output() (direct model_validate_json, no extraction needed)
Truncation is detected (if stop_reason == "max_tokens") and raises a ValueError

The parse_agent_output() function (which uses extract_json() with repair logic) is retained only for backward compatibility with evaluation tests.

Prompt Caching¶

System prompts are sent with cache_control: {"type": "ephemeral"} to enable Anthropic server-side prompt caching:

Cache TTL: 24 hours
Cache hit rate: billed at 10% of normal input token rate
Cache creation: billed at 25% premium over normal input rate (first call only)

This significantly reduces costs for repeated agent calls with the same prompt version, which is the common case in batch production runs.

Prompt Loading¶

Prompts are stored in the agent_prompts table and loaded via get_prompt(agent_name, content_type):

Check in-memory TTL cache (300s)
On miss: query DB for the active prompt matching agent_name and content_type
Falls back to content_type='all' if no exact match
Raises PromptNotFoundError if no active prompt exists
Cache can be invalidated via invalidate_prompt_cache(agent_name) (used by dashboard edits)

Tool Dispatch¶

Agents that need external data (e.g., the factuality checker needs PubMed) use the tool use loop:

# In the agent module
_TOOL_MAP = {"search_pubmed": _wrap_search_pubmed, ...}
_TOOL_DEFINITIONS = [SEARCH_PUBMED_SCHEMA, ...]

# Pass to call_agent
response = await call_agent(
    "factuality_checker", content_type,
    messages, tools=_TOOL_MAP, tool_definitions=_TOOL_DEFINITIONS,
)

Inside call_agent(), the loop runs while the model's stop_reason is "tool_use":

Extract tool call blocks from the response
Execute each tool: result = await tools[tool_name](**block.input)
Sanitize the result via sanitize_external_content()
Append tool results to the conversation
Re-invoke the API with updated messages

Agent Types¶

Tool-Using Agents¶

Agents like the Researcher and Factuality Checker use tools to access external data. They:

Define tool wrapper functions that return JSON strings
Build a _TOOL_MAP and _TOOL_DEFINITIONS
Pass these to call_agent() with a higher max_iterations

Generation-Only Agents¶

Agents like the Writer and Brief Generator produce output without tool calls. They:

Assemble a user message from structured inputs (brief, dossier)
Call call_agent() without tools or tool_definitions
Parse the JSON response into a Pydantic model

I/O Schemas¶

All agent inputs and outputs are defined as Pydantic v2 models in src/agents/schemas.py:

Agent	Output Model	Key Fields
Brief Generator	`ArticleBrief`	Structure, keywords, outline, AEO spec, content template
Researcher	`ResearchDossier`	Sources, evidence summaries, vault notes consulted, FAQ suggestions
Writer	`WriterOutput`	Draft, meta description, SEO title, slug, cited sources
Factuality Checker	`FactualityOutput`	Passed, score, claims checked/verified/flagged, issues with severity/location
SEO Optimizer	`SeoOutput`	Passed, score, deterministic checks (`SeoCheckResult`), LLM evaluation, suggested fixes
Style Checker	`StyleOutput`	Passed, score, readability stats, structure compliance, voice evaluation, humanizer check, medical language
Synthesis	`SynthesisOutput`	Revised draft, change log, conflict resolutions, requires_human_review flag
Triage	`TriageOutput`	List of `TriageResult` per source item (relevance, novelty, accessibility scores)
Digest Writer	`DigestOutput`	Editorial intro, item summaries with citations, SEO metadata
Image Generator	`ImageGeneratorOutput`	Image bytes, prompt used, revised prompt

All quality gate agents extend a common QualityGateOutput base with passed, score, and feedback fields.

Tool-Using + External API Agents¶

The Image Generator uses Claude to compose a DALL-E prompt from article context, then calls the OpenAI API via a tool to generate the image. It combines both patterns: LLM reasoning (Anthropic) and external API tool use (OpenAI DALL-E 3).

Adding a New Agent¶

See the Add an Agent how-to guide for step-by-step instructions.