Skip to content

Agent Pattern

All AI agents in the pipeline follow a consistent pattern built around call_agent() in src/agents/base.py.

How call_agent() Works

call_agent() is the single entry point for all LLM interactions. It handles:

  1. Prompt loading -- fetches the active system prompt from the database (cached with 300s TTL)
  2. Langfuse observation -- starts a generation observation (if Langfuse is configured)
  3. API call -- sends the request to Anthropic with exponential backoff on 429/5xx errors
  4. Tool use loop -- if the model requests tool calls, executes them and continues the conversation
  5. Structured output validation -- when using structured output mode, detects truncation and raises
  6. Cost tracking -- calculates token costs (including prompt cache savings) and records to the database
  7. Langfuse update -- records output, usage, cost, and metadata to the generation observation
  8. Structured logging -- logs agent name, duration, token usage, cost, and cache stats

Parameters

Parameter Type Description
agent_name str Agent identifier (e.g., "factuality_checker")
content_type str Content category for prompt lookup (e.g., "article")
messages list[dict] Conversation history
article_id UUID | None For cost attribution
tools dict[str, Callable] | None Tool name to async handler map
tool_definitions list[dict] | None JSON Schema definitions for Claude's tool protocol
max_tokens int Token budget (default 4096)
max_iterations int Max tool use loop cycles (default 10)
flow_name str | None Override for cost ceiling enforcement (defaults to current_flow_name ContextVar)
session AsyncSession | None Reuse a caller-provided DB session for cost recording
structured_output_schema dict | None Pydantic JSON schema for Anthropic structured output mode
assistant_prefill str | None Pre-fill the assistant response (for steering output format)

Return Type

@dataclass
class AgentResponse:
    content: str              # Final text response
    tool_results: list[dict]  # Executed tools with inputs/outputs
    usage: dict[str, int]     # {"input_tokens": N, "output_tokens": N, ...}
    cost_usd: float           # Calculated cost (including cache savings)
    model: str                # Model used (from prompt)
    duration_ms: int          # Wall-clock milliseconds
    stop_reason: str          # API stop reason ("end_turn", "tool_use", "max_tokens")

The usage dict may also include cache_read_input_tokens and cache_creation_input_tokens when prompt caching is active.

Structured Output

Most agents now use Anthropic's structured output mode instead of extracting JSON from free-text responses. When structured_output_schema is provided:

  1. The schema is transformed by schema_transform.py into Anthropic's restricted JSON Schema format
  2. The API returns guaranteed-valid JSON matching the schema
  3. The response is validated with validate_structured_output() (direct model_validate_json, no extraction needed)
  4. Truncation is detected (if stop_reason == "max_tokens") and raises a ValueError

The parse_agent_output() function (which uses extract_json() with repair logic) is retained only for backward compatibility with evaluation tests.

Prompt Caching

System prompts are sent with cache_control: {"type": "ephemeral"} to enable Anthropic server-side prompt caching:

  • Cache TTL: 24 hours
  • Cache hit rate: billed at 10% of normal input token rate
  • Cache creation: billed at 25% premium over normal input rate (first call only)

This significantly reduces costs for repeated agent calls with the same prompt version, which is the common case in batch production runs.

Prompt Loading

Prompts are stored in the agent_prompts table and loaded via get_prompt(agent_name, content_type):

  1. Check in-memory TTL cache (300s)
  2. On miss: query DB for the active prompt matching agent_name and content_type
  3. Falls back to content_type='all' if no exact match
  4. Raises PromptNotFoundError if no active prompt exists
  5. Cache can be invalidated via invalidate_prompt_cache(agent_name) (used by dashboard edits)

Tool Dispatch

Agents that need external data (e.g., the factuality checker needs PubMed) use the tool use loop:

# In the agent module
_TOOL_MAP = {"search_pubmed": _wrap_search_pubmed, ...}
_TOOL_DEFINITIONS = [SEARCH_PUBMED_SCHEMA, ...]

# Pass to call_agent
response = await call_agent(
    "factuality_checker", content_type,
    messages, tools=_TOOL_MAP, tool_definitions=_TOOL_DEFINITIONS,
)

Inside call_agent(), the loop runs while the model's stop_reason is "tool_use":

  1. Extract tool call blocks from the response
  2. Execute each tool: result = await tools[tool_name](**block.input)
  3. Sanitize the result via sanitize_external_content()
  4. Append tool results to the conversation
  5. Re-invoke the API with updated messages

Agent Types

Tool-Using Agents

Agents like the Researcher and Factuality Checker use tools to access external data. They:

  1. Define tool wrapper functions that return JSON strings
  2. Build a _TOOL_MAP and _TOOL_DEFINITIONS
  3. Pass these to call_agent() with a higher max_iterations

Generation-Only Agents

Agents like the Writer and Brief Generator produce output without tool calls. They:

  1. Assemble a user message from structured inputs (brief, dossier)
  2. Call call_agent() without tools or tool_definitions
  3. Parse the JSON response into a Pydantic model

I/O Schemas

All agent inputs and outputs are defined as Pydantic v2 models in src/agents/schemas.py:

Agent Output Model Key Fields
Brief Generator ArticleBrief Structure, keywords, outline, AEO spec, content template
Researcher ResearchDossier Sources, evidence summaries, vault notes consulted, FAQ suggestions
Writer WriterOutput Draft, meta description, SEO title, slug, cited sources
Factuality Checker FactualityOutput Passed, score, claims checked/verified/flagged, issues with severity/location
SEO Optimizer SeoOutput Passed, score, deterministic checks (SeoCheckResult), LLM evaluation, suggested fixes
Style Checker StyleOutput Passed, score, readability stats, structure compliance, voice evaluation, humanizer check, medical language
Synthesis SynthesisOutput Revised draft, change log, conflict resolutions, requires_human_review flag
Triage TriageOutput List of TriageResult per source item (relevance, novelty, accessibility scores)
Digest Writer DigestOutput Editorial intro, item summaries with citations, SEO metadata
Image Generator ImageGeneratorOutput Image bytes, prompt used, revised prompt

All quality gate agents extend a common QualityGateOutput base with passed, score, and feedback fields.

Tool-Using + External API Agents

The Image Generator uses Claude to compose a DALL-E prompt from article context, then calls the OpenAI API via a tool to generate the image. It combines both patterns: LLM reasoning (Anthropic) and external API tool use (OpenAI DALL-E 3).

Adding a New Agent

See the Add an Agent how-to guide for step-by-step instructions.