Agent Pattern¶
All AI agents in the pipeline follow a consistent pattern built around call_agent() in src/agents/base.py.
How call_agent() Works¶
call_agent() is the single entry point for all LLM interactions. It handles:
- Prompt loading -- fetches the active system prompt from the database (cached with 300s TTL)
- Langfuse observation -- starts a generation observation (if Langfuse is configured)
- API call -- sends the request to Anthropic with exponential backoff on 429/5xx errors
- Tool use loop -- if the model requests tool calls, executes them and continues the conversation
- Structured output validation -- when using structured output mode, detects truncation and raises
- Cost tracking -- calculates token costs (including prompt cache savings) and records to the database
- Langfuse update -- records output, usage, cost, and metadata to the generation observation
- Structured logging -- logs agent name, duration, token usage, cost, and cache stats
Parameters¶
| Parameter | Type | Description |
|---|---|---|
agent_name |
str |
Agent identifier (e.g., "factuality_checker") |
content_type |
str |
Content category for prompt lookup (e.g., "article") |
messages |
list[dict] |
Conversation history |
article_id |
UUID | None |
For cost attribution |
tools |
dict[str, Callable] | None |
Tool name to async handler map |
tool_definitions |
list[dict] | None |
JSON Schema definitions for Claude's tool protocol |
max_tokens |
int |
Token budget (default 4096) |
max_iterations |
int |
Max tool use loop cycles (default 10) |
flow_name |
str | None |
Override for cost ceiling enforcement (defaults to current_flow_name ContextVar) |
session |
AsyncSession | None |
Reuse a caller-provided DB session for cost recording |
structured_output_schema |
dict | None |
Pydantic JSON schema for Anthropic structured output mode |
assistant_prefill |
str | None |
Pre-fill the assistant response (for steering output format) |
Return Type¶
@dataclass
class AgentResponse:
content: str # Final text response
tool_results: list[dict] # Executed tools with inputs/outputs
usage: dict[str, int] # {"input_tokens": N, "output_tokens": N, ...}
cost_usd: float # Calculated cost (including cache savings)
model: str # Model used (from prompt)
duration_ms: int # Wall-clock milliseconds
stop_reason: str # API stop reason ("end_turn", "tool_use", "max_tokens")
The usage dict may also include cache_read_input_tokens and cache_creation_input_tokens when prompt caching is active.
Structured Output¶
Most agents now use Anthropic's structured output mode instead of extracting JSON from free-text responses. When structured_output_schema is provided:
- The schema is transformed by
schema_transform.pyinto Anthropic's restricted JSON Schema format - The API returns guaranteed-valid JSON matching the schema
- The response is validated with
validate_structured_output()(directmodel_validate_json, no extraction needed) - Truncation is detected (if
stop_reason == "max_tokens") and raises aValueError
The parse_agent_output() function (which uses extract_json() with repair logic) is retained only for backward compatibility with evaluation tests.
Prompt Caching¶
System prompts are sent with cache_control: {"type": "ephemeral"} to enable Anthropic server-side prompt caching:
- Cache TTL: 24 hours
- Cache hit rate: billed at 10% of normal input token rate
- Cache creation: billed at 25% premium over normal input rate (first call only)
This significantly reduces costs for repeated agent calls with the same prompt version, which is the common case in batch production runs.
Prompt Loading¶
Prompts are stored in the agent_prompts table and loaded via get_prompt(agent_name, content_type):
- Check in-memory TTL cache (300s)
- On miss: query DB for the active prompt matching
agent_nameandcontent_type - Falls back to
content_type='all'if no exact match - Raises
PromptNotFoundErrorif no active prompt exists - Cache can be invalidated via
invalidate_prompt_cache(agent_name)(used by dashboard edits)
Tool Dispatch¶
Agents that need external data (e.g., the factuality checker needs PubMed) use the tool use loop:
# In the agent module
_TOOL_MAP = {"search_pubmed": _wrap_search_pubmed, ...}
_TOOL_DEFINITIONS = [SEARCH_PUBMED_SCHEMA, ...]
# Pass to call_agent
response = await call_agent(
"factuality_checker", content_type,
messages, tools=_TOOL_MAP, tool_definitions=_TOOL_DEFINITIONS,
)
Inside call_agent(), the loop runs while the model's stop_reason is "tool_use":
- Extract tool call blocks from the response
- Execute each tool:
result = await tools[tool_name](**block.input) - Sanitize the result via
sanitize_external_content() - Append tool results to the conversation
- Re-invoke the API with updated messages
Agent Types¶
Tool-Using Agents¶
Agents like the Researcher and Factuality Checker use tools to access external data. They:
- Define tool wrapper functions that return JSON strings
- Build a
_TOOL_MAPand_TOOL_DEFINITIONS - Pass these to
call_agent()with a highermax_iterations
Generation-Only Agents¶
Agents like the Writer and Brief Generator produce output without tool calls. They:
- Assemble a user message from structured inputs (brief, dossier)
- Call
call_agent()withouttoolsortool_definitions - Parse the JSON response into a Pydantic model
I/O Schemas¶
All agent inputs and outputs are defined as Pydantic v2 models in src/agents/schemas.py:
| Agent | Output Model | Key Fields |
|---|---|---|
| Brief Generator | ArticleBrief |
Structure, keywords, outline, AEO spec, content template |
| Researcher | ResearchDossier |
Sources, evidence summaries, vault notes consulted, FAQ suggestions |
| Writer | WriterOutput |
Draft, meta description, SEO title, slug, cited sources |
| Factuality Checker | FactualityOutput |
Passed, score, claims checked/verified/flagged, issues with severity/location |
| SEO Optimizer | SeoOutput |
Passed, score, deterministic checks (SeoCheckResult), LLM evaluation, suggested fixes |
| Style Checker | StyleOutput |
Passed, score, readability stats, structure compliance, voice evaluation, humanizer check, medical language |
| Synthesis | SynthesisOutput |
Revised draft, change log, conflict resolutions, requires_human_review flag |
| Triage | TriageOutput |
List of TriageResult per source item (relevance, novelty, accessibility scores) |
| Digest Writer | DigestOutput |
Editorial intro, item summaries with citations, SEO metadata |
| Image Generator | ImageGeneratorOutput |
Image bytes, prompt used, revised prompt |
All quality gate agents extend a common QualityGateOutput base with passed, score, and feedback fields.
Tool-Using + External API Agents¶
The Image Generator uses Claude to compose a DALL-E prompt from article context, then calls the OpenAI API via a tool to generate the image. It combines both patterns: LLM reasoning (Anthropic) and external API tool use (OpenAI DALL-E 3).
Adding a New Agent¶
See the Add an Agent how-to guide for step-by-step instructions.