Architecture¶
System Overview¶
graph TB
subgraph External
WP[WordPress]
PubMed[PubMed API]
BraveSearch[Brave Search]
Vault[Obsidian Vault]
Anthropic[Anthropic Claude API]
OpenAI[OpenAI API]
end
subgraph Pipeline["Pipeline (Prefect)"]
BF[Brief Flow]
AF[Article Flow]
PF[Publish Flow]
RNF[Research News Flow]
end
subgraph Agents
BG[Brief Generator]
RES[Researcher]
WR[Writer]
FC[Factuality Checker]
SEO[SEO Optimizer]
SC[Style Checker]
SYN[Synthesis]
IMG[Image Generator]
end
subgraph Storage
DB[(Neon Postgres)]
end
subgraph Dashboard["Dashboard (Streamlit)"]
UI[Operator Dashboard]
end
BF --> BG
AF --> RES --> WR --> FC & SEO & SC --> SYN
PF --> WP
RES --> PubMed & BraveSearch & Vault
BG & RES & WR & FC & SEO & SC & SYN --> Anthropic
IMG --> OpenAI
AF & BF & PF & RNF --> DB
UI --> DB
Component Roles¶
Pipeline (Prefect)¶
Prefect flows handle orchestration — sequencing agent calls, managing retries, and tracking runs. No business logic lives in flows. Each flow is a thin wrapper that calls agents and persists results.
Agents¶
Agents are the core business logic layer. Each agent:
- Loads its prompt from the database via
call_agent() - Sends a request to the Anthropic API
- Returns a validated Pydantic model
See Agent Pattern for the full convention.
Tools¶
Tools provide agents with external data access:
- PubMed — searches medical literature
- Web Search — queries Brave Search API
- Vault Reader — searches research notes via pgvector hybrid search (semantic + full-text)
- Image Generator — creates featured images via DALL-E 3
- Source Scanner — extracts and validates source references
WordPress Publisher¶
The publishing layer converts articles to WordPress format:
- Markdown to Gutenberg blocks
- ACF custom field mapping
- Polylang language association (EN/DE)
- Taxonomy assignment (categories, tags, pillars)
Database¶
Neon Postgres stores all pipeline state:
- Content backlog and article lifecycle
- Agent prompts (versioned)
- Run tracking and cost data
- Research news feed results
- Trusted sources catalog (authoritative sources for brief generation)
- Vault note embeddings (pgvector hybrid search)
Dashboard¶
The Streamlit dashboard provides operator visibility into pipeline status, article details, cost trends, and prompt management. It uses its own database connection pool (NullPool) to avoid event loop conflicts.
Data Flow¶
A typical article production follows this path:
- Backlog item created (dashboard or CSV import)
- Brief flow generates a structured article brief
- Article flow runs the full lifecycle:
- Researcher gathers evidence (PubMed, web, vault)
- Writer produces a draft from brief + dossier
- Quality gates run in parallel (factuality, SEO, readability)
- Synthesis reconciles feedback
- If any gate fails, the writer rewrites (up to 3 times)
- Operator reviews in the dashboard
- Publish flow pushes approved articles to WordPress as drafts
Key Design Decisions¶
- Prompts as data — stored in DB, versioned via Alembic, not hardcoded
- SQLAlchemy Core only — no ORM, direct
select/insert/updatewithTable+MetaData - Async by default —
asyncpgfor Postgres,httpxfor HTTP - Strict typing — mypy strict mode, Pydantic v2 strict validation
- Cost ceilings — per-article and monthly token budgets enforced at runtime
- pgvector hybrid search — vault knowledge base uses semantic (vector) + full-text search in Postgres