Architecture¶

System Overview¶

graph TB
    subgraph External
        WP[WordPress]
        PubMed[PubMed API]
        BraveSearch[Brave Search]
        Vault[Obsidian Vault]
        Anthropic[Anthropic Claude API]
        OpenAI[OpenAI API]
    end

    subgraph Pipeline["Pipeline (Prefect)"]
        BF[Brief Flow]
        AF[Article Flow]
        PF[Publish Flow]
        RNF[Research News Flow]
    end

    subgraph Agents
        BG[Brief Generator]
        RES[Researcher]
        WR[Writer]
        FC[Factuality Checker]
        SEO[SEO Optimizer]
        SC[Style Checker]
        SYN[Synthesis]
        IMG[Image Generator]
    end

    subgraph Storage
        DB[(Neon Postgres)]
    end

    subgraph Dashboard["Dashboard (Streamlit)"]
        UI[Operator Dashboard]
    end

    BF --> BG
    AF --> RES --> WR --> FC & SEO & SC --> SYN
    PF --> WP
    RES --> PubMed & BraveSearch & Vault
    BG & RES & WR & FC & SEO & SC & SYN --> Anthropic
    IMG --> OpenAI
    AF & BF & PF & RNF --> DB
    UI --> DB

Component Roles¶

Pipeline (Prefect)¶

Prefect flows handle orchestration — sequencing agent calls, managing retries, and tracking runs. No business logic lives in flows. Each flow is a thin wrapper that calls agents and persists results.

Agents¶

Agents are the core business logic layer. Each agent:

Loads its prompt from the database via call_agent()
Sends a request to the Anthropic API
Returns a validated Pydantic model

See Agent Pattern for the full convention.

Tools¶

Tools provide agents with external data access:

PubMed — searches medical literature
Web Search — queries Brave Search API
Vault Reader — searches research notes via pgvector hybrid search (semantic + full-text)
Image Generator — creates featured images via DALL-E 3
Source Scanner — extracts and validates source references

WordPress Publisher¶

The publishing layer converts articles to WordPress format:

Markdown to Gutenberg blocks
ACF custom field mapping
Polylang language association (EN/DE)
Taxonomy assignment (categories, tags, pillars)

Database¶

Neon Postgres stores all pipeline state:

Content backlog and article lifecycle
Agent prompts (versioned)
Run tracking and cost data
Research news feed results
Trusted sources catalog (authoritative sources for brief generation)
Vault note embeddings (pgvector hybrid search)

Dashboard¶

The Streamlit dashboard provides operator visibility into pipeline status, article details, cost trends, and prompt management. It uses its own database connection pool (NullPool) to avoid event loop conflicts.

Data Flow¶

A typical article production follows this path:

Backlog item created (dashboard or CSV import)
Brief flow generates a structured article brief
Article flow runs the full lifecycle:
- Researcher gathers evidence (PubMed, web, vault)
- Writer produces a draft from brief + dossier
- Quality gates run in parallel (factuality, SEO, readability)
- Synthesis reconciles feedback
- If any gate fails, the writer rewrites (up to 3 times)
Operator reviews in the dashboard
Publish flow pushes approved articles to WordPress as drafts

Key Design Decisions¶

Prompts as data — stored in DB, versioned via Alembic, not hardcoded
SQLAlchemy Core only — no ORM, direct select/insert/update with Table + MetaData
Async by default — asyncpg for Postgres, httpx for HTTP
Strict typing — mypy strict mode, Pydantic v2 strict validation
Cost ceilings — per-article and monthly token budgets enforced at runtime
pgvector hybrid search — vault knowledge base uses semantic (vector) + full-text search in Postgres