Skip to content

Architecture

System Overview

graph TB
    subgraph External
        WP[WordPress]
        PubMed[PubMed API]
        BraveSearch[Brave Search]
        Vault[Obsidian Vault]
        Anthropic[Anthropic Claude API]
        OpenAI[OpenAI API]
    end

    subgraph Pipeline["Pipeline (Prefect)"]
        BF[Brief Flow]
        AF[Article Flow]
        PF[Publish Flow]
        RNF[Research News Flow]
    end

    subgraph Agents
        BG[Brief Generator]
        RES[Researcher]
        WR[Writer]
        FC[Factuality Checker]
        SEO[SEO Optimizer]
        SC[Style Checker]
        SYN[Synthesis]
        IMG[Image Generator]
    end

    subgraph Storage
        DB[(Neon Postgres)]
    end

    subgraph Dashboard["Dashboard (Streamlit)"]
        UI[Operator Dashboard]
    end

    BF --> BG
    AF --> RES --> WR --> FC & SEO & SC --> SYN
    PF --> WP
    RES --> PubMed & BraveSearch & Vault
    BG & RES & WR & FC & SEO & SC & SYN --> Anthropic
    IMG --> OpenAI
    AF & BF & PF & RNF --> DB
    UI --> DB

Component Roles

Pipeline (Prefect)

Prefect flows handle orchestration — sequencing agent calls, managing retries, and tracking runs. No business logic lives in flows. Each flow is a thin wrapper that calls agents and persists results.

Agents

Agents are the core business logic layer. Each agent:

  • Loads its prompt from the database via call_agent()
  • Sends a request to the Anthropic API
  • Returns a validated Pydantic model

See Agent Pattern for the full convention.

Tools

Tools provide agents with external data access:

  • PubMed — searches medical literature
  • Web Search — queries Brave Search API
  • Vault Reader — searches research notes via pgvector hybrid search (semantic + full-text)
  • Image Generator — creates featured images via DALL-E 3
  • Source Scanner — extracts and validates source references

WordPress Publisher

The publishing layer converts articles to WordPress format:

  • Markdown to Gutenberg blocks
  • ACF custom field mapping
  • Polylang language association (EN/DE)
  • Taxonomy assignment (categories, tags, pillars)

Database

Neon Postgres stores all pipeline state:

  • Content backlog and article lifecycle
  • Agent prompts (versioned)
  • Run tracking and cost data
  • Research news feed results
  • Trusted sources catalog (authoritative sources for brief generation)
  • Vault note embeddings (pgvector hybrid search)

Dashboard

The Streamlit dashboard provides operator visibility into pipeline status, article details, cost trends, and prompt management. It uses its own database connection pool (NullPool) to avoid event loop conflicts.

Data Flow

A typical article production follows this path:

  1. Backlog item created (dashboard or CSV import)
  2. Brief flow generates a structured article brief
  3. Article flow runs the full lifecycle:
    • Researcher gathers evidence (PubMed, web, vault)
    • Writer produces a draft from brief + dossier
    • Quality gates run in parallel (factuality, SEO, readability)
    • Synthesis reconciles feedback
    • If any gate fails, the writer rewrites (up to 3 times)
  4. Operator reviews in the dashboard
  5. Publish flow pushes approved articles to WordPress as drafts

Key Design Decisions

  • Prompts as data — stored in DB, versioned via Alembic, not hardcoded
  • SQLAlchemy Core only — no ORM, direct select/insert/update with Table + MetaData
  • Async by defaultasyncpg for Postgres, httpx for HTTP
  • Strict typing — mypy strict mode, Pydantic v2 strict validation
  • Cost ceilings — per-article and monthly token budgets enforced at runtime
  • pgvector hybrid search — vault knowledge base uses semantic (vector) + full-text search in Postgres