Skip to content

Database

The pipeline uses Neon Postgres with SQLAlchemy Core (no ORM). Tables are defined in src/db/tables.py. Entity models and enums are in src/db/models.py.

Key Tables

The application database has 14 tables:

Table Purpose
content_backlog Content planning items (topics, keywords, priorities)
articles Article lifecycle (status, drafts, quality scores, costs)
article_images Featured image metadata and R2 storage keys (up to 2 candidates per article)
quality_scores Per-iteration quality gate scores and feedback
agent_prompts Versioned LLM prompts per agent (keyed by agent_name + content_type + version)
content_templates Structural templates defining expected article sections by content type and pillar
pipeline_runs Batch run tracking (success/failure counts, Prefect flow run linkage)
source_registry External source configuration for automated scanning (PubMed, RSS, etc.)
source_items Discovered items from source scanning with triage results
trusted_sources Curated catalog of authoritative tinnitus sources for brief generation
dashboard_users Dashboard authentication and roles
audit_log Dashboard action audit trail
vault_notes Obsidian vault note metadata (file hash, frontmatter)
vault_chunks Note chunks with pgvector embeddings and full-text search vectors

Content Backlog Columns

The content_backlog table stores article topics for the production pipeline. These columns are also the expected format for CSV import.

Column Type Required Default Description
id UUID Auto Generated Primary key
content_type text Yes -- satellite, cornerstone, or research_news
pillar text Yes -- Content pillar (p1 through p5)
language text Yes -- Language code (en or de)
primary_keyword text Yes -- Main SEO keyword
working_title text Yes -- Article working title
target_word_count integer Yes -- Target length in words
priority_score numeric Yes -- Priority score (0.0 to 100.0)
source text Yes -- Origin (seo_research, manual, gap_analysis)
secondary_keywords text[] No {} Additional keywords
search_intent text No informational Search intent type
journey_stage text No exploration User journey stage
info_gain_angle text No "" Unique angle or information gain
aeo_priority text No standard AI Engine Optimization priority
authorship_level text No brand_attributed Attribution level
status text No planned planned, brief_generated, in_production, published
cornerstone_id UUID No null Parent cornerstone reference (for satellites)
article_id UUID No null Linked article after brief generation
notes text No "" Free-text notes

Indexes: status, pillar (for fast filtering on the dashboard).

Quality Scores

The quality_scores table stores per-iteration results from quality gate agents.

Column Type Description
id UUID Primary key
article_id UUID FK to articles.id
agent text Quality gate agent (factuality, seo, style)
passed boolean Whether the gate passed
score numeric Score from the agent (0-100)
feedback JSONB Structured feedback from the agent
iteration integer Quality gate iteration (1-3)
tokens JSONB Token usage breakdown
cost_usd numeric Cost of this gate run
created_at timestamptz When the score was recorded

Indexes: article_id (for loading all scores for an article).

Content Templates

The content_templates table stores structural templates that define expected sections for articles by content type and pillar.

Column Type Description
id UUID Primary key
name text Template name
content_type text Content type (satellite, cornerstone, research_news)
pillar text Content pillar (nullable; null means applies to all pillars)
structure JSONB Array of section definitions (type, heading, description, guidance)
required_elements text[] List of required structural elements
active boolean Whether this template is active (default: true)
created_at timestamptz When the template was created

Templates are loaded by the brief generator to provide structural guidance. The style checker validates articles against their template's required elements.

Article Images

The article_images table stores featured image metadata. Image binary data is stored in Cloudflare R2 object storage (not in the database).

Column Type Description
id UUID Primary key
article_id UUID FK to articles.id (CASCADE delete)
ordinal integer Image candidate number (1-based)
r2_key text R2 object key (e.g. images/<uuid>/1.png)
prompt_used text DALL-E prompt sent to generate this image
revised_prompt text DALL-E's revised prompt (if it modified the original)
is_selected boolean Whether this is the selected featured image (default: false)
generated_at timestamptz When the image was generated

Constraints: Unique on (article_id, ordinal).

Indexes: article_id (for loading images for an article).

pgvector Search Tables

The vault knowledge base uses pgvector for hybrid semantic + full-text search.

vault_notes

Column Type Description
id UUID Primary key
file_path text Relative path within the vault
title text Note title from frontmatter
tags text[] Frontmatter tags
file_hash text SHA256 of file content (change detection for sync)
updated_at timestamptz Last sync timestamp

vault_chunks

Column Type Description
id UUID Primary key
note_id UUID FK → vault_notes.id (CASCADE delete)
heading text Section heading (## boundary)
content text Chunk text content
embedding vector(1536) OpenAI text-embedding-3-small vector
search_vector tsvector PostgreSQL full-text search vector

Indexes:

  • HNSW index on vault_chunks.embedding for fast approximate nearest-neighbor search
  • GIN index on vault_chunks.search_vector for full-text search

Hybrid search combines cosine similarity (70% weight) with full-text relevance (30% weight) for optimal recall.