Database¶

The pipeline uses Neon Postgres with SQLAlchemy Core (no ORM). Tables are defined in src/db/tables.py. Entity models and enums are in src/db/models.py.

Key Tables¶

The application database has 14 tables:

Table	Purpose
`content_backlog`	Content planning items (topics, keywords, priorities)
`articles`	Article lifecycle (status, drafts, quality scores, costs)
`article_images`	Featured image metadata and R2 storage keys (up to 2 candidates per article)
`quality_scores`	Per-iteration quality gate scores and feedback
`agent_prompts`	Versioned LLM prompts per agent (keyed by agent_name + content_type + version)
`content_templates`	Structural templates defining expected article sections by content type and pillar
`pipeline_runs`	Batch run tracking (success/failure counts, Prefect flow run linkage)
`source_registry`	External source configuration for automated scanning (PubMed, RSS, etc.)
`source_items`	Discovered items from source scanning with triage results
`trusted_sources`	Curated catalog of authoritative tinnitus sources for brief generation
`dashboard_users`	Dashboard authentication and roles
`audit_log`	Dashboard action audit trail
`vault_notes`	Obsidian vault note metadata (file hash, frontmatter)
`vault_chunks`	Note chunks with pgvector embeddings and full-text search vectors

Content Backlog Columns¶

The content_backlog table stores article topics for the production pipeline. These columns are also the expected format for CSV import.

Column	Type	Required	Default	Description
`id`	UUID	Auto	Generated	Primary key
`content_type`	text	Yes	--	`satellite`, `cornerstone`, or `research_news`
`pillar`	text	Yes	--	Content pillar (`p1` through `p5`)
`language`	text	Yes	--	Language code (`en` or `de`)
`primary_keyword`	text	Yes	--	Main SEO keyword
`working_title`	text	Yes	--	Article working title
`target_word_count`	integer	Yes	--	Target length in words
`priority_score`	numeric	Yes	--	Priority score (0.0 to 100.0)
`source`	text	Yes	--	Origin (`seo_research`, `manual`, `gap_analysis`)
`secondary_keywords`	text[]	No	`{}`	Additional keywords
`search_intent`	text	No	`informational`	Search intent type
`journey_stage`	text	No	`exploration`	User journey stage
`info_gain_angle`	text	No	`""`	Unique angle or information gain
`aeo_priority`	text	No	`standard`	AI Engine Optimization priority
`authorship_level`	text	No	`brand_attributed`	Attribution level
`status`	text	No	`planned`	`planned`, `brief_generated`, `in_production`, `published`
`cornerstone_id`	UUID	No	null	Parent cornerstone reference (for satellites)
`article_id`	UUID	No	null	Linked article after brief generation
`notes`	text	No	`""`	Free-text notes

Indexes: status, pillar (for fast filtering on the dashboard).

Quality Scores¶

The quality_scores table stores per-iteration results from quality gate agents.

Column	Type	Description
`id`	UUID	Primary key
`article_id`	UUID	FK to `articles.id`
`agent`	text	Quality gate agent (`factuality`, `seo`, `style`)
`passed`	boolean	Whether the gate passed
`score`	numeric	Score from the agent (0-100)
`feedback`	JSONB	Structured feedback from the agent
`iteration`	integer	Quality gate iteration (1-3)
`tokens`	JSONB	Token usage breakdown
`cost_usd`	numeric	Cost of this gate run
`created_at`	timestamptz	When the score was recorded

Indexes: article_id (for loading all scores for an article).

Content Templates¶

The content_templates table stores structural templates that define expected sections for articles by content type and pillar.

Column	Type	Description
`id`	UUID	Primary key
`name`	text	Template name
`content_type`	text	Content type (`satellite`, `cornerstone`, `research_news`)
`pillar`	text	Content pillar (nullable; `null` means applies to all pillars)
`structure`	JSONB	Array of section definitions (type, heading, description, guidance)
`required_elements`	text[]	List of required structural elements
`active`	boolean	Whether this template is active (default: true)
`created_at`	timestamptz	When the template was created

Templates are loaded by the brief generator to provide structural guidance. The style checker validates articles against their template's required elements.

Article Images¶

The article_images table stores featured image metadata. Image binary data is stored in Cloudflare R2 object storage (not in the database).

Column	Type	Description
`id`	UUID	Primary key
`article_id`	UUID	FK to `articles.id` (CASCADE delete)
`ordinal`	integer	Image candidate number (1-based)
`r2_key`	text	R2 object key (e.g. `images/<uuid>/1.png`)
`prompt_used`	text	DALL-E prompt sent to generate this image
`revised_prompt`	text	DALL-E's revised prompt (if it modified the original)
`is_selected`	boolean	Whether this is the selected featured image (default: false)
`generated_at`	timestamptz	When the image was generated

Constraints: Unique on (article_id, ordinal).

Indexes: article_id (for loading images for an article).

pgvector Search Tables¶

The vault knowledge base uses pgvector for hybrid semantic + full-text search.

`vault_notes`¶

Column	Type	Description
`id`	UUID	Primary key
`file_path`	text	Relative path within the vault
`title`	text	Note title from frontmatter
`tags`	text[]	Frontmatter tags
`file_hash`	text	SHA256 of file content (change detection for sync)
`updated_at`	timestamptz	Last sync timestamp

`vault_chunks`¶

Column	Type	Description
`id`	UUID	Primary key
`note_id`	UUID	FK → `vault_notes.id` (CASCADE delete)
`heading`	text	Section heading (`##` boundary)
`content`	text	Chunk text content
`embedding`	vector(1536)	OpenAI text-embedding-3-small vector
`search_vector`	tsvector	PostgreSQL full-text search vector

Indexes:

HNSW index on vault_chunks.embedding for fast approximate nearest-neighbor search
GIN index on vault_chunks.search_vector for full-text search

Hybrid search combines cosine similarity (70% weight) with full-text relevance (30% weight) for optimal recall.