Database¶

The pipeline uses Neon Postgres with SQLAlchemy Core (no ORM). Tables are defined in src/db/tables.py.

Key Tables¶

Table	Purpose
`content_backlog`	Content planning items (topics, keywords, priorities)
`articles`	Article lifecycle (status, drafts, quality scores, costs)
`agent_prompts`	Versioned LLM prompts per agent
`pipeline_runs`	Batch run tracking (success/failure counts)
`source_registry`	External source configuration for automated scanning (PubMed, RSS, etc.)
`source_items`	Discovered items from source scanning
`trusted_sources`	Curated catalog of authoritative tinnitus sources for brief generation
`dashboard_users`	Dashboard authentication and roles
`audit_log`	Dashboard action audit trail
`vault_notes`	Obsidian vault note metadata (file hash, frontmatter)
`vault_chunks`	Note chunks with pgvector embeddings and full-text search vectors

The content_backlog table stores article topics for the production pipeline. These columns are also the expected format for CSV import.

Column	Type	Required	Default	Description
`id`	UUID	Auto	Generated	Primary key
`content_type`	text	Yes	--	`satellite`, `cornerstone`, or `research_news`
`pillar`	text	Yes	--	Content pillar (`p1` through `p5`)
`language`	text	Yes	--	Language code (`en` or `de`)
`primary_keyword`	text	Yes	--	Main SEO keyword
`working_title`	text	Yes	--	Article working title
`target_word_count`	integer	Yes	--	Target length in words
`priority_score`	numeric	Yes	--	Priority score (0.0 to 100.0)
`source`	text	Yes	--	Origin (`seo_research`, `manual`, `gap_analysis`)
`secondary_keywords`	text[]	No	`{}`	Additional keywords
`search_intent`	text	No	`informational`	Search intent type
`journey_stage`	text	No	`exploration`	User journey stage
`info_gain_angle`	text	No	`""`	Unique angle or information gain
`aeo_priority`	text	No	`standard`	AI Engine Optimization priority
`authorship_level`	text	No	`brand_attributed`	Attribution level
`status`	text	No	`planned`	`planned`, `brief_generated`, `in_production`, `published`
`cornerstone_id`	UUID	No	null	Parent cornerstone reference (for satellites)
`article_id`	UUID	No	null	Linked article after brief generation
`notes`	text	No	`""`	Free-text notes

Indexes: status, pillar (for fast filtering on the dashboard).

The vault knowledge base uses pgvector for hybrid semantic + full-text search.

Column	Type	Description
`id`	UUID	Primary key
`file_path`	text	Relative path within the vault
`title`	text	Note title from frontmatter
`tags`	text[]	Frontmatter tags
`file_hash`	text	SHA256 of file content (change detection for sync)
`updated_at`	timestamptz	Last sync timestamp

Column	Type	Description
`id`	UUID	Primary key
`note_id`	UUID	FK → `vault_notes.id` (CASCADE delete)
`heading`	text	Section heading (`##` boundary)
`content`	text	Chunk text content
`embedding`	vector(1536)	OpenAI text-embedding-3-small vector
`search_vector`	tsvector	PostgreSQL full-text search vector

Indexes:

HNSW index on vault_chunks.embedding for fast approximate nearest-neighbor search
GIN index on vault_chunks.search_vector for full-text search

Hybrid search combines cosine similarity (70% weight) with full-text relevance (30% weight) for optimal recall.