Skip to content

Database

The pipeline uses Neon Postgres with SQLAlchemy Core (no ORM). Tables are defined in src/db/tables.py.

Key Tables

Table Purpose
content_backlog Content planning items (topics, keywords, priorities)
articles Article lifecycle (status, drafts, quality scores, costs)
agent_prompts Versioned LLM prompts per agent
pipeline_runs Batch run tracking (success/failure counts)
source_registry External source configuration for automated scanning (PubMed, RSS, etc.)
source_items Discovered items from source scanning
trusted_sources Curated catalog of authoritative tinnitus sources for brief generation
dashboard_users Dashboard authentication and roles
audit_log Dashboard action audit trail
vault_notes Obsidian vault note metadata (file hash, frontmatter)
vault_chunks Note chunks with pgvector embeddings and full-text search vectors

Content Backlog Columns

The content_backlog table stores article topics for the production pipeline. These columns are also the expected format for CSV import.

Column Type Required Default Description
id UUID Auto Generated Primary key
content_type text Yes -- satellite, cornerstone, or research_news
pillar text Yes -- Content pillar (p1 through p5)
language text Yes -- Language code (en or de)
primary_keyword text Yes -- Main SEO keyword
working_title text Yes -- Article working title
target_word_count integer Yes -- Target length in words
priority_score numeric Yes -- Priority score (0.0 to 100.0)
source text Yes -- Origin (seo_research, manual, gap_analysis)
secondary_keywords text[] No {} Additional keywords
search_intent text No informational Search intent type
journey_stage text No exploration User journey stage
info_gain_angle text No "" Unique angle or information gain
aeo_priority text No standard AI Engine Optimization priority
authorship_level text No brand_attributed Attribution level
status text No planned planned, brief_generated, in_production, published
cornerstone_id UUID No null Parent cornerstone reference (for satellites)
article_id UUID No null Linked article after brief generation
notes text No "" Free-text notes

Indexes: status, pillar (for fast filtering on the dashboard).

pgvector Search Tables

The vault knowledge base uses pgvector for hybrid semantic + full-text search.

vault_notes

Column Type Description
id UUID Primary key
file_path text Relative path within the vault
title text Note title from frontmatter
tags text[] Frontmatter tags
file_hash text SHA256 of file content (change detection for sync)
updated_at timestamptz Last sync timestamp

vault_chunks

Column Type Description
id UUID Primary key
note_id UUID FK → vault_notes.id (CASCADE delete)
heading text Section heading (## boundary)
content text Chunk text content
embedding vector(1536) OpenAI text-embedding-3-small vector
search_vector tsvector PostgreSQL full-text search vector

Indexes:

  • HNSW index on vault_chunks.embedding for fast approximate nearest-neighbor search
  • GIN index on vault_chunks.search_vector for full-text search

Hybrid search combines cosine similarity (70% weight) with full-text relevance (30% weight) for optimal recall.