Database¶
The pipeline uses Neon Postgres with SQLAlchemy Core (no ORM). Tables are defined in src/db/tables.py.
Key Tables¶
| Table | Purpose |
|---|---|
content_backlog |
Content planning items (topics, keywords, priorities) |
articles |
Article lifecycle (status, drafts, quality scores, costs) |
agent_prompts |
Versioned LLM prompts per agent |
pipeline_runs |
Batch run tracking (success/failure counts) |
source_registry |
External source configuration for automated scanning (PubMed, RSS, etc.) |
source_items |
Discovered items from source scanning |
trusted_sources |
Curated catalog of authoritative tinnitus sources for brief generation |
dashboard_users |
Dashboard authentication and roles |
audit_log |
Dashboard action audit trail |
vault_notes |
Obsidian vault note metadata (file hash, frontmatter) |
vault_chunks |
Note chunks with pgvector embeddings and full-text search vectors |
Content Backlog Columns¶
The content_backlog table stores article topics for the production pipeline. These columns are also the expected format for CSV import.
| Column | Type | Required | Default | Description |
|---|---|---|---|---|
id |
UUID | Auto | Generated | Primary key |
content_type |
text | Yes | -- | satellite, cornerstone, or research_news |
pillar |
text | Yes | -- | Content pillar (p1 through p5) |
language |
text | Yes | -- | Language code (en or de) |
primary_keyword |
text | Yes | -- | Main SEO keyword |
working_title |
text | Yes | -- | Article working title |
target_word_count |
integer | Yes | -- | Target length in words |
priority_score |
numeric | Yes | -- | Priority score (0.0 to 100.0) |
source |
text | Yes | -- | Origin (seo_research, manual, gap_analysis) |
secondary_keywords |
text[] | No | {} |
Additional keywords |
search_intent |
text | No | informational |
Search intent type |
journey_stage |
text | No | exploration |
User journey stage |
info_gain_angle |
text | No | "" |
Unique angle or information gain |
aeo_priority |
text | No | standard |
AI Engine Optimization priority |
authorship_level |
text | No | brand_attributed |
Attribution level |
status |
text | No | planned |
planned, brief_generated, in_production, published |
cornerstone_id |
UUID | No | null | Parent cornerstone reference (for satellites) |
article_id |
UUID | No | null | Linked article after brief generation |
notes |
text | No | "" |
Free-text notes |
Indexes: status, pillar (for fast filtering on the dashboard).
pgvector Search Tables¶
The vault knowledge base uses pgvector for hybrid semantic + full-text search.
vault_notes¶
| Column | Type | Description |
|---|---|---|
id |
UUID | Primary key |
file_path |
text | Relative path within the vault |
title |
text | Note title from frontmatter |
tags |
text[] | Frontmatter tags |
file_hash |
text | SHA256 of file content (change detection for sync) |
updated_at |
timestamptz | Last sync timestamp |
vault_chunks¶
| Column | Type | Description |
|---|---|---|
id |
UUID | Primary key |
note_id |
UUID | FK → vault_notes.id (CASCADE delete) |
heading |
text | Section heading (## boundary) |
content |
text | Chunk text content |
embedding |
vector(1536) | OpenAI text-embedding-3-small vector |
search_vector |
tsvector | PostgreSQL full-text search vector |
Indexes:
- HNSW index on
vault_chunks.embeddingfor fast approximate nearest-neighbor search - GIN index on
vault_chunks.search_vectorfor full-text search
Hybrid search combines cosine similarity (70% weight) with full-text relevance (30% weight) for optimal recall.