Database¶
The pipeline uses Neon Postgres with SQLAlchemy Core (no ORM). Tables are defined in src/db/tables.py. Entity models and enums are in src/db/models.py.
Key Tables¶
The application database has 14 tables:
| Table | Purpose |
|---|---|
content_backlog |
Content planning items (topics, keywords, priorities) |
articles |
Article lifecycle (status, drafts, quality scores, costs) |
article_images |
Featured image metadata and R2 storage keys (up to 2 candidates per article) |
quality_scores |
Per-iteration quality gate scores and feedback |
agent_prompts |
Versioned LLM prompts per agent (keyed by agent_name + content_type + version) |
content_templates |
Structural templates defining expected article sections by content type and pillar |
pipeline_runs |
Batch run tracking (success/failure counts, Prefect flow run linkage) |
source_registry |
External source configuration for automated scanning (PubMed, RSS, etc.) |
source_items |
Discovered items from source scanning with triage results |
trusted_sources |
Curated catalog of authoritative tinnitus sources for brief generation |
dashboard_users |
Dashboard authentication and roles |
audit_log |
Dashboard action audit trail |
vault_notes |
Obsidian vault note metadata (file hash, frontmatter) |
vault_chunks |
Note chunks with pgvector embeddings and full-text search vectors |
Content Backlog Columns¶
The content_backlog table stores article topics for the production pipeline. These columns are also the expected format for CSV import.
| Column | Type | Required | Default | Description |
|---|---|---|---|---|
id |
UUID | Auto | Generated | Primary key |
content_type |
text | Yes | -- | satellite, cornerstone, or research_news |
pillar |
text | Yes | -- | Content pillar (p1 through p5) |
language |
text | Yes | -- | Language code (en or de) |
primary_keyword |
text | Yes | -- | Main SEO keyword |
working_title |
text | Yes | -- | Article working title |
target_word_count |
integer | Yes | -- | Target length in words |
priority_score |
numeric | Yes | -- | Priority score (0.0 to 100.0) |
source |
text | Yes | -- | Origin (seo_research, manual, gap_analysis) |
secondary_keywords |
text[] | No | {} |
Additional keywords |
search_intent |
text | No | informational |
Search intent type |
journey_stage |
text | No | exploration |
User journey stage |
info_gain_angle |
text | No | "" |
Unique angle or information gain |
aeo_priority |
text | No | standard |
AI Engine Optimization priority |
authorship_level |
text | No | brand_attributed |
Attribution level |
status |
text | No | planned |
planned, brief_generated, in_production, published |
cornerstone_id |
UUID | No | null | Parent cornerstone reference (for satellites) |
article_id |
UUID | No | null | Linked article after brief generation |
notes |
text | No | "" |
Free-text notes |
Indexes: status, pillar (for fast filtering on the dashboard).
Quality Scores¶
The quality_scores table stores per-iteration results from quality gate agents.
| Column | Type | Description |
|---|---|---|
id |
UUID | Primary key |
article_id |
UUID | FK to articles.id |
agent |
text | Quality gate agent (factuality, seo, style) |
passed |
boolean | Whether the gate passed |
score |
numeric | Score from the agent (0-100) |
feedback |
JSONB | Structured feedback from the agent |
iteration |
integer | Quality gate iteration (1-3) |
tokens |
JSONB | Token usage breakdown |
cost_usd |
numeric | Cost of this gate run |
created_at |
timestamptz | When the score was recorded |
Indexes: article_id (for loading all scores for an article).
Content Templates¶
The content_templates table stores structural templates that define expected sections for articles by content type and pillar.
| Column | Type | Description |
|---|---|---|
id |
UUID | Primary key |
name |
text | Template name |
content_type |
text | Content type (satellite, cornerstone, research_news) |
pillar |
text | Content pillar (nullable; null means applies to all pillars) |
structure |
JSONB | Array of section definitions (type, heading, description, guidance) |
required_elements |
text[] | List of required structural elements |
active |
boolean | Whether this template is active (default: true) |
created_at |
timestamptz | When the template was created |
Templates are loaded by the brief generator to provide structural guidance. The style checker validates articles against their template's required elements.
Article Images¶
The article_images table stores featured image metadata. Image binary data is stored in Cloudflare R2 object storage (not in the database).
| Column | Type | Description |
|---|---|---|
id |
UUID | Primary key |
article_id |
UUID | FK to articles.id (CASCADE delete) |
ordinal |
integer | Image candidate number (1-based) |
r2_key |
text | R2 object key (e.g. images/<uuid>/1.png) |
prompt_used |
text | DALL-E prompt sent to generate this image |
revised_prompt |
text | DALL-E's revised prompt (if it modified the original) |
is_selected |
boolean | Whether this is the selected featured image (default: false) |
generated_at |
timestamptz | When the image was generated |
Constraints: Unique on (article_id, ordinal).
Indexes: article_id (for loading images for an article).
pgvector Search Tables¶
The vault knowledge base uses pgvector for hybrid semantic + full-text search.
vault_notes¶
| Column | Type | Description |
|---|---|---|
id |
UUID | Primary key |
file_path |
text | Relative path within the vault |
title |
text | Note title from frontmatter |
tags |
text[] | Frontmatter tags |
file_hash |
text | SHA256 of file content (change detection for sync) |
updated_at |
timestamptz | Last sync timestamp |
vault_chunks¶
| Column | Type | Description |
|---|---|---|
id |
UUID | Primary key |
note_id |
UUID | FK → vault_notes.id (CASCADE delete) |
heading |
text | Section heading (## boundary) |
content |
text | Chunk text content |
embedding |
vector(1536) | OpenAI text-embedding-3-small vector |
search_vector |
tsvector | PostgreSQL full-text search vector |
Indexes:
- HNSW index on
vault_chunks.embeddingfor fast approximate nearest-neighbor search - GIN index on
vault_chunks.search_vectorfor full-text search
Hybrid search combines cosine similarity (70% weight) with full-text relevance (30% weight) for optimal recall.