Skip to content

Quality Gates

The pipeline runs quality checks in parallel after each draft is produced. If any gate fails, the synthesis agent compiles feedback and the writer rewrites the article -- up to 3 iterations.

Gate Overview

Gate Agent Model What It Checks
Factuality Factuality Checker Sonnet Medical claim accuracy against research dossier
SEO SEO Optimizer Haiku Deterministic metrics + LLM evaluation of keyword usage, headings, meta
Style Style Checker Haiku Readability, structure compliance, voice/tone, AI patterns, medical language

Factuality and style gates are mandatory for all content types (medical content safety requirement, enforced by config validation). The SEO gate is optional and skipped for research_news articles.

Config-Driven Dispatch

Active gates and pass thresholds are defined per content type in src/config.py:

Active Gates

Content Type Gates
satellite factuality, seo, style
cornerstone factuality, seo, style
research_news factuality, style

Pass Thresholds

Content Type Threshold
satellite 70
cornerstone 85
research_news 70

An article passes only when all active gates score at or above the content-type threshold. Cornerstones require higher quality because they are the authoritative hub articles.

Gate Execution

Quality gates run in parallel via asyncio.gather(). The map_gate_results() helper in src/pipeline/quality_gates.py maps positional results back to their typed outputs (factuality, SEO, style).

Each gate produces a typed output extending QualityGateOutput:

class QualityGateOutput(BaseModel):
    passed: bool
    score: float
    feedback: QualityFeedback

Quality scores are persisted to the quality_scores table with the iteration number, allowing the dashboard to display score progression across rewrite iterations.

SEO Gate: Hybrid Evaluation

The SEO gate combines two evaluation strategies:

  1. Deterministic checks (seo_checks.py): pure Python, no LLM cost

    • Keyword density with hyphen normalization and proximity matching (German compound words)
    • Secondary keyword occurrence counts
    • Heading hierarchy validation
    • Word count
    • FAQ section detection (heading text, Rank Math blocks, consecutive question headings)
  2. LLM evaluation: Haiku assesses qualitative aspects

    • Keyword naturalness
    • Title tag and meta description quality
    • AEO (Answer Engine Optimization) readiness
    • E-E-A-T signals
    • Content element usage

Style Gate: Multi-Dimensional

The style checker evaluates five dimensions:

  • Readability (StyleCheckResult): Flesch-Kincaid grade, average sentence length, within-target check
  • Structure compliance (StyleStructureCompliance): validates required elements from the content template
  • Voice evaluation (StyleVoiceEvaluation): pillar tone match, individual voice/tone issues
  • Humanizer check (StyleHumanizerCheck): detects AI-writing patterns (formulaic transitions, hedging overuse, etc.)
  • Medical language (StyleMedicalLanguage): flags inappropriate medical terminology, missing hedging, unsupported claims

Synthesis and Rewrite Loop

When any gate fails:

  1. The synthesis agent receives the draft, all gate outputs, the article brief, and optionally the research dossier
  2. It produces a SynthesisOutput containing:
    • A revised_draft with changes applied
    • A change_log listing each change (gate, location, original, revised, reason)
    • conflict_resolutions when gates disagree (e.g., SEO wants more keywords but style says it reads unnaturally)
  3. The change log is serialized as rewrite instructions and fed back to the writer agent
  4. The writer produces a new draft incorporating the feedback
  5. Quality gates run again on the new draft

This loop repeats up to 3 times. If all iterations are exhausted without passing, the article is marked as failed.

Langfuse Score Tracking

When Langfuse is configured, quality gate scores are posted to the article's root Langfuse span after every iteration:

Top-level scores (per iteration):

  • factuality_iter1, factuality_iter2, factuality_iter3 -- numeric scores
  • seo_passed_iter1, seo_passed_iter2 -- boolean (1.0/0.0)
  • style_passed_iter1, style_passed_iter2 -- boolean (1.0/0.0)

Granular sub-scores (per iteration):

  • Factuality: claims_checked, claims_verified, claims_flagged
  • SEO: keyword_density, secondary_keywords, word_count, heading_hierarchy, faq_present
  • Style: readability, structure, voice, humanizer, medical_language

Synthesis feedback (per iteration):

  • synthesis_feedback_iter1 -- string score containing the change log from the synthesis agent

This provides visibility into quality progression across rewrite iterations directly in the Langfuse dashboard, with enough granularity to diagnose which specific aspect of quality is causing rewrites.