Quality Gates¶
The pipeline runs quality checks in parallel after each draft is produced. If any gate fails, the synthesis agent compiles feedback and the writer rewrites the article -- up to 3 iterations.
Gate Overview¶
| Gate | Agent | Model | What It Checks |
|---|---|---|---|
| Factuality | Factuality Checker | Sonnet | Medical claim accuracy against research dossier |
| SEO | SEO Optimizer | Haiku | Deterministic metrics + LLM evaluation of keyword usage, headings, meta |
| Style | Style Checker | Haiku | Readability, structure compliance, voice/tone, AI patterns, medical language |
Factuality and style gates are mandatory for all content types (medical content safety requirement, enforced by config validation). The SEO gate is optional and skipped for research_news articles.
Config-Driven Dispatch¶
Active gates and pass thresholds are defined per content type in src/config.py:
Active Gates¶
| Content Type | Gates |
|---|---|
satellite |
factuality, seo, style |
cornerstone |
factuality, seo, style |
research_news |
factuality, style |
Pass Thresholds¶
| Content Type | Threshold |
|---|---|
satellite |
70 |
cornerstone |
85 |
research_news |
70 |
An article passes only when all active gates score at or above the content-type threshold. Cornerstones require higher quality because they are the authoritative hub articles.
Gate Execution¶
Quality gates run in parallel via asyncio.gather(). The map_gate_results() helper in src/pipeline/quality_gates.py maps positional results back to their typed outputs (factuality, SEO, style).
Each gate produces a typed output extending QualityGateOutput:
Quality scores are persisted to the quality_scores table with the iteration number, allowing the dashboard to display score progression across rewrite iterations.
SEO Gate: Hybrid Evaluation¶
The SEO gate combines two evaluation strategies:
-
Deterministic checks (
seo_checks.py): pure Python, no LLM cost- Keyword density with hyphen normalization and proximity matching (German compound words)
- Secondary keyword occurrence counts
- Heading hierarchy validation
- Word count
- FAQ section detection (heading text, Rank Math blocks, consecutive question headings)
-
LLM evaluation: Haiku assesses qualitative aspects
- Keyword naturalness
- Title tag and meta description quality
- AEO (Answer Engine Optimization) readiness
- E-E-A-T signals
- Content element usage
Style Gate: Multi-Dimensional¶
The style checker evaluates five dimensions:
- Readability (
StyleCheckResult): Flesch-Kincaid grade, average sentence length, within-target check - Structure compliance (
StyleStructureCompliance): validates required elements from the content template - Voice evaluation (
StyleVoiceEvaluation): pillar tone match, individual voice/tone issues - Humanizer check (
StyleHumanizerCheck): detects AI-writing patterns (formulaic transitions, hedging overuse, etc.) - Medical language (
StyleMedicalLanguage): flags inappropriate medical terminology, missing hedging, unsupported claims
Synthesis and Rewrite Loop¶
When any gate fails:
- The synthesis agent receives the draft, all gate outputs, the article brief, and optionally the research dossier
- It produces a
SynthesisOutputcontaining:- A
revised_draftwith changes applied - A
change_loglisting each change (gate, location, original, revised, reason) conflict_resolutionswhen gates disagree (e.g., SEO wants more keywords but style says it reads unnaturally)
- A
- The change log is serialized as rewrite instructions and fed back to the writer agent
- The writer produces a new draft incorporating the feedback
- Quality gates run again on the new draft
This loop repeats up to 3 times. If all iterations are exhausted without passing, the article is marked as failed.
Langfuse Score Tracking¶
When Langfuse is configured, quality gate scores are posted to the article's root Langfuse span after every iteration:
Top-level scores (per iteration):
factuality_iter1,factuality_iter2,factuality_iter3-- numeric scoresseo_passed_iter1,seo_passed_iter2-- boolean (1.0/0.0)style_passed_iter1,style_passed_iter2-- boolean (1.0/0.0)
Granular sub-scores (per iteration):
- Factuality:
claims_checked,claims_verified,claims_flagged - SEO:
keyword_density,secondary_keywords,word_count,heading_hierarchy,faq_present - Style:
readability,structure,voice,humanizer,medical_language
Synthesis feedback (per iteration):
synthesis_feedback_iter1-- string score containing the change log from the synthesis agent
This provides visibility into quality progression across rewrite iterations directly in the Langfuse dashboard, with enough granularity to diagnose which specific aspect of quality is causing rewrites.