Quality Gates¶

The pipeline runs quality checks in parallel after each draft is produced. If any gate fails, the synthesis agent compiles feedback and the writer rewrites the article -- up to 3 iterations.

Gate Overview¶

Gate	Agent	Model	What It Checks
Factuality	Factuality Checker	Sonnet	Medical claim accuracy against research dossier
SEO	SEO Optimizer	Haiku	Deterministic metrics + LLM evaluation of keyword usage, headings, meta
Style	Style Checker	Haiku	Readability, structure compliance, voice/tone, AI patterns, medical language

Factuality and style gates are mandatory for all content types (medical content safety requirement, enforced by config validation). The SEO gate is optional and skipped for research_news articles.

Config-Driven Dispatch¶

Active gates and pass thresholds are defined per content type in src/config.py:

Active Gates¶

Content Type	Gates
`satellite`	factuality, seo, style
`cornerstone`	factuality, seo, style
`research_news`	factuality, style

Pass Thresholds¶

Content Type	Threshold
`satellite`	70
`cornerstone`	85
`research_news`	70

An article passes only when all active gates score at or above the content-type threshold. Cornerstones require higher quality because they are the authoritative hub articles.

Gate Execution¶

Quality gates run in parallel via asyncio.gather(). The map_gate_results() helper in src/pipeline/quality_gates.py maps positional results back to their typed outputs (factuality, SEO, style).

Each gate produces a typed output extending QualityGateOutput:

class QualityGateOutput(BaseModel):
    passed: bool
    score: float
    feedback: QualityFeedback

Quality scores are persisted to the quality_scores table with the iteration number, allowing the dashboard to display score progression across rewrite iterations.

SEO Gate: Hybrid Evaluation¶

The SEO gate combines two evaluation strategies:

Deterministic checks (seo_checks.py): pure Python, no LLM cost
- Keyword density with hyphen normalization and proximity matching (German compound words)
- Secondary keyword occurrence counts
- Heading hierarchy validation
- Word count
- FAQ section detection (heading text, Rank Math blocks, consecutive question headings)
LLM evaluation: Haiku assesses qualitative aspects
- Keyword naturalness
- Title tag and meta description quality
- AEO (Answer Engine Optimization) readiness
- E-E-A-T signals
- Content element usage

Style Gate: Multi-Dimensional¶

The style checker evaluates five dimensions:

Readability (StyleCheckResult): Flesch-Kincaid grade, average sentence length, within-target check
Structure compliance (StyleStructureCompliance): validates required elements from the content template
Voice evaluation (StyleVoiceEvaluation): pillar tone match, individual voice/tone issues
Humanizer check (StyleHumanizerCheck): detects AI-writing patterns (formulaic transitions, hedging overuse, etc.)
Medical language (StyleMedicalLanguage): flags inappropriate medical terminology, missing hedging, unsupported claims

Synthesis and Rewrite Loop¶

When any gate fails:

The synthesis agent receives the draft, all gate outputs, the article brief, and optionally the research dossier
It produces a SynthesisOutput containing:
- A revised_draft with changes applied
- A change_log listing each change (gate, location, original, revised, reason)
- conflict_resolutions when gates disagree (e.g., SEO wants more keywords but style says it reads unnaturally)
The change log is serialized as rewrite instructions and fed back to the writer agent
The writer produces a new draft incorporating the feedback
Quality gates run again on the new draft

This loop repeats up to 3 times. If all iterations are exhausted without passing, the article is marked as failed.

Langfuse Score Tracking¶

When Langfuse is configured, quality gate scores are posted to the article's root Langfuse span after every iteration:

Top-level scores (per iteration):

factuality_iter1, factuality_iter2, factuality_iter3 -- numeric scores
seo_passed_iter1, seo_passed_iter2 -- boolean (1.0/0.0)
style_passed_iter1, style_passed_iter2 -- boolean (1.0/0.0)

Granular sub-scores (per iteration):

Factuality: claims_checked, claims_verified, claims_flagged
SEO: keyword_density, secondary_keywords, word_count, heading_hierarchy, faq_present
Style: readability, structure, voice, humanizer, medical_language

Synthesis feedback (per iteration):

synthesis_feedback_iter1 -- string score containing the change log from the synthesis agent

This provides visibility into quality progression across rewrite iterations directly in the Langfuse dashboard, with enough granularity to diagnose which specific aspect of quality is causing rewrites.