Security¶
The pipeline handles medical content and external data sources, so security is a core concern.
External Content Handling¶
- Sanitization — all external content (scraped web pages, RSS feeds, search results) passes through
sanitize_external_content()before being sent to LLMs or stored in the database - User role only — external text is always placed in
userrole messages, never in system prompts, to prevent prompt injection
Database Security¶
- SSL required — all database connections must include
sslmode=require(enforced in config validation) - Least-privilege users — the runtime pipeline user has no DDL privileges; a separate migrations user handles schema changes
- IP allowlist — Neon's IP Allow feature restricts connections to known IPs (Fly.io egress, developer machines)
Dashboard Authentication¶
- Cloudflare Access — the dashboard is protected by Cloudflare Access (Zero Trust). Users authenticate via Google or GitHub OAuth. The JWT
Cf-Access-Jwt-Assertionheader is validated on each request - Role-based access —
adminandeditorroles restrict access to sensitive pages (prompt editing, user management) - Local bypass — when
ENVIRONMENT=local, a dev email bypass is available viaDASHBOARD_DEV_EMAIL
Image Storage¶
- Cloudflare R2 — featured images are stored in R2 object storage, not in the database. R2 credentials (
R2_ACCOUNT_ID,R2_ACCESS_KEY_ID,R2_SECRET_ACCESS_KEY) are Fly.io secrets - Presigned URLs — the dashboard uses presigned R2 URLs (15 min TTL) for image display, avoiding direct credential exposure to the browser
Secrets Management¶
- Environment variables only — all credentials are loaded from environment variables, never hardcoded
- detect-secrets — a pre-commit hook scans for accidentally committed secrets (baseline in
.secrets.baseline)
Cost Controls¶
- Per-flow ceilings -- each flow has a configurable token budget. The pipeline raises
CostLimitExceededif exceeded - Model selection -- cheaper models (Haiku) are used for classification/scoring tasks; expensive models (Sonnet) only for creative work
- Prompt caching -- system prompts use Anthropic's ephemeral cache to reduce repeat input costs
Observability Security¶
- Langfuse -- response previews are truncated to 500 characters to avoid sending full medical content to external services
- Sentry -- performance tracing sample rate is 0.0 (no request-level traces sent); only exceptions are captured
- Structured logs -- JSON output in production; no sensitive content in log fields (article IDs and agent names only)