Skip to content

Security

The pipeline handles medical content and external data sources, so security is a core concern.

External Content Handling

  • Sanitization — all external content (scraped web pages, RSS feeds, search results) passes through sanitize_external_content() before being sent to LLMs or stored in the database
  • User role only — external text is always placed in user role messages, never in system prompts, to prevent prompt injection

Database Security

  • SSL required — all database connections must include sslmode=require (enforced in config validation)
  • Least-privilege users — the runtime pipeline user has no DDL privileges; a separate migrations user handles schema changes
  • IP allowlist — Neon's IP Allow feature restricts connections to known IPs (Fly.io egress, developer machines)

Dashboard Authentication

  • Cloudflare Access — the dashboard is protected by Cloudflare Access (Zero Trust). Users authenticate via Google or GitHub OAuth. The JWT Cf-Access-Jwt-Assertion header is validated on each request
  • Role-based accessadmin and editor roles restrict access to sensitive pages (prompt editing, user management)
  • Local bypass — when ENVIRONMENT=local, a dev email bypass is available via DASHBOARD_DEV_EMAIL

Image Storage

  • Cloudflare R2 — featured images are stored in R2 object storage, not in the database. R2 credentials (R2_ACCOUNT_ID, R2_ACCESS_KEY_ID, R2_SECRET_ACCESS_KEY) are Fly.io secrets
  • Presigned URLs — the dashboard uses presigned R2 URLs (15 min TTL) for image display, avoiding direct credential exposure to the browser

Secrets Management

  • Environment variables only — all credentials are loaded from environment variables, never hardcoded
  • detect-secrets — a pre-commit hook scans for accidentally committed secrets (baseline in .secrets.baseline)

Cost Controls

  • Per-flow ceilings -- each flow has a configurable token budget. The pipeline raises CostLimitExceeded if exceeded
  • Model selection -- cheaper models (Haiku) are used for classification/scoring tasks; expensive models (Sonnet) only for creative work
  • Prompt caching -- system prompts use Anthropic's ephemeral cache to reduce repeat input costs

Observability Security

  • Langfuse -- response previews are truncated to 500 characters to avoid sending full medical content to external services
  • Sentry -- performance tracing sample rate is 0.0 (no request-level traces sent); only exceptions are captured
  • Structured logs -- JSON output in production; no sensitive content in log fields (article IDs and agent names only)