Skip to content

Security

The pipeline handles medical content and external data sources, so security is a core concern.

External Content Handling

  • Sanitization — all external content (scraped web pages, RSS feeds, search results) passes through sanitize_external_content() before being sent to LLMs or stored in the database
  • User role only — external text is always placed in user role messages, never in system prompts, to prevent prompt injection

Database Security

  • SSL required — all database connections must include sslmode=require (enforced in config validation)
  • Least-privilege users — the runtime pipeline user has no DDL privileges; a separate migrations user handles schema changes
  • IP allowlist — Neon's IP Allow feature restricts connections to known IPs (Fly.io egress, developer machines)

Secrets Management

  • Environment variables only — all credentials are loaded from environment variables, never hardcoded
  • detect-secrets — a pre-commit hook scans for accidentally committed secrets (baseline in .secrets.baseline)
  • Streamlit secrets — the dashboard reads credentials from Streamlit's secrets manager, not from .env

Cost Controls

  • Per-article ceiling — each article production run has a token budget. The pipeline stops if exceeded
  • Monthly budget — aggregate monthly token spend is tracked and alerted on threshold breach
  • Model selection — cheaper models (Haiku) are used for classification/scoring tasks; expensive models (Sonnet) only for creative work