Security¶
The pipeline handles medical content and external data sources, so security is a core concern.
External Content Handling¶
- Sanitization — all external content (scraped web pages, RSS feeds, search results) passes through
sanitize_external_content()before being sent to LLMs or stored in the database - User role only — external text is always placed in
userrole messages, never in system prompts, to prevent prompt injection
Database Security¶
- SSL required — all database connections must include
sslmode=require(enforced in config validation) - Least-privilege users — the runtime pipeline user has no DDL privileges; a separate migrations user handles schema changes
- IP allowlist — Neon's IP Allow feature restricts connections to known IPs (Fly.io egress, developer machines)
Secrets Management¶
- Environment variables only — all credentials are loaded from environment variables, never hardcoded
- detect-secrets — a pre-commit hook scans for accidentally committed secrets (baseline in
.secrets.baseline) - Streamlit secrets — the dashboard reads credentials from Streamlit's secrets manager, not from
.env
Cost Controls¶
- Per-article ceiling — each article production run has a token budget. The pipeline stops if exceeded
- Monthly budget — aggregate monthly token spend is tracked and alerted on threshold breach
- Model selection — cheaper models (Haiku) are used for classification/scoring tasks; expensive models (Sonnet) only for creative work