Skip to content

Naluma Content Pipeline

Security

Security¶

The pipeline handles medical content and external data sources, so security is a core concern.

External Content Handling¶

Sanitization — all external content (scraped web pages, RSS feeds, search results) passes through sanitize_external_content() before being sent to LLMs or stored in the database
User role only — external text is always placed in user role messages, never in system prompts, to prevent prompt injection

Database Security¶

SSL required — all database connections must include sslmode=require (enforced in config validation)
Least-privilege users — the runtime pipeline user has no DDL privileges; a separate migrations user handles schema changes
IP allowlist — Neon's IP Allow feature restricts connections to known IPs (Fly.io egress, developer machines)

Secrets Management¶

Environment variables only — all credentials are loaded from environment variables, never hardcoded
detect-secrets — a pre-commit hook scans for accidentally committed secrets (baseline in .secrets.baseline)
Streamlit secrets — the dashboard reads credentials from Streamlit's secrets manager, not from .env

Cost Controls¶

Per-article ceiling — each article production run has a token budget. The pipeline stops if exceeded
Monthly budget — aggregate monthly token spend is tracked and alerted on threshold breach
Model selection — cheaper models (Haiku) are used for classification/scoring tasks; expensive models (Sonnet) only for creative work