Testing Guide¶
Test Structure¶
Tests are organised in three layers:
tests/
unit/ # Fast, no external dependencies
integration/ # Real DB, marked with @pytest.mark.integration
e2e/ # Full pipeline, mocked agents + WordPress
evaluation/ # LLM-as-judge prompt evaluations (not in CI)
fixtures/ # Shared test data (JSON, agent responses)
Running Tests¶
# All unit tests
uv run pytest
# With coverage
uv run pytest --cov=src
# Integration tests (requires DB)
uv run pytest -m integration
# E2E tests
uv run pytest -m e2e
# Evaluation tests (not in CI, requires API keys)
uv run pytest tests/evaluation/ -m "not integration"
# Single test file
uv run pytest tests/unit/test_agents/test_writer.py -v
Key Patterns¶
Patch at Import Sites¶
Mock functions where they are imported, not where they are defined:
# CORRECT — patches the reference in the flow module
@patch("src.pipeline.article_flow.write_article")
async def test_flow(mock_write):
...
# WRONG — patches the original, but flow already imported its own copy
@patch("src.agents.writer.write_article")
async def test_flow(mock_write):
...
Removing imports breaks test patches
When removing an unused import from production code, search for
patch("src.module.name") in tests first. Removing the import causes
AttributeError in tests that patch the import site.
Prefect Task Bypass¶
E2E and integration tests monkey-patch Task.__call__ so @task-decorated
functions execute directly without a Prefect API server:
# conftest.py
@pytest.fixture(autouse=True)
def bypass_prefect_tasks(monkeypatch):
monkeypatch.setattr(Task, "__call__", lambda self, *a, **kw: self.fn(*a, **kw))
Async Test Configuration¶
All async tests use a session-scoped event loop (configured in pyproject.toml):
For tests with concurrent DB access (e.g., parallel quality gates),
use asyncio.Lock() to serialise session patches.
Pydantic Strict Mode in Fixtures¶
JSON fixtures contain raw strings for StrEnum fields. Use strict=False
when loading:
data = json.loads(fixture_path.read_text())
output = WriterOutput.model_validate(data, strict=False)
Adding New Database Tables¶
When adding a new table to src/db/tables.py, also update the expected
set in tests/unit/test_db/test_tables.py::test_all_tables_defined.
Evaluation Tests¶
tests/evaluation/ contains LLM-as-judge tests that evaluate prompt quality.
These are not run in CI — they require API keys and are slow.
Each evaluation test sends a prompt to an LLM and checks the output against quality criteria. Use these after modifying agent prompts to verify no regression.