Skip to content

Testing Guide

Test Structure

Tests are organised in three layers:

tests/
  unit/           # Fast, no external dependencies
  integration/    # Real DB, marked with @pytest.mark.integration
  e2e/            # Full pipeline, mocked agents + WordPress
  evaluation/     # LLM-as-judge prompt evaluations (not in CI)
  fixtures/       # Shared test data (JSON, agent responses)

Running Tests

# All unit tests
uv run pytest

# With coverage
uv run pytest --cov=src

# Integration tests (requires DB)
uv run pytest -m integration

# E2E tests
uv run pytest -m e2e

# Evaluation tests (not in CI, requires API keys)
uv run pytest tests/evaluation/ -m "not integration"

# Single test file
uv run pytest tests/unit/test_agents/test_writer.py -v

Key Patterns

Patch at Import Sites

Mock functions where they are imported, not where they are defined:

# CORRECT — patches the reference in the flow module
@patch("src.pipeline.article_flow.write_article")
async def test_flow(mock_write):
    ...

# WRONG — patches the original, but flow already imported its own copy
@patch("src.agents.writer.write_article")
async def test_flow(mock_write):
    ...

Removing imports breaks test patches

When removing an unused import from production code, search for patch("src.module.name") in tests first. Removing the import causes AttributeError in tests that patch the import site.

Prefect Task Bypass

E2E and integration tests monkey-patch Task.__call__ so @task-decorated functions execute directly without a Prefect API server:

# conftest.py
@pytest.fixture(autouse=True)
def bypass_prefect_tasks(monkeypatch):
    monkeypatch.setattr(Task, "__call__", lambda self, *a, **kw: self.fn(*a, **kw))

Async Test Configuration

All async tests use a session-scoped event loop (configured in pyproject.toml):

[tool.pytest.ini_options]
asyncio_default_test_loop_scope = "session"

For tests with concurrent DB access (e.g., parallel quality gates), use asyncio.Lock() to serialise session patches.

Pydantic Strict Mode in Fixtures

JSON fixtures contain raw strings for StrEnum fields. Use strict=False when loading:

data = json.loads(fixture_path.read_text())
output = WriterOutput.model_validate(data, strict=False)

Adding New Database Tables

When adding a new table to src/db/tables.py, also update the expected set in tests/unit/test_db/test_tables.py::test_all_tables_defined.

Evaluation Tests

tests/evaluation/ contains LLM-as-judge tests that evaluate prompt quality. These are not run in CI — they require API keys and are slow.

# Run all evaluations
uv run pytest tests/evaluation/ -m "not integration" -v

Each evaluation test sends a prompt to an LLM and checks the output against quality criteria. Use these after modifying agent prompts to verify no regression.