Skip to content

Tools API

tools

Agent tools — external data sources and utilities.

Each tool module exports a JSON schema constant (*_SCHEMA) describing the tool's parameters and an async handler function that agents invoke via the tool-use loop in agents.base.call_agent().

Modules: web_search: Brave Search API client. pubmed: PubMed E-utilities client (search + fetch abstracts). semantic_scholar: Semantic Scholar Academic Graph API client. keyword_data: SEO keyword volume and difficulty data. vault_reader: Hybrid pgvector + full-text search over Obsidian vault notes. vault_chunking: Chunk vault notes for embedding. source_scanner: Scan and validate source URLs. image_gen: AI image generation tool wrapper. image_colors: Image color extraction utilities. clinical_trials: ClinicalTrials.gov API client. source_enrichment: Source metadata enrichment. sanitize: External content sanitisation before LLM ingestion.

Web search tool for research agents.

Provides a pluggable backend architecture via WebSearchBackend protocol. The default backend uses Brave's LLM Context API which returns pre-processed content chunks optimized for LLM consumption. Results are sanitized before being returned to callers.

WebSearchResult

Bases: BaseModel

A single web search result.

WebSearchBackend

Bases: Protocol

Protocol for pluggable web search backends.

BraveSearchBackend(api_key, maximum_number_of_tokens=2000)

Brave LLM Context API backend for web search.

Uses the /res/v1/llm/context endpoint which returns pre-processed content chunks ranked for LLM consumption. Same pricing as regular web search ($5/1K requests).

search(query, max_results) async

Search using Brave LLM Context API.

Parameters:

Name Type Description Default
query str

Search query string.

required
max_results int

Maximum number of URL results to return.

required

NotImplementedBackend

Stub backend that raises until a real provider is configured.

configure_backend(backend)

Set the module-level web search backend.

configure_brave_backend()

Initialize BraveSearchBackend if brave_api_key is configured.

Search the web and return sanitized results.

Delegates to the module-level _backend. All string fields in each result are passed through sanitize_external_content before being returned.

PubMed

pubmed

PubMed E-utilities client for searching and fetching article metadata.

Uses NCBI E-utilities (esearch + efetch) with rate limiting to stay within NCBI's usage policies. All external text is sanitized before returning.

PubMedArticle

Bases: BaseModel

Parsed article metadata from PubMed.

PubMedError(pmid, detail='')

Bases: Exception

Raised when a PubMed API operation fails.

search_pubmed(query, max_results=20, date_range_days=7) async

Search PubMed and return PMIDs with titles and publication dates.

First calls esearch to get matching PMIDs, then esummary to fetch titles and dates — enabling the LLM to select relevant papers before fetching full abstracts.

Parameters:

Name Type Description Default
query str

Free-text search query.

required
max_results int

Maximum number of results to return (default 20).

20
date_range_days int

Restrict to articles published within the last N days.

7

fetch_abstract(pmid) async

Fetch article metadata for a single PMID from PubMed efetch.

Returns a fallback :class:PubMedArticle with empty fields if the article is unavailable (retracted, withdrawn, or missing from PubMed). Raises :class:PubMedError only on unexpected parse failures.

fetch_paper(pmid_or_doi) async

Fetch article metadata by PMID or DOI.

If pmid_or_doi contains 10. it is treated as a DOI and resolved via a PubMed search first. Otherwise it is treated as a numeric PMID. Raises :class:PubMedError for URLs or other non-identifier input.

Semantic Scholar

semantic_scholar

Semantic Scholar Academic Graph API client.

Searches for papers by query, returning titles, URLs, and citation counts. Abstracts are fetched separately via fetch_s2_paper for selected papers. All external text is sanitized before returning. Rate-limited to 1 request per 2 seconds to stay within the authenticated API tier.

SemanticScholarPaper

Bases: BaseModel

A single paper result from Semantic Scholar.

search_semantic_scholar(query, max_results=5, year='') async

Search Semantic Scholar for papers matching query.

Returns lightweight results (titles, dates, citation counts) WITHOUT abstracts. Use :func:fetch_s2_paper to get full details for selected papers.

Parameters:

Name Type Description Default
query str

Free-text search query.

required
max_results int

Maximum number of papers to return (default 5).

5
year str

Optional year range filter (e.g. "2020-2025").

''

fetch_s2_paper(paper_id) async

Fetch full paper details (including abstract) for a single paper.

Parameters:

Name Type Description Default
paper_id str

Semantic Scholar paper ID.

required

Returns:

Type Description
SemanticScholarPaper

Full paper with abstract, title, URL, date, and citation count.

Raises:

Type Description
HTTPStatusError

On API errors (404 if paper not found).

Vault Reader

vault_reader

Search vault notes for tinnitus research context via DB-backed hybrid search.

Provides hybrid vector + full-text search over vault notes stored in Postgres. Matched content is sanitized before being returned to agents.

VaultNote

Bases: BaseModel

A single vault note with optional relevant section excerpts.

get_embedding(query) async

Embed query via OpenAI text-embedding-3-small.

Returns None on any failure so callers can fall back to text-only search.

read_vault_notes(search_terms, folders=None) async

Search vault notes using hybrid vector + full-text search.

Parameters:

Name Type Description Default
search_terms list[str]

Keywords combined into a single query for semantic and text search.

required
folders list[str] | None

Accepted for backward compatibility but effectively a no-op. All synced notes come from Know-how/.

None

Returns:

Type Description
list[VaultNote]

Notes whose chunks match the query, grouped by note with relevant sections from matching chunks. Returns [] on any error.

Sanitization

sanitize

Content sanitization for external text before LLM or storage use.

Strips control characters, collapses excessive whitespace, escapes pipeline XML delimiters, and truncates to a safe length.

sanitize_external_content(text, max_length=10000)

Sanitize external content for safe use in LLM prompts and storage.

  1. Strip control characters (except \n and \t)
  2. Collapse runs of 3+ spaces to 2 spaces
  3. Escape pipeline XML delimiters to prevent delimiter confusion
  4. Truncate to max_length characters