Tools API¶
tools
¶
Agent tools — external data sources and utilities.
Each tool module exports a JSON schema constant (*_SCHEMA) describing
the tool's parameters and an async handler function that agents invoke
via the tool-use loop in agents.base.call_agent().
Modules: web_search: Brave Search API client. pubmed: PubMed E-utilities client (search + fetch abstracts). semantic_scholar: Semantic Scholar Academic Graph API client. keyword_data: SEO keyword volume and difficulty data. vault_reader: Hybrid pgvector + full-text search over Obsidian vault notes. vault_chunking: Chunk vault notes for embedding. source_scanner: Scan and validate source URLs. image_gen: AI image generation tool wrapper. image_colors: Image color extraction utilities. clinical_trials: ClinicalTrials.gov API client. source_enrichment: Source metadata enrichment. sanitize: External content sanitisation before LLM ingestion.
Web Search¶
web_search
¶
Web search tool for research agents.
Provides a pluggable backend architecture via WebSearchBackend protocol.
The default backend uses Brave's LLM Context API which returns pre-processed
content chunks optimized for LLM consumption. Results are sanitized before
being returned to callers.
WebSearchResult
¶
Bases: BaseModel
A single web search result.
WebSearchBackend
¶
Bases: Protocol
Protocol for pluggable web search backends.
BraveSearchBackend(api_key, maximum_number_of_tokens=2000)
¶
Brave LLM Context API backend for web search.
Uses the /res/v1/llm/context endpoint which returns pre-processed
content chunks ranked for LLM consumption. Same pricing as regular
web search ($5/1K requests).
search(query, max_results)
async
¶
Search using Brave LLM Context API.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
query
|
str
|
Search query string. |
required |
max_results
|
int
|
Maximum number of URL results to return. |
required |
NotImplementedBackend
¶
Stub backend that raises until a real provider is configured.
configure_backend(backend)
¶
Set the module-level web search backend.
configure_brave_backend()
¶
Initialize BraveSearchBackend if brave_api_key is configured.
web_search(query, max_results=5)
async
¶
Search the web and return sanitized results.
Delegates to the module-level _backend. All string fields in
each result are passed through sanitize_external_content before
being returned.
PubMed¶
pubmed
¶
PubMed E-utilities client for searching and fetching article metadata.
Uses NCBI E-utilities (esearch + efetch) with rate limiting to stay within NCBI's usage policies. All external text is sanitized before returning.
PubMedArticle
¶
Bases: BaseModel
Parsed article metadata from PubMed.
PubMedError(pmid, detail='')
¶
Bases: Exception
Raised when a PubMed API operation fails.
search_pubmed(query, max_results=20, date_range_days=7)
async
¶
Search PubMed and return PMIDs with titles and publication dates.
First calls esearch to get matching PMIDs, then esummary to fetch titles and dates — enabling the LLM to select relevant papers before fetching full abstracts.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
query
|
str
|
Free-text search query. |
required |
max_results
|
int
|
Maximum number of results to return (default 20). |
20
|
date_range_days
|
int
|
Restrict to articles published within the last N days. |
7
|
fetch_abstract(pmid)
async
¶
Fetch article metadata for a single PMID from PubMed efetch.
Returns a fallback :class:PubMedArticle with empty fields if the
article is unavailable (retracted, withdrawn, or missing from PubMed).
Raises :class:PubMedError only on unexpected parse failures.
fetch_paper(pmid_or_doi)
async
¶
Fetch article metadata by PMID or DOI.
If pmid_or_doi contains 10. it is treated as a DOI and resolved
via a PubMed search first. Otherwise it is treated as a numeric PMID.
Raises :class:PubMedError for URLs or other non-identifier input.
Semantic Scholar¶
semantic_scholar
¶
Semantic Scholar Academic Graph API client.
Searches for papers by query, returning titles, URLs, and citation counts.
Abstracts are fetched separately via fetch_s2_paper for selected papers.
All external text is sanitized before returning. Rate-limited to 1 request
per 2 seconds to stay within the authenticated API tier.
SemanticScholarPaper
¶
Bases: BaseModel
A single paper result from Semantic Scholar.
search_semantic_scholar(query, max_results=5, year='')
async
¶
Search Semantic Scholar for papers matching query.
Returns lightweight results (titles, dates, citation counts) WITHOUT
abstracts. Use :func:fetch_s2_paper to get full details for
selected papers.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
query
|
str
|
Free-text search query. |
required |
max_results
|
int
|
Maximum number of papers to return (default 5). |
5
|
year
|
str
|
Optional year range filter (e.g. |
''
|
fetch_s2_paper(paper_id)
async
¶
Fetch full paper details (including abstract) for a single paper.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
paper_id
|
str
|
Semantic Scholar paper ID. |
required |
Returns:
| Type | Description |
|---|---|
SemanticScholarPaper
|
Full paper with abstract, title, URL, date, and citation count. |
Raises:
| Type | Description |
|---|---|
HTTPStatusError
|
On API errors (404 if paper not found). |
Vault Reader¶
vault_reader
¶
Search vault notes for tinnitus research context via DB-backed hybrid search.
Provides hybrid vector + full-text search over vault notes stored in Postgres. Matched content is sanitized before being returned to agents.
VaultNote
¶
Bases: BaseModel
A single vault note with optional relevant section excerpts.
get_embedding(query)
async
¶
Embed query via OpenAI text-embedding-3-small.
Returns None on any failure so callers can fall back to text-only search.
read_vault_notes(search_terms, folders=None)
async
¶
Search vault notes using hybrid vector + full-text search.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
search_terms
|
list[str]
|
Keywords combined into a single query for semantic and text search. |
required |
folders
|
list[str] | None
|
Accepted for backward compatibility but effectively a no-op. All synced notes come from Know-how/. |
None
|
Returns:
| Type | Description |
|---|---|
list[VaultNote]
|
Notes whose chunks match the query, grouped by note with relevant
sections from matching chunks. Returns |
Sanitization¶
sanitize
¶
Content sanitization for external text before LLM or storage use.
Strips control characters, collapses excessive whitespace, escapes pipeline XML delimiters, and truncates to a safe length.
sanitize_external_content(text, max_length=10000)
¶
Sanitize external content for safe use in LLM prompts and storage.
- Strip control characters (except
\nand\t) - Collapse runs of 3+ spaces to 2 spaces
- Escape pipeline XML delimiters to prevent delimiter confusion
- Truncate to max_length characters