OpenAI and Anthropic ship hosted tools alongside their inference APIs — web search, code execution, file search, computer use. The pattern is now standard for agent-era applications: one API call, many native capabilities. The catch for a sovereign platform is that those tools run on US infrastructure. If a model on Stav invokes OpenAI's web_search_preview tool, the search query and the returned content transit US-controlled systems and the sovereignty story leaks at the tool-call boundary.
Stav closes that gap with five sovereign built-in tools. Each is operated by Stav on EU infrastructure, exposed through the standard OpenAI tools array, and reusable across every model in the catalogue — including the routed-commercial ones. A request through Claude on Stav can invoke stav.web_search, and the search query never reaches Anthropic.
The five tools
Each tool is governed per team (admins enable each tool explicitly), billed per invocation, and audit-logged with its execution jurisdiction. Default posture is all tools off — enabling code execution or web access has to be a deliberate decision.
How to invoke
Sovereign tools ride the standard OpenAI tools array. The reserved stav.* function-name prefix tells the gateway to execute the tool server-side instead of returning a tool-call request for your application to handle.
One-off invocation
response = client.chat.completions.create(
model="auto",
messages=[
{"role": "user", "content": "What did the EU AI Act say about general-purpose model obligations?"}
],
tools=[{"type": "function", "function": {"name": "stav.web_search"}}],
)
print(response.choices[0].message.content)
The model decides whether to call the tool, calls it, reads the results, and continues. From your side, that's a single HTTP request — no callback, no second round trip.
Declarative defaults
For agents that want a fixed tool suite available across every request, use the stav_tools shorthand:
response = client.chat.completions.create(
model="auto",
messages=[...],
extra_body={"stav_tools": ["web_search", "file_search", "code_execution"]},
)
The shorthand expands to the full tools array at dispatch. Tools the team hasn't enabled are silently dropped with an info-level warning header.
What the response carries
Sovereign tool invocations show up as a new tool_results block on the assistant message, parallel to tool_calls:
{
"choices": [{
"message": {
"role": "assistant",
"content": "The EU AI Act, under Article 55, imposes transparency and systemic-risk obligations on providers of general-purpose AI models …",
"tool_results": [
{
"tool_name": "stav.web_search",
"tool_call_id": "call_0x12ab",
"query": "EU AI Act general-purpose model obligations Article 55",
"source_count": 6,
"sources": [
{"url": "https://eur-lex.europa.eu/...", "title": "Regulation (EU) 2024/1689", "snippet_chars": 480}
],
"jurisdiction": "EU",
"duration_ms": 1120
}
]
},
"finish_reason": "stop"
}],
"usage": {
"prompt_tokens": 210,
"completion_tokens": 340,
"tool_invocations": {"stav.web_search": 1},
"total_tokens": 550
}
}
Two design notes worth highlighting:
- Every
tool_resultsentry carries"jurisdiction": "EU". That's the contract — sovereign tools execute on EU infrastructure, and the response makes that auditable per invocation. - Citations map byte ranges in
contentto source indices intool_results[].sources. Same shape as Anthropic's field, a superset of OpenAI's . Useful for in-app linkbacks.
The five tools in detail
stav.web_search
Grounded web search routed through Qwant Enterprise (primary) and Mojeek (secondary). Queries enter Stav's EU gateway, hit the sovereign operator, results return through Stav. No query, no result, no identifier crosses non-EU infrastructure.
Per-request options:
web_search.max_results(default 6, max 20)web_search.region_filter(.eu,.de,.fr,.no, …) to constrain result geography
Default per-team rate limit is 500 queries/hour. Query text is part of the inference audit log; result snippets are stored hashed (for dedup) with a 30-day retention.
stav.code_execution
Firecracker microVMs running on a dedicated CPU pool at Green Mountain DC2. Each invocation gets a fresh VM: 1 vCPU, 2 GB RAM, 512 MB rootfs, 30-second wall-clock, no network egress by default. VMs are torn down immediately after response.
Why Firecracker rather than gVisor or container-only sandboxes — VM-level isolation with sub-200 ms cold-start, same model AWS Lambda runs on. For regulated workloads, container-level isolation isn't sufficient. The Python scientific stack (numpy, pandas, scipy, matplotlib, , , , ) is pre-installed.
Egress is opt-in per team, allowlist-only. Optional per-session scratchpad (256 MB, 24-hour TTL) for multi-turn analysis. Static analysis flags obvious-sensitive patterns (SSN, IBAN, credit card) to the admin audit log.
stav.file_search
RAG over team-uploaded documents. Backed by Qdrant on Stav's Kubernetes cluster at DC2 plus a sovereign embedding model served on Stav GPUs. Per-team namespaces with hard isolation.
Ingestion via the OpenAI-compatible /v1/vector_stores endpoint: upload → chunker → sovereign embedder → Qdrant. Query via the tool: model emits stav.file_search with optional store_ids, top-k retrieval with citations folds into the next model turn.
Storage quotas are tier-bound: Starter 500 MB, Professional 10 GB, Enterprise negotiated. PII detection runs on ingestion.
stav.url_fetch
Fetch a specific URL the model knows about (often produced by stav.web_search in a prior turn) and return readable markdown. Splits find from read — the industry-standard pattern for grounded generation.
Hard block list: *.onion, localhost, RFC1918 ranges, cloud metadata endpoints, known-malicious domain lists. Per-team allowlist-only mode is available for high-compliance teams. Defaults: 2 MB max payload, 10-second timeout, no credentials followed, three-hop redirect cap.
stav.document_ocr
Structured extraction from invoices, contracts, IDs, scanned documents. The agent passes a file_reference; Stav runs it through the document-vision pipeline (Qwen2-VL-72B on :precision, InternVL2-26B on :fast for high-volume simple extraction) and returns structured JSON.
Stav ships a starter library of extraction templates for common EU document types: invoice-eu, contract-generic, kyc-passport-eu, kyc-driver-license-eu, sepa-mandate, bank-statement. Enterprise customers can add custom templates under service agreement.
When to use which
A loose decision rule:
Sovereignty consequences
When you enable any sovereign tool, requests that use the tool are kept inside the sovereign candidate pool by the Smart Router, even if your team's sovereignty_preference is off. The reason — once a tool execution is sovereign, the model call chain has to be too, or the sovereignty story falls apart at the boundary the tool was supposed to close.
This is the design philosophy: sovereign tools and sovereign inference are coupled by construction. You can mix them with routed-commercial models on a per-request basis (a sensitive request invokes web search and lands on Llama 4; a follow-up creative request uses no tools and routes to Claude). What you can't do is have a routed-commercial model invoke a sovereign tool — the tools strategy doc explains the rationale.
Next steps
- Sovereignty — what the three tiers mean and how the tool layer reinforces them.
- Smart Router — how routing interacts with the sovereign-tool constraint.