Smartflow API Reference
Platform v3.0 • SDK v0.3.0 • February 2026
Smartflow is an enterprise AI gateway that proxies requests to multiple LLM providers, enforces compliance policy, caches semantically, and orchestrates MCP tools and A2A agents. This document covers every API surface the platform exposes: the LLM proxy, management APIs, MCP gateway, A2A gateway, vector store, RAG pipeline, and the Python SDK.
Architecture Overview
Smartflow runs as five cooperating services:
smartflow (proxy)
7775
LLM proxy, MCP gateway, A2A gateway, semantic caching, pre/post-call compliance hooks
api_server (management)
7778
Virtual keys, routing chains, audit logs, analytics
compliance_api_server
7777
ML content scanning, PII redaction, adaptive learning, intelligent scan
policy_perfect_api
7782
Policy and preset CRUD, AI document-to-policy generation, assignment management
smartflow-hybrid-bridge
3500
Cross-datacenter Redis log aggregation
All five services share one Redis instance for shared state: routing tables, semantic cache, VAS logs, provider latency metrics, virtual key budgets, and MCP server registry. The Policy Perfect API additionally requires PostgreSQL for durable policy and preset storage. In production the proxy sits behind a TLS-terminating reverse proxy (Caddy or nginx). Management, compliance, and policy APIs are backend surfaces.
Authentication
Virtual Keys
The primary credential. Issue sk-sf-{48-hex} tokens through the management API. Each key carries optional spend limits and model restrictions.
Authorization: Bearer sk-sf-a1b2c3...Provider API Keys
Stored server-side in Redis. Clients never send raw provider credentials. The proxy resolves the correct key from the key store when forwarding requests to providers.
Anthropic Native Passthrough
For /anthropic/* routes, the proxy automatically injects the configured ANTHROPIC_API_KEY. Clients do not need to supply an x-api-key header.
JWT (Application Layer)
The SafeChat product and dashboard use smartflow_token cookie-based JWT for browser sessions. JWT validation occurs at the application layer, not in the proxy.
LLM Proxy Endpoints
All proxy endpoints are on port 7775 by default.
/v1/chat/completions
POST/v1/chat/completions
OpenAI-compatible chat completions. Accepts any OpenAI-format request body. Provider and model are resolved from the model name or an explicit prefix.
Model-prefix routing:
gpt-, o1-, o3-, chatgpt-
OpenAI
claude-*
Anthropic
gemini-*
Google Gemini
grok-*
xAI
mistral-, mixtral-
Mistral AI
command-, c4ai-
Cohere
llama-, groq/
Groq
openrouter/*
OpenRouter
ollama/*
Local Ollama
azure/*
Azure OpenAI
No prefix is required for the primary supported providers — model name heuristic detects gemini-*, claude-*, gpt-*, etc. automatically. An explicit provider/model prefix always takes precedence.
Multimodal — Image
Multimodal — Audio (gpt-4o-audio-preview)
Response
/anthropic/v1/messages
POST/anthropic/v1/messages
Native Anthropic Messages API passthrough. The proxy injects the API key from the server key store. The full Anthropic request and response format is preserved with no translation. Also accessible as /cursor/v1/messages for Cursor IDE passthrough. The [1m] suffix that Claude Code appends to model names is stripped automatically.
Multimodal — Image (native Anthropic)
Multimodal — PDF Document (native Anthropic)
/v1/embeddings
POST/v1/embeddings
Generate vector embeddings. Supports multi-provider routing via model prefix.
Response follows the OpenAI embeddings format with data[].embedding float arrays.
/v1/audio/transcriptions
POST/v1/audio/transcriptions
Transcribe audio. Multipart form upload. Routes to OpenAI Whisper by default. Use groq/whisper-large-v3 for Groq, deepgram/nova-2 for Deepgram.
/v1/audio/speech
POST/v1/audio/speech
Text-to-speech synthesis. Returns raw audio bytes.
/v1/images/generations
POST/v1/images/generations
/v1/rerank
POST/v1/rerank
Document reranking. Compatible with Cohere's rerank API.
/v1/models
GET/v1/models
List available models across all enabled providers.
/v1/completions
POST/v1/completions
Legacy text completions. Forwarded to the configured provider.
Routing and Provider Selection
Automatic Model-Name Heuristic
For requests to /v1/chat/completions with no explicit provider prefix, the proxy infers the provider from the model name. An explicit provider/model prefix always takes precedence over heuristic detection.
gpt-, o1-, o3-, o4-, chatgpt-, whisper-, tts-, dall-e-
OpenAI
claude-*
Anthropic
gemini-*
grok-*
xAI
mistral-, mixtral-
Mistral
command-*
Cohere
llama-*
Groq
Routing Strategies
Configured per fallback chain via the management API:
round_robin
Distribute requests across targets in order
weighted
Traffic proportional to assigned weights
least_connections
Send to provider with fewest in-flight requests
random
Random selection among healthy providers
priority
Try targets in order; fall back only on failure
latency
Route to provider with lowest p95 rolling EMA latency (tracked in Redis)
cost
Route to provider with lowest per-token cost; skip providers over daily budget cap
Fallback Chains
Named ordered provider lists with retry logic. Configured at POST /api/routing/fallback-chains.
On 429 or 5xx the proxy retries the next target with exponential backoff. Non-retryable 4xx errors bypass retry. Providers that have exceeded their daily budget cap are excluded from selection automatically.
MetaCache — Semantic Caching
The MetaCache intercepts every /v1/chat/completions request before any provider call is made.
How It Works
The incoming query is embedded and its cosine similarity is computed against stored request embeddings. If similarity exceeds the configured threshold, the cached response is returned. Otherwise the request is forwarded to the provider and the response is stored. Responses are semantically compressed before storage to reduce Redis footprint while preserving meaning.
Three tiers operate in sequence: L1 in-process memory, L2 Redis semantic similarity, L3 Redis exact match. Every cache lookup traverses all three before forwarding.
Per-Request Cache Controls
Cache-Control: no-cache
Bypass cache read; always query the provider
Cache-Control: no-store
Bypass cache write; do not cache this response
x-smartflow-cache-ttl: 3600
Override TTL in seconds for this response
x-smartflow-cache-namespace:
Scope cache to a logical partition
Cached responses return x-smartflow-cache-hit: true and x-smartflow-cache-key for client-side correlation.
MCP Gateway
Smartflow implements the Model Context Protocol (MCP) gateway. Register external MCP servers and invoke their tools through the proxy with shared authentication, budgeting, and audit logging.
Server Registry
GET/api/mcp/servers
List registered MCP servers.
POST/api/mcp/servers
Register an MCP server.
Tool Invocation
POST/{server_id}/mcp/
POST/mcp/v1/{server_id}/tools/call
The proxy authenticates the request, applies per-tool access controls, records cost, and forwards to the server.
Catalog and Search
GET/api/mcp/catalog
Browse the tool catalog across all registered servers.
GET/api/mcp/tools/search?q={query}&k={n}
Semantic search over the tool catalog. Returns the top k tools matching the natural-language query.
GET/api/mcp/tools/index
Full indexed tool list with embedding metadata.
Access Control
Per-server configuration fields for access control:
allowed_tools
string[]
If non-empty, only these tools may be called
disallowed_tools
string[]
These tools are always blocked
allowed_params
object
Per-tool parameter allowlists
guardrail_mode
string
"strict" — block on policy violation; "log" — flag and continue
available_on_public_internet
bool
If false, only accessible from approved network segments
Access Request Flow
GET/api/mcp/catalog/requests
POST/api/mcp/catalog/requests
POST/api/mcp/catalog/requests/{id}/approve
POST/api/mcp/catalog/requests/{id}/deny
OAuth Flow
GET/api/mcp/auth/initiate?server_id={id}
GET/api/mcp/auth/callback
GET/api/mcp/auth/tokens
Usage and Logs
GET/api/mcp/usage
Aggregated cost and call counts per server and tool.
GET/api/mcp/logs
Per-invocation audit logs.
API Generation from OpenAPI Spec
POST/api/mcp/generate
Auto-generate an MCP server adapter from an OpenAPI specification.
A2A Agent Gateway
Smartflow implements the A2A (Agent-to-Agent) protocol for inter-agent communication. Register external agents and invoke them with full logging and routing.
Agent Card
GET/a2a/{agent_id}/.well-known/agent.json
Returns the agent's machine-readable capability card: name, capabilities, supported task types, and authentication requirements.
Task Invocation
POST/a2a/{agent_id}
Send a task to a registered agent. The proxy forwards the request, captures the response, and logs both.
Supports synchronous JSON responses and SSE streaming for long-running tasks. Include x-a2a-trace-id to correlate task invocations across agents in distributed workflows.
Vector Store API
Built-in vector store backed by Redis. No external vector database required. All endpoints are on the proxy at port 7775.
POST/v1/vector_stores
Create a vector store.
Response includes id, name, description, file_count, created_at.
GET/v1/vector_stores
List all vector stores.
GET/v1/vector_stores/{id}
Get a specific vector store.
DELETE/v1/vector_stores/{id}
Delete a vector store and all its files.
POST/v1/vector_stores/{id}/files
Add a text document. The document is chunked and embedded automatically.
GET/v1/vector_stores/{id}/files
List files in a vector store.
POST/v1/vector_stores/{id}/search
Semantic search over stored documents.
RAG Pipeline API
Built on top of the vector store. Ingest documents with automatic chunking, then retrieve context for LLM augmentation.
POST/v1/rag/ingest
Chunk a document, embed each chunk, and store in a named vector store.
content
string
required
Full document text
vector_store_id
string
required
Target store (must already exist)
filename
string
""
Display name for the file
chunk_size
int
512
Characters per chunk
chunk_overlap
int
64
Overlap between consecutive chunks
metadata
object
{}
Arbitrary key-value metadata
Response: { "store_id", "file_id", "chunks_created", "status": "completed" }
POST/v1/rag/query
Embed a question, retrieve matching chunks, and optionally assemble a context string for injection into an LLM system prompt.
query
required
Natural language question
vector_store_id
required
Store to search
max_results
5
Maximum chunks to return
score_threshold
0.0
Minimum cosine similarity (0 = return all)
include_context
true
Concatenate chunks into a context string field
Response includes chunks[], context (concatenated string for prompt injection), and total.
Management API
Management API runs on port 7778.
Virtual Keys
GET/api/enterprise/vkeys
List all virtual keys.
POST/api/enterprise/vkeys
Create a virtual key.
DELETE/api/enterprise/vkeys/{key}
Revoke a virtual key.
Routing API
GET/api/routing/fallback-chains
POST/api/routing/fallback-chains
DELETE/api/routing/fallback-chains/{name}
GET/api/routing/status
Current routing state: active provider, fallback chain, last failure.
POST/api/routing/force-provider
Audit Logs (VAS)
GET/api/vas/logs?limit=50&provider=openai
Retrieve VAS audit logs. Every request proxied through Smartflow produces a log entry including: timestamp, provider, model, prompt tokens, completion tokens, cost in USD, cache hit flag, compliance flags, user context, and latency.
GET/api/vas/logs/hybrid
Retrieve logs aggregated across multiple Smartflow instances via the hybrid bridge.
Analytics
GET/api/analytics?period=7d
Usage analytics: request volume, cost by provider, cache hit rate, top models, top users.
Compliance API
The Compliance API runs on port 7777. It provides ML-based content scanning, PII detection and redaction, and an adaptive learning loop that improves over time based on human feedback. The proxy integrates with this service on every request when pre/post-call scanning is enabled.
POST/v1/compliance/scan
Rule-based compliance scan against configured policies.
POST/v1/compliance/intelligent-scan
Maestro ML policy engine. Evaluates intent against your organization's policy documents — not keyword matching.
Response includes risk_score (0–1), risk_level, recommended_action (Allow / Flag / Block), violations, explanation.
POST/v1/compliance/feedback
Submit a correction to improve the ML model's future predictions.
POST/v1/compliance/redact
Detect and redact PII from content. Returns the redacted string.
GET/v1/compliance/learning/status/{user_id}
GET/v1/compliance/learning/summary
GET/v1/compliance/ml/stats
GET/v1/compliance/org/baseline/{org_id}
Policy Perfect API
The Policy Perfect API runs on port 7782. It manages the organization's compliance policy library — the source documents the Maestro ML engine reads when evaluating requests. Backed by PostgreSQL for durable policy storage.
GET/health
Liveness check for the Policy Perfect service.
GET/api/stats
Aggregate counts for the current state of the policy library.
Policies
Policies are named compliance rules attached to scopes. The Maestro engine evaluates all active policies on every request.
Policy types:
compliance
Regulatory rules — HIPAA, GDPR, SOC 2, PCI-DSS, etc.
brand
Brand voice and communication standards
format
Output format constraints
role
Role-based access and behavior restrictions
industry
Industry-specific usage rules
legal
Legal department rules and disclaimers
security
Security guardrails and data handling policies
GET/api/policies
List all active policies.
POST/api/policies
Create a policy.
name
string
Policy display name
policy_type
string
One of the seven policy types above
content
string
Policy text read by the Maestro ML engine
priority
int
Evaluation order (0–100); higher values evaluated first
applicable_providers
string[]
Providers this policy applies to; ["all"] for universal
applicable_models
string[]
Models this policy applies to; ["all"] for universal
regulatory_framework
string
HIPAA, GDPR, SOC2, PCI-DSS, etc.
severity
string
critical, high, medium, low
metadata
object
Layer 2/3 targeting: source_ips, ad_groups, departments, applications
GET/api/policies/{id}
Get a policy by ID.
PUT/api/policies/{id}
Update a policy. All fields optional; only supplied fields are changed. Set "is_active": false to deactivate without deleting.
DELETE/api/policies/{id}
Delete a policy permanently.
Presets
Presets are named, ordered collections of policies. Assign a preset to a team, role, or virtual key instead of managing individual policies per scope.
GET/api/presets
List all presets. Each entry includes the preset metadata and its ordered policy list.
POST/api/presets
Create a preset.
Policy order in policy_ids determines evaluation priority.
GET/api/presets/{id}
Get a preset and its full ordered policy list.
AI Document-to-Policy Generation
Upload a compliance document (PDF, DOCX, TXT — up to 50 MB). The service uses GPT-4o to extract structured policy suggestions automatically. Processing is asynchronous; poll for progress with the returned job ID.
POST/api/policies/generate-from-document
Multipart form upload. Field name: file.
Immediate response:
GET/api/documents/job/{job_id}/progress
Poll for processing status. Status values: pending, processing, completed, failed.
GET/api/documents/job/{job_id}/results
Retrieve suggested policies once status is completed. Each suggestion includes a confidence score (0–1). Review suggestions and create live policies via POST /api/policies.
Alerting
Smartflow fires HTTP POST webhooks when threshold events occur. Configuration is via environment variables on the proxy server.
BudgetThreshold
Provider or virtual key spend exceeds configured cap
ProviderFailure
Error rate for a provider exceeds spike threshold
SlowRequest
Request latency exceeds the slow-request threshold
Custom
Programmatic alerts from the management API
Configure any combination of webhook destinations:
Alerts are fire-and-forget — they do not block the request that triggered them.
Observability
GET/health/liveliness
Returns 200 OK with {"status":"ok"} when the proxy process is running.
GET/health/readiness
Returns 200 OK when Redis is connected and providers are reachable.
GET/metrics
Prometheus-compatible metrics. Exposed metrics:
smartflow_requests_total
Request counter by provider, model, status
smartflow_request_latency_seconds
Request latency histogram
smartflow_cache_hits_total
Cache hit counter by tier (L1/L2/L3)
smartflow_cache_misses_total
Cache miss counter
smartflow_provider_errors_total
Upstream error counter by provider and status
smartflow_tokens_total
Token usage by provider and direction
smartflow_cost_usd_total
Cumulative cost by provider
smartflow_mcp_calls_total
MCP tool invocation counter by server and tool
smartflow_vkey_spend_usd
Per-virtual-key spend gauge
Python SDK
Installation
Requirements: Python 3.10+, httpx >= 0.24
SmartflowClient
The primary async client.
base_url
str
—
Proxy URL, e.g. "https://smartflow.example.com"
api_key
str
None
Virtual key sent as Authorization: Bearer
timeout
float
30.0
Request timeout in seconds
management_port
int
7778
Management API port
compliance_port
int
7777
Compliance API port
bridge_port
int
3500
Hybrid bridge port
Core AI Methods
chat()
Send a message, receive the reply as a plain string.
chat_completions()
Full OpenAI-compatible completions. Returns an AIResponse object.
stream_chat()
Async generator that yields text delta strings as they stream.
embeddings()
rerank()
Audio and Image Methods
audio_transcription()
text_to_speech()
image_generation()
Compliance Methods
check_compliance()
intelligent_scan()
redact_pii()
submit_compliance_feedback()
Monitoring Methods
get_cache_stats()
health_comprehensive()
Other monitoring methods
health()
Dict — basic liveness check
get_provider_health()
List[ProviderHealth] — latency + success rate per provider
get_logs(limit, provider)
List[VASLog] — audit log entries
get_analytics(period)
Dict — usage and cost analytics
get_routing_status()
Dict — current routing state
force_provider(provider, duration_seconds)
Dict — force routing for a duration
SmartflowAgent
Stateful agent with conversation memory and per-message compliance scanning.
chat(message, scan_input=True, scan_output=True)
Send message; raises ComplianceError if blocked
clear_history()
Reset conversation, preserve system prompt
get_history()
Return copy of message history
message_count
Number of messages in history
SmartflowWorkflow
Chain AI operations with branching and error handling.
"chat"
prompt, model, temperature
Chat completion; {input} / {output} are template variables
"compliance_check"
content
Compliance scan
"condition"
field, cases, default
Branch on a context value
SyncSmartflowClient
Synchronous wrapper for scripts and Jupyter notebooks. Every async method is available without await.
In Jupyter with an existing event loop: pip install nest_asyncio then nest_asyncio.apply().
OpenAI Drop-in Replacement
Any code targeting the OpenAI API works by pointing base_url at Smartflow. MetaCache, compliance scanning, VAS logging, and routing apply transparently.
Response Types
AIResponse
content
str
First choice text
choices
list
Full choices array
usage
Usage
Token usage
model
str
Model used
id
str
Response ID
CacheStats
hit_rate
float
total_requests
int
tokens_saved
int
cost_saved_usd
float
l1_hits
int
l2_hits
int
l3_hits
int
ComplianceResult
has_violations
bool
compliance_score
float
violations
list[str]
pii_detected
list[str]
risk_level
str — "low" / "medium" / "high" / "critical"
recommendations
list[str]
redacted_content
str | None
IntelligentScanResult
risk_score
float — 0.0 to 1.0
risk_level
str
recommended_action
str — "Allow" / "Flag" / "Block"
violations
list
explanation
str
Response Headers
Every proxied response includes these headers:
x-smartflow-provider
Provider that served the request
x-smartflow-model
Actual model used
x-smartflow-request-id
Unique request ID for log correlation
x-smartflow-cache-hit
true if response was served from MetaCache
x-smartflow-cache-key
Cache key when cache-hit is true
x-smartflow-latency-ms
Total proxy latency in milliseconds
x-smartflow-cost-usd
Estimated cost in USD for this request
x-smartflow-compliance-score
Compliance score (0–1) when pre-call scan is enabled
Environment Variables
Set on the Smartflow server; not used in client code.
Provider Keys
OPENAI_API_KEY
OpenAI
ANTHROPIC_API_KEY
Anthropic
GEMINI_API_KEY
Google Gemini
XAI_API_KEY
xAI / Grok
OPENROUTER_API_KEY
OpenRouter
AZURE_API_KEY, AZURE_API_BASE, AZURE_API_VERSION
Azure OpenAI
MISTRAL_API_KEY
Mistral AI
COHERE_API_KEY
Cohere
GROQ_API_KEY
Groq
DEEPGRAM_API_KEY
Deepgram
FIREWORKS_API_KEY
Fireworks AI
NVIDIA_NIM_API_KEY, NVIDIA_NIM_API_BASE
NVIDIA NIM
HUGGINGFACE_API_KEY, HUGGINGFACE_API_BASE
HuggingFace
TOGETHER_API_KEY
Together AI
PERPLEXITY_API_KEY
Perplexity AI
REPLICATE_API_KEY
Replicate
VERTEXAI_API_KEY, VERTEXAI_PROJECT, VERTEXAI_LOCATION
Vertex AI
AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_REGION
AWS Bedrock
NOVITA_API_KEY
Novita AI
VERCEL_AI_GATEWAY_API_KEY
Vercel AI Gateway
Feature Flags and Ports
GEMINI_ENABLED
false
Enable Google Gemini in intelligent routing
SMARTFLOW_ALERTS_ENABLED
true
Enable webhook alerting
SLACK_WEBHOOK_URL
—
Slack incoming webhook
TEAMS_WEBHOOK_URL
—
Microsoft Teams webhook
DISCORD_WEBHOOK_URL
—
Discord webhook
PROXY_PORT
7775
LLM proxy port
MANAGEMENT_PORT
7778
Management API port
COMPLIANCE_PORT
7777
Compliance API port
BRIDGE_PORT
3500
Hybrid bridge port
Error Reference
HTTP Status Codes
400
Malformed request — check body format
401
Missing or invalid API key
402
Virtual key budget exceeded
403
Request blocked by compliance policy
404
Resource or route not found
429
Rate limit exceeded (RPM or TPM)
500
Proxy internal error
502
Upstream provider returned an error
503
No providers available — fallback chain exhausted
SDK Exceptions
SmartflowError
Base class for all SDK errors
ConnectionError
Cannot connect to proxy
AuthenticationError
401 — invalid or missing key
RateLimitError
429 — rate limit hit
ComplianceError
403 — request blocked by policy
ProviderError
Upstream provider error
TimeoutError
Request timeout
Changelog
v3.0 (proxy) / v0.3.0 (SDK) — 2026
New in the proxy:
Vector Store API (
/v1/vector_stores/*) — Redis-backed, no external vector database requiredRAG Pipeline API (
/v1/rag/ingest,/v1/rag/query) — document chunking, embedding, context retrievalA2A Agent Gateway (
/a2a/*) — A2A protocol for inter-agent orchestrationWebhook alerting — Slack, Teams, Discord for budget, failure, and latency events
Model-name heuristic routing —
claude-*,gemini-*,gpt-*detected automaticallyAnthropic API key injection for
/anthropic/*passthroughCost-based and latency-based routing strategies
Prometheus metrics endpoint (
/metrics)MCP access control —
allowed_tools,disallowed_tools,guardrail_modeper serverMCP cost tracking via Redis
HINCRBYFLOAT
New in the SDK:
image_generation()— multi-provider image generationaudio_transcription()— multipart audio, Groq/Deepgram/Fireworks routingtext_to_speech()— returns raw audio bytesstream_chat()— async SSE iteratorrerank()— Cohere-compatible document rerankingExtended
embeddings()withencoding_format,dimensions,input_type