Smartflow API Reference

Platform v3.0 • SDK v0.3.0 • February 2026

Smartflow is an enterprise AI gateway that proxies requests to multiple LLM providers, enforces compliance policy, caches semantically, and orchestrates MCP tools and A2A agents. This document covers every API surface the platform exposes: the LLM proxy, management APIs, MCP gateway, A2A gateway, vector store, RAG pipeline, and the Python SDK.

Architecture Overview

Smartflow runs as five cooperating services:

Service
Default Port
Purpose

smartflow (proxy)

7775

LLM proxy, MCP gateway, A2A gateway, semantic caching, pre/post-call compliance hooks

api_server (management)

7778

Virtual keys, routing chains, audit logs, analytics

compliance_api_server

7777

ML content scanning, PII redaction, adaptive learning, intelligent scan

policy_perfect_api

7782

Policy and preset CRUD, AI document-to-policy generation, assignment management

smartflow-hybrid-bridge

3500

Cross-datacenter Redis log aggregation

All five services share one Redis instance for shared state: routing tables, semantic cache, VAS logs, provider latency metrics, virtual key budgets, and MCP server registry. The Policy Perfect API additionally requires PostgreSQL for durable policy and preset storage. In production the proxy sits behind a TLS-terminating reverse proxy (Caddy or nginx). Management, compliance, and policy APIs are backend surfaces.

Authentication

Virtual Keys

The primary credential. Issue sk-sf-{48-hex} tokens through the management API. Each key carries optional spend limits and model restrictions.

Authorization: Bearer sk-sf-a1b2c3...

Provider API Keys

Stored server-side in Redis. Clients never send raw provider credentials. The proxy resolves the correct key from the key store when forwarding requests to providers.

Anthropic Native Passthrough

For /anthropic/* routes, the proxy automatically injects the configured ANTHROPIC_API_KEY. Clients do not need to supply an x-api-key header.

JWT (Application Layer)

The SafeChat product and dashboard use smartflow_token cookie-based JWT for browser sessions. JWT validation occurs at the application layer, not in the proxy.

LLM Proxy Endpoints

All proxy endpoints are on port 7775 by default.

/v1/chat/completions

POST/v1/chat/completions

OpenAI-compatible chat completions. Accepts any OpenAI-format request body. Provider and model are resolved from the model name or an explicit prefix.

Model-prefix routing:

Prefix / Pattern
Provider

gpt-, o1-, o3-, chatgpt-

OpenAI

claude-*

Anthropic

gemini-*

Google Gemini

grok-*

xAI

mistral-, mixtral-

Mistral AI

command-, c4ai-

Cohere

llama-, groq/

Groq

openrouter/*

OpenRouter

ollama/*

Local Ollama

azure/*

Azure OpenAI

No prefix is required for the primary supported providers — model name heuristic detects gemini-*, claude-*, gpt-*, etc. automatically. An explicit provider/model prefix always takes precedence.

Multimodal — Image

Multimodal — Audio (gpt-4o-audio-preview)

Response

/anthropic/v1/messages

POST/anthropic/v1/messages

Native Anthropic Messages API passthrough. The proxy injects the API key from the server key store. The full Anthropic request and response format is preserved with no translation. Also accessible as /cursor/v1/messages for Cursor IDE passthrough. The [1m] suffix that Claude Code appends to model names is stripped automatically.

Multimodal — Image (native Anthropic)

Multimodal — PDF Document (native Anthropic)

/v1/embeddings

POST/v1/embeddings

Generate vector embeddings. Supports multi-provider routing via model prefix.

Response follows the OpenAI embeddings format with data[].embedding float arrays.

/v1/audio/transcriptions

POST/v1/audio/transcriptions

Transcribe audio. Multipart form upload. Routes to OpenAI Whisper by default. Use groq/whisper-large-v3 for Groq, deepgram/nova-2 for Deepgram.

/v1/audio/speech

POST/v1/audio/speech

Text-to-speech synthesis. Returns raw audio bytes.

/v1/images/generations

POST/v1/images/generations

/v1/rerank

POST/v1/rerank

Document reranking. Compatible with Cohere's rerank API.

/v1/models

GET/v1/models

List available models across all enabled providers.

/v1/completions

POST/v1/completions

Legacy text completions. Forwarded to the configured provider.

Routing and Provider Selection

Automatic Model-Name Heuristic

For requests to /v1/chat/completions with no explicit provider prefix, the proxy infers the provider from the model name. An explicit provider/model prefix always takes precedence over heuristic detection.

Pattern
Inferred Provider

gpt-, o1-, o3-, o4-, chatgpt-, whisper-, tts-, dall-e-

OpenAI

claude-*

Anthropic

gemini-*

Google

grok-*

xAI

mistral-, mixtral-

Mistral

command-*

Cohere

llama-*

Groq

Routing Strategies

Configured per fallback chain via the management API:

Strategy
Behavior

round_robin

Distribute requests across targets in order

weighted

Traffic proportional to assigned weights

least_connections

Send to provider with fewest in-flight requests

random

Random selection among healthy providers

priority

Try targets in order; fall back only on failure

latency

Route to provider with lowest p95 rolling EMA latency (tracked in Redis)

cost

Route to provider with lowest per-token cost; skip providers over daily budget cap

Fallback Chains

Named ordered provider lists with retry logic. Configured at POST /api/routing/fallback-chains.

On 429 or 5xx the proxy retries the next target with exponential backoff. Non-retryable 4xx errors bypass retry. Providers that have exceeded their daily budget cap are excluded from selection automatically.

MetaCache — Semantic Caching

The MetaCache intercepts every /v1/chat/completions request before any provider call is made.

How It Works

The incoming query is embedded and its cosine similarity is computed against stored request embeddings. If similarity exceeds the configured threshold, the cached response is returned. Otherwise the request is forwarded to the provider and the response is stored. Responses are semantically compressed before storage to reduce Redis footprint while preserving meaning.

Three tiers operate in sequence: L1 in-process memory, L2 Redis semantic similarity, L3 Redis exact match. Every cache lookup traverses all three before forwarding.

Per-Request Cache Controls

Header
Effect

Cache-Control: no-cache

Bypass cache read; always query the provider

Cache-Control: no-store

Bypass cache write; do not cache this response

x-smartflow-cache-ttl: 3600

Override TTL in seconds for this response

x-smartflow-cache-namespace:

Scope cache to a logical partition

Cached responses return x-smartflow-cache-hit: true and x-smartflow-cache-key for client-side correlation.

MCP Gateway

Smartflow implements the Model Context Protocol (MCP) gateway. Register external MCP servers and invoke their tools through the proxy with shared authentication, budgeting, and audit logging.

Server Registry

GET/api/mcp/servers

List registered MCP servers.

POST/api/mcp/servers

Register an MCP server.

Tool Invocation

POST/{server_id}/mcp/

POST/mcp/v1/{server_id}/tools/call

The proxy authenticates the request, applies per-tool access controls, records cost, and forwards to the server.

GET/api/mcp/catalog

Browse the tool catalog across all registered servers.

GET/api/mcp/tools/search?q={query}&k={n}

Semantic search over the tool catalog. Returns the top k tools matching the natural-language query.

GET/api/mcp/tools/index

Full indexed tool list with embedding metadata.

Access Control

Per-server configuration fields for access control:

Field
Type
Description

allowed_tools

string[]

If non-empty, only these tools may be called

disallowed_tools

string[]

These tools are always blocked

allowed_params

object

Per-tool parameter allowlists

guardrail_mode

string

"strict" — block on policy violation; "log" — flag and continue

available_on_public_internet

bool

If false, only accessible from approved network segments

Access Request Flow

GET/api/mcp/catalog/requests

POST/api/mcp/catalog/requests

POST/api/mcp/catalog/requests/{id}/approve

POST/api/mcp/catalog/requests/{id}/deny

OAuth Flow

GET/api/mcp/auth/initiate?server_id={id}

GET/api/mcp/auth/callback

GET/api/mcp/auth/tokens

Usage and Logs

GET/api/mcp/usage

Aggregated cost and call counts per server and tool.

GET/api/mcp/logs

Per-invocation audit logs.

API Generation from OpenAPI Spec

POST/api/mcp/generate

Auto-generate an MCP server adapter from an OpenAPI specification.

A2A Agent Gateway

Smartflow implements the A2A (Agent-to-Agent) protocol for inter-agent communication. Register external agents and invoke them with full logging and routing.

Agent Card

GET/a2a/{agent_id}/.well-known/agent.json

Returns the agent's machine-readable capability card: name, capabilities, supported task types, and authentication requirements.

Task Invocation

POST/a2a/{agent_id}

Send a task to a registered agent. The proxy forwards the request, captures the response, and logs both.

Supports synchronous JSON responses and SSE streaming for long-running tasks. Include x-a2a-trace-id to correlate task invocations across agents in distributed workflows.

Vector Store API

Built-in vector store backed by Redis. No external vector database required. All endpoints are on the proxy at port 7775.

POST/v1/vector_stores

Create a vector store.

Response includes id, name, description, file_count, created_at.

GET/v1/vector_stores

List all vector stores.

GET/v1/vector_stores/{id}

Get a specific vector store.

DELETE/v1/vector_stores/{id}

Delete a vector store and all its files.

POST/v1/vector_stores/{id}/files

Add a text document. The document is chunked and embedded automatically.

GET/v1/vector_stores/{id}/files

List files in a vector store.

POST/v1/vector_stores/{id}/search

Semantic search over stored documents.

RAG Pipeline API

Built on top of the vector store. Ingest documents with automatic chunking, then retrieve context for LLM augmentation.

POST/v1/rag/ingest

Chunk a document, embed each chunk, and store in a named vector store.

Field
Type
Default
Description

content

string

required

Full document text

vector_store_id

string

required

Target store (must already exist)

filename

string

""

Display name for the file

chunk_size

int

512

Characters per chunk

chunk_overlap

int

64

Overlap between consecutive chunks

metadata

object

{}

Arbitrary key-value metadata

Response: { "store_id", "file_id", "chunks_created", "status": "completed" }

POST/v1/rag/query

Embed a question, retrieve matching chunks, and optionally assemble a context string for injection into an LLM system prompt.

Field
Default
Description

query

required

Natural language question

vector_store_id

required

Store to search

max_results

5

Maximum chunks to return

score_threshold

0.0

Minimum cosine similarity (0 = return all)

include_context

true

Concatenate chunks into a context string field

Response includes chunks[], context (concatenated string for prompt injection), and total.

Management API

Management API runs on port 7778.

Virtual Keys

GET/api/enterprise/vkeys

List all virtual keys.

POST/api/enterprise/vkeys

Create a virtual key.

DELETE/api/enterprise/vkeys/{key}

Revoke a virtual key.

Routing API

GET/api/routing/fallback-chains

POST/api/routing/fallback-chains

DELETE/api/routing/fallback-chains/{name}

GET/api/routing/status

Current routing state: active provider, fallback chain, last failure.

POST/api/routing/force-provider

Audit Logs (VAS)

GET/api/vas/logs?limit=50&provider=openai

Retrieve VAS audit logs. Every request proxied through Smartflow produces a log entry including: timestamp, provider, model, prompt tokens, completion tokens, cost in USD, cache hit flag, compliance flags, user context, and latency.

GET/api/vas/logs/hybrid

Retrieve logs aggregated across multiple Smartflow instances via the hybrid bridge.

Analytics

GET/api/analytics?period=7d

Usage analytics: request volume, cost by provider, cache hit rate, top models, top users.

Compliance API

The Compliance API runs on port 7777. It provides ML-based content scanning, PII detection and redaction, and an adaptive learning loop that improves over time based on human feedback. The proxy integrates with this service on every request when pre/post-call scanning is enabled.

POST/v1/compliance/scan

Rule-based compliance scan against configured policies.

POST/v1/compliance/intelligent-scan

Maestro ML policy engine. Evaluates intent against your organization's policy documents — not keyword matching.

Response includes risk_score (0–1), risk_level, recommended_action (Allow / Flag / Block), violations, explanation.

POST/v1/compliance/feedback

Submit a correction to improve the ML model's future predictions.

POST/v1/compliance/redact

Detect and redact PII from content. Returns the redacted string.

GET/v1/compliance/learning/status/{user_id}

GET/v1/compliance/learning/summary

GET/v1/compliance/ml/stats

GET/v1/compliance/org/baseline/{org_id}

Policy Perfect API

The Policy Perfect API runs on port 7782. It manages the organization's compliance policy library — the source documents the Maestro ML engine reads when evaluating requests. Backed by PostgreSQL for durable policy storage.

GET/health

Liveness check for the Policy Perfect service.

GET/api/stats

Aggregate counts for the current state of the policy library.

Policies

Policies are named compliance rules attached to scopes. The Maestro engine evaluates all active policies on every request.

Policy types:

Type
Description

compliance

Regulatory rules — HIPAA, GDPR, SOC 2, PCI-DSS, etc.

brand

Brand voice and communication standards

format

Output format constraints

role

Role-based access and behavior restrictions

industry

Industry-specific usage rules

legal

Legal department rules and disclaimers

security

Security guardrails and data handling policies

GET/api/policies

List all active policies.

POST/api/policies

Create a policy.

Field
Type
Description

name

string

Policy display name

policy_type

string

One of the seven policy types above

content

string

Policy text read by the Maestro ML engine

priority

int

Evaluation order (0–100); higher values evaluated first

applicable_providers

string[]

Providers this policy applies to; ["all"] for universal

applicable_models

string[]

Models this policy applies to; ["all"] for universal

regulatory_framework

string

HIPAA, GDPR, SOC2, PCI-DSS, etc.

severity

string

critical, high, medium, low

metadata

object

Layer 2/3 targeting: source_ips, ad_groups, departments, applications

GET/api/policies/{id}

Get a policy by ID.

PUT/api/policies/{id}

Update a policy. All fields optional; only supplied fields are changed. Set "is_active": false to deactivate without deleting.

DELETE/api/policies/{id}

Delete a policy permanently.

Presets

Presets are named, ordered collections of policies. Assign a preset to a team, role, or virtual key instead of managing individual policies per scope.

GET/api/presets

List all presets. Each entry includes the preset metadata and its ordered policy list.

POST/api/presets

Create a preset.

Policy order in policy_ids determines evaluation priority.

GET/api/presets/{id}

Get a preset and its full ordered policy list.

AI Document-to-Policy Generation

Upload a compliance document (PDF, DOCX, TXT — up to 50 MB). The service uses GPT-4o to extract structured policy suggestions automatically. Processing is asynchronous; poll for progress with the returned job ID.

POST/api/policies/generate-from-document

Multipart form upload. Field name: file.

Immediate response:

GET/api/documents/job/{job_id}/progress

Poll for processing status. Status values: pending, processing, completed, failed.

GET/api/documents/job/{job_id}/results

Retrieve suggested policies once status is completed. Each suggestion includes a confidence score (0–1). Review suggestions and create live policies via POST /api/policies.

Alerting

Smartflow fires HTTP POST webhooks when threshold events occur. Configuration is via environment variables on the proxy server.

Alert Type
Trigger

BudgetThreshold

Provider or virtual key spend exceeds configured cap

ProviderFailure

Error rate for a provider exceeds spike threshold

SlowRequest

Request latency exceeds the slow-request threshold

Custom

Programmatic alerts from the management API

Configure any combination of webhook destinations:

Alerts are fire-and-forget — they do not block the request that triggered them.

Observability

GET/health/liveliness

Returns 200 OK with {"status":"ok"} when the proxy process is running.

GET/health/readiness

Returns 200 OK when Redis is connected and providers are reachable.

GET/metrics

Prometheus-compatible metrics. Exposed metrics:

Metric
Description

smartflow_requests_total

Request counter by provider, model, status

smartflow_request_latency_seconds

Request latency histogram

smartflow_cache_hits_total

Cache hit counter by tier (L1/L2/L3)

smartflow_cache_misses_total

Cache miss counter

smartflow_provider_errors_total

Upstream error counter by provider and status

smartflow_tokens_total

Token usage by provider and direction

smartflow_cost_usd_total

Cumulative cost by provider

smartflow_mcp_calls_total

MCP tool invocation counter by server and tool

smartflow_vkey_spend_usd

Per-virtual-key spend gauge

Python SDK

Installation

Requirements: Python 3.10+, httpx >= 0.24

SmartflowClient

The primary async client.

Parameter
Type
Default
Description

base_url

str

Proxy URL, e.g. "https://smartflow.example.com"

api_key

str

None

Virtual key sent as Authorization: Bearer

timeout

float

30.0

Request timeout in seconds

management_port

int

7778

Management API port

compliance_port

int

7777

Compliance API port

bridge_port

int

3500

Hybrid bridge port

Core AI Methods

chat()

Send a message, receive the reply as a plain string.

chat_completions()

Full OpenAI-compatible completions. Returns an AIResponse object.

stream_chat()

Async generator that yields text delta strings as they stream.

embeddings()

rerank()

Audio and Image Methods

audio_transcription()

text_to_speech()

image_generation()

Compliance Methods

check_compliance()

intelligent_scan()

redact_pii()

submit_compliance_feedback()

Monitoring Methods

get_cache_stats()

health_comprehensive()

Other monitoring methods

Method
Returns

health()

Dict — basic liveness check

get_provider_health()

List[ProviderHealth] — latency + success rate per provider

get_logs(limit, provider)

List[VASLog] — audit log entries

get_analytics(period)

Dict — usage and cost analytics

get_routing_status()

Dict — current routing state

force_provider(provider, duration_seconds)

Dict — force routing for a duration

SmartflowAgent

Stateful agent with conversation memory and per-message compliance scanning.

Method
Description

chat(message, scan_input=True, scan_output=True)

Send message; raises ComplianceError if blocked

clear_history()

Reset conversation, preserve system prompt

get_history()

Return copy of message history

message_count

Number of messages in history

SmartflowWorkflow

Chain AI operations with branching and error handling.

Action
Config fields
Description

"chat"

prompt, model, temperature

Chat completion; {input} / {output} are template variables

"compliance_check"

content

Compliance scan

"condition"

field, cases, default

Branch on a context value

SyncSmartflowClient

Synchronous wrapper for scripts and Jupyter notebooks. Every async method is available without await.

In Jupyter with an existing event loop: pip install nest_asyncio then nest_asyncio.apply().

OpenAI Drop-in Replacement

Any code targeting the OpenAI API works by pointing base_url at Smartflow. MetaCache, compliance scanning, VAS logging, and routing apply transparently.

Response Types

AIResponse

Field
Type
Description

content

str

First choice text

choices

list

Full choices array

usage

Usage

Token usage

model

str

Model used

id

str

Response ID

CacheStats

Field
Type

hit_rate

float

total_requests

int

tokens_saved

int

cost_saved_usd

float

l1_hits

int

l2_hits

int

l3_hits

int

ComplianceResult

Field
Type

has_violations

bool

compliance_score

float

violations

list[str]

pii_detected

list[str]

risk_level

str — "low" / "medium" / "high" / "critical"

recommendations

list[str]

redacted_content

str | None

IntelligentScanResult

Field
Type

risk_score

float — 0.0 to 1.0

risk_level

str

recommended_action

str — "Allow" / "Flag" / "Block"

violations

list

explanation

str

Response Headers

Every proxied response includes these headers:

Header
Description

x-smartflow-provider

Provider that served the request

x-smartflow-model

Actual model used

x-smartflow-request-id

Unique request ID for log correlation

x-smartflow-cache-hit

true if response was served from MetaCache

x-smartflow-cache-key

Cache key when cache-hit is true

x-smartflow-latency-ms

Total proxy latency in milliseconds

x-smartflow-cost-usd

Estimated cost in USD for this request

x-smartflow-compliance-score

Compliance score (0–1) when pre-call scan is enabled

Environment Variables

Set on the Smartflow server; not used in client code.

Provider Keys

Variable
Provider

OPENAI_API_KEY

OpenAI

ANTHROPIC_API_KEY

Anthropic

GEMINI_API_KEY

Google Gemini

XAI_API_KEY

xAI / Grok

OPENROUTER_API_KEY

OpenRouter

AZURE_API_KEY, AZURE_API_BASE, AZURE_API_VERSION

Azure OpenAI

MISTRAL_API_KEY

Mistral AI

COHERE_API_KEY

Cohere

GROQ_API_KEY

Groq

DEEPGRAM_API_KEY

Deepgram

FIREWORKS_API_KEY

Fireworks AI

NVIDIA_NIM_API_KEY, NVIDIA_NIM_API_BASE

NVIDIA NIM

HUGGINGFACE_API_KEY, HUGGINGFACE_API_BASE

HuggingFace

TOGETHER_API_KEY

Together AI

PERPLEXITY_API_KEY

Perplexity AI

REPLICATE_API_KEY

Replicate

VERTEXAI_API_KEY, VERTEXAI_PROJECT, VERTEXAI_LOCATION

Vertex AI

AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_REGION

AWS Bedrock

NOVITA_API_KEY

Novita AI

VERCEL_AI_GATEWAY_API_KEY

Vercel AI Gateway

Feature Flags and Ports

Variable
Default
Description

GEMINI_ENABLED

false

Enable Google Gemini in intelligent routing

SMARTFLOW_ALERTS_ENABLED

true

Enable webhook alerting

SLACK_WEBHOOK_URL

Slack incoming webhook

TEAMS_WEBHOOK_URL

Microsoft Teams webhook

DISCORD_WEBHOOK_URL

Discord webhook

PROXY_PORT

7775

LLM proxy port

MANAGEMENT_PORT

7778

Management API port

COMPLIANCE_PORT

7777

Compliance API port

BRIDGE_PORT

3500

Hybrid bridge port

Error Reference

HTTP Status Codes

Code
Meaning

400

Malformed request — check body format

401

Missing or invalid API key

402

Virtual key budget exceeded

403

Request blocked by compliance policy

404

Resource or route not found

429

Rate limit exceeded (RPM or TPM)

500

Proxy internal error

502

Upstream provider returned an error

503

No providers available — fallback chain exhausted

SDK Exceptions

Exception
Condition

SmartflowError

Base class for all SDK errors

ConnectionError

Cannot connect to proxy

AuthenticationError

401 — invalid or missing key

RateLimitError

429 — rate limit hit

ComplianceError

403 — request blocked by policy

ProviderError

Upstream provider error

TimeoutError

Request timeout

Changelog

1

v3.0 (proxy) / v0.3.0 (SDK) — 2026

New in the proxy:

  • Vector Store API (/v1/vector_stores/*) — Redis-backed, no external vector database required

  • RAG Pipeline API (/v1/rag/ingest, /v1/rag/query) — document chunking, embedding, context retrieval

  • A2A Agent Gateway (/a2a/*) — A2A protocol for inter-agent orchestration

  • Webhook alerting — Slack, Teams, Discord for budget, failure, and latency events

  • Model-name heuristic routing — claude-*, gemini-*, gpt-* detected automatically

  • Anthropic API key injection for /anthropic/* passthrough

  • Cost-based and latency-based routing strategies

  • Prometheus metrics endpoint (/metrics)

  • MCP access control — allowed_tools, disallowed_tools, guardrail_mode per server

  • MCP cost tracking via Redis HINCRBYFLOAT

New in the SDK:

  • image_generation() — multi-provider image generation

  • audio_transcription() — multipart audio, Groq/Deepgram/Fireworks routing

  • text_to_speech() — returns raw audio bytes

  • stream_chat() — async SSE iterator

  • rerank() — Cohere-compatible document reranking

  • Extended embeddings() with encoding_format, dimensions, input_type

2

v2.0 (proxy) / v0.2.0 (SDK)

  • MCP gateway with server registry, catalog, OAuth flow

  • SmartflowAgent with compliance scanning and conversation memory

  • SmartflowWorkflow for multi-step AI pipelines

  • Maestro ML policy engine (intelligent compliance)

3

v1.0 (proxy) / v0.1.0 (SDK)

  • OpenAI-compatible proxy, virtual keys, 3-tier semantic cache

  • Initial SDK: chat, chat_completions, embeddings

  • VAS audit logging, SyncSmartflowClient