Smartflow Platform Capabilities

Version 1.4

Smartflow Platform Capabilities

The enterprise AI gateway that speaks LLM, MCP, and A2A — unified control plane with a policy engine that learns, a semantic cache that thinks, and observability that tells you exactly what happened and what it cost.

__LLM Proxy __ MCP Gateway __ A2A Agent Gateway __ MetaCache __ AI Policy Engine

MetaCache

Semantic similarity caching using embeddings. Not exact-match — conceptually equivalent prior answers are served from cache, collapsing redundant LLM calls across rephrased queries.

AI Policy Engine

Guardrail decisions made by an AI reading your actual compliance policies — not a keyword blocklist. Thresholds adapt over time from your organisation's real compliance outcomes.

Unified LLM + MCP + A2A

One gateway. One audit trail. One policy engine. All three protocols share identity, budget enforcement, compliance logging, and semantic caching infrastructure.

🤖 LLM Proxy

OpenAI-Compatible Endpoint with Provider Auto-Routing

LLM Proxy

Drop-in replacement for the OpenAI API. Any client using /v1/chat/completions works with zero changes. Streaming (text/event-stream), function/tool calling, and extended context are all supported. Provider is resolved automatically from the model name: gpt- → OpenAI, claude- → Anthropic, gemini-* → Google, ollama/* → local Ollama**.

Unlocks

Existing OpenAI SDK clients, LangChain apps, and LlamaIndex pipelines connect to Smartflow and gain audit logging, policy enforcement, cost tracking, and semantic caching without any code changes.

Cursor IDE / Claude Code Passthrough

LLM Proxy

Cursor and Claude Code point directly at Smartflow. Requests at /anthropic/* are forwarded natively to Anthropic with full header preservation. The [1m] extended-context suffix that Claude Code appends to model names is stripped automatically before forwarding.

Unlocks

Route every developer's IDE session through Smartflow for centralised logging, budget enforcement, and policy application — with zero client-side changes.

Virtual Keys with Spend Budgets

Key Management

Smartflow-issued virtual keys (sk-sf-...) carry hard spend limits scoped to teams, models, or use cases. Budget periods: daily, weekly, monthly, lifetime. Budget is checked before every request — if exceeded, the request returns 429 before any provider cost is incurred. Spend is recorded after each response using actual token cost.

Unlocks

Per-user, per-team, and per-application cost caps with zero-tolerance enforcement. Issue keys to internal users or external partners with guaranteed spend control.

Semantic Similarity Caching ★ UNIQUE

MetaCache

Every request is embedded and compared against stored request embeddings using cosine similarity. If a prior request is semantically close enough (configurable threshold), the cached response is returned — not as an exact string match, but as a conceptually equivalent prior answer. Responses are semantically compressed before storage using the same embedding model, reducing Redis memory footprint while preserving meaning.

Why This Matters

A user asking "what are the side effects of ibuprofen?" and another asking "ibuprofen side effects?" resolve to the same cached response. Exact-match caches miss this entirely. MetaCache collapses rephrased, paraphrased, and semantically equivalent queries into a single LLM call — silently, without any application changes.

Unlocks

Dramatically lower provider costs on repeated-topic workloads. Faster response times for semantically redundant queries. No application changes required.

Per-Request Cache Controls

Caching

Callers control caching behaviour on individual requests without changing server configuration: Cache-Control: no-cache — bypass read | no-store — bypass write x-smartflow-cache-ttl: 3600 — override TTL | x-smartflow-cache-namespace: team-a — scope to a logical partition Every cached response returns x-smartflow-cache-hit: true and x-smartflow-cache-key for client-side correlation.

Unlocks

Mix cacheable and non-cacheable calls in the same integration. Real-time lookups that must never be stale coexist with deterministic queries that benefit from caching — controlled per-request, not per-route.

Learning-Based Guardrails ★ UNIQUE

Policy Engine

Maestro runs as a pre-call and post-call validation pass. It reads your organisation's compliance policies (stored as documents in the Policy Perfect API), embeds them, and evaluates each request against the semantic intent of those policies — not surface-level keyword matching. The Policy Perfect API maintains a living corpus of compliance decisions. Every flagged or blocked request outcome is fed back into the policy model. Guardrail thresholds adapt over time based on your organisation's actual decisions — not a vendor's preset calibration.

Why This Matters

Static keyword blocklists block "kill process" in a DevOps context but miss subtle policy violations in legal or medical text. Maestro reads the policy the same way a compliance officer would and makes contextual judgements. It gets better as your team reviews its decisions.

Unlocks

Compliance enforcement that improves over time. Zero false positives from keyword collisions. A policy engine that matches the nuance of your actual compliance requirements rather than a generic vendor baseline.

Policy Groups with Inheritance + Tag-Wildcard Scoping

Policy

Named guardrails (pii, toxicity, prompt_injection, compliance, custom) are grouped into named policies. Policies inherit from a parent and override specific guardrails — no duplication. Policies attach to scopes: team, virtual-key alias, model pattern, or tag wildcard (e.g. hipaa-*). Every proxied request returns headers listing which policies matched and why.

Endpoints

POST /api/guardrails, GET /api/guardrails
POST /api/policies, GET /api/policies/{name}
POST /api/policies/attachments
POST /api/policies/resolve — preview guardrails for a context

HTTP, SSE, and STDIO Transports

MCP Gateway

Every MCP transport type is supported. HTTP — standard JSON-RPC 2.0 over HTTPS. SSE — proxy opens and maintains the Server-Sent Events stream; events are parsed and routed transparently. STDIO — Smartflow spawns the local child process, communicates over stdin/stdout using JSON-RPC 2.0, and manages the process lifecycle automatically.

Unlocks

Real-time event-driven MCP servers, local CLI tools, and standard community servers (GitHub CLI, filesystem access, local databases) can all be registered and used through the same gateway.

Per-Server Tool Access Control

MCP Gateway

Each MCP server carries a fine-grained access policy: allowed_tools (whitelist), disallowed_tools (blacklist, overrides whitelist), and allowed_params (per-tool parameter allow-lists). Requests that call disallowed tools or pass disallowed parameters are rejected at the gateway before reaching the MCP server. available_on_public_internet: false blocks requests from non-RFC-1918 IPs entirely.

Unlocks

Fine-grained least-privilege enforcement for every MCP server — no trust required at the MCP server level. Public-facing deployments with private MCP infrastructure are safely isolated.

Semantic Tool Filtering

MCP Gateway

All tools across all registered servers are indexed with embeddings of their name, description, and parameter signatures. Passing x-mcp-query: summarize a PDF on a tools/list request returns only semantically relevant tools — not the full catalogue. Agents discover capabilities by intent, not by knowing server names.

Endpoints

GET /api/mcp/tools/search?q=read+a+file&k=5
POST /api/mcp/tools/reindex

Built-In Vector Store API ★ UNIQUE

Vector / RAG

OpenAI-compatible vector store API backed by Redis and EmbeddingService — no external vector database required. CRUD for stores, automatic chunk-and-embed on file ingest, and top-K semantic search. The RAG pipeline endpoints compose retrieval-augmented generation end-to-end: ingest a document once, query it with a natural language question, and receive assembled context chunks ready to inject into a prompt.

Vector Store Endpoints

POST /v1/vector_stores — create store
GET /v1/vector_stores — list stores
POST /v1/vector_stores/{id}/files — ingest text (chunked + embedded)
POST /v1/vector_stores/{id}/search — top-K semantic search

RAG Endpoints

POST /v1/rag/ingest — chunk, embed, and store a document
POST /v1/rag/query — embed question, retrieve context, return chunks

OAuth Client Credentials Auto-Refresh

MCP Auth

MCP servers secured with OAuth 2.0 client credentials flow are fully supported. Smartflow automatically obtains, caches, and refreshes tokens on behalf of the caller. Compatible with Azure AD, Okta, Auth0, and any standards-compliant OAuth provider.

MCP Auth

MCP servers requiring individual user consent use the PKCE flow. The user is redirected to the provider's consent screen; after authorisation, Smartflow exchanges the code and stores a user-scoped token in Redis. Tokens are scoped per user and server and expire independently.

Endpoints

GET /api/mcp/auth/initiate?server_id=...&user_id=...
GET /api/mcp/auth/callback
GET /.well-known/oauth-protected-resource
GET /.well-known/oauth-authorization-server

Per-Request Server Auth Header Forwarding

MCP Auth

Callers pass server-specific credentials using x-mcp-{alias}-{header-name} headers. Smartflow extracts and forwards them only to the intended server and strips them before forwarding to the end LLM. User-specific or request-specific credentials (session tokens, scoped API keys) can be forwarded to MCP servers without storing them centrally. No credential leakage between servers.

PreCall / DuringCall / Disabled Guardrail Modes

MCP Policy

Each registered MCP server declares its guardrail mode. PreCall — compliance is scanned before the tool is called; violations block the request. DuringCall — the MCP server response is scanned; violations in the response are caught before the result reaches the LLM. Disabled — no scanning for performance-sensitive internal tools.

Unlocks

Compliance coverage for both inbound tool requests and outbound tool responses — catching violations at both ends of the MCP call without requiring changes to the MCP server itself.

Google A2A Open Protocol Gateway ★ UNIQUE

Agent Gateway

Smartflow implements the Google A2A open protocol, making it interoperable with any A2A-compatible agent runtime: LangGraph, Vertex AI, Azure AI Foundry, Amazon Bedrock AgentCore, Pydantic AI. Agents are registered as named profiles in Redis with their own Agent Cards and skill declarations. External agent systems connect without custom integration code.

Unlocks

Cross-framework agent collaboration. LangGraph agents talk to Pydantic AI agents through Smartflow. Task chains span services with full traceability via X-A2A-Trace-Id. Task history is persisted in Redis for replay and audit.

Endpoints

GET /.well-known/agent.json — gateway Agent Card
GET /a2a/{id}/.well-known/agent.json — per-agent card
POST /a2a/{id} — tasks/send, tasks/sendSubscribe, tasks/get, tasks/cancel
GET/POST /api/a2a/agents — agent management

Fallback Chains with Retry and Backoff

Routing

Named fallback chains define an ordered list of provider targets. Retryable errors (429, 5xx) trigger exponential backoff before trying the next target. Non-retryable errors (4xx) move immediately to the next step. Chains are stored in Redis and manageable via API.

Unlocks

High-availability LLM routing with no single point of failure. Multi-provider redundancy configurable per model or use case without application-level changes.

Latency-Based + Tag-Based + Budget-Capped Routing

Routing

Latency-based (SMARTFLOW_ROUTING_STRATEGY=latency) — rolling p95 EMA tracked per provider in Redis; requests route to the fastest live provider. Tag-based (strategy=tag) — x-smartflow-tags header matched against per-provider capability tags in Redis. Budget caps (SMARTFLOW_PROVIDER_BUDGETS=openai:100,anthropic:50) — provider is skipped when its daily spend cap is reached; fallback chain takes over automatically.

Microsoft Entra ID SSO + Group Sync

Identity

On every SSO sign-in, Smartflow decodes the OIDC id_token, extracts Entra group memberships and App Role claims, and automatically creates or updates Smartflow teams in Redis. Users are added to teams they belong to and removed from teams they have left. App Role values map to internal roles: proxy_admin, org_admin, proxy_admin_viewer, internal_user. Access controls, budgets, and guardrail policies attached to teams take effect immediately when membership changes in Entra.

Unlocks

Zero-touch team provisioning from Entra ID. No manual group-to-team mapping. Spend limits and compliance policies follow group membership automatically.

Endpoints

POST /api/auth/sso/config
POST /api/auth/sso/signin
GET /api/auth/sso/teams
GET /api/auth/sso/users/{id}/teams

Prometheus /metrics Endpoint

Observability

GET /metrics exposes text-format Prometheus metrics: per-provider daily spend, per-provider rolling p95 latency, MCP call counts and costs by server, vector store count, and version info. Scrape directly into any Prometheus + Grafana stack.

Standardised Response Headers

Observability

Every response from the proxy carries a complete observability header set: x-smartflow-call-id — unique trace ID | x-smartflow-response-cost — USD cost | x-smartflow-cache-hit — true/false | x-smartflow-duration-ms — end-to-end latency | x-smartflow-provider — which provider served the response

Alerting Webhooks — Slack, Teams, Discord

Alerting

Fire-and-forget webhook alerts on: provider budget threshold breach, provider failure spike, and slow/hanging API calls. Configure via environment variables: SLACK_WEBHOOK_URL, TEAMS_WEBHOOK_URL, DISCORD_WEBHOOK_URL. Alerts are non-blocking and do not add latency to the request path.

VAS Log Audit Trail + Compliance Dashboard

Compliance

Every proxied request writes a structured VAS log entry capturing: user identity, model, provider, token counts, cost estimate, cache outcome, end-to-end latency, matched policy names, and guardrail decisions. Logs are persisted in Redis (hot) and MongoDB (archive). The compliance dashboard surfaces every entry with named users, full Q&A replay in a modal panel, and filterable policy-match views.

Unlocks

Complete per-user, per-request audit trail for compliance reporting. Every LLM call, every MCP tool invocation, and every A2A task is logged with the identity of who triggered it and which policies applied.

Complete Feature Summary

Feature

Area

OpenAI-compatible LLM proxy with provider auto-routing

LLM Proxy

Cursor IDE / Claude Code passthrough

LLM Proxy

Virtual keys with spend budgets (daily / weekly / monthly)

Key Mgmt

MetaCache — semantic similarity caching★

MetaCache

Per-request cache controls (no-cache, no-store, ttl, namespace)

Caching

AI Policy Engine (Maestro) — learning-based guardrails★

Policy

Policy groups with parent inheritance + tag-wildcard scoping

Policy

Guardrail policy response headers on every request

Policy

MCP HTTP / SSE / STDIO transports

MCP

Per-server tool allow/deny lists + parameter allow-lists

MCP

Semantic tool filtering via embedding index

MCP

Built-in vector stores + RAG pipeline★

RAG

MCP server aliases + per-alias routing

MCP

OAuth Client Credentials auto-refresh for MCP

MCP Auth

OAuth PKCE per-user browser consent

MCP Auth

Per-request auth header forwarding to MCP servers

MCP Auth

Public internet IP gating per MCP server

MCP

OAuth .well-known discovery endpoints

MCP Auth

MCP guardrail modes (PreCall / DuringCall / Disabled)

MCP Policy

MCP cost tracking by server / user / tool

MCP Obs

A2A agent gateway (Google A2A protocol)★

A2A

Agent Cards + task streaming via SSE

A2A

Cross-agent tracing (X-A2A-Trace-Id)

A2A

Fallback chains with per-step retry and exponential backoff

Routing

Latency-based + tag-based provider routing

Routing

Per-provider daily budget caps with auto-failover

Routing

Microsoft Entra ID SSO + zero-touch group sync

Identity

Prometheus /metrics endpoint

Observability

Standardised response headers (cost, trace, cache, latency, provider)

Observability

Slack / Teams / Discord alerting webhooks

Alerting

VAS log audit trail with compliance dashboard + Q&A replay

Compliance

PreviousSmartflow Platform Overview NextSmartflow API Reference

hashtagSmartflow Platform Capabilities

hashtagOpenAI-Compatible Endpoint with Provider Auto-Routing

hashtagCursor IDE / Claude Code Passthrough

hashtagVirtual Keys with Spend Budgets

hashtagSemantic Similarity Caching ★ UNIQUE

hashtagPer-Request Cache Controls

hashtagLearning-Based Guardrails ★ UNIQUE

hashtagPolicy Groups with Inheritance + Tag-Wildcard Scoping

hashtagHTTP, SSE, and STDIO Transports

hashtagPer-Server Tool Access Control

hashtagSemantic Tool Filtering

hashtagBuilt-In Vector Store API ★ UNIQUE

hashtagOAuth Client Credentials Auto-Refresh

hashtagOAuth PKCE Per-User Browser Consent

hashtagPer-Request Server Auth Header Forwarding

hashtagPreCall / DuringCall / Disabled Guardrail Modes

hashtagGoogle A2A Open Protocol Gateway ★ UNIQUE

hashtagFallback Chains with Retry and Backoff

hashtagLatency-Based + Tag-Based + Budget-Capped Routing

hashtagMicrosoft Entra ID SSO + Group Sync

hashtagPrometheus /metrics Endpoint

hashtagStandardised Response Headers

hashtagAlerting Webhooks — Slack, Teams, Discord

hashtagVAS Log Audit Trail + Compliance Dashboard