# Smartflow API Reference&#x20;

Platform v3.0 • SDK v0.3.0 • February 2026

Smartflow is an enterprise AI gateway that proxies requests to multiple LLM providers, enforces compliance policy, caches semantically, and orchestrates MCP tools and A2A agents. This document covers every API surface the platform exposes: the LLM proxy, management APIs, MCP gateway, A2A gateway, vector store, RAG pipeline, and the Python SDK.

## Architecture Overview

Smartflow runs as five cooperating services:

| Service                  | Default Port | Purpose                                                                               |
| ------------------------ | ------------ | ------------------------------------------------------------------------------------- |
| smartflow (proxy)        | 7775         | LLM proxy, MCP gateway, A2A gateway, semantic caching, pre/post-call compliance hooks |
| api\_server (management) | 7778         | Virtual keys, routing chains, audit logs, analytics                                   |
| compliance\_api\_server  | 7777         | ML content scanning, PII redaction, adaptive learning, intelligent scan               |
| policy\_perfect\_api     | 7782         | Policy and preset CRUD, AI document-to-policy generation, assignment management       |
| smartflow-hybrid-bridge  | 3500         | Cross-datacenter Redis log aggregation                                                |

All five services share one Redis instance for shared state: routing tables, semantic cache, VAS logs, provider latency metrics, virtual key budgets, and MCP server registry. The Policy Perfect API additionally requires PostgreSQL for durable policy and preset storage. In production the proxy sits behind a TLS-terminating reverse proxy (Caddy or nginx). Management, compliance, and policy APIs are backend surfaces.

## Authentication

### Virtual Keys

The primary credential. Issue `sk-sf-{48-hex}` tokens through the management API. Each key carries optional spend limits and model restrictions.

```
Authorization: Bearer sk-sf-a1b2c3...
```

### Provider API Keys

Stored server-side in Redis. Clients never send raw provider credentials. The proxy resolves the correct key from the key store when forwarding requests to providers.

### Anthropic Native Passthrough

For `/anthropic/*` routes, the proxy automatically injects the configured `ANTHROPIC_API_KEY`. Clients do not need to supply an `x-api-key` header.

### JWT (Application Layer)

The SafeChat product and dashboard use `smartflow_token` cookie-based JWT for browser sessions. JWT validation occurs at the application layer, not in the proxy.

## LLM Proxy Endpoints

All proxy endpoints are on port `7775` by default.

### /v1/chat/completions

POST/v1/chat/completions

OpenAI-compatible chat completions. Accepts any OpenAI-format request body. Provider and model are resolved from the model name or an explicit prefix.

```
{ "model": "gpt-4o", "messages": [ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What is the capital of France?"} ], "temperature": 0.7, "max_tokens": 256, "stream": false }
```

Model-prefix routing:

| Prefix / Pattern             | Provider      |
| ---------------------------- | ------------- |
| gpt-*, o1-*, o3-*, chatgpt-* | OpenAI        |
| claude-\*                    | Anthropic     |
| gemini-\*                    | Google Gemini |
| grok-\*                      | xAI           |
| mistral-*, mixtral-*         | Mistral AI    |
| command-*, c4ai-*            | Cohere        |
| llama-*, groq/*              | Groq          |
| openrouter/\*                | OpenRouter    |
| ollama/\*                    | Local Ollama  |
| azure/\*                     | Azure OpenAI  |

No prefix is required for the primary supported providers — model name heuristic detects `gemini-*`, `claude-*`, `gpt-*`, etc. automatically. An explicit `provider/model` prefix always takes precedence.

#### Multimodal — Image

```
{ "model": "gpt-4o", "messages": [{ "role": "user", "content": [ {"type": "text", "text": "What is in this image?"}, {"type": "image_url", "image_url": {"url": "data:image/png;base64,..."}} ] }] }
```

#### Multimodal — Audio (gpt-4o-audio-preview)

```
{ "type": "input_audio", "input_audio": {"data": "<base64>", "format": "mp3"} }
```

#### Response

```
{ "id": "chatcmpl-...", "object": "chat.completion", "model": "gpt-4o", "choices": [{ "index": 0, "message": {"role": "assistant", "content": "Paris."}, "finish_reason": "stop" }], "usage": {"prompt_tokens": 24, "completion_tokens": 3, "total_tokens": 27} }
```

### /anthropic/v1/messages

POST/anthropic/v1/messages

Native Anthropic Messages API passthrough. The proxy injects the API key from the server key store. The full Anthropic request and response format is preserved with no translation. Also accessible as `/cursor/v1/messages` for Cursor IDE passthrough. The `[1m]` suffix that Claude Code appends to model names is stripped automatically.

```
{ "model": "claude-3-5-sonnet-20241022", "max_tokens": 1024, "system": "You are a helpful assistant.", "messages": [ {"role": "user", "content": "Hello, Claude."} ] }
```

#### Multimodal — Image (native Anthropic)

```
{ "type": "image", "source": {"type": "base64", "media_type": "image/png", "data": "..."} }
```

#### Multimodal — PDF Document (native Anthropic)

```
{ "type": "document", "source": {"type": "base64", "media_type": "application/pdf", "data": "..."} }
```

### /v1/embeddings

POST/v1/embeddings

Generate vector embeddings. Supports multi-provider routing via model prefix.

```
{"model": "text-embedding-3-small", "input": "Your text here"}
```

Response follows the OpenAI embeddings format with `data[].embedding` float arrays.

### /v1/audio/transcriptions

POST/v1/audio/transcriptions

Transcribe audio. Multipart form upload. Routes to OpenAI Whisper by default. Use `groq/whisper-large-v3` for Groq, `deepgram/nova-2` for Deepgram.

```
Content-Type: multipart/form-data file=@audio.mp3 model=whisper-1
```

### /v1/audio/speech

POST/v1/audio/speech

Text-to-speech synthesis. Returns raw audio bytes.

```
{ "model": "tts-1", "input": "Hello, world.", "voice": "nova", "response_format": "mp3" }
```

### /v1/images/generations

POST/v1/images/generations

```
{ "model": "dall-e-3", "prompt": "A futuristic city at sunrise", "n": 1, "size": "1024x1024", "quality": "hd", "style": "vivid", "response_format": "url" }
```

### /v1/rerank

POST/v1/rerank

Document reranking. Compatible with Cohere's rerank API.

```
{ "model": "rerank-english-v3.0", "query": "What is the return policy?", "documents": ["Document one text.", "Document two text."], "top_n": 3 }
```

### /v1/models

GET/v1/models

List available models across all enabled providers.

### /v1/completions

POST/v1/completions

Legacy text completions. Forwarded to the configured provider.

## Routing and Provider Selection

### Automatic Model-Name Heuristic

For requests to `/v1/chat/completions` with no explicit provider prefix, the proxy infers the provider from the model name. An explicit `provider/model` prefix always takes precedence over heuristic detection.

| Pattern                                                        | Inferred Provider |
| -------------------------------------------------------------- | ----------------- |
| gpt-*, o1-*, o3-*, o4-*, chatgpt-*, whisper-*, tts-*, dall-e-* | OpenAI            |
| claude-\*                                                      | Anthropic         |
| gemini-\*                                                      | Google            |
| grok-\*                                                        | xAI               |
| mistral-*, mixtral-*                                           | Mistral           |
| command-\*                                                     | Cohere            |
| llama-\*                                                       | Groq              |

### Routing Strategies

Configured per fallback chain via the management API:

| Strategy           | Behavior                                                                           |
| ------------------ | ---------------------------------------------------------------------------------- |
| round\_robin       | Distribute requests across targets in order                                        |
| weighted           | Traffic proportional to assigned weights                                           |
| least\_connections | Send to provider with fewest in-flight requests                                    |
| random             | Random selection among healthy providers                                           |
| priority           | Try targets in order; fall back only on failure                                    |
| latency            | Route to provider with lowest p95 rolling EMA latency (tracked in Redis)           |
| cost               | Route to provider with lowest per-token cost; skip providers over daily budget cap |

### Fallback Chains

Named ordered provider lists with retry logic. Configured at `POST /api/routing/fallback-chains`.

```
{ "name": "production-chain", "targets": [ {"provider": "openai", "model": "gpt-4o", "weight": 1}, {"provider": "anthropic", "model": "claude-3-5-sonnet-20241022", "weight": 1}, {"provider": "google", "model": "gemini-1.5-pro", "weight": 1} ], "retry_on": ["429", "500", "502", "503"], "max_retries": 2, "backoff_ms": 500 }
```

On 429 or 5xx the proxy retries the next target with exponential backoff. Non-retryable 4xx errors bypass retry. Providers that have exceeded their daily budget cap are excluded from selection automatically.

## MetaCache — Semantic Caching

The MetaCache intercepts every `/v1/chat/completions` request before any provider call is made.

### How It Works

The incoming query is embedded and its cosine similarity is computed against stored request embeddings. If similarity exceeds the configured threshold, the cached response is returned. Otherwise the request is forwarded to the provider and the response is stored. Responses are semantically compressed before storage to reduce Redis footprint while preserving meaning.

Three tiers operate in sequence: L1 in-process memory, L2 Redis semantic similarity, L3 Redis exact match. Every cache lookup traverses all three before forwarding.

### Per-Request Cache Controls

| Header                       | Effect                                         |
| ---------------------------- | ---------------------------------------------- |
| Cache-Control: no-cache      | Bypass cache read; always query the provider   |
| Cache-Control: no-store      | Bypass cache write; do not cache this response |
| x-smartflow-cache-ttl: 3600  | Override TTL in seconds for this response      |
| x-smartflow-cache-namespace: | Scope cache to a logical partition             |

Cached responses return `x-smartflow-cache-hit: true` and `x-smartflow-cache-key` for client-side correlation.

## MCP Gateway

Smartflow implements the Model Context Protocol (MCP) gateway. Register external MCP servers and invoke their tools through the proxy with shared authentication, budgeting, and audit logging.

### Server Registry

GET/api/mcp/servers

List registered MCP servers.

POST/api/mcp/servers

Register an MCP server.

```
{ "id": "github-tools", "name": "GitHub MCP Server", "base_url": "https://mcp.github.example.com", "auth_type": "bearer", "allowed_tools": ["list_repos", "create_issue"], "disallowed_tools": [], "cost_info": {"per_call_usd": 0.001}, "guardrail_mode": "strict" }
```

### Tool Invocation

POST/{server\_id}/mcp/

POST/mcp/v1/{server\_id}/tools/call

The proxy authenticates the request, applies per-tool access controls, records cost, and forwards to the server.

### Catalog and Search

GET/api/mcp/catalog

Browse the tool catalog across all registered servers.

GET/api/mcp/tools/search?q={query}\&k={n}

Semantic search over the tool catalog. Returns the top `k` tools matching the natural-language query.

GET/api/mcp/tools/index

Full indexed tool list with embedding metadata.

### Access Control

Per-server configuration fields for access control:

| Field                           | Type      | Description                                                         |
| ------------------------------- | --------- | ------------------------------------------------------------------- |
| allowed\_tools                  | string\[] | If non-empty, only these tools may be called                        |
| disallowed\_tools               | string\[] | These tools are always blocked                                      |
| allowed\_params                 | object    | Per-tool parameter allowlists                                       |
| guardrail\_mode                 | string    | `"strict"` — block on policy violation; `"log"` — flag and continue |
| available\_on\_public\_internet | bool      | If false, only accessible from approved network segments            |

#### Access Request Flow

GET/api/mcp/catalog/requests

POST/api/mcp/catalog/requests

POST/api/mcp/catalog/requests/{id}/approve

POST/api/mcp/catalog/requests/{id}/deny

#### OAuth Flow

GET/api/mcp/auth/initiate?server\_id={id}

GET/api/mcp/auth/callback

GET/api/mcp/auth/tokens

#### Usage and Logs

GET/api/mcp/usage

Aggregated cost and call counts per server and tool.

GET/api/mcp/logs

Per-invocation audit logs.

### API Generation from OpenAPI Spec

POST/api/mcp/generate

Auto-generate an MCP server adapter from an OpenAPI specification.

```
{ "spec": "<OpenAPI JSON or YAML string>", "server_id": "my-api", "server_name": "My REST API", "base_url": "https://api.example.com", "include_methods": ["GET", "POST"] }
```

## A2A Agent Gateway

Smartflow implements the A2A (Agent-to-Agent) protocol for inter-agent communication. Register external agents and invoke them with full logging and routing.

### Agent Card

GET/a2a/{agent\_id}/.well-known/agent.json

Returns the agent's machine-readable capability card: name, capabilities, supported task types, and authentication requirements.

### Task Invocation

POST/a2a/{agent\_id}

Send a task to a registered agent. The proxy forwards the request, captures the response, and logs both.

```
{ "id": "task-uuid", "message": { "role": "user", "parts": [{"type": "text", "text": "Summarize the latest earnings report."}] } }
```

Supports synchronous JSON responses and SSE streaming for long-running tasks. Include `x-a2a-trace-id` to correlate task invocations across agents in distributed workflows.

## Vector Store API

Built-in vector store backed by Redis. No external vector database required. All endpoints are on the proxy at port `7775`.

POST/v1/vector\_stores

Create a vector store.

```
{ "name": "product-documentation", "description": "Internal product docs", "metadata": {"team": "engineering"} }
```

Response includes `id`, `name`, `description`, `file_count`, `created_at`.

GET/v1/vector\_stores

List all vector stores.

GET/v1/vector\_stores/{id}

Get a specific vector store.

DELETE/v1/vector\_stores/{id}

Delete a vector store and all its files.

POST/v1/vector\_stores/{id}/files

Add a text document. The document is chunked and embedded automatically.

```
{ "content": "Full document text...", "filename": "architecture.md", "metadata": {"version": "3.0"} }
```

GET/v1/vector\_stores/{id}/files

List files in a vector store.

POST/v1/vector\_stores/{id}/search

Semantic search over stored documents.

```
{ "query": "How does the caching layer work?", "max_results": 5, "score_threshold": 0.7 } // Response { "results": [ {"file_id": "vf_xyz", "filename": "architecture.md", "content": "...chunk...", "score": 0.91} ], "total": 1 }
```

## RAG Pipeline API

Built on top of the vector store. Ingest documents with automatic chunking, then retrieve context for LLM augmentation.

POST/v1/rag/ingest

Chunk a document, embed each chunk, and store in a named vector store.

```
{ "content": "Full document text...", "vector_store_id": "vs_abc123", "filename": "report-q4.txt", "chunk_size": 512, "chunk_overlap": 64, "metadata": {"source": "internal"} }
```

| Field             | Type   | Default  | Description                        |
| ----------------- | ------ | -------- | ---------------------------------- |
| content           | string | required | Full document text                 |
| vector\_store\_id | string | required | Target store (must already exist)  |
| filename          | string | ""       | Display name for the file          |
| chunk\_size       | int    | 512      | Characters per chunk               |
| chunk\_overlap    | int    | 64       | Overlap between consecutive chunks |
| metadata          | object | {}       | Arbitrary key-value metadata       |

Response: `{ "store_id", "file_id", "chunks_created", "status": "completed" }`

POST/v1/rag/query

Embed a question, retrieve matching chunks, and optionally assemble a context string for injection into an LLM system prompt.

```
{ "query": "What were the Q4 revenue figures?", "vector_store_id": "vs_abc123", "max_results": 5, "score_threshold": 0.0, "include_context": true }
```

| Field             | Default  | Description                                      |
| ----------------- | -------- | ------------------------------------------------ |
| query             | required | Natural language question                        |
| vector\_store\_id | required | Store to search                                  |
| max\_results      | 5        | Maximum chunks to return                         |
| score\_threshold  | 0.0      | Minimum cosine similarity (0 = return all)       |
| include\_context  | true     | Concatenate chunks into a `context` string field |

Response includes `chunks[]`, `context` (concatenated string for prompt injection), and `total`.

## Management API

Management API runs on port `7778`.

### Virtual Keys

GET/api/enterprise/vkeys

List all virtual keys.

POST/api/enterprise/vkeys

Create a virtual key.

```
{ "alias": "team-alpha", "budget_period": "monthly", "max_budget_usd": 100.00, "model_restrictions": ["gpt-4o", "claude-3-5-sonnet-20241022"], "rpm_limit": 60, "tpm_limit": 100000 }
```

DELETE/api/enterprise/vkeys/{key}

Revoke a virtual key.

### Routing API

GET/api/routing/fallback-chains

POST/api/routing/fallback-chains

DELETE/api/routing/fallback-chains/{name}

GET/api/routing/status

Current routing state: active provider, fallback chain, last failure.

POST/api/routing/force-provider

```
{"provider": "openai", "duration_seconds": 600}
```

### Audit Logs (VAS)

GET/api/vas/logs?limit=50\&provider=openai

Retrieve VAS audit logs. Every request proxied through Smartflow produces a log entry including: timestamp, provider, model, prompt tokens, completion tokens, cost in USD, cache hit flag, compliance flags, user context, and latency.

GET/api/vas/logs/hybrid

Retrieve logs aggregated across multiple Smartflow instances via the hybrid bridge.

### Analytics

GET/api/analytics?period=7d

Usage analytics: request volume, cost by provider, cache hit rate, top models, top users.

## Compliance API

The Compliance API runs on port `7777`. It provides ML-based content scanning, PII detection and redaction, and an adaptive learning loop that improves over time based on human feedback. The proxy integrates with this service on every request when pre/post-call scanning is enabled.

POST/v1/compliance/scan

Rule-based compliance scan against configured policies.

```
{ "content": "Text to scan", "policy": "enterprise_standard", "user_id": "user-123", "org_id": "acme" } // Response { "has_violations": false, "compliance_score": 0.97, "risk_level": "low", "recommended_action": "Allow", "violations": [], "pii_detected": [] }
```

POST/v1/compliance/intelligent-scan

Maestro ML policy engine. Evaluates intent against your organization's policy documents — not keyword matching.

Response includes `risk_score` (0–1), `risk_level`, `recommended_action` (Allow / Flag / Block), `violations`, `explanation`.

POST/v1/compliance/feedback

Submit a correction to improve the ML model's future predictions.

```
{ "scan_id": "scan-xyz", "correct_action": "Allow", "correct_risk_level": "low", "notes": "False positive — internal terminology" }
```

POST/v1/compliance/redact

Detect and redact PII from content. Returns the redacted string.

GET/v1/compliance/learning/status/{user\_id}

GET/v1/compliance/learning/summary

GET/v1/compliance/ml/stats

GET/v1/compliance/org/baseline/{org\_id}

## Policy Perfect API

The Policy Perfect API runs on port `7782`. It manages the organization's compliance policy library — the source documents the Maestro ML engine reads when evaluating requests. Backed by PostgreSQL for durable policy storage.

GET/health

Liveness check for the Policy Perfect service.

GET/api/stats

Aggregate counts for the current state of the policy library.

```
{ "total_policies": 42, "total_presets": 8, "total_applications": 1204, "compliance_violations": 3 }
```

### Policies

Policies are named compliance rules attached to scopes. The Maestro engine evaluates all active policies on every request.

Policy types:

| Type       | Description                                          |
| ---------- | ---------------------------------------------------- |
| compliance | Regulatory rules — HIPAA, GDPR, SOC 2, PCI-DSS, etc. |
| brand      | Brand voice and communication standards              |
| format     | Output format constraints                            |
| role       | Role-based access and behavior restrictions          |
| industry   | Industry-specific usage rules                        |
| legal      | Legal department rules and disclaimers               |
| security   | Security guardrails and data handling policies       |

GET/api/policies

List all active policies.

POST/api/policies

Create a policy.

```
{ "name": "HIPAA PHI Protection", "description": "Prevent transmission of protected health information", "policy_type": "compliance", "content": "Do not include patient names, diagnoses, medical record numbers, or any PHI in AI responses.", "priority": 90, "applicable_providers": ["all"], "applicable_models": ["all"], "regulatory_framework": "HIPAA", "severity": "critical", "metadata": { "departments": ["clinical", "billing"], "ad_groups": ["clinicians", "admins"] } }
```

| Field                 | Type      | Description                                                                   |
| --------------------- | --------- | ----------------------------------------------------------------------------- |
| name                  | string    | Policy display name                                                           |
| policy\_type          | string    | One of the seven policy types above                                           |
| content               | string    | Policy text read by the Maestro ML engine                                     |
| priority              | int       | Evaluation order (0–100); higher values evaluated first                       |
| applicable\_providers | string\[] | Providers this policy applies to; `["all"]` for universal                     |
| applicable\_models    | string\[] | Models this policy applies to; `["all"]` for universal                        |
| regulatory\_framework | string    | HIPAA, GDPR, SOC2, PCI-DSS, etc.                                              |
| severity              | string    | `critical`, `high`, `medium`, `low`                                           |
| metadata              | object    | Layer 2/3 targeting: `source_ips`, `ad_groups`, `departments`, `applications` |

GET/api/policies/{id}

Get a policy by ID.

PUT/api/policies/{id}

Update a policy. All fields optional; only supplied fields are changed. Set `"is_active": false` to deactivate without deleting.

DELETE/api/policies/{id}

Delete a policy permanently.

### Presets

Presets are named, ordered collections of policies. Assign a preset to a team, role, or virtual key instead of managing individual policies per scope.

GET/api/presets

List all presets. Each entry includes the preset metadata and its ordered policy list.

POST/api/presets

Create a preset.

```
{ "name": "Healthcare Standard", "description": "Default policy set for all clinical staff", "use_case": "Clinical AI assistant", "policy_ids": ["pol_hipaa_phi", "pol_brand_tone", "pol_no_diagnosis"] }
```

Policy order in `policy_ids` determines evaluation priority.

GET/api/presets/{id}

Get a preset and its full ordered policy list.

### AI Document-to-Policy Generation

Upload a compliance document (PDF, DOCX, TXT — up to 50 MB). The service uses GPT-4o to extract structured policy suggestions automatically. Processing is asynchronous; poll for progress with the returned job ID.

POST/api/policies/generate-from-document

Multipart form upload. Field name: `file`.

```
Content-Type: multipart/form-data file=@hipaa-policy-handbook.pdf
```

Immediate response:

```
{ "success": true, "job_id": "550e8400-e29b-41d4-a716-446655440000", "message": "Document processing started." }
```

GET/api/documents/job/{job\_id}/progress

Poll for processing status. Status values: `pending`, `processing`, `completed`, `failed`.

```
{ "success": true, "job": { "id": "550e8400-...", "filename": "hipaa-policy-handbook.pdf", "status": "processing", "progress_pct": 62, "created_at": "2026-02-19T10:00:00Z" } }
```

GET/api/documents/job/{job\_id}/results

Retrieve suggested policies once status is `completed`. Each suggestion includes a `confidence` score (0–1). Review suggestions and create live policies via `POST /api/policies`.

```
{ "success": true, "job_id": "550e8400-...", "filename": "hipaa-policy-handbook.pdf", "total_policies": 7, "suggested_policies": [ { "id": "sugg_abc", "name": "Minimum Necessary Standard", "type": "compliance", "content": "Limit PHI access and disclosure to the minimum necessary...", "priority": 85, "regulatory_framework": "HIPAA", "confidence": 0.94 } ] }
```

## Alerting

Smartflow fires HTTP POST webhooks when threshold events occur. Configuration is via environment variables on the proxy server.

| Alert Type      | Trigger                                              |
| --------------- | ---------------------------------------------------- |
| BudgetThreshold | Provider or virtual key spend exceeds configured cap |
| ProviderFailure | Error rate for a provider exceeds spike threshold    |
| SlowRequest     | Request latency exceeds the slow-request threshold   |
| Custom          | Programmatic alerts from the management API          |

Configure any combination of webhook destinations:

```
SLACK_WEBHOOK_URL=https://hooks.slack.com/services/... TEAMS_WEBHOOK_URL=https://outlook.office.com/webhook/... DISCORD_WEBHOOK_URL=https://discord.com/api/webhooks/... SMARTFLOW_ALERTS_ENABLED=true
```

Alerts are fire-and-forget — they do not block the request that triggered them.

## Observability

GET/health/liveliness

Returns `200 OK` with `{"status":"ok"}` when the proxy process is running.

GET/health/readiness

Returns `200 OK` when Redis is connected and providers are reachable.

GET/metrics

Prometheus-compatible metrics. Exposed metrics:

| Metric                               | Description                                    |
| ------------------------------------ | ---------------------------------------------- |
| smartflow\_requests\_total           | Request counter by provider, model, status     |
| smartflow\_request\_latency\_seconds | Request latency histogram                      |
| smartflow\_cache\_hits\_total        | Cache hit counter by tier (L1/L2/L3)           |
| smartflow\_cache\_misses\_total      | Cache miss counter                             |
| smartflow\_provider\_errors\_total   | Upstream error counter by provider and status  |
| smartflow\_tokens\_total             | Token usage by provider and direction          |
| smartflow\_cost\_usd\_total          | Cumulative cost by provider                    |
| smartflow\_mcp\_calls\_total         | MCP tool invocation counter by server and tool |
| smartflow\_vkey\_spend\_usd          | Per-virtual-key spend gauge                    |

## Python SDK

### Installation

```
pip install smartflow-sdk # or from source pip install git+https://github.com/SRAGroupTX/SmartflowV3.git#subdirectory=sdk/python
```

Requirements: Python 3.10+, `httpx >= 0.24`

### SmartflowClient

The primary async client.

```
class SmartflowClient( base_url: str, api_key: Optional[str] = None, timeout: float = 30.0, management_port: int = 7778, compliance_port: int = 7777, bridge_port: int = 3500, )
```

| Parameter        | Type  | Default | Description                                       |
| ---------------- | ----- | ------- | ------------------------------------------------- |
| base\_url        | str   | —       | Proxy URL, e.g. `"https://smartflow.example.com"` |
| api\_key         | str   | None    | Virtual key sent as `Authorization: Bearer`       |
| timeout          | float | 30.0    | Request timeout in seconds                        |
| management\_port | int   | 7778    | Management API port                               |
| compliance\_port | int   | 7777    | Compliance API port                               |
| bridge\_port     | int   | 3500    | Hybrid bridge port                                |

```
from smartflow import SmartflowClient async with SmartflowClient("https://smartflow.example.com", api_key="sk-sf-...") as sf: reply = await sf.chat("What is the capital of France?") print(reply)
```

### Core AI Methods

#### chat()

```
async def chat( message: str, model: str = "gpt-4o", system_prompt: Optional[str] = None, temperature: float = 0.7, max_tokens: Optional[int] = None, **kwargs, ) -> str
```

Send a message, receive the reply as a plain string.

```
reply = await sf.chat("Summarise this in one sentence.", model="claude-3-5-sonnet-20241022")
```

#### chat\_completions()

```
async def chat_completions( messages: List[Dict[str, str]], model: str = "gpt-4o", temperature: float = 0.7, max_tokens: Optional[int] = None, stream: bool = False, **kwargs, ) -> AIResponse
```

Full OpenAI-compatible completions. Returns an `AIResponse` object.

```
response = await sf.chat_completions( messages=[ {"role": "system", "content": "You are a concise assistant."}, {"role": "user", "content": "What is 2 + 2?"}, ], model="gpt-4o-mini", max_tokens=50, ) print(response.content) print(response.usage.total_tokens)
```

#### stream\_chat()

```
async def stream_chat(message: str, model: str = "gpt-4o", ...) -> AsyncIterator[str]
```

Async generator that yields text delta strings as they stream.

```
async for chunk in sf.stream_chat("Tell me a story about a robot"): print(chunk, end="", flush=True)
```

#### embeddings()

```
async def embeddings( input: Union[str, List[str]], model: str = "text-embedding-3-small", encoding_format: str = "float", dimensions: Optional[int] = None, input_type: Optional[str] = None, **kwargs, ) -> Dict[str, Any]
```

```
result = await sf.embeddings("Hello world") vector = result["data"][0]["embedding"] # Cohere with input_type result = await sf.embeddings( ["doc one", "doc two"], model="cohere/embed-english-v3.0", input_type="search_document", ) # Reduce dimensions result = await sf.embeddings("Hello", model="text-embedding-3-large", dimensions=256)
```

#### rerank()

```
result = await sf.rerank( "What is the return policy?", ["We accept returns within 30 days.", "Contact support@example.com."], top_n=1, )
```

### Audio and Image Methods

#### audio\_transcription()

```
with open("recording.mp3", "rb") as f: result = await sf.audio_transcription(f, model="whisper-1") print(result["text"]) # Groq Whisper (faster, same format) with open("recording.mp3", "rb") as f: result = await sf.audio_transcription(f, model="groq/whisper-large-v3")
```

#### text\_to\_speech()

```
audio = await sf.text_to_speech("Hello, this is Smartflow.", voice="nova") with open("output.mp3", "wb") as f: f.write(audio)
```

#### image\_generation()

```
result = await sf.image_generation( "A mountain landscape at dawn", model="dall-e-3", size="1792x1024", quality="hd", ) print(result["data"][0]["url"])
```

### Compliance Methods

#### check\_compliance()

```
result = await sf.check_compliance("User message text", policy="hipaa") if result.has_violations: print(result.violations)
```

#### intelligent\_scan()

```
result = await sf.intelligent_scan("User message text") print(f"{result.risk_level}: {result.recommended_action}")
```

#### redact\_pii()

```
clean = await sf.redact_pii("My SSN is 123-45-6789") # "My SSN is [SSN]"
```

#### submit\_compliance\_feedback()

```
await sf.submit_compliance_feedback( scan_id="scan-xyz", correct_action="Allow", correct_risk_level="low", notes="False positive — internal terminology", )
```

### Monitoring Methods

#### get\_cache\_stats()

```
stats = await sf.get_cache_stats() print(f"Hit rate: {stats.hit_rate:.1%}") print(f"Tokens saved: {stats.tokens_saved:,}") print(f"Cost saved: ${stats.cost_saved_usd:.4f}") print(f"L1/L2/L3: {stats.l1_hits} / {stats.l2_hits} / {stats.l3_hits}")
```

#### health\_comprehensive()

```
h = await sf.health_comprehensive() print(h.overall_status) # "healthy" print(h.redis_connected) # True print(h.providers_available) # ["openai", "anthropic", "google"]
```

#### Other monitoring methods

| Method                                       | Returns                                                     |
| -------------------------------------------- | ----------------------------------------------------------- |
| health()                                     | Dict — basic liveness check                                 |
| get\_provider\_health()                      | List\[ProviderHealth] — latency + success rate per provider |
| get\_logs(limit, provider)                   | List\[VASLog] — audit log entries                           |
| get\_analytics(period)                       | Dict — usage and cost analytics                             |
| get\_routing\_status()                       | Dict — current routing state                                |
| force\_provider(provider, duration\_seconds) | Dict — force routing for a duration                         |

### SmartflowAgent

Stateful agent with conversation memory and per-message compliance scanning.

```
async with SmartflowClient("https://smartflow.example.com", api_key="sk-...") as sf: agent = SmartflowAgent( client=sf, name="SupportBot", model="gpt-4o", system_prompt="You are a helpful customer support agent.", user_id="user-123", org_id="acme", ) r1 = await agent.chat("How do I reset my password?") r2 = await agent.chat("What if I forgot my email too?") print(agent.message_count) agent.clear_history()
```

| Method                                             | Description                                     |
| -------------------------------------------------- | ----------------------------------------------- |
| chat(message, scan\_input=True, scan\_output=True) | Send message; raises ComplianceError if blocked |
| clear\_history()                                   | Reset conversation, preserve system prompt      |
| get\_history()                                     | Return copy of message history                  |
| message\_count                                     | Number of messages in history                   |

### SmartflowWorkflow

Chain AI operations with branching and error handling.

```
workflow = SmartflowWorkflow(client, name="TicketFlow") workflow \ .add_step("classify", action="chat", config={"prompt": "Classify this ticket: {input}", "model": "gpt-4o-mini"}) \ .add_step("check", action="compliance_check", config={"content": "{output}"}) \ .add_step("route", action="condition", config={"field": "output", "cases": {"billing": "billing_step"}, "default": "general_step"}) result = await workflow.execute({"input": ticket_text}) print(result.output) print(result.steps_executed) print(result.execution_time_ms)
```

| Action              | Config fields              | Description                                                |
| ------------------- | -------------------------- | ---------------------------------------------------------- |
| "chat"              | prompt, model, temperature | Chat completion; {input} / {output} are template variables |
| "compliance\_check" | content                    | Compliance scan                                            |
| "condition"         | field, cases, default      | Branch on a context value                                  |

### SyncSmartflowClient

Synchronous wrapper for scripts and Jupyter notebooks. Every async method is available without `await`.

```
from smartflow import SyncSmartflowClient sf = SyncSmartflowClient("https://smartflow.example.com", api_key="sk-...") reply = sf.chat("Hello!") emb = sf.embeddings("Hello", model="text-embedding-3-small") img = sf.image_generation("A sunset", model="dall-e-3") transcript = sf.audio_transcription(open("audio.mp3", "rb")) audio = sf.text_to_speech("Hello!", voice="nova") ranked = sf.rerank("What is the return policy?", ["doc1", "doc2"])
```

In Jupyter with an existing event loop: `pip install nest_asyncio` then `nest_asyncio.apply()`.

### OpenAI Drop-in Replacement

Any code targeting the OpenAI API works by pointing `base_url` at Smartflow. MetaCache, compliance scanning, VAS logging, and routing apply transparently.

```
from openai import OpenAI client = OpenAI( api_key="sk-sf-your-virtual-key", base_url="https://smartflow.example.com/v1" ) response = client.chat.completions.create( model="gpt-4o", messages=[{"role": "user", "content": "Hello!"}] )
```

## Response Types

### AIResponse

| Field   | Type  | Description        |
| ------- | ----- | ------------------ |
| content | str   | First choice text  |
| choices | list  | Full choices array |
| usage   | Usage | Token usage        |
| model   | str   | Model used         |
| id      | str   | Response ID        |

### CacheStats

| Field            | Type  |
| ---------------- | ----- |
| hit\_rate        | float |
| total\_requests  | int   |
| tokens\_saved    | int   |
| cost\_saved\_usd | float |
| l1\_hits         | int   |
| l2\_hits         | int   |
| l3\_hits         | int   |

### ComplianceResult

| Field             | Type                                         |
| ----------------- | -------------------------------------------- |
| has\_violations   | bool                                         |
| compliance\_score | float                                        |
| violations        | list\[str]                                   |
| pii\_detected     | list\[str]                                   |
| risk\_level       | str — "low" / "medium" / "high" / "critical" |
| recommendations   | list\[str]                                   |
| redacted\_content | str \| None                                  |

### IntelligentScanResult

| Field               | Type                             |
| ------------------- | -------------------------------- |
| risk\_score         | float — 0.0 to 1.0               |
| risk\_level         | str                              |
| recommended\_action | str — "Allow" / "Flag" / "Block" |
| violations          | list                             |
| explanation         | str                              |

## Response Headers

Every proxied response includes these headers:

| Header                       | Description                                          |
| ---------------------------- | ---------------------------------------------------- |
| x-smartflow-provider         | Provider that served the request                     |
| x-smartflow-model            | Actual model used                                    |
| x-smartflow-request-id       | Unique request ID for log correlation                |
| x-smartflow-cache-hit        | `true` if response was served from MetaCache         |
| x-smartflow-cache-key        | Cache key when cache-hit is true                     |
| x-smartflow-latency-ms       | Total proxy latency in milliseconds                  |
| x-smartflow-cost-usd         | Estimated cost in USD for this request               |
| x-smartflow-compliance-score | Compliance score (0–1) when pre-call scan is enabled |

## Environment Variables

Set on the Smartflow server; not used in client code.

### Provider Keys

| Variable                                                    | Provider          |
| ----------------------------------------------------------- | ----------------- |
| OPENAI\_API\_KEY                                            | OpenAI            |
| ANTHROPIC\_API\_KEY                                         | Anthropic         |
| GEMINI\_API\_KEY                                            | Google Gemini     |
| XAI\_API\_KEY                                               | xAI / Grok        |
| OPENROUTER\_API\_KEY                                        | OpenRouter        |
| AZURE\_API\_KEY, AZURE\_API\_BASE, AZURE\_API\_VERSION      | Azure OpenAI      |
| MISTRAL\_API\_KEY                                           | Mistral AI        |
| COHERE\_API\_KEY                                            | Cohere            |
| GROQ\_API\_KEY                                              | Groq              |
| DEEPGRAM\_API\_KEY                                          | Deepgram          |
| FIREWORKS\_API\_KEY                                         | Fireworks AI      |
| NVIDIA\_NIM\_API\_KEY, NVIDIA\_NIM\_API\_BASE               | NVIDIA NIM        |
| HUGGINGFACE\_API\_KEY, HUGGINGFACE\_API\_BASE               | HuggingFace       |
| TOGETHER\_API\_KEY                                          | Together AI       |
| PERPLEXITY\_API\_KEY                                        | Perplexity AI     |
| REPLICATE\_API\_KEY                                         | Replicate         |
| VERTEXAI\_API\_KEY, VERTEXAI\_PROJECT, VERTEXAI\_LOCATION   | Vertex AI         |
| AWS\_ACCESS\_KEY\_ID, AWS\_SECRET\_ACCESS\_KEY, AWS\_REGION | AWS Bedrock       |
| NOVITA\_API\_KEY                                            | Novita AI         |
| VERCEL\_AI\_GATEWAY\_API\_KEY                               | Vercel AI Gateway |

### Feature Flags and Ports

| Variable                   | Default | Description                                 |
| -------------------------- | ------- | ------------------------------------------- |
| GEMINI\_ENABLED            | false   | Enable Google Gemini in intelligent routing |
| SMARTFLOW\_ALERTS\_ENABLED | true    | Enable webhook alerting                     |
| SLACK\_WEBHOOK\_URL        | —       | Slack incoming webhook                      |
| TEAMS\_WEBHOOK\_URL        | —       | Microsoft Teams webhook                     |
| DISCORD\_WEBHOOK\_URL      | —       | Discord webhook                             |
| PROXY\_PORT                | 7775    | LLM proxy port                              |
| MANAGEMENT\_PORT           | 7778    | Management API port                         |
| COMPLIANCE\_PORT           | 7777    | Compliance API port                         |
| BRIDGE\_PORT               | 3500    | Hybrid bridge port                          |

## Error Reference

### HTTP Status Codes

| Code | Meaning                                           |
| ---- | ------------------------------------------------- |
| 400  | Malformed request — check body format             |
| 401  | Missing or invalid API key                        |
| 402  | Virtual key budget exceeded                       |
| 403  | Request blocked by compliance policy              |
| 404  | Resource or route not found                       |
| 429  | Rate limit exceeded (RPM or TPM)                  |
| 500  | Proxy internal error                              |
| 502  | Upstream provider returned an error               |
| 503  | No providers available — fallback chain exhausted |

### SDK Exceptions

| Exception           | Condition                       |
| ------------------- | ------------------------------- |
| SmartflowError      | Base class for all SDK errors   |
| ConnectionError     | Cannot connect to proxy         |
| AuthenticationError | 401 — invalid or missing key    |
| RateLimitError      | 429 — rate limit hit            |
| ComplianceError     | 403 — request blocked by policy |
| ProviderError       | Upstream provider error         |
| TimeoutError        | Request timeout                 |

```
from smartflow import ComplianceError, RateLimitError import asyncio try: result = await sf.chat("sensitive message") except ComplianceError as e: print(f"Blocked by policy: {e}") except RateLimitError: await asyncio.sleep(60) # retry
```

## Changelog

{% stepper %}
{% step %}

#### v3.0 (proxy) / v0.3.0 (SDK) — 2026

New in the proxy:

* Vector Store API (`/v1/vector_stores/*`) — Redis-backed, no external vector database required
* RAG Pipeline API (`/v1/rag/ingest`, `/v1/rag/query`) — document chunking, embedding, context retrieval
* A2A Agent Gateway (`/a2a/*`) — A2A protocol for inter-agent orchestration
* Webhook alerting — Slack, Teams, Discord for budget, failure, and latency events
* Model-name heuristic routing — `claude-*`, `gemini-*`, `gpt-*` detected automatically
* Anthropic API key injection for `/anthropic/*` passthrough
* Cost-based and latency-based routing strategies
* Prometheus metrics endpoint (`/metrics`)
* MCP access control — `allowed_tools`, `disallowed_tools`, `guardrail_mode` per server
* MCP cost tracking via Redis `HINCRBYFLOAT`

New in the SDK:

* `image_generation()` — multi-provider image generation
* `audio_transcription()` — multipart audio, Groq/Deepgram/Fireworks routing
* `text_to_speech()` — returns raw audio bytes
* `stream_chat()` — async SSE iterator
* `rerank()` — Cohere-compatible document reranking
* Extended `embeddings()` with `encoding_format`, `dimensions`, `input_type`
  {% endstep %}

{% step %}

#### v2.0 (proxy) / v0.2.0 (SDK)

* MCP gateway with server registry, catalog, OAuth flow
* `SmartflowAgent` with compliance scanning and conversation memory
* `SmartflowWorkflow` for multi-step AI pipelines
* Maestro ML policy engine (intelligent compliance)
  {% endstep %}

{% step %}

#### v1.0 (proxy) / v0.1.0 (SDK)

* OpenAI-compatible proxy, virtual keys, 3-tier semantic cache
* Initial SDK: `chat`, `chat_completions`, `embeddings`
* VAS audit logging, `SyncSmartflowClient`
  {% endstep %}
  {% endstepper %}


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.langsmart.ai/smartflow-api-reference.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.