Conceptual AI Agents Universe — System Design Document
PART 1: HIGH-LEVEL DESIGN
1. Vision & Problem Statement
Organizations need AI automation across multiple business functions — code review, customer support, QA, documentation, data analysis, and more. Today, each of these is a siloed solution with its own interface, deployment, and learning curve.
AI Agents Universe solves this by providing a single platform where each AI capability is a self-contained, subscription-based plugin. One interface, one orchestrator, unlimited capabilities — added without touching existing code.
Core promise: A customer subscribes to the plugins they need. A developer publishes a plugin. The universe expands.
2. Technology Stack
All technology decisions are fixed here. These are not suggestions — they are the choices the platform is built on.
| Layer | Technology | Rationale |
|---|---|---|
| API framework | FastAPI + Uvicorn | Async-native, auto-generates OpenAPI docs, strong AI/ML ecosystem |
| Agent framework | LangGraph | Stateful graph-based agents, explicit control flow, production-grade for multi-step workflows |
| Default LLM | claude-sonnet-4-6 | Strong reasoning, good cost/performance balance for agentic tasks |
| Embedding model | text-embedding-3-small (OpenAI) | Hosted, fast, ~$0.02/1M tokens, sufficient for capability matching |
| Primary database | PostgreSQL 16 + SQLAlchemy 2 (async) + Alembic | All operational data; Alembic manages schema migrations |
| Vector store | pgvector extension (same PostgreSQL instance) | RAG for org knowledge; no separate service up to ~1M vectors per tenant |
| Cache | Redis 7 | Auth cache, plugin registry, circuit breakers, route cache, rate limits, Redis Streams queues |
| Async queues | Redis Streams | Task execution, background summarisation, webhook delivery — reuses Redis, no extra service |
| Object storage | MinIO (self-hosted, S3-compatible) | Artifacts and reports; cloud-agnostic |
| Container registry | Zot (self-hosted OCI registry) | Plugin container images; MinIO is S3, not an OCI registry |
| Secrets (v1) | Kubernetes Secrets + pgcrypto | Platform credentials in K8s Secrets; tenant tokens AES-256 encrypted in PostgreSQL |
| Rate limiting | slowapi (FastAPI middleware) | Sliding window; plugs into FastAPI route decorators |
| Metrics | Prometheus + Grafana | Kubernetes-native metrics and dashboards |
| Logs | Loki + Grafana | Structured log aggregation alongside metrics in Grafana |
| Deployment | Self-hosted Kubernetes (cloud-agnostic) | Plain Deployments + PVCs; no operators needed at v1 single-replica scale |
| Local dev | Docker Compose | Full local stack: app, PostgreSQL, Redis, MinIO, Zot |
v1 scope: Single replicas for all services (no explicit HA). SSE streaming deferred to v2. Third-party plugins are invite-only at launch. Vault introduced in v2 when untrusted third-party plugins require isolated secret management and audit trails.
What was simplified vs. earlier drafts:
| Removed / replaced | With | Why |
|---|---|---|
| HashiCorp Vault | K8s Secrets + pgcrypto | No untrusted plugins in v1; Vault overhead not justified |
| RabbitMQ | Redis Streams | Already have Redis; three queue use cases don’t need AMQP |
| 5 separate microservices | 1 FastAPI app + 2 background workers | Registration, billing, notification are not services at this scale |
| 5 Kubernetes operators | Plain Deployments + PVCs | Operators are for HA clusters; v1 is single-replica |
| Custom Schema Registry service | envelope_schemas table + in-process cache | A service for one table is over-engineering |
| MinIO as image store | Zot (OCI registry) | Containerd cannot pull from S3; OCI images need an OCI registry |
| bcrypt for API key hashing | SHA-256 | bcrypt is for passwords; API keys are long random strings; SHA-256 is fast and sufficient |
| 3 routing confidence thresholds | 2 configurable thresholds | Simpler to calibrate; removes an undefined dead zone between 0.45–0.65 |
| Gateway + Orchestrator (2 pods) | Single FastAPI app | Internal HTTP hop for nothing; auth is middleware, not a separate service |
3. System Overview
flowchart TB
subgraph External["External Channels"]
direction LR
CLI --- API --- WEB --- SLACK --- WEBHOOK
end
subgraph App["Platform App (single FastAPI + Uvicorn process)"]
AUTH["API Key Auth<br/><i>SHA-256 verify → tenant_id + user_id</i>"]
RATE["Rate Limiter<br/><i>slowapi sliding window</i>"]
ROUTER["Intent Router<br/><i>text-embedding-3-small + claude-sonnet-4-6 fallback</i>"]
REGISTRY["Plugin Registry<br/><i>Redis cache / PostgreSQL source of truth</i>"]
CONTEXT["Shared Context Store<br/><i>History (windowed) + RAG org knowledge (pgvector)</i>"]
SEQUENCER["Task Sequencer<br/><i>Single & multi-plugin plans, DFS dep check</i>"]
FAILURE["Failure Manager<br/><i>Retry, fallback, circuit breaker</i>"]
AUTH --> RATE --> ROUTER
ROUTER --> REGISTRY
ROUTER --> CONTEXT
ROUTER --> SEQUENCER
SEQUENCER --> FAILURE
end
subgraph Workers["Background Workers"]
WDELIVER["Webhook Delivery Worker<br/><i>Redis Streams consumer</i>"]
WBILLING["Billing Aggregation Worker<br/><i>Scheduled cron</i>"]
end
subgraph Subscription["Subscription Layer (in-process)"]
ACCESS["Access Control"]
METER["Usage Metering"]
QUOTA["Quota + Token Budget"]
ACCESS --> METER --> QUOTA
end
subgraph PluginUniverse["Plugin Universe (LangGraph agents)"]
direction LR
P1["Code Review Pro"] --- P2["Customer Support AI"] --- P3["QA Transformer"]
end
subgraph Infra["Shared Infrastructure"]
REDIS2["Redis 7<br/><i>Cache + Streams queues</i>"]
PG["PostgreSQL 16 + pgvector<br/><i>All operational data + org knowledge RAG</i>"]
MINIO["MinIO<br/><i>Artifacts + reports</i>"]
ZOT["Zot OCI Registry<br/><i>Plugin container images</i>"]
PROM["Prometheus + Grafana + Loki"]
K8SSEC["Kubernetes Secrets<br/><i>Platform credentials</i>"]
end
External --> App
App --> Subscription
Subscription --> PluginUniverse
App --> Workers
App --> Infra
Workers --> Infra
PluginUniverse --> Infra
4. Core Architecture Layers
Layer 1: Interface, Auth & Rate Limiting
The platform is a single FastAPI application. Auth middleware and rate limiting run in-process before route handlers — no separate gateway pod, no internal HTTP hop, no serialisation of auth context between services.
API Key Authentication
sequenceDiagram
participant Client
participant App as FastAPI App
participant Cache as Redis
participant DB as PostgreSQL
Client->>App: Request + Authorization: Bearer sk_live_a1b2c3d4...
App->>App: Extract lookup prefix<br/>(first 8 chars of random part: "a1b2c3d4")
App->>Cache: GET auth:a1b2c3d4 (5-min TTL)
alt Cache hit
Cache-->>App: {tenant_id, user_id, scopes}
else Cache miss
App->>DB: SELECT * FROM api_keys WHERE key_prefix = 'a1b2c3d4'
DB-->>App: {key_hash, tenant_id, user_id, scopes, expires_at, revoked_at}
App->>App: SHA-256(raw_key) == key_hash?
alt Valid + not expired + not revoked
App->>Cache: SET auth:a1b2c3d4 {tenant_id, user_id, scopes} EX 300
else Invalid / expired / revoked
App-->>Client: 401 AUTH_INVALID
end
end
App->>App: Attach (tenant_id, user_id, scopes) to request state
App-->>Client: Continue to handler
Key format: sk_live_<32-char random hex> for production; sk_test_<32-char random hex> for sandbox. The lookup prefix is the first 8 characters of the random part (the hex after sk_live_), not sk_live_ itself which is identical on every key. SHA-256 is used for hashing — API keys are long random strings, not passwords; SHA-256 runs in microseconds and is cryptographically sufficient for this use case. bcrypt’s brute-force resistance adds nothing over a 32-byte random value.
API Key Lifecycle
flowchart LR
CREATE["POST /v1/api-keys<br/><i>Generate random key<br/>SHA-256 hash → store<br/>Return plaintext once only</i>"]
--> ACTIVE["Active<br/><i>5-min Redis auth cache</i>"]
ACTIVE -->|"Manual rotation"| ROTATE["New key issued<br/><i>Old key valid for 24h grace period</i>"]
ROTATE --> ACTIVE
ACTIVE -->|"DELETE /v1/api-keys/{id}"| REVOKED["Revoked<br/><i>revoked_at set<br/>Cache invalidated immediately</i>"]
ACTIVE -->|"expires_at reached"| EXPIRED["Expired<br/><i>Rejected on next use</i>"]
REVOKED --> PURGED["Purged after 90 days"]
EXPIRED --> PURGED
Rate Limiting
slowapi runs as FastAPI middleware in the same process. Every response includes standard rate limit headers.
flowchart TD
REQ["Incoming Request<br/><i>tenant_id from auth state</i>"]
--> KEY["Build limit key<br/><i>rate:{tenant_id}:{plugin_name}:{window_start}</i>"]
--> INCR["Redis INCR + EXPIREAT<br/><i>atomic sliding window</i>"]
--> CHECK{"Count ≤ limit?"}
CHECK -->|"Yes"| HEADERS["Set response headers:<br/>X-RateLimit-Limit<br/>X-RateLimit-Remaining<br/>X-RateLimit-Reset"]
CHECK -->|"No"| REJECT["429 Too Many Requests<br/>Retry-After: {seconds}<br/>error_code: RATE_LIMITED"]
HEADERS --> PROCEED["Forward to route handler"]
Default limits (configurable per tenant):
| Tier | Requests/min per plugin |
|---|---|
| Starter | 20 |
| Professional | 100 |
| Enterprise | Custom |
Layer 2: Orchestrator (same process)
Intent Router, Plugin Registry, Context Store, Task Sequencer, Failure Manager — all in-process. See Section 14 for deep-dive.
Layer 3: Subscription Layer (in-process)
Access control, metering, quota enforcement. The subscription check occurs inside the intent router before any plan is built. See Section 17 for deep-dive.
Layer 4: Plugin Universe
Each plugin is a LangGraph StateGraph. Fully stateless per execution. See Section 10 for deep-dive.
Layer 5: Shared Infrastructure
- Redis 7: Auth cache, plugin registry cache, circuit breaker state, route cache, rate limit counters, and Redis Streams for all async queues
- PostgreSQL 16 + pgvector: All operational data + org knowledge RAG embeddings
- MinIO: Artifact and report storage. Path:
/{tenant_id}/{plugin_name}/{task_id}/ - Zot: Self-hosted OCI registry for plugin container images
- Prometheus + Grafana + Loki: Metrics, dashboards, log aggregation
- Kubernetes Secrets: Platform-level credentials (LLM API keys, embedding API key)
5. Request Lifecycle
sequenceDiagram
participant User
participant App as FastAPI App (auth + orchestrator)
participant Router as Intent Router
participant Registry as Plugin Registry (Redis/PG)
participant Sub as Subscription Layer
participant Sequencer as Task Sequencer
participant Plugin as Target Plugin (LangGraph)
participant Agents as Internal Agent Nodes
participant Store as MinIO
User->>App: POST /v1/tasks {command, attachments}
App->>App: SHA-256 verify API key → tenant_id, user_id
App->>App: slowapi rate limit check
App->>Router: Forward command + tenant context
Router->>Router: Embed command (text-embedding-3-small)
Router->>Registry: Score against capability embeddings
Registry-->>Router: Matching plugins + confidence scores
Router->>Sub: Check access for tenant → plugin
Sub-->>Router: Authorized / Denied / Quota exceeded
alt Authorized (sync)
Router->>Sequencer: Build execution plan
Sequencer->>Plugin: Send InputEnvelope
Plugin->>Agents: Invoke LangGraph StateGraph
Agents->>Agents: Node 1 → Node 2 → Node 3
Agents->>Store: Save artifacts (MinIO)
Agents-->>Plugin: Final graph state
Plugin-->>Sequencer: OutputEnvelope (SDK-validated)
Sequencer-->>Router: Final response
Router-->>App: Response + artifact URLs
App-->>User: 200 OK {task_id, status, result}
Sub->>Sub: Write execution_events row
else Authorized (async)
App->>App: XADD tasks:{plugin_name} (Redis Stream)
App-->>User: 202 Accepted {task_id}
else Not subscribed
App-->>User: 403 PLUGIN_NOT_SUBSCRIBED {upgrade_url}
else Quota exceeded
App-->>User: 429 QUOTA_EXCEEDED {quota_warning}
end
6. REST API Specification
All endpoints require Authorization: Bearer <api_key>.
6.1 Task Endpoints
| Method | Path | Description | Response |
|---|---|---|---|
POST | /v1/tasks | Submit a task | 202 {task_id} or 200 {result} |
GET | /v1/tasks/{task_id} | Poll task status | 200 {task} |
DELETE | /v1/tasks/{task_id} | Cancel a running task | 202 {task_id, status: cancelling} |
GET | /v1/tasks | List tasks (paginated) | 200 {tasks[], next_cursor} |
6.2 Plugin Endpoints
| Method | Path | Description | Response |
|---|---|---|---|
GET | /v1/plugins | List available plugins for tenant | 200 {plugins[]} |
GET | /v1/plugins/{name} | Plugin detail + capabilities | 200 {plugin} |
GET | /v1/plugins/{name}/health | Plugin health status | 200 {status, components} |
6.3 Auth Endpoints
| Method | Path | Description | Response |
|---|---|---|---|
POST | /v1/api-keys | Create API key | 201 {id, key (shown once), prefix} |
GET | /v1/api-keys | List API keys for tenant | 200 {keys[]} |
DELETE | /v1/api-keys/{id} | Revoke API key | 204 |
6.4 Registration Endpoints (internal — plugin pods only)
| Method | Path | Description | Response |
|---|---|---|---|
POST | /internal/registry/register | Plugin self-registers on startup | 201 {plugin_id} |
DELETE | /internal/registry/{name}/{version} | Plugin deregisters on shutdown | 204 |
POST | /internal/registry/{name}/health | Plugin reports health | 200 |
6.5 Webhook Endpoints
| Method | Path | Description | Response |
|---|---|---|---|
POST | /v1/webhooks | Register a webhook endpoint | 201 {webhook_id} |
DELETE | /v1/webhooks/{webhook_id} | Remove a webhook | 204 |
GET | /v1/webhooks/{webhook_id}/deliveries | View delivery attempts | 200 {deliveries[]} |
6.6 Standard Error Format
{
"error": {
"code": "PLUGIN_NOT_SUBSCRIBED",
"message": "Your plan does not include the code-review plugin.",
"upgrade_url": "https://app.aau.io/billing/upgrade",
"task_id": "task_abc123",
"request_id": "req_xyz789"
}
}
6.7 Error Code Taxonomy
| Code | HTTP | Category | Retry? |
|---|---|---|---|
PLUGIN_NOT_SUBSCRIBED | 403 | Permanent | No |
QUOTA_EXCEEDED | 429 | Permanent | No |
RATE_LIMITED | 429 | Transient | Yes (Retry-After) |
PLUGIN_UNAVAILABLE | 503 | Transient | Yes |
PLUGIN_TIMEOUT | 504 | Transient | Yes |
CIRCULAR_DEPENDENCY | 422 | Permanent | No |
INVALID_INPUT | 422 | Permanent | No |
TOKEN_BUDGET_EXCEEDED | 200 (partial) | Overage | Billed, no retry |
CONTENT_POLICY_VIOLATION | 422 | Permanent | No |
CONTEXT_WINDOW_OVERFLOW | 200 (retry) | Transient | Yes (auto-truncated) |
DEPENDENCY_FAILED | 200 (partial) | Depends | Partial result returned |
AUTH_INVALID | 401 | Permanent | No |
7. Subscription & Commercial Model
7.1 Subscription Tiers
| Tier | Plugins Included | Cross-Plugin Chaining | Custom Plugins | Pricing Model |
|---|---|---|---|---|
| Starter | 1 plugin of choice | No | No | Flat monthly per plugin |
| Professional | Up to 5 plugins | Yes | No | Bundle discount + per-execution overage |
| Enterprise | Unlimited | Yes | Yes (custom-built) | Annual contract, volume pricing |
Subscription enforcement position: The intent router checks the tenant’s subscription immediately after scoring plugins — before building an execution plan.
7.2 Plugin Marketplace
Third-party plugins are invite-only at launch. Marketplace infrastructure is a v2 deliverable.
graph LR
subgraph Marketplace["Plugin Marketplace (v1 — first-party only)"]
CATALOG["Plugin Catalog<br/><i>Browse & search</i>"]
DETAIL["Product Page<br/><i>Capabilities, use cases, pricing</i>"]
SANDBOX["Sandbox Mode<br/><i>10 free executions</i>"]
SUB["Subscribe<br/><i>Activate for tenant</i>"]
CATALOG --> DETAIL --> SANDBOX --> SUB
end
subgraph Revenue["Revenue Streams"]
FIRST["First-Party Plugins"]
CUSTOM["Enterprise Custom Builds"]
end
subgraph Billing["Billing Engine"]
BASE["Base Subscription"]
USAGE["Usage-Based Charges"]
OVERAGE["Overage Billing"]
BASE --> USAGE --> OVERAGE
end
Marketplace --> Revenue
Revenue --> Billing
8. Sample Plugin Catalog
| Plugin | Domain | Capabilities | Agents Inside |
|---|---|---|---|
| Code Review Pro | Engineering | PR reviews, security scanning, style checks, refactor suggestions | Diff Analyzer, Security Scanner, Style Checker, Summary Writer |
| Customer Support AI | Support | Ticket classification, resolution, response drafting, escalation | Classifier, Resolution Engine, Response Drafter, Escalation Agent |
| QA Transformer | Quality | Test case generation, coverage validation, gap analysis | Code Analyzer, Test Generator, Coverage Validator |
| Ticket Processor | Operations | Triage, assignment, status tracking, SLA monitoring | Triager, Assigner, Status Tracker, SLA Monitor |
| Doc Generator | Content | PRDs, tech docs, reports from conversations | Extractor, Structurer, Writer, Reviewer |
| Data Analyst | Analytics | SQL generation, visualization, insight surfacing | Query Builder, Visualizer, Insight Generator, Narrator |
PART 2: LOW-LEVEL DESIGN
9. Database Schema
9.1 Entity Relationship Diagram
erDiagram
tenants {
uuid id PK
text name
text plan_tier
jsonb settings
timestamptz created_at
}
users {
uuid id PK
uuid tenant_id FK
text email
text role
timestamptz created_at
}
api_keys {
uuid id PK
uuid tenant_id FK
uuid user_id FK
text key_hash
text key_prefix
text[] scopes
timestamptz expires_at
timestamptz revoked_at
}
tenant_secrets {
uuid id PK
uuid tenant_id FK
text plugin_name
text secret_name
bytea secret_value
timestamptz created_at
}
plugins {
uuid id PK
text name
text category
text owner_type
text status
timestamptz deprecated_at
timestamptz sunset_at
}
plugin_versions {
uuid id PK
uuid plugin_id FK
text version
jsonb manifest
text envelope_version
text status
timestamptz registered_at
}
envelope_schemas {
uuid id PK
text version
jsonb input_schema
jsonb output_schema
timestamptz deprecated_at
timestamptz created_at
}
subscriptions {
uuid id PK
uuid tenant_id FK
uuid plugin_id FK
text tier
int quota_limit
bool overage_enabled
timestamptz started_at
timestamptz ended_at
}
tenant_plugin_config {
uuid id PK
uuid tenant_id FK
uuid plugin_id FK
jsonb config
timestamptz updated_at
}
tasks {
uuid id PK
uuid tenant_id FK
uuid user_id FK
text plugin_name
text status
jsonb input
jsonb result
int tokens_used
numeric cost_estimate
text routing_method
timestamptz created_at
timestamptz completed_at
}
webhooks {
uuid id PK
uuid tenant_id FK
text url
text secret_hash
text[] events
bool active
}
execution_events {
uuid id PK
uuid tenant_id FK
text plugin_name
uuid task_id FK
text status
int tokens_used
int latency_ms
numeric cost_estimate
text routing_method
bool billed
timestamptz timestamp
}
org_knowledge_chunks {
uuid id PK
uuid tenant_id FK
text source_url
text content
vector embedding
jsonb metadata
}
tenants ||--o{ users : "has"
tenants ||--o{ api_keys : "has"
tenants ||--o{ subscriptions : "has"
tenants ||--o{ tenant_plugin_config : "configures"
tenants ||--o{ tasks : "submits"
tenants ||--o{ webhooks : "registers"
tenants ||--o{ execution_events : "generates"
tenants ||--o{ org_knowledge_chunks : "owns"
tenants ||--o{ tenant_secrets : "stores"
users ||--o{ api_keys : "owns"
users ||--o{ tasks : "submits"
plugins ||--o{ plugin_versions : "has"
plugins ||--o{ subscriptions : "subscribed via"
plugins ||--o{ tenant_plugin_config : "configured by"
tasks ||--o{ execution_events : "produces"
9.2 PostgreSQL — Full DDL
CREATE EXTENSION IF NOT EXISTS vector;
CREATE EXTENSION IF NOT EXISTS pgcrypto;
-- Tenants and users
CREATE TABLE tenants (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
name TEXT NOT NULL,
plan_tier TEXT NOT NULL CHECK (plan_tier IN ('starter', 'professional', 'enterprise')),
settings JSONB DEFAULT '{}',
created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
CREATE TABLE users (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
tenant_id UUID NOT NULL REFERENCES tenants(id),
email TEXT NOT NULL UNIQUE,
role TEXT NOT NULL DEFAULT 'member',
created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
-- API keys — SHA-256 hashed, not bcrypt
CREATE TABLE api_keys (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
tenant_id UUID NOT NULL REFERENCES tenants(id),
user_id UUID REFERENCES users(id), -- NULL = service account
key_hash TEXT NOT NULL UNIQUE, -- SHA-256(raw_key), hex-encoded
key_prefix TEXT NOT NULL, -- first 8 chars of random part for lookup
scopes TEXT[] NOT NULL DEFAULT '{}',
description TEXT,
last_used_at TIMESTAMPTZ,
expires_at TIMESTAMPTZ,
revoked_at TIMESTAMPTZ,
created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
-- Tenant-owned secrets: AES-256 via pgcrypto, replaces Vault for v1
-- Platform encryption key stored in Kubernetes Secret, injected as env var ENCRYPT_KEY
CREATE TABLE tenant_secrets (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
tenant_id UUID NOT NULL REFERENCES tenants(id),
plugin_name TEXT NOT NULL,
secret_name TEXT NOT NULL, -- e.g. "github_token", "jira_api_key"
secret_value BYTEA NOT NULL, -- pgp_sym_encrypt(value, ENCRYPT_KEY)
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
UNIQUE (tenant_id, plugin_name, secret_name)
);
-- Helper functions for encrypt/decrypt
-- encrypt: SELECT pgp_sym_encrypt('value', current_setting('app.encrypt_key'))
-- decrypt: SELECT pgp_sym_decrypt(secret_value, current_setting('app.encrypt_key'))
-- Plugin registry (durable source of truth — Redis is the cache)
CREATE TABLE plugins (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
name TEXT NOT NULL UNIQUE,
category TEXT NOT NULL,
owner_type TEXT NOT NULL CHECK (owner_type IN ('first_party', 'enterprise_custom')),
status TEXT NOT NULL DEFAULT 'active'
CHECK (status IN ('active', 'deprecated', 'sunset', 'retired')),
deprecated_at TIMESTAMPTZ,
sunset_at TIMESTAMPTZ,
created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
CREATE TABLE plugin_versions (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
plugin_id UUID NOT NULL REFERENCES plugins(id),
version TEXT NOT NULL,
manifest JSONB NOT NULL,
envelope_version TEXT NOT NULL DEFAULT '1.0',
status TEXT NOT NULL DEFAULT 'active',
registered_at TIMESTAMPTZ NOT NULL DEFAULT now(),
UNIQUE (plugin_id, version)
);
-- Envelope schema registry — a table, not a service
-- Cached in-process by orchestrator on startup; refreshed on NOTIFY envelope_schemas
CREATE TABLE envelope_schemas (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
version TEXT NOT NULL UNIQUE,
input_schema JSONB NOT NULL,
output_schema JSONB NOT NULL,
deprecated_at TIMESTAMPTZ,
created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
-- Subscriptions
CREATE TABLE subscriptions (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
tenant_id UUID NOT NULL REFERENCES tenants(id),
plugin_id UUID NOT NULL REFERENCES plugins(id),
tier TEXT NOT NULL,
quota_limit INTEGER,
overage_enabled BOOLEAN NOT NULL DEFAULT false,
started_at TIMESTAMPTZ NOT NULL DEFAULT now(),
ended_at TIMESTAMPTZ,
UNIQUE (tenant_id, plugin_id)
);
-- Tenant-specific plugin config
CREATE TABLE tenant_plugin_config (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
tenant_id UUID NOT NULL REFERENCES tenants(id),
plugin_id UUID NOT NULL REFERENCES plugins(id),
config JSONB NOT NULL DEFAULT '{}',
updated_at TIMESTAMPTZ NOT NULL DEFAULT now(),
UNIQUE (tenant_id, plugin_id)
);
-- Task tracking
CREATE TABLE tasks (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
tenant_id UUID NOT NULL REFERENCES tenants(id),
user_id UUID NOT NULL REFERENCES users(id),
plugin_name TEXT NOT NULL,
status TEXT NOT NULL DEFAULT 'pending'
CHECK (status IN
('pending','queued','running','success','partial','failed','cancelled')),
input JSONB NOT NULL,
result JSONB,
error JSONB,
tokens_used INTEGER,
cost_estimate NUMERIC(10,6),
routing_method TEXT,
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
started_at TIMESTAMPTZ,
completed_at TIMESTAMPTZ
);
-- Webhooks
CREATE TABLE webhooks (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
tenant_id UUID NOT NULL REFERENCES tenants(id),
url TEXT NOT NULL,
secret_hash TEXT NOT NULL,
events TEXT[] NOT NULL DEFAULT '{task.completed}',
active BOOLEAN NOT NULL DEFAULT true,
created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
-- Metering (v1 in PostgreSQL; v2 upgrade to ClickHouse at billing scale)
CREATE TABLE execution_events (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
tenant_id UUID NOT NULL,
plugin_name TEXT NOT NULL,
plugin_version TEXT NOT NULL,
task_id UUID NOT NULL,
status TEXT NOT NULL,
tokens_used INTEGER,
latency_ms INTEGER,
cost_estimate NUMERIC(10,6),
routing_method TEXT,
envelope_version TEXT,
billed BOOLEAN NOT NULL DEFAULT false,
timestamp TIMESTAMPTZ NOT NULL DEFAULT now()
);
CREATE INDEX ON execution_events (tenant_id, timestamp);
CREATE INDEX ON execution_events (task_id);
-- pgvector: org knowledge RAG
CREATE TABLE org_knowledge_chunks (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
tenant_id UUID NOT NULL REFERENCES tenants(id),
source_url TEXT,
content TEXT NOT NULL,
embedding vector(1536), -- text-embedding-3-small output dimension
metadata JSONB DEFAULT '{}',
created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
-- HNSW index: no minimum row requirement, no reindexing needed, better recall at v1 scale
CREATE INDEX ON org_knowledge_chunks
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);
CREATE INDEX ON org_knowledge_chunks (tenant_id);
Why HNSW over IVFFlat: IVFFlat requires 3 × lists minimum rows (300 rows with lists=100) to produce useful results and needs periodic REINDEX as new rows are inserted. HNSW has no minimum, never needs reindexing, and has better recall at small-to-medium scale. Switch to IVFFlat only if memory pressure becomes a concern at large scale.
10. Plugin Architecture — Deep Dive
10.1 Plugin Internal Structure
graph TB
subgraph PluginShell["Plugin Shell (Visible to Orchestrator)"]
META["Identity & Manifest<br/><i>name, version, capabilities, schema</i>"]
CONFIG["Configuration<br/><i>model, limits, token_budget, required_secret_names</i>"]
LIFECYCLE["Lifecycle Hooks<br/><i>initialize · health_check · shutdown</i>"]
EXECUTE["execute(InputEnvelope) → OutputEnvelope"]
end
subgraph PluginInternals["Plugin Internals — LangGraph StateGraph"]
subgraph AgentGraph["StateGraph"]
A1["Node: Task Decomposition"]
A2["Node: Domain Processing"]
A3["Node: Validation"]
A4["Node: Synthesis & Output"]
A1 -->|"GraphState"| A2
A2 -->|"GraphState"| A3
A3 -->|"GraphState"| A4
end
subgraph Budget["Token Budget Tracker (SDK)"]
TB1["Track tokens per LLM call"]
TB2["Emit warning at 80%"]
TB3["Log overage at 100%+"]
end
subgraph Tools["Tools"]
T1["External APIs<br/><i>credentials from InputEnvelope.context.secrets</i>"]
T2["PostgreSQL queries (tenant-scoped)"]
T3["MinIO artifact store"]
T4["LLM calls (Anthropic SDK)"]
end
AgentGraph --> Budget
AgentGraph --> Tools
end
EXECUTE --> PluginInternals
PluginInternals -->|"OutputEnvelope (SDK-validated)"| EXECUTE
How plugins access credentials: The orchestrator resolves required_secret_names from tenant_secrets (decrypting with pgcrypto) and injects the resolved key-value pairs into InputEnvelope.context.secrets before calling execute(). Plugins never call the secrets store directly — the orchestrator is the only component that decrypts.
10.2 Plugin Interface Contract
Identity & Manifest:
| Field | Type | Purpose |
|---|---|---|
name | string | Unique identifier (e.g., "code-review") |
version | string | Semantic version |
description | string | Human-readable summary |
capabilities | list[string] | What it can do — embedded for routing |
category | string | Domain tag |
dependencies | list[PluginDep] | {name, version_constraint} |
input_schema | dict | JSON schema for expected input |
envelope_version | string | Which envelope version this plugin speaks |
estimated_latency | string | fast / medium / slow |
required_scopes | list[string] | Declared capability scopes |
required_secret_names | list[string] | Secret names needed from tenant_secrets |
Configuration:
| Field | Type | Purpose |
|---|---|---|
model | string | LLM override (default: claude-sonnet-4-6) |
temperature | float | LLM temperature |
max_retries | int | Retry count before failing |
timeout_seconds | int | Max execution time |
token_budget | int | Max tokens per execution |
Lifecycle Methods:
| Method | When Called | Returns |
|---|---|---|
initialize() | Once at registration | Sets up connections, compiles LangGraph |
health_check() | Every 30s | HealthStatus |
execute(input) | Per request | OutputEnvelope |
shutdown() | On deregistration | Cleans up connections |
Health check returns structured status:
@dataclass
class HealthStatus:
status: Literal["healthy", "degraded", "unhealthy"]
components: dict[str, str] # {"llm": "healthy", "github_api": "degraded"}
message: str | None = None
Three consecutive unhealthy responses trip the circuit breaker.
11. Standardized Envelopes
11.1 InputEnvelope
classDiagram
class InputEnvelope {
+string task_id
+string task_description
+string envelope_version
+Context context
+Metadata metadata
+list~any~ attachments
}
class Context {
+string user_id
+string tenant_id
+dict user_profile
+list~dict~ conversation_history
+list~dict~ org_knowledge_chunks
+dict plugin_config
+dict secrets
}
class Metadata {
+string source_channel
+string priority
+string source_plugin
+string correlation_id
+datetime timestamp
}
InputEnvelope --> Context : context
InputEnvelope --> Metadata : metadata
| Field | Type | Description |
|---|---|---|
task_id | string | Unique ID for tracking |
task_description | string | Natural language task |
envelope_version | string | Schema version (e.g., "1.0") |
context.conversation_history | list | Last 20 turns (sliding window) |
context.org_knowledge_chunks | list | RAG-retrieved chunks (top-k, tenant-scoped) |
context.plugin_config | dict | Tenant config from tenant_plugin_config |
context.secrets | dict | Decrypted secrets from tenant_secrets (orchestrator-resolved, never stored) |
metadata.correlation_id | string | End-to-end trace ID |
attachments | list[any] | Files, code snippets, URLs, PRs |
11.2 OutputEnvelope
classDiagram
class OutputEnvelope {
+string task_id
+string envelope_version
+string status
+string result
+list~any~ artifacts
+float confidence
+string|null error
+string|null error_code
+list~string~ next_steps
+string|null deprecation_warning
+ExecutionMeta execution_meta
}
class ExecutionMeta {
+int tokens_used
+float latency_seconds
+int agents_invoked
+string model_used
+float cost_estimate
+string routing_method
+bool budget_exceeded
}
OutputEnvelope --> ExecutionMeta : execution_meta
OutputEnvelope validation: The SDK validates every OutputEnvelope against the in-process schema cache (loaded from the envelope_schemas table) before returning to the orchestrator.
12. Envelope Versioning
12.1 Versioning Rules
- Additive changes (new optional fields): minor bump. Old plugins ignore unknown fields.
- Breaking changes (removal, type change, rename): major bump. Orchestrator runs a version adapter.
- Schema storage:
envelope_schemastable in PostgreSQL. Orchestrator loads schemas into an in-process dict at startup and listens forNOTIFY envelope_schemasto refresh on updates — no schema registry service, no extra network calls. - Multi-version support: Current + one prior major version. Plugins on older versions receive
deprecation_warninginexecution_meta.
12.2 Adapter Flow
flowchart LR
ORCH["Orchestrator<br/><i>speaks v2</i>"]
ADAPT["Version Adapter<br/><i>in-process dict lookup</i>"]
PLUGIN["Plugin<br/><i>speaks v1</i>"]
ORCH -->|"v2 InputEnvelope"| ADAPT
ADAPT -->|"v1 InputEnvelope"| PLUGIN
PLUGIN -->|"v1 OutputEnvelope"| ADAPT
ADAPT -->|"v2 OutputEnvelope"| ORCH
Plugin developers are notified 90 days before a major version is sunset.
13. Plugin Registration & Discovery
13.1 Plugin Deployment Models
flowchart TD
subgraph Dev["Local Development (Docker Compose)"]
FILE["Plugin .py file"] --> WATCH["aau plugin dev<br/><i>File watcher + hot-reload</i>"]
WATCH --> STUB["Local app stub<br/><i>Sends test InputEnvelopes, prints OutputEnvelopes</i>"]
end
subgraph Prod["Production (Kubernetes)"]
IMAGE["Plugin Container Image<br/><i>aau plugin publish → Zot OCI registry</i>"]
--> DEPLOY["kubectl apply / Helm chart"]
--> STARTUP["Container starts, initialize() called"]
--> REGAPI["POST /internal/registry/register<br/><i>{manifest, envelope_version, required_scopes}</i>"]
--> VALIDATE{"Contract check +<br/>scope allowlist +<br/>cosign signature verify"}
VALIDATE -->|"Valid"| PGWRITE["INSERT plugin_versions (PostgreSQL)"]
VALIDATE -->|"Invalid"| REJECT["400 — reject + log"]
PGWRITE --> HEALTH{"health_check() passes?"}
HEALTH -->|"Yes"| REDISCACHE["SET registry:{name} in Redis<br/>PUBLISH registry.updated"]
HEALTH -->|"No"| BACKOFF["Log degraded + retry backoff"]
REDISCACHE --> READY["Plugin live in orchestrator"]
end
subgraph Teardown["Plugin Removal"]
SCALE["Scale to 0"] --> DRAIN["In-flight requests drain<br/><i>grace = timeout_seconds</i>"]
DRAIN --> DEREG["DELETE /internal/registry/{name}/{version}"]
DEREG --> PGMARK["UPDATE plugin_versions SET status='inactive'"]
PGMARK --> REDINVAL["DEL registry:{name} + PUBLISH registry.removed"]
end
13.2 Registration Rules
- Contract validation:
name,version,capabilities,required_scopes,execute(). Fails fast with a clear error. - Scope enforcement:
required_scopesvalidated against an allowlist at registration. - Health gating: Only healthy plugins enter the registry.
- Registry durability: PostgreSQL is source of truth. Redis rebuilt from PostgreSQL on cache miss or restart.
14. Orchestrator — Deep Dive
14.1 Intent Routing Architecture
The router uses two configurable thresholds: CONFIDENCE_MIN (below which asks for clarification or falls back to LLM) and CHAIN_THRESHOLD (above which routes directly; between the two, considers chaining). Both are environment config values with documented defaults.
flowchart TD
CMD["Incoming Command"] --> CACHE{"Route cache hit?<br/><i>Redis, TTL: 5 min</i>"}
CACHE -->|"Hit"| CACHED["Use cached plugin route"]
CACHE -->|"Miss"| EMBED["Embed command<br/><i>text-embedding-3-small</i>"]
EMBED --> MATCH["Cosine similarity vs<br/>capability embeddings in Redis"]
MATCH --> EVAL{"Score vs thresholds<br/><i>CHAIN_THRESHOLD=0.75<br/>CONFIDENCE_MIN=0.50</i>"}
EVAL -->|"Score > CHAIN_THRESHOLD<br/>single plugin"| DIRECT["Route directly"]
EVAL -->|"Score > CHAIN_THRESHOLD<br/>multiple plugins"| DEPCHECK{"Circular dependency?<br/><i>DFS check</i>"}
EVAL -->|"CONFIDENCE_MIN < Score ≤ CHAIN_THRESHOLD"| LLM["LLM fallback<br/><i>claude-sonnet-4-6</i>"]
EVAL -->|"Score < CONFIDENCE_MIN"| NOPE["No match → suggest closest"]
EVAL -->|"Embed service down"| KEYWORD["Keyword fallback<br/><i>substring match on capability strings</i>"]
LLM --> EVAL2{"LLM confident?"}
EVAL2 -->|"Yes"| DIRECT
EVAL2 -->|"Ambiguous"| CLARIFY["Ask user to clarify"]
EVAL2 -->|"No"| NOPE
DEPCHECK -->|"Cycle found"| ERROR["422 CIRCULAR_DEPENDENCY<br/><i>cycle path in error message</i>"]
DEPCHECK -->|"No cycle"| CHAIN["Build execution chain (topological sort)"]
DIRECT --> SUBCHECK{"Subscription check"}
CHAIN --> PLAN["Build execution plan"]
PLAN --> SUBCHECK
SUBCHECK -->|"Authorized"| ENVELOPE["Build InputEnvelope<br/><i>inject context + secrets</i>"]
SUBCHECK -->|"Not subscribed"| UPSELL["403 PLUGIN_NOT_SUBSCRIBED"]
SUBCHECK -->|"Quota exceeded"| QUOTA["429 QUOTA_EXCEEDED"]
ENVELOPE --> EXEC["Execute plugin(s)"]
EXEC --> RESULT["Return OutputEnvelope"]
RESULT --> WRITECACHE["Write to route cache (Redis)"]
CACHED --> SUBCHECK
Threshold calibration: CHAIN_THRESHOLD and CONFIDENCE_MIN are set in environment config. Calibrate by running a labeled command corpus through the router and adjusting to maximise precision while keeping recall above 95%. Two values are easier to tune and reason about than three.
LLM routing cost target: < 5% of requests. Tracked via routing_method in every execution_meta. Grafana alert fires if LLM fallback rate exceeds 10%.
14.2 Multi-Plugin Task Sequencing
flowchart LR
subgraph Example["Example: Review PR + create tickets for each issue"]
direction TB
CMD2["Command"] --> DECOMPOSE["Task Sequencer"]
DECOMPOSE --> DEPCHECK2["DFS dependency check"]
DEPCHECK2 --> STEP1["Step 1: Code Review Pro<br/><i>find issues</i>"]
STEP1 -->|"OutputEnvelope.artifacts"| TRANSFORM["Transform<br/><i>map issues to ticket format</i>"]
TRANSFORM --> STEP2["Step 2: Ticket Processor<br/><i>create tickets</i>"]
STEP2 -->|"OutputEnvelope"| MERGE["Merge Results"]
MERGE --> FINAL["Final OutputEnvelope"]
end
Sequencing rules:
- Serial chaining: Output of plugin A feeds plugin B
- Parallel execution: Independent plugins run concurrently; results merged
- Conditional branching: Plugin A output determines next plugin
- Early termination: Failed step →
status: partialwith what completed
14.3 Shared Context Store
Context Assembly Pipeline
flowchart TD
REQ["Request (tenant_id, user_id, task, plugin_name)"] --> PARALLEL["Parallel lookups"]
PARALLEL --> HIST["Conversation history<br/><i>Redis: ctx:{tenant_id}:{user_id}:history (last 20)</i>"]
PARALLEL --> PROF["User profile<br/><i>Redis cache → PostgreSQL users</i>"]
PARALLEL --> CFG["Plugin config<br/><i>Redis cache → tenant_plugin_config</i>"]
PARALLEL --> RAG["RAG retrieval<br/><i>pgvector cosine search, top-k chunks</i>"]
PARALLEL --> SECRETS["Decrypt secrets<br/><i>SELECT + pgp_sym_decrypt from tenant_secrets</i>"]
HIST --> ASSEMBLE["Context Assembler"]
PROF --> ASSEMBLE
CFG --> ASSEMBLE
RAG --> ASSEMBLE
SECRETS --> ASSEMBLE
ASSEMBLE --> ENVELOPE3["InputEnvelope.context<br/><i>history, profile, plugin_config,<br/>org_knowledge_chunks, secrets</i>"]
RAG Pipeline — Org Knowledge Indexing
flowchart TD
UPLOAD["POST /v1/knowledge {content, source_url}"]
--> CHUNK["Chunker<br/><i>~512-token chunks with overlap</i>"]
--> EMBED2["Embed each chunk<br/><i>text-embedding-3-small → vector(1536)</i>"]
--> STORE2["INSERT org_knowledge_chunks<br/><i>content, embedding, metadata, tenant_id</i>"]
--> READY2["Available for HNSW retrieval"]
CHUNK --> META["Extract metadata<br/><i>source_url, section headers</i>"]
META --> STORE2
RAG Pipeline — Retrieval at Request Time
flowchart TD
TASK["task_description"]
--> EMBED3["Embed task (text-embedding-3-small)"]
--> SEARCH["pgvector HNSW cosine search<br/><i>WHERE tenant_id = $1 ORDER BY embedding <=> $2 LIMIT 8</i>"]
--> FILTER["Filter: cosine similarity > 0.7"]
--> WRAP["Wrap in system prompt boundary<br/><i>prevents prompt injection from org docs</i>"]
--> INJECT["Inject as context.org_knowledge_chunks<br/><i>max 2000 tokens total</i>"]
Async History Summarisation
sequenceDiagram
participant Agent as Any Agent
participant Redis
participant Stream as Redis Stream
participant Worker as Summarisation Worker
participant LLM as claude-sonnet-4-6
Agent->>Redis: LPUSH ctx:tenant:user:history {new_turn}
Redis->>Redis: LLEN ctx:tenant:user:history
alt Length > 20
Redis->>Stream: XADD stream:summarization {tenant_id, user_id}
end
Note over Agent: Request continues unblocked
Stream->>Worker: XREADGROUP (consumer group)
Worker->>Redis: LRANGE history (turns 21..end)
Redis-->>Worker: Older turns
Worker->>LLM: Summarise older turns
LLM-->>Worker: Compressed summary
Worker->>Redis: SET ctx:tenant:user:summary {text}
Worker->>Redis: LTRIM ctx:tenant:user:history 0 19
Worker->>Stream: XACK (mark message processed)
14.4 Async Task Delivery (Redis Streams)
sequenceDiagram
participant User
participant App
participant Stream as Redis Stream
participant Worker as Plugin Worker
participant DB as PostgreSQL
participant Notify as Webhook Delivery Worker
participant Webhook as Customer Webhook URL
User->>App: POST /v1/tasks {command}
App->>DB: INSERT tasks (status=queued)
App->>Stream: XADD stream:tasks:{plugin_name} {task_id, payload}
App-->>User: 202 Accepted {task_id}
Stream->>Worker: XREADGROUP (consumer group)
Worker->>DB: UPDATE tasks SET status=running
Worker->>Worker: Run LangGraph graph
Worker->>DB: UPDATE tasks SET status=success/failed, result={...}
Worker->>Stream: XADD stream:webhooks {task_id, tenant_id}
Worker->>Stream: XACK tasks stream
Note over User: Poll for result
User->>App: GET /v1/tasks/{task_id}
App->>DB: SELECT tasks WHERE id=task_id
DB-->>App: {status, result}
App-->>User: 200 {status, result}
Note over Notify: Webhook delivery worker
Stream->>Notify: XREADGROUP stream:webhooks
Notify->>DB: SELECT webhooks WHERE tenant_id=...
Notify->>Notify: HMAC-SHA256 sign payload
Notify->>Webhook: POST {task_id, status, result}<br/>X-AAU-Signature: sha256=<hmac>
Notify->>Stream: XACK (on 2xx) or re-enqueue with backoff
Webhook Security Detail
flowchart TD
PAYLOAD["Webhook payload JSON"]
--> SIGN["HMAC-SHA256(payload, webhook.secret)<br/><i>secret from webhooks.secret_hash (decrypted)</i>"]
--> HEADER["X-AAU-Signature: sha256={digest}"]
--> POST["POST to customer URL (10s timeout)"]
--> RESP{"HTTP response?"}
RESP -->|"2xx"| LOG_OK["Log delivery success"]
RESP -->|"Non-2xx / timeout"| RETRY{"Retries < 5?"}
RETRY -->|"Yes"| BACKOFF2["Exponential backoff<br/><i>1m → 5m → 30m → 2h → 8h</i>"]
BACKOFF2 --> POST
RETRY -->|"No"| LOG_FAIL["Log permanent failure<br/><i>tenant notified in dashboard</i>"]
Task cancellation: DELETE /v1/tasks/{task_id} sets Redis key cancel:{task_id} EX 3600. The LangGraph graph checks this flag at each node boundary. Acknowledged within one node cycle.
Task state machine: pending → queued → running → success | partial | failed | cancelled
15. Failure Handling & Resilience
15.1 Failure Strategy
flowchart TD
EXEC2["Plugin Execution"] --> OUTCOME{"Outcome?"}
OUTCOME -->|"Success"| RETURN["OutputEnvelope status: success"]
OUTCOME -->|"Timeout"| RETRY{"Retries remaining?"}
OUTCOME -->|"Error"| ERRTYPE{"Error type?"}
RETRY -->|"Yes"| BACKOFF3["Exponential backoff → retry"]
RETRY -->|"No"| TIMEOUT["status: failed / PLUGIN_TIMEOUT"]
BACKOFF3 --> EXEC2
ERRTYPE -->|"Transient"| RETRY
ERRTYPE -->|"Token budget exceeded"| WARNBILL["Warn + continue<br/><i>overage tracked + billed</i>"]
ERRTYPE -->|"Context window overflow"| TRUNCATE["Truncate history → retry once"]
ERRTYPE -->|"Content policy violation"| POLICY["status: failed / CONTENT_POLICY_VIOLATION<br/><i>not retried, not billed</i>"]
ERRTYPE -->|"Permanent"| FALLBACK{"Fallback plugin?"}
ERRTYPE -->|"Partial result"| PARTIAL["status: partial"]
FALLBACK -->|"Yes"| ALT["Route to fallback plugin"]
FALLBACK -->|"No"| FAIL["status: failed"]
ALT --> EXEC2
TIMEOUT --> CIRCUIT["Update circuit breaker (Redis)"]
FAIL --> CIRCUIT
CIRCUIT -->|"3 consecutive failures"| OPEN["Open circuit — 60s cooldown"]
15.2 Token Budget Enforcement
flowchart TD
NODE_START["LangGraph node starts"]
--> SDK_CHECK{"tokens_used / token_budget"}
SDK_CHECK -->|"< 80%"| RUN["Run node (LLM call)"]
SDK_CHECK -->|"80–100%"| WARN2["Log warning + emit metric"]
WARN2 --> RUN
RUN --> TRACK["SDK reads response.usage<br/><i>input_tokens + output_tokens → running total in GraphState</i>"]
TRACK --> CHECK2{"Total > token_budget?"}
CHECK2 -->|"No"| NEXT["Proceed to next node"]
CHECK2 -->|"Yes"| OVERAGE["Set budget_exceeded=true in ExecutionMeta<br/><i>Overage billed to tenant</i>"]
OVERAGE --> CONTINUE["Continue execution"]
CONTINUE --> NEXT
15.3 Circuit Breaker per Plugin
stateDiagram-v2
[*] --> Closed
Closed --> Open : 3 consecutive failures
Open --> HalfOpen : 60s cooldown elapsed
HalfOpen --> Closed : Probe request succeeds
HalfOpen --> Open : Probe request fails
Open --> Open : Requests rejected immediately (no plugin call)
State stored in Redis (circuit:{plugin_name}:{version}). Surfaced as aau_circuit_breaker_state Gauge in Grafana.
16. Plugin Sandboxing & Security
16.1 Capability Scope Model
| Scope | Grants Access To |
|---|---|
context:read | Current request’s user and tenant context only |
secrets:{name} | A specific named secret resolved from tenant_secrets |
store:write | Writing to MinIO at /{tenant_id}/{plugin_name}/... |
store:read | Reading artifacts the plugin itself created |
plugin:chain | Calling other plugins as dependencies |
knowledge:read | Reading the tenant’s org knowledge base |
knowledge:write | Indexing new content into the tenant’s org knowledge base |
Note on secrets: Plugin pods never access the database or call a secrets endpoint. The orchestrator resolves and decrypts secrets from tenant_secrets before building the InputEnvelope, and injects them as context.secrets. Secrets leave the orchestrator only inside a request, over an in-cluster encrypted channel to the plugin pod.
16.2 Runtime Isolation
flowchart TB
subgraph App2["Platform App"]
EXEC3["execute(InputEnvelope)<br/><i>secrets already injected, tenant_id embedded</i>"]
end
subgraph PluginPod["Plugin Pod (network-isolated)"]
PROXY["Scope Enforcement Proxy<br/><i>validates all infra calls against declared scopes</i>"]
PLUGIN3["Plugin Code (LangGraph)"]
PROXY --> PLUGIN3
end
subgraph SharedInfra["Shared Infrastructure"]
PG3["PostgreSQL (WHERE tenant_id = $1 enforced)"]
STORE3["MinIO (path: /{tenant_id}/{plugin_name}/...)"]
end
EXEC3 -->|"InputEnvelope"| PROXY
PROXY -->|"Scope-validated"| STORE3
PROXY -->|"Read-only, tenant-scoped"| PG3
PLUGIN3 -->|"OutputEnvelope"| EXEC3
Network policy: Kubernetes NetworkPolicy restricts plugin pod egress to only the endpoints declared in required_secret_names. No direct database or secret store access from plugin pods.
Plugin signing: All plugins are signed with cosign during aau plugin publish. The registration endpoint verifies the signature before accepting.
16.3 Plugin Signing & Publish Flow
flowchart TD
CODE["Plugin code (my_plugin.py)"]
--> SDKBUILD["aau plugin publish<br/><i>Step 1: docker build using aau-base image</i>"]
--> IMAGE2["Container image built"]
--> SCAN["Step 2: trivy scan<br/><i>Fail on HIGH/CRITICAL CVEs</i>"]
--> SIGN2["Step 3: cosign sign image<br/><i>using platform private key</i>"]
--> PUSH["Step 4: push to Zot OCI registry<br/><i>registry.aau.internal/{name}:{version}</i>"]
--> MANIFEST["Step 5: generate manifest.json"]
--> UPLOAD2["Step 6: POST /internal/registry/register<br/><i>platform verifies cosign signature</i>"]
--> LIVE2["Plugin registered and live"]
SCAN --> FAIL2["Build fails — developer must fix CVEs"]
17. Subscription Enforcement — Deep Dive
17.1 Enforcement Flow
flowchart TD
REQ["Incoming Request (tenant_id from API key)"]
--> LOOKUP["Lookup subscription<br/><i>Redis cache → PostgreSQL subscriptions</i>"]
--> CHECK{"Plugin in subscription?"}
CHECK -->|"Yes"| QUOTA2{"Within quota?"}
CHECK -->|"No"| DENY["403 PLUGIN_NOT_SUBSCRIBED + upgrade_url"]
QUOTA2 -->|"< 80%"| ALLOW["Allow execution"]
QUOTA2 -->|"80–100%"| WARN3["Allow + emit quota.warning event"]
QUOTA2 -->|"Exceeded"| OVER{"Overage enabled?"}
OVER -->|"Yes"| CHARGE["Allow + set overage=true in metering"]
OVER -->|"No"| BLOCK["429 QUOTA_EXCEEDED"]
ALLOW --> LOG["INSERT execution_events"]
WARN3 --> LOG
CHARGE --> LOG
LOG --> BILL["Billing worker reads execution_events WHERE billed=false<br/><i>runs on schedule, marks billed=true after invoicing</i>"]
17.2 Usage Metering Record
| Field | Description |
|---|---|
tenant_id | Who |
plugin_name + plugin_version | What |
task_id | Which request |
timestamp | When |
tokens_used | From ExecutionMeta.tokens_used (SDK-tracked) |
latency_ms | Execution duration |
status | success / partial / failed |
cost_estimate | tokens_used × per-token rate for model_used |
routing_method | embedding / llm_fallback / cache / keyword |
billed | False for failed; True once invoiced by billing worker |
18. Plugin Dependency Management
18.1 Dependency Resolution
dependencies = [
{"name": "code-review", "version": ">=1.2.0,<2.0.0"},
{"name": "ticket-processor", "version": ">=2.0.0"},
]
At plan-build time: resolve versions → check subscriptions → DFS cycle detection → topological sort → execute.
18.2 Circular Dependency Detection
flowchart TD
PLAN["Build Execution Plan"] --> DFS["DFS over dependency graph"]
DFS --> CYCLE{"Cycle detected?"}
CYCLE -->|"Yes"| REJECT2["422 CIRCULAR_DEPENDENCY<br/><i>error message includes full cycle path</i>"]
CYCLE -->|"No"| TOPO["Topological sort → execution order"]
TOPO --> EXEC4["Execute in order"]
19. Plugin Deprecation Lifecycle
flowchart LR
ACTIVE["Active"] -->|"Owner initiates"| DEPRECATED["Deprecated<br/><i>90-day warning</i>"]
DEPRECATED -->|"90 days elapsed"| SUNSET["Sunset<br/><i>No new subscriptions</i>"]
SUNSET -->|"Zero active subscribers"| RETIRED["Retired<br/><i>Removed from registry</i>"]
DEPRECATED --> BLOCKED2["New subscriptions blocked"]
DEPRECATED --> WARN4["deprecation_warning in every OutputEnvelope<br/><i>sunset_at + migration_guide_url</i>"]
plugin_versions.statustransitions:active → deprecated → sunset → retired- At sunset, plugin accepts no new tasks; in-flight tasks complete before deregistration
20. Plugin SDK & Developer Experience
20.1 SDK Structure
aau-sdk/
├── base_plugin.py # BasePlugin abstract class (LangGraph-aware)
├── decorators.py # @register_plugin
├── envelopes.py # InputEnvelope, OutputEnvelope, typed dataclasses
├── graph.py # AAUStateGraph, standard node helpers
├── budget.py # Token budget tracker (wraps Anthropic client)
├── validation.py # OutputEnvelope schema validation (in-process envelope_schemas cache)
├── testing/
│ ├── harness.py # Local orchestrator stub
│ └── fixtures.py # Sample InputEnvelopes for testing
└── cli/
├── new.py # aau plugin new <name>
├── dev.py # aau plugin dev (Docker Compose + hot-reload)
├── test.py # aau plugin test
└── publish.py # build → trivy scan → cosign sign → Zot push → register
20.2 Local Development Stack
aau plugin dev runs docker compose up. All services are local — no external dependencies.
graph TB
subgraph DockerCompose["Docker Compose — Local Dev Stack"]
subgraph AppLayer["Application"]
LAPP["Platform App (FastAPI :8080)<br/><i>auth + orchestrator + registration routes</i>"]
end
subgraph Data["Data Services"]
LPG["PostgreSQL 16 + pgvector :5432<br/><i>Pre-seeded: 2 tenants, subscriptions,<br/>API keys, sample org knowledge</i>"]
LREDIS["Redis 7 :6379<br/><i>Cache + Streams queues</i>"]
end
subgraph Storage["Storage"]
LMINIO["MinIO :9000 / Console :9001<br/><i>Pre-created tenant buckets</i>"]
LZOT["Zot OCI Registry :5000<br/><i>Accepts plugin image pushes</i>"]
end
subgraph Observability2["Observability"]
LPROM["Prometheus :9090"]
LGRAFANA["Grafana :3000<br/><i>Pre-built plugin dashboards</i>"]
LLOKI["Loki :3100"]
end
subgraph Plugin2["Developer Plugin (hot-reload)"]
LPLUGIN["my_plugin.py<br/><i>Reloads on file save</i>"]
end
end
LAPP --> LPG
LAPP --> LREDIS
LAPP --> LMINIO
LPLUGIN -->|"POST /internal/registry/register"| LAPP
LPROM --> LGRAFANA
LLOKI --> LGRAFANA
Test API keys:
<YOUR_STRIPE_TEST_KEY_1>—test-tenant-1(Professional tier)<YOUR_STRIPE_TEST_KEY_2>—test-tenant-2(Starter tier)
20.3 Plugin Development Flow
flowchart LR
SCAFFOLD["aau plugin new my-plugin<br/><i>Scaffolds: my_plugin.py, tests/, Dockerfile, manifest.json</i>"]
--> CODE["Implement LangGraph nodes<br/><i>extend BasePlugin, define GraphState</i>"]
--> DEV["aau plugin dev<br/><i>docker compose up + hot-reload</i>"]
--> TEST["aau plugin test<br/><i>PluginHarness: tokens, scopes, schema</i>"]
--> PUBLISH["aau plugin publish<br/><i>build → trivy → cosign → Zot → register</i>"]
--> LIVE2["Plugin live in registry"]
20.4 Minimal Plugin Implementation
from aau_sdk import BasePlugin, register_plugin
from aau_sdk.envelopes import InputEnvelope, OutputEnvelope
from aau_sdk.graph import AAUStateGraph
from typing import TypedDict
from langchain_anthropic import ChatAnthropic
class GraphState(TypedDict):
task: str
result: str
@register_plugin
class MyPlugin(BasePlugin):
name = "my-plugin"
version = "1.0.0"
description = "Summarises documents."
capabilities = ["summarize documents", "extract key points"]
required_scopes = ["context:read", "store:write"]
required_secret_names = [] # declare any tenant secrets needed here
token_budget = 50_000
def initialize(self):
# SDK wraps client with token budget tracker automatically
self.llm = self.sdk.get_llm(model="claude-sonnet-4-6")
self.graph = self._build_graph()
def _build_graph(self):
graph = AAUStateGraph(GraphState)
graph.add_node("summarize", self._summarize_node)
graph.set_entry_point("summarize")
graph.set_finish_point("summarize")
return graph.compile()
def _summarize_node(self, state: GraphState) -> GraphState:
# plugin_config comes from tenant_plugin_config table
style = self.plugin_config.get("summary_style", "concise")
response = self.llm.invoke(f"Summarise in {style} style: {state['task']}")
return {"result": response.content}
def execute(self, input: InputEnvelope) -> OutputEnvelope:
final_state = self.graph.invoke({"task": input.task_description})
return OutputEnvelope(
task_id=input.task_id,
status="success",
result=final_state["result"],
confidence=0.9,
)
# SDK validates OutputEnvelope against in-process schema cache before returning
20.5 Local Test Harness
from aau_sdk.testing import PluginHarness
from my_plugin import MyPlugin
harness = PluginHarness(MyPlugin())
result = harness.run(
task="Summarise this architecture document",
tenant_id="test-tenant-1",
plugin_config={"summary_style": "detailed"},
attachments=["doc.pdf"],
)
assert result.status == "success"
assert result.confidence >= 0.7
harness.assert_tokens_within_budget()
harness.assert_no_cross_tenant_calls()
harness.assert_output_schema_valid()
21. Plugin Example — Code Review Pro (Detailed)
21.1 Internal LangGraph Agent Graph
flowchart TD
INPUT["InputEnvelope<br/><i>PR URL + context.secrets.github_token<br/>+ context.plugin_config.coding_standards</i>"]
--> FETCH["Diff Fetcher Node<br/><i>GitHub API (token from context.secrets) → parsed diff</i>"]
--> SPLIT["File Splitter Node<br/><i>Groups files by type/module</i>"]
--> PARALLEL2["Parallel Branch (LangGraph Send API)"]
PARALLEL2 --> SEC["Security Node<br/><i>OWASP + dependency vulns + secret detection</i>"]
PARALLEL2 --> STYLE2["Style Node<br/><i>plugin_config.coding_standards applied</i>"]
PARALLEL2 --> LOGIC2["Logic Node<br/><i>Bug detection, edge cases, race conditions</i>"]
PARALLEL2 --> PERF["Performance Node<br/><i>N+1 queries, memory leaks, complexity</i>"]
SEC --> MERGE3["Result Merger Node"]
STYLE2 --> MERGE3
LOGIC2 --> MERGE3
PERF --> MERGE3
MERGE3 --> PRIORITY2["Priority Ranker Node<br/><i>Critical → High → Medium → Low</i>"]
--> SUMMARY2["Summary Node<br/><i>Executive summary + inline comments</i>"]
--> OUTPUT2["OutputEnvelope<br/><i>Review report (MinIO URL) + confidence</i>"]
21.2 Agent Node Responsibilities
| Node | Input | Output | Tools |
|---|---|---|---|
| Diff Fetcher | PR URL | Parsed diff | GitHub API (context.secrets.github_token) |
| File Splitter | Diff | File batches | Internal logic |
| Security | File batch | Security findings | OWASP rules, Trivy |
| Style | File batch | Style violations | plugin_config.coding_standards |
| Logic | File batch | Bug candidates | claude-sonnet-4-6 |
| Performance | File batch | Perf concerns | Complexity analyser |
| Priority Ranker | All findings | Prioritised list | Scoring model |
| Summary | Prioritised findings | Review report | claude-sonnet-4-6 |
22. Data Storage Architecture
graph TB
subgraph DataStores["Data Layer"]
subgraph Primary["Primary Stores"]
PG4["PostgreSQL 16<br/><i>tenants, users, api_keys, tenant_secrets,<br/>plugins, plugin_versions, envelope_schemas,<br/>subscriptions, tenant_plugin_config,<br/>tasks, webhooks, execution_events</i>"]
PGV2["pgvector (HNSW index)<br/><i>org_knowledge_chunks<br/>scoped by tenant_id</i>"]
REDIS3["Redis 7<br/><i>Auth cache, plugin registry cache,<br/>circuit breakers, route cache,<br/>rate limit counters, cancellation flags<br/>+ Streams: tasks / summarisation / webhooks</i>"]
end
subgraph Storage2["Storage"]
MINIO3["MinIO<br/><i>/{tenant_id}/{plugin_name}/{task_id}/ — artifacts</i>"]
ZOT2["Zot OCI Registry<br/><i>Plugin container images</i>"]
end
subgraph PlatformSecrets["Platform Credentials"]
K8SSEC2["Kubernetes Secrets<br/><i>LLM API key, embedding API key,<br/>pgcrypto ENCRYPT_KEY</i>"]
end
end
APP2["Platform App"] --> PG4
APP2 --> PGV2
APP2 --> REDIS3
APP2 --> K8SSEC2
PLUGINS3["Plugins"] --> MINIO3
WBILLING2["Billing Worker"] --> PG4
WDELIVER2["Webhook Worker"] --> REDIS3
WDELIVER2 --> PG4
DEPLOY2["Plugin Deployments"] --> ZOT2
Data retention & GDPR:
| Data | Retention | GDPR erasure |
|---|---|---|
execution_events | 13 months | Pseudonymise user_id within 30 days |
| MinIO artifacts | 90 days (tenant-configurable) | Purge within 30 days |
tasks.input / tasks.result | 90 days | Purge within 30 days |
users row | Until account deleted | Hard delete (cascade) |
| Billing aggregates | 7 years (accounting) | Not erasable — no PII in aggregates |
GDPR Erasure Flow
flowchart TD
REQUEST["POST /v1/gdpr/erasure-request {user_id}"]
--> VALIDATE2["Verify requester is user or tenant admin"]
--> STREAM2["XADD stream:erasure {user_id, tenant_id}"]
--> ACK["202 Accepted — SLA: 30 days"]
STREAM2 --> WORKER2["Erasure Worker (Redis Streams consumer)"]
WORKER2 --> TASKS_PURGE["Nullify tasks.input, tasks.result WHERE user_id=$1"]
WORKER2 --> EVENTS_PSEUDO["Pseudonymise execution_events.user_id"]
WORKER2 --> ARTIFACTS_DEL["DELETE MinIO: /{tenant_id}/.../{user_id}/"]
WORKER2 --> HISTORY_DEL["DEL Redis ctx:{tenant_id}:{user_id}:*"]
WORKER2 --> USER_DEL["DELETE users WHERE id=$1 (cascades to api_keys)"]
WORKER2 --> AUDIT["Immutable erasure audit log (no PII)"]
23. Deployment Architecture
graph TB
subgraph K8s["Kubernetes Cluster (cloud-agnostic, plain Deployments + PVCs)"]
subgraph AppLayer2["Application (single replica — v1)"]
APP3["Platform App<br/><i>FastAPI + Uvicorn :8080<br/>auth, rate limiting, orchestrator,<br/>registration routes</i>"]
WWEBHOOK["Webhook Delivery Worker<br/><i>Redis Streams consumer</i>"]
WBILL["Billing Aggregation Worker<br/><i>Scheduled cron</i>"]
end
subgraph PluginPods2["Plugin Pods (HPA: 1–10 replicas per plugin)"]
PP4["Code Review Pro"]
PP5["Customer Support AI"]
PPN2["Plugin N"]
end
subgraph DataLayer["Data Services (plain Deployments + PVCs)"]
PG5["PostgreSQL 16<br/><i>Deployment + PVC</i>"]
REDIS4["Redis 7<br/><i>Deployment + PVC</i>"]
MINIO4["MinIO<br/><i>Deployment + PVC</i>"]
ZOT3["Zot OCI Registry<br/><i>Deployment + PVC</i>"]
end
subgraph Obs["Observability"]
PROM2["Prometheus"]
GRAF2["Grafana (Metrics + Logs)"]
LOKI3["Loki"]
end
end
subgraph External3["External"]
LB2["Ingress Controller (nginx)<br/><i>TLS termination</i>"]
CDN2["CDN — Marketplace UI"]
end
LB2 --> APP3
CDN2 --> APP3
APP3 --> PluginPods2
PluginPods2 -->|"POST /internal/registry/register"| APP3
APP3 -->|"Redis pub/sub"| APP3
PROM2 --> GRAF2
LOKI3 --> GRAF2
Key deployment principles:
- Cloud-agnostic: No cloud-provider-specific services. Plain Kubernetes
Deployment+PersistentVolumeClaimfor all stateful services. No operators required at single-replica scale. - One application, two workers:
Platform Apphandles all HTTP traffic.Webhook Delivery WorkerandBilling Aggregation Workerare separate pods consuming from Redis Streams and the scheduler respectively. - Plugins self-register: Plugin pods call
POST /internal/registry/registeron startup. The app writes to PostgreSQL and Redis. No file system watching. - Zero-downtime plugin updates: Kubernetes rolling deployment + graceful in-flight request draining.
- v2 HA path: Add a PostgreSQL streaming replica, Redis Sentinel, and increase replicas on the app. No architectural changes needed — plain Deployments scale horizontally.
24. Observability
24.1 Metrics (Prometheus + Grafana)
| Metric | Type | Labels | Purpose |
|---|---|---|---|
aau_router_confidence | Histogram | method | Routing quality over time |
aau_router_method_total | Counter | method | LLM fallback rate (alert > 10%) |
aau_plugin_execution_duration_seconds | Histogram | plugin, status | Latency per plugin |
aau_plugin_token_usage_total | Counter | plugin, model | Token cost attribution |
aau_plugin_budget_exceeded_total | Counter | plugin | Overage events |
aau_circuit_breaker_state | Gauge | plugin | 0=closed, 1=half-open, 2=open |
aau_task_stream_depth | Gauge | plugin | Redis Stream pending count |
aau_rate_limit_hits_total | Counter | tenant_id | Throttling events |
aau_webhook_delivery_failures_total | Counter | tenant_id | Delivery health |
aau_api_key_auth_latency_seconds | Histogram | cache_hit | Auth path performance |
24.2 Alerting Pipeline
flowchart TD
PROM3["Prometheus (scrapes every 15s)"]
--> RULES["PrometheusRule alerts"]
RULES --> A1["circuit_breaker_state == 2 → plugin circuit open"]
RULES --> A2["router_method{llm} / total > 0.10 → LLM routing rate high"]
RULES --> A3["task_stream_depth > 500 → queue backing up"]
RULES --> A4["execution_duration p99 > 30s → latency SLO breach"]
RULES --> A5["auth_latency p99 > 500ms → auth path slow"]
A1 --> ALERTMANAGER["Alertmanager"]
A2 --> ALERTMANAGER
A3 --> ALERTMANAGER
A4 --> ALERTMANAGER
A5 --> ALERTMANAGER
ALERTMANAGER --> SLACK2["Slack #alerts"]
ALERTMANAGER --> PAGERDUTY["PagerDuty (P1 only)"]
24.3 Structured Logging (Loki)
Every log line is JSON with mandatory fields:
{
"timestamp": "2026-02-28T12:00:00Z",
"level": "info",
"service": "platform-app",
"task_id": "task_abc123",
"tenant_id": "tenant_xyz",
"plugin_name": "code-review",
"node": "security_agent",
"routing_method": "embedding",
"duration_ms": 342,
"tokens_used": 1240,
"message": "Node completed"
}
PII policy: Never log email, API key plaintext, raw user input, or tasks.input content. task_id is the primary correlation key.
24.4 SLOs
| SLO | Target | Alert threshold |
|---|---|---|
| Router p99 latency (embedding path) | < 200ms | > 400ms |
| Plugin execution success rate | > 99% | < 98% |
| Task delivery (sync) p95 | < 5s | > 10s |
| LLM fallback routing rate | < 5% | > 10% |
| Webhook delivery success rate | > 99.5% | < 99% |
| Auth p99 latency | < 100ms | > 500ms |
25. Non-Functional Requirements
| Requirement | Target | Approach |
|---|---|---|
| Availability | 99.9% uptime | Circuit breakers per plugin; HA as v2 milestone (plain Deployment scale-out) |
| Scalability | 10K concurrent executions | Plugin HPA, Redis Streams for async backpressure |
| Latency | p95 < 5s sync tasks | Embedding routing (< 200ms), route cache, async summarisation |
| Isolation | Zero cross-tenant impact | tenant_id enforced at query layer, Redis namespacing, MinIO path isolation, NetworkPolicy |
| Security | SOC2-ready | SHA-256 key hashing, pgcrypto tenant secrets, K8s Secrets platform creds, cosign plugin signing, scope enforcement |
| Observability | Full request tracing | task_id correlation, Prometheus + Grafana + Loki, 6 SLOs with alert thresholds |
| Extensibility | Local dev in < 1 hour | Plugin SDK, Docker Compose full stack, aau plugin new scaffolding |
| Versioning | No breaking surprises | envelope_schemas table + in-process cache, version adapters, 90-day deprecation window |
| Data compliance | GDPR-ready | Tenant isolation, 30-day erasure SLA, erasure worker via Redis Streams |
| Operational simplicity | Minimal ops burden | 1 app + 2 workers, plain K8s Deployments, Redis Streams reuses existing Redis |
| Router cost | LLM calls < 5% | Embedding-first, 5-min route cache, keyword fallback, 2-threshold routing |
26. Summary
The AI Agents Universe is a platform where:
- For customers: Subscribe to the plugins you need. One API key, one interface. Long-running tasks return a
task_idimmediately — poll or receive a signed webhook. - For developers:
aau plugin newscaffolds a LangGraph plugin.aau plugin devstarts a full local Docker Compose stack with hot-reload.aau plugin publishbuilds, scans with Trivy, signs with cosign, and registers automatically. The entire contract is one Python class extendingBasePlugin. - For the business: Token-level usage billing. Invite-only plugin ecosystem at launch; public marketplace in v2. Enterprise custom builds via direct engagement.
- For operators: Self-hosted Kubernetes with no proprietary managed services and no Kubernetes operators — plain Deployments and PVCs throughout. One application pod, two worker pods, and five data services. Redis does double duty as cache and async queue via Streams, eliminating RabbitMQ. PostgreSQL does double duty as operational store and vector store via pgvector, eliminating a separate vector database. Secrets are stored encrypted in PostgreSQL using pgcrypto and in Kubernetes Secrets — HashiCorp Vault is a v2 addition when untrusted third-party plugins arrive.
The simplification principle: every infrastructure component must earn its place. When two use cases can be served by one tool already in the stack, that is always preferred over adding a new service.