Engineering Notes

Engineering Notes

Thoughts and Ideas on AI by Muthukrishnan

Conceptual AI Agents Universe — System Design Document

28 Feb 2026

PART 1: HIGH-LEVEL DESIGN


1. Vision & Problem Statement

Organizations need AI automation across multiple business functions — code review, customer support, QA, documentation, data analysis, and more. Today, each of these is a siloed solution with its own interface, deployment, and learning curve.

AI Agents Universe solves this by providing a single platform where each AI capability is a self-contained, subscription-based plugin. One interface, one orchestrator, unlimited capabilities — added without touching existing code.

Core promise: A customer subscribes to the plugins they need. A developer publishes a plugin. The universe expands.


2. Technology Stack

All technology decisions are fixed here. These are not suggestions — they are the choices the platform is built on.

LayerTechnologyRationale
API frameworkFastAPI + UvicornAsync-native, auto-generates OpenAPI docs, strong AI/ML ecosystem
Agent frameworkLangGraphStateful graph-based agents, explicit control flow, production-grade for multi-step workflows
Default LLMclaude-sonnet-4-6Strong reasoning, good cost/performance balance for agentic tasks
Embedding modeltext-embedding-3-small (OpenAI)Hosted, fast, ~$0.02/1M tokens, sufficient for capability matching
Primary databasePostgreSQL 16 + SQLAlchemy 2 (async) + AlembicAll operational data; Alembic manages schema migrations
Vector storepgvector extension (same PostgreSQL instance)RAG for org knowledge; no separate service up to ~1M vectors per tenant
CacheRedis 7Auth cache, plugin registry, circuit breakers, route cache, rate limits, Redis Streams queues
Async queuesRedis StreamsTask execution, background summarisation, webhook delivery — reuses Redis, no extra service
Object storageMinIO (self-hosted, S3-compatible)Artifacts and reports; cloud-agnostic
Container registryZot (self-hosted OCI registry)Plugin container images; MinIO is S3, not an OCI registry
Secrets (v1)Kubernetes Secrets + pgcryptoPlatform credentials in K8s Secrets; tenant tokens AES-256 encrypted in PostgreSQL
Rate limitingslowapi (FastAPI middleware)Sliding window; plugs into FastAPI route decorators
MetricsPrometheus + GrafanaKubernetes-native metrics and dashboards
LogsLoki + GrafanaStructured log aggregation alongside metrics in Grafana
DeploymentSelf-hosted Kubernetes (cloud-agnostic)Plain Deployments + PVCs; no operators needed at v1 single-replica scale
Local devDocker ComposeFull local stack: app, PostgreSQL, Redis, MinIO, Zot

v1 scope: Single replicas for all services (no explicit HA). SSE streaming deferred to v2. Third-party plugins are invite-only at launch. Vault introduced in v2 when untrusted third-party plugins require isolated secret management and audit trails.

What was simplified vs. earlier drafts:

Removed / replacedWithWhy
HashiCorp VaultK8s Secrets + pgcryptoNo untrusted plugins in v1; Vault overhead not justified
RabbitMQRedis StreamsAlready have Redis; three queue use cases don’t need AMQP
5 separate microservices1 FastAPI app + 2 background workersRegistration, billing, notification are not services at this scale
5 Kubernetes operatorsPlain Deployments + PVCsOperators are for HA clusters; v1 is single-replica
Custom Schema Registry serviceenvelope_schemas table + in-process cacheA service for one table is over-engineering
MinIO as image storeZot (OCI registry)Containerd cannot pull from S3; OCI images need an OCI registry
bcrypt for API key hashingSHA-256bcrypt is for passwords; API keys are long random strings; SHA-256 is fast and sufficient
3 routing confidence thresholds2 configurable thresholdsSimpler to calibrate; removes an undefined dead zone between 0.45–0.65
Gateway + Orchestrator (2 pods)Single FastAPI appInternal HTTP hop for nothing; auth is middleware, not a separate service

3. System Overview

flowchart TB
    subgraph External["External Channels"]
        direction LR
        CLI --- API --- WEB --- SLACK --- WEBHOOK
    end

    subgraph App["Platform App (single FastAPI + Uvicorn process)"]
        AUTH["API Key Auth<br/><i>SHA-256 verify → tenant_id + user_id</i>"]
        RATE["Rate Limiter<br/><i>slowapi sliding window</i>"]
        ROUTER["Intent Router<br/><i>text-embedding-3-small + claude-sonnet-4-6 fallback</i>"]
        REGISTRY["Plugin Registry<br/><i>Redis cache / PostgreSQL source of truth</i>"]
        CONTEXT["Shared Context Store<br/><i>History (windowed) + RAG org knowledge (pgvector)</i>"]
        SEQUENCER["Task Sequencer<br/><i>Single & multi-plugin plans, DFS dep check</i>"]
        FAILURE["Failure Manager<br/><i>Retry, fallback, circuit breaker</i>"]

        AUTH --> RATE --> ROUTER
        ROUTER --> REGISTRY
        ROUTER --> CONTEXT
        ROUTER --> SEQUENCER
        SEQUENCER --> FAILURE
    end

    subgraph Workers["Background Workers"]
        WDELIVER["Webhook Delivery Worker<br/><i>Redis Streams consumer</i>"]
        WBILLING["Billing Aggregation Worker<br/><i>Scheduled cron</i>"]
    end

    subgraph Subscription["Subscription Layer (in-process)"]
        ACCESS["Access Control"]
        METER["Usage Metering"]
        QUOTA["Quota + Token Budget"]
        ACCESS --> METER --> QUOTA
    end

    subgraph PluginUniverse["Plugin Universe (LangGraph agents)"]
        direction LR
        P1["Code Review Pro"] --- P2["Customer Support AI"] --- P3["QA Transformer"]
    end

    subgraph Infra["Shared Infrastructure"]
        REDIS2["Redis 7<br/><i>Cache + Streams queues</i>"]
        PG["PostgreSQL 16 + pgvector<br/><i>All operational data + org knowledge RAG</i>"]
        MINIO["MinIO<br/><i>Artifacts + reports</i>"]
        ZOT["Zot OCI Registry<br/><i>Plugin container images</i>"]
        PROM["Prometheus + Grafana + Loki"]
        K8SSEC["Kubernetes Secrets<br/><i>Platform credentials</i>"]
    end

    External --> App
    App --> Subscription
    Subscription --> PluginUniverse
    App --> Workers
    App --> Infra
    Workers --> Infra
    PluginUniverse --> Infra
  

4. Core Architecture Layers

Layer 1: Interface, Auth & Rate Limiting

The platform is a single FastAPI application. Auth middleware and rate limiting run in-process before route handlers — no separate gateway pod, no internal HTTP hop, no serialisation of auth context between services.

API Key Authentication

sequenceDiagram
    participant Client
    participant App as FastAPI App
    participant Cache as Redis
    participant DB as PostgreSQL

    Client->>App: Request + Authorization: Bearer sk_live_a1b2c3d4...
    App->>App: Extract lookup prefix<br/>(first 8 chars of random part: "a1b2c3d4")
    App->>Cache: GET auth:a1b2c3d4 (5-min TTL)

    alt Cache hit
        Cache-->>App: {tenant_id, user_id, scopes}
    else Cache miss
        App->>DB: SELECT * FROM api_keys WHERE key_prefix = 'a1b2c3d4'
        DB-->>App: {key_hash, tenant_id, user_id, scopes, expires_at, revoked_at}
        App->>App: SHA-256(raw_key) == key_hash?
        alt Valid + not expired + not revoked
            App->>Cache: SET auth:a1b2c3d4 {tenant_id, user_id, scopes} EX 300
        else Invalid / expired / revoked
            App-->>Client: 401 AUTH_INVALID
        end
    end

    App->>App: Attach (tenant_id, user_id, scopes) to request state
    App-->>Client: Continue to handler
  

Key format: sk_live_<32-char random hex> for production; sk_test_<32-char random hex> for sandbox. The lookup prefix is the first 8 characters of the random part (the hex after sk_live_), not sk_live_ itself which is identical on every key. SHA-256 is used for hashing — API keys are long random strings, not passwords; SHA-256 runs in microseconds and is cryptographically sufficient for this use case. bcrypt’s brute-force resistance adds nothing over a 32-byte random value.

API Key Lifecycle

flowchart LR
    CREATE["POST /v1/api-keys<br/><i>Generate random key<br/>SHA-256 hash → store<br/>Return plaintext once only</i>"]
    --> ACTIVE["Active<br/><i>5-min Redis auth cache</i>"]
    ACTIVE -->|"Manual rotation"| ROTATE["New key issued<br/><i>Old key valid for 24h grace period</i>"]
    ROTATE --> ACTIVE
    ACTIVE -->|"DELETE /v1/api-keys/{id}"| REVOKED["Revoked<br/><i>revoked_at set<br/>Cache invalidated immediately</i>"]
    ACTIVE -->|"expires_at reached"| EXPIRED["Expired<br/><i>Rejected on next use</i>"]
    REVOKED --> PURGED["Purged after 90 days"]
    EXPIRED --> PURGED
  

Rate Limiting

slowapi runs as FastAPI middleware in the same process. Every response includes standard rate limit headers.

flowchart TD
    REQ["Incoming Request<br/><i>tenant_id from auth state</i>"]
    --> KEY["Build limit key<br/><i>rate:{tenant_id}:{plugin_name}:{window_start}</i>"]
    --> INCR["Redis INCR + EXPIREAT<br/><i>atomic sliding window</i>"]
    --> CHECK{"Count ≤ limit?"}

    CHECK -->|"Yes"| HEADERS["Set response headers:<br/>X-RateLimit-Limit<br/>X-RateLimit-Remaining<br/>X-RateLimit-Reset"]
    CHECK -->|"No"| REJECT["429 Too Many Requests<br/>Retry-After: {seconds}<br/>error_code: RATE_LIMITED"]

    HEADERS --> PROCEED["Forward to route handler"]
  

Default limits (configurable per tenant):

TierRequests/min per plugin
Starter20
Professional100
EnterpriseCustom

Layer 2: Orchestrator (same process)

Intent Router, Plugin Registry, Context Store, Task Sequencer, Failure Manager — all in-process. See Section 14 for deep-dive.

Layer 3: Subscription Layer (in-process)

Access control, metering, quota enforcement. The subscription check occurs inside the intent router before any plan is built. See Section 17 for deep-dive.

Layer 4: Plugin Universe

Each plugin is a LangGraph StateGraph. Fully stateless per execution. See Section 10 for deep-dive.

Layer 5: Shared Infrastructure


5. Request Lifecycle

sequenceDiagram
    participant User
    participant App as FastAPI App (auth + orchestrator)
    participant Router as Intent Router
    participant Registry as Plugin Registry (Redis/PG)
    participant Sub as Subscription Layer
    participant Sequencer as Task Sequencer
    participant Plugin as Target Plugin (LangGraph)
    participant Agents as Internal Agent Nodes
    participant Store as MinIO

    User->>App: POST /v1/tasks {command, attachments}
    App->>App: SHA-256 verify API key → tenant_id, user_id
    App->>App: slowapi rate limit check

    App->>Router: Forward command + tenant context
    Router->>Router: Embed command (text-embedding-3-small)
    Router->>Registry: Score against capability embeddings
    Registry-->>Router: Matching plugins + confidence scores

    Router->>Sub: Check access for tenant → plugin
    Sub-->>Router: Authorized / Denied / Quota exceeded

    alt Authorized (sync)
        Router->>Sequencer: Build execution plan
        Sequencer->>Plugin: Send InputEnvelope

        Plugin->>Agents: Invoke LangGraph StateGraph
        Agents->>Agents: Node 1 → Node 2 → Node 3
        Agents->>Store: Save artifacts (MinIO)
        Agents-->>Plugin: Final graph state

        Plugin-->>Sequencer: OutputEnvelope (SDK-validated)
        Sequencer-->>Router: Final response
        Router-->>App: Response + artifact URLs
        App-->>User: 200 OK {task_id, status, result}

        Sub->>Sub: Write execution_events row
    else Authorized (async)
        App->>App: XADD tasks:{plugin_name} (Redis Stream)
        App-->>User: 202 Accepted {task_id}
    else Not subscribed
        App-->>User: 403 PLUGIN_NOT_SUBSCRIBED {upgrade_url}
    else Quota exceeded
        App-->>User: 429 QUOTA_EXCEEDED {quota_warning}
    end
  

6. REST API Specification

All endpoints require Authorization: Bearer <api_key>.

6.1 Task Endpoints

MethodPathDescriptionResponse
POST/v1/tasksSubmit a task202 {task_id} or 200 {result}
GET/v1/tasks/{task_id}Poll task status200 {task}
DELETE/v1/tasks/{task_id}Cancel a running task202 {task_id, status: cancelling}
GET/v1/tasksList tasks (paginated)200 {tasks[], next_cursor}

6.2 Plugin Endpoints

MethodPathDescriptionResponse
GET/v1/pluginsList available plugins for tenant200 {plugins[]}
GET/v1/plugins/{name}Plugin detail + capabilities200 {plugin}
GET/v1/plugins/{name}/healthPlugin health status200 {status, components}

6.3 Auth Endpoints

MethodPathDescriptionResponse
POST/v1/api-keysCreate API key201 {id, key (shown once), prefix}
GET/v1/api-keysList API keys for tenant200 {keys[]}
DELETE/v1/api-keys/{id}Revoke API key204

6.4 Registration Endpoints (internal — plugin pods only)

MethodPathDescriptionResponse
POST/internal/registry/registerPlugin self-registers on startup201 {plugin_id}
DELETE/internal/registry/{name}/{version}Plugin deregisters on shutdown204
POST/internal/registry/{name}/healthPlugin reports health200

6.5 Webhook Endpoints

MethodPathDescriptionResponse
POST/v1/webhooksRegister a webhook endpoint201 {webhook_id}
DELETE/v1/webhooks/{webhook_id}Remove a webhook204
GET/v1/webhooks/{webhook_id}/deliveriesView delivery attempts200 {deliveries[]}

6.6 Standard Error Format

{
  "error": {
    "code": "PLUGIN_NOT_SUBSCRIBED",
    "message": "Your plan does not include the code-review plugin.",
    "upgrade_url": "https://app.aau.io/billing/upgrade",
    "task_id": "task_abc123",
    "request_id": "req_xyz789"
  }
}

6.7 Error Code Taxonomy

CodeHTTPCategoryRetry?
PLUGIN_NOT_SUBSCRIBED403PermanentNo
QUOTA_EXCEEDED429PermanentNo
RATE_LIMITED429TransientYes (Retry-After)
PLUGIN_UNAVAILABLE503TransientYes
PLUGIN_TIMEOUT504TransientYes
CIRCULAR_DEPENDENCY422PermanentNo
INVALID_INPUT422PermanentNo
TOKEN_BUDGET_EXCEEDED200 (partial)OverageBilled, no retry
CONTENT_POLICY_VIOLATION422PermanentNo
CONTEXT_WINDOW_OVERFLOW200 (retry)TransientYes (auto-truncated)
DEPENDENCY_FAILED200 (partial)DependsPartial result returned
AUTH_INVALID401PermanentNo

7. Subscription & Commercial Model

7.1 Subscription Tiers

TierPlugins IncludedCross-Plugin ChainingCustom PluginsPricing Model
Starter1 plugin of choiceNoNoFlat monthly per plugin
ProfessionalUp to 5 pluginsYesNoBundle discount + per-execution overage
EnterpriseUnlimitedYesYes (custom-built)Annual contract, volume pricing

Subscription enforcement position: The intent router checks the tenant’s subscription immediately after scoring plugins — before building an execution plan.

7.2 Plugin Marketplace

Third-party plugins are invite-only at launch. Marketplace infrastructure is a v2 deliverable.

graph LR
    subgraph Marketplace["Plugin Marketplace (v1 — first-party only)"]
        CATALOG["Plugin Catalog<br/><i>Browse & search</i>"]
        DETAIL["Product Page<br/><i>Capabilities, use cases, pricing</i>"]
        SANDBOX["Sandbox Mode<br/><i>10 free executions</i>"]
        SUB["Subscribe<br/><i>Activate for tenant</i>"]
        CATALOG --> DETAIL --> SANDBOX --> SUB
    end

    subgraph Revenue["Revenue Streams"]
        FIRST["First-Party Plugins"]
        CUSTOM["Enterprise Custom Builds"]
    end

    subgraph Billing["Billing Engine"]
        BASE["Base Subscription"]
        USAGE["Usage-Based Charges"]
        OVERAGE["Overage Billing"]
        BASE --> USAGE --> OVERAGE
    end

    Marketplace --> Revenue
    Revenue --> Billing
  

8. Sample Plugin Catalog

PluginDomainCapabilitiesAgents Inside
Code Review ProEngineeringPR reviews, security scanning, style checks, refactor suggestionsDiff Analyzer, Security Scanner, Style Checker, Summary Writer
Customer Support AISupportTicket classification, resolution, response drafting, escalationClassifier, Resolution Engine, Response Drafter, Escalation Agent
QA TransformerQualityTest case generation, coverage validation, gap analysisCode Analyzer, Test Generator, Coverage Validator
Ticket ProcessorOperationsTriage, assignment, status tracking, SLA monitoringTriager, Assigner, Status Tracker, SLA Monitor
Doc GeneratorContentPRDs, tech docs, reports from conversationsExtractor, Structurer, Writer, Reviewer
Data AnalystAnalyticsSQL generation, visualization, insight surfacingQuery Builder, Visualizer, Insight Generator, Narrator

PART 2: LOW-LEVEL DESIGN


9. Database Schema

9.1 Entity Relationship Diagram

erDiagram
    tenants {
        uuid id PK
        text name
        text plan_tier
        jsonb settings
        timestamptz created_at
    }
    users {
        uuid id PK
        uuid tenant_id FK
        text email
        text role
        timestamptz created_at
    }
    api_keys {
        uuid id PK
        uuid tenant_id FK
        uuid user_id FK
        text key_hash
        text key_prefix
        text[] scopes
        timestamptz expires_at
        timestamptz revoked_at
    }
    tenant_secrets {
        uuid id PK
        uuid tenant_id FK
        text plugin_name
        text secret_name
        bytea secret_value
        timestamptz created_at
    }
    plugins {
        uuid id PK
        text name
        text category
        text owner_type
        text status
        timestamptz deprecated_at
        timestamptz sunset_at
    }
    plugin_versions {
        uuid id PK
        uuid plugin_id FK
        text version
        jsonb manifest
        text envelope_version
        text status
        timestamptz registered_at
    }
    envelope_schemas {
        uuid id PK
        text version
        jsonb input_schema
        jsonb output_schema
        timestamptz deprecated_at
        timestamptz created_at
    }
    subscriptions {
        uuid id PK
        uuid tenant_id FK
        uuid plugin_id FK
        text tier
        int quota_limit
        bool overage_enabled
        timestamptz started_at
        timestamptz ended_at
    }
    tenant_plugin_config {
        uuid id PK
        uuid tenant_id FK
        uuid plugin_id FK
        jsonb config
        timestamptz updated_at
    }
    tasks {
        uuid id PK
        uuid tenant_id FK
        uuid user_id FK
        text plugin_name
        text status
        jsonb input
        jsonb result
        int tokens_used
        numeric cost_estimate
        text routing_method
        timestamptz created_at
        timestamptz completed_at
    }
    webhooks {
        uuid id PK
        uuid tenant_id FK
        text url
        text secret_hash
        text[] events
        bool active
    }
    execution_events {
        uuid id PK
        uuid tenant_id FK
        text plugin_name
        uuid task_id FK
        text status
        int tokens_used
        int latency_ms
        numeric cost_estimate
        text routing_method
        bool billed
        timestamptz timestamp
    }
    org_knowledge_chunks {
        uuid id PK
        uuid tenant_id FK
        text source_url
        text content
        vector embedding
        jsonb metadata
    }

    tenants ||--o{ users : "has"
    tenants ||--o{ api_keys : "has"
    tenants ||--o{ subscriptions : "has"
    tenants ||--o{ tenant_plugin_config : "configures"
    tenants ||--o{ tasks : "submits"
    tenants ||--o{ webhooks : "registers"
    tenants ||--o{ execution_events : "generates"
    tenants ||--o{ org_knowledge_chunks : "owns"
    tenants ||--o{ tenant_secrets : "stores"
    users ||--o{ api_keys : "owns"
    users ||--o{ tasks : "submits"
    plugins ||--o{ plugin_versions : "has"
    plugins ||--o{ subscriptions : "subscribed via"
    plugins ||--o{ tenant_plugin_config : "configured by"
    tasks ||--o{ execution_events : "produces"
  

9.2 PostgreSQL — Full DDL

CREATE EXTENSION IF NOT EXISTS vector;
CREATE EXTENSION IF NOT EXISTS pgcrypto;

-- Tenants and users
CREATE TABLE tenants (
    id          UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    name        TEXT NOT NULL,
    plan_tier   TEXT NOT NULL CHECK (plan_tier IN ('starter', 'professional', 'enterprise')),
    settings    JSONB DEFAULT '{}',
    created_at  TIMESTAMPTZ NOT NULL DEFAULT now()
);

CREATE TABLE users (
    id          UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    tenant_id   UUID NOT NULL REFERENCES tenants(id),
    email       TEXT NOT NULL UNIQUE,
    role        TEXT NOT NULL DEFAULT 'member',
    created_at  TIMESTAMPTZ NOT NULL DEFAULT now()
);

-- API keys — SHA-256 hashed, not bcrypt
CREATE TABLE api_keys (
    id           UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    tenant_id    UUID NOT NULL REFERENCES tenants(id),
    user_id      UUID REFERENCES users(id),   -- NULL = service account
    key_hash     TEXT NOT NULL UNIQUE,         -- SHA-256(raw_key), hex-encoded
    key_prefix   TEXT NOT NULL,               -- first 8 chars of random part for lookup
    scopes       TEXT[] NOT NULL DEFAULT '{}',
    description  TEXT,
    last_used_at TIMESTAMPTZ,
    expires_at   TIMESTAMPTZ,
    revoked_at   TIMESTAMPTZ,
    created_at   TIMESTAMPTZ NOT NULL DEFAULT now()
);

-- Tenant-owned secrets: AES-256 via pgcrypto, replaces Vault for v1
-- Platform encryption key stored in Kubernetes Secret, injected as env var ENCRYPT_KEY
CREATE TABLE tenant_secrets (
    id           UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    tenant_id    UUID NOT NULL REFERENCES tenants(id),
    plugin_name  TEXT NOT NULL,
    secret_name  TEXT NOT NULL,               -- e.g. "github_token", "jira_api_key"
    secret_value BYTEA NOT NULL,              -- pgp_sym_encrypt(value, ENCRYPT_KEY)
    created_at   TIMESTAMPTZ NOT NULL DEFAULT now(),
    UNIQUE (tenant_id, plugin_name, secret_name)
);

-- Helper functions for encrypt/decrypt
-- encrypt: SELECT pgp_sym_encrypt('value', current_setting('app.encrypt_key'))
-- decrypt: SELECT pgp_sym_decrypt(secret_value, current_setting('app.encrypt_key'))

-- Plugin registry (durable source of truth — Redis is the cache)
CREATE TABLE plugins (
    id            UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    name          TEXT NOT NULL UNIQUE,
    category      TEXT NOT NULL,
    owner_type    TEXT NOT NULL CHECK (owner_type IN ('first_party', 'enterprise_custom')),
    status        TEXT NOT NULL DEFAULT 'active'
                      CHECK (status IN ('active', 'deprecated', 'sunset', 'retired')),
    deprecated_at TIMESTAMPTZ,
    sunset_at     TIMESTAMPTZ,
    created_at    TIMESTAMPTZ NOT NULL DEFAULT now()
);

CREATE TABLE plugin_versions (
    id               UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    plugin_id        UUID NOT NULL REFERENCES plugins(id),
    version          TEXT NOT NULL,
    manifest         JSONB NOT NULL,
    envelope_version TEXT NOT NULL DEFAULT '1.0',
    status           TEXT NOT NULL DEFAULT 'active',
    registered_at    TIMESTAMPTZ NOT NULL DEFAULT now(),
    UNIQUE (plugin_id, version)
);

-- Envelope schema registry — a table, not a service
-- Cached in-process by orchestrator on startup; refreshed on NOTIFY envelope_schemas
CREATE TABLE envelope_schemas (
    id             UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    version        TEXT NOT NULL UNIQUE,
    input_schema   JSONB NOT NULL,
    output_schema  JSONB NOT NULL,
    deprecated_at  TIMESTAMPTZ,
    created_at     TIMESTAMPTZ NOT NULL DEFAULT now()
);

-- Subscriptions
CREATE TABLE subscriptions (
    id              UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    tenant_id       UUID NOT NULL REFERENCES tenants(id),
    plugin_id       UUID NOT NULL REFERENCES plugins(id),
    tier            TEXT NOT NULL,
    quota_limit     INTEGER,
    overage_enabled BOOLEAN NOT NULL DEFAULT false,
    started_at      TIMESTAMPTZ NOT NULL DEFAULT now(),
    ended_at        TIMESTAMPTZ,
    UNIQUE (tenant_id, plugin_id)
);

-- Tenant-specific plugin config
CREATE TABLE tenant_plugin_config (
    id         UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    tenant_id  UUID NOT NULL REFERENCES tenants(id),
    plugin_id  UUID NOT NULL REFERENCES plugins(id),
    config     JSONB NOT NULL DEFAULT '{}',
    updated_at TIMESTAMPTZ NOT NULL DEFAULT now(),
    UNIQUE (tenant_id, plugin_id)
);

-- Task tracking
CREATE TABLE tasks (
    id             UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    tenant_id      UUID NOT NULL REFERENCES tenants(id),
    user_id        UUID NOT NULL REFERENCES users(id),
    plugin_name    TEXT NOT NULL,
    status         TEXT NOT NULL DEFAULT 'pending'
                       CHECK (status IN
                           ('pending','queued','running','success','partial','failed','cancelled')),
    input          JSONB NOT NULL,
    result         JSONB,
    error          JSONB,
    tokens_used    INTEGER,
    cost_estimate  NUMERIC(10,6),
    routing_method TEXT,
    created_at     TIMESTAMPTZ NOT NULL DEFAULT now(),
    started_at     TIMESTAMPTZ,
    completed_at   TIMESTAMPTZ
);

-- Webhooks
CREATE TABLE webhooks (
    id          UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    tenant_id   UUID NOT NULL REFERENCES tenants(id),
    url         TEXT NOT NULL,
    secret_hash TEXT NOT NULL,
    events      TEXT[] NOT NULL DEFAULT '{task.completed}',
    active      BOOLEAN NOT NULL DEFAULT true,
    created_at  TIMESTAMPTZ NOT NULL DEFAULT now()
);

-- Metering (v1 in PostgreSQL; v2 upgrade to ClickHouse at billing scale)
CREATE TABLE execution_events (
    id               UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    tenant_id        UUID NOT NULL,
    plugin_name      TEXT NOT NULL,
    plugin_version   TEXT NOT NULL,
    task_id          UUID NOT NULL,
    status           TEXT NOT NULL,
    tokens_used      INTEGER,
    latency_ms       INTEGER,
    cost_estimate    NUMERIC(10,6),
    routing_method   TEXT,
    envelope_version TEXT,
    billed           BOOLEAN NOT NULL DEFAULT false,
    timestamp        TIMESTAMPTZ NOT NULL DEFAULT now()
);

CREATE INDEX ON execution_events (tenant_id, timestamp);
CREATE INDEX ON execution_events (task_id);

-- pgvector: org knowledge RAG
CREATE TABLE org_knowledge_chunks (
    id         UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    tenant_id  UUID NOT NULL REFERENCES tenants(id),
    source_url TEXT,
    content    TEXT NOT NULL,
    embedding  vector(1536),              -- text-embedding-3-small output dimension
    metadata   JSONB DEFAULT '{}',
    created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);

-- HNSW index: no minimum row requirement, no reindexing needed, better recall at v1 scale
CREATE INDEX ON org_knowledge_chunks
    USING hnsw (embedding vector_cosine_ops)
    WITH (m = 16, ef_construction = 64);

CREATE INDEX ON org_knowledge_chunks (tenant_id);

Why HNSW over IVFFlat: IVFFlat requires 3 × lists minimum rows (300 rows with lists=100) to produce useful results and needs periodic REINDEX as new rows are inserted. HNSW has no minimum, never needs reindexing, and has better recall at small-to-medium scale. Switch to IVFFlat only if memory pressure becomes a concern at large scale.


10. Plugin Architecture — Deep Dive

10.1 Plugin Internal Structure

graph TB
    subgraph PluginShell["Plugin Shell (Visible to Orchestrator)"]
        META["Identity & Manifest<br/><i>name, version, capabilities, schema</i>"]
        CONFIG["Configuration<br/><i>model, limits, token_budget, required_secret_names</i>"]
        LIFECYCLE["Lifecycle Hooks<br/><i>initialize · health_check · shutdown</i>"]
        EXECUTE["execute(InputEnvelope) → OutputEnvelope"]
    end

    subgraph PluginInternals["Plugin Internals — LangGraph StateGraph"]
        subgraph AgentGraph["StateGraph"]
            A1["Node: Task Decomposition"]
            A2["Node: Domain Processing"]
            A3["Node: Validation"]
            A4["Node: Synthesis & Output"]
            A1 -->|"GraphState"| A2
            A2 -->|"GraphState"| A3
            A3 -->|"GraphState"| A4
        end

        subgraph Budget["Token Budget Tracker (SDK)"]
            TB1["Track tokens per LLM call"]
            TB2["Emit warning at 80%"]
            TB3["Log overage at 100%+"]
        end

        subgraph Tools["Tools"]
            T1["External APIs<br/><i>credentials from InputEnvelope.context.secrets</i>"]
            T2["PostgreSQL queries (tenant-scoped)"]
            T3["MinIO artifact store"]
            T4["LLM calls (Anthropic SDK)"]
        end

        AgentGraph --> Budget
        AgentGraph --> Tools
    end

    EXECUTE --> PluginInternals
    PluginInternals -->|"OutputEnvelope (SDK-validated)"| EXECUTE
  

How plugins access credentials: The orchestrator resolves required_secret_names from tenant_secrets (decrypting with pgcrypto) and injects the resolved key-value pairs into InputEnvelope.context.secrets before calling execute(). Plugins never call the secrets store directly — the orchestrator is the only component that decrypts.

10.2 Plugin Interface Contract

Identity & Manifest:

FieldTypePurpose
namestringUnique identifier (e.g., "code-review")
versionstringSemantic version
descriptionstringHuman-readable summary
capabilitieslist[string]What it can do — embedded for routing
categorystringDomain tag
dependencieslist[PluginDep]{name, version_constraint}
input_schemadictJSON schema for expected input
envelope_versionstringWhich envelope version this plugin speaks
estimated_latencystringfast / medium / slow
required_scopeslist[string]Declared capability scopes
required_secret_nameslist[string]Secret names needed from tenant_secrets

Configuration:

FieldTypePurpose
modelstringLLM override (default: claude-sonnet-4-6)
temperaturefloatLLM temperature
max_retriesintRetry count before failing
timeout_secondsintMax execution time
token_budgetintMax tokens per execution

Lifecycle Methods:

MethodWhen CalledReturns
initialize()Once at registrationSets up connections, compiles LangGraph
health_check()Every 30sHealthStatus
execute(input)Per requestOutputEnvelope
shutdown()On deregistrationCleans up connections

Health check returns structured status:

@dataclass
class HealthStatus:
    status: Literal["healthy", "degraded", "unhealthy"]
    components: dict[str, str]  # {"llm": "healthy", "github_api": "degraded"}
    message: str | None = None

Three consecutive unhealthy responses trip the circuit breaker.


11. Standardized Envelopes

11.1 InputEnvelope

classDiagram
    class InputEnvelope {
        +string task_id
        +string task_description
        +string envelope_version
        +Context context
        +Metadata metadata
        +list~any~ attachments
    }

    class Context {
        +string user_id
        +string tenant_id
        +dict user_profile
        +list~dict~ conversation_history
        +list~dict~ org_knowledge_chunks
        +dict plugin_config
        +dict secrets
    }

    class Metadata {
        +string source_channel
        +string priority
        +string source_plugin
        +string correlation_id
        +datetime timestamp
    }

    InputEnvelope --> Context : context
    InputEnvelope --> Metadata : metadata
  
FieldTypeDescription
task_idstringUnique ID for tracking
task_descriptionstringNatural language task
envelope_versionstringSchema version (e.g., "1.0")
context.conversation_historylistLast 20 turns (sliding window)
context.org_knowledge_chunkslistRAG-retrieved chunks (top-k, tenant-scoped)
context.plugin_configdictTenant config from tenant_plugin_config
context.secretsdictDecrypted secrets from tenant_secrets (orchestrator-resolved, never stored)
metadata.correlation_idstringEnd-to-end trace ID
attachmentslist[any]Files, code snippets, URLs, PRs

11.2 OutputEnvelope

classDiagram
    class OutputEnvelope {
        +string task_id
        +string envelope_version
        +string status
        +string result
        +list~any~ artifacts
        +float confidence
        +string|null error
        +string|null error_code
        +list~string~ next_steps
        +string|null deprecation_warning
        +ExecutionMeta execution_meta
    }

    class ExecutionMeta {
        +int tokens_used
        +float latency_seconds
        +int agents_invoked
        +string model_used
        +float cost_estimate
        +string routing_method
        +bool budget_exceeded
    }

    OutputEnvelope --> ExecutionMeta : execution_meta
  

OutputEnvelope validation: The SDK validates every OutputEnvelope against the in-process schema cache (loaded from the envelope_schemas table) before returning to the orchestrator.


12. Envelope Versioning

12.1 Versioning Rules

12.2 Adapter Flow

flowchart LR
    ORCH["Orchestrator<br/><i>speaks v2</i>"]
    ADAPT["Version Adapter<br/><i>in-process dict lookup</i>"]
    PLUGIN["Plugin<br/><i>speaks v1</i>"]

    ORCH -->|"v2 InputEnvelope"| ADAPT
    ADAPT -->|"v1 InputEnvelope"| PLUGIN
    PLUGIN -->|"v1 OutputEnvelope"| ADAPT
    ADAPT -->|"v2 OutputEnvelope"| ORCH
  

Plugin developers are notified 90 days before a major version is sunset.


13. Plugin Registration & Discovery

13.1 Plugin Deployment Models

flowchart TD
    subgraph Dev["Local Development (Docker Compose)"]
        FILE["Plugin .py file"] --> WATCH["aau plugin dev<br/><i>File watcher + hot-reload</i>"]
        WATCH --> STUB["Local app stub<br/><i>Sends test InputEnvelopes, prints OutputEnvelopes</i>"]
    end

    subgraph Prod["Production (Kubernetes)"]
        IMAGE["Plugin Container Image<br/><i>aau plugin publish → Zot OCI registry</i>"]
        --> DEPLOY["kubectl apply / Helm chart"]
        --> STARTUP["Container starts, initialize() called"]
        --> REGAPI["POST /internal/registry/register<br/><i>{manifest, envelope_version, required_scopes}</i>"]
        --> VALIDATE{"Contract check +<br/>scope allowlist +<br/>cosign signature verify"}
        VALIDATE -->|"Valid"| PGWRITE["INSERT plugin_versions (PostgreSQL)"]
        VALIDATE -->|"Invalid"| REJECT["400 — reject + log"]
        PGWRITE --> HEALTH{"health_check() passes?"}
        HEALTH -->|"Yes"| REDISCACHE["SET registry:{name} in Redis<br/>PUBLISH registry.updated"]
        HEALTH -->|"No"| BACKOFF["Log degraded + retry backoff"]
        REDISCACHE --> READY["Plugin live in orchestrator"]
    end

    subgraph Teardown["Plugin Removal"]
        SCALE["Scale to 0"] --> DRAIN["In-flight requests drain<br/><i>grace = timeout_seconds</i>"]
        DRAIN --> DEREG["DELETE /internal/registry/{name}/{version}"]
        DEREG --> PGMARK["UPDATE plugin_versions SET status='inactive'"]
        PGMARK --> REDINVAL["DEL registry:{name} + PUBLISH registry.removed"]
    end
  

13.2 Registration Rules


14. Orchestrator — Deep Dive

14.1 Intent Routing Architecture

The router uses two configurable thresholds: CONFIDENCE_MIN (below which asks for clarification or falls back to LLM) and CHAIN_THRESHOLD (above which routes directly; between the two, considers chaining). Both are environment config values with documented defaults.

flowchart TD
    CMD["Incoming Command"] --> CACHE{"Route cache hit?<br/><i>Redis, TTL: 5 min</i>"}
    CACHE -->|"Hit"| CACHED["Use cached plugin route"]
    CACHE -->|"Miss"| EMBED["Embed command<br/><i>text-embedding-3-small</i>"]

    EMBED --> MATCH["Cosine similarity vs<br/>capability embeddings in Redis"]
    MATCH --> EVAL{"Score vs thresholds<br/><i>CHAIN_THRESHOLD=0.75<br/>CONFIDENCE_MIN=0.50</i>"}

    EVAL -->|"Score > CHAIN_THRESHOLD<br/>single plugin"| DIRECT["Route directly"]
    EVAL -->|"Score > CHAIN_THRESHOLD<br/>multiple plugins"| DEPCHECK{"Circular dependency?<br/><i>DFS check</i>"}
    EVAL -->|"CONFIDENCE_MIN < Score ≤ CHAIN_THRESHOLD"| LLM["LLM fallback<br/><i>claude-sonnet-4-6</i>"]
    EVAL -->|"Score < CONFIDENCE_MIN"| NOPE["No match → suggest closest"]
    EVAL -->|"Embed service down"| KEYWORD["Keyword fallback<br/><i>substring match on capability strings</i>"]

    LLM --> EVAL2{"LLM confident?"}
    EVAL2 -->|"Yes"| DIRECT
    EVAL2 -->|"Ambiguous"| CLARIFY["Ask user to clarify"]
    EVAL2 -->|"No"| NOPE

    DEPCHECK -->|"Cycle found"| ERROR["422 CIRCULAR_DEPENDENCY<br/><i>cycle path in error message</i>"]
    DEPCHECK -->|"No cycle"| CHAIN["Build execution chain (topological sort)"]

    DIRECT --> SUBCHECK{"Subscription check"}
    CHAIN --> PLAN["Build execution plan"]
    PLAN --> SUBCHECK

    SUBCHECK -->|"Authorized"| ENVELOPE["Build InputEnvelope<br/><i>inject context + secrets</i>"]
    SUBCHECK -->|"Not subscribed"| UPSELL["403 PLUGIN_NOT_SUBSCRIBED"]
    SUBCHECK -->|"Quota exceeded"| QUOTA["429 QUOTA_EXCEEDED"]

    ENVELOPE --> EXEC["Execute plugin(s)"]
    EXEC --> RESULT["Return OutputEnvelope"]
    RESULT --> WRITECACHE["Write to route cache (Redis)"]
    CACHED --> SUBCHECK
  

Threshold calibration: CHAIN_THRESHOLD and CONFIDENCE_MIN are set in environment config. Calibrate by running a labeled command corpus through the router and adjusting to maximise precision while keeping recall above 95%. Two values are easier to tune and reason about than three.

LLM routing cost target: < 5% of requests. Tracked via routing_method in every execution_meta. Grafana alert fires if LLM fallback rate exceeds 10%.

14.2 Multi-Plugin Task Sequencing

flowchart LR
    subgraph Example["Example: Review PR + create tickets for each issue"]
        direction TB
        CMD2["Command"] --> DECOMPOSE["Task Sequencer"]
        DECOMPOSE --> DEPCHECK2["DFS dependency check"]
        DEPCHECK2 --> STEP1["Step 1: Code Review Pro<br/><i>find issues</i>"]
        STEP1 -->|"OutputEnvelope.artifacts"| TRANSFORM["Transform<br/><i>map issues to ticket format</i>"]
        TRANSFORM --> STEP2["Step 2: Ticket Processor<br/><i>create tickets</i>"]
        STEP2 -->|"OutputEnvelope"| MERGE["Merge Results"]
        MERGE --> FINAL["Final OutputEnvelope"]
    end
  

Sequencing rules:

14.3 Shared Context Store

Context Assembly Pipeline

flowchart TD
    REQ["Request (tenant_id, user_id, task, plugin_name)"] --> PARALLEL["Parallel lookups"]

    PARALLEL --> HIST["Conversation history<br/><i>Redis: ctx:{tenant_id}:{user_id}:history (last 20)</i>"]
    PARALLEL --> PROF["User profile<br/><i>Redis cache → PostgreSQL users</i>"]
    PARALLEL --> CFG["Plugin config<br/><i>Redis cache → tenant_plugin_config</i>"]
    PARALLEL --> RAG["RAG retrieval<br/><i>pgvector cosine search, top-k chunks</i>"]
    PARALLEL --> SECRETS["Decrypt secrets<br/><i>SELECT + pgp_sym_decrypt from tenant_secrets</i>"]

    HIST --> ASSEMBLE["Context Assembler"]
    PROF --> ASSEMBLE
    CFG --> ASSEMBLE
    RAG --> ASSEMBLE
    SECRETS --> ASSEMBLE

    ASSEMBLE --> ENVELOPE3["InputEnvelope.context<br/><i>history, profile, plugin_config,<br/>org_knowledge_chunks, secrets</i>"]
  

RAG Pipeline — Org Knowledge Indexing

flowchart TD
    UPLOAD["POST /v1/knowledge {content, source_url}"]
    --> CHUNK["Chunker<br/><i>~512-token chunks with overlap</i>"]
    --> EMBED2["Embed each chunk<br/><i>text-embedding-3-small → vector(1536)</i>"]
    --> STORE2["INSERT org_knowledge_chunks<br/><i>content, embedding, metadata, tenant_id</i>"]
    --> READY2["Available for HNSW retrieval"]

    CHUNK --> META["Extract metadata<br/><i>source_url, section headers</i>"]
    META --> STORE2
  

RAG Pipeline — Retrieval at Request Time

flowchart TD
    TASK["task_description"]
    --> EMBED3["Embed task (text-embedding-3-small)"]
    --> SEARCH["pgvector HNSW cosine search<br/><i>WHERE tenant_id = $1 ORDER BY embedding &lt;=&gt; $2 LIMIT 8</i>"]
    --> FILTER["Filter: cosine similarity > 0.7"]
    --> WRAP["Wrap in system prompt boundary<br/><i>prevents prompt injection from org docs</i>"]
    --> INJECT["Inject as context.org_knowledge_chunks<br/><i>max 2000 tokens total</i>"]
  

Async History Summarisation

sequenceDiagram
    participant Agent as Any Agent
    participant Redis
    participant Stream as Redis Stream
    participant Worker as Summarisation Worker
    participant LLM as claude-sonnet-4-6

    Agent->>Redis: LPUSH ctx:tenant:user:history {new_turn}
    Redis->>Redis: LLEN ctx:tenant:user:history
    alt Length > 20
        Redis->>Stream: XADD stream:summarization {tenant_id, user_id}
    end

    Note over Agent: Request continues unblocked

    Stream->>Worker: XREADGROUP (consumer group)
    Worker->>Redis: LRANGE history (turns 21..end)
    Redis-->>Worker: Older turns
    Worker->>LLM: Summarise older turns
    LLM-->>Worker: Compressed summary
    Worker->>Redis: SET ctx:tenant:user:summary {text}
    Worker->>Redis: LTRIM ctx:tenant:user:history 0 19
    Worker->>Stream: XACK (mark message processed)
  

14.4 Async Task Delivery (Redis Streams)

sequenceDiagram
    participant User
    participant App
    participant Stream as Redis Stream
    participant Worker as Plugin Worker
    participant DB as PostgreSQL
    participant Notify as Webhook Delivery Worker
    participant Webhook as Customer Webhook URL

    User->>App: POST /v1/tasks {command}
    App->>DB: INSERT tasks (status=queued)
    App->>Stream: XADD stream:tasks:{plugin_name} {task_id, payload}
    App-->>User: 202 Accepted {task_id}

    Stream->>Worker: XREADGROUP (consumer group)
    Worker->>DB: UPDATE tasks SET status=running
    Worker->>Worker: Run LangGraph graph
    Worker->>DB: UPDATE tasks SET status=success/failed, result={...}
    Worker->>Stream: XADD stream:webhooks {task_id, tenant_id}
    Worker->>Stream: XACK tasks stream

    Note over User: Poll for result
    User->>App: GET /v1/tasks/{task_id}
    App->>DB: SELECT tasks WHERE id=task_id
    DB-->>App: {status, result}
    App-->>User: 200 {status, result}

    Note over Notify: Webhook delivery worker
    Stream->>Notify: XREADGROUP stream:webhooks
    Notify->>DB: SELECT webhooks WHERE tenant_id=...
    Notify->>Notify: HMAC-SHA256 sign payload
    Notify->>Webhook: POST {task_id, status, result}<br/>X-AAU-Signature: sha256=<hmac>
    Notify->>Stream: XACK (on 2xx) or re-enqueue with backoff
  

Webhook Security Detail

flowchart TD
    PAYLOAD["Webhook payload JSON"]
    --> SIGN["HMAC-SHA256(payload, webhook.secret)<br/><i>secret from webhooks.secret_hash (decrypted)</i>"]
    --> HEADER["X-AAU-Signature: sha256={digest}"]
    --> POST["POST to customer URL (10s timeout)"]
    --> RESP{"HTTP response?"}

    RESP -->|"2xx"| LOG_OK["Log delivery success"]
    RESP -->|"Non-2xx / timeout"| RETRY{"Retries < 5?"}

    RETRY -->|"Yes"| BACKOFF2["Exponential backoff<br/><i>1m → 5m → 30m → 2h → 8h</i>"]
    BACKOFF2 --> POST
    RETRY -->|"No"| LOG_FAIL["Log permanent failure<br/><i>tenant notified in dashboard</i>"]
  

Task cancellation: DELETE /v1/tasks/{task_id} sets Redis key cancel:{task_id} EX 3600. The LangGraph graph checks this flag at each node boundary. Acknowledged within one node cycle.

Task state machine: pending → queued → running → success | partial | failed | cancelled


15. Failure Handling & Resilience

15.1 Failure Strategy

flowchart TD
    EXEC2["Plugin Execution"] --> OUTCOME{"Outcome?"}

    OUTCOME -->|"Success"| RETURN["OutputEnvelope status: success"]
    OUTCOME -->|"Timeout"| RETRY{"Retries remaining?"}
    OUTCOME -->|"Error"| ERRTYPE{"Error type?"}

    RETRY -->|"Yes"| BACKOFF3["Exponential backoff → retry"]
    RETRY -->|"No"| TIMEOUT["status: failed / PLUGIN_TIMEOUT"]
    BACKOFF3 --> EXEC2

    ERRTYPE -->|"Transient"| RETRY
    ERRTYPE -->|"Token budget exceeded"| WARNBILL["Warn + continue<br/><i>overage tracked + billed</i>"]
    ERRTYPE -->|"Context window overflow"| TRUNCATE["Truncate history → retry once"]
    ERRTYPE -->|"Content policy violation"| POLICY["status: failed / CONTENT_POLICY_VIOLATION<br/><i>not retried, not billed</i>"]
    ERRTYPE -->|"Permanent"| FALLBACK{"Fallback plugin?"}
    ERRTYPE -->|"Partial result"| PARTIAL["status: partial"]

    FALLBACK -->|"Yes"| ALT["Route to fallback plugin"]
    FALLBACK -->|"No"| FAIL["status: failed"]
    ALT --> EXEC2

    TIMEOUT --> CIRCUIT["Update circuit breaker (Redis)"]
    FAIL --> CIRCUIT
    CIRCUIT -->|"3 consecutive failures"| OPEN["Open circuit — 60s cooldown"]
  

15.2 Token Budget Enforcement

flowchart TD
    NODE_START["LangGraph node starts"]
    --> SDK_CHECK{"tokens_used / token_budget"}

    SDK_CHECK -->|"< 80%"| RUN["Run node (LLM call)"]
    SDK_CHECK -->|"80–100%"| WARN2["Log warning + emit metric"]
    WARN2 --> RUN

    RUN --> TRACK["SDK reads response.usage<br/><i>input_tokens + output_tokens → running total in GraphState</i>"]
    TRACK --> CHECK2{"Total > token_budget?"}

    CHECK2 -->|"No"| NEXT["Proceed to next node"]
    CHECK2 -->|"Yes"| OVERAGE["Set budget_exceeded=true in ExecutionMeta<br/><i>Overage billed to tenant</i>"]
    OVERAGE --> CONTINUE["Continue execution"]
    CONTINUE --> NEXT
  

15.3 Circuit Breaker per Plugin

stateDiagram-v2
    [*] --> Closed
    Closed --> Open : 3 consecutive failures
    Open --> HalfOpen : 60s cooldown elapsed
    HalfOpen --> Closed : Probe request succeeds
    HalfOpen --> Open : Probe request fails
    Open --> Open : Requests rejected immediately (no plugin call)
  

State stored in Redis (circuit:{plugin_name}:{version}). Surfaced as aau_circuit_breaker_state Gauge in Grafana.


16. Plugin Sandboxing & Security

16.1 Capability Scope Model

ScopeGrants Access To
context:readCurrent request’s user and tenant context only
secrets:{name}A specific named secret resolved from tenant_secrets
store:writeWriting to MinIO at /{tenant_id}/{plugin_name}/...
store:readReading artifacts the plugin itself created
plugin:chainCalling other plugins as dependencies
knowledge:readReading the tenant’s org knowledge base
knowledge:writeIndexing new content into the tenant’s org knowledge base

Note on secrets: Plugin pods never access the database or call a secrets endpoint. The orchestrator resolves and decrypts secrets from tenant_secrets before building the InputEnvelope, and injects them as context.secrets. Secrets leave the orchestrator only inside a request, over an in-cluster encrypted channel to the plugin pod.

16.2 Runtime Isolation

flowchart TB
    subgraph App2["Platform App"]
        EXEC3["execute(InputEnvelope)<br/><i>secrets already injected, tenant_id embedded</i>"]
    end

    subgraph PluginPod["Plugin Pod (network-isolated)"]
        PROXY["Scope Enforcement Proxy<br/><i>validates all infra calls against declared scopes</i>"]
        PLUGIN3["Plugin Code (LangGraph)"]
        PROXY --> PLUGIN3
    end

    subgraph SharedInfra["Shared Infrastructure"]
        PG3["PostgreSQL (WHERE tenant_id = $1 enforced)"]
        STORE3["MinIO (path: /{tenant_id}/{plugin_name}/...)"]
    end

    EXEC3 -->|"InputEnvelope"| PROXY
    PROXY -->|"Scope-validated"| STORE3
    PROXY -->|"Read-only, tenant-scoped"| PG3
    PLUGIN3 -->|"OutputEnvelope"| EXEC3
  

Network policy: Kubernetes NetworkPolicy restricts plugin pod egress to only the endpoints declared in required_secret_names. No direct database or secret store access from plugin pods.

Plugin signing: All plugins are signed with cosign during aau plugin publish. The registration endpoint verifies the signature before accepting.

16.3 Plugin Signing & Publish Flow

flowchart TD
    CODE["Plugin code (my_plugin.py)"]
    --> SDKBUILD["aau plugin publish<br/><i>Step 1: docker build using aau-base image</i>"]
    --> IMAGE2["Container image built"]
    --> SCAN["Step 2: trivy scan<br/><i>Fail on HIGH/CRITICAL CVEs</i>"]
    --> SIGN2["Step 3: cosign sign image<br/><i>using platform private key</i>"]
    --> PUSH["Step 4: push to Zot OCI registry<br/><i>registry.aau.internal/{name}:{version}</i>"]
    --> MANIFEST["Step 5: generate manifest.json"]
    --> UPLOAD2["Step 6: POST /internal/registry/register<br/><i>platform verifies cosign signature</i>"]
    --> LIVE2["Plugin registered and live"]

    SCAN --> FAIL2["Build fails — developer must fix CVEs"]
  

17. Subscription Enforcement — Deep Dive

17.1 Enforcement Flow

flowchart TD
    REQ["Incoming Request (tenant_id from API key)"]
    --> LOOKUP["Lookup subscription<br/><i>Redis cache → PostgreSQL subscriptions</i>"]
    --> CHECK{"Plugin in subscription?"}

    CHECK -->|"Yes"| QUOTA2{"Within quota?"}
    CHECK -->|"No"| DENY["403 PLUGIN_NOT_SUBSCRIBED + upgrade_url"]

    QUOTA2 -->|"< 80%"| ALLOW["Allow execution"]
    QUOTA2 -->|"80–100%"| WARN3["Allow + emit quota.warning event"]
    QUOTA2 -->|"Exceeded"| OVER{"Overage enabled?"}

    OVER -->|"Yes"| CHARGE["Allow + set overage=true in metering"]
    OVER -->|"No"| BLOCK["429 QUOTA_EXCEEDED"]

    ALLOW --> LOG["INSERT execution_events"]
    WARN3 --> LOG
    CHARGE --> LOG
    LOG --> BILL["Billing worker reads execution_events WHERE billed=false<br/><i>runs on schedule, marks billed=true after invoicing</i>"]
  

17.2 Usage Metering Record

FieldDescription
tenant_idWho
plugin_name + plugin_versionWhat
task_idWhich request
timestampWhen
tokens_usedFrom ExecutionMeta.tokens_used (SDK-tracked)
latency_msExecution duration
statussuccess / partial / failed
cost_estimatetokens_used × per-token rate for model_used
routing_methodembedding / llm_fallback / cache / keyword
billedFalse for failed; True once invoiced by billing worker

18. Plugin Dependency Management

18.1 Dependency Resolution

dependencies = [
    {"name": "code-review", "version": ">=1.2.0,<2.0.0"},
    {"name": "ticket-processor", "version": ">=2.0.0"},
]

At plan-build time: resolve versions → check subscriptions → DFS cycle detection → topological sort → execute.

18.2 Circular Dependency Detection

flowchart TD
    PLAN["Build Execution Plan"] --> DFS["DFS over dependency graph"]
    DFS --> CYCLE{"Cycle detected?"}
    CYCLE -->|"Yes"| REJECT2["422 CIRCULAR_DEPENDENCY<br/><i>error message includes full cycle path</i>"]
    CYCLE -->|"No"| TOPO["Topological sort → execution order"]
    TOPO --> EXEC4["Execute in order"]
  

19. Plugin Deprecation Lifecycle

flowchart LR
    ACTIVE["Active"] -->|"Owner initiates"| DEPRECATED["Deprecated<br/><i>90-day warning</i>"]
    DEPRECATED -->|"90 days elapsed"| SUNSET["Sunset<br/><i>No new subscriptions</i>"]
    SUNSET -->|"Zero active subscribers"| RETIRED["Retired<br/><i>Removed from registry</i>"]

    DEPRECATED --> BLOCKED2["New subscriptions blocked"]
    DEPRECATED --> WARN4["deprecation_warning in every OutputEnvelope<br/><i>sunset_at + migration_guide_url</i>"]
  

20. Plugin SDK & Developer Experience

20.1 SDK Structure

aau-sdk/
├── base_plugin.py        # BasePlugin abstract class (LangGraph-aware)
├── decorators.py         # @register_plugin
├── envelopes.py          # InputEnvelope, OutputEnvelope, typed dataclasses
├── graph.py              # AAUStateGraph, standard node helpers
├── budget.py             # Token budget tracker (wraps Anthropic client)
├── validation.py         # OutputEnvelope schema validation (in-process envelope_schemas cache)
├── testing/
│   ├── harness.py        # Local orchestrator stub
│   └── fixtures.py       # Sample InputEnvelopes for testing
└── cli/
    ├── new.py            # aau plugin new <name>
    ├── dev.py            # aau plugin dev (Docker Compose + hot-reload)
    ├── test.py           # aau plugin test
    └── publish.py        # build → trivy scan → cosign sign → Zot push → register

20.2 Local Development Stack

aau plugin dev runs docker compose up. All services are local — no external dependencies.

graph TB
    subgraph DockerCompose["Docker Compose — Local Dev Stack"]
        subgraph AppLayer["Application"]
            LAPP["Platform App (FastAPI :8080)<br/><i>auth + orchestrator + registration routes</i>"]
        end

        subgraph Data["Data Services"]
            LPG["PostgreSQL 16 + pgvector :5432<br/><i>Pre-seeded: 2 tenants, subscriptions,<br/>API keys, sample org knowledge</i>"]
            LREDIS["Redis 7 :6379<br/><i>Cache + Streams queues</i>"]
        end

        subgraph Storage["Storage"]
            LMINIO["MinIO :9000 / Console :9001<br/><i>Pre-created tenant buckets</i>"]
            LZOT["Zot OCI Registry :5000<br/><i>Accepts plugin image pushes</i>"]
        end

        subgraph Observability2["Observability"]
            LPROM["Prometheus :9090"]
            LGRAFANA["Grafana :3000<br/><i>Pre-built plugin dashboards</i>"]
            LLOKI["Loki :3100"]
        end

        subgraph Plugin2["Developer Plugin (hot-reload)"]
            LPLUGIN["my_plugin.py<br/><i>Reloads on file save</i>"]
        end
    end

    LAPP --> LPG
    LAPP --> LREDIS
    LAPP --> LMINIO
    LPLUGIN -->|"POST /internal/registry/register"| LAPP
    LPROM --> LGRAFANA
    LLOKI --> LGRAFANA
  

Test API keys:

20.3 Plugin Development Flow

flowchart LR
    SCAFFOLD["aau plugin new my-plugin<br/><i>Scaffolds: my_plugin.py, tests/, Dockerfile, manifest.json</i>"]
    --> CODE["Implement LangGraph nodes<br/><i>extend BasePlugin, define GraphState</i>"]
    --> DEV["aau plugin dev<br/><i>docker compose up + hot-reload</i>"]
    --> TEST["aau plugin test<br/><i>PluginHarness: tokens, scopes, schema</i>"]
    --> PUBLISH["aau plugin publish<br/><i>build → trivy → cosign → Zot → register</i>"]
    --> LIVE2["Plugin live in registry"]
  

20.4 Minimal Plugin Implementation

from aau_sdk import BasePlugin, register_plugin
from aau_sdk.envelopes import InputEnvelope, OutputEnvelope
from aau_sdk.graph import AAUStateGraph
from typing import TypedDict
from langchain_anthropic import ChatAnthropic

class GraphState(TypedDict):
    task: str
    result: str

@register_plugin
class MyPlugin(BasePlugin):
    name = "my-plugin"
    version = "1.0.0"
    description = "Summarises documents."
    capabilities = ["summarize documents", "extract key points"]
    required_scopes = ["context:read", "store:write"]
    required_secret_names = []   # declare any tenant secrets needed here
    token_budget = 50_000

    def initialize(self):
        # SDK wraps client with token budget tracker automatically
        self.llm = self.sdk.get_llm(model="claude-sonnet-4-6")
        self.graph = self._build_graph()

    def _build_graph(self):
        graph = AAUStateGraph(GraphState)
        graph.add_node("summarize", self._summarize_node)
        graph.set_entry_point("summarize")
        graph.set_finish_point("summarize")
        return graph.compile()

    def _summarize_node(self, state: GraphState) -> GraphState:
        # plugin_config comes from tenant_plugin_config table
        style = self.plugin_config.get("summary_style", "concise")
        response = self.llm.invoke(f"Summarise in {style} style: {state['task']}")
        return {"result": response.content}

    def execute(self, input: InputEnvelope) -> OutputEnvelope:
        final_state = self.graph.invoke({"task": input.task_description})
        return OutputEnvelope(
            task_id=input.task_id,
            status="success",
            result=final_state["result"],
            confidence=0.9,
        )
        # SDK validates OutputEnvelope against in-process schema cache before returning

20.5 Local Test Harness

from aau_sdk.testing import PluginHarness
from my_plugin import MyPlugin

harness = PluginHarness(MyPlugin())
result = harness.run(
    task="Summarise this architecture document",
    tenant_id="test-tenant-1",
    plugin_config={"summary_style": "detailed"},
    attachments=["doc.pdf"],
)

assert result.status == "success"
assert result.confidence >= 0.7
harness.assert_tokens_within_budget()
harness.assert_no_cross_tenant_calls()
harness.assert_output_schema_valid()

21. Plugin Example — Code Review Pro (Detailed)

21.1 Internal LangGraph Agent Graph

flowchart TD
    INPUT["InputEnvelope<br/><i>PR URL + context.secrets.github_token<br/>+ context.plugin_config.coding_standards</i>"]
    --> FETCH["Diff Fetcher Node<br/><i>GitHub API (token from context.secrets) → parsed diff</i>"]
    --> SPLIT["File Splitter Node<br/><i>Groups files by type/module</i>"]
    --> PARALLEL2["Parallel Branch (LangGraph Send API)"]

    PARALLEL2 --> SEC["Security Node<br/><i>OWASP + dependency vulns + secret detection</i>"]
    PARALLEL2 --> STYLE2["Style Node<br/><i>plugin_config.coding_standards applied</i>"]
    PARALLEL2 --> LOGIC2["Logic Node<br/><i>Bug detection, edge cases, race conditions</i>"]
    PARALLEL2 --> PERF["Performance Node<br/><i>N+1 queries, memory leaks, complexity</i>"]

    SEC --> MERGE3["Result Merger Node"]
    STYLE2 --> MERGE3
    LOGIC2 --> MERGE3
    PERF --> MERGE3

    MERGE3 --> PRIORITY2["Priority Ranker Node<br/><i>Critical → High → Medium → Low</i>"]
    --> SUMMARY2["Summary Node<br/><i>Executive summary + inline comments</i>"]
    --> OUTPUT2["OutputEnvelope<br/><i>Review report (MinIO URL) + confidence</i>"]
  

21.2 Agent Node Responsibilities

NodeInputOutputTools
Diff FetcherPR URLParsed diffGitHub API (context.secrets.github_token)
File SplitterDiffFile batchesInternal logic
SecurityFile batchSecurity findingsOWASP rules, Trivy
StyleFile batchStyle violationsplugin_config.coding_standards
LogicFile batchBug candidatesclaude-sonnet-4-6
PerformanceFile batchPerf concernsComplexity analyser
Priority RankerAll findingsPrioritised listScoring model
SummaryPrioritised findingsReview reportclaude-sonnet-4-6

22. Data Storage Architecture

graph TB
    subgraph DataStores["Data Layer"]
        subgraph Primary["Primary Stores"]
            PG4["PostgreSQL 16<br/><i>tenants, users, api_keys, tenant_secrets,<br/>plugins, plugin_versions, envelope_schemas,<br/>subscriptions, tenant_plugin_config,<br/>tasks, webhooks, execution_events</i>"]
            PGV2["pgvector (HNSW index)<br/><i>org_knowledge_chunks<br/>scoped by tenant_id</i>"]
            REDIS3["Redis 7<br/><i>Auth cache, plugin registry cache,<br/>circuit breakers, route cache,<br/>rate limit counters, cancellation flags<br/>+ Streams: tasks / summarisation / webhooks</i>"]
        end

        subgraph Storage2["Storage"]
            MINIO3["MinIO<br/><i>/{tenant_id}/{plugin_name}/{task_id}/ — artifacts</i>"]
            ZOT2["Zot OCI Registry<br/><i>Plugin container images</i>"]
        end

        subgraph PlatformSecrets["Platform Credentials"]
            K8SSEC2["Kubernetes Secrets<br/><i>LLM API key, embedding API key,<br/>pgcrypto ENCRYPT_KEY</i>"]
        end
    end

    APP2["Platform App"] --> PG4
    APP2 --> PGV2
    APP2 --> REDIS3
    APP2 --> K8SSEC2
    PLUGINS3["Plugins"] --> MINIO3
    WBILLING2["Billing Worker"] --> PG4
    WDELIVER2["Webhook Worker"] --> REDIS3
    WDELIVER2 --> PG4
    DEPLOY2["Plugin Deployments"] --> ZOT2
  

Data retention & GDPR:

DataRetentionGDPR erasure
execution_events13 monthsPseudonymise user_id within 30 days
MinIO artifacts90 days (tenant-configurable)Purge within 30 days
tasks.input / tasks.result90 daysPurge within 30 days
users rowUntil account deletedHard delete (cascade)
Billing aggregates7 years (accounting)Not erasable — no PII in aggregates

GDPR Erasure Flow

flowchart TD
    REQUEST["POST /v1/gdpr/erasure-request {user_id}"]
    --> VALIDATE2["Verify requester is user or tenant admin"]
    --> STREAM2["XADD stream:erasure {user_id, tenant_id}"]
    --> ACK["202 Accepted — SLA: 30 days"]

    STREAM2 --> WORKER2["Erasure Worker (Redis Streams consumer)"]
    WORKER2 --> TASKS_PURGE["Nullify tasks.input, tasks.result WHERE user_id=$1"]
    WORKER2 --> EVENTS_PSEUDO["Pseudonymise execution_events.user_id"]
    WORKER2 --> ARTIFACTS_DEL["DELETE MinIO: /{tenant_id}/.../{user_id}/"]
    WORKER2 --> HISTORY_DEL["DEL Redis ctx:{tenant_id}:{user_id}:*"]
    WORKER2 --> USER_DEL["DELETE users WHERE id=$1 (cascades to api_keys)"]
    WORKER2 --> AUDIT["Immutable erasure audit log (no PII)"]
  

23. Deployment Architecture

graph TB
    subgraph K8s["Kubernetes Cluster (cloud-agnostic, plain Deployments + PVCs)"]
        subgraph AppLayer2["Application (single replica — v1)"]
            APP3["Platform App<br/><i>FastAPI + Uvicorn :8080<br/>auth, rate limiting, orchestrator,<br/>registration routes</i>"]
            WWEBHOOK["Webhook Delivery Worker<br/><i>Redis Streams consumer</i>"]
            WBILL["Billing Aggregation Worker<br/><i>Scheduled cron</i>"]
        end

        subgraph PluginPods2["Plugin Pods (HPA: 1–10 replicas per plugin)"]
            PP4["Code Review Pro"]
            PP5["Customer Support AI"]
            PPN2["Plugin N"]
        end

        subgraph DataLayer["Data Services (plain Deployments + PVCs)"]
            PG5["PostgreSQL 16<br/><i>Deployment + PVC</i>"]
            REDIS4["Redis 7<br/><i>Deployment + PVC</i>"]
            MINIO4["MinIO<br/><i>Deployment + PVC</i>"]
            ZOT3["Zot OCI Registry<br/><i>Deployment + PVC</i>"]
        end

        subgraph Obs["Observability"]
            PROM2["Prometheus"]
            GRAF2["Grafana (Metrics + Logs)"]
            LOKI3["Loki"]
        end
    end

    subgraph External3["External"]
        LB2["Ingress Controller (nginx)<br/><i>TLS termination</i>"]
        CDN2["CDN — Marketplace UI"]
    end

    LB2 --> APP3
    CDN2 --> APP3
    APP3 --> PluginPods2
    PluginPods2 -->|"POST /internal/registry/register"| APP3
    APP3 -->|"Redis pub/sub"| APP3
    PROM2 --> GRAF2
    LOKI3 --> GRAF2
  

Key deployment principles:


24. Observability

24.1 Metrics (Prometheus + Grafana)

MetricTypeLabelsPurpose
aau_router_confidenceHistogrammethodRouting quality over time
aau_router_method_totalCountermethodLLM fallback rate (alert > 10%)
aau_plugin_execution_duration_secondsHistogramplugin, statusLatency per plugin
aau_plugin_token_usage_totalCounterplugin, modelToken cost attribution
aau_plugin_budget_exceeded_totalCounterpluginOverage events
aau_circuit_breaker_stateGaugeplugin0=closed, 1=half-open, 2=open
aau_task_stream_depthGaugepluginRedis Stream pending count
aau_rate_limit_hits_totalCountertenant_idThrottling events
aau_webhook_delivery_failures_totalCountertenant_idDelivery health
aau_api_key_auth_latency_secondsHistogramcache_hitAuth path performance

24.2 Alerting Pipeline

flowchart TD
    PROM3["Prometheus (scrapes every 15s)"]
    --> RULES["PrometheusRule alerts"]

    RULES --> A1["circuit_breaker_state == 2 → plugin circuit open"]
    RULES --> A2["router_method{llm} / total > 0.10 → LLM routing rate high"]
    RULES --> A3["task_stream_depth > 500 → queue backing up"]
    RULES --> A4["execution_duration p99 > 30s → latency SLO breach"]
    RULES --> A5["auth_latency p99 > 500ms → auth path slow"]

    A1 --> ALERTMANAGER["Alertmanager"]
    A2 --> ALERTMANAGER
    A3 --> ALERTMANAGER
    A4 --> ALERTMANAGER
    A5 --> ALERTMANAGER

    ALERTMANAGER --> SLACK2["Slack #alerts"]
    ALERTMANAGER --> PAGERDUTY["PagerDuty (P1 only)"]
  

24.3 Structured Logging (Loki)

Every log line is JSON with mandatory fields:

{
  "timestamp": "2026-02-28T12:00:00Z",
  "level": "info",
  "service": "platform-app",
  "task_id": "task_abc123",
  "tenant_id": "tenant_xyz",
  "plugin_name": "code-review",
  "node": "security_agent",
  "routing_method": "embedding",
  "duration_ms": 342,
  "tokens_used": 1240,
  "message": "Node completed"
}

PII policy: Never log email, API key plaintext, raw user input, or tasks.input content. task_id is the primary correlation key.

24.4 SLOs

SLOTargetAlert threshold
Router p99 latency (embedding path)< 200ms> 400ms
Plugin execution success rate> 99%< 98%
Task delivery (sync) p95< 5s> 10s
LLM fallback routing rate< 5%> 10%
Webhook delivery success rate> 99.5%< 99%
Auth p99 latency< 100ms> 500ms

25. Non-Functional Requirements

RequirementTargetApproach
Availability99.9% uptimeCircuit breakers per plugin; HA as v2 milestone (plain Deployment scale-out)
Scalability10K concurrent executionsPlugin HPA, Redis Streams for async backpressure
Latencyp95 < 5s sync tasksEmbedding routing (< 200ms), route cache, async summarisation
IsolationZero cross-tenant impacttenant_id enforced at query layer, Redis namespacing, MinIO path isolation, NetworkPolicy
SecuritySOC2-readySHA-256 key hashing, pgcrypto tenant secrets, K8s Secrets platform creds, cosign plugin signing, scope enforcement
ObservabilityFull request tracingtask_id correlation, Prometheus + Grafana + Loki, 6 SLOs with alert thresholds
ExtensibilityLocal dev in < 1 hourPlugin SDK, Docker Compose full stack, aau plugin new scaffolding
VersioningNo breaking surprisesenvelope_schemas table + in-process cache, version adapters, 90-day deprecation window
Data complianceGDPR-readyTenant isolation, 30-day erasure SLA, erasure worker via Redis Streams
Operational simplicityMinimal ops burden1 app + 2 workers, plain K8s Deployments, Redis Streams reuses existing Redis
Router costLLM calls < 5%Embedding-first, 5-min route cache, keyword fallback, 2-threshold routing

26. Summary

The AI Agents Universe is a platform where:

The simplification principle: every infrastructure component must earn its place. When two use cases can be served by one tool already in the stack, that is always preferred over adding a new service.

● Intelligence at Every Action

AI Native
Project Management

Stop using tools that bolt on AI as an afterthought. Jovis is built AI-first — smart routing, proactive monitoring, and intelligent workflows from the ground up.