Operations Dashboard

# Amodal

> Documentation for the Amodal Agent Runtime — build domain-specific AI agents from your repo.

## Chat Widget

A standalone, embeddable chat widget for adding agent-powered chat to any web application. Available as a sub-export of `@amodalai/react`.

### Installation

```bash
npm install @amodalai/react
```

### Quick Start

```tsx
import { ChatWidget } from '@amodalai/react/widget'
import '@amodalai/react/widget/style.css'

function App() {
  return (
    <ChatWidget
      serverUrl="http://localhost:3847"
      appId="app_123"
      position="floating"
    />
  )
}
```

### Positions

| Position   | Behavior                                       |
| ---------- | ---------------------------------------------- |
| `inline`   | Renders in-place within your layout            |
| `floating` | Floating button that expands into a chat panel |
| `right`    | Fixed panel on the right side                  |
| `bottom`   | Fixed panel at the bottom                      |

### Callbacks

```tsx
<ChatWidget
  serverUrl="http://localhost:3847"
  onToolCall={(tool, args) => {
    console.log(`Agent called: ${tool}`, args)
  }}
  onKBProposal={(proposal) => {
    console.log('Knowledge proposal:', proposal)
  }}
/>
```

### SSE Events

The widget handles these event types from the runtime:

| Event                  | Description                    |
| ---------------------- | ------------------------------ |
| `text`                 | Streaming text output          |
| `tool_call`            | Tool invocation                |
| `skill_activated`      | Skill activation               |
| `kb_proposal`          | Knowledge base update proposal |
| `ConfirmationRequired` | Write operation needs approval |

### Theming

CSS custom properties — no Tailwind dependency:

```css
.pcw-widget {
  --pcw-primary: #6e56cf;
  --pcw-background: #ffffff;
  --pcw-text: #1a1a1a;
  --pcw-border: #e5e5e5;
  --pcw-radius: 12px;
}
```

### Bundle Size

| Format | Size   |
| ------ | ------ |
| ESM    | \~18KB |
| CSS    | \~6KB  |


## SDK Overview

Two ways to embed Amodal in your product:

* **[@amodalai/react](/sdk/react)** — Drop-in React components: `AmodalProvider`, `AmodalChat`, `AmodalAction`, and hooks. Talks to a running runtime server over HTTP/SSE.
* **[@amodalai/react/widget](/sdk/chat-widget)** — Standalone chat widget with SSE streaming, theming, and callbacks. No React required on the host page.
* **[@amodalai/runtime](#server-side-runtime)** — The server-side engine. Use `createAgent()` to embed the agent runtime directly in your Node.js application.

### React (client-side)

```bash
npm install @amodalai/react
```

```tsx
import { AmodalProvider, AmodalChat } from '@amodalai/react'

function App() {
  return (
    <AmodalProvider runtimeUrl="http://localhost:3847" appId="my-app">
      <AmodalChat />
    </AmodalProvider>
  )
}
```

Your React app calls a running runtime server. See the [React SDK reference](/sdk/react) for the full component API.

### Server-side runtime

For server-side embedding — your own Express/Fastify/Hono/Next.js route handlers running the agent in-process — use `createAgent()` from `@amodalai/runtime`:

```bash
npm install @amodalai/runtime
```

```typescript
import { createAgent } from '@amodalai/runtime'

const agent = await createAgent({
  repoPath: './my-agent',
  provider: 'anthropic',
  apiKey: process.env.ANTHROPIC_API_KEY,
})

// In your Express route:
app.post('/api/chat', async (req, res) => {
  const session = await agent.createSession({ userId: req.user.id })
  for await (const event of session.stream(req.body.message)) {
    res.write(`data: ${JSON.stringify(event)}\n\n`)
  }
  res.end()
})
```

The agent you create owns its own tool registry, provider, and store backend — you bring your own database (Postgres pool or PGLite for dev), your own auth, your own framework. Amodal stays invisible to your end users.

#### What you get

* **State-machine agent loop** — see [State Machine](/learn/architecture/state-machine) for the full architecture.
* **Multi-provider support** — Anthropic, OpenAI, Google, DeepSeek, Groq, Mistral, xAI via the Vercel AI SDK. Provider failover chains built in.
* **Tool system** — store tools, connection tools with ACL enforcement, custom tools (handler.ts files), MCP tools, and admin file tools.
* **Sub-agent dispatch** — `dispatch_task` spawns a write-enabled sub-agent with its own context.
* **Context compaction + loop detection** — long agent runs stay coherent without blowing the budget.
* **Store backends** — PGLite for dev, Postgres for production, both via Drizzle ORM. Bring your own via the `storeBackend` injection.
* **SSE streaming** — every `session.stream()` call yields typed SSE events (init, text\_delta, tool\_call\_start, tool\_call\_result, done).
* **MCP support** — discover tools from connected MCP servers.

#### When to use which

| Scenario                                                                       | Use                      |
| ------------------------------------------------------------------------------ | ------------------------ |
| You want to talk to your agent from your own web app                           | React SDK                |
| You want a chat widget on a marketing page or third-party site                 | Chat widget              |
| You're building a vertical SaaS and want the agent as the core of your product | `createAgent()` runtime  |
| You want to run `amodal dev` locally and iterate on agent config               | CLI (no SDK code needed) |


## @amodalai/react

High-level React components for embedding Amodal agents in your product. Provides a provider, chat UI, action components, and hooks.

### Installation

```bash
npm install @amodalai/react
```

### Components

#### AmodalProvider

Wraps your app with agent context:

```tsx
import { AmodalProvider } from '@amodalai/react'

<AmodalProvider
  runtimeUrl="http://localhost:3847"
  appId="app_123"
>
  {children}
</AmodalProvider>
```

#### AmodalChat

Full chat interface:

```tsx
import { AmodalChat } from '@amodalai/react'

<AmodalChat />
```

#### AmodalAction

Trigger agent actions from buttons or other UI elements:

```tsx
import { AmodalAction } from '@amodalai/react'

<AmodalAction prompt="Summarize today's alerts">
  Get Summary
</AmodalAction>
```

### Hooks

| Hook               | Description                            |
| ------------------ | -------------------------------------- |
| `useAmodalBrief`   | Get a quick agent summary on a topic   |
| `useAmodalInsight` | Request a detailed analysis            |
| `useAmodalTask`    | Start and track background tasks       |
| `useAmodalQuery`   | Run a query and get structured results |

### Confirmation & Review

Built-in confirmation and review UIs for write operations. When the agent needs user approval, these components render automatically within the chat flow.

### Bundle Size

\~26.7KB ESM


## Architecture Overview

Amodal is a layered system. The Runtime is the agent engine (state machine, providers, tools, stores, session manager). The CLI provides the developer interface. The React SDK and chat widget embed the runtime's streaming output in web apps. Each layer has a clear boundary and a clear job.

The runtime is transport-agnostic. The same engine runs behind an HTTP server (`amodal dev`), embedded in your own Node.js application via `createAgent()`, inside an automation runner, or in a test harness — without code changes.

### System Diagram

```
Chat Interfaces
  ├── CLI (amodal chat)
  ├── React SDK (@amodalai/react)
  └── Chat Widget (@amodalai/react/widget)
        │
        ▼
Runtime (@amodalai/runtime)
  │ HTTP server, SSE streaming, session management,
  │ state machine agent loop, tools, stores, automations
  ▼
Providers (via Vercel AI SDK)
  ├── Anthropic
  ├── OpenAI
  └── Google Gemini
```

### Packages

| Package                      | Role                                                                                                                                                          |
| ---------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **`@amodalai/runtime`**      | Agent engine. State machine, provider layer, tool system, stores, session manager, HTTP server, automation runner. `createAgent()` is the public entry point. |
| **`@amodalai/core`**         | Build utilities. `loadRepo`, `buildSnapshot`, NPM package management, knowledge base formatting, MCP manager. No agent runtime code.                          |
| **`@amodalai/types`**        | Zero-dep shared types. `AgentBundle`, `SSEEvent`, `ToolDefinition`, `StoreBackend`, `CustomToolContext`, branded ID types. Safe to import from any context.   |
| **`@amodalai/amodal`** (CLI) | Terminal interface. Commands for project management, chat, eval, and package install. Built with Ink.                                                         |
| **`@amodalai/react`**        | React components + SSE chat client for embedding Amodal in web apps.                                                                                          |
| **`@amodalai/react/widget`** | Standalone embeddable chat widget (no React required on the host page).                                                                                       |

### Data Flow: What Happens When a User Sends a Message

When a user types a message and hits enter, here is what happens — from keystroke to streamed response:

#### 1. Client to Runtime

The chat client (CLI, web app, or widget) sends a POST to `/chat/stream` with the message text and an optional `sessionId`. If this is a new conversation, no session ID is sent. The connection stays open — the response streams back over SSE.

#### 2. Session Manager

The `StandaloneSessionManager` either creates a new session or loads an existing one. For a new session, it builds the session components (provider, tool registry, permission checker, logger, compiled system prompt). For an existing session, it loads the conversation history from the session store (PGLite locally, Postgres in production). It also checks that the session has not expired.

#### 3. Context Compilation

Before the message reaches the LLM, the context compiler assembles the system prompt. This is where the layered config becomes a single, coherent prompt:

* **Agent instructions**: the agent's role definition and `userContext` from `amodal.json`
* **Skills**: full body of every skill that passes requirement checks
* **Knowledge index**: a compact listing of available KB documents (titles, tags, categories) so the agent can load them on demand
* **Connection surfaces**: endpoints + field guidance + scope labels for every configured connection
* **Store schemas**: auto-generated from each store's entity definition
* **Tool definitions**: store tools, connection tools, custom tools, MCP tools, admin tools — all Vercel AI `tool()` definitions with Zod or JSON Schema

The compiled system prompt is cached and reused for every turn in the session (and prompt-cached with Anthropic providers to reduce cost).

#### 4. The State Machine

The agent loop is an explicit state machine — `thinking` → `streaming` → `executing` (if tools were called) → back to `thinking`, until done. Each state is a discriminated-union variant with its own handler. Compaction, loop detection, sub-agent dispatch, and user confirmation are all explicit states, not if-branches in a while loop.

See [State Machine](/learn/architecture/state-machine) for the full architecture: all six states, the transition rules, exhaustiveness checking, and how SSE events are returned as side effects of transitions.

#### 5. Tool Execution

When the model emits tool calls, the state machine enters `executing`. Each tool call runs through the `ToolRegistry` via the Vercel AI `tool()` interface:

* **request**: Calls a connection endpoint. The `PermissionChecker` resolves the connection's `access.json` — check ACL rules, strip hidden fields, apply rate limits, require confirmation for destructive writes.
* **query\_store / write\_\<store> / \<store>\_batch**: CRUD against the configured store backend (PGLite or Postgres via Drizzle).
* **dispatch\_task**: Spawns a sub-agent with its own isolated state machine, sharing the parent's tools and stores. Returns a compressed summary to the parent's context.
* **present**: Emits a widget SSE event with structured data for the client to render inline.
* **stop\_execution**: Ends the current turn cleanly — useful for automations that shouldn't keep talking after completing their task.
* **web\_search / fetch\_url** (when `webTools` is configured): Grounded search and URL extraction via a dedicated Gemini Flash instance, routed regardless of the main model provider. See [Web Tools](/guide/tools#web-tools).
* **Custom tools**: Compiled from each tool's `handler.ts` via esbuild, then executed with a scoped `ctx` containing `request`, `store`, `exec`, `log`, and `env`.
* **MCP tools**: Proxied to the MCP server they were discovered from.
* **Admin file tools** (in admin sessions only): `read_repo_file`, `write_repo_file`, `edit_repo_file`, `delete_repo_file`, `list_repo_files`, `glob_repo_files`, `grep_repo_files`, `read_many_repo_files`, `internal_api` — all allowlist-scoped to the agent's config directories. See [Admin Agent](/guide/admin-agent).

Tool errors are observations the model reasons about — not exceptions that crash the loop. This is the "continue site" pattern: a tool failure becomes a `tool_call_result` with `status: 'error'`, and the model decides what to do next.

#### 6. Response Streaming

Throughout this entire process, the runtime streams events to the client over SSE. The client receives `text_delta` events as the LLM generates text, `tool_call_start` and `tool_call_result` events as tools execute, `subagent_event` events when sub-agents are working, and a final `done` event with token usage when the response is complete.

The client renders these events in real time. The user sees the agent thinking, calling tools, and composing its answer — not a loading spinner followed by a wall of text.

### Deployment

#### Local Development

```bash
amodal dev    # starts runtime reading from local git repo
amodal chat   # connects to local runtime
```

The runtime reads your config directory directly from the filesystem. Changes to files are hot-reloaded — edit a skill, and the next message uses the updated version. This is the fastest feedback loop for development.

#### Production

For production, run the runtime as a standalone server or in a Docker container:

```bash
amodal deploy serve          # run from local config
amodal ops docker build      # build a Docker image
```

The git repo is the source of truth. Everything about your agent — connections, skills, knowledge, automations, and config — lives in version-controlled files.

#### Embedded (ISV)

For ISVs embedding Amodal in their own SaaS product, `createAgent()` gives you the engine as a library:

```typescript
import { createAgent } from '@amodalai/runtime'

const agent = await createAgent({
  repoPath: './my-agent',
  provider: 'anthropic',
  apiKey: process.env.ANTHROPIC_API_KEY,
  storeBackend: myPostgresPool,    // bring your own
})
```

See the [SDK overview](/sdk) for the embedded pattern.

### Security Boundaries

#### What Runs Where, What Has Access to What

The runtime process holds the resolved config (including decrypted secrets in memory), the LLM provider API clients, and the connected system API clients. Secrets are resolved from environment variables at startup and held only in memory — they are never written to disk, logged, or included in LLM prompts.

#### How Secrets Flow

Secrets never appear in deployment snapshots, API responses, LLM prompts, or logs.

1. You set secrets as environment variables in your hosting environment (Kubernetes secrets, Fly.io secrets, etc.)
2. `amodal.json` references them with `env:VARIABLE_NAME`
3. The runtime resolves `env:` references at startup, holding the values in memory
4. The connection tool uses them for API authentication — injecting headers, signing requests — but never includes them in the context sent to the LLM
5. The field scrubber scans all tool outputs against the connection's `access.json` hidden-field rules and strips them before the result reaches the model

#### Role-Based Tool Filtering

Not every caller should see every tool. The runtime filters the tool list based on the caller's role before the LLM sees it:

* **User role**: Sees all non-admin tools.
* **Admin role**: Sees everything including file tools, connection configuration, and automation control.
* **Automation role**: Sees read-only tools by default. Write tools are only available if `writeEnabled` is true for the specific automation.

This filtering happens at the session layer, before the prompt is compiled. The LLM never knows about tools it cannot use — they are simply absent from its tool list.

### Key Protocols

| Protocol                         | Where                  | Purpose                                                      |
| -------------------------------- | ---------------------- | ------------------------------------------------------------ |
| **SSE** (Server-Sent Events)     | Runtime to Client      | Streaming chat responses, tool calls, widget events          |
| **REST**                         | Runtime API            | Session management, automation control, store/file endpoints |
| **MCP** (Model Context Protocol) | Runtime to MCP servers | Tool discovery and execution over stdio or SSE transport     |

SSE was chosen over WebSockets for client streaming because it is simpler (unidirectional), works through more proxies and CDNs, and automatically reconnects on connection drops.

### Deep Dives

* [The Core Loop](/learn/architecture/core-loop) — explore-plan-execute reasoning cycle
* [State Machine](/learn/architecture/state-machine) — the agent loop implementation
* [Agent Architecture](/learn/architecture/agents) — primary agent, sub-agents, context isolation
* [Context Management](/learn/architecture/context) — compaction, tool output masking, loop detection


## Runtime Event Bus

The runtime publishes server-level state changes (sessions created/updated, automations triggered/completed, store writes, config reloads) as events. Clients subscribe once to `GET /api/events` via SSE and react in real time — no polling.

This is **separate from the chat SSE stream**. Chat responses use the per-session streaming protocol (`text_delta`, `tool_call_start`, `done`, etc. — see [State Machine](/learn/architecture/state-machine)). Runtime events are cross-session, cross-cutting server state.

### Subscribe

```ts
const events = new EventSource('/api/events')

events.addEventListener('message', (e) => {
  const event = JSON.parse(e.data)
  // { seq: 42, timestamp: "2026-04-05T14:00:00Z", type: "session_created", sessionId: "...", appId: "local" }
})
```

A comment-line heartbeat is sent every 15s to prevent proxies (nginx, Cloudflare) from closing the connection during quiet periods.

### Reconnect-and-resume

Every event carries a monotonic `seq` number. The runtime keeps a ring buffer of recent events (default 500) so clients that miss a batch during a reconnect can catch up.

`EventSource` automatically sends `Last-Event-ID` on reconnect with the `seq` of the last event it received. The server replays any buffered events with `seq > Last-Event-ID` before streaming live events. Clients that track this themselves can send the header manually:

```
GET /api/events HTTP/1.1
Last-Event-ID: 42
```

Events older than the buffer's cutoff are dropped — if a client is offline long enough that the buffer wraps, it gets live events only. The runtime logs `events_replay_overrun` when that happens.

### Event catalog

#### Session events

| Type              | Fields                         | When                                                                     |
| ----------------- | ------------------------------ | ------------------------------------------------------------------------ |
| `session_created` | `sessionId`, `appId`           | A new session was created (chat, admin, automation).                     |
| `session_updated` | `sessionId`, `appId`, `title?` | A session was persisted — new message, title change, or metadata update. |
| `session_deleted` | `sessionId`                    | A session was deleted via the API.                                       |

#### Automation events

| Type                   | Fields                        | When                                                                                  |
| ---------------------- | ----------------------------- | ------------------------------------------------------------------------------------- |
| `automation_triggered` | `name`, `source`              | Automation run starting. `source` is `"cron"`, `"webhook"`, or `"manual"`.            |
| `automation_completed` | `name`, `durationMs`          | Automation finished successfully.                                                     |
| `automation_failed`    | `name`, `error`, `durationMs` | Automation threw during the agent loop.                                               |
| `automation_started`   | `name`, `intervalMs`          | A cron automation was registered (on startup or via `amodal ops automations resume`). |
| `automation_stopped`   | `name`                        | A cron automation was paused.                                                         |

#### Delivery events

| Type                 | Fields                                                                       | When                                    |
| -------------------- | ---------------------------------------------------------------------------- | --------------------------------------- |
| `delivery_succeeded` | `automation`, `targetType`, `targetUrl?`, `httpStatus?`, `durationMs`        | A delivery target accepted the payload. |
| `delivery_failed`    | `automation`, `targetType`, `targetUrl?`, `httpStatus?`, `error`, `attempts` | Delivery failed after retries.          |

#### Store events

| Type            | Fields                             | When                                                                                                 |
| --------------- | ---------------------------------- | ---------------------------------------------------------------------------------------------------- |
| `store_updated` | `storeName`, `operation`, `count?` | A document was written, deleted, or batch-written. `operation` is `"put"`, `"delete"`, or `"batch"`. |

#### Config events

| Type               | Fields  | When                                                                                                                   |
| ------------------ | ------- | ---------------------------------------------------------------------------------------------------------------------- |
| `manifest_changed` | —       | The agent manifest (connections, skills, automations) was reloaded from disk. `amodal dev` emits this on file changes. |
| `files_changed`    | `path?` | A file in the agent repo was touched. `amodal dev` watches for these to trigger hot reload.                            |

### Types

All events carry `seq`, `timestamp`, and `type` fields plus the event-specific payload. The full type union is exported from `@amodalai/types`:

```ts
import type { RuntimeEvent, RuntimeEventType } from '@amodalai/types'

function handleEvent(event: RuntimeEvent) {
  switch (event.type) {
    case 'session_created':
      // event.sessionId, event.appId
      break
    case 'automation_completed':
      // event.name, event.durationMs
      break
    // ...
    default: {
      const _exhaustive: never = event
      throw new Error(`unhandled event: ${String(_exhaustive)}`)
    }
  }
}
```

### When to use events vs. polling

Use the event bus for **reactive UI updates** — anywhere your UI would otherwise poll every N seconds:

* Session list that updates when a new chat starts
* Automation status page showing live triggered/completed state
* Store-backed dashboards that refresh on write
* Config/manifest hot-reload indicators

Use HTTP requests for **point-in-time queries** — fetching session history, running an ad-hoc automation, reading a store document. The event bus signals that state changed; you still need the REST API to fetch the new state.

### Source

The event bus lives in `packages/runtime/src/events/event-bus.ts`. Internal runtime components emit events (session manager on create/update/delete, proactive runner on automation lifecycle, store backend wrapper on writes, config watcher on file changes). The bus fans events out to all SSE subscribers with sequence numbering and replay buffering.


## Runtime Server

The runtime (`@amodalai/runtime`) is an HTTP server that hosts the agent engine and exposes it over SSE for chat clients. It handles session management, automation scheduling, webhook ingestion, and tool execution. It is the bridge between "someone sends a message" and "the agent reasons about it and responds."

The underlying engine is a state machine — see [State Machine](/learn/architecture/state-machine) for the architecture.

The runtime is deliberately stateless in its HTTP layer — all persistent state lives in the store backend (PGLite or Postgres). This means you can run multiple runtime instances behind a load balancer, and sessions will work correctly as long as they share the same store.

### Endpoints

#### Chat

| Method | Path           | Description                                             |
| ------ | -------------- | ------------------------------------------------------- |
| `POST` | `/chat`        | Send a message, get a complete response (non-streaming) |
| `POST` | `/chat/stream` | Send a message, stream response via SSE                 |
| `GET`  | `/ai/stream`   | Direct AI streaming endpoint                            |

The `/chat/stream` endpoint is what most clients use. You POST a message and keep the connection open — the response streams back as SSE events. The non-streaming `/chat` endpoint waits for the full response before returning, which is simpler but means no real-time feedback during tool execution.

#### Sessions

| Method | Path            | Description                         |
| ------ | --------------- | ----------------------------------- |
| `GET`  | `/sessions`     | List sessions (supports pagination) |
| `GET`  | `/sessions/:id` | Get session history                 |
| `PUT`  | `/sessions/:id` | Update session metadata             |

#### Interactions

| Method | Path                 | Description                        |
| ------ | -------------------- | ---------------------------------- |
| `POST` | `/ask-user-response` | Respond to a confirmation prompt   |
| `POST` | `/widget-actions`    | Handle widget button/action clicks |

When the agent needs confirmation for a write operation, it emits a `confirmation_required` SSE event. The client shows the confirmation UI and sends the user's response back via `/ask-user-response`. The agent's state machine is paused in the `confirming` state while waiting.

#### System

| Method | Path        | Description                       |
| ------ | ----------- | --------------------------------- |
| `GET`  | `/health`   | Health check                      |
| `POST` | `/webhooks` | Incoming webhooks for automations |

### SSE Event Types

When streaming via `/chat/stream`, the server emits these event types:

| Event                   | Payload               | Description                                            |
| ----------------------- | --------------------- | ------------------------------------------------------ |
| `init`                  | `{ sessionId }`       | Session created or resumed                             |
| `text_delta`            | `{ delta }`           | Incremental text output from the LLM                   |
| `tool_call_start`       | `{ name, id }`        | Tool execution beginning                               |
| `tool_call_result`      | `{ id, result }`      | Tool execution complete                                |
| `subagent_event`        | `{ agentId, event }`  | Task agent activity (dispatched work)                  |
| `skill_activated`       | `{ skill }`           | Skill matched and loaded into context                  |
| `widget`                | `{ type, data }`      | Widget rendered inline (entity-card, data-table, etc.) |
| `kb_proposal`           | `{ proposal }`        | Knowledge base update proposed                         |
| `confirmation_required` | `{ action, details }` | Waiting for user approval of a write                   |
| `approved`              | `{ action }`          | User approved an action                                |
| `credential_saved`      | `{ connection }`      | Connection credentials captured                        |
| `error`                 | `{ message }`         | Error occurred                                         |
| `done`                  | `{}`                  | Response complete                                      |

### Streaming Example

Here is what a real SSE stream looks like when a user asks "What were our top 5 customers by revenue last month?" and the agent queries Stripe to answer:

```
event: init
data: {"sessionId":"sess_k8m2x4n9"}

event: text_delta
data: {"delta":"I'll pull the revenue data from Stripe for last month."}

event: text_delta
data: {"delta":" Let me look that up."}

event: tool_call_start
data: {"name":"dispatch","id":"tc_01","args":{"task":"Query Stripe for all charges in the last month, grouped by customer. Return the top 5 by total amount with customer name and total."}}

event: subagent_event
data: {"agentId":"agent_x2k4","event":{"type":"tool_call_start","name":"request","id":"tc_sub_01"}}

event: subagent_event
data: {"agentId":"agent_x2k4","event":{"type":"tool_call_result","id":"tc_sub_01","result":"[200 OK] 847 charges retrieved"}}

event: tool_call_result
data: {"id":"tc_01","result":"Top 5 customers by revenue (March 2026):\n1. Globex Corp — $48,290\n2. Initech — $41,750\n3. Acme Inc — $38,420\n4. Wayne Enterprises — $29,100\n5. Umbrella Corp — $24,680\nTotal for top 5: $182,240 (68% of total monthly revenue)"}

event: text_delta
data: {"delta":"Here are your top 5 customers by revenue for last month:\n\n"}

event: widget
data: {"type":"data-table","data":{"columns":["Rank","Customer","Revenue","% of Total"],"rows":[["1","Globex Corp","$48,290","18.1%"],["2","Initech","$41,750","15.7%"],["3","Acme Inc","$38,420","14.4%"],["4","Wayne Enterprises","$29,100","10.9%"],["5","Umbrella Corp","$24,680","9.3%"]]}}

event: text_delta
data: {"delta":"\nThese five customers represent 68% of your total monthly revenue of $267,840. Globex Corp leads at $48,290, which is 18.1% of the total."}

event: done
data: {}
```

Notice the sequence: the agent starts with a text message explaining what it will do, dispatches a task agent to gather the data, receives the summarized result, renders a widget with structured data, and wraps up with analysis. The client receives all of this in real time — it can render the text as it arrives, show a loading indicator during tool calls, and render the table widget inline.

### Building a Client

Here is a minimal JavaScript example that consumes the SSE stream:

```javascript
async function chat(message, sessionId) {
  const response = await fetch('http://localhost:3847/chat/stream', {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify({ message, sessionId }),
  });

  const reader = response.body.getReader();
  const decoder = new TextDecoder();
  let buffer = '';
  let currentSessionId = sessionId;

  while (true) {
    const { done, value } = await reader.read();
    if (done) break;

    buffer += decoder.decode(value, { stream: true });
    const lines = buffer.split('\n');
    buffer = lines.pop(); // keep incomplete line in buffer

    let eventType = null;
    for (const line of lines) {
      if (line.startsWith('event: ')) {
        eventType = line.slice(7);
      } else if (line.startsWith('data: ') && eventType) {
        const data = JSON.parse(line.slice(6));

        switch (eventType) {
          case 'init':
            currentSessionId = data.sessionId;
            break;
          case 'text_delta':
            process.stdout.write(data.delta);
            break;
          case 'tool_call_start':
            console.log(`\n[Calling ${data.name}...]`);
            break;
          case 'tool_call_result':
            console.log(`[Result received]`);
            break;
          case 'widget':
            renderWidget(data.type, data.data); // your widget renderer
            break;
          case 'error':
            console.error(`Error: ${data.message}`);
            break;
          case 'done':
            console.log('\n');
            break;
        }
        eventType = null;
      }
    }
  }

  return currentSessionId;
}

// Usage: maintain session across messages
let sessionId = null;
sessionId = await chat('What were our top customers last month?', sessionId);
sessionId = await chat('How does that compare to the month before?', sessionId);
```

For production use, the `@amodalai/react` package provides hooks that handle all of this — reconnection, event parsing, widget rendering, confirmation flows — so you do not need to build the SSE client from scratch:

* **`useChat`** — full-featured hook for the main chat endpoint (`/chat/stream`). Includes session resume, history loading, auth tokens, and ask-user flows.
* **`useAmodalChat`** — convenience wrapper for apps using the `AmodalProvider` context (uses `AmodalClient` for transport).
* **`useChatStream`** — low-level hook for custom endpoints. You supply a `streamFn` that connects to any SSE endpoint; the hook handles the reducer, event loop, tool-call tracking, and widget event bus. Use this when integrating with non-standard chat endpoints (e.g. admin chat at `/config/chat`).

### Session Lifecycle

#### Creation

The first message to `/chat/stream` without a `sessionId` creates a new session. The runtime generates a unique ID (`sess_*`), initializes the message history, compiles the initial context (system prompt, knowledge index, tool definitions), and emits the `init` event with the session ID. The client stores this ID and sends it with all subsequent messages.

#### Active State

During an active session, messages accumulate and context grows. Each exchange adds the user's message, the agent's reasoning, tool calls, and tool results to the conversation history. The session is persisted to the store after each exchange.

#### Context Compaction

As the conversation grows, it eventually approaches the model's context window limit. When the compiled prompt exceeds 80% of the window, the context compiler triggers **compaction**:

1. **Identify compactable segments**: Older conversation turns that are not part of an active tool chain
2. **Summarize**: The runtime sends the old turns to the explore model with a summarization prompt: "Compress this conversation history into a concise summary preserving key facts, decisions, and open threads"
3. **Replace**: The old turns are replaced with the summary in the session history
4. **Continue**: The next LLM call uses the compacted history, freeing up context for new turns

```
Before compaction (context at 85%):
  [system prompt] [KB index] [tools]
  [turn 1: user question + agent response + 3 tool calls]
  [turn 2: user question + agent response + 5 tool calls]
  [turn 3: user question + agent response + 2 tool calls]  ← compactable
  [turn 4: user question + agent response]                  ← recent, kept
  [turn 5: current message]                                  ← current

After compaction (context at 52%):
  [system prompt] [KB index] [tools]
  [summary: "User asked about revenue trends. Agent found Q1 was up 12%.
   User then asked about churn — agent identified 3 at-risk accounts.
   User requested a comparison with last quarter."]
  [turn 4: user question + agent response]                  ← kept in full
  [turn 5: current message]                                  ← current
```

Compaction is transparent to the user. They do not see it happen. The agent retains the key facts from earlier turns and can reference them — it just loses the exact wording and tool call details of older exchanges. For most conversations, this works well. For very long investigations where exact details from early turns matter, write those findings to a [store](/guide/stores) so they survive compaction and can be queried on demand from later turns.

#### Idle and Expiry

Sessions that receive no messages for the configured TTL period (`AMODAL_SESSION_TTL`, default 3600 seconds) transition to idle. After another TTL period, idle sessions expire and their context is discarded. The session metadata (ID, creation time, message count) is retained for audit purposes, but the full conversation history is cleaned up.

Sessions persist across reconnects. If the client's SSE connection drops (network hiccup, browser tab backgrounded), the client can reconnect and resume by sending the same `sessionId`. The session state lives in the store, not in the SSE connection.

### Automation Scheduling

The runtime includes a built-in scheduler for cron automations and a webhook listener for event-triggered automations. These run alongside the HTTP server in the same process.

#### Cron Scheduling

On startup, the runtime reads all automation definitions from the `automations/` directory. For each automation with a `schedule` field, it registers a cron job.

When a cron job fires:

1. The scheduler creates a fresh agent session with the `automation` role
2. The automation's prompt is set as the initial message, with `lastRunSummary` from the previous run injected as context
3. `lastRunTimestamp` is provided so the agent can scope queries to new data
4. The SDK runs the explore-plan-execute loop
5. On completion, the output is routed to the configured channel (Slack, email, webhook)
6. The run summary is stored for the next run's continuity
7. The session is closed and context is discarded

The scheduler uses a lightweight in-process cron library. It does not depend on external job queues or message brokers. For high-availability deployments with multiple runtime instances, only one instance runs the scheduler (leader election via the store's advisory locks) to prevent duplicate runs.

#### Webhook Routing

Each webhook automation gets a deterministic URL path derived from its name: `/webhooks/<automation_id>`. When the runtime receives a POST to a webhook URL:

1. The request is matched to an automation by the URL path
2. If signature verification is configured, the runtime validates the request signature
3. The JSON payload is extracted and stringified
4. The `{{event}}` placeholder in the automation's prompt is replaced with the payload
5. A fresh agent session is created and the composed prompt is sent
6. The response is routed to the configured output channel

Webhook automations run immediately — there is no queuing. If two webhooks arrive simultaneously, two separate sessions run in parallel. Each gets its own isolated context and does not interfere with the other.

#### Automation Monitoring

The runtime exposes automation run status through the `/sessions` API (automation sessions are tagged with their automation name and run ID) and through the health endpoint.

Failed automation runs (LLM error, timeout, tool failure) are logged with full context. The runtime does not retry automatically — if a cron run fails, the next scheduled run starts fresh. For webhook automations, the runtime returns a 500 to the caller, which can retry according to its own policy.

### Configuration

The runtime reads from `amodal.json` and environment variables:

#### Environment Variables

| Variable             | Default | Description                |
| -------------------- | ------- | -------------------------- |
| `PORT`               | `3847`  | Server port                |
| `AMODAL_SESSION_TTL` | `3600`  | Session timeout in seconds |

#### Store Backend

The runtime uses PGLite (in-process Postgres) by default for session and store data. This requires no setup — PGLite runs embedded in the Node.js process and stores data in a local directory. For production, configure an external PostgreSQL instance:

```json
{
  "stores": {
    "backend": "postgres",
    "connectionString": "env:DATABASE_URL"
  }
}
```

Using external Postgres is recommended for production because it enables multiple runtime instances to share session state (required for load balancing) and provides proper backup and recovery.

### Starting the Server

```bash
# Development (repo mode with hot reload)
amodal dev

# Production (standalone)
amodal deploy serve

# Docker
amodal ops docker build
docker run -p 3847:3847 amodal-runtime
```


## Quick Start

This guide walks you through creating an agent from scratch, connecting it to an API, and having a conversation with it. By the end you will have a working agent running locally that can query external systems and reason about the results.

### 1. Initialize a project

```bash
npx amodal init
```

The init command is interactive — it asks for your product name and type, then scaffolds the full directory structure:

```
my-agent/
├── .amodal/
│   └── config.json            ← manifest (name, provider, model)
├── connections/               ← API definitions (empty for now)
├── skills/
│   └── general.md             ← sample skill with basic reasoning
├── knowledge/
│   └── welcome.md             ← sample knowledge doc
├── automations/               ← proactive triggers (empty)
└── evals/                     ← test scenarios (empty)
```

The `.amodal/config.json` file is the project manifest. Everything else is convention-over-configuration: drop files into the right folders and the runtime picks them up automatically.

### 2. Configure your provider

Set your LLM provider credentials. Amodal auto-detects from environment variables:

```bash
# Pick one:
export ANTHROPIC_API_KEY=sk-ant-...
export OPENAI_API_KEY=sk-...
export GOOGLE_API_KEY=...
```

The auto-detection order is: **Anthropic → OpenAI → Google**. The runtime checks for each provider's environment variable in sequence and uses the first one it finds. If no key is set, `amodal dev` will start but the agent cannot respond — you will see a clear error message telling you which environment variables to set.

You can also pin a specific provider and model in `amodal.json` to override auto-detection:

```json
{
  "name": "My Agent",
  "provider": "anthropic",
  "model": "claude-sonnet-4-20250514"
}
```

This is useful when your team standardizes on a model or when you want different models in dev vs. production.

### 3. Add a connection

Install a pre-built plugin or create a custom connection:

```bash
# Install a plugin (e.g., Slack, GitHub, Datadog)
amodal pkg connect slack

# Or sync from an OpenAPI spec
amodal pkg sync --from https://api.example.com/openapi.json
```

When you run `amodal pkg connect slack`, the CLI installs the `@amodalai/connection-slack` package. The resolver automatically surfaces the plugin's contents — spec, access rules, documentation, and more — alongside your local files. The package includes:

* **spec.json** — API endpoints, methods, parameters, auth config
* **access.json** — field restrictions, action tiers, scoping rules
* **surface.md** — natural-language overview of what the API can do
* **entities.md** — key entities (channels, users, messages, threads)
* **rules.md** — usage constraints (rate limits, read-only by default)

The `surface.md` and `entities.md` files are what make this different from a raw OpenAPI spec. They teach the agent *when* to use the API and *what the data means* — in plain English. To customize behavior, create a local `connections/slack/` directory and add your own files — local files always take precedence over package files. For example, adding a `rules.md` with "Never post to #general without explicit confirmation" will override the default rules.

You will be prompted to provide your Slack Bot Token, which is stored as an environment variable reference (e.g., `env:SLACK_BOT_TOKEN`).

### 4. Start the dev server

```bash
amodal dev
```

The dev server compiles your entire project into an optimized context and starts serving:

```
  ✓ Compiled context (3 connections, 2 skills, 4 knowledge docs)
  ✓ Token budget: 12,847 / 200,000 (6.4%)
  ✓ Connected APIs: slack (42 endpoints), github (18 endpoints)

  Runtime ready:
    Chat UI     → http://localhost:3847
    SSE API     → http://localhost:3847/api/v1/chat
    Health      → http://localhost:3847/health

  Watching for changes (300ms debounce)...
```

The key number is the token budget. Amodal tracks exactly how many tokens your connections, skills, and knowledge consume so you know how much room the agent has for reasoning. If you edit any file, the runtime recompiles and hot-reloads — no restart needed.

### 5. Chat with your agent

```bash
amodal chat
```

This opens a terminal chat UI built with React and Ink. Here is what a real conversation looks like with the Slack connection active:

```
You: What happened in #incidents today?

Agent: I'll check the #incidents channel for today's activity.

  ► dispatch: query Slack #incidents channel for today's messages
    → request: GET conversations.history (channel: C04INCIDENTS, oldest: 1711411200)
    → Found 3 messages across 2 threads

Here's what happened in #incidents today:

┌─────────────────────────────────────────────────────────┐
│ #incidents — March 26, 2026                             │
├─────────────────────────────────────────────────────────┤
│ 09:14 AM  @sarah — Database connection pool exhausted   │
│           on prod-db-02. P1 declared. Resolved at 9:47  │
│           by scaling pool from 50→200.                  │
│                                                         │
│ 11:30 AM  @mike — Elevated 5xx rates on /api/checkout.  │
│           Traced to a bad deploy (v2.14.3). Rolled back │
│           at 11:42. Post-mortem scheduled.              │
│                                                         │
│ 02:05 PM  @alex — False alarm: monitoring blip from     │
│           Datadog agent restart. No user impact.        │
└─────────────────────────────────────────────────────────┘

Two real incidents today — a DB connection pool issue (resolved) and a
bad deploy causing checkout errors (rolled back). The third was a false
alarm. Want me to dig into any of these?
```

Notice the `► dispatch` line: the primary agent delegated data gathering to a task agent, which queried Slack, parsed the raw API response, and returned a clean summary. The primary agent never saw the raw JSON — it stayed focused on reasoning and presentation.

You can also connect to a remote server or resume a previous session:

```bash
amodal chat --url http://localhost:3847
amodal chat --resume latest
```

### 6. Validate and inspect

Before deploying, verify that your project is well-formed and review what the agent actually sees:

```bash
amodal validate    # Check for missing connections, config issues
amodal inspect     # Show compiled context with token counts
```

The `inspect` command shows exactly how your token budget is allocated:

```
  Context Budget Breakdown
  ────────────────────────────────────────────
  System prompt          1,240 tokens   (0.6%)
  Skills (2)             2,180 tokens   (1.1%)
    general.md             890
    triage.md            1,290
  Knowledge (4)          3,420 tokens   (1.7%)
    welcome.md             310
    environment.md       1,450
    team.md                820
    baselines.md           840
  Connections (1)        6,007 tokens   (3.0%)
    slack                6,007
      surface.md         1,200
      entities.md        2,340
      rules.md             467
      spec (42 endpoints) 2,000
  ────────────────────────────────────────────
  Total context         12,847 tokens   (6.4%)
  Available for chat   187,153 tokens  (93.6%)
```

This breakdown is important. If your context grows too large (e.g., connecting 20 APIs with full specs), the agent has less room for reasoning and conversation history. The inspect command helps you find what to trim.

### What just happened?

Here is the full flow of what happened when you chatted with your agent:

1. **Compile** — `amodal dev` read every file in your project (connections, skills, knowledge) and compiled them into a single optimized context. Connection specs became tool definitions. Skills became system instructions. Knowledge became loadable documents indexed by tag.

2. **Route** — When you asked "what happened in #incidents today?", the primary agent recognized this as a data-gathering question. Instead of trying to answer from memory, it planned a retrieval step.

3. **Dispatch** — The primary agent used the `dispatch` tool to spin up a task agent with a focused job: "query the Slack #incidents channel for today's messages." This task agent got its own fresh context with just the Slack connection docs loaded.

4. **Execute** — The task agent called the Slack API via the `request` tool (read-only, auth handled automatically by the connection config). It received the raw JSON response, interpreted it using the entity documentation from `entities.md`, and returned a structured summary back to the primary agent.

5. **Present** — The primary agent received the \~200-token summary (not the raw API response), reasoned about it, and presented the results using a formatted table. The primary agent's context stayed clean — around 4-6K tokens even though the task agent processed much more raw data.

This is **context isolation** in action. The primary agent delegates data gathering to ephemeral workers, keeps its own context small for high-quality reasoning, and composes the final answer. Task agents can even dispatch sub-task agents for complex multi-system queries.

### What's Next

* **[Project Structure](/quickstart/project-structure)** — Understand the full repo layout and conventions
* **[CLI Reference](/cli)** — All 30+ commands with examples
* **[Add skills](/guide/skills)** — Author reasoning frameworks that activate automatically
* **[Add knowledge](/guide/knowledge-base)** — Teach your agent about your domain


## Introduction

Amodal is a **git-repo-centric agent runtime**. Your agent's configuration — connections, skills, knowledge, tools, automations — lives as files in your repository. The runtime reads these files, compiles them into an optimized context, and runs a reasoning loop against any supported LLM provider.

### How It Works

```
Your Repo
  ├── amodal.json          ← agent identity, provider config
  ├── connections/          ← API credentials + docs (or plugins)
  ├── skills/              ← Markdown reasoning frameworks
  ├── knowledge/           ← Domain knowledge documents
  ├── tools/               ← Custom HTTP/chain/function tools
  ├── automations/         ← Scheduled or trigger-based runs
  └── evals/               ← Test cases for agent quality

       ↓ amodal dev

Runtime Server (localhost:3847)
  ├── Context Compiler     ← builds optimized prompts
  ├── Token Allocator      ← manages context window budget
  ├── Security Layer       ← field scrubbing, output guards, action gates
  ├── Provider Adapter     ← Anthropic / OpenAI / Google Gemini
  └── Session Manager      ← TTL-based sessions with hot reload
```

### The Core Loop

Every agent runs the same fundamental cycle:

```
Explore → query connected systems, load knowledge, gather context
Plan    → reason about findings, decide next steps
Execute → call APIs, dispatch sub-agents, present results, learn
```

Simple questions skip planning. Complex questions get the full loop with multi-agent dispatch and skill activation. The runtime matches depth to the question automatically.

### Key Capabilities

| Capability                  | What It Does                                                                 |
| --------------------------- | ---------------------------------------------------------------------------- |
| **Multi-provider**          | Anthropic Claude, OpenAI, Google Gemini — with failover                      |
| **Git-native config**       | Everything is a file. Version, diff, review, and deploy your agent like code |
| **20+ connection plugins**  | Slack, GitHub, Stripe, Datadog, Jira, PagerDuty, Salesforce, and more        |
| **Security infrastructure** | Field scrubbing, output guards, action gates, scope checking, leak detection |
| **Evaluation framework**    | LLM-judged evals, experiments, cost tracking, multi-model comparison         |
| **Hot reload**              | File watcher on your repo — edit config, agent updates instantly             |
| **React SDK**               | `@amodalai/react` components and chat widget for embedding                   |
| **Snapshot deployment**     | Build → snapshot → deploy and self-host                                      |

### Next Steps

* **[Quick Start](/quickstart/create-agent)** — Build your first agent
* **[Project Structure](/quickstart/project-structure)** — What goes where in your repo
* **[CLI Overview](/cli)** — All available commands


## Project Structure

Everything that defines your agent lives in your repo root. The runtime reads these files at startup and watches for changes during development.

```
my-agent/
├── amodal.json                  ← root config: name, models, sandbox, MCP
├── connections/                 ← REST APIs and MCP servers
│   ├── slack/
│   │   ├── spec.json            ← API source, auth, entities, sync config
│   │   ├── access.json          ← field restrictions, action tiers, scoping
│   │   ├── surface.md           ← (optional) endpoint documentation
│   │   ├── entities.md          ← (optional) entity definitions
│   │   └── rules.md             ← (optional) business rules
│   ├── internal-api/
│   │   ├── spec.json
│   │   └── access.json
│   └── github/                  ← MCP connection (no access.json needed)
│       └── spec.json
├── skills/                      ← Markdown reasoning frameworks
│   └── triage/
│       └── SKILL.md
├── knowledge/                   ← Domain knowledge documents
│   ├── environment.md
│   └── baselines.md
├── tools/                       ← Custom tool handlers
│   └── create_ticket/
│       ├── tool.json            ← (optional) metadata, parameters, confirmation
│       └── handler.ts           ← handler code
├── stores/                      ← Persistent data store schemas
│   └── active-alerts.json
├── pages/                       ← React UI views (dashboards, briefs)
│   ├── ops-dashboard.tsx
│   └── morning-brief.tsx
├── automations/                 ← Scheduled or trigger-based runs
│   └── daily-digest.json
├── evals/                       ← Test cases for agent quality
│   └── triage-accuracy.md
├── agents/                      ← subagents + built-in overrides
│   ├── explore/                 ← override: replaces default explore agent
│   │   └── AGENT.md
│   ├── compliance-checker/      ← custom subagent
│   │   └── AGENT.md
│   └── vendor-lookup/           ← custom subagent
│       └── AGENT.md
└── .amodal/
    └── store-data/              ← PGLite data directory (gitignored)
```

### Config Reference

| File/Directory | What It Does                                        | Docs                                    |
| -------------- | --------------------------------------------------- | --------------------------------------- |
| `amodal.json`  | Agent identity, models, sandbox, MCP servers        | [amodal.json](/guide/config)            |
| `connections/` | REST API and MCP server connections                 | [Connections](/guide/connections)       |
| `skills/`      | Expert reasoning frameworks                         | [Skills](/guide/skills)                 |
| `knowledge/`   | Domain knowledge documents                          | [Knowledge Base](/guide/knowledge-base) |
| `tools/`       | Custom tool handlers with code                      | [Tools](/guide/tools)                   |
| `stores/`      | Persistent data store schemas                       | [Stores](/guide/stores)                 |
| `pages/`       | React UI views (dashboards, briefs, detail screens) | [Pages](/guide/pages)                   |
| `automations/` | Scheduled/triggered agent runs                      | [Automations](/guide/automations)       |
| `evals/`       | Test cases and assertions                           | [Evals](/guide/evals)                   |
| `agents/`      | Custom subagents + built-in agent overrides         | [Agents](/guide/agents)                 |
| MCP servers    | Defined as connections or in `amodal.json`          | [MCP Servers](/guide/mcp)               |

### Hot Reload

During `amodal dev`, the runtime watches all config files for changes with 300ms debounce. Edit any file and the agent picks up changes instantly — no restart needed.


## FAQ

For architectural positioning vs. Vercel AI SDK, LangChain, Mastra, and raw provider SDKs, see [What is an agent?](/learn/what-is-an-agent).

### Does Amodal lock me into a specific LLM provider?

No. Provider choice is a one-line config change. Amodal supports Anthropic, OpenAI, and Google Gemini today, with automatic failover between providers. Adding a new provider is wiring in the corresponding `@ai-sdk/*` package — the runtime's provider interface is thin.

### Can I self-host Amodal?

Yes. The entire runtime is MIT-licensed and runs as a single Node.js process. Deploy it anywhere that runs Node (AWS, GCP, Fly, Railway, your own metal). Bring your own Postgres for production, or use the embedded PGLite for dev.

The hosted cloud platform (`amodalai.com`) is a separate product. The OSS runtime is fully functional without it.

### Where's the intelligence configured?

`amodal.json` at the root of your agent repo:

```json
{
  "name": "My Agent",
  "provider": "anthropic",
  "model": "claude-sonnet-4-20250514"
}
```

Swap the provider and model to change the intelligence. See [Providers](/guide/providers) for the full list.

### Where does the agent's "personality" and behavior come from?

Three places, in order of weight:

1. **Skills** (`skills/*.md`) — expert reasoning methodologies that activate based on the question
2. **Knowledge** (`knowledge/*.md`) — persistent domain context the agent loads on demand
3. **`userContext` in `amodal.json`** — standing instructions that apply to every turn

All three get compiled into the system prompt before every LLM call.

### Is Amodal a framework or a runtime?

Both. `@amodalai/runtime` is the engine — a state machine agent loop with provider/tool/store/session systems. It's a library you can embed in your app with `createAgent()`.

The CLI (`amodal dev`) wraps the runtime in an HTTP server with an admin UI. That's the "framework" layer — optional, but what most people use locally.

### Can I extend the runtime?

Yes, several ways:

* **Custom tools**: Drop a `handler.ts` in `tools/<name>/` and it's available to the agent. See [Tools](/guide/tools).
* **MCP servers**: Connect any MCP server and its tools are discovered automatically. See [MCP Servers](/guide/mcp).
* **Custom connections**: Define an OpenAPI-style spec in `connections/<name>/` to expose a new HTTP API to the agent. See [Connections](/guide/connections).
* **Embedded runtime**: Use `createAgent()` to embed the agent in your own Node.js application. See [SDK Overview](/sdk).
* **Fork it**: MIT license. Add new states to the state machine, new providers, new store backends.

### Is there a package ecosystem?

Yes. Install pre-built connections, skills, and tools from the registry:

```bash
amodal pkg install @amodalai/slack       # Slack connection
amodal pkg install @amodalai/stripe      # Stripe connection
amodal pkg install @amodalai/ops-pack    # Bundle of on-call skills
```

Packaged connections and skills drop into your `connections/` and `skills/` directories. Install once, use everywhere. Publish your own to the marketplace.

### How does the runtime compare to Claude Code, Cursor, Aider?

Those are **coding agents** — the domain is "your codebase" and the tools are file editing, shell commands, and git. They're opinionated products.

Amodal is a **runtime for building domain-specific agents**. You specify the domain (via connections, skills, knowledge). The domain isn't code — it's whatever you want: SOC compliance, payment investigation, customer support, HR recruiting, ops incident response.

If you want a coding agent, use Claude Code or Cursor. If you want to build an agent that lives inside your SaaS product, your ops pipeline, or your vertical domain, use Amodal.

### Can I use my own database?

Yes. Inject a store backend at runtime creation time:

```typescript
import { createAgent } from '@amodalai/runtime'
import { Pool } from 'pg'

const agent = await createAgent({
  repoPath: './my-agent',
  storeBackend: createPostgresStoreBackend(new Pool({ connectionString: process.env.DATABASE_URL })),
})
```

For `amodal dev`, set the backend in `amodal.json`:

```json
{
  "stores": {
    "backend": "postgres",
    "postgresUrl": "env:DATABASE_URL"
  }
}
```

PGLite is the default for local dev (zero config). Postgres is the production path. Custom `StoreBackend` implementations can target any store that satisfies the interface.

### Are sessions persistent across server restarts?

Yes. Sessions are stored in the configured store backend (PGLite locally, Postgres in production). When you restart the runtime, sessions resume with their full conversation history.

### How do I handle secrets?

Environment variables referenced from `amodal.json` with the `env:` prefix:

```json
{
  "connections": {
    "stripe": { "auth": { "api_key": "env:STRIPE_API_KEY" } }
  }
}
```

Secrets are resolved at startup and held only in memory. They never appear in logs, the LLM prompt, or deployment snapshots. See [Security & Guardrails](/guide/security).

### How do I prevent the agent from making destructive API calls?

Connection ACLs via `connections/<name>/access.json`:

* Mark endpoints `allow`, `confirm`, or `deny`
* Mark fields `hidden` to strip them from responses
* Require `intent: "write"` or `"confirmed_write"` on mutating HTTP methods
* Rate limits per connection

The runtime enforces these rules before tool calls execute. See [Security & Guardrails](/guide/security) and [Connections](/guide/connections).

### How do I test an agent?

Evals. Define test cases in `evals/*.yaml` with LLM-judged assertions, run them with `amodal eval`, compare scores across models. See [Evals](/guide/evals).

### Can automations run without the UI open?

Yes. Automations are cron/webhook-triggered and run in the runtime process. They execute independently of any connected chat client and can post results to external systems (Slack, webhooks). See [Automations](/guide/automations).

### How do I debug when something's wrong?

* **Logs**: Run with `-v` or `-vv` for structured log output. Every tool call, state transition, and error is logged with context.
* **Inspect endpoints**: `GET /inspect/context` shows the compiled system prompt, active skills, and loaded knowledge. `/inspect/connections/:name` shows a connection's resolved spec.
* **UI Activity panel**: The runtime admin UI (`amodal dev`) shows every SSE event in real time — text deltas, tool calls, errors, compaction, sub-agent dispatches.

### Is there a chat widget I can drop into my own web app?

Yes. `@amodalai/react/widget` is a standalone embeddable chat widget with SSE streaming, theming, and callbacks. No React required on the host page. See [Chat Widget](/sdk/chat-widget).


## What is an agent?

Useful definition:

> **An agent is intelligence + a reasoning loop + configuration.**

Most "what is an AI agent" explanations wave at emergent autonomy. That's the marketing answer. The engineering answer is three concrete pieces.

### 1. Intelligence

The LLM. This is what people usually mean by "AI" — a model that takes tokens in and predicts tokens out. Claude, GPT-4, Gemini, whichever.

The intelligence is **not** the agent. It's one component. A raw LLM call is stateless, has no tools, and has no idea what you're working on. Give it a chat interface and you have a chatbot — not an agent.

Swapping the intelligence is a configuration change. Amodal supports Anthropic, OpenAI, and Google today, and can fail over between them. See [Providers](/guide/providers).

### 2. A reasoning loop

The loop is what turns a single LLM call into something that can gather information, take actions, correct itself, and keep going. Without a loop, the LLM sees your message, emits a response, done. With a loop, the LLM can:

* Ask for information ("call this API")
* See the result
* Decide what's next (ask for more, take an action, answer the user)
* Repeat until it's done

Every agent framework has some version of this loop. The canonical pattern is **ReAct** (Reason + Act, Princeton/Google, 2022): the model alternates between reasoning about what it knows and taking actions to learn more.

Amodal's loop is an explicit state machine: `thinking → streaming → executing → back to thinking`, with additional states for sub-agent dispatch, user confirmation, and context compaction. It's deliberately more structured than a `while` loop so adding new behaviors (pausing for approval, detecting when the agent is stuck, spawning sub-agents) is additive instead of tangled. See [State Machine](/learn/architecture/state-machine).

The loop is **domain-agnostic**. The same loop runs every agent you build.

### 3. Configuration

This is the agent's knowledge of *your* domain. Where the first two are generic infrastructure, this is the thing that makes your agent different from every other agent.

In Amodal, configuration is a git repo of plain files:

* **Connections** tell the agent which APIs it can call, how to authenticate, and which endpoints/fields are off-limits.
* **Skills** are markdown documents encoding expert reasoning ("when a user asks about payment failures, here's how to investigate").
* **Knowledge** is persistent domain context (environment details, team structure, historical patterns).
* **Stores** are typed data buckets the agent reads and writes (findings, user preferences, automation outputs).
* **Tools** are actions the agent can take (HTTP calls, custom handlers, MCP integrations).
* **`amodal.json`** pins the provider, model, timeouts, sandbox rules, and everything else.

The loop compiles this configuration into the system prompt for every turn. The LLM sees a coherent picture: who it is, what it knows, what it can do, and what rules it has to follow.

### The three pieces are independent

| Piece          | Swap example                           | Impact                                                                                      |
| -------------- | -------------------------------------- | ------------------------------------------------------------------------------------------- |
| Intelligence   | Switch from Claude Sonnet to GPT-4o    | Reasoning style changes, cost changes, speed changes. Configuration and loop are untouched. |
| Reasoning loop | Add a new state (e.g., async approval) | All agents get the new behavior. Configuration and intelligence are untouched.              |
| Configuration  | Edit `skills/triage.md`                | Behavior on triage tasks changes. Loop and model are untouched.                             |

***

### Where Amodal fits in the stack

This framing makes it easy to see what each tool in the ecosystem gives you.

#### Raw provider SDKs (Anthropic, OpenAI, Google)

**What you get:** Intelligence (streaming LLM calls, tool-calling primitives).

**What you build yourself:** The loop. All of the configuration layer. Sessions, security, evals, scheduling, persistence.

Starting here means \~3-6 months of infrastructure before you ship a single line of domain logic.

#### Vercel AI SDK

**What you get:** Intelligence + *loop primitives*. Unified streaming across providers, tool-calling with schema validation, built-in tool-loop control (`stopWhen: stepCountIs(N)`).

**What you build yourself:** All of the configuration layer. Sessions, skills, knowledge, connections, security, stores, scheduling, MCP, evals.

Vercel AI SDK is deliberately scoped to the loop primitives — their [own docs](https://ai-sdk.dev) describe it as "a unified API for generating text and structured objects, streaming responses, and building agentic systems." It's the right primitive layer. Amodal uses it internally for every LLM call.

If you only need streaming LLM calls with tool support, use Vercel AI SDK directly. If you need a full agent, you'll keep building on top of it.

#### LangChain, LlamaIndex

**What you get:** Intelligence + loop + some configuration primitives (retrieval, memory, callbacks).

**What you build yourself:** A coherent configuration model. The flexibility is also the burden — big API surface, many ways to do the same thing, lots to learn, more footguns.

#### Mastra, CrewAI, AutoGen (framework peers)

**What you get:** Intelligence + loop + configuration *in code*. Agents are defined as TypeScript/Python classes, workflows are chains, memory is an opinion.

**What you build yourself:** Not much at the runtime layer. But you commit to defining agents in code, which means non-engineers can't edit agent behavior without a code deploy.

#### Amodal

**What you get:**

* Intelligence (via Vercel AI SDK, provider-agnostic, with failover)
* Loop (explicit state machine: thinking, streaming, executing, confirming, compacting, dispatching, done)
* **Configuration in files, not code** — skills, knowledge, connections, stores, automations are markdown and JSON in a git repo
* A security model baked in (field scrubbing, ACL enforcement, action tiers, confirmation gates)
* Persistence (PGLite for dev, Postgres for production)
* Package registry (`amodal pkg install @amodalai/stripe`, etc.)
* Eval framework, automation scheduling, MCP client, sub-agent dispatch — all included

**What you build yourself:** Your domain. Edit markdown, ship.

**The bet:** agent behavior should be a git-versioned configuration asset that product/ops/domain experts can edit, not TypeScript classes that require a code deploy. Configuration-as-data > configuration-as-code for the thing that changes most often.

### What you don't get — and why

Agents don't magically know your systems. They don't magically know your rules. They don't magically know which tool to call or which field to redact. That's all configuration. A framework that claims "zero config" is either hiding the config (making it hard to change) or operating on such a thin slice of the problem that the agent is barely useful.

Good agents require work — but the work is writing markdown and editing JSON, not writing code. That's the trade.

### Next

* [The Core Loop](/learn/architecture/core-loop) — the explore/plan/execute cycle
* [State Machine](/learn/architecture/state-machine) — how the loop is implemented
* [Project Structure](/quickstart/project-structure) — what a configuration repo looks like
* [FAQ](/learn/faq) — practical questions


## Agent Architecture

Amodal uses a multi-agent architecture where a **primary agent** delegates data-intensive work to ephemeral **task agents**, keeping its own context clean for reasoning.

### Agent Types

#### Primary Agent

One per chat session. Handles the conversation, reasons about findings, decides next steps, and presents results. Never loads API docs or parses raw JSON directly.

#### Task Agents

Ephemeral workers dispatched by the primary agent (or by other task agents). Each gets its own fresh context, loads KB docs, queries systems, interprets raw responses, and returns a clean summary (\~200-500 tokens). Context is discarded after.

#### Automations

Scheduled or triggered agent runs from installed marketplace automations plus custom automations. Admin configures parameters (frequency, channels), not logic.

### Context Isolation

This is the key architectural insight. Without task agents, raw data accumulates in the primary agent's context and never leaves:

```
Without task agents:                 With task agents:

[System prompt: 2K]                 [System prompt: 2K]
[User: "Why is cash flow negative?"][User: "Why is cash flow negative?"]
[API docs: 8K]     ← stuck forever  [Task result: 200 tokens]
[Raw JSON: 3K]     ← stuck forever  [Task result: 250 tokens]
[Pattern docs: 4K] ← stuck forever  [Task result: 150 tokens]
Context: 28K+ and growing           Context: 3K — clean, focused
```

Task agents can dispatch sub-task agents (depth 2). Depth 3 is rare. Depth 4+ is blocked.

### Task Agent Capabilities

Task agents inherit the parent's tool registry by default but are typically dispatched with a narrowed subset via `dispatch_task({toolSubset: [...]})`:

| Tool                               | Available to sub-agents                                                       |
| ---------------------------------- | ----------------------------------------------------------------------------- |
| `request` (read-only)              | Yes                                                                           |
| `query_store`                      | Yes                                                                           |
| `write_<store>`                    | Yes (for automations with `writeEnabled: true`)                               |
| `dispatch_task` (nested sub-tasks) | Yes (depth limited)                                                           |
| `present` (widgets)                | Usually filtered out — sub-agents return summaries, not widgets               |
| Custom tools                       | Yes (whichever ones the dispatcher includes in `toolSubset`)                  |
| MCP tools                          | Yes (same rule)                                                               |
| Skills                             | Not applicable — sub-agents have a task prompt, not an agent loop with skills |

Task agents produce **200-500 token summaries** from 8-15K of processed data. The primary agent reasons about these summaries, not the raw data.

### Dispatch Example

```
User: "What happened to the payment service in the last hour?"

Primary Agent dispatches 3 task agents in parallel:
  1. "Query Datadog for payment-service metrics (error rate, latency, throughput)"
  2. "Check PagerDuty for alerts on payment-service in the last hour"
  3. "Query deployment logs for payment-service changes"

Each task agent:
  - Loads relevant KB docs (API documentation)
  - Makes API calls via `request`
  - Processes raw JSON responses
  - Returns a clean summary

Primary Agent receives 3 summaries (~600 tokens total)
  → Correlates: deployment at 2:15 PM matches error spike at 2:17 PM
  → Presents findings with timeline widget
```


## Context Management

The SDK provides several mechanisms to keep agent context clean and within limits, even during complex multi-step investigations.

### Smart Compaction

When context approaches the limit, the SDK performs **structured state snapshots** that preserve key findings while discarding intermediate reasoning and raw data.

What's preserved:

* Key findings and conclusions
* Active hypotheses
* Entities and relationships discovered
* User preferences from the session

What's discarded:

* Intermediate reasoning chains
* Raw tool outputs already summarized
* Superseded hypotheses

### Tool Output Masking

A **backward-scan FIFO** mechanism prunes bulky tool outputs while protecting recent context. Older tool responses are truncated or removed when newer, more relevant data arrives.

The mask operates on a priority system:

1. Recent tool outputs are protected
2. Older outputs with summaries already incorporated are candidates for removal
3. Large raw JSON responses are first to be pruned

### Eager Knowledge Loading

All knowledge documents in `knowledge/` are loaded into the system prompt at session start. No on-demand fetch, no "which doc should I load" decision turn — the agent sees everything from turn one. This keeps reasoning simple: a user asks about payments-api and the agent already knows the baseline, the patterns, and the team directory without spending a turn on lookup.

```
System prompt: [full knowledge docs inline — 3-8K tokens typical]
  knowledge/environment.md: Production environment...
  knowledge/baselines.md: Service baselines...
  knowledge/patterns.md: Known patterns...
  knowledge/team.md: Team directory...
```

Sub-agents share the parent's context compiler, so they also start with full knowledge loaded.

This works because realistic knowledge bases are small (\~20 docs, a few thousand tokens total). If your knowledge base starts to crowd the context window, split by session type (separate repos or personas) rather than adding retrieval — lookup turns compound across every investigation and the cost outweighs the context savings.

### Loop Detection

The SDK detects when an agent enters unproductive loops:

* **Pattern matching** — repeated identical tool calls or reasoning patterns
* **LLM-based detection** — the model evaluates whether it's making progress

When a loop is detected, the agent is nudged to try a different approach or escalate to the user.


## The Core Loop

Every Amodal agent runs the same fundamental cycle:

```
Explore → what's going on? query systems, load context, gather data
Plan    → what should happen? reason about findings, decide next steps
Execute → do it. call APIs, dispatch agents, present results, learn
```

This is the conceptual model. For the actual runtime implementation — the discriminated-union state machine that drives every agent turn — see [State Machine](/learn/architecture/state-machine).

### Adaptive Depth

Not every question needs the full loop. The runtime matches depth to the question automatically:

| Question                           | Loop Behavior                                                         |
| ---------------------------------- | --------------------------------------------------------------------- |
| "What's the current error rate?"   | Explore only — query and answer                                       |
| "Why did latency spike at 3 PM?"   | Explore + Plan — gather data, correlate, explain                      |
| "Investigate the payment failures" | Full loop — sub-agent dispatch, iterative reasoning, skill activation |

### The Compounding Effect

The loop compounds through stores and knowledge. Every execution can write findings to a [store](/guide/stores) — patterns identified, false positives flagged, baselines updated — so the next explore phase starts with prior context already loaded in.

```
Session 1: Explore → slow, everything is new
           Plan    → generic reasoning
           Execute → discover false positive, write to findings store

Session 50: Explore → fast, stores and KB have patterns and baselines
            Plan    → informed reasoning with historical context
            Execute → focused on novel signals, skip known patterns
```

This is the flywheel — the system learns from use. See [Knowledge Base](/guide/knowledge-base) for details.

### How the loop actually runs

Under the hood, the loop is implemented as an explicit state machine rather than a while loop with implicit states. Each agent turn transitions through: `thinking` → `streaming` → `executing` (if tools were called) → back to `thinking`, until the model stops or a stopping condition fires.

Runtime guards:

* **Max turns** — prevent infinite loops
* **Max tokens** — hard budget ceiling
* **Loop detection** — detects when the agent is stuck calling the same tool repeatedly with similar arguments
* **Context compaction** — when the conversation exceeds a token threshold, older turns are summarized into a structured snapshot so the agent can keep going

For the full state machine — all six states, the transition rules, and how streaming/tool-calls/compaction interleave — see [State Machine](/learn/architecture/state-machine).


## State Machine

The Amodal agent loop is built as an **explicit state machine** — a discriminated union of states plus a `transition()` function that dispatches to a handler per state. This is what runs every time you send a message to an agent.

The outer loop is trivial:

```typescript
async function* runAgent(options): AsyncGenerator<SSEEvent> {
  let state: AgentState = { type: 'thinking', messages }
  yield { type: 'init', session_id }

  while (state.type !== 'done') {
    const { next, effects } = await transition(state, ctx)
    for (const event of effects) yield event  // SSE to the client
    if (ctx.signal.aborted) { state = { type: 'done', reason: 'user_abort', usage }; continue }
    if (ctx.turnCount >= ctx.maxTurns) { state = { type: 'done', reason: 'max_turns', usage }; continue }
    state = next
  }

  yield { type: 'done', usage: ctx.usage }
}
```

Every piece of the runtime logic lives inside the state handlers. The loop just drives transitions and yields events.

### The states

```
   thinking ──────► streaming ──────► done (text only)
      ▲                 │
      │                 ▼
      │             executing ──► confirming ──► thinking (denied)
      │                 │                  │
      │                 │                  └──► executing (approved)
      │                 ▼
      │           dispatching ──────► thinking
      │                 │
      │                 ▼
      └──── compacting ◄─── executing (context heavy)
             │
             │
             └──► thinking
```

| State         | What happens                                                                                                                 | Next                                                                                                                               |
| ------------- | ---------------------------------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------- |
| `thinking`    | Compile system prompt, check for loops, clear old tool result bodies, call `streamText()`.                                   | `streaming` (always)                                                                                                               |
| `streaming`   | Consume the provider's streamed response — emit `text_delta` events, collect tool calls.                                     | `done` (text only) or `executing` (tool calls)                                                                                     |
| `executing`   | Run the next tool in the queue, emit `tool_call_start`/`tool_call_result`, append results to the message history.            | `confirming` (needs approval), `dispatching` (sub-agent), `executing` (next in queue), `compacting` (context heavy), or `thinking` |
| `confirming`  | Tool call needs user approval (ACL rule or `requiresConfirmation` flag). Emit `confirmation_required`, wait for response.    | `executing` (approved) or `thinking` (denied)                                                                                      |
| `dispatching` | Sub-agent runs in its own isolated state machine with the parent's tools/stores. Result becomes a tool result in the parent. | `thinking`                                                                                                                         |
| `compacting`  | Context exceeded threshold — summarize older turns into a structured snapshot and replace them.                              | `thinking`                                                                                                                         |
| `done`        | Terminal. Carries `usage` and `reason`.                                                                                      | —                                                                                                                                  |

### Why a state machine

The alternative is an imperative `while` loop (what most agent frameworks do) with implicit states hidden in if/else branches and variable flags. That works for simple agents but breaks down when you need to:

* **Pause mid-loop for user confirmation** — an imperative loop has to awkwardly yield control to the transport layer, then resume with callbacks or blocking.
* **Insert compaction, loop detection, or sub-agent dispatch** without tangling the control flow — new features become new states + new transitions, not if-branches scattered through the loop body.
* **Test phases independently** — each state handler is a pure function: `(state, ctx) → { next, effects }`. You can test the compaction handler without running the whole loop.
* **Answer "what state is the agent in right now?"** with a single variable instead of reconstructing it from flags.

The tradeoff: more code upfront, unfamiliar pattern for developers who haven't worked with state machines. We think it's worth it once you hit the features above — and we hit all of them.

### Exhaustiveness is enforced by the compiler

The `transition()` dispatcher uses the TypeScript `never` trick:

```typescript
switch (state.type) {
  case 'thinking':     return handleThinking(state, ctx)
  case 'streaming':    return handleStreaming(state, ctx)
  // ... all states ...
  default: {
    const _exhaustive: never = state
    throw new Error(`Unhandled agent state: ${(_exhaustive as AgentState).type}`)
  }
}
```

If someone adds a new state variant (say, `reviewing`) and forgets to add a case, the `_exhaustive` assignment fails to compile. Silent fallthrough is impossible.

### State data is minimal

Each state variant carries **only** what its handler needs:

```typescript
type AgentState =
  | { type: 'thinking'; messages: ModelMessage[] }
  | { type: 'streaming'; stream: StreamTextResult; pendingToolCalls: ToolCall[] }
  | { type: 'executing'; queue: ToolCall[]; current: ToolCall; results: ToolResult[] }
  | { type: 'confirming'; call: ToolCall; remainingQueue: ToolCall[] }
  | { type: 'compacting'; messages: ModelMessage[]; estimatedTokens: number }
  | { type: 'dispatching'; task: DispatchConfig; toolCallId: string; queue: ToolCall[]; results: ToolResult[] }
  | { type: 'done'; usage: TokenUsage; reason: DoneReason }
```

No shared "agent context" object leaks between states. The stuff that really is shared (provider, tool registry, logger, message history) lives on `AgentContext`, which is passed to every handler.

### SSE events are return values

State handlers return `{ next, effects }` where `effects` is a list of SSE events. The outer loop yields them. This means:

* **Business logic doesn't know about transport.** Handlers don't call `res.write()` or `sendEvent()`. They just return events.
* **The same loop runs over HTTP, in automations, and in eval runners.** Each caller decides what to do with the yielded events.
* **Testing is trivial.** Call `transition(state, ctx)`, assert on `next` and `effects`.

### Done reasons

The `done` state always carries `reason: DoneReason` and the final `usage`:

| Reason            | When                                                                                                        |
| ----------------- | ----------------------------------------------------------------------------------------------------------- |
| `model_stop`      | The model finished without requesting more tool calls                                                       |
| `max_turns`       | The turn budget was exhausted                                                                               |
| `user_abort`      | The abort signal fired                                                                                      |
| `error`           | An unrecoverable error occurred inside a state handler                                                      |
| `budget_exceeded` | Cumulative token usage hit the session's `maxSessionTokens` cap                                             |
| `loop_detected`   | Loop detection hit the hard-stop threshold (8+ repeated tool calls with similar, non-pagination parameters) |

The final `done` SSE event is yielded **unconditionally** with the accumulated usage — regardless of which reason triggered it. Downstream (audit log, usage reporting, the client) always sees token costs.

### Where to find it

| File                                       | What's in it                                                   |
| ------------------------------------------ | -------------------------------------------------------------- |
| `packages/runtime/src/agent/loop.ts`       | The `runAgent` generator and `transition()` dispatcher         |
| `packages/runtime/src/agent/loop-types.ts` | `AgentState`, `AgentContext`, `DoneReason`, `TransitionResult` |
| `packages/runtime/src/agent/states/`       | One file per state handler                                     |

See [The Core Loop](/learn/architecture/core-loop) for the conceptual model this state machine implements.


## Admin Agent

When you run `amodal dev`, the dev UI exposes a separate **admin chat** at `/config/chat`. Admin chat runs in its own session with its own system prompt — separate from your regular user chat — and has access to a set of file tools that let it read, search, and edit the files in your agent repo. You use it to configure the agent itself: add a connection, write a skill, fix a knowledge doc, delete a stale automation.

Admin sessions are **local-only**. They exist in `amodal dev`, not in hosted runtimes or ISV embeddings. The admin agent's source code lives in the `@amodalai/agent-admin` package and is cached at `~/.amodal/admin-agent/` on first use.

### What makes an admin session different

* **Different system prompt.** The admin agent is an assistant for configuring amodal agents. It knows the file structure (`skills/`, `knowledge/`, `connections/`, etc.), the config schemas, and the common patterns.
* **File tools.** Nine admin-only tools for navigating and editing the repo. Regular chat sessions never see these.
* **Path allowlist.** All file operations are scoped to the agent's config directories (`skills/`, `knowledge/`, `connections/`, `stores/`, `pages/`, `automations/`, `evals/`, `agents/`, `tools/`, `amodal_packages/`). Sensitive files (`.env`, `amodal.json`, `package.json`, `pnpm-lock.yaml`, `tsconfig.json`) are blocked.
* **Read-only packages.** Files under `amodal_packages/` can be read but not written — they're installed from the registry and should be edited by updating the package, not the local copy.

### The nine file tools

The admin agent has tools grouped by purpose: **discovery** (finding what's there), **read/write** (operating on files), and **introspection** (querying the runtime itself).

#### Discovery

Use these **before** reading or editing — don't guess paths.

| Tool                  | When to use                                                                                                                                                  |
| --------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| **`grep_repo_files`** | You know a keyword that would appear *inside* the target file. `grep_repo_files({pattern: "emoji"})` is the fastest way to find where formatting rules live. |
| **`list_repo_files`** | You want to see what exists in a directory (or across all allowlist dirs when `dir` is omitted). Good for "what skills do I have?" style questions.          |
| **`glob_repo_files`** | You want files matching a pattern by name/path structure: `**/SKILL.md`, `connections/*/spec.json`, `knowledge/*.md`.                                        |

**Rule of thumb:** keyword inside the file → grep. File/dir by name → list or glob.

#### Read / Write

| Tool                       | What it does                                                                                                                                                                                                              |
| -------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **`read_repo_file`**       | Read a single file. Default returns up to 2000 lines; for longer files set `truncated: true` and include `total_lines` — paginate with `offset` + `limit`.                                                                |
| **`read_many_repo_files`** | Batched read of up to 20 files (50KB each). Good for "review every SKILL.md."                                                                                                                                             |
| **`edit_repo_file`**       | In-place find-and-replace. Default requires exactly one match of `old_string`; set `allow_multiple: true` to replace every occurrence. Use for targeted changes to existing files — preserves everything you don't touch. |
| **`write_repo_file`**      | Create a new file or fully rewrite an existing one. Prefer `edit_repo_file` for targeted changes.                                                                                                                         |
| **`delete_repo_file`**     | Delete a file. Always confirm with the user first.                                                                                                                                                                        |

#### Introspection

| Tool               | What it does                                                                                                                                                                           |
| ------------------ | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **`internal_api`** | GET any endpoint on the running runtime. Useful for checking eval results (`/api/evals/runs`), connection health (`/inspect/context`), store data, etc. Read-only (GET requests only). |

### The workflow

1. **Discover** — grep for a keyword, or list/glob for structure. Don't guess paths.
2. **Read** — confirm the file's current content before editing.
3. **Edit in place** — use `edit_repo_file` for targeted changes. Use `write_repo_file` only for new files or full rewrites.
4. **Confirm** — tell the user what you changed.

#### Example: "Reduce emoji usage in my formatting rules"

```
# 1. Discover: keyword search beats path-guessing
grep_repo_files({pattern: "emoji"})
→ {matches: [{file: "knowledge/formatting-rules.md", line_number: 3, text: "..."}]}

# 2. Read: confirm current content
read_repo_file({path: "knowledge/formatting-rules.md"})
→ {content: "# Formatting Rules 🎨\n\nUse emojis liberally...", line_start: 1, line_end: 12, total_lines: 12}

# 3. Edit: in-place substitution, not full rewrite
edit_repo_file({
  path: "knowledge/formatting-rules.md",
  old_string: "Use emojis liberally to make the output more engaging",
  new_string: "Use clear, concise language",
})
→ {edited: "knowledge/formatting-rules.md", occurrences: 1, bytes_before: 417, bytes_after: 289}
```

Three tool calls, done. Before these tools existed, agents guessed paths like `skills/content-analysis/SKILL.md` (doesn't exist) and spent turns flailing.

### Pagination — reading long files

For large files (`amodal_packages/` specs, long lockfiles, generated docs), `read_repo_file` returns at most 2000 lines by default and sets `truncated: true`. Paginate with `offset`:

```
# First call — default
read_repo_file({path: "knowledge/incident-history.md"})
→ {content: "<lines 1-2000>", line_start: 1, line_end: 2000, total_lines: 5432, truncated: true}

# Continue from where we left off
read_repo_file({path: "knowledge/incident-history.md", offset: 2001})
→ {content: "<lines 2001-4000>", line_start: 2001, line_end: 4000, total_lines: 5432, truncated: true}

# Last chunk
read_repo_file({path: "knowledge/incident-history.md", offset: 4001})
→ {content: "<lines 4001-5432>", line_start: 4001, line_end: 5432, total_lines: 5432}
```

For targeted reads (I want just line 2500), set `offset` directly — no need to page through everything first.

| Parameter | Default     | Max      |
| --------- | ----------- | -------- |
| `offset`  | `1` (start) | —        |
| `limit`   | `2000`      | `10_000` |

### Caps and limits

| Tool                   | Cap                           | Behavior                                           |
| ---------------------- | ----------------------------- | -------------------------------------------------- |
| `list_repo_files`      | 2000 entries                  | `truncated: true` if exceeded                      |
| `glob_repo_files`      | 500 files                     | `truncated: true` if exceeded                      |
| `grep_repo_files`      | 100 matches                   | `truncated: true` if exceeded (matches gemini-cli) |
| `read_repo_file`       | 2000 lines default, 10000 max | `truncated: true` if more remain                   |
| `read_many_repo_files` | 20 files × 50KB each          | `truncated: true` for files cut at byte budget     |

`.git`, `node_modules`, and `.DS_Store` are always skipped across discovery tools.

### Customizing the admin agent

The admin agent's source lives in the `@amodalai/agent-admin` package. Override the cached version with a local path in your `amodal.json`:

```json
{
  "adminAgent": "/path/to/my/agent-admin"
}
```

This is useful for iterating on the admin agent's prompt, skills, or knowledge without republishing the package. The runtime reads directly from the path on every admin-session start.

### When NOT to use the admin agent

* **Regular user workflows.** Admin sessions are for configuring the agent; user chat is for the agent's actual job.
* **Production deployments.** Admin is local-dev only. Hosted runtimes don't expose it.
* **Editing files outside the allowlist.** Files at the repo root (`amodal.json`, `package.json`, `.env`) cannot be modified by admin tools — by design. Use your regular editor for those.


## Agents

The `agents/` directory defines **custom subagents** and lets you **override built-in agents**. Each subdirectory is an agent with an `AGENT.md` file.

```
agents/
├── explore/              ← override: replaces default explore agent
│   └── AGENT.md
├── plan/                 ← override: replaces default plan agent
│   └── AGENT.md
├── compliance-checker/   ← subagent: custom task agent
│   └── AGENT.md
└── vendor-lookup/        ← subagent: custom task agent
    └── AGENT.md
```

### Reserved Names (Overrides)

These directory names override built-in agents:

| Name      | What It Overrides                                                                 |
| --------- | --------------------------------------------------------------------------------- |
| `explore` | The explore sub-agent that gathers data from connected systems                    |
| `plan`    | The plan agent that reasons before executing                                      |
| `main`    | The primary agent prompt (backward compat: `agents/main.md` flat file also works) |

Override agents use the raw `AGENT.md` content as the system prompt.

### Custom Subagents

Any directory that isn't a reserved name defines a **custom subagent** — a reusable task agent the primary agent can dispatch by name for specialized work.

#### AGENT.md Format (heading-based)

```markdown
# Agent: Compliance Checker

Checks regulatory compliance across transactions and flags violations.

## Config

tools: [request, query_store, dispatch_task]
maxDepth: 2
maxToolCalls: 15
timeout: 60
modelTier: advanced
targetOutputMin: 200
targetOutputMax: 500

## Prompt

You are a compliance specialist. When dispatched:

1. Load the relevant compliance KB documents for the regulation in question
2. Query the transaction system for the entities specified
3. Check each entity against the compliance rules
4. Return a structured report:
   - Compliant items (brief)
   - Violations (detailed, with rule references)
   - Recommendations
```

#### AGENT.md Format (frontmatter)

```markdown
---
displayName: Vendor Lookup
description: Enriches vendor profiles from CRM and public data
tools: [request, query_store]
maxDepth: 1
maxToolCalls: 10
timeout: 30
modelTier: simple
---

Query the vendor management system for the requested vendor.
Cross-reference with public data sources.
Return a standardized vendor profile with:
- Company info, industry, size
- Contract history
- Risk indicators
```

### Configuration Fields

| Field             | Type                            | Default                    | Description                                                                                                   |
| ----------------- | ------------------------------- | -------------------------- | ------------------------------------------------------------------------------------------------------------- |
| `displayName`     | string                          | directory name             | Human-readable name                                                                                           |
| `description`     | string                          | displayName                | Short description                                                                                             |
| `tools`           | string\[]                       | inherits parent's tool set | Subset of tools this sub-agent can use when dispatched. Names must exist in the parent's registered tool set. |
| `maxDepth`        | number (1-4)                    | `1`                        | Dispatch depth (1 = no sub-agents)                                                                            |
| `maxToolCalls`    | number (1-100)                  | `10`                       | Max tool calls per execution                                                                                  |
| `timeout`         | number (5-600)                  | `20`                       | Timeout in seconds                                                                                            |
| `targetOutputMin` | number (50-2000)                | `200`                      | Min output tokens                                                                                             |
| `targetOutputMax` | number (50-2000)                | `400`                      | Max output tokens                                                                                             |
| `modelTier`       | `simple \| default \| advanced` | —                          | Model selection tier                                                                                          |

#### Model Tiers

| Tier       | Use Case                                           |
| ---------- | -------------------------------------------------- |
| `simple`   | Data gathering, API queries, structured extraction |
| `default`  | Standard reasoning                                 |
| `advanced` | Complex analysis, multi-step reasoning             |

#### Available Tools

Use any tool name registered on the parent's [tool registry](/guide/tools): `request`, `query_store`, `write_<store>`, `dispatch_task`, `present`, `stop_execution`, plus any custom tools or MCP tools. Knowledge is pre-loaded into the sub-agent's context at dispatch time (no `load_knowledge` tool — knowledge comes in the system prompt, same as the primary agent).

### How Subagents Are Dispatched

The primary agent uses the `dispatch` tool to invoke subagents by name:

```
Primary Agent: "I need to check compliance for these transactions"

dispatch({
  agent: "compliance-checker",
  query: "Check SOX compliance for transactions TXN-001 through TXN-050"
})

→ Compliance checker agent runs in isolated context
→ Returns 200-500 token summary
→ Primary agent continues reasoning with the result
```

Each subagent gets its own context window. Results are returned as clean summaries, keeping the primary agent's context focused on reasoning.

### Context Isolation

```
Without subagents:                    With subagents:

[System prompt: 2K]                   [System prompt: 2K]
[User: "Check compliance"]           [User: "Check compliance"]
[Compliance rules: 8K]  ← stuck      [Subagent result: 300 tokens]
[Raw transactions: 5K]  ← stuck      [Subagent result: 250 tokens]
Context: 20K+ and growing            Context: 3K — clean, focused
```

### Backward Compatibility

Flat files `agents/main.md` and `agents/explore.md` still work as override files. If both a flat file and a subdirectory exist, the subdirectory takes precedence.


## Automations

Automations let your agent work proactively — monitoring systems, digesting data, responding to incidents, detecting drift — without someone typing in a chat box. Instead of waiting for a user to ask "what happened overnight?", the agent runs on its own, queries the systems it needs, and routes the results to Slack, email, a webhook, or wherever your team actually looks.

Each automation is a file in the `automations/` directory. It defines what the agent should do (the prompt), when it should do it (a cron schedule or webhook trigger), and where the results go. The agent gets a fresh session each time — no accumulated state, no drift, no runaway loops.

This is the difference between a chatbot and an operational agent. Chatbots wait. Agents act.

### JSON Format (Recommended)

The simplest automation is a prompt and a schedule:

```json
{
  "title": "Daily Revenue Digest",
  "schedule": "0 9 * * 1-5",
  "prompt": "Pull yesterday's revenue data and summarize by region. Highlight any anomalies compared to the weekly baseline."
}
```

| Field          | Type                              | Description                                                                                                     |
| -------------- | --------------------------------- | --------------------------------------------------------------------------------------------------------------- |
| `title`        | string                            | Display name, shown in the CLI and admin dashboard                                                              |
| `prompt`       | string                            | The message sent to the agent when the automation runs                                                          |
| `schedule`     | string                            | Cron expression (triggers `cron` mode)                                                                          |
| `trigger`      | `"cron" \| "webhook" \| "manual"` | Trigger type (auto-inferred from `schedule` if present)                                                         |
| `delivery`     | object                            | Where results are routed on success — webhooks and/or ISV callbacks. See [Delivery routing](#delivery-routing). |
| `failureAlert` | object                            | Where failure alerts go, with consecutive-failure threshold + cooldown.                                         |
| `writeEnabled` | boolean                           | Explicitly allow write operations (default: `false`)                                                            |

### Trigger Types

| Type        | How It Runs                                                                                           |
| ----------- | ----------------------------------------------------------------------------------------------------- |
| **cron**    | On a schedule. Inferred automatically when `schedule` is present. Uses standard cron syntax.          |
| **webhook** | In response to an HTTP POST to the automation's webhook URL. The payload is injected into the prompt. |
| **manual**  | On-demand via CLI (`amodal ops automations trigger <name>`). Useful for testing.                      |

### Complete Examples

#### Example 1: Daily Revenue Digest

A cron automation that runs every weekday morning, pulls revenue data from Stripe and QuickBooks, and posts a summary to Slack:

```json
{
  "title": "Daily Revenue Digest",
  "schedule": "0 9 * * 1-5",
  "prompt": "Pull yesterday's revenue from Stripe (payments) and QuickBooks (invoices). Break down by region (NA, EMEA, APAC) using the customer metadata tags. Compare each region to its 7-day rolling average. Flag any region where revenue deviated more than 15% from the baseline. Include total MRR and net new revenue. Format as a concise Slack message.",
  "delivery": {
    "targets": [{ "type": "webhook", "url": "env:SLACK_WEBHOOK_REVENUE" }]
  }
}
```

When this runs, the agent gets a fresh session with the prompt as the initial message. It uses the `request` tool to query Stripe's payment intents and QuickBooks' invoice endpoints, dispatches sub-agents to process each data source in parallel, compiles the summary, and the runtime POSTs the formatted result to the Slack incoming webhook URL configured via `SLACK_WEBHOOK_REVENUE` in `.env`.

#### Example 2: Webhook-Triggered Incident Responder

A webhook automation that fires when PagerDuty sends an incident notification. The agent investigates the incident, gathers context, and posts a summary to Slack:

```json
{
  "title": "Incident Responder",
  "trigger": "webhook",
  "prompt": "A PagerDuty incident was triggered. The event data is in the <event_data> block of this message.\n\nInvestigate this incident:\n1. Check Datadog for related metrics and anomalies in the affected service\n2. Search recent deployments in GitHub for changes to the affected service\n3. Check the knowledge base for known issues matching this pattern\n4. Check Slack #incidents for any related discussion\n\nReturn your findings with a severity assessment and recommended next steps as a concise Slack message.",
  "delivery": {
    "targets": [{ "type": "webhook", "url": "env:SLACK_WEBHOOK_INCIDENTS" }]
  },
  "failureAlert": {
    "after": 2,
    "cooldownMinutes": 15,
    "targets": [{ "type": "webhook", "url": "env:SLACK_WEBHOOK_ONCALL" }]
  }
}
```

When PagerDuty sends a POST to the automation's webhook URL, the runtime parses the payload, appends it to the prompt inside an `<event_data>` block (marked as untrusted input — the agent is explicitly instructed not to follow instructions from inside it), and starts a fresh agent session.

The agent then runs the full loop: dispatches sub-agents to query Datadog, GitHub, and knowledge docs in parallel, synthesizes the findings, and returns a structured incident summary. The runtime POSTs that summary to the `SLACK_WEBHOOK_INCIDENTS` Slack webhook. If two consecutive runs fail, `failureAlert` POSTs to the oncall webhook — with a 15-minute cooldown so a sustained outage doesn't flood the channel.

#### Example 3: Weekly API Drift Detector

A cron automation that compares current API specifications against stored baselines to detect unexpected changes:

```json
{
  "title": "API Drift Detector",
  "schedule": "0 6 * * 1",
  "prompt": "Run a drift detection pass across all connected APIs.\n\nFor each connection with a spec.json:\n1. Fetch the current API spec from the live endpoint (if discovery URL is configured)\n2. Compare against the stored spec.json in the connection directory\n3. Flag any new endpoints, removed endpoints, changed request/response schemas, or deprecated fields\n4. For each drift, assess the impact: breaking change, additive change, or cosmetic\n\nWrite each drift to the drift_history store (keyed by connection + timestamp). Return a Slack-formatted summary — 'All clean' if nothing changed, or a prioritized list of breaking changes if anything did.",
  "delivery": {
    "targets": [{ "type": "webhook", "url": "env:SLACK_WEBHOOK_API_CHANGES" }]
  }
}
```

This automation writes findings to a store. Over time, the agent builds up a queryable history of API changes, deprecation patterns, and known quirks — making each subsequent drift check smarter. Stores persist across sessions; the knowledge base gets loaded into context at session start.

### Webhook Automations

Webhook automations respond to external events in real time. Each webhook automation gets a unique URL that you register with your external service (PagerDuty, GitHub, Stripe, or any system that sends HTTP webhooks).

#### Getting the Webhook URL

After deploying, the runtime generates a webhook URL for each webhook-triggered automation:

```bash
$ amodal ops automations list

  Name                 Trigger   Schedule       Status    Webhook URL
  ──────────────────── ───────── ────────────── ───────── ──────────────────────────────────────────
  revenue-digest       cron      0 9 * * 1-5    active    —
  incident-responder   webhook   —              active    http://localhost:3847/webhooks/auto_7kx2m9
  drift-detector       cron      0 6 * * 1      active    —
```

Copy the webhook URL and configure it in your external service. For PagerDuty, add it as a webhook subscription. For GitHub, add it as a repository webhook. The URL is stable across deploys — it is tied to the automation name, not the deployment version.

#### Webhook Payload Handling

When the runtime receives a POST to a webhook URL, it:

1. Validates the request (optional signature verification per-connection)
2. Extracts the JSON payload
3. Replaces `{{event}}` in the automation's prompt with the stringified payload
4. Creates a fresh agent session with the composed prompt
5. Runs the explore-plan-execute loop
6. Routes output to the configured channel

The raw payload is available to the agent as context. The agent can parse it, extract relevant fields, and use them to guide its investigation. For large payloads, the runtime truncates to 10KB and includes a note that the payload was trimmed.

#### Webhook Security

For production deployments, configure signature verification in the automation:

```json
{
  "title": "GitHub Push Handler",
  "trigger": "webhook",
  "prompt": "...",
  "webhookAuth": {
    "type": "hmac-sha256",
    "secret": "env:GITHUB_WEBHOOK_SECRET",
    "header": "X-Hub-Signature-256"
  }
}
```

The runtime validates the signature before processing the payload. Invalid signatures get a 401 response and the automation does not run.

### Markdown Format (Legacy)

Automations can also be defined in Markdown. This format is supported but not recommended for new automations — JSON is more explicit and easier to validate:

```markdown
# Automation: Morning Brief

Schedule: 0 7 * * *

## Check

Pull all active deals and recent activities from the CRM.
Summarize wins, losses, and pipeline changes.
```

The parser extracts the title from the heading, the schedule from the `Schedule:` line, and the prompt from the `## Check` section. It works, but it is ambiguous in edge cases and does not support output routing.

### How Runs Work

Each run is **stateless**. The agent queries systems fresh, using `since=lastRunTimestamp` to scope its queries to new data since the last run. The runtime provides `lastRunSummary` — a compact summary of the previous run's output — so the agent has continuity without accumulating state.

1. **Trigger** — The scheduler fires a cron job or the webhook listener receives a POST
2. **Session creation** — A fresh agent session is created with the automation's prompt (plus any injected event data)
3. **Explore-plan-execute** — The agent runs the full reasoning loop. It dispatches task agents, queries APIs, loads knowledge, and synthesizes findings
4. **Output routing** — Results are formatted and sent to the configured output channel (Slack, email, webhook, or stored in the session log)
5. **Cleanup** — The session is closed and context is discarded. The run summary is stored for the next run's continuity

The statefulness guarantee matters. Automations cannot accumulate side effects across runs. Each run starts clean, which means a bug in one run cannot corrupt the next. If something goes wrong, the worst case is one bad summary — not a cascade of compounding errors.

### Guardrails in Practice

#### Why Automations Cannot Write by Default

Automations run without a human in the loop. There is no one to confirm a write operation, review a destructive action, or catch a hallucinated API call. This is fundamentally different from interactive chat, where the user sees every tool call and can approve or reject writes.

Because of this asymmetry, automations are **read-only by default**. The agent can query any connected system, load knowledge, dispatch task agents, and generate output — but it cannot create, update, or delete data in external systems. The `request` tool rejects any call with `intent: 'write'` during an automation run unless writes are explicitly enabled.

#### Enabling Writes for Specific Automations

If an automation genuinely needs to write — posting a summary to Slack, creating a Jira ticket, updating a status page — you enable it explicitly in the automation config:

```json
{
  "title": "Incident Responder",
  "trigger": "webhook",
  "prompt": "...",
  "writeEnabled": true
}
```

When `writeEnabled` is `true`, the agent can make write calls — but the runtime still enforces rate limits, audit logging, and per-tool confirmation rules. The writes are logged in the automation run's audit trail.

Setting `writeEnabled` is a deliberate decision. It shows up in code review. It is audited. There is no way to accidentally grant write access.

### Delivery Routing

When an automation completes successfully, the runtime POSTs its final response text to each configured delivery target. Delivery is out-of-band from the agent's tools — the agent doesn't call a "post to slack" tool, the runtime routes the result after the turn ends. This keeps automations **read-only by default** while still letting results reach Slack, webhooks, email (via webhook), or ISV systems.

#### Targets

Two target types:

| Type       | Shape                              | What it does                                                                                                                                                                                         |
| ---------- | ---------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `webhook`  | `{type: "webhook", url: "..."}`    | POST the delivery payload to the URL. Supports `env:VAR_NAME` substitution resolved at bundle-load time.                                                                                             |
| `callback` | `{type: "callback", name?: "..."}` | Invoke the ISV-provided `onAutomationResult` handler registered with `createAgent({onAutomationResult})`. The optional `name` field lets ISVs route to specific handlers when they register several. |

Slack delivery is just `webhook` pointing at a Slack incoming webhook URL. Email delivery works the same via any transactional email provider's webhook (SendGrid, Mailgun, etc.).

#### Payload shape

The runtime POSTs a JSON payload to webhook targets:

```json
{
  "automation": "daily-revenue-digest",
  "status": "success",
  "timestamp": "2026-04-05T14:00:00Z",
  "result": "<the agent's final response text, max 16KB>",
  "truncated": false,
  "message": "<template-rendered string, if template was configured>",
  "data": { "mrr": 12.4, "growth": "+3.2%" }
}
```

`data` is populated automatically when the agent's last assistant message is valid JSON. That parsed object is available to templates.

#### Templates

If `delivery.template` is set, the runtime renders it with variables from `data` plus built-ins (`{{automation}}`, `{{timestamp}}`, `{{result}}`). Useful for shaping output per target:

```json
{
  "delivery": {
    "targets": [{ "type": "webhook", "url": "env:SLACK_WEBHOOK" }],
    "template": "📊 *{{automation}}* — MRR ${{mrr}} ({{growth}})"
  }
}
```

#### Failure alerts

`failureAlert` is separate from `delivery` — it fires on failures, with a consecutive-failure threshold and cooldown:

```json
{
  "failureAlert": {
    "after": 3,
    "cooldownMinutes": 60,
    "targets": [{ "type": "webhook", "url": "env:PAGERDUTY_WEBHOOK" }]
  }
}
```

`after: 3` means don't page until 3 runs have failed in a row. `cooldownMinutes: 60` means even during a sustained outage, only alert once per hour. Both defaults (`after: 1`, `cooldownMinutes: 60`) err on the side of earlier alerting.

**State is in-memory.** The consecutive-failure counter and last-alert timestamp live on the runtime process. Restarting the runtime resets the counter — a flapping service mid-cooldown may get re-alerted on restart. For hosted runtimes that cycle around deploys, expect occasional alert repetition at deploy boundaries.

#### Why delivery lives outside the tool system

Delivery is always allowed, even for automations without `writeEnabled: true`. Posting a summary to Slack isn't a "write" in the amodal sense — it's the runtime reporting what the agent found to the people who configured it. Writing means the agent itself calls a tool that changes external state. Delivery is just how automation results get where they need to go.

### Managing Automations

```bash
# List all automations with their status and next run time
$ amodal ops automations list

  Name                 Trigger   Schedule       Status    Next Run
  ──────────────────── ───────── ────────────── ───────── ──────────────────────
  revenue-digest       cron      0 9 * * 1-5    active    2026-03-27 09:00 UTC
  incident-responder   webhook   —              active    (on event)
  drift-detector       cron      0 6 * * 1      paused    —

# Pause an automation (skips future runs until resumed)
$ amodal ops automations pause drift-detector
  ✓ Paused "API Drift Detector" — will not run until resumed

# Resume a paused automation
$ amodal ops automations resume drift-detector
  ✓ Resumed "API Drift Detector" — next run: 2026-03-30 06:00 UTC

# Manually trigger an automation (useful for testing)
$ amodal ops automations trigger revenue-digest
  ✓ Triggered "Daily Revenue Digest"
  ℹ Run started — session auto_run_8k2mx4
  ℹ Streaming output to #finance-daily

# View the log of recent automation runs
$ amodal ops automations history revenue-digest

  Run ID           Started              Duration   Status     Output
  ──────────────── ──────────────────── ────────── ────────── ──────────
  run_9x2k4m       2026-03-26 09:00    42s        completed  → #finance-daily
  run_8m3k1n       2026-03-25 09:00    38s        completed  → #finance-daily
  run_7k4n2p       2026-03-24 09:00    51s        completed  → #finance-daily
```


## amodal.json

Every Amodal project starts with a single file: `amodal.json` at the root of your repo. This is the manifest. It tells the runtime who the agent is, which LLM to use, how to handle failover when a provider goes down, and how to connect to the outside world — MCP servers, data stores, and sandbox environments.

Think of it as `package.json` for an intelligent agent. The runtime reads it on startup, resolves any environment variable references, validates the schema, and uses it to bootstrap everything: the reasoning loop, provider connections, tool execution policies, and automation scheduling.

The config is deliberately flat. There are no nested pipelines or DAGs to configure. You pick your models, point at your connections, and the runtime handles the rest.

### Minimal Config

The smallest thing that works. Three fields — name, version, and a main model:

```json
{
  "name": "my-agent",
  "version": "0.1.0",
  "models": {
    "main": {
      "provider": "anthropic",
      "model": "claude-sonnet-4-20250514"
    }
  }
}
```

This gives you a working agent with no connections, no knowledge base, and no automations. It can chat, reason, and use built-in tools. You would use this during early development or when prototyping a new agent before wiring up APIs.

The `provider` field tells the runtime which SDK adapter to load. The `model` field is passed directly to that provider's API. The runtime expects the provider's API key in the standard environment variable (`ANTHROPIC_API_KEY`, `OPENAI_API_KEY`, etc.) unless you specify explicit credentials.

### Production Config with Failover

A realistic configuration for a deployed agent. This is what a production ops-agent might look like — primary model with a fallback, a cheaper model for data-gathering sub-agents, and custom system context:

```json
{
  "name": "ops-agent",
  "version": "2.4.1",
  "description": "Infrastructure monitoring and incident response for Acme Corp",

  "models": {
    "main": {
      "provider": "anthropic",
      "model": "claude-sonnet-4-20250514",
      "credentials": {
        "api_key": "env:ANTHROPIC_API_KEY"
      },
      "fallback": {
        "provider": "openai",
        "model": "gpt-4o",
        "credentials": {
          "api_key": "env:OPENAI_API_KEY"
        }
      }
    },
    "explore": {
      "provider": "anthropic",
      "model": "claude-haiku-4-5-20251001",
      "fallback": {
        "provider": "google",
        "model": "gemini-2.0-flash"
      }
    }
  },

  "userContext": "You are the operations agent for Acme Corp. Our infrastructure runs on AWS (us-east-1 and eu-west-1). We use Datadog for monitoring, PagerDuty for incident management, and Slack (#ops-alerts) for communication. Always check Datadog metrics before escalating. Never restart production services without explicit confirmation."
}
```

The `userContext` string is injected at the top of every session prompt. Use it for standing instructions that should always apply — your company's infrastructure layout, naming conventions, escalation policies, or behavioral constraints. This is not the place for methodology (that goes in skills) or reference data (that goes in the knowledge base). Think of it as the agent's permanent memory of "who am I and what are the ground rules."

### Config with MCP, Stores, and Sandbox

When you need external tool servers, persistent storage, and sandboxed execution:

```json
{
  "name": "finops-agent",
  "version": "1.2.0",
  "description": "Financial operations analysis and reporting",

  "models": {
    "main": {
      "provider": "anthropic",
      "model": "claude-sonnet-4-20250514"
    },
    "explore": {
      "provider": "google",
      "model": "gemini-2.0-flash"
    }
  },

  "sandbox": {
    "shellExec": true,
    "template": "finops-sandbox-v2",
    "maxTimeout": 60000
  },

  "stores": {
    "dataDir": ".amodal/store-data",
    "backend": "pglite"
  },

  "proactive": {
    "webhook": "https://hooks.acme.com/amodal/finops"
  },

  "mcp": {
    "servers": {
      "github": {
        "transport": "stdio",
        "command": "uvx",
        "args": ["mcp-server-github"],
        "env": { "GITHUB_TOKEN": "env:GITHUB_TOKEN" }
      },
      "postgres": {
        "transport": "stdio",
        "command": "npx",
        "args": ["-y", "@modelcontextprotocol/server-postgres"],
        "env": { "DATABASE_URL": "env:ANALYTICS_DB_URL" }
      },
      "custom-tools": {
        "transport": "sse",
        "url": "https://tools.internal.acme.com/mcp",
        "headers": {
          "Authorization": "env:INTERNAL_TOOLS_TOKEN"
        }
      }
    }
  }
}
```

MCP (Model Context Protocol) servers extend your agent with additional tools beyond the built-in set. Each server entry declares how to connect — either `stdio` (the runtime spawns the process) or `sse` (the runtime connects to a running server over HTTP). The tools exposed by these servers appear alongside built-in tools in the agent's tool list.

The `sandbox` block controls shell execution. When `shellExec` is `true`, the agent can run arbitrary commands. The `template` field specifies which sandbox image to use (pre-configured with your dependencies), and `maxTimeout` caps how long any single command can run.

The `stores` block configures the database backend used for **both** agent document stores (see [Stores](/guide/stores)) **and** session persistence (conversation history across restarts). PGLite (the default) runs an in-process WASM build of Postgres — no external database needed for local dev. Agent data lives in `.amodal/store-data/` and session history in `.amodal/session-data/`, both file-backed so they survive restarts. For hosted runtimes and ISV production deployments, switch to a real Postgres by setting `backend: "postgres"` and providing `postgresUrl` (typically `env:DATABASE_URL`). Both backends share the same Drizzle schema, so switching is a pure config change.

### Fields

#### Required

| Field         | Type        | Description                                                                                         |
| ------------- | ----------- | --------------------------------------------------------------------------------------------------- |
| `name`        | string      | Agent name (min 1 char). Used in logging and deployment IDs.                                        |
| `version`     | string      | Semantic version. Used for snapshot tagging and versioning.                                         |
| `models.main` | ModelConfig | Primary agent model. This is the reasoning brain — it handles the conversation, plans, and decides. |

#### Optional

| Field            | Type        | Description                                                                                                                               |
| ---------------- | ----------- | ----------------------------------------------------------------------------------------------------------------------------------------- |
| `description`    | string      | Agent description. Shown in CLI output.                                                                                                   |
| `userContext`    | string      | Injected at the top of every session prompt. Standing instructions, company context, behavioral constraints.                              |
| `models.explore` | ModelConfig | Model for explore/gather sub-agents. Should be faster and cheaper than `main`.                                                            |
| `sandbox`        | object      | Sandbox execution config. Controls whether custom tool handlers can call `ctx.exec()` for shell commands, and how sandboxing is enforced. |
| `stores`         | object      | Data store backend config. Defaults to PGLite in the local `.amodal/store-data` directory.                                                |
| `proactive`      | object      | Webhook URL for external triggers (automation webhooks, third-party integrations).                                                        |
| `mcp`            | object      | MCP server connections. Each server exposes additional tools to the agent.                                                                |
| `webTools`       | object      | Enables `web_search` + `fetch_url` built-in tools via Gemini grounding. See [Web Tools](/guide/tools#web-tools).                          |

### ModelConfig

```typescript
{
  "provider": "anthropic" | "openai" | "google",
  "model": "claude-sonnet-4-20250514",
  "baseUrl": "https://...",        // optional, custom endpoint
  "credentials": {                 // optional, explicit keys
    "api_key": "env:ANTHROPIC_API_KEY"
  },
  "fallback": { ... }             // optional, another ModelConfig
}
```

#### Model Tiers: Main vs. Explore

The `main` model is your reasoning engine. It handles the user conversation, decides when to dispatch sub-agents, interprets their findings, plans next steps, and composes the final response. This is where model quality matters most — you want the best model you can afford here, because this is where judgment happens.

The `explore` model is the workhorse. When the primary agent dispatches a task agent to gather data — "go query Datadog for the last hour of CPU metrics" or "pull the customer's recent Stripe invoices" — that task agent uses the explore model. These sub-agents do focused, bounded work: load some knowledge, make API calls, interpret the raw response, and return a clean summary. They do not need the full reasoning capability of the main model.

This matters for cost and latency. A complex investigation might dispatch 5-10 task agents, each making multiple tool calls. If every one of those runs on your most expensive model, costs add up fast and the user waits longer. By routing sub-agents to a faster, cheaper model, you keep the primary agent's context clean (it only sees the summaries) and your token bill reasonable.

Typical pairings:

| Use Case       | Main          | Explore       |
| -------------- | ------------- | ------------- |
| Cost-optimized | Claude Sonnet | Claude Haiku  |
| Quality-first  | Claude Opus   | Claude Sonnet |
| Multi-provider | Claude Sonnet | Gemini Flash  |

If you omit `models.explore`, the runtime falls back to `models.main` for everything. This works fine — it just costs more and runs slower on complex questions.

#### Fallback Chains

Every `ModelConfig` can include a `fallback` — another `ModelConfig` that the runtime tries when the primary provider fails. Failures include HTTP 5xx errors, rate limits (429), timeouts, and authentication errors.

The fallback is itself a full `ModelConfig`, which means it can have its own fallback, forming a chain:

```json
{
  "models": {
    "main": {
      "provider": "anthropic",
      "model": "claude-sonnet-4-20250514",
      "fallback": {
        "provider": "openai",
        "model": "gpt-4o",
        "fallback": {
          "provider": "google",
          "model": "gemini-2.5-pro"
        }
      }
    }
  }
}
```

In this example: the runtime tries Anthropic first. If that is down (outage, rate limit), it falls back to OpenAI GPT-4o. If OpenAI also fails, it falls back to Gemini. The user never sees the failover — the runtime handles it transparently and logs the switch.

Fallback is per-request. If Anthropic recovers, the next request goes back to the primary. There is no sticky routing.

This is particularly useful for production deployments where uptime matters more than provider loyalty. A multi-provider fallback chain means your agent stays up even during provider outages, which happen more often than you would like.

### Environment Variables

Any string value in `amodal.json` can reference an environment variable using the `env:` prefix:

```json
{
  "stores": {
    "postgresUrl": "env:DATABASE_URL"
  },
  "mcp": {
    "servers": {
      "github": {
        "env": { "GITHUB_TOKEN": "env:GITHUB_TOKEN" }
      }
    }
  }
}
```

The runtime resolves these at parse time — before any connections are established or tools are loaded. If a referenced variable is missing, the runtime throws an `ENV_NOT_SET` error and refuses to start. This is intentional. A misconfigured agent is worse than a stopped agent.

#### When to Use env: vs. Hardcoded Values

**Always use `env:` for:**

* API keys, tokens, and secrets of any kind
* Database connection strings (they contain passwords)
* Anything that changes between environments (dev/staging/prod)

**Hardcode when:**

* The value is not sensitive: model names, project IDs, region strings, cron schedules
* The value is part of the agent's identity: name, version, description

A good rule of thumb: if you would not want the value visible in a public GitHub repo, use `env:`. The `amodal.json` file is checked into git. Secrets should never be in it directly.

For local development, put your environment variables in a `.env` file at the repo root (and add it to `.gitignore`). The runtime loads `.env` automatically in repo mode. For production deployments, set them in your hosting environment's secret management — Kubernetes secrets, AWS Parameter Store, Fly.io secrets, or whatever your infrastructure uses.


## Connections

Each connection is a directory in `connections/` containing a spec, access rules, and optional documentation. Pre-built plugins come with everything maintained — you just provide credentials.

```bash
amodal pkg connect slack          # install a plugin
amodal pkg sync --from <url>      # sync from OpenAPI/GraphQL
```

### Directory Structure

```
connections/my-api/
├── spec.json       ← API source, auth, entities, sync config
├── access.json     ← field restrictions, action tiers, row scoping
├── surface.md      ← (optional) endpoint documentation
├── entities.md     ← (optional) entity definitions
└── rules.md        ← (optional) business rules
```

### spec.json

Connections support two protocols: **REST** (the default) for HTTP APIs, and **MCP** for [Model Context Protocol](/guide/mcp) tool servers.

#### REST Connection

```json
{
  "baseUrl": "https://api.example.com",
  "specUrl": "https://api.example.com/openapi.json",
  "format": "openapi",
  "auth": {
    "type": "bearer",
    "token": "env:MY_API_TOKEN",
    "header": "Authorization",
    "prefix": "Bearer "
  },
  "sync": {
    "auto": true,
    "frequency": "on_push",
    "notify_drift": true
  },
  "filter": {
    "tags": ["public"],
    "include_paths": ["/api/v2/**"],
    "exclude_paths": ["/api/v2/internal/**"]
  }
}
```

#### MCP Connection

```json
{
  "protocol": "mcp",
  "transport": "stdio",
  "command": "uvx",
  "args": ["mcp-server-github"],
  "env": { "GITHUB_TOKEN": "env:GITHUB_TOKEN" }
}
```

MCP connections do not require `access.json`, `baseUrl`, or `format`. See [MCP Servers](/guide/mcp) for full details on transports and configuration.

#### Fields

| Field       | Description                                                                             |
| ----------- | --------------------------------------------------------------------------------------- |
| `protocol`  | `"rest"` (default) or `"mcp"`                                                           |
| `baseUrl`   | API base URL (required for REST)                                                        |
| `specUrl`   | URL to the API spec document (optional)                                                 |
| `testPath`  | Relative path appended to baseUrl for `validate` health checks (optional, e.g. `"/me"`) |
| `format`    | `"openapi"`, `"graphql"`, `"grpc"`, `"rest"`, or `"aws-api"` (REST only)                |
| `auth.type` | `"bearer"`, `"api_key"`, `"oauth2"`, `"basic"`, `"header"` (REST only)                  |
| `sync`      | Auto-sync settings and drift notification (REST only)                                   |
| `filter`    | Include/exclude endpoints by tag or path glob (REST only)                               |
| `transport` | `"stdio"`, `"sse"`, or `"http"` (MCP only)                                              |
| `command`   | Command to spawn (MCP stdio only)                                                       |
| `args`      | Command arguments (MCP stdio only)                                                      |
| `env`       | Environment variables (MCP stdio only)                                                  |
| `url`       | Server URL (MCP sse/http only)                                                          |
| `headers`   | HTTP headers, e.g. for auth (MCP sse/http only)                                         |
| `trust`     | Trust unsigned responses (MCP http only)                                                |

### access.json

Controls what the agent can see and do:

```json
{
  "endpoints": {
    "GET /api/deals/{id}": {
      "returns": ["Deal"],
      "confirm": false
    },
    "POST /api/deals": {
      "returns": ["Deal"],
      "confirm": true,
      "reason": "Creates a new deal",
      "thresholds": [
        { "field": "body.amount", "above": 10000, "escalate": "review" }
      ]
    },
    "DELETE /api/deals/{id}": {
      "returns": ["Deal"],
      "confirm": "never",
      "reason": "Deletion not allowed via agent"
    }
  },
  "fieldRestrictions": [
    {
      "entity": "Contact",
      "field": "ssn",
      "policy": "never_retrieve",
      "sensitivity": "pii_identifier",
      "reason": "PII — never exposed"
    },
    {
      "entity": "Contact",
      "field": "email",
      "policy": "retrieve_but_redact",
      "sensitivity": "pii_name"
    },
    {
      "entity": "Deal",
      "field": "internal_notes",
      "policy": "role_gated",
      "sensitivity": "internal",
      "allowedRoles": ["supervisor"]
    }
  ],
  "rowScoping": {
    "Deal": {
      "owner_id": {
        "type": "field_match",
        "userContextField": "userId",
        "label": "your deals"
      }
    }
  },
  "delegations": {
    "enabled": true,
    "maxDurationDays": 7,
    "escalateConfirm": true
  },
  "alternativeLookups": [
    {
      "restrictedField": "Contact.ssn",
      "alternativeEndpoint": "GET /api/contacts/{id}/verification-status",
      "description": "Use verification status instead of raw SSN"
    }
  ]
}
```

#### Action Tiers

| Tier                 | Behavior                               |
| -------------------- | -------------------------------------- |
| `false` / omitted    | Allow without confirmation             |
| `true` / `"confirm"` | Ask user for approval before executing |
| `"review"`           | Show the full plan before executing    |
| `"never"`            | Block the operation entirely           |

#### Field Restriction Policies

| Policy                | Effect                                                |
| --------------------- | ----------------------------------------------------- |
| `never_retrieve`      | Field completely removed from API responses           |
| `retrieve_but_redact` | Kept in data, replaced with `[REDACTED]` in output    |
| `role_gated`          | Removed if user lacks `allowedRoles`, else redactable |

#### Threshold Escalation

Endpoints can escalate their confirmation tier based on request parameters:

```json
{ "field": "body.amount", "above": 10000, "escalate": "review" }
```

If `body.amount > 10000`, the tier escalates from `confirm` to `review`.

### Drift Detection

```bash
amodal pkg sync --check    # report drift without updating (CI-friendly)
amodal pkg sync            # update local specs from remote
```

### Pre-built Plugins

50+ pre-built connection plugins are available for popular APIs like Slack, GitHub, Stripe, Datadog, Jira, and more. Browse and install them from the **[Amodal Marketplace](https://www.amodalai.com/marketplace)**, or install directly from the CLI:

```bash
amodal pkg connect slack
amodal pkg connect stripe
amodal pkg connect datadog
```

For APIs not in the marketplace, create custom connections using the directory structure above with your own OpenAPI or GraphQL spec.


## Engineering Standards

The standards the runtime holds itself to. If you're writing custom tool handlers, embedding the runtime via `createAgent()`, or extending the codebase, follow these patterns. They're not aesthetic — every one exists because we paid for the opposite in a previous version of the codebase.

### Errors are values

Functions that can fail return `Result<T, E>` — not `null`, not thrown exceptions (except at module boundaries).

```typescript
type Result<T, E = Error> = { ok: true; value: T } | { ok: false; error: E }
```

This forces the caller to handle both cases. You can't accidentally treat "not found" the same as "database is broken."

**Never:**

* `catch (e) { }` — empty catch
* `catch (e) { return null }` — caller can't distinguish failure from absence
* `catch (e) { log.error(e) }` without re-throwing — error logged but swallowed
* `catch (e) { return [] }` — empty data hiding a broken system

**Four valid reasons to catch:**

1. **Enrich and re-throw** — add context: `throw new StoreWriteError(store, id, err)`
2. **Module boundary → structured error response** — API routes, tool executors convert errors into the agent's observation
3. **Specific expected failure with specific handling** — retries, fallbacks
4. **Cleanup** — use `finally`, not `catch`

Error boundaries live at **module edges** — API routes, tool executors, the session manager. Not inside store backends, not inside utility functions, not inside state handlers.

### Async discipline

**No floating promises.** Every async call is `await`ed or explicitly `void`ed with a `.catch()`. A floating promise that rejects silently is as bad as a swallowed error.

```typescript
// BAD
executeStoreDirectly(storeBackend, storeName, data)

// GOOD
await executeStoreDirectly(storeBackend, storeName, data)

// GOOD — intentional fire-and-forget with error handling
void deliverResult(result).catch(err =>
  logger.error('delivery_failed', { error: err.message })
)
```

**Timeouts on all external operations.** Every provider call, MCP call, tool execution, and store operation gets an `AbortSignal.timeout()`. If the external system hangs, we don't hang with it.

```typescript
await provider.request(url, { signal: AbortSignal.timeout(5000) })
```

**Exhaustive switches on discriminated unions.** Use the `never` trick so adding a new variant causes a compile error, not silent fallthrough.

```typescript
switch (state.type) {
  case 'thinking':  return handleThinking(state, ctx)
  case 'streaming': return handleStreaming(state, ctx)
  // ... all cases ...
  default: {
    const _exhaustive: never = state
    throw new Error(`Unhandled state: ${(_exhaustive as AgentState).type}`)
  }
}
```

### Types as documentation

* **No `any`.** Use `unknown` and narrow with type guards.
* **No `as` casts** except at system boundaries (parsing external JSON/API responses, after validation).
* **Discriminated unions** for state types (`AgentState`, `SSEEvent`, `ToolResult`). The type tells you what fields exist in each variant.
* **Branded types for IDs** (`SessionId`, `TenantId`, `ToolCallId`) — prevents passing a session ID where a tenant ID is expected.

### Logging

Logs are the runtime narrative of what happened. Use the `Logger` interface, never `console.log`, `console.error`, or `process.stderr.write`.

```typescript
// BAD
console.log(`Processing tool call ${toolName} for session ${sessionId}`)

// GOOD
logger.info('tool_call_start', {
  tool: toolName,
  session: sessionId,
  tenant: tenantId,
})
```

Snake\_case event names, structured data object. Every tool call, state transition, and error emits a structured log.

**Always log** on tool calls: tool name, status, duration, session ID, tenant ID. On errors: what operation, what inputs, what state.

**Never log:** raw API credentials, tokens, full PII. Use redacted patterns.

### Module boundaries

* No importing from another module's internal files (`../agent/internal/helper.ts` from the session manager — no)
* No accessing private fields via `(obj as any).field` or `obj['_privateField']`
* No circular dependencies between modules
* Each module wraps errors at its boundary with module-specific error types

### Tool schemas

* **Code-defined tools** (store, connection, admin): use Zod schemas. You get TypeScript type inference on the execute function.
* **External-schema tools** (MCP tools, custom tools from `tool.json`): use `jsonSchema()` from the AI SDK. Pass the schema through unchanged. Converting to Zod and back is a lossy round-trip that can lose `nullable`, `oneOf`, `$ref`, or `format` constraints.

### Testing

* **Integration tests > unit tests** for tool execution — test the real path, not mocks.
* **Contract tests for SSE events** — if an event shape changes, the test fails before the UI breaks.
* **Don't test implementation details** — test public behavior. Private functions can be refactored freely.

### What this means for custom tool handlers

When you write a `handler.ts` for a custom tool, the same rules apply to your handler's code:

* Use `ctx.log` (structured event + data), not `console.log`
* Don't catch errors to swallow them — let them propagate so the executor can turn them into a proper tool-error observation
* Don't make fire-and-forget HTTP calls
* Use `ctx.request()` for connection HTTP — it handles auth, timeouts, and permission checks

### What this means for `createAgent()` embedding

When you embed the runtime via `createAgent({ storeBackend, sessionStore, ... })`:

* Your injected `StoreBackend` should return `Result<T, StoreError>` from operations that can fail
* Provider API keys belong in environment variables, not hardcoded
* Timeouts on any HTTP client you pass into the runtime
* Your own error handler middleware catches whatever bubbles up — the runtime throws typed errors at its public boundary


## Evals

Evals live in `evals/` as Markdown files. Each eval defines a query, setup context, and assertions that measure agent quality.

### Eval File Format

```markdown
# Eval: Revenue Drop Investigation

Tests the agent's ability to investigate a revenue anomaly.

## Setup

Context: Revenue dropped 30% yesterday compared to the weekly average.

## Query

"Revenue was down 30% yesterday. What happened?"

## Assertions

- Should query Stripe charges for the relevant time period
- Should compare against baseline or previous period
- Should check for known issues (billing cycle, deployments, timezone effects)
- Should provide specific numbers (dollar amounts, percentages)
- Should NOT fabricate data or guess without querying
- Should NOT blame external factors without evidence
```

### Parsed Fields

| Field           | Source                                 | Description              |
| --------------- | -------------------------------------- | ------------------------ |
| `name`          | Filename without `.md`                 | Eval identifier          |
| `title`         | `# Eval: Title` heading                | Display name             |
| `description`   | Text between heading and first `##`    | What the eval tests      |
| `setup.context` | `Context:` line in `## Setup`          | Background context       |
| `query`         | Content of `## Query` (without quotes) | The user message to test |
| `assertions`    | `## Assertions` list items             | Quality criteria         |

### Assertions

Lines starting with `- Should` are **positive assertions** — things the agent must do.

Lines starting with `- Should NOT` are **negated assertions** — things the agent must avoid.

### Running Evals

```bash
amodal eval                              # run all evals
amodal eval --file revenue-drop.md       # run one eval
amodal eval --providers anthropic,openai # compare providers
```

### Evaluation Methods

| Method            | Description                                                  |
| ----------------- | ------------------------------------------------------------ |
| **LLM Judge**     | A separate LLM evaluates the response against each assertion |
| **Tool usage**    | Verify expected tools were called                            |
| **Cost tracking** | Token usage and cost per eval case                           |

### Experiments

Compare configurations side-by-side:

```bash
amodal ops experiment
```

Test different models, skills, knowledge docs, or prompts. Results include quality scores, costs, and latency.

### Multi-Model Comparison

```bash
amodal eval --providers anthropic,openai,google
```

Runs the same suite against each provider for cost/quality/latency comparison.


## Knowledge Base

The knowledge base is what turns a general-purpose reasoning engine into *your* agent. Without knowledge, the agent is smart but uninformed — it can follow methodologies and call APIs, but it does not know that your payments service spikes every Friday at payroll time, or that the `svc-deploy` service account triggers brute-force alerts every time CI runs, or that your team prefers Slack over email for P1 escalations.

Knowledge documents live in `knowledge/` as Markdown files. They encode everything the agent needs to know about your specific environment, your baselines, your team, and the patterns you have learned from past experience. Each document is loaded into the agent's system prompt at session start, so everything in `knowledge/` is available on every turn without the agent having to look anything up. You grow the knowledge base by editing files directly — either in your editor or through the [admin agent](/guide/admin-agent), which can search, read, and edit knowledge docs through its file tools.

```
knowledge/
├── environment.md
├── baselines.md
├── patterns.md
├── false-positives.md
├── team.md
└── response-procedures.md
```

### Document Format

```markdown
# Knowledge: Normal Traffic Patterns

- Weekday: 2,000-4,000 RPS (peak 12-2 PM EST)
- Weekend: 800-1,500 RPS
- Error rate: < 0.05% on api-gateway
- Deployment windows: 10-11 AM, 3-4 PM (brief spikes expected)
- Black Friday: 15,000-25,000 RPS (sustained 12 hours)
```

The first `# Heading` becomes the title. The filename (without `.md`) becomes the document name/ID. Everything after the heading is the body.

### Categories

Knowledge documents cover these categories:

| Category                 | What                                     | Typical Source                   |
| ------------------------ | ---------------------------------------- | -------------------------------- |
| **system\_docs**         | API endpoints, auth, response formats    | Auto from connections            |
| **methodology**          | What the data means, how to interpret it | Author writes                    |
| **patterns**             | Known patterns worth detecting           | Author seeds, agent discovers    |
| **false\_positives**     | Known benign anomalies                   | Agent discovers, author approves |
| **response\_procedures** | SOPs, escalation paths                   | Author writes                    |
| **environment**          | Infrastructure layout, inventory         | Author writes                    |
| **baselines**            | What "normal" looks like                 | Author seeds, agent refines      |
| **team**                 | Contacts, preferences, escalation paths  | Author maintains                 |
| **incident\_history**    | Past sessions, resolutions               | Agent proposes                   |
| **working\_memory**      | Agent's persistent context               | Agent maintains                  |

### Realistic Knowledge Document Examples

The following examples show what production knowledge documents look like. They are detailed enough to be genuinely useful to the agent, not just placeholders.

#### Example 1: Environment Document

This document gives the agent a mental map of your infrastructure. When the agent investigates an alert on `payments-api`, it can immediately understand what depends on it, where it runs, and who owns it.

```markdown
# Knowledge: Production Environment

## Service Architecture

### Core Services
- **api-gateway** — Kong. Entry point for all client traffic. Runs on EKS
  cluster `prod-us-east-1`. 12 pods, autoscales to 40. Owned by Platform
  team (#platform-eng).
- **auth-service** — Internal. Handles JWT issuance and validation. Backed
  by Redis cluster `auth-sessions` (ElastiCache, r6g.xlarge, 3-node).
  Owned by Identity team (#identity-eng).
- **payments-api** — Internal. Processes transactions, manages payment
  methods, handles refunds. Backed by Postgres RDS `payments-prod`
  (db.r6g.2xlarge, Multi-AZ). Integrates with Stripe (connection:
  `stripe-prod`) and Plaid (connection: `plaid-prod`). Owned by Payments
  team (#payments-eng, on-call: @payments-oncall in PagerDuty).
- **notifications-service** — Internal. Sends emails (SendGrid), SMS
  (Twilio), push (Firebase). Queue-based: reads from SQS
  `notifications-prod`. Owned by Platform team.

### Data Stores
- **Primary database:** Postgres RDS `main-prod` (db.r6g.4xlarge, Multi-AZ,
  read replicas: 2). Contains users, organizations, billing tables.
- **Payments database:** Postgres RDS `payments-prod`. Isolated for PCI
  compliance. Only payments-api has access.
- **Cache:** Redis ElastiCache `cache-prod` (r6g.xlarge, 3-node cluster).
  Used by api-gateway for rate limiting and auth-service for sessions.
- **Search:** OpenSearch `search-prod` (3 data nodes, r6g.2xlarge). Powers
  product search and audit log queries.
- **Queue:** SQS queues — `notifications-prod`, `analytics-events`,
  `payment-webhooks`. DLQ configured on all queues.

### Infrastructure
- **Cloud:** AWS us-east-1 (primary), us-west-2 (DR, warm standby)
- **Orchestration:** EKS 1.28, managed node groups
- **CI/CD:** GitHub Actions → ArgoCD (connection: `argocd-prod`)
- **Monitoring:** Datadog (connection: `datadog-prod`), PagerDuty
  (connection: `pagerduty-prod`)
- **Secrets:** AWS Secrets Manager. Rotated quarterly.

### External Dependencies
- **Stripe:** Payment processing. Webhook endpoint:
  api-gateway/webhooks/stripe. Average latency: 200-400ms.
- **Plaid:** Bank account linking. Rate limit: 100 req/min per client.
  Latency: 500ms-2s (external, variable).
- **SendGrid:** Email delivery. Rate limit: 100k/day on current plan.
- **Twilio:** SMS. Rate limit: 500 msg/sec.

## Deployment Windows
- Standard deploys: 10-11 AM ET or 3-4 PM ET, weekdays only
- Hotfixes: any time, but require on-call approval
- Frozen periods: last week of each quarter (finance close),
  Black Friday through Cyber Monday
```

#### Example 2: Baselines Document

Baselines tell the agent what "normal" looks like. Without this, the agent has no way to assess whether a metric value is healthy or alarming. Notice that baselines include time-of-day and day-of-week variation — flat thresholds miss too much.

```markdown
# Knowledge: Service Baselines

## api-gateway
- **Request rate:** Weekday 2,000-4,000 RPS (peak 12-2 PM ET), weekend
  800-1,500 RPS. Below 500 RPS on a weekday is abnormal and may indicate
  a DNS or load balancer issue.
- **Error rate (5xx):** < 0.05% sustained. Brief spikes to 0.2% during
  deployments are normal (rolling restarts cause 1-2 seconds of 503s).
  Anything above 0.1% sustained for more than 5 minutes outside a deploy
  window warrants investigation.
- **Latency p50:** 45-80ms. **p95:** 150-250ms. **p99:** 400-800ms.
  p99 above 1.5s is a red flag — usually indicates database contention
  or a slow upstream dependency.
- **CPU utilization:** 30-50% steady state. Autoscaler triggers at 70%.
  If CPU is above 70% and pods haven't scaled, check the HPA config.

## payments-api
- **Request rate:** Weekday 200-600 RPS. Significant spikes on the 1st
  and 15th of each month (subscription renewals). Friday afternoons see
  a 40% bump from payroll-related transactions.
- **Error rate (5xx):** < 0.02%. This service has a lower tolerance than
  the gateway because errors mean failed transactions. Above 0.05% is a
  P1 investigation.
- **Latency p50:** 120-200ms (includes Stripe round-trip). **p99:**
  800ms-1.2s. Latency increases when Stripe is slow — check Stripe
  status page before investigating internally.
- **Database connections:** Pool max is 100. Steady state: 20-40 active.
  Above 60 active connections, query performance degrades. Above 80 is
  an emergency — connection leak or missing connection release.
- **Queue depth (payment-webhooks):** 0-50 messages. Above 200 means
  the consumer is falling behind. Above 1,000 check if the consumer
  is running at all.

## notifications-service
- **Queue depth (notifications-prod):** 0-200 messages during steady
  state. Spikes to 5,000-10,000 during marketing campaign sends
  (usually Tuesday/Thursday mornings). These clear within 30 minutes
  and are expected.
- **Email send rate:** 50-200/minute steady state. Campaign bursts:
  1,000-2,000/minute. Above 3,000/minute risks SendGrid rate limiting.
- **SMS send rate:** 5-20/minute. Spikes during 2FA waves (Monday
  mornings when everyone logs in).

## General Patterns
- **Monday mornings (9-10 AM ET):** 15-20% traffic increase as users
  log in. Auth-service sees 3-5x normal request rate. This is normal.
- **Monthly subscription renewals (1st and 15th):** payments-api
  handles 3-4x normal volume. Pre-warm expectations: jobs start at
  midnight ET, complete by 6 AM ET.
- **End of quarter:** Slightly elevated API traffic from reporting
  integrations pulling data. Not an issue unless it coincides with
  a deploy.
```

#### Example 3: Patterns Document

Patterns document the recurring situations the agent should recognize. This is institutional knowledge — the kind of thing a senior engineer knows from experience but that does not exist in any runbook.

```markdown
# Knowledge: Known Patterns

## Deployment-Induced Error Spikes
**What it looks like:** 5xx error rate spikes to 0.1-0.5% for 30-90
seconds, then returns to baseline.
**Root cause:** Rolling deployment restarts pods. During restart, some
requests hit terminating pods. Kong retries but a small percentage fail.
**How to identify:** Check deploy timestamps in ArgoCD. If the spike
start time is within 60 seconds of a deploy and the spike duration is
under 2 minutes, this is almost certainly deployment noise.
**Action:** No action needed. If the spike lasts longer than 3 minutes,
the new version may be failing health checks — escalate to deployment
verification.

## Stripe Latency Propagation
**What it looks like:** payments-api latency p99 increases by 2-5x.
Error rate may or may not increase (depends on timeout configuration).
**Root cause:** Stripe is experiencing elevated latency. Since
payments-api calls Stripe synchronously for transaction processing,
Stripe latency directly impacts payments-api latency.
**How to identify:** Compare payments-api latency increase with Stripe
API latency (available via Datadog integration or Stripe status page).
If both increase at the same time, the root cause is Stripe.
**Action:** Check status.stripe.com. If Stripe has an active incident,
there is nothing to fix on our side. Notify the payments team so they
can monitor. If no Stripe incident is reported, investigate the specific
Stripe API endpoints being called.

## Monday Morning Auth Storm
**What it looks like:** auth-service CPU spikes to 70-90%, Redis
connections spike, latency increases by 2-3x. Usually between 9-10 AM ET.
**Root cause:** Thousands of users logging in simultaneously after the
weekend. JWT tokens issued Friday afternoon expire Monday morning,
causing a burst of re-authentication.
**How to identify:** Timing (Monday 9-10 AM ET) and the fact that the
spike is concentrated in auth-service with no corresponding anomaly in
other services.
**Action:** No action needed. The spike is self-limiting (clears by
10:30 AM). If it does NOT clear by 11 AM, investigate — the auth-service
may have an issue exacerbated by load.

## Database Connection Pool Exhaustion
**What it looks like:** Service latency increases dramatically (5-10x),
error rate climbs steadily (not a spike but a ramp), database CPU is
low but connection count is at or near the pool maximum.
**Root cause:** A code path is acquiring database connections but not
releasing them, usually due to an unhandled exception in a transaction
block. Often introduced in a recent deployment.
**How to identify:** If connection count is above 80% of pool max AND
database CPU is below 50%, the problem is almost certainly connection
leaks rather than query load. Check for recent deployments to the
affected service.
**Action:** This is a P2 that becomes P1 quickly. The connection pool
will fill up completely within minutes to hours. Short-term: restart
the affected service pods to release connections. Medium-term: identify
and fix the leaking code path.

## CI/CD Service Account Alerts
**What it looks like:** Security alert for brute-force or anomalous
login from service account `svc-deploy` or `svc-ci`.
**Root cause:** CI/CD pipelines authenticate to internal services via
service accounts. Rapid successive authentication attempts during a
deploy pipeline trigger brute-force detection rules.
**How to identify:** Check if the alert source is a known CI/CD service
account (svc-deploy, svc-ci, svc-argocd) and whether the timing
correlates with a pipeline run in GitHub Actions.
**Action:** False positive. No action needed. If the service account
is NOT in the known list, investigate normally.
```

### How Knowledge Enters Context

All knowledge documents are loaded into the agent's system prompt at session start. No on-demand loading, no tool call, no "which doc should I fetch" decision by the agent — it sees everything in `knowledge/` from turn one. Sub-agents dispatched via `dispatch_task` see the same knowledge: they share the parent's context compiler.

This keeps the agent's reasoning simple. A user asks about elevated error rates on payments-api, and the agent already knows the baseline, the deployment windows, the known patterns, and the team directory — no lookup turn, no fetch decision. Straight to reasoning.

#### The context budget

Eager loading works because realistic knowledge bases are small. A deployment with 20 moderate-length knowledge docs fits comfortably in a few thousand tokens. At that scale, the simplicity of "everything in context" wins over on-demand loading.

If your knowledge base grows past \~50K tokens and starts crowding the context window, the right fix is usually to **split by session type** (separate agent repos or different `userContext` blocks per persona) rather than adding retrieval. Retrieval adds a lookup hop that every investigation has to pay, and that cost compounds across turns.

### Growing the Knowledge Base

Knowledge docs are source files. You grow the knowledge base by **writing to the files**:

1. **In your editor.** Treat knowledge docs like any other source code — version them in git, review them in PRs, roll them back when they're wrong.
2. **Through the admin agent.** Run `amodal dev` and open the admin chat at `/config/chat`. The admin agent has file tools (`grep_repo_files`, `read_repo_file`, `edit_repo_file`) scoped to `knowledge/` and other config directories. Describe the update in natural language; it finds the right file and edits it. See [Admin Agent](/guide/admin-agent).

#### Capturing findings mid-investigation

If you want the agent to record findings as it goes (new patterns it discovers, baselines it refines, false positives it identifies) — use a [store](/guide/stores). Stores are structured, queryable, and grow indefinitely. Knowledge docs are narrative and opinionated; stores are data. The pattern is:

* **Knowledge doc** — stable guidance, written by humans (or admin agent): "Deployment-Induced Error Spikes last 2–3 min and concentrate on newly-deployed endpoints."
* **Store** — session-by-session records, written by the agent: a `findings` store with one document per incident, containing the agent's observations and resolutions.

Every N weeks, someone (human or admin agent) reviews the store, promotes recurring findings into knowledge docs, and archives the originals. That's the flywheel — no runtime approval queue, just the normal source-control loop.


## MCP Servers

Amodal supports the [Model Context Protocol](https://modelcontextprotocol.io) for connecting external tool servers. MCP tools appear alongside your custom tools and built-in tools — the agent uses them transparently.

### Configuration

The recommended way to define an MCP server is as a [connection](/guide/connections) with `"protocol": "mcp"` in its `spec.json`:

```
connections/github/
└── spec.json
```

```json
{
  "protocol": "mcp",
  "transport": "stdio",
  "command": "uvx",
  "args": ["mcp-server-github"],
  "env": { "GITHUB_TOKEN": "env:GITHUB_TOKEN" }
}
```

This gives you all the benefits of the connections directory — `surface.md` documentation, `rules.md` business rules, and a unified view in the admin UI. MCP connections do not require `access.json`.

#### Legacy: amodal.json

MCP servers can also be defined in the `mcp.servers` block of `amodal.json`. This still works but the connection-directory approach above is preferred:

```json
{
  "mcp": {
    "servers": {
      "github": {
        "transport": "stdio",
        "command": "uvx",
        "args": ["mcp-server-github"],
        "env": { "GITHUB_TOKEN": "env:GITHUB_TOKEN" }
      },
      "filesystem": {
        "transport": "sse",
        "url": "http://localhost:8001"
      },
      "docs": {
        "transport": "http",
        "url": "https://docs.example.com/api/mcp",
        "trust": true
      }
    }
  }
}
```

### Transports

| Transport | Config                    | Use Case                      |
| --------- | ------------------------- | ----------------------------- |
| `stdio`   | `command`, `args`, `env`  | Local tools via child process |
| `sse`     | `url`, `headers`          | Server-Sent Events over HTTP  |
| `http`    | `url`, `trust`, `headers` | Streamable HTTP transport     |

The `headers` field is optional. Use it for authenticated MCP servers that require a Bearer token or other HTTP headers.

### Tool Discovery

MCP tools are automatically discovered when servers connect. Tools are namespaced with the server name (or connection directory name) to avoid collisions:

```
github__create_issue
github__list_repos
filesystem__read_file
```

The separator is `__` (double underscore).

### CLI

```bash
# MCP servers start automatically with amodal dev
amodal dev

# The agent sees MCP tools alongside custom and built-in tools
amodal inspect   # shows all discovered tools including MCP
```

### Behavior

* **Non-fatal startup** — if an MCP server fails to connect, other servers and the agent still work
* **Auto-discovery** — tools are discovered after connection, no manual registration needed
* **Graceful shutdown** — servers are cleaned up when the runtime stops
* **Environment variables** — use `env:VAR_NAME` pattern for credentials

### Example: GitHub MCP (Connection Directory)

```json
// connections/github/spec.json
{
  "protocol": "mcp",
  "transport": "stdio",
  "command": "uvx",
  "args": ["mcp-server-github"],
  "env": {
    "GITHUB_TOKEN": "env:GITHUB_TOKEN"
  }
}
```

This gives the agent access to GitHub tools (create issues, list PRs, read files, etc.) via the GitHub MCP server.

### Example: Authenticated HTTP MCP

```json
// connections/internal-tools/spec.json
{
  "protocol": "mcp",
  "transport": "http",
  "url": "https://tools.internal.acme.com/mcp",
  "headers": {
    "Authorization": "env:INTERNAL_TOOLS_TOKEN"
  },
  "trust": true
}
```

Use `headers` to pass Bearer tokens or API keys to authenticated MCP servers.


## Pages

Pages are **React components** that render custom UI views in the runtime app. They live in `pages/` and use SDK hooks to read from stores, invoke skills, and display structured data. This is how you build dashboards, morning briefs, investigation views, and other composed screens.

```
pages/
├── ops-dashboard.tsx
├── morning-brief.tsx
└── deal-detail.tsx
```

### Page Format

Each page file exports a `page` config object and a default React component:

```tsx
export const page = {
  name: 'ops-dashboard',
  description: 'Facility overview — alerts and device status',
  stores: ['active-alerts', 'zone-status'],
  automations: ['scan-alerts'],
}

export default function OpsDashboard() {
  const [alerts, setAlerts] = React.useState([])

  React.useEffect(() => {
    fetch('/api/stores/active-alerts')
      .then((res) => res.json())
      .then((data) => setAlerts(data.documents || []))
      .catch(() => {})
  }, [])

  return (
    <div style={{ padding: '24px', background: '#0f1117', minHeight: '100%', color: '#e4e4e7' }}>
      <h1>Operations Dashboard</h1>
      <p>{alerts.length} active alerts</p>
    </div>
  )
}
```

### Page Config

| Field         | Type       | Default  | Description                                                                                                |
| ------------- | ---------- | -------- | ---------------------------------------------------------------------------------------------------------- |
| `name`        | string     | filename | Page identifier and URL slug                                                                               |
| `description` | string     | —        | Shown in the page header and sidebar tooltip                                                               |
| `stores`      | `string[]` | —        | Stores this page reads from. Shown in the data source bar with links.                                      |
| `automations` | `string[]` | —        | Automations that populate the page data. Shown in the data source bar with status, toggle, and run button. |
| `icon`        | string     | —        | Lucide icon name (e.g., `'shield'`, `'monitor'`, `'bar-chart'`)                                            |
| `hidden`      | boolean    | `false`  | If true, excluded from sidebar                                                                             |

When `stores` or `automations` are declared, the runtime automatically shows a **data source bar** above the page with store links, automation live/paused toggle, schedule, and a run button. This is a platform feature — no code required in the page itself.

### SDK Hooks

Pages use hooks from `@amodalai/react` to access agent data:

| Hook                             | Description                                                      |
| -------------------------------- | ---------------------------------------------------------------- |
| `useStoreList(store, options)`   | Fetch multiple documents with filtering, sorting, and pagination |
| `useStore(store, key)`           | Fetch a single document by key                                   |
| `useSkillAction(skill, options)` | Invoke a skill from the page                                     |

#### useStoreList

```tsx
const { data, loading, error, refresh } = useStoreList('classified-alerts', {
  filter: { severity: 'P1' },
  sort: { field: 'timestamp', order: 'desc' },
  limit: 50,
  refreshInterval: 10000,  // auto-refresh every 10s
})
```

#### useStore

```tsx
const { data: alert } = useStore('classified-alerts', alertId)
```

#### useSkillAction

```tsx
const { invoke, loading } = useSkillAction('triage')

const handleTriage = () => {
  invoke({ query: 'Triage the latest alerts' })
}
```

### Sidebar Integration

Pages appear in the runtime app sidebar under a **Pages** section. The name is auto-formatted from the filename (`ops-dashboard` → "Ops Dashboard"). Click a page to navigate to `/pages/{pageName}`.

Hidden pages (`hidden: true`) are excluded from the sidebar but still accessible via direct URL — useful for detail pages navigated to from other pages.

### Context Pages

Pages with `context` params are detail views that receive route parameters:

```tsx
export const page = {
  name: 'deal-detail',
  icon: 'file-text',
  context: { dealId: 'string' },
  hidden: true,  // navigated to, not shown in sidebar
}

export default function DealDetail({ dealId }: { dealId: string }) {
  const { data: deal } = useStore('deals', dealId)
  // ...
}
```

### Entity Pages vs. Composed Pages

* **Entity pages** are auto-generated from store definitions — list and detail views come for free without writing page files
* **Composed pages** are what you define in `pages/` — custom views that combine data from multiple stores with custom layout and logic

Only composed pages need explicit page files. If you just need a list/detail view of a single store, the runtime generates that automatically.

### Hot Reload

During `amodal dev`, changes to page files trigger hot module replacement (HMR) via the Vite plugin. Edit a page and see changes instantly in the browser.

### Example: Surveillance Dashboard

```tsx
import { useStoreList } from '@amodalai/react'

export const page = {
  name: 'ops-dashboard',
  icon: 'shield',
  description: 'Facility surveillance overview',
}

export default function OpsDashboard() {
  const { data: alerts } = useStoreList('classified-alerts', {
    sort: { field: 'confidence', order: 'desc' },
    limit: 10,
  })
  const { data: zones } = useStoreList('zone-status')
  const { data: devices } = useStoreList('device-profiles')

  return (
    <div className="grid grid-cols-2 gap-4">
      <section>
        <h2>Active Alerts</h2>
        {alerts?.map((a) => (
          <AlertCard key={a.key} alert={a.payload} />
        ))}
      </section>
      <section>
        <h2>Zone Status</h2>
        {zones?.map((z) => (
          <ZoneCard key={z.key} zone={z.payload} />
        ))}
      </section>
    </div>
  )
}
```


## Providers

Amodal supports multiple LLM providers with a unified interface, built on the Vercel AI SDK. Switch providers by changing a field in `amodal.json` — no code changes needed.

### Supported Providers

| Provider         | Models                                 | Auth env var        | Notes                                               |
| ---------------- | -------------------------------------- | ------------------- | --------------------------------------------------- |
| **Anthropic**    | Claude Opus 4, Sonnet 4, Haiku 4.5     | `ANTHROPIC_API_KEY` | Native adapter, prompt caching supported            |
| **OpenAI**       | GPT-4o, GPT-4, o1, o3                  | `OPENAI_API_KEY`    | Native adapter                                      |
| **Google**       | Gemini 3.1 Pro, 3 Flash, 2.5 Pro/Flash | `GOOGLE_API_KEY`    | Native adapter, same key as webTools                |
| **Groq**         | Llama 3.3 70B, Kimi-K2                 | `GROQ_API_KEY`      | OpenAI-compatible — ultra-low latency               |
| **DeepSeek**     | deepseek-chat, deepseek-reasoner       | `DEEPSEEK_API_KEY`  | OpenAI-compatible                                   |
| **xAI**          | Grok-4, Grok-3                         | `XAI_API_KEY`       | OpenAI-compatible                                   |
| **Mistral**      | Mistral Large, Small                   | `MISTRAL_API_KEY`   | OpenAI-compatible                                   |
| **Fireworks**    | Open-weight models (Llama, Qwen, etc.) | `FIREWORKS_API_KEY` | OpenAI-compatible hosting                           |
| **Together**     | Open-weight models                     | `TOGETHER_API_KEY`  | OpenAI-compatible hosting                           |
| **Bedrock**      | Claude, Titan, Llama (AWS)             | AWS credentials     | Available for eval judges; agent loop coming        |
| **Azure OpenAI** | GPT-4, GPT-4o on Azure                 | Azure creds         | Available for eval judges; agent loop coming        |
| **Custom**       | Any OpenAI-compatible endpoint         | per-provider        | Set `baseUrl` to point at your own inference server |

#### Recommended models by use case

| Use case          | Model                           | Input/Output per 1M | Why                                              |
| ----------------- | ------------------------------- | ------------------- | ------------------------------------------------ |
| Production agents | `gemini-3.1-pro-preview`        | $2.00 / $12.00      | Best quality/cost ratio, strong tool-calling     |
| Budget agents     | `gemini-3-flash-preview`        | $0.50 / $3.00       | Fast, cheap, good for simple workflows           |
| Maximum quality   | `claude-opus-4-20250514`        | $15.00 / $75.00     | Best reasoning, prompt caching cuts repeat costs |
| Fast + cheap      | `gemini-3.1-flash-lite-preview` | $0.25 / $1.50       | Sub-agent dispatch, simple tasks                 |
| Low-latency       | `groq/llama-3.3-70b-versatile`  | \~$0.60 / $0.80     | Groq hardware, \~300 tok/s                       |

OpenAI-compatible providers (Groq, DeepSeek, xAI, Mistral, Fireworks, Together) reuse the OpenAI adapter with a per-provider `baseUrl`. Any additional OpenAI-compatible endpoint works by setting `baseUrl` explicitly in `amodal.json` — no code changes needed.

### Configuration

#### Auto-detection

Set the relevant environment variable and Amodal auto-detects the provider:

```bash
export ANTHROPIC_API_KEY=sk-ant-...
amodal dev   # uses Anthropic automatically
```

#### Explicit config

Specify under `models.main` in `amodal.json`:

```json
{
  "models": {
    "main": {
      "provider": "anthropic",
      "model": "claude-sonnet-4-20250514"
    }
  }
}
```

For OpenAI-compatible providers, set `provider` to the provider name and optionally override `baseUrl`:

```json
{
  "models": {
    "main": {
      "provider": "groq",
      "model": "llama-3.3-70b-versatile",
      "apiKey": "env:GROQ_API_KEY"
    }
  }
}
```

### Failover

The `createFailoverProvider()` factory cascades between providers with retry logic and linear backoff:

```json
{
  "provider": "failover",
  "providers": ["anthropic", "openai"],
  "retries": 2,
  "backoffMs": 1000
}
```

If the primary provider fails, the runtime automatically tries the next one.

### Streaming

All providers support streaming via the `LLMProvider.streamText()` interface. The streaming shape is unified — state handlers don't know which provider is active.

### Multi-Model Comparison

Use `amodal eval` or `amodal ops experiment` to compare providers:

```bash
amodal eval --providers anthropic,openai,google
```

This runs the same eval suite against each provider and reports quality, latency, and cost differences.


## Security & Guardrails

Security is built into the runtime at every layer. Six components work together to protect sensitive data and control agent behavior.

### Security Pipeline

```
API Response
  → Field Scrubber    (strip restricted fields)
  → Agent processes data
  → Output Guard      (4-stage filtering before user sees output)
      1. Field Redaction   (replace scrubbed values with [REDACTED])
      2. Pattern Scanner   (detect SSN, credit cards, bank accounts)
      3. Leak Detector     (compare output against tracked scrubbed values)
      4. Scope Checker     (flag unqualified aggregate claims)
  → Action Gate       (confirm/review/block write operations)
```

### Field Scrubber

Strips restricted fields from API responses before the LLM sees them. Configured per connection in `access.json`:

| Policy                | Effect                                                |
| --------------------- | ----------------------------------------------------- |
| `never_retrieve`      | Field completely removed from response                |
| `retrieve_but_redact` | Kept in data, replaced with `[REDACTED]` in output    |
| `role_gated`          | Removed if user lacks `allowedRoles`, else redactable |

### Output Guard

Four-stage filter on every agent response:

**1. Field Redaction** — replaces `retrieve_but_redact` and denied `role_gated` values with `[REDACTED]`.

**2. Pattern Scanner** — regex detection of:

* SSN (`XXX-XX-XXXX`)
* Credit cards (13-19 digits, Luhn-validated)
* Bank accounts (8-17 digits near keywords like "account", "routing")

**3. Leak Detector** — compares agent output against all scrubbed values tracked in the session. `pii_identifier` values are always flagged. `pii_name` values only if entity context is nearby.

**4. Scope Checker** — detects unqualified aggregate claims ("all devices", "every contact") about scoped entities. Flags when the agent says "all X" but only has access to a subset.

Critical findings block output entirely.

### Action Gate

Controls write operation confirmations based on `access.json`:

| Tier      | Behavior                        |
| --------- | ------------------------------- |
| `allow`   | Execute without confirmation    |
| `confirm` | Ask user for approval           |
| `review`  | Show full plan before executing |
| `never`   | Block entirely                  |

**Threshold escalation**: endpoint tiers can escalate based on request parameters:

```json
{ "field": "body.amount", "above": 10000, "escalate": "review" }
```

**Delegation escalation**: if `isDelegated` (sub-agent acting on behalf), `confirm` escalates to `review`.

### Role-Based Access Control

Roles filter tools and skills **at the SDK layer** — the LLM never sees tools it doesn't have access to:

```json
{
  "name": "analyst",
  "tools": ["request", "query_store", "dispatch_task"],
  "skills": ["triage", "deep-dive"],
  "automations": { "can_view": true, "can_create": false },
  "constraints": {}
}
```

Use `"*"` as wildcard to allow all tools/skills.

### Plan Mode

When active, all writes are blocked until the user approves a plan:

1. Agent enters plan mode (triggered by security rules or manually)
2. Agent proposes a plan
3. User approves → plan is injected into context, writes re-enabled
4. Agent executes the approved plan

### Session Limits

| Limit          | Default      | Description                  |
| -------------- | ------------ | ---------------------------- |
| Max turns      | 15           | Prevents infinite loops      |
| Timeout        | configurable | Hard time limit              |
| Loop detection | always on    | Pattern matching + LLM-based |

### Audit Logging

Every action is logged with immutable hash chains:

| Event                           | Logged                        |
| ------------------------------- | ----------------------------- |
| `tool_call`                     | Tool name, params, duration   |
| `write_op`                      | Write operations specifically |
| `session_start` / `session_end` | Session lifecycle             |
| `version_load`                  | Config version loaded         |
| `kb_proposal`                   | Knowledge update proposals    |

Two sinks: Console (dev), File (JSON).

Each entry includes a SHA-256 hash of the previous entry, creating a tamper-evident chain.


## Skills

Every LLM can answer questions. The difference between a generic chatbot and a domain expert is *how it reasons* about a problem. Skills encode that reasoning — they are step-by-step methodologies, written in Markdown, that teach the agent how to investigate, assess, and respond to specific situations.

Think of skills as the institutional knowledge that lives in the heads of your best people. The security engineer who knows exactly which logs to check first when an alert fires. The sales ops analyst who can assess a deal's health in five minutes by pulling the right CRM fields. The SRE who has a mental checklist for verifying a deployment. Skills capture those workflows and make them available to every conversation.

Without skills, the agent is smart but generic. With skills, it becomes a specialist.

```
skills/
├── triage/
│   └── SKILL.md
├── deal-review/
│   └── SKILL.md
├── deployment-verification/
│   └── SKILL.md
└── incident-response/
    └── SKILL.md
```

### SKILL.md Format

Two formats are supported:

#### Heading-based (recommended)

```markdown
# Skill: Incident Response

Gather context, assess impact, and coordinate response for active incidents.
Trigger: When the user reports an outage, service degradation, or active incident.

## Behavior

1. Identify the affected service and symptoms
2. Assess blast radius — which systems and users are impacted?
3. Gather context:
   - Recent deployments or config changes
   - Monitoring metrics (error rates, latency)
   - Dependent service health
4. Correlate: what changed right before the incident?
5. Recommend immediate mitigation and investigation path

## Constraints

- Do not restart services without explicit user confirmation
- Do not dismiss alerts as false positives without evidence
```

#### Frontmatter-based

```markdown
---
name: incident-response
description: Gather context, assess impact, coordinate response
trigger: When the user reports an outage or active incident
---

## Methodology

...body content...
```

### Parsed Fields

| Field         | Source                                       | Description         |
| ------------- | -------------------------------------------- | ------------------- |
| `name`        | `# Skill: Name` or frontmatter `name`        | Skill identifier    |
| `description` | First paragraph or frontmatter `description` | What the skill does |
| `trigger`     | `Trigger:` line or frontmatter `trigger`     | When to activate    |
| `body`        | Everything after name/description            | The methodology     |

### Skill Activation

The agent sees all installed skill names and triggers at the start of every session. When a user's message matches a skill's trigger, the agent activates it — pulling the full methodology into its reasoning context. The agent can also activate skills mid-conversation when findings suggest a different approach is needed. This happens naturally: the agent does not need explicit instructions to transition between skills.

### Realistic Skill Examples

The following are complete, production-grade skills that demonstrate how to encode real domain expertise into Markdown.

#### Example 1: Security Alert Triage

This skill teaches the agent to systematically investigate a security alert rather than jumping to conclusions. Notice how each step specifies what to query, what to look for, and what the findings mean.

```markdown
# Skill: Security Alert Triage

Investigate and classify security alerts by gathering context from detection
systems, correlating with known patterns, and assessing real risk.
Trigger: When the user shares a security alert, detection finding, suspicious
activity, or asks about a potential threat.

## Behavior

### Step 1: Parse the Alert

Extract the core facts from the alert before doing anything else:
- **What** was detected (signature, rule name, detection logic)
- **Where** it happened (host, IP, user, service)
- **When** it occurred (timestamp, timezone, duration)
- **Severity** as reported by the detection system

Dispatch a task agent to pull the raw alert details from the detection
platform (Datadog, CrowdStrike, Splunk, etc.) and return a structured summary.

### Step 2: Enrich with Context

For each entity in the alert, gather surrounding context:
- **User context:** Role, department, login history, recent access patterns.
  Dispatch a task agent to query the identity provider (Okta, Azure AD).
- **Host context:** OS, installed software, patch level, recent changes.
  Dispatch a task agent to query the asset inventory.
- **Network context:** Source/destination IPs, geo-location, reputation scores.
  Dispatch a task agent to query threat intel feeds.

Do NOT attempt to correlate yet — just gather.

### Step 3: Check Against Known Patterns

Load knowledge tagged `patterns` and `false_positives`. Compare the alert
against:
- Known false positives (e.g., vulnerability scanners, pen-test IPs, CI/CD
  service accounts that trigger brute-force rules)
- Known attack patterns (e.g., credential stuffing followed by lateral movement)
- Recent similar alerts (query the detection platform for the same rule
  firing in the last 7 days)

If the alert matches a known false positive with >90% confidence, classify
it and explain why. Do not dismiss without evidence.

### Step 4: Assess Real Risk

Based on gathered context, evaluate:
- **Is this expected behavior?** A deploy service account pulling container
  images at 3 AM is normal. A marketing intern doing the same is not.
- **What is the blast radius?** A compromised service account with read-only
  access to a staging bucket is different from one with admin access to prod
  databases.
- **Is there evidence of progression?** A single failed login is noise. Ten
  failed logins followed by a successful one followed by a role change is a
  kill chain.

Assign a confidence-weighted severity:
- **Critical:** Active compromise with evidence of data access or lateral
  movement. Confidence > 80%.
- **High:** Strong indicators of compromise, but no evidence of progression
  yet. Confidence > 70%.
- **Medium:** Suspicious activity that warrants investigation but has benign
  explanations. Confidence 40-70%.
- **Low/False Positive:** Matches a known pattern or has a clear benign
  explanation. Confidence > 85% it is benign.

### Step 5: Recommend Response

Based on severity:
- **Critical/High:** Present findings immediately. Recommend containment
  actions (disable account, isolate host, revoke tokens). List what the user
  should confirm before the agent takes action.
- **Medium:** Present findings with recommended next investigation steps.
  Suggest what additional data would increase confidence.
- **Low/FP:** Summarize findings. If this is a new false positive pattern
  not already in the KB, propose a knowledge update.

## Constraints

- Never dismiss an alert as a false positive without loading and checking
  the false_positives knowledge category
- Never recommend containment actions without presenting evidence first
- Always state confidence level alongside severity assessment
- If confidence is below 40% on any assessment, report as inconclusive
  and recommend manual review
```

#### Example 2: Deal Health Review

This skill is for a sales ops use case. The agent pulls CRM data, evaluates a deal against known healthy deal patterns, and surfaces risks. Notice how it specifies which fields to pull and what "good" vs "bad" looks like.

```markdown
# Skill: Deal Health Review

Assess the health of an active deal by pulling CRM data, evaluating engagement
signals, and comparing against historical win patterns.
Trigger: When the user asks about a deal, opportunity, or pipeline item — or
asks to review pipeline health.

## Behavior

### Step 1: Pull Deal Data

Dispatch a task agent to query the CRM (Salesforce, HubSpot) for the deal:
- Deal stage, amount, close date, days in current stage
- Contact roles: who is the champion, economic buyer, technical evaluator?
- Activity history: last 30 days of emails, calls, meetings
- Competitor mentions in notes or call transcripts
- Stage history: how long in each previous stage?

If the user asks about multiple deals or "the pipeline," dispatch parallel
task agents — one per deal.

### Step 2: Evaluate Engagement Signals

Score engagement on a simple framework:
- **Champion engagement:** Has the identified champion had a meaningful
  interaction (call, meeting, email reply) in the last 10 days? If not,
  flag as "champion gone quiet."
- **Multi-threading:** Are there at least 2 contacts engaged from
  different departments? Single-threaded deals above $50K are high risk.
- **Meeting cadence:** Has there been at least one meeting in the last
  14 days for deals in stages 3+? A stalled meeting cadence in late stages
  is the #1 predictor of a slip.
- **Next steps:** Is there a concrete next step with a date? "They'll get
  back to us" is not a next step.

### Step 3: Compare to Historical Patterns

Load knowledge tagged `patterns` and `baselines`. Compare:
- **Stage velocity:** How does time-in-stage compare to won deals of
  similar size? Deals that take 2x the median time in a stage close at
  less than half the rate.
- **Close date movement:** Has the close date been pushed more than once?
  Two pushes correlates with a 70% drop in win rate based on typical
  historical data.
- **Deal size changes:** A significant discount (>20%) in late stages
  often indicates the champion lost internal sponsorship and is trying
  to reduce risk for the buyer.

### Step 4: Present Assessment

Use the present tool to render a deal health card:
- Overall health: Healthy / At Risk / Critical
- Key metrics: days in stage, engagement score, next step status
- Risk flags: specific, actionable items (not vague warnings)
- Recommended actions: what the rep should do this week

For pipeline reviews, present a summary table sorted by risk level,
then drill into the top 3 at-risk deals.

## Constraints

- Never fabricate activity data — if the CRM has no recent activities,
  say so clearly
- Always distinguish between "no data" and "negative signal" — a missing
  champion contact is different from a champion who stopped responding
- Do not predict win probability as a precise number — use the health
  categories (Healthy / At Risk / Critical) with supporting evidence
```

#### Example 3: Deployment Verification

This skill runs a post-deploy checklist. It is a good example of a linear, checklist-style skill where each step has clear pass/fail criteria.

```markdown
# Skill: Deployment Verification

Verify that a deployment is healthy by checking key metrics, error rates,
and functional indicators against pre-deploy baselines.
Trigger: When the user mentions a recent deploy, release, rollout, or asks
to verify a deployment.

## Behavior

### Step 1: Identify the Deployment

Determine what was deployed:
- Service name and version
- When it was deployed (timestamp)
- What changed (commit range, PR numbers, changelog)
- Deploy method (rolling, blue-green, canary)

If the user doesn't provide this, dispatch a task agent to query the
deployment system (GitHub Actions, ArgoCD, Spinnaker) for recent deploys.

### Step 2: Establish Baseline

Load knowledge tagged `baselines` for the deployed service. Determine
the pre-deploy baseline for each key metric:
- Error rate (p50, p95, p99 for the last hour before deploy)
- Latency (p50, p95, p99)
- Request rate (to detect traffic drops)
- CPU and memory utilization
- Queue depths / consumer lag (if applicable)

Dispatch a task agent to pull pre-deploy metrics from the monitoring
system (Datadog, Prometheus, CloudWatch) for the 1-hour window before
the deploy timestamp.

### Step 3: Check Post-Deploy Metrics

Dispatch a task agent to pull the same metrics for the post-deploy window
(deploy timestamp + 15 minutes through now). Compare:

- **Error rate:** Any increase above 2x the baseline p99 is a red flag.
  A sustained increase above 1.5x for more than 10 minutes warrants
  investigation.
- **Latency:** p50 increase > 20% is notable. p99 increase > 50% is
  a red flag.
- **Request rate:** A drop of more than 10% may indicate failed health
  checks or load balancer issues — the service might be up but not
  receiving traffic.
- **Resources:** CPU > 80% sustained or memory trending upward without
  plateau suggests a resource leak.

### Step 4: Check Functional Health

Dispatch a task agent to check:
- Health check endpoints (if defined in the connection config)
- Recent errors in application logs (new error types not seen before
  the deploy)
- Downstream dependency errors (did the deploy break a consumer?)

### Step 5: Render Verdict

Present a deployment health card:
- **Healthy:** All metrics within acceptable ranges. No new error types.
  Recommend: continue monitoring for 1 hour.
- **Degraded:** One or more metrics outside range but not critical.
  Recommend: investigate the specific metric, prepare to rollback.
- **Unhealthy:** Multiple red flags or critical metric breach.
  Recommend: rollback immediately. List the evidence.

Include a comparison table: metric name, baseline value, current value,
status (pass/warn/fail).

## Constraints

- Always compare to baseline, never to absolute thresholds — what is
  "normal" varies wildly between services
- A 15-minute post-deploy window is the minimum for comparison. If the
  deploy happened less than 15 minutes ago, say so and recommend waiting
- Do not recommend rollback without presenting the evidence — the user
  needs to make an informed decision
- If baseline knowledge is missing, report that the verification is
  incomplete and recommend the user add baseline docs to the KB
```

### Skill Chaining

Skills are not isolated — they chain naturally as the agent's understanding of a situation evolves. The agent does not need explicit instructions to transition between skills. It recognizes when its findings point toward a different methodology and activates the appropriate skill.

Here is a realistic example of skill chaining in practice:

**The user asks:** "We got an alert about elevated 5xx errors on the payments service."

1. **Security Alert Triage activates.** The trigger matches — the user shared an alert. The agent begins its triage methodology: parse the alert, enrich with context, check against known patterns.

2. **During Step 2 (Enrich with Context),** a task agent queries the deployment system and discovers that a new version of the payments service was deployed 22 minutes ago. The error rate increase started 18 minutes ago — 4 minutes after the deploy.

3. **The agent transitions to Deployment Verification.** The findings strongly suggest this is a deployment issue, not a security incident. The agent does not abandon its security findings — it carries forward the relevant context (timestamps, error patterns) but now follows the deployment verification methodology.

4. **Deployment Verification Step 3 confirms the correlation.** Post-deploy error rates are 8x the baseline. Latency p99 increased by 300%. The new version introduced a regression.

5. **The agent presents a unified finding:** "The elevated 5xx errors on payments-service correlate with deploy v2.14.3 (deployed 22 minutes ago). Error rate is 8x baseline, latency p99 is 4x baseline. This appears to be a deployment regression, not a security incident. Recommend immediate rollback."

The user saw one seamless investigation. Behind the scenes, the agent used two skills, dispatched five task agents, and processed data from three different systems — all while keeping its primary context clean.

### Common Patterns

#### Decision Trees in Skills

Many real-world investigations have branching logic. Encode this explicitly in your skills rather than hoping the agent figures it out:

```markdown
### Step 3: Determine Root Cause Category

Based on findings so far, branch:

- **If a deployment was found within the anomaly window:**
  Follow deployment correlation path (Step 4a)
- **If no deployment but a config change was detected:**
  Follow config change path (Step 4b)
- **If no changes found but the pattern matches a known scaling issue:**
  Follow capacity path (Step 4c)
- **If none of the above:**
  Expand the investigation window to 24 hours and repeat Step 2
```

This is much more effective than a vague "investigate the root cause." The agent follows the branch that matches its findings, and each branch can have its own detailed methodology.

#### When to Dispatch Task Agents

Use `dispatch` in your skill instructions whenever the agent needs to process raw data from an external system. The rule of thumb: if a step involves querying an API and interpreting the response, that is task agent work. The primary agent should reason about clean summaries, not parse JSON payloads.

Good skill instruction:

```markdown
Dispatch a task agent to query Datadog for error rate metrics on the
affected service for the last 2 hours. The task agent should return:
service name, current error rate, baseline error rate, and whether the
rate exceeds the 2x threshold.
```

Bad skill instruction:

```markdown
Query the metrics system for error data and analyze it.
```

The first version tells the agent exactly what to delegate, what data to ask for, and what shape the summary should take. The second leaves everything ambiguous.

#### Confidence Thresholds

Investigations rarely produce certainty. Encode confidence expectations directly in the skill so the agent knows when to commit to a conclusion and when to hedge:

```markdown
## Confidence Framework

- **> 85% confidence:** State the finding as a conclusion. "This is caused
  by the v2.14 deployment."
- **60-85% confidence:** State the finding as a strong hypothesis. "This is
  most likely caused by the v2.14 deployment, based on the timing correlation
  and error signature match."
- **40-60% confidence:** Present as one of several possibilities. "The timing
  suggests a possible correlation with the v2.14 deployment, but other factors
  may be involved."
- **< 40% confidence:** Report as inconclusive. "There is insufficient
  evidence to determine the root cause. Recommended next steps: [specific
  additional data to gather]."
```

This prevents the agent from being either overconfident (stating guesses as facts) or underconfident (hedging on everything and providing no value).

### Best Practices

* **Be specific about reasoning steps.** Not "investigate the issue" but "query deployment logs for changes within 30 minutes of the anomaly."
* **Include decision points.** "If no deployments found, check for scaling events."
* **Specify dispatching.** "Dispatch a task agent to query Datadog" uses context isolation.
* **Define when to stop.** "If confidence is below 60%, report as inconclusive."
* **Name your steps.** "Step 1: Parse the Alert" is easier for the agent to follow than an unnumbered wall of text.
* **Separate gathering from reasoning.** Steps that query systems should be distinct from steps that interpret results. This maps naturally to the task agent model — gathering is delegated, reasoning stays in the primary agent.
* **Write for a smart colleague, not a machine.** The best skills read like runbooks written by a senior engineer. If a human could follow the skill and reach a good conclusion, the agent can too.


## Stores

Stores give your agent persistent, typed data storage. Define a schema in `stores/` and the runtime auto-generates CRUD tools the agent can use.

### Store Definition

Each store is a JSON file in `stores/`:

```json
{
  "name": "active-alerts",
  "entity": {
    "name": "ClassifiedAlert",
    "key": "{event_id}",
    "schema": {
      "event_id": { "type": "string" },
      "title": { "type": "string" },
      "severity": {
        "type": "enum",
        "values": ["P1", "P2", "P3", "P4"]
      },
      "confidence": {
        "type": "number",
        "min": 0,
        "max": 1
      },
      "timestamp": { "type": "datetime" },
      "metadata": {
        "type": "object",
        "fields": {
          "category": { "type": "string" },
          "tags": { "type": "array", "item": { "type": "string" } }
        }
      },
      "relatedAlert": {
        "type": "ref",
        "store": "active-alerts"
      },
      "notes": {
        "type": "string",
        "nullable": true
      }
    }
  },
  "ttl": 86400,
  "ttl_conditional": {
    "default": 86400,
    "override": [
      { "condition": "severity IN ['P1', 'P2']", "ttl": 300 }
    ]
  },
  "failure": {
    "mode": "partial",
    "retries": 3,
    "backoff": "exponential",
    "deadLetter": true
  },
  "history": { "versions": 3 },
  "trace": true
}
```

### Field Types

| Type       | Description                | Extra Fields                              |
| ---------- | -------------------------- | ----------------------------------------- |
| `string`   | Text                       | —                                         |
| `number`   | Numeric                    | `min`, `max`                              |
| `boolean`  | True/false                 | —                                         |
| `datetime` | ISO 8601 timestamp         | —                                         |
| `enum`     | One of predefined values   | `values: string[]`                        |
| `array`    | List of items              | `item: FieldDefinition`                   |
| `object`   | Nested structure           | `fields: Record<string, FieldDefinition>` |
| `ref`      | Reference to another store | `store: string`                           |

Any field can set `nullable: true`.

### TTL

Simple TTL (seconds):

```json
{ "ttl": 86400 }
```

Conditional TTL:

```json
{
  "ttl_conditional": {
    "default": 86400,
    "override": [
      { "condition": "severity IN ['P1', 'P2']", "ttl": 300 }
    ]
  }
}
```

### Failure Handling

| Mode             | Behavior                               |
| ---------------- | -------------------------------------- |
| `partial`        | Continue on individual write failures  |
| `all-or-nothing` | Rollback entire batch on first failure |
| `skip`           | Skip failed writes silently            |

### Auto-Generated Tools

Store names (kebab-case) become tool names (snake\_case with `store_` prefix):

* `active-alerts` → `store_active_alerts`
* `deal-health` → `store_deal_health`

The agent uses these tools to get, put, list, delete, and query history on store documents.

### Storage Backend

Configured in `amodal.json`:

```json
{
  "stores": {
    "backend": "pglite",
    "dataDir": ".amodal/store-data"
  }
}
```

| Backend    | Description                                                                                                                                                    |
| ---------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `pglite`   | In-process WASM build of Postgres. Runs embedded in the runtime, file-backed at `dataDir` or in-memory. Default for `amodal dev` — zero external dependencies. |
| `postgres` | Real Postgres server via the `postgresUrl` field (supports `env:DATABASE_URL` substitution). Use for hosted runtimes and ISV production deployments.           |

Both backends use [Drizzle ORM](https://orm.drizzle.team) internally against the same schema, so switching between them is a pure config change — no migration scripts, no data-shape differences. Start with `pglite` locally, move to `postgres` when you need durability / shared state across processes.

```json
{
  "stores": {
    "backend": "postgres",
    "postgresUrl": "env:DATABASE_URL"
  }
}
```

### History & Tracing

* `"history": { "versions": 3 }` — retain 3 previous versions of each document
* `"trace": true` — store reasoning traces alongside documents


## Tools

Tools are how the agent interacts with the world beyond conversation. They make API calls, query stores, dispatch sub-agents, create tickets, send messages, and perform any action your product needs. The tool system has two layers: built-in tools that ship with every Amodal agent, and custom tools that you define for your specific business logic.

Built-in tools handle the universal operations — making authenticated HTTP requests, reading and writing stores, dispatching sub-agents, rendering widgets. You never define these; they are always available. Custom tools are where you encode your business-specific actions: creating a Jira ticket with your project's fields, calculating a weighted pipeline score with your formula, triggering a deploy through your CI system. The split is intentional. Built-in tools give the agent its core capabilities. Custom tools give it *your* capabilities.

```
tools/
└── create_ticket/
    ├── tool.json       ← (optional) metadata, parameters, confirmation
    ├── handler.ts      ← handler code
    ├── package.json    ← (optional) npm dependencies
    └── requirements.txt ← (optional) Python dependencies
```

### Built-in Tools

These are always available — you don't define them. They appear in the agent's tool list automatically whenever their prerequisites are met (a connection exists, a store is configured, etc.).

| Tool                | Registered when             | What It Does                                                                                                                                                    |
| ------------------- | --------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **request**         | ≥1 connection defined       | HTTP calls through a connection with automatic auth. Declares `intent: 'read' \| 'write'` for access control.                                                   |
| **query\_store**    | ≥1 store defined            | Read from any configured store — list, filter, paginate.                                                                                                        |
| **write\_\<store>** | ≥1 store defined            | Auto-generated per [store](/guide/stores). Write/upsert a single document.                                                                                      |
| **\<store>\_batch** | ≥1 store defined            | Auto-generated per store. Batch-write multiple documents.                                                                                                       |
| **dispatch\_task**  | always                      | Spawn a sub-agent with its own isolated state machine. Shares the parent's tools. See [Sub-Agents](/learn/architecture/agents).                                 |
| **present**         | always                      | Render a widget (info-card, data-table, metric, etc.) as an SSE event for the client to display inline.                                                         |
| **stop\_execution** | always                      | End the current turn cleanly when the agent is done (useful for automations that shouldn't keep talking).                                                       |
| **web\_search**     | `webTools` block configured | Grounded web search via Gemini Flash + Google Search. Returns a synthesized answer with cited source URLs. See [Web Tools](#web-tools).                         |
| **fetch\_url**      | `webTools` block configured | Fetch and extract the main content of a URL via Gemini urlContext, with a local fetch + Readability fallback for private networks. See [Web Tools](#web-tools). |
| **MCP tools**       | ≥1 MCP server configured    | Auto-discovered from each configured MCP server. Tool names are prefixed with the server name.                                                                  |

**Admin-only tools** — registered only for admin sessions (`/config/chat` in `amodal dev`), never for regular user chat or automations. See [Admin Agent](/guide/admin-agent) for the full list.

### Web Tools

`web_search` and `fetch_url` give the agent grounded access to the public web. They're opt-in via a `webTools` block in `amodal.json`:

```json
{
  "webTools": {
    "provider": "google",
    "apiKey": "env:GOOGLE_API_KEY",
    "model": "gemini-2.5-flash"
  }
}
```

Both tools route through a dedicated Gemini Flash instance with Google Search + urlContext grounding — this works **regardless of the main model provider**. An agent configured to use Anthropic Claude or OpenAI GPT-4o for reasoning can still use `web_search` / `fetch_url`; the search call is proxied to Gemini under the hood. Only the `google` provider is supported today; more providers may be added later.

#### web\_search

```
web_search({query: "kubernetes 1.31 deprecations", max_results: 5})
```

Returns a synthesized answer (up to 2000 tokens) with cited source URLs. `max_results` defaults to 5, capped at 10. The tool description tells the agent to write specific queries — include dates, names, error messages as relevant — since query quality drives answer quality.

#### fetch\_url

```
fetch_url({url: "https://example.com/article", prompt: "Extract the API changes"})
```

Public URLs go through Gemini's urlContext grounding — the model fetches and summarizes the page directly. Private-network URLs (localhost, RFC1918, `.local`) automatically route through a local fetch path with Mozilla Readability extraction.

* Per-hostname rate limit: 10 requests / 60 seconds
* Local fetch: 10s timeout, 1MB body cap
* Falls back to local fetch if Gemini urlContext fails

#### Error handling

Provider errors are classified by HTTP status so the agent knows whether to retry:

| Status          | Tool tells the agent                                                       |
| --------------- | -------------------------------------------------------------------------- |
| 400 / 401 / 403 | Auth problem — **don't retry**. Check the `GOOGLE_API_KEY` for `webTools`. |
| 429             | Rate limit / quota exhausted — **don't retry**.                            |
| 5xx             | Transient — **may retry once**.                                            |

Unexpected errors bubble as `ToolExecutionError`.

### An End-to-End Example

To make tools concrete, let's walk through a realistic scenario: your agent discovers an issue during an investigation and needs to create a Jira ticket to track it.

#### The tool definition

```
tools/
└── create_ops_ticket/
    ├── tool.json
    └── handler.ts
```

**tool.json:**

```json
{
  "name": "create_ops_ticket",
  "description": "Create a Jira issue in the OPS project. Use this when an investigation reveals an issue that needs tracking — infrastructure problems, security findings that need remediation, or follow-up tasks from incident response.",
  "parameters": {
    "type": "object",
    "properties": {
      "summary": {
        "type": "string",
        "description": "A concise title for the issue (under 100 characters)"
      },
      "description": {
        "type": "string",
        "description": "Detailed description including context from the investigation"
      },
      "priority": {
        "type": "string",
        "enum": ["P1", "P2", "P3", "P4"],
        "description": "P1 = outage/security incident, P2 = degraded service, P3 = needs attention this sprint, P4 = backlog"
      },
      "labels": {
        "type": "array",
        "items": { "type": "string" },
        "description": "Labels to categorize the issue (e.g., 'security', 'deployment', 'performance')"
      }
    },
    "required": ["summary", "priority"]
  },
  "confirm": "review",
  "timeout": 30000,
  "env": ["JIRA_API_TOKEN"]
}
```

**handler.ts:**

```typescript
export default async (params, ctx) => {
  // Build the Jira issue payload
  const fields = {
    project: { key: 'OPS' },
    summary: params.summary,
    priority: { name: params.priority },
    issuetype: { name: 'Task' },
    labels: params.labels || [],
  }

  // Add description if provided
  if (params.description) {
    fields.description = {
      type: 'doc',
      version: 1,
      content: [{
        type: 'paragraph',
        content: [{ type: 'text', text: params.description }],
      }],
    }
  }

  // Create the issue via the Jira connection
  const result = await ctx.request('jira', '/rest/api/3/issue', {
    method: 'POST',
    data: { fields },
  })

  // Return a clean summary — the agent doesn't need the full Jira response
  return {
    ticketId: result.key,
    url: `https://yourcompany.atlassian.net/browse/${result.key}`,
    summary: params.summary,
    priority: params.priority,
  }
}
```

#### What happens at runtime

1. The agent is investigating an alert and discovers a misconfigured security group.
2. It decides to create a tracking ticket. Because `create_ops_ticket` has `"confirm": "review"`, the agent presents the proposed action to the user: *"I'd like to create a P2 Jira ticket: 'Security group sg-0a1b2c3d allows unrestricted SSH access from 0.0.0.0/0'. Approve?"*
3. The user approves. The runtime calls the handler, which makes the authenticated API call to Jira via the `jira` connection.
4. The handler returns the ticket ID and URL. The agent reports: *"Created OPS-1847: Security group sg-0a1b2c3d allows unrestricted SSH access. [View in Jira](https://yourcompany.atlassian.net/browse/OPS-1847)"*

The tool description is critical here — it tells the agent *when* to use the tool, not just what it does. "Use this when an investigation reveals an issue that needs tracking" gives the agent judgment about when ticket creation is appropriate.

### Tool Types

#### HTTP Tool (Simple API Call)

For straightforward API calls that don't need custom logic, you can define a tool with just `tool.json` and no handler. The runtime makes the request directly.

```json
{
  "name": "get_deploy_status",
  "description": "Check the status of a deployment in ArgoCD. Use this to verify whether a recent deploy succeeded, is still rolling out, or failed.",
  "type": "http",
  "connection": "argocd",
  "endpoint": "/api/v1/applications/{{app_name}}",
  "method": "GET",
  "parameters": {
    "type": "object",
    "properties": {
      "app_name": {
        "type": "string",
        "description": "The ArgoCD application name (e.g., 'payments-prod', 'auth-staging')"
      }
    },
    "required": ["app_name"]
  },
  "confirm": false,
  "responseShaping": {
    "pick": ["status.sync.status", "status.health.status", "status.operationState.phase", "status.operationState.message"],
    "rename": {
      "status.sync.status": "syncStatus",
      "status.health.status": "healthStatus",
      "status.operationState.phase": "phase",
      "status.operationState.message": "message"
    }
  }
}
```

No handler code needed. The runtime substitutes `{{app_name}}` into the endpoint, makes the GET request via the `argocd` connection (which handles auth), and applies the response shaping to return a clean summary instead of the full ArgoCD response object.

#### Chain Tool (Multi-System)

Chain tools call multiple systems in parallel and combine the results. Useful when a single logical action requires data from several places.

```json
{
  "name": "get_service_overview",
  "description": "Pull a complete overview of a service: deployment status, error metrics, and recent incidents. Use this when starting an investigation or when the user asks about a service's current state.",
  "type": "chain",
  "steps": [
    {
      "name": "deploy",
      "connection": "argocd",
      "endpoint": "/api/v1/applications/{{service_name}}",
      "method": "GET"
    },
    {
      "name": "metrics",
      "connection": "datadog",
      "endpoint": "/api/v1/query",
      "method": "GET",
      "params": {
        "query": "avg:http.error_rate{service:{{service_name}}}",
        "from": "{{now_minus_1h}}",
        "to": "{{now}}"
      }
    },
    {
      "name": "incidents",
      "connection": "pagerduty",
      "endpoint": "/incidents",
      "method": "GET",
      "params": {
        "service_ids[]": "{{pagerduty_service_id}}",
        "since": "{{now_minus_24h}}",
        "statuses[]": ["triggered", "acknowledged"]
      }
    }
  ],
  "parameters": {
    "type": "object",
    "properties": {
      "service_name": { "type": "string" },
      "pagerduty_service_id": { "type": "string" }
    },
    "required": ["service_name"]
  },
  "confirm": false
}
```

All three steps execute in parallel. The agent gets a single combined result with `deploy`, `metrics`, and `incidents` keys.

#### Function Tool (Custom Logic)

Function tools have a handler that runs custom code. This is for anything that requires logic beyond a simple API call: calculations, data transformation, conditional workflows, or calls to systems that don't have a clean REST API.

```typescript
import { defineToolHandler } from '@amodalai/core'

export default defineToolHandler({
  description: 'Calculate the weighted pipeline value for a set of deals based on stage probability and engagement score. Use this for pipeline reviews or forecasting.',
  parameters: {
    type: 'object',
    properties: {
      deal_ids: {
        type: 'array',
        items: { type: 'string' },
        description: 'Salesforce opportunity IDs to include',
      },
      include_at_risk: {
        type: 'boolean',
        description: 'Whether to include deals flagged as at-risk (default: true)',
      },
    },
    required: ['deal_ids'],
  },
  confirm: 'review',
  timeout: 60000,
  handler: async (params, ctx) => {
    // Pull deal data from Salesforce
    const query = `SELECT Id, Name, Amount, StageName, Probability, LastActivityDate
                   FROM Opportunity
                   WHERE Id IN ('${params.deal_ids.join("','")}')`

    const result = await ctx.request('salesforce', '/services/data/v59.0/query', {
      params: { q: query },
    })

    const deals = result.records.map((deal) => {
      const daysSinceActivity = deal.LastActivityDate
        ? Math.floor((Date.now() - new Date(deal.LastActivityDate).getTime()) / 86400000)
        : 999

      // Engagement decay: reduce probability if the deal has gone quiet
      const engagementMultiplier = daysSinceActivity <= 7 ? 1.0
        : daysSinceActivity <= 14 ? 0.85
        : daysSinceActivity <= 30 ? 0.6
        : 0.3

      const adjustedProbability = (deal.Probability / 100) * engagementMultiplier
      const weightedValue = deal.Amount * adjustedProbability

      return {
        id: deal.Id,
        name: deal.Name,
        amount: deal.Amount,
        stage: deal.StageName,
        rawProbability: deal.Probability,
        daysSinceActivity,
        engagementMultiplier,
        adjustedProbability: Math.round(adjustedProbability * 100),
        weightedValue: Math.round(weightedValue),
        atRisk: daysSinceActivity > 14 || deal.Probability < 30,
      }
    })

    const included = params.include_at_risk !== false
      ? deals
      : deals.filter((d) => !d.atRisk)

    return {
      totalWeightedValue: included.reduce((sum, d) => sum + d.weightedValue, 0),
      dealCount: included.length,
      atRiskCount: deals.filter((d) => d.atRisk).length,
      deals: included,
    }
  },
})
```

The handler does real computation — engagement decay scoring, probability adjustments — that would be unreliable if left to the LLM. Domain computation belongs in tool code, not in the agent's reasoning.

### Custom Tool Definition

#### Option A: tool.json + handler.ts

**tool.json:**

```json
{
  "name": "create_ticket",
  "description": "Create a Jira issue in the ops project",
  "parameters": {
    "type": "object",
    "properties": {
      "summary": { "type": "string" },
      "priority": { "type": "string", "enum": ["P1", "P2", "P3", "P4"] }
    },
    "required": ["summary"]
  },
  "confirm": "review",
  "timeout": 30000,
  "env": ["JIRA_API_TOKEN"]
}
```

| Field              | Type                                   | Default        | Description                         |
| ------------------ | -------------------------------------- | -------------- | ----------------------------------- |
| `name`             | string                                 | directory name | Tool name (snake\_case)             |
| `description`      | string                                 | **required**   | Shown to the LLM                    |
| `parameters`       | JSON Schema                            | `{}`           | Input parameters                    |
| `confirm`          | `false \| true \| "review" \| "never"` | `false`        | Confirmation tier                   |
| `timeout`          | number                                 | `30000`        | Timeout in ms                       |
| `env`              | string\[]                              | `[]`           | Allowed env var names               |
| `responseShaping`  | object                                 | —              | Transform response before returning |
| `sandbox.language` | string                                 | `"typescript"` | Handler language                    |

**handler.ts:**

```typescript
export default async (params, ctx) => {
  const result = await ctx.request('jira', '/rest/api/3/issue', {
    method: 'POST',
    data: {
      fields: {
        project: { key: 'OPS' },
        summary: params.summary,
        priority: { name: params.priority },
        issuetype: { name: 'Task' },
      },
    },
  })
  return { ticketId: result.key, url: result.self }
}
```

#### Option B: defineToolHandler (single file)

```typescript
import { defineToolHandler } from '@amodalai/core'

export default defineToolHandler({
  description: 'Calculate weighted pipeline value',
  parameters: {
    type: 'object',
    properties: {
      deal_ids: { type: 'array', items: { type: 'string' } },
    },
    required: ['deal_ids'],
  },
  confirm: 'review',
  timeout: 60000,
  env: ['STRIPE_API_KEY'],
  handler: async (params, ctx) => {
    const deals = await ctx.request('crm', '/deals', {
      params: { ids: params.deal_ids.join(',') },
    })
    return { total: deals.reduce((sum, d) => sum + d.amount, 0) }
  },
})
```

### Response Shaping

API responses are often verbose — hundreds of fields when the agent only needs five. Response shaping transforms the raw response before it reaches the agent, keeping context clean and token usage low.

Define response shaping in `tool.json`:

```json
{
  "name": "get_user_details",
  "description": "Look up a user's profile, role, and recent login activity from Okta.",
  "type": "http",
  "connection": "okta",
  "endpoint": "/api/v1/users/{{user_id}}",
  "method": "GET",
  "parameters": {
    "type": "object",
    "properties": {
      "user_id": { "type": "string" }
    },
    "required": ["user_id"]
  },
  "confirm": false,
  "responseShaping": {
    "pick": [
      "profile.firstName",
      "profile.lastName",
      "profile.email",
      "profile.department",
      "status",
      "lastLogin",
      "created"
    ],
    "rename": {
      "profile.firstName": "firstName",
      "profile.lastName": "lastName",
      "profile.email": "email",
      "profile.department": "department"
    },
    "template": "User: {{firstName}} {{lastName}} ({{email}})\nDepartment: {{department}}\nStatus: {{status}}\nLast Login: {{lastLogin}}\nAccount Created: {{created}}"
  }
}
```

Without shaping, the Okta user response might be 2,000+ tokens of nested JSON including password policies, MFA factors, embedded links, and internal metadata. With shaping, the agent gets a clean 50-token summary with exactly the fields it needs for investigation.

The `pick` field selects specific paths from the response. The `rename` field flattens nested keys. The optional `template` field formats the output as a string instead of JSON — useful when the agent just needs to read the data, not process it programmatically.

### Confirmation Tiers in Practice

Every tool has a confirmation level that determines whether the agent can call it freely or needs user approval. This is the safety model for write operations.

| Tier             | Value      | Behavior                                                       |
| ---------------- | ---------- | -------------------------------------------------------------- |
| **Auto-approve** | `false`    | Agent calls freely. No user interaction.                       |
| **Confirm**      | `true`     | Agent shows what it will do and waits for user approval.       |
| **Review**       | `"review"` | Agent shows full parameters and waits for explicit review.     |
| **Never**        | `"never"`  | Tool is blocked from agent use (only callable by other tools). |

Here is how this plays out in a realistic investigation:

**Scenario:** The agent is investigating elevated error rates on the payments service.

1. **Read operations auto-approve.** The agent queries Datadog for metrics (`request` with `intent: 'read'`), pulls deploy history from ArgoCD, checks PagerDuty for related incidents. All of these are reads — they happen instantly with no user interaction. The investigation flows smoothly.

2. **Single writes confirm.** The agent determines that a rollback is warranted and wants to trigger it via ArgoCD. The rollback tool has `"confirm": true`. The agent presents: *"I'd like to trigger a rollback of payments-service from v2.14.3 to v2.14.2 via ArgoCD. This will start a rolling deployment. Approve?"* The user reviews and confirms.

3. **Sensitive writes require review.** After the rollback, the agent wants to create a Jira ticket and post a summary to the #incidents Slack channel. Both tools have `"confirm": "review"`. The agent shows the full ticket payload and Slack message content for the user to review before sending.

4. **Bulk operations get extra scrutiny.** If the agent needed to update 12 Jira tickets (adding a label to all tickets related to the incident), the runtime requires itemized confirmation for bulk writes (more than 5 items). The user sees each ticket that will be modified and approves the batch.

5. **Dangerous operations are blocked.** A `delete_deployment` tool might have `"confirm": "never"` — the agent cannot call it at all during interactive sessions. It exists only for use by other tools in controlled workflows.

The progression is natural: reads are fast, writes are confirmed, bulk writes are reviewed individually, and destructive operations are gated. The user stays in control without being interrupted for every API call.

### Handler Context

The `ctx` object available in every handler:

| Method                                        | Description                      |
| --------------------------------------------- | -------------------------------- |
| `ctx.request(connection, endpoint, options?)` | Make an authenticated API call   |
| `ctx.exec(command, options?)`                 | Run a shell command              |
| `ctx.env(name)`                               | Read an allowed env var          |
| `ctx.log(message)`                            | Log a message                    |
| `ctx.user`                                    | User info: `{ roles: string[] }` |
| `ctx.signal`                                  | AbortSignal for cancellation     |

#### ctx.request in practice

The `request` method handles authentication automatically based on the connection config. You specify the connection name and the endpoint path — the runtime resolves the base URL, attaches credentials, and handles retries.

```typescript
export default async (params, ctx) => {
  // Simple GET — auth is handled by the 'datadog' connection config
  const metrics = await ctx.request('datadog', '/api/v1/query', {
    params: {
      query: `avg:system.cpu.user{host:${params.hostname}}`,
      from: params.startTime,
      to: params.endTime,
    },
  })

  // POST with a body
  const incident = await ctx.request('pagerduty', '/incidents', {
    method: 'POST',
    data: {
      incident: {
        type: 'incident',
        title: params.title,
        service: { id: params.serviceId, type: 'service_reference' },
        urgency: params.urgency,
      },
    },
  })

  return {
    cpuAverage: metrics.series[0]?.pointlist?.map(([ts, val]) => val) || [],
    incidentId: incident.incident.id,
  }
}
```

#### ctx.exec for shell commands

Use `ctx.exec` for operations that don't map to a REST API — data processing, file manipulation, or calling CLI tools.

```typescript
export default async (params, ctx) => {
  // Run a database query via CLI
  const result = await ctx.exec(
    `psql "${ctx.env('DATABASE_URL')}" -c "SELECT count(*) as failed_jobs FROM jobs WHERE status = 'failed' AND created_at > now() - interval '1 hour'" --json`,
    { timeout: 10000 }
  )

  const parsed = JSON.parse(result.stdout)
  return { failedJobsLastHour: parsed[0].failed_jobs }
}
```

#### ctx.env for secrets

Only environment variables listed in the tool's `env` array are accessible. This prevents tools from reading secrets they should not have access to.

```typescript
// tool.json has: "env": ["SLACK_WEBHOOK_URL"]
export default async (params, ctx) => {
  const webhookUrl = ctx.env('SLACK_WEBHOOK_URL')

  // Use the webhook directly (not through a connection)
  const response = await fetch(webhookUrl, {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify({
      text: params.message,
      channel: params.channel,
    }),
  })

  return { sent: response.ok }
}
```

### Naming Convention

Tool names must be **snake\_case**: lowercase letters, digits, and underscores, starting with a letter. Example: `create_ticket`, `fetch_deals`, `calculate_risk`.


## amodal chat

Open an interactive terminal chat with your agent. The chat UI is a React-based TUI built with Ink.

```bash
amodal chat
```

### Modes

#### Local mode (default)

Boots a local runtime server from your repo and connects to it:

```bash
amodal chat
```

#### Remote mode

Connect to an already-running server:

```bash
amodal chat --url http://localhost:3847
amodal chat --url https://my-agent.amodal.ai
```

#### Snapshot mode

Load from a snapshot file (built with `amodal deploy build`):

```bash
amodal chat --config snapshot.json
```

### Options

| Flag                    | Description                   |
| ----------------------- | ----------------------------- |
| `--url <remote>`        | Connect to remote server      |
| `--config <file>`       | Load from snapshot            |
| `--app-id <id>`         | App identifier                |
| `--port <number>`       | Local server port             |
| `--resume <id\|latest>` | Resume a previous session     |
| `--fullscreen`          | Use alternate terminal buffer |

### Features

* **Streaming responses** — see the agent think in real-time
* **Tool call display** — watch tool invocations as they happen
* **Skill activation** — see which reasoning framework is active
* **Session resume** — pick up where you left off
* **Session browser** — navigate previous conversations
* **Markdown rendering** — formatted output in the terminal
* **Responsive layout** — adapts to terminal size


## amodal pkg connect & sync

### connect

Add a connection to your agent. Connections give the agent API access and documentation for external systems.

```bash
# Install a pre-built plugin
amodal pkg connect slack
amodal pkg connect datadog
amodal pkg connect github

# List available plugins
amodal pkg search --tag connection
```

This installs the plugin package into `node_modules`. The resolver automatically surfaces its connections, skills, knowledge, and other contents alongside your local files.

### sync

Sync API specifications from remote sources. Useful for custom APIs or keeping specs up to date.

```bash
# Sync from an OpenAPI spec
amodal pkg sync --from https://api.example.com/openapi.json

# Sync from a GraphQL schema
amodal pkg sync --from https://api.example.com/graphql

# Check for drift without updating (useful in CI)
amodal pkg sync --check
```

#### Drift Detection

The `--check` flag compares your local specs against the remote source and reports differences without making changes. This is useful in CI pipelines to catch API changes early.

### Connection Structure

Each connection directory contains:

```
connections/slack/
├── spec.json       ← endpoints, auth config, entity list
├── access.json     ← field restrictions, action tiers, scoping rules
└── credentials     ← (gitignored) API keys, tokens
```

#### spec.json

Machine-readable configuration:

* **baseUrl** — API base URL
* **specUrl** — URL to the API spec document (optional)
* **auth** — Authentication method (bearer, OAuth, API key)
* **sync** — Sync configuration and filters

#### access.json

Security rules:

* **Field restrictions** — which fields are readable/writable
* **Action tiers** — confirm/review/never for different operations
* **Scoping rules** — app-level access control

### Available Plugins

50+ pre-built connection plugins are available. Browse them in the [Amodal Marketplace](https://www.amodalai.com/marketplace) or see [Connections](/guide/connections) for setup details.


## amodal dev

Start a local runtime server for development. The server watches your repo config files and hot-reloads on every change.

```bash
amodal dev
```

### Options

| Flag             | Default     | Description                                              |
| ---------------- | ----------- | -------------------------------------------------------- |
| `--port`         | `3847`      | Server port                                              |
| `--host`         | `localhost` | Server host                                              |
| `--resume`       |             | Resume a previous session by ID or `latest`              |
| `-v`             |             | Increase log verbosity (`-v` for debug, `-vv` for trace) |
| `-q` / `--quiet` |             | Only show errors                                         |

### Log Levels

The default log level is `info` — startup messages, connection status, and warnings. Use `-v` or `-vv` to see more, or `-q` to see less.

| Flag        | Level | What you see                                |
| ----------- | ----- | ------------------------------------------- |
| `-q`        | error | Errors and fatal only                       |
| *(default)* | info  | Startup, connections, warnings              |
| `-v`        | debug | Session init, MCP, store ops, config reload |
| `-vv`       | trace | Tool execution, upstream library output     |

You can also set the `LOG_LEVEL` environment variable (`trace`, `debug`, `info`, `warn`, `error`, `fatal`, `none`). The env var takes precedence over CLI flags.

```bash
# Quiet mode
amodal dev -q

# Debug output
amodal dev -v

# Full trace (including upstream library noise)
amodal dev -vv

# Via environment variable
LOG_LEVEL=debug amodal dev
```

### What It Does

1. Loads your repo configuration from the project root
2. Starts an Express server with SSE streaming
3. Watches all config files with 300ms debounce
4. Manages sessions with TTL-based cleanup

### Endpoints

The dev server exposes:

| Method | Path               | Description                             |
| ------ | ------------------ | --------------------------------------- |
| `POST` | `/chat`            | Send a message, receive SSE stream      |
| `POST` | `/task`            | Start a background task                 |
| `GET`  | `/task/:id`        | Get task status                         |
| `GET`  | `/task/:id/stream` | Stream task output                      |
| `GET`  | `/inspect/context` | View compiled context with token counts |
| `GET`  | `/health`          | Health check                            |

### SSE Events

The chat endpoint returns a Server-Sent Events stream with these event types:

| Event                         | Description                    |
| ----------------------------- | ------------------------------ |
| `text`                        | Assistant text output          |
| `tool_call`                   | Tool invocation                |
| `ExploreStart` / `ExploreEnd` | System exploration phase       |
| `PlanMode`                    | Planning phase                 |
| `skill_activated`             | Skill activation               |
| `FieldScrub`                  | Sensitive field redaction      |
| `ConfirmationRequired`        | Write operation needs approval |
| `kb_proposal`                 | Knowledge base update proposal |

### Hot Reload

Edit any config file and the runtime picks up changes immediately:

* Connection specs, skills, knowledge, tools, automations
* Config changes (provider, model, etc.)
* No server restart needed


## amodal eval

Run evaluation suites against your agent to measure quality, compare models, and track regressions.

```bash
amodal eval
```

### Eval Files

Evals live in `evals/` as YAML files:

```yaml
name: triage-accuracy
description: Test alert triage quality
cases:
  - input: "Review recent security alerts"
    rubric:
      - "Correctly identifies critical alerts"
      - "Filters known false positives"
      - "Provides severity ranking"
    expected_tools:
      - request
      - query_store
```

### Evaluation Methods

| Method            | Description                                              |
| ----------------- | -------------------------------------------------------- |
| **LLM Judge**     | An LLM evaluates the agent's response against the rubric |
| **Tool usage**    | Verify expected tools were called                        |
| **Cost tracking** | Track token usage and cost per eval                      |

### Experiments

Compare different configurations side-by-side:

```bash
amodal ops experiment
```

Experiments let you test:

* Different LLM providers or models
* Different skill configurations
* Different prompt variations
* Different knowledge documents

Results include cost comparison, quality scores, and latency metrics.

### Multi-Model Comparison

Run the same eval suite against multiple providers to find the best model for your use case:

```bash
amodal eval --providers anthropic,openai,google
```


## CLI

The `amodal` CLI is the primary interface for building, running, and testing agents. Install it globally or use via `npx`:

```bash
npm install -g @amodalai/amodal
# or
npx amodal <command>
```

### Commands

#### Project

| Command                | Description                                                       |
| ---------------------- | ----------------------------------------------------------------- |
| [`init`](/cli/init)    | Scaffold a new agent project                                      |
| [`dev`](/cli/dev)      | Start local dev server with hot reload                            |
| `validate`             | Check config and test live connections (`--skip-test` to disable) |
| `inspect`              | Show compiled context with token counts                           |
| `build-manifest-types` | Generate TypeScript types from manifest                           |

#### Packages (`amodal pkg`)

| Command                       | Description                          |
| ----------------------------- | ------------------------------------ |
| [`pkg connect`](/cli/connect) | Add a connection (plugin or custom)  |
| [`pkg sync`](/cli/connect)    | Sync API specs from remote sources   |
| `pkg install`                 | Install marketplace packages         |
| `pkg uninstall`               | Remove packages                      |
| `pkg list`                    | List installed items                 |
| `pkg update`                  | Update packages                      |
| `pkg diff`                    | Show package changes                 |
| `pkg search`                  | Search the marketplace               |
| `pkg publish`                 | Publish to the registry              |
| `pkg link`                    | Link a local package for development |

#### Deploy (`amodal deploy`)

| Command           | Description                        |
| ----------------- | ---------------------------------- |
| `deploy push`     | Deploy to the platform             |
| `deploy build`    | Build a deployment snapshot        |
| `deploy serve`    | Run from a snapshot file           |
| `deploy status`   | Show deployment status             |
| `deploy list`     | List deployments                   |
| `deploy rollback` | Roll back to a previous deployment |
| `deploy promote`  | Promote a deployment               |

#### Runtime

| Command             | Description                               |
| ------------------- | ----------------------------------------- |
| [`chat`](/cli/chat) | Interactive terminal chat with your agent |
| `test-query`        | Fire a one-off query against the agent    |

#### Testing & Evaluation

| Command             | Description           |
| ------------------- | --------------------- |
| [`eval`](/cli/eval) | Run evaluation suites |
| `test-query`        | Test a single query   |

#### Ops (`amodal ops`)

| Command           | Description                         |
| ----------------- | ----------------------------------- |
| `ops secrets`     | Manage secrets                      |
| `ops docker`      | Docker image management             |
| `ops automations` | Manage automations                  |
| `ops audit`       | View audit logs                     |
| `ops experiment`  | Compare models, prompts, or configs |

#### Auth (`amodal auth`)

| Command       | Description            |
| ------------- | ---------------------- |
| `auth login`  | Log in to the platform |
| `auth logout` | Log out                |


## amodal init

Initialize a new Amodal agent project. Creates the config file, directory structure, and starter files.

```bash
amodal init
```

### What It Creates

```
my-agent/
├── amodal.json         ← agent name, provider, model
├── skills/             ← starter skill template
├── knowledge/          ← sample knowledge document
├── connections/        ← empty, ready for connections
├── automations/        ← empty, ready for automations
└── evals/              ← empty, ready for test cases
```

### Options

```bash
amodal init --name "My Agent"         # set project name (defaults to directory name)
amodal init --provider openai         # use OpenAI instead of Anthropic (default)
amodal init --provider google         # use Google Gemini
```

### Next Steps

After init, add a connection and start the dev server:

```bash
amodal pkg connect stripe      # install a registry connection
amodal validate                # check config is valid
amodal dev                     # start the agent
```