Skip to content

Deployment

Civitas uses a four-level deployment ladder. Every level runs the same agent code — only the transport and process topology change. Start at Level 1, graduate to higher levels as your scale or isolation requirements grow.


The deployment ladder

Deployment Levels


Level 1 — Single process

Default. No extra dependencies. For development and simple single-machine deployments.

Single Process Deployment

All agents run as asyncio tasks inside one Python process. The InProcessTransport delivers messages via asyncio queues — fast (~2–5 µs per message), zero network overhead.

Install:

pip install civitas

topology.yaml:

transport:
  type: in_process

plugins:
  models:
    - type: anthropic
      config:
        default_model: claude-sonnet-4-6

supervision:
  name: root
  strategy: ONE_FOR_ONE
  children:
    - agent:
        name: orchestrator
        type: myapp.Orchestrator
    - agent:
        name: researcher
        type: myapp.Researcher
    - agent:
        name: summarizer
        type: myapp.Summarizer

Run:

civitas run --topology topology.yaml

When to use Level 1:

  • Development and testing
  • Workloads that are I/O-bound (LLM calls, API calls) — the GIL doesn't matter
  • Demos and quickstarts
  • Any deployment where process isolation is not needed

Level 2 — Multi-process (ZMQ)

Scale beyond the GIL. Isolate agent processes on a single machine.

Multi-Process ZMQ Deployment

Agents marked process: worker in the topology run in a separate OS process. The supervisor process starts a ZMQ XSUB/XPUB proxy; all other processes connect to it.

Install:

pip install civitas[zmq]

topology.yaml:

transport:
  type: zmq
  pub_addr: "tcp://127.0.0.1:5559"
  sub_addr: "tcp://127.0.0.1:5560"
  start_proxy: true

supervision:
  name: root
  strategy: ONE_FOR_ONE
  children:
    - agent:
        name: orchestrator
        type: myapp.Orchestrator
        # no process: — runs in the supervisor process

    - agent:
        name: researcher
        type: myapp.WebResearcher
        process: worker            # runs in a separate Worker process

    - agent:
        name: summarizer
        type: myapp.Summarizer
        process: worker

Run:

# Terminal 1 — supervisor process (also starts the ZMQ proxy)
civitas run --topology topology.yaml

# Terminal 2 — worker process (connects to the proxy)
civitas run --topology topology.yaml --process worker

Multiple agents can share the same process: name — they all run in one worker. Each distinct process name requires a separate civitas run --process <name> invocation.

When to use Level 2:

  • True CPU-bound agents that need to escape the GIL (image processing, local inference)
  • GPU agents that require process-level memory isolation (TensorFlow/PyTorch)
  • Crash isolation: a runaway agent in one process can't corrupt the memory of others
  • Single-machine staging before moving to distributed

Level 3 — Distributed (NATS)

Run agents on multiple machines. Production scale.

Distributed NATS Deployment

NATS replaces the ZMQ proxy with a cloud-native message broker. Every agent subscribes to its own NATS subject (civitas.agent.<name>). The topology, supervision tree, and agent code are unchanged from Level 2.

Install:

pip install civitas[nats]

Start NATS:

# Local development
docker run -d -p 4222:4222 -p 8222:8222 nats

# With JetStream (for at-least-once delivery)
docker run -d -p 4222:4222 -p 8222:8222 nats --jetstream

topology.yaml:

transport:
  type: nats
  servers: "nats://localhost:4222"
  jetstream: false               # set true for at-least-once delivery

plugins:
  models:
    - type: anthropic
      config:
        default_model: claude-sonnet-4-6
        max_tokens: 8192
        max_retries: 3

  state:
    type: sqlite
    config:
      db_path: /data/agency_state.db

supervision:
  name: root
  strategy: ONE_FOR_ONE
  max_restarts: 3
  backoff: EXPONENTIAL
  backoff_base: 2.0
  children:
    - agent:
        name: orchestrator
        type: myapp.Orchestrator

    - agent:
        name: researcher
        type: myapp.WebResearcher
        process: worker

    - agent:
        name: summarizer
        type: myapp.Summarizer

Run:

# Machine A — supervisor
civitas run --topology topology.yaml

# Machine B — worker (point --nats-url at your NATS host if it's not localhost)
civitas run --topology topology.yaml --process worker --nats-url nats://nats-host:4222

JetStream (durable subscriptions):

With jetstream: true, agents use durable NATS subscriptions. Messages are persisted in the NATS server and redelivered if a subscriber disconnects. Use this when:

  • Agents must not miss messages during a restart or brief network partition
  • You need at-least-once delivery guarantees

For most workloads, at-most-once delivery with supervisor-level restart (the default) is sufficient and simpler to operate.

When to use Level 3:

  • Agents that need to scale horizontally across machines
  • Cloud deployments (Kubernetes pods, ECS tasks, GKE containers)
  • High-availability requirements — run a NATS cluster for HA
  • Geographic distribution of agent processes

Level 4 — Containerized (Docker Compose)

Generate a ready-to-run Docker Compose stack from your topology.

civitas deploy docker-compose --topology topology.yaml --output ./deploy

This reads your topology YAML, inspects the process: assignments, and generates:

  • Dockerfile — builds an Civitas image with the right transport extras
  • docker-compose.yml — one service per process group (supervisor + one per worker name)
  • .env — runtime environment variables with placeholders for secrets
  ✔ Generated deployment artifacts

    docker-compose.yml         supervisor + 2 workers
    Dockerfile                 agent image
    .env                       runtime config
    topology.yaml              topology (copied)

    4 agents across 3 containers

  Run with: cd deploy && docker compose up

Example output for a NATS topology:

# docker-compose.yml (generated)
services:
  nats:
    image: nats:latest
    ports:
      - "4222:4222"
      - "8222:8222"
    command: --jetstream
    restart: unless-stopped
    healthcheck:
      test: ["CMD-SHELL", "wget -qO- http://localhost:8222/healthz || exit 1"]
      interval: 5s
      timeout: 3s
      retries: 3

  supervisor:
    build: .
    command: ["--topology", "topology.yaml"]
    volumes:
      - ./topology.yaml:/app/topology.yaml:ro
    restart: unless-stopped
    depends_on:
      nats:
        condition: service_healthy
    environment:
      AGENCY_SERIALIZER: ${AGENCY_SERIALIZER:-msgpack}
      NATS_URL: nats://nats:4222

  worker-worker:
    build: .
    command: ["--topology", "topology.yaml", "--process", "worker"]
    volumes:
      - ./topology.yaml:/app/topology.yaml:ro
    restart: unless-stopped
    depends_on:
      nats:
        condition: service_healthy
      supervisor:
        condition: service_started
    environment:
      AGENCY_SERIALIZER: ${AGENCY_SERIALIZER:-msgpack}
      NATS_URL: nats://nats:4222

Deploy:

cd deploy
# Set secrets in .env before starting
echo "ANTHROPIC_API_KEY=sk-ant-..." >> .env

docker compose up --build

Scale workers horizontally:

docker compose up --scale worker-worker=3

Switching between levels

The same agent code runs at every level. Only the transport: block in your topology changes:

# Level 1 — development
transport:
  type: in_process

# Level 2 — multi-process on one machine
transport:
  type: zmq
  pub_addr: "tcp://127.0.0.1:5559"
  sub_addr: "tcp://127.0.0.1:5560"
  start_proxy: true

# Level 3 — distributed, multiple machines
transport:
  type: nats
  servers: "nats://prod-nats:4222"
  jetstream: true

You can also override the transport at runtime without editing the file:

civitas run --topology topology.yaml --transport nats --nats-url nats://prod:4222

Environment variables

Variable Default Description
AGENCY_SERIALIZER msgpack Message serializer: msgpack or json
OTEL_EXPORTER_OTLP_ENDPOINT None OTEL gRPC endpoint (e.g. http://localhost:4317). Unset = console output
ANTHROPIC_API_KEY None Anthropic API key — required when using AnthropicProvider
OPENAI_API_KEY None OpenAI key — required when using LiteLLMProvider with OpenAI models
GEMINI_API_KEY None Google Gemini key — required when using LiteLLM with Gemini
NATS_URL nats://localhost:4222 NATS server URL — overrides transport.servers in topology

Standard OTEL SDK variables (OTEL_SERVICE_NAME, OTEL_RESOURCE_ATTRIBUTES, etc.) are respected when opentelemetry-sdk is installed.

Never put secrets in topology YAML. The ANTHROPIC_API_KEY and other credentials are read from the environment automatically — no config key needed.


Production checklist

Transport

  • NATS is deployed with clustering for high availability
  • jetstream: true enabled if at-least-once delivery is required
  • NATS_URL is set in environment — not hardcoded in topology
  • NATS monitoring endpoint is reachable (http://nats-host:8222)

Supervision

  • max_restarts and restart_window tuned for expected failure rates
  • backoff: EXPONENTIAL set on supervisors that talk to external services
  • Critical agents (orchestrators, state managers) are at the root supervisor level

State

  • SQLiteStateStore (or a custom durable store) configured — not in_memory
  • db_path points to a persistent volume (not a container ephemeral filesystem)
  • State CLI commands tested: civitas state list, civitas state show <name>

Observability

  • OTEL_EXPORTER_OTLP_ENDPOINT set and validated against a real backend
  • OTEL_SERVICE_NAME set to identify this deployment in the trace backend
  • Cost attribution verified: LLM spans carry llm.cost_usd
  • Supervisor restart spans visible in trace backend

Secrets

  • All API keys are in environment or secret manager — not in topology files
  • .env file is in .gitignore
  • Docker secrets or Kubernetes Secrets used in container deployments

Operations

  • civitas topology validate topology.yaml passes in CI
  • docker compose up --scale worker-worker=N tested for horizontal scale
  • Graceful shutdown tested: SIGTERMruntime.stop() → spans flushed
  • Health check endpoint or NATS monitoring is wired to your load balancer or orchestrator

Upgrading between levels

Level 1 → Level 2: Install python-civitas[zmq]. Change transport.type to zmq. Add process: worker to agents you want isolated. Start a second terminal with --process worker. Agent code: unchanged.

Level 2 → Level 3: Install python-civitas[nats]. Start a NATS server. Change transport.type to nats, set servers. Run each process group on its own machine (or container). Agent code: unchanged.

Level 3 → Level 4: Run civitas deploy docker-compose --topology topology.yaml. Edit the generated .env with real secrets. Run docker compose up --build. Agent code: unchanged.