Observability¶

Civitas generates OpenTelemetry spans for every message, LLM call, tool invocation, and supervisor event automatically — no instrumentation code required in your agents. This document covers what is traced, how to view it, how to export to external backends, and how to add custom spans.

What is automatically traced¶

Every operation in the runtime emits a span:

Operation	Span name	Key attributes
Message sent	`send {type}`	`civitas.sender`, `civitas.recipient`, `civitas.message_type`, `civitas.message_id`
Message received	`recv {type}`	`civitas.sender`, `civitas.recipient`, `civitas.message_type`, `civitas.message_id`
Agent started	`civitas.agent.start`	`civitas.agent.name`
Message handled	`civitas.agent.handle`	`civitas.agent.name`, `civitas.message_type`, `civitas.attempt`
Agent stopped	`civitas.agent.stop`	`civitas.agent.name`
Message retried	`civitas.agent.retry`	`civitas.agent.name`, `civitas.attempt`
Supervisor restart	`supervisor.restart`	`civitas.supervisor`, `civitas.child`, `civitas.restart_count`, `civitas.strategy`, `civitas.error`
LLM call	`llm.chat {model}`	`llm.model`, `llm.tokens_in`, `llm.tokens_out`, `llm.cost_usd`, `llm.latency_ms`
Tool invocation	`tool.execute {name}`	`tool.name`, `tool.result_status`, `tool.latency_ms`

Zero configuration required. Run any Civitas program and these spans are emitted.

Three output modes¶

Civitas selects the output mode automatically based on what is installed and what environment variables are set:

Observability Setup

Mode 1 — Built-in console output¶

Default — no dependencies beyond python-civitas core.

When opentelemetry-sdk is not installed, Civitas prints a human-readable summary to the console via Python's logging module at DEBUG level. Enable it:

import logging
logging.basicConfig(level=logging.DEBUG)

Output format:

[10:00:00.123] orchestrator -> researcher: research_query
  [llm] claude-haiku-4-5: 1520in/430out $0.0089 2341ms
  [tool] web_search: ok 450ms
[10:00:02.480] researcher -> summarizer: summarize_request
  [llm] claude-haiku-4-5: 890in/210out $0.0003 615ms
[10:00:03.100] summarizer -> orchestrator: reply

This mode is zero-dependency and suitable for development. No spans are exported to any external system.

Mode 2 — OTEL ConsoleSpanExporter¶

Install opentelemetry-sdk, no endpoint configured.

pip install civitas[otel]

Without OTEL_EXPORTER_OTLP_ENDPOINT set, Civitas configures OpenTelemetry's built-in ConsoleSpanExporter, which writes full OTEL-format JSON spans to stdout. Useful for verifying span structure and attributes before connecting to a real backend.

{
    "name": "llm.chat claude-haiku-4-5",
    "context": {
        "trace_id": "0x4bf92f3577b34da6a3ce929d0e0e4736",
        "span_id": "0x00f067aa0ba902b7"
    },
    "parent_id": "0xa3ce929d0e0e4736",
    "start_time": "2026-04-06T10:00:00.123Z",
    "end_time": "2026-04-06T10:00:02.464Z",
    "attributes": {
        "llm.model": "claude-haiku-4-5",
        "llm.tokens_in": 1520,
        "llm.tokens_out": 430,
        "llm.cost_usd": 0.0089,
        "llm.latency_ms": 2341.2
    },
    "status": "OK"
}

Mode 3 — OTLP export (Jaeger, Grafana, Datadog, etc.)¶

Install opentelemetry-sdk and set OTEL_EXPORTER_OTLP_ENDPOINT.

pip install civitas[otel]
export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317

Civitas uses BatchSpanProcessor which exports spans in a background thread — the message loop is never blocked by network I/O. Spans are buffered and flushed automatically. On runtime.stop(), force_flush() is called to drain any pending spans.

Jaeger (local development)¶

# Start Jaeger all-in-one
docker run -d \
  -p 16686:16686 \
  -p 4317:4317 \
  jaegertracing/all-in-one

export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317
python examples/research_assistant.py "Compare AI safety approaches"

# Open the trace UI
open http://localhost:16686

In Jaeger you will see a single distributed trace per request, with all agent spans, LLM calls, tool invocations, and supervisor events linked by parent-child relationships.

Grafana + Tempo¶

# Start Grafana Tempo (OTLP gRPC on port 4317)
docker run -d -p 4317:4317 -p 3200:3200 grafana/tempo

export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317
python my_agent.py

Other OTEL-compatible backends¶

Any backend accepting OTLP gRPC works: Datadog (http://localhost:4317), Honeycomb, New Relic, Lightstep, AWS X-Ray via ADOT collector, etc. Set OTEL_EXPORTER_OTLP_ENDPOINT to your collector's gRPC address.

Trace context propagation¶

Trace context flows automatically through every message. You never set trace_id or span_id manually.

Distributed Tracing

The trace_id is the same for every span in a causal chain. The parent-child relationships (parent_span_id) form the tree structure that OTEL backends render as a waterfall trace. This propagates across process and machine boundaries — spans from a Worker process appear in the same trace as spans from the supervisor process.

Adding custom spans in agents¶

llm_span — context manager¶

Use self.llm_span() to wrap any LLM call. It creates a span parented to the current handle() span and records timing automatically:

class MyAgent(AgentProcess):
    async def handle(self, message: Message) -> Message | None:
        with self.llm_span("claude-haiku-4-5") as span:
            response = await self.llm.chat(
                model="claude-haiku-4-5",
                messages=[{"role": "user", "content": message.payload["question"]}],
            )
            # Enrich the span with response data
            span.set_attribute("civitas.llm.tokens_in", response.tokens_in)
            span.set_attribute("civitas.llm.tokens_out", response.tokens_out)
            span.set_attribute("civitas.llm.cost_usd", response.cost_usd)

        return self.reply({"answer": response.content})

Errors inside the with block are automatically recorded on the span before re-raising.

tool_span — context manager¶

class MyAgent(AgentProcess):
    async def handle(self, message: Message) -> Message | None:
        tool = self.tools.get("web_search")

        with self.tool_span("web_search") as span:
            result = await tool.execute(query=message.payload["query"])
            span.set_attribute("civitas.tool.result_count", len(result["results"]))

        return self.reply({"results": result["results"]})

Custom spans via tracer directly¶

For anything that doesn't fit llm_span or tool_span, use self._tracer.start_span() directly:

class MyAgent(AgentProcess):
    async def handle(self, message: Message) -> Message | None:
        span = self._tracer.start_span(
            "my_custom_operation",
            trace_id=message.trace_id,
            parent_span_id=message.span_id,
            attributes={"my.custom.attr": "value"},
        )
        try:
            result = await do_work()
            span.set_attribute("my.result.size", len(result))
        except Exception as exc:
            span.set_error(exc)
            raise
        finally:
            span.end()   # always end the span

        return self.reply({"result": result})

Always call span.end() in a finally block. Unclosed spans are not exported.

Span attribute reference¶

All Civitas-emitted attributes follow the civitas.* and llm.* / tool.* namespace conventions:

Message spans¶

Attribute	Type	Description
`civitas.sender`	string	Name of the sending agent
`civitas.recipient`	string	Name of the target agent
`civitas.message_type`	string	Value of `message.type`
`civitas.message_id`	string	UUID7 message ID

Agent lifecycle spans¶

Attribute	Type	Description
`civitas.agent.name`	string	Agent name
`civitas.message_type`	string	Type of message being handled
`civitas.attempt`	int	Retry attempt number (0 = first delivery)

Supervisor spans¶

Attribute	Type	Description
`civitas.supervisor`	string	Supervisor name
`civitas.child`	string	Name of the restarted child
`civitas.restart_count`	int	Restart number for this child
`civitas.strategy`	string	`ONE_FOR_ONE` / `ONE_FOR_ALL` / `REST_FOR_ONE`
`civitas.error`	string	Exception that caused the restart

LLM spans¶

Attribute	Type	Description
`llm.model`	string	Model identifier (e.g. `claude-haiku-4-5`)
`llm.tokens_in`	int	Input token count
`llm.tokens_out`	int	Output token count
`llm.cost_usd`	float	Estimated cost in USD
`llm.latency_ms`	float	End-to-end call latency in milliseconds

Tool spans¶

Attribute	Type	Description
`tool.name`	string	Tool name
`tool.result_status`	string	`ok` or `error`
`tool.latency_ms`	float	Execution latency in milliseconds

Error attributes (any span)¶

Attribute	Type	Description
`error`	bool	`True` if an error was recorded
`error.type`	string	Exception class name
`error.message`	string	Exception message string

SpanQueue — non-blocking export¶

All span emission from the message loop goes through a SpanQueue. The tracer calls put_nowait() — it never blocks. A background consumer drains the queue and calls the export backend.

If the queue fills up (default: 10,000 spans), the oldest span is dropped to make room. Losing a span is preferable to stalling the message loop. In practice this only occurs under extreme load or if the export backend is very slow.

You do not interact with the SpanQueue directly — it is wired by the runtime.

Custom export backends¶

The ExportBackend protocol has two methods:

class ExportBackend(Protocol):
    async def export(self, spans: list[SpanData]) -> None: ...
    async def shutdown(self) -> None: ...

Example — sending spans to a custom HTTP endpoint:

from civitas.observability.export_backend import ExportBackend
from civitas.observability.span_queue import SpanData
import aiohttp

class HttpBackend:
    def __init__(self, url: str) -> None:
        self._url = url

    async def export(self, spans: list[SpanData]) -> None:
        async with aiohttp.ClientSession() as session:
            payload = [
                {
                    "name": s.name,
                    "trace_id": s.trace_id,
                    "duration_ms": (s.end_time - s.start_time) * 1000,
                    "attributes": s.attributes,
                    "status": s.status,
                }
                for s in spans
            ]
            await session.post(self._url, json=payload)

    async def shutdown(self) -> None:
        pass

Use FanOutBackend to export to multiple backends simultaneously:

from civitas.observability.export_backend import FanOutBackend, ConsoleBackend

backend = FanOutBackend([
    ConsoleBackend(),
    HttpBackend("https://my-collector/spans"),
])

Environment variables¶

Variable	Default	Description
`OTEL_EXPORTER_OTLP_ENDPOINT`	`None`	gRPC endpoint for OTLP export. If unset, falls back to console.
`AGENCY_SERIALIZER`	`msgpack`	Serializer for messages: `msgpack` or `json`.

Standard OTEL SDK environment variables (OTEL_SERVICE_NAME, OTEL_RESOURCE_ATTRIBUTES, etc.) are respected when opentelemetry-sdk is installed.

Cost attribution¶

Every LLM span carries llm.cost_usd. Aggregating this attribute by civitas.agent.name in your OTEL backend gives you per-agent cost attribution over time:

orchestrator   $0.0421  (3 LLM calls)
researcher     $0.0089  (1 LLM call)
summarizer     $0.0003  (1 LLM call)
─────────────────────────────────────
Total          $0.0513

The Anthropic provider computes cost from the model's token pricing. The LiteLLM provider uses LiteLLM's built-in cost calculation. Custom providers should populate cost_usd in ModelResponse for this to work.

Tips¶

Jaeger trace not appearing? Check that OTEL_EXPORTER_OTLP_ENDPOINT points to the gRPC port (default 4317), not the HTTP port (4318) or the Jaeger UI port (16686).

Spans not flushed on exit? Ensure await runtime.stop() is called — it calls force_flush() on the OTEL provider. If you Ctrl+C, register a signal handler that calls runtime.stop().

Too much noise in console mode? The built-in console output is at DEBUG level. Set logging.basicConfig(level=logging.INFO) to suppress it while keeping application logs.

Adding trace context to external HTTP calls? Inject message.trace_id and message.span_id as HTTP headers in your tool implementation to continue the trace across service boundaries.