Skip to content

Observability

Collect metrics, traces, and logs from IdentityScribe. This guide covers endpoints, signals, playbooks, and PromQL examples.

For exhaustive inventories, see the generated telemetry reference:

Related:

Note: This document describes the OTel-first contract. Legacy /status/* endpoints have been removed in favor of /observe/* (breaking change, no aliases).

PathDescription
/metricsPrometheus scrape endpoint (OpenTelemetry-backed)
/observe/healthCombined MicroProfile health (JSON)
/observe/health/liveLiveness probe (JSON)
/observe/health/readyReadiness probe (JSON)
/observe/health/startedStartup probe (JSON)
/observe/health/check/{name}Individual health check by name (JSON)
/livezKubernetes liveness probe (plain text)
/readyzKubernetes readiness probe (plain text)
/startedzKubernetes startup probe (plain text)
/healthzKubernetes combined health (plain text)

Note: The root path / no longer serves metrics — use /metrics explicitly.

These endpoints expose on-demand insights (JSON, cached) for data that would explode metrics cardinality.

PathDescription
/observeOpenAPI documentation (HTML UI + JSON/YAML spec)
/observe/statusBasic status
/observe/channelsChannel and socket discovery with runtime binding info
/observe/configResolved configuration (passwords redacted)
/observe/doctorHealth report with threshold checks and recommendations
/observe/servicesService lifecycle status (state, uptime, failures)
/observe/pressureSaturation metrics (queue/task/memory pressure)
/observe/signalsGolden signals summary (latency, traffic, errors, saturation)
/observe/indexesIndex build status and concurrent build detection
/observe/hintsPersisted hints
/observe/signaturesQuery signatures
/observe/stats/valuesValue size statistics per entry type and attribute
/observe/stats/entriesEntry blob size percentiles per entry type
/observe/stats/eventsEvent rate windows (dashboard-friendly buckets)
/observe/stats/ingestIngest lag and checkpoint positions
/observe/mcpMCP server for AI assistants (see MCP Channel)

Tip: Use /observe for interactive docs. If you need machine-readable status, use /observe/status.

Target: /observe/* replaces the legacy /status/* paths (breaking change, no aliases).

The /observe/stats/* endpoints execute direct database queries and can be expensive on large datasets:

  • Use case: Investigation, debugging, collecting real usage statistics—not high-frequency polling.
  • Caching: Two-tier caching protects the database:
    • In-process cache (30s TTL): /values, /entries, /ingest endpoints cache responses server-side.
    • HTTP Cache-Control: All responses include Cache-Control: max-age=30, private for client-side caching.
    • /events is parameterized by since, so it uses client-side caching only.
  • Row limits: Queries return at most 100 rows by default to bound response size.
  • since parameter (/observe/stats/events):
    • ISO-8601 timestamps: 2026-01-01T00:00:00Z, 2026-01-01T00:00:00+01:00
    • Duration strings: 1h, 24h, 7d, 30m, PT1H
    • Invalid input returns 400: Future timestamps, negative/zero durations, or unparseable values.
  • Precision: Numeric byte fields (avgBytes, p50Bytes, etc.) use double to preserve fractional values.

These endpoints are intended for operational investigation and dashboard population, not continuous scraping.

Returns enabled channels, sockets, and runtime binding information. Useful for service discovery, UI connectivity information, and debugging network configuration.

Example response:

{
"channels": {
"ldap": {
"enabled": true,
"running": true,
"bindings": [
{
"host": "0.0.0.0",
"configuredPort": 0,
"actualPort": 10389,
"ssl": false,
"url": "ldap://0.0.0.0:10389"
},
{
"host": "0.0.0.0",
"configuredPort": 10636,
"actualPort": 10636,
"ssl": true,
"url": "ldaps://0.0.0.0:10636"
}
]
},
"identityHub": {
"enabled": false,
"running": false,
"bindings": []
},
"rest": {
"enabled": true,
"sockets": ["@default"],
"basePath": "/api"
}
},
"monitoring": {
"prometheus": { "enabled": true, "sockets": ["@default"], "path": "/metrics" },
"observe": { "enabled": true, "sockets": ["@default"], "path": "/observe" },
"health": { "enabled": true, "sockets": ["@default"], "paths": ["/livez", "/readyz", "/startedz", "/healthz"] }
},
"telemetry": {
"traces": { "enabled": true },
"metrics": { "prometheus": true, "otlp": false },
"hints": { "enabled": true, "explain": false, "persistence": true }
},
"sockets": {
"@default": {
"host": "0.0.0.0",
"configuredPort": 0,
"actualPort": 8080,
"ssl": false,
"url": "http://api.example.com:8080"
},
"internal": {
"host": "127.0.0.1",
"configuredPort": 9090,
"actualPort": 9090,
"ssl": false,
"url": "http://127.0.0.1:9090"
}
},
"request": {
"socket": "@default",
"host": "api.example.com",
"port": 8080,
"scheme": "http"
},
"timestamp": "2026-01-13T12:00:00Z"
}

Key features:

  • channels: Enabled channels with binding info. LDAP shows running status and actual ports (important for ephemeral port 0). REST shows socket references.
  • sockets: HTTP sockets with both configuredPort and actualPort (useful when port 0 is configured for ephemeral binding).
  • request: Request context showing detected host/port/scheme (respects X-Forwarded-* headers from proxies).
  • url: Auto-generated connection URL. For the current socket, uses detected host/scheme from request headers. For other sockets, uses configured values.

Use cases:

  • UI service discovery (fetch socket URLs dynamically)
  • Debugging ephemeral port bindings in test environments
  • Verifying proxy header forwarding (request.host, request.scheme)
  • Operational dashboards showing enabled features

Returns the resolved configuration with passwords redacted. Equivalent to the --printconfig CLI flag but accessible at runtime via HTTP.

Content negotiation:

Accept HeaderResponse Format
application/json (default)JSON with config and timestamp fields
text/plainRaw HOCON config text

Example JSON response:

{
"config": "Configuration Sources:\n\n- system properties\n- reference.conf\n\napp {\n mode = production\n}\n\ndatabase {\n \"password\" : \"<REDACTED>\"\n host = \"localhost\"\n port = 5432\n}\n...",
"timestamp": "2026-01-13T12:00:00Z"
}

Example plain text request:

Terminal window
curl -H "Accept: text/plain" http://localhost:8080/observe/config

Key features:

  • Password redaction: All password fields are replaced with <REDACTED>
  • Configuration sources: Shows merge order (system properties, env vars, config files)
  • Lazy caching: Config string is computed once on first request
  • Cache-Control: Responses include Cache-Control: max-age=3600, private (1-hour TTL)
  • Excludes JVM internals: Filters out java.*, jdk.*, sun.*, org.graalvm.* paths

Use cases:

  • Debugging configuration issues in production without shell access
  • Verifying environment variable overrides are applied correctly
  • Support diagnostics (config can be shared without exposing secrets)
  • CI/CD verification that config is resolved as expected

Returns detailed per-service status with uptime, startup duration, tags, and failure causes.

Example response:

{
"services": [
{
"id": "Scribe.user",
"name": "Scribe",
"state": "running",
"healthy": true,
"tags": {"entryType": "user"},
"uptime_seconds": 3600,
"startup_seconds": 2.5
},
{
"id": "Database.System",
"name": "Database.System",
"state": "failed",
"healthy": false,
"failure": "Connection refused"
}
],
"summary": {
"total": 8,
"healthy": 7,
"unhealthy": 1,
"restarts_5m": 3
},
"timestamp": "2026-01-08T12:00:00Z"
}

Use case: Detailed service diagnostics, startup timing analysis, failure investigation.

Returns an intelligent health report with threshold-based checks, per-service status, and actionable recommendations.

Key features:

  • services.down check shows which services are down by ID (not just count)
  • services array provides per-service state, uptime, and health
  • recommendations array with prioritized, actionable hints

Example services.down hint: "Down: Scribe.user, Database.Batch"

IdentityScribe exposes the Four Golden Signals for quick system health assessment, covering both query and ingest sides.

SignalMetricWhat it measures
Latencyscribe_signals_latency_p95Response time p95 (seconds)
Trafficscribe_signals_requests_per_secondRequest throughput
Errorsscribe_signals_error_rate_percentFailure percentage
Saturationscribe_signals_traffic_ratio, scribe_db_pool_pressureResource utilization

Per-channel breakdown with channel label:

  • scribe_signals_channel_latency_p95{channel="ldap"}
  • scribe_signals_channel_requests_per_second{channel="rest"}
  • scribe_signals_channel_error_rate_percent{channel="graphql"}
SignalMetricWhat it measures
Latencyscribe_signals_ingest_task_duration_p95Task processing time p95
Latencyscribe_signals_ingest_lag_max_secondsWorst replication lag
Trafficscribe_signals_ingest_changes_per_secondChange detection rate
Errorsscribe_signals_ingest_failed_rate_percentTask failure percentage

Per-entry-type breakdown with entry_type label:

  • scribe_signals_ingest_entry_lag_seconds{entry_type="user"}
  • scribe_signals_ingest_entry_task_duration_p95{entry_type="group"}
  • scribe_signals_ingest_entry_changes_per_second{entry_type="role"}

The /observe/signals endpoint returns a JSON summary for the built-in dashboard:

Terminal window
curl -s http://localhost:8080/observe/signals | jq

These metrics integrate with existing dashboards via PromQL:

# Query signals alerting
scribe_signals_latency_p95 > 2.0
scribe_signals_error_rate_percent > 5.0
scribe_signals_traffic_ratio > 5.0
# Ingest signals alerting
scribe_signals_ingest_lag_max_seconds > 300
scribe_signals_ingest_failed_rate_percent > 1.0
scribe_signals_ingest_task_duration_p95 > 5.0
# Per-entry-type lag comparison
scribe_signals_ingest_entry_lag_seconds

A dedicated Golden Signals dashboard is available at monitoring/grafana/dashboards/signals.json.

All metrics use the scribe. prefix (canonical dot notation) which is auto-converted to scribe_ for Prometheus.

MetricTypeLabelsDescription
scribe.channel.requests.totalCounterchannel, op, resultTotal requests by channel/operation
scribe.channel.request.duration.secondsHistogramchannel, op, resultRequest latency distribution
scribe.channel.inflightGaugechannel, opCurrently processing requests
MetricTypeLabelsDescription
scribe.query.stage.duration.secondsHistogramchannel, op, stage, resultPer-stage latency breakdown
scribe.query.shapes.totalCounterchannel, op, shapeQuery shape classification counts
scribe.query.permit.pressureGaugePermit utilization (0..1)
scribe.query.permit.queueGaugeThreads waiting for permits (count)
scribe.query.rejected.totalCounterchannel, resultRejected queries (resource exhaustion)
MetricTypeLabelsDescription
scribe.ingest.lag.secondsGaugeentry_typeSeconds behind head
scribe.ingest.queue.pressureGaugeentry_typeQueue fill ratio (0..1)
scribe.ingest.task.pressureGaugeentry_typeProcessing demand ratio (~1 steady)
scribe.ingest.changes.totalCounterentry_type, changeChange events by type
scribe.ingest.events.written.totalCounterentry_type, event_typeEvents written to store
MetricTypeLabelsDescription
scribe.store.commit.duration.secondsHistogramphaseCommit phase durations
scribe.store.commit.wait.duration.secondsHistogramphaseWait time for commit phases
MetricTypeLabelsDescription
scribe.service.restarts.totalCounterserviceService restart count
scribe.service.upGaugeserviceService health (0/1)
scribe.service.transitions.totalCounterservice, from, toService state transition count
scribe.service.transition.duration.secondsHistogramservice, from, toTime spent in each state before transition

Registered services: Database.System, Database.Batch, Database.Channel, Channel.LDAP, Channel.IdentityHub, Scribe, TaskExecutor, HintEngine, LicenseVerification, HelidonObserveServer.

MetricTypeLabelsDescription
scribe.hints.queue.sizeGaugePersistence queue size
scribe.hints.queue.dropped.totalCounterDropped hints (queue full)
scribe.hints.persisted.totalCounterSuccessfully persisted hints

These metrics provide minimal, portable runtime gauges that work reliably in both JVM and GraalVM native-image environments.

MetricTypeLabelsDescription
jvm.memory.usedGaugejvm.memory.typeMemory in use (bytes)
jvm.memory.committedGaugejvm.memory.typeMemory committed (bytes)
jvm.memory.limitGaugejvm.memory.typeMax memory (bytes)
jvm.memory.pressureGaugeMemory pressure (used/max, 0..1)
jvm.thread.countGaugeActive thread count
jvm.cpu.countGaugeAvailable processors
process.uptimeGaugeProcess uptime (seconds)

Note: GC metrics (jvm.gc.*) are intentionally omitted as they are not reliably available in GraalVM native-image.

External metrics (not in scribe.* contract)

Section titled “External metrics (not in scribe.* contract)”

These are emitted by libraries/frameworks and are kept as-is:

  • hikaricp_* — HikariCP connection pool metrics
  • jvm_* — JVM memory, threads (minimal, native-image safe; GC metrics intentionally omitted)
  • process_* — Process uptime (minimal, native-image safe)
  • system_* — System CPU, load
  • executor_* — Executor service metrics
# Request rate by channel
rate(scribe_channel_requests_total[5m])
# p99 latency by channel
histogram_quantile(0.99, rate(scribe_channel_request_duration_seconds_bucket[5m]))
# Error rate
sum(rate(scribe_channel_requests_total{result!="ok"}[5m]))
/ sum(rate(scribe_channel_requests_total[5m]))
# Query permit pressure sustained high
avg_over_time(scribe_query_permit_pressure[5m]) > 0.9
# Ingest queue pressure by entry_type
scribe_ingest_queue_pressure > 0.8
# Ingest falling behind
scribe_ingest_task_pressure > 1.2
# p95 by stage
histogram_quantile(0.95,
sum by (stage, le) (rate(scribe_query_stage_duration_seconds_bucket[5m]))
)
# Which stage dominates?
sum by (stage) (rate(scribe_query_stage_duration_seconds_sum[5m]))
/ sum(rate(scribe_query_stage_duration_seconds_sum[5m]))
# Services that restarted recently
increase(scribe_service_restarts_total[1h]) > 0
# Services currently down
scribe_service_up == 0
# Service state transitions (e.g., identify flapping services)
rate(scribe_service_transitions_total[5m])
# How long services spent starting (to detect slow startups)
histogram_quantile(0.95, rate(scribe_service_transition_duration_seconds_bucket{to="running"}[1h]))
# Services that failed recently
increase(scribe_service_transitions_total{to="failed"}[1h]) > 0

When tracing is enabled, each LDAP/REST/GraphQL query produces a nested span hierarchy:

LDAP.Search (or REST.Search, etc.) ← Channel entry span
└── Query.Normalize ← Normalization stage
└── Query.Plan ← Planning stage
└── Query.Compile ← SQL emission stage (Prepare)
└── Query.Execute ← DB execution + result streaming

Span attributes:

AttributeDescriptionExample
scribe.resultOutcome classificationok, cancelled, deadline_exceeded
scribe.search.kindSearch pagination mode (trace-only)simple, paged, vlv
scribe.query.signatureQuery signature hash (trace-only)a1b2c3d4
scribe.entry_typeEntry type(s) in scope (trace-only)inetOrgPerson

The stage label in scribe.query.stage.duration.seconds maps directly to span names:

StageSpan NameDescription
normalizeQuery.NormalizeAttribute mapping, filter canonicalization
planQuery.PlanIndex selection, predicate classification
compileQuery.CompileLogical plan → SQL
executeQuery.ExecuteJDBC execution, result streaming
  1. Find the slow stage: Look at scribe_query_stage_duration_seconds by stage
  2. Get a sample trace: Filter by trace ID or look for traces with high duration
  3. Drill into the Execute span: If execute is slow, check for:
    • Database lock contention
    • Missing indexes (see /observe/hints)
    • Pool saturation (hikaricp_connections_pending > 0)
  4. Check the Plan span: If plan is slow, the query may be too complex

Configure OTLP traces export in application.conf:

monitoring.telemetry.traces {
enabled = true
endpoint = "http://otel-collector:4317"
protocol = "grpc" # or "http/protobuf" for port 4318
}

Or via environment variables:

Terminal window
export SCRIBE_TELEMETRY_TRACES_ENABLED=true
export SCRIBE_TELEMETRY_TRACES_ENDPOINT=http://otel-collector:4317

WideLogCollector provides trace-first observability by accumulating context throughout request/task execution and emitting one structured log line when the operation is “interesting.”

Operations emit at completion only when:

  • Failure: Any failure kind (normalized via Failure.wrap()) → ERROR level
  • Warnings: At least one warning recorded via WideLogCollector.warn() → WARN level
  • Exceptions: At least one exception event recorded → WARN level
  • Slow: Duration exceeds the configured threshold for the flow type
  • Marked: Explicitly marked interesting via WideLogCollector.markInteresting()

Silent (fast success without warnings/exceptions) is intentional — it keeps logs focused on actionable events.

Identity Scribe uses 5 canonical loggers, all under the com.kenoxa.scribe.* namespace:

LoggerConfig KeyEnv OverridePurpose
com.kenoxa.scribe.SuperVisorlog.SuperVisorSCRIBE_LOG_SUPERVISORStartup/shutdown, orchestration
com.kenoxa.scribe.Ingestlog.IngestSCRIBE_LOG_INGESTTranscription pipeline, event store
com.kenoxa.scribe.Monitoringlog.MonitoringSCRIBE_LOG_MONITORINGWide-log output, observability
com.kenoxa.scribe.Licenselog.LicenseSCRIBE_LOG_LICENSELicense verification
com.kenoxa.scribe.Configlog.ConfigSCRIBE_LOG_CONFIGConfiguration parsing

All loggers inherit from log.level by default. Configure per-logger levels to control verbosity:

Terminal window
# Disable wide-log output entirely
SCRIBE_LOG_MONITORING=off
# Verbose transcription debugging
SCRIBE_LOG_INGEST=debug
# Quiet license checks
SCRIBE_LOG_LICENSE=error

Wide logs are emitted via the Monitoring logger at INFO/WARN/ERROR level:

{"trace_id":"...","span_id":"...","duration_seconds":1.5,"result":"ok","scribe.operation":"LDAP.Search",...}

Single-line JSON for machine parsing:

{
"trace_id": "abc123",
"span_id": "def456",
"duration_seconds": 1.5,
"result": "ok",
"scribe.operation": "LDAP.Search",
"scribe.entry_type": "user",
"scribe.search.kind": "paged",
"events": [
{"name": "warning", "timestamp": "...", "attributes": {"code": "unsupported_control", "message": "VLV not available"}}
]
}

Human-friendly format with header line, auto-grouped attributes, and segment timeline. When terminal color support is detected, output is enhanced with ANSI colors:

ElementColorPurpose
Result okGreenSuccess at a glance
Result ok (with warnings)YellowSuccess but needs attention
Result (error)RedImmediate attention required
DurationDimVisual separation from content
Attribute group prefixesCyanQuick section identification (db:, http:)
Attribute keysBoldEasy scanning within groups
Warning eventsYellowStand out in event timeline
Failure headerRedDraws eye to error details

Color detection uses TERM, FORCE_COLOR, NO_COLOR (no-color.org), and TTY presence. Disable colors with NO_COLOR=1 or TERM=dumb.

LDAP.Search ok dur=1.5s warnings=1 events=5
trace=abc123 span=def456
db: system=postgresql duration.seconds=0.42 row_count=42
http: route="/ldap/search"
scribe: entry_type=user search.kind=paged
scribe.query: scope="ou=users,o=org" where="(&(objectClass=*))" sort="cn"
events:
+ 5.1ms 12.3ms Query.Plan
warning code=unsupported_control message="VLV not available"
+ 18.0ms 401.9ms DB.Fetch row_count=42
cache.hit
+520.2ms 9.8ms Limiter.Acquire permits=1

Failure output (with full details and stacktrace):

scribe REST.Modify internal dur=311ms trace=trace123 span=span456
failure:
kind=INTERNAL code=SERVER_ERROR
message="Database connection failed"
details={"context": "user.modify", "attempt": 3}
trace_id=trace123 span_id=span456
cause_type=java.sql.SQLException
cause_message="Connection refused"
stacktrace:
java.sql.SQLException: Connection refused
at org.postgresql.Driver.connect(Driver.java:285)
at ...
scribe: entry_type=user

Features:

  • Header line: Operation, result (colored on TTY), duration, warning/event counts
  • Trace/span: Only shown on warnings or failures (de-emphasized on success)
  • Auto-grouped attributes: Keys grouped by prefix (e.g., db:, scribe:, scribe.query:)
  • Segment timeline: Child segments shown with offset and duration (+offset duration name)
  • JSON detection: String values that are JSON are auto-formatted
  • Color support: Auto-detected based on TTY, CI environment, and FORCE_COLOR/NO_COLOR env vars
FieldTypeDescription
trace_idstringOpenTelemetry trace ID (when tracing enabled)
span_idstringOpenTelemetry span ID
parent_span_idstringParent span ID (optional)
duration_secondsdoubleOperation duration in seconds
resultstringok or failure kind (e.g., internal, not_found)
failureobjectFailure details: kind, code, message, details, stacktrace (when failed)
eventsarrayEvent records including warnings (name=warning) and segment timing
scribe.operationstringOperation name (e.g., LDAP.Search, REST.GET, Transcription.WorkItem)
Custom attributesvariesAny attributes set via Segment.annotate()
monitoring.log {
# Enable/disable wide event logging (default: true)
enabled = true
enabled = ${?SCRIBE_LOG_ENABLED}
# Log format: pretty (default) | json | auto
# auto = pretty in dev mode + TTY, json otherwise
# Uses app.mode (dev/development/local/test → dev mode)
format = "pretty"
format = ${?SCRIBE_LOG_FORMAT}
# Random sampling for ops that don't match any rule
# 0 = never sample, 100 = always log (default)
sample-rate = 100
sample-rate = ${?SCRIBE_LOG_SAMPLE_RATE}
# Per-key redaction strategies
# Keys support glob patterns: * = any chars, ? = single char
#
# Available strategies:
# replace - Replace with "[REDACTED]"
# hash - SHA-256, base64url, 16 chars
# truncate - Show ≤33% of chars (max 8 visible); ≤8 chars fall back to [REDACTED]
# omit - Remove attribute entirely
#
# Hard-coded security patterns (always OMIT, cannot be disabled):
# *password*, *credential*, *token*, *secret*, *apikey*, *api_key*
redaction {
"*dn*" = hash # Matches scribe.entry_dn, ldap.base_dn, etc.
"*email*" = truncate # Matches user.email, notification.email, etc.
"*.raw" = replace
"*.pii.*" = hash
}
}

Rules control which operations are logged. They evaluate in order — first match wins. Operations not matching any rule fall through to random sampling.

Decision flow:

  1. Failures → Always logged (ERROR level)
  2. Warnings/Exceptions → Always logged (WARN level)
  3. markInteresting() → Always logged
  4. First matching ruleinclude logs, exclude suppresses
  5. No match → Apply sample-rate (0-100%)

Rule syntax:

monitoring.log.rules = [
{ action = include|exclude, name = "glob", where = "(filter)" }
]
FieldRequiredDescription
actionYesinclude (log) or exclude (suppress)
nameNoGlob pattern for operation name (* = any chars, ? = single char)
whereNoLDAP-style filter on attributes

Duration filtering:

The synthetic duration.seconds attribute is injected before rule evaluation, enabling duration-based filtering:

monitoring.log.rules = [
# Suppress fast successful operations
{ action = exclude, where = "(&(scribe.result=ok)(duration.seconds<=50ms))" }
# Always log slow operations
{ action = include, where = "(duration.seconds>=5s)" }
# Suppress internal maintenance under threshold
{ action = exclude, name = "Hints.*", where = "(duration.seconds<=100ms)" }
{ action = exclude, name = "Metrics.*", where = "(duration.seconds<=500ms)" }
]

Duration values support multiple formats: plain seconds (0.05), HOCON style (50ms, 5s, 1m), or ISO 8601 (PT5S).

Common patterns:

# Log all LDAP operations
{ action = include, name = "LDAP.*" }
# Suppress fast successful ops
{ action = exclude, where = "(&(scribe.result=ok)(duration.seconds<=50ms))" }
# Log slow DB queries
{ action = include, where = "(db.duration.seconds>=1s)" }
# Exclude everything else (catch-all)
{ action = exclude }

See monitoring.log.rules for the complete reference.

Child segments can be automatically captured as events in wide logs, providing operation breakdown without trace analysis.

Modes:

ModeWhat’s CapturedOverhead
autoFull in dev mode, minimal in prod (default)Varies
offNothingNone
minimalName, offset, durationLow
fullName, offset, duration, segment attributesModerate

Configuration:

monitoring.log.childs {
# Mode: auto (default) | off | minimal | full
# auto = full in dev mode, minimal in prod
mode = "auto"
mode = ${?SCRIBE_LOG_CHILDS_MODE}
# Display: auto (default) | off | summary | details
# auto = summary in dev, off in prod
display = "auto"
# Rules: which segments to track (first match wins, empty = include all)
rules = [
{ action = include, where = "(duration.seconds>=1ms)" }
{ action = exclude } # Catch-all: filter sub-millisecond noise
]
}

Example configurations:

# Track all segments (no filtering)
childs {
mode = "full"
rules = [] # Empty = include all
}
# Track only slow segments
childs {
mode = "minimal"
rules = [
{ action = include, name = "Query.*", where = "(duration.seconds>=10ms)" }
{ action = include, name = "DB.*", where = "(duration.seconds>=5ms)" }
{ action = exclude }
]
}

Event attributes (added to each tracked segment event):

AttributeTypeDescription
offset.secondsdoubleTime from operation start to segment start
duration.secondsdoubleSegment duration
Segment attributesvariesIn full mode, includes attributes set via segment.annotate()

Ordering: Events appear in segment start-time order (not end-time), making the wide log easy to read chronologically.

Redaction: Segment-local attributes are subject to the same redaction rules as operation-level attributes.

Use case: Identify slow sub-operations without diving into traces:

{
"duration_seconds": 0.125,
"result": "ok",
"scribe.operation": "LDAP.Search",
"events": [
{
"name": "Query.Plan",
"timestamp": "2026-01-07T10:30:00.001Z",
"attributes": { "offset.seconds": 0.001, "duration.seconds": 0.002 }
},
{
"name": "Query.Execute",
"timestamp": "2026-01-07T10:30:00.003Z",
"attributes": { "offset.seconds": 0.003, "duration.seconds": 0.095 }
},
{
"name": "Query.Map",
"timestamp": "2026-01-07T10:30:00.098Z",
"attributes": { "offset.seconds": 0.098, "duration.seconds": 0.025 }
}
]
}
startupProbe:
httpGet:
path: /startedz
port: 8081
initialDelaySeconds: 5
periodSeconds: 10
failureThreshold: 90 # 15min max for services + DB init
livenessProbe:
httpGet:
path: /livez
port: 8081
initialDelaySeconds: 0
periodSeconds: 10
failureThreshold: 3
readinessProbe:
httpGet:
path: /readyz
port: 8081
initialDelaySeconds: 0
periodSeconds: 5
failureThreshold: 1
successThreshold: 1