Observability
Collect metrics, traces, and logs from IdentityScribe. This guide covers endpoints, signals, playbooks, and PromQL examples.
For exhaustive inventories, see the generated telemetry reference:
Related:
- Monitoring — Dashboards, workflows, and troubleshooting
- Upgrading IdentityScribe — Migration notes for endpoints and metrics
- Failures — Error codes, retry semantics, and support workflow
Note: This document describes the OTel-first contract. Legacy /status/* endpoints have been removed in favor of /observe/* (breaking change, no aliases).
Endpoint taxonomy
Section titled “Endpoint taxonomy”Standard paths
Section titled “Standard paths”| Path | Description |
|---|---|
/metrics | Prometheus scrape endpoint (OpenTelemetry-backed) |
/observe/health | Combined MicroProfile health (JSON) |
/observe/health/live | Liveness probe (JSON) |
/observe/health/ready | Readiness probe (JSON) |
/observe/health/started | Startup probe (JSON) |
/observe/health/check/{name} | Individual health check by name (JSON) |
/livez | Kubernetes liveness probe (plain text) |
/readyz | Kubernetes readiness probe (plain text) |
/startedz | Kubernetes startup probe (plain text) |
/healthz | Kubernetes combined health (plain text) |
Note: The root path / no longer serves metrics — use /metrics explicitly.
Observe endpoints (/observe/*)
Section titled “Observe endpoints (/observe/*)”These endpoints expose on-demand insights (JSON, cached) for data that would explode metrics cardinality.
| Path | Description |
|---|---|
/observe | OpenAPI documentation (HTML UI + JSON/YAML spec) |
/observe/status | Basic status |
/observe/channels | Channel and socket discovery with runtime binding info |
/observe/config | Resolved configuration (passwords redacted) |
/observe/doctor | Health report with threshold checks and recommendations |
/observe/services | Service lifecycle status (state, uptime, failures) |
/observe/pressure | Saturation metrics (queue/task/memory pressure) |
/observe/signals | Golden signals summary (latency, traffic, errors, saturation) |
/observe/indexes | Index build status and concurrent build detection |
/observe/hints | Persisted hints |
/observe/signatures | Query signatures |
/observe/stats/values | Value size statistics per entry type and attribute |
/observe/stats/entries | Entry blob size percentiles per entry type |
/observe/stats/events | Event rate windows (dashboard-friendly buckets) |
/observe/stats/ingest | Ingest lag and checkpoint positions |
/observe/mcp | MCP server for AI assistants (see MCP Channel) |
Tip: Use /observe for interactive docs. If you need machine-readable status, use /observe/status.
Target: /observe/* replaces the legacy /status/* paths (breaking change, no aliases).
Stats endpoints performance note
Section titled “Stats endpoints performance note”The /observe/stats/* endpoints execute direct database queries and can be expensive on large datasets:
- Use case: Investigation, debugging, collecting real usage statistics—not high-frequency polling.
- Caching: Two-tier caching protects the database:
- In-process cache (30s TTL):
/values,/entries,/ingestendpoints cache responses server-side. - HTTP Cache-Control: All responses include
Cache-Control: max-age=30, privatefor client-side caching. /eventsis parameterized bysince, so it uses client-side caching only.
- In-process cache (30s TTL):
- Row limits: Queries return at most 100 rows by default to bound response size.
sinceparameter (/observe/stats/events):- ISO-8601 timestamps:
2026-01-01T00:00:00Z,2026-01-01T00:00:00+01:00 - Duration strings:
1h,24h,7d,30m,PT1H - Invalid input returns 400: Future timestamps, negative/zero durations, or unparseable values.
- ISO-8601 timestamps:
- Precision: Numeric byte fields (
avgBytes,p50Bytes, etc.) usedoubleto preserve fractional values.
These endpoints are intended for operational investigation and dashboard population, not continuous scraping.
Channels endpoint (/observe/channels)
Section titled “Channels endpoint (/observe/channels)”Returns enabled channels, sockets, and runtime binding information. Useful for service discovery, UI connectivity information, and debugging network configuration.
Example response:
{ "channels": { "ldap": { "enabled": true, "running": true, "bindings": [ { "host": "0.0.0.0", "configuredPort": 0, "actualPort": 10389, "ssl": false, "url": "ldap://0.0.0.0:10389" }, { "host": "0.0.0.0", "configuredPort": 10636, "actualPort": 10636, "ssl": true, "url": "ldaps://0.0.0.0:10636" } ] }, "identityHub": { "enabled": false, "running": false, "bindings": [] }, "rest": { "enabled": true, "sockets": ["@default"], "basePath": "/api" } }, "monitoring": { "prometheus": { "enabled": true, "sockets": ["@default"], "path": "/metrics" }, "observe": { "enabled": true, "sockets": ["@default"], "path": "/observe" }, "health": { "enabled": true, "sockets": ["@default"], "paths": ["/livez", "/readyz", "/startedz", "/healthz"] } }, "telemetry": { "traces": { "enabled": true }, "metrics": { "prometheus": true, "otlp": false }, "hints": { "enabled": true, "explain": false, "persistence": true } }, "sockets": { "@default": { "host": "0.0.0.0", "configuredPort": 0, "actualPort": 8080, "ssl": false, "url": "http://api.example.com:8080" }, "internal": { "host": "127.0.0.1", "configuredPort": 9090, "actualPort": 9090, "ssl": false, "url": "http://127.0.0.1:9090" } }, "request": { "socket": "@default", "host": "api.example.com", "port": 8080, "scheme": "http" }, "timestamp": "2026-01-13T12:00:00Z"}Key features:
channels: Enabled channels with binding info. LDAP showsrunningstatus and actual ports (important for ephemeral port 0). REST shows socket references.sockets: HTTP sockets with bothconfiguredPortandactualPort(useful when port 0 is configured for ephemeral binding).request: Request context showing detected host/port/scheme (respectsX-Forwarded-*headers from proxies).url: Auto-generated connection URL. For the current socket, uses detected host/scheme from request headers. For other sockets, uses configured values.
Use cases:
- UI service discovery (fetch socket URLs dynamically)
- Debugging ephemeral port bindings in test environments
- Verifying proxy header forwarding (
request.host,request.scheme) - Operational dashboards showing enabled features
Config endpoint (/observe/config)
Section titled “Config endpoint (/observe/config)”Returns the resolved configuration with passwords redacted. Equivalent to the --printconfig CLI flag but accessible at runtime via HTTP.
Content negotiation:
| Accept Header | Response Format |
|---|---|
application/json (default) | JSON with config and timestamp fields |
text/plain | Raw HOCON config text |
Example JSON response:
{ "config": "Configuration Sources:\n\n- system properties\n- reference.conf\n\napp {\n mode = production\n}\n\ndatabase {\n \"password\" : \"<REDACTED>\"\n host = \"localhost\"\n port = 5432\n}\n...", "timestamp": "2026-01-13T12:00:00Z"}Example plain text request:
curl -H "Accept: text/plain" http://localhost:8080/observe/configKey features:
- Password redaction: All password fields are replaced with
<REDACTED> - Configuration sources: Shows merge order (system properties, env vars, config files)
- Lazy caching: Config string is computed once on first request
- Cache-Control: Responses include
Cache-Control: max-age=3600, private(1-hour TTL) - Excludes JVM internals: Filters out
java.*,jdk.*,sun.*,org.graalvm.*paths
Use cases:
- Debugging configuration issues in production without shell access
- Verifying environment variable overrides are applied correctly
- Support diagnostics (config can be shared without exposing secrets)
- CI/CD verification that config is resolved as expected
Services endpoint (/observe/services)
Section titled “Services endpoint (/observe/services)”Returns detailed per-service status with uptime, startup duration, tags, and failure causes.
Example response:
{ "services": [ { "id": "Scribe.user", "name": "Scribe", "state": "running", "healthy": true, "tags": {"entryType": "user"}, "uptime_seconds": 3600, "startup_seconds": 2.5 }, { "id": "Database.System", "name": "Database.System", "state": "failed", "healthy": false, "failure": "Connection refused" } ], "summary": { "total": 8, "healthy": 7, "unhealthy": 1, "restarts_5m": 3 }, "timestamp": "2026-01-08T12:00:00Z"}Use case: Detailed service diagnostics, startup timing analysis, failure investigation.
Doctor endpoint (/observe/doctor)
Section titled “Doctor endpoint (/observe/doctor)”Returns an intelligent health report with threshold-based checks, per-service status, and actionable recommendations.
Key features:
services.downcheck shows which services are down by ID (not just count)servicesarray provides per-service state, uptime, and healthrecommendationsarray with prioritized, actionable hints
Example services.down hint: "Down: Scribe.user, Database.Batch"
Golden signals
Section titled “Golden signals”IdentityScribe exposes the Four Golden Signals for quick system health assessment, covering both query and ingest sides.
Query signals (channel side)
Section titled “Query signals (channel side)”| Signal | Metric | What it measures |
|---|---|---|
| Latency | scribe_signals_latency_p95 | Response time p95 (seconds) |
| Traffic | scribe_signals_requests_per_second | Request throughput |
| Errors | scribe_signals_error_rate_percent | Failure percentage |
| Saturation | scribe_signals_traffic_ratio, scribe_db_pool_pressure | Resource utilization |
Per-channel breakdown with channel label:
scribe_signals_channel_latency_p95{channel="ldap"}scribe_signals_channel_requests_per_second{channel="rest"}scribe_signals_channel_error_rate_percent{channel="graphql"}
Ingest signals (sync side)
Section titled “Ingest signals (sync side)”| Signal | Metric | What it measures |
|---|---|---|
| Latency | scribe_signals_ingest_task_duration_p95 | Task processing time p95 |
| Latency | scribe_signals_ingest_lag_max_seconds | Worst replication lag |
| Traffic | scribe_signals_ingest_changes_per_second | Change detection rate |
| Errors | scribe_signals_ingest_failed_rate_percent | Task failure percentage |
Per-entry-type breakdown with entry_type label:
scribe_signals_ingest_entry_lag_seconds{entry_type="user"}scribe_signals_ingest_entry_task_duration_p95{entry_type="group"}scribe_signals_ingest_entry_changes_per_second{entry_type="role"}
Built-in dashboard
Section titled “Built-in dashboard”The /observe/signals endpoint returns a JSON summary for the built-in dashboard:
curl -s http://localhost:8080/observe/signals | jqGrafana integration
Section titled “Grafana integration”These metrics integrate with existing dashboards via PromQL:
# Query signals alertingscribe_signals_latency_p95 > 2.0scribe_signals_error_rate_percent > 5.0scribe_signals_traffic_ratio > 5.0
# Ingest signals alertingscribe_signals_ingest_lag_max_seconds > 300scribe_signals_ingest_failed_rate_percent > 1.0scribe_signals_ingest_task_duration_p95 > 5.0
# Per-entry-type lag comparisonscribe_signals_ingest_entry_lag_secondsA dedicated Golden Signals dashboard is available at monitoring/grafana/dashboards/signals.json.
Core metrics inventory
Section titled “Core metrics inventory”All metrics use the scribe. prefix (canonical dot notation) which is auto-converted to scribe_ for Prometheus.
Channel (front door SLO)
Section titled “Channel (front door SLO)”| Metric | Type | Labels | Description |
|---|---|---|---|
scribe.channel.requests.total | Counter | channel, op, result | Total requests by channel/operation |
scribe.channel.request.duration.seconds | Histogram | channel, op, result | Request latency distribution |
scribe.channel.inflight | Gauge | channel, op | Currently processing requests |
Query pipeline
Section titled “Query pipeline”| Metric | Type | Labels | Description |
|---|---|---|---|
scribe.query.stage.duration.seconds | Histogram | channel, op, stage, result | Per-stage latency breakdown |
scribe.query.shapes.total | Counter | channel, op, shape | Query shape classification counts |
scribe.query.permit.pressure | Gauge | — | Permit utilization (0..1) |
scribe.query.permit.queue | Gauge | — | Threads waiting for permits (count) |
scribe.query.rejected.total | Counter | channel, result | Rejected queries (resource exhaustion) |
Ingest
Section titled “Ingest”| Metric | Type | Labels | Description |
|---|---|---|---|
scribe.ingest.lag.seconds | Gauge | entry_type | Seconds behind head |
scribe.ingest.queue.pressure | Gauge | entry_type | Queue fill ratio (0..1) |
scribe.ingest.task.pressure | Gauge | entry_type | Processing demand ratio (~1 steady) |
scribe.ingest.changes.total | Counter | entry_type, change | Change events by type |
scribe.ingest.events.written.total | Counter | entry_type, event_type | Events written to store |
| Metric | Type | Labels | Description |
|---|---|---|---|
scribe.store.commit.duration.seconds | Histogram | phase | Commit phase durations |
scribe.store.commit.wait.duration.seconds | Histogram | phase | Wait time for commit phases |
Services
Section titled “Services”| Metric | Type | Labels | Description |
|---|---|---|---|
scribe.service.restarts.total | Counter | service | Service restart count |
scribe.service.up | Gauge | service | Service health (0/1) |
scribe.service.transitions.total | Counter | service, from, to | Service state transition count |
scribe.service.transition.duration.seconds | Histogram | service, from, to | Time spent in each state before transition |
Registered services: Database.System, Database.Batch, Database.Channel, Channel.LDAP, Channel.IdentityHub, Scribe, TaskExecutor, HintEngine, LicenseVerification, HelidonObserveServer.
| Metric | Type | Labels | Description |
|---|---|---|---|
scribe.hints.queue.size | Gauge | — | Persistence queue size |
scribe.hints.queue.dropped.total | Counter | — | Dropped hints (queue full) |
scribe.hints.persisted.total | Counter | — | Successfully persisted hints |
JVM / process
Section titled “JVM / process”These metrics provide minimal, portable runtime gauges that work reliably in both JVM and GraalVM native-image environments.
| Metric | Type | Labels | Description |
|---|---|---|---|
jvm.memory.used | Gauge | jvm.memory.type | Memory in use (bytes) |
jvm.memory.committed | Gauge | jvm.memory.type | Memory committed (bytes) |
jvm.memory.limit | Gauge | jvm.memory.type | Max memory (bytes) |
jvm.memory.pressure | Gauge | — | Memory pressure (used/max, 0..1) |
jvm.thread.count | Gauge | — | Active thread count |
jvm.cpu.count | Gauge | — | Available processors |
process.uptime | Gauge | — | Process uptime (seconds) |
Note: GC metrics (jvm.gc.*) are intentionally omitted as they are not reliably available in GraalVM native-image.
External metrics (not in scribe.* contract)
Section titled “External metrics (not in scribe.* contract)”These are emitted by libraries/frameworks and are kept as-is:
hikaricp_*— HikariCP connection pool metricsjvm_*— JVM memory, threads (minimal, native-image safe; GC metrics intentionally omitted)process_*— Process uptime (minimal, native-image safe)system_*— System CPU, loadexecutor_*— Executor service metrics
PromQL examples
Section titled “PromQL examples”Request rate and latency
Section titled “Request rate and latency”# Request rate by channelrate(scribe_channel_requests_total[5m])
# p99 latency by channelhistogram_quantile(0.99, rate(scribe_channel_request_duration_seconds_bucket[5m]))
# Error ratesum(rate(scribe_channel_requests_total{result!="ok"}[5m])) / sum(rate(scribe_channel_requests_total[5m]))Pressure alerts
Section titled “Pressure alerts”# Query permit pressure sustained highavg_over_time(scribe_query_permit_pressure[5m]) > 0.9
# Ingest queue pressure by entry_typescribe_ingest_queue_pressure > 0.8
# Ingest falling behindscribe_ingest_task_pressure > 1.2Stage breakdown
Section titled “Stage breakdown”# p95 by stagehistogram_quantile(0.95, sum by (stage, le) (rate(scribe_query_stage_duration_seconds_bucket[5m])))
# Which stage dominates?sum by (stage) (rate(scribe_query_stage_duration_seconds_sum[5m])) / sum(rate(scribe_query_stage_duration_seconds_sum[5m]))Service health
Section titled “Service health”# Services that restarted recentlyincrease(scribe_service_restarts_total[1h]) > 0
# Services currently downscribe_service_up == 0
# Service state transitions (e.g., identify flapping services)rate(scribe_service_transitions_total[5m])
# How long services spent starting (to detect slow startups)histogram_quantile(0.95, rate(scribe_service_transition_duration_seconds_bucket{to="running"}[1h]))
# Services that failed recentlyincrease(scribe_service_transitions_total{to="failed"}[1h]) > 0Trace layout
Section titled “Trace layout”Query pipeline Spans
Section titled “Query pipeline Spans”When tracing is enabled, each LDAP/REST/GraphQL query produces a nested span hierarchy:
LDAP.Search (or REST.Search, etc.) ← Channel entry span└── Query.Normalize ← Normalization stage└── Query.Plan ← Planning stage└── Query.Compile ← SQL emission stage (Prepare)└── Query.Execute ← DB execution + result streamingSpan attributes:
| Attribute | Description | Example |
|---|---|---|
scribe.result | Outcome classification | ok, cancelled, deadline_exceeded |
scribe.search.kind | Search pagination mode (trace-only) | simple, paged, vlv |
scribe.query.signature | Query signature hash (trace-only) | a1b2c3d4 |
scribe.entry_type | Entry type(s) in scope (trace-only) | inetOrgPerson |
Stage names
Section titled “Stage names”The stage label in scribe.query.stage.duration.seconds maps directly to span names:
| Stage | Span Name | Description |
|---|---|---|
normalize | Query.Normalize | Attribute mapping, filter canonicalization |
plan | Query.Plan | Index selection, predicate classification |
compile | Query.Compile | Logical plan → SQL |
execute | Query.Execute | JDBC execution, result streaming |
Using traces for latency debugging
Section titled “Using traces for latency debugging”- Find the slow stage: Look at
scribe_query_stage_duration_secondsby stage - Get a sample trace: Filter by trace ID or look for traces with high duration
- Drill into the Execute span: If
executeis slow, check for:- Database lock contention
- Missing indexes (see
/observe/hints) - Pool saturation (
hikaricp_connections_pending > 0)
- Check the Plan span: If
planis slow, the query may be too complex
Enabling tracing
Section titled “Enabling tracing”Configure OTLP traces export in application.conf:
monitoring.telemetry.traces { enabled = true endpoint = "http://otel-collector:4317" protocol = "grpc" # or "http/protobuf" for port 4318}Or via environment variables:
export SCRIBE_TELEMETRY_TRACES_ENABLED=trueexport SCRIBE_TELEMETRY_TRACES_ENDPOINT=http://otel-collector:4317Wide event logging
Section titled “Wide event logging”WideLogCollector provides trace-first observability by accumulating context throughout request/task execution and emitting one structured log line when the operation is “interesting.”
What gets logged
Section titled “What gets logged”Operations emit at completion only when:
- Failure: Any failure kind (normalized via
Failure.wrap()) → ERROR level - Warnings: At least one warning recorded via
WideLogCollector.warn()→ WARN level - Exceptions: At least one exception event recorded → WARN level
- Slow: Duration exceeds the configured threshold for the flow type
- Marked: Explicitly marked interesting via
WideLogCollector.markInteresting()
Silent (fast success without warnings/exceptions) is intentional — it keeps logs focused on actionable events.
Logger categories
Section titled “Logger categories”Identity Scribe uses 5 canonical loggers, all under the com.kenoxa.scribe.* namespace:
| Logger | Config Key | Env Override | Purpose |
|---|---|---|---|
com.kenoxa.scribe.SuperVisor | log.SuperVisor | SCRIBE_LOG_SUPERVISOR | Startup/shutdown, orchestration |
com.kenoxa.scribe.Ingest | log.Ingest | SCRIBE_LOG_INGEST | Transcription pipeline, event store |
com.kenoxa.scribe.Monitoring | log.Monitoring | SCRIBE_LOG_MONITORING | Wide-log output, observability |
com.kenoxa.scribe.License | log.License | SCRIBE_LOG_LICENSE | License verification |
com.kenoxa.scribe.Config | log.Config | SCRIBE_LOG_CONFIG | Configuration parsing |
All loggers inherit from log.level by default. Configure per-logger levels to control verbosity:
# Disable wide-log output entirelySCRIBE_LOG_MONITORING=off
# Verbose transcription debuggingSCRIBE_LOG_INGEST=debug
# Quiet license checksSCRIBE_LOG_LICENSE=errorLog format
Section titled “Log format”Wide logs are emitted via the Monitoring logger at INFO/WARN/ERROR level:
{"trace_id":"...","span_id":"...","duration_seconds":1.5,"result":"ok","scribe.operation":"LDAP.Search",...}JSON format
Section titled “JSON format”Single-line JSON for machine parsing:
{ "trace_id": "abc123", "span_id": "def456", "duration_seconds": 1.5, "result": "ok", "scribe.operation": "LDAP.Search", "scribe.entry_type": "user", "scribe.search.kind": "paged", "events": [ {"name": "warning", "timestamp": "...", "attributes": {"code": "unsupported_control", "message": "VLV not available"}} ]}Pretty format (default)
Section titled “Pretty format (default)”Human-friendly format with header line, auto-grouped attributes, and segment timeline. When terminal color support is detected, output is enhanced with ANSI colors:
| Element | Color | Purpose |
|---|---|---|
Result ok | Green | Success at a glance |
Result ok (with warnings) | Yellow | Success but needs attention |
| Result (error) | Red | Immediate attention required |
| Duration | Dim | Visual separation from content |
| Attribute group prefixes | Cyan | Quick section identification (db:, http:) |
| Attribute keys | Bold | Easy scanning within groups |
| Warning events | Yellow | Stand out in event timeline |
| Failure header | Red | Draws eye to error details |
Color detection uses TERM, FORCE_COLOR, NO_COLOR (no-color.org), and TTY presence. Disable colors with NO_COLOR=1 or TERM=dumb.
LDAP.Search ok dur=1.5s warnings=1 events=5 trace=abc123 span=def456 db: system=postgresql duration.seconds=0.42 row_count=42 http: route="/ldap/search" scribe: entry_type=user search.kind=paged scribe.query: scope="ou=users,o=org" where="(&(objectClass=*))" sort="cn" events: + 5.1ms 12.3ms Query.Plan warning code=unsupported_control message="VLV not available" + 18.0ms 401.9ms DB.Fetch row_count=42 cache.hit +520.2ms 9.8ms Limiter.Acquire permits=1Failure output (with full details and stacktrace):
scribe REST.Modify internal dur=311ms trace=trace123 span=span456 failure: kind=INTERNAL code=SERVER_ERROR message="Database connection failed" details={"context": "user.modify", "attempt": 3} trace_id=trace123 span_id=span456 cause_type=java.sql.SQLException cause_message="Connection refused" stacktrace: java.sql.SQLException: Connection refused at org.postgresql.Driver.connect(Driver.java:285) at ... scribe: entry_type=userFeatures:
- Header line: Operation, result (colored on TTY), duration, warning/event counts
- Trace/span: Only shown on warnings or failures (de-emphasized on success)
- Auto-grouped attributes: Keys grouped by prefix (e.g.,
db:,scribe:,scribe.query:) - Segment timeline: Child segments shown with offset and duration (
+offset duration name) - JSON detection: String values that are JSON are auto-formatted
- Color support: Auto-detected based on TTY, CI environment, and
FORCE_COLOR/NO_COLORenv vars
Wide event fields
Section titled “Wide event fields”| Field | Type | Description |
|---|---|---|
trace_id | string | OpenTelemetry trace ID (when tracing enabled) |
span_id | string | OpenTelemetry span ID |
parent_span_id | string | Parent span ID (optional) |
duration_seconds | double | Operation duration in seconds |
result | string | ok or failure kind (e.g., internal, not_found) |
failure | object | Failure details: kind, code, message, details, stacktrace (when failed) |
events | array | Event records including warnings (name=warning) and segment timing |
scribe.operation | string | Operation name (e.g., LDAP.Search, REST.GET, Transcription.WorkItem) |
| Custom attributes | varies | Any attributes set via Segment.annotate() |
Configuration
Section titled “Configuration”monitoring.log { # Enable/disable wide event logging (default: true) enabled = true enabled = ${?SCRIBE_LOG_ENABLED}
# Log format: pretty (default) | json | auto # auto = pretty in dev mode + TTY, json otherwise # Uses app.mode (dev/development/local/test → dev mode) format = "pretty" format = ${?SCRIBE_LOG_FORMAT}
# Random sampling for ops that don't match any rule # 0 = never sample, 100 = always log (default) sample-rate = 100 sample-rate = ${?SCRIBE_LOG_SAMPLE_RATE}
# Per-key redaction strategies # Keys support glob patterns: * = any chars, ? = single char # # Available strategies: # replace - Replace with "[REDACTED]" # hash - SHA-256, base64url, 16 chars # truncate - Show ≤33% of chars (max 8 visible); ≤8 chars fall back to [REDACTED] # omit - Remove attribute entirely # # Hard-coded security patterns (always OMIT, cannot be disabled): # *password*, *credential*, *token*, *secret*, *apikey*, *api_key* redaction { "*dn*" = hash # Matches scribe.entry_dn, ldap.base_dn, etc. "*email*" = truncate # Matches user.email, notification.email, etc. "*.raw" = replace "*.pii.*" = hash }}Rule-based filtering
Section titled “Rule-based filtering”Rules control which operations are logged. They evaluate in order — first match wins. Operations not matching any rule fall through to random sampling.
Decision flow:
- Failures → Always logged (ERROR level)
- Warnings/Exceptions → Always logged (WARN level)
- markInteresting() → Always logged
- First matching rule →
includelogs,excludesuppresses - No match → Apply
sample-rate(0-100%)
Rule syntax:
monitoring.log.rules = [ { action = include|exclude, name = "glob", where = "(filter)" }]| Field | Required | Description |
|---|---|---|
action | Yes | include (log) or exclude (suppress) |
name | No | Glob pattern for operation name (* = any chars, ? = single char) |
where | No | LDAP-style filter on attributes |
Duration filtering:
The synthetic duration.seconds attribute is injected before rule evaluation, enabling duration-based filtering:
monitoring.log.rules = [ # Suppress fast successful operations { action = exclude, where = "(&(scribe.result=ok)(duration.seconds<=50ms))" }
# Always log slow operations { action = include, where = "(duration.seconds>=5s)" }
# Suppress internal maintenance under threshold { action = exclude, name = "Hints.*", where = "(duration.seconds<=100ms)" } { action = exclude, name = "Metrics.*", where = "(duration.seconds<=500ms)" }]Duration values support multiple formats: plain seconds (0.05), HOCON style (50ms, 5s, 1m), or ISO 8601 (PT5S).
Common patterns:
# Log all LDAP operations{ action = include, name = "LDAP.*" }
# Suppress fast successful ops{ action = exclude, where = "(&(scribe.result=ok)(duration.seconds<=50ms))" }
# Log slow DB queries{ action = include, where = "(db.duration.seconds>=1s)" }
# Exclude everything else (catch-all){ action = exclude }See monitoring.log.rules for the complete reference.
Segment tracking
Section titled “Segment tracking”Child segments can be automatically captured as events in wide logs, providing operation breakdown without trace analysis.
Modes:
| Mode | What’s Captured | Overhead |
|---|---|---|
auto | Full in dev mode, minimal in prod (default) | Varies |
off | Nothing | None |
minimal | Name, offset, duration | Low |
full | Name, offset, duration, segment attributes | Moderate |
Configuration:
monitoring.log.childs { # Mode: auto (default) | off | minimal | full # auto = full in dev mode, minimal in prod mode = "auto" mode = ${?SCRIBE_LOG_CHILDS_MODE}
# Display: auto (default) | off | summary | details # auto = summary in dev, off in prod display = "auto"
# Rules: which segments to track (first match wins, empty = include all) rules = [ { action = include, where = "(duration.seconds>=1ms)" } { action = exclude } # Catch-all: filter sub-millisecond noise ]}Example configurations:
# Track all segments (no filtering)childs { mode = "full" rules = [] # Empty = include all}
# Track only slow segmentschilds { mode = "minimal" rules = [ { action = include, name = "Query.*", where = "(duration.seconds>=10ms)" } { action = include, name = "DB.*", where = "(duration.seconds>=5ms)" } { action = exclude } ]}Event attributes (added to each tracked segment event):
| Attribute | Type | Description |
|---|---|---|
offset.seconds | double | Time from operation start to segment start |
duration.seconds | double | Segment duration |
| Segment attributes | varies | In full mode, includes attributes set via segment.annotate() |
Ordering: Events appear in segment start-time order (not end-time), making the wide log easy to read chronologically.
Redaction: Segment-local attributes are subject to the same redaction rules as operation-level attributes.
Use case: Identify slow sub-operations without diving into traces:
{ "duration_seconds": 0.125, "result": "ok", "scribe.operation": "LDAP.Search", "events": [ { "name": "Query.Plan", "timestamp": "2026-01-07T10:30:00.001Z", "attributes": { "offset.seconds": 0.001, "duration.seconds": 0.002 } }, { "name": "Query.Execute", "timestamp": "2026-01-07T10:30:00.003Z", "attributes": { "offset.seconds": 0.003, "duration.seconds": 0.095 } }, { "name": "Query.Map", "timestamp": "2026-01-07T10:30:00.098Z", "attributes": { "offset.seconds": 0.098, "duration.seconds": 0.025 } } ]}Kubernetes probe configuration
Section titled “Kubernetes probe configuration”startupProbe: httpGet: path: /startedz port: 8081 initialDelaySeconds: 5 periodSeconds: 10 failureThreshold: 90 # 15min max for services + DB init
livenessProbe: httpGet: path: /livez port: 8081 initialDelaySeconds: 0 periodSeconds: 10 failureThreshold: 3
readinessProbe: httpGet: path: /readyz port: 8081 initialDelaySeconds: 0 periodSeconds: 5 failureThreshold: 1 successThreshold: 1