Logging and Traces
You saw something in the signals. Now you need to find out what happened.
IdentityScribe uses two investigation tools: wide event logs that capture one structured log line per operation, and distributed traces via OpenTelemetry. Both work together — wide logs include trace IDs, and traces link back to the operation that produced them.
Wide logs are your first stop. They show failures, warnings, and slow operations without any extra infrastructure. Traces go deeper — they break each operation into stages so you can see exactly where time is spent. Traces require a backend (Tempo, Jaeger) but give you the full picture when logs aren’t enough.
Wide event logging
Section titled “Wide event logging”IdentityScribe accumulates context throughout each request, then emits a single structured log line when the operation finishes — but only when it’s worth looking at.
An operation logs when it:
- Failed (any failure kind) → ERROR level
- Had warnings or exceptions → WARN level
- Ran slow (exceeded the configured threshold)
- Was explicitly marked interesting
Fast, successful operations produce no log output. That’s by design — it keeps logs focused on things that need your attention.
Logger categories
Section titled “Logger categories”Five loggers, all under com.kenoxa.scribe.*:
| Logger | Env Override | Purpose |
|---|---|---|
SuperVisor | SCRIBE_LOG_SUPERVISOR | Startup, shutdown, orchestration |
Ingest | SCRIBE_LOG_INGEST | Transcription pipeline, event store |
Monitoring | SCRIBE_LOG_MONITORING | Wide-log output |
License | SCRIBE_LOG_LICENSE | License verification |
Config | SCRIBE_LOG_CONFIG | Configuration parsing |
All inherit from log.level by default. Override individually:
SCRIBE_LOG_MONITORING=off # Disable wide logs entirelySCRIBE_LOG_INGEST=debug # Verbose transcription debuggingSCRIBE_LOG_LICENSE=error # Quiet license checksLog format
Section titled “Log format”Pretty format (default in dev mode with a TTY):
LDAP.Search ok dur=1.5s warnings=1 events=5 trace=abc123 span=def456 db: system=postgresql duration.seconds=0.42 row_count=42 scribe: entry_type=user search.kind=paged scribe.query: scope="ou=users,o=org" where="(&(objectClass=*))" sort="cn" events: + 5.1ms 12.3ms Query.Plan warning code=unsupported_control message="VLV not available" + 18.0ms 401.9ms DB.Fetch row_count=42JSON format (default in production):
{"trace_id":"abc123","span_id":"def456","duration_seconds":1.5,"result":"ok","scribe.operation":"LDAP.Search",...}Failure output includes full details and stacktrace:
scribe REST.Modify internal dur=311ms trace=trace123 span=span456 failure: kind=INTERNAL code=SERVER_ERROR message="Database connection failed" cause_type=java.sql.SQLException cause_message="Connection refused" stacktrace: java.sql.SQLException: Connection refused at org.postgresql.Driver.connect(Driver.java:285)Color detection uses TERM, FORCE_COLOR, NO_COLOR, and TTY presence. Disable colors with NO_COLOR=1.
Configuration
Section titled “Configuration”monitoring.log { enabled = true # default format = "pretty" # pretty | json | auto sample-rate = 100 # 0-100%, for ops that don't match any rule
# Redaction strategies (glob patterns for keys) redaction { "*dn*" = hash # SHA-256 hash "*email*" = truncate # Show ≤33% of chars "*.raw" = replace # Replace with [REDACTED] } # Hard-coded: *password*, *credential*, *token*, *secret* always omitted}Rule-based filtering
Section titled “Rule-based filtering”Rules control which operations get logged. They evaluate in order — first match wins. Operations without a matching rule fall through to random sampling.
Decision flow:
- Failures → always logged (ERROR)
- Warnings/Exceptions → always logged (WARN)
- Marked interesting → always logged
- First matching rule →
includelogs,excludesuppresses - No match → apply
sample-rate
Rule syntax:
monitoring.log.rules = [ { id = "suppress-fast-ok", action = exclude, where = "scribe.result=ok duration.seconds<=50ms" } { id = "log-slow", action = include, where = "duration.seconds>=5s" } { action = exclude, name = "Hints.*", where = "duration.seconds<=100ms" } { action = exclude, name = "Metrics.*", where = "duration.seconds<=500ms" }]| Field | Required | Description |
|---|---|---|
action | Yes | include or exclude |
name | No | Glob pattern for operation name |
where | No | LDAP-style filter on attributes |
id | No | Identifier for debugging (defaults to Rule #N) |
Duration values accept plain seconds (0.05), HOCON style (50ms, 5s), or ISO 8601 (PT5S).
Common patterns:
{ action = include, name = "LDAP.*" } # All LDAP ops{ action = exclude, where = "(&(scribe.result=ok)(duration.seconds<=50ms))" } # Fast successes{ action = include, where = "(db.duration.seconds>=1s)" } # Slow DB queries{ action = exclude } # Catch-allSee monitoring.log.rules for the complete reference. For all logging configuration options (redaction, leak detection, shutdown noise), see monitoring.log.
Segment tracking
Section titled “Segment tracking”Child segments can appear as events in wide logs, giving you an operation breakdown without opening a trace viewer.
| Mode | What’s captured | Overhead |
|---|---|---|
auto | Full in dev, minimal in prod (default) | Varies |
off | Nothing | None |
minimal | Name, offset, duration | Low |
full | Name, offset, duration, attributes | Moderate |
monitoring.log.childs { mode = "auto" rules = [ { action = include, where = "(duration.seconds>=1ms)" } { action = exclude } ]}Events appear in start-time order (not end-time), so the log reads chronologically. Segment attributes follow the same redaction rules as operation-level attributes.
Trace layout
Section titled “Trace layout”When tracing is enabled, each query produces a span hierarchy:
LDAP.Search (or REST.Search, etc.) ← Channel entry span└── Query.Normalize ← Normalization└── Query.Plan ← Planning└── Query.Compile ← SQL emission└── Query.Execute ← DB execution + streamingSpan attributes
Section titled “Span attributes”| Attribute | Description | Example |
|---|---|---|
scribe.result | Outcome | ok, cancelled, deadline_exceeded |
scribe.search.kind | Pagination mode | simple, paged, vlv |
scribe.query.signature | Query signature hash | a1b2c3d4 |
scribe.entry_type | Entry type(s) in scope | inetOrgPerson |
Phase names
Section titled “Phase names”The phase label in scribe.query.stage.duration.seconds maps directly to span names:
| Phase | Span Name | What it does |
|---|---|---|
normalize | Query.Normalize | Attribute mapping, filter canonicalization |
plan | Query.Plan | Index selection, predicate classification |
compile | Query.Compile | Logical plan → SQL |
execute | Query.Execute | JDBC execution, result streaming |
Latency debugging with traces
Section titled “Latency debugging with traces”- Find the slow stage — look at
scribe_query_stage_duration_secondsby phase - Get a sample trace — filter by trace ID or high duration
- Drill into Execute — if it’s slow, check for lock contention, missing indexes (
/observe/hints), or pool saturation (hikaricp_connections_pending > 0) - Check Plan — if it’s slow, the query may be too complex
Enabling tracing
Section titled “Enabling tracing”monitoring.telemetry.traces { enabled = true endpoint = "http://otel-collector:4317" protocol = "grpc" # or "http/protobuf" for port 4318}Or via environment variables:
export SCRIBE_TELEMETRY_TRACES_ENABLED=trueexport SCRIBE_TELEMETRY_TRACES_ENDPOINT=http://otel-collector:4317Wide event fields
Section titled “Wide event fields”| Field | Type | Description |
|---|---|---|
trace_id | string | OpenTelemetry trace ID (when tracing enabled) |
span_id | string | OpenTelemetry span ID |
duration_seconds | double | Operation duration |
result | string | ok or failure kind |
failure | object | Details when failed: kind, code, message, stacktrace |
events | array | Warnings and segment timing |
scribe.operation | string | Operation name (e.g., LDAP.Search, REST.GET) |