Logging and Traces

You saw something in the signals. Now you need to find out what happened.

IdentityScribe uses two investigation tools: wide event logs that capture one structured log line per operation, and distributed traces via OpenTelemetry. Both work together — wide logs include trace IDs, and traces link back to the operation that produced them.

Wide logs are your first stop. They show failures, warnings, and slow operations without any extra infrastructure. Traces go deeper — they break each operation into stages so you can see exactly where time is spent. Traces require a backend (Tempo, Jaeger) but give you the full picture when logs aren’t enough.

Wide event logging

IdentityScribe accumulates context throughout each request, then emits a single structured log line when the operation finishes — but only when it’s worth looking at.

An operation logs when it:

Failed (any failure kind) → ERROR level
Had warnings or exceptions → WARN level
Ran slow (exceeded the configured threshold)
Was explicitly marked interesting

Fast, successful operations produce no log output. That’s by design — it keeps logs focused on things that need your attention.

Logger categories

Five loggers, all under com.kenoxa.scribe.*:

Logger	Env Override	Purpose
`SuperVisor`	`SCRIBE_LOG_SUPERVISOR`	Startup, shutdown, orchestration
`Ingest`	`SCRIBE_LOG_INGEST`	Transcription pipeline, event store
`Monitoring`	`SCRIBE_LOG_MONITORING`	Wide-log output
`License`	`SCRIBE_LOG_LICENSE`	License verification
`Config`	`SCRIBE_LOG_CONFIG`	Configuration parsing

All inherit from log.level by default. Override individually:

SCRIBE_LOG_MONITORING=off    # Disable wide logs entirely
SCRIBE_LOG_INGEST=debug      # Verbose transcription debugging
SCRIBE_LOG_LICENSE=error      # Quiet license checks

Log format

Pretty format (default in dev mode with a TTY):

LDAP.Search ok dur=1.5s warnings=1 events=5
  trace=abc123 span=def456
  db: system=postgresql duration.seconds=0.42 row_count=42
  scribe: entry_type=user search.kind=paged
  scribe.query: scope="ou=users,o=org" where="(&(objectClass=*))" sort="cn"
  events:
    +  5.1ms  12.3ms  Query.Plan
    warning code=unsupported_control message="VLV not available"
    + 18.0ms 401.9ms  DB.Fetch row_count=42

JSON format (default in production):

{"trace_id":"abc123","span_id":"def456","duration_seconds":1.5,"result":"ok","scribe.operation":"LDAP.Search",...}

Failure output includes full details and stacktrace:

scribe REST.Modify internal dur=311ms trace=trace123 span=span456
  failure:
    kind=INTERNAL code=SERVER_ERROR
    message="Database connection failed"
    cause_type=java.sql.SQLException
    cause_message="Connection refused"
    stacktrace:
      java.sql.SQLException: Connection refused
        at org.postgresql.Driver.connect(Driver.java:285)

Color detection uses TERM, FORCE_COLOR, NO_COLOR, and TTY presence. Disable colors with NO_COLOR=1.

Configuration

monitoring.log {
  enabled = true                    # default
  format = "pretty"                 # pretty | json | auto
  sample-rate = 100                 # 0-100%, for ops that don't match any rule

  # Redaction strategies (glob patterns for keys)
  redaction {
    "*dn*" = hash           # SHA-256 hash
    "*email*" = truncate    # Show ≤33% of chars
    "*.raw" = replace       # Replace with [REDACTED]
  }
  # Hard-coded: *password*, *credential*, *token*, *secret* always omitted
}

Rule-based filtering

Rules control which operations get logged. They evaluate in order — first match wins. Operations without a matching rule fall through to random sampling.

Decision flow:

Failures → always logged (ERROR)
Warnings/Exceptions → always logged (WARN)
Marked interesting → always logged
First matching rule → include logs, exclude suppresses
No match → apply sample-rate

Rule syntax:

monitoring.log.rules = [
  { id = "suppress-fast-ok", action = exclude, where = "scribe.result=ok duration.seconds<=50ms" }
  { id = "log-slow", action = include, where = "duration.seconds>=5s" }
  { action = exclude, name = "Hints.*", where = "duration.seconds<=100ms" }
  { action = exclude, name = "Metrics.*", where = "duration.seconds<=500ms" }
]

Field	Required	Description
`action`	Yes	`include` or `exclude`
`name`	No	Glob pattern for operation name
`where`	No	LDAP-style filter on attributes
`id`	No	Identifier for debugging (defaults to `Rule #N`)

Duration values accept plain seconds (0.05), HOCON style (50ms, 5s), or ISO 8601 (PT5S).

Common patterns:

{ action = include, name = "LDAP.*" }                                      # All LDAP ops
{ action = exclude, where = "(&(scribe.result=ok)(duration.seconds<=50ms))" }  # Fast successes
{ action = include, where = "(db.duration.seconds>=1s)" }                  # Slow DB queries
{ action = exclude }                                                        # Catch-all

See monitoring.log.rules for the complete reference. For all logging configuration options (redaction, leak detection, shutdown noise), see monitoring.log.

Segment tracking

Child segments can appear as events in wide logs, giving you an operation breakdown without opening a trace viewer.

Mode	What’s captured	Overhead
`auto`	Full in dev, minimal in prod (default)	Varies
`off`	Nothing	None
`minimal`	Name, offset, duration	Low
`full`	Name, offset, duration, attributes	Moderate

monitoring.log.childs {
  mode = "auto"
  rules = [
    { action = include, where = "(duration.seconds>=1ms)" }
    { action = exclude }
  ]
}

Events appear in start-time order (not end-time), so the log reads chronologically. Segment attributes follow the same redaction rules as operation-level attributes.

Trace layout

When tracing is enabled, each query produces a span hierarchy:

LDAP.Search (or REST.Search, etc.)         ← Channel entry span
└── Query.Normalize                         ← Normalization
└── Query.Plan                              ← Planning
└── Query.Compile                           ← SQL emission
└── Query.Execute                           ← DB execution + streaming

Span attributes

Attribute	Description	Example
`scribe.result`	Outcome	`ok`, `cancelled`, `deadline_exceeded`
`scribe.search.kind`	Pagination mode	`simple`, `paged`, `vlv`
`scribe.query.signature`	Query signature hash	`a1b2c3d4`
`scribe.entry_type`	Entry type(s) in scope	`inetOrgPerson`

Phase names

The phase label in scribe.query.stage.duration.seconds maps directly to span names:

Phase	Span Name	What it does
`normalize`	`Query.Normalize`	Attribute mapping, filter canonicalization
`plan`	`Query.Plan`	Index selection, predicate classification
`compile`	`Query.Compile`	Logical plan → SQL
`execute`	`Query.Execute`	JDBC execution, result streaming

Latency debugging with traces

Find the slow stage — look at scribe_query_stage_duration_seconds by phase
Get a sample trace — filter by trace ID or high duration
Drill into Execute — if it’s slow, check for lock contention, missing indexes (/observe/hints), or pool saturation (hikaricp_connections_pending > 0)
Check Plan — if it’s slow, the query may be too complex

Enabling tracing

monitoring.telemetry.traces {
  enabled = true
  endpoint = "http://otel-collector:4317"
  protocol = "grpc"  # or "http/protobuf" for port 4318
}

Or via environment variables:

export SCRIBE_TELEMETRY_TRACES_ENABLED=true
export SCRIBE_TELEMETRY_TRACES_ENDPOINT=http://otel-collector:4317

Wide event fields

Field	Type	Description
`trace_id`	string	OpenTelemetry trace ID (when tracing enabled)
`span_id`	string	OpenTelemetry span ID
`duration_seconds`	double	Operation duration
`result`	string	`ok` or failure kind
`failure`	object	Details when failed: `kind`, `code`, `message`, `stacktrace`
`events`	array	Warnings and segment timing
`scribe.operation`	string	Operation name (e.g., `LDAP.Search`, `REST.GET`)