Skip to content

Logging and Traces

You saw something in the signals. Now you need to find out what happened.

IdentityScribe uses two investigation tools: wide event logs that capture one structured log line per operation, and distributed traces via OpenTelemetry. Both work together — wide logs include trace IDs, and traces link back to the operation that produced them.

Wide logs are your first stop. They show failures, warnings, and slow operations without any extra infrastructure. Traces go deeper — they break each operation into stages so you can see exactly where time is spent. Traces require a backend (Tempo, Jaeger) but give you the full picture when logs aren’t enough.

IdentityScribe accumulates context throughout each request, then emits a single structured log line when the operation finishes — but only when it’s worth looking at.

An operation logs when it:

  • Failed (any failure kind) → ERROR level
  • Had warnings or exceptions → WARN level
  • Ran slow (exceeded the configured threshold)
  • Was explicitly marked interesting

Fast, successful operations produce no log output. That’s by design — it keeps logs focused on things that need your attention.

Five loggers, all under com.kenoxa.scribe.*:

LoggerEnv OverridePurpose
SuperVisorSCRIBE_LOG_SUPERVISORStartup, shutdown, orchestration
IngestSCRIBE_LOG_INGESTTranscription pipeline, event store
MonitoringSCRIBE_LOG_MONITORINGWide-log output
LicenseSCRIBE_LOG_LICENSELicense verification
ConfigSCRIBE_LOG_CONFIGConfiguration parsing

All inherit from log.level by default. Override individually:

Terminal window
SCRIBE_LOG_MONITORING=off # Disable wide logs entirely
SCRIBE_LOG_INGEST=debug # Verbose transcription debugging
SCRIBE_LOG_LICENSE=error # Quiet license checks

Pretty format (default in dev mode with a TTY):

LDAP.Search ok dur=1.5s warnings=1 events=5
trace=abc123 span=def456
db: system=postgresql duration.seconds=0.42 row_count=42
scribe: entry_type=user search.kind=paged
scribe.query: scope="ou=users,o=org" where="(&(objectClass=*))" sort="cn"
events:
+ 5.1ms 12.3ms Query.Plan
warning code=unsupported_control message="VLV not available"
+ 18.0ms 401.9ms DB.Fetch row_count=42

JSON format (default in production):

{"trace_id":"abc123","span_id":"def456","duration_seconds":1.5,"result":"ok","scribe.operation":"LDAP.Search",...}

Failure output includes full details and stacktrace:

scribe REST.Modify internal dur=311ms trace=trace123 span=span456
failure:
kind=INTERNAL code=SERVER_ERROR
message="Database connection failed"
cause_type=java.sql.SQLException
cause_message="Connection refused"
stacktrace:
java.sql.SQLException: Connection refused
at org.postgresql.Driver.connect(Driver.java:285)

Color detection uses TERM, FORCE_COLOR, NO_COLOR, and TTY presence. Disable colors with NO_COLOR=1.

monitoring.log {
enabled = true # default
format = "pretty" # pretty | json | auto
sample-rate = 100 # 0-100%, for ops that don't match any rule
# Redaction strategies (glob patterns for keys)
redaction {
"*dn*" = hash # SHA-256 hash
"*email*" = truncate # Show ≤33% of chars
"*.raw" = replace # Replace with [REDACTED]
}
# Hard-coded: *password*, *credential*, *token*, *secret* always omitted
}

Rules control which operations get logged. They evaluate in order — first match wins. Operations without a matching rule fall through to random sampling.

Decision flow:

  1. Failures → always logged (ERROR)
  2. Warnings/Exceptions → always logged (WARN)
  3. Marked interesting → always logged
  4. First matching rule → include logs, exclude suppresses
  5. No match → apply sample-rate

Rule syntax:

monitoring.log.rules = [
{ id = "suppress-fast-ok", action = exclude, where = "scribe.result=ok duration.seconds<=50ms" }
{ id = "log-slow", action = include, where = "duration.seconds>=5s" }
{ action = exclude, name = "Hints.*", where = "duration.seconds<=100ms" }
{ action = exclude, name = "Metrics.*", where = "duration.seconds<=500ms" }
]
FieldRequiredDescription
actionYesinclude or exclude
nameNoGlob pattern for operation name
whereNoLDAP-style filter on attributes
idNoIdentifier for debugging (defaults to Rule #N)

Duration values accept plain seconds (0.05), HOCON style (50ms, 5s), or ISO 8601 (PT5S).

Common patterns:

{ action = include, name = "LDAP.*" } # All LDAP ops
{ action = exclude, where = "(&(scribe.result=ok)(duration.seconds<=50ms))" } # Fast successes
{ action = include, where = "(db.duration.seconds>=1s)" } # Slow DB queries
{ action = exclude } # Catch-all

See monitoring.log.rules for the complete reference. For all logging configuration options (redaction, leak detection, shutdown noise), see monitoring.log.

Child segments can appear as events in wide logs, giving you an operation breakdown without opening a trace viewer.

ModeWhat’s capturedOverhead
autoFull in dev, minimal in prod (default)Varies
offNothingNone
minimalName, offset, durationLow
fullName, offset, duration, attributesModerate
monitoring.log.childs {
mode = "auto"
rules = [
{ action = include, where = "(duration.seconds>=1ms)" }
{ action = exclude }
]
}

Events appear in start-time order (not end-time), so the log reads chronologically. Segment attributes follow the same redaction rules as operation-level attributes.

When tracing is enabled, each query produces a span hierarchy:

LDAP.Search (or REST.Search, etc.) ← Channel entry span
└── Query.Normalize ← Normalization
└── Query.Plan ← Planning
└── Query.Compile ← SQL emission
└── Query.Execute ← DB execution + streaming
AttributeDescriptionExample
scribe.resultOutcomeok, cancelled, deadline_exceeded
scribe.search.kindPagination modesimple, paged, vlv
scribe.query.signatureQuery signature hasha1b2c3d4
scribe.entry_typeEntry type(s) in scopeinetOrgPerson

The phase label in scribe.query.stage.duration.seconds maps directly to span names:

PhaseSpan NameWhat it does
normalizeQuery.NormalizeAttribute mapping, filter canonicalization
planQuery.PlanIndex selection, predicate classification
compileQuery.CompileLogical plan → SQL
executeQuery.ExecuteJDBC execution, result streaming
  1. Find the slow stage — look at scribe_query_stage_duration_seconds by phase
  2. Get a sample trace — filter by trace ID or high duration
  3. Drill into Execute — if it’s slow, check for lock contention, missing indexes (/observe/hints), or pool saturation (hikaricp_connections_pending > 0)
  4. Check Plan — if it’s slow, the query may be too complex
monitoring.telemetry.traces {
enabled = true
endpoint = "http://otel-collector:4317"
protocol = "grpc" # or "http/protobuf" for port 4318
}

Or via environment variables:

Terminal window
export SCRIBE_TELEMETRY_TRACES_ENABLED=true
export SCRIBE_TELEMETRY_TRACES_ENDPOINT=http://otel-collector:4317
FieldTypeDescription
trace_idstringOpenTelemetry trace ID (when tracing enabled)
span_idstringOpenTelemetry span ID
duration_secondsdoubleOperation duration
resultstringok or failure kind
failureobjectDetails when failed: kind, code, message, stacktrace
eventsarrayWarnings and segment timing
scribe.operationstringOperation name (e.g., LDAP.Search, REST.GET)