Skip to content

Signals

Your system is running. What numbers should you watch, and when should you worry?

This page covers the signals that tell you whether IdentityScribe is healthy. For the full metrics inventory, see the generated Telemetry Reference. For pressure gauge playbooks, see Health and Monitoring.

Golden signals

Four dimensions of system health

HEALTHY

All signals nominal

Latency
p95 45ms
p50 12ms
Traffic
req/s 120
events/s 45
Errors
server 0.1%
client 2.0%
Saturation
permits 0.30
memory 0.45

Scribe exposes the Four Golden Signals for both query and ingest sides.

Five signals tell you whether the query side is healthy: latency, traffic, server errors, client errors, and saturation. Each is available as a pre-computed gauge — no PromQL needed.

Per-channel breakdowns use the channel label (e.g., scribe_signals_channel_latency_p95{channel="ldap"}).

Four signals cover the sync side: task latency, replication lag, change detection rate, and task failure rate. Per-entry-type breakdowns use the entry_type label.

LDAP searches that can’t be satisfied locally get delegated upstream. Track delegation rate with scribe_signals_ldap_delegation_rate_percent and per-reason counts with scribe_ldap_search_delegated_total. See LDAP channel docs for delegation reasons.

For the full metric inventory with labels, units, and scrape details, see the Telemetry Reference.

Terminal window
curl -s http://localhost:8080/observe/signals | jq

Returns a JSON summary of all golden signals. The built-in Operator UI at /ui/observe visualizes these in real time.

Not every error deserves the same response. A user typo shouldn’t page you at 3am. A database timeout should.

Server errors are problems inside Scribe — internal failures, timeouts, resource exhaustion. These affect health status because they mean the service itself has a problem.

Client errors are problems with the request — bad input, missing auth, resources that don’t exist. A spike might mean a misconfigured caller or a scanning attack, but Scribe itself is fine.

StatusWhat it meansWhat to do
HEALTHYAll signals within thresholdsNothing
NOISYClient error rate high (≥10%)Check if a caller changed behavior
DEGRADEDServer errors ≥0.5%, or latency thresholds breachedCheck logs for patterns
CRITICALServer errors ≥2%Investigate immediately

NOISY is intentionally separate from DEGRADED. A flood of 404s from a broken client shouldn’t mask a real database problem, and shouldn’t trigger your pager.

The defaults work for most deployments. Adjust them if your traffic pattern is unusual:

monitoring.signals {
min-requests = 200 # Ignore thresholds below this volume
server-error-rate.degraded = 0.5
server-error-rate.critical = 2.0
client-error-rate.noisy = 10.0
}

Different channels can have different tolerances. LDAP clients often need tighter latency than REST:

channels.ldap.signals = ${monitoring.signals} {
latency-p99.degraded = 0.3
}

See monitoring.signals for all thresholds.

By default, client errors pass through log rules and may get filtered out. Server errors and auth failures always log — you want to see those regardless.

To log all client errors (useful when debugging a broken caller):

monitoring.log.filter-client-errors = false

PromQL examples live in the reference: PromQL Recipes.