Signals
Your system is running. What numbers should you watch, and when should you worry?
This page covers the signals that tell you whether IdentityScribe is healthy. For the full metrics inventory, see the generated Telemetry Reference. For pressure gauge playbooks, see Health and Monitoring.
Golden signals
Section titled “Golden signals”Four dimensions of system health
All signals nominal
Scribe exposes the Four Golden Signals for both query and ingest sides.
Query signals
Section titled “Query signals”Five signals tell you whether the query side is healthy: latency, traffic, server errors, client errors, and saturation. Each is available as a pre-computed gauge — no PromQL needed.
Per-channel breakdowns use the channel label (e.g., scribe_signals_channel_latency_p95{channel="ldap"}).
Ingest signals
Section titled “Ingest signals”Four signals cover the sync side: task latency, replication lag, change detection rate, and task failure rate. Per-entry-type breakdowns use the entry_type label.
LDAP delegation signals
Section titled “LDAP delegation signals”LDAP searches that can’t be satisfied locally get delegated upstream. Track delegation rate with scribe_signals_ldap_delegation_rate_percent and per-reason counts with scribe_ldap_search_delegated_total. See LDAP channel docs for delegation reasons.
For the full metric inventory with labels, units, and scrape details, see the Telemetry Reference.
Built-in signals dashboard
Section titled “Built-in signals dashboard”curl -s http://localhost:8080/observe/signals | jqReturns a JSON summary of all golden signals. The built-in Operator UI at /ui/observe visualizes these in real time.
Error classification
Section titled “Error classification”Not every error deserves the same response. A user typo shouldn’t page you at 3am. A database timeout should.
Server errors are problems inside Scribe — internal failures, timeouts, resource exhaustion. These affect health status because they mean the service itself has a problem.
Client errors are problems with the request — bad input, missing auth, resources that don’t exist. A spike might mean a misconfigured caller or a scanning attack, but Scribe itself is fine.
Health states
Section titled “Health states”| Status | What it means | What to do |
|---|---|---|
| HEALTHY | All signals within thresholds | Nothing |
| NOISY | Client error rate high (≥10%) | Check if a caller changed behavior |
| DEGRADED | Server errors ≥0.5%, or latency thresholds breached | Check logs for patterns |
| CRITICAL | Server errors ≥2% | Investigate immediately |
NOISY is intentionally separate from DEGRADED. A flood of 404s from a broken client shouldn’t mask a real database problem, and shouldn’t trigger your pager.
Tuning thresholds
Section titled “Tuning thresholds”The defaults work for most deployments. Adjust them if your traffic pattern is unusual:
monitoring.signals { min-requests = 200 # Ignore thresholds below this volume server-error-rate.degraded = 0.5 server-error-rate.critical = 2.0 client-error-rate.noisy = 10.0}Different channels can have different tolerances. LDAP clients often need tighter latency than REST:
channels.ldap.signals = ${monitoring.signals} { latency-p99.degraded = 0.3}See monitoring.signals for all thresholds.
Client error logging
Section titled “Client error logging”By default, client errors pass through log rules and may get filtered out. Server errors and auth failures always log — you want to see those regardless.
To log all client errors (useful when debugging a broken caller):
monitoring.log.filter-client-errors = falsePromQL recipes
Section titled “PromQL recipes”PromQL examples live in the reference: PromQL Recipes.