Skip to content

Health and Monitoring

Is IdentityScribe healthy? This page shows you how to check, what the numbers mean, and what to do when they look wrong.

For the full list of /observe/* endpoints and their response shapes, see the Endpoints Reference.

Scribe ships with a web interface. Point your browser at the running instance — no external tools needed.

Portal (/) shows system status at a glance: version, uptime, golden signals strip, entry counts, and change feed summary.

Operator UI (/ui/) goes deeper: entry counts by type, live sparklines for ingest activity, pressure indicators, and quick links to REST, GraphQL, and Observe views.

Entries browser (/ui/entries) lets you search, filter, and inspect directory entries. You can view point-in-time snapshots, compare versions, and browse the change timeline for any entry.

Observe dashboard (/ui/observe) shows pressure gauges, golden signals, service health, and the Operator Copilot — which surfaces recommendations when it detects problems.

What you needWhere to go
Browse entries/ui/entries
Track changes/ui/changes
Quick health spot-check/ui/observe
Full health report/ui/observe/doctor
Production alertingGrafana bundle

Examples use port 8080 (the default). The bundled monitoring stack uses port 9001 via socket separation. These endpoints are also listed in the Endpoints Reference with full response schemas.

The smartest health check. Returns threshold-based assessments, per-service status, and actionable recommendations.

Terminal window
curl -s http://localhost:8080/observe/doctor | jq
StatusMeaning
healthyAll checks pass
degradedWarning thresholds exceeded — investigate soon
criticalCritical thresholds exceeded — act now

Focused on saturation metrics only. Shows per-entry-type breakdown of queue and task pressure, plus query_permit_queue (threads waiting for permits). Use this when you want a quick saturation check without the full doctor report.

Terminal window
curl -s http://localhost:8080/observe/pressure | jq

For Kubernetes liveness, readiness, and startup checks, see the probe configuration in the deployment guide.

EndpointPurpose
/livezIs the process alive?
/readyzCan it serve traffic?
/startedzHave services initialized?
/healthzCombined health check

Append ?verbose for detailed output.

Animated dashboard cycling through an operational scenario: steady state, traffic spike, cascade, full saturation, and recovery. Each gauge shows pressure from 0 to 1 with green, amber, and red zones.
Pressure gauges

Four metrics that tell you if the system is saturated

HEALTHY

All pressure metrics nominal

Healthy (< 0.5)
Elevated (0.5–0.8)
Saturated (> 0.8)

Pressure gauges are the first thing to check when something feels off. Each one measures how close a resource is to saturation. For the full metric definitions with labels, units, and scrape details, see the Telemetry Reference.

MetricRangeWarningCriticalWhat it means
scribe_query_permit_pressure0..10.80.95Query capacity used
scribe_query_permit_queue0..N510Threads waiting for query permits
scribe_ingest_queue_pressure0..10.80.95Ingest buffer fill level
scribe_ingest_task_pressure~11.22.0Processing keep-up ratio (>1 = falling behind)
jvm_memory_pressure0..10.80.95Heap utilization

Look for sustained elevation, not spikes. A brief spike during a bulk import is normal. Sustained high pressure with rising latency or rejections means you need to act.

MetricThreshold
scribe_db_connections_pending> 0 means pool saturation
scribe_ingest_lag_seconds> 60s warning, > 300s critical
scribe_query_rejected_5m> 0 means queries are being dropped

After a deployment or restart:

Terminal window
# 1. Services started?
curl http://localhost:8080/startedz?verbose
# 2. Uptime and status
curl -s http://localhost:8080/observe/status | jq
# 3. Sync completed?
curl http://localhost:8080/readyz?verbose
# 4. Any recent restarts?
curl -s http://localhost:8080/observe/doctor | jq '.checks[] | select(.name == "services.restarts_5m")'

If data looks stale, check ingest status:

Terminal window
curl -s http://localhost:8080/observe/stats/ingest | jq
SymptomLikely causeAction
High lag, low task pressureSource LDAP slowCheck source LDAP performance
High lag, high task pressureProcessing bottleneckIncrease workers or check DB
High queue pressureDB writes slowCheck connections, run VACUUM ANALYZE
ingest_completed = falseInitial sync still runningCheck /readyz

Alert: scribe_query_permit_pressure > 0.8

  1. Check /observe/doctor for query.permit_pressure
  2. Find slow queries via traces or stage durations
  3. Consider raising database.channelPoolSize
  4. Check /observe/hints?severity=warning for missing indexes

Alert: scribe_ingest_task_pressure > 1.2

  1. Check /observe/stats/ingest for per-entry-type breakdown
  2. Verify source LDAP connectivity
  3. Run VACUUM ANALYZE if DB is slow
  4. Consider raising ingest workers

Alert: jvm_memory_pressure > 0.8

  1. Check /observe/doctor context section
  2. Review -Xmx heap size
  3. Profile for leaks if persistent
  4. Restart as last resort

Alert: scribe_service_restarts_5m > 1

  1. Check /observe/doctor for services.restarts_5m
  2. Review logs for crash causes
  3. Check resource limits (CPU, memory)
  4. Check external dependencies (DB, LDAP)