Monitoring
Monitor IdentityScribe health, diagnose issues, and tune performance. Access the built-in portal at / or use the Grafana monitoring bundle for production alerting.
Related:
- Observability — Metrics, traces, and endpoint taxonomy
- REST Channel — API documentation including change history and temporal lookup
- Upgrading IdentityScribe — Probe and endpoint migrations
- HTTP Server Configuration — Socket separation for monitoring traffic
Portal and operator UI
Section titled “Portal and operator UI”IdentityScribe includes a built-in web interface for operators. No external tools required — just point your browser at the running instance.
Portal (/)
Section titled “Portal (/)”The root page shows system status at a glance:
- System status — version, uptime, health checks, and online/offline indicator
- Golden signals strip — traffic, latency, errors, saturation, ingest rate, and events
- Quick status bar — indexes, services, hints, signatures, and doctor check counts
- Destinations — quick links to Operator UI, Documentation, REST API, and GraphQL
- Directory data — entry type counts and change feed summary (adds, modifies, deletes)
Operator dashboard (/ui/)
Section titled “Operator dashboard (/ui/)”The main operator dashboard provides:
- Entry counts by type with quick links to search
- Change rates with live sparklines for ingest activity
- Pressure indicators for query, ingest, and JVM
- Quick access to REST API, GraphQL, and Observe views
- Hints and signatures counts for performance awareness
Entries browser
Section titled “Entries browser”Browse and inspect directory entries with full search and filtering capabilities.
| Path | Purpose |
|---|---|
/ui/entries | Entry type chooser — select which type to browse |
/ui/entries/{type} | Search and filter entries with query builder |
/ui/entries/{type}/{id} | Entry detail view with attributes, relations, and history |
/ui/entries/{type}/{id}?at=... | Point-in-time view — see entry state at a specific moment |
/ui/entries/{type}/{id}/changes | Per-entry change timeline with diff viewing |
/ui/entries/{type}/{id}/diff | Compare entry state between two timestamps |
Features:
- Query builder with FleX filter syntax support
- Attribute inspection with copy-to-clipboard
- Point-in-time navigation via timeline or
?at=parameter - Diff viewing between any two versions
Changes feed
Section titled “Changes feed”Track changes across all entries in real-time.
| Path | Purpose |
|---|---|
/ui/changes | Global change feed with event type filtering |
Features:
- Event type filtering — show only adds, modifies, moves, or deletes
- Time range filtering — focus on recent changes or specific periods
- Entry linking — click through to entry detail and history views
- Patch/merge views — see exactly what changed in each event
Observe views
Section titled “Observe views”The Observe section (/ui/observe) provides system health and performance monitoring.
Main dashboard (/ui/observe)
Section titled “Main dashboard (/ui/observe)”The observe dashboard provides:
- System status: Uptime, version, and service health at a glance
- Golden signals strip: Throughput, error rate, and latency metrics
- Pressure gauges: Visual indicators for query permit, ingest queue/task, and JVM memory utilization
- Health summary: Auto-refreshing status card (30-second default interval)
- Operator Copilot: AI-driven recommendations based on current system state
Drilldown views
Section titled “Drilldown views”Click through to specialized views for deeper investigation:
| Path | Purpose |
|---|---|
/ui/observe/doctor | Full health report with all registered checks and thresholds |
/ui/observe/services | Service lifecycle status and restart counts |
/ui/observe/query-pipeline | Query stage breakdown with timing and pressure metrics |
/ui/observe/signatures | Query signature analysis for pattern identification |
/ui/observe/hints | Performance optimization hints and recommendations |
/ui/observe/ingest | Ingest metrics per entry type (users, groups, etc.) |
/ui/observe/indexes | Index build status and progress, unused index detection |
/ui/observe/stats | Statistics overview (entries, values, events, slow queries) |
/ui/observe/jvm | JVM metrics (memory, threads, GC) |
Features
Section titled “Features”- Auto-refresh: Dashboard updates automatically every 30 seconds (configurable)
- Operator Copilot: AI-powered recommendations appear when issues are detected
- No authentication required: Uses the same access controls as
/observe/*endpoints - Mobile-friendly: Responsive layout works on tablets and phones
See the interactive API documentation at /observe for the full operational endpoints specification.
When to use
Section titled “When to use”| Scenario | Recommended Tool |
|---|---|
| Browse and inspect entries | Operator UI (/ui/entries) |
| Track recent changes | Operator UI (/ui/changes) |
| Investigate entry history | Operator UI (/ui/entries/{type}/{id}/changes) |
| Quick spot-check during deployment | Operator UI (/ui/observe) |
| Ad-hoc troubleshooting | Operator UI drilldown views |
| Production monitoring with alerts | Grafana Monitoring Bundle |
| Historical analysis and trending | Grafana + Prometheus |
Channel discovery
Section titled “Channel discovery”The /observe/channels endpoint exposes enabled channels, their runtime binding information, and HTTP socket configuration. Useful for service discovery, UIs, and debugging.
curl -s http://localhost:8080/observe/channels | jqExample response:
{ "channels": { "ldap": { "enabled": true, "running": true, "bindings": [ { "host": "0.0.0.0", "configuredPort": 0, "actualPort": 10389, "ssl": false, "url": "ldap://0.0.0.0:10389" } ] }, "rest": { "enabled": true, "sockets": ["@default"], "basePath": "/api" } }, "sockets": { "@default": { "host": "0.0.0.0", "configuredPort": 8080, "actualPort": 8080, "ssl": false, "url": "http://localhost:8080" } }, "request": { "socket": "@default", "host": "localhost", "port": 8080, "scheme": "http" }}Key fields:
| Field | Description |
|---|---|
channels.ldap.running | Whether LDAP is actively listening |
channels.ldap.bindings[].actualPort | Runtime-assigned port (important for ephemeral port 0) |
channels.ldap.bindings[].url | Ready-to-use connection URL |
sockets.<name>.actualPort | HTTP socket runtime port |
sockets.<name>.url | Auto-generated base URL (proxy-aware for current socket) |
request | Request context showing detected host/scheme from headers |
Use cases:
- Service discovery: Fetch socket URLs dynamically for UI configuration
- Ephemeral ports: Test environments using port 0 can discover actual assigned ports
- Proxy debugging: Verify
X-Forwarded-*headers are correctly interpreted
Configuration inspection
Section titled “Configuration inspection”The /observe/config endpoint exposes the resolved configuration with passwords redacted. Equivalent to --printconfig CLI but accessible at runtime.
# JSON response (default)curl -s http://localhost:8080/observe/config | jq
# Plain text response (raw HOCON)curl -s -H "Accept: text/plain" http://localhost:8080/observe/configExample JSON response:
{ "config": "Configuration Sources:\n\n- system properties\n- reference.conf\n\napp {\n mode = production\n}\ndatabase {\n \"password\" : \"<REDACTED>\"\n host = \"localhost\"\n}", "timestamp": "2026-01-13T12:00:00Z"}Key features:
| Feature | Description |
|---|---|
| Password redaction | All password fields show <REDACTED> |
| Configuration sources | Shows merge order (system props, env vars, files) |
| Lazy caching | Computed once on first request |
| Cache-Control | max-age=3600, private (1-hour client-side TTL) |
| Content negotiation | Accept: text/plain returns raw HOCON |
Use cases:
- Debugging: Verify config without shell access to production
- Support diagnostics: Share config safely (passwords redacted)
- CI/CD verification: Confirm environment variables are applied correctly
Quick health check
Section titled “Quick health check”Note: Examples below use port 8080 (the default). The bundled monitoring stack (
monitoring/docker,monitoring/helm) uses port 9001 for monitoring endpoints via named socket separation.
“Is everything OK?”
Use the new /observe/doctor endpoint for an intelligent health assessment:
curl -s http://localhost:8080/observe/doctor | jqExample response:
{ "status": "healthy", "checks": [ { "name": "query.permit_pressure", "status": "healthy", "value": 0.12, "threshold": 0.8, "message": "Permit utilization is 0.12" }, { "name": "ingest.lag_seconds", "status": "healthy", "value": 2.5, "threshold": 60, "message": "Seconds behind event head is 2.50" } ], "context": { "jvm_thread_count": 42, "jvm_cpu_count": 8, "process_uptime_seconds": 86400, "jvm_memory_pressure": 0.45 }}Status values:
| Status | Meaning |
|---|---|
healthy | All checks passing, no issues detected |
degraded | Warning thresholds exceeded, investigate soon |
critical | Critical thresholds exceeded, immediate action needed |
Pressure-focused health check
Section titled “Pressure-focused health check”For a quick view of just the pressure metrics (resource saturation signals), use /observe/pressure:
curl -s http://localhost:8080/observe/pressure | jqExample response:
{ "status": "healthy", "metrics": [ { "name": "query_permit", "value": 0.45, "status": "healthy", "meaning": "Query permit utilization (0-1)" }, { "name": "ingest_queue", "value": 0.12, "status": "healthy", "meaning": "Queue fill ratio (0-1)" }, { "name": "ingest_task", "value": 0.95, "status": "healthy", "meaning": "Task pressure (~1 steady, >1 backlog)" }, { "name": "jvm_memory", "value": 0.62, "status": "healthy", "meaning": "Heap utilization (0-1)" } ], "entry_types": { "user": { "queue": 0.10, "task": 0.95 }, "group": { "queue": 0.12, "task": 0.88 } }, "query_permit_queue": 0, "timestamp": "2026-01-04T12:00:00Z"}The query_permit_queue field shows how many threads are currently waiting to acquire query permits. Non-zero values indicate contention—queries are queuing behind slower operations. This is particularly useful for diagnosing latency spikes when query_permit pressure appears low but queries are still slow.
Key differences from /observe/doctor:
- Focused only on pressure metrics (saturation signals)
- Per-entry-type breakdown shows queue and task pressure for each transcribe
- Actionable hints appear only when metrics exceed warning thresholds
- Omits metrics with non-finite values (e.g., task pressure during cold start before any tasks complete)
When to use which:
| Endpoint | Use Case |
|---|---|
/observe/pressure | Quick saturation check, per-entry-type debugging |
/observe/doctor | Comprehensive health assessment with recommendations |
For simpler checks, use the Kubernetes-style health endpoints:
# Combined health (services + sync + indexes)curl http://localhost:8080/healthz?verbose
# Liveness (is the process alive?)curl http://localhost:8080/livez
# Readiness (can it serve traffic?)curl http://localhost:8080/readyz?verbose
# Startup (have services initialized?)curl http://localhost:8080/startedz?verboseKey actionable metrics
Section titled “Key actionable metrics”These are the “first things to check” when investigating issues.
Pressure gauges (saturation signals)
Section titled “Pressure gauges (saturation signals)”The most actionable metrics - they indicate resource saturation:
| Metric | Description | Warning | Critical |
|---|---|---|---|
scribe.query.permit.pressure | Query capacity utilization (0..1) | 0.8 | 0.95 |
scribe.query.permit.queue | Threads waiting for query permits (count) | 5 | 10 |
scribe.ingest.queue.pressure | Ingest buffer fill ratio (0..1) | 0.8 | 0.95 |
scribe.ingest.task.pressure | Processing keep-up ratio (~1 = steady) | 1.2 | 2.0 |
jvm.memory.pressure | Heap pressure (0..1) | 0.8 | 0.95 |
Pool / connection health
Section titled “Pool / connection health”| Metric | Description | Threshold |
|---|---|---|
scribe.db.connections.pending | Threads waiting for DB connections | > 0 = saturation |
scribe.db.connections.active | Active DB connections across pools | (info) |
scribe.ldap.connections.active | LDAP connection count | (info) |
Active work (current load)
Section titled “Active work (current load)”| Metric | Description | Threshold |
|---|---|---|
scribe.channel.inflight | Requests currently processing | (info) |
scribe.ingest.tasks.active | Active ingest tasks | (info) |
scribe_query_rejected_5m | Queries rejected in last 5 min | > 0 = issues |
Data freshness
Section titled “Data freshness”| Metric | Description | Warning | Critical |
|---|---|---|---|
scribe.ingest.lag.seconds | Seconds behind event head | 60s | 300s |
Deployment / restart verification
Section titled “Deployment / restart verification”After a deployment or restart:
# 1. Verify services startedcurl http://localhost:8080/startedz?verbose# Expected: "startedz check passed"
# 2. Check uptime and statuscurl -s http://localhost:8080/observe/status | jq# Look for: "status": "OK", uptime_duration
# 3. Verify sync completed (if applicable)curl http://localhost:8080/readyz?verbose# Expected: "ingest ok", "indexes ok"
# 4. Check for recent restartscurl -s http://localhost:8080/observe/doctor | jq '.checks[] | select(.name == "services.restarts_5m")'# Expected: value = 0Data freshness troubleshooting
Section titled “Data freshness troubleshooting”“Why is data stale?”
# 1. Check ingest status with real-time gaugescurl -s http://localhost:8080/observe/stats/ingest | jq
# Look for entry_types section with per-type metrics:# - lag_seconds: how far behind# - queue_pressure: buffer fill (0..1)# - task_pressure: processing ratio (~1 = keeping up)# - ingest_completed: has initial ingest completed?Decision tree:
| Symptom | Likely Cause | Action |
|---|---|---|
High lag_seconds, low task_pressure | Source LDAP slow | Check source LDAP performance |
High lag_seconds, high task_pressure | Processing bottleneck | Increase workers or check DB |
High queue_pressure | DB writes slow | Check DB connections, vacuum |
ingest_completed = false | Ingest not started | Check /readyz |
Performance degradation diagnosis
Section titled “Performance degradation diagnosis”“Why are queries slow?”
# 1. Check pressure gauges firstcurl -s http://localhost:8080/observe/doctor | jq '.checks[] | select(.name | startswith("query.") or startswith("jvm."))'
# 2. Check front-door latency via Prometheus# scribe_channel_request_duration_seconds{quantile="0.99"}
# 3. Check stage breakdown# scribe_query_stage_duration_seconds{stage="execute"}Common causes:
| High Pressure | Meaning | Action |
|---|---|---|
query.permit_pressure > 0.8 | Too many concurrent queries | Increase database.channelPoolSize |
jvm.memory_pressure > 0.8 | Heap pressure | Increase -Xmx |
db.connections_pending > 0 | Pool saturation | Increase pool size |
ingest.task_pressure > 1.5 | Ingest competing with queries | Check write load |
Index build status
Section titled “Index build status”# Check index build statuscurl -s http://localhost:8080/observe/indexes | jq
# With detailscurl -s 'http://localhost:8080/observe/indexes?details=true&limit=20' | jqKey fields:
ready: true when all required indexes are builtcounts.pending: indexes waiting to buildcounts.running: indexes currently buildingconcurrentBuildsInProgress: parallel builds in progress
Database performance monitoring
Section titled “Database performance monitoring”IdentityScribe exposes PostgreSQL performance insights via the observe API.
Index usage analysis
Section titled “Index usage analysis”Check for unused or underutilized indexes:
curl -s http://localhost:8080/observe/stats/index-usage | jqExample response:
{ "indexes": [ { "schemaName": "public", "tableName": "entries", "indexName": "idx_entries_dn", "idxScan": 15420, "idxTupRead": 30840, "idxTupFetch": 30840, "size": "2 MB", "sizeBytes": 2097152, "usageStatus": "active" } ], "unusedCount": 2, "totalSizeUnused": "512 kB", "totalSizeUnusedBytes": 524288, "timestamp": "2026-01-24T12:00:00Z"}Usage status values:
| Status | Meaning | Action |
|---|---|---|
active | Index is actively used (100+ scans) | Keep |
low | Index has few scans (1-99) | Monitor |
unused | Index has never been scanned | Consider dropping |
Important: Index usage statistics reset on pg_stat_reset() or PostgreSQL restart. After a restart, all indexes will appear “unused” until they are scanned again. Wait for normal traffic patterns before making decisions about unused indexes.
Slow query analysis
Section titled “Slow query analysis”Requires pg_stat_statements extension in PostgreSQL:
curl -s http://localhost:8080/observe/stats/queries | jqIf you see "error": "pg_stat_statements extension not available", see the Enabling pg_stat_statements guide.
Alert runbooks
Section titled “Alert runbooks”High permit pressure
Section titled “High permit pressure”Alert: scribe_query_permit_pressure > 0.8
Symptoms: Slow queries, rejected requests
Actions:
- Check
/observe/doctorfor query.permit_pressure - Identify slow queries via traces or stage durations
- Consider increasing
database.channelPoolSize - Check for missing indexes:
/observe/hints?severity=warning
Ingest falling behind
Section titled “Ingest falling behind”Alert: scribe_ingest_task_pressure > 1.2
Symptoms: Stale data, high lag_seconds
Actions:
- Check
/observe/stats/ingestfor per-entry-type breakdown - Check source LDAP connectivity
- Run
VACUUM ANALYZEif DB is slow - Consider increasing ingest workers
Memory pressure elevated
Section titled “Memory pressure elevated”Alert: jvm_memory_pressure > 0.8
Symptoms: GC pauses, slow response times
Actions:
- Check
/observe/doctorcontext section - Review heap size configuration (
-Xmx) - Check for memory leaks via profiler
- Consider restarting if persistent
Service flapping
Section titled “Service flapping”Alert: scribe_service_restarts_5m > 1
Symptoms: Intermittent failures, connection errors
Actions:
- Check
/observe/doctorfor services.restarts_5m - Review logs for crash causes
- Check resource limits (CPU, memory)
- Check external dependencies (DB, LDAP)
Prometheus queries
Section titled “Prometheus queries”Common PromQL queries for dashboards:
# Query throughputrate(scribe_channel_requests_total[5m])
# Query latency p99histogram_quantile(0.99, rate(scribe_channel_request_duration_seconds_bucket[5m]))
# Ingest raterate(scribe_ingest_events_written_total[5m])
# Error ratesum(rate(scribe_channel_requests_total{result!="ok"}[5m])) / sum(rate(scribe_channel_requests_total[5m]))
# Pressure gauges (direct)scribe_query_permit_pressurescribe_ingest_queue_pressurejvm_memory_pressureRelated documentation
Section titled “Related documentation”- Observability - Metrics, traces, and endpoint taxonomy
- Deployment Guide - Installation and configuration