Skip to content

Monitoring

Monitor IdentityScribe health, diagnose issues, and tune performance. Access the built-in portal at / or use the Grafana monitoring bundle for production alerting.

Related:

IdentityScribe includes a built-in web interface for operators. No external tools required — just point your browser at the running instance.

The root page shows system status at a glance:

  • System status — version, uptime, health checks, and online/offline indicator
  • Golden signals strip — traffic, latency, errors, saturation, ingest rate, and events
  • Quick status bar — indexes, services, hints, signatures, and doctor check counts
  • Destinations — quick links to Operator UI, Documentation, REST API, and GraphQL
  • Directory data — entry type counts and change feed summary (adds, modifies, deletes)

The main operator dashboard provides:

  • Entry counts by type with quick links to search
  • Change rates with live sparklines for ingest activity
  • Pressure indicators for query, ingest, and JVM
  • Quick access to REST API, GraphQL, and Observe views
  • Hints and signatures counts for performance awareness

Browse and inspect directory entries with full search and filtering capabilities.

PathPurpose
/ui/entriesEntry type chooser — select which type to browse
/ui/entries/{type}Search and filter entries with query builder
/ui/entries/{type}/{id}Entry detail view with attributes, relations, and history
/ui/entries/{type}/{id}?at=...Point-in-time view — see entry state at a specific moment
/ui/entries/{type}/{id}/changesPer-entry change timeline with diff viewing
/ui/entries/{type}/{id}/diffCompare entry state between two timestamps

Features:

  • Query builder with FleX filter syntax support
  • Attribute inspection with copy-to-clipboard
  • Point-in-time navigation via timeline or ?at= parameter
  • Diff viewing between any two versions

Track changes across all entries in real-time.

PathPurpose
/ui/changesGlobal change feed with event type filtering

Features:

  • Event type filtering — show only adds, modifies, moves, or deletes
  • Time range filtering — focus on recent changes or specific periods
  • Entry linking — click through to entry detail and history views
  • Patch/merge views — see exactly what changed in each event

The Observe section (/ui/observe) provides system health and performance monitoring.

The observe dashboard provides:

  • System status: Uptime, version, and service health at a glance
  • Golden signals strip: Throughput, error rate, and latency metrics
  • Pressure gauges: Visual indicators for query permit, ingest queue/task, and JVM memory utilization
  • Health summary: Auto-refreshing status card (30-second default interval)
  • Operator Copilot: AI-driven recommendations based on current system state

Click through to specialized views for deeper investigation:

PathPurpose
/ui/observe/doctorFull health report with all registered checks and thresholds
/ui/observe/servicesService lifecycle status and restart counts
/ui/observe/query-pipelineQuery stage breakdown with timing and pressure metrics
/ui/observe/signaturesQuery signature analysis for pattern identification
/ui/observe/hintsPerformance optimization hints and recommendations
/ui/observe/ingestIngest metrics per entry type (users, groups, etc.)
/ui/observe/indexesIndex build status and progress, unused index detection
/ui/observe/statsStatistics overview (entries, values, events, slow queries)
/ui/observe/jvmJVM metrics (memory, threads, GC)
  • Auto-refresh: Dashboard updates automatically every 30 seconds (configurable)
  • Operator Copilot: AI-powered recommendations appear when issues are detected
  • No authentication required: Uses the same access controls as /observe/* endpoints
  • Mobile-friendly: Responsive layout works on tablets and phones

See the interactive API documentation at /observe for the full operational endpoints specification.

ScenarioRecommended Tool
Browse and inspect entriesOperator UI (/ui/entries)
Track recent changesOperator UI (/ui/changes)
Investigate entry historyOperator UI (/ui/entries/{type}/{id}/changes)
Quick spot-check during deploymentOperator UI (/ui/observe)
Ad-hoc troubleshootingOperator UI drilldown views
Production monitoring with alertsGrafana Monitoring Bundle
Historical analysis and trendingGrafana + Prometheus


The /observe/channels endpoint exposes enabled channels, their runtime binding information, and HTTP socket configuration. Useful for service discovery, UIs, and debugging.

Terminal window
curl -s http://localhost:8080/observe/channels | jq

Example response:

{
"channels": {
"ldap": {
"enabled": true,
"running": true,
"bindings": [
{ "host": "0.0.0.0", "configuredPort": 0, "actualPort": 10389, "ssl": false, "url": "ldap://0.0.0.0:10389" }
]
},
"rest": { "enabled": true, "sockets": ["@default"], "basePath": "/api" }
},
"sockets": {
"@default": { "host": "0.0.0.0", "configuredPort": 8080, "actualPort": 8080, "ssl": false, "url": "http://localhost:8080" }
},
"request": { "socket": "@default", "host": "localhost", "port": 8080, "scheme": "http" }
}

Key fields:

FieldDescription
channels.ldap.runningWhether LDAP is actively listening
channels.ldap.bindings[].actualPortRuntime-assigned port (important for ephemeral port 0)
channels.ldap.bindings[].urlReady-to-use connection URL
sockets.<name>.actualPortHTTP socket runtime port
sockets.<name>.urlAuto-generated base URL (proxy-aware for current socket)
requestRequest context showing detected host/scheme from headers

Use cases:

  • Service discovery: Fetch socket URLs dynamically for UI configuration
  • Ephemeral ports: Test environments using port 0 can discover actual assigned ports
  • Proxy debugging: Verify X-Forwarded-* headers are correctly interpreted

The /observe/config endpoint exposes the resolved configuration with passwords redacted. Equivalent to --printconfig CLI but accessible at runtime.

Terminal window
# JSON response (default)
curl -s http://localhost:8080/observe/config | jq
# Plain text response (raw HOCON)
curl -s -H "Accept: text/plain" http://localhost:8080/observe/config

Example JSON response:

{
"config": "Configuration Sources:\n\n- system properties\n- reference.conf\n\napp {\n mode = production\n}\ndatabase {\n \"password\" : \"<REDACTED>\"\n host = \"localhost\"\n}",
"timestamp": "2026-01-13T12:00:00Z"
}

Key features:

FeatureDescription
Password redactionAll password fields show <REDACTED>
Configuration sourcesShows merge order (system props, env vars, files)
Lazy cachingComputed once on first request
Cache-Controlmax-age=3600, private (1-hour client-side TTL)
Content negotiationAccept: text/plain returns raw HOCON

Use cases:

  • Debugging: Verify config without shell access to production
  • Support diagnostics: Share config safely (passwords redacted)
  • CI/CD verification: Confirm environment variables are applied correctly

Note: Examples below use port 8080 (the default). The bundled monitoring stack (monitoring/docker, monitoring/helm) uses port 9001 for monitoring endpoints via named socket separation.

“Is everything OK?”

Use the new /observe/doctor endpoint for an intelligent health assessment:

Terminal window
curl -s http://localhost:8080/observe/doctor | jq

Example response:

{
"status": "healthy",
"checks": [
{
"name": "query.permit_pressure",
"status": "healthy",
"value": 0.12,
"threshold": 0.8,
"message": "Permit utilization is 0.12"
},
{
"name": "ingest.lag_seconds",
"status": "healthy",
"value": 2.5,
"threshold": 60,
"message": "Seconds behind event head is 2.50"
}
],
"context": {
"jvm_thread_count": 42,
"jvm_cpu_count": 8,
"process_uptime_seconds": 86400,
"jvm_memory_pressure": 0.45
}
}

Status values:

StatusMeaning
healthyAll checks passing, no issues detected
degradedWarning thresholds exceeded, investigate soon
criticalCritical thresholds exceeded, immediate action needed

For a quick view of just the pressure metrics (resource saturation signals), use /observe/pressure:

Terminal window
curl -s http://localhost:8080/observe/pressure | jq

Example response:

{
"status": "healthy",
"metrics": [
{ "name": "query_permit", "value": 0.45, "status": "healthy", "meaning": "Query permit utilization (0-1)" },
{ "name": "ingest_queue", "value": 0.12, "status": "healthy", "meaning": "Queue fill ratio (0-1)" },
{ "name": "ingest_task", "value": 0.95, "status": "healthy", "meaning": "Task pressure (~1 steady, >1 backlog)" },
{ "name": "jvm_memory", "value": 0.62, "status": "healthy", "meaning": "Heap utilization (0-1)" }
],
"entry_types": {
"user": { "queue": 0.10, "task": 0.95 },
"group": { "queue": 0.12, "task": 0.88 }
},
"query_permit_queue": 0,
"timestamp": "2026-01-04T12:00:00Z"
}

The query_permit_queue field shows how many threads are currently waiting to acquire query permits. Non-zero values indicate contention—queries are queuing behind slower operations. This is particularly useful for diagnosing latency spikes when query_permit pressure appears low but queries are still slow.

Key differences from /observe/doctor:

  • Focused only on pressure metrics (saturation signals)
  • Per-entry-type breakdown shows queue and task pressure for each transcribe
  • Actionable hints appear only when metrics exceed warning thresholds
  • Omits metrics with non-finite values (e.g., task pressure during cold start before any tasks complete)

When to use which:

EndpointUse Case
/observe/pressureQuick saturation check, per-entry-type debugging
/observe/doctorComprehensive health assessment with recommendations

For simpler checks, use the Kubernetes-style health endpoints:

Terminal window
# Combined health (services + sync + indexes)
curl http://localhost:8080/healthz?verbose
# Liveness (is the process alive?)
curl http://localhost:8080/livez
# Readiness (can it serve traffic?)
curl http://localhost:8080/readyz?verbose
# Startup (have services initialized?)
curl http://localhost:8080/startedz?verbose

These are the “first things to check” when investigating issues.

The most actionable metrics - they indicate resource saturation:

MetricDescriptionWarningCritical
scribe.query.permit.pressureQuery capacity utilization (0..1)0.80.95
scribe.query.permit.queueThreads waiting for query permits (count)510
scribe.ingest.queue.pressureIngest buffer fill ratio (0..1)0.80.95
scribe.ingest.task.pressureProcessing keep-up ratio (~1 = steady)1.22.0
jvm.memory.pressureHeap pressure (0..1)0.80.95
MetricDescriptionThreshold
scribe.db.connections.pendingThreads waiting for DB connections> 0 = saturation
scribe.db.connections.activeActive DB connections across pools(info)
scribe.ldap.connections.activeLDAP connection count(info)
MetricDescriptionThreshold
scribe.channel.inflightRequests currently processing(info)
scribe.ingest.tasks.activeActive ingest tasks(info)
scribe_query_rejected_5mQueries rejected in last 5 min> 0 = issues
MetricDescriptionWarningCritical
scribe.ingest.lag.secondsSeconds behind event head60s300s

After a deployment or restart:

Terminal window
# 1. Verify services started
curl http://localhost:8080/startedz?verbose
# Expected: "startedz check passed"
# 2. Check uptime and status
curl -s http://localhost:8080/observe/status | jq
# Look for: "status": "OK", uptime_duration
# 3. Verify sync completed (if applicable)
curl http://localhost:8080/readyz?verbose
# Expected: "ingest ok", "indexes ok"
# 4. Check for recent restarts
curl -s http://localhost:8080/observe/doctor | jq '.checks[] | select(.name == "services.restarts_5m")'
# Expected: value = 0

“Why is data stale?”

Terminal window
# 1. Check ingest status with real-time gauges
curl -s http://localhost:8080/observe/stats/ingest | jq
# Look for entry_types section with per-type metrics:
# - lag_seconds: how far behind
# - queue_pressure: buffer fill (0..1)
# - task_pressure: processing ratio (~1 = keeping up)
# - ingest_completed: has initial ingest completed?

Decision tree:

SymptomLikely CauseAction
High lag_seconds, low task_pressureSource LDAP slowCheck source LDAP performance
High lag_seconds, high task_pressureProcessing bottleneckIncrease workers or check DB
High queue_pressureDB writes slowCheck DB connections, vacuum
ingest_completed = falseIngest not startedCheck /readyz

“Why are queries slow?”

Terminal window
# 1. Check pressure gauges first
curl -s http://localhost:8080/observe/doctor | jq '.checks[] | select(.name | startswith("query.") or startswith("jvm."))'
# 2. Check front-door latency via Prometheus
# scribe_channel_request_duration_seconds{quantile="0.99"}
# 3. Check stage breakdown
# scribe_query_stage_duration_seconds{stage="execute"}

Common causes:

High PressureMeaningAction
query.permit_pressure > 0.8Too many concurrent queriesIncrease database.channelPoolSize
jvm.memory_pressure > 0.8Heap pressureIncrease -Xmx
db.connections_pending > 0Pool saturationIncrease pool size
ingest.task_pressure > 1.5Ingest competing with queriesCheck write load
Terminal window
# Check index build status
curl -s http://localhost:8080/observe/indexes | jq
# With details
curl -s 'http://localhost:8080/observe/indexes?details=true&limit=20' | jq

Key fields:

  • ready: true when all required indexes are built
  • counts.pending: indexes waiting to build
  • counts.running: indexes currently building
  • concurrentBuildsInProgress: parallel builds in progress

IdentityScribe exposes PostgreSQL performance insights via the observe API.

Check for unused or underutilized indexes:

Terminal window
curl -s http://localhost:8080/observe/stats/index-usage | jq

Example response:

{
"indexes": [
{
"schemaName": "public",
"tableName": "entries",
"indexName": "idx_entries_dn",
"idxScan": 15420,
"idxTupRead": 30840,
"idxTupFetch": 30840,
"size": "2 MB",
"sizeBytes": 2097152,
"usageStatus": "active"
}
],
"unusedCount": 2,
"totalSizeUnused": "512 kB",
"totalSizeUnusedBytes": 524288,
"timestamp": "2026-01-24T12:00:00Z"
}

Usage status values:

StatusMeaningAction
activeIndex is actively used (100+ scans)Keep
lowIndex has few scans (1-99)Monitor
unusedIndex has never been scannedConsider dropping

Important: Index usage statistics reset on pg_stat_reset() or PostgreSQL restart. After a restart, all indexes will appear “unused” until they are scanned again. Wait for normal traffic patterns before making decisions about unused indexes.

Requires pg_stat_statements extension in PostgreSQL:

Terminal window
curl -s http://localhost:8080/observe/stats/queries | jq

If you see "error": "pg_stat_statements extension not available", see the Enabling pg_stat_statements guide.

Alert: scribe_query_permit_pressure > 0.8

Symptoms: Slow queries, rejected requests

Actions:

  1. Check /observe/doctor for query.permit_pressure
  2. Identify slow queries via traces or stage durations
  3. Consider increasing database.channelPoolSize
  4. Check for missing indexes: /observe/hints?severity=warning

Alert: scribe_ingest_task_pressure > 1.2

Symptoms: Stale data, high lag_seconds

Actions:

  1. Check /observe/stats/ingest for per-entry-type breakdown
  2. Check source LDAP connectivity
  3. Run VACUUM ANALYZE if DB is slow
  4. Consider increasing ingest workers

Alert: jvm_memory_pressure > 0.8

Symptoms: GC pauses, slow response times

Actions:

  1. Check /observe/doctor context section
  2. Review heap size configuration (-Xmx)
  3. Check for memory leaks via profiler
  4. Consider restarting if persistent

Alert: scribe_service_restarts_5m > 1

Symptoms: Intermittent failures, connection errors

Actions:

  1. Check /observe/doctor for services.restarts_5m
  2. Review logs for crash causes
  3. Check resource limits (CPU, memory)
  4. Check external dependencies (DB, LDAP)

Common PromQL queries for dashboards:

# Query throughput
rate(scribe_channel_requests_total[5m])
# Query latency p99
histogram_quantile(0.99, rate(scribe_channel_request_duration_seconds_bucket[5m]))
# Ingest rate
rate(scribe_ingest_events_written_total[5m])
# Error rate
sum(rate(scribe_channel_requests_total{result!="ok"}[5m])) / sum(rate(scribe_channel_requests_total[5m]))
# Pressure gauges (direct)
scribe_query_permit_pressure
scribe_ingest_queue_pressure
jvm_memory_pressure