Skip to content

PromQL Recipes

Ready-to-use PromQL queries for Scribe metrics and signals.

# Request rate by channel
rate(scribe_channel_requests_total[5m])
# p99 latency by channel
histogram_quantile(0.99, rate(scribe_channel_request_duration_seconds_bucket[5m]))
# Error rate
sum(rate(scribe_channel_requests_total{result!="ok"}[5m]))
/ sum(rate(scribe_channel_requests_total[5m]))
# Query permit pressure sustained high
avg_over_time(scribe_query_permit_pressure[5m]) > 0.9
# Ingest queue pressure by entry type
scribe_ingest_queue_pressure > 0.8
# Ingest falling behind
scribe_ingest_task_pressure > 1.2
# Server error rate (use this for health checks, not total error rate)
scribe_signals_server_error_rate_percent > 2.0
scribe_signals_latency_p95 > 2.0
# Client error alerting (informational only)
scribe_signals_client_error_rate_percent > 25.0
# Ingest signals
scribe_signals_ingest_lag_max_seconds > 300
scribe_signals_ingest_failed_rate_percent > 1.0
# p95 by phase
histogram_quantile(0.95,
sum by (phase, le) (rate(scribe_query_stage_duration_seconds_bucket[5m]))
)
# Which phase dominates?
sum by (phase) (rate(scribe_query_stage_duration_seconds_sum[5m]))
/ sum(rate(scribe_query_stage_duration_seconds_sum[5m]))
# Services currently down
scribe_service_up == 0
# Services that restarted recently
increase(scribe_service_restarts_total[1h]) > 0
# Flapping services
rate(scribe_service_transitions_total[5m])
# Ingest rate
rate(scribe_ingest_events_written_total[5m])
# Pressure gauges (direct read)
scribe_query_permit_pressure
scribe_ingest_queue_pressure
jvm_memory_pressure

For dashboards, see the Grafana monitoring bundle.