PromQL Recipes
Ready-to-use PromQL queries for Scribe metrics and signals.
Request rate and latency
Section titled “Request rate and latency”# Request rate by channelrate(scribe_channel_requests_total[5m])
# p99 latency by channelhistogram_quantile(0.99, rate(scribe_channel_request_duration_seconds_bucket[5m]))
# Error ratesum(rate(scribe_channel_requests_total{result!="ok"}[5m])) / sum(rate(scribe_channel_requests_total[5m]))Pressure alerts
Section titled “Pressure alerts”# Query permit pressure sustained highavg_over_time(scribe_query_permit_pressure[5m]) > 0.9
# Ingest queue pressure by entry typescribe_ingest_queue_pressure > 0.8
# Ingest falling behindscribe_ingest_task_pressure > 1.2Golden signals alerting
Section titled “Golden signals alerting”# Server error rate (use this for health checks, not total error rate)scribe_signals_server_error_rate_percent > 2.0scribe_signals_latency_p95 > 2.0
# Client error alerting (informational only)scribe_signals_client_error_rate_percent > 25.0
# Ingest signalsscribe_signals_ingest_lag_max_seconds > 300scribe_signals_ingest_failed_rate_percent > 1.0Phase breakdown
Section titled “Phase breakdown”# p95 by phasehistogram_quantile(0.95, sum by (phase, le) (rate(scribe_query_stage_duration_seconds_bucket[5m])))
# Which phase dominates?sum by (phase) (rate(scribe_query_stage_duration_seconds_sum[5m])) / sum(rate(scribe_query_stage_duration_seconds_sum[5m]))Service health
Section titled “Service health”# Services currently downscribe_service_up == 0
# Services that restarted recentlyincrease(scribe_service_restarts_total[1h]) > 0
# Flapping servicesrate(scribe_service_transitions_total[5m])Ingest monitoring
Section titled “Ingest monitoring”# Ingest raterate(scribe_ingest_events_written_total[5m])
# Pressure gauges (direct read)scribe_query_permit_pressurescribe_ingest_queue_pressurejvm_memory_pressureFor dashboards, see the Grafana monitoring bundle.