Change Detection
Scribe uses three mechanisms to detect directory changes: persistent search, polling, and reconciliation. Each trades speed for coverage. The fastest fires first; the others fill the gaps.
Scribe tries each mechanism in order — fastest first
This page covers each mechanism, when it activates, and the knobs that affect sync lag.
For the high-level data flow, see Architecture. For tuning worker counts and queue sizes, see Transcribes configuration.
Persistent search
Section titled “Persistent search”When the LDAP directory supports it (Active Directory via DirSync, OpenLDAP via syncrepl, or the persistent search control), Scribe registers for real-time change notifications. The directory pushes changes as they happen.
This is the fastest path. Lag is typically sub-second — bounded by network round-trip and PostgreSQL write latency. Scribe holds the connection open and reconnects automatically if it drops.
Not all directories support persistent search, and some only support it for certain subtrees. When persistent search isn’t available for a transcribe, Scribe falls back to polling.
Polling
Section titled “Polling”Scribe periodically queries the directory for entries modified since the last check. The idle period between polls defaults to 5 seconds (ldap.idle-period). The effective interval is the search duration plus that idle period.
Lag equals the poll interval plus processing time. For a directory with moderate change rates (hundreds of changes per interval), processing adds negligible overhead. For high-rate sources (thousands of changes per poll), consider lowering the interval or increasing workers.
Polling uses modifyTimestamp or an equivalent attribute to detect changes. It misses entries that were modified and then reverted between polls — reconciliation catches those.
Reconciliation
Section titled “Reconciliation”Reconciliation is the safety net. It periodically walks the entire directory tree for a transcribe and compares each entry against the local copy. Anything that persistent search or polling missed gets picked up here.
Typical triggers for missed changes:
- Network partition during a persistent search connection
- Directory failover to a replica with slightly different state
- Entries modified while Scribe was shut down
- Clock skew causing
modifyTimestampqueries to miss a window
When it runs
Section titled “When it runs”Reconciliation is pressure-aware. It waits for the system to be idle — low ingest queue pressure, no active tasks consuming significant resources — before starting a full scan. This prevents reconciliation from competing with real-time change processing during peak load.
After startup, Scribe runs an initial reconciliation to catch anything that changed while it was down. Scheduled runs happen only when you set ldap.reconciliation.interval or ldap.reconciliation.cron (there is no default interval).
ETag deduplication
Section titled “ETag deduplication”Every entry has an attribute hash (ETag) computed from its current attribute values. During reconciliation, Scribe compares the directory entry’s ETag against the stored one. If they match, the entry is skipped — no event is written, no database update happens.
This makes reconciliation cheap even for directories with millions of entries. A full scan of a million-entry directory typically takes a few minutes and writes zero events if nothing changed.
What affects lag
Section titled “What affects lag”The gap between a change happening in the directory and that change being queryable through Scribe depends on several factors:
| Factor | Impact | Tuning |
|---|---|---|
| Persistent search support | Sub-second lag when available; falls back to poll interval otherwise | Directory-dependent — enable DirSync/syncrepl if possible |
| Poll interval | Directly adds to lag for polled transcribes | ldap.idle-period |
| Worker count | More workers process changes faster under load | transcribes.<name>.transcription.workers |
| PostgreSQL write throughput | Bottleneck under high change rates | Connection pool size, disk I/O, VACUUM ANALYZE |
| Network latency to LDAP | Adds to each poll and persistent search reconnect | Network topology |
Monitor lag with scribe_ingest_lag_seconds per entry type. Sustained lag above 60 seconds warrants investigation — see Monitoring for the diagnostic workflow.