Database Tuning

You have a PostgreSQL server for IdentityScribe. These settings size memory, storage, and WAL for IdentityScribe's workload.

For caller-side index selection and query-shape guidance, see Query performance.

All formulas assume a dedicated database server. If PostgreSQL shares a host with IdentityScribe, reduce memory fractions accordingly.

Workload profile

IdentityScribe's database workload has distinct characteristics that inform tuning:

Three isolated connection pools — a default total of ~43 connections serving different workloads: write-heavy transcription tasks, schema maintenance, and latency-sensitive query serving (LDAP, REST, GraphQL).
Partition-pruned leaf tables — entries_data is sub-partitioned by entry type and attribute, producing dozens of leaf partitions. At 10M entries, a typical deployment has 27+ indexes across 16 partitions.
Multiple index types per attribute — equality and range B-tree indexes, sort-order B-tree indexes, and GIN trigram indexes for substring search. GIN indexes dominate disk usage (the largest can exceed 1 GB per attribute).
Read-dominant steady state — equality lookups, range scans with cursor pagination, substring search, and sorted result sets. All served concurrently from LDAP, REST, and GraphQL channels.
Bursty writes — event-sourced single-row inserts at steady state; multi-row bulk writes during reconciliation. When IdentityScribe prepares a new entry type, GIN trigram indexes are built at runtime.

Recommended settings

Settings are parameterized by server RAM and CPU cores.

Memory

Setting	Formula	16 GB	32 GB	64 GB	Rationale
`shared_buffers`	RAM × 0.25	4 GB	8 GB	16 GB	Must hold hot partition data and GIN indexes for query serving. Undersizing causes buffer churn — every query evicts pages needed by the next
`effective_cache_size`	RAM × 0.75	12 GB	24 GB	48 GB	Reflects OS page cache availability so PostgreSQL can favor index-backed reads when hot data fits in cache
`work_mem`	RAM / (connections × 4)	16 MB	32 MB	64 MB	Per-sort-node per-connection. Too high risks OOM under load; too low forces disk sorts on sorted pagination queries
`maintenance_work_mem`	min(RAM × 0.05, 2 GB)	800 MB	1.6 GB	2 GB	GIN trigram index builds are memory-intensive. Directly affects how fast new entry types become available

Session-level `work_mem` override

The table above recommends server-level work_mem for postgresql.conf. IdentityScribe also sets work_mem = 16MB per query connection via database.connection-hints.session-flags.work-mem. This connection-level setting overrides the server default for IdentityScribe's connections.

16MB works for most deployments. If query performance degrades over time or under load, insufficient work_mem can cause PostgreSQL to spill sort and hash operations to disk — which is orders of magnitude slower than in-memory processing.

Observe detects spills automatically. When it finds disk spilling, it recommends a work_mem value and provides a ready-to-use config snippet. You can also calculate manually:

Tuning formula

A safe starting point for work_mem given your server's total RAM and connection count:

work_mem = max(16MB, min(512MB, total_ram_mb / max_connections / 4))

For a server with 32GB RAM and 100 connections:

work_mem = max(16MB, min(512MB, 32768 / 100 / 4)) ≈ 82MB → round to 64MB or 96MB

Start at the next standard size above the formula result (16, 32, 64, 96, 128, 256, 512 MB). Monitor temp file counts after each increase — if they stop growing, you have enough.

How to adjust

Override per-channel or globally in IdentityScribe config:

database.connection-hints.session-flags {
  work-mem = "64MB"
}

Or per-channel (e.g., only for REST API queries):

channels.rest.connection-hints.session-flags {
  work-mem = "128MB"
}

The value must match <number><unit> where unit is B, kB, MB, GB, or TB (e.g., "256MB", "1GB").

Risk

Raising work_mem increases memory pressure. A single query can use work_mem multiple times — once per sort or hash operation, and once per parallel worker. On a server running many concurrent queries, doubling work_mem across the board can trigger out-of-memory errors. Raise gradually and verify in Observe that temporary-file pressure falls before moving higher.

Storage

Setting	Value	Rationale
`random_page_cost`	1.1	SSD random reads are nearly as fast as sequential. Helps PostgreSQL favor index-backed reads, which is critical for partition-pruned queries
`effective_io_concurrency`	200	SSD can handle many concurrent read requests. Benefits bitmap heap scans on GIN trigram results

For HDD storage, use random_page_cost = 2.0 and effective_io_concurrency = 4.

Parallelism

Setting	Formula	4 cores	8 cores	16 cores
`max_parallel_workers_per_gather`	cores / 4	1	2	4
`max_parallel_maintenance_workers`	cores / 2	2	4	4
`max_parallel_workers`	cores / 2	2	4	8

IdentityScribe uses parallel query execution for equality lookups at scale — each equality filter is translated to a direct join predicate that PostgreSQL can split across worker processes via Gather Merge.

Maintenance parallelism is higher to speed up index creation when new entry types are prepared.

Partition-wise operations

Setting	Value	Rationale
`enable_partitionwise_join`	on	IdentityScribe's schema is partitioned by entry type and attribute. Without this, PostgreSQL joins full parent tables and prunes afterward. With it, joins target individual partitions directly
`enable_partitionwise_aggregate`	on	Enables per-partition GROUP BY aggregation before merging results. Benefits sort and cursor queries

These settings are off by default in PostgreSQL because they add query-preparation overhead for non-partitioned schemas. IdentityScribe enables both per-session automatically on every query connection, so they work out of the box without server configuration.

Setting them server-level in postgresql.conf is still recommended — it avoids the per-connection SET overhead and ensures all connections (including ad-hoc psql sessions and monitoring queries) benefit.

WAL and checkpoints

Setting	Value	Rationale
`wal_level`	replica	Required — event sourcing requires WAL for crash recovery and potential replication
`synchronous_commit`	on	Required — event sourcing demands durable writes. Losing committed events breaks sync state
`max_wal_size`	RAM × 0.125 (2–8 GB)	Larger WAL before forced checkpoint. Sync writes are bursty during reconciliation
`checkpoint_timeout`	10–15 min	Reduces checkpoint frequency. Default 5 min causes excessive I/O during sustained writes
`checkpoint_completion_target`	0.9	Spread checkpoint writes over 90% of the interval (default; keep as-is)

Connections

Setting	Formula	Rationale
`max_connections`	pool total + 15	Sum of all connection pools, plus headroom for admin, monitoring, and migrations. Default pool total is ~43, so 60 is a safe starting point

Autovacuum

Setting	Value	Rationale
`autovacuum`	on	Required — partitioned tables with frequent updates need regular vacuum to prevent bloat and maintain visibility maps
`autovacuum_vacuum_scale_factor`	0.05	More aggressive than default (0.2) — partitions are smaller, so the default leaves too many dead tuples proportionally
`autovacuum_analyze_scale_factor`	0.02	Keeps statistics fresh. Stale partition stats reduce pruning efficiency and can slow filter-heavy searches

Settings that must not be used in production

These settings disable crash recovery and durability. Never use them in production.

Setting	Unsafe value	Production value	Why
`wal_level`	minimal	replica	Minimal disables crash recovery and replication
`max_wal_senders`	0	10 (default)	Required for replication and `pg_basebackup`
`synchronous_commit`	off	on	Event sourcing requires durable commits
`fsync`	off	on (default)	Disabling risks unrecoverable data corruption on crash
`full_page_writes`	off	on (default)	Prevents partial page writes; required for crash recovery
`autovacuum`	off	on	Prevents table bloat and maintains statistics

Example configuration

For a 32 GB RAM / 8-core SSD server with default IdentityScribe pool sizes:

# Memory
shared_buffers = 8GB
effective_cache_size = 24GB
work_mem = 32MB
maintenance_work_mem = 1536MB

# Storage (SSD)
random_page_cost = 1.1
effective_io_concurrency = 200

# Parallelism
max_parallel_workers_per_gather = 2
max_parallel_maintenance_workers = 4
max_parallel_workers = 4

# Partition-wise operations (required for IdentityScribe)
enable_partitionwise_join = on
enable_partitionwise_aggregate = on

# WAL and Checkpoints
max_wal_size = 4GB
checkpoint_timeout = 10min

# Connections
max_connections = 60

# Autovacuum (tuned for partitioned tables)
autovacuum_vacuum_scale_factor = 0.05
autovacuum_analyze_scale_factor = 0.02

After restarting PostgreSQL, confirm buffer allocation with SHOW shared_buffers and monitor query performance on the Health and Monitoring dashboard.

IdentityScribe connection pools

IdentityScribe manages three separate HikariCP connection pools to isolate workloads. These are configured in the IdentityScribe config (not postgresql.conf).

Pool	Used by	Default size	Tuning guidance
Batch	Transcription tasks (main write workload)	`concurrency + 5`, clamped to `concurrency × 1.5`	Increase if `scribe_db_connections_pending` is consistently high and traces point at write work; decrease on memory-constrained hosts
System	Migrations, maintenance, health checks, DDL	`max(transcribeCount + 4, concurrency / 4)`, clamped `[2, max-pool-size / 2]`	Increase if maintenance windows overlap with heavy write load
Channel	REST, GraphQL, LDAP query serving	`concurrency`, minimum `2`	Increase if channel latency is dominated by connection acquisition waits (`scribe_db_connections_pending` with high query permit pressure)

The total connection count is the sum of all three pools. Ensure max_connections in postgresql.conf can accommodate the total (see the Connections table above).

A semaphore (default: channel pool size) caps concurrent query connections across all channels. HTTP and GraphQL queries that exceed this wait up to query-http-acquisition-timeout (default: 5s) and return 503 with a Retry-After header. LDAP queries block up to their query time limit instead.

Config	Env var	Default
`database.max-pool-size`	`SCRIBE_DATABASE_MAX_POOL_SIZE`	`concurrency + 5`
`database.system-pool-size`	`SCRIBE_DATABASE_SYSTEM_POOL_SIZE`	`max(transcribeCount + 4, concurrency / 4)`
`database.channel-pool-size`	`SCRIBE_DATABASE_CHANNEL_POOL_SIZE`	`concurrency`
`database.query-http-acquisition-timeout`	`SCRIBE_DATABASE_QUERY_HTTP_ACQUISITION_TIMEOUT`	`5s`

Adaptive filtered-search tuning

For filtered, sorted, prefix, and substring searches, IdentityScribe adjusts execution automatically based on live directory size, attribute distribution, and recent production traffic.

Narrow-match searches finish quickly when only a small share of rows match.
Sorted searches that narrow to a small set of entries finish quickly even over large directories — a sorted or virtual-list-view browse selected by entry distinguished names (a single one or a long list), a unique identifier (uuid/uoid), or a selective attribute value applies that narrowing before sorting, instead of ordering the whole result set first.
Broad-match searches stay predictable when a large share of rows match.
Prefix searches reuse recent measurements for repeated patterns.
Range searches use maintained coverage data when that is expected to reduce work.

No manual tuning is required for standard deployments. The default behaviour is conservative during startup and becomes more specific as normal traffic provides evidence.

When to intervene

Leave adaptive search tuning enabled unless you see a repeatable regression on a specific attribute or search pattern. If that happens:

Capture a Query Diagnostic Report for the affected request.
Compare the affected request against the Health and Monitoring dashboard for database load, queueing, and timeout signals.
Contact Kenoxa support with the report and the time window. Support can provide deployment-specific overrides when a temporary safety valve is needed.

Do not tune the advanced search thresholds from symptoms alone. The same latency symptom can come from stale database statistics, undersized memory, connection pressure, or an attribute distribution that changed after a bulk import.

Statistics target

IdentityScribe dynamically computes the PostgreSQL statistics target before each maintenance refresh, adjusting it based on directory size. This keeps filter and sort performance more consistent at scale without the overhead of a blanket high target.

The computed target is logged at the start of each maintenance window.

Config	Env var	Default
`database.maintenance.statistics-target-ratio`	`SCRIBE_DATABASE_MAINTENANCE_STATISTICS_TARGET_RATIO`	`0.00005`

Set to 0 to disable dynamic tuning and use PostgreSQL's default.

Sort index backfill

Sort indexes store a compact min/max range per entry for each sortable attribute. These indexes are built in the background at startup; the service remains fully operational during the build window.

Backfill concurrency auto-scales from available system pool resources, reserving enough connections for maintenance and health checks.

Config	Env var	Default	When to change
`database.sort-index-backfill.max-concurrent`	`SCRIBE_DATABASE_SORT_INDEX_BACKFILL_MAX_CONCURRENT`	Auto-scaled	Increase to shorten backfill window on high-spec systems; decrease to reduce read pressure during initial sync

Quick reference

Common IdentityScribe database configuration environment variables:

Env var	Default
`SCRIBE_DATABASE_MAX_POOL_SIZE`	`concurrency + 5`
`SCRIBE_DATABASE_SYSTEM_POOL_SIZE`	`max(transcribeCount + 4, concurrency / 4)`
`SCRIBE_DATABASE_CHANNEL_POOL_SIZE`	`concurrency`
`SCRIBE_DATABASE_QUERY_HTTP_ACQUISITION_TIMEOUT`	`5s`
`SCRIBE_DATABASE_MAINTENANCE_STATISTICS_TARGET_RATIO`	`0.00005`
`SCRIBE_DATABASE_SORT_INDEX_BACKFILL_MAX_CONCURRENT`	Auto-scaled