Skip to content

Database

PostgreSQL database connection and pool configuration.

Collation guidance:

  • Recommended (multi-language prod): ICU collation “und-x-icu” for consistent

ordering across platforms. Use TEMPLATE template0 to guarantee your collation/ctype apply (template1 may carry different locale or extensions).

CREATE DATABASE "identity_scribe"
TEMPLATE template0
ENCODING = 'UTF8'
LC_COLLATE = 'und-x-icu'
LC_CTYPE = 'und-x-icu'
CONNECTION LIMIT = -1;
  • Performance-focused / ASCII-only prod: choose “C” for maximum speed; keep

it consistent across all environments.

CREATE DATABASE "identity_scribe"
TEMPLATE template0
ENCODING = 'UTF8'
LC_COLLATE = 'C'
LC_CTYPE = 'C'
CONNECTION LIMIT = -1;

Connection pool architecture

Identity Scribe uses three separate connection pools to isolate different workloads:

  • Batch pool - Used by: Transcription tasks (main workload) - Size: Configured via maxPoolSize (default: concurrency + 5, max: concurrency * 1.5) - Each transcription task uses one connection for its lifetime

  • System pool - Used by: System-level operations (commit coordination, migrations, maintenance, entry preparation) - Size: Configurable via systemPoolSize (optional) - Default: max(transcribeCount + 4, concurrency / 4), clamped between 2 and maxPoolSize / 2 - Scales with both number of transcribes and concurrency level - Isolates system operations from transcription tasks

  • Channel pool - Used by: LDAP channel services (IdentityHub, LDAP channels) - Size: Configurable via channelPoolSize (optional) - Default: concurrency, minimum 2 - Can experience heavy LDAP traffic with multiple channels - Isolates channel operations from batch and commit workloads

Total maximum connections (when all pools are fully utilized): = Batch Pool + System Pool + Channel Pool = maxPoolSize + systemPoolSize + channelPoolSize

Example with default settings (concurrency = 16, 2 transcribes):

  • Batch Pool: 16 + 5 = 21 connections (default maxPoolSize)
  • System Pool: max(2 + 4, 16/4) = max(6, 4) = 6 connections
  • Channel Pool: 16 connections
  • Total Maximum: 21 + 6 + 16 = 43 connections

Example with maxPoolSize explicitly set to 50 (concurrency = 16, 2 transcribes):

  • Batch Pool: 50 connections
  • System Pool: max(2 + 4, 16/4) = 6 connections (capped at 50/2 = 25)
  • Channel Pool: 16 connections
  • Total Maximum: 50 + 6 + 16 = 72 connections

Tuning guidelines:

  • Monitor pool metrics: Track hikari.connections.active, hikari.connections.idle,

and hikari.connections.pending for each pool

  • Signs of under-sizing: If pending connections are consistently high or active equals

maximum for extended periods, consider increasing pool size

  • Signs of over-sizing: If idle connections consistently exceed 50% of maximum,

consider reducing pool size to free database resources

  • High concurrency workloads: Many more tasks can be queued than

concurrency, but each task still uses one connection. Monitor actual connection utilization rather than task count.

Size of the channel connection pool. This pool handles LDAP channel operations (IdentityHub, LDAP channels).

Default: concurrency

Minimum: 2

Channels can experience heavy LDAP traffic, especially with multiple active channels. Increase for high-throughput read-heavy workloads.

Example: With concurrency = 16:

Default = 16 connections

Priority: SCRIBE_DATABASE_CHANNEL_POOL_SIZE > config

PropertyValue
Default16
OverrideSCRIBE_DATABASE_CHANNEL_POOL_SIZE (optional)
database.channel-pool-size = ${?SCRIBE_DATABASE_CHANNEL_POOL_SIZE}

Connection Hints (channel query defaults)

Session-level hints applied to channel queries (REST, LDAP, GraphQL, gRPC). All keys are unset by default (PostgreSQL defaults apply). Channels can override these defaults in their own connection-hints section. These hints control PostgreSQL session parameters for query execution. They are applied when opening a connection and restored when the query completes.

Lock acquisition timeout.

PostgreSQL will abort any statement that waits longer than this for a lock. Format: HOCON duration (e.g., 5s, 30s, 1m)

Priority: SCRIBE_DATABASE_LOCK_TIMEOUT > config

PropertyValue
Defaultnull
OverrideSCRIBE_DATABASE_LOCK_TIMEOUT (optional)
database.connection-hints.lock-timeout = ${?SCRIBE_DATABASE_LOCK_TIMEOUT}

Session flags (comma-separated).

Supported flags:

  • force-custom-plan: Forces PostgreSQL to use custom plans for prepared statements
  • jit-off: Disables JIT compilation

Priority: SCRIBE_DATABASE_SESSION_FLAGS > config

PropertyValue
Defaultnull
OverrideSCRIBE_DATABASE_SESSION_FLAGS (optional)
database.connection-hints.session-flags = ${?SCRIBE_DATABASE_SESSION_FLAGS}

database.connection-hints.statement-timeout

Section titled “database.connection-hints.statement-timeout”

Statement execution timeout (prevents runaway queries). PostgreSQL will abort any statement that takes longer than this duration. Format: HOCON duration (e.g., 30s, 1m, 500ms)

Priority: SCRIBE_DATABASE_STATEMENT_TIMEOUT > config

PropertyValue
Defaultnull
OverrideSCRIBE_DATABASE_STATEMENT_TIMEOUT (optional)
database.connection-hints.statement-timeout = ${?SCRIBE_DATABASE_STATEMENT_TIMEOUT}

PostgreSQL work_mem per-query memory allocation. Controls the amount of memory used for internal sort operations and hash tables. Format: PostgreSQL memory size (e.g., “64MB”, “256MB”, “1GB”) Valid units: B, kB, MB, GB, TB (case-insensitive)

Priority: SCRIBE_DATABASE_WORK_MEM > config

PropertyValue
Defaultnull
OverrideSCRIBE_DATABASE_WORK_MEM (optional)
database.connection-hints.work-mem = ${?SCRIBE_DATABASE_WORK_MEM}

Maintenance scheduling for tasks like vacuuming, re-indexing, etc.

Define a cron expression (unix cron format) to run the maintenance task at a specific time Default not set, eg no cron window

Priority: SCRIBE_DATABASE_MAINTENANCE_CRON > config

cron = ‘0 5 * * *’ # 5am daily

PropertyValue
OverrideSCRIBE_DATABASE_MAINTENANCE_CRON (optional)
database.maintenance.cron = ${?SCRIBE_DATABASE_MAINTENANCE_CRON}

If enabled, maintenance tasks will be scheduled Defaults to enabled unless readonly is set to true

Priority: SCRIBE_DATABASE_MAINTENANCE_ENABLED > config

PropertyValue
Defaulttrue
OverrideSCRIBE_DATABASE_MAINTENANCE_ENABLED (optional)
database.maintenance.enabled = ${?SCRIBE_DATABASE_MAINTENANCE_ENABLED}

Used after interval, to check if system is idle enough to run maintenance

Priority: SCRIBE_DATABASE_MAINTENANCE_HARD_THRESHOLDS > config

Defaults to:

hard-thresholds = {
scribe_db_connections_active = Math.ceil(Math.pow(Math.log(max-pool-size + channel-pool-size + system-pool-size), 2)) # DB pool load, eg many connections are active
}
PropertyValue
OverrideSCRIBE_DATABASE_MAINTENANCE_HARD_THRESHOLDS (optional)
database.maintenance.hard-thresholds = ${?SCRIBE_DATABASE_MAINTENANCE_HARD_THRESHOLDS}

Runs based on system load metrics, at least every 36 hours Only a well-known set of threshold metrics is supported (no labels/selectors):

  • scribe_ingest_tasks_active
  • scribe_ingest_task_pressure
  • scribe_ingest_queue_pressure
  • scribe_db_connections_active
  • scribe_ldap_connections_active
  • scribe_query_permit_pressure

Arbitrary metric names are NOT supported. These values are queried directly from ScribeMetrics (not Prometheus). The default value for interval is 36 hours, if no interval or cron is set

Priority: SCRIBE_DATABASE_MAINTENANCE_INTERVAL > config

PropertyValue
Default36 hours
OverrideSCRIBE_DATABASE_MAINTENANCE_INTERVAL (optional)
database.maintenance.interval = ${?SCRIBE_DATABASE_MAINTENANCE_INTERVAL}

Used after half of interval, to check if system is idle enough to run maintenance

Priority: SCRIBE_DATABASE_MAINTENANCE_SOFT_THRESHOLDS > config

Defaults to:

soft-thresholds = {
scribe_db_connections_active = Math.ceil(Math.log(max-pool-size + channel-pool-size + system-pool-size)) # DB pool load, eg not many connections are active
}
PropertyValue
OverrideSCRIBE_DATABASE_MAINTENANCE_SOFT_THRESHOLDS (optional)
database.maintenance.soft-thresholds = ${?SCRIBE_DATABASE_MAINTENANCE_SOFT_THRESHOLDS}

Maximum size of the batch connection pool. This pool is used by transcription tasks - each task uses one connection for its lifetime.

Default: concurrency + 5

Minimum: 5

Maximum: concurrency * 1.5 (rounded up) When setting this value, remember that the total connection count includes:

  • This batch pool (max-pool-size)
  • System pool (configurable via system-pool-size, default: max(transcribeCount + 4, concurrency / 4))
  • Channel pool (configurable via channel-pool-size, default: concurrency)

Example: With max-pool-size = 50, concurrency = 16, and 2 transcribes:

Total max connections = 50 (batch) + 6 (system) + 16 (channel) = 72 connections

Priority: SCRIBE_DATABASE_MAX_POOL_SIZE > config

PropertyValue
Default21
OverrideSCRIBE_DATABASE_MAX_POOL_SIZE (optional)
database.max-pool-size = ${?SCRIBE_DATABASE_MAX_POOL_SIZE}

Priority: SCRIBE_DATABASE_PASSWORD > config

PropertyValue
Default""
OverrideSCRIBE_DATABASE_PASSWORD (optional)
database.password = ${?SCRIBE_DATABASE_PASSWORD}

Prepared Statement Caching (advanced tuning)

Controls server-side prepared statements and per-connection caching. These improve performance by reusing query plans across executions. Memory footprint: poolSize * prepared-statement-cache-size

Example: 35 connections * 8 MiB = ~280 MiB

prepare-threshold: executions before server-side prepare (1 = immediate) prepared-statement-cache-queries: max cached statements per connection prepared-statement-cache-size: memory limit per connection (HOCON memory size syntax)

PropertyValue
Default1
database.prepare-threshold = 1

PropertyValue
Default256
database.prepared-statement-cache-queries = 256

PropertyValue
Default8 MiB
database.prepared-statement-cache-size = 8 MiB

Query Connection Limiter

Timeout for HTTP/GraphQL queries when the system is busy. When all connections are in use, HTTP and GraphQL queries wait up to this duration before returning 503 Service Unavailable with a Retry-After header. LDAP queries block up to their query time limit instead.

Default: 5 seconds

Priority: SCRIBE_DATABASE_QUERY_HTTP_ACQUISITION_TIMEOUT > config

PropertyValue
Default5s
OverrideSCRIBE_DATABASE_QUERY_HTTP_ACQUISITION_TIMEOUT (optional)
database.query-http-acquisition-timeout = ${?SCRIBE_DATABASE_QUERY_HTTP_ACQUISITION_TIMEOUT}

Retry policy for acquiring database connections. Unset fields inherit from the root retry block.

Initial delay between retries while waiting for database connections

Priority: SCRIBE_DATABASE_RETRY_INITIAL_DELAY > config

PropertyValue
Default100 milliseconds
OverrideSCRIBE_DATABASE_RETRY_INITIAL_DELAY (optional)
database.retry.initial-delay = ${?SCRIBE_DATABASE_RETRY_INITIAL_DELAY}

Randomized jitter added to each delay to avoid thundering herds

Priority: SCRIBE_DATABASE_RETRY_JITTER > config

PropertyValue
Default25 milliseconds
OverrideSCRIBE_DATABASE_RETRY_JITTER (optional)
database.retry.jitter = ${?SCRIBE_DATABASE_RETRY_JITTER}

Set to >0 to cap retries by attempt count instead of duration (0 = unlimited)

Priority: SCRIBE_DATABASE_RETRY_MAX_ATTEMPTS > config

PropertyValue
Default0
OverrideSCRIBE_DATABASE_RETRY_MAX_ATTEMPTS (optional)
database.retry.max-attempts = ${?SCRIBE_DATABASE_RETRY_MAX_ATTEMPTS}

Maximum delay between retries

Priority: SCRIBE_DATABASE_RETRY_MAX_DELAY > config

PropertyValue
Default5 seconds
OverrideSCRIBE_DATABASE_RETRY_MAX_DELAY (optional)
database.retry.max-delay = ${?SCRIBE_DATABASE_RETRY_MAX_DELAY}

Maximum time spent retrying before surfacing a timeout (leave unset to retry indefinitely)

Priority: SCRIBE_DATABASE_RETRY_MAX_DURATION > config

PropertyValue
Defaultnull
OverrideSCRIBE_DATABASE_RETRY_MAX_DURATION (optional)
database.retry.max-duration = ${?SCRIBE_DATABASE_RETRY_MAX_DURATION}

defaults to the root ssl configuration

The location of the root certificate for authenticating the server. File containing the root certificate when validating server (mode = “verify-ca” or “verify-full”). Default will be the file “root.crt” in “$HOME/.postgresql” (*nix) or “%APPDATA%\postgresql” (windows).

Priority: SCRIBE_DATABASE_SSL_CA > config > ssl.ca

PropertyValue
Default"relative/from/config/file/ca.pem"
OverrideSCRIBE_DATABASE_SSL_CA (optional)
database.ssl.ca = ${?SCRIBE_DATABASE_SSL_CA}

The location of the client’s SSL certificate File containing the SSL Certificate. Default will be the file “postgresql.crt” in “$HOME/.postgresql” (*nix) or “%APPDATA%\postgresql” (windows).

Priority: SCRIBE_DATABASE_SSL_CERT > config > ssl.cert

PropertyValue
Default"relative/from/config/file/cert.pem"
OverrideSCRIBE_DATABASE_SSL_CERT (optional)
database.ssl.cert = ${?SCRIBE_DATABASE_SSL_CERT}

Must be set to ‘true’ to enable SSL for the database

Priority: SCRIBE_DATABASE_SSL_ENABLED > config

PropertyValue
Defaultfalse
OverrideSCRIBE_DATABASE_SSL_ENABLED (optional)
database.ssl.enabled = ${?SCRIBE_DATABASE_SSL_ENABLED}

The location of the client’s PKCS#8 SSL key. File containing the SSL Key. Default will be the file “postgresql.pk8” in “$HOME/.postgresql” (*nix) or “%APPDATA%\postgresql” (windows).

Priority: SCRIBE_DATABASE_SSL_KEY > config > ssl.key

PropertyValue
Default"relative/from/config/file/cert.pem"
OverrideSCRIBE_DATABASE_SSL_KEY (optional)
database.ssl.key = ${?SCRIBE_DATABASE_SSL_KEY}

The password for the client’s ssl key.

Priority: SCRIBE_DATABASE_SSL_PASSWORD > config > ssl.password

PropertyValue
Default"..."
OverrideSCRIBE_DATABASE_SSL_PASSWORD (optional)
database.ssl.password = ${?SCRIBE_DATABASE_SSL_PASSWORD}

Size of the system connection pool. This pool handles commit coordination, migrations, maintenance, and entry preparation.

Default: max(transcribeCount + 4, concurrency / 4)

Minimum: 2

Maximum: max-pool-size / 2 The default scales with both the number of configured transcribes and concurrency level to handle workloads where multiple transcribes perform system operations simultaneously.

Example: With 4 transcribes and concurrency = 16:

Default = max(4 + 4, 16/4) = max(8, 4) = 8 connections

Priority: SCRIBE_DATABASE_SYSTEM_POOL_SIZE > config

PropertyValue
Default6
OverrideSCRIBE_DATABASE_SYSTEM_POOL_SIZE (optional)
database.system-pool-size = ${?SCRIBE_DATABASE_SYSTEM_POOL_SIZE}

https://www.prisma.io/dataguide/postgresql/short-guides/connection-uris https://jdbc.postgresql.org/documentation/use/#connecting-to-the-database Multiple servers can be specified, separated by a comma, eg “postgres://server1,server2” Options can be set in the URL or as separate properties

Priority: SCRIBE_DATABASE_URL > config

url = “postgres:”${user.name}

PropertyValue
OverrideSCRIBE_DATABASE_URL (optional)
database.url = ${?SCRIBE_DATABASE_URL}

Priority: SCRIBE_DATABASE_USER > config

user = ${user.name}

PropertyValue
OverrideSCRIBE_DATABASE_USER (optional)
database.user = ${?SCRIBE_DATABASE_USER}