Observe, analyze, and optimize the usage, health, and performance of your PostgreSQL database.
PostgresSQL monitoring solution is based on a remote monitoring approach implemented as a Dynatrace ActiveGate extension. he extension queries PostgreSQL databases for performance and health metrics. Dynatrace Intelligence analyzes these metrics for anomalies and problems.
Designate an ActiveGate group or groups that will remotely connect to your PostgreSQL database server to pull data. All ActiveGates in each group must connect to your PostgreSQL database server.
For self-hosted Postgres:
For cloud-managed Postgres services:
Create a dedicated database user in your database instance. Dynatrace uses this user to run monitoring queries against your PostgreSQL database.
CREATE USER dynatrace WITH PASSWORD '<PASSWORD>' INHERIT;
.
Databases app.Select a hosting type from the options. This choice determines which script generates the necessary database objects later in the process.
Set up the connection to your database instance. Provide the required credentials directly in the wizard or use secure alternatives:
dynatrace monitoring user you have created directly or use secure alternatives.
To monitor your database instance, configure specific settings in PostgreSQL. Dynatrace provides scripts that create the necessary database objects. Download the script to your host and run it as an admin user with sufficient permissions.
The script performs the following actions:
Grants the monitoring user membership in the pg_monitor role for read-only monitoring access.
GRANT pg_monitor TO dynatrace;
Create a helper function that generates execution plans from SQL commands for deeper query insights.
CREATE SCHEMA dynatrace;CREATE OR REPLACE FUNCTION dynatrace.dynatrace_execution_plan(query text,OUT explain JSON) RETURNS SETOF JSONLANGUAGE plpgsqlVOLATILERETURNS NULL ON NULL INPUTSECURITY DEFINERROWS 1SET plan_cache_mode = force_generic_planAS$$DECLAREarg_count integer;open_paren text;close_paren text;explain_cmd text;json_result json;BEGIN/* reject statements containing a semicolon in the middle */IF pg_catalog.strpos(pg_catalog.rtrim(dynatrace_execution_plan.query, ';'),';') OPERATOR(pg_catalog.>) 0 THENRAISE EXCEPTION 'query string must not contain a semicolon';END IF;/* get the parameter count */SELECT count(*) INTO arg_countFROM pg_catalog.regexp_matches( /* extract the "$n" */pg_catalog.regexp_replace( /* remove single quoted strings */dynatrace_execution_plan.query,'''[^'']*''','','g'),'\$\d{1,}','g');IF arg_count OPERATOR(pg_catalog.=) 0 THENopen_paren := '';close_paren := '';ELSEopen_paren := '(';close_paren := ')';END IF;/* construct a prepared statement */EXECUTEpg_catalog.concat('PREPARE _stmt_',open_paren,pg_catalog.rtrim(pg_catalog.repeat('unknown,', arg_count),','),close_paren,' AS ',dynatrace_execution_plan.query);/* construct an EXPLAIN statement */explain_cmd :=pg_catalog.concat('EXPLAIN (FORMAT JSON, ANALYZE FALSE) EXECUTE _stmt_',open_paren,pg_catalog.rtrim(pg_catalog.repeat('NULL,', arg_count),','),close_paren);/* get and return the plan */EXECUTE explain_cmd INTO json_result;RETURN QUERY SELECT json_result;/* delete the prepared statement */DEALLOCATE _stmt_;END;$$;
SECURITY DEFINER: Executes with the privileges of the user who defined it, not the one who executes it.EXPLAIN: Requires the same permissions as to run the query.
The user with this function needs to have sufficient privileges to run PREPARE and EXPLAIN on the queries it will be explaining.USAGE grant on the dynatrace schema.GRANT USAGE ON SCHEMA dynatrace to <username>
search_path for the monitoring user:ALTER USER dynatrace SET search_path to dynatrace, public;
After these steps, metrics for the monitored PostgreSQL instance appear in the DB app within 2–3 minutes. Then, you can select any instance to explore detailed metrics and performance insights.
You have to run the script for the system to retrieve any database metrics. To learn more, refer to the helper function details in the Install the instance section.
Recommended
After running the creation script, run the validation script to confirm all required objects were created. This ensures the monitoring setup will work as expected
Dynatrace supports both self-hosted monitoring and cloud-managed monitoring for Postgres databases.
Choose self-hosted Postgres for complete observability, execution plan analysis, automated onboarding, and advanced diagnostics.
Choose cloud-managed Postgres for reduced operational overhead with some monitoring limitations and manual configuration.
Postgres self-hosted monitoring enables you to manage the database instance and infrastructure. The extension collects data directly from the database using a read-only user. Complete the following setup to enable monitoring:
Configure database parameters.
Configure the following Postgres parameters in the postgresql.conf file and restart the server to apply the settings. For more details, see the Postgres documentation.
Grant the Dynatrace proxy access to the database.
Select and configure the ActiveGate group.
Cloud-managed PostgreSQL is a database provided as a service by cloud providers. Cloud providers like AWS RDS, AWS Aurora, and Google Cloud SQL prevent direct database configuration. Enable the following features and settings to ensure full monitoring capability.
Add the pg_stat_statements extension.
This extension collects query-level statistics.
Configure the following settings:
track_activity_query_size = 4096 Required
Enables the collection of larger queries by increasing the size of SQL text in pg_stat_activity. If left at the default value (1024), queries longer than 1024 characters aren't collected.
pg_stat_statements.max = 10000 Optional
Increases the number of normalized queries tracked in pg_stat_statements.
pg_stat_statements.track_utility = off Optional
Disables tracking of utility commands like PREPARE and EXPLAIN.
track_io_timing = on Optional
Collects timing information for block read and write operations in queries.
The metrics collected through this extension consume Dynatrace Davis Data Units (see DDUs for metrics).
A rough estimation of the amount of DDUs consumed by metric ingest can be obtained through the following formula:
( (11 * number of instances)+ (29 * number of databases)+ (1 * number of tablespaces)) * 525.6 DDUs/year
For logs, regular DDU consumption for log monitoring applies. Depending on your licensing model, refer either to DDU consumption for Log Management and Analytics or DDUs for Log Monitoring Classic.
If your license consists of Custom Metrics, each custom metric is equivalent to 525.6 DDUs/yr. For more information, see Metric Cost Calculation.
If both the Dynatrace log monitoring is enabled and the pg_stat_statements view is available, Dynatrace will ingest the top 100 queries (sorted by total execution time) every 5 minutes and store them as logs. These logs are available either from the database instance screen or on the Databases App, under Top queries by total execution time.
To filter by these queries on a dashboard or notebook, one can filter by dt.extension.name = com.dynatrace.extension.postgres and event.group = top_queries. See below a DQL query example:
fetch logs| filter dt.extension.name=="com.dynatrace.extension.postgres" and event.group=="top_queries"| sort total_exec_time desc
Regardless of whether pg_stat_statements is available or not, Dynatrace still collects queries from pg_stat_activity as part of the Queries feature set, which are similarly ingested as logs with event.group = longest_queries.
For SaaS users who have access to the Databases app and who have top query monitoring enabled (see previous section), fetching execution plans for these queries is possible. This can be done from the Databases app, under Statement performance, by clicking Request on the execution plan for a specific query.
For that query, the extension attempts to execute the following:
SELECT * from dynatrace.dynatrace_execution_plan({query})
and then ingest into Dynatrace the first row of the column named explain. These execution plans are ingested as logs with event.group = execution_plans.
OneAgent must establish a direct connection to the host being monitored. It should avoid connecting through a load balancer, proxy, or connection pooler (such as pg-pool).
If the OneAgent switches between different hosts while running (for example, during failover), it may compute differences in statistics across the DB instances. This can result in inaccurate metrics and misleading data.
When activating your extension using monitoring configuration, you can limit monitoring to one of the feature sets. To work properly, the extension has to collect at least one metric after the activation.
In highly segmented networks, feature sets can reflect the segments of your environment. Then, when you create a monitoring configuration, you can select a feature set and a corresponding ActiveGate group that can connect to this particular segment.
All metrics that aren't categorized into any feature set are considered to be the default and are always reported.
A metric inherits the feature set of a subgroup, which in turn inherits the feature set of a group. Also, the feature set defined on the metric level overrides the feature set defined on the subgroup level, which in turn overrides the feature set defined on the group level.
| Metric name | Metric key | Description |
|---|
| Metric name | Metric key | Description |
|---|---|---|
| Instance uptime | postgres.uptime | Time since the instance has been started |
| Metric name | Metric key | Description |
|---|---|---|
| Replication WAL restart delay | postgres.replication.restart_delay | Difference between current WAL LSN and the restart_lsn as reported by pg_replication_slots |
| Replication WAL confirmed flush lag | postgres.replication.confirmed_flush_lag | Difference between current WAL LSN and the confirmed_flush_lsn as reported by pg_replication_slots |
| Replication WAL write lag | postgres.replication.write_lag | Time elapsed between flushing recent WAL locally and receiving notification that this standby server has written it, as reported by pg_stat_replication. |
| Replication WAL flush lag | postgres.replication.flush_lag | Time elapsed between flushing recent WAL locally and receiving notification that this standby server has written and flushed it, as reported by pg_stat_replication. |
| Replication WAL replay lag | postgres.replication.replay_lag | Time elapsed between flushing recent WAL locally and receiving notification that this standby server has written, flushed and applied it, as reported by pg_stat_replication. |
| Metric name | Metric key | Description |
|---|---|---|
| Scheduled checkpoints performed | postgres.checkpoints_timed.count | Number of scheduled checkpoints that have been performed |
| Requested checkpoints performed | postgres.checkpoints_req.count | Number of requested checkpoints that have been performed |
| Checkpoints write time | postgres.checkpoint_write_time.count | Total amount of time that has been spent in the portion of checkpoint processing where files are written to disk |
| Checkpoint sync time | postgres.checkpoint_sync_time.count | Total amount of time that has been spent in the portion of checkpoint processing where files are synchronized to disk |
| Buffers written during checkpoints | postgres.buffers_checkpoint.count | Number of buffers written during checkpoints |
| Buffers written by background writer | postgres.buffers_clean.count | Number of buffers written by the background writer |
| Cleaning scan stops | postgres.maxwritten_clean.count | Number of times the background writer stopped a cleaning scan because it had written too many buffers |
| Buffers written by backend | postgres.buffers_backend.count | Number of buffers written directly by a backend |
| Backend fsync executions | postgres.buffers_backend_fsync.count | Number of times a backend had to execute its own fsync call |
| Buffers allocated | postgres.buffers_alloc.count | Number of buffers allocated |
| Metric name | Metric key | Description |
|---|---|---|
| Latest transaction XID age | postgres.xid_age | Difference between the current transaction's XID and datfrozenxid. If this value exceeds 2^31, this can cause a database crash due to transaction ID wraparound. |
| Number of backends | postgres.numbackends | Number of backends currently connected to this database |
| Committed transactions | postgres.xact_commit.count | Number of transactions in this database that have been committed |
| Rolled back transactions | postgres.xact_rollback.count | Number of transactions in this database that have been rolled back |
| Block read from disk | postgres.blks_read.count | Number of disk blocks read in this database |
| Blocks found in buffer cache | postgres.blks_hit.count | Number of times disk blocks were found already in the buffer cache, so that a read was not necessary |
| Live rows returned | postgres.tup_returned.count | Number of live rows fetched by sequential scans and index entries returned by index scans in this database |
| Live rows fetched by index scans | postgres.tup_fetched.count | Number of live rows fetched by index scans in this database |
| Rows inserted | postgres.tup_inserted.count | Number of rows inserted by queries in this database |
| Rows updated | postgres.tup_updated.count | Number of rows updated by queries in this database |
| Rows deleted | postgres.tup_deleted.count | Number of rows deleted by queries in this database |
| Queries canceled due to conflict | postgres.conflicts.count | Number of queries canceled due to conflicts with recovery in this database |
| Temporary files created | postgres.temp_files.count | Number of temporary files created by queries in this database |
| Data written to temporary files | postgres.temp_bytes.count | Total amount of data written to temporary files by queries in this database |
| Deadlocks | postgres.deadlocks.count | Number of deadlocks detected in this database |
| Data file blocks reading time | postgres.blk_read_time.count | Time spent reading data file blocks by backends in this database |
| Data file blocks writing time | postgres.blk_write_time.count | Time spent writing data file blocks by backends in this database |
| Database Size | postgres.db_size | Size of the database in bytes |
| Data page checksum failures | postgres.checksum_failures.count | Number of data page checksum failures detected in this database. Only available if data checksums are enabled. |
| Time spent by sessions | postgres.session_time.count | Time spent by database sessions in this database |
| Time spent executing SQL statements | postgres.active_time.count | Time spent executing SQL statements in this database |
| Time spent idling | postgres.idle_in_transaction_time.count | Time spent idling while in a transaction in this database |
| Established sessions | postgres.sessions.count | Total number of sessions established |
| Abandoned sessions | postgres.sessions_abandoned.count | Number of database sessions to this database that were terminated because connection to the client was lost |
| Fatal error terminated sessions | postgres.sessions_fatal.count | Number of database sessions to this database that were terminated by fatal errors |
| Killed sessions | postgres.sessions_killed.count | Number of database sessions to this database that were terminated by operator intervention |
| Metric name | Metric key | Description |
|---|---|---|
| WAL diff size | postgres.wal_diff_size | Size of difference between current WAL and last WAL replay |
| WAL records per minute | postgres.wal_records.count | Number of WAL records generated per minute |
| WAL fpi per minute | postgres.wal_fpi.count | Number of WAL full page images generated per minute |
| WAL bytes | postgres.wal_bytes.count | Total amount of WAL generated in bytes |
| WAL buffers full | postgres.wal_buffers_full.count | Number of times WAL data was written to disk because WAL buffers became full |
| WAL write | postgres.wal_write.count | Number of times WAL buffers were written out to disk via XLogWrite request |
| WAL sync | postgres.wal_sync.count | Number of times WAL files were synced to disk via issue_xlog_fsync request |
| WAL write time | postgres.wal_write_time.count | Total amount of time spent writing WAL buffers to disk via XLogWrite request, in milliseconds |
| WAL sync time | postgres.wal_sync_time.count | Total amount of time spent syncing WAL files to disk via issue_xlog_fsync request, in milliseconds |
| Metric name | Metric key | Description |
|---|---|---|
| Tablespace size | postgres.tablespace.size | Tablespace size in bytes |
| Metric name | Metric key | Description |
|---|---|---|
| Instance recovery mode | postgres.recovery.state | Indicate that the instance is in recovery mode. 1 if in recovery, 0 otherwise. |
| Metric name | Metric key | Description |
|---|---|---|
| Number of locks | postgres.locks | Number of locks as reported by pg_locks |
| Metric name | Metric key | Description |
|---|---|---|
| Active backend processes | postgres.activity.active | Number of server processes executing a query |
| Idle backend processes | postgres.activity.idle | Number of server processes waiting for a new client command |
| Idle in transaction backends processes | postgres.activity.idle_in_transaction | Number of server processes in transaction not currently executing a query |
| Idle in transaction aborted backends processes | postgres.activity.idle_in_transaction_aborted | Number of server processes in transaction not currently executing a query where one of the statements caused an error |
| Fast-path function backend processes | postgres.activity.fastpath_function_call | Number of server processes executing a fast-path function call |
Top queries:
ALTER statements are excluded from top query collection.Execution plan details:
extensions:configuration.actions:write permission is required to trigger the execution plan fetching.dynatrace.dynatrace_execution_plan function has been created.This extension runs from your Dynatrace ActiveGates and connects to the configured databases. Once the connection has been established, the extension regularly runs queries on the database to gather performance and health metrics, reporting the results back to Dynatrace.
Only SELECT queries are executed to collect data. To see exactly which queries are executed, download the extension yaml artifact by going to Release notes, opening a release and pressing the Download version button.
From version 2.3.0 onwards, query execution frequency is controlled by the configuration variables query-interval and heavy-query-interval. Most of the queries executed by the extension runs every query-interval minutes (with a default of 1 minute), while the queries under the Queries feature set runs every heavy-query-interval minutes (with a default of 5 minutes).
For older versions, most queries run every minute, with exceptions for the heavy queries mentioned above, which run every 5 minutes.
In order to support a wide range of Postgres versions we need to have several versions of the same queries running at the same time, since over time Postgres has changed column names for several tables. As such, it is expected for some queries to fail, but as long as there is no missing data, there is no cause for concern.