Monitor your Postgres performance via our new EF2.0 extension framework.


PostgresSQL monitoring solution is based on a remote monitoring approach implemented as a Dynatrace ActiveGate extension. The extension queries Postgres databases for key performance and health. Dynatrace's DAVIS AI then analyzes these metrics to provide anomaly and problem analysis.
There must be connectivity between the ActiveGate, where the extension is deployed, and the Postgres database.
Database user with proper permissions must be provided. Example:
CREATE USER dynatrace WITH PASSWORD '<PASSWORD>' INHERIT;GRANT pg_monitor TO dynatrace;
For top query monitoring:
pg_stat_statements view (must be enabled)[https://www.postgresql.org/docs/current/pgstatstatements.html#PGSTATSTATEMENTS].pg_stat_statements.track_planning must be turned on to enable plan fetching.For execution plan details monitoring:
Special dynatrace.dynatrace_execution_plan function must be created in the database, to which you will connect and from which the execution plans will be fetched.
CREATE SCHEMA dynatrace;CREATE OR REPLACE FUNCTION dynatrace.dynatrace_execution_plan(query text,OUT explain JSON) RETURNS SETOF JSONLANGUAGE plpgsqlVOLATILERETURNS NULL ON NULL INPUTSECURITY DEFINERROWS 1SET plan_cache_mode = force_generic_planAS$$DECLAREarg_count integer;open_paren text;close_paren text;explain_cmd text;json_result json;BEGIN/* reject statements containing a semicolon in the middle */IF pg_catalog.strpos(pg_catalog.rtrim(dynatrace_execution_plan.query, ';'),';') OPERATOR(pg_catalog.>) 0 THENRAISE EXCEPTION 'query string must not contain a semicolon';END IF;/* get the parameter count */SELECT count(*) INTO arg_countFROM pg_catalog.regexp_matches( /* extract the "$n" */pg_catalog.regexp_replace( /* remove single quoted strings */dynatrace_execution_plan.query,'''[^'']*''','','g'),'\$\d{1,}','g');IF arg_count OPERATOR(pg_catalog.=) 0 THENopen_paren := '';close_paren := '';ELSEopen_paren := '(';close_paren := ')';END IF;/* construct a prepared statement */EXECUTEpg_catalog.concat('PREPARE _stmt_',open_paren,pg_catalog.rtrim(pg_catalog.repeat('unknown,', arg_count),','),close_paren,' AS ',dynatrace_execution_plan.query);/* construct an EXPLAIN statement */explain_cmd :=pg_catalog.concat('EXPLAIN (FORMAT JSON, ANALYZE FALSE) EXECUTE _stmt_',open_paren,pg_catalog.rtrim(pg_catalog.repeat('NULL,', arg_count),','),close_paren);/* get and return the plan */EXECUTE explain_cmd INTO json_result;RETURN QUERY SELECT json_result;/* delete the prepared statement */DEALLOCATE _stmt_;END;$$;
Notice that the function above is defined with SECURITY DEFINER, meaning it executes with the privileges of the user who defined it, not the one executing it. This is due to the fact that the permissions required to EXPLAIN a query are the same as the one required to run that query, meaning that the user with which this function is defined needs to have a sufficient priveliges to run PREPARE and EXPLAIN on the queries it will be explaining. As such, for full functionality ensure that the function is defined with the appropriate user.
The monitoring user will need to have USAGE grant on the dynatrace schema.
GRANT USAGE ON SCHEMA dynatrace to <username>
In some PostgreSQL configurations you might also need to set search_path for the monitoring user:
ALTER USER dynatrace SET search_path to dynatrace, public;
Dynatrace version 1.255+
To activate remote monitoring:
, select PostgreSQL.The metrics collected through this extension consume Dynatrace Davis Data Units (see DDUs for metrics).
A rough estimation of the amount of DDUs consumed by metric ingest can be obtained through the following formula:
( (11 * number of instances)+ (29 * number of databases)+ (1 * number of tablespaces)) * 525.6 DDUs/year
For logs, regular DDU consumption for log monitoring applies. Depending on your licensing model, refer either to DDU consumption for Log Management and Analytics or DDUs for Log Monitoring Classic.
If your license consists of Custom Metrics, each custom metric is equivalent to 525.6 DDUs/yr. For more information, see Metric Cost Calculation.
If both the Dynatrace log monitoring is enabled and the pg_stat_statements view is available, Dynatrace will ingest the top 100 queries (sorted by total execution time) every 5 minutes and store them as logs. These logs are available either from the database instance screen or on the Databases App, under Top queries by total execution time.
To filter by these queries on a dashboard or notebook, one can filter by dt.extension.name = com.dynatrace.extension.postgres and event.group = top_queries. See below a DQL query example:
fetch logs| filter dt.extension.name=="com.dynatrace.extension.postgres" and event.group=="top_queries"| sort total_exec_time desc
Regardless of whether pg_stat_statements is available or not, Dynatrace will still collect queries from pg_stat_activity as part of the Queries feature set, which are similarly ingested as logs with event.group = longest_queries.
For SaaS users who have access to the Databases app and who have top query monitoring enabled (see previous section), fetching execution plans for these queries is possible. This can be done from the Databases app, under Statement performance, by clicking Request on the execution plan for a specific query.
For that query, the extension will then attemp to execute the following:
SELECT * from dynatrace.dynatrace_execution_plan({query})
and then ingest into Dynatrace the first row of the column named explain. These execution plans are ingested as logs with event.group = execution_plans.
When activating your extension using monitoring configuration, you can limit monitoring to one of the feature sets. To work properly the extension has to collect at least one metric after the activation.
In highly segmented networks, feature sets can reflect the segments of your environment. Then, when you create a monitoring configuration, you can select a feature set and a corresponding ActiveGate group that can connect to this particular segment.
All metrics that aren't categorized into any feature set are considered to be the default and are always reported.
A metric inherits the feature set of a subgroup, which in turn inherits the feature set of a group. Also, the feature set defined on the metric level overrides the feature set defined on the subgroup level, which in turn overrides the feature set defined on the group level.
| Metric name | Metric key | Description |
|---|---|---|
| Active backend processes | postgres.activity.active | Number of server processes executing a query |
| Idle backend processes | postgres.activity.idle | Number of server processes waiting for a new client command |
| Idle in transaction backends processes | postgres.activity.idle_in_transaction | Number of server processes in transaction not currently executing a query |
| Idle in transaction aborted backends processes | postgres.activity.idle_in_transaction_aborted | Number of server processes in transaction not currently executing a query where one of the statements caused an error |
| Fast-path function backend processes | postgres.activity.fastpath_function_call | Number of server processes executing a fast-path function call |
| Metric name | Metric key | Description |
|---|
| Metric name | Metric key | Description |
|---|---|---|
| Instance uptime | postgres.uptime | Time since the instance has been started |
| Metric name | Metric key | Description |
|---|---|---|
| Replication WAL restart delay | postgres.replication.restart_delay | Difference between current WAL LSN and the restart_lsn as reported by pg_replication_slots |
| Replication WAL confirmed flush lag | postgres.replication.confirmed_flush_lag | Difference between current WAL LSN and the confirmed_flush_lsn as reported by pg_replication_slots |
| Replication WAL write lag | postgres.replication.write_lag | Time elapsed between flushing recent WAL locally and receiving notification that this standby server has written it, as reported by pg_stat_replication. |
| Replication WAL flush lag | postgres.replication.flush_lag | Time elapsed between flushing recent WAL locally and receiving notification that this standby server has written and flushed it, as reported by pg_stat_replication. |
| Replication WAL replay lag | postgres.replication.replay_lag | Time elapsed between flushing recent WAL locally and receiving notification that this standby server has written, flushed and applied it, as reported by pg_stat_replication. |
| Metric name | Metric key | Description |
|---|---|---|
| Scheduled checkpoints performed | postgres.checkpoints_timed.count | Number of scheduled checkpoints that have been performed |
| Requested checkpoints performed | postgres.checkpoints_req.count | Number of requested checkpoints that have been performed |
| Checkpoints write time | postgres.checkpoint_write_time.count | Total amount of time that has been spent in the portion of checkpoint processing where files are written to disk |
| Checkpoint sync time | postgres.checkpoint_sync_time.count | Total amount of time that has been spent in the portion of checkpoint processing where files are synchronized to disk |
| Buffers written during checkpoints | postgres.buffers_checkpoint.count | Number of buffers written during checkpoints |
| Buffers written by background writer | postgres.buffers_clean.count | Number of buffers written by the background writer |
| Cleaning scan stops | postgres.maxwritten_clean.count | Number of times the background writer stopped a cleaning scan because it had written too many buffers |
| Buffers written by backend | postgres.buffers_backend.count | Number of buffers written directly by a backend |
| Backend fsync executions | postgres.buffers_backend_fsync.count | Number of times a backend had to execute its own fsync call |
| Buffers allocated | postgres.buffers_alloc.count | Number of buffers allocated |
| Metric name | Metric key | Description |
|---|---|---|
| Latest transaction XID age | postgres.xid_age | Difference between the current transaction's XID and datfrozenxid. If this value exceeds 2^31, this can cause a database crash due to transaction ID wraparound. |
| Number of backends | postgres.numbackends | Number of backends currently connected to this database |
| Committed transactions | postgres.xact_commit.count | Number of transactions in this database that have been committed |
| Rolled back transactions | postgres.xact_rollback.count | Number of transactions in this database that have been rolled back |
| Block read from disk | postgres.blks_read.count | Number of disk blocks read in this database |
| Blocks found in buffer cache | postgres.blks_hit.count | Number of times disk blocks were found already in the buffer cache, so that a read was not necessary |
| Live rows returned | postgres.tup_returned.count | Number of live rows fetched by sequential scans and index entries returned by index scans in this database |
| Live rows fetched by index scans | postgres.tup_fetched.count | Number of live rows fetched by index scans in this database |
| Rows inserted | postgres.tup_inserted.count | Number of rows inserted by queries in this database |
| Rows updated | postgres.tup_updated.count | Number of rows updated by queries in this database |
| Rows deleted | postgres.tup_deleted.count | Number of rows deleted by queries in this database |
| Queries canceled due to conflict | postgres.conflicts.count | Number of queries canceled due to conflicts with recovery in this database |
| Temporary files created | postgres.temp_files.count | Number of temporary files created by queries in this database |
| Data written to temporary files | postgres.temp_bytes.count | Total amount of data written to temporary files by queries in this database |
| Deadlocks | postgres.deadlocks.count | Number of deadlocks detected in this database |
| Data file blocks reading time | postgres.blk_read_time.count | Time spent reading data file blocks by backends in this database |
| Data file blocks writing time | postgres.blk_write_time.count | Time spent writing data file blocks by backends in this database |
| Database Size | postgres.db_size | Size of the database in bytes |
| Data page checksum failures | postgres.checksum_failures.count | Number of data page checksum failures detected in this database. Only available if data checksums are enabled. |
| Time spent by sessions | postgres.session_time.count | Time spent by database sessions in this database |
| Time spent executing SQL statements | postgres.active_time.count | Time spent executing SQL statements in this database |
| Time spent idling | postgres.idle_in_transaction_time.count | Time spent idling while in a transaction in this database |
| Established sessions | postgres.sessions.count | Total number of sessions established |
| Abandoned sessions | postgres.sessions_abandoned.count | Number of database sessions to this database that were terminated because connection to the client was lost |
| Fatal error terminated sessions | postgres.sessions_fatal.count | Number of database sessions to this database that were terminated by fatal errors |
| Killed sessions | postgres.sessions_killed.count | Number of database sessions to this database that were terminated by operator intervention |
| Metric name | Metric key | Description |
|---|---|---|
| WAL diff size | postgres.wal_diff_size | Size of difference between current WAL and last WAL replay |
| WAL records per minute | postgres.wal_records.count | Number of WAL records generated per minute |
| WAL fpi per minute | postgres.wal_fpi.count | Number of WAL full page images generated per minute |
| WAL bytes | postgres.wal_bytes.count | Total amount of WAL generated in bytes |
| WAL buffers full | postgres.wal_buffers_full.count | Number of times WAL data was written to disk because WAL buffers became full |
| WAL write | postgres.wal_write.count | Number of times WAL buffers were written out to disk via XLogWrite request |
| WAL sync | postgres.wal_sync.count | Number of times WAL files were synced to disk via issue_xlog_fsync request |
| WAL write time | postgres.wal_write_time.count | Total amount of time spent writing WAL buffers to disk via XLogWrite request, in milliseconds |
| WAL sync time | postgres.wal_sync_time.count | Total amount of time spent syncing WAL files to disk via issue_xlog_fsync request, in milliseconds |
| Metric name | Metric key | Description |
|---|---|---|
| Tablespace size | postgres.tablespace.size | Tablespace size in bytes |
| Metric name | Metric key | Description |
|---|---|---|
| Instance recovery mode | postgres.recovery.state | Indicate that the instance is in recovery mode. 1 if in recovery, 0 otherwise. |
| Metric name | Metric key | Description |
|---|---|---|
| Number of locks | postgres.locks | Number of locks as reported by pg_locks |
Top queries:
ALTER statements are excluded from top query collection.Execution plan details:
extensions:configuration.actions:write permission is required to trigger the execution plan fetching.dynatrace.dynatrace_execution_plan function has been created.This extension will run from your Dynatrace ActiveGates and connect to the configured databases. Once the connection has been established, the extension will regularly run queries on the database to gather performance and health metrics, reporting the results back to Dynatrace.
Only SELECT queries are executed to collect data. To see exactly which queries are executed, download the extension yaml artifact by going to Release notes, opening a release and pressing the Download version button.
From version 2.3.0 onwards, query execution frequency is controlled by the configuration variables query-interval and heavy-query-interval. Most of the queries executed by the extension will run every query-interval minutes (with a default of 1 minute), while the queries under the Queries feature set will run every heavy-query-interval minutes (with a default of 5 minutes).
For older versions, most queries run every minute, with exceptions for the heavy queries mentioned above, which run every 5 minutes.
In order to support a wide range of Postgres versions we need to have several versions of the same queries running at the same time, since over time Postgres has changed column names for several tables. As such, it is expected for some queries to fail, but as long as there is no missing data, there is no cause for concern.