Skip to content

Commit

Permalink
Expose LSN and replication delay as metrics (#7610)
Browse files Browse the repository at this point in the history
## Problem
We currently have no way to see what the current LSN of a compute its,
and in case of read replicas, we don't know what the difference in LSNs
is.

## Summary of changes
Adds these metrics
  • Loading branch information
Sasha Krassovsky authored May 8, 2024
1 parent 0457980 commit 21e1a49
Showing 1 changed file with 43 additions and 1 deletion.
44 changes: 43 additions & 1 deletion vm-image-spec.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -244,6 +244,49 @@ files:
values: [approximate_working_set_size]
query: |
select neon.approximate_working_set_size(false) as approximate_working_set_size;
- metric_name: current_lsn
type: gauge
help: 'Current LSN of the database'
key_labels:
values: [lsn]
query: |
select
case
when pg_catalog.pg_is_in_recovery()
then pg_last_wal_replay_lsn()
else pg_current_wal_lsn()
end as lsn;
- metric_name: replication_delay_bytes
type: gauge
help: 'Bytes between received and replayed LSN'
key_labels:
values: [replication_delay_bytes]
query: |
SELECT pg_wal_lsn_diff(pg_last_wal_receive_lsn(), pg_last_wal_replay_lsn()) AS replication_delay_bytes;
- metric_name: replication_delay_seconds
type: gauge
help: 'Time since last LSN was replayed'
key_labels:
values: [replication_delay_seconds]
query: |
SELECT
CASE
WHEN pg_last_wal_receive_lsn() = pg_last_wal_replay_lsn() THEN 0
ELSE GREATEST (0, EXTRACT (EPOCH FROM now() - pg_last_xact_replay_timestamp()))
END AS replication_delay_seconds;
- metric_name: checkpoint_stats
type: gauge
help: 'Number of requested and scheduled checkpoints'
key_labels:
values:
- checkpoints_req
- checkpoints_timed
query: |
SELECT checkpoints_req, checkpoints_timed FROM pg_stat_bgwriter;
- filename: neon_collector_autoscaling.yml
content: |
collector_name: neon_collector_autoscaling
Expand Down Expand Up @@ -295,7 +338,6 @@ files:
values: [approximate_working_set_size]
query: |
select neon.approximate_working_set_size(false) as approximate_working_set_size;
build: |
# Build cgroup-tools
#
Expand Down

1 comment on commit 21e1a49

@github-actions
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3105 tests run: 2957 passed, 2 failed, 146 skipped (full report)


Failures on Postgres 14

  • test_heavy_write_workload[neon_off-github-actions-selfhosted-10-5-5]: release
  • test_storage_controller_many_tenants[github-actions-selfhosted]: release
# Run all failed tests locally:
scripts/pytest -vv -n $(nproc) -k "test_heavy_write_workload[neon_off-release-pg14-github-actions-selfhosted-10-5-5] or test_storage_controller_many_tenants[release-pg14-github-actions-selfhosted]"
Flaky tests (1)

Postgres 15

  • test_vm_bit_clear_on_heap_lock: debug

Code coverage* (full report)

  • functions: 31.4% (6313 of 20125 functions)
  • lines: 47.3% (47575 of 100666 lines)

* collected from Rust tests only


The comment gets automatically updated with the latest test results
21e1a49 at 2024-05-08T17:32:20.911Z :recycle:

Please sign in to comment.